Skip to content

S5-05 — Attempt log

Attempt 1 — 2026-05-17 — phase-story-executor

Outcome: GREEN (all 18 ACs satisfied; full suite + lint + mypy --strict + forbidden-patterns green).

Inputs read

  • Story S5-05-runtime-trace-freshness-and-drift.md (HARDENED — 18 ACs incl. the validator's 6 consistency-fixes against on-disk shapes).
  • scip_freshness precedent at src/codegenie/probes/layer_b/index_health.py:143-204 — the canonical "freshness function" shape.
  • FreshnessRegistry contract at src/codegenie/indices/registry.py:67Callable[[dict[str, object], str], IndexFreshness].
  • IndexFreshness discriminated union at codegenie.indices.freshnessFresh | Stale(reason: DigestMismatch | IndexerError | ...).
  • S5-04 integration test for cache-key invalidation as the template for Scenario A.
  • tests/adv/phase02/test_stale_scip_fixture.py outer-key invariant — the only existing test that pinned the registry size at 1.

Upstream-AC patch to S5-02 (one new field)

The story's "Notes for implementer" anticipated this: S5-02's slice did NOT carry last_traced_at, so the freshness function's Fresh(indexed_at=...) branch had no source. Per the inline-patch instructions:

  1. Added last_traced_at to _EXPECTED_SLICE_KEYS (the snapshot pin).
  2. Added a _now_utc_iso() helper module-local seam.
  3. Threaded last_traced_at: str | None = None through _empty_slice and _slice_from_aggregate — each defaults to _now_utc_iso() when None (the probe DID run; the timestamp is honest).
  4. Updated the inline _build_envelope_build_failed to stamp last_traced_at too.
  5. Existing slice-key snapshot tests at tests/unit/probes/layer_c/test_runtime_trace.py:317,324 absorbed the addition because they read from _EXPECTED_SLICE_KEYS (no test edit needed — the constant is the contract).

The widening is additive; no existing tests broke. Branch (b) of the freshness function (trace_coverage_confidence == "unavailable") fires before branch (c) of type-validation, so failure-path slices with last_traced_at=None would never reach the type check anyway — but the timestamp is stamped in all paths for honest-confidence rendering downstream.

Discoveries that mattered

  1. Pydantic model_dump(mode="json") renders UTC datetimes with the Z suffix. The B2 integration test's freshness["indexed_at"] is the wire-shape string after model_dump, not the source ISO string. Initial assertion against "2026-05-17T00:00:00+00:00" failed; corrected to "2026-05-17T00:00:00Z". Pin the wire shape, not the source.
  2. IndexHealthProbe imports codegenie.exec as _exec. The HEAD-resolver monkeypatch must target codegenie.exec.run_allowlisted (the imported module attribute), NOT ih.run_allowlisted (which doesn't exist on the index_health module). Mirrors the pattern that other index_health tests use indirectly via fixtures.
  3. tests/adv/phase02/test_stale_scip_fixture.py had a load-bearing set(...) == {"scip"} assertion that S5-05 widens. Updated to {"scip", "runtime_trace"} with a comment naming the future-widening trigger (S6-08).
  4. Ruff's UP031 flagged %-formatting in adversarial assertion messages; converted to f-strings. F811 flagged the pytest-fixture import that also names a test parameter — applied a targeted noqa: F811 (the pytest pattern requires both the import and the parameter).
  5. No edits to index_health.py. AC-16's structural promise held: git diff origin/master..HEAD --name-only does NOT include src/codegenie/probes/layer_b/index_health.py. The registry decorator pattern + read_raw_slices kernel is the Open/Closed seam working as designed.

Refactor decisions

  • Single _now_utc_iso() seam instead of three duplicated _dt.datetime.now(_dt.UTC).isoformat() call-sites — keeps the I/O surface narrow (one impure-line in S5-02's pure-helper layer) and gives the AST-purity audit a stable boundary to assert against.
  • Six-branch isinstance discipline for the freshness function — mirrors scip_freshness lines 168-184 verbatim. The branch order is load-bearing per the story validator's consistency block; (b) catches failure paths before (c) type-validates the optional last_traced_at.
  • Hypothesis property under tests/property/ — same directory the existing test_index_freshness_roundtrip.py lives. The property strategies are intentionally broad (None / int / bool / list / str) to exercise every isinstance arm including the "weird object" defensive paths.
  • Adversarial helpers in tests/adv/phase02/_helpers.py — second helper file in the adv corpus (after tests/adv/_helpers.py); the rule-of-three threshold is not yet reached so no kernel extraction. Documented for the next consumer.
  • No shared _FreshnessHelpers base. Story Notes explicitly defer this to S6-08 (the rule-of-three trigger fires at the 3rd consumer + the 4th & 5th together). Followed Rule 2 / Rule 11 — the duplication is fine; the trigger is recorded.

Acceptance criteria — evidence

AC Evidence
AC-1 — function placement + decorator + signature + __all__ test_function_signature_matches_registry_contract, test_function_exported_in_all
AC-2 — branch table over six cases, total test_branch_{a,b,c,d,e,f,g}_* (8 tests) + test_function_never_raises_on_arbitrary_object_values
AC-3 — Final[str] message constants test_all_message_constants_annotated_final_str, test_message_constants_values_are_unique, test_message_constants_match_id_pattern
AC-4 — purity AST-walk audit test_function_body_has_no_clock_or_io_calls, test_function_body_has_no_await_or_subprocess, test_function_body_is_pure_no_assignments_to_outer_state
AC-5 — registry membership + identity test_runtime_trace_registered_in_default_registry
AC-6 — B2 drift end-to-end (four-part) test_b2_emits_drift_for_runtime_trace
AC-7 — B2 clean = Fresh test_b2_emits_fresh_for_runtime_trace
AC-8 — B2 absent slice → upstream_unavailable test_b2_emits_stale_for_absent_runtime_trace_slice
AC-9 — mutation-resistance table (5 stubs) test_mutant_fails_at_least_one_named_check parametrized over 5 mutants — every one fails AC-6, AC-7, or AC-8
AC-10 — Hypothesis totality + purity tests/property/test_runtime_trace_freshness_purity.py::test_totality_and_purity, test_wall_clock_under_soft_budget
AC-11 — argument-order canary test_arg_order_is_slice_then_head
AC-12 — adversarial three scenarios test_image_digest_change_changes_cache_key + test_drift_detected_through_b2 + test_clean_run_emits_fresh
AC-13 — no real subprocess in adv forbid_real_subprocess fixture + test_no_real_subprocess_in_adv_layer smoke
AC-14 — ADR refs in adv assertion messages test_assertion_messages_carry_adr_refs (AST-introspect)
AC-15 — duplicate-registration smoke test_runtime_trace_duplicate_registration_rejected
AC-16 — no edits to index_health.py test_no_edit_to_index_health_module (git diff audit)
AC-17 — mypy --strict clean mypy --strict src/codegenie → 109 files, 0 errors
AC-18 — forbidden-patterns green python scripts/check_forbidden_patterns.py exit 0

Gates

  • ruff check (src + tests) — clean
  • ruff format --check — 330 files formatted
  • mypy --strict src/codegenie — 109 files, 0 errors
  • python scripts/check_forbidden_patterns.py — exit 0
  • Full suite — 2669 passed, 15 skipped, 3 deselected, 2 xfailed (one initial flake on test_stale_scip_regenerate_guard.py resolved on re-run; the test passes in isolation — order-pollution from a pre-existing test, not new in this story)

Files touched

  • Extended (S5-02 inline-patch): src/codegenie/probes/layer_c/runtime_trace.py — added _dt, Index{Freshness,erError}, Fresh, Stale, DigestMismatch, register_index_freshness_check, IndexName imports; added _MSG_* Final[str] constants; added last_traced_at to _EXPECTED_SLICE_KEYS; added _now_utc_iso(); threaded last_traced_at through _empty_slice, _slice_from_aggregate, and the inline _build_envelope_build_failed; added _check_runtime_trace_freshness + the @register_index_freshness_check decorator; updated __all__.
  • Updated test: tests/adv/phase02/test_stale_scip_fixture.py — outer-key set widened to include runtime_trace.
  • New tests:
  • tests/unit/probes/layer_c/test_runtime_trace_freshness.py (21 tests: signature, registry, branch table, B2 integration, arg-order, duplicate, no-edit-to-B2)
  • tests/unit/probes/layer_c/test_runtime_trace_freshness_purity.py (6 tests: Final[str] audit + AST-walk purity audit)
  • tests/unit/probes/layer_c/test_runtime_trace_freshness_mutation.py (5 mutants killed)
  • tests/property/test_runtime_trace_freshness_purity.py (2 Hypothesis properties)
  • tests/adv/phase02/test_image_digest_drift.py (5 tests: cache-key, drift, clean, no-real-subproc, ADR-message audit)
  • tests/adv/phase02/_helpers.py (shared build_drift_slice + forbid_real_subprocess + clean_freshness_registry)

Lessons for future Phase 2 stories

  • Pydantic JSON-mode renders UTC as Z. When a story asserts the wire shape of a datetime, pin against "...Z", not "...+00:00". The source datetime is built from fromisoformat(...) (which produces +00:00) but model_dump(mode="json") is the wire serializer.
  • Outer-key invariants widen with each freshness-check registration. The tests/adv/phase02/test_stale_scip_fixture.py test pinned the registry to a single name; S5-05 widens it to 2; S6-08 will widen it to 5. Each new registration must update this assertion at the same time — leave a comment naming the next-widening story (S6-08).
  • scip_freshness is the load-bearing template. Every future @register_index_freshness_check candidate should clone its shape: pure function, isinstance-discipline, try/except ValueError for timestamp parsing, Stale(IndexerError(_MSG_*)) for every failure return. Deviation from the template is a smell.
  • The HEAD-resolver monkeypatch target is codegenie.exec.run_allowlisted (via the module attribute), NOT index_health.run_allowlisted (which doesn't exist). The pattern is consistent — patch the imported module, not the importing module.