S5-05 — Attempt log¶
Attempt 1 — 2026-05-17 — phase-story-executor¶
Outcome: GREEN (all 18 ACs satisfied; full suite + lint + mypy --strict + forbidden-patterns green).
Inputs read¶
- Story
S5-05-runtime-trace-freshness-and-drift.md(HARDENED — 18 ACs incl. the validator's 6 consistency-fixes against on-disk shapes). scip_freshnessprecedent atsrc/codegenie/probes/layer_b/index_health.py:143-204— the canonical "freshness function" shape.FreshnessRegistrycontract atsrc/codegenie/indices/registry.py:67—Callable[[dict[str, object], str], IndexFreshness].IndexFreshnessdiscriminated union atcodegenie.indices.freshness—Fresh | Stale(reason: DigestMismatch | IndexerError | ...).S5-04integration test for cache-key invalidation as the template for Scenario A.tests/adv/phase02/test_stale_scip_fixture.pyouter-key invariant — the only existing test that pinned the registry size at 1.
Upstream-AC patch to S5-02 (one new field)¶
The story's "Notes for implementer" anticipated this: S5-02's slice did NOT carry last_traced_at, so the freshness function's Fresh(indexed_at=...) branch had no source. Per the inline-patch instructions:
- Added
last_traced_atto_EXPECTED_SLICE_KEYS(the snapshot pin). - Added a
_now_utc_iso()helper module-local seam. - Threaded
last_traced_at: str | None = Nonethrough_empty_sliceand_slice_from_aggregate— each defaults to_now_utc_iso()when None (the probe DID run; the timestamp is honest). - Updated the inline
_build_envelope_build_failedto stamplast_traced_attoo. - Existing slice-key snapshot tests at
tests/unit/probes/layer_c/test_runtime_trace.py:317,324absorbed the addition because they read from_EXPECTED_SLICE_KEYS(no test edit needed — the constant is the contract).
The widening is additive; no existing tests broke. Branch (b) of the freshness function (trace_coverage_confidence == "unavailable") fires before branch (c) of type-validation, so failure-path slices with last_traced_at=None would never reach the type check anyway — but the timestamp is stamped in all paths for honest-confidence rendering downstream.
Discoveries that mattered¶
- Pydantic
model_dump(mode="json")renders UTC datetimes with theZsuffix. The B2 integration test'sfreshness["indexed_at"]is the wire-shape string aftermodel_dump, not the source ISO string. Initial assertion against"2026-05-17T00:00:00+00:00"failed; corrected to"2026-05-17T00:00:00Z". Pin the wire shape, not the source. IndexHealthProbeimportscodegenie.execas_exec. The HEAD-resolver monkeypatch must targetcodegenie.exec.run_allowlisted(the imported module attribute), NOTih.run_allowlisted(which doesn't exist on theindex_healthmodule). Mirrors the pattern that other index_health tests use indirectly via fixtures.tests/adv/phase02/test_stale_scip_fixture.pyhad a load-bearingset(...) == {"scip"}assertion that S5-05 widens. Updated to{"scip", "runtime_trace"}with a comment naming the future-widening trigger (S6-08).- Ruff's
UP031flagged%-formatting in adversarial assertion messages; converted to f-strings.F811flagged the pytest-fixture import that also names a test parameter — applied a targetednoqa: F811(the pytest pattern requires both the import and the parameter). - No edits to
index_health.py. AC-16's structural promise held:git diff origin/master..HEAD --name-onlydoes NOT includesrc/codegenie/probes/layer_b/index_health.py. The registry decorator pattern +read_raw_sliceskernel is the Open/Closed seam working as designed.
Refactor decisions¶
- Single
_now_utc_iso()seam instead of three duplicated_dt.datetime.now(_dt.UTC).isoformat()call-sites — keeps the I/O surface narrow (one impure-line in S5-02's pure-helper layer) and gives the AST-purity audit a stable boundary to assert against. - Six-branch isinstance discipline for the freshness function — mirrors
scip_freshnesslines 168-184 verbatim. The branch order is load-bearing per the story validator's consistency block; (b) catches failure paths before (c) type-validates the optionallast_traced_at. - Hypothesis property under
tests/property/— same directory the existingtest_index_freshness_roundtrip.pylives. The property strategies are intentionally broad (None / int / bool / list / str) to exercise every isinstance arm including the "weird object" defensive paths. - Adversarial helpers in
tests/adv/phase02/_helpers.py— second helper file in the adv corpus (aftertests/adv/_helpers.py); the rule-of-three threshold is not yet reached so no kernel extraction. Documented for the next consumer. - No shared
_FreshnessHelpersbase. Story Notes explicitly defer this to S6-08 (the rule-of-three trigger fires at the 3rd consumer + the 4th & 5th together). Followed Rule 2 / Rule 11 — the duplication is fine; the trigger is recorded.
Acceptance criteria — evidence¶
| AC | Evidence |
|---|---|
AC-1 — function placement + decorator + signature + __all__ |
test_function_signature_matches_registry_contract, test_function_exported_in_all |
| AC-2 — branch table over six cases, total | test_branch_{a,b,c,d,e,f,g}_* (8 tests) + test_function_never_raises_on_arbitrary_object_values |
AC-3 — Final[str] message constants |
test_all_message_constants_annotated_final_str, test_message_constants_values_are_unique, test_message_constants_match_id_pattern |
| AC-4 — purity AST-walk audit | test_function_body_has_no_clock_or_io_calls, test_function_body_has_no_await_or_subprocess, test_function_body_is_pure_no_assignments_to_outer_state |
| AC-5 — registry membership + identity | test_runtime_trace_registered_in_default_registry |
| AC-6 — B2 drift end-to-end (four-part) | test_b2_emits_drift_for_runtime_trace |
| AC-7 — B2 clean = Fresh | test_b2_emits_fresh_for_runtime_trace |
| AC-8 — B2 absent slice → upstream_unavailable | test_b2_emits_stale_for_absent_runtime_trace_slice |
| AC-9 — mutation-resistance table (5 stubs) | test_mutant_fails_at_least_one_named_check parametrized over 5 mutants — every one fails AC-6, AC-7, or AC-8 |
| AC-10 — Hypothesis totality + purity | tests/property/test_runtime_trace_freshness_purity.py::test_totality_and_purity, test_wall_clock_under_soft_budget |
| AC-11 — argument-order canary | test_arg_order_is_slice_then_head |
| AC-12 — adversarial three scenarios | test_image_digest_change_changes_cache_key + test_drift_detected_through_b2 + test_clean_run_emits_fresh |
| AC-13 — no real subprocess in adv | forbid_real_subprocess fixture + test_no_real_subprocess_in_adv_layer smoke |
| AC-14 — ADR refs in adv assertion messages | test_assertion_messages_carry_adr_refs (AST-introspect) |
| AC-15 — duplicate-registration smoke | test_runtime_trace_duplicate_registration_rejected |
AC-16 — no edits to index_health.py |
test_no_edit_to_index_health_module (git diff audit) |
AC-17 — mypy --strict clean |
mypy --strict src/codegenie → 109 files, 0 errors |
AC-18 — forbidden-patterns green |
python scripts/check_forbidden_patterns.py exit 0 |
Gates¶
ruff check(src + tests) — cleanruff format --check— 330 files formattedmypy --strict src/codegenie— 109 files, 0 errorspython scripts/check_forbidden_patterns.py— exit 0- Full suite —
2669 passed, 15 skipped, 3 deselected, 2 xfailed(one initial flake ontest_stale_scip_regenerate_guard.pyresolved on re-run; the test passes in isolation — order-pollution from a pre-existing test, not new in this story)
Files touched¶
- Extended (S5-02 inline-patch):
src/codegenie/probes/layer_c/runtime_trace.py— added_dt,Index{Freshness,erError},Fresh,Stale,DigestMismatch,register_index_freshness_check,IndexNameimports; added_MSG_*Final[str]constants; addedlast_traced_atto_EXPECTED_SLICE_KEYS; added_now_utc_iso(); threadedlast_traced_atthrough_empty_slice,_slice_from_aggregate, and the inline_build_envelope_build_failed; added_check_runtime_trace_freshness+ the@register_index_freshness_checkdecorator; updated__all__. - Updated test:
tests/adv/phase02/test_stale_scip_fixture.py— outer-key set widened to includeruntime_trace. - New tests:
tests/unit/probes/layer_c/test_runtime_trace_freshness.py(21 tests: signature, registry, branch table, B2 integration, arg-order, duplicate, no-edit-to-B2)tests/unit/probes/layer_c/test_runtime_trace_freshness_purity.py(6 tests: Final[str] audit + AST-walk purity audit)tests/unit/probes/layer_c/test_runtime_trace_freshness_mutation.py(5 mutants killed)tests/property/test_runtime_trace_freshness_purity.py(2 Hypothesis properties)tests/adv/phase02/test_image_digest_drift.py(5 tests: cache-key, drift, clean, no-real-subproc, ADR-message audit)tests/adv/phase02/_helpers.py(sharedbuild_drift_slice+forbid_real_subprocess+clean_freshness_registry)
Lessons for future Phase 2 stories¶
- Pydantic JSON-mode renders UTC as
Z. When a story asserts the wire shape of a datetime, pin against"...Z", not"...+00:00". The source datetime is built fromfromisoformat(...)(which produces+00:00) butmodel_dump(mode="json")is the wire serializer. - Outer-key invariants widen with each freshness-check registration. The
tests/adv/phase02/test_stale_scip_fixture.pytest pinned the registry to a single name; S5-05 widens it to 2; S6-08 will widen it to 5. Each new registration must update this assertion at the same time — leave a comment naming the next-widening story (S6-08). scip_freshnessis the load-bearing template. Every future@register_index_freshness_checkcandidate should clone its shape: pure function, isinstance-discipline,try/except ValueErrorfor timestamp parsing,Stale(IndexerError(_MSG_*))for every failure return. Deviation from the template is a smell.- The HEAD-resolver monkeypatch target is
codegenie.exec.run_allowlisted(via the module attribute), NOTindex_health.run_allowlisted(which doesn't exist). The pattern is consistent — patch the imported module, not the importing module.