Story S5-05 — runtime_trace freshness-check registration + image_digest_drift adversarial¶
Step: Step 5 — Ship Layer C (runtime + container) probes
Status: Done — GREEN 2026-05-17 (phase-story-executor; see _attempts/S5-05.md for the per-AC evidence table + gate log)
Effort: S
Depends on: S5-02 (RuntimeTraceProbe writes <raw_dir>/runtime_trace.json containing built_image_digest, last_traced_image_digest, last_traced_at), S1-02 (@register_index_freshness_check decorator-registry — landed at codegenie.indices.registry), S1-01 (IndexFreshness, Fresh, Stale, DigestMismatch, IndexerError), S4-01 (IndexHealthProbe loops the freshness registry via default_freshness_registry.dispatch_all; read_raw_slices(raw_dir(repo.root)) is the kernel that hydrates the per-name slice dict), S1-05 (IndexName newtype)
ADRs honored: 02-ADR-0004 (image-digest as declared_inputs special token — the adversarial proves cache invalidation), 02-ADR-0006 (sum-type freshness location), 02-ADR-0003 (IndexHealthProbe runs_last=True — depends on this freshness function executing at the right moment)
Validation notes (2026-05-17, phase-story-validator v1 — verdict HARDENED)¶
The original draft was structurally sound (registry-side Open/Closed seam, freshness function in runtime_trace.py not index_health.py, four-part Stale(DigestMismatch(...)) assertion discipline, load-bearing adversarial test path) but encoded six load-bearing contract drifts against code that already exists on disk:
- Function signature contradiction (Consistency block). Draft prescribed
_check_runtime_trace_freshness(slice_: RuntimeTraceSlice, head: GitSha) -> IndexFreshness. The actual registry contract atsrc/codegenie/indices/registry.py:67isFreshnessCheck = Callable[[dict[str, object], str], "IndexFreshness"]; the registeredscip_freshnessprecedent atsrc/codegenie/probes/layer_b/index_health.py:144takesdict[str, object]and a plainstrhead. There is noRuntimeTraceSlicemodel and noGitShanewtype in the codebase (the existing newtypes areIndexName,IndexId,SkillId,TaskClassId,ProbeId,Language,ConventionId). A typed-Pydantic-model freshness function would not register and the AC was unachievable. Rewritten to_check_runtime_trace_freshness(slice_: dict[str, object], head: str) -> IndexFreshnessmirroringscip_freshnessexactly. - Error class wrong name + wrong constructor shape (Consistency block). Draft prescribed
IndexFreshnessRegistryError(reason="duplicate_name"). The actual class isFreshnessRegistryError(atsrc/codegenie/errors.py:155); its message is a positional stringf"duplicate index_name {name!r}: {prior} and {origin}"— noreason=kwarg. Existing duplicate-registration test attests/unit/indices/test_freshness_registry.py:78asserts the message-string shape (offending name + both call-site qualnames). Story rewritten to match. - Slice-field name
index_freshnesswrong (Consistency block). Draft AC-6 Scenario B assertedB2 slice's index_freshness["runtime_trace"]. The actualIndexHealthProbe.runemitsschema_slice={"index_health": results}whereresults[str(name)] = {"freshness": <model_dump>, "confidence": ..., "current_commit": ..., "last_indexed_at": ...}(index_health.py:397-404). Story rewritten toout.schema_slice["index_health"]["runtime_trace"]["freshness"]["kind"]etc. - Upstream-unavailable signal wrong (Consistency block). Draft branched on
slice_.confidence == "unavailable". There is noconfidencefield on the runtime_trace slice; the runtime_trace slice carriestrace_coverage_confidence: Literal["high","medium","low","unavailable"](S5-02 AC line 86 + AC line 87) but the envelopeconfidenceisLiteral["high","medium","low"]per S5-02's contract-preservation pin (AC line 84). More importantly, the empty-dict sentinel established byscip_freshness(line 164) is the registry's canonical upstream-unavailable path —default_freshness_registry.dispatch_allpassesslices.get(name, {})for any name without a slice file. Story rewritten to use the empty-dict sentinel as the primary upstream-unavailable signal, withtrace_coverage_confidence == "unavailable"as a secondary slice-driven signal. Fresh.indexed_atderivation hand-wave (Consistency block). Draft said "Fresh(indexed_at=)". Fresh.indexed_at: datetimerequires a realdatetime; the function is pure (no clock read); there is no slice field today that carries a timestamp. Resolved by pinning a new S5-02 slice field,last_traced_at: str(ISO-8601 UTC) — added as an explicit upstream dependency in this story's "Notes for the implementer" with the small-AC-patch surface; the freshness function parses it via_dt.datetime.fromisoformat, with malformed-string →Stale(IndexerError("runtime_trace_slice_malformed"))mirroringscip_freshnesslines 200–203._checknot added to__all__(Consistency harden). Draft said "leave it module-private". Thescip_freshnessprecedent IS exported in__all__(index_health.py:97). Story rewritten to require the same export so unit tests can import the function symbolically.
Plus five harden-tier improvements: replaced source-grep purity check with AST-walk audit (mirrors S5-04 T4/T5 + S5-03 ASTAudit); added a Hypothesis purity / totality property test; added a mutation-resistance suite; added Final[str] constants for every IndexerError.message value; added an argument-order canary test (registry calls positional (slice, head) — swapping silently corrupts every freshness check; this is a structural defense). One design-pattern surfaced in Notes: this is the rule-of-three threshold for @register_index_freshness_check (scip + runtime_trace + S6-08's three rule-pack-versioned checks). Per Rule 2 / S5-04 D2 precedent, NO kernel extraction in this story — the trigger is recorded for S6-08 authors.
Eighteen in-place edits applied. Full audit trail in _validation/S5-05-runtime-trace-freshness-and-drift.md.
Context¶
S5-02's RuntimeTraceProbe writes <raw_dir>/runtime_trace.json with three fields B2 (IndexHealthProbe, S4-01) reads to decide freshness:
- built_image_digest: str | None — the digest of the image just built (or the cached one). None when the resolver was unbound, returned None, or raised (S5-02 AC line 81–83).
- last_traced_image_digest: str | None — the digest of the image actually traced (which equals built_image_digest on a clean run, but differs when the on-disk slice from a prior gather is stale relative to a freshly-resolved current digest — the drift case).
- last_traced_at: str — ISO-8601 UTC timestamp of when the trace was recorded (this field is added by S5-02; pinned as a small upstream-dependency AC in Notes-for-implementer below — it does not exist in S5-02's hardened slice today and must be added before this story executes).
When last_traced_image_digest != built_image_digest, the runtime-trace index is stale — the trace evidence reflects an older image than the one that's currently built. IndexHealthProbe must emit IndexFreshness.Stale(reason=DigestMismatch(expected=built, actual=last_traced)), not a silent Fresh. The mechanism is S1-02's @register_index_freshness_check("runtime_trace") registry — IndexHealthProbe (S4-01) loops the registry via default_freshness_registry.dispatch_all; freshness functions land alongside their probe (Open/Closed at the file boundary — phase-arch-design.md §"Component design" #1 + §"Gap 3"; existing scip_freshness precedent at index_health.py:144).
The registry's contract is unforgiving and structural: the function takes slice_: dict[str, object], head: str (positionally) and returns IndexFreshness (a discriminated union of Fresh | Stale). The dispatch passes slices.get(name, {}) so a missing slice file lands as an empty dict — the canonical "upstream unavailable" sentinel (mirrors scip_freshness line 164).
The load-bearing adversarial: tests/adv/phase02/test_image_digest_drift.py proves three things without invoking real docker:
1. Resolving two distinct digests produces two distinct cache keys via the image-digest:<resolved> declared-input token (02-ADR-0004 / S5-02 step 0 wires cache/keys.py::_resolve_special_token). Subject-under-test: cache-key derivation, not _execute_scenario invocation count.
2. Constructed runtime_trace.json slices with built ≠ last_traced cause IndexHealthProbe to emit Stale(DigestMismatch(expected=built, actual=last_traced)) in its schema_slice["index_health"]["runtime_trace"]["freshness"].
3. Slices with built == last_traced emit Fresh(indexed_at=...) in the same field (Rule 12 — fail loud; negative-case discipline).
References¶
- phase-arch-design.md §"Component design" #1 (
IndexHealthProbe) — registry-dispatched freshness checks. - phase-arch-design.md §"Component design" #6 (
RuntimeTraceProbe) —last_traced_image_digestandbuilt_image_digestslice fields. - phase-arch-design.md §"Edge cases" rows 5, 6, 14 — docker-build failure, macOS strace, image-digest resolver returns None.
- phase-arch-design.md §"Testing strategy" —
test_image_digest_drift.py. - 02-ADR-0004 §Consequences —
tests/adv/phase02/test_image_digest_drift.py(load-bearing adversarial) is named here. - 02-ADR-0006 —
IndexFreshnesslives atcodegenie.indices.freshness. - High-level-impl.md §"Step 5" — registry entry for
@register_index_freshness_check("runtime_trace")inruntime_trace.py. - final-design.md §"Adversarial corpus" —
test_image_digest_drift.pylisted alongsidetest_stale_scip_fixture.pyas load-bearing. src/codegenie/indices/registry.py—FreshnessChecksignature contract.src/codegenie/probes/layer_b/index_health.py:143-204—scip_freshnessprecedent (signature, branch shape, malformed-slice handling,IndexerError.messagediscipline).tests/unit/indices/test_freshness_registry.py:78-106— existing duplicate-registration test shape.
Goal¶
Register @register_index_freshness_check(IndexName("runtime_trace")) in src/codegenie/probes/layer_c/runtime_trace.py — a pure function _check_runtime_trace_freshness(slice_: dict[str, object], head: str) -> IndexFreshness that returns Stale(DigestMismatch(...)) when last_traced_image_digest != built_image_digest and Fresh(indexed_at=...) (parsed from the slice's last_traced_at) otherwise. Land tests/adv/phase02/test_image_digest_drift.py — the load-bearing adversarial proving (a) image-digest mutation produces two distinct cache keys via the declared-input special token and (b) B2 surfaces the drift through schema_slice["index_health"]["runtime_trace"].
Acceptance criteria¶
-
[ ] AC-1 (function placement + decorator + signature).
src/codegenie/probes/layer_c/runtime_trace.py(the module from S5-02) gains a new top-level function_check_runtime_trace_freshness(slice_: dict[str, object], head: str) -> IndexFreshness, decorated with@register_index_freshness_check(IndexName("runtime_trace")). The signature is verbatim —dict[str, object]for the slice, plainstrfor the head, NOT a typedRuntimeTraceSlicemodel and NOT aGitShanewtype (neither exists). The function is added toruntime_trace.py's__all__(mirrorsscip_freshnessinindex_health.py:97) so unit tests canfrom codegenie.probes.layer_c.runtime_trace import _check_runtime_trace_freshnesssymbolically. Theheadparameter is unused (freshness signal is digest-based, not commit-based — the registry contract is uniform(slice, head)for all freshness checks); document the no-op in a one-line docstring. -
[ ] AC-2 (branch table — six cases, total over the input domain). The function body is total: every input produces exactly one
IndexFreshnessvalue; the function never raises (matchesscip_freshness's "never raises" property). Branches, in order:
| # | Condition | Result |
|---|---|---|
| a | slice_ == {} (registry sentinel: dispatch passed slices.get(name, {}) because runtime_trace.json was absent on disk) |
Stale(reason=IndexerError(message=_MSG_UPSTREAM_UNAVAILABLE)) where _MSG_UPSTREAM_UNAVAILABLE: Final[str] = "upstream_runtime_trace_unavailable" |
| b | slice_.get("trace_coverage_confidence") == "unavailable" (S5-02's tetra-state surfaces when the probe ran but produced no usable trace — build failed, resolver returned None, macOS, etc.) |
Same as (a) — Stale(reason=IndexerError(message=_MSG_UPSTREAM_UNAVAILABLE)). The sentinel string is shared so a downstream renderer collapses both upstream-degraded paths identically. |
| c | Any required field missing or wrong-typed (built_image_digest not str | None; last_traced_image_digest not str | None; last_traced_at not str) |
Stale(reason=IndexerError(message=_MSG_SLICE_MALFORMED)) where _MSG_SLICE_MALFORMED: Final[str] = "runtime_trace_slice_malformed". Mirrors scip_freshness lines 168-184; isinstance-checks with bool-discriminator on the int paths if any are added later. |
| d | built_image_digest is None (resolver was unbound / returned None / raised — S5-02 records this state on disk) |
Stale(reason=IndexerError(message=_MSG_NO_BUILT_IMAGE)) where _MSG_NO_BUILT_IMAGE: Final[str] = "no_built_image" |
| e | last_traced_image_digest is None (trace did not complete — e.g., docker build failed) |
Stale(reason=IndexerError(message=_MSG_NO_TRACE_RECORDED)) where _MSG_NO_TRACE_RECORDED: Final[str] = "no_trace_recorded" |
| f | last_traced_image_digest != built_image_digest (the drift case) |
Stale(reason=DigestMismatch(expected=built_image_digest, actual=last_traced_image_digest)). The argument order is load-bearing: expected = currently-built; actual = what-was-traced. A test asserts the swap is wrong. |
| g | Else (digests match) | Fresh(indexed_at=parsed_last_traced_at) where parsed_last_traced_at = datetime.fromisoformat(slice_["last_traced_at"]) inside a try / except ValueError → Stale(reason=IndexerError(_MSG_SLICE_MALFORMED)) (mirrors scip_freshness lines 200-203). The parsed datetime carries its own tzinfo per S5-02's UTC discipline. |
-
[ ] AC-3 (Final[str] message constants, module-scope). All four message strings appear as
Final[str]module-level constants —_MSG_UPSTREAM_UNAVAILABLE,_MSG_NO_BUILT_IMAGE,_MSG_NO_TRACE_RECORDED,_MSG_SLICE_MALFORMED. No string literal is duplicated between the function body and the test file (the test imports the constants). Mirrors the_WARNING_IDS/_SCIP_REQUIRED_KEYSFinaldiscipline inindex_health.py. A test assertsinspect.getmembersreports each asFinal[str]-annotated. -
[ ] AC-4 (purity — AST-walk audit, NOT source-grep).
tests/unit/probes/layer_c/test_runtime_trace_freshness_purity.pyAST-walks the_check_runtime_trace_freshnessfunction's body and asserts NOast.Nameorast.Attributereferencesdatetime.now,datetime.datetime.now,datetime.utcnow,time.time,time.monotonic,os.path.getmtime,os.stat,Path.stat,pathlib.Path.stat, or any subprocess/asyncio call. Mirrors the AST-audit pattern in S5-04 T4/T5 / S5-03 ASTAudit. Source-grep is bypassable via string concatenation; AST-walk is not. The walker also asserts the function has noawaitand noforloop that iterates a fresh-resolution side-effecting iterator (the body is straight-lineif/elif/return). -
[ ] AC-5 (registry membership + retrieval). After
runtime_trace.pyis imported,default_freshness_registry.registered_names()containsIndexName("runtime_trace"). A test imports the module then assertsdefault_freshness_registry._checks[IndexName("runtime_trace")] is _check_runtime_trace_freshness(identity, not equality — the registry stores the function object unchanged, mirroringscip_freshness's identity atindex_health.py:143-144). Uses the sameclean_freshness_registrysnapshot+restore fixture pattern thattest_index_health_probe.pyuses (lines 74-87). -
[ ] AC-6 (B2 end-to-end integration — drift surfaces in
schema_slice["index_health"]). A unit testtests/unit/probes/layer_c/test_runtime_trace_freshness.py::test_b2_emits_drift_for_runtime_trace: - Imports
codegenie.probes.layer_c.runtime_trace(planting the registry entry by import side-effect). - Writes a synthetic
runtime_trace.jsontoraw_dir(tmp_path)withbuilt_image_digest="sha256:def..."andlast_traced_image_digest="sha256:abc..."andtrace_coverage_confidence="high"andlast_traced_at="2026-05-17T00:00:00+00:00". - Monkeypatches
_exec.run_allowlistedto returnHEAD_SHAforgit rev-parse HEAD. - Runs
await IndexHealthProbe().run(repo, ctx). - Asserts the four-part inequality against
out.schema_slice["index_health"]["runtime_trace"]["freshness"](themodel_dump(mode="json")shape per index_health.py:370):freshness["kind"] == "stale"freshness["reason"]["kind"] == "digest_mismatch"freshness["reason"]["expected"] == "sha256:def..."freshness["reason"]["actual"] == "sha256:abc..."
- Asserts
out.schema_slice["index_health"]["runtime_trace"]["confidence"] == "medium"(theDigestMismatch → "medium"rule at index_health.py:259-260). - Asserts
"runtime_trace" in default_freshness_registry.registered_names()survived the run (no transient registration).
Asserting all four parts (not just kind == "stale") is the load-bearing mutation-resistance pin — a buggy implementation that regressed DigestMismatch to IndexerError("idk") would still pass kind == "stale" (this is S4-02's "implementation risk #3" applied here).
-
[ ] AC-7 (B2 end-to-end integration — clean run is
Fresh). Sister testtest_b2_emits_fresh_for_runtime_trace: same shape as AC-6 butbuilt_image_digest == last_traced_image_digest == "sha256:abc..."; assertsfreshness["kind"] == "fresh";freshness["indexed_at"] == "2026-05-17T00:00:00+00:00";confidence == "high". Negative-case discipline (Rule 12: fail loud) — the clean case is asserted so silent breakage is caught. -
[ ] AC-8 (B2 end-to-end integration — upstream-unavailable). Sister test
test_b2_emits_stale_for_absent_runtime_trace_slice: noruntime_trace.jsonwritten toraw_dir; assertfreshness["kind"] == "stale";freshness["reason"]["kind"] == "indexer_error";freshness["reason"]["message"] == "upstream_runtime_trace_unavailable". Exercises the empty-dict sentinel path. -
[ ] AC-9 (mutation-resistance suite).
tests/unit/probes/layer_c/test_runtime_trace_freshness_mutation.pydefines a parametrized table of 5+ intentionally-wrong stub implementations and asserts each fails at least one named test from AC-2/5/6/7/8: always_fresh: returnsFresh(indexed_at=...)regardless of input → must fail AC-6 (drift case) AND AC-8 (absent slice).always_stale: returnsStale(IndexerError("x"))regardless of input → must fail AC-7 (clean case).swap_expected_actual: emitsDigestMismatch(expected=last_traced, actual=built)(swapped) → must fail AC-6 (expected/actualfield assertions).wrong_reason_kind: returnsStale(IndexerError("digest_mismatch"))for the drift case (collapsed) → must fail AC-6 (reason["kind"] == "digest_mismatch"discriminator check).drops_upstream_unavailable_branch: returnsStale(IndexerError("scip_slice_malformed"))for empty-dict input → must fail AC-8 (message string).
The test parametrizes the stub list; per-stub, it monkeypatches default_freshness_registry._checks[IndexName("runtime_trace")] = stub and asserts at least one of the named tests under AC-6/7/8 fails. Mirrors S5-04 T2 / S5-03 T16 mutation-resistance discipline. The test would itself fail (false-pass) if any stub were behaviorally correct; that is the structural defense.
- [ ] AC-10 (Hypothesis property — totality + purity).
tests/property/test_runtime_trace_freshness_purity.pyuses Hypothesis to generate arbitrarydict[str, object]slices (text strategy for digest strings,none() | text(), occasionalint/boolto exercise the malformed branch). For every drawn input: _check_runtime_trace_freshness(slice_, head)returns exactly one ofFresh | Stale(totality — never raises, never returns None).- Called twice with the same input, returns equal values (purity — no hidden state).
- Called twice with the same input from two separate test processes (parametrized with
pytest-xdist -n2skip if not installed; Rule 11), returns equal values (no hidden mutable globals). -
Wall-clock between two calls with the same input is < 5 ms (no I/O fallthrough — soft signal; the AST audit at AC-4 is the hard structural defense).
-
[ ] AC-11 (argument-order canary).
tests/unit/probes/layer_c/test_runtime_trace_freshness.py::test_arg_order_is_slice_then_headasserts the registry's positional call signature is honored: _check_runtime_trace_freshness({"built_image_digest": "sha256:abc", "last_traced_image_digest": "sha256:abc", "last_traced_at": "2026-01-01T00:00:00+00:00", "trace_coverage_confidence": "high"}, "deadbeef")returnsFresh.-
Calling with swapped args
_check_runtime_trace_freshness("deadbeef", {...})(string as slice, dict as head) raisesTypeError(orAttributeErroron.get) — explicitly asserted. This pins the positional contract thatFreshnessRegistry.dispatch_all(registry.py:174) relies on. A silent regression where the function defensively accepts either order is structurally wrong — the registry MUST be the source of order truth. -
[ ] AC-12 (
tests/adv/phase02/test_image_digest_drift.py— load-bearing adversarial, three scenarios). The file lands and is registered for theadv-phase02CI job (S8-03). The job globtests/adv/phase02/test_*.pyalready picks it up — no S8-03 amendment is needed by this story (S8-03 liststest_image_digest_drift.pyexplicitly in phase-arch-design.md L953); the story's PR description re-confirms the path. -
Scenario A — cache-key invalidation via the image-digest declared-input token.
tests/adv/phase02/test_image_digest_drift.py::test_image_digest_change_changes_cache_key:- Constructs a
RuntimeTraceProbeinstance (or, if S5-02's cache-key API is not yet stable at execute time, exercisescache/keys.py::declared_inputs_fordirectly with a syntheticdeclared_inputs=["Dockerfile", "image-digest:<resolved>"]). - Binds two distinct
image_digest_resolvercallables: one returning"sha256:abc...", one returning"sha256:def...". - Computes the cache key under each binding.
- Asserts the two cache keys are distinct. Subject-under-test is the token-resolution + cache-key derivation path landed by S5-02's
cache/keys.py::_resolve_special_token(S5-02 step 0); this story's adversarial proves the structural integrity of that path at the integration boundary. - No
_execute_scenariois invoked; nodocker buildis invoked; the test runs in <100 ms. - Mutation hint embedded in the assertion message:
"image-digest:<resolved>token must produce distinct cache keys under distinct resolver returns — see 02-ADR-0004 §Consequences. If equal, the dispatch arm incache/keys.py::_resolve_special_tokenis not folding the resolved string into the content-hash tuple."
- Constructs a
-
Scenario B — drift detection through B2 (
schema_slice["index_health"]["runtime_trace"]).test_drift_detected_through_b2: same shape as AC-6 but lives intests/adv/phase02/; identical four-part assertion againstschema_slice["index_health"]["runtime_trace"]["freshness"]. The duplication is intentional — the unit-test sibling undertests/unit/probes/layer_c/is the development-time safety net; the adversarial undertests/adv/phase02/is the CI-gating mirror. The two tests share a fixture helper (_build_drift_slice(built, last_traced) -> dict[str, object]) imported from a sharedtests/adv/phase02/_helpers.py. -
Scenario C — clean run is
Fresh.test_clean_run_emits_fresh: same shape as AC-7 (sibling assertion); also CI-gating. -
[ ] AC-13 (adversarial does NOT invoke real
dockeror any subprocess). A pytest fixture_forbid_real_subprocess(shared withtests/adv/phase02/_helpers.py) monkeypatchessubprocess.run,subprocess.Popen.__init__,asyncio.create_subprocess_exec,asyncio.create_subprocess_shellto raiseAssertionError("real subprocess forbidden in adversarial layer")if invoked from this file's tests. Each of the three scenario testsautouses this fixture. The Phase 0fencejob stays green; the test file completes in ≤ 5 s. Tests use theclean_freshness_registrysnapshot+restore fixture from the shared conftest. -
[ ] AC-14 (informative failure messages — ADR-0004 in the assertion narrative). Every assertion in
test_image_digest_drift.pythat targets the cache-key derivation embeds the substring"02-ADR-0004"and"image-digest"in the failure message; every assertion targeting B2 drift embeds the substring"02-ADR-0006"and"DigestMismatch". A test introspects the file's assertion messages via AST and asserts the coverage (greppable on a build break — operator-side debuggability). -
[ ] AC-15 (duplicate-registration smoke).
tests/unit/probes/layer_c/test_runtime_trace_freshness.py::test_runtime_trace_duplicate_registration_rejected: importsruntime_trace(planting the entry), then attemptsdefault_freshness_registry.register(IndexName("runtime_trace"))(dummy_check)— assertFreshnessRegistryErroris raised;exc_info.value.args[0]contains"duplicate index_name","runtime_trace", AND both call-sitemodule.qualnamestrings (mirrors the existing structural test at tests/unit/indices/test_freshness_registry.py:78-106). This is a smoke test for S1-02's registry hardening at the runtime_trace integration boundary, not a re-test of S1-02 itself. Usesclean_freshness_registryto leave the singleton intact. -
[ ] AC-16 (no edits to
IndexHealthProbe). A structural test readsgit diff --name-only origin/master..HEADand assertssrc/codegenie/probes/layer_b/index_health.pyis NOT in the diff. The Open/Closed promise is observable: adding a new index source must require ZERO edits to B2. Mirrors the registry-symmetry discipline (02-ADR-0006 §Consequences "Gap 3 improvement"; S6-08 will assert the same for semgrep/gitleaks/conventions). If the executor finds a reason to editindex_health.py, this AC fails and the executor must escalate to ADR-amend, not silently edit. -
[ ] AC-17 (
mypy --strictclean). New module section + tests passmypy --strict._check_runtime_trace_freshnesshas noAnyand no untypeddict; the function's return type is the discriminated-unionIndexFreshness. The repo-wide--warn-unreachable(Phase 0) flips a missingcasearm in any future consumer-sidematchto a build error. -
[ ] AC-18 (
forbidden-patternsstays green). No new pattern violations introduced; the existing layer_c-scope predicate (S5-01) covers the new function and the new test files. Nomodel_construct, no plaintext-persistence, nosubprocess.runliteral in the new test files.
Implementation outline¶
-
Add
Final[str]message constants at module scope insrc/codegenie/probes/layer_c/runtime_trace.py(after the existing S5-02 imports, before theRuntimeTraceProbeclass): -
Add the freshness function after the
RuntimeTraceProbeclass. Signature is verbatim per AC-1:from codegenie.indices.freshness import ( IndexFreshness, Fresh, Stale, DigestMismatch, IndexerError ) from codegenie.indices.registry import register_index_freshness_check from codegenie.types.identifiers import IndexName import datetime as _dt @register_index_freshness_check(IndexName("runtime_trace")) def _check_runtime_trace_freshness( slice_: dict[str, object], head: str ) -> IndexFreshness: """Pure ``(slice, head) -> IndexFreshness`` for runtime_trace. The ``head`` parameter is unused (the freshness signal is digest-based, not commit-based) but the registry signature is uniform across all freshness checks per S1-02; accept-and-ignore. """ # Branch (a): empty dict sentinel — runtime_trace.json absent on disk. if not slice_: return Stale(reason=IndexerError(message=_MSG_UPSTREAM_UNAVAILABLE)) # Branch (b): probe ran but produced no usable trace. if slice_.get("trace_coverage_confidence") == "unavailable": return Stale(reason=IndexerError(message=_MSG_UPSTREAM_UNAVAILABLE)) # Branch (c): isinstance validation. Mirrors scip_freshness lines 168-184. built = slice_.get("built_image_digest") last_traced = slice_.get("last_traced_image_digest") last_traced_at = slice_.get("last_traced_at") if ( not (built is None or isinstance(built, str)) or not (last_traced is None or isinstance(last_traced, str)) or not isinstance(last_traced_at, str) ): return Stale(reason=IndexerError(message=_MSG_SLICE_MALFORMED)) # Branch (d): no built image (resolver was unbound / returned None / raised). if built is None: return Stale(reason=IndexerError(message=_MSG_NO_BUILT_IMAGE)) # Branch (e): no trace recorded. if last_traced is None: return Stale(reason=IndexerError(message=_MSG_NO_TRACE_RECORDED)) # Branch (f): drift case — argument order is load-bearing. if last_traced != built: return Stale(reason=DigestMismatch(expected=built, actual=last_traced)) # Branch (g): clean — parse the timestamp. Mirrors scip lines 200-203. try: parsed = _dt.datetime.fromisoformat(last_traced_at) except ValueError: return Stale(reason=IndexerError(message=_MSG_SLICE_MALFORMED)) return Fresh(indexed_at=parsed) -
Add
_check_runtime_trace_freshnessto__all__inruntime_trace.py(parallel toscip_freshnessatindex_health.py:97). The decorator registration is what makes it findable in the registry; the__all__export is what makes it importable in unit tests symbolically. -
Write three test files:
tests/unit/probes/layer_c/test_runtime_trace_freshness.py— registry-membership / AC-5; B2 integration AC-6 / AC-7 / AC-8; arg-order canary AC-11; duplicate smoke AC-15.tests/unit/probes/layer_c/test_runtime_trace_freshness_purity.py— AST-walk audit AC-4;Final[str]constant audit AC-3.tests/unit/probes/layer_c/test_runtime_trace_freshness_mutation.py— mutation-resistance suite AC-9.-
tests/property/test_runtime_trace_freshness_purity.py— Hypothesis totality + purity AC-10. -
Write the adversarial:
tests/adv/phase02/_helpers.py—_build_drift_slice(built, last_traced) -> dict[str, object];_forbid_real_subprocessautouse fixture;clean_freshness_registryshared with the unit-test conftest.-
tests/adv/phase02/test_image_digest_drift.py— three scenarios per AC-12; informative ADR-cross-referenced messages per AC-14. -
No edits to
IndexHealthProbeitself (AC-16). S4-01 already loops the registry. The whole point of the registry seam is that this story plants a new entry that the loop dispatches generically. -
No edits to
cache/keys.pyunless Scenario A's adversarial demands one — S5-02 step 0 is supposed to have wired_resolve_special_token. If it has not landed by execute time, the executor escalates rather than silently fixing it here.
TDD plan — red / green / refactor¶
Red:
test_runtime_trace_registered_by_module_import(tests/unit/probes/layer_c/test_runtime_trace_freshness.py): importruntime_trace; assertIndexName("runtime_trace")is indefault_freshness_registry.registered_names(); assert_checks[IndexName("runtime_trace")] is _check_runtime_trace_freshness. (AC-5)test_freshness_signature_matches_registry_contract: introspect_check_runtime_trace_freshnessviainspect.signature; assert two positional params namedslice_andhead, annotateddict[str, object]andstr. (AC-1)test_arg_order_is_slice_then_head: positive call returnsFresh; swapped-arg call raisesTypeErrororAttributeError. (AC-11)test_message_constants_are_Final_str(test_runtime_trace_freshness_purity.py): introspect module annotations; assert each of the four_MSG_*symbols isFinal[str]. (AC-3)test_function_is_pure_via_ast_walk(test_runtime_trace_freshness_purity.py): AST-walk audit. (AC-4)test_freshness_fresh_when_digests_match: construct slice with matching digests + valid ISO timestamp; assertisinstance(result, Fresh); assertresult.indexed_at == datetime(2026, 5, 17, tzinfo=UTC). (AC-2-g)test_freshness_stale_digest_mismatch:built="sha256:def",last_traced="sha256:abc"; assert four-part inequalityisinstance(result, Stale)ANDisinstance(result.reason, DigestMismatch)ANDresult.reason.expected == "sha256:def"ANDresult.reason.actual == "sha256:abc". (AC-2-f)test_freshness_stale_no_built_image:built_image_digest=None; assertStale(reason=IndexerError(message=_MSG_NO_BUILT_IMAGE)). (AC-2-d)test_freshness_stale_no_trace_recorded:last_traced_image_digest=None; assertStale(reason=IndexerError(message=_MSG_NO_TRACE_RECORDED)). (AC-2-e)test_freshness_stale_upstream_unavailable_empty_dict:slice_={}; assertStale(reason=IndexerError(message=_MSG_UPSTREAM_UNAVAILABLE)). (AC-2-a)test_freshness_stale_upstream_unavailable_trace_coverage:slice_={"trace_coverage_confidence": "unavailable", "built_image_digest": "sha256:abc", "last_traced_image_digest": "sha256:abc", "last_traced_at": "2026-01-01T00:00:00+00:00"}; assert the sameStale(_MSG_UPSTREAM_UNAVAILABLE). (AC-2-b)test_freshness_stale_slice_malformed_wrong_type: parametrized over field-type errors (e.g.,built_image_digest=123,last_traced_at=42); assertStale(_MSG_SLICE_MALFORMED). (AC-2-c)test_freshness_stale_slice_malformed_bad_timestamp:last_traced_at="not-a-timestamp"; matching digests; assertStale(_MSG_SLICE_MALFORMED). (AC-2-g fallback)test_b2_emits_drift_for_runtime_trace: write drift fixture to raw_dir; runIndexHealthProbe; four-part assertion onschema_slice["index_health"]["runtime_trace"]["freshness"]. (AC-6)test_b2_emits_fresh_for_runtime_trace: clean fixture; assertkind=="fresh"+indexed_at+confidence=="high". (AC-7)test_b2_emits_stale_for_absent_runtime_trace_slice: no slice file; assertmessage=="upstream_runtime_trace_unavailable". (AC-8)test_runtime_trace_duplicate_registration_rejected: try to re-register; assertFreshnessRegistryErrorwith the expected message-string components. (AC-15)test_no_edit_to_index_health_module:git diffaudit; assertsrc/codegenie/probes/layer_b/index_health.pynot in diff. (AC-16)test_image_digest_change_changes_cache_key(tests/adv/phase02/test_image_digest_drift.py): two resolvers; assert distinct cache keys; failure message names ADR-0004. (AC-12 Scenario A)test_drift_detected_through_b2(tests/adv/phase02/): adversarial mirror of test 14; failure message names ADR-0006. (AC-12 Scenario B)test_clean_run_emits_fresh(tests/adv/phase02/): adversarial mirror of test 15. (AC-12 Scenario C)test_no_real_subprocess_in_adv(tests/adv/phase02/):_forbid_real_subprocessautouse triggersAssertionErrorif any subprocess call escapes. (AC-13)test_assertion_messages_carry_adr_refs(tests/adv/phase02/): AST-introspect the assertion messages intest_image_digest_drift.py; assert ADR substring coverage. (AC-14)test_mutation_resistance_table(tests/unit/probes/layer_c/test_runtime_trace_freshness_mutation.py): parametrized 5+ wrong stubs; assert each fails ≥ 1 named test. (AC-9)test_hypothesis_totality_and_purity(tests/property/test_runtime_trace_freshness_purity.py): Hypothesis property over arbitrary slice dicts. (AC-10)
Green:
- Add the
Final[str]constants and_check_runtime_trace_freshnesstoruntime_trace.pyper the Implementation outline. Update__all__. - Implement the adversarial test in
tests/adv/phase02/test_image_digest_drift.pyand shared helpers.
Refactor:
- Confirm the freshness function body is < 40 LOC; if longer, collapse to the seven canonical branches above.
- Confirm
_check_runtime_trace_freshnessreusesscip_freshness's shape verbatim (same isinstance discipline, sametry / except ValueErrorfor timestamp parsing, sameStale(IndexerError(_MSG_*))return path on every failure). Do NOT extract a shared base — the rule-of-three threshold (scip + runtime_trace + the three S6-08 checks = five) is met but the kernel extraction is deferred to S6-08 per Rule 2 (see Notes-for-implementer). - Confirm the adversarial reads cleanly — fixtures named, scenarios separated, assertions narrate ADR-0004 and ADR-0006 in failure messages.
Files to touch¶
- Extend (existing from S5-02):
src/codegenie/probes/layer_c/runtime_trace.py— add the fourFinal[str]constants, the freshness function, the@register_index_freshness_checkdecorator, and the__all__export. - New tests:
tests/unit/probes/layer_c/test_runtime_trace_freshness.py(AC-5, 6, 7, 8, 11, 15, 16)tests/unit/probes/layer_c/test_runtime_trace_freshness_purity.py(AC-3, 4)tests/unit/probes/layer_c/test_runtime_trace_freshness_mutation.py(AC-9)tests/property/test_runtime_trace_freshness_purity.py(AC-10)tests/adv/phase02/test_image_digest_drift.py(AC-12, 13, 14)tests/adv/phase02/_helpers.py(shared fixtures +_build_drift_slicebuilder +_forbid_real_subprocess)- PR description note: the new adversarial test path is already declared as load-bearing in phase-arch-design.md §"Testing strategy" L953. S8-03's
adv-phase02job globtests/adv/phase02/test_*.pypicks it up by convention; no S8-03 amendment is required by this story. The PR description re-confirms the path and verifies the glob via a dry-runpytest --collect-only tests/adv/phase02/test_image_digest_drift.py.
Out of scope¶
- Other freshness-check registrations —
semgrep,gitleaks,conventionsregister their freshness functions in their own files (S6-08). - The
stale-scipfixture full materialization — S7-02 (the SCIP staleness adversarial is S4-02). - The
adversarial_dockerfilecontainer-hardening test — S5-06. - Real
docker buildin CI — the adversarial mocks. A separate Phase-2integrationCI job runs realdocker buildagainst a fixture image (S8-03), but that's not gating on this story's correctness. - Modifying
IndexHealthProbeitself — S4-01 already loops the registry; this story plants a new entry the loop will dispatch (AC-16 makes the no-edit promise observable). - Extracting a shared "freshness-check kernel" base — the rule-of-three threshold is met (scip + runtime_trace + S6-08's three = five total) but per Rule 2 and the S5-04 D2 precedent, the kernel extraction is deferred to S6-08 (where the duplication becomes load-bearing). See Notes-for-implementer.
- Editing
cache/keys.py::_resolve_special_token— that's S5-02's deliverable (step 0). If the executor finds it absent at execute time, escalate; do NOT silently land it here (would inflate this story past the small-effort budget and split the cache contract across two stories).
Notes for the implementer¶
- Upstream-AC dependency on S5-02 —
last_traced_atslice field. This story's AC-2(g) requires the runtime_trace slice to carrylast_traced_at: str(ISO-8601 UTC) so the freshness function can constructFresh.indexed_at: datetime. Inspection of S5-02's hardened ACs (specifically the slice-key enumeration at AC line 86) shows this field is NOT present today. Before executing S5-05, verify the field is in S5-02's slice. If not, the executor must: - Open an inline AC patch to S5-02 (one new field; the slice key set widens by one; the snapshot test in S5-02 absorbs the addition).
- Surface this to the user as a one-line "S5-02 needs
last_traced_at— patching" before proceeding. -
Update both implementations. Alternative considered and rejected: using a sentinel datetime (
_dt.datetime.min.replace(tzinfo=_dt.UTC)) forFresh.indexed_at. Rejected because it regresses the honest-confidence surface — the consumer (confidence_section.py, S8-01) would render a misleading timestamp ("indexed at 0001-01-01") on every clean runtime_trace gather. -
Why the freshness function lives in
runtime_trace.pyand not inindex_health.py. Open/Closed at the file boundary (phase-arch-design.md §"Component design" #1; §"Gap 3"). If we put the runtime-trace-freshness check insideindex_health.py, then adding the SCIP-index freshness check (already landed atindex_health.py:143) would have set the wrong precedent. The registry decorator inverts this: each probe owns its own freshness function; B2 dispatches them all generically. AC-16 makes this observable: zero edits toindex_health.py. The "S4-01 has amatch index_name:block that grows every phase" anti-pattern (final-design.md §"Improvement" #14) is exactly what S1-02's registry exists to prevent. -
The freshness function is pure — no
datetime.now(), no I/O, no subprocess. The signal comes from slice content (thelast_traced_atfield S5-02 records at write time). Theheadparameter is unused here (it matters forscip_indexfreshness — there,head != last_indexed_commitis the signal — but forruntime_trace, the signal is digest-based). Accept the parameter to match the registry's uniform(slice, head) -> IndexFreshnessshape; document the no-op in the function's docstring. AC-4's AST-walk audit is the structural defense against a future contributor sneaking adatetime.now()into the function body (source-grep is bypassable via string concatenation; AST-walk is not). -
Function signature exactness. The registry's
FreshnessChecktype alias (registry.py:67) isCallable[[dict[str, object], str], "IndexFreshness"]. Thescip_freshnessprecedent (index_health.py:144) uses this verbatim. A typed Pydantic-model slice (the original draft'sRuntimeTraceSlice) would not register —mypy --strictwould refuse the assignment becauseCallable[[dict[str, object], str], …]is not a subtype ofCallable[[RuntimeTraceSlice, …], …]. The function MUST takedict[str, object]and do isinstance-checks inline. -
Scenario B's four-part assertion is the load-bearing one. Implementation risk #3 (in the S4-02 stale-scip fixture, but the discipline applies here too) says "asserting only
Staleis too weak — also assert the reason and its inner fields, because a future bug could regress fromDigestMismatchto a genericIndexerError("idk")and the test would still pass on the weaker assertion." Don't weaken toisinstance(result, Stale)— assert all four inequalities, and assert them against themodel_dump(mode="json")shape that lands inschema_slice["index_health"][...]["freshness"](NOT the in-processIndexFreshnesstyped value — B2 dumps it before emitting). -
Argument-order canary (AC-11) is structural defense, not paranoia.
FreshnessRegistry.dispatch_all(registry.py:174) callscheck(slices.get(name, {}), head)— positional. If a freshness function defensively accepts either order, a future registry refactor that reverses the call (or a typocheck(head, slices.get(...))) would silently corrupt every freshness check in the registry. The canary makes the order load-bearing: the function explicitly fails if called wrong. -
Rule-of-three threshold reached but NOT extracted in this story. Five
@register_index_freshness_checkregistrations exist by phase-end:scip(S4-01),runtime_trace(this story), and three more in S6-08 (semgrep,gitleaks,conventions). The duplication across the five would justify a_FreshnessHelpersmicro-kernel (_require_str_field,_parse_iso_or_stale,_empty_dict_sentinel) — but per Rule 2 / the S5-04 D2 precedent, the extract is deferred to the consumer that triggers it. S6-08 is the kernel-extraction trigger story; do NOT extract here. Documented for S6-08 authors via this Notes paragraph + a cross-reference in S6-08's own validation report. -
The adversarial does not run real
docker. Building two real images in CI per PR is too slow and too brittle. AC-13's_forbid_real_subprocessfixture is the structural defense — a future "let's quickly add an end-to-end check" contributor cannot silently un-mock the subprocess layer; the fixture's monkeypatched stubs raiseAssertionErroron any escape. A separateintegrationCI job (S8-03) optionally runs realdocker buildagainst thedistroless-targetfixture for end-to-end smoke — but it's not gating on this story. -
IndexHealthProbeintegration call usesread_raw_slices. The adversarial reaches into S4-01's machinery indirectly: write a syntheticruntime_trace.jsontoraw_dir(tmp_path), construct anIndexHealthProbe(S4-01), callawait probe.run(repo, ctx), read the emittedschema_slice["index_health"]["runtime_trace"]["freshness"]. Theread_raw_sliceskernel (index_health.py:212) is what hydrates the per-name slice dict the registry dispatches against. Mirrors S5-04's sibling-slice-read precedent. -
Fresh.indexed_athonesty. TheFreshvariant'sindexed_atis rendered inCONTEXT_REPORT.md's Confidence section (S8-01). A sentinel datetime would render as a confusing "indexed at 0001-01-01"; reading fromslice_["last_traced_at"]carries the real timestamp the runtime_trace probe recorded. The malformed-timestamp fallback (AC-2-gtry / except ValueError → Stale(_MSG_SLICE_MALFORMED)) is the honest answer when the upstream writer corrupted the field — better than a sentinel + silent rendering. -
Open question — re-trace on drift. When the operator rebuilds the image without re-running the gather, B2 surfaces
Stale(DigestMismatch). The response (re-runcodegenie gather) is the operator's; this story doesn't auto-re-trace. Document in the freshness function's docstring: "the resolution path iscodegenie gatherre-run; B2's job is detection, not remediation." -
Open question — multi-image repos. A repo with
Dockerfile+apps/api/Dockerfile+apps/web/Dockerfilecould in principle trace all three. Today,RuntimeTraceProbetraces only the canonical one (the slice'sbuilt_image_digestis singular). Multi-image support is a future ADR; the freshness function does NOT defensively handlebuilt_image_digestas a list — if S5-02's slice ever widens this field's type, the AC-2-cisinstance(...)check fails fast (Stale(_MSG_SLICE_MALFORMED)), which is the right structural answer until a follow-up ADR pins the multi-image shape. -
The
clean_freshness_registryfixture is shared. Thetests/unit/probes/layer_b/test_index_health_probe.py:74-87fixture pattern (snapshot + restore the singleton's_checksand_originsdicts in afinallyblock) is the canonical idiom. Reuse it via a shared conftest undertests/conftest.py(if not already there — check before adding) or duplicate inline. Do NOT callunregister_for_testsand forget to restore — the test pollution propagates to every downstream test in the same process.