Story S5-02 — RuntimeTraceProbe — sequential 5-scenario harness + image-digest token¶
Step: Step 5 — Ship Layer C (runtime + container) probes
Status: Done (GREEN 2026-05-17 — see _attempts/S5-02.md)
Effort: L
Depends on: S5-01 (ScenarioResult discriminated union), S1-09 (ProbeContext.image_digest_resolver), S1-08 (@register_probe(heaviness="heavy")), S1-06 (docker, strace in ALLOWED_BINARIES), S1-07 (run_external_cli exists — but Layer C calls run_allowlisted directly, not via this wrapper), S3-03 (writer chokepoint with RedactedSlice), S4-01 (IndexHealthProbe consumes last_traced_image_digest / built_image_digest from this probe's slice — S5-05 wires the freshness check)
ADRs honored: 02-ADR-0001 (docker/strace allowlist), 02-ADR-0003 (heaviness="heavy" sort), 02-ADR-0004 (image-digest as declared_inputs special token via ProbeContext.image_digest_resolver), 02-ADR-0007 (no Plugin Loader, no plugin.yaml — the probe is in-tree), 02-ADR-0010 (writer chokepoint RedactedSlice)
Validation notes (2026-05-16)¶
Story hardened by phase-story-validator (_validation/S5-02-runtime-trace-probe.md). This is the 1st canonical consumer of S5-01's ScenarioResult, the 1st canonical Layer-C probe (setting the precedent that Layer C calls run_allowlisted directly, not via run_external_cli), and the 1st consumer of the image-digest:<resolved> declared-input special-token mechanism (so this story also lands the cache/keys.py::_resolve_special_token dispatch arm — S1-09 added the ProbeContext.image_digest_resolver field but not the cache-side resolver). Verdict: HARDENED. Eighteen in-place edits applied (full details in _validation/):
- Envelope
confidencecontract preserved (CF1 / NF-B).Probe.confidence: Literal["high","medium","low"](frozen atbase.py:68/localv2.md §4 line 328) does not admit"unavailable". The four prior occurrences of envelopeconfidence="unavailable"are routed to envelopeconfidence="low"+ slicetrace_coverage_confidence="unavailable"(the Phase-2 extension oflocalv2.md §5.3 C4's tri-state, surfaced via a new AC pinning the contract preservation). ImageDigestUnresolvedrouted toFailed, notSkipped(NF-A). S5-01's HARDENED variant set placesImageDigestUnresolvedinTraceFailureReason, NOTTraceSkipReason. AC-11 + AC-12 rewritten: resolver-returned-None and resolver-is-None paths emitTraceScenarioFailed(reason=ImageDigestUnresolved()). Docker-build-failure paths remainTraceScenarioSkipped(reason=ImageBuildUnavailable())(already correct).- Cache
_resolve_special_tokendispatch arm scoped into this story (CF7 / NF-C). Inspection ofcache/keys.py::declared_inputs_forconfirms the resolver does NOT exist —image-digest:<resolved>is silently rglobbed and dropped. Three new ACs land it here: (a) recognition byr"^[a-z0-9_-]+:<resolved>$"syntax; (b)image-digest:arm callsctx.image_digest_resolver(snapshot.root)and folds the digest (orNone-fallback) into the content-hash tuple; (c) unknown tokens raiseCacheKeyError(reason="unknown_special_token", token=…)viamatch+assert_never(the Open/Closed seam for future tokens likescip-index-output:). image_digest_resolverraising →Failed(ImageDigestUnresolved())(CF6). Any exception from the resolver is caught at the call site, translated, and structured-logged withimage_digest_unresolved_reason="resolver_raised". The probe never raises out ofrun()._DEFAULT_SCENARIOS: list[ScenarioSpec](NF-D). The list-of-strings phrasing in AC-4 is replaced with the list-of-ScenarioSpecform prescribed by the implementation outline;_DEFAULT_SCENARIO_NAMES: Final[tuple[str, ...]]is the names-only constant where a name list is needed.- Citation correction (NF-E).
final-design.md §"Implementation risks" #7(4 references) →final-design.md §"Components" #6+§"Where security/best-practices traded off perf" (a). The sequential-scenario load-bearing rationale lives in those two sections, not in a numbered risks list (final-design.mdonly enumerates 5 top-level Risks). cache_strategy = "content"pinned literally (CF2). Not "NOTnone" — exact value matchesProbe.cache_strategydefault + dep_graph / scip precedent.applies_to_tasks=["*"],applies_to_languages=["*"],requires=[](NF-F / NF-G). Dockerfile-driven, not source-language-driven; no sibling-slice prerequisite (image-digest comes viaProbeContextcallable, not a sibling artifact).- macOS-detection mechanism pinned to
sys.platform(TF-D). The parentheticalos.uname().sysname == "Darwin"is removed; onlysys.platform != "linux"is canonical. _HARDENING_FLAGS: Final[tuple[str, ...]]module constant (DF-4). The three hardening flags live in one place; argv builder + tests import the constant; typo-in-one-flag mutations become catchable._image_ref_for_digest(digest: str) -> strpure smart constructor (CF3 / DF-3). Returns exactly"codegenie-trace:" + _short(digest)(first 12 hex chars after stripping anysha256:prefix); parametrized test pins format includingsha256:-prefixed, empty-string, non-hex edge cases._parse_strace_linespure-function AC + golden fixture + property-test entry (TF-B / DF-1). Set-valued slice fields are permutation-stable under line-ordering shuffles (excludingexecveordering for shell-invocation counting);tests/property/test_strace_parser_commutativity.pyadds a Hypothesis-driven entry._aggregate_scenariosexhaustivematchwithassert_never(TF-C / DF-5). Mirrors S5-01 AC-6; thematchis rehearsed at every level of the sum (ScenarioResulttop-level +TraceFailureReason+TraceSkipReasonvia S5-01 consumers);mypy --warn-unreachabledeletion smoke-test included._build_strace_argv/_build_docker_run_argvpure builders + explicit--separator pin (TF-3 / DF-1 / mutation #3). Each is importable and unit-testable without subprocess mocking; the strace argv asserts the explicit--token separates strace args from the wrapped command (mutation #3 — argv-merge regressions are catchable).scenarios_run/scenarios_failedderivation pinned (CF4).scenarios_run = [r.scenario_name for r in results if isinstance(r, TraceScenarioCompleted)];scenarios_failed = [r.scenario_name for r in results if isinstance(r, TraceScenarioFailed)]; Skipped scenarios appear in neither list (only inper_scenario_artifactswith aNonevalue + structured log).- Slice schema is the COMPLETE observable surface (CF8). Snapshot test asserts the slice dict's keys are EXACTLY the
localv2.md §5.3 C4set (no extras, no missing); drift in either direction flips the test red. - 6+-scenario operator-extensibility test (DF-2).
tests/fixtures/scenarios/seven_scenarios.yamlexercises a 7-scenario run end-to-end with zeroruntime_trace.pyedit; a source-scan test asserts_DEFAULT_SCENARIOSis the ONLY in-source scenario list (so a future hardcoded-6th-scenario drift in some sibling module is caught). _image_builtper-run()not per-instance (TF-E / mutation #15). The flag is a local in_run_all_scenariospassed explicitly into_execute_scenario. A regression test runs the probe twice on the same instance and assertsdocker buildis invoked exactly once perrun().
Notes-for-implementer extended with seven new paragraphs (contract-confidence routing, image-ref format, macOS detection mechanism, cache-token dispatch lives here, newtype deferral S1-05, trace-backend Protocol deferral, _image_built lifecycle). Three new test files land: tests/unit/cache/test_special_token_dispatch.py, tests/property/test_strace_parser_commutativity.py, tests/unit/probes/layer_c/test_runtime_trace_argv_builders.py.
Context¶
RuntimeTraceProbe is the densest single probe in Phase 2 and the single most valuable probe for distroless confidence (localv2.md §5.3 C4 — "without this, distroless migration breaks silently in production"). It runs five scenarios (startup, smoke_test, healthcheck, shutdown, error_path) against the analyzed-repo's container, captures syscalls / loaded libraries / shell invocations / network endpoints via strace -f -e trace=openat,execve,connect,bind,mmap (Linux) or deterministically emits TraceScenarioFailed(StraceUnavailable()) per scenario (macOS — no sudo prompt, no dtruss, the macOS path is permanent per final-design.md §"Where security/best-practices traded off perf" (a)). The five scenarios serialize through a single asyncio task — concurrent docker run of the same image races resources and confuses trace attribution (final-design.md §"Components" #6 + §"Where security/best-practices traded off perf" (a)). Per-scenario timeout 120 s; aggregate 600 s.
The cache-correctness story (02-ADR-0004): a package.json-only change with the image rebuilt-and-pushed-with-same-digest must cache-HIT; a FROM-line bump or base-image rebuild (new digest) must cache-MISS. The signal is in declared_inputs as the special token image-digest:<resolved> — Phase 0 Cache's special-token resolver calls ProbeContext.image_digest_resolver(repo_root) -> str | None, the one Phase 0 contract extension Phase 2 makes (S1-09).
The container-hardening triple --network=none --cap-drop=ALL --security-opt=no-new-privileges is non-negotiable — test_adversarial_dockerfile.py (S5-06) is the proof that a forkbomb/infinite-loop Dockerfile is contained.
References¶
- phase-arch-design.md §"Component design" #6 (
RuntimeTraceProbe) — the canonical internal-structure prose. - phase-arch-design.md §"Edge cases" rows 5, 6, 14 — docker-build failure, macOS strace, image-digest resolver returns None.
- phase-arch-design.md §"Data model" —
ProbeContextadditive field —image_digest_resolver. - final-design.md §"Components" #6 — sequential scenarios, p50 ~90 s, image-digest cache key.
- final-design.md §"Where security/best-practices traded off perf" (a) — "sequential runtime trace scenarios (~75 s wall-clock floor vs. theoretical 15 s if parallel) — accepted because parallel traces against the same image race resources and confuse attribution" — the load-bearing rationale
test_concurrent_task_count_le_onedefends. - final-design.md §"Conflict-resolution table" rows 9, 16 —
cache_keystrategy + cache-key shape. - 02-ADR-0001 —
docker,straceinALLOWED_BINARIES; Layer C callsrun_allowlisteddirectly (notrun_external_cli). - 02-ADR-0004 — image-digest as
declared_inputsspecial token;cache_key()override refused; resolver isOptional[Callable].§Consequencesnamescache.py's token-recognizer dispatch as the new extension surface. - High-level-impl.md §"Step 5" — Risks specific to Step 5: parallel-scenarios redirect; macOS determinism; container-hardening flags non-negotiable.
- localv2.md §5.3 C4 — output slice shape (
shared_libs_loaded,cert_paths_read,files_read_at_runtime,shell_invocations,network_endpoints_touched,trace_coverage_confidence). - Phase 0
run_allowlisted(src/codegenie/exec/__init__.py) — direct call site fordocker/strace. - Phase 0
cache/keys.py::declared_inputs_for— the function this story extends with_resolve_special_tokendispatch (S1-09 addedProbeContext.image_digest_resolverbut did NOT extendcache/keys.py; this story is the first consumer and lands the dispatch arm). - S5-01 sum-type modules —
src/codegenie/probes/layer_c/scenario_result.py(ScenarioResult,TraceFailureReasonwith variantsStraceUnavailable | DockerBuildFailed | ScenarioTimeout | ImageDigestUnresolved,TraceSkipReasonwith variantsNoDockerfile | ImageBuildUnavailable).ImageDigestUnresolvedis aTraceFailureReasonvariant, NOTTraceSkipReason— this story honors the S5-01 variant placement.
Goal¶
Implement src/codegenie/probes/layer_c/runtime_trace.py — a @register_probe(heaviness="heavy") probe that builds the analyzed repo's container, runs five scenarios sequentially under the container-hardening triple, captures syscalls via strace -f (Linux) or emits TraceScenarioFailed(StraceUnavailable()) per scenario (macOS) deterministically, declares the image-digest:<resolved> special token in declared_inputs, and emits a ProbeOutput whose slice round-trips through the Phase 2 writer chokepoint with RedactedSlice. Per-scenario 120 s timeout; aggregate 600 s; docker build failure → all five scenarios TraceScenarioSkipped(reason=ImageBuildUnavailable()), envelope confidence="low", slice trace_coverage_confidence="unavailable". Also lands the Phase 0 cache/keys.py::_resolve_special_token dispatch arm (the first consumer of the special-token mechanism) honoring the localv2.md §4 syntax. The frozen Probe.confidence: Literal["high","medium","low"] contract is preserved verbatim: "unavailable" is a slice-level signal on trace_coverage_confidence, never an envelope value.
Acceptance criteria¶
- [ ]
src/codegenie/probes/layer_c/runtime_trace.pyexists; declaresclass RuntimeTraceProbe(Probe)decorated with@register_probe(heaviness="heavy", runs_last=False)(S1-08 decorator). Class attributes pinned:name = "runtime_trace",layer = "C",tier = "base",applies_to_tasks: list[str] = ["*"],applies_to_languages: list[str] = ["*"](Dockerfile-driven, not source-language-driven — operator-extensibility per DF-2),requires: list[str] = [](no sibling-slice prerequisite; image-digest comes viaProbeContextcallable, not via sibling artifact),timeout_seconds: int = 300(envelope-side default; per-scenario + aggregate timeouts are separate constants below). A unit test asserts the literal class-attribute shape. - [ ]
RuntimeTraceProbe.declared_inputsis the literal list["Dockerfile", ".codegenie/scenarios.yaml", "image-digest:<resolved>"]— the literal token string with the<resolved>placeholder. The Phase 0 cache layer's_resolve_special_tokendispatch (extended by this story — see new AC below) recognizes the<name>:<resolved>syntax and expandsimage-digest:<resolved>viactx.image_digest_resolver(snapshot.root). A unit test asserts the literal three-entry shape; a separate unit test (undertests/unit/cache/) asserts the dispatch arm works. - [ ]
RuntimeTraceProbe.cache_strategy: Literal["content"] = "content"— exact value. Matches theProbe.cache_strategydefault (base.py:83) and the dep_graph / scip precedent.cache_strategy="none"is reserved for B2 (S4-01); this probe's whole point is to cache against image-digest equality, so"content"is the right value. A unit test asserts the literal annotationLiteral["content"]and runtime value"content"viainspect.get_annotations. - [ ] Reads
.codegenie/scenarios.yamlvia Phase 1safe_yaml.loadchokepoint; Pydantic-validates against an internalScenariosConfig(BaseModel)model with required fieldscenarios: list[ScenarioSpec]; falls back to a module-level constant_DEFAULT_SCENARIOS: Final[tuple[ScenarioSpec, ...]]carrying fiveScenarioSpecinstances (the five canonical names + minimalcommandargvs) when the file is absent. File present but malformed → the probe envelope reports a load error (envelopeconfidence="low", slicescenarios_run=[],scenarios_failed=[], all five scenarios asTraceScenarioFailed(reason=ScenarioYamlMalformed | DockerBuildFailed)per the S5-01 variant set — pick the closest existing S5-01 reason and document inNotes; do NOT add a new S5-01 variant here, Rule 3); not silent-fallback. Names-only constant_DEFAULT_SCENARIO_NAMES: Final[tuple[str, ...]] = ("startup", "smoke_test", "healthcheck", "shutdown", "error_path")exists for places where only a name list is needed (e.g., logs). - [ ] Cache
_resolve_special_tokendispatch arm lands in this story (because S5-02 is the first consumer of the mechanism; S1-09 added theProbeContext.image_digest_resolverfield but did NOT extendcache/keys.py). Three observable sub-criteria: - (a) Recognition syntax.
cache/keys.py::declared_inputs_forrecognizes any entry inprobe.declared_inputsmatchingr"^[a-z0-9_-]+:<resolved>$"as a special token (NOT a glob). All other entries continue torglobas today. - (b)
image-digest:arm. When the recognized token name isimage-digest, the dispatch callsctx.image_digest_resolver(snapshot.root)if non-None; folds the resulting string (or the sentinel""ifNone-returned /None-bound / resolver raised) into the content-hash tuple alongside the file content hashes. A unit test runskey_for(...)over a synthetic probe with the token indeclared_inputs+ two different resolvers (returning different digests) and asserts the two cache keys differ; runs with the SAME resolver twice and asserts the keys are byte-identical. - (c) Unknown-token fail-loud. The dispatch is a
matchon the token name withassert_neveron the otherwise branch via raisingCacheKeyError(reason="unknown_special_token", token=<full_token_string>). A unit test asserts an unknown token (bogus:<resolved>) raisesCacheKeyErrorwhose message contains both"unknown_special_token"AND the full token string. Thematchis the Open/Closed seam for future tokens (scip-index-output:,tree-sitter-grammar-set:); adding a new arm requires an ADR amendment to 02-ADR-0004. - [ ] Sequential per-scenario execution — verified by
tests/unit/probes/layer_c/test_runtime_trace.py::test_concurrent_task_count_le_one: an asyncio.Event-driven instrumentation hook (self._scenario_in_progress: asyncio.Eventset inside_execute_scenariofor the duration of one scenario run; cleared between scenarios) plus a test-side_observer_taskthat loopsawait asyncio.sleep(0); count = len([t for t in asyncio.all_tasks() if t.get_name().startswith("runtime_trace_scenario_")])until the run completes. The test assertscount <= 1at every sampled tick ANDlen(samples) >= 10. The assertion is on observed task count, not absence ofasyncio.gatherin source (which a future contributor could re-introduce subtly). - [ ]
_HARDENING_FLAGS: Final[tuple[str, ...]] = ("--network=none", "--cap-drop=ALL", "--security-opt=no-new-privileges")is exposed as a module-level constant. Both_build_docker_run_argvand thetest_hardening_flags_in_argvtest import the constant — there is no string duplication of any of the three flags anywhere in the source. A unit test imports the constant and asserts it equals exactly that 3-tuple (catches a typo-in-one-flag mutation that set-checking alone would miss). - [ ] Per scenario:
docker build→docker run <_HARDENING_FLAGS unpacked> -- <image_ref> <scenario-command argv>wrapped bystrace -f -e trace=openat,execve,connect,bind,mmapon Linux. The three hardening flags AND the explicit--separator are passed as separate argv tokens (no string-concat); a unit test mocksrun_allowlistedand asserts the captured argv (a) contains all three of_HARDENING_FLAGStokens in any order; (b) contains a literal"--"token immediately beforeimage_ref; (c) does NOT contain any string equal to_HARDENING_FLAGSconcatenated (catches a mutation that joins them with spaces). - [ ] Pure argv builder functions —
_build_strace_argv(image_ref: str, command_argv: list[str]) -> list[str]and_build_docker_run_argv(image_ref: str, command_argv: list[str]) -> list[str]are pure module-private functions (no I/O, no subprocess, no logging). Each is imported by a dedicated unit test undertests/unit/probes/layer_c/test_runtime_trace_argv_builders.pyand asserted without mockingrun_allowlisted. The strace builder additionally asserts the argv contains exactly one literal"--"token (separating strace's own arguments from the wrappeddocker runinvocation, mutation #3). - [ ] All
dockerandstracecalls route throughrun_allowlistedDIRECTLY — notrun_external_cli. A grep test (tests/unit/probes/layer_c/test_runtime_trace_no_external_cli_wrap.py) asserts the probe's source has zerorun_external_clireferences and ≥ 1run_allowlistedreference (02-ADR-0001 + final-design.md §"Departures" reaffirmation). - [ ] Per-scenario
asyncio.wait_for(..., timeout=120); aggregate guardasyncio.wait_for(..., timeout=600)around the for-loop. Both timeouts are constants exported as_PER_SCENARIO_TIMEOUT_S: Final[int] = 120and_AGGREGATE_TIMEOUT_S: Final[int] = 600(test imports them and asserts the values; a deliberate edit to60/300flips the test red). - [ ] macOS path is deterministic — no sudo prompt: on
sys.platform != "linux"(canonical detector; theos.uname()form is NOT used — pick one and stick with it), the probe does not invokestraceordtruss; each scenario short-circuits toTraceScenarioFailed(scenario_name=..., reason=StraceUnavailable()). Unit test: under monkey-patchedsys.platform = "darwin", the probe's run completes without anyrun_allowlisted("strace", ...)orrun_allowlisted("sudo", ...)invocation (verified by mock-spy onrun_allowlistedrejecting anyargv[0] in {"strace", "sudo", "dtruss"}— assertion-by-rejection so a wrong probe path crashes the spy rather than silently passing). - [ ]
docker buildfailure (non-zero exit) → all five scenarios skip withTraceScenarioSkipped(reason=ImageBuildUnavailable(...)); the probe envelope'sconfidenceis"low"(the frozenLiteral["high","medium","low"]contract —Probe.confidencedoes NOT admit"unavailable"); the slice'strace_coverage_confidenceis"unavailable"; the slice'sbuilt_image_digestisNone; the slice'slast_traced_image_digestisNone.IndexHealthProbe(S4-01) readsbuilt_image_digestandlast_traced_image_digestfrom this slice and emitsIndexFreshness.Stale(IndexerError(message="upstream_runtime_trace_unavailable"))— covered by a fixture test that constructs the slice and roundtrips through B2's freshness loop (the S4-01 freshness loop call is exercised in a small integration test landed via S5-05; this story emits the slice fields B2 reads and asserts the slice-emission shape; the freshness-side roundtrip lives in S5-05). - [ ] Image-digest cache HIT skips scenarios. Two-layer test: (1)
tests/unit/cache/test_special_token_dispatch.py::test_image_digest_resolver_changes_cache_keyexercises the new dispatch arm on a synthetic probe with the token indeclared_inputs(this proves the resolver works); (2)tests/unit/probes/layer_c/test_runtime_trace.py::test_cache_hit_skips_scenariosruns the probe twice with the same(Dockerfile, scenarios.yaml, fixed-digest)tuple, expects the second run hits cache, asserts a mock-spy on_execute_scenariois called five times on the first run and zero times on the second. - [ ]
image_digest_resolverreturnsNone(no built image yet) → the probe envelopeconfidence="low"(NOT"unavailable"— contract preservation); slicetrace_coverage_confidence="unavailable"; slicebuilt_image_digest=None; scenarios are allTraceScenarioFailed(reason=ImageDigestUnresolved())(NOT Skipped —ImageDigestUnresolvedlives in S5-01'sTraceFailureReason, notTraceSkipReason); cache key folds in the sentinel""for the unresolved token (per the new dispatch AC above), so the cache still has a stable key over multiple resolver-returns-None runs. Reference:phase-arch-design.md §"Edge cases" row 14+02-ADR-0004 §Consequences. - [ ]
image_digest_resolverisNoneonProbeContext(operator never bound one) → identical envelope/slice shape to "resolver returned None"; the probe never raises. Covered by a dedicated unit test. The two None paths are distinguished only in the structured log fieldimage_digest_unresolved_reason: Literal["resolver_unbound", "resolver_returned_none"]. - [ ]
image_digest_resolverraises → caught at the call site, translated to per-scenarioTraceScenarioFailed(reason=ImageDigestUnresolved())for ALL five scenarios; structured log emitsimage_digest_unresolved_reason="resolver_raised"; the original exception'srepris in a separate structured-log fieldimage_digest_resolver_error_repr(never the message body — defensive against PII leak via exception text). The probe never raises out ofrun(). A unit test mocks the resolver to raise; asserts the probe completes; asserts the structured-log field values. - [ ] Envelope-confidence contract preservation pin. A unit test asserts
inspect.get_annotations(ProbeOutput)["confidence"]evaluates toLiteral["high", "medium", "low"](not widened); a parametrized test runs the probe across all six envelope-failure paths (build failure / resolver None-returned / resolver None-bound / resolver raised / aggregate timeout / all-scenarios-timed-out) and asserts the envelopeconfidenceis always in{"high", "medium", "low"}— never"unavailable". This is the structural pin against a future contributor widening the contract silently. - [ ]
scenarios_run/scenarios_failed/ Skipped routing pinned. Slice fields derive deterministically fromresults: list[ScenarioResult]:scenarios_run = [r.scenario_name for r in results if isinstance(r, TraceScenarioCompleted)];scenarios_failed = [r.scenario_name for r in results if isinstance(r, TraceScenarioFailed)];TraceScenarioSkippedscenarios appear in neither list — they surface only inper_scenario_artifacts(with aNonevalue for that scenario name) and in the per-scenario structured log. A parametrized test covers all combinations. - [ ] Output slice schema matches the relevant subset of
localv2.md§5.3 C4:artifact_uri,per_scenario_artifacts: dict[str, Path | None],scenarios_run: list[str],scenarios_failed: list[str],binaries_executed: list[str],shared_libs_loaded: list[str],cert_paths_read: list[str],files_read_at_runtime: {summary, full_list_uri},shell_invocations: int,network_endpoints_touched: {outbound, inbound},built_image_digest: str | None,last_traced_image_digest: str | None,trace_coverage_confidence: Literal["high", "medium", "low", "unavailable"]. The slice schema is the COMPLETE observable surface: a snapshot test assertsset(slice.keys()) == EXPECTED_SLICE_KEYSfor both a healthy run and an all-skipped run; drift in either direction (extra or missing key) flips the test red. Sub-schema lands as part of S5-03 / S5-04 (src/codegenie/schema/probes/layer_c/); this story emits the dict shape that the sub-schema validates. - [ ]
trace_coverage_confidencederivation: 5/5 scenarios completed →"high"; smoke-only or 2–4 completed →"medium"; startup-only →"low"; 0 completed →"unavailable"(matcheslocalv2.md§5.3 C4 + this story's explicit extension of the tri-state to a tetra-state). Pure function_derive_trace_coverage_confidence(results: list[ScenarioResult]) -> Literal["high", "medium", "low", "unavailable"]; table-driven test over(n_completed: 5..0)with the documented mapping; type checker confirms exhaustiveness. - [ ]
_aggregate_scenarios(results: list[ScenarioResult]) -> SliceFieldsis a pure function over the per-scenario outcome list that returns the slice fields (scenarios_run,scenarios_failed,binaries_executed, ...,trace_coverage_confidence). Itmatches on every variant ofScenarioResultwithassert_neveron the otherwise branch (mirrors S5-01 AC-6 exhaustive-match discipline; the producer/consumer ladder S5-01 documents — S5-02's_aggregate_scenariosis the 1st canonical consumer ofScenarioResult). Amypy --warn-unreachablesmoke-test verifies that deleting onecasearm produces a type-check error. - [ ]
_image_ref_for_digest(digest: str) -> stris a pure smart constructor returning exactly"codegenie-trace:" + _short(digest)where_shortstrips any leading"sha256:"prefix and takes the first 12 hex characters. A parametrized test pins the format over:"sha256:cafef00ddeadbeef..."→"codegenie-trace:cafef00ddead"; bare hex"cafef00ddeadbeef..."→"codegenie-trace:cafef00ddead"; empty""→ValueError("empty digest"); non-hex"not-a-digest"→ValueError("non-hex digest"). The tag prefix"codegenie-trace:"is itself a module-levelFinal[str]constant (no string duplication). - [ ]
_parse_strace_lines(lines: Iterable[str]) -> ParsedTraceis a pure function over an iterable of strace output lines, returning a frozenParsedTracePydantic model with fieldsbinaries_executed: frozenset[str],shared_libs_loaded: frozenset[str],cert_paths_read: frozenset[str],files_read_at_runtime: frozenset[str],shell_invocations: int,network_endpoints_touched: frozenset[tuple[str, str]]. Tested via (a) golden fixturetests/fixtures/strace/minimal.straceover a known-good snippet asserting the exact parsed model; (b) malformed-line resilience —_parse_strace_lines(["this is not strace output", "neither is this"])returns the all-emptyParsedTrace(does NOT raise); (c) Hypothesis property test intests/property/test_strace_parser_commutativity.py— for any permutation of the fixture lines, the set-valued fields are byte-identical (shell_invocationsis the only count-valued field; documented in the module as the one non-commutative exception). - [ ]
_image_builtis per-run(), not per-instance. The flag lives as a local in_run_all_scenarios(...)and is passed explicitly into_execute_scenario(...). No attribute onself. A regression test runs the probe twice on the same instance (probe = RuntimeTraceProbe(); await probe.run(...); await probe.run(...)) and assertsdocker buildis invoked once perrun()— total two builds across two runs (not one across both, which would be wrong — image may have been rebuilt between gathers). - [ ] Operator-extensibility for scenarios (Open/Closed). A fixture
tests/fixtures/scenarios/seven_scenarios.yamldeclares 7 scenario names (the 5 defaults + 2 operator-added:db_migrate,worker_drain). A test runs the probe end-to-end against a fixture repo with this YAML and asserts: (a) all 7 scenarios were executed in declared order; (b) the slice'sscenarios_run(orscenarios_failed/per_scenario_artifacts) covers all 7 names; (c) zero edits tosrc/codegenie/probes/layer_c/runtime_trace.pywere required to support the 6th + 7th — verified by a separate source-scan test (assert _DEFAULT_SCENARIO_NAMES == ("startup", "smoke_test", "healthcheck", "shutdown", "error_path")— unchanged from the canonical 5). - [ ]
_DEFAULT_SCENARIOSsource-scan uniqueness. A test greps thesrc/codegenie/tree and asserts the symbol_DEFAULT_SCENARIOS(or any literal list/tuple constant equivalent shape) appears in EXACTLY ONE source file (probes/layer_c/runtime_trace.py). This catches a future drift where a sibling module silently hardcodes its own 6th-scenario list — the operator-sidescenarios.yamlis the only legitimate extension surface. - [ ] Slice flows through the writer chokepoint as
RedactedSlice(S3-02 / S3-03); a test assertssecrets_redacted_count == 0for a clean fixture and>= 1for a fixture whose smoke-test command echoes an AWS-format key (theSecretRedactorfrom S3-01 must catch it on the runtime trace path). - [ ] Structured log fields emitted at least once per probe run:
probe.runtime_trace.dispatch,probe.runtime_trace.scenario_started(per scenario),probe.runtime_trace.scenario_finished(per scenario, includeswall_clock_msand thekindof theScenarioResult),probe.runtime_trace.image_digest_resolved(or…unresolved),probe.runtime_trace.cache_hit(when applicable),probe.runtime_trace.finish. - [ ]
mypy --strictclean on the new modules. No per-module override needed:[tool.mypy] warn_unreachable = trueis already repo-wide since Phase 0 S1-02 (established by S1-11 validation; reaffirmed in S5-01 validation). A unit test asserts the repo-wide flag is present and unmodified after this story; both new modules (probes/layer_c/runtime_trace.pyand thecache/keys.pyextension) are included in the defaultmypy --strictglob (noexcludeentry). - [ ] Phase 0
fencejob stays green — nohttpx,requests,socket,anthropic,openai,langgraphimports added. - [ ]
forbidden-patternspre-commit coverssrc/codegenie/probes/layer_c/runtime_trace.py(S5-01 already extended_is_under_phase2_banned_packageforprobes/layer_c/scenario_result.py— verify the predicate matches the wholeprobes/layer_c/subdirectory by inspection ofscripts/check_forbidden_patterns.py::_is_under_phase2_banned_package; if narrower, extend in this story PR mirroring S5-01 AC-11's pattern). A dedicated test (tests/unit/pre_commit/test_forbidden_patterns_phase2_runtime_trace.py) writes syntheticmodel_construct/subprocess.run/asyncio.create_subprocess_execsource undersrc/codegenie/probes/layer_c/synth_runtime_trace.py(tmp_path-rooted) and asserts the script exits non-zero with both02-ADR-0010 §Decisionandproduction ADR-0033 §3substrings emitted. Negative coverage: same source underprobes/layer_a/synth.pyexits zero.
Implementation outline¶
- Extend
src/codegenie/cache/keys.py::declared_inputs_forto dispatch special tokens before the existing rglob path. New helper_resolve_special_token(token: str, snapshot: RepoSnapshot, ctx: ProbeContext) -> str(pure function over the strings; the only impure call isctx.image_digest_resolver(snapshot.root)). Regex_SPECIAL_TOKEN_RE = re.compile(r"^([a-z0-9_-]+):<resolved>$")decides which entries are tokens vs globs. The dispatch ismatch token_name: case "image-digest": ...; case _: raise CacheKeyError(reason="unknown_special_token", token=token)—assert_never-equivalent via the explicit raise. Inject the resolved string into the content-hash tuplekey_foralready constructs (sentinel""forNone-returned /None-bound / resolver-raised cases — the cache key is stable across all three "unresolved" paths). AddCacheKeyErrortocache/keys.py(sibling of existing types in the file). NoProbeContextschema edit — the field S1-09 added is already present. - Define
ScenarioSpecPydantic model insrc/codegenie/probes/layer_c/runtime_trace.py: requiredname: str, optionalcommand: list[str](argv to pass todocker run), optionalexpected_exit_code: int = 0. DefineScenariosConfig(scenarios: list[ScenarioSpec]). Bothfrozen=True, extra="forbid". - Define
_DEFAULT_SCENARIOS: Final[tuple[ScenarioSpec, ...]]with the five canonical names; each default carries a minimalcommandargv (e.g.,["sh", "-c", "exit 0"]forstartup; smoke/healthcheck/shutdown/error_path defaults follow the localv2.md §5.3 C4 prose). Names-only constant_DEFAULT_SCENARIO_NAMES: Final[tuple[str, ...]] = ("startup", "smoke_test", "healthcheck", "shutdown", "error_path")exists for log/render places that need only names. - Implement
RuntimeTraceProbe.declared_inputsas a class attribute (matching the kernel ABC atbase.py:81):declared_inputs: list[str] = ["Dockerfile", ".codegenie/scenarios.yaml", "image-digest:<resolved>"]. Theimage-digest:<resolved>form is the literal string the newcache/keys.py::_resolve_special_tokendispatch recognizes (Step 0 above); the resolver substitution happens insidecache/keys.py, not here. - Implement
RuntimeTraceProbe.run(self, snapshot: RepoSnapshot, ctx: ProbeContext) -> ProbeOutput: - (a) Resolve
image_digest, unresolved_reason = _resolve_image_digest(ctx, snapshot.root)— a pure-ish helper that returns(digest_str_or_None, "resolver_unbound" | "resolver_returned_none" | "resolver_raised" | None). Wraps the resolver call intry/except Exception(Rule 5/12 — fail loud at the call site, not silently). Structured-log the outcome. - (b) If
image_digest is None: short-circuit → emitProbeOutputwithscenarios_run=[],scenarios_failed=[name for name in _DEFAULT_SCENARIO_NAMES](because all five emitTraceScenarioFailed(reason=ImageDigestUnresolved())), allTraceScenarioFailedper-scenario list, slicebuilt_image_digest=None, slicetrace_coverage_confidence="unavailable", envelopeconfidence="low"(contract preservation). - (c) Else: load
scenarios.yaml(Pydantic-validate viaScenariosConfig); fall back to_DEFAULT_SCENARIOSon absence. Malformed YAML → all fiveTraceScenarioFailed(reason=DockerBuildFailed(stderr_tail="scenarios.yaml malformed: <error>"))(closest S5-01 variant; do NOT add new variants here — Rule 3); envelopeconfidence="low". - (d) Detect platform:
if sys.platform != "linux": emit oneTraceScenarioFailed(reason=StraceUnavailable())per scenario; do not callrun_allowlistedat all. Envelopeconfidence="low"; slicetrace_coverage_confidence="unavailable". - (e) Else (Linux): wrap the for-loop in
asyncio.wait_for(_run_all_scenarios(scenarios=…, image_digest=…, ctx=ctx, snapshot=snapshot), timeout=_AGGREGATE_TIMEOUT_S); inside_run_all_scenarios, declareimage_built = Falseas a local (NOTself._image_built), iterate scenarios with explicitawaitbetween iterations (noasyncio.gather, noTaskGroup). For each scenario:await asyncio.wait_for(_execute_scenario(scenario, image_ref, image_built, ctx, snapshot), timeout=_PER_SCENARIO_TIMEOUT_S). The first iteration receivesimage_built=Falseand triggers thedocker build; subsequent iterations receiveimage_built=Truefrom the loop's accumulator. OnTimeoutError: emitTraceScenarioFailed(reason=ScenarioTimeout(seconds=120)). On the aggregateTimeoutError: not-yet-started scenarios getTraceScenarioSkipped(reason=ImageBuildUnavailable())(closest S5-01 variant for "didn't run"). - (f)
_execute_scenarioacceptsimage_built: bool(input) and returns(ScenarioResult, image_built_after: bool). Onimage_built=Falseit callsrun_allowlisted("docker", ["build", "-t", _image_ref_for_digest(image_digest), "-f", "Dockerfile", str(snapshot.root)])first; setsimage_built_after=True. Then callsrun_allowlisted("strace", _build_strace_argv(image_ref, scenario.command)). The strace argv (built by the pure_build_strace_argvhelper) is["-f", "-e", "trace=openat,execve,connect,bind,mmap", "--", "docker", "run", *_HARDENING_FLAGS, "--", image_ref, *scenario.command]. Capture stdout/stderr; parse strace output via_parse_strace_linesinto the slice fields. - (g)
_aggregate_scenarios(results)(pure function) folds per-scenarioScenarioResults into the slice;_derive_trace_coverage_confidence(results)derives the tetra-state. Envelopeconfidenceis the lift oftrace_coverage_confidenceclipped to the tri-state Literal:{"high": "high", "medium": "medium", "low": "low", "unavailable": "low"}. Pin this lift in a_envelope_confidence(slice_confidence) -> Literal["high","medium","low"]pure function. - Implement strace-output parser as a small pure function
_parse_strace_lines(lines: Iterable[str]) -> ParsedTracereturning a frozenParsedTrace(BaseModel, frozen=True, extra="forbid")model with set-valued fields (binaries_executed: frozenset[str],shared_libs_loaded: frozenset[str],cert_paths_read: frozenset[str],files_read_at_runtime: frozenset[str],network_endpoints_touched: frozenset[tuple[str, str]]) and one count-valued field (shell_invocations: int). Pure function — golden-tested against a fixture strace output snippet undertests/fixtures/strace/minimal.strace; property-tested for permutation-stability of the set fields (tests/property/test_strace_parser_commutativity.py); resilience-tested against malformed input (returns all-empty frozensets + zero count, does NOT raise). - Write artifacts (one
.straceper scenario + a mergedruntime-trace.json) under.codegenie/context/raw/; the slice carriesartifact_uriandper_scenario_artifacts. Skipped scenarios getper_scenario_artifacts[name] = None; the test asserts this representation explicitly. - Slice flows back to the coordinator as
ProbeOutput.schema_slice: dict[str, JSONValue]; the writer chokepoint (S3-03) wraps it inRedactedSliceviaSecretRedactor. Nomodel_constructanywhere in the module (theforbidden-patternstest backstops this). - Module-level constants required:
_HARDENING_FLAGS: Final[tuple[str, ...]],_PER_SCENARIO_TIMEOUT_S: Final[int],_AGGREGATE_TIMEOUT_S: Final[int],_IMAGE_REF_PREFIX: Final[str] = "codegenie-trace:",_DEFAULT_SCENARIOS: Final[tuple[ScenarioSpec, ...]],_DEFAULT_SCENARIO_NAMES: Final[tuple[str, ...]],_SCENARIO_TASK_NAME_PREFIX: Final[str] = "runtime_trace_scenario_". All exposed at module level; tests import them. - Register
@register_index_freshness_check("runtime_trace")— deferred to S5-05; this story does not register it.
TDD plan — red / green / refactor¶
Red:
test_register_probe_heaviness_heavy— registry introspection assertsRuntimeTraceProbeis registered withheaviness == "heavy"andruns_last is False. Initial state: module import fails.test_declared_inputs_literal_three_entries— assertsRuntimeTraceProbe().declared_inputs == ["Dockerfile", ".codegenie/scenarios.yaml", "image-digest:<resolved>"]. Failure mode: order or count or token-shape drift.test_class_attributes_pinned— assertsapplies_to_tasks == ["*"],applies_to_languages == ["*"],requires == [],cache_strategy == "content",tier == "base",layer == "C",name == "runtime_trace".test_concurrent_task_count_le_one— instrument the probe via aself._scenario_in_progress: asyncio.Eventhook set inside_execute_scenariofor the duration of one scenario run; an observer task loopsawait asyncio.sleep(0)and snapshotslen([t for t in asyncio.all_tasks() if t.get_name().startswith(_SCENARIO_TASK_NAME_PREFIX)]). The test asserts (a)count <= 1at every sample; (b)len(samples) >= 10. This is the load-bearing test forfinal-design.md §"Where security/best-practices traded off perf" (a)— encodes "per-scenario sequentialRuntimeTraceProbeexecution can be silently parallelized by a future contributor." Assertion is on observed task count, not absence ofasyncio.gatherin source (bypassable).test_macos_no_strace_invocation— monkeypatchsys.platformto"darwin"; mockrun_allowlistedwith a spy that raises onargv[0] in {"strace", "sudo", "dtruss"}; run the probe; assert no spy raise; assert every scenario isTraceScenarioFailed(reason=StraceUnavailable()). (Noos.unamevariant — canonical detector issys.platform != "linux".)test_macos_no_tty_interaction— mockrun_allowlistedto fail-loud ifstdinis anything other thanDEVNULL; run on macOS-platform path; assert no failure (the probe never opens a TTY).test_hardening_flags_constant_pinned—_HARDENING_FLAGS == ("--network=none", "--cap-drop=ALL", "--security-opt=no-new-privileges")exactly. Catches typo mutations in any one of the three flags that pure set-membership would miss.test_hardening_flags_in_argv— mockrun_allowlistedto capture argv; run a single-scenario fixture on Linux-platform path; assert (a) the captured argv for thedocker runsegment contains every element of_HARDENING_FLAGSexactly once (order-independent set membership); (b) the argv contains a literal"--"token immediately precedingimage_ref; (c) no argv element equals the string-concat of the three flags. Mutation test: deleting--network=nonefrom_HARDENING_FLAGSflips this red.test_no_run_external_cli_in_source— opensrc/codegenie/probes/layer_c/runtime_trace.pyandassert "run_external_cli" not in source and "run_allowlisted" in source.test_per_scenario_timeout_120s_constant/test_aggregate_timeout_600s_constant— import_PER_SCENARIO_TIMEOUT_S/_AGGREGATE_TIMEOUT_Sand assert their literal values.test_per_scenario_timeout_triggers_failed— mock_execute_scenarioto sleep200real-time-mocked seconds; assert the result isTraceScenarioFailed(reason=ScenarioTimeout(seconds=120)); assert the aggregate loop did not also time out.test_aggregate_timeout_triggers_failed_all_remaining— mock the first scenario to consume 540 s; subsequent scenarios should not start; the slice reflects 1TraceScenarioFailed(ScenarioTimeout)(the in-flight one cancelled) + 4TraceScenarioSkipped(ImageBuildUnavailable)for not-yet-started scenarios (closest S5-01 variant for "didn't run"). Documented in module docstring.test_docker_build_failure_all_skipped— mockrun_allowlistedto return non-zero exit for thedocker buildargv; assert all fiveScenarioResultareTraceScenarioSkipped(reason=ImageBuildUnavailable(...)); assert envelopeconfidence == "low"(NOT"unavailable"— contract preservation); assert slicetrace_coverage_confidence == "unavailable".test_image_digest_resolver_returns_none_failed— bind a resolver returningNone; assert all fiveScenarioResultareTraceScenarioFailed(reason=ImageDigestUnresolved())(NOT Skipped — variant lives in S5-01'sTraceFailureReason); assert envelopeconfidence == "low"; slicebuilt_image_digest is None; slicetrace_coverage_confidence == "unavailable"; structured-log fieldimage_digest_unresolved_reason == "resolver_returned_none".test_image_digest_resolver_unbound_failed—ctx.image_digest_resolver is None; same envelope/slice shape as test 14 (Failed not Skipped); structured-log fieldimage_digest_unresolved_reason == "resolver_unbound".test_image_digest_resolver_raises_translated_to_failed— mock the resolver to raiseRuntimeError("boom"); assert the probe completes (does NOT raise out ofrun()); assert all fiveTraceScenarioFailed(reason=ImageDigestUnresolved()); envelopeconfidence == "low"; structured-logimage_digest_unresolved_reason == "resolver_raised"ANDimage_digest_resolver_error_reprcontains"RuntimeError"(NOT the message body"boom"— defensive against PII).test_envelope_confidence_contract_preserved—inspect.get_annotations(ProbeOutput)["confidence"]evaluates toLiteral["high","medium","low"]; a parametrized run over all six envelope-failure paths (build failure / resolver None-returned / resolver None-bound / resolver raised / aggregate timeout / all-scenarios-timed-out) asserts envelopeconfidence ∈ {"high","medium","low"}— never"unavailable".test_cache_special_token_dispatch_recognizes_image_digest(tests/unit/cache/test_special_token_dispatch.py) — synthetic probe withdeclared_inputs=["Dockerfile", "image-digest:<resolved>"]; twoctxinstances with resolvers returning different digests; assertkey_for(probe, snapshot, task)produces two distinct cache keys; same resolver invoked twice on identical state → byte-identical key; the three "unresolved" paths (None-returned / unbound / raised) all fold to the same sentinel and produce the same cache key.test_cache_special_token_dispatch_unknown_raises(same file) — synthetic probe withdeclared_inputs=["bogus:<resolved>"];key_for(...)raisesCacheKeyErrorwhosestr(exc)contains both"unknown_special_token"AND"bogus:<resolved>".test_cache_resolves_runtime_trace_hit_skips_scenarios— integration test depending on tests 18+19. Run twice with the same fixture + same resolver returning the same digest; spy on_execute_scenario; first-run call count == 5, second-run call count == 0; second-run slice JSON byte-identical to first-run (modulogathered_at/ wall-clock).test_scenarios_yaml_pydantic_validation— malformedscenarios.yaml→ envelopeconfidence="low"; all fiveTraceScenarioFailed(reason=DockerBuildFailed(stderr_tail="scenarios.yaml malformed: ...")). Missing file → default-fallback (envelope-succeeds path with_DEFAULT_SCENARIOS).test_trace_coverage_confidence_derivation— table-driven over(n_completed: 5..0) -> ("high","medium","medium","medium","low","unavailable"). Directly tests_derive_trace_coverage_confidence(...)as a pure function.test_aggregate_scenarios_is_exhaustive_match— pass[completed, failed, skipped]; assert returned slice fields match the expected mapping. Separatetest_aggregate_scenarios_warn_unreachable_smokerunsmypy --warn-unreachableagainst a copy ofruntime_trace.pywith onecasearm of_aggregate_scenariosdeleted; asserts mypy errors with"Statement is unreachable"(mirrors S5-01 AC-6 smoke test).test_writer_chokepoint_secret_redaction— fixture whose smoke-test command echoesAKIA0123456789ABCDEF; capture the writer'sRedactedSlice; assertfindings_count >= 1; assert plaintext absent from every.codegenie/context/raw/output file (grep-walk asserts 0 occurrences).test_image_ref_for_digest_format(tests/unit/probes/layer_c/test_runtime_trace_argv_builders.py) — parametrized over[("sha256:cafef00ddeadbeef0123456789abcdef", "codegenie-trace:cafef00ddead"), ("cafef00ddeadbeef0123456789abcdef", "codegenie-trace:cafef00ddead")];ValueErrorfor""and for"not-a-digest".test_build_strace_argv_explicit_dash_dash_separator(same file) —_build_strace_argv("codegenie-trace:cafef00ddead", ["sh", "-c", "exit 0"])returns argv containing exactly one"--"token positioned immediately before"docker"(separating strace's own args from the wrapped command);--network=noneappears AFTER the"--"and beforeimage_ref.test_build_docker_run_argv_contains_all_hardening_flags(same file) —_build_docker_run_argv(...): (a) contains every_HARDENING_FLAGSelement exactly once; (b) contains a literal"--"immediately beforeimage_ref; (c) no argv element equals string-concat of the three flags.test_parse_strace_lines_golden_fixture— loadtests/fixtures/strace/minimal.strace;_parse_strace_lines(lines)returns the exact expectedParsedTraceinstance.test_parse_strace_lines_malformed_resilience—_parse_strace_lines(["this is not strace", "neither is this"])returns the all-emptyParsedTrace; does NOT raise.test_parse_strace_lines_permutation_stability(tests/property/test_strace_parser_commutativity.py, Hypothesis-driven) — for any permutation of the golden fixture's lines, the set-valued fields are byte-identical.shell_invocationsis the documented non-commutative exception.test_image_built_local_not_instance—probe = RuntimeTraceProbe(); await probe.run(...); await probe.run(...); spy onrun_allowlisted; assertdocker buildargv is invoked exactly twice total (once per run); assertgetattr(probe, "_image_built", "<absent>") == "<absent>"(no instance attribute exists).test_six_plus_scenarios_via_yaml_zero_source_edit— fixture with.codegenie/scenarios.yamldeclaring 7 scenarios (startup, smoke_test, healthcheck, shutdown, error_path, db_migrate, worker_drain); run the probe; assert 7 scenarios executed; assertset(slice["scenarios_run"]) | set(slice["scenarios_failed"]) | set(slice["per_scenario_artifacts"].keys())covers all 7 names; assert_DEFAULT_SCENARIO_NAMES == ("startup", "smoke_test", "healthcheck", "shutdown", "error_path")(unchanged from canonical 5 — operator-side extension didn't require source edit).test_default_scenarios_source_scan_uniqueness—grep -rn "_DEFAULT_SCENARIOS" src/codegenie/returns exactly one file (probes/layer_c/runtime_trace.py).test_scenarios_run_failed_skipped_routing— parametrized: result list[Completed("a"), Failed("b"), Skipped("c"), Completed("d"), Failed("e")]; assertscenarios_run == ["a", "d"],scenarios_failed == ["b", "e"],set(per_scenario_artifacts.keys()) == {"a","b","c","d","e"},per_scenario_artifacts["c"] is None.test_slice_schema_is_complete_observable_surface— snapshot test assertsset(slice.keys()) == EXPECTED_SLICE_KEYSfor both a healthy 5/5 run and an all-Failed(ImageDigestUnresolved)run.test_forbidden_patterns_phase2_runtime_trace— parametrized synthesis (mirrors S5-01 AC-11):model_construct/subprocess.run/asyncio.create_subprocess_execsource forms undersrc/codegenie/probes/layer_c/synth_runtime_trace.py(tmp_path) — each exits non-zero with both02-ADR-0010 §Decisionandproduction ADR-0033 §3. Negative: same source underprobes/layer_a/synth.pyexits zero.test_mypy_warn_unreachable_is_repo_wide— parsespyproject.toml; asserts[tool.mypy] warn_unreachable == True; asserts no[[tool.mypy.overrides]]block hasexcludematchinglayer_c/runtime_trace(covered by default).
Green:
- Land Step 0 first — extend
cache/keys.py::declared_inputs_forwith_resolve_special_tokendispatch +CacheKeyError. Tests 18 + 19 turn green here. - Implement
RuntimeTraceProbeper the implementation outline (steps 1–8). - Implement
_parse_strace_linesagainsttests/fixtures/strace/minimal.strace. - Make all red tests pass; do NOT introduce mocks the test didn't already expect.
Refactor:
- Confirm
_execute_scenario(scenario, image_ref, image_built, ctx, snapshot) -> (ScenarioResult, bool)is a pure async function — testable in isolation without mocking the probe class. - Confirm
_build_strace_argv/_build_docker_run_argv/_image_ref_for_digest/_envelope_confidence/_derive_trace_coverage_confidence/_parse_strace_lines/_aggregate_scenariosare pure module-private functions; each has at least one dedicated unit test that does NOT mockrun_allowlisted. - Confirm structured-log fields land via
structlog's context binding (logger.bind(probe="runtime_trace", scenario=name)); one dispatch per scenario carries the binding. - Confirm
__all__exports onlyRuntimeTraceProbe; internal builders are module-private (leading underscore). - Confirm
cache/keys.py's_resolve_special_tokenis amatchover token names withassert_never-equivalent on the otherwise branch (the explicitraise CacheKeyError(reason="unknown_special_token", token=…)); future tokens add arms via ADR amendment to 02-ADR-0004.
Files to touch¶
- New:
src/codegenie/probes/layer_c/runtime_trace.py,tests/fixtures/strace/minimal.strace,tests/fixtures/scenarios/{empty.yaml,malformed.yaml,three_only.yaml,seven_scenarios.yaml}. - New tests:
tests/unit/probes/layer_c/test_runtime_trace.py(covers AC tests 1–37 above),tests/unit/probes/layer_c/test_runtime_trace_no_external_cli_wrap.py(source-grep test),tests/unit/probes/layer_c/test_runtime_trace_argv_builders.py(pure-function builders),tests/unit/cache/test_special_token_dispatch.py(the newcache/keys.pyextension),tests/property/test_strace_parser_commutativity.py(Hypothesis-driven permutation stability),tests/unit/pre_commit/test_forbidden_patterns_phase2_runtime_trace.py(mirrors S5-01 AC-11 shape). - Existing — edit required:
src/codegenie/cache/keys.py— extenddeclared_inputs_forwith_resolve_special_tokendispatch and addCacheKeyError. S5-02 is the first consumer of the special-token mechanism; S1-09 added theProbeContext.image_digest_resolverfield but did NOT extendcache/keys.py. Possibly extend:scripts/check_forbidden_patterns.py::_is_under_phase2_banned_packageif the predicate doesn't already coverprobes/layer_c/runtime_trace.py(S5-01 coveredprobes/layer_c/scenario_result.py; verify the predicate is path-scoped to the wholeprobes/layer_c/subdirectory). - Existing — read-only references:
src/codegenie/probes/layer_c/scenario_result.py(S5-01 — variant set),src/codegenie/probes/_shared/scanner_outcome.py(S5-01 — NOT consumed by this probe; documented for cross-reference),src/codegenie/probes/base.py(readProbeContext.image_digest_resolverfield after S1-09;Probe.confidence: Literal["high","medium","low"]contract — pinned in this story's test),src/codegenie/exec/__init__.py(run_allowlisted— S1-06 landsdocker/straceinALLOWED_BINARIES),src/codegenie/output/writer.py(writer'sRedactedSlicesignature — S3-03),src/codegenie/cache/keys.py::key_for(downstream consumer of the new dispatch). - No edit:
pyproject.toml[tool.mypy]—warn_unreachable = trueis already repo-wide (S1-02 / S1-11).docs/localv2.md §4— the special-token form is already permitted bylocalv2.md §4.src/codegenie/probes/base.py—image_digest_resolveralready present (S1-09).
Out of scope¶
- The freshness-check registration
@register_index_freshness_check("runtime_trace")— S5-05 lands it. This story's probe emits the slice fields B2 reads; S5-05 wires the freshness function. - The
image_digest_driftadversarial test — S5-05. - The
adversarial_dockerfilecontainer-hardening test — S5-06 (this story makes the hardening flags present and tested at unit level; S5-06 proves the flags actually contain a forkbomb). DockerfileProbe,EntrypointProbe,ShellUsageProbe,CertificateProbe— S5-03.SyftProbe,GrypeProbe— S5-04 (whichrequires=["runtime_trace"]per the dispatch-ordering ADR — see S5-04'srequiresmechanism).- Sub-schema
src/codegenie/schema/probes/layer_c/runtime_trace.schema.json— S5-03 lands it (this story emits the dict shape; S5-03's sub-schema validates it). - Bench (cold p50 ~90 s) — S8-03 lands the canary; this story's unit tests do not exercise wall-clock targets.
Notes for the implementer¶
- The single most load-bearing test in this story is
test_concurrent_task_count_le_one. It encodesfinal-design.md §"Where security/best-practices traded off perf" (a)— "sequential runtime trace scenarios (~75 s wall-clock floor vs. theoretical 15 s if parallel) — accepted because parallel traces against the same image race resources and confuse attribution." A future PR that introducesasyncio.gatherover scenarios will flip this red. Do not weaken the assertion to "nogatherliteral in source" — that is bypassable. Assert on observed task count, not on syntax. - Envelope
confidencecontract preservation.Probe.confidence: Literal["high","medium","low"]is frozen atsrc/codegenie/probes/base.py:68(andlocalv2.md §4 line 328). The probe's slice carriestrace_coverage_confidence: Literal["high","medium","low","unavailable"](tetra-state — a Phase-2 extension of the tri-state inlocalv2.md §5.3 C4); the envelope'sconfidenceclips this to the contract via_envelope_confidence. NEVER widen the envelopeconfidenceLiteral — even if it seems harmless; the contract amendment requires an ADR-gatedProbe-ABC change that this story does not have.test_envelope_confidence_contract_preservedis the structural defense. ImageDigestUnresolvedis aTraceFailureReasonvariant, NOT aTraceSkipReasonvariant (S5-01 HARDENED variant set). Resolver-returned-None / resolver-unbound / resolver-raised paths all emitTraceScenarioFailed(reason=ImageDigestUnresolved())— NOTSkipped. Docker-build-failure paths emitTraceScenarioSkipped(reason=ImageBuildUnavailable()). The semantic distinction: a scenario that failed to acquire its prerequisite (image digest unresolved) was attempted; a scenario that was never attempted because the image build itself failed was skipped. Do NOT add new variants to S5-01 from this story (Rule 3 — surgical). If the implementer encounters a path that S5-01's variant set genuinely doesn't cover, surface to user and amend S5-01 in a separate PR.- Cache
_resolve_special_tokendispatch lives in this story. S1-09 added theProbeContext.image_digest_resolverfield but did NOT extendcache/keys.py; inspection confirmscache/keys.py::declared_inputs_for(lines 94–126) literally rglobs every entry and silently drops non-matches. As the first consumer, this story lands the resolver. The dispatch is amatchon the token name with an explicitraise CacheKeyError(reason="unknown_special_token", token=…)on the otherwise arm — Open/Closed seam for future tokens (scip-index-output:,tree-sitter-grammar-set:); future arms add via ADR amendment to 02-ADR-0004. Fold the resolved string (or sentinel""for None / unbound / raised) into the content-hash tuplekey_foralready constructs. - The macOS path is permanent. Resist the urge to add a "TODO: implement dtruss with sudo" comment. The synthesis explicitly chose
StraceUnavailableover a sudo-prompting dtruss path because the sudo prompt would break determinism and CI is Linux-canonical. The macOS path emits the typed failure so S5-05's freshness check + S8-01's renderer surface it loudly. Canonical detector issys.platform != "linux"(NOTos.uname().sysname); pick one and stick with it. - Layer C does NOT use
run_external_cli. 02-ADR-0001 (final-design.md §"Departures" #1). Therun_external_cliwrapper (S1-07) addsbubblewrap --unshare-netand env-strip for Layer B/G scanners. For Layer C the equivalent isolation is the--network=none --cap-drop=ALL --security-opt=no-new-privilegesflags constructed at the call site — different mechanism, same outcome. Wrappingdockerinsidebubblewrap --unshare-netwould preventdocker buildfrom working (Docker daemon socket access). Thetest_no_run_external_cli_in_sourcesmoke test is the structural enforcement. image_digest_resolverraising path. Any exception fromctx.image_digest_resolver(repo_root)is caught and translated to per-scenarioTraceScenarioFailed(reason=ImageDigestUnresolved())+ structured-logimage_digest_unresolved_reason="resolver_raised". The original exception'srepr(NOTstr(exc)or the message body) goes into a separate structured-log fieldimage_digest_resolver_error_repr— defensive against PII leak via exception text. The probe never raises out ofrun().- Cache HIT semantics. When Phase 0
Cachereturns a HIT (resolvedimage-digest:<digest>token matches cached token), the probe'srun()should NOT be re-entered for the scenarios block — the cached slice is returned. The "second-run_execute_scenariocall count == 0" assertion guards this. If you find yourself touchingCache.get/put, stop — Phase 0Cachealready handles HIT short-circuiting viakey_for; this probe just needs to emitdeclared_inputscorrectly and accept the cached envelope. - Aggregate timeout semantics. When the aggregate 600 s budget expires mid-scenario, the not-yet-started scenarios get
TraceScenarioSkipped(reason=ImageBuildUnavailable())(closest existing S5-01 variant for "didn't run"). The currently-executing scenario, onasyncio.CancelledError, getsTraceScenarioFailed(reason=ScenarioTimeout(seconds=<remaining>)). Document this in the module docstring so a future maintainer doesn't conflate the two paths. _image_builtis per-run(), not per-instance. The flag lives as a local in_run_all_scenarios(...)(or as an accumulator threaded through the per-scenario tuple return) and is passed explicitly into_execute_scenario(...). NOself._image_builtattribute. If the coordinator ever reuses a probe instance across gathers (a possibility the kernel hasn't ruled out), instance-level state would poison the secondrun()— the image may have been rebuilt between gathers, and we MUST rundocker buildexactly once perrun()invocation.test_image_built_local_not_instanceis the structural defense.- No
pytest-xdist— Phase 2 ADR-0009 vetoed parallel test execution. Even this probe's unit tests are serial. Wall-clock cost is paid in CI'sunitjob budget (≤ 90 s per Step 5 README; verify in S8-03's bench canary). - The slice's
built_image_digestandlast_traced_image_digestare what S4-01'sIndexHealthProbereads. Today they are identical when a fresh trace succeeds; S5-05 introduces theimage_digest_driftadversarial that mutates them apart so B2 emitsStale(DigestMismatch(...)). - Strace-parsing is pure.
_parse_strace_lines(lines: Iterable[str]) -> ParsedTraceover an iterable of lines, returning a frozen Pydantic model. Set-valued fields (binaries_executed,shared_libs_loaded,cert_paths_read,files_read_at_runtime,network_endpoints_touched) arefrozensetso permutation stability is structural; onlyshell_invocations: intis non-commutative (count may differ under reorderings that group/un-group exec lineages — documented in module docstring). The Hypothesis property test (tests/property/test_strace_parser_commutativity.py) exercises permutation stability for the set fields. - Image-ref smart-constructor format.
_image_ref_for_digest(digest: str) -> strreturns_IMAGE_REF_PREFIX + _short(digest)where_IMAGE_REF_PREFIX: Final[str] = "codegenie-trace:"(module constant) and_shortstrips any leading"sha256:"prefix then takes the first 12 hex characters. Empty / non-hex inputs raiseValueError. The format is pinned in one helper — no string concatenation of"codegenie-trace:"at any call site. - Pure argv builders.
_build_strace_argvand_build_docker_run_argvare module-private pure functions; each is importable bytests/unit/probes/layer_c/test_runtime_trace_argv_builders.pyand asserted without mockingrun_allowlisted. The strace builder produces argv with exactly one"--"token separating strace's own args from the wrappeddocker runinvocation; mutation #3 (argv-merge regressions) is caught bytest_build_strace_argv_explicit_dash_dash_separator. - Operator-extensibility for scenarios. Adding a 6th, 7th, … scenario is a
.codegenie/scenarios.yamloperator edit. Zeroruntime_trace.pyedit required.test_six_plus_scenarios_via_yaml_zero_source_editis the structural defense. Adding a new canonical default scenario is a separate (rarer) event — that's alocalv2.md §5.3 C4doc amendment +_DEFAULT_SCENARIOSconstant edit, with the source-scan uniqueness test catching drift if a sibling module hardcodes its own list. - Newtype deferral (S1-05).
image_ref: str,image_digest: str,scenario_name: streach cross ≥ 2 module boundaries (probe ↔ cache, probe ↔ slice, probe ↔ structured-log). S1-05 is the canonical newtype story; mirror S5-01 DF-5's deferral — do NOT introduce newtypes here (Rule 2 — premature abstraction with only one in-tree producer). When S1-05 lands (or when a 3rd consumer of these strings emerges), the migration is a one-pass rename + alias. - Trace-backend Protocol deferral (Phase 5 / Phase 7). The macOS/Linux split is one
iftoday (two cases — below the rule-of-three threshold). When a 3rd backend lands (microVM ptrace? Phase 5? dtrace under Chainguard distroless? Phase 7?), refactor_TraceBackend = ProtocolwithStrace,Unavailable,Ptraceimpls. Today'sifis the boring shape and is fine per CLAUDE.md Rule 2 ("three similar lines is better than a premature abstraction"). - Producer/consumer
assert_neverladder. This story is the 1st canonical producer ofScenarioResult(S5-01 was the type introduction). Document in the module docstring: producers = {RuntimeTraceProbe}; consumers = {_aggregate_scenarios(in-module), S5-05 freshness check, S8-01 renderer}. Mirror S5-01's "rehearse the discipline at every level" — thematchruns onScenarioResulttop-level AND on the innerTraceFailureReason/TraceSkipReasonreasons where used. forbidden-patternsextension. S5-01 coveredprobes/layer_c/scenario_result.py. This story is the 2ndprobes/layer_c/module; verify by inspection thatscripts/check_forbidden_patterns.py::_is_under_phase2_banned_packagematches the wholeprobes/layer_c/subdirectory (most likely it does, since S5-01 prescribed the path-scoped predicate). If not, extend in this story PR mirroring S5-01 AC-11 pattern; the dedicatedtests/unit/pre_commit/test_forbidden_patterns_phase2_runtime_trace.pyis the structural defense.- mypy enforcement.
[tool.mypy] warn_unreachable = trueis repo-wide since Phase 0 S1-02 (pyproject.tomlline 141 — established by S1-11 validation, reaffirmed in S5-01 validation). No per-module override needed.test_mypy_warn_unreachable_is_repo_wideis the cross-cut defense. - Open question — distroless target image (Phase 7 forward-looking): the
distroless-targetfixture (S7-01) exercisesRuntimeTraceProbeagainst an image wherestracecannot attach (distroless has no/proc/self/exesymlink for the host strace to read against). Today: same shape as macOS —TraceScenarioFailed(reason=StraceUnavailable())per scenario, surfaced via structured log. Document in the module docstring as an open path that S7-01 stresses.