Skip to content

Story S5-02 — RuntimeTraceProbe — sequential 5-scenario harness + image-digest token

Step: Step 5 — Ship Layer C (runtime + container) probes Status: Done (GREEN 2026-05-17 — see _attempts/S5-02.md) Effort: L Depends on: S5-01 (ScenarioResult discriminated union), S1-09 (ProbeContext.image_digest_resolver), S1-08 (@register_probe(heaviness="heavy")), S1-06 (docker, strace in ALLOWED_BINARIES), S1-07 (run_external_cli exists — but Layer C calls run_allowlisted directly, not via this wrapper), S3-03 (writer chokepoint with RedactedSlice), S4-01 (IndexHealthProbe consumes last_traced_image_digest / built_image_digest from this probe's slice — S5-05 wires the freshness check) ADRs honored: 02-ADR-0001 (docker/strace allowlist), 02-ADR-0003 (heaviness="heavy" sort), 02-ADR-0004 (image-digest as declared_inputs special token via ProbeContext.image_digest_resolver), 02-ADR-0007 (no Plugin Loader, no plugin.yaml — the probe is in-tree), 02-ADR-0010 (writer chokepoint RedactedSlice)

Validation notes (2026-05-16)

Story hardened by phase-story-validator (_validation/S5-02-runtime-trace-probe.md). This is the 1st canonical consumer of S5-01's ScenarioResult, the 1st canonical Layer-C probe (setting the precedent that Layer C calls run_allowlisted directly, not via run_external_cli), and the 1st consumer of the image-digest:<resolved> declared-input special-token mechanism (so this story also lands the cache/keys.py::_resolve_special_token dispatch arm — S1-09 added the ProbeContext.image_digest_resolver field but not the cache-side resolver). Verdict: HARDENED. Eighteen in-place edits applied (full details in _validation/):

  1. Envelope confidence contract preserved (CF1 / NF-B). Probe.confidence: Literal["high","medium","low"] (frozen at base.py:68 / localv2.md §4 line 328) does not admit "unavailable". The four prior occurrences of envelope confidence="unavailable" are routed to envelope confidence="low" + slice trace_coverage_confidence="unavailable" (the Phase-2 extension of localv2.md §5.3 C4's tri-state, surfaced via a new AC pinning the contract preservation).
  2. ImageDigestUnresolved routed to Failed, not Skipped (NF-A). S5-01's HARDENED variant set places ImageDigestUnresolved in TraceFailureReason, NOT TraceSkipReason. AC-11 + AC-12 rewritten: resolver-returned-None and resolver-is-None paths emit TraceScenarioFailed(reason=ImageDigestUnresolved()). Docker-build-failure paths remain TraceScenarioSkipped(reason=ImageBuildUnavailable()) (already correct).
  3. Cache _resolve_special_token dispatch arm scoped into this story (CF7 / NF-C). Inspection of cache/keys.py::declared_inputs_for confirms the resolver does NOT exist — image-digest:<resolved> is silently rglobbed and dropped. Three new ACs land it here: (a) recognition by r"^[a-z0-9_-]+:<resolved>$" syntax; (b) image-digest: arm calls ctx.image_digest_resolver(snapshot.root) and folds the digest (or None-fallback) into the content-hash tuple; (c) unknown tokens raise CacheKeyError(reason="unknown_special_token", token=…) via match + assert_never (the Open/Closed seam for future tokens like scip-index-output:).
  4. image_digest_resolver raising → Failed(ImageDigestUnresolved()) (CF6). Any exception from the resolver is caught at the call site, translated, and structured-logged with image_digest_unresolved_reason="resolver_raised". The probe never raises out of run().
  5. _DEFAULT_SCENARIOS: list[ScenarioSpec] (NF-D). The list-of-strings phrasing in AC-4 is replaced with the list-of-ScenarioSpec form prescribed by the implementation outline; _DEFAULT_SCENARIO_NAMES: Final[tuple[str, ...]] is the names-only constant where a name list is needed.
  6. Citation correction (NF-E). final-design.md §"Implementation risks" #7 (4 references) → final-design.md §"Components" #6 + §"Where security/best-practices traded off perf" (a). The sequential-scenario load-bearing rationale lives in those two sections, not in a numbered risks list (final-design.md only enumerates 5 top-level Risks).
  7. cache_strategy = "content" pinned literally (CF2). Not "NOT none" — exact value matches Probe.cache_strategy default + dep_graph / scip precedent.
  8. applies_to_tasks=["*"], applies_to_languages=["*"], requires=[] (NF-F / NF-G). Dockerfile-driven, not source-language-driven; no sibling-slice prerequisite (image-digest comes via ProbeContext callable, not a sibling artifact).
  9. macOS-detection mechanism pinned to sys.platform (TF-D). The parenthetical os.uname().sysname == "Darwin" is removed; only sys.platform != "linux" is canonical.
  10. _HARDENING_FLAGS: Final[tuple[str, ...]] module constant (DF-4). The three hardening flags live in one place; argv builder + tests import the constant; typo-in-one-flag mutations become catchable.
  11. _image_ref_for_digest(digest: str) -> str pure smart constructor (CF3 / DF-3). Returns exactly "codegenie-trace:" + _short(digest) (first 12 hex chars after stripping any sha256: prefix); parametrized test pins format including sha256:-prefixed, empty-string, non-hex edge cases.
  12. _parse_strace_lines pure-function AC + golden fixture + property-test entry (TF-B / DF-1). Set-valued slice fields are permutation-stable under line-ordering shuffles (excluding execve ordering for shell-invocation counting); tests/property/test_strace_parser_commutativity.py adds a Hypothesis-driven entry.
  13. _aggregate_scenarios exhaustive match with assert_never (TF-C / DF-5). Mirrors S5-01 AC-6; the match is rehearsed at every level of the sum (ScenarioResult top-level + TraceFailureReason + TraceSkipReason via S5-01 consumers); mypy --warn-unreachable deletion smoke-test included.
  14. _build_strace_argv / _build_docker_run_argv pure builders + explicit -- separator pin (TF-3 / DF-1 / mutation #3). Each is importable and unit-testable without subprocess mocking; the strace argv asserts the explicit -- token separates strace args from the wrapped command (mutation #3 — argv-merge regressions are catchable).
  15. scenarios_run / scenarios_failed derivation pinned (CF4). scenarios_run = [r.scenario_name for r in results if isinstance(r, TraceScenarioCompleted)]; scenarios_failed = [r.scenario_name for r in results if isinstance(r, TraceScenarioFailed)]; Skipped scenarios appear in neither list (only in per_scenario_artifacts with a None value + structured log).
  16. Slice schema is the COMPLETE observable surface (CF8). Snapshot test asserts the slice dict's keys are EXACTLY the localv2.md §5.3 C4 set (no extras, no missing); drift in either direction flips the test red.
  17. 6+-scenario operator-extensibility test (DF-2). tests/fixtures/scenarios/seven_scenarios.yaml exercises a 7-scenario run end-to-end with zero runtime_trace.py edit; a source-scan test asserts _DEFAULT_SCENARIOS is the ONLY in-source scenario list (so a future hardcoded-6th-scenario drift in some sibling module is caught).
  18. _image_built per-run() not per-instance (TF-E / mutation #15). The flag is a local in _run_all_scenarios passed explicitly into _execute_scenario. A regression test runs the probe twice on the same instance and asserts docker build is invoked exactly once per run().

Notes-for-implementer extended with seven new paragraphs (contract-confidence routing, image-ref format, macOS detection mechanism, cache-token dispatch lives here, newtype deferral S1-05, trace-backend Protocol deferral, _image_built lifecycle). Three new test files land: tests/unit/cache/test_special_token_dispatch.py, tests/property/test_strace_parser_commutativity.py, tests/unit/probes/layer_c/test_runtime_trace_argv_builders.py.

Context

RuntimeTraceProbe is the densest single probe in Phase 2 and the single most valuable probe for distroless confidence (localv2.md §5.3 C4 — "without this, distroless migration breaks silently in production"). It runs five scenarios (startup, smoke_test, healthcheck, shutdown, error_path) against the analyzed-repo's container, captures syscalls / loaded libraries / shell invocations / network endpoints via strace -f -e trace=openat,execve,connect,bind,mmap (Linux) or deterministically emits TraceScenarioFailed(StraceUnavailable()) per scenario (macOS — no sudo prompt, no dtruss, the macOS path is permanent per final-design.md §"Where security/best-practices traded off perf" (a)). The five scenarios serialize through a single asyncio task — concurrent docker run of the same image races resources and confuses trace attribution (final-design.md §"Components" #6 + §"Where security/best-practices traded off perf" (a)). Per-scenario timeout 120 s; aggregate 600 s.

The cache-correctness story (02-ADR-0004): a package.json-only change with the image rebuilt-and-pushed-with-same-digest must cache-HIT; a FROM-line bump or base-image rebuild (new digest) must cache-MISS. The signal is in declared_inputs as the special token image-digest:<resolved> — Phase 0 Cache's special-token resolver calls ProbeContext.image_digest_resolver(repo_root) -> str | None, the one Phase 0 contract extension Phase 2 makes (S1-09).

The container-hardening triple --network=none --cap-drop=ALL --security-opt=no-new-privileges is non-negotiabletest_adversarial_dockerfile.py (S5-06) is the proof that a forkbomb/infinite-loop Dockerfile is contained.

References

  • phase-arch-design.md §"Component design" #6 (RuntimeTraceProbe) — the canonical internal-structure prose.
  • phase-arch-design.md §"Edge cases" rows 5, 6, 14 — docker-build failure, macOS strace, image-digest resolver returns None.
  • phase-arch-design.md §"Data model" — ProbeContext additive fieldimage_digest_resolver.
  • final-design.md §"Components" #6 — sequential scenarios, p50 ~90 s, image-digest cache key.
  • final-design.md §"Where security/best-practices traded off perf" (a) — "sequential runtime trace scenarios (~75 s wall-clock floor vs. theoretical 15 s if parallel) — accepted because parallel traces against the same image race resources and confuse attribution" — the load-bearing rationale test_concurrent_task_count_le_one defends.
  • final-design.md §"Conflict-resolution table" rows 9, 16cache_key strategy + cache-key shape.
  • 02-ADR-0001docker, strace in ALLOWED_BINARIES; Layer C calls run_allowlisted directly (not run_external_cli).
  • 02-ADR-0004 — image-digest as declared_inputs special token; cache_key() override refused; resolver is Optional[Callable]. §Consequences names cache.py's token-recognizer dispatch as the new extension surface.
  • High-level-impl.md §"Step 5" — Risks specific to Step 5: parallel-scenarios redirect; macOS determinism; container-hardening flags non-negotiable.
  • localv2.md §5.3 C4 — output slice shape (shared_libs_loaded, cert_paths_read, files_read_at_runtime, shell_invocations, network_endpoints_touched, trace_coverage_confidence).
  • Phase 0 run_allowlisted (src/codegenie/exec/__init__.py) — direct call site for docker/strace.
  • Phase 0 cache/keys.py::declared_inputs_for — the function this story extends with _resolve_special_token dispatch (S1-09 added ProbeContext.image_digest_resolver but did NOT extend cache/keys.py; this story is the first consumer and lands the dispatch arm).
  • S5-01 sum-type modules — src/codegenie/probes/layer_c/scenario_result.py (ScenarioResult, TraceFailureReason with variants StraceUnavailable | DockerBuildFailed | ScenarioTimeout | ImageDigestUnresolved, TraceSkipReason with variants NoDockerfile | ImageBuildUnavailable). ImageDigestUnresolved is a TraceFailureReason variant, NOT TraceSkipReason — this story honors the S5-01 variant placement.

Goal

Implement src/codegenie/probes/layer_c/runtime_trace.py — a @register_probe(heaviness="heavy") probe that builds the analyzed repo's container, runs five scenarios sequentially under the container-hardening triple, captures syscalls via strace -f (Linux) or emits TraceScenarioFailed(StraceUnavailable()) per scenario (macOS) deterministically, declares the image-digest:<resolved> special token in declared_inputs, and emits a ProbeOutput whose slice round-trips through the Phase 2 writer chokepoint with RedactedSlice. Per-scenario 120 s timeout; aggregate 600 s; docker build failure → all five scenarios TraceScenarioSkipped(reason=ImageBuildUnavailable()), envelope confidence="low", slice trace_coverage_confidence="unavailable". Also lands the Phase 0 cache/keys.py::_resolve_special_token dispatch arm (the first consumer of the special-token mechanism) honoring the localv2.md §4 syntax. The frozen Probe.confidence: Literal["high","medium","low"] contract is preserved verbatim: "unavailable" is a slice-level signal on trace_coverage_confidence, never an envelope value.

Acceptance criteria

  • [ ] src/codegenie/probes/layer_c/runtime_trace.py exists; declares class RuntimeTraceProbe(Probe) decorated with @register_probe(heaviness="heavy", runs_last=False) (S1-08 decorator). Class attributes pinned: name = "runtime_trace", layer = "C", tier = "base", applies_to_tasks: list[str] = ["*"], applies_to_languages: list[str] = ["*"] (Dockerfile-driven, not source-language-driven — operator-extensibility per DF-2), requires: list[str] = [] (no sibling-slice prerequisite; image-digest comes via ProbeContext callable, not via sibling artifact), timeout_seconds: int = 300 (envelope-side default; per-scenario + aggregate timeouts are separate constants below). A unit test asserts the literal class-attribute shape.
  • [ ] RuntimeTraceProbe.declared_inputs is the literal list ["Dockerfile", ".codegenie/scenarios.yaml", "image-digest:<resolved>"] — the literal token string with the <resolved> placeholder. The Phase 0 cache layer's _resolve_special_token dispatch (extended by this story — see new AC below) recognizes the <name>:<resolved> syntax and expands image-digest:<resolved> via ctx.image_digest_resolver(snapshot.root). A unit test asserts the literal three-entry shape; a separate unit test (under tests/unit/cache/) asserts the dispatch arm works.
  • [ ] RuntimeTraceProbe.cache_strategy: Literal["content"] = "content" — exact value. Matches the Probe.cache_strategy default (base.py:83) and the dep_graph / scip precedent. cache_strategy="none" is reserved for B2 (S4-01); this probe's whole point is to cache against image-digest equality, so "content" is the right value. A unit test asserts the literal annotation Literal["content"] and runtime value "content" via inspect.get_annotations.
  • [ ] Reads .codegenie/scenarios.yaml via Phase 1 safe_yaml.load chokepoint; Pydantic-validates against an internal ScenariosConfig(BaseModel) model with required field scenarios: list[ScenarioSpec]; falls back to a module-level constant _DEFAULT_SCENARIOS: Final[tuple[ScenarioSpec, ...]] carrying five ScenarioSpec instances (the five canonical names + minimal command argvs) when the file is absent. File present but malformed → the probe envelope reports a load error (envelope confidence="low", slice scenarios_run=[], scenarios_failed=[], all five scenarios as TraceScenarioFailed(reason=ScenarioYamlMalformed | DockerBuildFailed) per the S5-01 variant set — pick the closest existing S5-01 reason and document in Notes; do NOT add a new S5-01 variant here, Rule 3); not silent-fallback. Names-only constant _DEFAULT_SCENARIO_NAMES: Final[tuple[str, ...]] = ("startup", "smoke_test", "healthcheck", "shutdown", "error_path") exists for places where only a name list is needed (e.g., logs).
  • [ ] Cache _resolve_special_token dispatch arm lands in this story (because S5-02 is the first consumer of the mechanism; S1-09 added the ProbeContext.image_digest_resolver field but did NOT extend cache/keys.py). Three observable sub-criteria:
  • (a) Recognition syntax. cache/keys.py::declared_inputs_for recognizes any entry in probe.declared_inputs matching r"^[a-z0-9_-]+:<resolved>$" as a special token (NOT a glob). All other entries continue to rglob as today.
  • (b) image-digest: arm. When the recognized token name is image-digest, the dispatch calls ctx.image_digest_resolver(snapshot.root) if non-None; folds the resulting string (or the sentinel "" if None-returned / None-bound / resolver raised) into the content-hash tuple alongside the file content hashes. A unit test runs key_for(...) over a synthetic probe with the token in declared_inputs + two different resolvers (returning different digests) and asserts the two cache keys differ; runs with the SAME resolver twice and asserts the keys are byte-identical.
  • (c) Unknown-token fail-loud. The dispatch is a match on the token name with assert_never on the otherwise branch via raising CacheKeyError(reason="unknown_special_token", token=<full_token_string>). A unit test asserts an unknown token (bogus:<resolved>) raises CacheKeyError whose message contains both "unknown_special_token" AND the full token string. The match is the Open/Closed seam for future tokens (scip-index-output:, tree-sitter-grammar-set:); adding a new arm requires an ADR amendment to 02-ADR-0004.
  • [ ] Sequential per-scenario execution — verified by tests/unit/probes/layer_c/test_runtime_trace.py::test_concurrent_task_count_le_one: an asyncio.Event-driven instrumentation hook (self._scenario_in_progress: asyncio.Event set inside _execute_scenario for the duration of one scenario run; cleared between scenarios) plus a test-side _observer_task that loops await asyncio.sleep(0); count = len([t for t in asyncio.all_tasks() if t.get_name().startswith("runtime_trace_scenario_")]) until the run completes. The test asserts count <= 1 at every sampled tick AND len(samples) >= 10. The assertion is on observed task count, not absence of asyncio.gather in source (which a future contributor could re-introduce subtly).
  • [ ] _HARDENING_FLAGS: Final[tuple[str, ...]] = ("--network=none", "--cap-drop=ALL", "--security-opt=no-new-privileges") is exposed as a module-level constant. Both _build_docker_run_argv and the test_hardening_flags_in_argv test import the constant — there is no string duplication of any of the three flags anywhere in the source. A unit test imports the constant and asserts it equals exactly that 3-tuple (catches a typo-in-one-flag mutation that set-checking alone would miss).
  • [ ] Per scenario: docker builddocker run <_HARDENING_FLAGS unpacked> -- <image_ref> <scenario-command argv> wrapped by strace -f -e trace=openat,execve,connect,bind,mmap on Linux. The three hardening flags AND the explicit -- separator are passed as separate argv tokens (no string-concat); a unit test mocks run_allowlisted and asserts the captured argv (a) contains all three of _HARDENING_FLAGS tokens in any order; (b) contains a literal "--" token immediately before image_ref; (c) does NOT contain any string equal to _HARDENING_FLAGS concatenated (catches a mutation that joins them with spaces).
  • [ ] Pure argv builder functions_build_strace_argv(image_ref: str, command_argv: list[str]) -> list[str] and _build_docker_run_argv(image_ref: str, command_argv: list[str]) -> list[str] are pure module-private functions (no I/O, no subprocess, no logging). Each is imported by a dedicated unit test under tests/unit/probes/layer_c/test_runtime_trace_argv_builders.py and asserted without mocking run_allowlisted. The strace builder additionally asserts the argv contains exactly one literal "--" token (separating strace's own arguments from the wrapped docker run invocation, mutation #3).
  • [ ] All docker and strace calls route through run_allowlisted DIRECTLY — not run_external_cli. A grep test (tests/unit/probes/layer_c/test_runtime_trace_no_external_cli_wrap.py) asserts the probe's source has zero run_external_cli references and ≥ 1 run_allowlisted reference (02-ADR-0001 + final-design.md §"Departures" reaffirmation).
  • [ ] Per-scenario asyncio.wait_for(..., timeout=120); aggregate guard asyncio.wait_for(..., timeout=600) around the for-loop. Both timeouts are constants exported as _PER_SCENARIO_TIMEOUT_S: Final[int] = 120 and _AGGREGATE_TIMEOUT_S: Final[int] = 600 (test imports them and asserts the values; a deliberate edit to 60/300 flips the test red).
  • [ ] macOS path is deterministic — no sudo prompt: on sys.platform != "linux" (canonical detector; the os.uname() form is NOT used — pick one and stick with it), the probe does not invoke strace or dtruss; each scenario short-circuits to TraceScenarioFailed(scenario_name=..., reason=StraceUnavailable()). Unit test: under monkey-patched sys.platform = "darwin", the probe's run completes without any run_allowlisted("strace", ...) or run_allowlisted("sudo", ...) invocation (verified by mock-spy on run_allowlisted rejecting any argv[0] in {"strace", "sudo", "dtruss"} — assertion-by-rejection so a wrong probe path crashes the spy rather than silently passing).
  • [ ] docker build failure (non-zero exit) → all five scenarios skip with TraceScenarioSkipped(reason=ImageBuildUnavailable(...)); the probe envelope's confidence is "low" (the frozen Literal["high","medium","low"] contract — Probe.confidence does NOT admit "unavailable"); the slice's trace_coverage_confidence is "unavailable"; the slice's built_image_digest is None; the slice's last_traced_image_digest is None. IndexHealthProbe (S4-01) reads built_image_digest and last_traced_image_digest from this slice and emits IndexFreshness.Stale(IndexerError(message="upstream_runtime_trace_unavailable")) — covered by a fixture test that constructs the slice and roundtrips through B2's freshness loop (the S4-01 freshness loop call is exercised in a small integration test landed via S5-05; this story emits the slice fields B2 reads and asserts the slice-emission shape; the freshness-side roundtrip lives in S5-05).
  • [ ] Image-digest cache HIT skips scenarios. Two-layer test: (1) tests/unit/cache/test_special_token_dispatch.py::test_image_digest_resolver_changes_cache_key exercises the new dispatch arm on a synthetic probe with the token in declared_inputs (this proves the resolver works); (2) tests/unit/probes/layer_c/test_runtime_trace.py::test_cache_hit_skips_scenarios runs the probe twice with the same (Dockerfile, scenarios.yaml, fixed-digest) tuple, expects the second run hits cache, asserts a mock-spy on _execute_scenario is called five times on the first run and zero times on the second.
  • [ ] image_digest_resolver returns None (no built image yet) → the probe envelope confidence="low" (NOT "unavailable" — contract preservation); slice trace_coverage_confidence="unavailable"; slice built_image_digest=None; scenarios are all TraceScenarioFailed(reason=ImageDigestUnresolved()) (NOT Skipped — ImageDigestUnresolved lives in S5-01's TraceFailureReason, not TraceSkipReason); cache key folds in the sentinel "" for the unresolved token (per the new dispatch AC above), so the cache still has a stable key over multiple resolver-returns-None runs. Reference: phase-arch-design.md §"Edge cases" row 14 + 02-ADR-0004 §Consequences.
  • [ ] image_digest_resolver is None on ProbeContext (operator never bound one) → identical envelope/slice shape to "resolver returned None"; the probe never raises. Covered by a dedicated unit test. The two None paths are distinguished only in the structured log field image_digest_unresolved_reason: Literal["resolver_unbound", "resolver_returned_none"].
  • [ ] image_digest_resolver raises → caught at the call site, translated to per-scenario TraceScenarioFailed(reason=ImageDigestUnresolved()) for ALL five scenarios; structured log emits image_digest_unresolved_reason="resolver_raised"; the original exception's repr is in a separate structured-log field image_digest_resolver_error_repr (never the message body — defensive against PII leak via exception text). The probe never raises out of run(). A unit test mocks the resolver to raise; asserts the probe completes; asserts the structured-log field values.
  • [ ] Envelope-confidence contract preservation pin. A unit test asserts inspect.get_annotations(ProbeOutput)["confidence"] evaluates to Literal["high", "medium", "low"] (not widened); a parametrized test runs the probe across all six envelope-failure paths (build failure / resolver None-returned / resolver None-bound / resolver raised / aggregate timeout / all-scenarios-timed-out) and asserts the envelope confidence is always in {"high", "medium", "low"} — never "unavailable". This is the structural pin against a future contributor widening the contract silently.
  • [ ] scenarios_run / scenarios_failed / Skipped routing pinned. Slice fields derive deterministically from results: list[ScenarioResult]: scenarios_run = [r.scenario_name for r in results if isinstance(r, TraceScenarioCompleted)]; scenarios_failed = [r.scenario_name for r in results if isinstance(r, TraceScenarioFailed)]; TraceScenarioSkipped scenarios appear in neither list — they surface only in per_scenario_artifacts (with a None value for that scenario name) and in the per-scenario structured log. A parametrized test covers all combinations.
  • [ ] Output slice schema matches the relevant subset of localv2.md §5.3 C4: artifact_uri, per_scenario_artifacts: dict[str, Path | None], scenarios_run: list[str], scenarios_failed: list[str], binaries_executed: list[str], shared_libs_loaded: list[str], cert_paths_read: list[str], files_read_at_runtime: {summary, full_list_uri}, shell_invocations: int, network_endpoints_touched: {outbound, inbound}, built_image_digest: str | None, last_traced_image_digest: str | None, trace_coverage_confidence: Literal["high", "medium", "low", "unavailable"]. The slice schema is the COMPLETE observable surface: a snapshot test asserts set(slice.keys()) == EXPECTED_SLICE_KEYS for both a healthy run and an all-skipped run; drift in either direction (extra or missing key) flips the test red. Sub-schema lands as part of S5-03 / S5-04 (src/codegenie/schema/probes/layer_c/); this story emits the dict shape that the sub-schema validates.
  • [ ] trace_coverage_confidence derivation: 5/5 scenarios completed → "high"; smoke-only or 2–4 completed → "medium"; startup-only → "low"; 0 completed → "unavailable" (matches localv2.md §5.3 C4 + this story's explicit extension of the tri-state to a tetra-state). Pure function _derive_trace_coverage_confidence(results: list[ScenarioResult]) -> Literal["high", "medium", "low", "unavailable"]; table-driven test over (n_completed: 5..0) with the documented mapping; type checker confirms exhaustiveness.
  • [ ] _aggregate_scenarios(results: list[ScenarioResult]) -> SliceFields is a pure function over the per-scenario outcome list that returns the slice fields (scenarios_run, scenarios_failed, binaries_executed, ..., trace_coverage_confidence). It matches on every variant of ScenarioResult with assert_never on the otherwise branch (mirrors S5-01 AC-6 exhaustive-match discipline; the producer/consumer ladder S5-01 documents — S5-02's _aggregate_scenarios is the 1st canonical consumer of ScenarioResult). A mypy --warn-unreachable smoke-test verifies that deleting one case arm produces a type-check error.
  • [ ] _image_ref_for_digest(digest: str) -> str is a pure smart constructor returning exactly "codegenie-trace:" + _short(digest) where _short strips any leading "sha256:" prefix and takes the first 12 hex characters. A parametrized test pins the format over: "sha256:cafef00ddeadbeef...""codegenie-trace:cafef00ddead"; bare hex "cafef00ddeadbeef...""codegenie-trace:cafef00ddead"; empty ""ValueError("empty digest"); non-hex "not-a-digest"ValueError("non-hex digest"). The tag prefix "codegenie-trace:" is itself a module-level Final[str] constant (no string duplication).
  • [ ] _parse_strace_lines(lines: Iterable[str]) -> ParsedTrace is a pure function over an iterable of strace output lines, returning a frozen ParsedTrace Pydantic model with fields binaries_executed: frozenset[str], shared_libs_loaded: frozenset[str], cert_paths_read: frozenset[str], files_read_at_runtime: frozenset[str], shell_invocations: int, network_endpoints_touched: frozenset[tuple[str, str]]. Tested via (a) golden fixture tests/fixtures/strace/minimal.strace over a known-good snippet asserting the exact parsed model; (b) malformed-line resilience — _parse_strace_lines(["this is not strace output", "neither is this"]) returns the all-empty ParsedTrace (does NOT raise); (c) Hypothesis property test in tests/property/test_strace_parser_commutativity.py — for any permutation of the fixture lines, the set-valued fields are byte-identical (shell_invocations is the only count-valued field; documented in the module as the one non-commutative exception).
  • [ ] _image_built is per-run(), not per-instance. The flag lives as a local in _run_all_scenarios(...) and is passed explicitly into _execute_scenario(...). No attribute on self. A regression test runs the probe twice on the same instance (probe = RuntimeTraceProbe(); await probe.run(...); await probe.run(...)) and asserts docker build is invoked once per run() — total two builds across two runs (not one across both, which would be wrong — image may have been rebuilt between gathers).
  • [ ] Operator-extensibility for scenarios (Open/Closed). A fixture tests/fixtures/scenarios/seven_scenarios.yaml declares 7 scenario names (the 5 defaults + 2 operator-added: db_migrate, worker_drain). A test runs the probe end-to-end against a fixture repo with this YAML and asserts: (a) all 7 scenarios were executed in declared order; (b) the slice's scenarios_run (or scenarios_failed/per_scenario_artifacts) covers all 7 names; (c) zero edits to src/codegenie/probes/layer_c/runtime_trace.py were required to support the 6th + 7th — verified by a separate source-scan test (assert _DEFAULT_SCENARIO_NAMES == ("startup", "smoke_test", "healthcheck", "shutdown", "error_path") — unchanged from the canonical 5).
  • [ ] _DEFAULT_SCENARIOS source-scan uniqueness. A test greps the src/codegenie/ tree and asserts the symbol _DEFAULT_SCENARIOS (or any literal list/tuple constant equivalent shape) appears in EXACTLY ONE source file (probes/layer_c/runtime_trace.py). This catches a future drift where a sibling module silently hardcodes its own 6th-scenario list — the operator-side scenarios.yaml is the only legitimate extension surface.
  • [ ] Slice flows through the writer chokepoint as RedactedSlice (S3-02 / S3-03); a test asserts secrets_redacted_count == 0 for a clean fixture and >= 1 for a fixture whose smoke-test command echoes an AWS-format key (the SecretRedactor from S3-01 must catch it on the runtime trace path).
  • [ ] Structured log fields emitted at least once per probe run: probe.runtime_trace.dispatch, probe.runtime_trace.scenario_started (per scenario), probe.runtime_trace.scenario_finished (per scenario, includes wall_clock_ms and the kind of the ScenarioResult), probe.runtime_trace.image_digest_resolved (or …unresolved), probe.runtime_trace.cache_hit (when applicable), probe.runtime_trace.finish.
  • [ ] mypy --strict clean on the new modules. No per-module override needed: [tool.mypy] warn_unreachable = true is already repo-wide since Phase 0 S1-02 (established by S1-11 validation; reaffirmed in S5-01 validation). A unit test asserts the repo-wide flag is present and unmodified after this story; both new modules (probes/layer_c/runtime_trace.py and the cache/keys.py extension) are included in the default mypy --strict glob (no exclude entry).
  • [ ] Phase 0 fence job stays green — no httpx, requests, socket, anthropic, openai, langgraph imports added.
  • [ ] forbidden-patterns pre-commit covers src/codegenie/probes/layer_c/runtime_trace.py (S5-01 already extended _is_under_phase2_banned_package for probes/layer_c/scenario_result.py — verify the predicate matches the whole probes/layer_c/ subdirectory by inspection of scripts/check_forbidden_patterns.py::_is_under_phase2_banned_package; if narrower, extend in this story PR mirroring S5-01 AC-11's pattern). A dedicated test (tests/unit/pre_commit/test_forbidden_patterns_phase2_runtime_trace.py) writes synthetic model_construct / subprocess.run / asyncio.create_subprocess_exec source under src/codegenie/probes/layer_c/synth_runtime_trace.py (tmp_path-rooted) and asserts the script exits non-zero with both 02-ADR-0010 §Decision and production ADR-0033 §3 substrings emitted. Negative coverage: same source under probes/layer_a/synth.py exits zero.

Implementation outline

  1. Extend src/codegenie/cache/keys.py::declared_inputs_for to dispatch special tokens before the existing rglob path. New helper _resolve_special_token(token: str, snapshot: RepoSnapshot, ctx: ProbeContext) -> str (pure function over the strings; the only impure call is ctx.image_digest_resolver(snapshot.root)). Regex _SPECIAL_TOKEN_RE = re.compile(r"^([a-z0-9_-]+):<resolved>$") decides which entries are tokens vs globs. The dispatch is match token_name: case "image-digest": ...; case _: raise CacheKeyError(reason="unknown_special_token", token=token)assert_never-equivalent via the explicit raise. Inject the resolved string into the content-hash tuple key_for already constructs (sentinel "" for None-returned / None-bound / resolver-raised cases — the cache key is stable across all three "unresolved" paths). Add CacheKeyError to cache/keys.py (sibling of existing types in the file). No ProbeContext schema edit — the field S1-09 added is already present.
  2. Define ScenarioSpec Pydantic model in src/codegenie/probes/layer_c/runtime_trace.py: required name: str, optional command: list[str] (argv to pass to docker run), optional expected_exit_code: int = 0. Define ScenariosConfig(scenarios: list[ScenarioSpec]). Both frozen=True, extra="forbid".
  3. Define _DEFAULT_SCENARIOS: Final[tuple[ScenarioSpec, ...]] with the five canonical names; each default carries a minimal command argv (e.g., ["sh", "-c", "exit 0"] for startup; smoke/healthcheck/shutdown/error_path defaults follow the localv2.md §5.3 C4 prose). Names-only constant _DEFAULT_SCENARIO_NAMES: Final[tuple[str, ...]] = ("startup", "smoke_test", "healthcheck", "shutdown", "error_path") exists for log/render places that need only names.
  4. Implement RuntimeTraceProbe.declared_inputs as a class attribute (matching the kernel ABC at base.py:81): declared_inputs: list[str] = ["Dockerfile", ".codegenie/scenarios.yaml", "image-digest:<resolved>"]. The image-digest:<resolved> form is the literal string the new cache/keys.py::_resolve_special_token dispatch recognizes (Step 0 above); the resolver substitution happens inside cache/keys.py, not here.
  5. Implement RuntimeTraceProbe.run(self, snapshot: RepoSnapshot, ctx: ProbeContext) -> ProbeOutput:
  6. (a) Resolve image_digest, unresolved_reason = _resolve_image_digest(ctx, snapshot.root) — a pure-ish helper that returns (digest_str_or_None, "resolver_unbound" | "resolver_returned_none" | "resolver_raised" | None). Wraps the resolver call in try/except Exception (Rule 5/12 — fail loud at the call site, not silently). Structured-log the outcome.
  7. (b) If image_digest is None: short-circuit → emit ProbeOutput with scenarios_run=[], scenarios_failed=[name for name in _DEFAULT_SCENARIO_NAMES] (because all five emit TraceScenarioFailed(reason=ImageDigestUnresolved())), all TraceScenarioFailed per-scenario list, slice built_image_digest=None, slice trace_coverage_confidence="unavailable", envelope confidence="low" (contract preservation).
  8. (c) Else: load scenarios.yaml (Pydantic-validate via ScenariosConfig); fall back to _DEFAULT_SCENARIOS on absence. Malformed YAML → all five TraceScenarioFailed(reason=DockerBuildFailed(stderr_tail="scenarios.yaml malformed: <error>")) (closest S5-01 variant; do NOT add new variants here — Rule 3); envelope confidence="low".
  9. (d) Detect platform: if sys.platform != "linux": emit one TraceScenarioFailed(reason=StraceUnavailable()) per scenario; do not call run_allowlisted at all. Envelope confidence="low"; slice trace_coverage_confidence="unavailable".
  10. (e) Else (Linux): wrap the for-loop in asyncio.wait_for(_run_all_scenarios(scenarios=…, image_digest=…, ctx=ctx, snapshot=snapshot), timeout=_AGGREGATE_TIMEOUT_S); inside _run_all_scenarios, declare image_built = False as a local (NOT self._image_built), iterate scenarios with explicit await between iterations (no asyncio.gather, no TaskGroup). For each scenario: await asyncio.wait_for(_execute_scenario(scenario, image_ref, image_built, ctx, snapshot), timeout=_PER_SCENARIO_TIMEOUT_S). The first iteration receives image_built=False and triggers the docker build; subsequent iterations receive image_built=True from the loop's accumulator. On TimeoutError: emit TraceScenarioFailed(reason=ScenarioTimeout(seconds=120)). On the aggregate TimeoutError: not-yet-started scenarios get TraceScenarioSkipped(reason=ImageBuildUnavailable()) (closest S5-01 variant for "didn't run").
  11. (f) _execute_scenario accepts image_built: bool (input) and returns (ScenarioResult, image_built_after: bool). On image_built=False it calls run_allowlisted("docker", ["build", "-t", _image_ref_for_digest(image_digest), "-f", "Dockerfile", str(snapshot.root)]) first; sets image_built_after=True. Then calls run_allowlisted("strace", _build_strace_argv(image_ref, scenario.command)). The strace argv (built by the pure _build_strace_argv helper) is ["-f", "-e", "trace=openat,execve,connect,bind,mmap", "--", "docker", "run", *_HARDENING_FLAGS, "--", image_ref, *scenario.command]. Capture stdout/stderr; parse strace output via _parse_strace_lines into the slice fields.
  12. (g) _aggregate_scenarios(results) (pure function) folds per-scenario ScenarioResults into the slice; _derive_trace_coverage_confidence(results) derives the tetra-state. Envelope confidence is the lift of trace_coverage_confidence clipped to the tri-state Literal: {"high": "high", "medium": "medium", "low": "low", "unavailable": "low"}. Pin this lift in a _envelope_confidence(slice_confidence) -> Literal["high","medium","low"] pure function.
  13. Implement strace-output parser as a small pure function _parse_strace_lines(lines: Iterable[str]) -> ParsedTrace returning a frozen ParsedTrace(BaseModel, frozen=True, extra="forbid") model with set-valued fields (binaries_executed: frozenset[str], shared_libs_loaded: frozenset[str], cert_paths_read: frozenset[str], files_read_at_runtime: frozenset[str], network_endpoints_touched: frozenset[tuple[str, str]]) and one count-valued field (shell_invocations: int). Pure function — golden-tested against a fixture strace output snippet under tests/fixtures/strace/minimal.strace; property-tested for permutation-stability of the set fields (tests/property/test_strace_parser_commutativity.py); resilience-tested against malformed input (returns all-empty frozensets + zero count, does NOT raise).
  14. Write artifacts (one .strace per scenario + a merged runtime-trace.json) under .codegenie/context/raw/; the slice carries artifact_uri and per_scenario_artifacts. Skipped scenarios get per_scenario_artifacts[name] = None; the test asserts this representation explicitly.
  15. Slice flows back to the coordinator as ProbeOutput.schema_slice: dict[str, JSONValue]; the writer chokepoint (S3-03) wraps it in RedactedSlice via SecretRedactor. No model_construct anywhere in the module (the forbidden-patterns test backstops this).
  16. Module-level constants required: _HARDENING_FLAGS: Final[tuple[str, ...]], _PER_SCENARIO_TIMEOUT_S: Final[int], _AGGREGATE_TIMEOUT_S: Final[int], _IMAGE_REF_PREFIX: Final[str] = "codegenie-trace:", _DEFAULT_SCENARIOS: Final[tuple[ScenarioSpec, ...]], _DEFAULT_SCENARIO_NAMES: Final[tuple[str, ...]], _SCENARIO_TASK_NAME_PREFIX: Final[str] = "runtime_trace_scenario_". All exposed at module level; tests import them.
  17. Register @register_index_freshness_check("runtime_trace")deferred to S5-05; this story does not register it.

TDD plan — red / green / refactor

Red:

  1. test_register_probe_heaviness_heavy — registry introspection asserts RuntimeTraceProbe is registered with heaviness == "heavy" and runs_last is False. Initial state: module import fails.
  2. test_declared_inputs_literal_three_entries — asserts RuntimeTraceProbe().declared_inputs == ["Dockerfile", ".codegenie/scenarios.yaml", "image-digest:<resolved>"]. Failure mode: order or count or token-shape drift.
  3. test_class_attributes_pinned — asserts applies_to_tasks == ["*"], applies_to_languages == ["*"], requires == [], cache_strategy == "content", tier == "base", layer == "C", name == "runtime_trace".
  4. test_concurrent_task_count_le_one — instrument the probe via a self._scenario_in_progress: asyncio.Event hook set inside _execute_scenario for the duration of one scenario run; an observer task loops await asyncio.sleep(0) and snapshots len([t for t in asyncio.all_tasks() if t.get_name().startswith(_SCENARIO_TASK_NAME_PREFIX)]). The test asserts (a) count <= 1 at every sample; (b) len(samples) >= 10. This is the load-bearing test for final-design.md §"Where security/best-practices traded off perf" (a) — encodes "per-scenario sequential RuntimeTraceProbe execution can be silently parallelized by a future contributor." Assertion is on observed task count, not absence of asyncio.gather in source (bypassable).
  5. test_macos_no_strace_invocation — monkeypatch sys.platform to "darwin"; mock run_allowlisted with a spy that raises on argv[0] in {"strace", "sudo", "dtruss"}; run the probe; assert no spy raise; assert every scenario is TraceScenarioFailed(reason=StraceUnavailable()). (No os.uname variant — canonical detector is sys.platform != "linux".)
  6. test_macos_no_tty_interaction — mock run_allowlisted to fail-loud if stdin is anything other than DEVNULL; run on macOS-platform path; assert no failure (the probe never opens a TTY).
  7. test_hardening_flags_constant_pinned_HARDENING_FLAGS == ("--network=none", "--cap-drop=ALL", "--security-opt=no-new-privileges") exactly. Catches typo mutations in any one of the three flags that pure set-membership would miss.
  8. test_hardening_flags_in_argv — mock run_allowlisted to capture argv; run a single-scenario fixture on Linux-platform path; assert (a) the captured argv for the docker run segment contains every element of _HARDENING_FLAGS exactly once (order-independent set membership); (b) the argv contains a literal "--" token immediately preceding image_ref; (c) no argv element equals the string-concat of the three flags. Mutation test: deleting --network=none from _HARDENING_FLAGS flips this red.
  9. test_no_run_external_cli_in_source — open src/codegenie/probes/layer_c/runtime_trace.py and assert "run_external_cli" not in source and "run_allowlisted" in source.
  10. test_per_scenario_timeout_120s_constant / test_aggregate_timeout_600s_constant — import _PER_SCENARIO_TIMEOUT_S / _AGGREGATE_TIMEOUT_S and assert their literal values.
  11. test_per_scenario_timeout_triggers_failed — mock _execute_scenario to sleep 200 real-time-mocked seconds; assert the result is TraceScenarioFailed(reason=ScenarioTimeout(seconds=120)); assert the aggregate loop did not also time out.
  12. test_aggregate_timeout_triggers_failed_all_remaining — mock the first scenario to consume 540 s; subsequent scenarios should not start; the slice reflects 1 TraceScenarioFailed(ScenarioTimeout) (the in-flight one cancelled) + 4 TraceScenarioSkipped(ImageBuildUnavailable) for not-yet-started scenarios (closest S5-01 variant for "didn't run"). Documented in module docstring.
  13. test_docker_build_failure_all_skipped — mock run_allowlisted to return non-zero exit for the docker build argv; assert all five ScenarioResult are TraceScenarioSkipped(reason=ImageBuildUnavailable(...)); assert envelope confidence == "low" (NOT "unavailable" — contract preservation); assert slice trace_coverage_confidence == "unavailable".
  14. test_image_digest_resolver_returns_none_failed — bind a resolver returning None; assert all five ScenarioResult are TraceScenarioFailed(reason=ImageDigestUnresolved()) (NOT Skipped — variant lives in S5-01's TraceFailureReason); assert envelope confidence == "low"; slice built_image_digest is None; slice trace_coverage_confidence == "unavailable"; structured-log field image_digest_unresolved_reason == "resolver_returned_none".
  15. test_image_digest_resolver_unbound_failedctx.image_digest_resolver is None; same envelope/slice shape as test 14 (Failed not Skipped); structured-log field image_digest_unresolved_reason == "resolver_unbound".
  16. test_image_digest_resolver_raises_translated_to_failed — mock the resolver to raise RuntimeError("boom"); assert the probe completes (does NOT raise out of run()); assert all five TraceScenarioFailed(reason=ImageDigestUnresolved()); envelope confidence == "low"; structured-log image_digest_unresolved_reason == "resolver_raised" AND image_digest_resolver_error_repr contains "RuntimeError" (NOT the message body "boom" — defensive against PII).
  17. test_envelope_confidence_contract_preservedinspect.get_annotations(ProbeOutput)["confidence"] evaluates to Literal["high","medium","low"]; a parametrized run over all six envelope-failure paths (build failure / resolver None-returned / resolver None-bound / resolver raised / aggregate timeout / all-scenarios-timed-out) asserts envelope confidence ∈ {"high","medium","low"} — never "unavailable".
  18. test_cache_special_token_dispatch_recognizes_image_digest (tests/unit/cache/test_special_token_dispatch.py) — synthetic probe with declared_inputs=["Dockerfile", "image-digest:<resolved>"]; two ctx instances with resolvers returning different digests; assert key_for(probe, snapshot, task) produces two distinct cache keys; same resolver invoked twice on identical state → byte-identical key; the three "unresolved" paths (None-returned / unbound / raised) all fold to the same sentinel and produce the same cache key.
  19. test_cache_special_token_dispatch_unknown_raises (same file) — synthetic probe with declared_inputs=["bogus:<resolved>"]; key_for(...) raises CacheKeyError whose str(exc) contains both "unknown_special_token" AND "bogus:<resolved>".
  20. test_cache_resolves_runtime_trace_hit_skips_scenarios — integration test depending on tests 18+19. Run twice with the same fixture + same resolver returning the same digest; spy on _execute_scenario; first-run call count == 5, second-run call count == 0; second-run slice JSON byte-identical to first-run (modulo gathered_at / wall-clock).
  21. test_scenarios_yaml_pydantic_validation — malformed scenarios.yaml → envelope confidence="low"; all five TraceScenarioFailed(reason=DockerBuildFailed(stderr_tail="scenarios.yaml malformed: ...")). Missing file → default-fallback (envelope-succeeds path with _DEFAULT_SCENARIOS).
  22. test_trace_coverage_confidence_derivation — table-driven over (n_completed: 5..0) -> ("high","medium","medium","medium","low","unavailable"). Directly tests _derive_trace_coverage_confidence(...) as a pure function.
  23. test_aggregate_scenarios_is_exhaustive_match — pass [completed, failed, skipped]; assert returned slice fields match the expected mapping. Separate test_aggregate_scenarios_warn_unreachable_smoke runs mypy --warn-unreachable against a copy of runtime_trace.py with one case arm of _aggregate_scenarios deleted; asserts mypy errors with "Statement is unreachable" (mirrors S5-01 AC-6 smoke test).
  24. test_writer_chokepoint_secret_redaction — fixture whose smoke-test command echoes AKIA0123456789ABCDEF; capture the writer's RedactedSlice; assert findings_count >= 1; assert plaintext absent from every .codegenie/context/raw/ output file (grep-walk asserts 0 occurrences).
  25. test_image_ref_for_digest_format (tests/unit/probes/layer_c/test_runtime_trace_argv_builders.py) — parametrized over [("sha256:cafef00ddeadbeef0123456789abcdef", "codegenie-trace:cafef00ddead"), ("cafef00ddeadbeef0123456789abcdef", "codegenie-trace:cafef00ddead")]; ValueError for "" and for "not-a-digest".
  26. test_build_strace_argv_explicit_dash_dash_separator (same file) — _build_strace_argv("codegenie-trace:cafef00ddead", ["sh", "-c", "exit 0"]) returns argv containing exactly one "--" token positioned immediately before "docker" (separating strace's own args from the wrapped command); --network=none appears AFTER the "--" and before image_ref.
  27. test_build_docker_run_argv_contains_all_hardening_flags (same file) — _build_docker_run_argv(...): (a) contains every _HARDENING_FLAGS element exactly once; (b) contains a literal "--" immediately before image_ref; (c) no argv element equals string-concat of the three flags.
  28. test_parse_strace_lines_golden_fixture — load tests/fixtures/strace/minimal.strace; _parse_strace_lines(lines) returns the exact expected ParsedTrace instance.
  29. test_parse_strace_lines_malformed_resilience_parse_strace_lines(["this is not strace", "neither is this"]) returns the all-empty ParsedTrace; does NOT raise.
  30. test_parse_strace_lines_permutation_stability (tests/property/test_strace_parser_commutativity.py, Hypothesis-driven) — for any permutation of the golden fixture's lines, the set-valued fields are byte-identical. shell_invocations is the documented non-commutative exception.
  31. test_image_built_local_not_instanceprobe = RuntimeTraceProbe(); await probe.run(...); await probe.run(...); spy on run_allowlisted; assert docker build argv is invoked exactly twice total (once per run); assert getattr(probe, "_image_built", "<absent>") == "<absent>" (no instance attribute exists).
  32. test_six_plus_scenarios_via_yaml_zero_source_edit — fixture with .codegenie/scenarios.yaml declaring 7 scenarios (startup, smoke_test, healthcheck, shutdown, error_path, db_migrate, worker_drain); run the probe; assert 7 scenarios executed; assert set(slice["scenarios_run"]) | set(slice["scenarios_failed"]) | set(slice["per_scenario_artifacts"].keys()) covers all 7 names; assert _DEFAULT_SCENARIO_NAMES == ("startup", "smoke_test", "healthcheck", "shutdown", "error_path") (unchanged from canonical 5 — operator-side extension didn't require source edit).
  33. test_default_scenarios_source_scan_uniquenessgrep -rn "_DEFAULT_SCENARIOS" src/codegenie/ returns exactly one file (probes/layer_c/runtime_trace.py).
  34. test_scenarios_run_failed_skipped_routing — parametrized: result list [Completed("a"), Failed("b"), Skipped("c"), Completed("d"), Failed("e")]; assert scenarios_run == ["a", "d"], scenarios_failed == ["b", "e"], set(per_scenario_artifacts.keys()) == {"a","b","c","d","e"}, per_scenario_artifacts["c"] is None.
  35. test_slice_schema_is_complete_observable_surface — snapshot test asserts set(slice.keys()) == EXPECTED_SLICE_KEYS for both a healthy 5/5 run and an all-Failed(ImageDigestUnresolved) run.
  36. test_forbidden_patterns_phase2_runtime_trace — parametrized synthesis (mirrors S5-01 AC-11): model_construct / subprocess.run / asyncio.create_subprocess_exec source forms under src/codegenie/probes/layer_c/synth_runtime_trace.py (tmp_path) — each exits non-zero with both 02-ADR-0010 §Decision and production ADR-0033 §3. Negative: same source under probes/layer_a/synth.py exits zero.
  37. test_mypy_warn_unreachable_is_repo_wide — parses pyproject.toml; asserts [tool.mypy] warn_unreachable == True; asserts no [[tool.mypy.overrides]] block has exclude matching layer_c/runtime_trace (covered by default).

Green:

  1. Land Step 0 first — extend cache/keys.py::declared_inputs_for with _resolve_special_token dispatch + CacheKeyError. Tests 18 + 19 turn green here.
  2. Implement RuntimeTraceProbe per the implementation outline (steps 1–8).
  3. Implement _parse_strace_lines against tests/fixtures/strace/minimal.strace.
  4. Make all red tests pass; do NOT introduce mocks the test didn't already expect.

Refactor:

  1. Confirm _execute_scenario(scenario, image_ref, image_built, ctx, snapshot) -> (ScenarioResult, bool) is a pure async function — testable in isolation without mocking the probe class.
  2. Confirm _build_strace_argv / _build_docker_run_argv / _image_ref_for_digest / _envelope_confidence / _derive_trace_coverage_confidence / _parse_strace_lines / _aggregate_scenarios are pure module-private functions; each has at least one dedicated unit test that does NOT mock run_allowlisted.
  3. Confirm structured-log fields land via structlog's context binding (logger.bind(probe="runtime_trace", scenario=name)); one dispatch per scenario carries the binding.
  4. Confirm __all__ exports only RuntimeTraceProbe; internal builders are module-private (leading underscore).
  5. Confirm cache/keys.py's _resolve_special_token is a match over token names with assert_never-equivalent on the otherwise branch (the explicit raise CacheKeyError(reason="unknown_special_token", token=…)); future tokens add arms via ADR amendment to 02-ADR-0004.

Files to touch

  • New: src/codegenie/probes/layer_c/runtime_trace.py, tests/fixtures/strace/minimal.strace, tests/fixtures/scenarios/{empty.yaml,malformed.yaml,three_only.yaml,seven_scenarios.yaml}.
  • New tests: tests/unit/probes/layer_c/test_runtime_trace.py (covers AC tests 1–37 above), tests/unit/probes/layer_c/test_runtime_trace_no_external_cli_wrap.py (source-grep test), tests/unit/probes/layer_c/test_runtime_trace_argv_builders.py (pure-function builders), tests/unit/cache/test_special_token_dispatch.py (the new cache/keys.py extension), tests/property/test_strace_parser_commutativity.py (Hypothesis-driven permutation stability), tests/unit/pre_commit/test_forbidden_patterns_phase2_runtime_trace.py (mirrors S5-01 AC-11 shape).
  • Existing — edit required: src/codegenie/cache/keys.py — extend declared_inputs_for with _resolve_special_token dispatch and add CacheKeyError. S5-02 is the first consumer of the special-token mechanism; S1-09 added the ProbeContext.image_digest_resolver field but did NOT extend cache/keys.py. Possibly extend: scripts/check_forbidden_patterns.py::_is_under_phase2_banned_package if the predicate doesn't already cover probes/layer_c/runtime_trace.py (S5-01 covered probes/layer_c/scenario_result.py; verify the predicate is path-scoped to the whole probes/layer_c/ subdirectory).
  • Existing — read-only references: src/codegenie/probes/layer_c/scenario_result.py (S5-01 — variant set), src/codegenie/probes/_shared/scanner_outcome.py (S5-01 — NOT consumed by this probe; documented for cross-reference), src/codegenie/probes/base.py (read ProbeContext.image_digest_resolver field after S1-09; Probe.confidence: Literal["high","medium","low"] contract — pinned in this story's test), src/codegenie/exec/__init__.py (run_allowlisted — S1-06 lands docker/strace in ALLOWED_BINARIES), src/codegenie/output/writer.py (writer's RedactedSlice signature — S3-03), src/codegenie/cache/keys.py::key_for (downstream consumer of the new dispatch).
  • No edit: pyproject.toml [tool.mypy]warn_unreachable = true is already repo-wide (S1-02 / S1-11). docs/localv2.md §4 — the special-token form is already permitted by localv2.md §4. src/codegenie/probes/base.pyimage_digest_resolver already present (S1-09).

Out of scope

  • The freshness-check registration @register_index_freshness_check("runtime_trace")S5-05 lands it. This story's probe emits the slice fields B2 reads; S5-05 wires the freshness function.
  • The image_digest_drift adversarial test — S5-05.
  • The adversarial_dockerfile container-hardening test — S5-06 (this story makes the hardening flags present and tested at unit level; S5-06 proves the flags actually contain a forkbomb).
  • DockerfileProbe, EntrypointProbe, ShellUsageProbe, CertificateProbeS5-03.
  • SyftProbe, GrypeProbeS5-04 (which requires=["runtime_trace"] per the dispatch-ordering ADR — see S5-04's requires mechanism).
  • Sub-schema src/codegenie/schema/probes/layer_c/runtime_trace.schema.jsonS5-03 lands it (this story emits the dict shape; S5-03's sub-schema validates it).
  • Bench (cold p50 ~90 s) — S8-03 lands the canary; this story's unit tests do not exercise wall-clock targets.

Notes for the implementer

  • The single most load-bearing test in this story is test_concurrent_task_count_le_one. It encodes final-design.md §"Where security/best-practices traded off perf" (a) — "sequential runtime trace scenarios (~75 s wall-clock floor vs. theoretical 15 s if parallel) — accepted because parallel traces against the same image race resources and confuse attribution." A future PR that introduces asyncio.gather over scenarios will flip this red. Do not weaken the assertion to "no gather literal in source" — that is bypassable. Assert on observed task count, not on syntax.
  • Envelope confidence contract preservation. Probe.confidence: Literal["high","medium","low"] is frozen at src/codegenie/probes/base.py:68 (and localv2.md §4 line 328). The probe's slice carries trace_coverage_confidence: Literal["high","medium","low","unavailable"] (tetra-state — a Phase-2 extension of the tri-state in localv2.md §5.3 C4); the envelope's confidence clips this to the contract via _envelope_confidence. NEVER widen the envelope confidence Literal — even if it seems harmless; the contract amendment requires an ADR-gated Probe-ABC change that this story does not have. test_envelope_confidence_contract_preserved is the structural defense.
  • ImageDigestUnresolved is a TraceFailureReason variant, NOT a TraceSkipReason variant (S5-01 HARDENED variant set). Resolver-returned-None / resolver-unbound / resolver-raised paths all emit TraceScenarioFailed(reason=ImageDigestUnresolved()) — NOT Skipped. Docker-build-failure paths emit TraceScenarioSkipped(reason=ImageBuildUnavailable()). The semantic distinction: a scenario that failed to acquire its prerequisite (image digest unresolved) was attempted; a scenario that was never attempted because the image build itself failed was skipped. Do NOT add new variants to S5-01 from this story (Rule 3 — surgical). If the implementer encounters a path that S5-01's variant set genuinely doesn't cover, surface to user and amend S5-01 in a separate PR.
  • Cache _resolve_special_token dispatch lives in this story. S1-09 added the ProbeContext.image_digest_resolver field but did NOT extend cache/keys.py; inspection confirms cache/keys.py::declared_inputs_for (lines 94–126) literally rglobs every entry and silently drops non-matches. As the first consumer, this story lands the resolver. The dispatch is a match on the token name with an explicit raise CacheKeyError(reason="unknown_special_token", token=…) on the otherwise arm — Open/Closed seam for future tokens (scip-index-output:, tree-sitter-grammar-set:); future arms add via ADR amendment to 02-ADR-0004. Fold the resolved string (or sentinel "" for None / unbound / raised) into the content-hash tuple key_for already constructs.
  • The macOS path is permanent. Resist the urge to add a "TODO: implement dtruss with sudo" comment. The synthesis explicitly chose StraceUnavailable over a sudo-prompting dtruss path because the sudo prompt would break determinism and CI is Linux-canonical. The macOS path emits the typed failure so S5-05's freshness check + S8-01's renderer surface it loudly. Canonical detector is sys.platform != "linux" (NOT os.uname().sysname); pick one and stick with it.
  • Layer C does NOT use run_external_cli. 02-ADR-0001 (final-design.md §"Departures" #1). The run_external_cli wrapper (S1-07) adds bubblewrap --unshare-net and env-strip for Layer B/G scanners. For Layer C the equivalent isolation is the --network=none --cap-drop=ALL --security-opt=no-new-privileges flags constructed at the call site — different mechanism, same outcome. Wrapping docker inside bubblewrap --unshare-net would prevent docker build from working (Docker daemon socket access). The test_no_run_external_cli_in_source smoke test is the structural enforcement.
  • image_digest_resolver raising path. Any exception from ctx.image_digest_resolver(repo_root) is caught and translated to per-scenario TraceScenarioFailed(reason=ImageDigestUnresolved()) + structured-log image_digest_unresolved_reason="resolver_raised". The original exception's repr (NOT str(exc) or the message body) goes into a separate structured-log field image_digest_resolver_error_repr — defensive against PII leak via exception text. The probe never raises out of run().
  • Cache HIT semantics. When Phase 0 Cache returns a HIT (resolved image-digest:<digest> token matches cached token), the probe's run() should NOT be re-entered for the scenarios block — the cached slice is returned. The "second-run _execute_scenario call count == 0" assertion guards this. If you find yourself touching Cache.get/put, stop — Phase 0 Cache already handles HIT short-circuiting via key_for; this probe just needs to emit declared_inputs correctly and accept the cached envelope.
  • Aggregate timeout semantics. When the aggregate 600 s budget expires mid-scenario, the not-yet-started scenarios get TraceScenarioSkipped(reason=ImageBuildUnavailable()) (closest existing S5-01 variant for "didn't run"). The currently-executing scenario, on asyncio.CancelledError, gets TraceScenarioFailed(reason=ScenarioTimeout(seconds=<remaining>)). Document this in the module docstring so a future maintainer doesn't conflate the two paths.
  • _image_built is per-run(), not per-instance. The flag lives as a local in _run_all_scenarios(...) (or as an accumulator threaded through the per-scenario tuple return) and is passed explicitly into _execute_scenario(...). NO self._image_built attribute. If the coordinator ever reuses a probe instance across gathers (a possibility the kernel hasn't ruled out), instance-level state would poison the second run() — the image may have been rebuilt between gathers, and we MUST run docker build exactly once per run() invocation. test_image_built_local_not_instance is the structural defense.
  • No pytest-xdist — Phase 2 ADR-0009 vetoed parallel test execution. Even this probe's unit tests are serial. Wall-clock cost is paid in CI's unit job budget (≤ 90 s per Step 5 README; verify in S8-03's bench canary).
  • The slice's built_image_digest and last_traced_image_digest are what S4-01's IndexHealthProbe reads. Today they are identical when a fresh trace succeeds; S5-05 introduces the image_digest_drift adversarial that mutates them apart so B2 emits Stale(DigestMismatch(...)).
  • Strace-parsing is pure. _parse_strace_lines(lines: Iterable[str]) -> ParsedTrace over an iterable of lines, returning a frozen Pydantic model. Set-valued fields (binaries_executed, shared_libs_loaded, cert_paths_read, files_read_at_runtime, network_endpoints_touched) are frozenset so permutation stability is structural; only shell_invocations: int is non-commutative (count may differ under reorderings that group/un-group exec lineages — documented in module docstring). The Hypothesis property test (tests/property/test_strace_parser_commutativity.py) exercises permutation stability for the set fields.
  • Image-ref smart-constructor format. _image_ref_for_digest(digest: str) -> str returns _IMAGE_REF_PREFIX + _short(digest) where _IMAGE_REF_PREFIX: Final[str] = "codegenie-trace:" (module constant) and _short strips any leading "sha256:" prefix then takes the first 12 hex characters. Empty / non-hex inputs raise ValueError. The format is pinned in one helper — no string concatenation of "codegenie-trace:" at any call site.
  • Pure argv builders. _build_strace_argv and _build_docker_run_argv are module-private pure functions; each is importable by tests/unit/probes/layer_c/test_runtime_trace_argv_builders.py and asserted without mocking run_allowlisted. The strace builder produces argv with exactly one "--" token separating strace's own args from the wrapped docker run invocation; mutation #3 (argv-merge regressions) is caught by test_build_strace_argv_explicit_dash_dash_separator.
  • Operator-extensibility for scenarios. Adding a 6th, 7th, … scenario is a .codegenie/scenarios.yaml operator edit. Zero runtime_trace.py edit required. test_six_plus_scenarios_via_yaml_zero_source_edit is the structural defense. Adding a new canonical default scenario is a separate (rarer) event — that's a localv2.md §5.3 C4 doc amendment + _DEFAULT_SCENARIOS constant edit, with the source-scan uniqueness test catching drift if a sibling module hardcodes its own list.
  • Newtype deferral (S1-05). image_ref: str, image_digest: str, scenario_name: str each cross ≥ 2 module boundaries (probe ↔ cache, probe ↔ slice, probe ↔ structured-log). S1-05 is the canonical newtype story; mirror S5-01 DF-5's deferral — do NOT introduce newtypes here (Rule 2 — premature abstraction with only one in-tree producer). When S1-05 lands (or when a 3rd consumer of these strings emerges), the migration is a one-pass rename + alias.
  • Trace-backend Protocol deferral (Phase 5 / Phase 7). The macOS/Linux split is one if today (two cases — below the rule-of-three threshold). When a 3rd backend lands (microVM ptrace? Phase 5? dtrace under Chainguard distroless? Phase 7?), refactor _TraceBackend = Protocol with Strace, Unavailable, Ptrace impls. Today's if is the boring shape and is fine per CLAUDE.md Rule 2 ("three similar lines is better than a premature abstraction").
  • Producer/consumer assert_never ladder. This story is the 1st canonical producer of ScenarioResult (S5-01 was the type introduction). Document in the module docstring: producers = {RuntimeTraceProbe}; consumers = {_aggregate_scenarios (in-module), S5-05 freshness check, S8-01 renderer}. Mirror S5-01's "rehearse the discipline at every level" — the match runs on ScenarioResult top-level AND on the inner TraceFailureReason / TraceSkipReason reasons where used.
  • forbidden-patterns extension. S5-01 covered probes/layer_c/scenario_result.py. This story is the 2nd probes/layer_c/ module; verify by inspection that scripts/check_forbidden_patterns.py::_is_under_phase2_banned_package matches the whole probes/layer_c/ subdirectory (most likely it does, since S5-01 prescribed the path-scoped predicate). If not, extend in this story PR mirroring S5-01 AC-11 pattern; the dedicated tests/unit/pre_commit/test_forbidden_patterns_phase2_runtime_trace.py is the structural defense.
  • mypy enforcement. [tool.mypy] warn_unreachable = true is repo-wide since Phase 0 S1-02 (pyproject.toml line 141 — established by S1-11 validation, reaffirmed in S5-01 validation). No per-module override needed. test_mypy_warn_unreachable_is_repo_wide is the cross-cut defense.
  • Open question — distroless target image (Phase 7 forward-looking): the distroless-target fixture (S7-01) exercises RuntimeTraceProbe against an image where strace cannot attach (distroless has no /proc/self/exe symlink for the host strace to read against). Today: same shape as macOS — TraceScenarioFailed(reason=StraceUnavailable()) per scenario, surfaced via structured log. Document in the module docstring as an open path that S7-01 stresses.