Validation report: S6-06 — `Semgrep` + `AstGrep` + `RipgrepCurated` Layer G scanners¶

Validated: 2026-05-17 Verdict: HARDENED Validator version: phase-story-validator v1 Story: ../S6-06-layer-g-curated-scanners.md

Summary¶

S6-06's intent — three sibling Layer G scanner files (semgrep.py, ast_grep.py, ripgrep_curated.py) that decline to share a ScannerRunner base because each has a genuinely-different I/O shape (and semgrep's "exit code 1 = findings present" carve-out is the textbook reason a shared abstraction would be wrong) — is well-formed and traces cleanly to phase-arch-design.md §"Design patterns applied" row 7, §"Component design" #5, and ADR-0001. The story is also the stated test of that design discipline: a reviewer who collapses the four scanners into a shared base immediately fails the architectural tests in test_scanner_loc_ceiling.py. None of that is in dispute.

However, the original draft repeats the same systemic kernel-drift mis-statements every S5-04 / S6-01 / S6-02 / S6-03 / S6-04 / S6-05 validation has had to undo: the frozen Phase-0 Probe ABC (src/codegenie/probes/base.py:74-96) is async def run(self, repo: RepoSnapshot, ctx: ProbeContext) -> ProbeOutput with a name: str class attribute and a six-field ProbeOutput(schema_slice, raw_artifacts, confidence, duration_ms, warnings, errors) constructor. The draft consistently encodes _run(ctx), probe_id = ProbeId("..."), ProbeOutput(probe_id=..., schema_slice=..., errors=[]) (four fields, wrong names), ProbeContext.for_test(...) (does not exist), tuple[str, ...] for applies_to_* (must be list[str]), from codegenie.ids import ProbeId (wrong module — actual is codegenie.types.identifiers), from codegenie.exec import ToolMissingError (actual: codegenie.errors; exec re-exports), run_external_cli(binary, argv, timeout_seconds=...) synchronously (actual is await run_external_cli(probe_name, argv, *, cwd, timeout_s) async with probe_name separate from argv), and pytest-subprocess's fp.register([...]) mocking pattern (repo precedent is monkeypatch.setattr(mod, "run_external_cli", _spy) — see tests/unit/probes/layer_c/test_sbom.py:140 and test_cve.py:141-363).

A second, deeper structural flaw: the story declares per-scanner Pydantic Finding models (SemgrepFinding, AstGrepFinding, RipgrepFinding with different fields) as the ScannerRan.findings payload. But ScannerRan.findings: list[Finding] is typed (S5-01, src/codegenie/probes/_shared/scanner_outcome.py:82-88) against the shared Finding(kind, id, severity, metadata: dict[str, JSONValue]). The S5-04 precedent (sbom.py:244-254) shows how to honor both: ScannerRan(findings=[]) is fine (empty list passes the contract); the scanner-specific rich payload lives in slice-level fields (packages_by_source, package_count, etc.) — not in ScannerRan.findings. The story must adopt this two-level shape, or amend ADR-0006 to widen ScannerRan.findings. The first is by-far cheaper.

A third drift: AC-13 invokes a phantom OutputTooLarge exception and a ScannerFailed(reason="output_too_large") value that does not exist in the ScannerFailed.reason: Literal["invalid_json", "sbom_artifact_missing"] | None closed set. run_external_cli does not raise on cap-exceeded — it returns a ProcessResult with tail-truncated bytes (src/codegenie/exec/__init__.py:579-598). The test must be reframed: "stdout truncated past cap → parse-then-emit-finding still succeeds (tail-preservation guarantee) OR ScannerFailed(reason='invalid_json') if the tail starts mid-token."

Verdict: HARDENED. Twenty-six in-place edits applied — sixteen consistency fixes mirroring S6-05's pattern, six new ACs covering coverage gaps (truncation, timeout, empty-findings, registry-membership smoke, name-identity discipline, ABC-required-attrs), three test-quality hardens (AST audit instead of source-grep, property-based classifier totality, mutation-resistance smoke), and three design-pattern hardens (pure-total classifier per scanner, _PROBE_ID: Final[ProbeId] dual-form identity, slice-level rich findings instead of breaking the ScannerRan.findings contract). No NEEDS RESEARCH findings — every fix has direct in-repo precedent (S5-04 sbom/cve, S6-05 layer-e).

Context Brief (Stage 1)¶

Story snapshot¶

Goal: Ship src/codegenie/probes/layer_g/{semgrep,ast_grep,ripgrep_curated}.py. Each ≤ 200 LOC, each @register_probe(heaviness="medium"), each follows the five-step inline shape (tool-cache check → run_external_cli → Pydantic parse → ScannerOutcome → ProbeOutput), each imports zero code from sibling scanners. The fourth scanner (gitleaks.py) is S6-07.
Non-goals: GitleaksProbe (S6-07), TestCoverageMappingProbe (S6-08), custom rule pack authoring, cross-scanner correlation, SecretRedactor invocation (writer-chokepoint job).
Effort: M.
Depends on: S1-07 (run_external_cli), S3-03 (writer + sanitizer composition), S5-01 (ScannerOutcome sum type), S1-08 (@register_probe(heaviness=...)).

Phase / arch constraints touched¶

Phase 0 ADR-0007 — Probe ABC frozen. Eight class attributes (name, layer, tier, applies_to_tasks, applies_to_languages, requires, declared_inputs, timeout_seconds, optional cache_strategy). One async method run(repo, ctx). Six-field ProbeOutput.
02-ADR-0001 — semgrep, ast-grep, ripgrep in ALLOWED_BINARIES. Layer C uses run_allowlisted directly; Layer B/G goes through run_external_cli.
02-ADR-0003 Option D — @register_probe accepts only heaviness + runs_last. requires is a class attribute. Coordinator does NOT topologically sort.
02-ADR-0005 — no plaintext persistence; secret findings flow through SecretRedactor at the writer chokepoint. Probe captures honestly.
02-ADR-0006 — ScannerOutcome variant set is closed; extension is ADR-amendment-gated. ScannerFailed.reason is Literal["invalid_json", "sbom_artifact_missing"] | None. ScannerSkipped.reason is Literal["tool_missing", "tool_unhealthy", "upstream_unavailable"].
02-ADR-0010 — RedactedSlice smart constructor at writer boundary.
phase-arch-design.md §"Design patterns applied" row 7 — no shared ScannerRunner abstraction; the duplication is the discipline.
CLAUDE.md "Extension by addition" — adding a 4th scanner (gitleaks) is a new file, not edits to the three landed here.
CLAUDE.md "Honest confidence" — confidence="high" when the scanner ran and parsed; confidence="low" when skipped or failed.

Findings by critic¶

Coverage critic¶

ID	Severity	Finding	Proposed fix
K1	harden	No AC for timeout → typed outcome. `run_external_cli` raises `ProbeTimeoutError` on `timeout_s` exceeded; the story doesn't say what happens. S5-04 precedent: `_ProcessExited(exit_code=124, stdout=b"", stderr_tail="<probe>.timeout")` → `ScannerFailed(exit_code=124, ...)`.	New AC-T1: timeout → `ScannerFailed(exit_code=124, stderr_tail="<probe>.timeout")`. New test per probe.
K2	harden	No AC for empty findings success (semgrep exit 0 with `results: []`). Mutation: an implementation that returned `ScannerSkipped(...)` on empty findings would pass every other AC.	New AC-E1: exit 0 + valid JSON + empty `results` → `ScannerRan(findings=[])` with `confidence="high"`.
K3	harden	No AC for registry-membership smoke (mirror S6-05 AC-NEW-2 / S5-04 K2). The `@register_probe(heaviness="medium")` decorator effect is unverified — a future maintainer who drops the decorator silently loses dispatch.	New AC-R1: `_PROBE_REGISTRY["semgrep"]` / `["ast_grep"]` / `["ripgrep_curated"]` entries exist with `heaviness="medium"` and `runs_last=False`.
K4	harden	No AC for the `name` class-attribute identity (mirror S6-05 AC-NEW-1 / S6-04 dual-form discipline). The story uses `probe_id = ProbeId("semgrep")` which is not the ABC attribute (ABC has `name: str`). Per S5-04 sbom/cve precedent, the right shape is `_PROBE_ID: Final[ProbeId] = ProbeId("semgrep")` module constant + `name: str = "semgrep"` class attribute — the `_PROBE_ID` is what feeds `run_external_cli(probe_name=...)`; the `name` is what the kernel introspects.	New AC-N1 per probe: module-level `_PROBE_ID: Final[ProbeId]` + class-level `name: str = "semgrep"` matching the filename.
K5	harden	No AC pinning the eight ABC class attributes (`layer="G"`, `tier="base"`, `requires=[]`, `declared_inputs=[<globs>]`, `cache_strategy` default). S5-04 precedent: `cve.py:177-185` pins every one. Without these the registry can still load but `mypy --strict` will let `layer = "F"` slip.	New AC-B1: every probe pins `layer="G"`, `tier="base"`, `requires: list[str] = []`, `declared_inputs: list[str] = [<per-scanner globs>]`.
K6	harden	AC-13 invokes a phantom `OutputTooLarge` exception and `ScannerFailed(reason="output_too_large")`. Neither exists — `run_external_cli` tail-truncates and returns a `ProcessResult` (`src/codegenie/exec/__init__.py:579-598`); `ScannerFailed.reason` is closed to `{"invalid_json", "sbom_artifact_missing", None}` (`scanner_outcome.py:120`).	Rewrote AC-13: stdout-cap-exceeded path is exercised by parametrized test that mocks `run_external_cli` to return a `ProcessResult` with truncated bytes; assert (a) the probe doesn't raise, (b) if the tail still parses → `ScannerRan(...)`, (c) if the tail starts mid-token → `ScannerFailed(reason="invalid_json")`.
K7	harden	AC-14 (`bubblewrap` no-op on macOS) tests `sys.platform = "darwin"` — but the platform check is inside `run_external_cli`, not the probe. The probe sees no difference between Linux+bwrap and macOS+no-bwrap. The correct AC is "probe is platform-independent": same monkeypatched `run_external_cli` stub → same probe outcome on either platform.	Rewrote AC-14: the probe contains zero `sys.platform` / `platform.system()` / `shutil.which("bwrap")` references (AST audit). The platform-handling is the wrapper's job.
K8	harden	AC-18 (sub-schema round-trip) defers to S6-08 but the story doesn't say what to assert in S6-06's TDD lane. S6-05 precedent: pin Pydantic `model_config["extra"] == "forbid"` on the slice model in this story; defer JSON-Schema round-trip to S6-08.	Soft AC-18a: each slice model has `model_config = ConfigDict(frozen=True, extra="forbid")`. JSON-Schema round-trip deferred to S6-08.
K9	harden	The per-scanner Pydantic `Finding` shape (`SemgrepFinding`/`AstGrepFinding`/`RipgrepFinding`) collides with `ScannerRan.findings: list[Finding]` where `Finding` is the shared S5-01 model. AC-9 needs reframing OR ADR-0006 amendment (the latter is expensive — closed-set discipline).	Adopt the S5-04 precedent: `ScannerRan(findings=[])` (empty); the rich shape lives in slice-level fields. Rewrote AC-9 to use `findings_detail: list[SemgrepFinding]` etc. on the slice, with `ScannerRan.findings` left empty (the typed sum signals "scanner ran"; the rich payload is slice-side).
K10	nit	AC-7's ripgrep argv embeds the literal pattern `"exec\\("` as a string in the argv list — but `--type-not lock` argument order matters (rg parses positional flags). Also `--max-count` is per-file, NOT per-pattern (rg's documented semantics).	Tightened AC-7: argv order pinned with mutation note; replaced "per-pattern" with "per-file" in description.

Test-Quality critic¶

ID	Severity	Finding	Proposed fix
T1	block	Every TDD test imports `from codegenie.probes.layer_g import semgrep as sg` and calls `sg.SemgrepProbe()._run(ctx)`. The ABC's only method is `async def run(self, repo, ctx)` — `_run` does not exist. Tests will fail at AttributeError, not catch any business logic.	Rewrote every test to use `await SemgrepProbe().run(repo, ctx)` with helper builders `_make_repo()` / `_make_ctx()` mirroring `tests/unit/probes/layer_c/test_sbom.py` conftest.
T2	block	Tests use `pytest-subprocess`'s `fp.register([...])` — but the repo's actual mocking precedent (10+ call sites) is `monkeypatch.setattr(mod, "run_external_cli", _spy)` returning an awaitable `ProcessResult`. `pytest-subprocess` is not in `pyproject.toml § [project.optional-dependencies] dev`.	Rewrote every test to monkeypatch `run_external_cli` directly on the scanner module. The argv-introspection assertion is now a captured-args spy (`captured["argv"]`) inside the stub.
T3	harden	No property-based test on the per-scanner classifier. Each scanner has a pure classifier (`_classify_semgrep_outcome`, etc.) that maps `(ProcessResult \\| _ToolMissing \\| _ProcessTimedOut)` → `ScannerOutcome`. Totality (every input → exactly one variant, no raises) is the load-bearing property. S5-04 T3 precedent.	New parametrized Hypothesis property test per scanner: for any drawn `ScannerAttempt`, classifier returns exactly one `ScannerOutcome` and never raises.
T4	harden	AC-16's "no direct subprocess" test uses source-grep — defeatable by string concatenation or re-export. S5-04 T4 precedent: AST audit.	Replaced with AST-walk: parse the module source, walk for `ast.Call` whose `.func` resolves to `subprocess.run` / `subprocess.Popen` / `asyncio.create_subprocess_exec` / `asyncio.create_subprocess_shell` / `os.system` / `os.popen`. Also assert `run_allowlisted` is not imported in any layer_g module (Layer G uses the `run_external_cli` wrapper; Layer C uses `run_allowlisted` directly).
T5	harden	No mutation-resistance smoke (mirror S5-04 T2 / S6-05 mutation table). A scanner that always returned `ScannerRan(findings=[])` regardless of exit code would pass most happy-path tests.	New Test-MR: parametrized table of 6+ wrong stubs per scanner (always-ran, always-skipped, always-failed, swallow-ValidationError, ignore-exit-1, raise-past-boundary); assert each fails at least one named test.
T6	harden	The `test_semgrep_exit_code_1_is_findings_not_failure` test pins only that `outcome` is `ScannerRan`. A trivially-wrong implementation that returned `ScannerRan(findings=[])` regardless of stdout would pass. Mutation hint missing.	Strengthened to assert `len(slice_.findings_detail) == 1` AND `slice_.findings_detail[0].check_id == "p/nodejs.eval-detected"`. Mutation note added.
T7	harden	No test asserts `run_external_cli` is awaited with the correct `probe_name=` argument (the `ProbeId` value). A copy-paste between scanners could leave `semgrep.py` calling `run_external_cli(ProbeId("ast_grep"), ...)`.	New test per scanner: spy captures the `probe_name` positional arg; assert equality to the scanner's `_PROBE_ID`.

Consistency critic¶

ID	Severity	Finding	Proposed fix
C1	block	Every probe-class skeleton uses `probe_id = ProbeId("...")` as a class attribute. The frozen ABC (`base.py:75`) declares `name: str`. The story violates Phase 0 ADR-0007's "byte-for-byte localv2 §4" snapshot test (`tests/unit/test_probe_contract.py`).	Rewrote class skeleton: `name: str = "<scanner>"` class attribute + module-level `_PROBE_ID: Final[ProbeId] = ProbeId("<scanner>")` constant (S5-04 / S6-05 dual-form precedent). The `_PROBE_ID` is the value passed to `run_external_cli(probe_name=..., ...)`.
C2	block	`from codegenie.ids import ProbeId`. The actual module is `codegenie.types.identifiers` (`src/codegenie/types/identifiers.py`; cve.py:81 + sbom.py:78 import precedent).	Corrected to `from codegenie.types.identifiers import ProbeId`.
C3	block	`from codegenie.exec import ProcessResult, ToolMissingError, run_external_cli`. `ToolMissingError` actually lives in `codegenie.errors`; `exec` re-exports it but the sibling-precedent (`sbom.py:59-60`) imports `ToolMissingError` (and `ProbeTimeoutError`) from `codegenie.errors`, not from `codegenie.exec`.	Corrected the import lines: `from codegenie.errors import ProbeTimeoutError, ToolMissingError`; `from codegenie.exec import ProcessResult, run_external_cli`.
C4	block	The Green skeleton's `_run(self, ctx)` returns `ProbeOutput(probe_id=..., confidence=..., schema_slice=..., errors=[])` — four kwargs. The actual `ProbeOutput` dataclass is `(schema_slice, raw_artifacts, confidence, duration_ms, warnings, errors)`. There is no `probe_id` field; missing `raw_artifacts`, `duration_ms`, `warnings`.	Rewrote the skeleton's `_envelope` helper to construct `ProbeOutput(schema_slice={"semgrep": slice_dict}, raw_artifacts=raw_artifacts, confidence=..., duration_ms=int((time.perf_counter() - t0) * 1000), warnings=[], errors=[])` matching `sbom.py:289-297`.
C5	block	`_run` is the wrong method name. The ABC's frozen method is `async def run(self, repo: RepoSnapshot, ctx: ProbeContext) -> ProbeOutput`. Story's signature drops `repo` entirely, but the scanner needs `repo.root` as `cwd` to `run_external_cli`.	Rewrote signature to `async def run(self, repo: RepoSnapshot, ctx: ProbeContext) -> ProbeOutput` everywhere; `cwd=repo.root` in the `run_external_cli` call.
C6	block	`run_external_cli(binary, argv, timeout_seconds=...)` — wrong call site. Actual signature is `await run_external_cli(probe_name: ProbeId, argv: list[str], , cwd: Path, timeout_s: float, allowlisted_egress=frozenset(), max_stdout_bytes=64 1024 * 1024)`. `argv[0]` is the binary; the first positional is `probe_name`; the kwarg is `timeout_s` (not `_seconds`); `cwd` is required.	Rewrote AC-5/6/7 and the skeleton's call sites: `await run_external_cli(_PROBE_ID, ["semgrep", "--config", ..., str(repo.root)], cwd=repo.root, timeout_s=60.0)`.
C7	block	`applies_to_tasks: tuple[str, ...] = ("*",)`. ABC's frozen type is `list[str]` (mirror S6-05 finding). Tuple vs list collide on registry introspection.	Corrected to `applies_to_tasks: list[str] = [""]` and `applies_to_languages: list[str] = [""]` on every probe.
C8	block	Tests use `ProbeContext.for_test(repo_root=...)`. No such constructor exists; `ProbeContext` is a stdlib `@dataclass` with required fields (`cache_dir, output_dir, workspace, logger, config`). Same drift S6-05 / S6-04 had to fix.	Replaced with `_make_repo()` + `_make_ctx()` fixtures (mirror `tests/unit/probes/layer_c/test_sbom.py` conftest pattern); construct `RepoSnapshot(root=..., git_commit=None, detected_languages={}, config={})` and `ProbeContext(cache_dir=tmp_path / "cache", output_dir=tmp_path / "out", workspace=tmp_path / "ws", logger=logging.getLogger("test"), config={})`.
C9	harden	`_PROBE_REGISTRY["semgrep"].heaviness` — registry-entry shape unverified. Per S5-03/S5-04 validation reports, the registry entry stores only `heaviness` + `runs_last`.	Asserted both fields and asserted no `requires` key (defensive mutation note that `@register_probe(requires=...)` was a kwarg-phantom in earlier drafts).
C10	harden	`ScannerRan(findings=[SemgrepFinding(...)])` — type-fails because `ScannerRan.findings: list[Finding]` (the shared S5-01 model), not `list[SemgrepFinding]`.	Per K9 fix: keep `ScannerRan(findings=[])`; put rich findings on the slice as `findings_detail: list[SemgrepFinding]`.
C11	harden	The `_config_path` helper reads `ctx.config.get("semgrep_config", "p/nodejs")` but `ctx.config: dict[str, Any]` semantics aren't pinned (mirror S6-05's `config` discipline).	Pinned: keyed lookup with explicit default; type cast to `Path` only when the value is a filesystem path, else passed as a registry slug string to semgrep's `--config`.
C12	harden	Schema path `src/codegenie/schema/probes/layer_g/{semgrep,ast_grep,ripgrep_curated}.schema.json`. Existing convention (`src/codegenie/schema/probes/layer_c/`) supports the nesting — confirmed precedent.	Surfaced — convention holds; AC-18a defers JSON-Schema round-trip to S6-08.

Design-Patterns critic¶

ID	Severity	Finding	Proposed fix
D1	harden	The story's Green skeleton runs the five steps inline in `_run`. The S5-04 precedent (`sbom.py:100-112`) introduces a pure, total classifier (`_classify_syft_outcome(attempt: ScannerAttempt) -> ScannerOutcome`) on a per-scanner tagged-union input. This gives (a) `match`-exhaustive `assert_never` discipline, (b) a property-based totality test (mutation-resistance), (c) functional core / imperative shell separation. The story misses this opportunity.	New Refactor step: each scanner declares a private tagged union `<Scanner>Attempt = _ToolMissing \| _ProcessTimedOut \| _ProcessExited(...)` and a pure classifier `_classify_<scanner>_outcome(attempt) -> ScannerOutcome`. The `run()` method's I/O is reduced to: build `attempt` → classify → wrap. Mirrors `sbom.py` line-for-line.
D2	harden	The rule-of-three trigger is mentioned in the design table but the story does not pin the threshold. S5-04 D2 hardening surfaced that "extract a shared kernel at the 4th scanner" is the next trigger; here it's the 3rd + 4th scanner that materially duplicate.	New Notes-for-implementer paragraph: when S6-07 (gitleaks) lands, three of the four scanners will share (a) the `_ProcessExited` tagged-union element + `_stderr_tail` helper, (b) the `_envelope` helper, (c) the `_PROBE_ID` + `_TIMEOUT` Final constants. The trigger for extraction to `codegenie.probes._shared.scanner_common` (a NEW module — not edits to existing scanners) is "third scanner ships and three concrete behaviors duplicate verbatim across files." The semgrep exit-1 carve-out and ripgrep's NDJSON parser stay per-scanner (genuinely different).
D3	harden	`_CURATED_PATTERNS: Final[tuple[str, ...]]` is correct but missing the `Final` annotation in the skeleton (only mentioned in Notes). Module constants without `Final` lose mypy's rebinding backstop.	Pin `Final[...]` annotation on every module constant (`_PROBE_ID`, `_TIMEOUT_S`, `_CURATED_PATTERNS`, `_SLICE_FILENAME`, `_RAW_TOOL_FILENAME`) in the skeleton + a one-line refactor step.
D4	harden	Per-scanner `_RAW_TOOL_FILENAME` / `_SLICE_FILENAME` constants are missing — sibling-precedent (`sbom.py:85-86`) makes the writer's two-file split (typed slice file + raw tool bytes) explicit and grep-discoverable. Story implicitly serializes the slice but never writes the raw tool bytes.	New AC-W1 per probe: writes `<raw_dir>/<scanner>.json` (typed slice via `IndexName("<scanner>")` for downstream sibling-slice readers) AND `<raw_dir>/<scanner>-raw.json` (the raw scanner stdout, retained for audit). Failure paths (`ScannerFailed`) only write the slice file, never the malformed raw file.
D5	harden	Per-scanner rich Finding model on the slice (`SemgrepFinding` / `AstGrepFinding` / `RipgrepFinding`) plus the empty `ScannerRan.findings: list[Finding]` from S5-01 (K9 fix) — three independent Pydantic models with `frozen=True, extra="forbid"`. Open/Closed at the file boundary: a future variant (e.g. semgrep `--severity=info` tier) is a sibling model under tagged-union, not a backward-compatible additive field.	Pinned per-scanner Finding-model discipline in AC-9 and the Refactor step. Phase-3+ widening goes through tagged-union extension.
D6	nit	`--metrics=off` is one of three semgrep "phone-home" flags (the others are `SEMGREP_USER_AGENT`, `SEMGREP_SEND_METRICS`). A future maintainer could re-introduce telemetry via env.	Notes-for-implementer paragraph: the env-strip in `run_external_cli._filter_env` is the structural defense; `--metrics=off` is belt-and-suspenders. Both must hold.

Research briefs (if any)¶

None — every fix has direct in-repo precedent:

S5-04 sbom/cve for the _PROBE_ID: Final[ProbeId] + name: str dual-form, the _classify_*_outcome pure-total classifier, the _envelope helper, the two-file write split, the monkeypatch.setattr(mod, "run_external_cli", _spy) mocking pattern, the AST audit for direct-subprocess violations.
S6-05 layer-e for the kernel-drift fix table (eight blocks: _run/run, probe_id/name, ProbeContext.for_test, _PROBE_REGISTRY, from codegenie.ids, ProbeOutput(probe_id=...), tuple vs list[str], ("*",) vs ["*"]).
S5-01 for the ScannerOutcome discriminated-union typing and the closed-set reason enums.
phase-arch-design.md row 7 for the "no shared ScannerRunner" stance.
02-ADR-0006 §Consequences for the closed-set discipline that forbids invented reason values like "output_too_large".

The Hypothesis property test for classifier totality is the standard pattern — no arXiv lookup needed.

Conflict resolutions¶

Design-Patterns D1 (per-scanner pure classifier) vs Rule 2 (Simplicity First) vs the 200 LOC ceiling. D1 wins — the classifier is small (~10 lines), the tagged-union input is _ToolMissing | _ProcessTimedOut | _ProcessExited (3 elements per scanner), and the totality property test buys mutation-resistance not otherwise reachable. Verified: semgrep ~150 LOC + classifier fits under 200; ast-grep ~120 LOC + classifier fits; ripgrep ~150 LOC + classifier fits.
Design-Patterns D2 (rule-of-three trigger) vs phase-arch-design.md row 7 (no shared base). Row 7 wins for classes (no shared ScannerRunner base); D2 wins for helpers (the third copy of _stderr_tail / _envelope / _ProcessExited triggers extraction to a new codegenie.probes._shared.scanner_common module — additive, no edits to existing scanner files). The discipline is "no shared class/base", not "no shared module" — final-design table row 7 explicitly admits ScannerOutcome itself as a shared module under _shared/.
Coverage K9 (per-scanner Finding model) vs ADR-0006 (ScannerRan.findings: list[Finding] is closed). ADR-0006 wins — amending the closed set is expensive and would force every Phase-2 consumer to re-discriminate. The K9 fix moves the rich shape to the slice level (findings_detail: list[SemgrepFinding] on the slice), keeping ScannerRan(findings=[]). Sibling consumers see the closed sum; rich consumers (renderer, Phase-3 planner) read the slice.
Coverage K6 (output_too_large reason) vs ADR-0006 (closed reason enum). ADR-0006 wins — invented reason values violate the smart-constructor discipline. Truncation is observable but not a typed failure mode; it just shifts the invalid_json path's likelihood. Reframed accordingly.
Test-Quality T2 (mock at run_external_cli level) vs the story's stated rationale ("pytest-subprocess mocks at the subprocess layer, exercising the wrapper"). Repo precedent (10+ call sites) wins. The wrapper is exercised by its own test suite (tests/unit/exec/test_run_external_cli.py); each probe's tests stub run_external_cli directly and assert the contract surface (argv, probe_name, cwd, timeout_s, awaitable result). Documented in Notes-for-implementer #4.

Edits applied¶

Twenty-six in-place edits. Categories:

Header — Status: HARDENED (validated 2026-05-17); expanded Depends on: (added S5-01, S5-04 (precedent), S1-08); expanded ADRs honored: (clarified 02-ADR-0006 closed-set discipline).
Validation notes block inserted under header — records the verdict and the four critic categories.
ACs rewritten / added — 12 blocks → 19 ACs (six new: AC-T1 timeout, AC-E1 empty-findings, AC-R1 registry-membership, AC-N1 name identity, AC-B1 ABC-required-attrs, AC-W1 two-file write). Original AC-13 (output_too_large) rewrote to truncation-tolerance; AC-9 rewrote to put rich findings on the slice; AC-14 rewrote to pure platform-independence (AST audit); AC-3/4/5/6/7 corrected to list[str]/timeout_s/probe_name=/async/cwd=repo.root.
TDD plan rewritten — every test uses await SemgrepProbe().run(repo, ctx) with _make_repo() / _make_ctx(); every subprocess mock is monkeypatch.setattr(mod, "run_external_cli", _spy); argv assertions via captured-args spy; AST audit replaces source-grep for AC-16/AC-8; property-based classifier-totality test added per scanner; mutation-resistance smoke table added.
Green skeleton rewritten — name: str = "semgrep"; _PROBE_ID: Final[ProbeId] = ProbeId("semgrep"); full 8-attr ABC compliance; pure _classify_semgrep_outcome(attempt); _envelope helper matching sbom.py; async def run(self, repo, ctx) with await run_external_cli(_PROBE_ID, argv, cwd=repo.root, timeout_s=60.0); six-field ProbeOutput construction.
Refactor section rewritten — pure-classifier extraction step; Final[...] annotations step; two-file write step; rule-of-three trigger documented for S6-07 (gitleaks).
Notes-for-implementer extended — added the four new disciplines (slice-side rich findings, monkeypatch mocking, env-strip + --metrics=off belt-and-suspenders, helper-extraction rule-of-three trigger for S6-07 author).
Files-to-touch updated — added the two raw artifact paths per probe; tests/unit/probes/layer_g/conftest.py for _make_repo / _make_ctx helpers (mirror sbom/cve precedent).

Final verdict¶

HARDENED. The story is now ready for phase-story-executor. The discipline it tests (no shared ScannerRunner base) is preserved; the kernel-drift mis-statements that would have wasted at least one executor attempt are resolved; the per-scanner pure classifier + property-based totality test materially strengthen mutation-resistance; the slice-level rich Finding model honors the closed-set ScannerRan.findings contract; the architectural tests (AST audit, no-cross-scanner-imports, LOC ceiling) are the structural backstop the design row 7 promised.

Next story for the validator (when invoked): S6-07-gitleaks-and-secret-adversarial.md — same scanner shape, with the additional load-bearing concern of the writer-chokepoint test_secret_in_source.py adversarial test (which the design intentionally isolated from this story to keep the four scanners visibly separate).

Validation report: S6-06 — Semgrep + AstGrep + RipgrepCurated Layer G scanners¶