Skip to content

Validation report: S6-06 — Semgrep + AstGrep + RipgrepCurated Layer G scanners

Validated: 2026-05-17 Verdict: HARDENED Validator version: phase-story-validator v1 Story: ../S6-06-layer-g-curated-scanners.md

Summary

S6-06's intent — three sibling Layer G scanner files (semgrep.py, ast_grep.py, ripgrep_curated.py) that decline to share a ScannerRunner base because each has a genuinely-different I/O shape (and semgrep's "exit code 1 = findings present" carve-out is the textbook reason a shared abstraction would be wrong) — is well-formed and traces cleanly to phase-arch-design.md §"Design patterns applied" row 7, §"Component design" #5, and ADR-0001. The story is also the stated test of that design discipline: a reviewer who collapses the four scanners into a shared base immediately fails the architectural tests in test_scanner_loc_ceiling.py. None of that is in dispute.

However, the original draft repeats the same systemic kernel-drift mis-statements every S5-04 / S6-01 / S6-02 / S6-03 / S6-04 / S6-05 validation has had to undo: the frozen Phase-0 Probe ABC (src/codegenie/probes/base.py:74-96) is async def run(self, repo: RepoSnapshot, ctx: ProbeContext) -> ProbeOutput with a name: str class attribute and a six-field ProbeOutput(schema_slice, raw_artifacts, confidence, duration_ms, warnings, errors) constructor. The draft consistently encodes _run(ctx), probe_id = ProbeId("..."), ProbeOutput(probe_id=..., schema_slice=..., errors=[]) (four fields, wrong names), ProbeContext.for_test(...) (does not exist), tuple[str, ...] for applies_to_* (must be list[str]), from codegenie.ids import ProbeId (wrong module — actual is codegenie.types.identifiers), from codegenie.exec import ToolMissingError (actual: codegenie.errors; exec re-exports), run_external_cli(binary, argv, timeout_seconds=...) synchronously (actual is await run_external_cli(probe_name, argv, *, cwd, timeout_s) async with probe_name separate from argv), and pytest-subprocess's fp.register([...]) mocking pattern (repo precedent is monkeypatch.setattr(mod, "run_external_cli", _spy) — see tests/unit/probes/layer_c/test_sbom.py:140 and test_cve.py:141-363).

A second, deeper structural flaw: the story declares per-scanner Pydantic Finding models (SemgrepFinding, AstGrepFinding, RipgrepFinding with different fields) as the ScannerRan.findings payload. But ScannerRan.findings: list[Finding] is typed (S5-01, src/codegenie/probes/_shared/scanner_outcome.py:82-88) against the shared Finding(kind, id, severity, metadata: dict[str, JSONValue]). The S5-04 precedent (sbom.py:244-254) shows how to honor both: ScannerRan(findings=[]) is fine (empty list passes the contract); the scanner-specific rich payload lives in slice-level fields (packages_by_source, package_count, etc.) — not in ScannerRan.findings. The story must adopt this two-level shape, or amend ADR-0006 to widen ScannerRan.findings. The first is by-far cheaper.

A third drift: AC-13 invokes a phantom OutputTooLarge exception and a ScannerFailed(reason="output_too_large") value that does not exist in the ScannerFailed.reason: Literal["invalid_json", "sbom_artifact_missing"] | None closed set. run_external_cli does not raise on cap-exceeded — it returns a ProcessResult with tail-truncated bytes (src/codegenie/exec/__init__.py:579-598). The test must be reframed: "stdout truncated past cap → parse-then-emit-finding still succeeds (tail-preservation guarantee) OR ScannerFailed(reason='invalid_json') if the tail starts mid-token."

Verdict: HARDENED. Twenty-six in-place edits applied — sixteen consistency fixes mirroring S6-05's pattern, six new ACs covering coverage gaps (truncation, timeout, empty-findings, registry-membership smoke, name-identity discipline, ABC-required-attrs), three test-quality hardens (AST audit instead of source-grep, property-based classifier totality, mutation-resistance smoke), and three design-pattern hardens (pure-total classifier per scanner, _PROBE_ID: Final[ProbeId] dual-form identity, slice-level rich findings instead of breaking the ScannerRan.findings contract). No NEEDS RESEARCH findings — every fix has direct in-repo precedent (S5-04 sbom/cve, S6-05 layer-e).

Context Brief (Stage 1)

Story snapshot

  • Goal: Ship src/codegenie/probes/layer_g/{semgrep,ast_grep,ripgrep_curated}.py. Each ≤ 200 LOC, each @register_probe(heaviness="medium"), each follows the five-step inline shape (tool-cache check → run_external_cli → Pydantic parse → ScannerOutcomeProbeOutput), each imports zero code from sibling scanners. The fourth scanner (gitleaks.py) is S6-07.
  • Non-goals: GitleaksProbe (S6-07), TestCoverageMappingProbe (S6-08), custom rule pack authoring, cross-scanner correlation, SecretRedactor invocation (writer-chokepoint job).
  • Effort: M.
  • Depends on: S1-07 (run_external_cli), S3-03 (writer + sanitizer composition), S5-01 (ScannerOutcome sum type), S1-08 (@register_probe(heaviness=...)).

Phase / arch constraints touched

  • Phase 0 ADR-0007Probe ABC frozen. Eight class attributes (name, layer, tier, applies_to_tasks, applies_to_languages, requires, declared_inputs, timeout_seconds, optional cache_strategy). One async method run(repo, ctx). Six-field ProbeOutput.
  • 02-ADR-0001semgrep, ast-grep, ripgrep in ALLOWED_BINARIES. Layer C uses run_allowlisted directly; Layer B/G goes through run_external_cli.
  • 02-ADR-0003 Option D@register_probe accepts only heaviness + runs_last. requires is a class attribute. Coordinator does NOT topologically sort.
  • 02-ADR-0005 — no plaintext persistence; secret findings flow through SecretRedactor at the writer chokepoint. Probe captures honestly.
  • 02-ADR-0006ScannerOutcome variant set is closed; extension is ADR-amendment-gated. ScannerFailed.reason is Literal["invalid_json", "sbom_artifact_missing"] | None. ScannerSkipped.reason is Literal["tool_missing", "tool_unhealthy", "upstream_unavailable"].
  • 02-ADR-0010RedactedSlice smart constructor at writer boundary.
  • phase-arch-design.md §"Design patterns applied" row 7 — no shared ScannerRunner abstraction; the duplication is the discipline.
  • CLAUDE.md "Extension by addition" — adding a 4th scanner (gitleaks) is a new file, not edits to the three landed here.
  • CLAUDE.md "Honest confidence"confidence="high" when the scanner ran and parsed; confidence="low" when skipped or failed.

Findings by critic

Coverage critic

ID Severity Finding Proposed fix
K1 harden No AC for timeout → typed outcome. run_external_cli raises ProbeTimeoutError on timeout_s exceeded; the story doesn't say what happens. S5-04 precedent: _ProcessExited(exit_code=124, stdout=b"", stderr_tail="<probe>.timeout")ScannerFailed(exit_code=124, ...). New AC-T1: timeout → ScannerFailed(exit_code=124, stderr_tail="<probe>.timeout"). New test per probe.
K2 harden No AC for empty findings success (semgrep exit 0 with results: []). Mutation: an implementation that returned ScannerSkipped(...) on empty findings would pass every other AC. New AC-E1: exit 0 + valid JSON + empty resultsScannerRan(findings=[]) with confidence="high".
K3 harden No AC for registry-membership smoke (mirror S6-05 AC-NEW-2 / S5-04 K2). The @register_probe(heaviness="medium") decorator effect is unverified — a future maintainer who drops the decorator silently loses dispatch. New AC-R1: _PROBE_REGISTRY["semgrep"] / ["ast_grep"] / ["ripgrep_curated"] entries exist with heaviness="medium" and runs_last=False.
K4 harden No AC for the name class-attribute identity (mirror S6-05 AC-NEW-1 / S6-04 dual-form discipline). The story uses probe_id = ProbeId("semgrep") which is not the ABC attribute (ABC has name: str). Per S5-04 sbom/cve precedent, the right shape is _PROBE_ID: Final[ProbeId] = ProbeId("semgrep") module constant + name: str = "semgrep" class attribute — the _PROBE_ID is what feeds run_external_cli(probe_name=...); the name is what the kernel introspects. New AC-N1 per probe: module-level _PROBE_ID: Final[ProbeId] + class-level name: str = "semgrep" matching the filename.
K5 harden No AC pinning the eight ABC class attributes (layer="G", tier="base", requires=[], declared_inputs=[<globs>], cache_strategy default). S5-04 precedent: cve.py:177-185 pins every one. Without these the registry can still load but mypy --strict will let layer = "F" slip. New AC-B1: every probe pins layer="G", tier="base", requires: list[str] = [], declared_inputs: list[str] = [<per-scanner globs>].
K6 harden AC-13 invokes a phantom OutputTooLarge exception and ScannerFailed(reason="output_too_large"). Neither exists — run_external_cli tail-truncates and returns a ProcessResult (src/codegenie/exec/__init__.py:579-598); ScannerFailed.reason is closed to {"invalid_json", "sbom_artifact_missing", None} (scanner_outcome.py:120). Rewrote AC-13: stdout-cap-exceeded path is exercised by parametrized test that mocks run_external_cli to return a ProcessResult with truncated bytes; assert (a) the probe doesn't raise, (b) if the tail still parses → ScannerRan(...), (c) if the tail starts mid-token → ScannerFailed(reason="invalid_json").
K7 harden AC-14 (bubblewrap no-op on macOS) tests sys.platform = "darwin" — but the platform check is inside run_external_cli, not the probe. The probe sees no difference between Linux+bwrap and macOS+no-bwrap. The correct AC is "probe is platform-independent": same monkeypatched run_external_cli stub → same probe outcome on either platform. Rewrote AC-14: the probe contains zero sys.platform / platform.system() / shutil.which("bwrap") references (AST audit). The platform-handling is the wrapper's job.
K8 harden AC-18 (sub-schema round-trip) defers to S6-08 but the story doesn't say what to assert in S6-06's TDD lane. S6-05 precedent: pin Pydantic model_config["extra"] == "forbid" on the slice model in this story; defer JSON-Schema round-trip to S6-08. Soft AC-18a: each slice model has model_config = ConfigDict(frozen=True, extra="forbid"). JSON-Schema round-trip deferred to S6-08.
K9 harden The per-scanner Pydantic Finding shape (SemgrepFinding/AstGrepFinding/RipgrepFinding) collides with ScannerRan.findings: list[Finding] where Finding is the shared S5-01 model. AC-9 needs reframing OR ADR-0006 amendment (the latter is expensive — closed-set discipline). Adopt the S5-04 precedent: ScannerRan(findings=[]) (empty); the rich shape lives in slice-level fields. Rewrote AC-9 to use findings_detail: list[SemgrepFinding] etc. on the slice, with ScannerRan.findings left empty (the typed sum signals "scanner ran"; the rich payload is slice-side).
K10 nit AC-7's ripgrep argv embeds the literal pattern "exec\\(" as a string in the argv list — but --type-not lock argument order matters (rg parses positional flags). Also --max-count is per-file, NOT per-pattern (rg's documented semantics). Tightened AC-7: argv order pinned with mutation note; replaced "per-pattern" with "per-file" in description.

Test-Quality critic

ID Severity Finding Proposed fix
T1 block Every TDD test imports from codegenie.probes.layer_g import semgrep as sg and calls sg.SemgrepProbe()._run(ctx). The ABC's only method is async def run(self, repo, ctx)_run does not exist. Tests will fail at AttributeError, not catch any business logic. Rewrote every test to use await SemgrepProbe().run(repo, ctx) with helper builders _make_repo() / _make_ctx() mirroring tests/unit/probes/layer_c/test_sbom.py conftest.
T2 block Tests use pytest-subprocess's fp.register([...]) — but the repo's actual mocking precedent (10+ call sites) is monkeypatch.setattr(mod, "run_external_cli", _spy) returning an awaitable ProcessResult. pytest-subprocess is not in pyproject.toml § [project.optional-dependencies] dev. Rewrote every test to monkeypatch run_external_cli directly on the scanner module. The argv-introspection assertion is now a captured-args spy (captured["argv"]) inside the stub.
T3 harden No property-based test on the per-scanner classifier. Each scanner has a pure classifier (_classify_semgrep_outcome, etc.) that maps (ProcessResult \| _ToolMissing \| _ProcessTimedOut)ScannerOutcome. Totality (every input → exactly one variant, no raises) is the load-bearing property. S5-04 T3 precedent. New parametrized Hypothesis property test per scanner: for any drawn ScannerAttempt, classifier returns exactly one ScannerOutcome and never raises.
T4 harden AC-16's "no direct subprocess" test uses source-grep — defeatable by string concatenation or re-export. S5-04 T4 precedent: AST audit. Replaced with AST-walk: parse the module source, walk for ast.Call whose .func resolves to subprocess.run / subprocess.Popen / asyncio.create_subprocess_exec / asyncio.create_subprocess_shell / os.system / os.popen. Also assert run_allowlisted is not imported in any layer_g module (Layer G uses the run_external_cli wrapper; Layer C uses run_allowlisted directly).
T5 harden No mutation-resistance smoke (mirror S5-04 T2 / S6-05 mutation table). A scanner that always returned ScannerRan(findings=[]) regardless of exit code would pass most happy-path tests. New Test-MR: parametrized table of 6+ wrong stubs per scanner (always-ran, always-skipped, always-failed, swallow-ValidationError, ignore-exit-1, raise-past-boundary); assert each fails at least one named test.
T6 harden The test_semgrep_exit_code_1_is_findings_not_failure test pins only that outcome is ScannerRan. A trivially-wrong implementation that returned ScannerRan(findings=[]) regardless of stdout would pass. Mutation hint missing. Strengthened to assert len(slice_.findings_detail) == 1 AND slice_.findings_detail[0].check_id == "p/nodejs.eval-detected". Mutation note added.
T7 harden No test asserts run_external_cli is awaited with the correct probe_name= argument (the ProbeId value). A copy-paste between scanners could leave semgrep.py calling run_external_cli(ProbeId("ast_grep"), ...). New test per scanner: spy captures the probe_name positional arg; assert equality to the scanner's _PROBE_ID.

Consistency critic

ID Severity Finding Proposed fix
C1 block Every probe-class skeleton uses probe_id = ProbeId("...") as a class attribute. The frozen ABC (base.py:75) declares name: str. The story violates Phase 0 ADR-0007's "byte-for-byte localv2 §4" snapshot test (tests/unit/test_probe_contract.py). Rewrote class skeleton: name: str = "<scanner>" class attribute + module-level _PROBE_ID: Final[ProbeId] = ProbeId("<scanner>") constant (S5-04 / S6-05 dual-form precedent). The _PROBE_ID is the value passed to run_external_cli(probe_name=..., ...).
C2 block from codegenie.ids import ProbeId. The actual module is codegenie.types.identifiers (src/codegenie/types/identifiers.py; cve.py:81 + sbom.py:78 import precedent). Corrected to from codegenie.types.identifiers import ProbeId.
C3 block from codegenie.exec import ProcessResult, ToolMissingError, run_external_cli. ToolMissingError actually lives in codegenie.errors; exec re-exports it but the sibling-precedent (sbom.py:59-60) imports ToolMissingError (and ProbeTimeoutError) from codegenie.errors, not from codegenie.exec. Corrected the import lines: from codegenie.errors import ProbeTimeoutError, ToolMissingError; from codegenie.exec import ProcessResult, run_external_cli.
C4 block The Green skeleton's _run(self, ctx) returns ProbeOutput(probe_id=..., confidence=..., schema_slice=..., errors=[]) — four kwargs. The actual ProbeOutput dataclass is (schema_slice, raw_artifacts, confidence, duration_ms, warnings, errors). There is no probe_id field; missing raw_artifacts, duration_ms, warnings. Rewrote the skeleton's _envelope helper to construct ProbeOutput(schema_slice={"semgrep": slice_dict}, raw_artifacts=raw_artifacts, confidence=..., duration_ms=int((time.perf_counter() - t0) * 1000), warnings=[], errors=[]) matching sbom.py:289-297.
C5 block _run is the wrong method name. The ABC's frozen method is async def run(self, repo: RepoSnapshot, ctx: ProbeContext) -> ProbeOutput. Story's signature drops repo entirely, but the scanner needs repo.root as cwd to run_external_cli. Rewrote signature to async def run(self, repo: RepoSnapshot, ctx: ProbeContext) -> ProbeOutput everywhere; cwd=repo.root in the run_external_cli call.
C6 block run_external_cli(binary, argv, timeout_seconds=...) — wrong call site. Actual signature is await run_external_cli(probe_name: ProbeId, argv: list[str], *, cwd: Path, timeout_s: float, allowlisted_egress=frozenset(), max_stdout_bytes=64 * 1024 * 1024). argv[0] is the binary; the first positional is probe_name; the kwarg is timeout_s (not _seconds); cwd is required. Rewrote AC-5/6/7 and the skeleton's call sites: await run_external_cli(_PROBE_ID, ["semgrep", "--config", ..., str(repo.root)], cwd=repo.root, timeout_s=60.0).
C7 block applies_to_tasks: tuple[str, ...] = ("*",). ABC's frozen type is list[str] (mirror S6-05 finding). Tuple vs list collide on registry introspection. Corrected to applies_to_tasks: list[str] = ["*"] and applies_to_languages: list[str] = ["*"] on every probe.
C8 block Tests use ProbeContext.for_test(repo_root=...). No such constructor exists; ProbeContext is a stdlib @dataclass with required fields (cache_dir, output_dir, workspace, logger, config). Same drift S6-05 / S6-04 had to fix. Replaced with _make_repo() + _make_ctx() fixtures (mirror tests/unit/probes/layer_c/test_sbom.py conftest pattern); construct RepoSnapshot(root=..., git_commit=None, detected_languages={}, config={}) and ProbeContext(cache_dir=tmp_path / "cache", output_dir=tmp_path / "out", workspace=tmp_path / "ws", logger=logging.getLogger("test"), config={}).
C9 harden _PROBE_REGISTRY["semgrep"].heaviness — registry-entry shape unverified. Per S5-03/S5-04 validation reports, the registry entry stores only heaviness + runs_last. Asserted both fields and asserted no requires key (defensive mutation note that @register_probe(requires=...) was a kwarg-phantom in earlier drafts).
C10 harden ScannerRan(findings=[SemgrepFinding(...)]) — type-fails because ScannerRan.findings: list[Finding] (the shared S5-01 model), not list[SemgrepFinding]. Per K9 fix: keep ScannerRan(findings=[]); put rich findings on the slice as findings_detail: list[SemgrepFinding].
C11 harden The _config_path helper reads ctx.config.get("semgrep_config", "p/nodejs") but ctx.config: dict[str, Any] semantics aren't pinned (mirror S6-05's config discipline). Pinned: keyed lookup with explicit default; type cast to Path only when the value is a filesystem path, else passed as a registry slug string to semgrep's --config.
C12 harden Schema path src/codegenie/schema/probes/layer_g/{semgrep,ast_grep,ripgrep_curated}.schema.json. Existing convention (src/codegenie/schema/probes/layer_c/) supports the nesting — confirmed precedent. Surfaced — convention holds; AC-18a defers JSON-Schema round-trip to S6-08.

Design-Patterns critic

ID Severity Finding Proposed fix
D1 harden The story's Green skeleton runs the five steps inline in _run. The S5-04 precedent (sbom.py:100-112) introduces a pure, total classifier (_classify_syft_outcome(attempt: ScannerAttempt) -> ScannerOutcome) on a per-scanner tagged-union input. This gives (a) match-exhaustive assert_never discipline, (b) a property-based totality test (mutation-resistance), (c) functional core / imperative shell separation. The story misses this opportunity. New Refactor step: each scanner declares a private tagged union <Scanner>Attempt = _ToolMissing | _ProcessTimedOut | _ProcessExited(...) and a pure classifier _classify_<scanner>_outcome(attempt) -> ScannerOutcome. The run() method's I/O is reduced to: build attempt → classify → wrap. Mirrors sbom.py line-for-line.
D2 harden The rule-of-three trigger is mentioned in the design table but the story does not pin the threshold. S5-04 D2 hardening surfaced that "extract a shared kernel at the 4th scanner" is the next trigger; here it's the 3rd + 4th scanner that materially duplicate. New Notes-for-implementer paragraph: when S6-07 (gitleaks) lands, three of the four scanners will share (a) the _ProcessExited tagged-union element + _stderr_tail helper, (b) the _envelope helper, (c) the _PROBE_ID + _TIMEOUT Final constants. The trigger for extraction to codegenie.probes._shared.scanner_common (a NEW module — not edits to existing scanners) is "third scanner ships and three concrete behaviors duplicate verbatim across files." The semgrep exit-1 carve-out and ripgrep's NDJSON parser stay per-scanner (genuinely different).
D3 harden _CURATED_PATTERNS: Final[tuple[str, ...]] is correct but missing the Final annotation in the skeleton (only mentioned in Notes). Module constants without Final lose mypy's rebinding backstop. Pin Final[...] annotation on every module constant (_PROBE_ID, _TIMEOUT_S, _CURATED_PATTERNS, _SLICE_FILENAME, _RAW_TOOL_FILENAME) in the skeleton + a one-line refactor step.
D4 harden Per-scanner _RAW_TOOL_FILENAME / _SLICE_FILENAME constants are missing — sibling-precedent (sbom.py:85-86) makes the writer's two-file split (typed slice file + raw tool bytes) explicit and grep-discoverable. Story implicitly serializes the slice but never writes the raw tool bytes. New AC-W1 per probe: writes <raw_dir>/<scanner>.json (typed slice via IndexName("<scanner>") for downstream sibling-slice readers) AND <raw_dir>/<scanner>-raw.json (the raw scanner stdout, retained for audit). Failure paths (ScannerFailed) only write the slice file, never the malformed raw file.
D5 harden Per-scanner rich Finding model on the slice (SemgrepFinding / AstGrepFinding / RipgrepFinding) plus the empty ScannerRan.findings: list[Finding] from S5-01 (K9 fix) — three independent Pydantic models with frozen=True, extra="forbid". Open/Closed at the file boundary: a future variant (e.g. semgrep --severity=info tier) is a sibling model under tagged-union, not a backward-compatible additive field. Pinned per-scanner Finding-model discipline in AC-9 and the Refactor step. Phase-3+ widening goes through tagged-union extension.
D6 nit --metrics=off is one of three semgrep "phone-home" flags (the others are SEMGREP_USER_AGENT, SEMGREP_SEND_METRICS). A future maintainer could re-introduce telemetry via env. Notes-for-implementer paragraph: the env-strip in run_external_cli._filter_env is the structural defense; --metrics=off is belt-and-suspenders. Both must hold.

Research briefs (if any)

None — every fix has direct in-repo precedent:

  • S5-04 sbom/cve for the _PROBE_ID: Final[ProbeId] + name: str dual-form, the _classify_*_outcome pure-total classifier, the _envelope helper, the two-file write split, the monkeypatch.setattr(mod, "run_external_cli", _spy) mocking pattern, the AST audit for direct-subprocess violations.
  • S6-05 layer-e for the kernel-drift fix table (eight blocks: _run/run, probe_id/name, ProbeContext.for_test, _PROBE_REGISTRY, from codegenie.ids, ProbeOutput(probe_id=...), tuple vs list[str], ("*",) vs ["*"]).
  • S5-01 for the ScannerOutcome discriminated-union typing and the closed-set reason enums.
  • phase-arch-design.md row 7 for the "no shared ScannerRunner" stance.
  • 02-ADR-0006 §Consequences for the closed-set discipline that forbids invented reason values like "output_too_large".

The Hypothesis property test for classifier totality is the standard pattern — no arXiv lookup needed.

Conflict resolutions

  • Design-Patterns D1 (per-scanner pure classifier) vs Rule 2 (Simplicity First) vs the 200 LOC ceiling. D1 wins — the classifier is small (~10 lines), the tagged-union input is _ToolMissing | _ProcessTimedOut | _ProcessExited (3 elements per scanner), and the totality property test buys mutation-resistance not otherwise reachable. Verified: semgrep ~150 LOC + classifier fits under 200; ast-grep ~120 LOC + classifier fits; ripgrep ~150 LOC + classifier fits.
  • Design-Patterns D2 (rule-of-three trigger) vs phase-arch-design.md row 7 (no shared base). Row 7 wins for classes (no shared ScannerRunner base); D2 wins for helpers (the third copy of _stderr_tail / _envelope / _ProcessExited triggers extraction to a new codegenie.probes._shared.scanner_common module — additive, no edits to existing scanner files). The discipline is "no shared class/base", not "no shared module" — final-design table row 7 explicitly admits ScannerOutcome itself as a shared module under _shared/.
  • Coverage K9 (per-scanner Finding model) vs ADR-0006 (ScannerRan.findings: list[Finding] is closed). ADR-0006 wins — amending the closed set is expensive and would force every Phase-2 consumer to re-discriminate. The K9 fix moves the rich shape to the slice level (findings_detail: list[SemgrepFinding] on the slice), keeping ScannerRan(findings=[]). Sibling consumers see the closed sum; rich consumers (renderer, Phase-3 planner) read the slice.
  • Coverage K6 (output_too_large reason) vs ADR-0006 (closed reason enum). ADR-0006 wins — invented reason values violate the smart-constructor discipline. Truncation is observable but not a typed failure mode; it just shifts the invalid_json path's likelihood. Reframed accordingly.
  • Test-Quality T2 (mock at run_external_cli level) vs the story's stated rationale ("pytest-subprocess mocks at the subprocess layer, exercising the wrapper"). Repo precedent (10+ call sites) wins. The wrapper is exercised by its own test suite (tests/unit/exec/test_run_external_cli.py); each probe's tests stub run_external_cli directly and assert the contract surface (argv, probe_name, cwd, timeout_s, awaitable result). Documented in Notes-for-implementer #4.

Edits applied

Twenty-six in-place edits. Categories:

  • HeaderStatus: HARDENED (validated 2026-05-17); expanded Depends on: (added S5-01, S5-04 (precedent), S1-08); expanded ADRs honored: (clarified 02-ADR-0006 closed-set discipline).
  • Validation notes block inserted under header — records the verdict and the four critic categories.
  • ACs rewritten / added — 12 blocks → 19 ACs (six new: AC-T1 timeout, AC-E1 empty-findings, AC-R1 registry-membership, AC-N1 name identity, AC-B1 ABC-required-attrs, AC-W1 two-file write). Original AC-13 (output_too_large) rewrote to truncation-tolerance; AC-9 rewrote to put rich findings on the slice; AC-14 rewrote to pure platform-independence (AST audit); AC-3/4/5/6/7 corrected to list[str]/timeout_s/probe_name=/async/cwd=repo.root.
  • TDD plan rewritten — every test uses await SemgrepProbe().run(repo, ctx) with _make_repo() / _make_ctx(); every subprocess mock is monkeypatch.setattr(mod, "run_external_cli", _spy); argv assertions via captured-args spy; AST audit replaces source-grep for AC-16/AC-8; property-based classifier-totality test added per scanner; mutation-resistance smoke table added.
  • Green skeleton rewrittenname: str = "semgrep"; _PROBE_ID: Final[ProbeId] = ProbeId("semgrep"); full 8-attr ABC compliance; pure _classify_semgrep_outcome(attempt); _envelope helper matching sbom.py; async def run(self, repo, ctx) with await run_external_cli(_PROBE_ID, argv, cwd=repo.root, timeout_s=60.0); six-field ProbeOutput construction.
  • Refactor section rewritten — pure-classifier extraction step; Final[...] annotations step; two-file write step; rule-of-three trigger documented for S6-07 (gitleaks).
  • Notes-for-implementer extended — added the four new disciplines (slice-side rich findings, monkeypatch mocking, env-strip + --metrics=off belt-and-suspenders, helper-extraction rule-of-three trigger for S6-07 author).
  • Files-to-touch updated — added the two raw artifact paths per probe; tests/unit/probes/layer_g/conftest.py for _make_repo / _make_ctx helpers (mirror sbom/cve precedent).

Final verdict

HARDENED. The story is now ready for phase-story-executor. The discipline it tests (no shared ScannerRunner base) is preserved; the kernel-drift mis-statements that would have wasted at least one executor attempt are resolved; the per-scanner pure classifier + property-based totality test materially strengthen mutation-resistance; the slice-level rich Finding model honors the closed-set ScannerRan.findings contract; the architectural tests (AST audit, no-cross-scanner-imports, LOC ceiling) are the structural backstop the design row 7 promised.

Next story for the validator (when invoked): S6-07-gitleaks-and-secret-adversarial.md — same scanner shape, with the additional load-bearing concern of the writer-chokepoint test_secret_in_source.py adversarial test (which the design intentionally isolated from this story to keep the four scanners visibly separate).