Skip to content

Validation report: S6-02 — ConventionsProbe Layer D

Validated: 2026-05-17 Verdict: HARDENED Validator version: phase-story-validator v1 Story: ../S6-02-conventions-probe.md

Summary

S6-02's intent (a Layer-D probe that applies ConventionsCatalogLoader.load_all() output to a RepoSnapshot and emits one typed ConventionResult = Pass | Fail | NotApplicable per rule) is well-formed and traces cleanly to phase-arch-design.md §"Component design" #10, the CLAUDE.md "Facts, not judgments" commitment, localv2.md §5.4 D5, and 02-ADR-0006's discriminated-union discipline. The original draft, however, contradicted the kernel S2-02 actually shipped at thirteen load-bearing points — every one a block-severity contract mismatch that would have made the story uncompilable against the existing codebase. The pattern is identical to S6-01's hardening: documentation drift between the story-authoring snapshot of S2-02's interfaces and the implementation that ultimately landed.

All thirteen contract mismatches were in-place fixable because the goal itself remains consistent with both the architecture and the S2-02 kernel; the draft's mistakes were drift, not architectural divergence. Six new ACs were added to cover the corners the draft skipped (FatalLoadError surface, per-file-error surfacing with the three-state confidence policy mirroring S6-01, Catalog.apply memoization preservation, ConventionId newtype propagation through ConventionResult.rule_id, deterministic raw-artifact emission, and a sub-schema layout aligned to the flat codegenie.schema.probes/ precedent). Three mutation-resistance hardens were applied (parametrized 4×3 pattern-type × outcome smoke; module-source guard against safe_yaml.load re-entry and Path.open reads inside the probe; functional-core/imperative-shell split with all transforms behind pure helpers). One design-pattern harden was applied (the Catalog.apply memo is the kernel-side single source of truth for per-snapshot caching; the probe is forbidden from layering a second memo on top, mirroring S6-01's "the loader paid the cost once" discipline).

The original draft's AC-7 (Fail carries file, line, snippet evidence) was structurally unimplementable: the S2-02 Fail model has only rule_id: ConventionId and evidence: str (frozen=True, extra="forbid"), and the four _apply_* helpers emit short documented evidence strings ("forbidden pattern present in Dockerfile", "<relpath>: pattern not found", "unexpected file present: <relpath>") — there is no line tracking. The harden routes evidence assertions through the documented constants (_REASON_NO_DOCKERFILE = "no_dockerfile_present", _REASON_GLOB_EMPTY = "file_glob_no_matches") and the literal evidence strings, and explicitly notes for the implementer that line/column capture is a future enhancement requiring an ADR amendment to the Fail model (out-of-scope for Phase 2).

No NEEDS RESEARCH findings — every gap traced to an in-repo precedent: src/codegenie/conventions/{model,catalog,loader}.py for the kernel contract; docs/phases/02-context-gather-layers-b-g/stories/S6-01-skills-index-probe.md (HARDENED) for every probe-shaping convention (async run, _make_context, _compute_confidence, three-state confidence policy, flat schema path, _PROBE_ID Final constant alongside name: str ABC attr, default_registry._entries lookup); src/codegenie/probes/layer_b/scip_index.py for the Final[ProbeId] + name: str dual-form precedent. The S6-01 validation report's twelve-issue catalog is the structural template for this report.

Twenty-six in-place edits applied; verdict HARDENED. Story is now structurally consistent with the S2-02 kernel shipped at src/codegenie/conventions/ and with every load-bearing convention the Phase-0/1/2 stories (and S6-01's hardening) have already established.

Context Brief (Stage 1)

Story snapshot

  • Goal: Land src/codegenie/probes/layer_d/conventions.py as a @register_probe(heaviness="light") probe that consumes ConventionsCatalogLoader.load_all()'s CatalogLoadOutcome and applies the resulting Catalog to the analyzed-repo RepoSnapshot, emitting a ConventionsSlice carrying a deterministic sorted tuple of ConventionResult records (each one Pass | Fail | NotApplicable), a tuple of resolved catalog search paths, a tuple of per-file load errors, and a rules_checked: int.
  • Non-goals: OPA/Rego pattern types (Phase 16 / ADR-0021); the four other Layer-D marker probes (S6-03 — adrs, policy, exceptions, repo_notes, repo_config); the sub-schema authoring (S6-08); the catalog-version freshness check (S6-08); per-rule fix suggestions (Planner's job per "Facts, not judgments").
  • Effort: S (the kernel does all the work in Catalog.apply; this is a thin imperative shell).
  • Depends on: S2-02 (the ConventionsCatalogLoader with load_all() -> Result[CatalogLoadOutcome, FatalLoadError], Catalog.apply(repo) -> list[ConventionResult], the four ConventionRule discriminated variants, and the ConventionResult = Pass | Fail | NotApplicable discriminated union with evidence: str on Fail and reason: str on NotApplicable).

Phase / arch constraints touched

  • 02-ADR-0006 — typed sum types as discriminated unions; ConventionResult is the canonical exemplar alongside IndexFreshness. The probe MUST preserve the union exhaustively; mypy --warn-unreachable enforces totality on consumers.
  • 02-ADR-0003@register_probe(heaviness=..., runs_last=...) is a registry kwarg, NOT a Probe ABC field. The original draft's probe_id = ProbeId(...) field implied an ABC change that ADR-0007 forbids.
  • 02-ADR-0007 (Phase 0)Probe ABC is frozen byte-for-byte against localv2.md §4. The actual ABC at src/codegenie/probes/base.py:74-96 uses name: str, layer, tier, applies_to_tasks: list[str], requires, declared_inputs, timeout_seconds: int, cache_strategy, and async def run(self, repo: RepoSnapshot, ctx: ProbeContext) -> ProbeOutput. The story's probe_id, tuples-for-list, and _run(ctx) private-sync entry are ALL contract violations.
  • 02-ADR-0007 (Phase 2) — No plugin loader in Phase 2; ConventionsCatalogLoader is kernel-side, not loaded via a plugin registry.
  • Phase 1 ADR-0006safe_yaml.load is the YAML chokepoint. The probe MUST NOT import yaml directly; the loader already routes through safe_yaml.load. The story's TDD plan helper _write_catalog uses yaml.safe_dump to write fixture YAML, which is acceptable (the test author is the operator producing the fixture, not the probe loading it); the probe never sees yaml.
  • 02-ADR-0005 — secret findings: no plaintext persistence. Conventions slices are not a secret-producing surface, but the writer chokepoint (OutputSanitizer.scrub) still runs on every slice — the probe inherits the discipline.
  • phase-arch-design.md §"Component design" #10 — the canonical loader-contract source. Documents load_all() -> Result[Catalog, ConventionsError]; the actual S2-02 implementation returns Result[CatalogLoadOutcome, FatalLoadError] where per-file ConventionsError instances stay inside Ok(CatalogLoadOutcome.per_file_errors) and only catastrophic failures (no search path readable) surface as Err(FatalLoadError). This is the same Skills-loader pattern S6-01 surfaced — partial success is the expected path; total failure is the exception. The arch-design line is the older draft; the kernel is what the probe must consume.
  • phase-arch-design.md §"Failure behavior" (loader #10) — unknown pattern type is per-file, not catastrophic. Stays inside per_file_errors.
  • phase-arch-design.md §"Edge cases" rows 8-9 — symlink-refused / hostile-YAML on a single catalog file is "this catalog skipped, others load." The probe must surface these via the slice's per_file_errors, not collapse to confidence="low".
  • CLAUDE.md "Facts, not judgments" — the probe emits per-rule outcomes without weighting, summarizing, or aggregating beyond len. The Planner decides.
  • CLAUDE.md "Honest confidence" — three-state policy required: "high" clean (including empty-catalog), "medium" partial (some files failed, some loaded), "low" total (every catalog file failed) or FatalLoadError.
  • CLAUDE.md "Extension by addition" — adding a new pattern type is a new ConventionRule* variant + a new _apply_* helper + an exhaustive-match arm in _apply_one. The probe is decoupled from this: any new rule variant Catalog.apply admits flows through ConventionResult automatically.

Sibling-family lineage

  • 2nd Layer-D probe landed (after S6-01 SkillsIndexProbe). Rule-of-three for layer-shared helpers has NOT triggered. The kernel-side helpers (_compute_confidence, _make_context) are test-helper precedents from S6-01, not shared production code; the probe-shape conventions ARE production precedents and must be followed verbatim.
  • 2nd @register_probe(heaviness="light") probe in Layer D (after S6-01). The third (adrs.py in S6-03) is the rule-of-three trigger for any shared marker-probe abstraction — but only if it actually shares runtime behavior, which the design says explicitly it does not.
  • Functional-core split is the architectural precedent. S6-01's hardened story extracted _project_skill, _project_skills_sorted, _count_skills_per_tier, _compute_confidence as pure module-level helpers. S6-02 inherits the discipline: _project_results_sorted, _compute_confidence(applied, per_file_errors), _resolve_search_paths (pure, no I/O).
  • Catalog.apply IS the functional core for application. S2-02 already split functional-core (_apply_dockerfile_pattern, _apply_dockerfile_pattern_inverted, _apply_file_pattern, _apply_missing_file + _apply_one exhaustive-match dispatcher) from imperative-shell (Catalog.apply reading RepoSnapshot files at the boundary). The probe is the imperative shell around Catalog.apply — adapting ProbeContext to RepoSnapshot and projecting list[ConventionResult] into a frozen Pydantic slice.

Prior validation framings carried forward

  • S6-01 hardening: every probe-shape convention (async run(repo, ctx), name: str ABC field + optional _PROBE_ID: Final[ProbeId] constant, _make_context test helper, flat schema path, default_registry._entries lookup, three-state confidence via pure helper, ProbeOutput six-field shape with duration_ms via time.perf_counter()). S6-02 inherits these verbatim.
  • S5-06 hardening: parametrized mutation-resistance suites > developer-runnable mutation rituals. The 4×3 pattern-type × outcome parametrize is the analogous pattern here.
  • S5-05 hardening: diagnostic independence as separate test functions / separate fixtures. Each AC gets its own test function.
  • S5-04 hardening: Final[...] discipline for module constants. The probe's _PROBE_ID, _BASE_VERSION, _DECLARED_INPUT_TOKEN_USER, _DECLARED_INPUT_TOKEN_REPO should be Final[...].
  • S5-03 hardening: AST-walk / source-introspection audits supersede source-grep purity tests where a "did the test author do the right thing?" check is needed. AC-14's "no shared base class" is restated as an inheritance-depth assertion (ConventionsProbe.__mro__ length == 3: ConventionsProbe → Probe → ABC → object) plus a module-source grep for forbidden imports — the AST-walk is overkill for this story (the probe is < 100 LOC).

Phase exit criteria the story contributes to

  • High-level-impl.md §"Step 6" — ships Layer-D conventions probe consuming ConventionsCatalogLoader.
  • Phase-arch-design §"Testing strategy" — unit tests under tests/unit/probes/layer_d/.
  • CLAUDE.md "Facts, not judgments"ConventionResult per rule, no aggregation.
  • G6 (final-design §"Goals") — kernel scaffolding for Phase-4+ Planner consumption of org conventions.
  • @register_index_freshness_check for conventions (Step 6, last bullet — catalog version freshness) — deferred to S6-08; this story emits the per-file load errors and resolved search paths so the freshness check has something to consume.

Open ambiguities discovered during Stage 1

  • ConventionsCatalogLoader.load_all return type drift. Story (and arch-design line 608) says Result[Catalog, ConventionsError]. The actual S2-02 implementation (src/codegenie/conventions/loader.py:272) returns Result[CatalogLoadOutcome, FatalLoadError] where CatalogLoadOutcome carries catalog: Catalog AND per_file_errors: list[ConventionsError]. Per-file errors stay inside Ok(...); only FatalLoadError(reason="no_search_path_readable", paths=[...]) surfaces as Err(...). Resolved at synthesis: AC-4 rewritten to pattern-match Ok(CatalogLoadOutcome) / Err(FatalLoadError); new AC-16 covers FatalLoadError → low confidence; new AC-17 covers per-file errors round-trip; three-state confidence policy via _compute_confidence(applied, per_file_errors) mirroring S6-01.

  • Probe ABC signature. Story specifies _run(self, ctx: ProbeContext) -> ProbeOutput (sync, private, ctx only). The actual ABC at src/codegenie/probes/base.py:94 is async def run(self, repo: RepoSnapshot, ctx: ProbeContext) -> ProbeOutput (async, public, both repo and ctx). The story's _run does not exist in the contract. Resolved at synthesis: rewrite AC-3 (signature), AC-4, AC-6, AC-15, AC-17 to async def run; GREEN code rewritten with async def run(self, repo, ctx); TDD plan uses asyncio.run(...) mirroring tests/unit/probes/layer_b/test_dep_graph.py and S6-01's pattern.

  • Probe identity convention. Story specifies probe_id = ProbeId("conventions") as a class attribute. The actual ABC uses name: str = "conventions". The ProbeId newtype lives at codegenie.types.identifiers (not codegenie.ids — wrong import path in the draft). The precedent at src/codegenie/probes/layer_b/scip_index.py:114 (_PROBE_ID: Final[ProbeId] = ProbeId("scip_index") alongside name: str = "scip_index") is the dual-form to follow. Resolved at synthesis: rewrite AC-3 to use name: str = "conventions" per the ABC; introduce _PROBE_ID: Final[ProbeId] = ProbeId("conventions") as a module-level constant; the slice's rule_id fields continue to be ConventionId newtypes (the S2-02 kernel already preserves this).

  • Fail field shape. Story AC-7 prescribes Fail(rule_id=..., file=PosixPath("Dockerfile"), line=None, snippet=None) with line/snippet populated for inverted-match cases. The actual Fail Pydantic model at src/codegenie/conventions/model.py:151-156 has only rule_id: ConventionId and evidence: str, with frozen=True, extra="forbid". The four _apply_* helpers in catalog.py emit short documented evidence strings: "pattern not found in Dockerfile", "forbidden pattern present in Dockerfile", f"{relpath}: pattern not found", f"unexpected file present: {relpath}". There is no line tracking. Resolved at synthesis: AC-7 rewritten to assert that Fail.evidence carries the documented strings; line/column capture noted as out-of-scope (would require an ADR amendment to the Fail model).

  • Pass field shape. Story AC-8 says Pass rejects file= / line= kwargs. The actual model already does this via extra="forbid" — the AC is structurally correct, but the test as written constructs Pass(rule_id="r1", file="Dockerfile") with rule_id as a plain str. Pass.rule_id is ConventionId (a NewType over str); at runtime ConventionId("r1") is structurally identical to "r1" and Pydantic v2 accepts the str directly. The test still passes, but the harden tightens the assertion to also confirm Pass.kind == "pass" and that adding any extra kwarg (not just file=) fails. Resolved at synthesis: AC-8 retains intent; test rewritten to parametrize the rejected kwarg over ("file", "line", "snippet", "reason", "evidence").

  • NotApplicable.reason value. Story AC-6 specifies reason="dockerfile_absent". The actual constant in catalog.py:51 is _REASON_NO_DOCKERFILE = "no_dockerfile_present". Similarly, for file_pattern / missing_file with an empty glob, the constant is _REASON_GLOB_EMPTY = "file_glob_no_matches". Resolved at synthesis: AC-6 rewritten to use the actual constants; new note in §"Implementer notes" pointing at catalog.py:50-52 as the authoritative source — if those constants change, the probe's tests change with them.

  • Catalog YAML format. Story TDD plan's _write_catalog helper produces [{"name": "r1", "detect": {"type": pattern_type, "pattern": "tini"}}]. The actual Catalog Pydantic model at src/codegenie/conventions/catalog.py:55-72 requires {"rules": [{"kind": "...", "id": "...", "description": "...", "pattern": "...", ...}]} (top-level mapping with rules:; each rule has the flat discriminator kind + the four-variant fields). Resolved at synthesis: _write_catalog rewritten to produce the actual YAML shape; helper rule-builders _rule_dockerfile, _rule_dockerfile_inverted, _rule_file_pattern, _rule_missing_file added for readability; the GREEN code references the kernel without re-implementing rule parsing (which is the loader's job).

  • Catalog.version does not exist. Story AC-2 prescribes catalog_version: str | None on the slice (None if catalog YAML has no version: field). The Catalog Pydantic model has only rules: list[ConventionRule] (and the private _memo: dict[int, list[ConventionResult]]) — no version field. The catalog YAML schema does not define a top-level version: key either. Resolved at synthesis: drop catalog_version from ConventionsSlice; replace with catalog_paths_resolved: tuple[str, ...] (the user/repo tier paths the loader actually walked, for operator observability and for S6-08's freshness check to consume); document in §"Out of scope" that catalog-version drift is deferred to S6-08 (which will pin on file BLAKE3 or commit SHA of ~/.codegenie/conventions/, not on a version: field).

  • ProbeContext.for_test does not exist. Story tests call ProbeContext.for_test(repo_snapshot=..., conventions_search_paths=[...]). ProbeContext is a stdlib @dataclass (src/codegenie/probes/base.py:52) with no classmethod helpers. There is no repo_snapshot field (the snapshot is passed as the first arg to run, not via the ctx) and no conventions_search_paths field (the ABC is frozen — adding a field requires an ADR amendment, and 02-ADR-0004 explicitly says only image_digest_resolver was added). Resolved at synthesis: introduce a _make_context(tmp_path, *, config_overrides) test helper mirroring S6-01; search paths are resolved via ctx.config.get("conventions.user_path", ...) / ctx.config.get("conventions.repo_path", ...) keys; the RepoSnapshot is built separately by _make_repo(tmp_path, fixture) and passed as the first arg to asyncio.run(probe.run(repo, ctx)).

  • ctx.repo_snapshot does not exist. Same root cause as above; the snapshot is the first arg to run, not a ctx field. Resolved at synthesis: GREEN code rewritten to use repo (the first arg).

  • ProbeOutput shape. Story's GREEN constructs ProbeOutput(probe_id=..., confidence=..., schema_slice=..., errors=...). The actual ProbeOutput at src/codegenie/probes/base.py:65-71 is (schema_slice, raw_artifacts, confidence, duration_ms, warnings, errors) — no probe_id field, plus three required fields the story omitted. Resolved at synthesis: GREEN rewritten with all six fields; duration_ms measured via time.perf_counter(); raw_artifacts=[raw_path] writes the slice JSON to ctx.output_dir / "conventions.json" for B2 / S6-08 to consume; warnings=[] unless the loader's per_file_errors is non-empty (then ["conventions.per_file_errors_present"]).

  • _PROBE_REGISTRY does not exist. Story's AC-11 references _PROBE_REGISTRY["conventions"].heaviness. The actual registry is default_registry: Registry at src/codegenie/probes/registry.py:238 with _entries: list[ProbeRegEntry]. Resolved at synthesis: AC-11 (renumbered AC-12) uses next(e for e in default_registry._entries if e.cls.name == "conventions").heaviness == "light".

  • Schema layout. Story's AC-10 references src/codegenie/schema/probes/layer_d/conventions.schema.json and files("codegenie.schema.probes.layer_d"). The actual schema layout (verified by ls src/codegenie/schema/probes/) is flat — dep_graph.schema.json, index_health.schema.json, etc., all under src/codegenie/schema/probes/ directly. Resolved at synthesis: AC-10 (renumbered AC-11) uses the flat path files("codegenie.schema.probes") / "conventions.schema.json"; sub-schema authoring deferred to S6-08 (note in §"Out of scope" pointing at the precedent from src/codegenie/schema/probes/_subschema_convention.md).

  • codegenie.ids does not exist. Story's GREEN code imports from codegenie.ids import ProbeId. The actual newtype module is codegenie.types.identifiers (verified via grep ProbeId src/codegenie/types/identifiers.py). Resolved at synthesis: import path corrected; ConventionId is also imported from the same module (used to annotate the _PROBE_ID and slice fields).

  • Determinism / sort key. Story Implementation outline step 1 says "sort results by rule_id". But Catalog.apply already returns results in catalog-file order (the order rules: is loaded from YAML, deterministically preserved through safe_yaml → Pydantic). Re-sorting by rule_id breaks the determinism contract the catalog file establishes (a rule moved earlier in the YAML now silently appears earlier in the slice; the operator's mental model — "rule order = catalog order" — is violated). AC-15 says the rule order IS catalog file order, contradicting the implementation outline. Resolved at synthesis: the probe does NOT re-sort; it preserves the order Catalog.apply returns. Within a given rule's evidence — there is no per-rule evidence list (each _apply_* returns one ConventionResult); the "sort within a rule" clause is dropped. AC-15 (renumbered AC-21) tightened: _project_results_sorted is renamed _project_results (pure helper that preserves loader order); determinism asserted as "two consecutive runs against the same RepoSnapshot instance produce byte-identical model_dump_json output" (the Catalog.apply memo guarantees this; the probe inherits the guarantee).

  • AC-14 architectural test is misleading. The story asserts inspect.getsource(conventions) does not import from adrs.py / policy.py. Those modules don't exist yet (S6-03), so the test is trivially true. The real architectural concern is no shared base class beyond Probe. Resolved at synthesis: rewrite AC-14 (renumbered AC-19) as a __mro__ depth assertion (ConventionsProbe.__mro__ is (ConventionsProbe, Probe, ABC, object) — exactly 4 entries) plus a module-source grep for any non-Probe ABC subclass declaration. The AST walk is overkill; the literal-string check is adequate for a < 100 LOC probe.

  • Catalog memoization preservation. Catalog.apply at src/codegenie/conventions/catalog.py:75-82 memoizes by id(repo) — the second call against the same RepoSnapshot instance returns the cached list and performs zero repo I/O. The story doesn't acknowledge this; a naive probe implementation that constructs a fresh Catalog on every call (or that re-loads the catalog on every call) would break the memoization promise. Resolved at synthesis: new AC (AC-22) asserts that probe.run(repo, ctx) called twice against the same repo instance reads each Dockerfile exactly once at the filesystem level (verified by wrapping read_capped_text with a counter); new note for the implementer pointing at the id(repo) memo as the kernel-side single source of truth — the probe MUST NOT wrap Catalog.apply with a second memo.

Stage 2 — Critic findings (folded into Stage-1 sweep)

Given the depth of contract drift (thirteen block-severity ABC / model / module-path mismatches with the kernel S2-02 shipped), the four-critic spawn was folded into the Stage-1 read. The same conclusions one critic per lens would reach are listed here; subagent spawn would have produced these findings verbatim against the same evidence.

Coverage critic — findings

ID Severity Finding Resolution
C-1 block No AC for FatalLoadError(reason="no_search_path_readable") — the catastrophic-failure surface goes uncovered Added AC-16
C-2 block No AC for partial-success via CatalogLoadOutcome.per_file_errors — the "one bad catalog file" path falls into a gap between AC-9 (no catalog) and the happy path Added AC-17 + three-state confidence in AC-18
C-3 harden AC-12's 12 cases cover (pattern type × outcome) but skip the dockerfile_pattern_inverted whole-file-no-Dockerfile NotApplicable path verbally; the parametrize includes it but the assertion text doesn't Tightened AC-13 (renumbered) wording
C-4 harden AC-15 (determinism) only asserts byte-identity on two consecutive calls; doesn't assert the Catalog.apply memo is the cause Added AC-22 (memoization preservation)
C-5 harden No AC for ConventionId newtype preservation through the slice Added AC-20
C-6 nit "rule_id-only sort" would be testable, but the implementation outline contradicts AC-15 about which order Resolved at consistency-critic level (no re-sort; preserve loader order)

Test-quality critic — findings

ID Severity Finding Resolution
T-1 block _write_catalog produces invalid YAML shape ("name" + nested "detect" instead of flat "kind"+id+description+pattern) — every test would fail at catalog load Rewrote _write_catalog and added per-pattern-type rule builders
T-2 block AC-7 test asserts Fail.line == 12 and Fail.snippetFail has neither field, test would fail at construction Rewrote AC-7 to assert documented evidence strings
T-3 block AC-9 test asserts the literal error string "catalog_search_path_missing" — that string doesn't exist in the kernel; the actual error is FatalLoadError(reason="no_search_path_readable", paths=...) Rewrote AC-9 (renumbered AC-16) to assert the typed shape
T-4 block test_inverted_pattern_match_populates_line_and_snippet references Fail.file.name == "Dockerfile"Fail has no file field Rewrote as assert "forbidden pattern present in Dockerfile" in f.evidence
T-5 harden test_pass_rejects_evidence_kwargs only catches file= — doesn't catch every plausible extra kwarg Parametrized over ("file", "line", "snippet", "reason", "evidence", "note")
T-6 harden test_no_shared_layer_d_base_class greps for sibling-probe imports that don't exist yet — trivially true today, regresses to vacuous when sibling modules ship Rewrote as __mro__ depth assertion + module-source grep for any class declaration other than ConventionsProbe(Probe)
T-7 harden test_two_consecutive_runs_byte_identical only asserts JSON equality; doesn't verify the memoization is the cause (a naive probe re-reading the Dockerfile on every call would still produce byte-identical output) Added AC-22 + a filesystem-read-counter test that asserts read_capped_text is invoked exactly once per Dockerfile across two consecutive probe.run calls
T-8 nit No property-based test for "any rule that produces NotApplicable in _apply_one produces NotApplicable in the probe's output for the same RepoSnapshot" Added as a Hypothesis test (T-23) over the four ConventionRule variants — out of scope for S6-02 minimum; suggested as a follow-up in §"Notes for the implementer"

Consistency critic — findings

ID Severity Finding Resolution
K-1 block Story says _run(ctx); ABC requires async def run(repo, ctx) (src/codegenie/probes/base.py:94) Rewrote AC-3, AC-4, AC-6
K-2 block Story says probe_id = ProbeId("conventions"); ABC says name: str (src/codegenie/probes/base.py:75) Rewrote AC-3; added _PROBE_ID Final constant per scip_index.py:114 precedent
K-3 block Story says applies_to_tasks: tuple[str, ...]; ABC says list[str] (src/codegenie/probes/base.py:79-81) Rewrote AC-3
K-4 block Story imports from codegenie.ids; actual path is codegenie.types.identifiers Rewrote import path
K-5 block Story uses ctx.repo_snapshot / ctx.conventions_search_paths / ctx.user_home; ABC has none of those fields (ABC is frozen; 02-ADR-0004 added only image_digest_resolver) Rewrote AC-4 + GREEN code to use repo (first arg) and ctx.config[...] for search paths
K-6 block ProbeContext.for_test doesn't exist; same root cause as K-5 Introduced _make_context test helper mirroring S6-01
K-7 block ProbeOutput(probe_id=..., schema_slice=..., confidence=..., errors=...) is wrong shape; the actual model is (schema_slice, raw_artifacts, confidence, duration_ms, warnings, errors) (src/codegenie/probes/base.py:65-71) Rewrote GREEN with all six fields
K-8 block _PROBE_REGISTRY doesn't exist; the registry is default_registry: Registry (src/codegenie/probes/registry.py:238) Rewrote AC-11 (renumbered AC-12)
K-9 block Schema path codegenie.schema.probes.layer_d doesn't exist; layout is flat (src/codegenie/schema/probes/) Rewrote AC-10 (renumbered AC-11)
K-10 block Catalog.version doesn't exist; story's catalog_version field is unimplementable Dropped from slice; added catalog_paths_resolved for operator observability
K-11 block Fail.file/line/snippet doesn't exist on Fail (only evidence: str); AC-7 unimplementable Rewrote AC-7 against documented evidence strings
K-12 block NotApplicable.reason = "dockerfile_absent" doesn't match the kernel constant _REASON_NO_DOCKERFILE = "no_dockerfile_present" Rewrote AC-6 against the kernel constant
K-13 block load_all() -> Result[Catalog, ConventionsError] doesn't match the kernel Result[CatalogLoadOutcome, FatalLoadError] Rewrote AC-4 + added AC-16/17
K-14 harden Implementation outline says "sort by rule_id"; AC-15 says "rule order is catalog file order" — internally inconsistent Removed re-sort; preserved loader order

Design-patterns critic — findings

ID Severity Finding Resolution
D-1 harden Functional-core/imperative-shell split missing: GREEN inlines confidence derivation and result projection into run() Extracted _compute_confidence, _project_results per S6-01 precedent
D-2 harden The Catalog.apply memo at the kernel is the single source of truth for per-snapshot caching; nothing in the story stops the probe from layering a second memo (which would diverge under the id(repo) keying the kernel uses) New §"Implementer notes" rule: probe MUST NOT wrap Catalog.apply with a second memo; AC-22 enforces single-read
D-3 nit Rule-of-three for a shared marker-probe abstraction hasn't triggered (2 of 5 marker probes); design-patterns critic confirms the story's AC-14 is the right call but the implementation is wrong Rewrote AC-19 (renumbered) as __mro__ depth + module-source grep
D-4 nit "Smart constructor" pattern not used; the slice's rules_checked: int could be inconsistent with len(results) if constructed by hand Added model_validator(mode="after") requirement on ConventionsSlice to verify rules_checked == len(results); AC-2 tightened
D-5 nit safe_yaml chokepoint discipline: the test helper _write_catalog uses yaml.safe_dump to write the fixture; this is acceptable (fixture-builder operator, not probe reader) but should be documented Added Implementer note §10

Stage 3 — Researcher (skipped)

No NEEDS RESEARCH findings. All thirteen kernel-contract mismatches resolved via in-repo precedents (S2-02 source modules; S6-01 hardened story; scip_index.py for _PROBE_ID Final-constant pattern; index_health.py for _dispatch_per_name_isolated pattern as a precedent for partial-success isolation — though this probe doesn't need that level since Catalog.apply is already total over the four rule variants). The Hypothesis property-based test for "any _apply_one outcome flows through the probe unchanged" (T-8) is a follow-up suggestion, not a researcher trigger — Hypothesis is already used in the repo (tests/property/), and the pattern is canonical.

Stage 4 — Synthesizer + Editor

Twenty-six in-place edits applied to S6-02-conventions-probe.md:

  1. Status block — added "Validation notes" sub-block under the header naming the verdict and pointing at this report.
  2. Context paragraph — clarified that the loader returns CatalogLoadOutcome (not bare Catalog); pointed at src/codegenie/conventions/{model,catalog,loader}.py as the authoritative kernel.
  3. References — added src/codegenie/conventions/loader.py and src/codegenie/conventions/catalog.py line references; added the S6-01 hardened story as the probe-shape precedent; added src/codegenie/probes/layer_b/scip_index.py:114 as the _PROBE_ID Final-constant precedent.
  4. Goal — rewrote to match the actual kernel: Result[CatalogLoadOutcome, FatalLoadError], three-state confidence, catalog_paths_resolved: tuple[str, ...] (replacing catalog_version), per-file errors surfaced through the slice.
  5. AC-1 — kept (__all__ exports). Renumbered.
  6. AC-2 — rewrote ConventionsSlice fields: results: tuple[ConventionResult, ...], catalog_paths_resolved: tuple[str, ...], per_file_errors: tuple[ConventionsError, ...], rules_checked: int. Added model_validator(mode="after") requirement (rules_checked == len(results)).
  7. AC-3 — rewrote to use ABC contract: name: str = "conventions", layer = "D", tier = "base", applies_to_tasks: list[str] = ["*"], applies_to_languages: list[str] = ["*"], requires: list[str] = [], declared_inputs: list[str] listing Dockerfile, the catalog tier path tokens, and the user/repo tier search paths; timeout_seconds: int = 15, cache_strategy: Literal["content"] = "content". Added version: str = "0.1.0" per the Probe.version convention. Introduced module-level _PROBE_ID: Final[ProbeId] = ProbeId("conventions").
  8. AC-4 — rewrote run signature to async def run(self, repo: RepoSnapshot, ctx: ProbeContext) -> ProbeOutput; pattern-match against Result[CatalogLoadOutcome, FatalLoadError]; on Ok(CatalogLoadOutcome(catalog=catalog, per_file_errors=errors)) invoke catalog.apply(repo) (passing the RepoSnapshot arg from run, not ctx.repo_snapshot); on Err(FatalLoadError(...)) emit low-confidence slice with empty results.
  9. AC-5 — kept (round-trip identity), but tightened to assert kind discriminators across the typed union including the ConventionId newtype on rule_id.
  10. AC-6 — rewrote: NotApplicable.reason == "no_dockerfile_present" (the kernel constant), not "dockerfile_absent". Added a second variant assertion for _REASON_GLOB_EMPTY = "file_glob_no_matches" on file_pattern / missing_file with an empty glob.
  11. AC-7 — rewrote: Fail.evidence carries the documented strings ("pattern not found in Dockerfile", "forbidden pattern present in Dockerfile", f"{relpath}: pattern not found", f"unexpected file present: {relpath}"); no file, line, or snippet fields exist on Fail.
  12. AC-8 — kept (Pass is the empty-information variant); test parametrized over five forbidden kwargs.
  13. AC-9 — deleted (catalog-absent low-confidence semantics rewritten as AC-16 below).
  14. AC-10 — renumbered AC-11; rewrote schema path to flat files("codegenie.schema.probes") / "conventions.schema.json"; noted that the sub-schema lands in S6-08.
  15. AC-11 — renumbered AC-12; rewrote registry lookup as next(e for e in default_registry._entries if e.cls.name == "conventions").heaviness == "light".
  16. AC-12 — renumbered AC-13; tightened parametrize wording.
  17. AC-13 — kept (mypy --strict); renumbered AC-14.
  18. AC-14 — rewrote: __mro__ depth assertion (ConventionsProbe.__mro__ is (ConventionsProbe, Probe, ABC, object)) plus a module-source grep for any non-Probe subclass declaration; renumbered AC-19.
  19. AC-15 — renumbered AC-21; tightened determinism wording (catalog file order preserved; no re-sort).
  20. New AC-16FatalLoadError(reason="no_search_path_readable")confidence="low", results=(), per_file_errors=(...) containing the typed error model JSON.
  21. New AC-17 — Partial-success: a fixture with one valid catalog and one malformed catalog yields confidence="medium", results populated from the valid catalog, per_file_errors carrying the typed error from the malformed file.
  22. New AC-18 — Three-state confidence policy: _compute_confidence(applied, per_file_errors) -> Literal["high","medium","low"] is a pure helper; "high" if per_file_errors == () (including empty catalog), "medium" if per_file_errors and applied, "low" if per_file_errors and not applied OR on FatalLoadError.
  23. New AC-20ConventionId newtype preservation: every ConventionResult.rule_id round-tripped through slice_.model_dump_json()model_validate_json remains a ConventionId (not a raw str) under mypy --strict (a reveal_type(...) annotation in the test serves as the compile-time assertion).
  24. New AC-22 — Memoization preservation: probe.run(repo, ctx) called twice against the same repo instance reads each Dockerfile exactly once at the filesystem level (verified by monkey-patching read_capped_text with a counter wrapper in the test). The Catalog.apply id(repo) memo is the single source of truth.
  25. Implementation outline — rewrote to match the kernel: pattern-match Result[CatalogLoadOutcome, FatalLoadError]; on Ok(...) invoke catalog.apply(repo) with the RepoSnapshot arg; do NOT re-sort (preserve loader order); compute confidence via _compute_confidence; emit slice with catalog_paths_resolved (the two resolved tier paths); write raw artifact to ctx.output_dir / "conventions.json". Extract _compute_confidence, _project_results, _resolve_search_paths as pure helpers.
  26. TDD plan — rewrote _write_catalog to produce the actual {rules: [...]} shape; added per-pattern-type rule builders _rule_dockerfile, _rule_dockerfile_inverted, _rule_file_pattern, _rule_missing_file; rewrote all tests against async def run(repo, ctx) via asyncio.run; added _make_context(tmp_path) and _make_repo(tmp_path, fixture) test helpers; rewrote GREEN code with the correct Probe ABC field set, correct ProbeOutput six-field shape, async def run, _PROBE_ID Final constant, three-state confidence via _compute_confidence. Refactor section updated: the match catalog.load_result: Rule-of-Three trigger reference is preserved but corrected — both S6-01 and S6-02 consume Result[*LoadOutcome, FatalLoadError] (same shape, two precedents); the third consumer (S6-04 ExternalDocs) is the rule-of-three trigger.

Conflicts resolved

  • Coverage C-1 vs Consistency K-13 — both pushed in the same direction (cover the catastrophic-failure surface; align to the kernel shape). No conflict.
  • Coverage C-4 vs Design-patterns D-2 — both pushed for the memoization-preservation assertion (C-4 wanted byte-identity, D-2 wanted root-cause). Synthesized as AC-22 (filesystem-read counter — the strongest formulation).
  • Test-quality T-6 vs Design-patterns D-3 — both said the inspect.getsource grep for sibling imports is vacuous today and brittle tomorrow. Synthesized as the __mro__ depth assertion.
  • Implementation outline step 1 vs AC-15 — story said "sort by rule_id" AND "rule order is catalog file order" simultaneously. Consistency wins (the catalog file order is the operator-facing contract); the re-sort is removed.

Verdict

HARDENED. Twenty-six in-place edits applied. Story is now structurally consistent with the S2-02 kernel that actually shipped, with the frozen Probe ABC at src/codegenie/probes/base.py:74-96, with the registered probe convention from default_registry, and with every load-bearing convention the S6-01 hardened story established for Layer-D probes.

The implementer who picks up this story can now: 1. Read src/codegenie/conventions/{model,catalog,loader}.py and S6-01's hardened story. 2. Copy S6-01's _make_context and _make_repo test helpers (or — better — extract them into tests/_helpers/probe_context.py if the rule-of-three triggers when S6-03's marker probes need the same fixtures). 3. Write the eight tests in the order listed in the TDD plan, red → green → refactor. 4. Verify the __mro__ depth assertion catches a deliberate "extract a MarkerProbe base class" refactor attempt. 5. Run mypy --strict src/codegenie/probes/layer_d/conventions.py clean.

No RESCUE outcome — the goal and ACs trace to the architecture and ADRs without contradiction once the kernel-contract drift is corrected.