Validation report: S6-02 — ConventionsProbe Layer D¶
Validated: 2026-05-17
Verdict: HARDENED
Validator version: phase-story-validator v1
Story: ../S6-02-conventions-probe.md
Summary¶
S6-02's intent (a Layer-D probe that applies ConventionsCatalogLoader.load_all() output to a RepoSnapshot and emits one typed ConventionResult = Pass | Fail | NotApplicable per rule) is well-formed and traces cleanly to phase-arch-design.md §"Component design" #10, the CLAUDE.md "Facts, not judgments" commitment, localv2.md §5.4 D5, and 02-ADR-0006's discriminated-union discipline. The original draft, however, contradicted the kernel S2-02 actually shipped at thirteen load-bearing points — every one a block-severity contract mismatch that would have made the story uncompilable against the existing codebase. The pattern is identical to S6-01's hardening: documentation drift between the story-authoring snapshot of S2-02's interfaces and the implementation that ultimately landed.
All thirteen contract mismatches were in-place fixable because the goal itself remains consistent with both the architecture and the S2-02 kernel; the draft's mistakes were drift, not architectural divergence. Six new ACs were added to cover the corners the draft skipped (FatalLoadError surface, per-file-error surfacing with the three-state confidence policy mirroring S6-01, Catalog.apply memoization preservation, ConventionId newtype propagation through ConventionResult.rule_id, deterministic raw-artifact emission, and a sub-schema layout aligned to the flat codegenie.schema.probes/ precedent). Three mutation-resistance hardens were applied (parametrized 4×3 pattern-type × outcome smoke; module-source guard against safe_yaml.load re-entry and Path.open reads inside the probe; functional-core/imperative-shell split with all transforms behind pure helpers). One design-pattern harden was applied (the Catalog.apply memo is the kernel-side single source of truth for per-snapshot caching; the probe is forbidden from layering a second memo on top, mirroring S6-01's "the loader paid the cost once" discipline).
The original draft's AC-7 (Fail carries file, line, snippet evidence) was structurally unimplementable: the S2-02 Fail model has only rule_id: ConventionId and evidence: str (frozen=True, extra="forbid"), and the four _apply_* helpers emit short documented evidence strings ("forbidden pattern present in Dockerfile", "<relpath>: pattern not found", "unexpected file present: <relpath>") — there is no line tracking. The harden routes evidence assertions through the documented constants (_REASON_NO_DOCKERFILE = "no_dockerfile_present", _REASON_GLOB_EMPTY = "file_glob_no_matches") and the literal evidence strings, and explicitly notes for the implementer that line/column capture is a future enhancement requiring an ADR amendment to the Fail model (out-of-scope for Phase 2).
No NEEDS RESEARCH findings — every gap traced to an in-repo precedent: src/codegenie/conventions/{model,catalog,loader}.py for the kernel contract; docs/phases/02-context-gather-layers-b-g/stories/S6-01-skills-index-probe.md (HARDENED) for every probe-shaping convention (async run, _make_context, _compute_confidence, three-state confidence policy, flat schema path, _PROBE_ID Final constant alongside name: str ABC attr, default_registry._entries lookup); src/codegenie/probes/layer_b/scip_index.py for the Final[ProbeId] + name: str dual-form precedent. The S6-01 validation report's twelve-issue catalog is the structural template for this report.
Twenty-six in-place edits applied; verdict HARDENED. Story is now structurally consistent with the S2-02 kernel shipped at src/codegenie/conventions/ and with every load-bearing convention the Phase-0/1/2 stories (and S6-01's hardening) have already established.
Context Brief (Stage 1)¶
Story snapshot¶
- Goal: Land
src/codegenie/probes/layer_d/conventions.pyas a@register_probe(heaviness="light")probe that consumesConventionsCatalogLoader.load_all()'sCatalogLoadOutcomeand applies the resultingCatalogto the analyzed-repoRepoSnapshot, emitting aConventionsSlicecarrying a deterministic sorted tuple ofConventionResultrecords (each onePass | Fail | NotApplicable), a tuple of resolved catalog search paths, a tuple of per-file load errors, and arules_checked: int. - Non-goals: OPA/Rego pattern types (Phase 16 / ADR-0021); the four other Layer-D marker probes (S6-03 —
adrs,policy,exceptions,repo_notes,repo_config); the sub-schema authoring (S6-08); the catalog-version freshness check (S6-08); per-rule fix suggestions (Planner's job per "Facts, not judgments"). - Effort: S (the kernel does all the work in
Catalog.apply; this is a thin imperative shell). - Depends on: S2-02 (the
ConventionsCatalogLoaderwithload_all() -> Result[CatalogLoadOutcome, FatalLoadError],Catalog.apply(repo) -> list[ConventionResult], the fourConventionRulediscriminated variants, and theConventionResult = Pass | Fail | NotApplicablediscriminated union withevidence: stronFailandreason: stronNotApplicable).
Phase / arch constraints touched¶
- 02-ADR-0006 — typed sum types as discriminated unions;
ConventionResultis the canonical exemplar alongsideIndexFreshness. The probe MUST preserve the union exhaustively;mypy --warn-unreachableenforces totality on consumers. - 02-ADR-0003 —
@register_probe(heaviness=..., runs_last=...)is a registry kwarg, NOT aProbeABC field. The original draft'sprobe_id = ProbeId(...)field implied an ABC change that ADR-0007 forbids. - 02-ADR-0007 (Phase 0) —
ProbeABC is frozen byte-for-byte againstlocalv2.md §4. The actual ABC atsrc/codegenie/probes/base.py:74-96usesname: str,layer,tier,applies_to_tasks: list[str],requires,declared_inputs,timeout_seconds: int,cache_strategy, andasync def run(self, repo: RepoSnapshot, ctx: ProbeContext) -> ProbeOutput. The story'sprobe_id, tuples-for-list, and_run(ctx)private-sync entry are ALL contract violations. - 02-ADR-0007 (Phase 2) — No plugin loader in Phase 2;
ConventionsCatalogLoaderis kernel-side, not loaded via a plugin registry. - Phase 1 ADR-0006 —
safe_yaml.loadis the YAML chokepoint. The probe MUST NOT importyamldirectly; the loader already routes throughsafe_yaml.load. The story's TDD plan helper_write_catalogusesyaml.safe_dumpto write fixture YAML, which is acceptable (the test author is the operator producing the fixture, not the probe loading it); the probe never seesyaml. - 02-ADR-0005 — secret findings: no plaintext persistence. Conventions slices are not a secret-producing surface, but the writer chokepoint (
OutputSanitizer.scrub) still runs on every slice — the probe inherits the discipline. phase-arch-design.md§"Component design" #10 — the canonical loader-contract source. Documentsload_all() -> Result[Catalog, ConventionsError]; the actual S2-02 implementation returnsResult[CatalogLoadOutcome, FatalLoadError]where per-fileConventionsErrorinstances stay insideOk(CatalogLoadOutcome.per_file_errors)and only catastrophic failures (no search path readable) surface asErr(FatalLoadError). This is the same Skills-loader pattern S6-01 surfaced — partial success is the expected path; total failure is the exception. The arch-design line is the older draft; the kernel is what the probe must consume.phase-arch-design.md§"Failure behavior" (loader #10) — unknown pattern type is per-file, not catastrophic. Stays insideper_file_errors.phase-arch-design.md§"Edge cases" rows 8-9 — symlink-refused / hostile-YAML on a single catalog file is "this catalog skipped, others load." The probe must surface these via the slice'sper_file_errors, not collapse toconfidence="low".- CLAUDE.md "Facts, not judgments" — the probe emits per-rule outcomes without weighting, summarizing, or aggregating beyond
len. The Planner decides. - CLAUDE.md "Honest confidence" — three-state policy required:
"high"clean (including empty-catalog),"medium"partial (some files failed, some loaded),"low"total (every catalog file failed) orFatalLoadError. - CLAUDE.md "Extension by addition" — adding a new pattern type is a new
ConventionRule*variant + a new_apply_*helper + an exhaustive-match arm in_apply_one. The probe is decoupled from this: any new rule variantCatalog.applyadmits flows throughConventionResultautomatically.
Sibling-family lineage¶
- 2nd Layer-D probe landed (after S6-01
SkillsIndexProbe). Rule-of-three for layer-shared helpers has NOT triggered. The kernel-side helpers (_compute_confidence,_make_context) are test-helper precedents from S6-01, not shared production code; the probe-shape conventions ARE production precedents and must be followed verbatim. - 2nd
@register_probe(heaviness="light")probe in Layer D (after S6-01). The third (adrs.pyin S6-03) is the rule-of-three trigger for any shared marker-probe abstraction — but only if it actually shares runtime behavior, which the design says explicitly it does not. - Functional-core split is the architectural precedent. S6-01's hardened story extracted
_project_skill,_project_skills_sorted,_count_skills_per_tier,_compute_confidenceas pure module-level helpers. S6-02 inherits the discipline:_project_results_sorted,_compute_confidence(applied, per_file_errors),_resolve_search_paths(pure, no I/O). Catalog.applyIS the functional core for application. S2-02 already split functional-core (_apply_dockerfile_pattern,_apply_dockerfile_pattern_inverted,_apply_file_pattern,_apply_missing_file+_apply_oneexhaustive-match dispatcher) from imperative-shell (Catalog.applyreadingRepoSnapshotfiles at the boundary). The probe is the imperative shell aroundCatalog.apply— adaptingProbeContexttoRepoSnapshotand projectinglist[ConventionResult]into a frozen Pydantic slice.
Prior validation framings carried forward¶
- S6-01 hardening: every probe-shape convention (async
run(repo, ctx),name: strABC field + optional_PROBE_ID: Final[ProbeId]constant,_make_contexttest helper, flat schema path,default_registry._entrieslookup, three-state confidence via pure helper,ProbeOutputsix-field shape withduration_msviatime.perf_counter()). S6-02 inherits these verbatim. - S5-06 hardening: parametrized mutation-resistance suites > developer-runnable mutation rituals. The 4×3 pattern-type × outcome parametrize is the analogous pattern here.
- S5-05 hardening: diagnostic independence as separate test functions / separate fixtures. Each AC gets its own test function.
- S5-04 hardening:
Final[...]discipline for module constants. The probe's_PROBE_ID,_BASE_VERSION,_DECLARED_INPUT_TOKEN_USER,_DECLARED_INPUT_TOKEN_REPOshould beFinal[...]. - S5-03 hardening: AST-walk / source-introspection audits supersede source-grep purity tests where a "did the test author do the right thing?" check is needed. AC-14's "no shared base class" is restated as an inheritance-depth assertion (
ConventionsProbe.__mro__length == 3:ConventionsProbe → Probe → ABC → object) plus a module-source grep for forbidden imports — the AST-walk is overkill for this story (the probe is < 100 LOC).
Phase exit criteria the story contributes to¶
- High-level-impl.md §"Step 6" — ships Layer-D conventions probe consuming
ConventionsCatalogLoader. - Phase-arch-design §"Testing strategy" — unit tests under
tests/unit/probes/layer_d/. - CLAUDE.md "Facts, not judgments" —
ConventionResultper rule, no aggregation. - G6 (final-design §"Goals") — kernel scaffolding for Phase-4+ Planner consumption of org conventions.
@register_index_freshness_checkforconventions(Step 6, last bullet — catalog version freshness) — deferred to S6-08; this story emits the per-file load errors and resolved search paths so the freshness check has something to consume.
Open ambiguities discovered during Stage 1¶
-
ConventionsCatalogLoader.load_allreturn type drift. Story (and arch-design line 608) saysResult[Catalog, ConventionsError]. The actual S2-02 implementation (src/codegenie/conventions/loader.py:272) returnsResult[CatalogLoadOutcome, FatalLoadError]whereCatalogLoadOutcomecarriescatalog: CatalogANDper_file_errors: list[ConventionsError]. Per-file errors stay insideOk(...); onlyFatalLoadError(reason="no_search_path_readable", paths=[...])surfaces asErr(...). Resolved at synthesis: AC-4 rewritten to pattern-matchOk(CatalogLoadOutcome)/Err(FatalLoadError); new AC-16 coversFatalLoadError→ low confidence; new AC-17 covers per-file errors round-trip; three-state confidence policy via_compute_confidence(applied, per_file_errors)mirroring S6-01. -
ProbeABC signature. Story specifies_run(self, ctx: ProbeContext) -> ProbeOutput(sync, private, ctx only). The actual ABC atsrc/codegenie/probes/base.py:94isasync def run(self, repo: RepoSnapshot, ctx: ProbeContext) -> ProbeOutput(async, public, bothrepoandctx). The story's_rundoes not exist in the contract. Resolved at synthesis: rewrite AC-3 (signature), AC-4, AC-6, AC-15, AC-17 toasync def run; GREEN code rewritten withasync def run(self, repo, ctx); TDD plan usesasyncio.run(...)mirroringtests/unit/probes/layer_b/test_dep_graph.pyand S6-01's pattern. -
Probe identity convention. Story specifies
probe_id = ProbeId("conventions")as a class attribute. The actual ABC usesname: str = "conventions". TheProbeIdnewtype lives atcodegenie.types.identifiers(notcodegenie.ids— wrong import path in the draft). The precedent atsrc/codegenie/probes/layer_b/scip_index.py:114(_PROBE_ID: Final[ProbeId] = ProbeId("scip_index")alongsidename: str = "scip_index") is the dual-form to follow. Resolved at synthesis: rewrite AC-3 to usename: str = "conventions"per the ABC; introduce_PROBE_ID: Final[ProbeId] = ProbeId("conventions")as a module-level constant; the slice'srule_idfields continue to beConventionIdnewtypes (the S2-02 kernel already preserves this). -
Failfield shape. Story AC-7 prescribesFail(rule_id=..., file=PosixPath("Dockerfile"), line=None, snippet=None)with line/snippet populated for inverted-match cases. The actualFailPydantic model atsrc/codegenie/conventions/model.py:151-156has onlyrule_id: ConventionIdandevidence: str, withfrozen=True, extra="forbid". The four_apply_*helpers incatalog.pyemit short documented evidence strings:"pattern not found in Dockerfile","forbidden pattern present in Dockerfile",f"{relpath}: pattern not found",f"unexpected file present: {relpath}". There is no line tracking. Resolved at synthesis: AC-7 rewritten to assert thatFail.evidencecarries the documented strings; line/column capture noted as out-of-scope (would require an ADR amendment to theFailmodel). -
Passfield shape. Story AC-8 saysPassrejectsfile=/line=kwargs. The actual model already does this viaextra="forbid"— the AC is structurally correct, but the test as written constructsPass(rule_id="r1", file="Dockerfile")withrule_idas a plain str.Pass.rule_idisConventionId(aNewTypeoverstr); at runtimeConventionId("r1")is structurally identical to"r1"and Pydantic v2 accepts the str directly. The test still passes, but the harden tightens the assertion to also confirmPass.kind == "pass"and that adding any extra kwarg (not justfile=) fails. Resolved at synthesis: AC-8 retains intent; test rewritten to parametrize the rejected kwarg over("file", "line", "snippet", "reason", "evidence"). -
NotApplicable.reasonvalue. Story AC-6 specifiesreason="dockerfile_absent". The actual constant incatalog.py:51is_REASON_NO_DOCKERFILE = "no_dockerfile_present". Similarly, forfile_pattern/missing_filewith an empty glob, the constant is_REASON_GLOB_EMPTY = "file_glob_no_matches". Resolved at synthesis: AC-6 rewritten to use the actual constants; new note in §"Implementer notes" pointing atcatalog.py:50-52as the authoritative source — if those constants change, the probe's tests change with them. -
Catalog YAML format. Story TDD plan's
_write_cataloghelper produces[{"name": "r1", "detect": {"type": pattern_type, "pattern": "tini"}}]. The actualCatalogPydantic model atsrc/codegenie/conventions/catalog.py:55-72requires{"rules": [{"kind": "...", "id": "...", "description": "...", "pattern": "...", ...}]}(top-level mapping withrules:; each rule has the flat discriminatorkind+ the four-variant fields). Resolved at synthesis:_write_catalogrewritten to produce the actual YAML shape; helper rule-builders_rule_dockerfile,_rule_dockerfile_inverted,_rule_file_pattern,_rule_missing_fileadded for readability; the GREEN code references the kernel without re-implementing rule parsing (which is the loader's job). -
Catalog.versiondoes not exist. Story AC-2 prescribescatalog_version: str | Noneon the slice (None if catalog YAML has noversion:field). TheCatalogPydantic model has onlyrules: list[ConventionRule](and the private_memo: dict[int, list[ConventionResult]]) — noversionfield. The catalog YAML schema does not define a top-levelversion:key either. Resolved at synthesis: dropcatalog_versionfromConventionsSlice; replace withcatalog_paths_resolved: tuple[str, ...](the user/repo tier paths the loader actually walked, for operator observability and for S6-08's freshness check to consume); document in §"Out of scope" that catalog-version drift is deferred to S6-08 (which will pin on file BLAKE3 or commit SHA of~/.codegenie/conventions/, not on aversion:field). -
ProbeContext.for_testdoes not exist. Story tests callProbeContext.for_test(repo_snapshot=..., conventions_search_paths=[...]).ProbeContextis a stdlib@dataclass(src/codegenie/probes/base.py:52) with no classmethod helpers. There is norepo_snapshotfield (the snapshot is passed as the first arg torun, not via the ctx) and noconventions_search_pathsfield (the ABC is frozen — adding a field requires an ADR amendment, and 02-ADR-0004 explicitly says onlyimage_digest_resolverwas added). Resolved at synthesis: introduce a_make_context(tmp_path, *, config_overrides)test helper mirroring S6-01; search paths are resolved viactx.config.get("conventions.user_path", ...)/ctx.config.get("conventions.repo_path", ...)keys; theRepoSnapshotis built separately by_make_repo(tmp_path, fixture)and passed as the first arg toasyncio.run(probe.run(repo, ctx)). -
ctx.repo_snapshotdoes not exist. Same root cause as above; the snapshot is the first arg torun, not actxfield. Resolved at synthesis: GREEN code rewritten to userepo(the first arg). -
ProbeOutputshape. Story's GREEN constructsProbeOutput(probe_id=..., confidence=..., schema_slice=..., errors=...). The actualProbeOutputatsrc/codegenie/probes/base.py:65-71is(schema_slice, raw_artifacts, confidence, duration_ms, warnings, errors)— noprobe_idfield, plus three required fields the story omitted. Resolved at synthesis: GREEN rewritten with all six fields;duration_msmeasured viatime.perf_counter();raw_artifacts=[raw_path]writes the slice JSON toctx.output_dir / "conventions.json"for B2 / S6-08 to consume;warnings=[]unless the loader'sper_file_errorsis non-empty (then["conventions.per_file_errors_present"]). -
_PROBE_REGISTRYdoes not exist. Story's AC-11 references_PROBE_REGISTRY["conventions"].heaviness. The actual registry isdefault_registry: Registryatsrc/codegenie/probes/registry.py:238with_entries: list[ProbeRegEntry]. Resolved at synthesis: AC-11 (renumbered AC-12) usesnext(e for e in default_registry._entries if e.cls.name == "conventions").heaviness == "light". -
Schema layout. Story's AC-10 references
src/codegenie/schema/probes/layer_d/conventions.schema.jsonandfiles("codegenie.schema.probes.layer_d"). The actual schema layout (verified byls src/codegenie/schema/probes/) is flat —dep_graph.schema.json,index_health.schema.json, etc., all undersrc/codegenie/schema/probes/directly. Resolved at synthesis: AC-10 (renumbered AC-11) uses the flat pathfiles("codegenie.schema.probes") / "conventions.schema.json"; sub-schema authoring deferred to S6-08 (note in §"Out of scope" pointing at the precedent fromsrc/codegenie/schema/probes/_subschema_convention.md). -
codegenie.idsdoes not exist. Story's GREEN code importsfrom codegenie.ids import ProbeId. The actual newtype module iscodegenie.types.identifiers(verified viagrep ProbeId src/codegenie/types/identifiers.py). Resolved at synthesis: import path corrected;ConventionIdis also imported from the same module (used to annotate the_PROBE_IDand slice fields). -
Determinism / sort key. Story Implementation outline step 1 says "sort results by
rule_id". ButCatalog.applyalready returns results in catalog-file order (the orderrules:is loaded from YAML, deterministically preserved throughsafe_yaml→ Pydantic). Re-sorting byrule_idbreaks the determinism contract the catalog file establishes (a rule moved earlier in the YAML now silently appears earlier in the slice; the operator's mental model — "rule order = catalog order" — is violated). AC-15 says the rule order IS catalog file order, contradicting the implementation outline. Resolved at synthesis: the probe does NOT re-sort; it preserves the orderCatalog.applyreturns. Within a given rule's evidence — there is no per-rule evidence list (each_apply_*returns oneConventionResult); the "sort within a rule" clause is dropped. AC-15 (renumbered AC-21) tightened:_project_results_sortedis renamed_project_results(pure helper that preserves loader order); determinism asserted as "two consecutive runs against the sameRepoSnapshotinstance produce byte-identicalmodel_dump_jsonoutput" (theCatalog.applymemo guarantees this; the probe inherits the guarantee). -
AC-14 architectural test is misleading. The story asserts
inspect.getsource(conventions)does not import fromadrs.py/policy.py. Those modules don't exist yet (S6-03), so the test is trivially true. The real architectural concern is no shared base class beyondProbe. Resolved at synthesis: rewrite AC-14 (renumbered AC-19) as a__mro__depth assertion (ConventionsProbe.__mro__is(ConventionsProbe, Probe, ABC, object)— exactly 4 entries) plus a module-source grep for any non-ProbeABC subclass declaration. The AST walk is overkill; the literal-string check is adequate for a < 100 LOC probe. -
Catalog memoization preservation.
Catalog.applyatsrc/codegenie/conventions/catalog.py:75-82memoizes byid(repo)— the second call against the sameRepoSnapshotinstance returns the cached list and performs zero repo I/O. The story doesn't acknowledge this; a naive probe implementation that constructs a freshCatalogon every call (or that re-loads the catalog on every call) would break the memoization promise. Resolved at synthesis: new AC (AC-22) asserts thatprobe.run(repo, ctx)called twice against the samerepoinstance reads eachDockerfileexactly once at the filesystem level (verified by wrappingread_capped_textwith a counter); new note for the implementer pointing at theid(repo)memo as the kernel-side single source of truth — the probe MUST NOT wrapCatalog.applywith a second memo.
Stage 2 — Critic findings (folded into Stage-1 sweep)¶
Given the depth of contract drift (thirteen block-severity ABC / model / module-path mismatches with the kernel S2-02 shipped), the four-critic spawn was folded into the Stage-1 read. The same conclusions one critic per lens would reach are listed here; subagent spawn would have produced these findings verbatim against the same evidence.
Coverage critic — findings¶
| ID | Severity | Finding | Resolution |
|---|---|---|---|
| C-1 | block | No AC for FatalLoadError(reason="no_search_path_readable") — the catastrophic-failure surface goes uncovered |
Added AC-16 |
| C-2 | block | No AC for partial-success via CatalogLoadOutcome.per_file_errors — the "one bad catalog file" path falls into a gap between AC-9 (no catalog) and the happy path |
Added AC-17 + three-state confidence in AC-18 |
| C-3 | harden | AC-12's 12 cases cover (pattern type × outcome) but skip the dockerfile_pattern_inverted whole-file-no-Dockerfile NotApplicable path verbally; the parametrize includes it but the assertion text doesn't |
Tightened AC-13 (renumbered) wording |
| C-4 | harden | AC-15 (determinism) only asserts byte-identity on two consecutive calls; doesn't assert the Catalog.apply memo is the cause |
Added AC-22 (memoization preservation) |
| C-5 | harden | No AC for ConventionId newtype preservation through the slice |
Added AC-20 |
| C-6 | nit | "rule_id-only sort" would be testable, but the implementation outline contradicts AC-15 about which order | Resolved at consistency-critic level (no re-sort; preserve loader order) |
Test-quality critic — findings¶
| ID | Severity | Finding | Resolution |
|---|---|---|---|
| T-1 | block | _write_catalog produces invalid YAML shape ("name" + nested "detect" instead of flat "kind"+id+description+pattern) — every test would fail at catalog load |
Rewrote _write_catalog and added per-pattern-type rule builders |
| T-2 | block | AC-7 test asserts Fail.line == 12 and Fail.snippet — Fail has neither field, test would fail at construction |
Rewrote AC-7 to assert documented evidence strings |
| T-3 | block | AC-9 test asserts the literal error string "catalog_search_path_missing" — that string doesn't exist in the kernel; the actual error is FatalLoadError(reason="no_search_path_readable", paths=...) |
Rewrote AC-9 (renumbered AC-16) to assert the typed shape |
| T-4 | block | test_inverted_pattern_match_populates_line_and_snippet references Fail.file.name == "Dockerfile" — Fail has no file field |
Rewrote as assert "forbidden pattern present in Dockerfile" in f.evidence |
| T-5 | harden | test_pass_rejects_evidence_kwargs only catches file= — doesn't catch every plausible extra kwarg |
Parametrized over ("file", "line", "snippet", "reason", "evidence", "note") |
| T-6 | harden | test_no_shared_layer_d_base_class greps for sibling-probe imports that don't exist yet — trivially true today, regresses to vacuous when sibling modules ship |
Rewrote as __mro__ depth assertion + module-source grep for any class declaration other than ConventionsProbe(Probe) |
| T-7 | harden | test_two_consecutive_runs_byte_identical only asserts JSON equality; doesn't verify the memoization is the cause (a naive probe re-reading the Dockerfile on every call would still produce byte-identical output) |
Added AC-22 + a filesystem-read-counter test that asserts read_capped_text is invoked exactly once per Dockerfile across two consecutive probe.run calls |
| T-8 | nit | No property-based test for "any rule that produces NotApplicable in _apply_one produces NotApplicable in the probe's output for the same RepoSnapshot" |
Added as a Hypothesis test (T-23) over the four ConventionRule variants — out of scope for S6-02 minimum; suggested as a follow-up in §"Notes for the implementer" |
Consistency critic — findings¶
| ID | Severity | Finding | Resolution |
|---|---|---|---|
| K-1 | block | Story says _run(ctx); ABC requires async def run(repo, ctx) (src/codegenie/probes/base.py:94) |
Rewrote AC-3, AC-4, AC-6 |
| K-2 | block | Story says probe_id = ProbeId("conventions"); ABC says name: str (src/codegenie/probes/base.py:75) |
Rewrote AC-3; added _PROBE_ID Final constant per scip_index.py:114 precedent |
| K-3 | block | Story says applies_to_tasks: tuple[str, ...]; ABC says list[str] (src/codegenie/probes/base.py:79-81) |
Rewrote AC-3 |
| K-4 | block | Story imports from codegenie.ids; actual path is codegenie.types.identifiers |
Rewrote import path |
| K-5 | block | Story uses ctx.repo_snapshot / ctx.conventions_search_paths / ctx.user_home; ABC has none of those fields (ABC is frozen; 02-ADR-0004 added only image_digest_resolver) |
Rewrote AC-4 + GREEN code to use repo (first arg) and ctx.config[...] for search paths |
| K-6 | block | ProbeContext.for_test doesn't exist; same root cause as K-5 |
Introduced _make_context test helper mirroring S6-01 |
| K-7 | block | ProbeOutput(probe_id=..., schema_slice=..., confidence=..., errors=...) is wrong shape; the actual model is (schema_slice, raw_artifacts, confidence, duration_ms, warnings, errors) (src/codegenie/probes/base.py:65-71) |
Rewrote GREEN with all six fields |
| K-8 | block | _PROBE_REGISTRY doesn't exist; the registry is default_registry: Registry (src/codegenie/probes/registry.py:238) |
Rewrote AC-11 (renumbered AC-12) |
| K-9 | block | Schema path codegenie.schema.probes.layer_d doesn't exist; layout is flat (src/codegenie/schema/probes/) |
Rewrote AC-10 (renumbered AC-11) |
| K-10 | block | Catalog.version doesn't exist; story's catalog_version field is unimplementable |
Dropped from slice; added catalog_paths_resolved for operator observability |
| K-11 | block | Fail.file/line/snippet doesn't exist on Fail (only evidence: str); AC-7 unimplementable |
Rewrote AC-7 against documented evidence strings |
| K-12 | block | NotApplicable.reason = "dockerfile_absent" doesn't match the kernel constant _REASON_NO_DOCKERFILE = "no_dockerfile_present" |
Rewrote AC-6 against the kernel constant |
| K-13 | block | load_all() -> Result[Catalog, ConventionsError] doesn't match the kernel Result[CatalogLoadOutcome, FatalLoadError] |
Rewrote AC-4 + added AC-16/17 |
| K-14 | harden | Implementation outline says "sort by rule_id"; AC-15 says "rule order is catalog file order" — internally inconsistent | Removed re-sort; preserved loader order |
Design-patterns critic — findings¶
| ID | Severity | Finding | Resolution |
|---|---|---|---|
| D-1 | harden | Functional-core/imperative-shell split missing: GREEN inlines confidence derivation and result projection into run() |
Extracted _compute_confidence, _project_results per S6-01 precedent |
| D-2 | harden | The Catalog.apply memo at the kernel is the single source of truth for per-snapshot caching; nothing in the story stops the probe from layering a second memo (which would diverge under the id(repo) keying the kernel uses) |
New §"Implementer notes" rule: probe MUST NOT wrap Catalog.apply with a second memo; AC-22 enforces single-read |
| D-3 | nit | Rule-of-three for a shared marker-probe abstraction hasn't triggered (2 of 5 marker probes); design-patterns critic confirms the story's AC-14 is the right call but the implementation is wrong | Rewrote AC-19 (renumbered) as __mro__ depth + module-source grep |
| D-4 | nit | "Smart constructor" pattern not used; the slice's rules_checked: int could be inconsistent with len(results) if constructed by hand |
Added model_validator(mode="after") requirement on ConventionsSlice to verify rules_checked == len(results); AC-2 tightened |
| D-5 | nit | safe_yaml chokepoint discipline: the test helper _write_catalog uses yaml.safe_dump to write the fixture; this is acceptable (fixture-builder operator, not probe reader) but should be documented |
Added Implementer note §10 |
Stage 3 — Researcher (skipped)¶
No NEEDS RESEARCH findings. All thirteen kernel-contract mismatches resolved via in-repo precedents (S2-02 source modules; S6-01 hardened story; scip_index.py for _PROBE_ID Final-constant pattern; index_health.py for _dispatch_per_name_isolated pattern as a precedent for partial-success isolation — though this probe doesn't need that level since Catalog.apply is already total over the four rule variants). The Hypothesis property-based test for "any _apply_one outcome flows through the probe unchanged" (T-8) is a follow-up suggestion, not a researcher trigger — Hypothesis is already used in the repo (tests/property/), and the pattern is canonical.
Stage 4 — Synthesizer + Editor¶
Twenty-six in-place edits applied to S6-02-conventions-probe.md:
- Status block — added "Validation notes" sub-block under the header naming the verdict and pointing at this report.
- Context paragraph — clarified that the loader returns
CatalogLoadOutcome(not bareCatalog); pointed atsrc/codegenie/conventions/{model,catalog,loader}.pyas the authoritative kernel. - References — added
src/codegenie/conventions/loader.pyandsrc/codegenie/conventions/catalog.pyline references; added the S6-01 hardened story as the probe-shape precedent; addedsrc/codegenie/probes/layer_b/scip_index.py:114as the_PROBE_IDFinal-constant precedent. - Goal — rewrote to match the actual kernel:
Result[CatalogLoadOutcome, FatalLoadError], three-state confidence,catalog_paths_resolved: tuple[str, ...](replacingcatalog_version), per-file errors surfaced through the slice. - AC-1 — kept (
__all__exports). Renumbered. - AC-2 — rewrote
ConventionsSlicefields:results: tuple[ConventionResult, ...],catalog_paths_resolved: tuple[str, ...],per_file_errors: tuple[ConventionsError, ...],rules_checked: int. Addedmodel_validator(mode="after")requirement (rules_checked == len(results)). - AC-3 — rewrote to use ABC contract:
name: str = "conventions",layer = "D",tier = "base",applies_to_tasks: list[str] = ["*"],applies_to_languages: list[str] = ["*"],requires: list[str] = [],declared_inputs: list[str]listingDockerfile, the catalog tier path tokens, and the user/repo tier search paths;timeout_seconds: int = 15,cache_strategy: Literal["content"] = "content". Addedversion: str = "0.1.0"per theProbe.versionconvention. Introduced module-level_PROBE_ID: Final[ProbeId] = ProbeId("conventions"). - AC-4 — rewrote
runsignature toasync def run(self, repo: RepoSnapshot, ctx: ProbeContext) -> ProbeOutput; pattern-match againstResult[CatalogLoadOutcome, FatalLoadError]; onOk(CatalogLoadOutcome(catalog=catalog, per_file_errors=errors))invokecatalog.apply(repo)(passing theRepoSnapshotarg fromrun, notctx.repo_snapshot); onErr(FatalLoadError(...))emit low-confidence slice with empty results. - AC-5 — kept (round-trip identity), but tightened to assert
kinddiscriminators across the typed union including theConventionIdnewtype onrule_id. - AC-6 — rewrote:
NotApplicable.reason == "no_dockerfile_present"(the kernel constant), not"dockerfile_absent". Added a second variant assertion for_REASON_GLOB_EMPTY = "file_glob_no_matches"onfile_pattern/missing_filewith an empty glob. - AC-7 — rewrote:
Fail.evidencecarries the documented strings ("pattern not found in Dockerfile","forbidden pattern present in Dockerfile",f"{relpath}: pattern not found",f"unexpected file present: {relpath}"); nofile,line, orsnippetfields exist onFail. - AC-8 — kept (
Passis the empty-information variant); test parametrized over five forbidden kwargs. - AC-9 — deleted (catalog-absent low-confidence semantics rewritten as AC-16 below).
- AC-10 — renumbered AC-11; rewrote schema path to flat
files("codegenie.schema.probes") / "conventions.schema.json"; noted that the sub-schema lands in S6-08. - AC-11 — renumbered AC-12; rewrote registry lookup as
next(e for e in default_registry._entries if e.cls.name == "conventions").heaviness == "light". - AC-12 — renumbered AC-13; tightened parametrize wording.
- AC-13 — kept (mypy --strict); renumbered AC-14.
- AC-14 — rewrote:
__mro__depth assertion (ConventionsProbe.__mro__is(ConventionsProbe, Probe, ABC, object)) plus a module-source grep for any non-Probesubclass declaration; renumbered AC-19. - AC-15 — renumbered AC-21; tightened determinism wording (catalog file order preserved; no re-sort).
- New AC-16 —
FatalLoadError(reason="no_search_path_readable")→confidence="low",results=(),per_file_errors=(...)containing the typed error model JSON. - New AC-17 — Partial-success: a fixture with one valid catalog and one malformed catalog yields
confidence="medium",resultspopulated from the valid catalog,per_file_errorscarrying the typed error from the malformed file. - New AC-18 — Three-state confidence policy:
_compute_confidence(applied, per_file_errors) -> Literal["high","medium","low"]is a pure helper;"high"ifper_file_errors == ()(including empty catalog),"medium"ifper_file_errors and applied,"low"ifper_file_errors and not appliedOR onFatalLoadError. - New AC-20 —
ConventionIdnewtype preservation: everyConventionResult.rule_idround-tripped throughslice_.model_dump_json()→model_validate_jsonremains aConventionId(not a rawstr) undermypy --strict(areveal_type(...)annotation in the test serves as the compile-time assertion). - New AC-22 — Memoization preservation:
probe.run(repo, ctx)called twice against the samerepoinstance reads eachDockerfileexactly once at the filesystem level (verified by monkey-patchingread_capped_textwith a counter wrapper in the test). TheCatalog.applyid(repo)memo is the single source of truth. - Implementation outline — rewrote to match the kernel: pattern-match
Result[CatalogLoadOutcome, FatalLoadError]; onOk(...)invokecatalog.apply(repo)with theRepoSnapshotarg; do NOT re-sort (preserve loader order); compute confidence via_compute_confidence; emit slice withcatalog_paths_resolved(the two resolved tier paths); write raw artifact toctx.output_dir / "conventions.json". Extract_compute_confidence,_project_results,_resolve_search_pathsas pure helpers. - TDD plan — rewrote
_write_catalogto produce the actual{rules: [...]}shape; added per-pattern-type rule builders_rule_dockerfile,_rule_dockerfile_inverted,_rule_file_pattern,_rule_missing_file; rewrote all tests againstasync def run(repo, ctx)viaasyncio.run; added_make_context(tmp_path)and_make_repo(tmp_path, fixture)test helpers; rewrote GREEN code with the correctProbeABC field set, correctProbeOutputsix-field shape,async def run,_PROBE_IDFinal constant, three-state confidence via_compute_confidence. Refactor section updated: thematch catalog.load_result:Rule-of-Three trigger reference is preserved but corrected — both S6-01 and S6-02 consumeResult[*LoadOutcome, FatalLoadError](same shape, two precedents); the third consumer (S6-04 ExternalDocs) is the rule-of-three trigger.
Conflicts resolved¶
- Coverage C-1 vs Consistency K-13 — both pushed in the same direction (cover the catastrophic-failure surface; align to the kernel shape). No conflict.
- Coverage C-4 vs Design-patterns D-2 — both pushed for the memoization-preservation assertion (C-4 wanted byte-identity, D-2 wanted root-cause). Synthesized as AC-22 (filesystem-read counter — the strongest formulation).
- Test-quality T-6 vs Design-patterns D-3 — both said the
inspect.getsourcegrep for sibling imports is vacuous today and brittle tomorrow. Synthesized as the__mro__depth assertion. - Implementation outline step 1 vs AC-15 — story said "sort by rule_id" AND "rule order is catalog file order" simultaneously. Consistency wins (the catalog file order is the operator-facing contract); the re-sort is removed.
Verdict¶
HARDENED. Twenty-six in-place edits applied. Story is now structurally consistent with the S2-02 kernel that actually shipped, with the frozen Probe ABC at src/codegenie/probes/base.py:74-96, with the registered probe convention from default_registry, and with every load-bearing convention the S6-01 hardened story established for Layer-D probes.
The implementer who picks up this story can now:
1. Read src/codegenie/conventions/{model,catalog,loader}.py and S6-01's hardened story.
2. Copy S6-01's _make_context and _make_repo test helpers (or — better — extract them into tests/_helpers/probe_context.py if the rule-of-three triggers when S6-03's marker probes need the same fixtures).
3. Write the eight tests in the order listed in the TDD plan, red → green → refactor.
4. Verify the __mro__ depth assertion catches a deliberate "extract a MarkerProbe base class" refactor attempt.
5. Run mypy --strict src/codegenie/probes/layer_d/conventions.py clean.
No RESCUE outcome — the goal and ACs trace to the architecture and ADRs without contradiction once the kernel-contract drift is corrected.