S6-02 — `ConventionsProbe` Layer D — attempt log¶

Attempt 1 — GREEN (2026-05-17)¶

Result: GREEN. 36 tests pass (1 skipped for AC-11 sub-schema, deferred to S6-08). Full suite green (make lint typecheck lint-imports test → 2732 passed, 30 skipped, 2 xfailed). make fence green. mypy --strict clean on the new module.

What shipped¶

Path	Notes
`src/codegenie/probes/layer_d/conventions.py`	New file (~210 LOC). `ConventionsSlice` smart-constructor + `ConventionsProbe` `@register_probe(heaviness="light")`. Four module-level pure helpers: `_compute_confidence`, `_project_results`, `_atomic_write_text`, plus the probe's `_resolve_search_paths`. Module-level `_PROBE_ID: Final[ProbeId]` + `_BASE_VERSION` + `_RAW_ARTIFACT_NAME` + `_WARNING_PER_FILE_ERRORS` + the two default-path constants.
`src/codegenie/probes/__init__.py`	One additive import line (`conventions, # noqa: F401 — S6-02 registration`) + entry in `__all__`. No other edits.
`tests/unit/probes/layer_d/test_conventions.py`	New file (~520 LOC). 19 test functions; the parametrized `test_pattern_type_outcomes` covers 11 cases (4 pattern kinds × Pass/Fail/NotApplicable, minus the impossible `missing_file` × NotApplicable cell).

Per-AC evidence table¶

AC	Test
AC-1 (exports)	`test_module_exports_exactly_two_names`
AC-2 (smart constructor + extra=forbid)	`test_slice_smart_constructor_rejects_count_mismatch`, `test_slice_extra_field_rejected`
AC-3 (ABC contract + `_PROBE_ID`)	`test_probe_abc_attributes_match_contract`
AC-4 (`run` shape end-to-end)	covered by `test_pattern_type_outcomes` + `test_empty_catalog_` + `test_fatal_load_error_`
AC-5 (`_resolve_search_paths` purity)	`test_resolve_search_paths_pure_and_two_tier`
AC-6 (NA kernel constants)	`test_dockerfile_absent_yields_no_dockerfile_present`, `test_file_glob_empty_yields_file_glob_no_matches`
AC-7 (Fail evidence strings)	`test_fail_evidence_strings_match_kernel`
AC-8 (Pass is empty-info)	`test_pass_rejects_extra_kwarg[file\|line\|snippet\|reason\|evidence\|note]`
AC-9 (round-trip variants)	`test_slice_round_trip_preserves_typed_variants_and_newtype`
AC-10 (4×3 outcomes)	`test_pattern_type_outcomes` (11 cases)
AC-11 (sub-schema)	`test_slice_matches_subschema_with_strict_additional_properties` — skipped (S6-08 ships the schema file)
AC-12 (registry heaviness)	`test_registry_heaviness_is_light`
AC-13 (typed variants via match)	`test_pattern_type_outcomes` match block with `assert_never`
AC-14 (mypy strict)	`mypy --strict src/codegenie/probes/layer_d/conventions.py` → Success
AC-15 (empty catalog → high)	`test_empty_catalog_yields_high_confidence`
AC-16 (FatalLoadError → low)	`test_fatal_load_error_yields_low_confidence`
AC-17 (partial → medium)	`test_partial_success_yields_medium_confidence_and_typed_errors`
AC-18 (pure helper)	`test_compute_confidence_three_state_policy`
AC-19 (no shared base)	`test_mro_depth_and_no_helper_classes`
AC-20 (ConventionId newtype)	`test_slice_round_trip_preserves_typed_variants_and_newtype`
AC-21 (determinism + catalog order)	`test_two_runs_byte_identical_and_preserve_catalog_order`
AC-22 (Catalog.apply memo)	`test_catalog_apply_memo_reads_dockerfile_once_per_run`, `test_catalog_apply_memo_returns_cached_results_within_one_run`
AC-23 (atomic + deterministic raw)	`test_raw_artifact_written_atomically_and_deterministically`

Surprises + deviations from the story¶

dockerfile_pattern_inverted regex in the story doesn't actually match. The story's example pattern r"npm (start|run)" against CMD ["npm", "start"] does not match — the chars between npm and start in the JSON-exec form are ", ", not a single space. Without the match, the Fail row would have been a Pass (no forbidden pattern present), so the test would assert Fail against an actual Pass. Resolution: swapped to r"\bnpm\b" (matches the bare token npm regardless of quoting); semantically equivalent to "the forbidden pattern is npm." The story's intent is preserved; the regex now realistically demonstrates the inverted-rule contract. Logged here as a story-text correction candidate for whoever lands S6-08+.
AC-22 read-count. The story's prose says "exactly once" but its TDD plan asserts reads_after_first == 1 (one rule) and reads_after_second == 2 (second run rebuilds the Catalog). I extended this to two rules per run to demonstrate that the memo is per-Catalog.apply call, not per Dockerfile: with the kernel as shipped, each rule that needs the Dockerfile reads it independently (the id(repo) memo only short-circuits a second apply on the same Catalog instance, not a second rule on the same repo). The test now asserts reads_after_first == 2 and reads_after_second == 4 (two reads per run, no leak across runs). Added test_catalog_apply_memo_returns_cached_results_within_one_run to pin the actual id(repo)-memo invariant (the Catalog returns the same list object on the second apply call with the same repo instance).
Test patching site. The story prescribed patching codegenie.conventions._io.read_capped_text. That binding is invisible to the kernel because catalog.py imports it as a direct name (from codegenie.conventions._io import read_capped_text). Patched at the call site (codegenie.conventions.catalog.read_capped_text) instead, which the kernel actually looks up. Tests pass; mutation guarantee preserved.
AC-19 source-grep. The story's regex matched class ConventionsProbe(Probe): alone OR the two-class form. I structured the assertion the same way (permitted = {(...) , (...)}) — the implementation ships both ConventionsSlice(BaseModel) and ConventionsProbe(Probe), so the second form fires.
Probe.layer and Probe.tier typed as Literal["D"] and Literal["base"]. The story showed bare assignment (layer = "D"); the precedent (SkillsIndexProbe) uses the Literal["D"] annotation. Followed the precedent for mypy clarity. Test asserts the runtime value (p.layer == "D"), so either form passes — the annotation is the more conservative choice.

Files not in the story's "Files to touch" but touched¶

src/codegenie/probes/__init__.py — registration line + __all__ entry. This is the conventional collection point for @register_probe-decorated probes (the registry doesn't scan; modules must be imported). Story §"Files to touch" omitted this; same omission as S6-01's story. Logged here for the manifest-keeper.

Suggested commit message¶

feat(phase2/S6-02): GREEN — ConventionsProbe Layer D rule-evaluation probe

Lands src/codegenie/probes/layer_d/conventions.py as a
@register_probe(heaviness="light") Layer-D probe that applies the
ConventionsCatalogLoader output (S2-02) to the analyzed-repo
RepoSnapshot and projects the returned list[ConventionResult] into a
typed ConventionsSlice carrying the catalog-file-ordered results,
resolved tier search paths (operator observability + S6-08 freshness
hook), the loader's per-file errors round-tripped through the
discriminated union, and a smart-constructor rules_checked count.

Pattern matches Result[CatalogLoadOutcome, FatalLoadError]:
- Ok → catalog.apply(repo) (preserves kernel id(repo) memo),
  confidence via _compute_confidence (high/medium/low three-state).
- Err(FatalLoadError) → empty results, confidence "low", catalog_paths
  carries the unreadable tier paths; probe never raises (Phase 0
  failure-isolation contract).

Discriminated-union ConventionResult = Pass | Fail | NotApplicable
preserved end-to-end. NotApplicable carries the kernel's documented
reason constants (no_dockerfile_present, file_glob_no_matches); Fail
carries the four documented evidence strings (per-line capture
deferred to a future ADR amendment). ConventionId newtype survives
JSON round-trip via Pydantic's Annotated discriminator.

Functional-core / imperative-shell split (four pure module-level
helpers + the probe's pure _resolve_search_paths + the imperative
async run). Raw artifact at ctx.output_dir/conventions.json written
atomically (sibling .tmp + os.replace), byte-identical on rerun.

23 ACs verified with runtime evidence; 36 unit tests (1 skipped for
AC-11 sub-schema — lands in S6-08). Parametrized 4×3 pattern × outcome
test (11 reachable cases — missing_file has no NotApplicable path)
exercises the exhaustive match + assert_never discipline through the
typed slice. Full suite green: 2732 passed, 30 skipped, 2 xfailed.
mypy --strict, ruff, lint-imports — all clean.

Lessons for future Phase 2 stories¶

Always validate the story's example regexes against the example fixtures before lifting them. The r"npm (start|run)" vs CMD ["npm", "start"] mismatch would have shipped as a false-Pass if the test had assumed the regex did what the prose said. A 5-minute python -c 'import re; re.search(r"npm (start|run)", \'CMD ["npm", "start"]\')' is cheaper than a Stage-3 validator catch.
Mutmut-style mutation testing for parametrized matrices. The 11-case test_pattern_type_outcomes is exactly the shape where a polarity swap (Pass↔Fail on the inverted variant) would slip past a less granular test — every cell asserts isinstance(result, ExpectedClass) AND the match-discriminator branch. A future S6-03+ marker probe could lift the parametrize shape (rule × outcome) as a shared idiom.
Patch at the binding site, not the source module. Python's name resolution caches imports — from X import Y binds Y in the importing module's namespace. Patching X.Y doesn't affect callers that already bound Y directly. Always patch where the call happens (the importing module's namespace), not where the function is defined. Mirrors the same lesson from S6-01's tracemalloc test (where the patch target was the consumer, not linecache).

S6-02 — ConventionsProbe Layer D — attempt log¶