S6-02 — ConventionsProbe Layer D — attempt log¶
Attempt 1 — GREEN (2026-05-17)¶
Result: GREEN. 36 tests pass (1 skipped for AC-11 sub-schema, deferred to S6-08). Full suite green (make lint typecheck lint-imports test → 2732 passed, 30 skipped, 2 xfailed). make fence green. mypy --strict clean on the new module.
What shipped¶
| Path | Notes |
|---|---|
src/codegenie/probes/layer_d/conventions.py |
New file (~210 LOC). ConventionsSlice smart-constructor + ConventionsProbe @register_probe(heaviness="light"). Four module-level pure helpers: _compute_confidence, _project_results, _atomic_write_text, plus the probe's _resolve_search_paths. Module-level _PROBE_ID: Final[ProbeId] + _BASE_VERSION + _RAW_ARTIFACT_NAME + _WARNING_PER_FILE_ERRORS + the two default-path constants. |
src/codegenie/probes/__init__.py |
One additive import line (conventions, # noqa: F401 — S6-02 registration) + entry in __all__. No other edits. |
tests/unit/probes/layer_d/test_conventions.py |
New file (~520 LOC). 19 test functions; the parametrized test_pattern_type_outcomes covers 11 cases (4 pattern kinds × Pass/Fail/NotApplicable, minus the impossible missing_file × NotApplicable cell). |
Per-AC evidence table¶
| AC | Test |
|---|---|
| AC-1 (exports) | test_module_exports_exactly_two_names |
| AC-2 (smart constructor + extra=forbid) | test_slice_smart_constructor_rejects_count_mismatch, test_slice_extra_field_rejected |
AC-3 (ABC contract + _PROBE_ID) |
test_probe_abc_attributes_match_contract |
AC-4 (run shape end-to-end) |
covered by test_pattern_type_outcomes + test_empty_catalog_* + test_fatal_load_error_* |
AC-5 (_resolve_search_paths purity) |
test_resolve_search_paths_pure_and_two_tier |
| AC-6 (NA kernel constants) | test_dockerfile_absent_yields_no_dockerfile_present, test_file_glob_empty_yields_file_glob_no_matches |
| AC-7 (Fail evidence strings) | test_fail_evidence_strings_match_kernel |
| AC-8 (Pass is empty-info) | test_pass_rejects_extra_kwarg[file|line|snippet|reason|evidence|note] |
| AC-9 (round-trip variants) | test_slice_round_trip_preserves_typed_variants_and_newtype |
| AC-10 (4×3 outcomes) | test_pattern_type_outcomes (11 cases) |
| AC-11 (sub-schema) | test_slice_matches_subschema_with_strict_additional_properties — skipped (S6-08 ships the schema file) |
| AC-12 (registry heaviness) | test_registry_heaviness_is_light |
| AC-13 (typed variants via match) | test_pattern_type_outcomes match block with assert_never |
| AC-14 (mypy strict) | mypy --strict src/codegenie/probes/layer_d/conventions.py → Success |
| AC-15 (empty catalog → high) | test_empty_catalog_yields_high_confidence |
| AC-16 (FatalLoadError → low) | test_fatal_load_error_yields_low_confidence |
| AC-17 (partial → medium) | test_partial_success_yields_medium_confidence_and_typed_errors |
| AC-18 (pure helper) | test_compute_confidence_three_state_policy |
| AC-19 (no shared base) | test_mro_depth_and_no_helper_classes |
| AC-20 (ConventionId newtype) | test_slice_round_trip_preserves_typed_variants_and_newtype |
| AC-21 (determinism + catalog order) | test_two_runs_byte_identical_and_preserve_catalog_order |
| AC-22 (Catalog.apply memo) | test_catalog_apply_memo_reads_dockerfile_once_per_run, test_catalog_apply_memo_returns_cached_results_within_one_run |
| AC-23 (atomic + deterministic raw) | test_raw_artifact_written_atomically_and_deterministically |
Surprises + deviations from the story¶
-
dockerfile_pattern_invertedregex in the story doesn't actually match. The story's example patternr"npm (start|run)"againstCMD ["npm", "start"]does not match — the chars betweennpmandstartin the JSON-exec form are", ", not a single space. Without the match, the Fail row would have been a Pass (no forbidden pattern present), so the test would assertFailagainst an actualPass. Resolution: swapped tor"\bnpm\b"(matches the bare tokennpmregardless of quoting); semantically equivalent to "the forbidden pattern is npm." The story's intent is preserved; the regex now realistically demonstrates the inverted-rule contract. Logged here as a story-text correction candidate for whoever lands S6-08+. -
AC-22 read-count. The story's prose says "exactly once" but its TDD plan asserts
reads_after_first == 1(one rule) andreads_after_second == 2(second run rebuilds the Catalog). I extended this to two rules per run to demonstrate that the memo is per-Catalog.applycall, not perDockerfile: with the kernel as shipped, each rule that needs the Dockerfile reads it independently (theid(repo)memo only short-circuits a secondapplyon the sameCataloginstance, not a second rule on the same repo). The test now assertsreads_after_first == 2andreads_after_second == 4(two reads per run, no leak across runs). Addedtest_catalog_apply_memo_returns_cached_results_within_one_runto pin the actualid(repo)-memo invariant (theCatalogreturns the same list object on the secondapplycall with the samerepoinstance). -
Test patching site. The story prescribed patching
codegenie.conventions._io.read_capped_text. That binding is invisible to the kernel becausecatalog.pyimports it as a direct name (from codegenie.conventions._io import read_capped_text). Patched at the call site (codegenie.conventions.catalog.read_capped_text) instead, which the kernel actually looks up. Tests pass; mutation guarantee preserved. -
AC-19 source-grep. The story's regex matched
class ConventionsProbe(Probe):alone OR the two-class form. I structured the assertion the same way (permitted = {(...) , (...)}) — the implementation ships bothConventionsSlice(BaseModel)andConventionsProbe(Probe), so the second form fires. -
Probe.layerandProbe.tiertyped asLiteral["D"]andLiteral["base"]. The story showed bare assignment (layer = "D"); the precedent (SkillsIndexProbe) uses theLiteral["D"]annotation. Followed the precedent for mypy clarity. Test asserts the runtime value (p.layer == "D"), so either form passes — the annotation is the more conservative choice.
Files not in the story's "Files to touch" but touched¶
src/codegenie/probes/__init__.py— registration line +__all__entry. This is the conventional collection point for@register_probe-decorated probes (the registry doesn't scan; modules must be imported). Story §"Files to touch" omitted this; same omission as S6-01's story. Logged here for the manifest-keeper.
Suggested commit message¶
feat(phase2/S6-02): GREEN — ConventionsProbe Layer D rule-evaluation probe
Lands src/codegenie/probes/layer_d/conventions.py as a
@register_probe(heaviness="light") Layer-D probe that applies the
ConventionsCatalogLoader output (S2-02) to the analyzed-repo
RepoSnapshot and projects the returned list[ConventionResult] into a
typed ConventionsSlice carrying the catalog-file-ordered results,
resolved tier search paths (operator observability + S6-08 freshness
hook), the loader's per-file errors round-tripped through the
discriminated union, and a smart-constructor rules_checked count.
Pattern matches Result[CatalogLoadOutcome, FatalLoadError]:
- Ok → catalog.apply(repo) (preserves kernel id(repo) memo),
confidence via _compute_confidence (high/medium/low three-state).
- Err(FatalLoadError) → empty results, confidence "low", catalog_paths
carries the unreadable tier paths; probe never raises (Phase 0
failure-isolation contract).
Discriminated-union ConventionResult = Pass | Fail | NotApplicable
preserved end-to-end. NotApplicable carries the kernel's documented
reason constants (no_dockerfile_present, file_glob_no_matches); Fail
carries the four documented evidence strings (per-line capture
deferred to a future ADR amendment). ConventionId newtype survives
JSON round-trip via Pydantic's Annotated discriminator.
Functional-core / imperative-shell split (four pure module-level
helpers + the probe's pure _resolve_search_paths + the imperative
async run). Raw artifact at ctx.output_dir/conventions.json written
atomically (sibling .tmp + os.replace), byte-identical on rerun.
23 ACs verified with runtime evidence; 36 unit tests (1 skipped for
AC-11 sub-schema — lands in S6-08). Parametrized 4×3 pattern × outcome
test (11 reachable cases — missing_file has no NotApplicable path)
exercises the exhaustive match + assert_never discipline through the
typed slice. Full suite green: 2732 passed, 30 skipped, 2 xfailed.
mypy --strict, ruff, lint-imports — all clean.
Lessons for future Phase 2 stories¶
-
Always validate the story's example regexes against the example fixtures before lifting them. The
r"npm (start|run)"vsCMD ["npm", "start"]mismatch would have shipped as a false-Pass if the test had assumed the regex did what the prose said. A 5-minutepython -c 'import re; re.search(r"npm (start|run)", \'CMD ["npm", "start"]\')'is cheaper than a Stage-3 validator catch. -
Mutmut-style mutation testing for parametrized matrices. The 11-case
test_pattern_type_outcomesis exactly the shape where a polarity swap (Pass↔Fail on the inverted variant) would slip past a less granular test — every cell assertsisinstance(result, ExpectedClass)AND the match-discriminator branch. A future S6-03+ marker probe could lift the parametrize shape (rule × outcome) as a shared idiom. -
Patch at the binding site, not the source module. Python's name resolution caches imports —
from X import YbindsYin the importing module's namespace. PatchingX.Ydoesn't affect callers that already boundYdirectly. Always patch where the call happens (the importing module's namespace), not where the function is defined. Mirrors the same lesson from S6-01'stracemalloctest (where the patch target was the consumer, notlinecache).