Skip to content

S6-02 — ConventionsProbe Layer D — attempt log

Attempt 1 — GREEN (2026-05-17)

Result: GREEN. 36 tests pass (1 skipped for AC-11 sub-schema, deferred to S6-08). Full suite green (make lint typecheck lint-imports test → 2732 passed, 30 skipped, 2 xfailed). make fence green. mypy --strict clean on the new module.

What shipped

Path Notes
src/codegenie/probes/layer_d/conventions.py New file (~210 LOC). ConventionsSlice smart-constructor + ConventionsProbe @register_probe(heaviness="light"). Four module-level pure helpers: _compute_confidence, _project_results, _atomic_write_text, plus the probe's _resolve_search_paths. Module-level _PROBE_ID: Final[ProbeId] + _BASE_VERSION + _RAW_ARTIFACT_NAME + _WARNING_PER_FILE_ERRORS + the two default-path constants.
src/codegenie/probes/__init__.py One additive import line (conventions, # noqa: F401 — S6-02 registration) + entry in __all__. No other edits.
tests/unit/probes/layer_d/test_conventions.py New file (~520 LOC). 19 test functions; the parametrized test_pattern_type_outcomes covers 11 cases (4 pattern kinds × Pass/Fail/NotApplicable, minus the impossible missing_file × NotApplicable cell).

Per-AC evidence table

AC Test
AC-1 (exports) test_module_exports_exactly_two_names
AC-2 (smart constructor + extra=forbid) test_slice_smart_constructor_rejects_count_mismatch, test_slice_extra_field_rejected
AC-3 (ABC contract + _PROBE_ID) test_probe_abc_attributes_match_contract
AC-4 (run shape end-to-end) covered by test_pattern_type_outcomes + test_empty_catalog_* + test_fatal_load_error_*
AC-5 (_resolve_search_paths purity) test_resolve_search_paths_pure_and_two_tier
AC-6 (NA kernel constants) test_dockerfile_absent_yields_no_dockerfile_present, test_file_glob_empty_yields_file_glob_no_matches
AC-7 (Fail evidence strings) test_fail_evidence_strings_match_kernel
AC-8 (Pass is empty-info) test_pass_rejects_extra_kwarg[file|line|snippet|reason|evidence|note]
AC-9 (round-trip variants) test_slice_round_trip_preserves_typed_variants_and_newtype
AC-10 (4×3 outcomes) test_pattern_type_outcomes (11 cases)
AC-11 (sub-schema) test_slice_matches_subschema_with_strict_additional_propertiesskipped (S6-08 ships the schema file)
AC-12 (registry heaviness) test_registry_heaviness_is_light
AC-13 (typed variants via match) test_pattern_type_outcomes match block with assert_never
AC-14 (mypy strict) mypy --strict src/codegenie/probes/layer_d/conventions.py → Success
AC-15 (empty catalog → high) test_empty_catalog_yields_high_confidence
AC-16 (FatalLoadError → low) test_fatal_load_error_yields_low_confidence
AC-17 (partial → medium) test_partial_success_yields_medium_confidence_and_typed_errors
AC-18 (pure helper) test_compute_confidence_three_state_policy
AC-19 (no shared base) test_mro_depth_and_no_helper_classes
AC-20 (ConventionId newtype) test_slice_round_trip_preserves_typed_variants_and_newtype
AC-21 (determinism + catalog order) test_two_runs_byte_identical_and_preserve_catalog_order
AC-22 (Catalog.apply memo) test_catalog_apply_memo_reads_dockerfile_once_per_run, test_catalog_apply_memo_returns_cached_results_within_one_run
AC-23 (atomic + deterministic raw) test_raw_artifact_written_atomically_and_deterministically

Surprises + deviations from the story

  1. dockerfile_pattern_inverted regex in the story doesn't actually match. The story's example pattern r"npm (start|run)" against CMD ["npm", "start"] does not match — the chars between npm and start in the JSON-exec form are ", ", not a single space. Without the match, the Fail row would have been a Pass (no forbidden pattern present), so the test would assert Fail against an actual Pass. Resolution: swapped to r"\bnpm\b" (matches the bare token npm regardless of quoting); semantically equivalent to "the forbidden pattern is npm." The story's intent is preserved; the regex now realistically demonstrates the inverted-rule contract. Logged here as a story-text correction candidate for whoever lands S6-08+.

  2. AC-22 read-count. The story's prose says "exactly once" but its TDD plan asserts reads_after_first == 1 (one rule) and reads_after_second == 2 (second run rebuilds the Catalog). I extended this to two rules per run to demonstrate that the memo is per-Catalog.apply call, not per Dockerfile: with the kernel as shipped, each rule that needs the Dockerfile reads it independently (the id(repo) memo only short-circuits a second apply on the same Catalog instance, not a second rule on the same repo). The test now asserts reads_after_first == 2 and reads_after_second == 4 (two reads per run, no leak across runs). Added test_catalog_apply_memo_returns_cached_results_within_one_run to pin the actual id(repo)-memo invariant (the Catalog returns the same list object on the second apply call with the same repo instance).

  3. Test patching site. The story prescribed patching codegenie.conventions._io.read_capped_text. That binding is invisible to the kernel because catalog.py imports it as a direct name (from codegenie.conventions._io import read_capped_text). Patched at the call site (codegenie.conventions.catalog.read_capped_text) instead, which the kernel actually looks up. Tests pass; mutation guarantee preserved.

  4. AC-19 source-grep. The story's regex matched class ConventionsProbe(Probe): alone OR the two-class form. I structured the assertion the same way (permitted = {(...) , (...)}) — the implementation ships both ConventionsSlice(BaseModel) and ConventionsProbe(Probe), so the second form fires.

  5. Probe.layer and Probe.tier typed as Literal["D"] and Literal["base"]. The story showed bare assignment (layer = "D"); the precedent (SkillsIndexProbe) uses the Literal["D"] annotation. Followed the precedent for mypy clarity. Test asserts the runtime value (p.layer == "D"), so either form passes — the annotation is the more conservative choice.

Files not in the story's "Files to touch" but touched

  • src/codegenie/probes/__init__.py — registration line + __all__ entry. This is the conventional collection point for @register_probe-decorated probes (the registry doesn't scan; modules must be imported). Story §"Files to touch" omitted this; same omission as S6-01's story. Logged here for the manifest-keeper.

Suggested commit message

feat(phase2/S6-02): GREEN — ConventionsProbe Layer D rule-evaluation probe

Lands src/codegenie/probes/layer_d/conventions.py as a
@register_probe(heaviness="light") Layer-D probe that applies the
ConventionsCatalogLoader output (S2-02) to the analyzed-repo
RepoSnapshot and projects the returned list[ConventionResult] into a
typed ConventionsSlice carrying the catalog-file-ordered results,
resolved tier search paths (operator observability + S6-08 freshness
hook), the loader's per-file errors round-tripped through the
discriminated union, and a smart-constructor rules_checked count.

Pattern matches Result[CatalogLoadOutcome, FatalLoadError]:
- Ok → catalog.apply(repo) (preserves kernel id(repo) memo),
  confidence via _compute_confidence (high/medium/low three-state).
- Err(FatalLoadError) → empty results, confidence "low", catalog_paths
  carries the unreadable tier paths; probe never raises (Phase 0
  failure-isolation contract).

Discriminated-union ConventionResult = Pass | Fail | NotApplicable
preserved end-to-end. NotApplicable carries the kernel's documented
reason constants (no_dockerfile_present, file_glob_no_matches); Fail
carries the four documented evidence strings (per-line capture
deferred to a future ADR amendment). ConventionId newtype survives
JSON round-trip via Pydantic's Annotated discriminator.

Functional-core / imperative-shell split (four pure module-level
helpers + the probe's pure _resolve_search_paths + the imperative
async run). Raw artifact at ctx.output_dir/conventions.json written
atomically (sibling .tmp + os.replace), byte-identical on rerun.

23 ACs verified with runtime evidence; 36 unit tests (1 skipped for
AC-11 sub-schema — lands in S6-08). Parametrized 4×3 pattern × outcome
test (11 reachable cases — missing_file has no NotApplicable path)
exercises the exhaustive match + assert_never discipline through the
typed slice. Full suite green: 2732 passed, 30 skipped, 2 xfailed.
mypy --strict, ruff, lint-imports — all clean.

Lessons for future Phase 2 stories

  • Always validate the story's example regexes against the example fixtures before lifting them. The r"npm (start|run)" vs CMD ["npm", "start"] mismatch would have shipped as a false-Pass if the test had assumed the regex did what the prose said. A 5-minute python -c 'import re; re.search(r"npm (start|run)", \'CMD ["npm", "start"]\')' is cheaper than a Stage-3 validator catch.

  • Mutmut-style mutation testing for parametrized matrices. The 11-case test_pattern_type_outcomes is exactly the shape where a polarity swap (Pass↔Fail on the inverted variant) would slip past a less granular test — every cell asserts isinstance(result, ExpectedClass) AND the match-discriminator branch. A future S6-03+ marker probe could lift the parametrize shape (rule × outcome) as a shared idiom.

  • Patch at the binding site, not the source module. Python's name resolution caches imports — from X import Y binds Y in the importing module's namespace. Patching X.Y doesn't affect callers that already bound Y directly. Always patch where the call happens (the importing module's namespace), not where the function is defined. Mirrors the same lesson from S6-01's tracemalloc test (where the patch target was the consumer, not linecache).