Skip to content

Attempt log — S2-02 ConventionsCatalogLoader

Attempt 1 — 2026-05-16 — GREEN

Outcome

All 16 ACs satisfied; 55 new tests; full suite green (1946 passed, 5 skipped, 2 xfail). Coverage on codegenie.conventions.*: 89–100% per file; well above the 85% phase floor.

Files created

  • src/codegenie/conventions/__init__.py__all__ re-exports + package docstring
  • src/codegenie/conventions/_io.pyread_capped_text(path, *, max_bytes)
  • src/codegenie/conventions/model.py — four ConventionRule* Pydantic variants with model_validator(mode="after") regex compile, ConventionRule discriminated union, Pass / Fail / NotApplicable minimal-field models, ConventionResult discriminated union
  • src/codegenie/conventions/loader.py — seven ConventionsError variants, CatalogLoadOutcome, FatalLoadError, ConventionsCatalogLoader with multi-file partial-success driver, _classify_validation_error / _classify_yaml_failure pure helpers
  • src/codegenie/conventions/catalog.pyCatalog model with _memo (PrivateAttr-backed) for AC-12, module-level _apply_one exhaustive match + assert_never, four independent _apply_* helpers
  • tests/unit/conventions/__init__.py
  • tests/unit/conventions/test_catalog.py — 47 behavioural tests (AC-1..AC-13d, AC-15, AC-16)
  • tests/unit/conventions/test_no_direct_yaml_import.py — AST scan (AC-10)
  • tests/unit/conventions/test_no_model_construct.py — AST scan (AC-14)
  • tests/unit/conventions/test_inverted_helper_is_independent.py — AST scan (AC-5a)
  • tests/unit/conventions/test_no_local_convention_id.py — AST scan (AC-1a)
  • tests/unit/conventions/test_apply_match_is_exhaustive_compile_time.py — mypy invocation (AC-9 compile-time half)
  • tests/fixtures/mypy_warn_unreachable/incomplete_match_conventions.py — deliberately incomplete match fixture

Files modified

  • src/codegenie/types/identifiers.py — additive only: ConventionId = NewType("ConventionId", str) + __all__ extension. Zero edits to existing newtypes (B2 closure).

Acceptance-criteria evidence

AC Evidence
AC-1 test_ac1_module_all_is_exact_set (21-symbol set pinned) + parametrized test_ac1_models_are_frozen_and_extra_forbid (9 models)
AC-1a test_no_local_convention_id_newtype (AST scan) + test_convention_id_resolves_from_identifiers_module
AC-2 test_ac2_constructor_does_no_io (monkeypatch 8 I/O primitives including Path.glob and Path.iterdir)
AC-3 test_ac3_happy_path_per_kind parametrized over all four kind literals; per-kind field assertion-strict
AC-3a test_ac3a_multi_rule_single_file_in_order — two-rule catalog, order pinned
AC-3b test_ac3b_multi_file_merge_is_sorted_relpathb.yaml written first; loader returns [from-a, from-b]
AC-4 test_ac4_dockerfile_pattern_three_outcomes — Pass/Fail/NA + exact rule_id checks + reason == "no_dockerfile_present"
AC-4d test_ac4d_dockerfile_pattern_uses_re_multiline^FROM cgr\.dev/chainguard/ against # comment\nFROM cgr.dev/...\n
AC-5 test_ac5_dockerfile_pattern_inverted_three_outcomes^USER root match→Fail, no-match→Pass, missing→NA
AC-5a test_inverted_helper_does_not_call_main_helper (AST walk over _apply_dockerfile_pattern_inverted body)
AC-6 test_ac6_file_pattern_zero_matches_is_not_applicablereason="file_glob_no_matches" pinned
AC-6a test_ac6a_file_pattern_pass_when_all_match — two passing tsconfigs
AC-6b test_ac6b_file_pattern_fail_names_lex_first_failingb/tsconfig.json named, z/tsconfig.json not
AC-6c test_ac6c_file_glob_pathlib_recursive_and_dot_exclusion.hidden/foo.json excluded by pathlib default
AC-7 test_ac7_missing_file_passes_when_absent_fails_when_present — Pass on clean repo, Fail on dirty (evidence contains "Dockerfile")
AC-8 test_ac8_unknown_pattern_type_isolated_other_rules_unaffected — partial-success contract verified
AC-8a test_ac8a_unsafe_yaml_python_object_does_not_execute (sentinel uncreated; AC-8a syntax-error sibling for umbrella honesty)
AC-8b test_ac8b_size_cap_exceeded — 1.1 MiB padded file
AC-8c test_ac8c_depth_cap_exceeded — > 64 nesting levels
AC-9 runtime: test_ac9_apply_match_smoke_asserts_assert_never_only (pytest.raises(AssertionError) only — B4); compile-time: test_incomplete_conventions_match_fails_mypy_strict
AC-9a test_ac9a_result_model_field_sets_are_minimalPass.model_dump() exact dict + rejection of stray evidence
AC-10 AST: test_no_conventions_module_imports_yaml_directly; runtime: test_ac10_safe_yaml_chokepoint_is_only_yaml_call_site (spy counts == 2 for two catalog files)
AC-11 test_ac11_extra_field_yields_typed_schema_errorunexpected_key lands in details
AC-11a test_ac11a_uncompilable_regex_pattern_lands_at_load[unterminatedSchemaError with details[*].loc containing "pattern"
AC-12 test_ac12_catalog_apply_is_idempotent_without_repeated_io — second apply() call yields zero Path.open over the repo (memoization on id(repo))
AC-13 parametrized test_ac13_conventions_error_discriminated_union_seven_reasons (7 reasons round-trip via TypeAdapter) + test_ac13_symlink_refused_json_shape + test_ac13_unknown_reason_raises
AC-13a test_ac13a_toctou_disappearance_yields_catalog_file_unreadablemonkeypatch raises FileNotFoundError for one file; other file's rule persists
AC-13b test_ac13b_partial_success_under_mixed_quality — three-file fixture (ok.yaml, unknown.yaml, broken.yaml) → one rule + two errors (unknown_pattern_type, unsafe_yaml)
AC-13c test_ac13c_empty_search_paths_returns_ok_empty
AC-13d test_ac13d_fatal_when_every_search_path_unreadableos.access patched to always-FalseResult.Err(FatalLoadError(...))
AC-14 test_no_model_construct_call_or_assignment (AST scan) + pre-commit forbidden-patterns already includes conventions in the Phase-2 packages set (existing rule in scripts/check_forbidden_patterns.py)
AC-15 ruff check, ruff format --check, mypy --strict --warn-unreachable all clean; pytest tests/unit/conventions/ passes (55 tests in 0.33s)
AC-16 RED-GREEN: confirmed first that the test file fails to import (ModuleNotFoundError: No module named 'codegenie.conventions'), then landed the implementation in the order prescribed by the story (identifiers_iomodelloadercatalog__init__); refactor was a no-op behavior

Design-pattern decisions recorded

  • Strategy + Open/Closed at the file boundary. Four _apply_* helpers live as independent module-level pure functions, dispatched via an exhaustive match on the ConventionRule discriminated union. Adding a fifth variant in Phase 3+ is a new model class + new helper + new case arm — zero edits to existing helpers, Catalog, or ConventionsCatalogLoader. The mypy --warn-unreachable ratchet (repo-wide, Phase 0) makes a missing case arm a build failure.
  • Functional core / imperative shell. _apply_* helpers are pure given a RepoSnapshot; reading repo files via _io.read_capped_text keeps the seam at pathlib.Path rather than amending the frozen RepoSnapshot contract (Phase 0 ADR-0007 — B1 closure). ConventionsCatalogLoader.load_all() is the I/O shell.
  • Smart constructor on regex patterns. Each pattern-carrying variant compiles its regex at load time via model_validator(mode="after") and stashes the compiled re.Pattern on _compiled_pattern. Compilation failure surfaces as a SchemaError at the gather phase, not a RuntimeError mid-Catalog.apply. re.MULTILINE is the compile-time flag.
  • Newtype ConventionId at the canonical home. codegenie.types.identifiers gains exactly one new symbol; AST source-scan ratchet forbids local redefinition (B2).
  • Tagged-union ConventionsError mirrors SkillsLoadError (S2-01). Same field names (reason, path, details), seven reason Literals. Phase 3 plugin code can pattern-match uniformly across the two.
  • Make-illegal-states-unrepresentable on ConventionResult. Pass has no evidence, NotApplicable has no evidence, Fail has no reason. extra="forbid" rejects defensive leakage at construction time.
  • Catalog.apply memoization on id(repo). _memo: dict[int, list[ConventionResult]] via Pydantic PrivateAttr so two consecutive applies against the same snapshot reuse results — the snapshot is the I/O boundary (AC-12). A fresh snapshot wires a fresh evaluation.
  • unsafe_yaml umbrella keeps operator response coherent. MalformedYAMLError covers parser, scanner, constructor, and top-level-non-mapping; operator response is the same (inspect before re-running) so the umbrella names the response. AC-8a verifies both the !!python/object constructor case AND a pure syntax-error case land in the umbrella, and that no code executes for either.

Lessons (carry forward)

  • L1 — Pydantic v2 discriminator-tag failures locate at the parent list slot, not at kind. The error row carries loc=['rules', 0] (or [0, '<variant_kind>']) and type='union_tag_invalid'; the offending tag is in ctx.tag (string) or input['kind'] (fallback). The original "loc ends in 'kind'" filter never fires — match on type ∈ {union_tag_invalid, union_tag_not_found} and pull from ctx.tag.
  • L2 — ValidationError.errors() may include non-JSON-serialisable values. When a model_validator(mode="after") raises a Python exception (e.g., ValueError from a regex re.error re-raise), Pydantic surfaces the exception object inside errors()[i].ctx.error. Storing the rows directly in SchemaError.details: list[dict[str, object]] lets construction succeed but blows up at model_dump(mode="json"). Fix: round-trip through exc.json() (JSON-safe) and json.loads back to Python — single line, alias-resistant.
  • L3 — frozen=True Pydantic models can still hold a mutable _memo via PrivateAttr. PrivateAttr(default_factory=dict) is excluded from serialization, equality, and the frozen write-block — exactly the shape needed for per-instance memoization of pure-functional outputs. Stash sites must mutate the dict in place; reassigning self._memo = ... would still violate frozen.
  • L4 — Compile-time pattern compilation at the discriminator boundary needs explicit MULTILINE. re.search(pattern, text) with ^ anchors to string start only; users authoring Dockerfile patterns expect line anchors. Compile with re.MULTILINE once at load time so every _apply_* helper inherits the semantics without re-passing flags at call sites.
  • L5 — The Pydantic v2 frozen=True + private-attribute write needs object.__setattr__. The model_validator(mode="after") runs after the model has been frozen — direct self._compiled_pattern = ... raises. object.__setattr__(self, "_compiled_pattern", compiled) is the documented escape hatch.