Story S2-02 — ConventionsCatalogLoader with discriminated-union pattern types¶
Step: Step 2 — Plant kernel-side loaders (SkillsLoader, ConventionsCatalogLoader) and reference TCCM
Status: Done (GREEN 2026-05-16, all ACs satisfied — see _attempts/S2-02.md)
Effort: M
Depends on: S1-04 (TCCM model + loader establishes the Result[T, E] + safe_yaml-chokepoint + Pydantic discriminated-union pattern this story repeats). Sibling-and-precedent: S2-01 (SkillsLoader) — same multi-file partial-success loader shape; this story reuses the SkillsLoadError-style reason: Literal[…] discriminator + LoadOutcome + per_file_errors convention.
ADRs honored: 02-ADR-0007 (kernel-side scaffolding only — no plugin loader), Phase 1 ADR-0006 (safe_yaml chokepoint preserved), Phase 1 ADR-0008 (in-process parse caps + O_NOFOLLOW), Phase 0 ADR-0007 (probe-contract surface frozen — RepoSnapshot extension policy, see Validation notes B1), production ADR-0033 §1, §3–4 (newtypes for every domain primitive — ConventionId, RegexPatternSource; make-illegal-states-unrepresentable — every pattern type a discriminated-union variant; one match per type with assert_never on unreachable)
Validation notes (added 2026-05-15 by phase-story-validator)¶
This story was hardened against four critic lenses (coverage, test-quality, consistency, design-patterns). Verdict: HARDENED. The core shape (Pydantic discriminated union over four pattern variants + ConventionResult = Pass | Fail | NotApplicable + Catalog.apply as a match with assert_never + safe_yaml.load chokepoint reuse) is correct and traces cleanly to arch §"Component design" #10, arch §"Design patterns applied" rows 5/8, 02-ADR-0007 §Decision, and production ADR-0033. Block-tier and harden-tier closures applied in place:
- B1 —
RepoSnapshotAPI mismatch (block). Original story prescribedRepoSnapshot.build(tmp_path)(a factory) andrepo.read_text(relpath)(a method). Neither exists.src/codegenie/probes/base.pyshipsRepoSnapshotas a plain@dataclasswithroot: Path,git_commit: str | None,detected_languages: dict[str, int],config: dict[str, Any]— no factory, noread_text. Phase 0 ADR-0007 freezes the probe-contract surface — adding either method toRepoSnapshotwould require a Phase 2 ADR amendment with sentinel-test wiring (tests/unit/test_probe_contract.py). The hardened story takes the smaller blast radius:_apply_*helpers read repo files viarepo.root / relpath(pathlib.Pathoperations only — no new method onRepoSnapshot), capped at 1 MiB per file viacodegenie.parsers._io.open_capped-style discipline (aread_capped_text(path: Path, *, max_bytes: int) -> str | None-style helper local tocodegenie.conventions._io, not a new public method onRepoSnapshot). Test fixtures constructRepoSnapshotdirectly viaRepoSnapshot(root=tmp_path / "repo", git_commit=None, detected_languages={}, config={})(no factory). AC-12 is the contract pin; the Implementation outline §3–6 was rewritten to reflect this. (See also DP-Notes "Functional core, RepoSnapshot at the boundary".) - B2 —
ConventionIdnewtype location (block). Original story said "lift from Step 1's ADR-0033 newtype roster ifcodegenie.adapters.idsexposes it; otherwise add here."codegenie.adapters.idsdoes not exist; canonical home for domain identifiers iscodegenie/types/identifiers.py(which already exportsSkillId,TaskClassId,ProbeId). Story now commits to extendingcodegenie.types.identifierswithConventionId = NewType("ConventionId", str)and the__all__line — Open/Closed at the file boundary, mirroring the lift S1-05 established for the existing newtypes. TheLanguagenewtype is not introduced by this story (no language-applicability field onConventionRule*);ConventionIdis the only new newtype. AC-1 and AC-25 (new) pin the import location. - B3 —
rule_idfield never asserted onConventionResult(block). Every AC checkedisinstance(result, Pass|Fail|NotApplicable)but neverresult.rule_id == rule.id. A mutation that fusesrule_idto a constant (e.g., alwaysConventionId("")) or swaps it with a sibling rule's id would pass every existing test. AC-4 / AC-5 / AC-6 / AC-7 now assertresult.rule_id == ConventionId("<expected id>")exactly. - B4 —
assert_neverexception type was too lax (block, TQ1). AC-9's runtime smoke test allowedpytest.raises((AssertionError, TypeError, ValueError)).typing.assert_neverraisesAssertionErrorin Python 3.11+ (assert_neverisraise AssertionError(...)). AllowingTypeError/ValueErrorlets a defensive implementation that writesif not isinstance(rule, _KNOWN_TYPES): raise TypeError("unknown kind")pass — which is exactly the anti-pattern ADR-0033 §4 forbids (the load-bearing signal is the compile-time exhaustiveness check, not a runtimeisinstancewhitelist). Tightened topytest.raises(AssertionError)only. (The runtime test exists because compile-timemypy --warn-unreachableonly fires on a missing arm, not on someone writing theisinstance-chain anti-pattern.) - B5 —
_apply_oneimport path pinned (block). The TDD plan importsfrom codegenie.conventions.catalog import _apply_one, but the Implementation outline shows_apply_oneas a method onCatalog. The hardened story pins_apply_one(rule: ConventionRule, repo: RepoSnapshot) -> ConventionResultas a module-level pure function incatalog.py;Catalog.applyis a thin wrapper that iteratesself.rulesand calls the module-level function. This composes with the "four module-level_apply_*helpers" prescription in §"Refactor" and makes theassert_neversmoke test (AC-9) callable without instantiatingCatalog. - B6 —
ConventionsErrorreasons under-enumerated vssafe_yaml.loadraise set (block, CN3). Story enumerated four reasons (unknown_pattern_type,schema,symlink_refused,catalog_file_unreadable) butsafe_yaml.loadraisesMalformedYAMLError(anyyaml.YAMLErrorsubclass, includingConstructorErrorfor!!python/object),SizeCapExceeded,DepthCapExceeded, andSymlinkRefusedError. The S2-01 hardening surfaced this exact gap; this story now adopts the same convention:unsafe_yamlis the umbrella for allMalformedYAMLErrorcauses (parser, scanner, constructor — operationally-prudent name),size_cap_exceededis its own reason (the operator distinguishes a 200 MB hostile catalog from a parser typo), anddepth_cap_exceededis its own reason. FinalConventionsErroris a seven-reason discriminated union:unknown_pattern_type | schema | symlink_refused | unsafe_yaml | size_cap_exceeded | depth_cap_exceeded | catalog_file_unreadable. AC-13 (parameterized over the seven reasons) replaces the previous four-reason form; AC-8 (umbrella honesty), AC-8b (size-cap), AC-8c (depth-cap) are new. - H1 — Multi-rule single-file catalog absent. Every red test loaded a one-rule catalog. AC-3a (new) loads a two-rule catalog with distinct
kinds in the same YAML file; asserts both rules round-trip andCatalog.apply(repo)returnslen(...) == 2in the same order. - H2 — Multi-file catalog merge order never pinned. Story says "merged into one
Catalog.ruleslist in iteration order" but no AC verifies. AC-3b (new) writes two YAML files (a.yamlfirst-rule,b.yamlsecond-rule) in the samesearch_paths[0]directory; asserts ordering by sorted relative path (lexicographic; deterministic across xfs/ext4/APFS — same convention as S2-01 AC-19). - H3 — Regex compilation at load not pinned by an AC. Notes-for-implementer say "compilation failure is a schema-time concern caught at load (a rule with an uncompilable
pattern→Result.Err(ConventionsError(reason="schema", ...))via a Pydanticmodel_validator)." AC-11a (new) pins this: apattern: "[unterminated"rule →Result.Err(SchemaError(...))withdetailsnon-empty and at least one row whoselocreferences thepatternfield. Tied to DP1 (regex as a smart-constructor / validated newtype). - H4 —
re.searchMULTILINE semantics never pinned. The example pattern^FROM cgr\.dev/chainguard/uses anchor^—re.searchdefaults to single-line mode where^only matches start of string. Pin:re.search(pattern, contents, flags=re.MULTILINE)(so^matches line starts inside the Dockerfile contents). AC-4d (new) is the mutation killer: aFROM cgr.dev/chainguard/line not at the top of the Dockerfile (e.g., preceded by a comment block) mustPass— wouldFailwithoutre.MULTILINE. - H5 —
file_globlibrary / semantics never pinned. AC-6 usedfile_glob: "**/tsconfig.json".pathlib.Path.globrequiresPath.rglob("tsconfig.json")for recursive;Path.glob("**/tsconfig.json")also recurses but with subtle dot-file rules. AC-6c (new) pins library and recursive semantics:Path(repo.root).glob(rule.file_glob)(NOTrglob), and assertsfile_glob: "**/foo.json"findsrepo/x/y/foo.jsonbut does NOT matchrepo/.hidden/foo.json(dot-leading components excluded —pathlib.Path.globdefault behavior). - H6 —
_apply_file_patternfirst-offending file is non-deterministic without sort. Story says "the first offending file is named inevidence" butPath.globorder is filesystem-dependent. AC-6d (new) pinsevidenceto reference the lexicographically-first failing path (deterministic across filesystems); the implementationsorted(...)s the glob result before iterating. - H7 —
dockerfile_pattern_invertedindependent-helper invariant unenforced. Notes say "don't share machinery with_apply_dockerfile_pattern." AC-5a (new) is an AST source-scan ratchet:_apply_dockerfile_pattern_invertedbody must not contain a call to_apply_dockerfile_pattern. Mutation-killer for the "just invert the Pass/Fail" anti-pattern. - H8 —
Passevidence-emptiness invariant absent. A defensive implementer might addevidence: str = ""toPass. AC-9a (new):Passhas exactly the fields{kind, rule_id}andmodel_dump()returns exactly{"kind": "pass", "rule_id": "<id>"}. Same forNotApplicable({kind, rule_id, reason}) andFail({kind, rule_id, evidence}). Pins the "make illegal states unrepresentable" discipline at the field-set level. - H9 — Toolchain extension to forbid direct
yaml.*import via AST (not ripgrep). AC-10 originally saidripgrep "yaml\\." src/codegenie/conventions/. S2-01 hardening surfaced that ripgrep misses aliases (from yaml import safe_load as _y). AC-10 now uses anast.parse+ast.walksource-scan intests/unit/conventions/test_no_direct_yaml_import.py— same shape as S2-01 AC-24. - H10 —
model_constructban via AST source-scan. AC-14 was a "pre-commit hook scans … finds zero" — same alias-resistance concern as H9. AC-14 now an AST-source-scan test colocated with the test suite. - H11 — TOCTOU on catalog-file disappearance between glob and
safe_yaml.load. S2-01 hardened withIoFailure(path, errno_name). AC-13a (new) wires the same: betweenPath.glob("*.yaml")enumeration andsafe_yaml.load(catalog_path), the file may be removed (FileNotFoundError). The hardened story addscatalog_file_unreadablesemantics covering any non-symlinkOSError(FileNotFoundError,PermissionError,IsADirectoryError) witherrno_name: strfield. - H12 —
LoadOutcomepartial-success shape consistent with S2-01. Originalload_allreturnsResult[Catalog, ConventionsError]— single-error semantics, fail-fast at first bad catalog. But the loader walks multiple catalog files and a per-file partial-success shape matches S2-01's convention (loaded skills + per-file errors), letting operators inspect a portfolio gather where some catalog files are good and others malformed. Hardened:load_all(self) -> Result[CatalogLoadOutcome, FatalLoadError]whereCatalogLoadOutcome(catalog: Catalog, per_file_errors: list[ConventionsError])— same shape as S2-01'sLoadOutcome(skills, per_file_errors). A fatalFatalLoadErroris reserved for "no search path is readable" / "every catalog file failed and the operator asked for fail-fast" (Phase 2 ships partial-success only; fail-fast deferred to a follow-up). AC-3, AC-3a, AC-3b, AC-13 are written against the partial-success shape; AC-13b pins that one malformed catalog does not erase other catalogs' rules fromoutcome.catalog.rules. - NEEDS RESEARCH: none. All findings closeable from arch + ADRs + verified repo state (
src/codegenie/probes/base.py,src/codegenie/parsers/safe_yaml.py,src/codegenie/result.py,src/codegenie/types/identifiers.py) + S2-01 hardening precedent. Stage 3 skipped.
A full audit log is at _validation/S2-02-conventions-catalog-loader.md.
Context¶
ConventionsCatalogLoader is the kernel-side loader for the org conventions catalog — YAML files at ~/.codegenie/conventions/*.yaml whose entries declare structural rules to check against a repo (e.g., "Dockerfile must use a Chainguard distroless base", "tsconfig.json must exist and set strict: true"). Conventions are organizational uniqueness expressed as data, not as prompts — the Planner queries a typed result list rather than re-discovering the org's policy at decision time (commitment "Organizational uniqueness as data, not prompts" in repo CLAUDE.md). Phase 2 ships the loader skeleton + four pattern-type variants; OPA/Rego is a Phase 16 concern (ADR-0021), and policy authoring tooling is out of scope.
The two load-bearing commitments are:
- Pattern types as a Pydantic discriminated union, exhaustively matched. The four variants —
dockerfile_pattern,dockerfile_pattern_inverted,file_pattern,missing_file— are a closedmatch/caseswitch withassert_neveron the unreachable branch.mypy --warn-unreachableis a per-module ratchet (codegenie.conventions/**) that turns a fifth-pattern-without-a-match-arm into a build error. Adding adockerfile_pattern_globvariant (Phase 3+) is a new file + ADR-amend; never a string-keyed dispatch dict. ConventionResult = Pass | Fail | NotApplicableas a Pydantic discriminated union.NotApplicableis the load-bearing third value — without it, "rule did not run because the file isn't present" gets fused with "rule passed", and the Confidence section (Phase 2 Step 8) would silently green-flag an absent input. ADR-0033 §3: make-illegal-states-unrepresentable; "rule didn't apply" is a legal state, distinct from pass and fail.
YAML access routes exclusively through Phase 1's codegenie.parsers.safe_yaml.load chokepoint (final-design §"Conflict-resolution" row 9 — same Rule 7 discipline that S2-01 enforces). The catalog file is a single YAML document; multi-doc load_all is not required (rules within one catalog file are a list under a top-level key, not separate documents).
References — where to look¶
- Architecture:
../phase-arch-design.md §"Component design" #10— interface, four pattern types, onematchper type withassert_never,Result.Err(ConventionsError(reason="unknown_pattern_type"))failure mode.../phase-arch-design.md §"Data model"—ConventionResult = Pass | Fail | NotApplicable;extra="forbid"Pydantic discipline; sub-schemas withadditionalProperties: false.../phase-arch-design.md §"Design patterns applied"row 5 (sum type / make-illegal-states-unrepresentable) and row 8 (one file per Layer G scanner — Rule of Three) — informs Catalog vs. ScannerRunner shape.../phase-arch-design.md §"Anti-patterns avoided"— "Side effects in constructors" applies toConventionsCatalogLoader.__init__.- Phase ADRs:
../ADRs/0007-no-plugin-loader-in-phase-2.md—ConventionsCatalogLoaderis kernel-side; noplugin.yaml.- Production ADRs:
../../production/adrs/0033-domain-modeling-discipline.md§3–4 — discriminated unions for pattern types + result types;mypy --warn-unreachableenforces exhaustiveness.- Source design:
../final-design.md §"Components" #10—ConventionsCatalogLoaderinterface; OPA/Rego deferred to Phase 16; Catalog.apply pure-function shape.../final-design.md §"Departures from all three inputs"§8 — Rule 7 (safe_yamlreused, no parallel loader).../final-design.md §"Conflict-resolution table"row 9 —safe_yamlchokepoint discipline.- Existing code:
src/codegenie/parsers/safe_yaml.py(Phase 1 S1-03) —load(path, *, max_bytes, max_depth=64) -> Mapping[str, JSONValue]. Only YAML loader; do not add a parallel.src/codegenie/result.py(lifted in S1-04) —Result[T, E]sum type.src/codegenie/tccm/loader.py(S1-04) — establishedResult-returning,safe_yaml-routing loader shape; this story mirrors it.src/codegenie/coordinator/validator.py(Phase 0) —JSONValuerecursive alias (single source of truth).- External docs:
localv2.md§5.4 (Layer D probes) —ConventionsProbe(S6-02) is the Phase 2 consumer; it applies the catalog to the repo snapshot and emits per-conventionPass | Fail | NotApplicableslices.localv2.md§5.5 (Layer E probes) —Ownership+ServiceTopologyStub+SloStub(S6-05) also depend on this loader for marker-driven evidence.
Goal¶
Ship src/codegenie/conventions/ (four files: __init__.py, model.py, loader.py, catalog.py) plus a single-symbol additive extension to src/codegenie/types/identifiers.py (adds ConventionId).
# codegenie/types/identifiers.py — additive extension (B2)
ConventionId = NewType("ConventionId", str)
# __all__ extended with "ConventionId"; existing newtypes unchanged.
# codegenie/conventions/model.py
class ConventionRuleDockerfilePattern(BaseModel): # frozen=True, extra="forbid"
kind: Literal["dockerfile_pattern"] = "dockerfile_pattern"
id: ConventionId
description: str
pattern: str # regex source; compiled by model_validator (H3 / DP1)
class ConventionRuleDockerfilePatternInverted(BaseModel):
kind: Literal["dockerfile_pattern_inverted"] = "dockerfile_pattern_inverted"
id: ConventionId
description: str
pattern: str # must NOT match; compiled at load
class ConventionRuleFilePattern(BaseModel):
kind: Literal["file_pattern"] = "file_pattern"
id: ConventionId
description: str
file_glob: str
pattern: str
class ConventionRuleMissingFile(BaseModel):
kind: Literal["missing_file"] = "missing_file"
id: ConventionId
description: str
file_glob: str # presence-as-assertion
ConventionRule = Annotated[
Union[ConventionRuleDockerfilePattern, ConventionRuleDockerfilePatternInverted,
ConventionRuleFilePattern, ConventionRuleMissingFile],
Field(discriminator="kind")
]
class Pass(BaseModel): # frozen=True, extra="forbid"
kind: Literal["pass"] = "pass"
rule_id: ConventionId
class Fail(BaseModel):
kind: Literal["fail"] = "fail"
rule_id: ConventionId
evidence: str
class NotApplicable(BaseModel):
kind: Literal["not_applicable"] = "not_applicable"
rule_id: ConventionId
reason: str
ConventionResult = Annotated[Union[Pass, Fail, NotApplicable], Field(discriminator="kind")]
# codegenie/conventions/loader.py — ConventionsError discriminated union (B6 — seven reasons)
class UnknownPatternType(BaseModel): # frozen=True, extra="forbid"
reason: Literal["unknown_pattern_type"] = "unknown_pattern_type"
path: Path
offending_kind: str
class SchemaError(BaseModel):
reason: Literal["schema"] = "schema"
path: Path
details: list[dict[str, JSONValue]] # Pydantic ValidationError.errors() shape
class SymlinkRefused(BaseModel):
reason: Literal["symlink_refused"] = "symlink_refused"
path: Path
class UnsafeYaml(BaseModel): # umbrella for every MalformedYAMLError cause (B6, S2-01 convention)
reason: Literal["unsafe_yaml"] = "unsafe_yaml"
path: Path
class SizeCapExceeded(BaseModel):
reason: Literal["size_cap_exceeded"] = "size_cap_exceeded"
path: Path
class DepthCapExceeded(BaseModel):
reason: Literal["depth_cap_exceeded"] = "depth_cap_exceeded"
path: Path
class CatalogFileUnreadable(BaseModel):
reason: Literal["catalog_file_unreadable"] = "catalog_file_unreadable"
path: Path
errno_name: str # e.g., "ENOENT", "EACCES", "EISDIR"
ConventionsError = Annotated[
Union[UnknownPatternType, SchemaError, SymlinkRefused, UnsafeYaml,
SizeCapExceeded, DepthCapExceeded, CatalogFileUnreadable],
Field(discriminator="reason"),
]
class CatalogLoadOutcome(BaseModel): # frozen=True, extra="forbid"
catalog: Catalog
per_file_errors: list[ConventionsError]
class FatalLoadError(BaseModel): # frozen=True; reserved for "no search path readable"
reason: Literal["no_search_path_readable"] = "no_search_path_readable"
paths: list[Path]
# codegenie/conventions/catalog.py
class Catalog(BaseModel): # frozen=True, extra="forbid"
rules: list[ConventionRule]
def apply(self, repo: RepoSnapshot) -> list[ConventionResult]: ...
# Module-level (NOT a method on Catalog — B5):
def _apply_one(rule: ConventionRule, repo: RepoSnapshot) -> ConventionResult: ...
def _apply_dockerfile_pattern(rule, repo) -> ConventionResult: ...
def _apply_dockerfile_pattern_inverted(rule, repo) -> ConventionResult: ...
def _apply_file_pattern(rule, repo) -> ConventionResult: ...
def _apply_missing_file(rule, repo) -> ConventionResult: ...
class ConventionsCatalogLoader:
def __init__(self, search_paths: list[Path]) -> None: ... # pure data; no I/O
def load_all(self) -> Result[CatalogLoadOutcome, FatalLoadError]: ...
With invariants:
- Constructor is pure data.
__init__storesself._search_paths = list(search_paths); no I/O. First I/O isload_all(). load_all()usessafe_yaml.loadexclusively. Noyaml.*imports anywhere insrc/codegenie/conventions/— enforced by an AST source-scan intests/unit/conventions/test_no_direct_yaml_import.py(H9), not a ripgrep (alias-resistant).- Unknown
kind→per_file_errorsentryUnknownPatternType(path=path, offending_kind=...). Caught by Pydantic'sField(discriminator="kind")ValidationError; loader's_classify_validation_errorhelper distinguishes discriminator-tag failures from schema-shape failures by inspectingerrors()[i]["type"]("union_tag_invalid"/"literal_error"for tag failures; everything else isSchemaError). Catalog.apply(repo)is a singlematch ruleswitch withassert_neveron the unreachable branch —_apply_oneis a module-level function, not a method onCatalog. Each variant has a dedicated_apply_<kind>(rule, repo)helper returning aConventionResult. The discriminated-unionmatchis the only branch on rule type; noisinstancechains; no string lookup tables; thecase _ as unreachable: assert_never(unreachable)arm is the load-bearing exhaustiveness pin.mypy --warn-unreachableper-module oncodegenie.conventions.*makes a missing arm a build failure.NotApplicableis the load-bearing third value. Adockerfile_patternrule against a repo with noDockerfile→NotApplicable(rule_id=rule.id, reason="no_dockerfile_present"), notPass. Afile_patternrule whosefile_globmatches zero files →NotApplicable(rule_id=rule.id, reason="file_glob_no_matches"). EveryPass/Fail/NotApplicablecarriesrule_id == rule.idexactly — asserted by AC-4 / AC-5 / AC-6 / AC-7 (B3).ConventionsErroris a Pydantic discriminated union over sevenreasonliterals (unknown_pattern_type,schema,symlink_refused,unsafe_yaml,size_cap_exceeded,depth_cap_exceeded,catalog_file_unreadable) — same shape asSkillsLoadError(S2-01 hardening B6). Theunsafe_yamlumbrella covers everyMalformedYAMLErrorcause (!!python/objectconstructor, syntaxParserError,ScannerError, top-level-non-mapping) — operationally-prudent name; convention documented in Notes-for-implementer.Catalog.applyis pure given a fixedRepoSnapshot. Two consecutiveapply()calls return equallist[ConventionResult]; the second call performs zeroos.open/Path.read_textcalls on the repo files (the first call may; the snapshot is the I/O boundary). AC-12 pins this with a counter-monkeypatch.- Pattern regexes compile at load time, not at apply time. A Pydantic
model_validator(mode="after")onConventionRuleDockerfilePattern/ConventionRuleDockerfilePatternInverted/ConventionRuleFilePatterncallsre.compile(self.pattern); failure →ValidationError→ loader wraps asSchemaError. The compiled regex is stashed on the model via_compiled_pattern: re.Pattern[str](private;model_config = ConfigDict(arbitrary_types_allowed=True)) so_apply_*helpers do not re-compile per call. AC-11a pins the load-time failure; the compiled-once invariant is documented in Notes (not promoted to AC because the per-apply timing is an implementation detail). - Dockerfile pattern matching uses
re.MULTILINE.re.search(rule.pattern, contents, flags=re.MULTILINE)so^anchors match line starts inside multi-line Dockerfiles. AC-4d is the mutation killer. file_globusespathlib.Path(repo.root).glob(rule.file_glob). Recursive**is honored bypathlib.Path.globnatively; dot-leading path components are excluded bypathlibdefault. AC-6c pins library + recursion + dot-exclusion semantics. AC-6d pinssorted(...)ordering so multi-fileFailevidence is deterministic across xfs/ext4/APFS.RepoSnapshotis read atrepo.root, not through a new method. Phase 0 ADR-0007 freezesRepoSnapshot; the four_apply_*helpers computerepo.root / relpathand callpathlib.Pathoperations directly (is_file(),read_text(encoding="utf-8", errors="replace")). A capped-text helper local tocodegenie.conventions._io(read_capped_text(path, *, max_bytes=1 << 20) -> str | None) is the safe reader; returnsNonewhen the file is absent. This avoids any Phase-0 ADR-0007 amendment. (B1.)load_all()returnsResult[CatalogLoadOutcome, FatalLoadError]— partial success. One malformed catalog file yields aper_file_errorsentry; other catalog files still load (B6, H12).FatalLoadError(reason="no_search_path_readable", paths=...)is reserved for the catastrophic case where every entry insearch_pathsis unreadable (os.access(path, os.R_OK) == Falsefor all). Emptysearch_paths→Result.Ok(CatalogLoadOutcome(catalog=Catalog(rules=[]), per_file_errors=[])).
Acceptance criteria¶
- [ ] AC-1 — module surface.
src/codegenie/conventions/__init__.pyexports — via exact-set__all__—Catalog,ConventionsCatalogLoader,ConventionRule,ConventionResult,Pass,Fail,NotApplicable,ConventionsError,CatalogLoadOutcome,FatalLoadError, and each of the fourConventionRule*Pydantic classes and each of the sevenConventionsErrorvariant classes.Pass/Fail/NotApplicableand the fourConventionRule*variants andCatalogandCatalogLoadOutcomeare allfrozen=True, extra="forbid". The two domain discriminated unions (ConventionRule,ConventionResult) useField(discriminator="kind");ConventionsErrorusesField(discriminator="reason"). - [ ] AC-1a —
ConventionIdlives incodegenie.types.identifiers.from codegenie.types.identifiers import ConventionIdresolves;ConventionId.__module__ == "codegenie.types.identifiers". AST source-scan intests/unit/conventions/test_no_local_convention_id.pyfinds zeroNewType("ConventionId", ...)calls undersrc/codegenie/conventions/(B2 — single canonical newtype home). - [ ] AC-2 — pure-data constructor (no I/O).
ConventionsCatalogLoader(search_paths=[non_existent_path, another_missing_path])does not raise and does not call any I/O primitive. Asserted bymonkeypatch.setattrover all ofos.listdir,os.scandir,os.open,os.stat,pathlib.Path.exists,pathlib.Path.is_dir,pathlib.Path.glob,pathlib.Path.iterdir, each replaced bylambda *a, **kw: pytest.fail("constructor performed I/O")(strengthens againstPath.glob/Path.iterdirsmuggling; matches S2-01 AC-2). - [ ] AC-3 — happy path per pattern type. Parametrized over the four
kindliterals (dockerfile_pattern,dockerfile_pattern_inverted,file_pattern,missing_file): each test loads a one-rule catalog YAML and asserts the loadedCatalog.rules[0]is the matchingConventionRule*class with every field fully populated —kindliteral,id == ConventionId("<expected>"),description == "<expected literal>",pattern/file_globexactly as authored. Replaces the kind-only assertion (H1 — catches a partial-deserialization mutation that losesdescriptionorid). - [ ] AC-3a — multi-rule single-file catalog round-trips in order. A two-rule catalog YAML with
kind: dockerfile_patternfirst andkind: missing_filesecond yieldslen(outcome.catalog.rules) == 2,outcome.catalog.rules[0].kind == "dockerfile_pattern",outcome.catalog.rules[1].kind == "missing_file". Pins YAML-list order preservation. - [ ] AC-3b — multi-file catalog merge is sorted-relative-path order. Two YAML files in the same
search_paths[0]directory —b.yamlcontaining one rule with idfrom-b,a.yamlcontaining one rule with idfrom-a. Afterload_all(),outcome.catalog.rules[0].id == ConventionId("from-a")andoutcome.catalog.rules[1].id == ConventionId("from-b")(lexicographic sort bypath.relative_to(search_root)— deterministic across xfs/ext4/APFS). - [ ] AC-4 —
dockerfile_patternPass/Fail/NotApplicable(rule_id assertion-strict). Three sub-tests with one rule ofid: distroless-base: - Repo with
Dockerfilecontaining matching pattern →Pass(rule_id=ConventionId("distroless-base"))—result.rule_idassertion is exact. - Repo with
Dockerfilenot matching →Fail(rule_id=ConventionId("distroless-base"), evidence=...). Theevidencestring contains either the offending line or the literal"pattern not found in Dockerfile";evidence != "". - Repo with no
Dockerfile→NotApplicable(rule_id=ConventionId("distroless-base"), reason="no_dockerfile_present"). Reason literal pinned exactly. - [ ] AC-4d —
re.MULTILINEsemantics (^matches line starts). Fixture: rulepattern: '^FROM cgr\.dev/chainguard/'against aDockerfilewhose first line is a# commentand whose second line isFROM cgr.dev/chainguard/node:latest— must returnPass. An implementation that omitsre.MULTILINEfromre.searchreturnsFail(H4 mutation killer). - [ ] AC-5 —
dockerfile_pattern_invertedthree outcomes (rule_id assertion-strict). Symmetric to AC-4 with one rule ofid: no-root-user: matching pattern →Fail(rule_idexact); absent pattern →Pass(rule_idexact); noDockerfile→NotApplicable(rule_id=..., reason="no_dockerfile_present"). Mutation killer for the off-by-negationPass↔Failflip. - [ ] AC-5a —
_apply_dockerfile_pattern_inverteddoes not delegate to_apply_dockerfile_pattern. AST source-scan intests/unit/conventions/test_inverted_helper_is_independent.pyparsessrc/codegenie/conventions/catalog.py, locates the_apply_dockerfile_pattern_invertedfunction node, and asserts noCallwhosefunc.id == "_apply_dockerfile_pattern"exists in its body. Mutation killer for the "just invert the Pass" anti-pattern (H7; Notes-for-implementer §"Don't share machinery"). - [ ] AC-6 —
file_patternover afile_globof zero matches →NotApplicable. Ruleid: tsconfig-strict,file_glob: "**/tsconfig.json", against a repo with notsconfig.jsonanywhere →NotApplicable(rule_id=ConventionId("tsconfig-strict"), reason="file_glob_no_matches")(exactreasonliteral; exactrule_id). Distinct fromPass. Pins the "empty match-set conflated with passing" bug. - [ ] AC-6a —
file_patternPasswhen every matched file matches the pattern. Rulepattern: '"strict"\s*:\s*true',file_glob: "**/tsconfig.json", against a repo with twotsconfig.jsonfiles both containing"strict": true→Pass(rule_id=...). - [ ] AC-6b —
file_patternFailnames the lexicographically-first failing file (deterministic order). Repo witha/tsconfig.json(passes) +b/tsconfig.json(fails) +z/tsconfig.json(fails). TheFail.evidencestring contains"b/tsconfig.json"(the lexicographically-first failing path relative torepo.root), not"z/tsconfig.json"— pinssorted(...)overPath.globresults before iterating (H6). - [ ] AC-6c —
pathlib.Path.globlibrary + recursive**+ dot-component exclusion. Rulefile_glob: "**/foo.json"against a repo containingrepo/x/y/foo.jsonandrepo/.hidden/foo.json(a dot-leading subdirectory). The hidden file is excluded (pathlibglobdefault behavior); the visible file participates. Pins library choice (pathlib.Path.glob, notglob.glob, notPath.rglob). - [ ] AC-7 —
missing_filesemantics (rule_id assertion-strict). Ruleid: no-rogue-dockerfile,file_glob: "Dockerfile": - Against a repo without
Dockerfile→Pass(rule_id=ConventionId("no-rogue-dockerfile")). The rule succeeds when the file is absent — the kind is named for the assertion, not the observed outcome. - Against a repo with
Dockerfile→Fail(rule_id=..., evidence=...).evidencecontains the literal substring"Dockerfile"(the offending file's relative path). - [ ] AC-8 — unknown pattern type →
per_file_errorsentry, other rules unaffected. Catalog YAML with one rulekind: dockerfile_pattern_glob(not in the enumerated four) + one well-formedkind: missing_filerule in a sibling YAML file. Afterload_all():outcome.per_file_errorscontains exactly one entry, anUnknownPatternType(path=<bad-catalog-path>, offending_kind="dockerfile_pattern_glob");outcome.catalog.rulescontains the well-formedmissing_filerule (len(...) == 1). Asserts the partial-success contract (B6/H12). - [ ] AC-8a —
unsafe_yamlumbrella coversMalformedYAMLErrorfamily (operationally-prudent name). Two sub-tests, both produceper_file_errors=[UnsafeYaml(path=...)]: - Catalog YAML containing
!!python/object/apply:os.system ['touch {sentinel}']— sentinel file does not exist afterload_all();!isinstance(per_file_errors[0], (SchemaError, SizeCapExceeded)). - Catalog YAML with a syntactic typo (
rules: [— unterminated sequence) — sameUnsafeYamlbucket. Operator readsunsafe_yamland inspects (B6, S2-01 convention). - [ ] AC-8b —
size_cap_exceededon > 1 MiB catalog. Catalog YAML padded to 1.1 MiB →per_file_errors=[SizeCapExceeded(path=...)]. Thesafe_yaml.load(catalog_path, max_bytes=1 << 20)call boundary is what fires. Other catalog files in the samesearch_paths[0]still load. - [ ] AC-8c —
depth_cap_exceededon deeply-nested catalog. Catalog YAML withrules: [{x: {y: {z: ...}}}]nesting > 64 levels →per_file_errors=[DepthCapExceeded(path=...)]. Inheritssafe_yaml.load'smax_depth=64default. - [ ] AC-9 — exhaustive
matchwithassert_never(compile-time + runtime). Two halves: - Compile-time half (
tests/unit/conventions/test_apply_match_is_exhaustive_compile_time.py): a fixture script importscodegenie.conventions.catalog._apply_oneand pattern-matches;mypy --warn-unreachableover that script must be clean. A complementarytests/fixtures/mypy_unreachable_negative_should_fail/(run under a separatemypyinvocation that's expected to fail) proves removing acasearm causes a build failure. - Runtime half (
test_apply_match_smoke_asserts_assert_never_only): a hand-constructed_Imposterobject withkind="not_a_real_kind"is passed to module-level_apply_one;pytest.raises(AssertionError)only (NOTTypeError/ValueError— B4 / TQ1). Theassert_neveris the load-bearing signal; anisinstance-whitelistraise TypeError("unknown kind")is the anti-pattern this test catches. - [ ] AC-9a —
Pass/Fail/NotApplicablefield sets are exactly minimal (illegal-states-unrepresentable).Pass(rule_id=ConventionId("x")).model_dump() == {"kind": "pass", "rule_id": "x"}(noevidence, noreason).Fail(rule_id=ConventionId("x"), evidence="y").model_dump() == {"kind": "fail", "rule_id": "x", "evidence": "y"}.NotApplicable(rule_id=ConventionId("x"), reason="y").model_dump() == {"kind": "not_applicable", "rule_id": "x", "reason": "y"}. Adding an extra field on construction (e.g.,Pass(rule_id=..., evidence="leak")) raisesValidationError(frozen+extra=forbid). Pins ADR-0033 §4. - [ ] AC-10 —
safe_yaml.loadchokepoint via AST source-scan (alias-resistant).tests/unit/conventions/test_no_direct_yaml_import.pyparses every.pyfile undersrc/codegenie/conventions/withast.parse, walksImport/ImportFromnodes, and asserts: - No
import yamlorfrom yaml import ...statement (alias-resistant — catchesfrom yaml import safe_load as _y). - No identifier whose
Attribute.value.id == "yaml"(e.g.,yaml.safe_load(...)— catchesimport yamlfollowed by usage). - Companion runtime test
test_catalog_loader_routes_yaml_through_safe_yaml_chokepointusesmonkeypatch.setattr(codegenie.parsers.safe_yaml, "load", spy)and assertsspywas called once per catalog file. Replaces the originalripgrepAC (H9 — S2-01 AC-24 precedent). - [ ] AC-11 — sub-schemas with
additionalProperties: false(extra="forbid"). A catalog YAML with an unknown field (unexpected_key: value) on a rule entry producesper_file_errors=[SchemaError(path=..., details=[...])]withdetailsa non-emptylist[dict]containing at least one row whoselocreferences the offending field. Pins the "silently ignore unknown keys" anti-pattern. Other rules in other catalog files still load. - [ ] AC-11a — uncompilable regex
pattern→SchemaErrorat load (not at apply). Catalog YAML with one rulekind: dockerfile_pattern,pattern: "[unterminated"(re.error: unterminated character set) →per_file_errors=[SchemaError(path=..., details=[...])]with at least one details row whoselocends in"pattern". Compilation happens via a Pydanticmodel_validator(mode="after");Catalog.applynever sees an uncompilable regex (DP1; H3). - [ ] AC-12 —
Catalog.apply(repo)is pure (idempotent + no repo I/O on repeated calls). Given the sameRepoSnapshot, two consecutiveapply()calls return equallist[ConventionResult](first == second). The second call performs zeropathlib.Path.read_textand zeropathlib.Path.opencalls over the repo files (asserted viamonkeypatchcounters wrapped aroundpathlib.Path.read_text/pathlib.Path.open). The first call may read repo files (e.g., theDockerfile); the snapshot is the I/O boundary. The hardened story does NOT add aread_textmethod toRepoSnapshot(B1) — counters wrap the underlyingpathlib.Pathmethods. - [ ] AC-13 —
ConventionsErrordiscriminated union, seven reasons enumerated (sixth-and-after raises). Test parametrizes over{"unknown_pattern_type", "schema", "symlink_refused", "unsafe_yaml", "size_cap_exceeded", "depth_cap_exceeded", "catalog_file_unreadable"}and asserts each constructs successfully via the discriminator with the documented field set; an eighth reason (reason="bogus") raisesValidationError. JSON-shape pin:SymlinkRefused(path=Path("/x")).model_dump() == {"reason": "symlink_refused", "path": "/x"}(cross-version mutation catcher; S2-01 AC-10 precedent). - [ ] AC-13a — TOCTOU on catalog disappearance →
CatalogFileUnreadable(other rules unaffected). Fixture: two catalog files; betweenPath.glob("*.yaml")enumeration andsafe_yaml.load, the first file is deleted (monkeypatch.setattr(safe_yaml, "load", _raise_filenotfound_for_first_then_real)).outcome.per_file_errors == [CatalogFileUnreadable(path=<missing>, errno_name="ENOENT")]; the second catalog's rules are present inoutcome.catalog.rules. - [ ] AC-13b — partial-success contract under mixed-quality catalogs. Three catalog files in
search_paths[0]: one well-formed, one withunknown_pattern_type, one withunsafe_yaml.len(outcome.catalog.rules) >= 1(the well-formed catalog's rules persist);len(outcome.per_file_errors) == 2with oneUnknownPatternTypeand oneUnsafeYaml. Erasure of well-formed rules due to a sibling-catalog failure would be a regression (H12). - [ ] AC-13c — empty
search_pathsreturnsResult.Ok(empty).ConventionsCatalogLoader(search_paths=[]).load_all() == Result.Ok(CatalogLoadOutcome(catalog=Catalog(rules=[]), per_file_errors=[])). The constructor with empty list does not crash andload_allproduces a valid empty outcome. - [ ] AC-13d — fatal
no_search_path_readablewhen every search path is unreadable.monkeypatch.setattr(os, "access", lambda *a, **kw: False)+ non-emptysearch_paths→Result.Err(FatalLoadError(reason="no_search_path_readable", paths=<input search_paths>)). The single fatal-shape; everything else is partial-success. - [ ] AC-14 —
model_constructAST source-scan ban.tests/unit/conventions/test_no_model_construct.pyparses every.pyundersrc/codegenie/conventions/, walksAttribute/Callnodes, and asserts no expression of the form<X>.model_construct(...). Complementaryforbidden-patternspre-commit hook extension scans the same paths (defense in depth; matches H10 alias-resistance). - [ ] AC-15 — toolchain.
ruff check,ruff format --check,mypy --strict,mypy --warn-unreachable(per-module override oncodegenie.conventions.*) all clean.pytest tests/unit/conventions/passes. - [ ] AC-16 — TDD discipline. Red tests committed failing; green commit makes them pass; refactor commit is no-op behavior.
Implementation outline¶
src/codegenie/types/identifiers.py— additive: appendConventionId = NewType("ConventionId", str)and add"ConventionId"to__all__. No edits to existing newtypes. Open/Closed at the file boundary; ADR-0033 §1; B2 closure.src/codegenie/conventions/model.py—from codegenie.types.identifiers import ConventionId. Define the fourConventionRule*Pydantic models. Eachpattern-carrying variant gets amodel_validator(mode="after")that callsre.compile(self.pattern)and stashes the result on_compiled_pattern(private,model_config = ConfigDict(arbitrary_types_allowed=True, frozen=True, extra="forbid"));re.errorpropagates asValueErrorfor Pydantic to bundle intoValidationError. Define theConventionRulediscriminated union withField(discriminator="kind"). DefinePass/Fail/NotApplicable(eachfrozen=True, extra="forbid", field sets exactly as AC-9a pins) and theConventionResultdiscriminated union.src/codegenie/conventions/loader.py— define the sevenConventionsErrorvariants (UnknownPatternType,SchemaError,SymlinkRefused,UnsafeYaml,SizeCapExceeded,DepthCapExceeded,CatalogFileUnreadable) asfrozen=True, extra="forbid"Pydantic models, each with itsLiteral[…]reason discriminator +path: Path+ per-variant fields. Define theConventionsErrorAnnotated[Union[...], Field(discriminator="reason")]. DefineCatalogLoadOutcome(catalog: Catalog, per_file_errors: list[ConventionsError])andFatalLoadError(reason: Literal["no_search_path_readable"], paths: list[Path]). Define_classify_validation_error(exc: ValidationError, path: Path) -> ConventionsError— inspectsexc.errors()rows; a row whosetypeis"union_tag_invalid"or"literal_error"withloc[-1] == "kind"→UnknownPatternType(offending_kind=<row.input>); a row whose loc ends in"pattern"withtypeindicating themodel_validatorre-raise →SchemaError(regex compile failure surfaces here via Pydantic's wrapping ofValueError); everything else →SchemaError(details=exc.errors()). DefineConventionsCatalogLoader.__init__(self, search_paths: list[Path]) -> Nonestoringself._search_paths = list(search_paths)(no I/O). DefineConventionsCatalogLoader.load_all(self) -> Result[CatalogLoadOutcome, FatalLoadError]as the multi-file partial-success driver (§4 below)._apply_dockerfile_pattern(rule, repo)(module-level incatalog.py) —dockerfile_path = repo.root / "Dockerfile". Ifnot dockerfile_path.is_file()→NotApplicable(rule_id=rule.id, reason="no_dockerfile_present"). Read via_io.read_capped_text(dockerfile_path, max_bytes=1 << 20)(returnsNoneif absent — defense in depth against TOCTOU betweenis_fileandread_text).match = re.search(rule.pattern, contents, flags=re.MULTILINE)— match →Pass(rule_id=rule.id); no match →Fail(rule_id=rule.id, evidence="pattern not found in Dockerfile"). Regex is already compiled by the model_validator; this call uses the compiled regex (rule._compiled_pattern.search(contents)) — there.search(rule.pattern, ...)form above is shorthand. AC-4d pinsre.MULTILINE._apply_dockerfile_pattern_inverted(rule, repo)— independent body (NOT a wrapper over_apply_dockerfile_pattern— AC-5a AST-source-scan enforces this). Same locate (repo.root / "Dockerfile"+is_file+ capped read);re.search(..., flags=re.MULTILINE)— match →Fail(rule_id=..., evidence="forbidden pattern present"); no match →Pass(rule_id=...); absent file →NotApplicable(rule_id=..., reason="no_dockerfile_present"). The negation lives in this helper; each helper readsrepoindependently (Rule of Three; arch §"Design patterns applied" row 8)._apply_file_pattern(rule, repo)—matches = sorted(repo.root.glob(rule.file_glob), key=lambda p: p.relative_to(repo.root).as_posix())(deterministic ordering across filesystems; AC-6b/AC-6c). Zero matches →NotApplicable(rule_id=rule.id, reason="file_glob_no_matches"). Iterate matches in order; for each,contents = _io.read_capped_text(path, max_bytes=1 << 20)thenre.search(rule.pattern, contents, flags=re.MULTILINE). If any fails → returnFail(rule_id=rule.id, evidence=f"{matches_relpath}: pattern not found")for the first failing path (matches_relpath = path.relative_to(repo.root).as_posix()); if all pass →Pass(rule_id=rule.id). Per-rule emission only (oneConventionResultper rule); per-file emission is out of scope._apply_missing_file(rule, repo)—matches = sorted(repo.root.glob(rule.file_glob), key=lambda p: p.relative_to(repo.root).as_posix()). Zero matches →Pass(rule_id=rule.id)(the assertion is "this file is absent"; the rule succeeds when no file matches). Any match →Fail(rule_id=rule.id, evidence=f"unexpected file present: {matches[0].relative_to(repo.root).as_posix()}"). Code comment in the function body documents the inverted naming convention for the next reader._apply_one(rule: ConventionRule, repo: RepoSnapshot) -> ConventionResult(module-level incatalog.py) — the exhaustivematch:def _apply_one(rule: ConventionRule, repo: RepoSnapshot) -> ConventionResult: match rule: case ConventionRuleDockerfilePattern(): return _apply_dockerfile_pattern(rule, repo) case ConventionRuleDockerfilePatternInverted(): return _apply_dockerfile_pattern_inverted(rule, repo) case ConventionRuleFilePattern(): return _apply_file_pattern(rule, repo) case ConventionRuleMissingFile(): return _apply_missing_file(rule, repo) case _ as unreachable: assert_never(unreachable)Catalog.apply(self, repo)is a thin wrapper:return [_apply_one(rule, repo) for rule in self.rules]. Module-level function so tests can call it without instantiatingCatalog(B5; AC-9 runtime smoke).ConventionsCatalogLoader.load_all(self)— multi-file partial-success driver. Pseudocode:Phase 2 ships without rule-ID deduplication across catalog files (single-file fixtures are the norm); a follow-up ADR can decide whether duplicate IDs across files are an error or a last-wins merge. Within-tier files are processed inif self._search_paths: readable = [p for p in self._search_paths if os.access(p, os.R_OK)] if not readable and any(self._search_paths): return Result.Err(FatalLoadError(reason="no_search_path_readable", paths=list(self._search_paths))) merged_rules: list[ConventionRule] = [] per_file_errors: list[ConventionsError] = [] for search_path in self._search_paths: if not search_path.is_dir(): continue # missing search path → silent skip; matches S2-01 AC-3a catalog_files = sorted(search_path.glob("*.yaml")) + sorted(search_path.glob("*.yml")) for catalog_path in catalog_files: try: data = safe_yaml.load(catalog_path, max_bytes=1 << 20, max_depth=64) except SymlinkRefusedError: per_file_errors.append(SymlinkRefused(path=catalog_path)); continue except MalformedYAMLError: per_file_errors.append(UnsafeYaml(path=catalog_path)); continue except SizeCapExceeded: per_file_errors.append(SizeCapExceeded(path=catalog_path)); continue except DepthCapExceeded: per_file_errors.append(DepthCapExceeded(path=catalog_path)); continue except OSError as exc: per_file_errors.append(CatalogFileUnreadable( path=catalog_path, errno_name=errno.errorcode.get(exc.errno, str(exc.errno)), )) continue try: sub_catalog = Catalog.model_validate(data) except ValidationError as exc: per_file_errors.append(_classify_validation_error(exc, catalog_path)) continue merged_rules.extend(sub_catalog.rules) return Result.Ok(CatalogLoadOutcome( catalog=Catalog(rules=merged_rules), per_file_errors=per_file_errors, ))sorted(...)order (AC-3b).src/codegenie/conventions/_io.py— single small helper:This is the only file-read entry point fromdef read_capped_text(path: Path, *, max_bytes: int) -> str | None: """Return decoded text up to ``max_bytes``; None if the file does not exist. TOCTOU-safe: handles FileNotFoundError between caller's existence check and this read. Files larger than max_bytes are truncated (the offending region beyond the cap is not part of any pattern check, which is the Phase 2 documented behavior — a 100 MB Dockerfile is non-idiomatic and the truncation is observable via byte-counting for an operator). """ try: with path.open("rb") as fh: return fh.read(max_bytes).decode("utf-8", errors="replace") except FileNotFoundError: return None_apply_*helpers. Phase 2 deliberately does NOT add aread_textmethod toRepoSnapshot(Phase 0 ADR-0007 contract freeze; B1).src/codegenie/conventions/__init__.py—__all__re-exports per AC-1. Default factoryConventionsCatalogLoader.default()(classmethod) pins[Path("~/.codegenie/conventions/").expanduser(), Path(".codegenie/conventions/")]for production callers; tests pass explicit paths.assert_neverimport.from typing import assert_never(Python 3.11+). Themypy --warn-unreachableper-module override inpyproject.toml(added in Step 1) ensures a missing variant in thematchis a build error.
TDD plan — red / green / refactor¶
Red — write the failing test first¶
Test file: tests/unit/conventions/test_catalog.py plus colocated AST-source-scan test files (test_no_direct_yaml_import.py, test_no_model_construct.py, test_inverted_helper_is_independent.py, test_no_local_convention_id.py, test_apply_match_is_exhaustive_compile_time.py). 30+ named tests covering AC-1..AC-16.
# tests/unit/conventions/test_catalog.py — red tests pinning the load-bearing ACs
import os
import textwrap
from pathlib import Path
import pytest
from codegenie.conventions import (
Catalog, ConventionsCatalogLoader, Fail, NotApplicable, Pass,
)
from codegenie.conventions.model import (
ConventionRuleDockerfilePattern, ConventionRuleMissingFile,
)
from codegenie.conventions.loader import (
UnknownPatternType, SchemaError, UnsafeYaml, SymlinkRefused,
SizeCapExceeded, DepthCapExceeded, CatalogFileUnreadable,
CatalogLoadOutcome, FatalLoadError,
)
from codegenie.probes.base import RepoSnapshot
from codegenie.types.identifiers import ConventionId
def _write_catalog(p: Path, body: str) -> Path:
p.parent.mkdir(parents=True, exist_ok=True)
p.write_text(body)
return p
def _repo_snapshot_with(tmp_path: Path, files: dict[str, str]) -> RepoSnapshot:
"""Build a Phase-0 RepoSnapshot rooted at tmp_path with the given files.
Uses the dataclass constructor directly — `RepoSnapshot` is Phase-0
contract-frozen (ADR-0007). No `build()` factory; no `read_text()` method
on the snapshot — `_apply_*` helpers compute `repo.root / relpath`.
"""
for relpath, contents in files.items():
f = tmp_path / relpath
f.parent.mkdir(parents=True, exist_ok=True)
f.write_text(contents)
return RepoSnapshot(
root=tmp_path, git_commit=None, detected_languages={}, config={}
)
def test_constructor_is_pure_data(tmp_path: Path, monkeypatch: pytest.MonkeyPatch) -> None:
"""AC-2: __init__ must perform no I/O across the full primitive set."""
def _fail(*a, **kw):
pytest.fail(f"constructor performed I/O: args={a} kwargs={kw}")
monkeypatch.setattr(os, "listdir", _fail)
monkeypatch.setattr(os, "scandir", _fail)
monkeypatch.setattr(os, "open", _fail)
monkeypatch.setattr(os, "stat", _fail)
monkeypatch.setattr(Path, "exists", _fail, raising=False)
monkeypatch.setattr(Path, "is_dir", _fail, raising=False)
monkeypatch.setattr(Path, "glob", _fail, raising=False)
monkeypatch.setattr(Path, "iterdir", _fail, raising=False)
ConventionsCatalogLoader(search_paths=[
tmp_path / "does-not-exist", tmp_path / "also-missing",
])
def test_dockerfile_pattern_pass_fail_not_applicable(tmp_path: Path) -> None:
"""AC-4: three outcomes pinned for dockerfile_pattern."""
catalog_path = _write_catalog(tmp_path / "conventions" / "c.yaml", textwrap.dedent("""\
rules:
- kind: dockerfile_pattern
id: distroless-base
description: must use a chainguard distroless base
pattern: '^FROM cgr\\.dev/chainguard/'
"""))
loader = ConventionsCatalogLoader(search_paths=[tmp_path / "conventions"])
outcome = loader.load_all().unwrap()
catalog = outcome.catalog
assert outcome.per_file_errors == []
expected_id = ConventionId("distroless-base")
# Pass — rule_id assertion-strict (B3)
repo_pass = _repo_snapshot_with(tmp_path / "pass-repo", {"Dockerfile": "FROM cgr.dev/chainguard/node:latest\n"})
result_pass = catalog.apply(repo_pass)[0]
assert isinstance(result_pass, Pass)
assert result_pass.rule_id == expected_id
# Fail — rule_id assertion-strict, evidence non-empty
repo_fail = _repo_snapshot_with(tmp_path / "fail-repo", {"Dockerfile": "FROM node:20-alpine\n"})
result_fail = catalog.apply(repo_fail)[0]
assert isinstance(result_fail, Fail)
assert result_fail.rule_id == expected_id
assert result_fail.evidence != ""
assert "pattern" in result_fail.evidence.lower() or "Dockerfile" in result_fail.evidence
# NotApplicable — no Dockerfile; rule_id + reason exact
repo_na = _repo_snapshot_with(tmp_path / "na-repo", {"package.json": "{}"})
result_na = catalog.apply(repo_na)[0]
assert isinstance(result_na, NotApplicable)
assert result_na.rule_id == expected_id
assert result_na.reason == "no_dockerfile_present"
def test_dockerfile_pattern_uses_re_multiline(tmp_path: Path) -> None:
"""AC-4d: ^ anchors must match line starts inside multi-line Dockerfile contents."""
_write_catalog(tmp_path / "conventions" / "c.yaml", textwrap.dedent("""\
rules:
- kind: dockerfile_pattern
id: distroless-base
description: chainguard base required
pattern: '^FROM cgr\\.dev/chainguard/'
"""))
outcome = ConventionsCatalogLoader(
search_paths=[tmp_path / "conventions"]
).load_all().unwrap()
# FROM is on the second line — only re.MULTILINE makes ^ match here
repo = _repo_snapshot_with(tmp_path / "r", {
"Dockerfile": "# build args first\nFROM cgr.dev/chainguard/node:latest\n",
})
assert isinstance(outcome.catalog.apply(repo)[0], Pass)
def test_missing_file_kind_succeeds_when_file_absent(tmp_path: Path) -> None:
"""AC-7: the kind is named for the assertion; rule passes when file is absent."""
_write_catalog(tmp_path / "conventions" / "c.yaml", textwrap.dedent("""\
rules:
- kind: missing_file
id: no-rogue-dockerfile
description: this repo must not ship its own Dockerfile
file_glob: Dockerfile
"""))
outcome = ConventionsCatalogLoader(
search_paths=[tmp_path / "conventions"]
).load_all().unwrap()
catalog = outcome.catalog
expected_id = ConventionId("no-rogue-dockerfile")
repo_clean = _repo_snapshot_with(tmp_path / "clean", {"package.json": "{}"})
result_clean = catalog.apply(repo_clean)[0]
assert isinstance(result_clean, Pass)
assert result_clean.rule_id == expected_id
repo_dirty = _repo_snapshot_with(tmp_path / "dirty", {"Dockerfile": "FROM scratch\n"})
result_dirty = catalog.apply(repo_dirty)[0]
assert isinstance(result_dirty, Fail)
assert result_dirty.rule_id == expected_id
assert "Dockerfile" in result_dirty.evidence
def test_file_pattern_zero_matches_is_not_applicable(tmp_path: Path) -> None:
"""AC-6: empty glob match-set MUST NOT be conflated with Pass."""
_write_catalog(tmp_path / "conventions" / "c.yaml", textwrap.dedent("""\
rules:
- kind: file_pattern
id: tsconfig-strict
description: all tsconfig files must enable strict mode
file_glob: "**/tsconfig.json"
pattern: '"strict"\\s*:\\s*true'
"""))
outcome = ConventionsCatalogLoader(
search_paths=[tmp_path / "conventions"]
).load_all().unwrap()
expected_id = ConventionId("tsconfig-strict")
repo_no_ts = _repo_snapshot_with(tmp_path / "no-ts", {"package.json": "{}"})
result = outcome.catalog.apply(repo_no_ts)[0]
assert isinstance(result, NotApplicable)
assert result.rule_id == expected_id
assert result.reason == "file_glob_no_matches"
def test_unknown_pattern_kind_returns_typed_result_err(tmp_path: Path) -> None:
"""AC-8: an unknown discriminator kind yields Result.Err(ConventionsError(...))."""
catalog_path = _write_catalog(tmp_path / "conventions" / "c.yaml", textwrap.dedent("""\
rules:
- kind: dockerfile_pattern_glob # not in the enumerated four
id: x
description: y
pattern: ".*"
"""))
outcome = ConventionsCatalogLoader(
search_paths=[tmp_path / "conventions"]
).load_all().unwrap()
# Partial-success: per_file_errors carries the typed UnknownPatternType,
# the well-formed-rules section is empty (only one bad rule was in this file).
assert outcome.catalog.rules == []
assert len(outcome.per_file_errors) == 1
err = outcome.per_file_errors[0]
assert isinstance(err, UnknownPatternType)
assert err.offending_kind == "dockerfile_pattern_glob"
assert err.path == catalog_path
def test_schema_violation_extra_field_returns_typed_result_err(tmp_path: Path) -> None:
"""AC-11: extra='forbid' is the load-bearing discipline — unknown keys MUST raise."""
_write_catalog(tmp_path / "conventions" / "c.yaml", textwrap.dedent("""\
rules:
- kind: dockerfile_pattern
id: x
description: y
pattern: ".*"
unexpected_key: value
"""))
outcome = ConventionsCatalogLoader(
search_paths=[tmp_path / "conventions"]
).load_all().unwrap()
assert outcome.catalog.rules == []
assert len(outcome.per_file_errors) == 1
err = outcome.per_file_errors[0]
assert isinstance(err, SchemaError)
# Pydantic ValidationError.errors() shape — details non-empty
assert err.details
assert any("unexpected_key" in str(row) for row in err.details)
def test_safe_yaml_chokepoint_is_the_only_yaml_call_site(tmp_path: Path, monkeypatch: pytest.MonkeyPatch) -> None:
"""AC-10: every YAML access routes through codegenie.parsers.safe_yaml.load."""
from codegenie.parsers import safe_yaml as sy
real_load = sy.load
calls: list[Path] = []
def spy(path, *, max_bytes, max_depth=64):
calls.append(path)
return real_load(path, max_bytes=max_bytes, max_depth=max_depth)
monkeypatch.setattr(sy, "load", spy)
_write_catalog(tmp_path / "conventions" / "c.yaml", "rules: []\n")
ConventionsCatalogLoader(search_paths=[tmp_path / "conventions"]).load_all().unwrap()
assert calls, "safe_yaml.load was not called — chokepoint bypassed"
def test_apply_match_smoke_asserts_assert_never_only() -> None:
"""AC-9 runtime half: the match must end in assert_never (NOT a defensive
`raise TypeError("unknown kind")`). assert_never raises AssertionError.
Allowing TypeError/ValueError would let the isinstance-whitelist anti-pattern
pass — the load-bearing signal is compile-time exhaustiveness, not a
runtime defensive check (ADR-0033 §4; B4 / TQ1).
"""
from codegenie.conventions.catalog import _apply_one
class _Imposter:
kind = "not_a_real_kind"
with pytest.raises(AssertionError):
_apply_one(_Imposter(), repo=None) # type: ignore[arg-type]
def test_apply_is_idempotent_without_repeated_io(tmp_path: Path, monkeypatch: pytest.MonkeyPatch) -> None:
"""AC-12: second apply() call reads no files from disk (snapshot is the I/O boundary)."""
_write_catalog(tmp_path / "conventions" / "c.yaml", textwrap.dedent("""\
rules:
- kind: dockerfile_pattern
id: x
description: y
pattern: "FROM"
"""))
outcome = ConventionsCatalogLoader(
search_paths=[tmp_path / "conventions"]
).load_all().unwrap()
catalog = outcome.catalog
repo = _repo_snapshot_with(tmp_path / "r", {"Dockerfile": "FROM node\n"})
first = catalog.apply(repo)
# Wrap the underlying pathlib I/O — RepoSnapshot has no read_text method
# (B1; Phase 0 ADR-0007 contract freeze).
opens: list = []
real_open = Path.open
def _spy_open(self, *a, **kw):
# Only count reads against repo files, not catalog files (which would
# not be re-read here anyway since load_all is done).
if self.is_relative_to(repo.root):
opens.append(self)
return real_open(self, *a, **kw)
monkeypatch.setattr(Path, "open", _spy_open, raising=False)
second = catalog.apply(repo)
assert first == second
assert opens == [], f"second apply() performed disk I/O on repo: {opens}"
Run; confirm every test fails because src/codegenie/conventions/ does not exist. Commit as red.
Green — make it pass¶
Land in this order:
src/codegenie/types/identifiers.py— additive: appendConventionId = NewType("ConventionId", str)+ extend__all__. Zero edits to existing newtypes (B2).src/codegenie/conventions/_io.py—read_capped_text(path, *, max_bytes) -> str | Nonehelper. Single small function; no public re-export.src/codegenie/conventions/model.py— fourConventionRule*variants withmodel_validator(mode="after")for regex compilation (_compiled_patternstash),ConventionRulediscriminator union,Pass/Fail/NotApplicable(exact minimal field sets per AC-9a),ConventionResultdiscriminator union.src/codegenie/conventions/loader.py— sevenConventionsErrorvariants (UnknownPatternType,SchemaError,SymlinkRefused,UnsafeYaml,SizeCapExceeded,DepthCapExceeded,CatalogFileUnreadable),ConventionsErrordiscriminator union,CatalogLoadOutcome,FatalLoadError,ConventionsCatalogLoaderclass with multi-file partial-success driver,_classify_validation_errorhelper over PydanticValidationError.errors().src/codegenie/conventions/catalog.py—CatalogPydantic model (frozen, extra=forbid), four module-level_apply_*helpers, module-level_apply_onewithmatch+assert_never.Catalog.applyis the thin list-comprehension wrapper.src/codegenie/conventions/__init__.py—__all__re-exports per AC-1;ConventionsCatalogLoader.default()classmethod factory.tests/unit/conventions/__init__.py,test_catalog.py(main suite from above),test_no_direct_yaml_import.py(AC-10 AST),test_no_model_construct.py(AC-14 AST),test_inverted_helper_is_independent.py(AC-5a AST),test_no_local_convention_id.py(AC-1a AST),test_apply_match_is_exhaustive_compile_time.py(AC-9 compile-time half).pyproject.toml— if not already added in Step 1, the per-modulemypy --warn-unreachableoverride oncodegenie.conventions.*.
Refactor — clean up¶
- Module docstring on
catalog.pycitesphase-arch-design.md §"Component design" #10, ADR-0033 §3–4, and the four pattern types. - The four
_apply_*helpers live as module-level pure functions (not methods onCatalog) so they're independently testable and so the_apply_onematchis a thin dispatcher — easier to read, easier to extend. RepoSnapshot.read_text(relpath)is the abstraction;_apply_*helpers never touchos.opendirectly. IfRepoSnapshot(Phase 0) doesn't expose the needed shape, file a follow-up — do not bypass.- Do not introduce a shared
ScannerRunner/ pattern-engine class. Four small helpers, four distinct shapes, ~30 LOC each. Phase-2 final design row 7 forbids the abstraction.
Files to touch¶
| Path | Why |
|---|---|
src/codegenie/types/identifiers.py |
Modify (additive) — append ConventionId = NewType("ConventionId", str) + __all__ extension. Zero edits to existing newtypes. (B2) |
src/codegenie/conventions/__init__.py |
New — public surface (__all__ exact-set) + ConventionsCatalogLoader.default() factory |
src/codegenie/conventions/model.py |
New — four ConventionRule* variants with regex-compile model_validator + ConventionRule union + Pass/Fail/NotApplicable (exact minimal field sets) + ConventionResult union |
src/codegenie/conventions/loader.py |
New — ConventionsCatalogLoader, seven-variant ConventionsError discriminated union, CatalogLoadOutcome, FatalLoadError, _classify_validation_error Pydantic-error introspection |
src/codegenie/conventions/catalog.py |
New — Catalog Pydantic model, module-level _apply_* helpers, module-level _apply_one match with assert_never |
src/codegenie/conventions/_io.py |
New — read_capped_text(path, *, max_bytes) -> str \| None; the only file-read entry point from _apply_* helpers (B1) |
tests/unit/conventions/__init__.py |
New — package marker |
tests/unit/conventions/test_catalog.py |
New — main behavioral suite (~22 tests covering AC-2..AC-13d) |
tests/unit/conventions/test_no_direct_yaml_import.py |
New — AC-10 AST source-scan (alias-resistant; replaces ripgrep) |
tests/unit/conventions/test_no_model_construct.py |
New — AC-14 AST source-scan |
tests/unit/conventions/test_inverted_helper_is_independent.py |
New — AC-5a AST source-scan over catalog.py (_apply_dockerfile_pattern_inverted body must not call _apply_dockerfile_pattern) |
tests/unit/conventions/test_no_local_convention_id.py |
New — AC-1a AST source-scan (no local NewType("ConventionId", ...) outside types/identifiers.py) |
tests/unit/conventions/test_apply_match_is_exhaustive_compile_time.py |
New — AC-9 compile-time half: mypy --warn-unreachable over a fixture script |
tests/fixtures/mypy_unreachable_negative/ |
New — fixture script that removes a case arm; companion mypy invocation expected to fail (AC-9 compile-time) |
pyproject.toml |
If not already in Step 1: mypy --warn-unreachable per-module override on codegenie.conventions.* |
Out of scope¶
- Layer D
ConventionsProbe— S6-02; consumes this loader, ships next. - Layer E probes (
Ownership,ServiceTopologyStub,SloStub) — S6-05; also consume this loader. - OPA/Rego policy backends — Phase 16 (ADR-0021).
- Per-file
Failresults from afile_patternmatch-set — Phase 2 emits oneConventionResultper rule; a future ADR can decide whether to fan out per-file results. The first offending file is named inevidence. - Cross-catalog rule-ID deduplication — left as a follow-up; single-file fixtures are the norm.
- Catalog-version-as-
IndexFreshnesssignal — registered by Step 6 (@register_index_freshness_checkforconventionswith the catalog version). This story plants the loader; the freshness registration lands at probe time. - Hostile YAML adversarial tests — Phase 1 S5-01 already pins
safe_yaml's!!python/object+ alias-amplification defenses. SkillsLoader— S2-01 (parallel-after-S1-04 with this story).- Reference TCCM Protocol-mock dispatcher — S2-03 (depends on S2-01; uses S1-04's
TCCMLoader).
Notes for the implementer¶
matchexhaustiveness is type-enforced.mypy --warn-unreachableper-module oncodegenie.conventions.*makes a missing variant in_apply_onea build failure. Do NOT fall back toelse: raise ValueError(...)— thecase _ as unreachable: assert_never(unreachable)shape is the load-bearing one. The runtime smoke test (AC-9) catches anyone who replaces it.missing_filesemantics are subtle. The rule's name describes what it asserts, not what it observes. Amissing_filerule withfile_glob: Dockerfilesays "the Dockerfile MUST be missing"; the rule passes when no Dockerfile exists. Reviewers will misread this on first encounter — leave a one-line code comment in_apply_missing_fileexplaining the inversion.NotApplicableis the load-bearing third value. A future contributor will eventually be tempted to returnPasswhen a rule's file-glob matches zero files ("nothing to check, so it passes"). That fuses two distinct states and makes the Confidence section green-flag absent inputs. AC-6 + AC-4's third sub-test pin this; don't relax them.- Regex compilation at load, not at apply. Each
pattern-carrying variant has a Pydanticmodel_validator(mode="after")that callsre.compile(self.pattern)and stashes the compiled object (usemodel_config = ConfigDict(arbitrary_types_allowed=True)if needed, or compile lazily at firstapplycall). Per-apply compile is a perf bug ifapplyis called per repo in a batch. - No
yaml.*imports. AC-10 + the Phase 1 chokepoint. If you need a YAML capabilitysafe_yamllacks, file an issue. ConventionsErrormirrorsSkillsLoadError(S2-01). Both are Pydantic discriminated unions over areasonliteral withpath: Path+ optional details. The shape is intentional — Phase 3 plugins will import both and pattern-match uniformly. Don't drift the field names (reason,path,details) between the two.- Discriminator-error introspection. Pydantic's
ValidationError.errors()for a discriminated-union failure includes aloctuple ending inkindand atypeofunion_tag_invalidorliteral_error._is_unknown_kind_errorchecks this shape;_extract_kindpulls the offending value fromerrors()[i]["input"]. Stash a snippet ofValidationError.errors()output as a code comment for the next implementer — Pydantic v2 has revised this shape twice; future-proof the introspection. Catalog.applyis consumed by Layer DConventionsProbe(S6-02) and Layer EOwnership/Topology/SLOstubs (S6-05). Keepapplypure and snapshot-driven so the probes can call it inside a@register_probe(heaviness="light")slot without I/O surprises. If a future rule type needs network access, that's a new variant + a new probe layer; don't smuggle I/O into_apply_*helpers.- Do NOT lift a shared
ScannerRunner/ pattern-engine. Final-design row 7 explicitly rejects this (Rule of Three + SRP); four ~30-LOC helpers are cheaper than the abstraction they'd share. If a fifth variant lands in Phase 3+, that's the trigger to re-evaluate — not before.
Design-pattern notes (added by validator 2026-05-15)¶
These are contextual observations that make the story easy to maintain and extend by addition. They are not ACs because the observable behavior is already constrained — but a contributor reaching for a tempting shortcut should see this section before they decide.
ConventionIdlives incodegenie.types.identifiers, not a sibling location. The newtype roster is the canonical home for every domain identifier that crosses ≥ 2 module boundaries (ADR-0033 §1).ConventionIdwill be imported byloader.py,model.py,catalog.py, the futureConventionsProbe(S6-02), and the Layer EOwnership/Topology/SLOstubs (S6-05) — five consumers and counting; the lift is overdetermined. Story line 0 of the Implementation outline is the one-line addition; AC-1a is the AST source-scan ratchet. (B2 closure.)RepoSnapshotis read at the boundary; no new method on the frozen contract. AddingRepoSnapshot.read_text(relpath)or abuild()factory would amend Phase 0 ADR-0007. The cheaper path isrepo.root / relpath+ a local_io.read_capped_texthelper. This composes with the broader pattern:RepoSnapshotis the input boundary (where the I/O is named);_apply_*helpers are the functional core that consume it viapathlib.Pathoperations. Functional core / imperative shell, withpathlib.Pathitself as the seam. If a future probe needs aread_textmethod (e.g., for caching policy reasons), that's a Phase ADR amendment with the contract-freeze sentinel test as the gate — not a quiet attribute addition. (B1 closure.)- Regex as smart-constructor (Pydantic
model_validator, not deferred to apply-time). Eachpattern-carrying variant compiles its regex at load time. Compilation failure is aSchemaErrorevent the operator sees during the gather phase, not aRuntimeErrormid-Catalog.applyafter dozens of repos have already been scanned. AC-11a is the load-time pin; the compiled-once cache (_compiled_patternstash) is an implementation detail. Mirrors S1-03'ssafe_yaml-level discipline: bad input fails at the parse boundary, not the consumption boundary. unsafe_yamlumbrella naming is operationally-prudent. A YAML parse failure can be (a) a hostile!!python/object/apply:os.systemconstructor exploit, (b) a syntacticParserError, (c) aScannerError(e.g., illegal Unicode), or (d) top-level-non-mapping.safe_yaml.loadfuses all four intoMalformedYAMLError. Operators readingunsafe_yamlin a CLI log are expected to inspect the file before re-running — same posture for any of the four causes. Splitting the umbrella intoparse_error/constructor_exploitwould require inferring intent from__cause__chains, which Pydantic v2 has already revised. Convention: the umbrella name names the operational response, not the parser-flavored root cause. (B6; S2-01 convention.)- Partial-success multi-file pattern matches S2-01.
CatalogLoadOutcome(catalog, per_file_errors)mirrors S2-01'sLoadOutcome(skills, per_file_errors). The convention going forward: single-file loaders (TCCMLoader, S1-04) useCodegenieError-marker exceptions with string-prefixedargs[0]; multi-file partial-success loaders (SkillsLoader, this story, future Phase 4+ loaders of the same shape) use Pydantic discriminated unions + aLoadOutcome-shaped envelope. Phase 3 plugin authors will import both and pattern-match uniformly acrossSkillsLoadErrorandConventionsError— keep the field names (reason,path,details) identically shaped. (H12; CN3.) - Four independent
_apply_*helpers (rule of three NOT reached). Adding a fifth variant (e.g.,dockerfile_pattern_globin Phase 3+) is: (1) a newConventionRule*class with its ownmodel_validator, (2) extension of theConventionRuleAnnotated[Union[...]], (3) a new_apply_<kind>helper, (4) a newcasearm in_apply_one(themypy --warn-unreachableratchet on_apply_onemakes a missing arm a build failure). Zero edits to existing helpers; zero edits toCatalogorConventionsCatalogLoader. Open/Closed at the file boundary. If/when a sixth variant arrives, that's the rule-of-three trigger to re-evaluate whether a shared_PatternEnginereads-and-matches helper is justified — but not before. Arch §"Design patterns applied" row 8 is the load-bearing observation here. (DP framing.) Pass/Fail/NotApplicablefield sets are exactly minimal (illegal-states-unrepresentable).Passhas noevidencefield;NotApplicablehas noevidence;Failhas noreason. APass(rule_id=..., evidence="leaky data")is rejected byextra="forbid". The Confidence section (Phase 2 Step 8) pattern-matches onkindand reads only the fields documented for that kind — no defensive "if hasattr(result, 'evidence')" reader. ADR-0033 §4. (AC-9a.)- Module-level
_apply_one, not aCatalogmethod. Tests need to smuggle an_Imposterinto thematcharm to verifyassert_never. Calling it asCatalog(rules=[]).apply(...)would require constructing a synthetic rule that Pydantic would reject. Module-level form sidesteps this and matches the four_apply_*helpers' visibility.Catalog.applyis a thin list-comprehension wrapper — almost no behavior of its own. (B5.) - Three-and-counting newtype lift moments to watch. This story lifts
ConventionIdtocodegenie.types.identifiers. S1-05's existingSkillId/TaskClassId/ProbeIdlive there too. Future Phase 2 stories may wantLanguage,EvidenceKey,ConventionRuleKind-as-newtype — they should follow the same pattern: extend the module additively; AST-source-scan ratchet in the consumer module to forbid local redefinition. - No env-var auto-discovery; explicit
search_paths. Mirrors S2-01 / arch §"Anti-patterns avoided" row 11. TheConventionsCatalogLoader.default()classmethod factory is the only place that resolves~/.codegenie/conventions/+.codegenie/conventions/from disk; tests pass explicit paths. A future contributor who wantsCODEGENIE_CONVENTIONS_PATHenv-var resolution should file an ADR — not silently extend the loader. Catalog.applyis consumed by Layer DConventionsProbe(S6-02) and Layer E stubs (S6-05). Keepapplypure and snapshot-driven so the probes can call it inside a@register_probe(heaviness="light")slot without I/O surprises (AC-12). If a future rule type needs network access (e.g., remote-policy fetch), that's a new variant + a new probe layer; don't smuggle I/O into_apply_*helpers.