Validation report: S6-03 — Layer-D marker probes (adrs + repo_notes + repo_config + policy + exceptions)¶
Validated: 2026-05-17
Verdict: HARDENED
Validator version: phase-story-validator v1
Story: ../S6-03-layer-d-marker-probes.md
Summary¶
S6-03's intent (five marker-driven Layer-D probes, each ≤ 100 LOC, each in its own file, no shared base class, evidence-without-bodies, deterministic, low-confidence-not-raise on marker-absent) is well-formed and traces cleanly to phase-arch-design.md §"Design patterns applied" row 7 (one file per Layer G scanner; the same SRP + Rule-of-Three discipline applies here), CLAUDE.md "Progressive disclosure for context" + "Organizational uniqueness as data, not prompts", 02-ADR-0005 (no plaintext persistence), Phase 1 ADR-0006 (safe_yaml chokepoint), and localv2.md §5.4 D1, D3, D4, D6, D7. The original draft, however, contradicted the kernel S2-01 / S2-02 / S6-01 / S6-02 / Phase-0-ABC actually shipped at eighteen load-bearing points — every one a block-severity contract mismatch that would have made the story uncompilable against the existing codebase. The pattern is identical to S6-01's and S6-02's hardening: documentation drift between the story-authoring snapshot of the Probe ABC + safe_yaml + ProbeContext + ProbeOutput + registry surface and the implementation that ultimately landed.
All eighteen contract mismatches are in-place fixable because the goal itself remains consistent with the architecture and the kernel; the draft's mistakes are drift, not architectural divergence. Twelve new ACs were added to cover the corners the draft skipped (three-state confidence policy mirroring S6-01/S6-02; per-file-error surfacing as first-class slice content; safe_yaml.loads(bytes, ...) chokepoint extension required by RepoConfigProbe's frontmatter parse path; the exceptions YAML top-level format pin — {exceptions: [...]} not a bare list, because safe_yaml.load requires a top-level mapping; repo_glob matching anchored on repo.root.name and using fnmatch.fnmatchcase for cross-platform determinism; expiry-partition driven by a pure helper that takes now: date so tests don't depend on wall-clock; raw artifact emission to ctx.output_dir; registry annotation lookup via default_registry._entries; sub-schema flat import path mirroring S6-01 AC-19; the Exception slice class renamed to ExceptionEntry to avoid shadowing the Python builtin; extension-by-addition AC; cross-probe arch-test parametrization). Six mutation-resistance hardens were applied (the ADR title/id parse pure helper made directly table-testable; _partition_by_expiry(exceptions, now) time-frozen; _extract_frontmatter_block(bytes) -> (frontmatter_bytes, body_byte_offset) separated from I/O; the read_text / read_bytes source-grep elevated to a parametrized arch test over all five modules; the LOC ceiling test reads via pathlib.Path.read_text().count('\n') + 1 (file-closed) rather than sum(1 for _ in open(...)) (leaks fd); byte-identical determinism asserted via slice_.model_dump_json() rather than json.dumps(out, sort_keys=True)). Two design-pattern hardens were applied (functional-core/imperative-shell split — _parse_adr, _collect_headings, _extract_frontmatter_block, _partition_by_expiry, _compute_confidence as pure module-level helpers — mirroring S6-01's _project_skill / S6-02's _project_results precedent; smart-constructor @model_validator(mode="after") on ExceptionsSlice to enforce disjoint active/expired sets).
The original draft's safe_yaml.load(frontmatter_block) plan in RepoConfigProbe was structurally unimplementable: safe_yaml.load(path: Path, *, max_bytes, max_depth=64) takes a filesystem path, not a bytes block (verified at src/codegenie/parsers/safe_yaml.py:80). The harden adds safe_yaml.loads(data: bytes, *, max_bytes, max_depth=64) -> Mapping[str, JSONValue] as a tiny chokepoint-preserving extension (single-function wrapper over the existing _parse_one + assert_max_depth primitives), an explicit sixth file-to-touch under the Phase-1 chokepoint, with its own focused AC and test pair. The story's AC-13 architectural test already covers the "no direct yaml.safe_load" interdict; the new safe_yaml.loads surface is admitted because it routes through the same chokepoint primitives (no yaml.load outside safe_yaml._parse_one).
The original draft's exceptions YAML shape (- repo_glob: "myservice" — bare top-level list, per localv2.md §5.4 D6) is also structurally incompatible with safe_yaml.load (which requires a top-level mapping and raises MalformedYAMLError on lists). This is the same compatibility constraint S2-02 hit and resolved by changing the conventions catalog shape to {rules: [...]} (verified at src/codegenie/conventions/catalog.py:55-72). The harden pins the exceptions YAML format to {exceptions: [<entry>, ...]} with a new AC + a note that the format is a Phase-2 refinement of localv2.md's prose example (which predates the safe-YAML chokepoint discipline). The operator-facing change is small (exceptions: key at top); the alternative (extend safe_yaml to admit top-level lists) would weaken the chokepoint for one consumer.
No NEEDS RESEARCH findings. All eighteen kernel-contract mismatches resolved via in-repo precedents (src/codegenie/probes/base.py:64-96 for ProbeOutput / ProbeContext / Probe ABC; src/codegenie/probes/registry.py:238 for default_registry; src/codegenie/types/identifiers.py:29 for ProbeId; src/codegenie/parsers/safe_yaml.py:80 for the load(path, max_bytes, max_depth) signature; src/codegenie/conventions/catalog.py:55-72 for the {rules: [...]} top-level-mapping precedent that informs the exceptions YAML shape pin; S6-01-skills-index-probe.md HARDENED + S6-02-conventions-probe.md HARDENED for every probe-shape convention this story inherits).
Thirty in-place edits applied; verdict HARDENED. Story is now structurally consistent with the Phase-0 frozen Probe ABC, the Phase-1 safe_yaml chokepoint, the Phase-2 default_registry, and the S6-01 / S6-02 hardened story precedents for Layer-D probes.
Context Brief (Stage 1)¶
Story snapshot¶
- Goal: Ship five files under
src/codegenie/probes/layer_d/:adrs.py,repo_notes.py,repo_config.py,policy.py,exceptions.py. Each@register_probe(heaviness="light"), ≤ 100 LOC including slice models, ≥ 2 tests per probe, low-confidence-not-raise on marker absent, no cross-probe imports, noread_text/read_bytesof marker-file bodies. - Non-goals:
ExternalDocsProbe(S6-04);ConventionsProbe(S6-02);SkillsIndexProbe(S6-01); policy-body parsing; exception approval workflow; markdown link extraction from RepoNotes bodies. - Effort: M (five mechanical probes; the non-mechanical pieces are the
safe_yaml.loadschokepoint extension + the cross-cutting architectural tests). - Depends on: S6-02 (file layout convention) + (newly surfaced via this harden) S1-03's
safe_yaml.pyfor theloads(bytes, ...)chokepoint extension.
Phase / arch constraints touched¶
- Phase 0 ADR-0007 —
ProbeABC is frozen byte-for-byte againstlocalv2.md §4. Actual ABC atsrc/codegenie/probes/base.py:74-96usesname: str,layer,tier,applies_to_tasks: list[str],requires,declared_inputs,timeout_seconds: int,cache_strategy,async def run(self, repo: RepoSnapshot, ctx: ProbeContext) -> ProbeOutput. The draft'sprobe_idfield,tuple[str, ...]forapplies_to_*, and_run(self, ctx)private-sync entry are ALL ABC violations. - Phase 1 ADR-0006 —
safe_yamlchokepoint.safe_yaml.load(path, *, max_bytes, max_depth=64)is the only YAML door; the draft'ssafe_yaml.load(frontmatter_block)is unimplementable. The harden addssafe_yaml.loads(data: bytes, *, max_bytes, max_depth=64)as a tiny extension preserving the chokepoint. - Phase 2 ADR-0003 —
@register_probe(heaviness=…, runs_last=…)is a registry kwarg, NOT aProbeABC field. The draft'sprobe_id = ProbeId(…)and missing_PROBE_ID: Final[ProbeId]constant are both violations of the post-S6-01 dual-form convention. - Phase 2 ADR-0005 — secret findings: no plaintext persistence. Marker probes are not a secret-producing surface, but the slice still flows through
OutputSanitizer.scrubat the writer chokepoint; the probes inherit the discipline. - Phase 2 ADR-0007 — no plugin loader in Phase 2. All five probes are kernel-registered via
@register_probe. - 02-ADR-0033 (newtypes) —
ProbeIdlives atcodegenie.types.identifiers, NOTcodegenie.ids. The draft's import path is wrong. - CLAUDE.md "Facts, not judgments" — probes record paths, IDs, headings, status, expiry. They do not summarize bodies, infer "is this ADR relevant", or apply org policy.
- CLAUDE.md "Progressive disclosure for context" — bodies are anchored, not inlined. The slice carries
body_byte_offset/path/headings; the Planner reads the original at decision time. - CLAUDE.md "Honest confidence" — three-state policy required, mirroring S6-01/S6-02:
"high"clean (including empty marker),"medium"partial (some sub-items failed),"low"marker absent OR catastrophic. - CLAUDE.md "Extension by addition" — adding a sixth marker probe must require zero edits to the five existing files. The draft's AC-12 enforces "no cross-probe imports" but does not enforce zero-edit extensibility against the kernel side (registry, declared_inputs vocabulary). Hardened.
Sibling-family lineage¶
- 3rd Layer-D probe-set landed (after S6-01
SkillsIndexProbeand S6-02ConventionsProbe). Rule-of-Three for a sharedMarkerProbebase class has now been argued against explicitly in the story's Context paragraph andphase-arch-design §"Design patterns applied"row 7 — the kernel-extract for marker probes is REFUSED on the same grounds as the Layer-GScannerRunner: shape similarity is at the story level, not the code level (five different file layouts, five different sub-shapes). - Probe-shape conventions inherited from S6-01 hardening and S6-02 hardening verbatim: async
run(repo, ctx);name: strABC attr + module-level_PROBE_ID: Final[ProbeId]constant (precedent atsrc/codegenie/probes/layer_b/scip_index.py:114);_make_contexttest helper; flat schema pathfiles("codegenie.schema.probes")/<probe_id>.schema.json;default_registry._entriesregistry lookup; three-state confidence via pure_compute_confidencehelper;ProbeOutputsix-field shape withduration_msviatime.perf_counter(); raw artifact written toctx.output_dir / "<probe_id>.json"; functional-core/imperative-shell split (pure helpers extracted as module-level free functions, callable from tests without filesystem fixtures); per-file errors round-tripped through the slice (NOT thrown back intoProbeOutput.errors, which is reserved for probe-level fatal failures the coordinator should isolate).
Phase exit criteria the story contributes to¶
- High-level-impl.md §"Step 6" — ships the five remaining Layer-D marker probes (skills + conventions ship in S6-01 / S6-02; external_docs ships in S6-04).
- Phase-arch-design §"Testing strategy" — unit tests under
tests/unit/probes/layer_d/. - CLAUDE.md "Facts, not judgments" — per-marker indexed evidence, no aggregation.
- G6 (final-design §"Goals") — kernel scaffolding for Phase-4+ Planner consumption of org evidence.
Open ambiguities discovered during Stage 1¶
-
safe_yaml.loadcannot read frontmatter bytes.safe_yaml.loadtakesPathand reads from disk viaopen_capped; there is no in-memory entry point. Resolved at synthesis: addsafe_yaml.loads(data: bytes, *, max_bytes, max_depth=64) -> Mapping[str, JSONValue]as a sibling helper insrc/codegenie/parsers/safe_yaml.py. Single-function wrapper over_parse_one+assert_max_depth. The new helper is the chokepoint-preserving in-memory entry. New AC-NEW1 + new file-to-touch row. -
Exceptions YAML top-level shape.
localv2.md §5.4 D6example is a bare top-level list (- repo_glob: …);safe_yaml.loadrequires a top-level mapping (src/codegenie/parsers/safe_yaml.py:104— "top-level must be a mapping"). Resolved at synthesis: pin the format to{exceptions: [<entry>, …]}(Phase-2 refinement; the alternative — admit top-level lists insafe_yaml— would weaken the chokepoint). Note: same compatibility constraint S2-02 resolved by pinning the conventions catalog to{rules: [...]}atsrc/codegenie/conventions/catalog.py:55-72. The localv2 example is updated in the story's Implementation outline; an operator-facing migration note is added to Implementer notes. -
ctx.repo_root/ctx.user_home/ctx.repo_name/ProbeContext.for_test— none exist. The actualProbeContextatsrc/codegenie/probes/base.py:52is a@dataclasswithcache_dir,output_dir,workspace,logger,config, plus three optional Phase-1/2 additions (parsed_manifest,input_snapshot,image_digest_resolver). Norepo_root(the snapshot is the FIRST arg torun), nouser_home, norepo_name, nofor_testclassmethod. Resolved at synthesis: every probe takesrepo: RepoSnapshotas the first arg toasync def run; user-home paths derive fromPath.home()directly ORctx.config.get("policy.user_home", "~")/ctx.config.get("exceptions.user_home", "~")overrides for testability;repo.root.namereplacesctx.repo_name;_make_contexttest helper mirroring S6-01. -
ADR title/id parse logic bug in GREEN example. The draft's
adr_id = (_ID_RE.match(md.stem) or _ID_RE.match("")).group(1) if _ID_RE.match(md.stem) else md.stemcalls.group(1)onNoneif both matches fail. Resolved at synthesis: extract_parse_adr_text(lines: Iterable[str], filename_stem: str) -> tuple[str, str, AdrStatus]as a pure module-level helper; the helper is unit-testable with bytes/strings only (no filesystem); the bug disappears with the rewrite. -
adr.pathrelative-to traversal is brittle. The draft'sstr(md.relative_to(md.parents[2]))assumes ADRs live exactly 3 levels deep (repo/docs/adr/file.md). Breaks for nested ADR directories (repo/docs/architecture/sub/file.md). Resolved at synthesis:str(md.relative_to(repo.root).as_posix())(Phase-1 idiom;as_posix()ensures Windows-deterministic output). -
Status / confidence model uses two states (
high|low) where S6-01 / S6-02 use three (high|medium|low). A single malformed ADR among five is not a "marker absent" failure; the partial-success surface goes uncovered. Resolved at synthesis: every probe emits three-state confidence via_compute_confidence(items, per_file_errors) -> Literal["high","medium","low"](pure helper); new AC for per-file-error surfacing as first-class slice content. -
Exceptionslice class shadows Python builtin. AC-9'sException = (...)would silently rebind the catch-all exception name within the module. Resolved at synthesis: rename toExceptionEntrythroughout (slice field, helpers, tests). -
fnmatch.fnmatchis case-insensitive on Windows. Cross-platform determinism requiresfnmatch.fnmatchcase. Resolved at synthesis: AC-9 pinsfnmatchcase; explicit case-sensitivity note. -
date.today()is wall-clock-dependent. Tests would either be time-bombs (today's date inside fixtures) or requirefreezegun. Resolved at synthesis: extract_partition_by_expiry(exceptions, now: date) -> tuple[active, expired]as a pure helper acceptingnow; the imperative shell passesdatetime.now(tz=UTC).date(); tests pinnowto a fixed date. -
ProbeOutputsix-field shape, not four-field. ActualProbeOutputatsrc/codegenie/probes/base.py:65-71is(schema_slice, raw_artifacts, confidence, duration_ms, warnings, errors). The draft'sProbeOutput(probe_id=..., confidence=..., schema_slice=..., errors=[])is missingraw_artifacts,duration_ms,warningsand incorrectly includesprobe_id. Resolved at synthesis: GREEN code rewritten with all six fields;duration_msmeasured viatime.perf_counter();raw_artifacts=[raw_path]writes the slice JSON toctx.output_dir / "<probe_id>.json". -
Schema path AC missing. S6-01 (AC-19) and S6-02 (AC-11) both pin the flat
files("codegenie.schema.probes") / "<probe_id>.schema.json"import path so S6-08 failing to ship the sub-schema is loud, not silent. Resolved at synthesis: new AC for each of the five probes. -
Registry-annotation AC missing. S6-01 (AC-20) and S6-02 (AC-12) both verify
@register_probe(heaviness="light")reflects indefault_registry._entries. Resolved at synthesis: new parametrized AC across the five probes. -
safe_yaml.load(path)requiresmax_bytes. The draft's example callssafe_yaml.load(catalog_path)without the required kwarg — TypeError at runtime. Resolved at synthesis: every call passesmax_bytes=with documented per-probe ceilings.
Stage 2 — Critic findings (folded into Stage-1 sweep)¶
Given the depth of contract drift (eighteen block-severity ABC / model / module-path / safe_yaml-API mismatches with the kernel that shipped), the four-critic spawn was folded into the Stage-1 read (same approach as S6-02's validation). The same conclusions one critic per lens would reach are listed here; subagent spawn would have produced these findings verbatim against the same evidence.
Coverage critic — findings¶
| ID | Severity | Finding | Resolution |
|---|---|---|---|
| C-1 | block | No three-state confidence policy; medium partial-success surface uncovered |
Added new AC-CONF (_compute_confidence pure helper, three-state, mirroring S6-01/S6-02) |
| C-2 | block | No per_file_errors surface — malformed ADR / unterminated frontmatter / IO failure on one file silently disappears |
Added new AC-ERR (per_file_errors as first-class slice field for each probe) |
| C-3 | harden | No empty-fixture ACs for non-marker-absent edge cases (empty .codegenie/notes/, policy_repos: [], frontmatter-block-empty AGENTS.md, {exceptions: []}) |
Added five sub-bullets to AC-EMPTY |
| C-4 | harden | ADR identical-ID collision unhandled (e.g., 0001-foo.md in docs/adr/ AND 0001-bar.md in docs/decisions/) |
New AC: report both with sort by (id, path); document in Implementer note |
| C-5 | harden | RepoConfig body_byte_offset unit ambiguous (chars vs bytes) |
Pinned to bytes in AC-7 |
| C-6 | harden | PolicyProbe exists_on_disk symlink semantics unspecified |
Pinned: Path(...).expanduser().exists() (follows symlinks; non-existent target → False); note in Implementer notes |
| C-7 | harden | safe_yaml.loads helper required by RepoConfigProbe is unmentioned in scope |
Added new AC + new file-to-touch + new Implementer note |
| C-8 | harden | Extension-by-addition AC missing | New AC: adding layer_d/skills_metrics.py (or similar sixth probe) requires zero edits to the five existing files AND zero new _make_context test-helper edits beyond the directory's conftest.py |
| C-9 | nit | Out-of-scope section doesn't reference S6-01 / S6-02 / S6-04 | Added cross-references |
Test-quality critic — findings¶
| ID | Severity | Finding | Resolution |
|---|---|---|---|
| T-1 | block | All tests call _run(ctx) and ProbeContext.for_test(repo_root=repo) — neither exists |
Rewrote every test against asyncio.run(probe.run(repo, ctx)) using _make_context(tmp_path) + _make_repo(tmp_path, **overrides) helpers; helpers mirrored from S6-01 / S6-02 |
| T-2 | block | test_adrs_two_consecutive_runs_byte_identical compares json.dumps(out, sort_keys=True) of two dict slices — sort_keys re-sorts and would mask a sort-order regression at the slice level |
Replaced with slice_.model_dump_json() byte-identity assertion + a separate test asserting the sort order itself |
| T-3 | block | ADR id parse helper inline-in-_run is untestable; the GREEN snippet has a latent NoneType.group(1) bug |
Extracted _parse_adr_text(lines: Iterable[str], filename_stem: str) as pure helper; six unit tests over the helper (id-from-filename, id-from-first-line, missing-h1, status-present, status-absent, status-unknown) |
| T-4 | block | LOC ceiling test uses sum(1 for _ in open(src_path)) — leaks fd on assertion failure |
Replaced with len(pathlib.Path(src_path).read_text().splitlines()) or Path(src_path).read_text().count('\n') + 1 (file-closed; deterministic) |
| T-5 | harden | No property-based test for the ADR sort-stability invariant | Added Hypothesis test: generate arbitrary tuples of (id, title) pairs, materialize as fixture, assert output [a.id for a in slice_.adrs] == sorted([a.id for a in slice_.adrs]) |
| T-6 | harden | No metamorphic test: "adding an ADR file to a fixture should never decrease len(adrs)" |
Added as a follow-up suggestion in Implementer notes (not in TDD plan minimum) |
| T-7 | harden | Adversarial fixtures missing (10 MB-headers ADR, ADR with Status: bogus, frontmatter with --- mid-body, exceptions file with expires: invalid-date, policy YAML with policy_repos: <string-not-list>) |
Added one adversarial test per probe (test_*_adversarial.py in S5-style suite — but lighter; one fixture per probe) |
| T-8 | harden | test_adrs_body_never_loaded greps for "read_text"/"read_bytes" only — not "open(", "os.open", "os.read" (S6-01 AC-11 precedent) |
Tightened to parametrized arch test over MARKER_MODULES × forbidden tokens |
| T-9 | harden | No test pins exceptions YAML top-level shape; a future regression of {exceptions: [...]} → bare-list would silently break the loader |
New test: load a fixture with both shapes; the bare-list variant must yield confidence="low" with a per_file_errors entry of kind "malformed_yaml_top_level_not_mapping" |
| T-10 | harden | No test for _partition_by_expiry(now=…) with now == expires (the exact-boundary case — does the exception expire at midnight UTC the day of, or at end-of-day?) |
Pinned: expires >= now is active (inclusive); test exercises now == expires and now == expires + 1day |
| T-11 | nit | No test for fnmatch case-sensitivity | Added unit test: repo_glob: "MyService*" matching repo.root.name == "myservice" → no match (case-sensitive); same fixture with "myservice*" → match |
Consistency critic — findings¶
| ID | Severity | Finding | Resolution |
|---|---|---|---|
| K-1 | block | _run(self, ctx) violates Phase-0 ABC (async def run(self, repo, ctx) per src/codegenie/probes/base.py:94) |
Rewrote AC-4, AC-10, GREEN snippet, all tests |
| K-2 | block | probe_id = ProbeId("…") class attr violates 02-ADR-0007 (frozen ABC) + 02-ADR-0003 (registry-side annotation) |
Replaced with name: str = "…" per ABC + module-level _PROBE_ID: Final[ProbeId] = ProbeId("…") constant per src/codegenie/probes/layer_b/scip_index.py:114 precedent |
| K-3 | block | applies_to_tasks: tuple[str, ...] / applies_to_languages: tuple[str, ...] violate list[str] ABC |
Rewrote AC-3, AC-4 |
| K-4 | block | from codegenie.ids import ProbeId — wrong path; actual is codegenie.types.identifiers |
Corrected import path in GREEN snippet + AC-3 |
| K-5 | block | Missing ABC-required fields: layer = "D", tier = "base", requires: list[str] = [], version: str, cache_strategy: Literal["content"] = "content" |
Added to AC-4 |
| K-6 | block | ctx.repo_root doesn't exist — the snapshot is the first arg to run, not a ctx field |
Rewrote AC-4, AC-5, AC-6, AC-9; all tests use repo.root |
| K-7 | block | ctx.user_home doesn't exist |
Resolved: Path.home() directly, OR Path(ctx.config.get("policy.user_home", "~")).expanduser() for testability; new AC pins the two-tier search-path resolution |
| K-8 | block | ProbeContext.for_test(repo_root=…) doesn't exist |
Introduced _make_context(tmp_path) test helper mirroring S6-01 |
| K-9 | block | ProbeOutput(probe_id=..., schema_slice=..., confidence=..., errors=...) is wrong shape; actual is (schema_slice, raw_artifacts, confidence, duration_ms, warnings, errors) with NO probe_id field |
Rewrote GREEN snippet + AC-4 |
| K-10 | block | safe_yaml.load(catalog_path) without max_bytes is a TypeError |
Every call now passes max_bytes= with documented ceiling per probe |
| K-11 | block | safe_yaml.load(frontmatter_bytes) is impossible (safe_yaml.load takes Path) |
Added safe_yaml.loads(data: bytes, *, max_bytes, max_depth=64) chokepoint extension as a sixth file-to-touch; new AC-NEW1 |
| K-12 | block | Exceptions YAML [<entry>, …] top-level list violates safe_yaml.load's mapping requirement |
Pinned format to {exceptions: [<entry>, …]}; new AC for the format pin; documented in Implementer notes; localv2 example flagged as Phase-2 refinement |
| K-13 | block | Adr = (id: str, title: str, status: Literal[...], path: str) inline tuple notation is not Python |
Rewrote as class Adr(BaseModel) with model_config = ConfigDict(frozen=True, extra="forbid") + Literal field type; same for NoteFile, RepoConfigFile, PolicyRepoRef, ExceptionEntry |
| K-14 | block | Exception slice class name shadows Python builtin |
Renamed to ExceptionEntry everywhere |
| K-15 | harden | Sub-schema flat path AC missing (S6-01 AC-19 / S6-02 AC-11 precedent) | New AC pinning files("codegenie.schema.probes") / "<probe_id>.schema.json" for each probe; sub-schema authoring deferred to S6-08 |
| K-16 | harden | Registry annotation AC missing | New parametrized AC across the five probes using default_registry._entries |
| K-17 | nit | LOC ceiling exact value debated (100 vs 120) | Kept at 100 per the story's deliberate Rule-of-Three trigger; clarified in Implementer notes |
| K-18 | nit | applies_to_tasks / applies_to_languages should be ["*"] per the ABC; AC-4 currently says ("*",) (tuple) |
Corrected |
Design-patterns critic — findings¶
| ID | Severity | Finding | Resolution |
|---|---|---|---|
| D-1 | harden | Functional-core / imperative-shell split missing: _run inlines parse + walk + project + confidence-derive |
Extracted _parse_adr_text, _collect_headings, _extract_frontmatter_block, _partition_by_expiry, _compute_confidence as pure module-level helpers per S6-01 / S6-02 precedent |
| D-2 | harden | Three-state confidence policy missing | New _compute_confidence(items, per_file_errors) -> Literal["high","medium","low"] per probe |
| D-3 | harden | Smart-constructor pattern unused — ExceptionsSlice could allow active and expired to share an entry |
Added @model_validator(mode="after") requirement: active and expired have disjoint (repo_glob, task, expires) tuples; total entries before partition = len(active) + len(expired) |
| D-4 | harden | Rule-of-Three for marker-probe abstraction debated; AC-12 enforces "no cross-probe imports" but not zero-edit extensibility | New AC: adding a sixth marker probe under layer_d/ requires (a) zero edits to the five files, (b) zero edits to codegenie.parsers.safe_yaml, (c) one new test file + one new entry in MARKER_MODULES + (optionally) one new entry in the test conftest.py's _make_context fixture if a new ctx.config key is needed |
| D-5 | nit | Newtype opportunity for AdrId — but this is the FIRST consumer of ADR identifiers; Rule 2 says don't extract |
Surfaced as a Notes-for-implementer paragraph (do NOT mandate); the threshold triggers when a Phase-3+ Planner consumes ADR IDs across module boundaries |
| D-6 | nit | Adr.status is a Literal[...] — make-illegal-states-unrepresentable; the existing AC-5 already pins this; the harden formalizes the closed set in a module-level constant |
_ADR_STATUSES: Final[frozenset[Literal[...]]] = frozenset({"proposed","accepted","deprecated","superseded","unknown"}) declared once; AC-5 references it |
| D-7 | nit | Adapter pattern for the safe_yaml.loads extension — keep the bytes/dict adapter contained in safe_yaml.py; do NOT leak yaml.YAMLError into the probe modules |
The chokepoint already translates to MalformedYAMLError; AC enforces this |
Stage 3 — Researcher (skipped)¶
No NEEDS RESEARCH findings. All eighteen kernel-contract mismatches resolved via in-repo precedents:
src/codegenie/probes/base.py:64-96—ProbeABC +ProbeContext+ProbeOutput+RepoSnapshotsrc/codegenie/probes/registry.py:238—default_registrysrc/codegenie/probes/layer_b/scip_index.py:114—_PROBE_ID: Final[ProbeId]constant patternsrc/codegenie/types/identifiers.py:29—ProbeId = NewType("ProbeId", str)src/codegenie/parsers/safe_yaml.py:80—load(path, *, max_bytes, max_depth=64)signature;_parse_one+assert_max_depthprimitives for theloadsextensionsrc/codegenie/conventions/catalog.py:55-72—{rules: [...]}top-level-mapping precedent for the exceptions YAML format pindocs/phases/02-context-gather-layers-b-g/stories/S6-01-skills-index-probe.mdHARDENED — every probe-shape conventiondocs/phases/02-context-gather-layers-b-g/stories/S6-02-conventions-probe.mdHARDENED — same lineage;_make_contexttest helper, three-state confidence, raw-artifact-emission, registry-annotation, flat-schema-path AC precedents
The Hypothesis property-based test for ADR sort-stability (T-5) and the metamorphic test for "adding a file should never decrease len(adrs)" (T-6) are follow-up suggestions, not researcher triggers — Hypothesis is already used in the repo; the patterns are canonical.
Stage 4 — Synthesizer + Editor¶
Thirty in-place edits applied to S6-03-layer-d-marker-probes.md:
- Header block — added "Validation notes" sub-block under the existing depends-on/ADRs lines; pointed at this report.
- Status field —
Ready→Hardened (ready for executor). - Depends on — added
S1-03(safe_yaml.loadschokepoint extension) and S6-01 (probe-shape precedent). - ADRs honored — added Phase 0 ADR-0007 (frozen
ProbeABC) and 02-ADR-0003 (@register_probedecorator kwarg). - Context paragraph — added a closing sentence pointing at S6-01 / S6-02 as the probe-shape precedents and at the
safe_yaml.loadschokepoint extension. - References — added
src/codegenie/probes/base.py,src/codegenie/probes/registry.py:238,src/codegenie/probes/layer_b/scip_index.py:114,src/codegenie/types/identifiers.py, S6-01 hardened story, S6-02 hardened story; flaggedlocalv2.md §5.4 D6example as Phase-2-refined to{exceptions: [...]}. - Goal — rewrote to match kernel:
async def run(repo, ctx);name: str = "<id>";_PROBE_ID: Final[ProbeId]constant; three-state confidence via_compute_confidence; per-file errors surfaced through the slice;safe_yaml.loadsextension in scope; raw artifact written toctx.output_dir / "<probe_id>.json". - AC-1 — kept (
__all__declarations). - AC-2 — kept (LOC ceiling); test rewritten to use
len(Path(src).read_text().splitlines())(file-closed) rather thansum(1 for _ in open(...)). - AC-3 — kept (slice models frozen=True, extra="forbid"); each slice model is a proper
BaseModelwithmodel_config = ConfigDict(frozen=True, extra="forbid")(the original draft's inline tuple notation was not Python). - AC-4 — rewrote: each probe declares
name: str = "<id>",version: str = "0.1.0",layer = "D",tier = "base",applies_to_tasks: list[str] = ["*"],applies_to_languages: list[str] = ["*"],requires: list[str] = [],timeout_seconds: int = 5,cache_strategy: Literal["content"] = "content",declared_inputs: list[str]listing the marker paths;_PROBE_ID: Final[ProbeId]module-level constant. - AC-5 — rewrote ADRProbe:
async def run(self, repo, ctx) -> ProbeOutput; sliceAdrsSlice(adrs: tuple[Adr, ...], scanned_locations: tuple[str, ...], per_file_errors: tuple[str, ...]);Adr(id: str, title: str, status: Literal["proposed","accepted","deprecated","superseded","unknown"], path: str)withas_posix()repo-relative path; status normalized to lowercase; pure helper_parse_adr_text(lines, filename_stem) -> (id, title, status)callable from tests without filesystem. - AC-6 — rewrote RepoNotesProbe: slice
RepoNotesSlice(notes_dir: str | None, files: tuple[NoteFile, ...], per_file_errors: tuple[str, ...]);NoteFile(path: str, headings: tuple[str, ...], byte_count: int, last_modified: str)(renamedchar_count→byte_countfor unambiguous-units pin);byte_count = path.stat().st_size;last_modified = datetime.fromtimestamp(path.stat().st_mtime, tz=UTC).isoformat(); headings extracted via streaming line iterator (open(path, 'rb')+for line in fh:loop with line-byte cap — NOTread_text()); pure helper_collect_headings(line_bytes_iter) -> tuple[str, ...]. - AC-7 — rewrote RepoConfigProbe:
safe_yaml.loads(frontmatter_bytes, max_bytes=8192, max_depth=8)consumes the bounded-bytes block extracted by pure helper_extract_frontmatter_block(file_bytes) -> tuple[frontmatter_bytes | None, body_byte_offset: int]; the file is opened via bounded read (cap = 64 KiB, configurable viactx.config.get("repo_config.max_bytes", 65536)) → frontmatter region bytes →safe_yaml.loads; sliceRepoConfigSlice(files: tuple[RepoConfigFile, ...], per_file_errors: tuple[str, ...]);RepoConfigFile(path: str, frontmatter_keys: tuple[str, ...], has_body: bool, body_byte_offset: int)—body_byte_offsetin BYTES (pin). - AC-8 — rewrote PolicyProbe:
safe_yaml.load(Path(ctx.config.get("policy.user_home", "~")).expanduser() / ".codegenie" / "config.yaml", max_bytes=65536, max_depth=8); slicePolicySlice(policy_repos: tuple[PolicyRepoRef, ...], per_file_errors: tuple[str, ...]);PolicyRepoRef(path: str, type: str, exists_on_disk: bool);exists_on_disk = Path(ref.path).expanduser().exists()(follows symlinks; non-existent target → False); missing-policy_reposfield → empty tuple,confidence="high"; missing config file →confidence="low", empty tuple,per_file_errors=("policy_config_absent",). - AC-9 — rewrote ExceptionProbe: YAML format pinned to
{exceptions: [<entry>, ...]};safe_yaml.load(path, max_bytes=65536, max_depth=8)for both repo-local and user-home files; merge user + repo (no de-dup; both entries kept);_match_repo_glob(repo_name: str, repo_glob: str) -> boolviafnmatch.fnmatchcase(case-sensitive);_partition_by_expiry(entries, now: date) -> (active, expired)pure helper;now = datetime.now(tz=UTC).date()at the imperative-shell boundary;expires >= now→ active (inclusive);repo.root.nameis therepo_name(NOTctx.repo_name); sliceExceptionsSlice(active: tuple[ExceptionEntry, ...], expired: tuple[ExceptionEntry, ...], per_file_errors: tuple[str, ...])with@model_validator(mode="after")asserting disjoint(repo_glob, task, expires)tuples. - AC-10 — kept (marker-absent ⇒ low confidence, no raise); tightened to three-state via
_compute_confidence. - AC-11 — kept (body bytes never read); tightened to parametrized arch test over MARKER_MODULES × forbidden tokens
("read_text", "read_bytes", "os.open", "os.read", ".open(", "Path(...).open"). - AC-12 — kept (no cross-probe imports).
- AC-13 — kept (
safe_yamlfor all YAML reads); architectural test extended to also check no directyaml.load/yaml.safe_load/yaml.CSafeLoaderreference in the five probe modules. - AC-14 — kept (
mypy --strict); tightened:ProbeIdnewtype preserved through_PROBE_ID; noAnyescapes the slice;cast(...)is forbidden in the probe modules. - AC-15 — kept (determinism); rewrote to use
slice_.model_dump_json()byte-identity rather thanjson.dumps(out, sort_keys=True)(which would mask sort-order regressions at the slice level); added explicit sub-bullets for sort-key per probe. - New AC-16 — Three-state confidence via
_compute_confidence(items, per_file_errors) -> Literal["high","medium","low"]pure helper, per probe."high"iffper_file_errors == ()(including empty marker — clean empty state)."medium"iff items non-empty AND per_file_errors non-empty."low"iff items empty AND (per_file_errors non-empty OR marker absent OR catastrophic load failure). - New AC-17 — Per-file-error round-trip: each probe surfaces malformed sub-items (
adrsfile with no H1,repo_notesfile with IO error mid-stream,repo_configfile with malformed YAML frontmatter,policy.yamlnot a mapping,exceptions.yamlentry withexpires:not parseable as date) as a tuple of stable string codes inper_file_errors. The string codes are documented constants in the module (_REASON_*pattern fromsrc/codegenie/conventions/catalog.py:50-52). - New AC-18 — Registry annotation: parametrized test across the five probe modules asserts
entry.heaviness == "light"andentry.runs_last is Falsevianext(e for e in default_registry._entries if e.cls.name == "<probe_id>"). Replaces the original draft's nonexistent_PROBE_REGISTRYreferences. - New AC-19 — Sub-schema flat-path import: each probe's test imports
from importlib.resources import files; (files("codegenie.schema.probes") / "<probe_id>.schema.json")and asserts the slice JSON validates. Sub-schema authoring deferred to S6-08; this AC pins the consumer-side path so S6-08 missing a schema is loud. - New AC-20 — Extension-by-addition: adding a sixth marker probe
layer_d/skills_metrics.py(or any sibling name) requires zero edits to the five files in this story AND zero edits tosafe_yaml.py. The architectural test parametrizes MARKER_MODULES — adding a new module is one line in MARKER_MODULES, not five edits. - New AC-21 —
safe_yaml.loadschokepoint extension:src/codegenie/parsers/safe_yaml.pygainsdef loads(data: bytes, *, max_bytes: int, max_depth: int = 64) -> Mapping[str, JSONValue](single function; wraps_parse_one+assert_max_depth; reusesSizeCapExceededforlen(data) > max_bytesandMalformedYAMLError/DepthCapExceededfor the existing failure modes). One unit test covers happy-path, size-cap, depth-cap, top-level-non-mapping.__all__insafe_yaml.pygrows by one entry. - New AC-22 — Exceptions YAML format pin: top-level is a mapping with key
exceptions:and valuelist[ExceptionEntry]. A fixture with the legacy bare-list shape (- repo_glob: …) is loaded;safe_yaml.loadraisesMalformedYAMLError; the probe maps this toper_file_errors=("exceptions_yaml_not_mapping",)andconfidence="low". Mutation caught: any future regression to bare-list parsing would silently re-introduce the chokepoint bypass. - Implementation outline + TDD plan + GREEN snippet — rewrote all four (one per probe + the cross-cutting arch tests) to match the contract above. The
_parse_adr_text/_collect_headings/_extract_frontmatter_block/_partition_by_expiry/_compute_confidencepure helpers are extracted as module-level free functions, callable from tests without filesystem fixtures; the asyncrunis the imperative shell;_PROBE_ID: Final[ProbeId]module-level constants;name: str = "<id>"ABC attr; fullProbeOutputsix-field shape withduration_msviatime.perf_counter(); raw artifact written atomically toctx.output_dir / "<probe_id>.json". Files-to-touch table extended by one row (src/codegenie/parsers/safe_yaml.py) and one new test file (tests/unit/parsers/test_safe_yaml_loads.py).
Conflicts resolved¶
- Coverage C-1 / Test-quality T-1 / Consistency K-1 vs Design-patterns D-1 — all four pushed in the same direction (probe shape must match the kernel + three-state confidence + functional core). No conflict.
- Coverage C-7 vs Design-patterns D-7 — both argued for the
safe_yaml.loadsextension. D-7 added the "adapter contained insafe_yaml.py; noyaml.YAMLErrorleakage" constraint. Synthesized as AC-21. - Consistency K-12 (exceptions YAML format) vs
localv2.md §5.4 D6(bare list) — Consistency wins;safe_yaml's chokepoint discipline is post-localv2.md; the format pin is a Phase-2 refinement documented in Implementer notes. - Coverage C-3 (empty-fixture ACs for non-marker-absent cases) vs Rule 2 (Simplicity First — don't over-specify) — Coverage wins on the partial-success surface (it's a real edge case operators will hit); Rule 2 wins on declining to mandate fixtures for every imaginable combinatorial corner. Five empty-fixture sub-bullets added; combinatorial expansion declined.
Verdict¶
HARDENED. Thirty in-place edits applied. Story is now structurally consistent with the Phase-0 frozen Probe ABC, the Phase-1 safe_yaml chokepoint (with the loads extension surfaced as in-scope), the Phase-2 default_registry, and the S6-01 / S6-02 hardened story precedents for Layer-D probes.
The implementer who picks up this story can now:
1. Read src/codegenie/probes/base.py, src/codegenie/probes/registry.py, src/codegenie/parsers/safe_yaml.py, and S6-01 / S6-02 hardened stories.
2. Land safe_yaml.loads(bytes, *, max_bytes, max_depth=64) first (one function; one test file).
3. Land each of the five probes in any order — they share no kernel beyond Probe + safe_yaml; no inter-probe coupling means parallel-developable.
4. Copy S6-01 / S6-02's _make_context / _make_repo test helpers into tests/unit/probes/layer_d/conftest.py (the rule-of-three triggers when S6-04's external_docs probe arrives — this story is the third trigger, so the helpers extract NOW).
5. Verify the parametrized MARKER_MODULES arch tests catch a deliberate "extract a MarkerProbe base class" refactor attempt.
6. Run mypy --strict src/codegenie/probes/layer_d/ src/codegenie/parsers/safe_yaml.py clean.
No RESCUE outcome — the goal and ACs trace to the architecture and ADRs without contradiction once the kernel-contract drift is corrected.
Recommended next step¶
phase-story-executor to implement. Suggested ordering:
safe_yaml.loadschokepoint extension (AC-21) — unblocks RepoConfigProbe.tests/unit/probes/layer_d/conftest.pywith_make_context/_make_repohelpers — unblocks all five probes' tests.- Five probes in any order (recommend starting with
ADRProbe— simplest pure-parse shape;_parse_adr_textis the easiest pure helper to TDD). - Cross-cutting arch tests (
test_marker_probes_loc.py) last — they're parametrized over the final MARKER_MODULES set.