Validation report — S2-03 Reference TCCM + roundtrip integration exercising every Protocol method¶
Story: ../S2-03-reference-tccm-roundtrip.md
Validated: 2026-05-15
Validator: phase-story-validator (scheduled task: story-validation-corrector)
Verdict: HARDENED
Summary¶
The story ships a reference TCCM under docs/phases/02-context-gather-layers-b-g/_reference-tccm/tccm.yaml plus an integration-test file tests/integration/tccm/test_reference_tccm_roundtrips.py that closes Gap 1 from the phase arch (Adapter Protocols defined in S1-03, never invoked in Phase 2). The conceptual shape — exercise every DerivedQuery variant, dispatch each to a structurally-conformant mock implementing all four Phase 2 Protocols, assert every Protocol method is invoked — is correct and traces cleanly to arch §"Gap analysis" Gap 1, 02-ADR-0007 §Consequences, production ADR-0029, ADR-0030, and ADR-0032.
But the original draft had four block-tier defects caused by drift between the story's prescribed test code and the actually-shipped S1-04 modules:
- The fixture YAML used a function-call DSL form (
compute: dep_graph.consumers_of("@codegenie/scip")) that S1-04'sTCCMLoadernever parses — the live loader is a thinsafe_yaml.load → TCCM.model_validateshim with discriminator-form Pydantic variants only. _expected_tccm()constructedConsumersOf(name=..., max_files=...)etc. — neither field exists on any variant, andextra="forbid"would raise on every constructor call.- AC-7 asserted
err.reason == "unknown_query_primitive"butTCCMLoadErroris a marker with no.reasonattribute; the reason lives as a positionalargs[0]prefix. - References mixed ADR-0030's author-facing alias
scip.refswith the literal Pydantic discriminatorrefs_to.
Every test would have failed red on phase-story-executor's first attempt for non-design reasons. The executor's three-attempt loop would have burned attempts hunting parser bugs that don't exist.
A further dozen harden-tier gaps were closed: missing confidence_floor round-trip coverage for Trusted and Unavailable, missing frozen=True / extra="forbid" enforcement, missing per-variant JSON round-trip identity, over-broad pytest.raises((AssertionError, TypeError, ValueError)), multi-defect invalid fixture, missing mypy --strict scope on the test directory, missing Protocol-typed _dispatch parameters (the actual signature-drift closer for Gap 1), unused MagicMock import, redundant kind= kwargs on Trusted / Degraded / Unavailable, set-equality where multiset-equality belongs, and the missing inverse coverage ratchet (mypy --warn-unreachable on a synthetic deleted-arm copy of _dispatch).
Three design-pattern strengthenings: introduced _ProtocolMethod(StrEnum) as the typed nine-method surface (recorder + assertions both consume it — typo discipline); preserved mock-class duplication explicitly (deleted in Phase 8, not worth a mixin); flagged Protocol-side primitive obsession (pkg: str, module: str, symbol: str) for a future Phase-3-entry amendment ADR (out of S2-03's scope).
Stage 3 research skipped — no NEEDS RESEARCH findings. Every gap was answerable from arch + ADRs (02-ADR-0007, ADR-0029, ADR-0030, ADR-0032, ADR-0033) + verified live source (src/codegenie/tccm/{queries,loader,model}.py, src/codegenie/result.py, src/codegenie/errors.py, src/codegenie/adapters/{protocols,confidence}.py, src/codegenie/types/identifiers.py).
Twelve ACs original → fifteen ACs after hardening (AC-3b, AC-3c, AC-3d, AC-4b, AC-7b, AC-11b, AC-13 added; AC-1, AC-2, AC-3, AC-4, AC-6, AC-7, AC-9 reworded). Implementation outline §1 / §2 / §4 / §5 rewritten. Files-to-touch table grew from 6 entries to 13. Notes-for-implementer gained a "Design patterns" subsection. Story is now ready for phase-story-executor.
Context Brief (Stage 1)¶
Story snapshot¶
- Goal as written: Ship a reference TCCM at
docs/phases/02-context-gather-layers-b-g/_reference-tccm/tccm.yaml(one illustrative manifest exercising every TCCM field + everyDerivedQueryvariant) +tests/integration/tccm/test_reference_tccm_roundtrips.py(loads viaTCCMLoader, asserts equality with hand-constructed Pydantic, dispatches each variant to a mock implementing all four Phase 2Protocols, asserts every Protocol method is invoked at least once — the in-Phase-2 anchor for Gap 1 / signature-drift discovery). - Non-goals: Real adapter implementations (Phase 3), Bundle Builder (Phase 8), per-task-class second TCCM (Phase 3 ships first plugin TCCM),
SkillsLoader.find_applicable(...)(S2-01 owns; this story names skills as data only),tests/integration/adapters/test_phase3_handoff_smoke.py(S7-04 cross-phase trip-wire).
Phase 2 exit criteria touched¶
- Gap 1 closed in-phase (arch §"Gap analysis" — Adapter Protocol drift). ✓
- Kernel scaffolding only, no plugin loader (02-ADR-0007). ✓ (fixture lives under
docs/, notplugins/). safe_yaml.loadchokepoint preserved (Phase 1 ADR-0006). ✓ (TCCMLoader routes through it; AC-9 smokes it independently).- Five-primitive
DerivedQuerytaxonomy exhausted (production ADR-0030). ✓ - Discriminated unions +
assert_neverratchet (ADR-0033 §3–4, arch §"Agentic best practices"). ✓ (AC-6 + AC-13 inverse ratchet). - Open/Closed at the Protocol surface for adding a fifth Protocol or sixth method. ✓ (
_ProtocolMethodStrEnum is the single-place edit point).
Load-bearing commitments touched¶
- CLAUDE.md §"No LLM anywhere in the gather pipeline" — all loaders are deterministic. ✓
- CLAUDE.md §"Honest confidence" —
AdapterConfidencethree-variant union with reason strings; AC-3b verifies all three round-trip. ✓ - CLAUDE.md §"Extension by addition" — sixth
DerivedQueryvariant adds acasearm + new enum value + new YAML entry; mypy--warn-unreachableand the AC-4 enum-iteration both fire on omission. ✓ - CLAUDE.md §"Conventions to follow" —
.codegenie/namespace N/A here (test artifact only). - 02-ADR-0007 §Decision — fixture under
docs/, notplugins/; AC-8 walk asserts. ✓ - ADR-0029 — TCCM schema; AC-3 + AC-3b/c/d round-trip every field. ✓
- ADR-0030 — five derived-query primitives; AC-2 multiset assertion. ✓ (Author-facing alias
scip.refsreconciled with literalrefs_totoken in story prose.) - ADR-0032 — four
Protocols; AC-4 + AC-4b enforce invocation + Protocol-typed dispatch signatures. ✓ - ADR-0033 — newtypes for identifiers (
SkillId,ProbeId,TaskClassId);_expected_tccm()constructors now use them. ✓
Sibling-family lineage¶
- Third Phase-2 integration test that round-trips a typed Pydantic model under
docs/(after S1-04's TCCMLoader unit tests and S2-02's catalog-loader integration tests). Convention codified: the in-phase reference fixture lives underdocs/phases/{phase}/_reference-{thing}/, single-defect siblings under_invalid/, variant-coverage siblings under_floors//_variants/. Future fixtures should match this layout. - Fourth consumer of the sum-type + exhaustive
match+assert_neverratchet — after S1-01 (StaleReason), S1-04 (_classify/LoaderReason), andAdapterConfidenceconsumers. Pattern is now load-bearing. - First test in the codebase to use
mypy --warn-unreachableas an inverse coverage ratchet via subprocess (AC-13). Extension-by-addition forcing function — sixthDerivedQueryvariant cannot land green without acasearm. - Rule-of-three threshold for shared dispatcher kernel: NOT REACHED. One dispatcher in this phase; Phase 8's Bundle Builder is the production dispatcher (different shape, different concerns). Mocks-as-duplicate-classes is the right call (Rule 3 + intentional-deletion lifecycle).
Goal-to-AC trace¶
- AC-1 → goal: STRENGTHENED (allow
README.md, codify_invalid/+_floors/siblings). - AC-2 → goal: STRENGTHENED (
Counter-style multiset, explicitlen == 5— was: set-equality only). - AC-3 → goal: BLOCK-FIXED (corrected constructors against live
queries.py; correctedResult.Oknamespacing to top-levelOk; addedisinstance(result, Ok)assertion). - AC-3b → goal: ADDED (every
AdapterConfidencevariant round-trips forconfidence_floor). - AC-3c → goal: ADDED (
frozen=True+extra="forbid"enforced). - AC-3d → goal: ADDED (per-variant JSON round-trip identity).
- AC-4 → goal: STRENGTHENED (string-literal-fan replaced by typed
_ProtocolMethodenum iteration). - AC-4b → goal: ADDED (Protocol-typed
_dispatchparameters — the actual Gap-1 signature-drift closer). - AC-5 → goal: unchanged (already strong).
- AC-6 → goal: STRENGTHENED (
pytest.raises(AssertionError)only; previous(AssertionError, TypeError, ValueError)umbrella hid bugs). - AC-7 → goal: BLOCK-FIXED (
err.args[0].startswith("unknown_query_primitive:")—TCCMLoadErrorhas no.reason). - AC-7b → goal: ADDED (parametrized over
parse:/schema:/unknown_query_primitive:taxonomy). - AC-8 → goal: unchanged (already strong).
- AC-9 → goal: REWORDED (smoke, not parser test).
- AC-10 → goal: unchanged (already strong).
- AC-11 → goal: unchanged.
- AC-11b → goal: ADDED (
mypy --strictscoped to test directory). - AC-12 → goal: unchanged.
- AC-13 → goal: ADDED (inverse coverage ratchet via subprocess
mypy --warn-unreachable).
Stage 2 — Critic findings¶
Critic A — Coverage¶
| ID | Severity | Finding | Resolution |
|---|---|---|---|
| C1 | harden | AC-4 only asserts "every method called at least once" — traces to Gap 1's letter, not its spirit (Gap 1 worries about signature drift). A test calling dep.consumers("@x") passes against either signature. |
Applied. Added AC-4b: _dispatch parameters typed against the Protocols (not the mocks); mypy validates call sites against Protocol signatures. AC-11b scopes mypy --strict to tests/integration/tccm/. |
| C2 | block | confidence_floor only exercises Degraded — Trusted and Unavailable discriminator paths could ship broken. |
Applied. Added AC-3b: three _floors/ fixtures + parametrized round-trip test. |
| C3 | harden | No AC verifies frozen=True / extra="forbid" invariants on the loaded TCCM. Arch §"Data model" requires both. |
Applied. Added AC-3c: mutation raises ValidationError; extra top-level key returns Err with schema: prefix. |
| C4 | harden | _invalid/ coverage thin: only unknown_compute.yaml. Realistic on-disk failures include malformed YAML, missing required field, unknown top-level key. |
Applied. Added three sibling fixtures + AC-7b parametrization across the full LoaderReason taxonomy. |
| C5 | nit | AC-9 (safe_yaml.load) duplicates S1-04 cheaply — keep but reframe as fixture-validity smoke. |
Applied. AC-9 reworded. |
| C6 | block | No AC for mypy --strict on the test file itself. Without it, AC-4b's Protocol-typed signatures are just comments. |
Applied. Added AC-11b. |
| C7 | harden | AC-6's pytest.raises((AssertionError, TypeError, ValueError)) is over-defensive — typing.assert_never raises AssertionError deterministically (3.11+). |
Applied. Narrowed to pytest.raises(AssertionError). |
| C8 | nit | AC-1 says directory holds "only the reference TCCM and _invalid/" — contradicts AC-10 (mandates README.md). |
Applied. AC-1 reworded to allow README.md, _invalid/, _floors/. |
| C9 | harden | No JSON round-trip identity per variant — discriminator-tag-name mismatches could ship uncaught before Phase 3. | Applied. Added AC-3d. |
| C10 | nit | No escape-hatch hedges in ACs — clean. | No action. |
Critic B — Test Quality¶
| ID | Severity | Finding | Resolution |
|---|---|---|---|
| T1 | block | _expected_tccm() uses constructor args (name, max_files) that don't exist on any variant in queries.py; extra="forbid" raises. |
Applied. Dropped from _expected_tccm() and YAML; constructors now match live model. |
| T2 | block | YAML compute: dep_graph.consumers_of("@x") form does not match the actual loader (no DSL parser; compute: Literal[...] discriminator + flat payload field only). |
Applied. YAML rewritten to discriminator-form compute: consumers_of + pkg: "@x". |
| T3 | block | TCCMLoadError(reason=...) constructor and err.reason access don't work — TCCMLoadError is a marker; reason is positional args[0] prefix. |
Applied. AC-7 asserts err.args[0].startswith("unknown_query_primitive:"). |
| T4 | harden | Result.Ok / Result.Err not callable as namespaced constructors — Ok / Err are top-level exports. |
Applied. Story prose mentions corrected; test imports from codegenie.result import Ok. |
| T5 | harden | AC-6 pytest.raises((AssertionError, TypeError, ValueError)) over-broad. |
Applied. Narrowed (cf. C7). |
| T6 | harden | AC-3 equality could pass under near-wrong implementations — add property-style round-trip TCCM.model_validate(tccm.model_dump()) == tccm. |
Partially applied via AC-3d (per-variant JSON round-trip — stronger than dict round-trip for discriminator coverage). |
| T7 | harden | MagicMock import is dead; ruff would fail AC-11. |
Applied. Dropped. |
| T8 | harden | Mock confidence() returns Trusted(kind="trusted") — kind is defaulted; explicit pass is misleading. |
Applied. All Trusted() / Degraded(reason=...) / Unavailable(reason=...) calls. |
| T9 | nit | _dispatch itself doesn't invoke confidence() — production invariant Phase 8 will need. |
Deferred — Phase 8's Bundle Builder owns that. Notes-for-implementer flags _dispatch is test-only. |
| T10 | nit | _Imposter __match_args__ distraction — Pydantic-model case arms match by class identity, not __match_args__. |
Applied. Removed __match_args__ = (); added comment explaining why an arbitrary object falls through. |
| T11 | nit | AC-2 set equality misses multiset semantics — [Consumers, Producers, Reverse, Refs, Tests, Tests] would fail set check by length only. |
Applied. Counter-style multiset + explicit len == 5. |
Critic C — Consistency¶
| ID | Severity | Finding | Resolution |
|---|---|---|---|
| CO1 | block | YAML compute: strings do not parse under S1-04's loader. |
Applied (cf. T2). |
| CO2 | block | Story line ~35 lists scip.refs (ADR-0030 alias) instead of refs_to (literal token shipped by S1-04). |
Applied. References section reworded; alias documented as alias only. |
| CO3 | block | _expected_tccm() constructors include name= / max_files= fields that don't exist. |
Applied (cf. T1). |
| CO4 | confirmed | confidence_floor: AdapterConfidence correct in story prose; mock Trusted(kind="trusted") redundancy harmless. |
No action (T8 fixed the redundancy anyway). |
| CO5 | harden | AC-7 invalid fixture has extras — multi-defect; couples to Pydantic v2 error-emission order. | Applied. Single-defect: dropped name/max_files from the invalid fixture. |
| CO6 | harden | Result.Ok(value=...) namespacing in story prose — Ok / Err are top-level exports. |
Applied (cf. T4). |
| CO7 | block | TCCMLoadError(reason="unknown_query_primitive") and err.reason don't work. |
Applied (cf. T3). |
| CO8 | harden | Depends on: S2-01 is misleading — story does not call SkillsLoader; required_skills is data only. |
Applied. Header now lists S1-04 + S1-05 (SkillId newtype). Out-of-scope §5 already documented this. |
| CO9 | nit | Over-broad pytest.raises umbrella. |
Applied (cf. C7). |
| CO10 | confirmed | Reference path matches 02-ADR-0007 §Consequences ¶5. | No action. |
Critic D — Design Patterns¶
| ID | Severity | Finding | Resolution |
|---|---|---|---|
| DP1 | harden | The mypy --warn-unreachable claim is documentation, not enforcement — no AC asserts that removing a case arm makes mypy fail. |
Applied. Added AC-13: subprocess-driven inverse coverage ratchet via synthetic deleted-arm fixture. |
| DP2 | harden | Primitive obsession on Protocol method names — nine string literals in the recorder, nine in the assertions; typos silent. | Applied. Introduced _ProtocolMethod(StrEnum); recorder + assertions both consume it. |
| DP3 | harden | The fan of nine assert called(...) lines IS the "Protocol coverage map" the original Refactor §6 forbid — but uncentralized. The right form is the StrEnum (declarative typed surface), not nine independent asserts. |
Applied. Test body now iterates the StrEnum; single missing-method failure with named method. Refactor §6 reframed in Notes-for-implementer. |
| DP4 | nit | Four sibling mocks share __init__ + confidence() shape — three-strikes fires on its face. |
Resisted (Rule 3 + Phase-8 deletion lifecycle). Notes-for-implementer flags it explicitly. |
| DP5 | nit | Newtype opportunity (PackageName, ModulePath, SymbolName) on Protocols — primitive obsession at the Protocol boundary. |
Deferred to Phase-3-entry amendment ADR. Notes-for-implementer flags it. |
| DP6 | nit | Registry pattern (@register_dispatcher_for(...)) premature — Phase 8 owns production dispatcher. |
No action. Notes-for-implementer reinforces "do not export". |
| DP7 | nit | _dispatch mutates mocks — pure version would obscure intent (side effect IS the assertion target). |
No action. Notes-for-implementer documents. |
Stage 3 — Research¶
Skipped. No NEEDS RESEARCH findings. Every gap was answerable from arch + ADRs + verified live source code (src/codegenie/tccm/, src/codegenie/result.py, src/codegenie/errors.py, src/codegenie/adapters/, src/codegenie/types/identifiers.py).
Stage 4 — Edits applied¶
The story file was edited in place. All edits are surgical and preserve the original structure:
- Header —
Status: Ready (HARDENED 2026-05-15);Depends on:corrected to S1-04 + S1-05 (was: S1-04 + S2-01 misleading). - Validation notes block appended after the header — full edit log with critic IDs.
- References → Production ADRs → ADR-0030 line — reworded to use the literal
computetokens (refs_to) withscip.refsdocumented as ADR-0030's author-facing alias. - Goal §1 last bullet — explicit "no
name, nomax_files" callout. - Goal §2 first sub-bullet (
test_reference_tccm_loads_and_equals_expected_pydantic_instance) —Result.Ok→Ok(top-level export) clarified. - Goal §2 fifth sub-bullet (
test_unknown_compute_primitive_returns_typed_err_prefix) — corrected toargs[0]prefix; renamed test for accuracy. - Acceptance criteria — fifteen ACs replacing twelve. Block-tier rewrites on AC-3, AC-7. New ACs: AC-3b, AC-3c, AC-3d, AC-4b, AC-7b, AC-11b, AC-13. Reword: AC-1, AC-2, AC-4, AC-6, AC-9.
- Implementation outline §1 — YAML rewritten to discriminator-form (no DSL strings, no
name/max_files). - Implementation outline §2 — invalid fixtures restructured: single-defect
unknown_compute.yaml+ three new siblings (malformed.yaml,missing_required_probes.yaml,extra_top_level_key.yaml) + three_floors/siblings. - Implementation outline §4 — full test-file rewrite:
_ProtocolMethod(StrEnum)introduction, Protocol-typed_dispatchsignature,Trusted()/Degraded(reason=...)/Unavailable(reason=...)constructors,Counter-style multiset, parametrized fixtures,args[0]prefix assertions, narrowpytest.raises(AssertionError),MagicMockimport dropped. Sibling AC-13 ratchet test referenced. - Implementation outline §5 — sanity-check command rewritten to verify the live two-field shape.
- Files to touch — grew from 6 entries to 13 (incl. seven new fixtures + AC-13 ratchet test).
- Notes for the implementer — appended "Design patterns" subsection covering the StrEnum, the inverse-ratchet, the deliberate mock duplication, the deferred newtype amendment, and the functional-core-vs-imperative-shell rationale.
Verdict — HARDENED¶
The story is now ready for phase-story-executor. Every block-tier defect has been corrected against live source code. Every harden-tier gap has been closed with an AC + a corresponding test in the prescribed test file. Three design-pattern strengthenings preserve the original "extension by addition" intent without introducing premature abstraction.
A phase-story-executor run against this hardened story should:
- Land 11 named tests (8 standalone + 3 parametrized expansions = 17 actual
pytestIDs oncepytest --collect-onlyruns) red on the first commit because the YAML fixtures don't exist. - Author the seven YAML fixtures + README.md.
- Watch all tests turn green without YAML hand-tuning, because the
_expected_tccm()constructors and YAML now agree withqueries.py. - Land the AC-13 ratchet test in a sibling file.
- Pass
ruff check,ruff format --check,mypy --strict,mypy --strict tests/integration/tccm/,mypy --warn-unreachable.
Estimated executor attempts: 1 (single attempt happy-path).