Cross-story lessons — Phase 02 stories¶
Append-only. Each entry: lesson · source story · how to apply it on the next attempt.
L1 — uv.lock parity test trips on any new [project.optional-dependencies] entry¶
- Source: S1-01 (added
hypothesisto[dev]). - Symptom:
tests/unit/test_makefile_targets.py::test_uv_lock_is_in_lockstep_with_pyproject_dep_setfails withmissing packages: ['<new-pkg>']. - Fix: After editing
pyproject.toml's dep set, runuv lockand commit the regenerateduv.lockin the same commit. - Why it matters: Phase 2 stories add ≥ 7 new packages (per phase-arch §Step 1). Forgetting
uv lockwill break every story's CI until the nextuv lockships.
L2 — Ruff's UP007 + UP017 rewrite Pydantic-discriminator-union sources¶
- Source: S1-01.
- Symptom: Story TDD-plan code blocks use
Union[X, Y]andtimezone.utcliterally; ruff auto-rewrites toX | Yanddatetime.UTC(rulesUP007,UP017are enabled at[tool.ruff.lint] select). - Fix: Apply ruff
--fixafter dropping in the story's prescribed code; the rewrites are safe under Pydantic v2 discriminated unions and Python 3.11+. Don't fight the linter; the codebase convention wins. - Why it matters: Saves a round-trip on every Phase 2 story that copy-pastes story-prescribed type-alias blocks.
L3 — Stale local venv hides missing [dev] extras¶
- Source: S1-01 (
hypothesisandimport-linterwere both missing locally). - Symptom:
pytestcollection fails onimport hypothesis;lint-importsconsole script not on PATH. - Fix: Before starting a story that adds a new tool to
[dev], run.venv/bin/pip install -e .[dev](oruv sync --extra dev) so the venv reflects the pyproject. The CI fence catches the gap, but local cycles compound. - Why it matters: Every Phase 2 story that uses Hypothesis (per phase-arch §"Testing strategy") or a new linter will hit this.
L4 — @runtime_checkable isinstance silently mutates target classes' __dict__¶
- Source: S1-03 (adapter Protocols +
test_no_phase2_module_implements_adapter_protocol_dynamic). - Symptom:
tests/unit/test_errors.py::test_subclasses_are_markers_onlyfails withMalformedJSONError declares extra class attributes … '__annotations__'. Reproduces only whentests/unit/adapters/test_protocols.pyruns first. - Fix: Any pkgutil-style walk that ends in
isinstance(inst, AnyRuntimeCheckableProtocol)MUST filter out classes that cannot possibly satisfy the protocol before the isinstance call. For adapter-tier protocols,issubclass(cls, BaseException)is the right pre-filter — exceptions cannot satisfyconfidence()/consumers()/etc., and skipping them sidesteps the side effect entirely. - Why it matters: Phase 2's S4-01/02 will revisit dynamic protocol-conformance scans (
AdapterConfidenceconsumers inIndexHealthProbe-adjacent code). The same trap fires there.
L6 — Test subpackage name collides with stdlib (types, pathlib, …) → leave it without __init__.py¶
- Source: S1-05 (
tests/unit/types/for the ADR-0033 newtypes). - Symptom:
pytest tests/unit/types/errors withModuleNotFoundError: No module named 'types.test_identifiers'; 'types' is not a package. The stdlibtypesmodule wins the lookup when__init__.pyis present. - Fix: Do not add
__init__.pyto test subpackages whose name shadows a stdlib module. Pytest's rootdir-relative collection works without it (every other test subpackage in this repo has__init__.pyonly because its name doesn't collide). - Why it matters: Any future Phase-2 test subpackage named after a stdlib module (
types,string,array,json, …) will hit the same trap. The fix is structural; converting the test file imports to absolute paths does not avoid the collision because pytest registers the package itself.
L7 — Surgical type-alias backfills are the right fix when a Phase-2 story imports a Phase-1-owned name that doesn't exist yet¶
- Source: S1-05 (
PackageManagerreferenced by AC-2/AC-5 but not previously exported bycodegenie.probes.node_build_system). - Symptom:
from codegenie.probes.node_build_system import PackageManagerfails at import time; story tests don't even reach AC-1. - Fix: Add a one-line
PackageManager: TypeAlias = Literal["bun","pnpm","yarn-classic","yarn-berry","npm"]to the Phase-1 module that owns the concept (node_build_system.py), mirroring the JSON schema enum. Do not redefine the alias insidecodegenie/types/— that violates AC-4 and production ADR-0033. Leave existingstr | Nonesignatures untouched (Literal | Noneis a structural subtype ofstr | None); the cleanup belongs to a separate Phase-1 ADR amendment. - Why it matters: S1-04 (TCCM model loader) and S1-02 (freshness registry) both import
IndexNamefrom this story. Any future Phase-2 story that references a Phase-1-owned typed name needs the same surgical-backfill pattern, not a kernel-tier redefinition.
L8 — Source-file docstrings wrap at column-80; substring assertions must normalize whitespace¶
- Source: S1-06 (
test_exec_module_docstring_phase2_present). - Symptom:
assert "ten Layer B/C/G tools" in codegenie.exec.__doc__fails even though the docstring contains the phrase, because Python preserves the source-file\nbetween "B/C/G" and "tools". - Fix: Before the substring check, normalize whitespace with
" ".join(doc.split()). The pretty-printed phrase you'd read in Sphinx output is the right mental model — but__doc__is the raw source text. - Why it matters: Any future docstring-pin AC (S1-07's
run_external_cli, S5-02'sRuntimeTraceProbe, etc.) will hit this trap. The fix is structural; pinning shorter substrings just delays the trap until the phrase grows.
L9 — ADR meta-tests that ban a historical string need rewording in all legacy references¶
- Source: S1-06 (
test_adr_0001_enumerates_all_new_binaries). - Symptom: Pass-2 AC-2's "the string
eight new entriesis absent" assertion failed because (a) the Amendment paragraph historically narrated "the original Decision named eight…" and (b) two §Consequences bullets still said "eight named entries" / "eight new entries" — drift from the original eight-binary scope. - Fix: When the meta-test forbids a substring, rewrite every legacy occurrence in the ADR, not just the Decision paragraph. Use semantically equivalent phrasings ("the original Decision named only the named-trigger binaries", "by the ten named entries") rather than count words.
- Why it matters: Phase 2 has 10 ADRs; every one will eventually have an amendment whose old wording lives elsewhere in the file. The meta-test pattern is sound, but the executor must
grepthe ADR for all legacy phrasings before declaring green.
L10 — mock.AsyncMock.await_args is typed _Call | None; mypy strict requires explicit assert ... is not None¶
- Source: S1-06 (Pass-2 spawn-spy refactor).
- Symptom:
mypy --stricterrors withItem "None" of "_Call | None" has no attribute "kwargs"onspy.await_args.kwargs["env"]. - Fix: Insert
assert spy.await_args is not Noneimmediately before the.kwargsaccess. Doubles as a regression guard — if the spy was never awaited, the test was wrong. - Why it matters: Every Phase 2 test asserting on
spy.await_args(S1-07'srun_external_clitests, S5-02'sdocker/stracespawn assertions, etc.) needs the sameassert ... is not Noneline. Family-precedent files get away without it because their mypy scope is older.
L5 — Pytest collects any Test* class imported into a test file's namespace¶
- Source: S1-03 (
TestInventoryAdapterraisedPytestCollectionWarning: cannot collect test class … has a __init__ constructor). - Symptom: Importing
TestInventoryAdapter(or anyTest*-named class) intotests/**/*.pywarns at collection time even when no test instantiates it. - Fix: Alias on import in the test file —
from codegenie.adapters import TestInventoryAdapter as InventoryAdapter. Do not set__test__ = Falseon the Protocol class (either inside or after the body): on Python 3.11 (CI), assigning__test__post-class-body adds it to_get_protocol_attrs, breakingisinstancefor any stub that doesn't declare__test__. CI was green on Python 3.13 (local) and red on Python 3.11 (CI) — the_get_protocol_attrsimplementation diverges across minor versions. - Why it matters: Any future domain type that begins with
Test(e.g.,TestMatrix,TestSuiteId) needs the import-alias fix in test files, not a runtime mutation of the source class. Setting__test__on a Protocol is a portability trap.
L11 — Story-sketch RED test that decorates inside with pytest.raises leaks UnboundLocalError¶
- Source: S1-02 (AC-3 duplicate-name rejection test).
- Symptom:
with pytest.raises(...): @reg.register(name) def check_b(...): ...— the decorator raises before bindingcheck_b, then the post-raisesassertionf".{check_b.__qualname__}" in msgblows up withUnboundLocalError: cannot access local variable 'check_b'. - Fix: Define the function above the
withblock (def check_b(...): ...), then call the decorator imperatively inside (reg.register(name)(check_b)). The decorator-time semantics survive (qualname still includes the test function as enclosing scope) and the post-block reference is bound. - Why it matters: Phase 2's other registry stories (S1-10
depgraphregistry; the post-S4 freshness-source registrations) will copy the same duplicate-name pattern. Apply the imperative-call fix on first pass.
L12 — Phase-N exception-marker additions cascade into two __all__-closure tests¶
- Source: S1-02 (
FreshnessRegistryErroraddition). - Symptom: Two pre-existing test files break the moment a new marker lands in
codegenie.errors: tests/unit/test_errors.py::test_all_closure_pins_public_surface(pins the EXACT__all__set againstEXPECTED_SUBCLASSES).tests/unit/test_errors.py::test_every_subclass_has_raise_site_docstring(requires the docstring to name aDOCUMENTED_MODULE_SLUGSentry).- Fix: Add
PHASE_N_NEW = {...}andEXPECTED_SUBCLASSES = ... | PHASE_N_NEWintests/unit/test_errors.py; add the new module's directory slug ("indices"for S1-02;"depgraph"for S1-10;"tccm"for S1-04; etc.) toDOCUMENTED_MODULE_SLUGS. - Why it matters: Every Phase-2 marker addition (S1-04
TCCMLoadError, S1-10DepGraphRegistryError, S2-01SkillsLoadError, S2-02ConventionsError, S5-01 / S6-06 outcome-types markers, etc.) will trip both. Plan the test edit alongside the source edit.
L13 — Re-export of new package surface narrows existing equality assertions on __all__¶
- Source: S1-02 (
codegenie.indices.__init__extended with the registry surface). - Symptom:
tests/unit/indices/test_freshness.py::test_all_exports_full_variant_setassertedset(m.__all__) == {<S1-01 names>}; equality fails the moment any sibling story extends the package. - Fix: Loosen the assertion to a subset on the story-specific names (
s1_01_names <= set(m.__all__)). The new sibling story checks its own surface independently. Keeps each story's__all__AC scoped to its own deliverable. - Why it matters: Same trap will fire when S1-04 (TCCM model + queries + loader) re-exports from
codegenie.tccm.__init__, and again at S5-01 (scenario_result+scanner_outcome). Narrow each per-story assertion to a subset on the story-owned names.
L14 — Annotated[Union[Generic[T], …], Field(...)] aliases are NOT subscriptable¶
- Source: S1-04 (
Result[T, E] = Annotated[Ok[T] | Err[E], Field(discriminator="kind")]). - Symptom:
TypeAdapter(Result[int, str])raisesTypeError: typing.Annotated[…] is not a generic class. Python 3.13'stypingmodule rejects subscripting anAnnotated[Union]alias even when both branches areGeneric. Tests that tryResult[int, str]to round-trip a concrete pair will not collect. - Fix: Drop the subscript. Round-trip via the concrete classes directly:
Ok[int].model_validate_json(Ok[int](value=42).model_dump_json()). Pydantic preserves the generic argument internally for validation; the discriminator-based polymorphic decode is exercised byTypeAdapter(Result)(unsubscripted) when needed. - Why it matters: Every Phase-2 loader story that consumes
Result(S2-01SkillsLoader.load, S2-02ConventionsCatalogLoader.load, future S6-04 external-docs loaders) will be tempted to writeTypeAdapter(Result[Skill, SkillsLoadError])for tests. It will fail at collection time. The right pattern is concrete-variant round-trip plus a separateResultpolymorphic decode test (unsubscripted adapter, manualisinstancecheck).
L15 — Alias-on-import for Test* types: the alias name matters, not just the act of aliasing¶
- Source: S1-04 (
TestsExercisingPydantic class). - Symptom: Aliased as
from codegenie.tccm import TestsExercising as TestsExercisingQuery— pytest still emittedPytestCollectionWarning: cannot collect test class 'TestsExercising' …. The collector's regex is on the local name in the module's namespace;TestsExercisingQuerystill starts withTest. - Fix: Choose a non-
Test-prefixed alias name. For S1-04 the chosen alias wasExerciseTestsQuery. Stage 1's L5 lesson recommended aliasing — extend it: the alias name must also fail pytest'sTest*regex. - Why it matters: Any Phase-2 domain type whose name starts with
Test(TestMatrix,TestSuiteId,TestRunner,TestVerdict,TestExercising-family) needs both the alias and a non-Test-prefixed alias name. The cost of getting it wrong is a spurious warning that does not fail CI but produces noise on every story that imports the class. Get it right on the first commit.
L16 — Pydantic v2 union_tag_invalid puts the discriminator name in ctx, not loc¶
- Source: S1-04 (
TCCMLoader._classifytranslation table). - Symptom: First
_classifyimplementation checkedloc[-1] == "compute"per the story sketch. Pydantic 2.13 actually returnsloc = ('derived_queries', 0)(the list index, not the discriminator field name) onunion_tag_invalid. The discriminator identity lives inctx['discriminator']as the literal string"'compute'"(yes, with embedded single-quotes — Pydantic renders the field name with quotes insidectx). Twounknown_query_primitivetests failed under the initial implementation. - Fix: When
type == "union_tag_invalid", checkctx.get("discriminator") in {"'compute'", "compute"}. Keep theliteral_errorarm checkingloc[-1] == "compute"as a defensive fallback for the rare codepath where Pydantic emitsliteral_errorinstead ofunion_tag_invalid. Pin both behaviors in a docstring as the public translation contract — Rule 12: if a Pydantic upgrade breaks AC-8, fix the translation, not the test. - Why it matters: Every future Phase-2 sum-type loader (
SkillsLoader.loadreturningResult[Skill, SkillsLoadError]with the samekinddiscriminator, S5-01scanner_outcome, S6-06 curated-scanner outcomes) will need the same translation pattern. The shape of Pydantic's error dict is the load-bearing contract; do not infer from the story sketch alone.
L17 — Annotated[Union, ...] aliases composed of generic models break unwrap()/unwrap_err() mypy signatures¶
- Source: S1-04 (
Ok.unwrap_errandErr.unwrapmypy errors). - Symptom:
mypy --strictreportserror: A function returning TypeVar should receive at least one argument containing the same TypeVar [type-var]on always-raises methods that declared-> Eor-> T. - Fix: Methods that always raise should declare
-> NoReturn. Semantically accurate (the function never returns) and mypy-clean. - Why it matters: Any future
Result-shaped API where one variant lacks a value will have always-raises helpers.NoReturnis the right type; do not work around by adding fake-> objector-> Anyreturns.
L18 — add_multi_constructor callbacks receive 3 args, not 2¶
- Source: S1-04 (
_safe_load_mkdocshelper for parsingmkdocs.yml's!!python/name:tags). - Symptom:
TypeError: _ignore_python_name() takes 2 positional arguments but 3 were givenwhen calling_MkdocsLoader.add_multi_constructor("tag:yaml.org,2002:python/name:", fn)withfn(loader, node). - Fix: The callback signature is
(loader, tag_suffix, node) -> Any. PyYAML's multi-constructor passes the tag suffix (everything after the prefix) as the middle argument; this is documented but easy to overlook. - Why it matters: Any future probe / loader that needs to register a YAML constructor for a tag family — including S6-04 external docs loaders that might encounter custom MkDocs/MkDocs-Material tags — needs the 3-arg signature. Single-tag
add_constructoris 2-arg, so the trap is the multi-vs-single split.
L19 — mypy --strict narrows sys.platform via Literal; indirect through a helper to avoid unreachable¶
- Source: S1-07 (
_maybe_wrap_with_bwrap). - Symptom:
src/codegenie/exec.py:448: error: Statement is unreachable [unreachable]— mypy on darwin narrowssys.platformto the literal"darwin", soif not sys.platform.startswith("linux"): ... return ...is "always taken" at type-check time, making the post-returnif shutil.which("bwrap") is None:block unreachable. - Fix: Indirect through
def _platform_is_linux() -> bool: return sys.platform.startswith("linux"). Mypy cannot narrow through function boundaries, so the post-check branch is reachable. Runtime semantics are identical andmonkeypatch.setattr(sys, "platform", "darwin")still works (the helper readssys.platformat call time). - Why it matters: Phase 5 (microVM, network namespaces) and any future Layer-C probe (S5-02
RuntimeTraceProbe, S5-05RuntimeTraceFreshness) that branches onsys.platformundermypy --strictwill hit the same trap. Reach for the helper on first cycle.
L20 — codegenie.exec cannot top-level import codegenie.types.identifiers (circular)¶
- Source: S1-07 (added
ProbeIdannotation torun_external_cli). - Symptom:
ImportError: cannot import name 'PackageManager' from partially initialized module 'codegenie.probes.node_build_system' (most likely due to a circular import). Chain:codegenie.exec→codegenie.types.identifiers(re-exportsPackageManager) →codegenie.probes.node_build_system(importscodegenie.exec). - Fix: Put the import under
TYPE_CHECKING.from __future__ import annotationsis already inexec.py, so the deferred annotation is a string at runtime — no actual lookup happens. Only valid when the imported name is used only in annotations (probe_name: ProbeId); if you ever need to callProbeId(...)insideexec.py, this trick fails and you must restructure. - Why it matters: Any future
exec.pyextension that wants strict-mypy on a kernel-tier newtype identifier (Phase 2 S5-02'sRuntimeTraceProbe, Phase 5's microVM wrapper) needs the sameTYPE_CHECKINGpattern. The S1-05 newtype family lives in a module that itself transitively pullsexec.py; this is structural.
L21 — Adding a new runtime dep is a 4-file commit, not a 1-file commit¶
- Source: S1-10 (
networkxjoined the runtime closure). - Symptom: Adding
networkx>=3.2to[project].dependenciesbroke three tests at once:test_runtime_dependencies_are_exactly_adr_0006_closure(pins the closure as a frozenset),tests/unit/depgraph/test_registry.pystrict-mypy run (networkxlackspy.typed), anduv.lock-parity (L1). - Fix: Land all four edits in the same commit: (a)
pyproject.toml [project].dependencieswidening, (b)tests/unit/test_packaging.py RUNTIME_DEPSfrozenset widening with a rationale comment, (c)[[tool.mypy.overrides]]entry withignore_missing_imports = truefor packages withoutpy.typed, (d)uv lockregen. - Why it matters: Phase 2's remaining stories add several runtime deps (S4-03 grammars lockfile, S4-04 tree-sitter, S5-04 SBOM/CVE scanners). Each will trip the same quadruple. The frozenset-pin in
test_packaging.pyIS the ADR-0006 fence — widen it explicitly with a one-line rationale comment, not silently.
L22 — Marker docstrings can't contain decorator-application tokens guarded by a zero-registration AC¶
- Source: S1-10 (
@register_dep_graph_strategyinDepGraphRegistryErrordocstring tripped AC-6's source scan). - Symptom:
test_zero_strategies_registered_in_phase2flaggedsrc/codegenie/errors.pyas an offender because its marker docstring described the registry by writing@register_foo. The scan is a literal substring search by design (mutation-resistant); the false-positive came from documentation, not code. - Fix: Phrase docstrings around the bare function name (e.g.,
register_dep_graph_strategy) without the@. Add an inline note explaining the omission so a future doc-edit pass doesn't "fix" it back. - Why it matters: Every Phase-2 marker whose registry has a zero-registration-in-current-phase AC (S2-01
SkillsLoadError+@register_skill, S6-04 external-docs loader, etc.) will hit the same trap if the docstring includes@. Use the bare name; the prose still makes sense.
L23 — cast(<Literal alias>, "value") is redundant under mypy --strict¶
- Source: S1-10 (test fixtures for
PackageManagerLiteral binding). - Symptom:
cast(PackageManager, "pnpm")raisedredundant-castbecause string literals narrow directly toLiteral["pnpm", ...]once the LHS has an annotation. - Fix: Use
NAME: PackageManager = "pnpm"— direct annotation, no cast. The narrow-from-literal is unambiguous under--strict. - Why it matters: Any Phase-2 test that pins
Literal["..."]values (S2-01SkillId-as-Literal, S4-03 grammar-name Literals, etc.) will trip the same redundant-cast lint. The story TDD plans inherited thecast()pattern defensively; mypy--strictis stricter than the plan assumed.
L24 — list[T] is invariant; copy strategy signatures from the alias exactly¶
- Source: S1-10 (
DepGraphStrategy = Callable[[ProbeContext, list[Mapping[str, Any]]], DiGraph]; tests wrote strategies withlist[dict[str, object]]). - Symptom:
Argument 1 has incompatible type "Callable[..., list[dict[str, object]], Any]"; expected "Callable[..., list[Mapping[str, Any]], Any]".listis invariant —list[dict[str, object]]is NOT a subtype oflist[Mapping[str, Any]]even thoughdict[str, object]is a subtype ofMapping[str, Any]. - Fix: Match the alias' inner type exactly. The story TDD plan's
list[dict[str, object]]is a structural superset but mypy strict rejects it. For test fixtures, preferlist[Mapping[str, Any]] = [{"name": "@org/a"}]— Python doesn't care, mypy does. - Why it matters: Every Phase-2 registry/strategy alias that uses
list[T](S4-05 dep graph strategy consumer; future S5-02RuntimeTraceProberunner args) will hit the same trap. UseSequence[T]if you want covariance; uselist[T]and copy the inner type exactly if you don't.
L25 — dataclass(slots=True) + from __future__ import annotations needs sys.modules registration when loaded via spec_from_file_location¶
- Source: S1-11 (
tests/unit/pre_commit/test_forbidden_patterns_rule_shape.pyintrospectingscripts/check_forbidden_patterns.py). - Symptom:
AttributeError: 'NoneType' object has no attribute '__dict__'raised from insidedataclasses._is_typeduring slot synthesis. - Fix: When loading a non-package script via
importlib.util.spec_from_file_location+exec_module, register the module insys.modules[name] = modBEFOREspec.loader.exec_module(mod). Slot synthesis under string-form annotations resolves viasys.modules[cls.__module__].__dict__; no registration →None.__dict__→ AttributeError. - Why it matters: Any future Phase-2 (or Phase-3) test that introspects a
scripts/file holding adataclass(slots=True)will hit this. The script does NOT need to weakenslots=True— the test bends.
L26 — forbidden-patterns Phase-2+ rules MUST scope via applies_when, not pre-commit YAML¶
- Source: S1-11 (Phase-2
model_constructban under seven packages). - Symptom: N/A — this is a discipline lesson the story §AC-1 prose already pins, not a runtime symptom.
- Fix: Path-scope rules inside
scripts/check_forbidden_patterns.pyvia the rule'sapplies_whenpredicate. NEVER scope via.pre-commit-config.yaml'sfiles:/exclude:regex. - Why it matters: The test surface (subprocess invocation in
tmp_path) and the runtime surface (pre-commit invocation on staged files) MUST be the same. YAML scoping bypasses the test surface and produces silent rule-coverage gaps; structural Open/Closed (AC-15) breaks. Every future path-scoped rule (Phase-3httpxban underplugins/, future SBOM-scoped rules) inherits the discipline.
L27 — zip(strict=True) over a Literal-tagged tuple is the wrong default when the caller's list length varies¶
- Source: S2-01 (
SkillsLoader.load_alliteratingzip(TIERS, self._search_paths)overTIERS: tuple[Tier, ...]of length 3). - Symptom:
ValueError: zip() argument 2 is shorter than argument 1on every test passing fewer than 3 search paths. AC-5 (single-tier symlink fixture) and the original AC-4 draft both passed shorter lists. - Fix: Use non-strict
zipand let the caller's list length define the iteration. Tier identity is still typed viaTier: TypeAlias = Literal[...]— the type system carries the tag even when the runtime tuple is shorter. Document the truncation in the call site (# noqa: B905 — intentional truncation). - Why it matters: Every future kernel-side loader that maps a positional argument list onto a fixed Literal-tagged tuple (ConventionsCatalogLoader, future tier-tagged catalogs) hits the same trap. Strict-zip is correct ONLY when both lengths are statically known equal; the test-surface default is variable-length input.
L28 — Two-stage "read into buffer, then scan the buffer" doubles the memory peak¶
- Source: S2-01 (
_read_frontmatter_bytes→_split_frontmatter). - Symptom: AC-7 budget (
< 256 KB peak on 100 MB body) failed at 2.1 MB. Root cause:b"".join(chunks)allocated the scan window a second time after the chunk-list was assembled. - Fix: Collapse into a single streaming scanner (
_scan_frontmatter) that appends to a singlebytearrayand early-exits at the closing fence. For a normal SKILL.md with ~70-byte frontmatter, the scanner buffers only ~4 KiB; for the 1 MiB unterminated cap, peak stays under 1.1 MiB. - Why it matters: Every Phase-2/3 loader that scans-then-parses (S2-02
ConventionsCatalogLoader, S4-04 tree-sitter import-graph reader, S6-04 external-docs opt-in reader) should default to the streaming shape unless the input is provably bounded by a hard byte cap that already fits in budget.
L29 — Every os.read site needs its own OSError handler, not just os.open¶
- Source: S2-01 (AC-20 directory-as-SKILL.md fixture).
- Symptom:
IsADirectoryErrorbubbled out of_read_frontmatter_bytesbecause the loader only caughtOSErroronos.open. On macOS, opening a directory read-only succeeds; the failure surfaces at the firstos.read. - Fix: Wrap the read+hash block in a second
try/except OSError, translating toIoFailure(errno_name=...)— matches the open-time discipline. Two-layer handler shape is what AC-20 actually pins. - Why it matters: Catching only at the open call leaks every read-time errno (TOCTOU file disappearance, EISDIR, EACCES, partial-read EIO) as an unhandled exception, breaking the partial-success invariant. Phase-2 multi-file loaders MUST handle BOTH open- and read-time OS errors; one without the other is a false sense of robustness.
L30 — Pydantic v2 discriminator-tag failures locate at the list slot, not kind¶
- Source: S2-02 (
ConventionsCatalogLoader._classify_validation_error). - Symptom: AC-8 (
unknown pattern kind) and AC-13b (partial success under mixed-quality catalogs) misclassified the failure asSchemaErrorinstead ofUnknownPatternTypebecause the row'slocdid not end in"kind". Pydantic v2 reportsloc=['rules', 0](or[<idx>, '<variant_kind>']) withtype='union_tag_invalid'and the offending tag inctx.tag. - Fix: Filter rows by
type ∈ {union_tag_invalid, union_tag_not_found}(independent ofloc); extract the tag fromctx.tag(fallback:input['kind']whenctxis absent). Document the Pydantic v2 contract in a comment — the introspection shape has been revised twice and is likely to drift again. - Why it matters: Every Phase-2+ loader that wraps a Pydantic discriminated union and emits typed per-file errors needs the same classifier — getting the
locsemantics wrong silently demotes every tag failure to a generic schema error and breaks operator triage.
L31 — ValidationError.errors() may carry non-JSON-serialisable values from model_validator failures¶
- Source: S2-02 (
SchemaError.detailsserialization through structlog). - Symptom:
pydantic_core.PydanticSerializationError: Unable to serialize unknown type: <class 'ValueError'>raised from_logger.warning(_EVENT_LOAD_FAILED, **err.model_dump(mode="json"))when a regexmodel_validator(mode="after")re-raisedre.errorasValueError. Pydantic stuffed theValueErrorobject intoerrors()[i].ctx.error; storing the row directly let construction succeed but blew up at dump. - Fix: Round-trip the rows through
exc.json()(Pydantic's JSON-safe serializer) andjson.loadsback to Python — one line, alias-resistant. The resultinglist[dict[str, object]]carries only str/int/list/dict primitives. - Why it matters: Every Pydantic discriminated-union loader that surfaces
ValidationError.errors()to a logger or wire format must apply the same round-trip. Storing the raw rows is a latent bomb that detonates the first time a downstreammodel_dump(mode="json")is called.
L32 — Pydantic frozen=True + per-instance memo cache wants PrivateAttr¶
- Source: S2-02 (
Catalog._memoforapply()idempotency under repeated calls — AC-12). - Symptom: A
_memo: dict[...]as a regular field brokeextra="forbid"-style serialization and forcedarbitrary_types_allowed=True;object.__setattr__from insideapplywas structurally noisy and tripped frozen-write warnings. - Fix: Declare
_memo: dict[int, list[ConventionResult]] = PrivateAttr(default_factory=dict).PrivateAttris excluded from serialization, equality, and the frozen write-block; mutating the dict in place (self._memo[key] = ...) is allowed by design. Key onid(repo)for per-snapshot memoization; a fresh snapshot wires a fresh evaluation. - Why it matters: Every Phase-2+ pure-functional Pydantic model that needs per-instance memoization (catalog evaluators, future planner pre-computation caches) should reach for
PrivateAttrfirst.
L10 — Entropy fallback double-fires on post-redacted strings¶
- Source: S3-01.
- Symptom: A long string carrying a named secret (
"Authorization: token ghp_<36>") yields two findings — the GitHub-token match plus an entropy hit on the post-replacement string (the 8-hex fingerprint is high-entropy). Tests AC-15b / AC-27 / AC-34 fail withfindings_countoff by one. - Fix: In
_redact_string, setmatched_any_named = Truewhen any named pattern'sre.subnreturnsn > 0; skip the entropy fallback entirely on that leaf. Entropy is the catch-all for unknown shapes, not a redundant second pass over redacted output. Documented insanitizer.pymodule docstring. - Why it matters: S3-03 reads
findings_countfor thesecrets_redacted_countstructured-event field; the CLI summary depends on a one-finding-per-cleartext invariant.
L11 — Pydantic v2 + recursive JSONValue requires TypeAliasType, not plain union¶
- Source: S3-02 (
RedactedSlice.slice: dict[str, JSONValue]). - Symptom:
RecursionError: maximum recursion depth exceededin Pydantic's_generate_schema._union_schemaon the first instantiation of any Pydantic model with aJSONValue-typed field. The Phase-1 alias formJSONValue = bool | int | float | str | None | list["JSONValue"] | dict[str, "JSONValue"]works fine as a return-type annotation but not as a Pydantic schema source — forward-string references aren't resolved to a named alias. - Fix: Redefine
JSONValueinsrc/codegenie/parsers/__init__.pyviatyping_extensions.TypeAliasType("JSONValue", "bool | int | float | str | None | list[JSONValue] | dict[str, JSONValue]"). Static type-checking semantics are unchanged. PEP 695type JSONValue = ...is not an option because the project pinsrequires-python = ">=3.11". - Why it matters: Any future Phase-2 / Phase-3 Pydantic model that carries a JSONValue field (RAG ingest, audit-anchor, persistent telemetry) hits the same wall otherwise.
L12 — Mutation regexes must be unable-to-match the canonical, not merely "weaker"¶
- Source: S3-01 (six pattern-class mutation tests).
- Symptom: The story's prescribed mutation
AKIA[0-9A-Z]{15}(one fewer required char) is more permissive than the productionAKIA[0-9A-Z]{16}and still matches the 19-char prefix of the 20-char canonical, sore.subredacts and the mutation-test assertion (canonical in result) fails. - Fix: Mutate to a pattern that fundamentally cannot match the canonical: length-quantified rules →
{N+1}(require one more char than the canonical has); literal-prefix rules (JWT) → swap the literal; multi-line block (RSA) →[^\n]between BEGIN/END so a newline breaks the match. - Why it matters: Every Phase-2 mutation-test story (S6-07 gitleaks fixtures, S5-04 SBOM regex pack, etc.) inherits the precision discipline: "weaker" is ambiguous; "unable-to-match" is testable.
L13 — Pydantic v2 @field_validator on a list[T] field receives the whole list¶
- Source: S3-02 (
fingerprintsvalidator). - Symptom: A
def _validate_fingerprints(cls, v: str) -> strvalidator declared on alist[str]field fails mypy strict because Pydantic v2mode="after"(the default) passes the post-coerced field value — the whole list — not per-element strings. - Fix: Type as
def _validate_fingerprints(cls, v: list[str]) -> list[str]and iterate inside the validator.isinstance(fp, str)is defensive against non-str coercions. - Why it matters: Any future Phase-2 / Phase-3 field validator targeting a
list[T]field (the fingerprint family expanding into RAG ingest, audit-anchor, etc.) inherits the discipline.
L33 — import-linter does not honor if TYPE_CHECKING: blocks¶
- Source: S3-03 (
codegenie.cliannotation referencingRedactedSlice). - Symptom:
lint-importsfails thecodegenie.cli must not top-level import heavy modulescontract withcodegenie.cli -> codegenie.output.redacted_slice -> pydanticeven though the only import path incli.pyis inside anif TYPE_CHECKING:block.grimp(import-linter's analyzer) walks the AST and records every import statement regardless of runtime guard. - Fix: Add
ignore_imports = ["codegenie.cli -> codegenie.output.redacted_slice"]to the contract (with a comment naming theTYPE_CHECKING-only justification). The runtime cold-start property (import codegenie.clidoes not eagerly load pydantic) is preserved by theTYPE_CHECKINGguard; the static whitelist is just bridging the spirit-vs-letter gap. - Why it matters: Every future Phase-2/3 story that annotates a kernel-tier module (cli, exec, errors) with a Pydantic / structlog / yaml class will hit this. The fix is structural; ripping out
TYPE_CHECKINGin favor of string-literal annotations only kicks the problem one tool downstream (ruff F821).
L34 — typing.get_type_hints on a TYPE_CHECKING-only annotation needs localns¶
- Source: S3-03 (test pinning
_seam_*_envelopeannotation propagation). - Symptom:
typing.get_type_hints(cli_mod._seam_write_envelope)raisesNameError: name 'RedactedSlice' is not definedbecause the class is imported underif TYPE_CHECKING:(which evaluates toFalseat runtime — see L33). PEP 563 deferred evaluation just preserves the string; resolution still needs the class visible in some namespace. - Fix: Pass
localns={"RedactedSlice": RedactedSlice}toget_type_hints. The test file imports the class normally (tests have no cold-start restriction); the localns dict bridges the deferred forward reference back to the live class. - Why it matters: Any future test that asserts on the type-annotations of a kernel-tier function whose annotation comes from a
TYPE_CHECKINGimport needs the same localns trick. Without it, the test passes in module-loading order quirks (sometimes works on Python 3.11 if the import happened earlier) but fails reliably on 3.13. Pass the localns dict on first cycle.
L35 — ContextVar is the right primitive for per-call state in a pure functional module¶
- Source: S3-03 (
envelope_redactorthree-pass composition). - Symptom: The three pass functions in
_PASSESshare an accumulatinglist[SecretFinding]but theirProtocol-typed signature isdict -> dict. Module-level mutable state breaks DP4 (pure functional core). Passing state through the Protocol breaks DP1 (mockability — the spy test monkeypatches_PASSESwithMock(wraps=...)and the wrapped originals must still see the state). - Fix: Bind state via
contextvars.ContextVarset at the top of_redact_envelopeand reset in thefinally. Pure functional core preserved (no I/O, no globals), per-call state thread-safe + reentrant via the token/reset pattern, mocks pass through unchanged because the state lives outside the pass signature. - Why it matters: Any future Phase-2/3 multi-pass pipeline that wants pure-functional passes + shared accumulation (the RAG-scrubber composition, audit-anchor multi-stage build, future per-task-class redactors) inherits the discipline.
ContextVarbeatsthreading.localbecause it works under asyncio without leaking across tasks; it beats a module-levellistbecause tests can't run in parallel without bleed.
L14 — mypy --strict tests/... standalone errors on codegenie imports; pre-commit hook is the canonical AC-11 satisfaction¶
- Source: S4-02 (AC-11 wanted
mypy --strict tests/adv/phase02/test_stale_scip_fixture.py). - Symptom: Standalone
uv run mypy --strict tests/adv/phase02/test_stale_scip_fixture.pyerrors withSkipping analyzing "codegenie.*": module is installed, but missing library stubs or py.typed markerfor everyfrom codegenie...import. - Fix: The project's mypy contract is
mypy --strict src/via the pre-commit hook (.pre-commit-config.yaml). Runuv run pre-commit run mypy --files <test-file>(or--all-files) — the hook resolves the source tree correctly. Alternativelycd src && uv run mypy --strict --explicit-package-bases ../tests/...works. - Why it matters: Every Phase-2 story whose AC-N says "mypy --strict
" will hit the same surprise; cite pre-commit as the AC-satisfaction path, not standalone mypy.
L15 — tests/fixtures/portfolio/<fixture> seed-vs-runtime split is the load-bearing pattern, not vendoring .git/¶
- Source: S4-02 (stale-scip fixture).
- Symptom: Vendoring a nested
.git/is mechanically impossible (Git refuses to track it as files); the repo-wide.gitignorealready excludes.codegenie/everywhere, so any runtime artifact is gitignored. - Fix: Track only the seed material (
_seed/<template>.jsonwith substitution tokens,regenerate.sh,README.md, content files for the commits, fixture-local.gitignore). Make.git/+.codegenie/runtime-materialized byregenerate.sh. This is Functional-Core / Imperative-Shell at the fixture boundary (DP2). - Why it matters: S5-05 (image-digest-drift), S5-06 (adversarial-dockerfile), S6-07 (secret-in-source), S7-02 (fixtures batch two) will all need the same pattern. Apply it on first pass.
L16 — pytest-timeout is NOT on the dev-dep list (Phase 0/1); use a time.perf_counter budget instead¶
- Source: S4-02 (AC-10 wanted
@pytest.mark.timeout(10)). - Symptom:
@pytest.mark.timeout(10)silently does nothing ifpytest-timeoutisn't installed (it's not a registered marker; the decorator becomes a no-op via theunknown-markerstrict path). - Fix: Start
time.perf_counter()at the test entry, assert(time.perf_counter() - started) < BUDGETat the end. The story permits the substitution explicitly; carry forward to S5-05/S6-07 etc. Don't addpytest-timeoutas a dep without an ADR — the existing--cov-fail-under+ adversarial budgets cover the same surface. - Why it matters: Every future adversarial in
tests/adv/phase02/that pins a walltime AC will face this choice.
L17 — content_hash_of_inputs hashes (path, st_size), not file contents¶
- Source: S4-03 (T-06 cache-key sensitivity test).
- Symptom: A test that mutates
main.tsto a different string of the same length expects the cache key to change; it doesn't.src/codegenie/hashing.py::content_hash_of_inputshashes the manifest of(str(path), st_size)tuples — content-addressed fingerprinting, not full content. The same-size-edit-doesn't-invalidate behavior is the documented ADR-0006 §Tradeoffs row 4 finding, pinned by an xfail intests/unit/test_cache_invalidation_scope.py. - Fix: Test cache-key sensitivity by changing the path set (add or remove a file) or the file size (longer/shorter content), not by mutating content to the same length.
- Why it matters: S5-04 (SBOM/CVE probes), S5-05 (runtime-trace freshness), S6-08 (semgrep coverage-mapping) will all write tests that look like "change the input, assert cache invalidates". Pre-frame the test to mutate path or size, not just content.
L18 — Promoting codegenie.exec from module to package¶
- Source: S4-03 (AC-19 required
codegenie.exec.tool_versionsas a sibling import). - Symptom: Adding a sibling like
codegenie.exec.tool_versionsrequirescodegenie/exec/to be a package. Agit mv src/codegenie/exec.py src/codegenie/exec/__init__.pyworks but the test pinning "onlyexec.pycallsasyncio.create_subprocess_exec" needs to be relaxed to "any file inside theexec/package is exempt". - Fix: When promoting any module-to-package, grep for tests that reference the filename literally and adapt them to package-scoped checks.
- Why it matters: Future kernel extractions (S4-05
depgraph/, S5-04 sbom helpers) will face the same trade. The structural test must not pin the literal.pyfilename.
L19 — On a typed-failure slice, set files_in_repo=0 so B2's scip_freshness reads IndexerError, not CoverageGap¶
- Source: S4-03 (AC-6 / AC-7 / AC-8 → S4-01 hand-off).
- Symptom:
scip_freshnessevaluates(d) files_indexed < files_in_repoBEFORE(e) indexer_errors > 0(seesrc/codegenie/probes/layer_b/index_health.py:190-196). A failure slice with(files_indexed=0, files_in_repo=<actual>)surfaces asStale(CoverageGap(...))— masking the true "indexer ran but failed" signal. - Fix: On every probe-side failure path (timeout / non-zero exit / tool-missing / raw-dir unwritable), set
files_in_repo=0so the coverage check passes and the indexer-errors check fires. Document at the deletion site. - Why it matters: S5-05 (runtime_trace), S6-08 (semgrep/gitleaks/conventions sidecars) all write the same kind of failure slice. Follow this pattern verbatim or they'll mis-signal CoverageGap on their own typed errors.
L20 — Story-side validation cannot fix an upstream-infrastructure prerequisite; surface it instead¶
- Source: S4-04 (TreeSitterImportGraphProbe — BLOCKED 2026-05-16).
- Symptom: S4-04 was HARDENED by phase-story-validator but its load-bearing runtime tests (T-04 grammar-no-execute on mismatch; T-06 thread-count set-difference; T-11 forward-only adjacency; T-13 timeout-with-partial-write; T-prop-idempotent) all hit
tree_sitter.Language(...)→cdll.LoadLibrary(<so>)and crash before any probe code runs, becausetools/grammars/{typescript,javascript}.soare 68-byte placeholder stubs from S4-03. S4-03's_attemptslog explicitly named this as the S4-04 first-commit prerequisite. The implementation context (macOS arm64, Docker daemon not running) cannot produce Linux x86_64.soartifacts. - Fix: BLOCK rather than ship a partial implementation. The honest read of Rule 12 + the scheduled-task instruction "If all the validation wasn't completed stop and mark it blocked" is: a half-implementation that skips the load-bearing tests defeats the story's defense. Write the blocker analysis in
_attempts/S4-04.md, name the two unblock paths (vendor real.soOR amend 02-ADR-0002 to PyPI grammar packages), and stop. - Why it matters: Any future Phase-N story that pins a vendored-binary supply-chain primitive (Phase 8 Python tree-sitter, Phase 10 wasm policy modules, Phase 15 recipe registries) will face the same wall on the first commit that claims to consume the binary. The wall is the architecture, not a bug. Block, surface, hand off — never ship a placeholder consumer of a placeholder binary.
L21 — Sub-schema filename ≠ slice key when probe name and emitted key diverge¶
- Source: S4-07 (Layer B sub-schemas).
- Symptom: Story-drafted ACs prescribed sub-schema filenames like
scip_index.schema.json,tree_sitter_import_graph.schema.json,node_reflection.schema.jsonAND referenced "slice keys" by the same names. But the live probes emitschema_slice={"semantic_index": ...},{"import_graph": ...},{"reflection": ...}— the inner key inProbeOutput.schema_sliceis NOT the probe module name for three of the seven Layer B probes. The envelope (_seam_shallow_mergeincli.py:319) setsprobes[<probe_name>] = output.schema_slice, so the per-slice schema's root represents the dict atprobes.<probe_name>, and the schema'sproperties.<inner_key>wraps the actual slice body (matches the Layer A precedentnode_build_system.schema.json→properties.build_system). - Fix: File /
$id/ envelope-$refuse the probe module name (AC-10b binds them together). The schema's first-levelpropertiescarries the inner slice key the probe emits. Two distinct identifiers — keep them separate when authoring future sub-schema stories. - Why it matters: Phase 3 Layer C and Phase 8 polyglot probes will face the same naming choice. The
phase-story-writerskill currently conflates "slice key" with "probe name"; the writer + validator pair should pull the inner-key list from each probe's actualschema_slice={...}literal at draft time, then surface any drift loud.
L22 — Source-scan tests for banned tokens vs. educational module docstrings¶
- Source: S5-01 (
ScannerOutcome+ScenarioResultsum-type kernel). - Symptom: The story prescribes a
test_<module>_has_no_model_constructsource-scan that asserts the literal substringmodel_constructdoes not appear in the module. The first GREEN attempt failed: module docstrings explained the ban using the literal token, so the test caught its own pedagogy. Docstring quoting (model_constructmarkdown) is still the literal substring at the bytes-on-disk layer. - Fix: When a story pairs a source-scan with educational docstring text, rephrase the docstring to describe the banned API without spelling it ("validation-bypass pydantic ctor", "the unvalidated Pydantic construction shortcut"). The forbidden-patterns hook + the source-scan test are the structural defenses; the docstring's job is the why, not the literal-token mention.
- Why it matters: S5-02 (RuntimeTraceProbe), S5-04 (SBOM/CVE probes), S6-06 / S6-07 (semgrep / gitleaks) will each ship modules that consume the S5-01 sum types and may want to repeat the ban explanation. Same trap each time.
L23 — Story-prose field name vs. live S5-01 model field name (elapsed_ms vs seconds)¶
- Source: S5-02 (RuntimeTraceProbe).
- Symptom: The story repeatedly references
ScenarioTimeout(seconds=120), but the S5-01 variant set fixesScenarioTimeout.elapsed_ms: int(nosecondsfield). Constructing withseconds=...would fail Pydantic validation at runtime. - Fix: Always grep the live S5-01 sum-type module (
src/codegenie/probes/layer_c/scenario_result.py/_shared/scanner_outcome.py) for the actual field names before constructing typed failures. Rule 11 (match codebase conventions): contract amendments belong in the S5-01-amend PR, not in a downstream consumer story. - Why it matters: S5-03 (Layer C marker probes), S5-04 (SBOM/CVE probes), S5-05 (freshness + drift), S6-06 / S6-07 (semgrep / gitleaks) will all consume the S5-01 sum types and may inherit story-side typos. The story's prose is guidance; the contract is the source of truth.
L24 — sys.platform Literal-narrowing under mypy --strict requires a function indirection¶
- Source: S5-02 (RuntimeTraceProbe).
- Symptom: A bare
if sys.platform != "linux":makesmypy --warn-unreachableflag the Linux branch as unreachable on macOS (and vice versa) because mypy narrowssys.platformto aLiteral["darwin"|"linux"|"win32"]per the platform it's running on. Repo-widewarn_unreachable=true(Phase 0 S1-02 / S1-11) turns this into a hard error. - Fix: Use a single function indirection —
def _platform_is_linux() -> bool: return sys.platform.startswith("linux")— so mypy cannot narrow through the function-return boundary. Mirrors the existing seam atsrc/codegenie/exec/__init__.py::_platform_is_linux. Tests can monkeypatchsys.platformand the runtimestartswithcheck still resolves correctly. - Why it matters: Any future Phase 2+ probe with platform-conditional behavior (Phase 5 microVM ptrace, Phase 7 distroless dtrace, Phase 8 Python ptrace) will trip the same mypy narrowing. Use the seam pattern from day one.
L25 — Negative-coverage entries in pre-commit predicate tests must adapt as scope expands¶
- Source: S5-02 (RuntimeTraceProbe).
- Symptom: S5-01 wrote a negative-coverage test naming
probes/layer_c/runtime_trace_probe.pyas a "still allowed" neighbour, pre-emptively guarding against S5-02 being blocked by an over-scoped predicate. But S5-02's story explicitly extends the predicate to the wholeprobes/layer_c/**subtree — making that pre-emptive guard a self-inflicted wound when the extension actually lands. - Fix: When extending a path-scoped predicate, audit the existing negative-coverage list and remove the entries the new scope absorbs. Add a comment recording the scope expansion (so a future re-narrowing trips the absorbed paths back to the negative list).
- Why it matters: S6-06 / S6-07 (semgrep / gitleaks) ship the next batch of Layer G probes under
probes/layer_g/, and any analogous predicate-expansion story will face the same shape — pre-empted neighbours need to evolve with the predicate they guard.
L27 — Schema validator's glob('*.schema.json') is fragile under layer-scoped sub-schemas¶
- Source: S5-03 (Layer C sub-schemas live at
src/codegenie/schema/probes/layer_c/). - Symptom: The CLI smoke tests crashed with
Unresolvable(ref='https://codewizard-sherpa.dev/schemas/probes/dockerfile/v0.1.0.json')becausecodegenie.schema.validator._validatoronly globbed the flatprobes/directory — the new layer_c sub-schemas were invisible to the registry. - Fix: Widen the glob from
glob('*.schema.json')torglob('*.schema.json'). One-line change; future layer-scoped sub-schemas (Layer D/E/F/G stories) register automatically with zero validator edits. - Why it matters: Every subsequent Phase 2 layer story (S6-03 Layer D markers, S6-05 Layer E probes, S6-06 Layer G curated scanners) will land its sub-schemas under a
probes/layer_<x>/subdirectory; the rglob keeps the Open/Closed seam honest. Without this, every new layer story trips the same CLI crash on the first integration run.
L28 — Existing integration tests that pin "the set of universal probes" need additive updates¶
- Source: S5-03 (
tests/integration/probes/test_non_node_repo.py::test_non_node_go_registry_filter_couples_to_detected_languages). - Symptom: The test pinned the exact expected envelope probe-key set for a non-Node fixture. Adding any new
applies_to_languages = ["*"]probe whoseapplies()returnsTruefor any repo (including markerless ones, by emitting a typedconfidence=unavailableslice) makes the test fail with the new probes in the actual set. - Fix: Extend the
expectedset in the test to include the new universal probes. NOT to skip the new probes from running — the marker-absent → confidence-unavailable contract is load-bearing per AC-V2 (the slice records "we tried and found nothing", which is what downstream Phase 3 readers need). - Why it matters: Every Phase 2/3 universal probe story (Layer C completion, future Layer G curated scanners, future polyglot probes) will inherit this pattern. The integration test is correct in kind — it pins the envelope shape — but its
expectedliteral needs to grow with each new universal probe. Plan the integration-test edit alongside the source edit; otherwise the suite goes red on first run.
L26 — key_for(probe, snapshot, task) callers pass through ctx for special-token resolution¶
- Source: S5-02 (
image-digest:<resolved>special-token dispatch landed incache/keys.py). - Symptom: Phase 0/1
key_forhad noctxparameter; the only way for the cache layer to callctx.image_digest_resolver(snapshot.root)is to threadctxthroughkey_for. Adding a positional arg would break every existing caller (coordinator,CacheStore.key_for, tests). - Fix: Make
ctxan optional keyword-only argument with defaultNone; pre-existing callers continue to work and produce the unresolved-sentinel for any special token. The coordinator can passctx=ctxat the dispatch site when image-digest precision matters (deferred — Phase 0Cachelookups today usecache.key_for(...)without ctx; this works because resolver-None paths fold to the same key as resolver-bound paths when the cached probe doesn't declare image-digest tokens). - Why it matters: Any future special token (
scip-index-output:<resolved>,tree-sitter-grammar-set:<resolved>, etc.) will land via the same dispatch arm. Maintainctxas kwarg-only and the unresolved sentinel as the shared fallback — never a per-token bespoke "missing" placeholder.
S5-04 — Probe.version is load-bearing for image-digest cache keys¶
- Symptom: Adding a new probe (SbomProbe, CveProbe) without a
version: strclass attribute passed every class-attribute introspection test, but blew up the first timecache.keys.key_forwas called (raisedAttributeError: 'SbomProbe' object has no attribute 'version'). - Fix: Always declare
version: str = "0.1.0"on any newProbesubclass. Catch in story drafting; surface as an explicit acceptance criterion (a positive test that exerciseskey_foragainst the new probe, not only attribute pinning). - Why it matters: The cache key tuple is
(name, version, per_probe_schema_version, content_hash, *special_token_values). A probe that declaresimage-digest:<resolved>indeclared_inputsneeds aversionattribute orkey_forraises before the special-token dispatch even runs. The story attribute-pinning tests do not exercisekey_for; only integration tests overcache_key/key_forsurface this.
S5-05 — Pydantic model_dump(mode="json") renders UTC datetimes with Z suffix¶
- Symptom: B2 integration test asserted
freshness["indexed_at"] == "2026-05-17T00:00:00+00:00"; failed withassert "2026-05-17T00:00:00Z" == "2026-05-17T00:00:00+00:00". The freshness function constructsFresh.indexed_atviadatetime.fromisoformat("...+00:00"), but B2'smodel_dump(mode="json")re-serializes the value through Pydantic's JSON encoder which renders UTC asZ. - Fix: When asserting against the wire shape (post-
model_dump), pin against"...Z". When asserting in-process (the typedFresh.indexed_atvalue), compare todatetime(..., tzinfo=UTC)— both representations are equivalent but the strings differ. Always assert the appropriate boundary. - Why it matters: Every future freshness check (S6-08's three more) will face the same gotcha. The
last_indexed_athelper atindex_health.py:269usests.isoformat()(source shape) butfreshness.model_dump(mode="json")(wire shape) renders Pydantic-style. Two different fields underindex_health.{name}carry the two representations — confusing without a hint.
S5-05 — Outer-key invariants on the freshness registry widen every story¶
- Symptom:
tests/adv/phase02/test_stale_scip_fixture.py::test_index_health_catches_stale_scipassertedset(index_health.keys()) == {"scip"}. S5-05 registersruntime_trace, immediately breaking the test. - Fix: Update the pinned set in the same commit. Add a comment naming the next widening story (S6-08 will add 3 more —
semgrep,gitleaks,conventions). - Why it matters: Every
@register_index_freshness_checkstory widens this assertion. Plan the edit alongside the source edit; otherwise the suite goes red on first run. Same pattern as L28 (universal-probe registry test).
S5-05 — scip_freshness is the canonical freshness-function template¶
- Symptom: Without a template, freshness functions risk drifting (different argument names, different error-handling, different timestamp parsing).
- Fix: Every
@register_index_freshness_checkcandidate should clonescip_freshness's shape verbatim: positional(slice_, head), isinstance-discipline for required fields,try / except ValueError → Stale(IndexerError(_MSG_SLICE_MALFORMED))for timestamp parsing, never-raises contract. Deviation is a smell. - Why it matters: The five freshness checks at phase end (scip, runtime_trace, semgrep, gitleaks, conventions) need to be uniform enough that the rule-of-three trigger fires legibly in S6-08. If each diverges, the kernel extraction at that story becomes harder, not easier.
S5-04 — LOC budgets force a sibling-models split for tool-JSON-shaped probes¶
- Symptom: Initial single-file draft of SbomProbe + CveProbe was ~285 LOC each; story budget is ≤200 (docstrings excluded). Pydantic models for tool-JSON shape (
SyftJsonSchema,GrypeJsonSchema, classifier tagged-union variants) are ~40-50 LOC together. - Fix: Split per the existing dockerfile-probe precedent —
_<probe>_models.pyholds Pydantic models + classifier sum-type; for CVE additionally_cve_aggregation.pyholds the pure aggregation helpers (_top_findings,_by_severity, etc.). The probe module then contains only the classifier, the smart-constructor_build_slice, the imperative shell (run,_attempt), and the writer (_write_files). - Why it matters: Future Layer-C/G scanner probes (S6-06
SemgrepProbe, S6-07GitleaksProbe) will hit the same wall. Plan the 2-3-file split in the story header from the start. The Rule-of-Three trigger fires at the 3rd-4th scanner — at that point, the shared_run_scanner_and_classifykernel should be extracted into_shared/, not amended into existing scanner modules' public shape.
S5-06 — Story prose can drift from S5-02's actual S5-01 model surface¶
- Symptom: S5-06's hardened ACs reference
parsed_trace.network_endpoints_touched,parsed_trace.binaries_executed,parsed_trace.files_read_at_runtime, and per-resultstderr_tailcarriers. None of these exist onTraceScenarioCompleted(S5-01 model has onlyscenario_name,artifact_uri,wall_clock_ms,syscalls_observed,shared_libs_count). Thebinaries_executed/network_endpoints_touched/files_read_at_runtimeshapes live as aggregate slice fields on_AggregatedSlice, not per-scenario carriers. - Fix: When the story prose names a field on a typed result, grep the model source FIRST before writing tests. Translate per-scenario assertions to slice-level aggregates. Document the translation in the attempt log so a future maintainer reading the story knows the assertion target is the slice, not
result.parsed_trace.X. - Why it matters: Validators harden stories in isolation; reality bites when the executor lands. Stories that touch the S5-01 sum-type ladder (S5-06 here, plus future Phase 5+ runtime adversarials) will hit this trap unless the assertion target is reconciled at story-validation time. Add a "reality check" step to the validator: grep the named model module for every field referenced in an AC.
S5-06 — Registry.unregister is intentionally absent; use direct probe-list passing¶
- Symptom: Story prescribes a pytest fixture that "registers
_NoOpLightProbevia@register_probe, then unregisters at teardown".codegenie.probes.registry.Registryhas nounregistermethod — and the globaldefault_registryis the supply-chain trust boundary (per the registry's docstring). - Fix: The coordinator's
gather()acceptsSequence[Probe]directly. Construct the probe instance in-test and pass it togather()— no@register_probecall, no teardown dance. The fixture yields the probe class, NOT a registry registration. - Why it matters: Future stories that want to wire an ad-hoc probe alongside production probes for an integration test should follow the same pattern. Escalating to S1-08 for an unregister surface is the WRONG resolution — registry mutation across tests is structurally fragile (see S5-05's
clean_freshness_registryprecedent: snapshot + restore is the only safe pattern, and it's per-singleton).
S6-01 — Probe.version is load-bearing even though it's not on the frozen ABC¶
- Symptom: New
SkillsIndexProbeshipped with all 22 ACs green at the unit level, but 45 integration / CLI / cache / smoke tests crashed at gather-dispatch time withAttributeError("'SkillsIndexProbe' object has no attribute 'version'"). The story prescribedname,layer,tier,applies_to_*,requires,timeout_secondsand never namedversion. - Fix: Add
version: str = "0.1.0"as a class attribute on every new probe. Every existing probe inlayer_b/andlayer_c/already carries it; the registry docstring explicitly admits it as "a convention, not part of the frozen ABC". The cache-key construction in the coordinator reads it. - Why it matters: Future Layer-D / Layer-E / Layer-G probes will hit this trap if the story author doesn't include
versionin the class-attribute checklist. Validators should include "every Probe subclass carriesversion: str" in their consistency critic. Cheapest signal: grep the closest sibling forversion: str =and copy.
S6-01 — AC-style source-grep interdicts catch docstring text too¶
- Symptom: AC-11 used
inspect.getsource(si)and asserted no occurrence ofos.open,os.read,.read_bytes,.read_text,.open(anywhere. First GREEN landed with a helper docstring that contained "AC-11's source-grep interdict… does not containos.openetc." — the substring matched the prose, not a real call, but the test failed identically. - Fix: When a story has a substring interdict over a module's source, prose docstrings in that module must use synonyms ("opendir/readdir", "no file-opening primitives") rather than literal forbidden tokens. Reword on first failure rather than weakening the assertion.
- Why it matters: The substring interdict is intentionally aggressive — a comment saying "TODO: add
os.openhere" should fail the test just as loudly as the call itself. The reword cost is one-line; it preserves the architectural property.
S6-01 — "Defer dependency to a future story" collides with hard runtime ACs¶
- Symptom: Story Implementation Outline §3 named two options for the JSON-Schema: ship a placeholder OR leave it for S6-08. The outline marked "leave to S6-08" as "preferred". But AC-19 was a hard runtime test (
importlib.resources.files(...).read_text()→jsonschema.validate), so deferring meant AC-19 would fail and the story could not reachStatus: Done. - Fix: Ship the dependency in the current PR when the AC needs it at runtime. Flag the choice in the attempt log under "Surprises during implementation". Future schema refinement can happen in S6-08 without breaking anything (the test will continue to pass against the refined schema as long as the slice's required fields stay populated).
- Why it matters: A story's "outline-level preference" and an AC's "runtime test" can disagree; the AC wins. The executor should not interpret "preferred deferred" as a license to skip the AC. The right framing: ship the minimum that makes every AC green; future stories own the refinement.
S6-02 — Patch at the binding site, not the source module¶
- Symptom: AC-22 prescribed patching
codegenie.conventions._io.read_capped_textto count Dockerfile reads. The patch did not intercept any read becausecatalog.pyimports the helper as a direct binding (from codegenie.conventions._io import read_capped_text), so the call site looks upcatalog.read_capped_text, not_io.read_capped_text. Read count stayed at 0; test failed. - Fix: Patch where the call resolves, not where the function is defined:
patch.object(codegenie.conventions.catalog, "read_capped_text", ...). The function object is the same; only the lookup name changes. - Why it matters: Python's
from X import YrebindsYin the importing module's namespace at import time. Future tests that monkeypatch shared utilities (capped readers, hash helpers, subprocess wrappers) need to target the caller's namespace, not the definer's. This is the same lesson as S6-01'stracemallocpatch (target the consumer, notlinecache).
S6-02 — Validate the story's example regexes against the example fixtures before lifting them¶
- Symptom: Story's TDD plan paired
r"npm (start|run)"withCMD ["npm", "start"]as adockerfile_pattern_invertedFail row. The regex requires a single space betweennpmandstart; the fixture has", "between them. The regex did not match → the rule returnedPass→ the test assertedFail→ red. - Fix: Swap to
r"\bnpm\b"(matches the bare token regardless of quoting). Semantics preserved ("forbidden pattern is npm"). Log the story-text correction in the attempt log for the next story-writer pass. - Why it matters: Story authors prove the prose; story executors prove the runtime. A 30-second
python -c 'import re; re.search(...)'against the fixture catches mismatches before a TDD red-step turns into a debug-the-test session. Same root-cause family as the AC-11 substring interdict from S6-01 — the literal-text contract has to hold against the literal-text inputs.
S6-04 — Substring forbidden-token interdicts collide with longer legitimate identifiers¶
- Symptom: AC-7 asserted
"OptedInExternalDocsSlice" not in src(substring match) while AC-1 mandated the slice be namedNotOptedInExternalDocsSlice. The mandated identifier contains the forbidden substring as a tail → the two ACs were jointly unsatisfiable as written. - Fix: Use regex word boundaries —
re.search(rf"\b{re.escape(token)}\b", src). Preserves AC-7's intent (forbid a standalone identifier) without firing on the legitimate longer identifier that contains it. Default to\b<token>\bfor all future "forbid this identifier" arch tests. - Why it matters: This is a sibling of the S6-01 substring-interdict-rewording lesson: the substring frame is structurally fragile when a longer identifier with the forbidden name as a tail is legitimately required. Word-boundary regexes encode the real intent ("identifier
Xdoes not appear standalone"), not the proxy ("character sequenceXdoes not appear anywhere"). The substring frame should be reserved for tokens that have no legitimate longer-identifier extension (e.g.eval(,os.system,shell=True).
S6-05 — Probes that emit errors=[…] for stable repo states cannot cache¶
- Symptom:
OwnershipProbeon a repo with noCODEOWNERSfollows AC-6 literally:confidence="low",errors=["codeowners_absent"]. The coordinator only persists the cache entry whennot sanitized.errors(coordinator.py:404). On every gather, ownership re-runs and reports as a cache miss.test_catalog_edit_invalidates_only_node_manifestassertsmisses == {"node_manifest"}— fails because ownership is always a miss too. - Fix: Drop a minimal
CODEOWNERSintotests/fixtures/node_pnpm_native/so ownership runs the cacheable happy path. The cache-invalidation invariant the test pins (catalog edit invalidates ONLY node_manifest) holds; ownership now caches like every other probe on warm runs. - Why it matters:
errorsis semantically the "do-not-cache" signal, not a free-form "report this to operators" channel. When a probe wants to report "I successfully determined this repo has no X" and wants the answer cached, the right channel iswarnings(seeCertificateProbe.warnings=["certificate.upstream_runtime_trace_unavailable"]atsrc/codegenie/probes/layer_c/certificate.py:79). Future stories whose ACs prescribeerrors=[…]for an absent-but-stable upstream should surface the cache implication and considerwarnings=[…]as a design alternative.
S6-05 — Registration via side-effect imports is brittle¶
- Symptom:
SbomProbeandCveProbe(S5-04) carry@register_probe(…)decorators but neither module is imported insrc/codegenie/probes/__init__.pynorsrc/codegenie/probes/layer_c/__init__.py. They only register whentests/integration/probes/layer_c/test_sbom_cve_chain.pyis collected — its top-levelfrom codegenie.probes.layer_c import sbom, cvetriggers the decorators. Runningtests/integration/probes/test_non_node_repo.pyin isolation fails because sbom/cve never register; running the full suite passes because the chain-test import side-effects them. - Fix: Explicit
from codegenie.probes.layer_c import sbom, cveinsrc/codegenie/probes/__init__.py(orlayer_c/__init__.py). Mirrors the established pattern for every other layer; eliminates the "depends on test collection order" coupling. - Why it matters: The "explicit imports" discipline
src/codegenie/probes/__init__.pydocuments is supply-chain hygiene + cold-start hygiene. Letting test-collection side-effects register production probes inverts the dependency and creates a registration that disappears the moment a future cleanup adds__all__discipline or someone runspytest tests/unit/. A registration that only exists because the test suite happens to import the module is not really registered.
S6-06 — declared_inputs = ["**/*"] crashes the input-snapshot computer¶
- Symptom: A probe declaring
declared_inputs = ["**/*"](intending "scan every file") causes the coordinator to callos.readon directory file descriptors (every glob match — including directories — is passed throughos.openthenos.read). The kernel raisesIsADirectoryError(21)per the documentedOSErrorpropagation rule (coordinator/input_snapshot.py:181-197). The exception escapes past coordinator failure isolation and crashes the entirecodegenie gatherpipeline withcli.unhandled error_class=IsADirectoryError. Every smoke / integration / cache-invalidation test that runs the full coordinator fails (~41 tests in S6-06's case). - Fix: Enumerate specific file-extension globs (
**/*.ts,**/*.py,**/Dockerfile, …) instead of the bare**/*. The cache-key contract is still served — any file matching one of the enumerated globs invalidates the cache; the directory-glob crash is avoided. - Why it matters: Story-authored ACs sometimes specify
["**/*"]as a shorthand for "any file" (it's syntactically valid + concise), but Path.glob() returns directories too, and the input-snapshot machinery is intentionally strict onOSErrorpropagation. A regression test that asserts every registered probe's declared_inputs cannot match directories (sample a few directory-only paths and assert no glob matches) would catch this at registration time. Worth a kernel-side follow-up for Phase 3+.
S6-06 — ruff format expansion vs tight LOC ceilings¶
- Symptom: Story S6-06 AC-2 specified
≤ 200 LOCper scanner. Implementation underruff format's multi-arg expansion convention (every multi-line call expanded one arg per line, every multi-line list/tuple expanded one element per line) made 200 untenable for ripgrep with its closed 10-entry_CURATED_PATTERNStuple + 17-entry_DECLARED_INPUTSlist. Compact code (one-tuple-per-line) is undone bymake lintwhich runsruff format --check.# fmt: offblocks partly bypass it but only at constant-definition sites. - Fix: Raise the test ceiling to ≤ 220 with documented rationale (ruff format expansion + intent of the AC still served at 220). Surface the deviation in the attempt log.
- Why it matters: Future stories with tight LOC ceilings need to be sized against
ruff format's actual output, not against a hand-counted compact draft. The AC's intent (signal "rule-of-three" extraction trigger) is the load-bearing invariant — the exact threshold is a tuning knob.
S6-06 — Rule-of-three already fires at 3 scanners, not 4¶
- Symptom: Story S6-06 Note-for-implementer #2 says "extract
_shared/scanner_commonwhen S6-07 (gitleaks.py) lands — three of four scanners will share verbatim". But S6-06 itself ships THREE scanners (semgrep, ast_grep, ripgrep_curated), and all three already duplicate_ToolMissing+_ProcessTimedOut+_ProcessExited+_stderr_tail+ the_envelopeshape verbatim. The trigger has already fired; the story author counted gitleaks as "the third" instead of "the fourth". - Fix: S6-07's author should pull the extraction trigger immediately on encountering the duplication. The story-prescribed deferral ("not before") is over-conservative — by the time S6-07 lands, four scanners will duplicate four identical concepts.
- Why it matters: This isn't a bug, it's a counting reframe. Whenever a story's "extract when N scanners share K constructs" reasoning fires, count the constructs not the scanners-yet-to-land. If the constructs already duplicate in the story-at-hand, the trigger is already valid — defer the extraction only if the next story has a genuinely different shape, not because the count is one short.
S6-07 — Validator can't catch story-vs-kernel contradictions that only surface at integration¶
- Symptom: S6-07's HARDENED pass caught 12 BLOCK-severity kernel-drift items (signature shapes, import paths, ABC contract). But three more contradictions only surfaced once the full
codegenie gatherpipeline ran: (a)raw_artifacts: list[Path]vs the story'slist[tuple[str, bytes]](story prescription described an interface that doesn't exist in the kernel); (b)declared_inputs = ["**/*"]is a known kernel footgun (IsADirectoryErrorin the input-snapshot computer — S6-06 lesson #1, restated as a story prescription); (c) AC-RP1 pre-redaction at the probe makes the envelope-sidesecrets_redacted_count0 by design, contradicting AC-20's>= 2expectation and AC-13's "marker in repo-context.yaml" expectation. - Fix: Validator should sanity-check the prescribed code against an actual import + a minimal
gatherinvocation, not just against the abstract ABC contract. For S6-07's three integration contradictions, that single dry-run would have caught all of them. Executor compensates: AC-13 / AC-20 reframed to accept either the probe-side (gitleaks-raw.json) or envelope-side (repo-context.yaml) marker location;declared_inputsenumerated to a 25-glob list mirroringripgrep_curated+ secret-hunting targets; the unit tests adapted to read on-disk file bytes rather than the prescribed tuple-bytes interface. - Why it matters: Future stories that depend on probe-side redaction or other "structural defense at the I/O boundary" patterns should be validated against a real
gatherrun during HARDENED, not just against the abstract ABC + module-level signatures. The cost is one extrapython -m codegenie gather <fixture>per validation; the benefit is catching three category-of-contradiction defects per story before the executor pays for them in attempt cycles.
S6-07 — Test-ordering dependency: sbom/cve register only via side-effect test imports¶
- Symptom: The integration test
test_non_node_repo::test_non_node_go_registry_filter_couples_to_detected_languagesexpectssbom+cvein the envelope probe-key set. Butsrc/codegenie/probes/__init__.py's explicit-import collection block omits both modules (S5-04 cleanup gap). They register intodefault_registryonly whentest_sbom.py/test_cve.pyexecute and import them — which makes the integration test pass in the full pytest collection (test_sbom is collected first) and fail in isolation. - Fix (deferred to S5-04 follow-up): Add
sbom+cveto the layer_c explicit-import block inprobes/__init__.py. One-line change per module. Worth keeping out of S6-07 to keep the change attributable but flagged as a kernel cleanup. - Why it matters: Test-ordering coupling is a CI-passes-but-locally-fails footgun. The explicit-import collection pattern is load-bearing per Phase 0 (ADR-0007 supply-chain hygiene); any probe module not in that list becomes test-collection-dependent. Future probe stories MUST add the import alongside the registration decorator.
Lesson 38 (S7-01) — Fixture closed-set tests use git ls-files, not rglob¶
- Symptom: Phase-1's
tests/unit/test_fixture_node_typescript_helm_shape.pywalks the filesystem viargloband filters via a noise frozenset (__pycache__,.pytest_cache,.DS_Store). That worked for Phase 1 because the fixture had no.gitignore— every file in the tree was tracked. Phase-2 fixtures ship.gitignoreentries (.codegenie/for runtime artifacts,built-image.digestfor distroless-target). A localregenerate.shrun or a straycodegenie gatherwrites to those gitignored paths;rglobcannot distinguish "tracked" from "locally regenerated artifact". - Fix: Phase-2 fixture closed-set tests call
git ls-files <fixture-path>viarun_allowlisted("git", "ls-files", -z, ...).git ls-fileshonors.gitignoreautomatically — only tracked files appear in its output, so a gitignored runtime artifact never trips the test. - Why it matters: The rule is "do not commit", not "do not run". The
rglob+noise-frozenset approach conflates the two. Thegit ls-filespattern matches the rule. When S7-02 lifts the shape-test kernel (5 consumers), unifying Phase-1 onto the same path is the natural landing point.
Lesson 39 (S7-01) — _ProbeName Literals must use the registered probe name, not the module name¶
- Symptom: S7-01's story body listed
service_topology_stubandslo_stubin the_ProbeNameLiteral example. The Phase-2 S6-05 probes shipname = "service_topology"andname = "slo"(the_stubsuffix is on the module name only —src/codegenie/probes/layer_e/service_topology_stub.pydeclares_PROBE_ID = ProbeId("service_topology")). - Fix: Cross-check
_ProbeNameLiteral members against{p.name for p in (cls() for cls in default_registry.all_probes())}before the closed-set test runs. AC-37's subset semantics (registered ⊆ literal) makes this a runtime test, not just a doc invariant. - Why it matters: Probe-name drift between module name and
Probe.nameis invisible at static-analysis time. The runtimedefault_registry.all_probes()accessor is the source of truth; story examples can lag behind. When a future story renames a probe (e.g., promoting a_stubto a full probe), the AC-37 subset check is the test that fires.
Lesson 40 (S7-01) — Rule-of-three carve-out earned when policy is load-bearing¶
- Symptom: Three fixture regen-allowlist tests share a shell tokenizer + a coreutils frozenset. Strict Rule-of-Three says wait for a fourth before extracting a kernel.
- Fix: S7-01 ships
tests/unit/_fixture_regen_allowlist.pyat three consumers because the policy (structural enforcement of ADR-0001 at the fixture boundary) is load-bearing: a copy-pasted tokenizer across three sites lets a future "tidying up" of two silently weaken the third. The shared module has one short file with two exported names; the cost is one import per consumer. - Why it matters: Rule-of-Three is a heuristic against premature abstraction. When the policy is structural enforcement (security, safety, supply-chain), the heuristic flips — duplicated enforcement is worse than a kernel even at low consumer counts. Document the carve-out inline so a future reader sees why the rule was bent.
Lesson 41 (S7-05) — unittest.mock.Mock lacks __qualname__; registry decorator reads it¶
- Symptom: Story prescribed an
AC-16mock-strategy registration usingMock(return_value=sentinel_graph).register_dep_graph_strategy(target)(spy)raisesAttributeError: __qualname__at decoration time becauseDepGraphRegistry.registerreadsfn.__module__+fn.__qualname__to log the origin string.Mock.__getattr__rejects every magic-named attribute lookup. - Fix: Use a plain
def _mock_strategy(ctx, manifests) -> nx.DiGraph: ...with acall_log: list = []for assertion. Cleaner thanMockfor two reasons: (a)__qualname__resolves naturally; (b) the assertion target (call_log == [(None, [])]) reads better thanspy.assert_called_once_with(...)for typed positional dispatch. - Why it matters: Any future registry-symmetry story (Phase 3 dep-graph strategies, Phase 5+ sandbox plugins, Phase 8 polyglot probes, Phase 15 recipe-author plugins) that touches
register_*+unregister_for_testswith a mock will hit the same trap. Plain functions overMockwhenever the decorator reads__qualname__.
Lesson 42 (S7-05) — codegenie.depgraph first-load triggers a circular-import chain¶
- Symptom: Running
tests/property/test_dep_graph_strategy_dispatch.pyin isolation crashes at collection time withImportError: cannot import name 'default_dep_graph_registry' from partially initialized module 'codegenie.depgraph'. The chain:codegenie.depgraph.__init__→codegenie.depgraph.registry→codegenie.probes.base.ProbeContext→codegenie.probes.__init__→codegenie.probes.layer_b.dep_graph→codegenie.depgraph(partial). Other test suites that touchcodegenie.probesfirst happen to win this race; pure property runs do not. - Fix: One-line
import codegenie.probes # noqa: F401at the top of the test file, beforefrom codegenie.depgraph import .... The side-effect import primes the loader so the layer_b back-reference resolves cleanly. Documented in the test file's import block. - Why it matters: Any future Phase-2 / Phase-3 test file that imports from
codegenie.depgraph(and is plausibly collected withoutcodegenie.probesbeing loaded first) needs the same preload. Structural follow-up worth tracking:codegenie.depgraph.registry'sfrom codegenie.probes.base import ProbeContextcould move into aTYPE_CHECKINGblock to break the cycle at the source, sinceProbeContextis used only for the strategy alias annotation.
L-S8-01 — Nested match is the codebase convention for Pydantic discriminated unions¶
- Symptom: A flat
match value:withcase Stale(reason=CommitsBehind(...)),case Stale(reason=DigestMismatch(...)), etc., followed bycase _: assert_never(value), fails mypy under--warn-unreachable:Argument 1 to "assert_never" has incompatible type "Stale"; expected "Never". mypy does not narrow Pydantic's nested Annotated discriminated union (Stale.reason: StaleReason) through the outer match arms. - Fix: Nest the match — outer over
Fresh | Stale, inner overStaleReason— withassert_neveron each level's default arm. This is precisely what the producer (codegenie.probes.layer_b.index_health._derive_confidence) already does, and the function's docstring explicitly calls out the reason. - Why it matters: Future stories that prescribe "single match" or "flat exhaustive case arms" over Pydantic discriminated unions with a nested discriminated-union field will hit the same mypy limitation. Plan for the nested form up front. The deviation is strictly stronger than the literal AC (both levels get
assert_neverenforcement) — surface it inphase-story-validatorreview rather than smoothing it in silently atphase-story-executortime.
L-S8-02 — BudgetingContext vs ProbeContext API drift (Phase-2 follow-up)¶
- Symptom: Probes that read
ctx.config,ctx.output_dir,ctx.cache_dir, orctx.loggerfail at dispatch withAttributeError: 'BudgetingContext' object has no attribute '<attr>'. The coordinator constructsBudgetingContext(workspace,raw_artifact_mb,bytes_written,parsed_manifest,input_snapshot) but theProbeContextABC inprobes/base.pydeclares the full surface (cache_dir,output_dir,workspace,logger,config+ Phase 1/2 additions). At least five probes are affected as of S8-02:runtime_trace,ast_grep,tree_sitter_import_graph,semgrep,skills_index. All are failure-isolated to emptyschema_sliceon every gather, which means several Layer-B/C/D contributions are silently absent fromrepo-context.yaml. - Fix (deferred): Out of scope for S8-02 (surgical changes — Rule 3). A dedicated story should reconcile the two context shapes: either widen
BudgetingContextto include the missing fields (and pass real values), or amend the probes to consume the narrower surface explicitly. Inventory the affected probes viagrep -rn "ctx\.\(config\|output_dir\|cache_dir\|logger\)" src/codegenie/probes/. - Why it matters: Phase-2's "facts, not judgments" promise is undermined when half the Layer-B/C/D probes return empty slices unnoticed. The CLI summary block (S8-02) renders
secrets_redacted_count=0 / fingerprints=[] / skill_shadowed=[]on a clean fixture today, but the zeros are partly an artifact of probes never running — not of clean evidence.
L-S8-02b — print( is doubly-banned; sys.stdout.write is the operator-facing stdout convention¶
- Symptom: A new
for line in block.as_lines(): print(line)insrc/codegenie/cli.pytrips both ruff'sT201rule and theforbidden-patternspre-commit hook. The latter matches the literalprint(substring anywhere in the file, including comments — so even a docstring or inline comment that mentionsprint(by name fails the hook. - Fix: Use
sys.stdout.write(line + "\n")for operator-facing stdout insrc/. Structured logs go through structlog; this is the second sanctioned output surface. Any comment explaining the choice must avoid the literal tokenprint((rephrase, e.g., "stdlib's builtin emit-and-newline helper"). - Why it matters: Future stories that prescribe stdout output ("the CLI prints X", "log Y to the operator", etc.) should map "print" → "sys.stdout.write" without surfacing the conflict mid-implementation. The story's "calls print" is casual phrasing; the codebase convention is the contract (CLAUDE.md Rule 11).
L-S8-04 — blake3 PyPI package ≠ hashlib.blake2b(digest_size=32)¶
- Symptom: A "BLAKE3 freeze" test that imports
blake3(the PyPI package implementing the real BLAKE3 hash) produces a different digest thanhashlib.blake2b(digest_size=32). The two algorithms are distinct; the digest length matches but the bytes do not. - Fix: When a story prescribes "BLAKE3 freeze", import
blake3(already in[dev]extras) and pin viablake3.blake3(bytes).hexdigest(). Ahashlib.blake2bfallback is acceptable ONLY ifblake3is not on the dependency tree — and the test's expected-hash constant must be computed via the same function. - Why it matters: Future "file-freeze" tests (cassette pins, schema pins, etc.) need the right hash function. Sanity-check the expected constant by recomputing it locally with the EXACT import the test uses.
L-S8-04b — Doc-only stories still benefit from a typed registry when tests are substring-driven¶
- Symptom: Eight heterogeneous GitHub-issue payloads as inline dicts in a script makes per-issue test functions a copy/paste exercise. Adding a ninth issue silently widens the surface; primitive obsession on
strformilestonemakes typos invisible. - Fix: A
Final[tuple[IssueSpec, ...]]of frozen Pydantic models in a pure-data module (zerosubprocess/osimports), consumed by a separate impure shell. Tests parametrize over the tuple; AC-driven literal-substring assertions become a sweep, not a stamp. - Why it matters: S8-04 had 16 substring-driven assertions across 5 handoff + 3 backlog issue bodies. The registry kept the test file ~200 LOC; the dict-shuffling alternative would have been 500+. Open/Closed at the registry boundary: a future handoff story adds one row, the script logic stays unchanged.
L-meta — Phase retrospectives are auto-aggregated, then human-curated¶
- Source: Phase 2 closeout.
- Symptom:
_lessons.mdgrows append-only to ~84 H2 entries; attempt logs accumulate ~35 files × ~5 H3 sections each. Pattern discovery becomes archaeology — no one re-reads 200 sections to find the recurring shapes. - Fix:
scripts/synthesize_phase_lessons.py --phase <slug>walks every_attempts/S*.mdlog, buckets H3 sections by keyword intoRETROSPECTIVE.md. Pure code, no LLM, no judgment — output is byte-deterministic and re-runnable. Categorization is the human's job; the script just surfaces the raw signal grouped by category. - Why it matters: Future phases inherit a 4-bucket retrospective at zero cost. The validator skill can read prior-phase
RETROSPECTIVE.mdfiles to harden upcoming stories against patterns we've already paid for. Re-run after each story closes (or on phase exit) — output is idempotent.