Skip to content

S2-04 — attempt log

Append-only journal. Newest attempt at the bottom.

Attempt 1 — 2026-05-19 (phase-story-executor)

Outcome: GREEN. All 19 ACs covered with runtime evidence; full plugin + fence + static test suite green; ruff, mypy --strict, lint-imports, mkdocs build --strict clean.

What shipped

  • src/codegenie/plugins/resolver.py (NEW) — UNIVERSAL_FALLBACK_ID, _MAX_EXTENDS_DEPTH, ScopedCandidate, ComposedTccm, ConcreteResolution, UniversalFallbackResolution, PluginResolution discriminated-union alias, lift_manifest_scope, _lift_dim, _lift_candidates, _unpack, _filter_matches, _sort_key, _candidates_considered, compose_extends_chain, _plugin_tccm, _merge_tccm, _merge_adapters, resolve, _universal_registered. Functional-core/imperative-shell split per the hardening note.
  • src/codegenie/plugins/registry.py — replaced the S2-01 NotImplementedError("S2-04 …") stub with delegation to resolver.resolve. The resolver import is module-level (not lazy) — see Refactor decisions §"Module-level resolver import" for why.
  • src/codegenie/plugins/resolution.py — collapsed to a one-line re-export shim per the story's "Module placement" choice (a).
  • src/codegenie/plugins/errors.py — added chain payload + exit code to PluginExtendsCycle; new PluginExtendsDepthExceeded exception (with chain + reason ClassVar); new PluginRegistryCorrupted(reason: Literal["missing_universal", "empty_registry"]).
  • src/codegenie/plugins/protocols.py — replaced the stub PluginManifest forward-ref with a real TYPE_CHECKING import of the S2-02 model. Required to satisfy mypy --strict on the resolver's plugin.manifest.{scope,extends,precedence} reads.
  • tests/fixtures/plugins/fake_plugin.py — extended with extends, precedence, manifest_scope_kwargs, adapters_map, kwargs. Default manifest scope changed to (vuln, node, npm) so resolver tests can call make_fake_plugin(name="...") without repeating the kwargs. Uses PluginManifest.model_construct to skip the production parse_plugin_id regex (test fakes use names like a-plugin that don't match the three-segment production format).
  • tests/fixtures/plugins/universal_fallback_fixture.py (NEW) — make_universal_fallback(). Uses PluginManifest.model_construct to skip the regex that rejects * in plugin ids; literal "universal--*--*" lives only in this fixture and in resolver.py's UNIVERSAL_FALLBACK_ID constant.
  • tests/unit/plugins/test_resolver.py — 20 tests covering AC-15 enumeration (13 named cases) + AC-1 __all__ + AC-3 frozen dataclass + AC-12 registry delegation + empty-registry defence.
  • tests/unit/plugins/test_resolver_property.py — Hypothesis property test_resolve_is_total (200 examples, deadline=None) + meta-property test_property_strategy_never_generates_universal_id.
  • tests/unit/plugins/test_resolver_purity.py — AC-13 module-purity AST scan + AC-17 single-source-of-truth scan for integer literal 4 (mutation M18).
  • tests/unit/plugins/test_resolver_exhaustiveness.py — AC-14 AST scan for case _: assert_never(...) arm + mypy-checked _dispatch_example(resolution: PluginResolution) -> str helper.
  • tests/static/test_universal_fallback_id_single_source.py — AC-2 AST/grep scan for the "universal--*--*" literal across src/codegenie/ and tests/fixtures/plugins/.
  • tests/static/test_no_notimplemented_in_registry.py — AC-12 static scan that the S2-01 stub message is gone.
  • tests/unit/plugins/test_registry.py — flipped test_resolve_stub_names_s2_04 (which forward-referenced the S2-01 stub) into test_resolve_delegates_to_resolver proving the delegation now surfaces PluginRegistryCorrupted("empty_registry") on an empty registry.

Gate log

Gate Outcome Notes
ruff check . ✅ pass All checks passed!
ruff format --check . ✅ pass 1643 files already formatted
mypy --strict src/ ✅ pass Success: no issues found in 154 source files (+1 over prior: resolver.py; plus resolution.py collapsed to a shim, protocols.py import widened).
lint-imports ✅ pass Contracts: 4 kept, 0 broken.
make fence equivalent ✅ pass (modulo pre-existing xfails) 191 passed, 28 skipped, 2 xfailed. test_no_any_in_plugin_surface initially failed because the resolver used dict[PrimitiveName, Any]; fixed by switching to the real Adapter Protocol (now part of the Phase-3 surface contract).
mkdocs build --strict ✅ pass Documentation built in 22.36 seconds.
Plugin tests ✅ pass tests/unit/plugins/ + tests/static/test_*plugin*: 155 passed.
Full pytest (no-cov) partial 4427 passed, 2 failed, 62 skipped, 5 xfailed. The 2 failures are the pre-existing test_lint_imports_canary tests — they look for lint-imports on the global PATH (not .venv/bin); CI runs make lint-imports via the venv binary so they're clean in CI. Documented as unrelated in the S2-03 attempt log.

Ralph-Wiggum naive-verification pass

Walked every AC verbatim against runtime behaviour:

  • AC-2 single source: "If somebody writes the literal anywhere else in src or fixtures, does the scan catch them?" — Planted a stray # "universal--*--*" in errors.py initially (in a docstring); the scan flagged it. Rewrote the docstring to reference UNIVERSAL_FALLBACK_ID. PASS.
  • AC-7 fan-out: "If I give you languages=['node', 'python'] and build_systems='*', do you give me exactly 2 PluginScopes with the right shapes?" — Parametrized table with 4 cases including the universal (*, *, *) → 1 corner. PASS.
  • AC-9 step 4 missing-universal: "If no plugin matches and the universal isn't registered, do you fail loud?" — PluginRegistryCorrupted(reason="missing_universal") raises with the typed reason. PASS.
  • AC-9 step 6 head==universal: "If the universal is the only thing in the registry, do you correctly return the fallback (not raise corrupted)?" — test_only_universal_registered_returns_universal_fallback exercises step 6 specifically (mutation M8 defence). PASS.
  • AC-11 cycle: "If A extends B extends A, do you tell me the cycle path with A repeated at the tail?" — chain == (A, B, A) verified by full-tuple equality. PASS.
  • AC-11 depth-cap: "If A → B → C → D → E, do you refuse at the point of the 5th level?" — chain == (A, B, C, D, E); depth-4 variant passes. PASS. Mutation M7 (> vs >=) would let depth-5 through; the test catches it.
  • AC-14 exhaustiveness: "If somebody adds a third PluginResolution variant tomorrow, will mypy yell?" — Yes: the _dispatch_example helper's case _: assert_never(resolution) arm forces a type-check failure on any new variant that isn't added to the dispatch table. PASS.
  • AC-16 totality: "For 200 random registries + random incoming scopes, does the resolver always return one of the two real variants?" — Exhaustive match over PluginResolution with assert_never proves it at the type AND the runtime layer. PASS.
  • AC-17 magic-number removal: "If somebody bumps the cap to 10 in one place but leaves a stray 4 in another, does the scan catch it?" — Planted if x == 4: in resolver.py temporarily; the scan flagged 2 occurrences (the constant + the planted use). PASS after removing the planted code.

Refactor decisions (Rule 3 — surgical)

  • PluginExtendsDepthExceeded is a CodegenieError, NOT a PluginRejected BaseModel variant. Story AC-11 wording says "raise PluginRejected(reason="extends_depth_exceeded", ...)", but PluginRejected is a TypeAlias for a discriminated Pydantic union — BaseModels cannot be raised. I read the contradiction as: the resolver needs a distinct exception class whose payload (reason, chain) matches the proposed variant shape. A future loader-time pre-check (S2-03 successor) can add the equivalent BaseModel variant to PluginRejected for its own Result[X, PluginRejected] return type. Documented in the PluginExtendsDepthExceeded class docstring; tests assert raises(PluginExtendsDepthExceeded) + payload.
  • Module-level resolver import in registry.py. The original cycle (registry ↔ resolver at module load) is already broken by TYPE_CHECKING import of PluginRegistry inside resolver.py. At runtime resolver.py imports zero of registry.py's members. Resolution order: registry.pyresolution.pyresolver.py → done. The from codegenie.plugins.resolver import resolve as _resolver_resolve lives at module level and is bound at registry-load time. Why this matters: the fence test tests/fence/test_no_llm_in_transforms.py pops codegenie.plugins.* from sys.modules and walks the package to re-import every submodule. A lazy from codegenie.plugins import resolver as _resolver inside resolve() would fetch the new (C2) module after the fence pop, while the test's Concrete() bindings still hold the old (C1) class — leading to assert_never(dim) on the C1/C2 mismatch. The module-level bind freezes the lookup at registry.py load time, so _resolver_resolve and the test's Concrete are class-identity consistent. Documented in the PluginRegistry.resolve docstring.
  • tests/fixtures/plugins/fake_plugin.py default scope changed to (vulnerability-remediation, node, npm). Previously (vulnerability-remediation, javascript, npm). The resolver tests use (vuln, node, npm) as the canonical incoming scope; matching the default avoids forcing every test that wants a matching plugin to repeat the kwargs. Documented inline.
  • make_fake_plugin uses PluginManifest.model_construct. The S2-02 parse_plugin_id validator requires three ---separated segments (e.g., vulnerability-remediation--node--npm); resolver tests use semantic names like a-plugin for sort tie-breakers. model_construct skips the validator. The manifest loader's own tests (test_manifest.py) exercise the production regex; resolver tests should not be coupled to it.
  • tests/unit/plugins/test_registry.py imports moved to module-level. The S2-04 update of test_resolve_stub_names_s2_04test_resolve_delegates_to_resolver originally used function- body local imports. Local imports re-fetch from sys.modules at the time of the call; after the fence test pops modules, local imports return the C2 classes while the test's existing module globals hold C1. Moving Concrete, PluginScope, and PluginRegistryCorrupted to module-level bindings makes them consistent with the rest of the file.
  • Adapter Protocol used as the runtime adapter-value type, not Any. The fence test_no_any_in_plugin_surface forbids Any in Phase-3 surface. The _FakeAdapter test-local dataclass satisfies the Protocol's one attribute (primitive: PrimitiveName); the resolver's composed map is now dict[PrimitiveName, Adapter].

Follow-ups surfaced (not folded in — Rule 3)

  • ComposedTccm real shape — S3-01 lands the real TCCM Pydantic per Phase-3 ADR-0004. The substitution point is documented on ComposedTccm's docstring and the resolver's _plugin_tccm hook (reads plugin._composed_tccm if present, else empty placeholder).
  • PluginRegistryCorrupted event-log emission — S6-01 wires the spanning-event. This story raises the typed exception only.
  • Loader-time extends cycle / depth / target-missing pre-check — additive to S2-03; the resolver's per-resolve checks are the contract. The pre-check is a fail-fast optimization.
  • Universal-fallback parse_plugin_id allowance — the regex rejects universal--*--*. The S7-03 real fallback plugin will need either (a) a regex amendment + ADR, or (b) a separate loader path. Out of scope here; tests use model_construct.
  • tests/unit/test_lint_imports_canary env drift — these tests look for lint-imports on the global PATH. They were pre-existing local failures named in the S2-03 attempt log; CI runs the venv-binary'd make lint-imports which is clean. The canary tests could be hardened to look in .venv/bin first.