ADR-0002: fence CI job enforcing no-LLM-in-gather¶
Status: Accepted Date: 2026-05-11 Tags: ci · determinism · supply-chain · invariant Related: ADR-0006, production ADR-0005
Context¶
production/design.md §2.1 and production ADR-0005 make "No LLM in the gather pipeline" a load-bearing architectural commitment. Reversibility on that ADR is High — adding an LLM call to a probe would break the cache contract, the replay guarantee, the cost-prediction model, and the audit story simultaneously.
../critique.md §7.2 and ../critique.md §6.1 both observe that the commitment is enforced nowhere in the lens designs. Performance and best-practices ship [project.optional-dependencies.dev] as the only extra; security mentions a gather extra but doesn't design it. Phase 4 will introduce anthropic / langgraph into the same pyproject.toml unless the structural separation is designed now. The roadmap explicitly defers the fence to "Phase 4 will route LLM SDKs through a service extra" — by which point the wheel may already pull them transitively.
A load-bearing commitment that isn't a test fails open under contributor pressure.
Options considered¶
- Documentation-only (status quo across all three lens designs).
production/design.md §2.1says "no LLM in gather" and the team commits to honoring it. Trust contributors and PR reviewers. - Lint rule blocking
from anthropic import/from langgraph importinsrc/codegenie/. Catches direct imports; misses transitive deps andimportlib-style indirection. - Runtime check at CLI startup. Inspect
sys.modulesafter import; refuse to run if an LLM SDK is loaded. Brittle (depends on probe import order); detection lags introduction. - CI test asserting the wheel's runtime dependency closure contains no LLM SDK (synth, load-bearing). Walks
importlib.metadata.distribution("codewizard-sherpa").requires; intersects with{"anthropic", "langgraph", "openai", "langchain", "transformers"}; asserts empty. Catches direct and transitive contamination before merge.
Decision¶
A dedicated fence CI job runs tests/unit/test_pyproject_fence.py on every PR. The test computes set(distribution("codewizard-sherpa").requires) ∩ {"anthropic", "langgraph", "openai", "langchain", "transformers"} over the installed wheel and asserts the intersection is empty. The test ships with a deliberate-negative test (test_fence_catches_planted_anthropic_dep) that plants anthropic in a synthetic pyproject.toml and asserts the check fails — guarding against the check itself silently breaking. The fence job is in the six-job CI matrix and PR-blocking.
Tradeoffs¶
| Gain | Cost |
|---|---|
| Production ADR-0005 stops being aspirational; it becomes an executable test that fails red on regression | One more CI job (~5–10s); one more test to maintain |
| The "no LLM in gather" invariant is enforced from Phase 0, not retroactively in Phase 4 | The fence forces the pyproject.toml shape (see ADR-0006) to have a real agents extra ready before LLM SDKs land |
The deliberate-negative test inoculates against silent rot of the check itself — the named risk in ../final-design.md §10 risk #5 |
Two tests instead of one; ~5 lines extra |
| The intersection list is a contract: future LLM SDKs (Bedrock, Vertex) are added by a one-line PR with mandatory review | The list must be maintained; an SDK not on the list slips through (mitigated: the list is reviewed at each phase boundary) |
Transitive contamination (e.g., a dev plugin pulling openai) surfaces immediately |
The test is scoped to dependencies (not optional-dependencies); the dev closure is allowed to contain LLM SDKs |
Consequences¶
- The wheel's
dependencieslist is the canonical gather-pipeline closure. Thegatherextra in[project.optional-dependencies]is intentionally empty — its existence marks the slot, but the runtime closure isdependenciesitself. - Any future PR adding an LLM SDK must route it through
[project.optional-dependencies.agents]and import it from a code path that is not loaded bycodegenie gather. The fence test fails fast if this discipline slips. - The fence is the structural enforcement of production ADR-0005. It sharpens that ADR from aspirational to tested.
- The
fencejob's signature (~5–10s) keeps the CI walltime budget intact. - The test surface establishes a pattern (deliberate-negative test alongside the real check) reusable for other load-bearing invariants (e.g., the probe-contract snapshot in ADR-0007).
Reversibility¶
Low. Adding or removing entries from the intersection set is a one-line PR. Disabling the job entirely is a deliberate ci.yml edit. The cost of reversing is small mechanically but High politically — the fence is the structural backbone of production ADR-0005; removing it should require an ADR amendment, not a configuration tweak.
Evidence / sources¶
../final-design.md §2.2(fence CI test specification)../final-design.md §3.2(Six CI jobs —fenceis job #6, "the load-bearing job")../final-design.md §10 risk #5(Deliberate-negative test rationale)../phase-arch-design.md §Testing strategy / CI gates(Phase 0 jobs)../critique.md §7.2(Critic flags absence of enforcement)../critique.md §6.1(Shared blind spot: nopyproject.tomlshape for the split)- production ADR-0005 — the commitment this job enforces