ADR-0014: Cassette discipline as a security control — `CassetteSanitizer` + `cassettes.lock` + nightly drift job¶

Status: Accepted Date: 2026-05-18 Tags: ci-enforcement · supply-chain · test-determinism · nightly-canary · content-addressed-manifest Related: ADR-0005 (this phase) · ADR-0007 (this phase)

Context¶

Phase 4's CI runs LLM-touching tests via pytest-recording cassettes (pytest --record-mode=none). The cassette layer has two correctness dimensions:

Secret hygiene. pytest-recording records Authorization headers verbatim by default. A contributor recording cassettes locally leaks their Anthropic API key into tests/cassettes/. The security design ships a sanitizer; performance lens missed it entirely (critique.md §"Things this design missed").
Cassette-vs-reality drift. Cassettes solve CI determinism but mask SDK upgrades, API shape changes, and prompt-vs-response semantic drift. All three lenses treated cassettes as if they solved the determinism problem completely; the critic correctly flagged this as a shared blind spot (critique.md §"Where do all three quietly agree" item 3).

Phase 6.5's bench harness will read per-case cassette hashes for replay-quality verification — but only one design (performance) committed to shipping a cassettes.lock BLAKE3 manifest.

The honest framing: cassettes are checked-in source code with the same review discipline source code gets. Sanitization is the secret-hygiene control; the manifest is the integrity control; the nightly drift job is the cassette-vs-reality canary.

Options considered¶

record_mode="none" only, no sanitizer (performance lens, default). Cassettes record raw headers including secrets. Pattern: Trust-the-contributor. One leaked key per careless pytest --record-mode=all run.
Header scrubbing on record (best-practices lens, partial). Strip Authorization, x-api-key, anthropic-version headers. Pattern: Sanitize at record. Closes header-leak hole; doesn't catch body-shaped secrets or cassette-vs-reality drift.
Sanitize-on-record + CI scanner + content-addressed manifest + nightly drift job (synthesis composite). pytest-recording before_record_request/response hooks strip headers; body-scan for sk-ant-*/claude_* tokens + 40+-char base64; CI test tests/security/test_cassettes_clean.py rejects any leakage; cassettes.lock BLAKE3 per cassette; nightly real-API CI job flags drift. Pattern: Layered control (sanitize + manifest + canary).

Decision¶

Phase 4 ships the full layered control:

Sanitize at record: pytest-recording before_record_request/response hooks strip Authorization, X-API-Key, Cookie, Set-Cookie, anthropic-version headers; body scans for sk-ant-* / claude_* patterns and 40+-char base64-shaped header values.
CI security scanner: tests/security/test_cassettes_clean.py walks tests/cassettes/ and fails CI on any leaked pattern (header, body, or shaped token).
CODEOWNERS gate: cassette diffs require cassette-review CODEOWNERS approval.
Content-addressed manifest: tests/cassettes/anthropic/cassettes.lock carries per-cassette BLAKE3; CI compares on-disk hashes to lock and rejects un-committed re-records.
Nightly drift job: budget-capped CI job runs real Anthropic calls against a representative bench fixture and annotates drift (not workflow-blocking; cassette refresh + commit is the recovery).
Operator refresh path: make refresh-cassettes requires explicit --i-understand-this-spends-tokens flag + CODEOWNERS approval.

Pattern: Layered control — Sanitize at record + CI scanner + Content-addressed manifest + Nightly real-API canary. The four together are the cassette-discipline contract; none is sufficient alone.

Tradeoffs¶

Gain	Cost
API key exfiltration through committed cassettes is structurally impossible (sanitizer strips before write; CI scanner is the backstop)	A real cassette refresh now requires three discrete steps: regenerate locally (`--i-understand-this-spends-tokens`), pass CI sanitizer, get CODEOWNERS approval — slower than `pytest --record-mode=all`
Cassette-vs-reality drift is caught by the nightly job — not in production, in CI annotation form	The nightly job spends real tokens (budget-capped CI key); operator manages the budget cap and reviews annotations
`cassettes.lock` per-case BLAKE3 is Phase 6.5's contract — bench replay knows which cassette shapes the bench result	The lock file must be updated in lockstep with cassette regeneration; mismatch fails CI; engineers must understand the regeneration workflow
Two correctness controls (cassette determinism + nightly drift) are honestly separated — neither claims to solve the other's job	Two failure paths to triage when something breaks; runbook (`docs/operations/cassettes.md`) documents which signal means which
The sanitizer + scanner pattern is reusable for Phase 6.5+ — every future LLM-touching cassette gets the same hygiene by inheritance	The denylist of secret-shaped patterns is the same denylist incompleteness as the canary corpus (ADR-0013); grows over time
Cassette diffs requiring CODEOWNERS approval prevents accidental "I just regenerated and pushed" PRs from landing	Contributor friction on legitimate cassette updates; mitigated by the `make refresh-cassettes` ergonomic

Pattern fit¶

This is not a textbook design pattern — it's a layered control composition. The closest toolkit fit is the "Functional core / Imperative shell" idea applied to test infrastructure: cassettes are the pure-replay core (deterministic, reviewable, content-addressed); the nightly drift job and the operator refresh workflow are the imperative shell that keeps the core true to reality.

The cassettes.lock is a Content-addressed manifest — same shape as embeddings_model.lock (ADR-0007) and .codegenie/rag/manifest.yaml. The pattern recurs because content-addressing is how this codebase says "this artifact's identity is its bytes."

Consequences¶

tests/cassettes/anthropic/ is the canonical cassette directory; cassettes.lock lives next to it.
tests/security/test_cassettes_clean.py runs in every CI build; failure = hard CI block.
tests/fence/test_cassette_discipline.py asserts CODEGENIE_LIVE_LLM is unset in CI.
make refresh-cassettes runs pytest --record-mode=all with sanitizer enabled; outputs require CODEOWNERS approval before merge.
The nightly CI job is configured to run against a representative bench fixture (fixtures/vuln-major-bump/express-cve-2026-1234/) with a budget-capped key; annotations land as PR comments on the open drift-flag PR.
Phase 6.5's bench harness reads cassettes.lock to verify cassette identity per case; the contract is "the lock matches the on-disk cassette OR Phase 6.5 reports identity drift."
pytest-recording record_mode="none" is the CI default; cassette miss = hard fail with a diagnostic pointing to make refresh-cassettes.
The sanitizer is also the way contributor laptops avoid local leakage — the same hooks fire in local recording.
Future Phase 6+ tests inherit the cassette discipline by writing under tests/cassettes/<vendor>/; the sanitizer + scanner extend transparently.

Reversibility¶

Medium. Disabling individual controls (e.g., removing the nightly job) is config-level; each removal loses one layer of defense. Removing the entire cassette discipline (no sanitizer, no manifest, no nightly) reverts to the "trust the contributor" state — would require a Phase-4 ADR amendment with a clear justification (e.g., a migration to a different replay mechanism). Replacing pytest-recording with a different cassette library lands behind the same sanitizer/scanner/manifest contract — adapter swap.

Evidence / sources¶

../final-design.md §Component 13 — CassetteSanitizer
../final-design.md §Goal "Cassette security scan in CI"
../phase-arch-design.md §Component 12 — CassetteSanitizer
../phase-arch-design.md §Goals — G11
../critique.md §"Things this design missed" (performance missed cassette sanitization)
../critique.md §"Where do all three quietly agree on something questionable" item 3 (cassette layer "solves CI determinism")
roadmap.md §Phase 6.5 (bench harness reads cassette manifests)

ADR-0014: Cassette discipline as a security control — CassetteSanitizer + cassettes.lock + nightly drift job¶