ADR-0004: Image digest as a declared_inputs special token, not a cache_key() override¶
Status: Accepted Date: 2026-05-14 Tags: cache · declared-inputs · chokepoint-preservation · contract-fidelity · probe-context · additive-extension Related: 02-ADR-0003, Phase 1 ADR-0002, Phase 0 ADR — declared_inputs cache key, production ADR-0006
Context¶
RuntimeTraceProbe (Phase 2 Layer C4) captures syscalls, loaded libraries, and shell invocations of the analyzed-repo's container under five scenarios. The probe's cache-correctness story is load-bearing: a package.json-only change (no Dockerfile change, image rebuilt with same digest) must cache-HIT; a FROM-line bump or a base-image rebuild (new digest) must cache-MISS. The signal that distinguishes the two cases is not in the Dockerfile bytes — it is in the resolved image digest that docker build produces. localv2.md §4's declared_inputs is the single, universal cache-key derivation primitive; Phase 0's Cache reads files-by-glob from declared_inputs and derives the cache key from their content hashes plus localv2.md §4's "special token" mechanism for non-filesystem inputs.
The performance lens proposed letting RuntimeTraceProbe override cache_key() directly, deviating from declared_inputs and introducing a parallel cache-key derivation pathway. The critic ([P] finding #6) flagged this as a structural deviation that future probes would copy: once one probe bypasses declared_inputs, the discipline becomes opt-out by convention rather than opt-in by chokepoint. Worse, cache_key() overrides hide the actual inputs from tests/unit/test_cache_key_stability.py's structural checks.
The synthesis (final-design.md §"Conflict-resolution table" row 9, 16) picked the alternative: extend declared_inputs with a special token image-digest:<resolved> — exactly the special-token mechanism localv2.md §4 already permits. The resolved digest is supplied via a new optional callable on ProbeContext (image_digest_resolver: Callable[[Path], str | None] | None = None), mirroring Phase 1 ADR-0002's parsed_manifest precedent: one optional callable, default None, defensive-check at the call site, ADR-gated. This is the one ProbeContext field Phase 2 adds; the Probe ABC itself stays untouched.
Options considered¶
- Option A — Override
cache_key()onRuntimeTraceProbe; bypassdeclared_inputsfor the image-digest signal. Pattern: none (chokepoint bypass). Performance lens's pick. Two cache-key derivation pathways exist; future probes copy the override;localv2.md §4discipline survives only by convention. - Option B — Compute the image digest inside the probe, then add it as an in-memory amendment to a probe-private cache key. Pattern: Strategy at the cache layer. Same structural problem as Option A: the chokepoint sees one signal; the probe maintains a second.
- Option C — Make
RuntimeTraceProbe.declared_inputsincludeDockerfileand.codegenie/scenarios.yamlonly; accept that a base-image rebuild silently cache-HITs until the Dockerfile is edited. Best-practices lens's pick (silent). Wrong: the probe-quality regression is invisible to the operator; B2 might not catch it. - Option D — Extend
declared_inputswith a special tokenimage-digest:<resolved>; resolve via an optionalProbeContext.image_digest_resolvercallable mirroring Phase 1 ADR-0002. Pattern: Additive extension at the right layer + Open/Closed fordeclared_inputs. Synthesis pick. Same cache pathway for all probes; one auditable chokepoint; the special-token mechanismlocalv2.md §4already permits.
Decision¶
Adopt Option D. RuntimeTraceProbe.declared_inputs lists ["Dockerfile", ".codegenie/scenarios.yaml", "image-digest:<resolved>"]. The image-digest: token is a localv2.md §4 special-token form; the Phase 0 Cache layer recognizes the token prefix and resolves it via ProbeContext.image_digest_resolver(repo_root) -> str | None, a new optional callable on ProbeContext defaulting to None. Probes that don't need it ignore it (the field is Optional); the cache layer falls back to declared-input file globs when the resolver is None or returns None. cache_key() is NOT overridden on any probe. The contract surface localv2.md §4 froze is preserved by addition, not by bypass. Pattern: Additive extension at the right layer — special-token mechanism already permitted; one optional callable mirroring Phase 1 ADR-0002.
Tradeoffs¶
| Gain | Cost |
|---|---|
declared_inputs remains the single cache-key derivation primitive across all probes — Phase 0 I1 contract preserved verbatim (production ADR-0006's deterministic-gather commitment) |
One new optional callable on ProbeContext — ProbeContext grows by one field (now parsed_manifest + image_digest_resolver); the precedent for "Phase N adds one optional callable per phase" is now twice-set, and a Phase 3+ addition follows the same shape |
Mirrors Phase 1 ADR-0002 precedent exactly — parsed_manifest callable was the additive-optional shape; this ADR uses the same shape for image_digest_resolver. The "what's allowed on ProbeContext" pattern is now load-bearing precedent |
A reader scanning ProbeContext sees two Callable | None = None fields; the discipline is "every such field is ADR-gated, additive, and defaults to None" — that discipline must be enforced in code review since the type system can't refuse a third arbitrary callable |
image-digest: is a localv2.md §4-permitted special token; the Phase 0 Cache layer's special-token resolution path is the natural extension point |
The Phase 0 Cache layer gains a token-recognizer dispatch; today there is one token type (image-digest:), so the dispatch is a one-arm match — but the shape ratchets if more special-tokens are added later (Phase 7's distroless target manifest? Phase 14's cross-repo SCIP?) |
A package.json-only change with the image rebuilt-and-pushed-with-same-digest cache-HITs correctly; a FROM-line change with the same Dockerfile bytes but different resolved digest cache-MISSES correctly — the user's mental model ("changing the image invalidates trace cache") is faithfully encoded |
The resolver's failure mode is silent if the implementer returns None carelessly — tests/adv/phase02/test_image_digest_drift.py (load-bearing adversarial) is the structural check that mutating the built image between gathers invalidates tier-C caches |
C-tier probes (syft, grype, runtime_trace) all benefit — SyftProbe and GrypeProbe declare the same token; they share cache invalidation with RuntimeTraceProbe when the image digest changes |
Three probes now depend on the same optional callable being supplied; if the coordinator forgets to bind image_digest_resolver, all three silently fall back to declared-input files only. Mitigation: the resolver is bound once at coordinator setup; tests cover the absence path explicitly |
Phase 3+ probes that need cache invalidation against an opaque external signal (e.g., a Phase 7 distroless-target manifest fingerprint, a Phase 14 cross-repo SCIP head) can extend declared_inputs with their own special token + a new optional ProbeContext callable — the precedent is now twice-set and the shape is bounded |
Future special tokens require their own ADR amendment to this one (or Phase 1 ADR-0002) — the friction is the point; ad-hoc ProbeContext growth is refused |
Pattern fit¶
Pattern: Additive extension at the right layer + Open/Closed for declared_inputs (design-patterns-toolkit.md §"Open/Closed Principle"). The toolkit's prescription — "open for extension, closed for modification … adding a new feature should not require editing existing code" — is honored exactly: declared_inputs is unchanged in shape (still a list[str]); the special-token semantics are unchanged in mechanism (localv2.md §4 already names the form); ProbeContext's additive-optional shape was set by Phase 1 ADR-0002. The pattern's failure mode the toolkit warns against ("the central dispatch_task_class(name) function has a match name block that grows every time") is bounded: the Phase 0 Cache's token-recognizer is a one-arm match today; new tokens add arms via ADR amendment, not silent edits. Composes with Tagged union for state discipline — image_digest_resolver: Callable[[Path], str | None] | None is honest about both "I might not be supplied" and "I might not resolve."
Consequences¶
src/codegenie/probes/base.py(ProbeContextdataclass) gains one field:image_digest_resolver: Callable[[Path], str | None] | None = None. TheProbeABC itself is not edited (Phase 0 contract-freeze snapshot still passes).src/codegenie/cache.py(Phase 0) gains a token-recognizer dispatch in_resolve_declared_inputs. Tokens are recognized by the<name>:<value>syntax; today's one token (image-digest:) is resolved by callingctx.image_digest_resolver(repo_root). Unknown tokens raiseCacheKeyError(reason="unknown_special_token", token=…)— fail-loud per Rule 12.RuntimeTraceProbe.declared_inputslists the token; if the resolver isNoneor returnsNone(e.g., no image built yet), the probe emitsconfidence="unavailable"and the cache key falls back to file globs only.SyftProbeandGrypeProbedeclare the sameimage-digest:token indeclared_inputs; cache invalidation is shared withRuntimeTraceProbecorrectly.tests/adv/phase02/test_image_digest_drift.py(load-bearing adversarial) asserts: mutating the built image between gathers invalidates tier-C caches; the same Dockerfile bytes with a new digest produce a different cache key.tests/unit/test_cache_key_stability.py(Phase 0) is extended — not edited — with the special-token round-trip cases.- The performance-lens-proposed
cache_key()override hook stays rejected. Phase 0's chokepoint is preserved; a Phase 3+ probe that wants cache invalidation against an opaque signal extendsdeclared_inputswith its own special token (and adds an optionalProbeContextcallable via a new ADR amendment to this one). - The pattern is now twice-precedented (Phase 1's
parsed_manifest+ Phase 2'simage_digest_resolver); a Phase 3 addition of a third optional callable carries the burden of ADR-gating + naming the named-trigger probe. Ad-hoc growth is refused; the precedent is auditable.
Reversibility¶
Medium-high. Removing the image-digest: token is a RuntimeTraceProbe.declared_inputs edit + a Cache dispatch arm deletion + a ProbeContext.image_digest_resolver field removal (or default-None-and-never-set). The probe degrades to file-only cache keys — base-image-rebuild silent cache-HITs would return, but the structural rollback is small. The harder reversal is changing the special-token syntax (e.g., to ${image-digest} or some YAML-ish escape); that would require coordinated edits across declared_inputs literals in probe modules — but no such reshape is contemplated, and localv2.md §4 already pinned the <name>:<value> form.
Evidence / sources¶
../final-design.md §"Conflict-resolution table" row 9, row 16—cache_keystrategy +RuntimeTraceProbecache-key shape../final-design.md §"Components" #6 RuntimeTraceProbe— cache-key special-token rationale../final-design.md §"Departures from all three inputs" #2— image digest as declared-input special token (not as cache-key override)../phase-arch-design.md §"Component design" #6—ProbeContext.image_digest_resolveras the one Phase-2ProbeContextaddition../phase-arch-design.md §"Data model"— explicit single-fieldProbeContextextension../phase-arch-design.md §"Edge cases" row 14— image-digest resolver returnsNonepath../critique.md §"Attacks on the performance-first design" #6—cache_key()bypass framing- Phase 1 ADR-0002 — additive-optional
ProbeContextprecedent localv2.md§4 — special-token mechanism indeclared_inputs- Production ADR-0006 — deterministic-gather commitment that
declared_inputsoperationalizes