ADR-0010: LlmInvocationGuard + BudgetToken — per-workflow budget cap as a function-signature capability¶
Status: Accepted Date: 2026-05-18 Tags: capability-pattern · circuit-breaker · type-driven-safety · cost-discipline Related: production ADR-0024 · production ADR-0025 · ADR-0001 (this phase)
Context¶
Phase 4 introduces the first place codewizard-sherpa spends money per-request (LLM tokens). Production ADR-0025 commits the project to "hard per-workflow cost cap with explicit override" as a Temporal-middleware-enforced pattern in the production service. Phase 4 (the local POC) needs the same control at its scale: a bug in LeafLlm.invoke that retries a never-resolving prompt should not silently burn tokens until a human notices.
The three design lenses split: performance emitted cost events and deferred enforcement to Phase 13 (the production ledger); best-practices shipped no per-workflow cap at all; security shipped LlmInvocationGuard as a hard cap exposed as a required function-signature argument — calling LeafLlm.invoke(...) without a BudgetToken is a type error.
The critic acknowledged the security shape as load-bearing-correct, with one caveat: "capability passed through ten frames" anti-pattern risk (critique.md §"Anti-patterns from the toolkit's flag on sight list"). The mitigation must be that the token flows through few frames, not many.
Options considered¶
- No cap; events only (performance lens).
LlmInvocationGuardemitscost.llm.callevents; no enforcement. Phase 13's production ledger eventually enforces. Pattern: Observability without control. A bug that retries forever spends tokens until alerts fire. - No cap at all (best-practices lens). Defer entirely to Phase 13. Pattern: Defer-to-platform. Phase 4 ships a system that can silently runaway in spend.
- Hard cap via a global counter the adapter checks (alternative).
LlmInvocationGuard.check_and_decrement()called at top ofinvoke. Pattern: Procedural check. A missed-check bug (or a new code path that forgets) silently bypasses the cap. - Hard cap as a function-signature capability (security lens).
LeafLlm.invoke(..., token: BudgetToken)— calling without one is aTypeErrorat call construction. Pattern: Capability pattern (financial) + Circuit breaker.
Decision¶
LlmInvocationGuard is a per-workflow object initialized with max_tokens, max_dollars, per_call_max_tokens, and an event_log. LeafLlm.invoke(...) declares token: BudgetToken as a required keyword arg; the adapter cannot be called without one. The BudgetToken is a frozen=True, extra="forbid" Pydantic model with precharged_tokens, precharged_dollars, issued_at, and a private _marker: Literal["budget_token"] discriminator. The token flows through exactly two frames: FallbackTier → LeafLlm.invoke. Defaults: max_tokens_per_workflow=250_000, max_dollars_per_workflow=$1.50, per_call_max_tokens=32_000. Pattern: Capability pattern (financial) + Circuit breaker — token is a function-signature property, not a runtime check.
Tradeoffs¶
| Gain | Cost |
|---|---|
A missed-check bug is structurally impossible — invoke() without a token is TypeError at call construction, caught by mypy --strict |
The cap defaults (250K / $1.50) are uncalibrated Q1-2026 estimates; Phase 13's cost ledger will calibrate; until then conservative numbers |
| Blast radius on runaway spend is bounded per workflow (not per process); concurrent workflows have independent guards | The cap is per-workflow; a portfolio of N workflows can still burn N × cap; Phase 13's per-portfolio cap is the next layer |
running_total() is a typed surface Phase 5's GateRunner consumes across retries — the budget composes naturally with Phase 5's retry envelope |
Reconcile is idempotent on BudgetTokenId — duplicate reconcile calls must be safe; tested in tests/unit/fallback/test_budget_guard.py |
BudgetToken flows through exactly two frames (FallbackTier → LeafLlm.invoke) — the toolkit's "capability passed through ten frames" anti-pattern is avoided by design (token does NOT flow through PromptBuilder or FenceWrapper) |
Three frames would already feel like a context object trying to escape — discipline must hold as Phase 6's LangGraph migration lifts FallbackTier into a node |
cost.llm.call event entries compose with Phase 5's cost.sandbox.run entries for Phase 13's eventual unified ledger |
Decimal arithmetic for dollars must be exact (Decimal, not float); tested via Hypothesis property tests/property/test_budget_decimal_exactness.py |
Override paths for known-expensive cases (per roadmap.md's "operator-mode batch with elevated cap") arrive in Phase 4 as a configuration knob in plugin.yaml, not as a runtime flag |
Operator override audit trail is load-bearing — every override emits an event and the override is operator-acknowledged |
Pattern fit¶
The toolkit's Capability pattern fit is exact: "A token (object) that grants permission to perform an action. Holding the token = having the capability. No capability = no operation." BudgetToken is the financial capability; LeafLlm.invoke is the action. The toolkit names "cost ledger access (SpendCapability(budget_usd=…))" as the canonical example.
The toolkit's anti-pattern flag "Capability passed through ten frames" is the load-bearing constraint: the token flows only through FallbackTier → LeafLlm.invoke. Two frames. It does not flow through PromptBuilder, FenceWrapper, EgressGuard, or SolvedExampleRetriever. The capability is local to the leaf-call site, not a global ambient permission.
Circuit Breaker is the second pattern at play: cap-exceeded BudgetExceeded raises before any leaf call, halting the spend (the "open" state of the breaker). The breaker resets per-workflow (no half-open / probing state — Phase 4 doesn't need that complexity).
Consequences¶
LeafLlm.invokecannot be called without aBudgetToken—mypy --strictenforces; testtests/unit/fallback/test_budget_guard.py::test_invoke_requires_token_typecheckasserts at type-check time.LlmInvocationGuard.precharge(requested_tokens)is the only mint path;prechargeraisesBudgetExceededifrunning_total + requested > max.LlmInvocationGuard.reconcile(token, actual_in, actual_out, actual_dollars)updates running totals; idempotent onBudgetTokenId.BudgetExceededraises return asRecipeApplication.Refused(reason=BUDGET_EXCEEDED)fromFallbackTier.run— typed; consumed by Phase 5's HITL escalation.- Phase 5's
GateRunnerconsumesrunning_total()across retries —BudgetSnapshotis the projection shape. - Phase 13's eventual cost ledger composes
cost.llm.call(Phase 4) +cost.sandbox.run(Phase 5) + others — the event vocabulary is forward-compatible. - Audit events:
LeafKeyLoaded,LeafInvoked(prompt_digest_blake3),LeafReturned(response_digest_blake3, tokens_in, tokens_out, cache_read, cache_creation),BudgetReconciled,BudgetExceeded. - The 250K / $1.50 cap is recorded as Phase 4's initial calibration in
plugin.yaml; Phase 13 cost-ledger evidence drives subsequent tuning. - No environment-variable escape (
CODEGENIE_ANTHROPIC_KEY_CIrejected per critic [S] hidden assumption #2); the API key flows viakeyring.get_password("codegenie", "anthropic_api_key") → SecretStronly.
Reversibility¶
Low. Removing the cap (defaulting max_tokens=∞) is a one-line config change but loses the runaway-spend guarantee. Changing LeafLlm.invoke to not require BudgetToken is a Protocol signature change consumed by every leaf adapter and Phase 5's retry envelope — high friction, requires phase amendment. Adjusting cap values is config (plugin.yaml); high reversibility there.
Evidence / sources¶
../final-design.md §Component 5 — LlmInvocationGuard../final-design.md §Goal "Per-workflow hard budget cap"../phase-arch-design.md §Component 5 — LlmInvocationGuard + BudgetToken../phase-arch-design.md §Design patterns appliedrow 4 (Capability + Circuit Breaker)../phase-arch-design.md §Anti-patterns avoided("Capability passed through ten frames")../critique.md §"Anti-patterns from the toolkit's flag on sight list"- production ADR-0024 (cost observability)
- production ADR-0025 (per-workflow cost cap pattern; this ADR is the Phase 4 instance)