ADR-0025: Per-workflow cost cap as a hard guard¶
Status: Accepted Date: 2026-05-11 Tags: cost · safety Related: ADR-0024, ADR-0014
Context¶
Even with cost-favorable architecture (deterministic gather, recipe-first planning, knowledge-graph reuse), individual workflows can spiral. A worker that hits a hard problem may retry, fall through to LLM-from-scratch, retry again, escalate to error-triage, retry the new plan — token spend can grow non-linearly even with the 3-retry cap (ADR-0014) on individual nodes.
The retry cap bounds spend per node. Per-workflow caps bound spend across all nodes of one workflow. Both are needed.
Options considered¶
- No per-workflow cap. Trust the per-node retry cap. Acceptable if workflows are short; risky if they have many nodes.
- Soft cap (warning at 100%, no enforcement). Operators see overruns but can't stop them automatically. Better than nothing.
- Hard cap with override. 80% triggers a soft warning to operators; 100% triggers a hard halt unless an explicit
--allow-overrunannotation was set at workflow start. - Hard cap without override. No escape valve. Simpler; brittle on rare-but-legitimate cases.
Decision¶
Hard per-workflow cost cap with explicit override:
- Each workflow declares a token + compute budget at start. Defaults differ by task class (vulnerability patches get a higher cap; convenience migrations get a lower cap).
- Temporal workflow state tracks cumulative spend deterministically across all Activities.
- At 80% of cap: soft warning emitted to operators; workflow continues.
- At 100% of cap: Supervisor short-circuits via
interrupt(), logs the budget overrun, escalates to human review. - The
--allow-overrunannotation can be set on the workflow start signal for known-expensive cases (e.g., a multi-repo monorepo migration). Use is logged for audit.
Tradeoffs¶
| Gain | Cost |
|---|---|
| Hard upper bound on per-workflow cost — no silent runaway | Cap calibration is initially guesswork; needs tuning |
| Operators see warning at 80% — can intervene before the cap hits | False positives (legitimate work that hits the cap) escalate to humans, adding review load |
| Per-task-class defaults capture "vulnerability patches are worth more than convenience migrations" intuition | Cap-by-task-class adds configuration surface area |
| Override exists for rare-but-real expensive cases — escape valve prevents brittle behavior | Override usage must be policed; could become the lazy escape for poor estimates |
Consequences¶
- The Budget Enforcer is implemented as Temporal middleware that wraps Activity execution. After each Activity completes, it reads the ledger and either advances or short-circuits.
- The Supervisor is the agent of the short-circuit — it issues
interrupt()with a state annotation explaining the budget cause. - Cap values are deferred decisions per task class — initial defaults are conservative; ADR amendment paths cover retuning after production data.
- The 80% warning triggers an automated message to a
#codewizard-budget-warningschannel; the 100% halt opens an audit ticket. - This cap is independent of the per-node retry cap (ADR-0014); both can fire.
Reversibility¶
Low cost. Cap thresholds and defaults are configuration. The Budget Enforcer middleware can be disabled (in test environments) without code changes. The enforcement behavior (halt on overrun) is the load-bearing piece; the threshold is not.
Evidence / sources¶
../design.md §3.3(Per-workflow budget enforcement subsection)../design.md §5(Cost controls — Per-workflow budget cap)- ADR-0024 (cost observability commitment)
- ADR-0014 (per-node retry cap — complementary)