ADR-0027: Cost attribution model — mapping costs to workflows, repos, and task classes¶
Status: Accepted Date: 2026-05-11 Tags: cost · attribution Related: ADR-0024, ADR-0025, ADR-0026
Context¶
Cost observability (ADR-0024) is only useful if costs can be attributed to specific workflows, repos, and task classes. The attribution model determines which questions are answerable from the cost ledger:
- "How much did this migration cost?"
- "Which repos are most expensive to migrate?"
- "What task class has the worst cost-per-PR ratio?"
- "Did the knowledge-graph reuse reduce per-migration cost over time?"
Attribution gets thorny when stages share resources. The Probe Coordinator runs continuously across all watched repos (ADR-0006); how is its cost attributed? A microVM warmed for one workflow may be reused for another; how is the warm-up amortized? Shared infrastructure (Temporal cluster, MCP servers) has fixed cost that has to be allocated.
Options considered¶
- Workflow-local attribution only. Each workflow pays for what it directly causes. Shared / continuous work (gather, infra) is unattributed overhead. Simple; underestimates true cost per workflow.
- Full marginal attribution. Continuous gather costs are amortized across the workflows that consume them in a window; shared infra is allocated proportionally. Most accurate; complex.
- Tiered: direct + amortized + overhead. Direct costs (LLM calls, sandbox runs uniquely for one workflow) attributed exactly. Continuous costs (gather, pre-rendering) amortized across consuming workflows. Fixed overhead (Temporal cluster, MCP infra) tracked at portfolio level, not per workflow.
Decision¶
Tiered attribution model:
- Direct costs — attributed exactly to the workflow that triggered them:
- LLM API calls (tokens × per-token cost by model)
- Sandbox microVM runs (microVM-seconds × rate)
- Workflow-specific MCP reads beyond the pre-rendered hot views
-
Reviewer time on the workflow's PR (Stage 6)
-
Continuous costs — amortized across consuming workflows:
- Continuous gather (ADR-0006) cost is attributed per-gather, then divided across the workflows that read that gather's
RepoContextwithin its freshness window - Pre-rendered Redis hot views are similarly amortized
-
Knowledge graph indexing cost (write side) is amortized across reads in a window
-
Portfolio overhead — tracked at the platform level, not per workflow:
- Temporal cluster running cost
- Postgres / Redis / vector DB baseline cost
- MCP server pod baseline cost (compute, not query work)
- Observability stack
Portfolio overhead is reported separately on the dashboard as a fixed-cost line item. The two headline ratios (ADR-0026) use direct + amortized; overhead is shown alongside but not divided into them.
Tradeoffs¶
| Gain | Cost |
|---|---|
| Per-workflow cost reflects true marginal cost — comparing workflows is fair | Amortization windows need definition (default: rolling 24h for continuous gather; revisable) |
| Portfolio overhead is visible as fixed cost — leadership can see the "table stakes" line | Some costs (e.g., a probe that fired for one workflow but cached for 50) are messy to attribute |
| Knowledge-graph compounding savings (ADR-0011) show up as decreasing amortized cost per workflow over time | Cross-workflow attribution math means a single workflow's reported cost can change retroactively as its gather is reused |
| Three tiers map cleanly to the cost ledger schema | Edge cases (e.g., a sandbox warmed for workflow A reused by workflow B before cooling) need decision rules |
Consequences¶
- The cost ledger schema has three classes of entries:
direct,amortized,overhead. - Amortized entries are written initially with a "pending" attribution; finalized once the amortization window closes.
- The dashboard surfaces all three: per-workflow direct + amortized for the "cost per merged PR" ratio; overhead as a separate fixed-cost line.
- Compounding savings via the knowledge graph (ADR-0011) become measurable: as more solved examples accumulate, per-workflow amortized RAG retrieval cost goes up (more storage) but per-workflow LLM cost goes down (more few-shot hits). Net per-workflow cost should trend down over time — this is a key ROI signal.
- Per-repo cost rolls up from per-workflow cost: same workflow-id keys, grouped by repo.
- Per-task-class cost rolls up similarly: useful for tuning per-task-class caps (ADR-0025).
Reversibility¶
Medium. Adjusting the amortization rules (e.g., changing the gather amortization window) is config; the cost ledger's reported numbers shift accordingly. Adding a new tier (e.g., "shared between specific workflow pairs") is a schema change with backfill implications.
Evidence / sources¶
../design.md §3.3(Cost telemetry instrumentation table — aggregation keys per source)../design.md §8.10(Cost view — ledger as central aggregation)- ADR-0024 (cost observability commitment)
- ADR-0026 (KPI model — depends on attribution being well-defined)