ADR-0034: Event sourcing as canonical primitive for agent runs¶

Status: Accepted Date: 2026-05-13 Tags: event-sourcing · audit · replay · observability · learning Related: ADR-0003, ADR-0024, ADR-0026, ADR-0027, ADR-0029, ADR-0033

Context¶

Multiple production concerns will each need an append-only log of "what happened during this workflow":

Cost ledger (ADR-0024) — already append-only by design
Stage 7 Learning — needs an event stream to project into the knowledge graph
Workflow audit trail — needs replay capability for security and compliance
Trust gate decisions — need full provenance (which evidence, which adapters, which outcomes)
Plugin resolution (ADR-0031) — needs an audit trail (which plugin matched, which extends chain, was the fallback used)
Adapter degradation (ADR-0032) — needs a visible history (when did SCIP go stale, when did we fall back to tree-sitter)
ROI dashboard (ADR-0026) — aggregates outcomes, and outcomes are events

Each of these concerns could implement its own append-only log. That would mean 6+ distinct storage layers, 6+ schemas, 6+ query patterns, and 6+ chances to get the audit story wrong. Worse, cross-cutting queries (e.g., "show me every workflow that hit the universal fallback plugin in the last 30 days") become 6-way joins across heterogeneous stores.

Two layers in the architecture already provide event sourcing without us choosing it explicitly:

Temporal workflow history (ADR-0003) — every workflow has a complete event log natively; replay is a first-class feature
LangGraph checkpointer (ADR-0016) — every state machine transition is checkpointed; state reconstruction is replay

The architecture is already half-event-sourced. The choice in this ADR is whether to make event sourcing the canonical primitive — a single typed event log that every concern projects from — or to let each concern grow its own structure.

This ADR depends on ADR-0033 (domain modeling discipline). Typed events are dramatically more valuable than untyped events; without that discipline, the event stream is a soup of unstructured payloads that consumers re-parse defensively at every boundary.

Options considered¶

Option A — each concern implements its own append-only log. Cost ledger has its own table; Stage 7 Learning has its own stream; audit trail has its own. Each storage optimized for its access pattern. Most flexible; least cohesive; cross-cutting queries painful.
Option B — single canonical event log; every concern is a projection. One typed event stream, multiple projections (cost ledger, KG, ROI dashboard, audit trail). High cohesion; some access-pattern compromises; one storage layer to operate.
Option C — hybrid. Temporal handles workflow-internal events natively (state transitions, retries, gate decisions); a typed side-channel event log in Postgres handles workflow-spanning events (cost rollups, portfolio-level signals, KG writes). Projections materialize from both sources.

Decision¶

Adopt Option C — hybrid event sourcing with Temporal workflow history as the workflow-internal event store, plus a typed side-channel Postgres event log for workflow-spanning concerns.

Event types¶

All events are well-typed Pydantic models, respecting ADR-0033 discipline. Every event has a common envelope:

event_id: EventId (newtype on UUID)
event_type: EventType (sum-type discriminator)
workflow_id: WorkflowId | None (workflow-scoped events have one; portfolio events don't)
timestamp: datetime (UTC, monotonic where Temporal supplies it)
payload: EventPayload (tagged-union variant typed by event_type)
correlation_id: CorrelationId | None (for tracing chains across workflows)

Illustrative event variants (the full catalog grows phase-by-phase):

class PluginResolved(BaseModel):
    kind: Literal["plugin_resolved"] = "plugin_resolved"
    plugin_id: PluginId
    extends_chain: list[PluginId]
    matched_scope: PluginScope
    fallback_used: bool

class AdapterDegraded(BaseModel):
    kind: Literal["adapter_degraded"] = "adapter_degraded"
    primitive: PrimitiveName
    primary_adapter: AdapterId
    primary_confidence: float
    fallback_adapter: AdapterId | None

class TrustGatePassed(BaseModel):
    kind: Literal["trust_gate_passed"] = "trust_gate_passed"
    gate: GateId
    signals: dict[SignalKind, SignalValue]
    score: TrustScore

class TrustGateFailed(BaseModel):
    kind: Literal["trust_gate_failed"] = "trust_gate_failed"
    gate: GateId
    failing_signals: list[SignalKind]
    score: TrustScore
    retry_count: int

class CostIncurred(BaseModel):
    kind: Literal["cost_incurred"] = "cost_incurred"
    tier: CostTier               # direct | amortized | overhead
    amount_usd: Decimal
    source: CostSource

class MergeOutcome(BaseModel):
    kind: Literal["merge_outcome"] = "merge_outcome"
    pr_url: str
    decision: Literal["merged", "closed", "modified"]
    reviewer: str | None

Workflow state at any time = fold(events.filter(workflow_id=X)). The fold function is pure and exhaustively handles every event variant (enforced by mypy --strict + assert_never per ADR-0033).

Projections (consumers)¶

Each consumer of the event stream is a projection. The same event can feed multiple projections; projections are idempotent — running one twice produces the same materialized state.

Projection	Reads	Materializes
Cost ledger (ADR-0024, ADR-0027)	`CostIncurred` events	ledger rows by `(workflow, tier, source)`
ROI dashboard (ADR-0026)	`CostIncurred` + `MergeOutcome` events	headline ratios + diagnostics
Stage 7 Learning	`SolutionFound` + `AttemptCompleted` events	KG write-back
Audit trail	all events filtered by `workflow_id`	chronological event log per workflow
Plugin telemetry	`PluginResolved` + `MergeOutcome` events	per-plugin merge rate, cost/PR, fallback rate
Trust gate observability	`TrustGate*` events	retry-cause histograms, score distributions

Each projection is independently testable: given a fixture event stream, assert the projection's output. This is the test pattern for the entire observability surface — no need for end-to-end workflow runs to test that the cost ledger or ROI dashboard works.

Storage¶

Scope	Storage	Source of truth
Workflow-internal (state transitions, retries, gate decisions)	Temporal workflow history	Temporal cluster (ADR-0003)
Workflow-spanning (cost rollups, KG writes, portfolio-level signals)	Postgres event log (`events` table)	App-managed, retention policy mirroring Temporal's
Materialized views (cost ledger, ROI, KG, …)	Postgres / Redis	Derived from the above two — not source of truth

Projections subscribe to both sources. For workflow-scoped projections, Temporal's history-stream API is the input. For workflow-spanning projections, the Postgres event log is the input. For projections needing both (e.g., the audit trail rendering everything chronologically per workflow), the two streams are merged on timestamp.

Replay¶

Given a workflow_id:

Temporal replay reconstructs workflow-internal state via Temporal's native replay
Side-channel events filtered by workflow_id reconstruct workflow-spanning artifacts (cost ledger entries, KG writes, plugin-resolution decisions)
Merging the two on timestamp gives a complete chronological audit view

Replay is also a test primitive: every workflow's stored event history can be replayed in CI to verify the system reaches the same final state. Nondeterminism bugs that would otherwise reach production are caught at this layer.

Tradeoffs¶

Gain	Cost
6+ ADRs (cost, ROI, learning, audit, replay, plugin telemetry) share one storage primitive	Event schema discipline is required; depends on the ADR-0033 typed-events foundation
Replay-driven debugging — point at any workflow, get the full history, project any state	Storage cost grows with retention window; eventually need snapshots for long-running workflows (out of scope for v1)
Projections independently testable from fixture event streams — no end-to-end workflow runs needed for observability tests	Read patterns can be slower for ad-hoc queries that don't match any projection — design pressure to anticipate access patterns and materialize them
New observability features become projections — no new storage layer per feature	More design effort upfront defining events and discriminators
Cross-workflow analytics work naturally — same query language across the event log	Workflow-internal Temporal events and workflow-spanning Postgres events live in two stores — projection logic spans both
Trust gate "why did it decide that?" is a query, not a recovery exercise — full evidence is in the event payload	The temptation to put everything in events must be resisted; the event log is for decisions and outcomes, not for general state mutation

Consequences¶

Phases 0–8 use ad-hoc append-only structures where they need them (attempt logs from phase-story-executor, draft cost ledgers, etc.). No retroactive disruption.
Phase 9 (Temporal) formalizes the canonical event log. Temporal workflow history is the workflow-scoped substrate; the Postgres side-channel event log is added in Phase 9 (or Phase 13 alongside the cost-ledger formalization — whichever comes first).
ADR-0024 (cost observability) projection. Cost ledger becomes fold(CostIncurred events). Migration is straightforward because the existing draft format is already append-only.
ADR-0026 (ROI KPIs) projection. Headline ratios + supporting metrics derive from event-stream folds; the dashboard reads materialized projections.
Stage 7 Learning projection. KG writes derive from SolutionFound + AttemptCompleted event streams. The KG itself doesn't need to be append-only — only the events feeding it do.
Plugin telemetry projection. Per-plugin merge rate, fallback rate, and ROI all derive from PluginResolved + MergeOutcome events.
Trust gate audit becomes free. Every TrustGatePassed / TrustGateFailed event captures the full signal set. "Why did the gate decide what it decided?" is a SELECT * FROM events WHERE event_type IN (...) AND workflow_id = ? query.
Domain modeling discipline (ADR-0033) becomes load-bearing. Without typed events, the event stream is a soup. With typed events, every consumer pattern-matches exhaustively and the type checker enforces handling of every event variant.
Schema evolution discipline required. Adding a new event variant is non-breaking; renaming or removing fields requires a migration window and a record in this ADR's evidence section.

Reversibility¶

Medium. Removing event sourcing as the canonical primitive would mean each projection migrating to its own storage with its own schema — feasible but loses cohesion, replay capability, and cross-cutting analytics. Reverse migration (re-introducing event sourcing after removal) would need to backfill events from the per-concern storage layers — possible but lossy. The compounding benefits (cheap-to-add projections, replay debugging) accrue over time; removing event sourcing after Phase 11 (when Stage 7 Learning is live) would be expensive.

Evidence / sources¶

Greg Young, "Event Sourcing" — https://eventstore.com/blog/what-is-event-sourcing
Martin Fowler, "Event Sourcing" — https://martinfowler.com/eaaDev/EventSourcing.html
Pat Helland, "Immutability Changes Everything" — CIDR 2015
Temporal docs, workflow history and replay — https://docs.temporal.io/encyclopedia/event-history
ADR-0003 — Temporal as workflow substrate (Phase 9 anchor)
ADR-0024 — Cost observability end-to-end
ADR-0033 — Domain modeling discipline (typed-events foundation; this ADR depends on it)
../../reviews/2026-05-18-research-committee-search-paper.md — external evidence: an append-only event log of (proposal, evidence, validator_outcome, trust_outcome) is the audit-anchor schema future critic training would consume; this ADR's append-only discipline makes that option cheap to preserve