Skip to content

ADR-0037: Layered analysis funnel — SCIP for gather, type-checkers for verification, LSP reserved for interactive loops

Status: Accepted Date: 2026-05-18 Tags: gather · verification · scip · type-checker · lsp · cost · phase-boundaries Related: ADR-0005, ADR-0006, ADR-0008, ADR-0030, ADR-0031, ADR-0032

Context

The architecture answers two recurring "what depends on X" / "is X correct" questions across the pipeline using three distinct substrates: tree-sitter (fast, syntactic, language-agnostic), SCIP indexers (precise, semantic, offline-serialized), and — speculatively — Language Server Protocol clients (precise, semantic, interactive). ADR-0030 and ADR-0032 committed to tree-sitter + SCIP as the gather-pipeline substrates and named the four adapter primitives (dep_graph.consumers, import_graph.reverse_lookup, scip.refs, test_inventory.tests_exercising). Neither ADR explicitly addressed where LSP fits, leaving an attractive but architecturally hazardous slope open: "pyright as a probe," "tsserver as a context provider," "rust-analyzer wired into the gather coordinator."

The forces pushing toward LSP are real: it answers semantic questions with editor-grade precision, it has the broadest language coverage of any single substrate, and the agent-coding-tool ecosystem has converged on tree-sitter + LSP as the dominant hybrid. The forces pushing against LSP-in-gather are also real and load-bearing for this architecture specifically: LSP is a stateful long-lived stdio server with a 10s–5min per-language cold-start tax, GB-scale memory footprints (jdtls 4–16 GB; rust-analyzer 1–4 GB), no content-hash identity for cache invalidation (so B2 IndexHealthProbe can't fingerprint its state), and a typical dependency on network egress to resolve packages — which fights Phase 3 ADR-0006's NetworkPolicy = DenyAll default. LSP is the right shape for an interactive editor; it is the wrong shape for the batch, content-addressed, replayable gather pipeline this architecture commits to (ADR-0005, ADR-0006, ADR-0034).

The same question — "did my edit introduce a type error?" — also surfaces in the validation loop after a transform is applied (Phase 5+ Trust-Aware gates per ADR-0008). Here the temptation to reach for LSP is strongest, but a cheaper, structurally simpler substrate exists for the same signal: one-shot type-checker subprocess invocations (tsc --noEmit, mypy --strict, cargo check, mvn compile -DskipTests, go build ./...). One-shot type-checkers produce the identical "zero-new-type-errors" TrustSignal as an LSP would, at a fraction of the architectural cost — no stateful server, no warmup tax, no new component type, fits the existing SubprocessJail + ALLOWED_BINARIES pattern with a one-line amendment per language.

This ADR pins the three-substrate funnel explicitly so future contributors do not silently introduce LSP into the gather pipeline or duplicate the type-checker signal across phases.

Options considered

  • Option A — leave the decision implicit. ADR-0030 + ADR-0032 already commit to SCIP for gather; the absence of an LSP commitment is itself a commitment. Risk: future contributors read "we use tree-sitter + a semantic layer" as "tree-sitter + LSP" because that is the dominant idiom outside this project. The implicit decision rots quietly until someone lands pyright as a probe and the cold-start tax + memory footprint surface as Phase 14 production incidents.
  • Option B — admit LSP into the gather pipeline as a third probe substrate. Treat LSP as a long-lived per-repo process backing a new lsp.* query family. Reaches editor-grade precision for "find references" / "go to definition" / "type at expression." Cost: a 10s–5min cold-start tax on every workflow that touches the probe; GB-scale memory per active workflow; new MCP-style infrastructure to manage server lifecycle; cache-invalidation story that doesn't compose with the content-addressed model. Effectively re-introduces the interactive-editor failure mode the gather pipeline was designed to avoid.
  • Option C — explicit three-substrate funnel. Tree-sitter for the syntactic 70%; SCIP for offline semantic precision in gather; one-shot type-checkers as a verification TrustSignal in Phase 5+. LSP is deferred as a per-workflow interactive-loop substrate, available only in two phase-bounded slots: Phase 15 (agentic recipe authoring's tight edit-verify inner loop) and a possible Phase 14 MCP Language MCP server if and only if Phases 4/15 evidence the warm-pool economics. Adopting this option requires no source changes — it commits the boundary as a written rule, gives TrustScorer a typed signal slot it can opportunistically register, and gives the roadmap two explicit "consider LSP here" markers.

Decision

We adopt Option C. The codewizard-sherpa pipeline uses three analysis substrates with crisp role boundaries:

  1. Tree-sitter is the fast structural layer for the gather pipeline — used for repo mapping, conventions, marker scanning, Layer D/F/G probes, and any "looks like X" query across mixed-language or partially-broken files.
  2. SCIP (via scip-typescript, scip-java, scip-python, scip-go, …) is the canonical semantic-precision substrate for the gather pipeline. The four ADR-0032 language search adapters (dep_graph.consumers, import_graph.reverse_lookup, scip.refs, test_inventory.tests_exercising) wrap SCIP indexer outputs. LSP is forbidden in the gather pipeline.
  3. One-shot type-checker subprocesses are the canonical "did my edit introduce a new type error?" TrustSignal in Phase 5+ validation gates. They are registered through the @register_signal_kind open registry committed in Phase 3, surface as one SignalKind per language (typecheck.typescript, typecheck.python, typecheck.rust, typecheck.java, typecheck.go), and reach the TrustScorer through the same strict-AND fold as build / install / tests / lockfile_policy / cve_delta. They run inside SubprocessJail with the same ALLOWED_BINARIES discipline; each language binary added requires a one-line amendment ADR per Phase 3 ADR-0012's pattern.

LSP is deferred to two phase-bounded slots, and only those two:

  • Phase 15 (agentic recipe authoring) as a per-workflow interactive substrate for the tight edit-verify inner loop where the leaf agent generates and revises recipe code; the warmup tax amortizes across many per-workflow edits.
  • Phase 14 (continuous gather + MCP servers operationalized), conditionally — a centralized Language MCP server joining the Context/Skills/KG/Policy quartet may be admitted only if Phases 4 and 15 produce evidence that warm-pool LSP economics beat the one-shot type-checker baseline at portfolio scale. Until that evidence exists, the MCP topology (ADR-0023) does not include a Language server.

Phase 3's design and ADRs are not modified by this ADR — Phase 3 ships vuln remediation (≥90% lockfile-only edits) where the type-checker signal adds little value to the existing build + tests + lockfile-policy gate set. The signal becomes load-bearing first in Phase 4 (LLM fallback for major-version bumps with call-site rewrites), and that's where the first typecheck.* SignalKind registration lands.

Tradeoffs

Gain Cost
Prevents future contributors from silently introducing LSP into the gather pipeline (the slope is real — every coding-agent essay published in 2025–26 recommends LSP). Documenting restraint adds an ADR that future readers may treat as "obvious" and skim. The restraint is non-obvious precisely because the broader industry has converged elsewhere.
One-shot type-checkers reuse the existing SubprocessJail + ALLOWED_BINARIES + @register_signal_kind machinery. Zero new component types; one-line ADR amendments per language. One-shot type-checkers re-pay the type-checker startup cost on every workflow that needs the signal. For Python (mypy) and TypeScript (tsc --noEmit) this is 1–10s — acceptable. For Java (javac against a Maven multi-module project) and Rust (cargo check against a fresh target dir) this is 30s–5min — borderline; phase-design decides whether to admit it as a per-workflow signal or only as a nightly bench signal.
Pinning SCIP as the canonical gather-pipeline semantic substrate makes the deferred-LSP commitment concrete rather than aspirational. The scip.* adapter slot (ADR-0032) is already wired; this ADR formalizes that no parallel lsp.* slot will compete. SCIP indexer coverage is uneven (Go OK, Ruby/Kotlin/Swift weaker, scip-clang recent and rough). The first task class that needs precision in a poorly-covered SCIP language will hit the gap. The mitigation is AdapterConfidence.Degraded (Phase 3 ADR-0010), not LSP escalation.
Phase 4's design has a clear input — "the typecheck SignalKind lands here" — without forcing this ADR to pre-design Phase 4. The two-phase deferral (LSP in Phases 14/15) reads as a strong commitment but is contingent on Phase-4/15 evidence. Future readers must treat the deferral as conditional, not a promise.
The cost story scales correctly: tree-sitter (cents per portfolio), SCIP (dollars per portfolio), one-shot type-check (tens of dollars per portfolio per day at 1000 workflows), LSP (deferred until warm-pool amortization is proven). Each layer's spend is bounded before the next layer is admitted. LSP-in-Phase-15 still requires the ALLOWED_BINARIES amendments and a NetworkPolicy = RegistryAllowlist carve-out for dep resolution. The deferred decision does not eliminate that work; it just delays it until the value is evidenced.
The funnel composes with the existing TrustScorer strict-AND model (ADR-0008) — a failing type-check is a single objective signal among many, not a new control-flow primitive. The Phase-3-frozen TrustScorer(event_log) constructor injection (Phase 3 ADR-0005) needs no signature change. A failing type-check is binary (zero new errors / one or more new errors); fine-grained per-file or per-symbol scoring isn't captured. If Phase 12's deep-validation work wants per-file granularity, a follow-up ADR widens the typecheck.* payload — not the contract.

Consequences

  • Gather pipeline closes its substrate set. Tree-sitter and SCIP are the only semantic substrates admitted to Phases 0–3 and to ongoing gather work in Phases 14+. Any PR that imports an LSP client library into src/codegenie/probes/ or src/codegenie/plugins/ is rejected at review; a parallel import-linter contract may be added in Phase 14 if drift is observed.
  • Phase 4 design gets a concrete input. When roadmap-phase-designer runs against Phase 4, it inherits "first language typecheck.* SignalKind registers here" as a goal-criterion candidate. The signal's port shape (SubprocessJail invocation + parse stderr/stdout + emit TrustSignal) is fully specified by Phase 3 — Phase 4 adds the per-language adapter, not the framework.
  • Phase 14 MCP topology decision sharpens. ADR-0023 (MCP server topology) currently lists Context / Skills / KG / Policy as the quartet. This ADR adds a conditional fifth — Language MCP server — admissible only if Phases 4 and 15 produce evidence that warm-pool LSP economics beat per-workflow one-shot type-checker invocations at portfolio scale. The conditionality is the load-bearing part; admission is not automatic.
  • Phase 15 design gets a concrete input. When roadmap-phase-designer runs against Phase 15, "introduce per-workflow LSP-in-jail for the recipe-authoring edit-verify inner loop" is on the menu as a default rather than a research question.
  • ALLOWED_BINARIES amendments become routine and bounded. Each new task-class plugin that wants a typecheck.<language> signal lands one ADR amendment adding that language's type-checker binary. This is the same shape as Phase 3's npm/bwrap/sandbox-exec/jq amendment — auditable, narrow, ADR-anchored.
  • The AdapterConfidence.Degraded escape valve is now the canonical answer to SCIP coverage gaps, not LSP escalation. If scip-clang produces low-quality output for a C++ task class in some future phase, the adapter reports Degraded and the TCCM either falls back to tree-sitter-only or refuses; introducing clangd-as-LSP-probe is explicitly out of the playbook.
  • The "score, don't enumerate" framing for blast-radius analysis is left open as a future TCCM query type per task class (vuln-remediation weights call-sites + test coverage; distroless weights egress + shell deltas). This ADR does not pre-introduce a risk_score adapter — the premature-pluggability anti-pattern from Phase 3's pattern catalog applies.
  • One unflattering invariant becomes explicit. A failing type-check is not a recovery signal — it's a refuse-and-escalate signal under the three-retry default (ADR-0014). LSP would not change that recovery shape; it would just produce the same signal faster. This is part of why LSP doesn't earn its keep until the leaf-agent inner loop becomes the dominant cost (Phase 15).

Reversibility

Medium. Reversing this ADR — admitting LSP into the gather pipeline — requires three concrete changes: (a) amending or superseding this ADR with documented evidence that the cold-start tax / memory footprint / cache-invalidation gap is acceptable at portfolio scale; (b) extending ADR-0032 with an lsp.* adapter slot parallel to scip.*; (c) extending Phase 5's sandbox port and ALLOWED_BINARIES to admit LSP-server binaries with appropriate NetworkPolicy carve-outs. None of this is technically impossible, but the architectural ripple is large enough that it is unlikely to happen accidentally — which is exactly the bar this ADR is trying to set. The narrower reverse (admitting LSP to Phase 14's MCP topology) is much cheaper: it's a single new ADR consuming Phases 4/15 evidence, written when that evidence exists.

Evidence / sources

  • ADR-0005 — the deterministic-only gather invariant this ADR composes with.
  • ADR-0008 — the strict-AND objective-signal model that absorbs typecheck.* as one signal among many.
  • ADR-0030 — the query-primitive interfaces SCIP backs in gather.
  • ADR-0031 — the plugin bundle shape that owns per-language typecheck.* adapter registration.
  • ADR-0032 — the scip.* adapter slot this ADR pins as the canonical gather-pipeline semantic substrate (no parallel lsp.* slot).
  • ADR-0034 — the event-sourcing model that the typecheck.* signal emits into.
  • Phase 3 ADR-0006 — the SubprocessJail port one-shot type-checkers reuse without modification.
  • Phase 3 ADR-0012 — the ALLOWED_BINARIES amendment pattern future typecheck.<language> adoptions follow.
  • Phase 3 story S6-02 — the @register_signal_kind open registry where Phase-4 typecheck.* registrations land.
  • SCIP coverage matrix — https://github.com/sourcegraph/scip (consulted 2026-05).
  • External essay framing (provided to architect, paraphrased): the tree-sitter-fast / LSP-precise hybrid is the dominant idiom in editor-grade coding agents. This ADR diverges deliberately because the gather pipeline is not an editor; SCIP fills the precision role with the right time-shape for batch work.