ADR-0011: Tree-sitter grammars via PyPI wheels (supersedes vendored .so model)¶
Status: Accepted Date: 2026-05-17 Supersedes: 02-ADR-0002 Tags: dependency-policy · supply-chain · parser · cross-platform · amendment Related: Phase 1 ADR-0009, 02-ADR-0001
Context¶
02-ADR-0002 adopted vendored grammar binaries under tools/grammars/{language}.so with BLAKE3 content pins in tools/grammars.lock. The model was "reviewed-as-code" — each grammar bump is a binary diff in the PR, reviewers compare the BLAKE3 to upstream, the loader recomputes BLAKE3 at runtime and refuses on mismatch.
Empirical findings on the road to S4-04 and S4-06 forced a re-evaluation:
- Cross-platform tax was never paid. The repo shipped only Linux x86_64
.sostubs and no macOS.dylib/ no Linux aarch64 / no Windows. Contributors on macOS could not run S4-04 or NodeReflectionProbe; CI on multiple platforms is a future requirement, not solvable by the current shape. - The vendored binaries on master are 68-byte placeholder stubs. Real grammars are 250–500 KiB. S4-04 hit
BLOCKED(2026-05-16) and S4-06's NodeReflectionProbe hit the same blocker the next day. Neither story can ship without real binaries. Producing real binaries requires a build chain (Node.js + tree-sitter CLI + C compiler) that is contributor-specific and per-platform. - "Reviewed-as-code" is theoretical for binary diffs. A human reviewer cannot meaningfully diff a
.sofile. The actual review is "does the BLAKE3 match what upstream signed?" — which is exactly the trust-the-maintainer model PyPI provides, routed through manual hash bumps. - The Python
tree_sitterAPI has moved. TheLanguage(path, name)shape 02-ADR-0002 designed around is deprecated upstream. Modern usage isLanguage(<PyCapsule>)where the capsule comes from a sibling PyPI package (tree_sitter_typescript.language_typescript()). - Maintenance cost grows linearly. Phase 8+ adds Python and Java grammars; each new language × each supported platform compounds the binary-vendoring matrix. PyPI scales by
pip install <new-grammar>— the wheel matrix is the maintainer's problem, not ours. tools/grammars.lockis a re-implementation ofpip --require-hashes. Both pin specific bytes by hash; only one is a one-off reinvention. Aligning with the ecosystem standard is a long-term simplification.
The named-trigger amendment 02-ADR-0002 made to Phase 1 ADR-0009 (the C-extension dep policy) is not in dispute — tree-sitter is still the single named-trigger exception. This ADR amends only how grammars are delivered, not whether tree-sitter itself is allowed.
Options considered¶
- Option A — stay with vendored
.sofiles, refill the stubs with real binaries. Honors 02-ADR-0002 literally. Defers but does not solve: cross-platform matrix, ABI churn, per-grammar-bump build-chain dependency, deprecated Python API. Each new language compounds the work. - Option B — adopt PyPI grammar wheels (
tree-sitter-typescript,tree-sitter-javascript, futuretree-sitter-python,tree-sitter-java). Wheels ship per-platform, maintained by the upstream tree-sitter org.pip --require-hashesin CI preserves the supply-chain-pin property at the wheel level. ABI compatibility is the wheel maintainer's responsibility. New language = new dep line inpyproject.toml. - Option C — hybrid: PyPI in dev, vendored in prod / service. Doubles the cache-key strategies, doubles the test matrix, doubles the failure modes. No clear benefit either path lacks.
- Option D — descope NodeReflectionProbe + B3 import-graph to a pure-Python regex tokenizer. Avoids
tree-sitterentirely. Rejected: 02-ADR-0002's Option A analysis already documented why regex tokenization is grammar-inaccurate; that finding still holds.
Decision¶
Adopt Option B. Tree-sitter grammars are sourced from PyPI as tree-sitter-typescript, tree-sitter-javascript, and (Phase 8+) tree-sitter-python, tree-sitter-java. The tree-sitter runtime is upgraded to >=0.23,<0.26 to match the modern Language(<PyCapsule>) API.
tools/grammars/, tools/grammars.lock, and tools/regenerate_grammars_lock.sh are deleted. The codegenie.grammars kernel's surface narrows to a single function language_for(name) -> tree_sitter.Language that imports the matching PyPI package and returns the Language object. GrammarLoadRefused is retained as the exception type — consumers catch it on ImportError (the package is missing from the closure) or on an unknown language name.
Supply-chain pinning moves from BLAKE3-of-.so to pip --require-hashes against the wheel SHA256 (Phase 0 ADR-0006's pinned-dep discipline applies uniformly). The pin is expressed in the pyproject.toml lower bound + the lockfile pip ultimately consumes (Phase 0 + Phase 1 already encode pip-lockfile discipline; this ADR does not extend it).
pip-audit and osv-scanner already watch the entire installed runtime closure, so the new wheels gain CVE-feed coverage automatically — no new tooling.
Tradeoffs¶
| Gain | Cost |
|---|---|
| Cross-platform out of the box — macOS Intel/Apple Silicon, Linux x86_64/aarch64, Windows wheels all ship from upstream | Trust shifts from "we built it" to "tree-sitter org built it and signed the wheel" — generally net-better for the threat model, but a deliberate shift |
| Two stories (S4-04 + S4-06 NodeReflection) unblock with a single ADR amendment + one kernel refactor | One-time migration cost: ADR amendment, kernel refactor, story status updates, deletion of vendoring infrastructure |
Modern Language(<PyCapsule>) API — the deprecated Language(path, name) shape becomes irrelevant; future tree-sitter upgrades don't break us |
The new API needs tree-sitter>=0.23; older grammar wheels that bundle their own C extension may pin lower. Verified upstream tree-sitter-typescript@0.23.2 + tree-sitter-javascript@0.25.0 co-install with tree-sitter@0.23.x cleanly |
Maintenance burden scales by pip install <new-grammar> — Phase 8+ Python grammar adds one dep line |
Wheels are larger than .so files (~1–3 MB each) due to bundled C ext; runtime closure size grows ~5 MB for ts+js |
tools/grammars.lock BLAKE3 model collapses into pip's existing hash-pinning — same supply-chain property, one mechanism instead of two |
Pre-existing tests that exercised the BLAKE3 verifier (~9 tests across tests/unit/grammars/ and tests/unit/tools/) are deleted; replaced by smaller "does the kernel return a Language" surface |
Reviewed-as-code property preserved at a higher level — pip --require-hashes is the ecosystem-standard expression of the same pin |
A malicious upstream maintainer of tree-sitter-typescript could publish a compromised wheel; same threat model as every other pinned dep, mitigated by the same pip hash machinery |
Pattern fit¶
Pattern: Ports & Adapters / Dependency Inversion (composes with design-patterns-toolkit.md §"Hexagonal architecture"). The kernel (codegenie.grammars.language_for) is the port; the PyPI grammar packages are the adapter. The probe code never imports tree_sitter_typescript directly — it asks the kernel for a Language("typescript") and the kernel does the dispatch. Adding Python in Phase 8+ is one new branch in the kernel's dispatch table, zero edits to NodeReflectionProbe.
This is also a strict simplification of 02-ADR-0002's pattern. The old shape encoded supply-chain trust at TWO layers (a hand-rolled BLAKE3 verifier + the PyPI installer for py-tree-sitter itself). The new shape encodes it at ONE layer (pip hash-pinning). Functional core / imperative shell holds: language_for is pure-ish (it imports a module and constructs an object); consumers compose it.
Consequences¶
pyproject.toml's[project].dependenciesgainstree-sitter>=0.23,<0.26,tree-sitter-typescript>=0.23,<1,tree-sitter-javascript>=0.23,<1. The fence (Phase 0 ADR-0002) continues to enforce the LLM-SDK closure; tree-sitter wheels are not LLM SDKs.tools/grammars/,tools/grammars.lock,tools/regenerate_grammars_lock.share deleted. The.gitattributesentries fortools/grammars/*.soandtools/grammars/*.dylibare removed.src/codegenie/grammars/lock.pyis replaced by a smaller module exposinglanguage_for(name) -> tree_sitter.LanguageandGrammarLoadRefused. TheGrammarLockFile/GrammarPindataclasses are removed (no callers outside the deleted tests).- S4-04 (
TreeSitterImportGraphProbe) and S4-06 NodeReflectionProbe both unblock. S4-04's story file is updated to referencelanguage_forinstead ofload_and_verify. S4-06's story file removes the AC-R2 / T-R3 assertions about "noclass GrammarLoadRefusedredeclaration" (the import surface changes; the no-redeclaration property is preserved). - Phase 8+ language additions (Python, Java, Go) are one-line
pyproject.tomladditions plus one entry in the kernel's dispatch table — no ADR amendment unless the runtime closure changes shape (e.g., a non-PyPI grammar source). - A grammar wheel CVE alert via
pip-auditis a "review the upstream advisory, bump or veto" PR — same workflow as every other dep CVE.
Reversibility¶
High. Reverting to vendored grammars is: re-add the tools/grammars/ directory + lockfile + regen script (the deleted ADR-0002 shape), rewrite language_for to call load_and_verify, drop the PyPI deps. Consumers (NodeReflectionProbe, TreeSitterImportGraphProbe) call language_for(...) either way — the kernel boundary makes the swap mechanical.
Evidence / sources¶
- 02-ADR-0002 — the parent ADR this supersedes
- Phase 1 ADR-0009 — the C-extension policy 02-ADR-0002 amended is still honored; this ADR only changes grammar delivery
docs/phases/02-context-gather-layers-b-g/stories/_attempts/S4-04.md— empirical blocker analysis: stub binaries cannot be loaded; build chain is per-contributordocs/phases/02-context-gather-layers-b-g/stories/_attempts/S4-06.md— same blocker hit by NodeReflectionProbe one day latertree-sitter-typescripton PyPI: https://pypi.org/project/tree-sitter-typescript/ — wheels for linux x86_64/aarch64, macOS Intel/Apple Silicon, Windowstree-sitter-javascripton PyPI: https://pypi.org/project/tree-sitter-javascript/ — same wheel matrix- Modern
tree_sitter.Language(<PyCapsule>)API: https://github.com/tree-sitter/py-tree-sitter