Story S4-04 — TreeSitterImportGraphProbe — py-tree-sitter no-internal-threads + grammar pin¶
Step: Step 4 — Ship IndexHealthProbe (B2) + Layer B structural probes
Status: GREEN (2026-05-16) — every AC has runtime evidence; full unit suite (2297 passed) + ruff + mypy --strict clean. See _attempts/S4-04.md attempt 2 for the per-AC evidence table and the 02-ADR-0011 surface translations. Implementation at src/codegenie/probes/layer_b/tree_sitter_import_graph.py; tests at tests/unit/probes/layer_b/test_tree_sitter_import_graph.py. Grammar load flows through codegenie.grammars.lock.language_for(name) -> tree_sitter.Language (PyPI wheels per 02-ADR-0011); ACs mentioning BLAKE3 / tools/grammars.lock / load_and_verify are superseded by pip --require-hashes at the wheel boundary.
Effort: M
Depends on: S4-03 (originally landed src/codegenie/grammars/lock.py as the BLAKE3 verifier kernel; the file persists but its public surface is now language_for(name) -> tree_sitter.Language + GrammarLoadRefused per 02-ADR-0011). S4-04 imports the kernel; it does NOT redeclare GrammarLoadRefused, does NOT import per-grammar PyPI packages directly (no from tree_sitter_typescript import ...), and does NOT re-implement any grammar load step.
ADRs honored: 02-ADR-0011 (PyPI grammar wheels behind language_for; supersedes 02-ADR-0002 with the named-trigger C-extension discipline preserved), 02-ADR-0003 (heaviness="medium" is a registry annotation; no internal ThreadPoolExecutor), Phase 0 ADR-0006 (the runtime closure is [project.dependencies]; gather extras is intentionally empty), Phase 1 ADR-0008 (in-process parse caps, not per-probe sandbox; grammar code runs in the gather process), Phase 1 ADR-0007 (warning ID pattern). The body below still cites 02-ADR-0002 verbatim — every such reference now points through 02-ADR-0011's supersession (the named-trigger discipline carries forward unchanged).
02-ADR-0011 translation table (read this BEFORE the body)¶
The story was written against 02-ADR-0002's vendored-.so model. The 2026-05-17 supersession changed the kernel surface but left every other AC valid. Apply these mechanical translations as you read:
| Legacy surface (S4-03 first edition / 02-ADR-0002) | Current surface (02-ADR-0011) |
|---|---|
from codegenie.grammars.lock import load_and_verify, GrammarLockFile, GrammarPin, GrammarLoadRefused |
from codegenie.grammars.lock import language_for, SupportedLanguage, GrammarLoadRefused |
lock = load_and_verify(_REPO_ROOT) |
language = language_for("typescript") (also "tsx", "javascript") |
tree_sitter.Language(pin.file, pin.language) for pin in lock.grammars |
language_for(name) returns a constructed tree_sitter.Language directly |
_get_language(lock_file_id, language) lru_cache helper |
not needed — the kernel memoizes via functools.lru_cache already |
AC asserting "no Path('tools/grammars.lock') literal" |
becomes "no from tree_sitter_typescript import ... / from tree_sitter_javascript import ..." |
AC asserting "no import blake3" |
preserved — still applies (the kernel handles supply-chain pinning at the wheel boundary, not the probe) |
AC asserting "no class GrammarLoadRefused redeclaration" |
preserved verbatim |
AC referencing tools/grammars.lock as declared_inputs cache-key token |
replaced by pyproject.toml + uv.lock (the wheel SHA256 pin) — a grammar bump invalidates because the wheel SHA256 changes. The legacy tools/grammars.lock token is no longer a valid declared input. |
Grammar pin mismatch → GrammarLoadRefused slice (confidence="low") |
preserved — the kernel still raises GrammarLoadRefused on every failure surface (missing wheel, unknown language, capsule factory drift, ABI mismatch). The probe-side honest-absence slice (Phase 2 NodeReflectionProbe ships the same shape — see src/codegenie/probes/layer_b/node_reflection.py:_emit_grammar_unavailable) is the reference implementation. |
Mirror the NodeReflectionProbe GREEN implementation (2026-05-17, S4-06 attempt 2 — src/codegenie/probes/layer_b/node_reflection.py + tests/unit/probes/layer_b/test_node_reflection.py) for every mechanical detail: kernel import line, Parser(language) construction, per-(language, query) Query caching, AST-walk discipline that forbids per-grammar PyPI imports.
Validation notes (2026-05-16 — phase-story-validator)¶
Verdict: HARDENED. Edits applied in place:
- BLOCK / consistency.
GrammarLoadRefusedand grammar-load logic moved to import the kernel fromcodegenie.grammars.lock(the chokepoint S4-03 AC-20 explicitly built for this probe). The probe no longer redefinesGrammarLoadRefused, no longer readstools/grammars.lockdirectly, no longer recomputes BLAKE3. AC-2 / AC-3 rewritten; impl outline step 2 deleted (kernel owns it); the dedicated_errors.pywas eliminated. - BLOCK / consistency.
Probe.runsignature corrected to two-argasync def run(self, repo: RepoSnapshot, ctx: ProbeContext) -> ProbeOutputper the frozen ABC atsrc/codegenie/probes/base.py:94. Impl outline step 6 + AC-12 + every TDD-plan invocation updated;self._parse_all(repo, ctx, ...)everywhere. - BLOCK / consistency.
py-tree-sittermoved to[project.dependencies]per Phase 0 ADR-0006 (gatherextras is intentionally empty — the runtime closure IS[project.dependencies]; the fence ADR-0002 reads that list). AC-14 rewritten; T-16 follows. - BLOCK / coverage. Grammar-file path resolution promoted from implementer-note to AC-Resolution:
_REPO_ROOTis the codewizard-sherpa repo root (computed at module import viaPath(__file__).resolve().parents[N]constant), NEVERctx.workspace/repo.root. T-resolution exercises a fixture-mode analyzed repo to confirm. - HARDEN / coverage. Determinism: edges sorted lexicographically by
(from, to); JSON written withsort_keys=True, separators=(",", ":"); atomic write via tempfile +os.replace. New AC-DET; new property test T-prop-idempotent. - HARDEN / test-quality. T-06 (thread count) scoped to set-difference by
Thread.ident, filtered to threads NOT present beforeasyncio.run(probe.run(...)). T-05 / T-07 specified as AST-precise walks with explicit forbidden-symbol lists (asyncio.gather,asyncio.wait,asyncio.to_thread,loop.run_in_executor,loop.create_task, etc.); the single admissible coordination primitive isasyncio.wait_forexactly once. - HARDEN / design-patterns.
_extract_imports_from_filesplit into pure_extract_imports(language, source_bytes, relative_path) -> list[Edge](functional core) + thin I/O shell. New AC-PURE; new T-pure-isolation. - HARDEN / design-patterns.
Edgemade a typed model (Pydantic frozenextra="forbid",Field(alias="from")forfrom_path). Newtype discipline for the import-graph payload. - HARDEN / coverage. Very-large-file guard (skip files > 4 MiB;
tree_sitter.file_too_largewarning; counted infailed_files). New AC-LARGE. - HARDEN / coverage. Indexable-files enumeration uses the Phase 1 shared helper (
_enumerate_indexable_files); explicit AC pins symlink +node_modules/+.codegenie/exclusion. - HARDEN / consistency.
has_errorAPI spelled precisely:tree.root_node.has_error.treeitself has nohas_errorattribute in modernpy-tree-sitter. - HARDEN / consistency. Import-time validation uses
raise AssertionError(...)(S4-01 precedent atsrc/codegenie/probes/layer_b/index_health.py:121-123), NOT bareassert. - HARDEN / coverage. Confidence rubric pinned to a discrete unambiguous rule (not "≥ 50% succeeded"):
highifffailed_files == 0 AND parsed_files > 0;mediumifffailed_files > 0 AND parsed_files >= failed_files;lowotherwise. AC-7 rewritten. - HARDEN / mypy.
[[tool.mypy.overrides]]entry fortree_sitter.*(ignore_missing_imports = true) added to AC-MYPY; cleaner than per-line# type: ignoreand matches Phase 1's convention. - CLARIFICATION. Explicit non-AC: tree-sitter is NOT a B2 freshness index source (S4-01's
IndexNameregistry coversscip,runtime_trace,semgrep,gitleaks,conventionsonly). The probe writesraw/import-graph.jsonand emits animport_graphslice; it does NOT write<output_dir>/raw/tree_sitter.jsonand B2 does NOT read it.
Full audit log: _validation/S4-04-tree-sitter-import-graph.md.
Context¶
TreeSitterImportGraphProbe extracts file-level import edges from the source tree using tree-sitter grammars and emits forward-only adjacency to raw/import-graph.json. Phase 3's adapters (ImportGraphAdapter Protocol shipped in S1-08) decide reverse projection; Phase 2 emits only.
Three disciplines from ADR-0002, phase-arch-design.md §"Component design" #12, and S4-03's chokepoint are load-bearing:
- Grammar load goes through the shared
codegenie.grammars.lockkernel (S4-03 AC-20). The probe importsload_and_verify(repo_root) -> GrammarLockFileandGrammarLoadRefused; the kernel readstools/grammars.lock, validates it via Pydantic, recomputes BLAKE3 over every vendored.so/.dylib, and raisesGrammarLoadRefusedon mismatch — before any grammar code executes. The probe never re-reads the lock file, never recomputes BLAKE3, never re-declares the exception. Duplicating the kernel surface would silently fork the supply-chain defense; the chokepoint is what makes it auditable. - In-process load. Grammar binaries are loaded in-process via
tree_sitter.Language(path, language_name)after the kernel's BLAKE3 check passes. No_grammar_runnersubprocess — ADR-0002 §Consequences rejects subprocess wrap as over-engineering for a threat the pin already addresses. A crashed grammar crashes the gather process; Phase 0 failure isolation contains it to one probe viaasyncio.wait_for. Loudness is a feature (Rule 12). - No internal
ThreadPoolExecutor, noasyncio.gather, noloop.run_in_executor, noasyncio.to_thread— the probe is one slot under the Phase 0 singleSemaphore(min(cpu_count(), 8)). Hidden parallelism inside a probe lies to the coordinator's budget (ADR-0003 §Decision reinforces this). Per-file extraction is sequential under the probe; the coordinator owns concurrency across probes. Verified by thread-count set-difference assertion at test time (the manifest risk callout specifically warned against "absence-of-threading-import" being a sufficient test), AND by an AST-precise grep for forbidden coordination primitives.
ADR-0002 §Consequences explicitly rejects _grammar_runner (out-of-process subprocess for tree-sitter invocations) — the grammar pin already addresses the threat; the subprocess wrap is over-engineering. In-process is the boring shape; a crashed grammar crashes the gather process, and Phase 0 failure isolation contains it to one probe via asyncio.wait_for. Loudness is a feature (Rule 12).
The slice (localv2.md §5.2 B3 — NodeReflectionProbe SLICE has reflection data, NOT import edges; this probe lands forward-only adjacency under a import_graph slice that the architecture treats separately from reflection). The arch §"Component design" #12 names raw/import-graph.json as the artifact; the slice itself summarizes (files_with_imports, total_edges, cyclic_components_count, confidence). Production consumers (Phase 3's ImportGraphAdapter) read the raw JSON, not the slice.
References — where to look¶
- Architecture:
../phase-arch-design.md §"Component design" #12— full internal structure; load-bearing properties: in-process, no thread pool, grammar pin verified.../phase-arch-design.md §"Edge cases" row 10— tree-sitter grammar BLAKE3 mismatch.../phase-arch-design.md §"Design patterns applied"row "Anti-patterns avoided" — "hidden parallelism inside a probe lies to the coordinator's budget."- Phase 2 ADRs:
../ADRs/0002-tree-sitter-grammars-phase-2-amendment.md— full rationale; in-process; pin-at-load;_grammar_runnerrejected; one named-trigger exception to Phase 1 ADR-0009.../ADRs/0003-coordinator-heaviness-sort-annotation.md—heaviness="medium"; no internal pools.- Phase 1 ADRs:
docs/phases/01-context-gather-layer-a-node/ADRs/0009-no-new-c-extension-parser-dependencies.md— the policy this ADR amends.- Source design:
docs/localv2.md §5.2 B3—NodeReflectionProbeuses tree-sitter as a sibling pattern; this probe is the import-graph one.- Existing code:
tools/grammars.lock+tools/grammars/{typescript,javascript}.so(from S4-03).src/codegenie/probes/base.py(frozen).src/codegenie/probes/registry.py(extended in S1-08).
Goal¶
Running codegenie gather against a TypeScript/JavaScript repo populates .codegenie/context/raw/import-graph.json (a JSON file containing forward-only adjacency, NetworkX-serializable shape, edges sorted lexicographically by (from, to), byte-identical across two consecutive runs on the same inputs) AND emits an import_graph slice summarizing edge counts. The probe delegates grammar load + BLAKE3 verification to the shared codegenie.grammars.lock kernel (S4-03 AC-20) — no duplicated reader, no duplicated GrammarLoadRefused, no per-file re-verification; mismatch surfaces as the kernel's typed exception and the probe slice records confidence="low", errors=["tree_sitter.grammar_pin_mismatch"] with no grammar code executed. The probe contains zero ThreadPoolExecutor, zero multiprocessing.Pool, zero asyncio.gather, zero asyncio.to_thread, zero loop.run_in_executor — per-file extraction is sequential under the probe's single coordinator slot, verified by thread-count set-difference assertion (not just import absence) AND AST-precise forbidden-symbol grep. Tree-sitter is not a B2 freshness-index source (B2's IndexName registry from S4-01 covers scip, runtime_trace, semgrep, gitleaks, conventions only); the probe writes raw/import-graph.json and emits an import_graph slice, and that is the totality of its contract.
Acceptance criteria¶
-
[ ] AC-1 — Probe contract attributes.
src/codegenie/probes/layer_b/tree_sitter_import_graph.pydefinesclass TreeSitterImportGraphProbe(Probe)with class attributes:name="tree_sitter_import_graph",version="0.1.0",layer="B",tier="base",applies_to_languages=["javascript","typescript"],applies_to_tasks=["*"],requires=["language_detection"],timeout_seconds=120,cache_strategy: Literal["content"] = "content".declared_inputsincludes["**/*.ts", "**/*.tsx", "**/*.js", "**/*.jsx", "tools/grammars.lock"](the lock-file is part of the cache key — a grammar version bump invalidates because the lock file content changes). The decorator is@register_probe(heaviness="medium"). The class implementsasync def run(self, repo: RepoSnapshot, ctx: ProbeContext) -> ProbeOutput— two-arg signature per the frozen ABC atsrc/codegenie/probes/base.py:94. One-argrun(self, ctx)is aTypeErrorat dispatch. -
[ ] AC-2 — Grammar load delegates to the shared kernel; no duplicated reader, no duplicated
At grammar-load time the probe callsGrammarLoadRefused. The probe imports fromcodegenie.grammars.lock:lock = load_and_verify(_REPO_ROOT)(the kernel readstools/grammars.lock, validates via Pydantic, recomputes BLAKE3 over every vendored.so/.dylib, raisesGrammarLoadRefusedon mismatch — before any grammar code executes). The probe then constructstree_sitter.Language(pin.file, pin.language)forlanguage ∈ {"typescript","javascript"}. Per-Languageconstruction is process-memoized via a module-level@functools.lru_cache(maxsize=4)-decorated helper_get_language(lock_file_id: str, language: Literal["typescript","javascript"]) -> tree_sitter.Language, keyed on(id(lock_file), language)so the kernel'sGrammarLockFileidentity preserves cache correctness across consecutiverun()calls within one process. The probe does not readtools/grammars.lockdirectly, does not callblake3.blake3(...), does not declareGrammarLoadRefused. T-no-direct-lockfile-IO AST-walks the probe module and asserts noopen(,Path("tools/grammars.lock"),blake3import — those belong to the kernel. -
[ ] AC-3 —
GrammarLoadRefusedis the kernel's exception; the probe catches and translates it to a slice. The probe'sruncatchesGrammarLoadRefused(imported fromcodegenie.grammars.lock— NOT re-declared), and emits a slice withconfidence="low",errors=["tree_sitter.grammar_pin_mismatch"],warnings=[](kernel-side detail does not surface as a warning; the structured-log record on the kernel side has the language + expected/actual BLAKE3),total_edges=0,files_with_imports=0,parsed_files=0,failed_files=0. The probe writes noimport-graph.json— the file is absent on disk after a mismatch run (atomic-write discipline of AC-DET means a half-populated file is never observable; the file simply does not exist). The structured log record from the probe includes the grammar language whose pin failed (pulled from the caught exception's attributes) AND the canonical error IDtree_sitter.grammar_pin_mismatch. -
[ ] AC-Resolution —
_REPO_ROOTresolves to the codewizard-sherpa repo, never the analyzed repo._REPO_ROOT: Final[Path]is a module-level constant computed at import viaPath(__file__).resolve().parents[N](implementer choosesNto land onsrc/codegenie/probes/layer_b/→ repo root). The probe NEVER consultsctx.workspace,ctx.output_dir,repo.root, or any analyzed-repo path to locate grammar binaries — the grammars belong to codewizard-sherpa itself, not the analyzed repo. A test (test_grammars_resolved_from_codegenie_repo_root) uses a fixture-mode analyzed repo attests/fixtures/portfolio/minimal-ts/and asserts the probe's resolved_REPO_ROOT / "tools/grammars.lock"is the codewizard-sherpa repo's lock file, NOT<fixture>/tools/grammars.lock(which doesn't exist). -
[ ] AC-4 — No internal
ThreadPoolExecutor, no parallel-coordination primitives; verified by thread-count set-difference AND AST-precise forbidden-symbol grep. - T-05 (import-name AST walk): AST-walks the probe module. For every
ast.Importandast.ImportFromnode, asserts no module name in the set{"threading", "concurrent", "concurrent.futures", "multiprocessing", "multiprocessing.pool", "asyncio.subprocess"}appears as an import target. Aliased imports (import threading as _t) are caught because the AST walk inspectsnode.name, not the alias. - T-06 (runtime thread-count set-difference): Captures
threads_before = {t.ident for t in threading.enumerate()}beforeasyncio.run(probe.run(repo, ctx)); capturesthreads_afterafter; asserts(threads_after - threads_before) == set(). Set-difference avoids brittleness from pytest-xdist / hypothesis / structlog-async threads pre-existing in the process. If tree-sitter's C library spawns a thread the test does not own, the assertion is scoped viaThread.name: any new thread whosename.lower()contains"tree_sitter"is considered upstream-library-owned and excluded — this scoping is explicit in the test code with a TODO comment referencing the upstream issue, NOT silent. - T-07 (forbidden coordination primitives AST walk): AST-walks every
ast.Callin the module. Asserts no call whosefuncresolves to any ofasyncio.gather,asyncio.wait,asyncio.as_completed,asyncio.create_task,asyncio.to_thread,loop.run_in_executor,loop.create_task,functools.partial(asyncio.gather, ...). The single admissible asyncio-coordination primitive isasyncio.wait_for(coro, timeout=...), exactly once, at therun()boundary (AC-12). -
The probe processes files in a synchronous
for file in indexable_files: ...loop inside anasync defshell. The loop body calls only synchronous helpers; there is noawaitinside the loop. The singleawaitin the probe is the kernel boundary atasyncio.wait_for(_parse_all(repo, ctx), timeout=...)inrun(). -
[ ] AC-PURE — Functional core / imperative shell separation. Per-file extraction is split:
_extract_imports(language: tree_sitter.Language, source_bytes: bytes, relative_path: str) -> list[Edge]is pure — noPathaccess, noopen(...), noread_bytes, no logging side-effects. Inputs: the Language object, the source bytes, the file's repo-relative path string. Output: a list ofEdge. Tested in isolation against in-memory byte strings (T-pure-isolation) — no temp directories, no file fixtures.-
_read_and_extract(path: Path, language: tree_sitter.Language, relative_path: str) -> list[Edge]is the thin shell that doespath.read_bytes()then calls_extract_imports(...). This is the only function in the per-file path that touches the filesystem; it is also the function that handles parse errors /tree.root_node.has_errorand raises_PerFileParseFailed. -
[ ] AC-5 — Per-file extraction emits forward-only adjacency; deterministic shape. For each TypeScript / JavaScript file,
_extract_importsparses the source viatree_sitter.Parser()configured with the language (parser.language = languageforpy-tree-sitter ≥ 0.21), walks the AST with a tree-sitter Query, and extracts everyimport X from "..."(ES module),import "..."(side-effect),export ... from "..."(re-export), andrequire("...")(CommonJS literal-string call). OutputEdge:Forward-only — no reverse adjacency in Phase 2 (Phase 3'sclass Edge(BaseModel): model_config = ConfigDict(frozen=True, extra="forbid", populate_by_name=True) from_path: str = Field(alias="from") # relative-to-repo POSIX path to: str # specifier as it appears in sourceImportGraphAdapterbuilds it).tovalues are emitted verbatim from the source (string literal as it appears in the import statement —"./utils","lodash","@scope/pkg", etc.). No resolution to filesystem paths — that's Phase 3 adapter territory. Dynamicimport(specifier)wherespecifieris a non-literal (variable, expression, template-literal-with-interpolation) is omitted (not emitted as"<dynamic>") — Phase 3's reflection adapter is the right layer for dynamic resolution. -
[ ] AC-DET — Deterministic, byte-identical artifact across reruns. Before writing
import-graph.json, edges are sorted lexicographically by(from_path, to). The artifact is serialized viajson.dumps(payload, sort_keys=True, separators=(",", ":"), ensure_ascii=False)and written atomically: write to a siblingimport-graph.json.tmpthenos.replace(...). T-prop-idempotent (Hypothesis property test) generates two runs against the same fixture set and asserts the on-disk artifact bytes are identical. A partial-timeout run (AC-12) also writes atomically — Phase 3 readers never observe a half-written file. -
[ ] AC-6 —
import-graph.jsonschema_version + Pydantic well-formedness. The artifact has top-level{"schema_version": 1, "edges": [...]}. A smallImportGraphArtifactPydantic model lives in the probe module (frozen=True,extra="forbid") and is the (de)serialization boundary. A unit test (test_import_graph_json_well_formed) loads the artifact viaImportGraphArtifact.model_validate_json(...).schema_version: 1is the current value; future shape changes bump and require a Phase-N ADR. -
[ ] AC-7 — Slice summary fields; confidence rubric is discrete and unambiguous. The
import_graphslice contains: files_with_imports: int— count of source files where ≥ 1 edge was emitted.total_edges: int.parsed_files: int,failed_files: int.confidence: Literal["high","medium","low"]per exactly this rubric (no thresholds, no arithmetic ratios):"high"ifffailed_files == 0 AND parsed_files > 0."medium"ifffailed_files > 0 AND parsed_files >= failed_files."low"iffparsed_files < failed_filesORparsed_files == 0(empty repo — see AC-9) ORGrammarLoadRefusedfired (AC-3) OR timeout fired (AC-12).
import_graph_uri: str(".codegenie/context/raw/import-graph.json"— relative path). Omitted when no artifact is written (mismatch/timeout-with-zero-edges).-
grammar_versions: dict[str, str]—{"typescript": "0.20.6", "javascript": "0.20.4"}from the kernel'sGrammarLockFile; provenance. Omitted onGrammarLoadRefused(the lock did not load). -
[ ] AC-8 — Per-file parse failure is contained.
_read_and_extractcheckstree.root_node.has_error(the precise modernpy-tree-sitter ≥ 0.21API) afterparser.parse(source_bytes). Iftree.root_node.has_errorisTrue, OR ifparser.parseraises any exception, the function raises the internal_PerFileParseFailedand the caller incrementsfailed_files; no edges are emitted from that file;warnings.append("tree_sitter.file_parse_failed")with a count cap (≤ 5 distinct entries — past 5, increment an internal counter and emit one summarytree_sitter.parse_failed_count_exceededwarning). The probe does NOT raise fromrun(). A future refactor that addspytest.raises(SyntaxError)here would defeat the discipline — failure containment is the contract. -
[ ] AC-LARGE — Very-large-file guard. Files whose size exceeds 4 MiB (
4 * 1024 * 1024bytes) are skipped beforeread_bytes(aPath.stat().st_sizecheck), counted infailed_files, and emit warningtree_sitter.file_too_large(subject to the same ≤ 5 cap and summary semantics as AC-8). Tree-sitter is robust but parsing a 50-MB bundled.jsfrom adist/directory can OOM the process and would defeat AC-12's timeout containment. -
[ ] AC-INDEXABLE — Indexable files come from the Phase 1 shared enumerator. The probe imports
_enumerate_indexable_files(or its successor name) from the Phase 1 shared helper module (Rule 11 — match codebase convention; the helper is referenced by S4-03's SCIP probe and is the source of truth for symlink /node_modules//.codegenie/exclusion). The probe does NOT re-implement file enumeration. A structural test asserts the helper is imported, not redefined; if the helper does not yet expose a JavaScript/TypeScript filter, the probe filters by extension after enumeration. -
[ ] AC-9 — Empty-repo guard. If
parsed_files == 0 AND failed_files == 0(an empty repo or one with zero.ts/.tsx/.js/.jsxfiles),confidence="low",warnings.append("tree_sitter.no_files_to_parse"), and noimport-graph.jsonis written (no artifact, noimport_graph_uriin the slice). Without this guard, an empty repo would pass through withconfidence="high"— the silent-confidence failure mode B2 exists to prevent. T-09 exercises this. -
[ ] AC-10 —
confidence="low"slice onGrammarLoadRefused. Per AC-3. Noimport-graph.jsonis written. The slice containsfiles_with_imports=0,total_edges=0,parsed_files=0,failed_files=0,confidence="low",errors=["tree_sitter.grammar_pin_mismatch"]. T-10 monkeypatchescodegenie.grammars.lock.load_and_verifyto raiseGrammarLoadRefused(language="typescript", expected_blake3=..., actual_blake3=...)and asserts the slice shape end-to-end (including absence of the artifact on disk). -
[ ] AC-11 — Warning + error ID frozenset; module-level
raise AssertionErrorvalidation. All IDs (tree_sitter.grammar_pin_mismatch,tree_sitter.file_parse_failed,tree_sitter.parse_failed_count_exceeded,tree_sitter.no_files_to_parse,tree_sitter.file_too_large,tree_sitter.timeout) are declared in module-level_WARNING_IDS: Final[frozenset[str]]and_ERROR_IDS: Final[frozenset[str]]. Import-time validation matches the S4-01 precedent atsrc/codegenie/probes/layer_b/index_health.py:121-123:for _id in _WARNING_IDS | _ERROR_IDS: if not _ID_PATTERN.match(_id): raise AssertionError(f"ADR-0007 violation: {_id!r}"). Bareassertis not used (Rule 11 — match convention). A unit test (test_warning_error_ids_match_adr_0007) also exercises the regex against the frozenset contents. -
[ ] AC-12 — Timeout containment via
asyncio.wait_for; atomic partial-graph write. The probe'srunisawait asyncio.wait_for(self._parse_all(repo, ctx), timeout=self.timeout_seconds). Thewait_foris the ONLY admissible asyncio-coordination primitive —asyncio.gather,asyncio.to_thread,loop.run_in_executor,loop.create_taskare all forbidden (AC-4). Onasyncio.TimeoutError, the slice contains whatever partial state was accumulated up to the timeout point ANDconfidence="low",warnings=["tree_sitter.timeout"]. The artifactimport-graph.jsonis written atomically (per AC-DET — sorted edges, tempfile +os.replace) with the partial edges iftotal_edges > 0; otherwise the artifact is omitted. A partial graph is better than no graph for Phase 3 fallback — UNLIKEScipIndexProbe(S4-03 AC-6 where partial blobs are deleted) — because the sorted-then-atomically-written JSON is a complete document of partial content, never a truncated stream. T-13 asserts atomicity by failing the test ifimport-graph.json.tmpexists afterrun()returns. -
[ ] AC-13 — Registry membership +
for_taskfilter.src/codegenie/probes/__init__.pyimportsTreeSitterImportGraphProbevia an explicit additive line (the side-effect import triggers@register_probe).default_registry.all_probes()includes it withheaviness="medium".for_task("*", frozenset({"typescript"}))andfor_task("*", frozenset({"javascript"}))include it; languages outsideapplies_to_languages(e.g.,frozenset({"python"})) skip it. -
[ ] AC-14 —
py-tree-sitterlands in[project.dependencies], NOT in[project.optional-dependencies] gather. Per Phase 0 ADR-0006 §Decision, thegatherextras is intentionally empty — the runtime closure IS[project.dependencies]; the fence (Phase 0 ADR-0002) scans[project.dependencies]for the LLM SDK ban. The new entry istree-sitter ~= 0.21(the modern PyPI package name; olderpy-tree-sitteraliases the same project — pin to the name the project ships at the chosen version).pip-auditandosv-scannercontinue to scan it via the standard[project.dependencies]reading path. A unit test (test_pyproject_lists_tree_sitter_in_project_dependencies) parsespyproject.tomlviatomllib, readsproject.dependencies, asserts the entry exactly once. -
[ ] AC-MYPY — Tree-sitter typing override in
This is cleaner than scatteringpyproject.toml.tree-sitter's package is notpy.typed. Add a[[tool.mypy.overrides]]entry inpyproject.toml:# type: ignore[import-untyped]at everyimport tree_sittersite (Phase 1 precedent: other untyped-third-party packages use the override block). A unit test parsespyproject.tomland asserts the override block exists. -
[ ] AC-15 — Tooling green.
ruff check,ruff format --check,mypy --strict src/codegenie/probes/layer_b/tree_sitter_import_graph.py,pytest tests/unit/probes/layer_b/test_tree_sitter_import_graph.py. All green.
Implementation outline¶
-
Create
src/codegenie/probes/layer_b/tree_sitter_import_graph.py. Class per AC-1 with two-argrun(self, repo, ctx)signature. -
Import the kernel (AC-2):
from codegenie.grammars.lock import GrammarLockFile, GrammarLoadRefused, load_and_verify. Do not redefineGrammarLoadRefused; do not readtools/grammars.lock; do not callblake3. -
Module-level
_REPO_ROOT: Final[Path](AC-Resolution):Path(__file__).resolve().parents[N]whereNlands on the codewizard-sherpa repo root (src/codegenie/probes/layer_b/foo.py→parents[4]is<repo-root>; verifyNempirically). -
_get_language(lock: GrammarLockFile, language: Literal["typescript","javascript"]) -> tree_sitter.Language(AC-2 process-memo) — module-level helper,@functools.lru_cache(maxsize=4)keyed on(id(lock), language). Looks up the pin by language in the typedGrammarLockFile, constructstree_sitter.Language(pin.file, language), returns it. The kernel'sload_and_verifyis the BLAKE3 chokepoint; this helper only constructs theLanguageafter the lock is verified. -
Pure helper
_extract_imports(language: tree_sitter.Language, source_bytes: bytes, relative_path: str) -> list[Edge](AC-PURE): parsessource_bytes, walks the AST via tree-sitter Queries (_TS_IMPORT_QUERY,_JS_IMPORT_QUERYmodule constants), emitsEdge(from_path=relative_path, to=specifier)for each hit. No I/O. -
Shell helper
_read_and_extract(path: Path, language: tree_sitter.Language, relative_path: str) -> list[Edge](AC-PURE): doesPath.stat().st_sizecheck (AC-LARGE); on too-large raises_PerFileTooLarge. Reads bytes viapath.read_bytes(). Parses; on parser exception ORtree.root_node.has_error == True(AC-8), raises_PerFileParseFailed. Otherwise calls_extract_imports(...)and returns the result. -
Sequential loop
_parse_all(self, repo: RepoSnapshot, ctx: ProbeContext, language_objs: dict[str, tree_sitter.Language]) -> tuple[list[Edge], int, int, list[str]]— synchronous-inside-async:Noedges: list[Edge] = [] parsed = 0 failed = 0 warnings: list[str] = [] for file in _enumerate_indexable_files(repo.root): if file.suffix not in (".ts", ".tsx", ".js", ".jsx"): continue language = language_objs["typescript"] if file.suffix in (".ts", ".tsx") else language_objs["javascript"] relative_path = file.relative_to(repo.root).as_posix() try: file_edges = _read_and_extract(file, language, relative_path) edges.extend(file_edges) parsed += 1 except _PerFileTooLarge: failed += 1 _accumulate_warning(warnings, "tree_sitter.file_too_large") except _PerFileParseFailed: failed += 1 _accumulate_warning(warnings, "tree_sitter.file_parse_failed") return edges, parsed, failed, warningsawaitinside the loop._accumulate_warningenforces the ≤ 5 cap + summary-emit semantics of AC-8 / AC-LARGE. -
async def run(self, repo: RepoSnapshot, ctx: ProbeContext) -> ProbeOutput(AC-1 / AC-12): - Try
lock = load_and_verify(_REPO_ROOT); onGrammarLoadRefusedbuild the mismatch slice (AC-3/AC-10) and return immediately — no artifact write, no further work. - Construct
language_objs = {"typescript": _get_language(lock, "typescript"), "javascript": _get_language(lock, "javascript")}. await asyncio.wait_for(_parse_all(self, repo, ctx, language_objs), timeout=self.timeout_seconds)— but_parse_allitself isasync defonly so it can be cancelled bywait_for; its body is synchronous (noawaitinside).- Catch
asyncio.TimeoutError; recover partial state via anonlocal-style accumulator (or split_parse_allto push state intoself's instance variables guarded by cancellation; the implementer chooses the shape — the test (T-13) pins behavior, not mechanism). - Sort
edgeslexicographically by(from_path, to)(AC-DET). - If
edgesnon-empty: atomic-writeimport-graph.jsontoctx.output_dir / "raw" / "import-graph.json"via tempfile +os.replace. Else: do not write. - Compute
confidenceper AC-7 rubric. - Build slice with all AC-7 fields (omitting
import_graph_uriandgrammar_versionswhen applicable). -
Return
ProbeOutput(schema_slice=..., raw_artifacts=[artifact_path] if written else [], confidence=..., duration_ms=..., warnings=..., errors=...). -
Register the probe via
src/codegenie/probes/__init__.pyadditive import (matches S4-01 precedent). -
Add
tree-sitter ~= 0.21to[project.dependencies]inpyproject.toml(AC-14). Add the[[tool.mypy.overrides]]block fortree_sitter.*(AC-MYPY).
TDD plan — red / green / refactor¶
Test helpers preamble¶
# tests/unit/probes/layer_b/test_tree_sitter_import_graph.py
from __future__ import annotations
import ast, asyncio, json, os, threading
from pathlib import Path
import pytest
from codegenie.grammars.lock import GrammarLoadRefused # AC-2/AC-3 — kernel exception, NOT a probe-local class
from codegenie.probes.base import RepoSnapshot, ProbeContext
from codegenie.probes.layer_b.tree_sitter_import_graph import (
TreeSitterImportGraphProbe,
Edge,
ImportGraphArtifact,
_extract_imports, # pure helper — AC-PURE
_get_language, # process-memo helper — AC-2
)
@pytest.fixture(autouse=True)
def _clear_language_cache():
"""``_get_language`` is ``lru_cache``-decorated. Tests that mutate the
grammar lock (T-03/T-04/T-10) must run against a cold cache."""
_get_language.cache_clear()
yield
_get_language.cache_clear()
RED¶
- T-01
test_probe_contract_attributes(AC-1): asserts class attributes, two-argrunsignature (inspect.signature(TreeSitterImportGraphProbe.run).parameterskeys are{"self", "repo", "ctx"}). - T-02
test_grammar_kernel_load_happy_path(AC-2): callsload_and_verify(_REPO_ROOT)directly (NOT through the probe) and asserts the returnedGrammarLockFilehas both TypeScript and JavaScript entries with vendored binaries that pass BLAKE3. - T-03
test_grammar_kernel_load_mismatch_propagates(AC-3): monkeypatchestools/grammars/typescript.socontent to a tampered byte string in a tempdir copy oftools/; callsload_and_verify(tempdir)(kernel surface — NOT a re-implemented probe helper) and assertsGrammarLoadRefusedis raised with the language name embedded in the message. - T-04
test_grammar_pin_mismatch_grammar_code_does_not_execute(AC-3): stronger — spy viamonkeypatch.setattr("tree_sitter.Language", Mock(side_effect=AssertionError("must not call"))); tamper the lock content; run the probe end-to-end viaasyncio.run(probe.run(repo, ctx)); assert noAssertionError, slice hasconfidence="low",errors==["tree_sitter.grammar_pin_mismatch"]. - T-05
test_no_parallelism_imports(AC-4): parses the probe module viaast.parse(Path(...).read_text()); for everyast.Import/ast.ImportFrom, asserts no module name in{"threading", "concurrent", "concurrent.futures", "multiprocessing", "multiprocessing.pool", "asyncio.subprocess"}. Aliased imports caught by inspectingnode.name, not the alias. - T-06
test_no_threads_created_during_run(AC-4 — load-bearing): capturesthreads_before = {t.ident for t in threading.enumerate()}beforeasyncio.run(probe.run(repo, ctx)); capturesthreads_afterafter; asserts(threads_after - threads_before) - {t.ident for t in threading.enumerate() if "tree_sitter" in t.name.lower()} == set(). Thetree_sitter-name filter is explicit in the test source with a TODO comment — it documents the upstream-library exemption rather than masking it silently. - T-07
test_no_forbidden_coordination_primitives(AC-4): AST-walks everyast.Callin the probe module; asserts no call resolves toasyncio.gather,asyncio.wait,asyncio.as_completed,asyncio.create_task,asyncio.to_thread,loop.run_in_executor,loop.create_task, orfunctools.partial(asyncio.gather, ...). The only admissibleasyncio.*call isasyncio.wait_for(also asserted positively — exactly one call site, insiderun). - T-no-direct-lockfile-IO (AC-2): AST-walks the probe module; asserts no
Path("tools/grammars.lock")-shaped string literal, noopen(...)with"grammars.lock"substring, noimport blake3, nofrom blake3 import .... The kernel owns these. - T-resolution
test_grammars_resolved_from_codegenie_repo_root(AC-Resolution): builds a fixture-mode analyzed repo at a tempdir, pointsctx.workspace/repo.rootthere; runs the probe; asserts the probe loaded the codewizard-sherpa repo'stools/grammars.lock(verified by inspecting the kernel's structured-log record OR by checking that the probe succeeded — the fixture repo has notools/of its own, so any resolution to it would fail loudly). - T-pure-isolation
test_extract_imports_is_pure(AC-PURE): constructs an in-memorysource_bytes = b"import x from 'lodash';\n", calls_extract_imports(language, source_bytes, "src/index.ts")directly (no fixture filesystem); asserts[Edge(from_path="src/index.ts", to="lodash")]. A monkeypatch onpathlib.Path.read_bytesraisingAssertionError("filesystem touched")asserts no I/O occurred during the pure-helper call. - T-08
test_per_file_parse_failure_contained(AC-8): fixture with one valid.tsfile and one with a deliberate syntax error (function (); assertsparsed_files==1,failed_files==1, slicewarningscontains"tree_sitter.file_parse_failed", probe returns without raising. Usestree.root_node.has_errorsemantics — verified by ensuring the helper does NOT catch a Python exception, only checks the boolean. - T-LARGE
test_file_too_large_skipped(AC-LARGE): fixture with a 5-MiB.tsfile containing real syntax; assertsfailed_files==1, slicewarningscontains"tree_sitter.file_too_large", AND tree-sitter is not called on the file (spy via monkeypatchingtree_sitter.Parser.parse). - T-09
test_no_files_to_parse_is_low_confidence(AC-9): empty repo (no.ts/.tsx/.js/.jsx); assertconfidence="low",warnings == ["tree_sitter.no_files_to_parse"], ANDimport-graph.jsonis not on disk, ANDimport_graph_uriis absent from the slice. - T-10
test_grammar_load_refused_full_slice(AC-10):monkeypatch.setattr("codegenie.probes.layer_b.tree_sitter_import_graph.load_and_verify", Mock(side_effect=GrammarLoadRefused(language="typescript", expected_blake3="abc", actual_blake3="def"))); assert slice fields per AC-10; assertimport-graph.jsondoes NOT exist on disk; assertgrammar_versionsandimport_graph_uriare omitted from the slice. - T-11
test_forward_only_adjacency_shape(AC-5/AC-6): fixture withsrc/a.tsimportinglodashand./utils,src/b.tsimportingreact; run probe; loadimport-graph.json; assert exact sorted shape[{"from":"src/a.ts","to":"./utils"},{"from":"src/a.ts","to":"lodash"},{"from":"src/b.ts","to":"react"}](lex-sorted by(from, to)). - T-12
test_slice_summary_fields(AC-7): exercise three runs (clean, partial-failure, mismatch); for each, assert every field in the slice matches the AC-7 rubric, including theconfidencediscrete-rule mapping. - T-13
test_timeout_contained_partial_graph_written_atomically(AC-12): monkeypatch_read_and_extractso that the third file's call awaits an unsignalled future (the implementer chooses how — likely via a fakeLanguagethat blocks);timeout_seconds=1; assertasyncio.TimeoutErrordoes NOT propagate;confidence="low",warningscontains"tree_sitter.timeout"; assertimport-graph.jsonexists with the first two files' sorted edges; assertimport-graph.json.tmpdoes NOT exist (atomic-write discipline). - T-prop-idempotent
test_two_runs_produce_byte_identical_artifact(AC-DET; Hypothesis): generate a list of synthetic TypeScript files (hypothesis.strategies.lists(...)of import statements); run the probe twice (cold cache between runs via_get_language.cache_clear()); assertPath("...import-graph.json").read_bytes()is byte-identical between runs. - T-14
test_warning_error_ids_match_adr_0007(AC-11): imports_WARNING_IDS,_ERROR_IDS,_ID_PATTERNfrom the module; asserts every ID matches the regex. - T-15
test_registry_membership_heaviness_medium(AC-13): asserts the probe is indefault_registry.all_probes()withheaviness="medium",runs_last=False; assertsfor_task("*", frozenset({"typescript"}))andfor_task("*", frozenset({"javascript"}))include it; assertsfor_task("*", frozenset({"python"}))excludes it. - T-16
test_pyproject_lists_tree_sitter_in_project_dependencies(AC-14): parsespyproject.tomlviatomllib; assertstree-sitter(or the pinned name) appears exactly once inproject.dependencies; asserts it does NOT appear inproject.optional-dependencies.gather(which must remain empty per Phase 0 ADR-0006). - T-MYPY
test_pyproject_has_tree_sitter_mypy_override(AC-MYPY): parsespyproject.toml; asserts atool.mypy.overridesentry exists withmoduleincluding"tree_sitter"andignore_missing_imports = true.
GREEN¶
Implement the module per outline. Source the tree-sitter Query strings from the tree-sitter-typescript and tree-sitter-javascript README query examples (the import-extraction queries are stable across recent grammar versions; bundle them in _TS_IMPORT_QUERY and _JS_IMPORT_QUERY module constants).
REFACTOR¶
- Confirm
_enumerate_indexable_filesis the Phase 1 shared helper (Rule 11). If S4-03 has not yet extracted a JS/TS variant, this probe's.ts/.tsx/.js/.jsxextension filter lives at the call site; do not duplicate enumeration policy. - Verify the structured-log event for
run()emitsparsed_files,failed_files,total_edges,grammar_versionsfor ops observability — and that onGrammarLoadRefusedit logs the language + expected/actual BLAKE3 (sourced from the kernel's exception attributes). - Confirm
mypy --strictpasses via the AC-MYPY override block.
Files to touch¶
Create:
- src/codegenie/probes/layer_b/tree_sitter_import_graph.py
- tests/unit/probes/layer_b/test_tree_sitter_import_graph.py
- tests/fixtures/portfolio/minimal-ts/ — small fixture for T-resolution, T-11, T-prop-idempotent. (If a sibling fixture already exists in S4-03's fixture set, prefer reusing it — Rule 11.)
Edit (additive):
- src/codegenie/probes/__init__.py — additive import (side-effect-registers the probe).
- pyproject.toml:
- [project.dependencies] adds tree-sitter ~= 0.21 (NOT [project.optional-dependencies] gather).
- [[tool.mypy.overrides]] block for tree_sitter.* (AC-MYPY).
Pre-existing (from S4-03 — imported, not re-implemented):
- src/codegenie/grammars/lock.py — load_and_verify, GrammarLockFile, GrammarLoadRefused. The probe IMPORTS this kernel.
- tools/grammars.lock, tools/grammars/typescript.so, tools/grammars/javascript.so — read by the kernel, NOT by the probe.
Out of scope¶
- Reverse adjacency /
ImportGraphAdapter. Phase 3 plugin owns reverse lookups. This story emits forward-only. - Symbol-level resolution. SCIP (S4-03) is the symbol-level layer; tree-sitter is statement-level.
- Dynamic
import("./" + name)resolution. Forward-only emission records the literal specifier as-emitted in source; if the source hasimport(specifier)wherespecifieris a variable, the probe emitsto: "<dynamic>"placeholder OR omits (implementer choice — recommend omit, and letNodeReflectionProbe(out of scope for this probe; it's S5-/B3 territory if a Phase-3 reflection probe is needed) handle dynamic patterns). - Other languages. TypeScript + JavaScript are Phase-2-required. Python / Go / Java grammars are Phase-8+ ADR-amendments to ADR-0002.
- Out-of-process
_grammar_runner— explicitly rejected by ADR-0002 §Decision. Do NOT propose subprocess isolation; the grammar pin is the supply-chain defense. - Re-verification per file. AC-2 explicitly says the BLAKE3 is verified at process startup, not per file. A future contributor proposing "re-verify per file for safety" is to be redirected here — the trust boundary is the process.
Notes for the implementer¶
- The kernel is the source of truth (AC-2). S4-03's AC-20 deliberately built
codegenie.grammars.lockas the chokepoint for both stories. A future contributor proposing "let's readtools/grammars.lockdirectly because it's a small file" is to be redirected here: the kernel owns BLAKE3 verification, the typedGrammarLockFile, andGrammarLoadRefused. Duplicating any of these silently forks the supply-chain defense. T-no-direct-lockfile-IO is the structural test that catches the regression. - The thread-count test (T-06) is the load-bearing one. The manifest's risk callout for this story: "verify by enumerating thread count, not just by absence of
threadingimport." A future contributor could (a) importasyncio(admissible), (b) useloop.run_in_executor(None, ...)orasyncio.to_thread(...)to spawn threads from the default executor, (c) violate the discipline without importingthreadingdirectly. T-06 catches (b) —threading.enumerate()sees the executor's threads. T-07 catches the AST form of the same violation. The combination of T-05 (forbidden imports) + T-06 (runtime thread-count) + T-07 (forbidden call symbols) is the load-bearing triple. - Process-memoization is on
_get_language, not onload_and_verify. The kernel may itself memoize, but the probe's side is afunctools.lru_cache(maxsize=4)on_get_language(lock, language). Test code MUST call_get_language.cache_clear()between mutation tests of the lock file (see the test preamble'sautouse=Truefixture). Per-file re-verification would be the kind of "defensive over-engineering" Rule 2 forbids; the trust boundary is process startup AND the kernel call. _REPO_ROOTresolution (AC-Resolution).Path(__file__).resolve().parents[N]— theNdepends on file location. Fromsrc/codegenie/probes/layer_b/tree_sitter_import_graph.py:parents[0]islayer_b/,parents[1]isprobes/,parents[2]iscodegenie/,parents[3]issrc/,parents[4]is the repo root. Verify empirically; pin in a module constant with a doc-comment explaining the count.- The
tree-sitterAPI. Moderntree-sitter(≥ 0.21 on PyPI) usestree_sitter.Language(path, name)for loading andparser.language = languagethenparser.parse(source_bytes). The query API:language.query(query_string).captures(tree.root_node)returns a list of(node, capture_name)pairs. Pin to a modern minor version (~= 0.21) so the import idiom is stable. - Tree-sitter Query language. The probe uses tree-sitter Queries (S-expression syntax) to match import patterns. For TypeScript:
(import_statement source: (string) @specifier),(export_statement source: (string) @specifier), side-effect imports(import_statement source: (string) @specifier). For CommonJSrequire:(call_expression function: (identifier) @func arguments: (arguments (string) @specifier) (#eq? @func "require")). Bundle the queries as module constants and document them inline — Phase 2 only needs ~6 query patterns; a vendored.scmquery file is premature. - Edges sort + atomic write (AC-DET) are not optional. Phase 3's
ImportGraphAdapterwill be tested by checksummingimport-graph.json; if the order varies across runs, every Phase 3 cache invalidates spuriously. Sort +sort_keys=True+ atomic-replace is the minimal cost; do it once at the write boundary. - The
[project.dependencies]placement (AC-14) is non-negotiable. Phase 0 ADR-0006 §Decision:gather = []is intentionally empty. The fence ADR-0002 reads[project.dependencies]. Addingtree-sitterto thegatherextras would silently exclude it from the fence's LLM-SDK check (the check uses set difference; the SDK list is the blocklist, not the allowlist — so the omission would be silent but the dependency would still install viapip install -e .[gather]). Match the repo convention; the slot exists for documentation, not for runtime separation. - The "loudness is a feature" framing. ADR-0002 §Tradeoffs — "A crashed grammar crashes the gather process; Phase 0 failure isolation contains it to one probe via
asyncio.wait_for, and the loudness is a feature." A grammar binary CVE or corruption is a real risk; the response is a CI failure (loud), not a silent skip. Theasyncio.wait_forcontainment is what makes this safe — the worst case is the gather drops this one probe's output and continues. - Functional core / imperative shell (AC-PURE) earns its keep here.
_extract_importswill eventually need to handle: dynamic imports, TSX/JSX-specific syntax, type-only imports, re-exports. Every one of those is a parser-side concern that can be unit-tested against in-memory byte strings. Keeping the I/O shell thin (one function, one filesystem touch) means the parser tests need zero fixtures and zero monkeypatching — they're literallyassert _extract_imports(lang, source, "x.ts") == [...]. - Rule 9 — tests verify intent. T-04 (grammar code does not execute on pin mismatch) encodes the WHY of the pin (supply-chain defense). T-06 (no threads created) encodes the WHY of the no-internal-pool rule (honesty to coordinator). T-pure-isolation encodes the WHY of functional core (testability + extensibility). T-prop-idempotent encodes the WHY of deterministic JSON (Phase 3 cache stability). T-13 (timeout writes partial graph atomically) encodes the WHY of "partial-graph-is-better-than-no-graph" — distinct from S4-03 where partial blobs are deleted, because sorted-then-atomically-written JSON degrades gracefully and
.scipdoesn't. Every test name and assertion message must point at WHICH discipline is being defended.