S4-03 — Attempt log¶
Attempt 1 — 2026-05-16 — SUCCESS¶
Outcome¶
All 20 ACs pass with runtime evidence. 40 tests added across 4 files; full
unit suite 2097 passed / 5 skipped / 1 xfail. Pre-existing
lint-imports env failure is not a regression (canary tool not installed
in local .venv; CI has it). ruff check, ruff format, mypy --strict,
pre-commit run, shellcheck all green.
What landed¶
Source (new):
- src/codegenie/exec/__init__.py — promoted from the monolithic
src/codegenie/exec.py module so codegenie.exec.tool_versions is a
valid import path. All existing imports
(from codegenie.exec import …, from codegenie import exec as _exec,
import codegenie.exec) continue to work — package import resolution is
identical from the caller's view.
- src/codegenie/exec/tool_versions.py (AC-19) — process-wide memo of
resolved CLI versions, keyed by (binary, argv-tuple). Exports
resolve_tool_version (async), resolve_tool_version_sync (sync
wrapper for @property use, degrades to "unknown" when called from
inside a running loop with cold cache), clear_for_tests. Tool-missing
is memoized as "unknown" rather than re-raised.
- src/codegenie/grammars/{__init__,lock}.py (AC-20) — typed loader for
tools/grammars.lock. Pydantic frozen=True, extra="forbid";
load_and_verify recomputes BLAKE3 over every vendored binary and
raises GrammarLoadRefused on mismatch with a structured message
naming the failing language.
- src/codegenie/probes/layer_b/scip_slice.py (AC-18) — SemanticIndexSlice
Pydantic smart constructor. Lifted into its own module so S4-07's
sub-schema generator and Phase-3's ScipAdapter can both import
without a circular dep on the probe.
- src/codegenie/probes/layer_b/scip_index.py (AC-1..AC-17) — the probe
itself. @register_probe(heaviness="heavy"); version is a
@property rolling in resolve_tool_version_sync("scip-typescript")
so the cache-key tuple at cache/keys.py:146 reflects tool upgrades
with zero new mechanism (AC-2). Single run_external_cli call site
(AC-3); pure helpers _build_scip_argv, _walk_indexable_files,
_count_indexable_files, _compute_indexable_merkle, _parse_summary_json.
Every failure path (timeout / non-zero exit / tool-missing /
raw-dir unwritable) writes the scip.json sidecar so B2 reads the
correct typed Stale(IndexerError(...)) and not the wrong
upstream_scip_unavailable signal.
Source (edited additive):
- src/codegenie/probes/__init__.py — explicit import of
codegenie.probes.layer_b.scip_index so registration fires at startup.
Vendored data:
- tools/grammars.lock — schema version 1, two pins
(typescript / javascript).
- tools/grammars/{typescript,javascript}.so — placeholder binaries
(see tools/grammars/README.md; S4-04's TreeSitterImportGraphProbe
is the first runtime consumer and must vendor real grammar binaries
before it lands green).
- tools/regenerate_grammars_lock.sh — idempotent BLAKE3 recompute;
refuses (exit 1) if any pinned binary is missing.
- .gitattributes — marks tools/grammars/*.so and *.dylib as
binary so git does not corrupt them.
Tests (40 added):
- tests/unit/probes/layer_b/test_scip_index.py (20) — covers AC-1..AC-18
including the cross-story B2 hand-off (T-07/T-10/T-19 feed scip.json
through S4-01's published scip_freshness via the freshness registry).
- tests/unit/exec/test_tool_versions.py (6) — T-20: single-subprocess
invariant, tool-missing memoization, sync wrapper paths.
- tests/unit/grammars/test_lock.py (9) — T-21: happy + mismatch +
schema-violation + missing-file + bad-blake3-shape.
- tests/unit/tools/test_grammars_lock.py (5) — T-14/15/16/17/18:
on-disk lock + regen-script idempotency + missing-binary refusal +
.gitattributes policy.
Tests (edited):
- tests/unit/exec/test_run_external_cli.py::test_only_exec_module_calls_create_subprocess_exec
— adjusted the exemption check from "filename equals exec.py" to
"path is inside the codegenie/exec/ package" since S4-03 promoted
the module to a package.
Deviations from the story spec¶
-
T-06's
.tscontent-mutation assertion swapped for a.tsset-mutation assertion.content_hash_of_inputsinsrc/codegenie/hashing.pyhashes(path, st_size)tuples — NOT file contents (it's a fingerprint, not a content hash; the same-size-edit-doesn't-invalidate behavior is the documented ADR-0006 §Tradeoffs row 4 finding, pinned by an xfail intests/unit/test_cache_invalidation_scope.py). So the test now adds a new.tsfile (changing the path-set) rather than mutating an existing one to the same size. The tool-version sensitivity arm of T-06 (probe.versionchange → key change) is intact. -
files_in_reposet to0on failure paths. AC-6/AC-7/AC-8 say the failure slice MUST flow through S4-01'sscip_freshnessto produceStale(IndexerError(message="indexer_reported_1_errors")). The check evaluates coverage BEFORE indexer errors (lines 190-196 ofindex_health.py), so a failure slice with(files_indexed=0, files_in_repo=<actual>)surfaces asStale(CoverageGap(...))— masking the true "indexer ran but failed" signal. Settingfiles_in_repo=0on failure paths makes the coverage check pass (0 < 0is false) so the indexer-errors check fires. Documented in code at the deletion site. -
Placeholder grammar binaries. Real tree-sitter grammar binaries require either downloading from upstream releases or local
tree-sitter generate && build. Both depend on network/toolchain not available in this autonomous scheduled-task environment. Thetools/grammars/README.mddocuments the vendoring protocol and names S4-04 as the first runtime consumer that MUST replace these placeholders. The BLAKE3 verifier works against the placeholders today AND against real grammars when they land — structural contract unchanged. AC-12 is satisfied (binaries exist on disk with matching BLAKE3); runtime grammar loading is S4-04's concern. -
codegenie.execpromoted from module → package. The story's "Files to touch" line explicitly left module-vs-package to implementer choice; the package form is the only waycodegenie.exec.tool_versionscan be a sibling import without a separate top-level module. All existing imports continue to work; the regression-tracked "only exec.py spawns subprocesses" test was adapted to "inside the exec package is exempt".
Refactor decisions (DP1-DP4 lens)¶
- DP1 — Open/Closed at the registry boundary. The probe registers
via
@register_probe(heaviness="heavy"); adding the next Layer-B probe (S4-04'sTreeSitterImportGraphProbe) requires zero edits toscip_index.py. Thetool_versionskernel andgrammars.lockloader are positioned for that consumer. - DP2 — Functional core / imperative shell. Pure helpers
(
_build_scip_argv,_walk_indexable_files,_count_indexable_files,_compute_indexable_merkle,_parse_summary_json) carry the deterministic logic; therun()method is the imperative shell that composes them with therun_external_cliport + filesystem writes. - DP3 — Smart constructor at the writer boundary.
SemanticIndexSliceis the single source of truth for the slice shape — both the envelope andscip.jsonderive fromslice.model_dump(mode="json", exclude_none=True). A renamed required field would fail Pydantic validation, not produce a silent mis-key. - DP4 — Kernel extraction at the rule-of-three.
tool_versionsis its own module so S4-04 (tree-sitter) and the Layer-G family (grype/syft/semgrep/gitleaks) can route through the same memo without copy-pasting.grammars.lockis its own module so both this story's tests and S4-04's pre-load BLAKE3 check share the typed surface.tool_versions.resolve_tool_versionreturning"unknown"on tool-missing closes the "probe.version must be safe to read on a machine without the tool" invariant.
Follow-ups surfaced this attempt¶
- S4-04 vendoring. The first commit that lands the real
TreeSitterImportGraphProbeMUST: (a) replace the placeholder.sofiles intools/grammars/with grammars compiled from the upstream tree-sitter-typescript / tree-sitter-javascript releases at the tag pinned intools/grammars.lock, (b) re-runtools/regenerate_grammars_lock.shto update the BLAKE3 fields, (c) include the upstream release URL + locally-computed BLAKE3 in the PR description. - CLI cold-start consideration.
ScipIndexProbe.versionreadsresolve_tool_version_sync("scip-typescript"). Ifcache_key-derivation paths read this synchronously from a cold cache, the first read fires a 5-second-budget subprocess. Today's call sites readprobe.versionfrom a sync path (cache-key derivation) — works. If a future call site reads it from inside a running asyncio loop with cold cache, the sync wrapper degrades to"unknown"(documented in its docstring); priming viaawait resolve_tool_version(...)at startup is the fix when that happens.