Skip to content

Story S7-02 — Fixtures batch 2: monorepo-pnpm + load-bearing stale-scip full materialization

Step: Step 7 — Plant five-repo fixture portfolio + per-probe golden files + remaining adversarial corpus Status: Done — GREEN 2026-05-18 (phase-story-executor; see _attempts/S7-02.md for the per-AC evidence table + gate log) Effort: M Depends on: S7-01 HARDENED (fixtures batch 1 — patterns + shape-test conventions + the shared tests/unit/_fixture_regen_allowlist.py module transfer wholesale; the "hand-author the lockfile; do NOT run pnpm install at regen" pattern is the explicit S7-01 precedent for monorepo-pnpm), S4-02 (stale-scip STUB at tests/fixtures/portfolio/stale-scip/ + test_stale_scip_fixture.py CI-gating adversarial — this story is the FULL materialization of the _seed/scip-index.scip.placeholder binary, NOT a wholesale stub replacement). ADRs honored: ADR-0001 (allowlisted binaries — regenerate.sh for both fixtures invokes only binaries in ALLOWED_BINARIES ∪ _SHELL_COREUTILS_ALLOWLIST; scip-typescript is in the allowlist but is invoked ONLY out-of-band by the contributor producing the seed binary, NOT inside regenerate.sh; pnpm/npm/node-gyp are NOT allowlisted), ADR-0006 (IndexFreshness location — CommitsBehind is the structural assertion the fixture's adversarial test reads), ADR-0007 (no plugin loader — neither fixture seeds plugins/), ADR-0009 (pytest-xdist veto — closed-set fixture trees, regen-script-only mutation surface).

Validation notes (2026-05-18)

Hardened by phase-story-validator (scheduled task: story-validation-corrector). Verdict: HARDENED.

Summary of changes (full audit log in _validation/S7-02-fixtures-batch-two.md):

  • Block-tier — pnpm is NOT in ALLOWED_BINARIES (verified at src/codegenie/exec/__init__.py:96-111; the closed Phase-2 set is {git, node, semgrep, syft, grype, gitleaks, scip-typescript, ast-grep, ripgrep, tree-sitter, docker, strace}). Original AC-10 + Implementation Outline §1 said monorepo-pnpm's regenerate.sh runs pnpm install --frozen-lockfile; that would either fail S7-01's AC-31 static check or force a silent ADR-0001 expansion. Fix: AC-10 rewritten to mirror S7-01 native-modules' HARDENED precedent — hand-authored pnpm-lock.yaml bytes committed to the fixture; regenerate.sh does NOT invoke pnpm install (regen is mkdir/coreutils-only). Implementation Outline §1 rewritten accordingly.
  • Block-tier — bash can't call run_allowlisted. Original AC-21 step (b) said regenerate.sh "runs scip-typescript (via run_allowlisted)". run_allowlisted is a Python function in src/codegenie/exec/__init__.py; bash cannot call it. Same architectural mismatch S7-01 AC-22 had. Fix: AC-21 split into AC-21a ("seed-build ritual" — one-time, contributor's local box, invokes scip-typescript directly to produce _seed/scip-index.scip; NOT inside regenerate.sh) and AC-21b ("regen-runtime" — deterministic, just commits + template-materialize + seed-copy; no scip-typescript invocation at regen time).
  • Block-tier — story prescribed last-indexed-commit.txt mechanism that contradicts the actual S4-02 stub. Original AC-19, AC-20, and content predicates (_last_indexed_not_equal_to_current_head, _regen_refuses_current_head) all referenced a last-indexed-commit.txt file. The actual S4-02 stub at tests/fixtures/portfolio/stale-scip/ uses a _seed/scip-slice.template.json template with a PARENT_COMMIT placeholder substituted at regen time into .codegenie/context/raw/scip.json. AC-18's "S4-02's stub already chose one path, this story honors it" made the contradiction internal. Fix: AC-18 rewritten to explicitly name the actual mechanism; AC-19 rewritten to describe the seed-template; AC-20 rewritten to point at the existing regenerate.sh guard (LAST_INDEXED="${LAST_INDEXED:-$(git rev-parse HEAD~1)}" + the [[ "$LAST_INDEXED" == "$(git rev-parse HEAD)" ]]; exit 1 block); content predicates rewritten accordingly.
  • Block-tier — AC-17 committed .codegenie/context/raw/scip-index.scip conflicts with the existing stub. The actual S4-02 stub gitignores .codegenie/ (defense-in-depth) AND S7-01's central no-committed-cache guard asserts no .codegenie/cache/ under any portfolio fixture. The story's .gitignore carve-out would either weaken the central guard or contradict the stub's pattern. Fix: AC-17 rewritten — the real binary SCIP lives in _seed/scip-index.scip (committed; replaces _seed/scip-index.scip.placeholder); .codegenie/ stays gitignored; regenerate.sh copies the seed blob to .codegenie/context/raw/scip-index.scip at runtime. AC-29 + Implementation Outline §4 amended — the existing S7-01 central no-cache-committed guard passes unchanged; no edit needed.
  • Block-tier — AC-15 "wholesale replacement of the S4-02 stub" destroys the working seed-template mechanism. The adversarial test reads .codegenie/context/raw/scip.json (materialized from _seed/scip-slice.template.json), NOT the binary .scip blob. Wholesale replacement would break this. Fix: AC-15 rewritten — replacement is restricted to the _seed/scip-index.scip.placeholder empty blob (substituted with a real scip-typescript-built blob) plus an expanded src/ tree; the seed-template + regenerate.sh mechanism is preserved.
  • Harden-tier — adversarial-still-passes framing. Original AC-32/AC-33 implied "full materialization makes the adversarial assertion non-trivially true" — but the adversarial has been passing since S4-02 against the seed-templated scip.json (the template carries PARENT_COMMIT, NOT current HEAD, by construction). S7-02's actual contribution is the real binary SCIP for S4-03's future ScipIndexProbe, not changing what S4-02's adversarial asserts. Fix: AC-32/AC-33 reworded; footnote acknowledges the widened outer-key set ({"scip", "runtime_trace"} after S5-05; future S6-08 registrations may widen it further) so a future contributor doesn't think a materialization PR broke an unrelated invariant.
  • Harden-tier — _ProbeName Literal subset semantics inherited from S7-01. Original AC-26 said "runtime-equals the documented one" — runtime-equality. S7-01's HARDENED AC-37 uses subset semantics (set(registered_phase_2_names) ⊆ set(get_args(_ProbeName))) so Phase-3+ probes added later don't retroactively break Phase-2 fixtures. Fix: AC-26 rewritten with subset semantics, matching S7-01.
  • Harden-tier — kernel location across the fixture-namespace boundary. Original AC-23 placed the kernel at tests/fixtures/portfolio/_shape_test_kernel.py. AC-25 says Phase 1's test_fixture_node_typescript_helm_shape.py migrates to consume the kernel; Phase 1's fixture lives at tests/fixtures/node_typescript_helm/ (NOT under portfolio/). Importing a portfolio/-namespaced kernel from outside the portfolio subdirectory is awkward. Fix: kernel relocated to tests/fixtures/_shape_test_kernel.py (above the portfolio/ subdirectory) so all six consumers import cleanly.
  • Harden-tier — kernel __all__ runtime check. AC-23 required mypy --strict, but a silent removal of enumerate_tracked or one of the make_* factories would still pass mypy if consumers were updated concurrently. Fix: AC-23 amended — kernel exposes a documented __all__ set; a test at tests/unit/test_shape_test_kernel.py asserts the export set matches the documented contract.
  • Harden-tier — _fixture_regen_allowlist.py consumer tests for the two new fixtures. S7-01 lifted the shared module to tests/unit/_fixture_regen_allowlist.py; the two new fixtures need their own consumer tests (tests/unit/test_fixture_monorepo_pnpm_regenerate_allowlist.py + tests/unit/test_fixture_stale_scip_regenerate_allowlist.py). Fix: AC-31 amended to reference the shared module; "Files to touch" extended.
  • Harden-tier — TDD-plan _FILE_SPECS for stale-scip matched the wrong mechanism. Original _FILE_SPECS listed last-indexed-commit.txt and .codegenie/context/raw/scip-index.scip as committed entries. Fix: _FILE_SPECS rewritten to list the actual committed files (_seed/scip-slice.template.json, _seed/scip-index.scip (committed real binary), regenerate.sh, README.md, the source tree); content predicates rewritten for the corrected mechanism (_last_indexed_not_equal_to_current_head reads regenerate.sh for the LAST_INDEXED="${LAST_INDEXED:-$(git rev-parse HEAD~1)}" line; _regen_refuses_current_head greps the guard; _scip_blob_metadata_records_prior_commit reads _seed/scip-index.scip non-emptiness).
  • Harden-tier — seed-template counters drift. Story expands the stale-scip source tree to ≤ 50 .ts files; the existing _seed/scip-slice.template.json has "files_indexed": 1, "files_in_repo": 1. If the source tree grows without updating these counters, B2 may surface CoverageGap instead of CommitsBehind. Fix: new predicate _seed_template_counters_match_source_tree asserts the seed template's counts equal the count of *.ts files under src/.
  • Design-pattern — kernel factory pattern. make_*_test returning pytest functions for module-level assignment is awkward for pytest's natural module-level @pytest.mark.parametrize discovery. Flatter alternative: kernel exposes pure helper functions (assert_file_exists(fixture, spec), assert_file_parses(fixture, spec), etc.); consumers write minimal @pytest.mark.parametrize test bodies. Documented as Notes-for-implementer (not promoted to AC — pattern advice is contextual; the consumer's choice). The functional-core / imperative-shell shape of the kernel makes this the more natural fit.
  • Design-pattern — enumerate_tracked as the kernel's port for git ls-files. The kernel's port-and-adapter discipline: git ls-files <fixture-path> is invoked from exactly one place in the kernel; consumers receive a tuple[str, ...] of relpaths. Documented as Notes-for-implementer.

Full audit log: _validation/S7-02-fixtures-batch-two.md.

Context

This story lands the remaining two of the five fixture repos:

  1. monorepo-pnpm/ — exercises DepGraphProbe cross-package edges via a real pnpm workspace. Three packages (packages/lib-a/, packages/lib-b/, packages/app/) with app depending on both libs, lib-b depending on lib-a. The dep_graph slice for this fixture contains real inter-package edges; tree_sitter_import_graph records the import adjacency between the workspace packages.
  2. stale-scip/the load-bearing roadmap exit-criterion fixture. Pre-populated SCIP index from a prior commit; HEAD has moved since; IndexHealthProbe (S4-01) must catch the staleness in CI (test_stale_scip_fixture.py from S4-02). S4-02 landed a STUB directory + minimal SCIP blob + README.md policy so the adversarial test could run during Step 4; this story produces the full materialization — populated .ts files, a real SCIP index built from a prior commit, two committed commits documented in the fixture so the staleness path is real.

The synthesis ledger pins three Step-7 implementation risks to this story:

  • Risk #3 (stale-scip regeneration silently breaks the load-bearing exit). A future contributor regenerates the SCIP fixture against current HEAD; the test still passes (because CommitsBehind.n >= 0 is trivially satisfied) but no longer exercises staleness. Defense: regenerate.sh for stale-scip MUST error out if invoked against current HEAD; README.md documents the structural assertion (CommitsBehind.n >= 1 and last_indexed != current_HEAD); the S4-02 adversarial asserts both inequalities — but the fixture's regenerate.sh is the front-line guard.
  • Risk #5 (golden-file non-determinism). Inherited from S7-01; this story compounds it because monorepo-pnpm's pnpm install against the public registry may produce slightly different lockfile bytes across runs. The discipline: pin the lockfile bytes at fixture creation time, never re-run pnpm install in regenerate.sh (the lockfile is committed; the regen script asserts it has not drifted).
  • Risk #8 (Phase 3 protocol drift). monorepo-pnpm is one of the two fixtures Phase 3's first plugin author will use as a target (per "Next-phase integration" table in phase-arch-design.md). The dep-graph evidence this fixture produces is what Phase 3's DepGraphAdapter will consume; the fixture's shape is part of the Protocol contract. Document this in the fixture's README.md so Phase 3's author sees the explicit handoff.

This story is also the natural landing point for the shared _shape_test_kernel.py the Rule-of-Three guard in S7-01 deferred. With five fixtures (Phase 1's node_typescript_helm/ + S7-01's three + this story's two), the kernel earns its keep.

References — where to look

  • Architecture:
  • ../phase-arch-design.md §"Testing strategy" → "Fixture portfolio"monorepo-pnpm + stale-scip rows.
  • ../phase-arch-design.md §"Component design" #1 (IndexHealthProbe — the stale-scip adversarial consumer).
  • ../phase-arch-design.md §"Component design" #11 (DepGraphProbemonorepo-pnpm's primary exerciser).
  • ../phase-arch-design.md §"Edge cases" row 11 (stale-scip fixture in CI — deliberate seed; the table row this story implements).
  • ../phase-arch-design.md §"Implementation risks" #3, #5, #8.
  • Phase ADRs: ADR-0006 (IndexFreshness sum type — CommitsBehind variant is the structural assertion), ADR-0007 (no plugin loader — monorepo-pnpm ships zero plugins/).
  • Implementation plan: ../High-level-impl.md §"Step 7"monorepo-pnpm + stale-scip bullets.
  • Source design: ../final-design.md §"Open questions" #7 (stale-scip regeneration policy — this story implements the named documentation discipline).
  • Existing code:
  • tests/adv/phase02/test_stale_scip_fixture.py (S4-02 — the adversarial this story's fixture must satisfy).
  • tests/fixtures/portfolio/stale-scip/README.md (S4-02 stub — this story extends it).
  • tests/fixtures/portfolio/minimal-ts/ + native-modules/ + distroless-target/ (S7-01 — shape conventions transfer).

Goal

Two fixtures exist under tests/fixtures/portfolio/:

  1. monorepo-pnpm/ — pnpm workspace with three packages; root pnpm-workspace.yaml; packages/lib-a/{package.json,src/index.ts}, packages/lib-b/{package.json,src/index.ts} (imports lib-a), packages/app/{package.json,src/index.ts} (imports both); a single root pnpm-lock.yaml resolving all internal + minimal external deps; root Dockerfile, .github/workflows/ci.yml, tsconfig.json at each package level; shape test (tests/unit/test_fixture_monorepo_pnpm_shape.py).
  2. stale-scip/ — full materialization, additive over the existing S4-02 stub at tests/fixtures/portfolio/stale-scip/. The existing stub mechanism is preserved: gitignored .git/ (regenerated by regenerate.sh); gitignored .codegenie/ (regenerated by regenerate.sh); committed seeds under _seed/; regenerate.sh initializes .git/, commits v0 (parent / last_indexed_commit) then v1 (HEAD), materializes _seed/scip-slice.template.json.codegenie/context/raw/scip.json (substituting PARENT_COMMIT), copies _seed/scip-index.scip.codegenie/context/raw/scip-index.scip, and refuses to set LAST_INDEXED == HEAD. S7-02's contributions are additive: (a) replace the empty _seed/scip-index.scip.placeholder with a real scip-typescript-built binary blob (produced OUT-OF-BAND by the contributor on their local box; the seed binary is committed); (b) expand the source tree to ≤ 50 .ts files; (c) update _seed/scip-slice.template.json's files_indexed/files_in_repo counters to match the seeded source-tree footprint; (d) extend README.md with the Phase-3 entry-gate handoff note + the seed-build ritual section. .codegenie/ stays gitignored; no .gitignore carve-out for .codegenie/context/raw/scip-index.scip is needed. The S4-02 adversarial test at tests/adv/phase02/test_stale_scip_fixture.py continues to read .codegenie/context/raw/scip.json (materialized from the seed template), and the binary _seed/scip-index.scip is the forward-looking contract surface for S4-03's ScipIndexProbe.

The shared _shape_test_kernel.py is extracted to tests/fixtures/_shape_test_kernel.py (above the portfolio/ subdirectory so Phase 1's tests/fixtures/node_typescript_helm/ shape test can import it cleanly) and consumed by all five S7-01/S7-02 portfolio fixtures' shape tests + Phase 1's node_typescript_helm/ shape test (sixth consumer; conclusively past Rule of Three).

Acceptance criteria

monorepo-pnpm/ fixture tree shape

  • [ ] AC-1. tests/fixtures/portfolio/monorepo-pnpm/ directory exists.
  • [ ] AC-2 — pnpm-workspace.yaml declares packages: ["packages/*"]; parses via safe_yaml.load.
  • [ ] AC-3 — package.json at root declares "name": "monorepo-pnpm-fixture", "private": true, "workspaces": ["packages/*"] (redundant with pnpm-workspace.yaml, but pnpm reads either); "devDependencies": {"typescript": "^5.3.0"}; no dependencies. Parses via safe_json.load.
  • [ ] AC-4 — packages/lib-a/package.json declares "name": "@monorepo-pnpm/lib-a", "version": "0.0.1", "main": "src/index.ts", no dependencies. Parses.
  • [ ] AC-5 — packages/lib-a/src/index.ts exports a single function add(a: number, b: number): number.
  • [ ] AC-6 — packages/lib-b/package.json declares "name": "@monorepo-pnpm/lib-b", "version": "0.0.1", "main": "src/index.ts", "dependencies": {"@monorepo-pnpm/lib-a": "workspace:*"} (the load-bearing pnpm workspace-protocol marker DepGraphProbe exercises). Parses.
  • [ ] AC-7 — packages/lib-b/src/index.ts imports from @monorepo-pnpm/lib-a and exports a derived function. The import statement is the load-bearing edge tree_sitter_import_graph records.
  • [ ] AC-8 — packages/app/package.json declares "name": "@monorepo-pnpm/app", "version": "0.0.1", "main": "src/index.ts", "dependencies": {"@monorepo-pnpm/lib-a": "workspace:*", "@monorepo-pnpm/lib-b": "workspace:*", "express": "^4.18.2"}. Parses.
  • [ ] AC-9 — packages/app/src/index.ts imports from both internal packages + express; declares a trivial Express handler. The two internal imports are what dep_graph slice records as cross-package edges.
  • [ ] AC-10 — root pnpm-lock.yaml is committed as hand-authored bytes (S7-01 native-modules HARDENED precedent — pnpm is NOT in ALLOWED_BINARIES per ADR-0001 + S1-06 AC-10; regenerate.sh MUST NOT invoke pnpm install / pnpm install --frozen-lockfile / any pnpm subcommand). Body: lockfileVersion: '6.0' header; resolves all three internal packages via the workspace:* protocol; resolves express and its transitive deps to pinned versions. Parses via safe_yaml.load. Generation path (out-of-band, contributor's local box, one-time per dep-version bump): run pnpm install once in a scratch directory exactly matching the fixture manifest; copy the resulting pnpm-lock.yaml into the fixture; commit. regenerate.sh is mkdir/coreutils-only (per AC-31's static check + tests/unit/_fixture_regen_allowlist.py shared module) — no install commands. Defense-in-depth: a fixture-local .npmrc with ignore-scripts=true ships alongside (mirrors S7-01 native-modules AC-16) so any operator who later runs pnpm install locally doesn't trigger lifecycle scripts.
  • [ ] AC-11 — tsconfig.json at each package level; root tsconfig.json with "references" declaring all three packages (TS project-references shape; exercises tsconfig-walk paths).
  • [ ] AC-12 — root Dockerfile is multi-stage; FROM node:20-slim AS build builds the app; final stage FROM node:20-slim; USER node; CMD ["node", "packages/app/dist/index.js"]. Parses by the Phase-2 Dockerfile probe.
  • [ ] AC-13 — root .github/workflows/ci.yml declares one job build with run: pnpm install --frozen-lockfile && pnpm -r build && pnpm -r test. Parses via safe_yaml.load.
  • [ ] AC-14 — README.md lists every file by relpath, names every probe in consumers, AND explicitly documents (in prose) "Phase 3 entry-gate target — DepGraphAdapter's first plugin will produce cross-package edges from this fixture." This is the Risk-#8 named handoff.

stale-scip/ fixture full materialization

  • [ ] AC-15 — additive materialization, NOT wholesale replacement. The existing S4-02 stub at tests/fixtures/portfolio/stale-scip/ ships: gitignored .git/ (regenerated by regenerate.sh); gitignored .codegenie/ (regenerated by regenerate.sh); committed seeds under _seed/; committed package.json + main.ts + regenerate.sh + README.md + .gitignore + .gitattributes. The seed-template + regenerate-script mechanism is PRESERVED. S7-02's contribution is restricted to: (a) replace the empty _seed/scip-index.scip.placeholder with a real (binary) SCIP blob produced OUT-OF-BAND by scip-typescript against the v0 commit tree (seed-build ritual per AC-21a; the resulting binary is committed at _seed/scip-index.scip); (b) expand the source tree to ≤ 50 .ts files; (c) update _seed/scip-slice.template.json's files_indexed / files_in_repo counters to match the seeded source-tree footprint; (d) extend README.md per AC-22. No wholesale stub replacement. A future contributor must NOT rm -rf the stub directory before applying S7-02's changes.
  • [ ] AC-16 — expanded source tree. src/ contains at least 5 .ts files with real export / import statements (e.g., src/a.ts exports a function; src/b.ts imports a's function + exports its own; chained through src/e.ts). Each file ≤ 30 lines. package.json declares the typescript devDependency at a version compatible with the scip-typescript version pinned in README.md (per AC-22). tsconfig.json is valid JSONC; emits to dist/ (which is gitignored — never built at regen time).
  • [ ] AC-17 — real binary SCIP committed at _seed/scip-index.scip (replaces _seed/scip-index.scip.placeholder). The binary is the output of scip-typescript invoked against the v0 commit tree per AC-21a's seed-build ritual. .codegenie/ STAYS gitignored (fixture-local .gitignore continues to list .codegenie/); the regenerate.sh script copies _seed/scip-index.scip to .codegenie/context/raw/scip-index.scip at runtime. No .gitignore carve-out for .codegenie/ is needed; S7-01's central no-committed-cache guard (tests/unit/test_no_committed_codegenie_cache_under_portfolio_fixtures.py) passes unchanged.
  • [ ] AC-18 — fixture mechanism (preserved from S4-02 stub). The fixture's "two-commits" history lives inside the fixture's OWN micro-git-repo at tests/fixtures/portfolio/stale-scip/.git/, which is gitignored (regenerated by regenerate.sh). Mechanism (codified in the existing regenerate.sh): rm -rf .git .codegenie; git init -q -b main; git add package.json && git commit -m "v0 — seeded last_indexed_commit" (the last_indexed_commit target); capture PARENT_COMMIT=$(git rev-parse HEAD); git add main.ts <other src files> && git commit -m "v1 — HEAD moves forward" (HEAD is now ahead by ≥ 1); materialize .codegenie/context/raw/scip.json from _seed/scip-slice.template.json by substituting PARENT_COMMIT; copy _seed/scip-index.scip to .codegenie/context/raw/scip-index.scip. The LAST_INDEXED and HEAD SHAs are genuinely different by construction. The story honors this mechanism wholesale; it is NOT to be replaced with a last-indexed-commit.txt-based scheme.
  • [ ] AC-19 — last_indexed_commit lives in _seed/scip-slice.template.json as the placeholder string "PARENT_COMMIT", substituted by regenerate.sh at runtime into .codegenie/context/raw/scip.json (gitignored). The S4-02 adversarial reads the materialized scip.json to assert freshness.reason.last_indexed != current_HEAD. There is NO last-indexed-commit.txt file; the prior-commit SHA is not separately persisted on disk because regenerate.sh captures it locally as the PARENT_COMMIT shell variable and substitutes it into the materialized slice. (The committed seed template stores the placeholder; the runtime materialized slice stores the actual prior-commit SHA.)
  • [ ] AC-20 — regenerate.sh's "refuse-against-current-HEAD" guard. The existing stub's regenerate.sh already implements the guard via LAST_INDEXED="${LAST_INDEXED:-$(git rev-parse HEAD~1)}" (defaults last_indexed to HEAD~1; never HEAD) followed by if [[ "$LAST_INDEXED" == "$(git rev-parse HEAD)" ]]; then echo "ERROR: regenerate.sh refuses to set last_indexed_commit == HEAD" >&2; exit 1; fi. The story PRESERVES this guard; it is NOT to be replaced. Verification of the guard's correctness lives in the existing tests/unit/test_stale_scip_regenerate_sh_guard.py (or whatever the S4-02 story landed for it — confirm at land-time; if missing, this story adds it). The test runs regenerate.sh with LAST_INDEXED=$(git rev-parse HEAD) forced via env override and asserts exit code 1 + a stderr message containing "refuses to set last_indexed_commit". Skipped unless CODEGENIE_REGEN_FIXTURES=1.
  • [ ] AC-21a — seed-build ritual (one-time per scip-typescript version bump; OUT-OF-BAND). The contributor producing _seed/scip-index.scip does so on their local box, NOT inside regenerate.sh. Sequence: (1) check out a clean copy of the v0 tree (just package.json) into a scratch directory; (2) invoke scip-typescript against the scratch dir (scip-typescript is in ALLOWED_BINARIES; bash can call it directly — but the seed-build is contributor-side, not regen-side); (3) copy the resulting .scip blob to _seed/scip-index.scip in the fixture; (4) commit. Pin the scip-typescript version in README.md so future contributors regenerate against the same tool version when the binary is updated. regenerate.sh does NOT invoke scip-typescript; the seed-binary is committed bytes (same discipline as pnpm-lock.yaml for monorepo-pnpm — generated once out-of-band, committed, treated as fixture bytes thereafter).
  • [ ] AC-21b — regenerate.sh runtime behavior (preserved from S4-02 stub). The script's full behavior is: rm -rf .git .codegenie; git init -q -b main with the fixture-local user.email/user.name; commit v0 (package.json) then v1 (main.ts + any expanded source files); mkdir -p .codegenie/context/raw; sed "s|PARENT_COMMIT|${PARENT_COMMIT}|g" _seed/scip-slice.template.json > .codegenie/context/raw/scip.json; cp _seed/scip-index.scip .codegenie/context/raw/scip-index.scip; AC-20's guard. No scip-typescript invocation at regen time; no pnpm/npm/node-gyp invocation. The script invokes only git, mkdir, rm, cp, sed, echo (all in ALLOWED_BINARIES ∪ _SHELL_COREUTILS_ALLOWLIST per S7-01's tokenizer spec). AC-31's static check passes.
  • [ ] AC-22 — README.md documents the regeneration ritual explicitly, additive over the existing stub's prose. Required sections: "Why this fixture exists" (preserved); "Structural assertion (CommitsBehind.n >= 1 AND last_indexed != current_HEAD — tool-version-agnostic)" (preserved + extended with the rationale of both inequalities); "Regeneration policy — DO NOT retarget against current HEAD" (preserved + extended); "Seed-build ritual (one-time per scip-typescript version bump)" (NEW — the AC-21a out-of-band ritual); "How to add a new commit (and the SCIP-vs-HEAD invariant that survives)" (NEW); "Pinned scip-typescript version" (NEW — records the tool version used to build _seed/scip-index.scip). The README is the Risk-#3 front-line guard.

Shared _shape_test_kernel.py extraction

  • [ ] AC-23 — tests/fixtures/_shape_test_kernel.py (above the portfolio/ subdirectory; chosen so Phase 1's tests/fixtures/node_typescript_helm/-targeted shape test can import the kernel without crossing the portfolio/-namespace boundary) is extracted with: the _FileSpec (frozen NamedTuple) + _ProbeName (Literal) + _ParserKind (Literal) types; the enumerate_tracked(fixture_path) -> tuple[str, ...] port (the only call site for git ls-files <fixture-path> — invoked through run_allowlisted("git", "ls-files", str(fixture_path)); consumers receive a tuple of relpaths and never shell out themselves); the _FIXTURE_NOISE_NAMES defense-in-depth frozenset; the parametrized-test machinery (see AC-24 for the choice of test-factory vs. flat-helper shape — both are acceptable as long as mypy --strict passes and consumers don't duplicate the structural logic). The kernel passes mypy --strict. The kernel declares a module-level __all__: Final[tuple[str, ...]] = (...); the test at tests/unit/test_shape_test_kernel.py asserts the runtime export set equals the documented contract (so a silent removal of enumerate_tracked or any factory becomes a build error).
  • [ ] AC-24 — every fixture's shape test consumes the kernel. tests/unit/test_fixture_{minimal_ts,native_modules,distroless_target,monorepo_pnpm,stale_scip}_shape.py import the kernel; each declares only its _FIXTURE path + its _FILE_SPECS tuple + its content-check predicate functions. The structural parametrized-test logic lives in the kernel. Implementer's choice — two acceptable shapes for the kernel's parametrized-test surface: (a) test-factory pattern (make_existence_test, make_parses_test, … returning pytest-decorated test functions for module-level assignment); (b) flat-helper pattern (assert_file_exists(fixture, spec), assert_file_parses(fixture, spec), … as pure helpers; each consumer writes minimal @pytest.mark.parametrize("spec", _FILE_SPECS, ids=lambda s: s.relpath) def test_fixture_file_exists(spec): assert_file_exists(_FIXTURE, spec)). The validator recommends (b) — it's more pytest-natural for module-level discovery, mypy --strict-clean without ergonomic dance, and keeps the kernel as a functional core. Pick one and apply consistently; the AC's requirement is "structural logic lives in the kernel; consumers declare only data", not the specific implementation shape.
  • [ ] AC-25 — Phase 1's test_fixture_node_typescript_helm_shape.py also migrates to the kernel. This is the sixth consumer and is the final demonstration that the kernel pays off (Rule of Three conclusively past). The migration preserves every existing AC from Phase 1 S2-03 (the original story at docs/phases/01-context-gather-layer-a-node/stories/S2-03-fixture-node-typescript-helm.md, ACs 1–23 and the hardened AC-37 + AC-38) — all S2-03 tests still pass after the kernel migration. Verification ritual: run the full Phase 1 test suite before and after the migration; the diff is non-test-file (just the import-rewrite of the existing test); test counts and pass/fail results unchanged.
  • [ ] AC-26 — kernel exposes _ProbeName as the live Phase-1 + Phase-2 probe-name superset; runtime check uses subset semantics (matching S7-01 AC-37). The Literal lists the full Phase-1 + Phase-2 probe names. A test at tests/unit/test_shape_test_kernel.py (alongside the __all__ test from AC-23) asserts set(p.name for p in default_registry.all()) ⊆ set(get_args(_ProbeName)) (subset, NOT equality) — Phase-3+ probes added later don't retroactively break Phase-2 fixtures, but a renamed/added Phase-2 probe whose name isn't reflected in the Literal IS a test failure. Equality semantics are explicitly REJECTED here — they would force every Phase-3+ probe addition to also edit the fixture kernel, which is the wrong direction.

Closed-set + forbidden-subpath + line-ending invariants per new fixture

  • [ ] AC-27 — monorepo-pnpm/ closed-set complement. test_fixture_monorepo_pnpm_tree_is_closed_set enumerates tracked files via enumerate_tracked (kernel port → git ls-files) and asserts the set equals {spec.relpath for spec in _FILE_SPECS}. node_modules/ MUST NOT be present in tracked files (gitignored; the install never happens in regen since pnpm is not allowlisted — per AC-10 — so node_modules/ doesn't exist in working trees either, but the gitignore defense covers operator-side pnpm install invocations).
  • [ ] AC-28 — stale-scip/ closed-set complement. test_fixture_stale_scip_tree_is_closed_set enumerates tracked files via enumerate_tracked (kernel port → git ls-files <fixture-path> from the parent codewizard-sherpa repo). Gitignored .git/ and .codegenie/ do NOT appear in the enumeration; the closed set is exactly _FILE_SPECS (which includes _seed/scip-slice.template.json, _seed/scip-index.scip, regenerate.sh, README.md, package.json, tsconfig.json, the src/*.ts files, .gitignore, .gitattributes). No include_paths carve-out for .codegenie/ — the real binary lives in _seed/, not under .codegenie/; the kernel's default exclusion of .codegenie/ continues to apply.
  • [ ] AC-29 — S7-01's central no-committed-cache guard passes unchanged. tests/unit/test_no_committed_codegenie_cache_under_portfolio_fixtures.py walks tests/fixtures/portfolio/ and asserts no .codegenie/cache/ directory or .codegenie/ content exists in committed (tracked) files. Both new fixtures honor the invariant: monorepo-pnpm/ does not produce any .codegenie/ content at all (no cache, no committed slice); stale-scip/ produces .codegenie/ content only at runtime (gitignored). No edit to the central guard test is needed; if the test currently has an explicit allowlist of zero entries, it stays at zero.
  • [ ] AC-30 — line endings per file for every file in both new fixtures (the kernel-provided test). Binary files (the seed _seed/scip-index.scip blob) are explicitly excluded from the LF check via _FILE_SPECS carrying a parser=None marker that the kernel treats as "skip line-ending check" — the same convention S7-01 used for the placeholder.
  • [ ] AC-31 — regenerate.sh invokes only allowlisted binaries per fixture, verified by tests/unit/test_fixture_monorepo_pnpm_regenerate_allowlist.py + tests/unit/test_fixture_stale_scip_regenerate_allowlist.py (one per new fixture, both consuming tests/unit/_fixture_regen_allowlist.py — the shared module S7-01 lifted; reused unchanged here). The tokenizer per S7-01's AC-31: each non-blank, non-comment line's first non-builtin/non-control-flow token must be in ALLOWED_BINARIES ∪ _SHELL_COREUTILS_ALLOWLIST. For monorepo-pnpm: the non-builtin / non-coreutil set must contain only git (if at all). For stale-scip: the set contains only git + sed (sed is in _SHELL_COREUTILS_ALLOWLIST). Explicit fails: pnpm, npm, node-gyp, scip-typescript (at runtime — the seed-build ritual is OUT-OF-BAND), curl, wget, eval. The story explicitly asserts pnpm ∉ invoked set for monorepo-pnpm and scip-typescript ∉ invoked set for stale-scip's regenerate.sh — the seed-build ritual invokes scip-typescript on the contributor's local box, not inside regenerate.sh.

stale-scip structural assertion survives regeneration (Risk #3 defense)

  • [ ] AC-32 — adversarial test from S4-02 continues to pass against the materialized fixture. tests/adv/phase02/test_stale_scip_fixture.py (landed in S4-02; this story does NOT edit it) reads .codegenie/context/raw/scip.json (materialized at regen time from _seed/scip-slice.template.json with the PARENT_COMMIT substitution) and asserts: (1) set(index_health.keys()) == {"scip", "runtime_trace"} (the widened outer-key set; was {"scip"} at S4-02 land time, widened by S5-05 — future S6-08 registrations may widen further; this story does NOT cause the set to change); (2) isinstance(slice.freshness, Stale); (3) isinstance(slice.freshness.reason, CommitsBehind); (4) slice.freshness.reason.n >= 1; (5) slice.freshness.reason.last_indexed != current_HEAD; (6) index_health["scip"]["confidence"] == "medium". The assertion has been passing since S4-02 against the stub-templated scip.json (the template carries PARENT_COMMIT, NOT current HEAD, by construction). S7-02's contribution to the assertion-passing claim is: the assertion CONTINUES to pass after the source tree expansion + seed-counter updates (AC-15 + AC-16). The real binary SCIP at _seed/scip-index.scip is forward-looking for S4-03's ScipIndexProbe consumer, not a determinant of S4-02's current assertion. Pre-flight check the implementer runs: pytest tests/adv/phase02/test_stale_scip_fixture.py after the source-tree expansion — observe green.
  • [ ] AC-33 — last_indexed != current_HEAD (both inequalities) is the structural assertion in the adversarial — not just n >= 1 (which >= 0 would trivially satisfy). The S4-02 file already encodes this; the existing regenerate.sh already enforces LAST_INDEXED != HEAD by construction (LAST_INDEXED defaults to HEAD~1 plus the guard against operator override to HEAD). This story's contribution is preserving the invariant after the source-tree expansion: ensure the v0/v1 split + the seed-template's last_indexed_commit=PARENT_COMMIT substitution mechanics survive.

Determinism, audit hygiene, type cleanliness

  • [ ] AC-34 — regenerate.sh byte-identical-twice scope is the tracked-files scope (matching S7-01 AC-30's hardened convention; gitignored artifacts — .git/, .codegenie/, dist/, node_modules/ — are out of scope by design). For monorepo-pnpm/: tracked-files SHA equality across two consecutive invocations (manual local verification; documented in PR). stale-scip/'s scope is narrower still: only the committed _seed/ blobs + manifest files + regenerate.sh + README.md + .gitignore + .gitattributes are in scope — the regenerated .git/ and .codegenie/ legitimately re-derive distinct ephemeral SHAs across invocations (each git init produces fresh object SHAs because the commit timestamps and committer identity may differ across runs even with the fixture-local user.email pin), and that's intentional — only the COMMITTED bytes are part of the fixture contract.
  • [ ] AC-35 — every new shape-test + kernel + the _seed/scip-index.scip binary's existence assertion passes mypy --strict. No Any outside the explicit payload: Any parser-dispatch lines (Phase 1 convention).
  • [ ] AC-36 — Phase 1's test_fixture_node_typescript_helm_shape.py still passes after the kernel migration (AC-25). Mandatory: run the existing test suite, observe green; the migration is refactor-by-extraction, not behavior change. Concretely: the diff of the test file is just the import change (from tests.fixtures._shape_test_kernel import ...) + the removal of the duplicated structural-test code (now imported from the kernel) — no logic edit, no behavior change.

Implementation outline

  1. Plant monorepo-pnpm/ first (no risky surface).
  2. mkdir -p tests/fixtures/portfolio/monorepo-pnpm/{packages/lib-a/src,packages/lib-b/src,packages/app/src,.github/workflows}.
  3. Write the shape test (tests/unit/test_fixture_monorepo_pnpm_shape.py) — TDD red, modeled on S7-01's three fixtures (still using inlined parametrized-test bodies; the kernel extraction comes in step 3 below).
  4. Plant each file per AC-2..AC-14.
  5. Generate the pnpm-lock.yaml ONCE, OUT-OF-BAND, on the contributor's local box: run pnpm install in a scratch directory that exactly matches the fixture manifest; copy the resulting pnpm-lock.yaml into the fixture; commit. regenerate.sh does NOT invoke pnpm (per AC-10 + ADR-0001 — pnpmALLOWED_BINARIES); the regen script is mkdir/coreutils-only (it materializes any tree skeleton that is regenerated and asserts invariants — but the lockfile is committed bytes treated as fixture contract).
  6. Plant .npmrc with ignore-scripts=true (defense-in-depth; mirrors S7-01 native-modules AC-16).
  7. Write tests/unit/test_fixture_monorepo_pnpm_regenerate_allowlist.py consuming tests/unit/_fixture_regen_allowlist.py (S7-01's shared module — reused unchanged); the test explicitly asserts pnpm ∉ invoked-binary set per AC-31.
  8. Run shape test + allowlist test. Green.
  9. Materialize stale-scip/ additively over the existing S4-02 stub.
  10. READ the existing stub first. tests/fixtures/portfolio/stale-scip/{README.md, regenerate.sh, .gitignore, package.json, main.ts, _seed/scip-slice.template.json, _seed/scip-index.scip.placeholder, .gitattributes} codify the seed-template + gitignored-.git/ + gitignored-.codegenie/ mechanism. Do NOT rm -rf the stub directory. Read the existing regenerate.sh end-to-end so you understand the v0/v1 commit sequence, the PARENT_COMMIT substitution, and the LAST_INDEXED guard.
  11. Write the shape test (tests/unit/test_fixture_stale_scip_shape.py) — TDD red, with _FILE_SPECS declaring the committed files only (_seed/scip-slice.template.json, _seed/scip-index.scip, package.json, tsconfig.json, src/*.ts, regenerate.sh, README.md, .gitignore, .gitattributes).
  12. Expand the source tree to ≤ 50 .ts files: add src/a.ts through src/e.ts (or more) with chained export/import statements. The expanded tree gives scip-typescript more to index than the stub's single main.ts.
  13. Update _seed/scip-slice.template.json's files_indexed / files_in_repo to match the count of .ts files in the new src/ tree (or whatever subset the seeded SCIP actually covers — pick deliberately, document in README.md's "Seed-build ritual" section).
  14. Seed-build ritual (one-time; OUT-OF-BAND on contributor's local box): on a scratch directory, check out only package.json (the v0 tree from the existing regenerate.sh's perspective); run scip-typescript . against the scratch directory; copy the resulting .scip blob to _seed/scip-index.scip (replacing _seed/scip-index.scip.placeholder). Pin the scip-typescript version used in README.md. Do NOT touch regenerate.sh to invoke scip-typescript — it stays as a committed seed bytes step.
  15. Extend regenerate.sh ONLY for the source-tree expansion: the existing script commits main.ts for v1; widen this to commit all expanded src/*.ts files for v1. All other lines of the existing regenerate.sh are preserved, including the LAST_INDEXED="${LAST_INDEXED:-$(git rev-parse HEAD~1)}" + if [[ "$LAST_INDEXED" == "$(git rev-parse HEAD)" ]]; then exit 1; fi guard (AC-20). The cp _seed/scip-index.scip.placeholder ... line in the existing script becomes cp _seed/scip-index.scip ... (the seed file now has real content).
  16. Extend README.md per AC-22 — preserve existing sections (existing structural-assertion and regeneration-policy sections); add: "Seed-build ritual (one-time per scip-typescript version bump)" + "How to add a new commit" + "Pinned scip-typescript version".
  17. Verify: run bash tests/fixtures/portfolio/stale-scip/regenerate.sh; observe the v0/v1 commits + materialized .codegenie/context/raw/scip.json + copied .codegenie/context/raw/scip-index.scip (real binary now). Run pytest tests/adv/phase02/test_stale_scip_fixture.py — green (AC-32). Run LAST_INDEXED=$(cd tests/fixtures/portfolio/stale-scip && git rev-parse HEAD) bash tests/fixtures/portfolio/stale-scip/regenerate.sh; observe exit code 1 + stderr "refuses to set last_indexed_commit == HEAD" (AC-20).
  18. Write tests/unit/test_fixture_stale_scip_regenerate_allowlist.py consuming tests/unit/_fixture_regen_allowlist.py; explicitly assert scip-typescript ∉ invoked-binary set (the seed-build ritual is out-of-band).
  19. Run shape test + allowlist test + adversarial. All green.
  20. Extract the shared kernel at tests/fixtures/_shape_test_kernel.py.
  21. Compare the three S7-01 shape-test files + the two new shape-test files + Phase 1's tests/unit/test_fixture_node_typescript_helm_shape.py. The duplicated machinery is the parametrized-test bodies + enumerate_tracked + _FIXTURE_NOISE_NAMES. The variable parts are _FIXTURE, _FILE_SPECS, the content-check predicates.
  22. Write tests/fixtures/_shape_test_kernel.py (above the portfolio/ subdirectory) with: the _FileSpec frozen NamedTuple + _ProbeName Literal + _ParserKind Literal types; the enumerate_tracked(fixture_path) -> tuple[str, ...] port (only call site for git ls-files); the _FIXTURE_NOISE_NAMES frozenset; the parametrized-test helpers per AC-24 (validator-recommended: flat helper functions like assert_file_exists, assert_file_parses, etc. — but factory-based pattern is also acceptable if mypy --strict-clean).
  23. Add __all__: Final[tuple[str, ...]] = (...) to the kernel; write tests/unit/test_shape_test_kernel.py asserting the runtime export set + the _ProbeName subset semantics check per AC-26.
  24. One at a time: migrate tests/unit/test_fixture_minimal_ts_shape.py → kernel-consumer; run; observe green. Same for native_modules, distroless_target, the two new fixtures, AND Phase 1's tests/unit/test_fixture_node_typescript_helm_shape.py (sixth consumer).
  25. Verify all six shape tests still pass; AC-25 + AC-36 require Phase 1's existing tests pass identically post-migration.
  26. Verify S7-01's central no-committed-cache guard passes unchanged. Run tests/unit/test_no_committed_codegenie_cache_under_portfolio_fixtures.py (S7-01's guard). No edit to this test is neededstale-scip's real binary SCIP lives at _seed/scip-index.scip, NOT under .codegenie/; the test's invariant ("no .codegenie/ content under tests/fixtures/portfolio/ in committed tracked files") passes unchanged. If the test would fail because some other adjacent change leaked a .codegenie/ path, that's a separate bug — fix it in place, not by allowlisting.
  27. Final pass: mypy --strict, ruff, ruff format --check. Run the full Phase 2 test suite (pytest -q minus advisory benches). Green.

TDD plan — red / green / refactor

Red — failing shape tests first

For monorepo-pnpm, the shape test mirrors S7-01:

# tests/unit/test_fixture_monorepo_pnpm_shape.py (excerpt)
_FILE_SPECS: tuple[_FileSpec, ...] = (
    _FileSpec("pnpm-workspace.yaml", ("node_build_system", "dep_graph"), "safe_yaml", (_workspace_declares_packages,)),
    _FileSpec("package.json", ("node_build_system", "node_manifest"), "safe_json", (_root_pkg_shape,)),
    _FileSpec("packages/lib-a/package.json", ("node_manifest", "dep_graph"), "safe_json", (_lib_a_pkg_shape,)),
    _FileSpec("packages/lib-a/src/index.ts", ("language_detection", "tree_sitter_import_graph"), "text", (_lib_a_exports_add,)),
    _FileSpec("packages/lib-b/package.json", ("node_manifest", "dep_graph"), "safe_json",
              (_lib_b_pkg_shape, _lib_b_declares_workspace_dep_on_lib_a)),
    _FileSpec("packages/lib-b/src/index.ts", ("language_detection", "tree_sitter_import_graph"),
              "text", (_lib_b_imports_from_lib_a,)),
    _FileSpec("packages/app/package.json", ("node_manifest", "dep_graph"), "safe_json",
              (_app_pkg_shape, _app_declares_workspace_deps_on_both_libs)),
    _FileSpec("packages/app/src/index.ts", ("language_detection", "tree_sitter_import_graph"),
              "text", (_app_imports_from_both_libs,)),
    _FileSpec("pnpm-lock.yaml", ("node_build_system", "node_manifest", "dep_graph"),
              "safe_yaml", (_lock_v6_header,)),
    _FileSpec("tsconfig.json", ("node_build_system",), "jsonc", (_tsconfig_root_references_all_three,)),
    _FileSpec("Dockerfile", ("dockerfile", "runtime_trace", "entrypoint"), "text",
              (_dockerfile_multistage, _dockerfile_uses_node_slim, _dockerfile_runs_as_node_user)),
    _FileSpec(".github/workflows/ci.yml", ("ci",), "safe_yaml", (_ci_runs_recursive_build,)),
    _FileSpec("README.md", (), "text", (_readme_documents_phase3_entry_gate_target,)),
)

The load-bearing content predicates for monorepo-pnpm:

  • _lib_b_declares_workspace_dep_on_lib_a(pkg) — asserts pkg["dependencies"]["@monorepo-pnpm/lib-a"] == "workspace:*". Mutation: drop the dep → fails.
  • _lib_b_imports_from_lib_a(raw_bytes) — asserts 'from "@monorepo-pnpm/lib-a"' is in the source. Mutation: remove the import → fails.
  • _app_declares_workspace_deps_on_both_libs(pkg) — asserts both workspace:* deps. Mutation: drop either → fails.
  • _app_imports_from_both_libs(raw_bytes) — asserts both from "@monorepo-pnpm/lib-a" AND from "@monorepo-pnpm/lib-b". Mutation: drop either → fails. (This is the load-bearing pair the tree_sitter_import_graph golden depends on.)
  • _readme_documents_phase3_entry_gate_target(raw_bytes) — asserts the literal phrase "Phase 3 entry-gate target" appears in README.md. The phrase is the Risk-#8 named handoff.

For stale-scip (the committed fixture surface; runtime-materialized .codegenie/ content is NOT in _FILE_SPECS because it's gitignored):

_FILE_SPECS: tuple[_FileSpec, ...] = (
    _FileSpec("package.json", ("node_build_system", "node_manifest"), "safe_json", (_pkg_declares_typescript,)),
    _FileSpec("tsconfig.json", ("node_build_system",), "jsonc", (_tsconfig_shape,)),
    _FileSpec("src/a.ts", ("language_detection",), "text", (_a_ts_exports,)),
    _FileSpec("src/b.ts", ("language_detection",), "text", (_b_ts_imports_a,)),
    _FileSpec("src/c.ts", ("language_detection",), "text", (_c_ts_imports_b,)),
    _FileSpec("src/d.ts", ("language_detection",), "text", (_d_ts_imports_c,)),
    _FileSpec("src/e.ts", ("language_detection",), "text", (_e_ts_imports_d,)),
    _FileSpec("main.ts", ("language_detection",), "text", (_main_ts_imports_e,)),  # preserved from S4-02 stub
    _FileSpec("_seed/scip-slice.template.json", ("scip_index", "index_health"), "safe_json",
              (_template_carries_parent_commit_placeholder,
               _seed_template_counters_match_source_tree,)),
    _FileSpec("_seed/scip-index.scip", ("scip_index",), None,
              (_scip_blob_non_empty, _scip_blob_smoke_shape,)),
    _FileSpec("regenerate.sh", (), "text",
              (_regen_initializes_git_and_commits_two_commits,
               _regen_substitutes_parent_commit_into_template,
               _regen_copies_seed_scip_to_runtime_path,
               _last_indexed_defaults_to_head_tilde_one,
               _regen_refuses_current_head,
               _regen_invokes_only_allowlisted_binaries,)),
    _FileSpec("README.md", (), "text",
              (_readme_documents_structural_assertion,
               _readme_documents_regen_ritual,
               _readme_documents_seed_build_ritual,
               _readme_pins_scip_typescript_version,)),
    _FileSpec(".gitignore", (), "text", (_gitignore_excludes_git_and_codegenie,)),
    _FileSpec(".gitattributes", (), "text", ()),
)

The load-bearing content predicates for stale-scip (all reading committed bytes — never runtime-materialized state):

  • _last_indexed_defaults_to_head_tilde_one(raw_bytes) — greps regenerate.sh for the line LAST_INDEXED="${LAST_INDEXED:-$(git rev-parse HEAD~1)}" (or its semantic equivalent — HEAD~1 is the structural guarantee that last_indexed is the PARENT of HEAD, never HEAD itself). This is the Risk-#3 front-line invariant. Mutation: a contributor "fixes" regenerate.sh to default LAST_INDEXED to HEAD → this predicate fails. Pure-string grep; no subprocess invocation (the predicate is called against the static script bytes by the kernel's content-invariants test, which is itself pure).
  • _regen_refuses_current_head(raw_bytes) — greps regenerate.sh for the explicit check if [[ "$LAST_INDEXED" == "$(git rev-parse HEAD)" ]] (or its semantic equivalent) + the exit 1 branch. Pins the load-bearing guard at the script-text level.
  • _regen_substitutes_parent_commit_into_template(raw_bytes) — greps regenerate.sh for the sed "s|PARENT_COMMIT|...|g" _seed/scip-slice.template.json > .codegenie/context/raw/scip.json line (or its semantic equivalent). Pins the template-substitution step; mutation: a contributor "tidies up" the regen script by hardcoding the materialized scip.json → predicate fails.
  • _regen_copies_seed_scip_to_runtime_path(raw_bytes) — greps regenerate.sh for cp _seed/scip-index.scip .codegenie/context/raw/scip-index.scip (or equivalent). Pins the seed-binary-copy step; mutation: a contributor adds scip-typescript invocation inside regen.sh instead of the cp-from-seed → predicate fails AND the AC-31 allowlist test also fails.
  • _template_carries_parent_commit_placeholder(parsed_json) — asserts parsed_json["last_indexed_commit"] == "PARENT_COMMIT" (the placeholder string, NOT a real SHA — the substitution happens at regen runtime). Mutation: a contributor "tidies up" the template by replacing the placeholder with the actual prior commit SHA at fixture creation → predicate fails (and the regenerate.sh substitution would no-op silently).
  • _seed_template_counters_match_source_tree(parsed_json) — counts *.ts files under src/ (and main.ts at root if present); asserts parsed_json["files_indexed"] == parsed_json["files_in_repo"] == <count> (or the deliberately-pinned subset count, per AC-15 + AC-16 + the README's "Seed-build ritual" section). Mutation: a contributor grows the source tree without updating the seed template → IndexHealthProbe surfaces CoverageGap instead of CommitsBehind and the adversarial fails for the wrong reason.
  • _scip_blob_non_empty(raw_bytes) — asserts len(raw_bytes) > 0 (the placeholder was 0 bytes; the real binary is non-empty). The first sanity check that the seed-build ritual actually ran.
  • _scip_blob_smoke_shape(raw_bytes) — asserts the blob is parseable as a SCIP index (the wire-format is a protobuf-serialized Index message; smoke check: the first bytes are a valid SCIP magic / varint prefix per the SCIP spec, OR a minimum-size check of ≥ 200 bytes which any real scip-typescript output exceeds). NOT a deep structural assertion; the placeholder is 0 bytes so the minimum-size check alone catches "seed-build ritual didn't run".
  • _readme_documents_structural_assertion(raw_bytes) — asserts the README contains both "CommitsBehind.n >= 1" AND "last_indexed != current_HEAD" phrases verbatim.
  • _readme_documents_seed_build_ritual(raw_bytes) — asserts the README has a "Seed-build ritual" section.
  • _readme_pins_scip_typescript_version(raw_bytes) — asserts the README pins the scip-typescript version used to produce _seed/scip-index.scip (regex scip-typescript\s+v?\d+\.\d+\.\d+ or equivalent).
  • _gitignore_excludes_git_and_codegenie(raw_bytes) — asserts .gitignore contains lines .git/ AND .codegenie/. Mutation: a contributor adds an !.codegenie/... allowlist carve-out → predicate fails (and S7-01's central no-committed-cache guard would also catch a leaked path).

Green — make it pass

Plant the trees. Run the shape tests. Green. Then extract the kernel.

Mutation-resistance witness table

Mutation Test that catches it
Drop "@monorepo-pnpm/lib-a": "workspace:*" from lib-b/package.json test_fixture_monorepo_pnpm_content_invariants[packages/lib-b/package.json] via _lib_b_declares_workspace_dep_on_lib_a
Remove the import from app/src/index.ts (silently breaks the tree_sitter_import_graph golden) _app_imports_from_both_libs
monorepo-pnpm/regenerate.sh invokes pnpm install --frozen-lockfile (or any pnpm subcommand) tests/unit/test_fixture_monorepo_pnpm_regenerate_allowlist.py (consuming _fixture_regen_allowlist.py) — pnpmALLOWED_BINARIES ∪ _SHELL_COREUTILS_ALLOWLIST
monorepo-pnpm/regenerate.sh invokes npm install or node-gyp rebuild Same allowlist test — neither binary is in ALLOWED_BINARIES
Contributor "tidies up" stale-scip/regenerate.sh by defaulting LAST_INDEXED to HEAD instead of HEAD~1 _last_indexed_defaults_to_head_tilde_one grep predicate AND tests/adv/phase02/test_stale_scip_fixture.py (from S4-02) BOTH fail (the materialized scip.json carries last_indexed == HEAD; the adversarial's last_indexed != current_HEAD assertion fails)
Contributor "fixes" regenerate.sh to allow regen with operator-forced LAST_INDEXED=$(git rev-parse HEAD) _regen_refuses_current_head grep predicate fails (the guard's if block is gone)
Contributor adds scip-typescript . invocation inside regenerate.sh (instead of the cp-from-seed pattern) tests/unit/test_fixture_stale_scip_regenerate_allowlist.pyscip-typescript shows up in the invoked-binary set, which is asserted-absent at regen-time (seed-build is OUT-OF-BAND)
Contributor "tidies up" the seed template by replacing "PARENT_COMMIT" placeholder with a real SHA _template_carries_parent_commit_placeholder predicate fails (the placeholder string is gone)
Contributor grows the source tree to 12 .ts files but forgets to update _seed/scip-slice.template.json counters _seed_template_counters_match_source_tree predicate fails (counts mismatch) — pre-empts the worse failure mode of IndexHealthProbe surfacing CoverageGap instead of CommitsBehind and the adversarial failing for the wrong reason
Contributor leaves _seed/scip-index.scip.placeholder (0 bytes) in place instead of replacing with a real binary _scip_blob_non_empty + _scip_blob_smoke_shape predicates fail (placeholder is 0 bytes; real binary is ≥ 200 bytes)
Contributor adds !.codegenie/context/raw/scip-index.scip carve-out to stale-scip/.gitignore _gitignore_excludes_git_and_codegenie predicate fails (the carve-out introduces extra non-.git/-non-.codegenie/ lines that flunk the strict-equality check) AND S7-01's central no-committed-cache guard catches the leaked .codegenie/ content
Stray node_modules/ force-added to monorepo-pnpm test_fixture_monorepo_pnpm_tree_is_closed_set (extra tracked file outside _FILE_SPECS)
Stray .codegenie/cache/blobs/x committed under any portfolio fixture tests/unit/test_no_committed_codegenie_cache_under_portfolio_fixtures.py (S7-01)
README drops the "Phase 3 entry-gate target" phrase from monorepo-pnpm/README.md _readme_documents_phase3_entry_gate_target
README drops the structural-assertion phrasing from stale-scip/README.md _readme_documents_structural_assertion
README forgets to pin the scip-typescript version _readme_pins_scip_typescript_version predicate fails
Kernel extraction silently changes behavior (e.g., enumerate_tracked excludes a different default) Phase 1's test_fixture_node_typescript_helm_shape.py regression (still passing is the proof) + tests/unit/test_shape_test_kernel.py __all__ runtime check
_ProbeName Literal in the kernel falls out of sync with the live probe registry (e.g., Phase 2 probe renamed) tests/unit/test_shape_test_kernel.py subset-semantics check (set(p.name for p in default_registry.all()) ⊆ set(get_args(_ProbeName)) — the renamed probe's new name is not in the Literal)

Refactor — clean up

  • The kernel extraction is the refactor. The pre-existing five shape tests + Phase 1's node_typescript_helm shape test all migrate to consume the kernel; the kernel itself is mypy-strict, no Any outside payload: Any, no untyped helpers.
  • _ProbeName in the kernel is the Phase-1 + Phase-2 probe-name superset. The kernel-side test (tests/unit/test_shape_test_kernel.py) asserts subset semantics (set(p.name for p in default_registry.all()) ⊆ set(get_args(_ProbeName))) per AC-26 — matching S7-01's hardened AC-37. Phase-3+ probes added later do NOT retroactively break Phase-2 fixtures.
  • The kernel's __all__ is a separate runtime check (also in tests/unit/test_shape_test_kernel.py) — silent export removal becomes a build error.
  • regenerate.sh byte-identical-twice scope per AC-34: monorepo-pnpmgit ls-files-tracked files only; gitignored artifacts out of scope. stale-scip — the committed bytes only (the _seed/ directory + manifest files + regenerate.sh + README.md + .gitignore + .gitattributes); the regenerated .git/ and .codegenie/ legitimately differ across runs (fresh git init produces fresh object SHAs).
  • No edit to tests/unit/test_no_committed_codegenie_cache_under_portfolio_fixtures.py — the existing S7-01 guard passes unchanged because stale-scip's real binary SCIP lives at _seed/scip-index.scip (NOT under .codegenie/).

Files to touch

Path Why
tests/fixtures/portfolio/monorepo-pnpm/ (tree per AC-2..AC-14; lockfile hand-authored OUT-OF-BAND; .npmrc with ignore-scripts=true) pnpm workspace; DepGraphProbe cross-package edges; Phase-3 entry-gate target
tests/fixtures/portfolio/stale-scip/ (additive materialization: expand src/ + replace _seed/scip-index.scip.placeholder with real binary + extend regenerate.sh for the wider commit-set + extend README.md) Load-bearing. The roadmap exit-criterion fixture; existing stub mechanism preserved
tests/fixtures/_shape_test_kernel.py (NOTE: above portfolio/ subdirectory so Phase 1's tests/fixtures/node_typescript_helm/ test can consume it cleanly) Shared _FileSpec + parametrized-test surface (Rule of Three conclusively past — 6 consumers including Phase 1)
tests/unit/test_fixture_monorepo_pnpm_shape.py Shape test
tests/unit/test_fixture_stale_scip_shape.py Shape test (regen-script grep predicates + seed-template counter invariant + seed-binary smoke check)
tests/unit/test_fixture_monorepo_pnpm_regenerate_allowlist.py AC-31 — explicit assertion pnpm ∉ invoked binaries; consumes _fixture_regen_allowlist.py from S7-01
tests/unit/test_fixture_stale_scip_regenerate_allowlist.py AC-31 — explicit assertion scip-typescript ∉ invoked binaries at regen-time (seed-build is OUT-OF-BAND); consumes _fixture_regen_allowlist.py from S7-01
tests/unit/test_fixture_minimal_ts_shape.py (migrate to kernel) Was direct-pattern in S7-01; now consumes kernel
tests/unit/test_fixture_native_modules_shape.py (migrate to kernel) Same
tests/unit/test_fixture_distroless_target_shape.py (migrate to kernel) Same
tests/unit/test_fixture_node_typescript_helm_shape.py (Phase 1; migrate to kernel) Phase-1 fixture consumes the kernel — the sixth consumer demonstrates the kernel pays off
tests/unit/test_shape_test_kernel.py Asserts kernel's __all__ matches documented contract + subset-semantics check for _ProbeName Literal vs. live probe registry
NOT TOUCHEDtests/unit/test_no_committed_codegenie_cache_under_portfolio_fixtures.py S7-01's central guard passes unchanged because stale-scip's real binary SCIP lives at _seed/scip-index.scip, NOT under .codegenie/
NOT TOUCHEDtests/unit/_fixture_regen_allowlist.py Reused unchanged from S7-01

Out of scope

  • Golden file regeneration + ~70 goldens — S7-03.
  • Adversarial corpus (hostile_skills_yaml, concurrent_gather_race, no_inmemory_secret_leak, phase3_handoff_smoke) — S7-04.
  • Property tests + portfolio sweep integration — S7-05.
  • CI wiring (portfolio job, adv-phase02 job) — S8-03.
  • stale-scip adversarial test itself (tests/adv/phase02/test_stale_scip_fixture.py) — already lives in S4-02; this story only ensures it passes against the full materialization (AC-32 + AC-33), does not edit it.
  • Pre-built monorepo-pnpm/node_modules/ cache for CI speedup — explicitly out. The regen-each-run policy is what Phase 2 ships; the escape valve lives in final-design.md §"Open questions" #6 and triggers only on hosted-runner bench failure.

Notes for the implementer

  • Risk #3 is the load-bearing risk this story defends. If a future contributor regenerates the stale-scip SCIP against current HEAD, the load-bearing exit criterion silently stops exercising staleness. Three layers of defense, all in this story (or inherited from S4-02):
  • regenerate.sh LAST_INDEXED defaults to HEAD~1 (the parent of HEAD; NEVER HEAD). The shape test's _last_indexed_defaults_to_head_tilde_one predicate pins this at the script-text level (AC-20).
  • regenerate.sh has an explicit guard against operator-forced LAST_INDEXED=$(git rev-parse HEAD) — the script exits 1 with a clear error. The shape test's _regen_refuses_current_head predicate pins this (AC-20).
  • The S4-02 adversarial asserts both n >= 1 AND last_indexed != current_HEAD (already coded; this story's source-tree expansion preserves the non-trivial truth of both inequalities by leaving the v0/v1 commit-split mechanism intact).

Document all three layers in stale-scip/README.md (AC-22). Test the regen-script refusal before opening the PR: LAST_INDEXED=$(cd tests/fixtures/portfolio/stale-scip && bash regenerate.sh && git rev-parse HEAD) bash tests/fixtures/portfolio/stale-scip/regenerate.sh — observe exit code 1.

  • monorepo-pnpm's pnpm-lock.yaml byte-stability matters for golden determinism. Pin the lockfile bytes at fixture creation: run pnpm install once in a scratch directory matching the manifest exactly, copy the lockfile in, commit it, and never invoke pnpm in regenerate.sh (per AC-10 + ADR-0001 — pnpm is NOT in ALLOWED_BINARIES; S7-01's native-modules HARDENED precedent is the model). If the public registry repushes any of monorepo-pnpm's deps, the OUT-OF-BAND pnpm install (run in a deliberate fixture-update PR) would observe a mismatch and the contributor would re-pin the lockfile then — never silently.

  • The kernel extraction in this story has been deferred from S7-01 deliberately. S7-01 had three consumers (Rule of Three boundary, not past); this story brings the count to five new + one Phase-1 = six. Six is conclusively past the rule. The kernel is the natural landing point — extract once, migrate all six consumers in one PR, observe Phase-1 regressions stay green (AC-25 + AC-36).

  • Kernel location at tests/fixtures/_shape_test_kernel.py (above the portfolio/ subdirectory). Phase 1's tests/fixtures/node_typescript_helm/ fixture is OUTSIDE portfolio/; placing the kernel at tests/fixtures/portfolio/_shape_test_kernel.py would force Phase 1's shape test to import from a "portfolio" namespace it isn't part of, which is structurally awkward. The above-portfolio/ location lets all six consumers import from tests.fixtures._shape_test_kernel import ... symmetrically.

  • Kernel pattern choice — flat helpers vs. test factories. Two acceptable shapes:

  • Test factories (make_existence_test, make_parses_test, …): the kernel returns pytest-decorated test functions for module-level assignment in each consumer. Compact but unusual; pytest's natural module-level @pytest.mark.parametrize discovery is inverted.
  • Flat helpers (assert_file_exists(fixture, spec), assert_file_parses(fixture, spec), …): the kernel exposes pure helper functions; each consumer writes minimal @pytest.mark.parametrize("spec", _FILE_SPECS, ids=lambda s: s.relpath) def test_fixture_file_exists(spec): assert_file_exists(_FIXTURE, spec). More pytest-natural; mypy --strict-clean without ergonomic dance; the kernel is a clean functional core.

Validator recommends flat helpers — but factory-based is acceptable if cleaner per the implementer's read. Pick one and apply consistently across all six consumers; the AC's requirement is "structural logic lives in the kernel; consumers declare only data".

  • enumerate_tracked is the kernel's port for git ls-files. Hexagonal discipline: subprocess invocation is encapsulated; consumers receive tuple[str, ...] of relpaths. The kernel is the ONLY call site for run_allowlisted("git", "ls-files", str(fixture_path)) — no consumer shells out itself. This makes the kernel's I/O surface auditable in one place.

  • _FileSpec is a frozen NamedTuple. Immutability by construction (S2-03 precedent). Don't switch to dataclass(frozen=True) — the existing S7-01 shape tests are NamedTuple and the migration should be import-rewrite, not constructor-rewrite.

  • _fixture_regen_allowlist.py (S7-01) and _shape_test_kernel.py (this story) are SEPARATE flat modules. Different responsibilities — closed-set discovery + parametrized-test structure (kernel) vs. allowlist policy ownership for regenerate.sh invocations (regen-allowlist). Subsuming one into the other would conflate two cohesive responsibilities; keep them flat.

  • Why no node_modules/ under monorepo-pnpm/. Phase 2's node_build_system probe (Phase 1) reads pnpm-lock.yaml; it does NOT read node_modules/. Committing node_modules/ would bloat the fixture by an order of magnitude AND introduce non-determinism (transitive-dep version-resolution drift). The probes that need the resolved tree (Phase 3+ adapters) reach through their adapters, not through the file system.

  • scip-typescript version pin matters for _seed/scip-index.scip reproducibility. Pin the tool version used to build the seed binary (record in stale-scip/README.md per AC-22 + _readme_pins_scip_typescript_version predicate). When the production tool version updates (S4-03 records the production scip-typescript version pin), the seed binary may need a deliberate regen via the AC-21a seed-build ritual. The structural assertion (CommitsBehind.n >= 1) survives tool-version drift; the seed binary's bytes do not.

  • Why the binary SCIP lives in _seed/, not .codegenie/. The existing S4-02 stub treats .codegenie/ as a runtime-only directory (gitignored; regenerated). The seed bytes (template + binary) live in _seed/ (committed). This split keeps the "committed contract surface" cleanly separated from "runtime materialization output". S7-01's central no-committed-cache guard rests on this split.

  • Why the adversarial test does NOT consume the binary SCIP today. tests/adv/phase02/test_stale_scip_fixture.py reads .codegenie/context/raw/scip.json (materialized from _seed/scip-slice.template.json). The binary _seed/scip-index.scip is forward-looking for S4-03's ScipIndexProbe consumer. S7-02's binary-SCIP contribution is therefore NOT load-bearing for the current adversarial — it's load-bearing for the next-phase consumer. Document this carefully in README.md so a future maintainer doesn't conclude "the placeholder is fine because the adversarial passes against it."

  • Phase-3 handoff note (Risk #8). monorepo-pnpm/README.md explicitly names this as the Phase-3 entry-gate target. When Phase 3's author lands the first DepGraphAdapter implementation, they will smoke against this fixture's dep_graph slice. Any Protocol drift between Phase 2's Protocol shape and Phase 3's first implementation surfaces here (in addition to S7-04's test_phase3_handoff_smoke.py skip-and-unskip ritual).

Patterns DELIBERATELY deferred

  • Pre-built fixture caches under tests/fixtures/portfolio/_cache/ — out of scope; regen-each-run policy is what Phase 2 ships.
  • A YAML-based MANIFEST.yaml SSoT inside each fixture — Python-as-SSoT continues to work; lift only if a fourth consumer of the manifest appears (e.g., a build-system probe needing it at runtime).
  • A second SCIP indexer (e.g., scip-go) for the stale-scip fixture — out; Phase 2 fixtures are TypeScript-only. Phase 6+ may introduce a polyglot variant.
  • A git history visualization committed alongside the fixture — out; the README's prose is enough.