Skip to content

Story S8-04 — Phase-3 handoff issues (project-board mirrors of existing Phase 3 stories) + docs/contributing.md Layer B–G addendum + Phase 2 README exit-criteria pointer

Step: Step 8 — Confidence section renderer + CI ratchet + advisory benches + Phase-3 handoff Status: Done — GREEN 2026-05-18 (phase-story-executor; see _attempts/S8-04.md for the per-AC evidence table + AC-6c manual mkdocs build --strict capture). Zero src/codegenie/** edits per the story invariant. The IssueSpec frozen Pydantic registry + milestones_needed() pure helper ride in scripts/_phase3_handoff_issues.py; scripts/file_phase3_handoff_issues.py is the impure shell (idempotent via title-dedupe + body-diff gh issue edit; --project optional with loud no-board warning per Rule 12). tests/unit/docs/ adds 23 tests covering AC-1..AC-11. docs/contributing.md gains a new H3 ### Adding a Layer B/C/D/E/G probe (Phase 2 additions) UNDER the existing ## Adding a probe H2 (Rule 11 — preserve Phase-0 recipe). Phase 2 README gains a ## Phase 2 exit-criteria — closed section pointing at the canonical stories/README.md §"Exit-criteria coverage" table (no duplication) + a G1–G10 [x] sign-off. High-level-impl.md Step 8: three boxes ticked [x] (S8-04); AC-10b's intentionally-soft warning catches that the other five Step-8 boxes (owned by S8-01/02/03 — those stories shipped but did not tick these specific lines) remain [ ]. tests/adv/phase02/test_phase3_handoff_smoke.py BLAKE3-frozen at 613f7f4e8102e2aa5f5ec0128c4da295191ac3ad5ca7ea8236a877979b886fc6; Phase 3's entry-gate review owns the unskip. Gates: 23/23 unit/docs tests pass; mypy --strict, ruff check+format --check, fence, test_doc_consistency, lint-imports (2 kept, 0 broken), mkdocs build --strict all green; full unit suite 3457 passed, 17 skipped, 1 xfailed. One Rule-7 surface in the attempt log: the spec asked issue #5's body to mention the wrong src/codegenie/exec.py path; the executor used only the correct src/codegenie/exec/__init__.py:96 and kept the negative-assertion test (same defensive coverage, no contradiction). Effort: S Depends on: S8-03 (eight CI jobs green on master). ADRs honored: 02-ADR-0007 (no Plugin Loader in Phase 2 — Phase 3 ships loader + first plugin + adapters together); 02-ADR-0006 (IndexFreshness sum-type location — any drift by Phase 3 requires an ADR amendment to ADR-0006); 02-ADR-0001 (ALLOWED_BINARIES — Phase 3 extends with npm, jq via a fresh ADR or amendment); production ADR-0031 (plugin architecture — Phase 3 owns the loader); production ADR-0032 (language search adapters — Phase 3 owns the four adapter implementations); ADR-0008 (no event-stream — this story adds zero new structlog events).

Validation notes (2026-05-18 — phase-story-validator)

This story was hardened by phase-story-validator. The draft's Goal — Phase 2 close-out: project-board issues for Phase 3 work, a contributor cheat-sheet, and an exit-criteria sign-off — is sound. The prescriptions, however, contradicted master in nine places and prescribed mechanisms that would silently duplicate existing artifacts. Sixteen findings closed. Verdict: HARDENED.

  1. Redundancy with shipped Phase 3 stories. Phase 3 already has 47 designed stories under docs/phases/03-vuln-deterministic-recipe/stories/ — including S2-01-plugin-registry-kernel.md, S2-02-plugin-manifest-pydantic.md, S2-03-plugin-loader-integrity.md, S2-04-plugin-resolver-extends.md (the four-part Plugin Loader decomposition), S7-01-vuln-node-npm-plugin-scaffold.md (the npm plugin), S7-03-universal-hitl-fallback-plugin.md (the universal fallback). The original AC-1..AC-5 re-prescribed the work from scratch, creating a parallel canonical surface that would drift from the story files. Resolution: each handoff issue is now an explicit project-board mirror that links to the canonical Phase 3 story file(s); the issue body cites the story IDs and adds the Phase-2-handoff context (e.g., "the four Protocols Phase 2 froze are at src/codegenie/adapters/protocols.py"). No re-prescription.

  2. docs/contributing.md already has ## Adding a probe. Line 69 contains a 7-step recipe citing LanguageDetectionProbe (Phase 0). The draft prescribed a parallel H2 ## Adding a Layer B/C/D/E/G probe. Resolution (Rule 7 — surface the conflict, don't blend): add a subsection ### Adding a Layer B/C/D/E/G probe (Phase 2 additions) under the existing ## Adding a probe H2 (not a parallel H2). The subsection covers what Phase 2 added (@register_probe(heaviness=, runs_last=), run_external_cli for B/G vs run_allowlisted for C, @register_index_freshness_check, model_construct forbidden under output/) with Phase 2 probes as canonical examples. The Phase 0 recipe stays untouched.

  3. Exit-criteria duplication. stories/README.md §"Exit-criteria coverage" already has the complete mapping table. The draft prescribed a second copy in the phase README. Resolution: the phase README gets a small ## Phase 2 exit-criteria — closed section with a pointer to stories/README.md §"Exit-criteria coverage" (the canonical table) + a top-level [x] checklist of the high-level Phase-2 goals (G1–G10 from arch-design.md §"Goals"). No table duplication; one source of truth.

  4. Wrong file path. src/codegenie/exec.py does not exist; the path is src/codegenie/exec/__init__.py (exec is a package, not a module). References line + AC-5 body corrected.

  5. GH Project board not verified. No board was ever verified to exist. The draft assumed gh issue create --project <board> would just work. Resolution: AC-1's pre-flight asserts gh api repos/:owner/:repo/milestones lists the Phase 3 milestone (creates it if missing — milestone is repo-scoped, not project-scoped, so it does not require a Project board). The --project flag is optional: the script accepts --project <name> for organizations that maintain a Project board; if omitted, the script files issues without a project association and prints a loud WARNING: no project board provided; issues filed without board association (Rule 12 — fail loud). No silent downgrade.

  6. Issue idempotency on re-run. The draft script would create 8 duplicate issues on a second run. Resolution: the script's first step is gh issue list --json title --search "[Phase 3]" --state all --limit 100 → dedupe-by-title; if a matching title already exists with the same milestone, the script updates the body via gh issue edit (idempotent) rather than creating a duplicate. AC-1b asserts: a second invocation of the script produces zero new issues and zero body changes (no-op idempotence).

  7. Mutation-weakness on AC-1 test. The draft's check ("body mentions ADR-0007") would pass against an empty body containing the literal string. Resolution: AC-1's assertion is now structured — body MUST contain all-of {ADR-0007, ADR-0031, src/codegenie/adapters/protocols.py, S2-01-plugin-registry-kernel, S2-02-plugin-manifest-pydantic, S2-03-plugin-loader-integrity, S2-04-plugin-resolver-extends} AND have a non-empty body length >= 200 characters AND have a structured "Phase 2 context", "Phase 3 stories", "Acceptance" subsection. Mutation-resistant.

  8. mkdocs build --strict in unit test. The draft prescribed invoking it as a subprocess in test_contributing_cheatsheet.py::test_mkdocs_build_strict. That is slow + side-effectful + duplicates the existing make docs CI job (per CLAUDE.md make check chain) + creates a tmp build dir that hits disk. Resolution: unit test does grep-only — asserts the section heading exists, the seven steps are enumerated, and each canonical-example probe name appears. The mkdocs build --strict invocation stays in the existing docs CI workflow (make docs); a new unit test (tests/unit/docs/test_mkdocs_nav_includes_contributing.py) parses mkdocs.yml YAML and asserts contributing.md is in the nav tree (no subprocess). Faster, hermetic, equivalent coverage.

  9. AC-9 fragility. Draft prescribed updating the test's @pytest.mark.skip reason string to include the filed GitHub issue number. GH issue numbers can move on repo migrations; the existing reason already cites ADR-0007 + High-level-impl.md §Step 7 (more durable). Resolution: AC-9 dropped (no edit to test_phase3_handoff_smoke.py); the test's skip-reason stays as-is. The filed issue #4 body cites the test path; Phase 3's executor unskips at the entry-gate review and updates the skip-reason then.

  10. AC-8 cherry-picked subset. Draft filed backlog issues for "Open implementation questions" #2/#4/#5 with no justification for excluding #1/#3/#6/#7/#8. Reading stories/README.md §"Open implementation questions" shows #1, #3, #6, #7, #8 are already resolved (encoded in shipped stories — S1-02, S4-02/S7-02, S3-01, S7-04, S1-11 respectively); #2/#4/#5 are the actual open items needing future work. AC-8 now justifies the selection inline: "#2/#4/#5 are the three items still open per stories/README.md; #1/#3/#6/#7/#8 are resolved (citation inline)."

  11. AC-10 cross-story coupling. Draft AC-10 asserted every Step 8 done-criterion box is [x], but 7 of 8 boxes are closed by S8-01/S8-02/S8-03; one slipping turns S8-04 red. Resolution: AC-10 split into AC-10a (this story marks the two Step 8 boxes it owns — "All five Phase-3 handoff issues exist"; "docs/contributing.md builds in mkdocs build --strict"; "docs/phases/02-context-gather-layers-b-g/README.md checklist marked complete and committed" — three boxes) and AC-10b (status check — asserts S8-01/02/03 boxes are [x] as a read-only verification, not a write). If S8-01/02/03 ship before S8-04 lands, AC-10b passes trivially. If they don't, AC-10b surfaces the dependency as data without blocking S8-04's own boxes.

  12. Off-by-one in Goal §2. Draft said "the four 'Decisions noted but not yet documented'" but listed three (#2/#4/#5). Resolution: corrected to "three" and added the inline justification per finding #10.

  13. Function name citation. Test function is test_phase3_adapter_handoff_smoke (file test_phase3_handoff_smoke.py). Multiple references corrected.

  14. Design-patterns: IssueSpec Pydantic model + typed registry. Eight heterogeneous issue payloads as inline dicts in the script is primitive obsession + anaemic dict. Resolution: scripts/_phase3_handoff_issues.py exposes a Final tuple of IssueSpec frozen Pydantic models (title: str, milestone: MilestoneName, body: str, labels: frozenset[Label], phase3_stories: tuple[str, ...]). scripts/file_phase3_handoff_issues.py is the impure shell consuming the tuple. Open/Closed: a future handoff story adds a row to the tuple; the script logic is unchanged. Pure spec / impure I/O — CLAUDE.md convention.

  15. MilestoneName newtype. Draft uses raw str for milestone everywhere. Resolution: MilestoneName = NewType("MilestoneName", str) co-located in _phase3_handoff_issues.py (single-use; no need to land in codegenie.types.identifiers).

  16. AC-6 layer convention check. Draft prescribed "route external CLIs through run_external_cli (B/G) or run_allowlisted (C only)". Spot-check against src/codegenie/exec/__init__.py + Layer C probes confirms run_allowlisted is used directly by Layer C (e.g., git rev-parse), run_external_cli is the wrapper for Layer B/G. The cheat-sheet text was correct; codified as a verifiable assertion in test_contributing_cheatsheet.py that the section names both functions.

Full critic findings + decision rationale archived at _validation/S8-04-phase3-handoff-and-docs.md.

Context

Phase 2 ships kernel-side scaffolding only: adapter Protocols, TCCMLoader, SkillsLoader, IndexFreshness, registration plumbing. The Plugin Loader itself, the universal (*, *, *) fallback plugin, and the first concrete plugin (plugins/vulnerability-remediation--node--npm/) are deliberately deferred to Phase 3 per ADR-0007 + ADR-0031 §Consequences §1. Phase 3 has already been fully designed — 47 stories under docs/phases/03-vuln-deterministic-recipe/stories/ carry the implementation prescription. This story is the project-board mirror: file five GitHub issues that each link to the Phase 3 story files + a Phase 3 — Vuln remediation: deterministic recipe path repo-level milestone, so Phase 3 has a fully-loaded project-board view at start of work. No re-prescription.

The handoff is also the moment to close the Phase 2 README's exit-criteria sign-off (a high-level [x] checklist pointing at stories/README.md §"Exit-criteria coverage" as the canonical mapping table) and extend docs/contributing.md's existing ## Adding a probe section with a Phase 2 subsection (Layer B/C/D/E/G additions: @register_probe(heaviness=, runs_last=), run_external_cli, @register_index_freshness_check, the model_construct ban under output/). The subsection uses Phase 2's now-shipped probes as canonical examples — IndexHealthProbe (B2), RuntimeTraceProbe (C), SemgrepProbe (G), SkillsIndexProbe (D), ConventionsProbe (D) — so a new probe author can copy a real probe and only edit what's task-specific.

The most load-bearing of the five issues is #4unskip tests/adv/phase02/test_phase3_handoff_smoke.py at Phase 3 entry-gate review. The test (function name test_phase3_adapter_handoff_smoke) currently has @pytest.mark.skip(reason="enabled when Phase 3 plugin lands — see ... ADR-0007 ... High-level-impl.md §Step 7"). Unskipping forces re-verification that Phase 2's four adapter Protocols are imported unchanged. Any drift (e.g., Phase 3 discovers consumers(self, pkg: str) should be consumers(self, pkg: PackageId, *, transitively: bool = False)) requires an explicit ADR amendment to 02-ADR-0006 or 02-ADR-0007 — not a silent Protocol edit. This is the contract trip-wire phase-arch-design.md §"Gap 1" identified; issue #4 is what makes Phase 3 honor it.

This is the smallest story in Step 8 in code terms (zero new src/ code) and the largest in coordination terms (cross-phase contract handoff, GH issue automation, contributor docs).

References — where to look

Goal

Three deliverables, no production-code changes:

  1. File eight GitHub issues on the repo (and optionally on a Project board if one exists) — five handoff issues (Phase 3 work) and three backlog issues (post-Phase-3 open questions). Each handoff issue is a project-board mirror that links to the canonical Phase 3 story file(s); each backlog issue links to the relevant Open implementation questions row in stories/README.md. Use a typed IssueSpec Pydantic model in scripts/_phase3_handoff_issues.py (data registry) consumed by scripts/file_phase3_handoff_issues.py (impure shell). Idempotent on re-run (dedupe-by-title; gh issue edit for body updates).

  2. Extend docs/contributing.md's existing ## Adding a probe section (Phase 0/1 content, line 69) with a new subsection ### Adding a Layer B/C/D/E/G probe (Phase 2 additions). The subsection covers seven Phase-2-specific topics: (a) heaviness annotation via @register_probe(heaviness=, runs_last=); (b) run_external_cli for Layer B/G external CLIs vs run_allowlisted direct for Layer C; (c) @register_index_freshness_check Open/Closed registration; (d) typed ProbeOutput.schema_slice via Pydantic with model_construct banned under output/; (e) declared_inputs for cache keys (including special tokens like image-digest: per ADR-0004); (f) confidence reporting discipline ("high"|"medium"|"low" — facts, not judgments); (g) canonical Phase 2 examples (IndexHealthProbe, RuntimeTraceProbe, SemgrepProbe, SkillsIndexProbe, ConventionsProbe). The doc passes mkdocs build --strict via the existing make docs CI job (this story does NOT invoke it from a unit test).

  3. Mark docs/phases/02-context-gather-layers-b-g/README.md's exit-criteria closed. Append a ## Phase 2 exit-criteria — closed section that (a) points at the canonical table in stories/README.md §"Exit-criteria coverage" (no duplication); (b) provides a top-level [x] checklist over the high-level Phase 2 goals (G1–G10 from arch-design.md §"Goals"), each line citing the story IDs that closed it (cross-checked against the canonical table).

Acceptance criteria

  • [ ] AC-1 (Handoff issue #1 — Plugin Loader + manifest parser + resolver — linked to Phase 3 stories S2-01..S2-04). A GitHub issue exists with title [Phase 3] Implement Plugin Loader: kernel + manifest parser + integrity loader + resolver and milestone Phase 3 — Vuln remediation: deterministic recipe path. Body MUST contain all of: (a) the literal substrings ADR-0007, ADR-0031, src/codegenie/adapters/protocols.py; (b) markdown links to all four Phase 3 story files: S2-01-plugin-registry-kernel.md, S2-02-plugin-manifest-pydantic.md, S2-03-plugin-loader-integrity.md, S2-04-plugin-resolver-extends.md; (c) a "Phase 2 context" H3 (≥ 50 chars), a "Phase 3 stories" H3 (the four links), and an "Acceptance" H3; (d) total body length >= 200 chars. tests/unit/docs/test_phase3_handoff_issues.py::test_issue_1_body_structured reads tests/unit/docs/_fixtures/issues.json (committed dry-run output) and asserts each of (a)–(d).
  • [ ] AC-1b (Idempotency — re-running the script is a no-op). The script's first step: gh issue list --json title,body,number --search "[Phase 3]" --state all --limit 100. For each IssueSpec in the registry, if a matching title exists, the script calls gh issue edit <num> --body-file ... ONLY if the existing body differs from the rendered body (string compare). tests/unit/docs/test_phase3_handoff_issues.py::test_idempotent_second_run simulates a second invocation against the fixture and asserts zero gh issue create calls + zero gh issue edit calls (when bodies match).
  • [ ] AC-1c (No-board graceful degradation + loud warning). The script accepts --project <board-name> as OPTIONAL. If absent, issues file without project association; the script prints WARNING: no project board provided; issues filed without board association to stderr (Rule 12 — fail loud). If the --project value is provided but gh project list does not match, the script EXITS with code 2 and a loud error (do not silently downgrade an explicit --project flag). tests/unit/docs/test_phase3_handoff_issues.py::test_no_project_warning asserts both paths via subprocess monkeypatching.
  • [ ] AC-1d (Milestone pre-flight). The script's pre-flight asserts the Phase 3 — Vuln remediation: deterministic recipe path milestone exists via gh api repos/:owner/:repo/milestones. If missing, the script creates it (idempotent: a second creation attempt is a no-op via gh api ... --silent || true). test_milestone_preflight_creates_idempotently asserts the milestone API call sequence.
  • [ ] AC-2 (Handoff issue #2 — first plugin plugins/vulnerability-remediation--node--npm/ + four ADR-0032 adapter implementations — linked to S7-01 + S7-02). Body contains all of: ADR-0032, src/codegenie/adapters/protocols.py, markdown links to S7-01-vuln-node-npm-plugin-scaffold.md and S7-02-npm-recipes-and-adapters.md, an enumeration of the four implementations (dep_graph_npm.py, import_graph_node.py, scip_node.py, test_inventory_node.py), and citations to the Phase 2 fixtures (monorepo-pnpm, minimal-ts). AC-1's structured-payload pattern (>= 200 chars, three H3 sections) repeats. test_issue_2_body_structured.
  • [ ] AC-3 (Handoff issue #3 — universal (*, *, *) fallback plugin / HITL escalation — linked to S7-03). Body contains production/design.md §"Humans always merge", ADR-0031, link to S7-03-universal-hitl-fallback-plugin.md, and an explanation of when the fallback fires (no concrete plugin matches the (task-class, language, package-manager) triple). test_issue_3_body_structured.
  • [ ] AC-4 (Handoff issue #4 — LOAD-BEARING — unskip test_phase3_handoff_smoke.py at Phase 3 entry-gate review; explicit ADR-amendment requirement). Body contains the literal phrases: (a) Any Protocol drift requires an explicit ADR amendment to 02-ADR-0006 / 02-ADR-0007; (b) tests/adv/phase02/test_phase3_handoff_smoke.py (file path); (c) test_phase3_adapter_handoff_smoke (the actual function name); (d) phase-arch-design.md §"Gap 1"; (e) a numbered "Acceptance at Phase 3 entry-gate" list with at least 3 items (run the test, verify Protocols imported unchanged, file ADR amendment if drift). test_issue_4_body_load_bearing asserts all five literals.
  • [ ] AC-5 (Handoff issue #5 — extend ALLOWED_BINARIES for npm, jq via amendment ADR). Body contains the literal string src/codegenie/exec/__init__.py (NOT the wrong src/codegenie/exec.py), references 02-ADR-0001 as the precedent, names npm and jq as the only two additions, and explicitly forbids "while we're at it" binaries (Implementation risk #2). The body acknowledges the structural enforcement: the ALLOWED_BINARIES: frozenset[str] at exec/__init__.py:96 is the real guard; the issue body restating the discipline is documentation, not a substitute for the frozenset edit. test_issue_5_body_correct_path.
  • [ ] AC-6 (docs/contributing.md### Adding a Layer B/C/D/E/G probe (Phase 2 additions) subsection added UNDER the existing ## Adding a probe H2). The new content is an H3 subsection, NOT a parallel H2 (Rule 7 — surface the conflict, don't blend; one source-of-truth recipe with a Phase 2 addendum). The subsection covers the seven topics from Goal §2. Each topic names at least one canonical Phase 2 probe example. The existing 7-step recipe (Phase 0 LanguageDetectionProbe) is unedited. tests/unit/docs/test_contributing_cheatsheet.py::test_subsection_under_existing_h2 parses docs/contributing.md, asserts: (a) the existing ## Adding a probe H2 at line ~69 is untouched (byte-identical first 50 lines of the section); (b) the new ### Adding a Layer B/C/D/E/G probe (Phase 2 additions) H3 exists within that H2 (no parallel H2 introduced); (c) the H3 names all seven topics; (d) the H3 cites all five canonical probe examples (IndexHealthProbe, RuntimeTraceProbe, SemgrepProbe, SkillsIndexProbe, ConventionsProbe).
  • [ ] AC-6b (mkdocs nav unchanged + contributing.md reachable — no subprocess invocation in unit tests). A new unit test tests/unit/docs/test_mkdocs_nav_includes_contributing.py parses mkdocs.yml as YAML and asserts contributing.md appears in the nav tree (recursive search through nested lists). The mkdocs build --strict invocation stays in the existing make docs CI job (per CLAUDE.md make check chain); this story does NOT shell out to mkdocs from a unit test. AC-6c (manual ritual, captured in _attempts/S8-04.md): run make docs locally before opening the closing PR; capture exit 0 in the attempt log.
  • [ ] AC-7 (docs/phases/02-context-gather-layers-b-g/README.md## Phase 2 exit-criteria — closed section appended; POINTS at canonical table; high-level [x] checklist over G1–G10). The new section: (a) starts with a single paragraph pointing at stories/README.md §"Exit-criteria coverage" as the canonical mapping ("Canonical mapping table: see stories/README.md §Exit-criteria coverage."); (b) follows with a markdown checklist of the ten Phase 2 high-level goals (G1–G10 from phase-arch-design.md §"Goals"), each line [x] + one-sentence summary + the story IDs that closed it (cross-referenced against the canonical table); (c) ends with a sign-off line crediting the story IDs that close Step 8 (S8-01..S8-04). tests/unit/docs/test_phase2_readme_signoff.py parses the Phase 2 README, asserts: (i) the new H2 section exists; (ii) the canonical-table pointer line exists (literal substring match for the link); (iii) every checkbox in the section is [x], none [ ]; (iv) the checkbox count is exactly 10 (one per G1–G10) — NOT a duplicate of the full ~22-row table. No table duplication.
  • [ ] AC-8 (Backlog issues for the three OPEN open-questions — #2, #4, #5 — with inline justification for why the other five are excluded). Three backlog issues on the milestone Backlog (or Post-Phase-3):
  • [Backlog] Full-repo mypy --warn-unreachable rollout (per stories/README.md §"Open implementation questions" #2 — backlog item; the global flag in pyproject.toml:172 already covers the repo, but per-module override file-list audit is outstanding).
  • [Backlog] ExternalDocsProbe host-allowlist config schema (per #4 — first arises when a real user opts in; Phase-4-or-later).
  • [Backlog] SkillsLoader per-tier signing (Sigstore-style) (per #5 — Phase 14 multi-tenant concern). The _phase3_handoff_issues.py registry's docstring explicitly justifies: # #1, #3, #6, #7, #8 are resolved by shipped stories (S1-02, S4-02/S7-02, S3-01, S7-04, S1-11); see stories/README.md §"Open implementation questions" inline citations. test_backlog_issues_justified asserts the docstring contains the justification literal AND the three backlog IssueSpecs are present.
  • [ ] AC-9 (No edit to test_phase3_handoff_smoke.py's skip reason — Phase 3 owns the update). This story explicitly does NOT modify tests/adv/phase02/test_phase3_handoff_smoke.py. The existing @pytest.mark.skip(reason=...) already cites ADR-0007 + High-level-impl.md §Step 7 (more durable than a GitHub issue number that may move on repo migration). Issue #4's body cites the file path; Phase 3's executor updates the skip-reason at the entry-gate review. tests/unit/docs/test_skip_reason_unchanged.py reads tests/adv/phase02/test_phase3_handoff_smoke.py, computes BLAKE3 of the file, and asserts it matches a frozen _EXPECTED_BLAKE3 constant captured at the time S8-04 lands. A future edit to the file triggers a loud test failure prompting an ADR review.
  • [ ] AC-10a (Step 8 done-criteria boxes owned by this story closed). docs/phases/02-context-gather-layers-b-g/High-level-impl.md §"Step 8 — Done criteria" — the THREE boxes this story owns are marked [x] and reference S8-04: "All five Phase-3 handoff issues exist on the GitHub Project board with milestones aligned to roadmap.md §Phase 3"; "docs/contributing.md builds in mkdocs build --strict and remains in curated nav"; "docs/phases/02-context-gather-layers-b-g/README.md checklist marked complete and committed". tests/unit/docs/test_step8_s8_04_boxes_closed.py::test_three_owned_boxes_checked asserts exactly these three boxes are [x] with the literal (S8-04) annotation alongside each.
  • [ ] AC-10b (Read-only verification — other Step 8 boxes closed by S8-01/02/03). A read-only assertion: the other five Step 8 done-criteria boxes are also [x] (closed by S8-01/02/03). tests/unit/docs/test_step8_other_boxes_closed.py is a SOFT assertion via pytest.warns(UserWarning) if any [ ] remains in Step 8; a hard xfail annotation when run in isolation (avoid this story failing due to S8-01/02/03 slips). The closing-PR's manual checklist verifies the hard zero-[ ] state before merge.
  • [ ] AC-11 (mypy --strict + ruff + Phase 0 fence green; zero new src/ imports). This story changes zero src/codegenie/** files. mypy --strict scripts/file_phase3_handoff_issues.py scripts/_phase3_handoff_issues.py passes. ruff check + format --check green on all new/touched files. Phase 0 fence job stays green trivially (no LLM/network imports introduced; the script's gh invocation is via subprocess — a stdlib import, not a network library import).

Out of scope

  • Implementing any Phase 3 code. Plugin Loader, first plugin, adapters, npm/jq allowlist edits are all Phase 3 (covered by Phase 3 stories S2-01..S2-04, S7-01..S7-03).
  • Unskipping test_phase3_handoff_smoke.py. That action belongs to Phase 3's entry-gate review (covered by issue #4). AC-9 enforces this by BLAKE3-freezing the file.
  • Editing the four adapter Protocols at src/codegenie/adapters/protocols.py. Any drift is an ADR amendment, not silent code change (Implementation risk #8).
  • Duplicating the stories/README.md §"Exit-criteria coverage" table into the phase README. The phase README POINTS at the canonical table + provides a small G1–G10 sign-off checklist.
  • Adding a parallel ## Adding a Layer B/C/D/E/G probe H2 in docs/contributing.md. The new content is a SUBSECTION (H3) under the existing ## Adding a probe H2.
  • Filing GitHub issues for already-resolved open questions (#1, #3, #6, #7, #8). The script's docstring justifies the selection inline.
  • Updating test_phase3_handoff_smoke.py's skip-reason text. Phase 3's entry-gate review owns that edit.
  • Invoking mkdocs build --strict from a unit test. The existing make docs CI job covers this; a unit subprocess is slow + duplicative.
  • Creating a GitHub Project board. If one exists, the script uses it via --project; if not, issues file without board association + a loud warning.
  • Adding a "Phase 2 retrospective" document. Useful, but not required by the roadmap; if the team wants one, a separate ticket.
  • Migrating docs/contributing.md to a new doc system. Stay in mkdocs.
  • Editing roadmap.md to mark Phase 2 done. Mechanical, separate commit on the closing PR.

Files to touch

New:

  • tests/unit/docs/__init__.py — empty.
  • tests/unit/docs/test_phase3_handoff_issues.py — AC-1, AC-1b, AC-1c, AC-1d, AC-2, AC-3, AC-4, AC-5, AC-8. Reads from a generated tests/unit/docs/_fixtures/issues.json (committed; produced by the script's --dry-run mode).
  • tests/unit/docs/test_contributing_cheatsheet.py — AC-6 (grep-only; no subprocess).
  • tests/unit/docs/test_mkdocs_nav_includes_contributing.py — AC-6b.
  • tests/unit/docs/test_phase2_readme_signoff.py — AC-7.
  • tests/unit/docs/test_skip_reason_unchanged.py — AC-9 (BLAKE3 freeze).
  • tests/unit/docs/test_step8_s8_04_boxes_closed.py — AC-10a.
  • tests/unit/docs/test_step8_other_boxes_closed.py — AC-10b (soft assertion).
  • scripts/_phase3_handoff_issues.py — typed IssueSpec Pydantic frozen model + MilestoneName newtype + Final tuple of 8 IssueSpec instances (5 handoff + 3 backlog) + docstring justifying the open-question selection.
  • scripts/file_phase3_handoff_issues.py — impure shell consuming the registry. Flags: --project <name> (optional), --dry-run (writes fixture to tests/unit/docs/_fixtures/issues.json), default = live. Idempotent via title-dedupe + body-diff gh issue edit.
  • tests/unit/docs/_fixtures/issues.json — committed dry-run output; the unit tests read this rather than hitting GH live.

Modified:

  • docs/contributing.md — append H3 ### Adding a Layer B/C/D/E/G probe (Phase 2 additions) UNDER the existing ## Adding a probe H2. The existing 7-step recipe is unedited.
  • docs/phases/02-context-gather-layers-b-g/README.md — append ## Phase 2 exit-criteria — closed section with canonical-table pointer + G1–G10 [x] checklist + Step 8 sign-off line.
  • docs/phases/02-context-gather-layers-b-g/High-level-impl.md — mark the THREE Step 8 done-criterion boxes this story owns as [x] (S8-04). (Other steps' done-criteria are closed by their own stories; this story closes only its three.)

Untouched (DO NOT EDIT):

  • src/codegenie/adapters/protocols.py (Implementation risk #8 — Protocol shape is Phase 3's discovery; any drift is ADR amendment).
  • src/codegenie/exec/__init__.py (Phase 3 owns the npm/jq extension).
  • Any Phase 2 production src/ code under src/codegenie/.
  • tests/adv/phase02/test_phase3_handoff_smoke.py (AC-9 enforces — BLAKE3 frozen).
  • docs/phases/02-context-gather-layers-b-g/stories/README.md (canonical exit-criteria coverage table; the phase README POINTS at it).
  • roadmap.md §"Phase 3" itself (story files issues against the milestone; the roadmap text is unchanged).
  • mkdocs.yml (nav already includes contributing; AC-6b verifies, does NOT edit).

TDD plan — red / green / refactor

RED (failing tests committed first):

  1. test_phase3_handoff_issues.py::test_issue_1_body_structured — reads _fixtures/issues.json, asserts the four literal substrings + four story-link substrings + three H3 sections + body length >= 200. Fails red.
  2. test_phase3_handoff_issues.py::test_issue_4_body_load_bearing — asserts the five literal phrases for issue #4 (most load-bearing). Fails red.
  3. test_phase3_handoff_issues.py::test_issue_5_body_correct_path — asserts the literal src/codegenie/exec/__init__.py (NOT exec.py). Fails red. Guards against the original draft's wrong path.
  4. test_phase3_handoff_issues.py::test_idempotent_second_run — simulates two consecutive invocations via subprocess monkeypatching against the fixture; asserts zero create + zero edit calls on the second run. Fails red.
  5. test_phase3_handoff_issues.py::test_no_project_warning — invokes the script without --project; asserts stderr contains the literal WARNING: no project board provided. With an unknown --project bogus, asserts exit code 2 + loud error. Fails red.
  6. test_phase3_handoff_issues.py::test_milestone_preflight_creates_idempotently — asserts the gh api repos/:owner/:repo/milestones call appears + creates if missing + does NOT create on a second run. Fails red.
  7. test_phase3_handoff_issues.py::test_backlog_issues_justified — asserts the script's docstring contains the inline justification literal AND the three backlog IssueSpecs exist with [Backlog] title prefix. Fails red.
  8. test_contributing_cheatsheet.py::test_subsection_under_existing_h2 — parses docs/contributing.md, asserts: existing ## Adding a probe H2 byte-identical first 50 lines; new H3 nested within; H3 names seven topics + five canonical probes. Fails red.
  9. test_mkdocs_nav_includes_contributing.py::test_contributing_in_nav_tree — parses mkdocs.yml, recursive nav search; asserts contributing.md present. Fails red if removed.
  10. test_phase2_readme_signoff.py::test_signoff_section_well_formed — asserts new H2 exists + canonical-table pointer substring + exactly 10 checkboxes all [x] + Step 8 sign-off line citing S8-01..S8-04. Fails red.
  11. test_skip_reason_unchanged.py::test_blake3_frozen — asserts BLAKE3 of tests/adv/phase02/test_phase3_handoff_smoke.py matches _EXPECTED_BLAKE3. Fails red if file changes.
  12. test_step8_s8_04_boxes_closed.py::test_three_owned_boxes_checked — asserts the three S8-04-owned boxes in High-level-impl.md §Step 8 are [x] (S8-04). Fails red.
  13. test_step8_other_boxes_closed.py::test_other_boxes_warn_if_unchecked — soft assertion (pytest.warns); never hard-fails. Catches S8-01/02/03 slips as visible signal.

GREEN (minimum code to pass):

  1. Write scripts/_phase3_handoff_issues.py with the IssueSpec Pydantic frozen model, MilestoneName = NewType("MilestoneName", str), and a Final[tuple[IssueSpec, ...]] of 8 entries (5 handoff + 3 backlog). Module docstring includes the open-question justification literal.
  2. Write scripts/file_phase3_handoff_issues.py as the impure shell: --dry-run writes fixture, default = live with title-dedupe + body-diff gh issue edit, --project <name> optional with loud no-board warning.
  3. Run python scripts/file_phase3_handoff_issues.py --dry-run and commit the generated tests/unit/docs/_fixtures/issues.json.
  4. Append the ### Adding a Layer B/C/D/E/G probe (Phase 2 additions) H3 under the existing ## Adding a probe H2 in docs/contributing.md.
  5. Append the ## Phase 2 exit-criteria — closed section to the phase README (canonical-table pointer + G1–G10 checklist + Step 8 sign-off).
  6. Mark the three S8-04-owned boxes in High-level-impl.md §Step 8 as [x] (S8-04).
  7. Compute BLAKE3 of tests/adv/phase02/test_phase3_handoff_smoke.py and pin it into _EXPECTED_BLAKE3 in test_skip_reason_unchanged.py.
  8. Run make docs locally; capture mkdocs build --strict exit 0 in _attempts/S8-04.md (AC-6c manual ritual).
  9. Run the file-issue script live (--no-dry-run, with --project if a board exists) and capture issue numbers in _attempts/S8-04.md for the PR description. (This step is run by the human merging the closing PR, NOT by the executor — labelled OPERATOR-RUN.)

REFACTOR:

  • Confirm _phase3_handoff_issues.py is pure data (the Final tuple + the IssueSpec model); no gh import, no subprocess import, no os import.
  • Confirm file_phase3_handoff_issues.py is the only file invoking subprocess.run(["gh", ...]).
  • Validate the JSON fixture round-trips via json.loads(Path(...).read_text()).
  • Double-check no PII / no internal hostnames leaked into issue bodies (Rule 12).
  • mypy --strict scripts/, ruff format, ruff check clean.
  • _attempts/S8-04.md captures the make docs exit-0 and the operator-run-live gh issue list output snapshot showing 8 issues with correct milestones.

Notes for the implementer

  • Issue #4 is the load-bearing one. The other four issues are operational handoff; #4 is the contract trip-wire — without it, Phase 3 can silently drift the four Protocols and Phase 2's typing guarantee evaporates. Treat the wording with care: name the ADR amendment requirement explicitly. The Step 8 PR review must verify this issue's body before merge.
  • Issues mirror stories, not replace them. Each handoff issue body links to the canonical Phase 3 story file(s). The story file is the implementation prescription; the GitHub issue is the project-board notification surface. Resist the urge to inline-copy the story's ACs into the issue body — that creates two sources of truth.
  • scripts/_phase3_handoff_issues.py is pure data; scripts/file_phase3_handoff_issues.py is the impure shell. This split is the functional-core / imperative-shell convention per CLAUDE.md. Tests load the pure registry directly; the impure script is integration-tested via the --dry-run → fixture path.
  • Idempotency matters. The script will be re-run by a future contributor (e.g., to refresh issue bodies after a Phase 3 story is reorganized). Title-dedupe + body-diff gh issue edit is the convention. Never gh issue create without checking for an existing match.
  • GH Project board reuse. If a codewizard-sherpa Project board exists, the closing-PR operator passes --project codewizard-sherpa. If not, the script files without project association + a loud warning. Either way, the milestone Phase 3 — Vuln remediation: deterministic recipe path is created by AC-1d's pre-flight (idempotent).
  • Issue labels. Apply phase:3, handoff:from-phase-2, plus one of loader/plugin/fallback/smoke/allowlist for the five primary issues. Backlog issues get backlog + the relevant area label (mypy, external-docs, skills).
  • docs/contributing.md already has ## Adding a probe. Phase 0's 7-step recipe with LanguageDetectionProbe is the Phase 0/1 generic guidance; the new H3 is the Phase 2 addendum. Do NOT edit the existing H2 — append the H3 below it. Match existing heading depth + style (Rule 11 — match codebase conventions).
  • Phase 2 README's exit-criteria section POINTS at the canonical table. Do NOT duplicate stories/README.md §"Exit-criteria coverage" into the phase README. Duplication will drift. The phase README's section is a high-level G1–G10 sign-off + a link to the canonical mapping.
  • Don't unskip test_phase3_handoff_smoke.py. The unskip is Phase 3's first commit on that test — the action is the entry-gate review. If a reviewer asks "why don't we just unskip it now?", the answer is ADR-0007 / Implementation risk #8: Phase 2 has zero implementations of the Protocols; unskipping in Phase 2 verifies nothing because there's no concrete adapter to verify against. AC-9 enforces this with a BLAKE3 freeze.
  • AC-10b is intentionally soft. S8-04 lands after S8-01..S8-03; if one of those slips, AC-10b warns but does not hard-fail. The closing-PR's manual checklist is the hard gate. This is to avoid coupling S8-04's executor pass to other stories' completion.
  • The OPERATOR-RUN GREEN step (live gh issue creation). The executor runs --dry-run and commits the fixture. The actual live-run against GitHub is a human operator step at PR merge time. Document this clearly in _attempts/S8-04.md; do NOT have the executor authenticate to GitHub.
  • No PII / no internal hostnames in issue bodies. The bodies are public on GitHub; review carefully before live-run.
  • Mark roadmap.md §"Phase 2" complete in a separate commit on the closing PR, not in this story. Mechanical, no test coverage.
  • Phase 0 fence stays green: zero new src/ imports introduced. Trivially.
  • Open-questions selection rationale. AC-8 files backlog issues for the three OPEN items (#2, #4, #5); the other five (#1, #3, #6, #7, #8) are resolved by shipped stories per the inline citations in stories/README.md §"Open implementation questions". The script's docstring carries the justification so a future reader of _phase3_handoff_issues.py understands the selection without re-reading the README.
  • Rule 2 vs IssueSpec registry. Eight issue payloads is past the rule-of-three threshold and heterogeneous enough that a typed registry is justified (open/closed: future handoff stories add rows to the Final tuple; the script logic is unchanged). The IssueSpec model carries title, milestone, body, labels, phase3_stories — the data shape eliminates dict-shuffling drift.