Story S8-04 — Phase-3 handoff issues (project-board mirrors of existing Phase 3 stories) + docs/contributing.md Layer B–G addendum + Phase 2 README exit-criteria pointer¶
Step: Step 8 — Confidence section renderer + CI ratchet + advisory benches + Phase-3 handoff
Status: Done — GREEN 2026-05-18 (phase-story-executor; see _attempts/S8-04.md for the per-AC evidence table + AC-6c manual mkdocs build --strict capture). Zero src/codegenie/** edits per the story invariant. The IssueSpec frozen Pydantic registry + milestones_needed() pure helper ride in scripts/_phase3_handoff_issues.py; scripts/file_phase3_handoff_issues.py is the impure shell (idempotent via title-dedupe + body-diff gh issue edit; --project optional with loud no-board warning per Rule 12). tests/unit/docs/ adds 23 tests covering AC-1..AC-11. docs/contributing.md gains a new H3 ### Adding a Layer B/C/D/E/G probe (Phase 2 additions) UNDER the existing ## Adding a probe H2 (Rule 11 — preserve Phase-0 recipe). Phase 2 README gains a ## Phase 2 exit-criteria — closed section pointing at the canonical stories/README.md §"Exit-criteria coverage" table (no duplication) + a G1–G10 [x] sign-off. High-level-impl.md Step 8: three boxes ticked [x] (S8-04); AC-10b's intentionally-soft warning catches that the other five Step-8 boxes (owned by S8-01/02/03 — those stories shipped but did not tick these specific lines) remain [ ]. tests/adv/phase02/test_phase3_handoff_smoke.py BLAKE3-frozen at 613f7f4e8102e2aa5f5ec0128c4da295191ac3ad5ca7ea8236a877979b886fc6; Phase 3's entry-gate review owns the unskip. Gates: 23/23 unit/docs tests pass; mypy --strict, ruff check+format --check, fence, test_doc_consistency, lint-imports (2 kept, 0 broken), mkdocs build --strict all green; full unit suite 3457 passed, 17 skipped, 1 xfailed. One Rule-7 surface in the attempt log: the spec asked issue #5's body to mention the wrong src/codegenie/exec.py path; the executor used only the correct src/codegenie/exec/__init__.py:96 and kept the negative-assertion test (same defensive coverage, no contradiction).
Effort: S
Depends on: S8-03 (eight CI jobs green on master).
ADRs honored: 02-ADR-0007 (no Plugin Loader in Phase 2 — Phase 3 ships loader + first plugin + adapters together); 02-ADR-0006 (IndexFreshness sum-type location — any drift by Phase 3 requires an ADR amendment to ADR-0006); 02-ADR-0001 (ALLOWED_BINARIES — Phase 3 extends with npm, jq via a fresh ADR or amendment); production ADR-0031 (plugin architecture — Phase 3 owns the loader); production ADR-0032 (language search adapters — Phase 3 owns the four adapter implementations); ADR-0008 (no event-stream — this story adds zero new structlog events).
Validation notes (2026-05-18 — phase-story-validator)¶
This story was hardened by phase-story-validator. The draft's Goal — Phase 2 close-out: project-board issues for Phase 3 work, a contributor cheat-sheet, and an exit-criteria sign-off — is sound. The prescriptions, however, contradicted master in nine places and prescribed mechanisms that would silently duplicate existing artifacts. Sixteen findings closed. Verdict: HARDENED.
-
Redundancy with shipped Phase 3 stories. Phase 3 already has 47 designed stories under
docs/phases/03-vuln-deterministic-recipe/stories/— includingS2-01-plugin-registry-kernel.md,S2-02-plugin-manifest-pydantic.md,S2-03-plugin-loader-integrity.md,S2-04-plugin-resolver-extends.md(the four-part Plugin Loader decomposition),S7-01-vuln-node-npm-plugin-scaffold.md(the npm plugin),S7-03-universal-hitl-fallback-plugin.md(the universal fallback). The original AC-1..AC-5 re-prescribed the work from scratch, creating a parallel canonical surface that would drift from the story files. Resolution: each handoff issue is now an explicit project-board mirror that links to the canonical Phase 3 story file(s); the issue body cites the story IDs and adds the Phase-2-handoff context (e.g., "the four Protocols Phase 2 froze are atsrc/codegenie/adapters/protocols.py"). No re-prescription. -
docs/contributing.mdalready has## Adding a probe. Line 69 contains a 7-step recipe citingLanguageDetectionProbe(Phase 0). The draft prescribed a parallel H2## Adding a Layer B/C/D/E/G probe. Resolution (Rule 7 — surface the conflict, don't blend): add a subsection### Adding a Layer B/C/D/E/G probe (Phase 2 additions)under the existing## Adding a probeH2 (not a parallel H2). The subsection covers what Phase 2 added (@register_probe(heaviness=, runs_last=),run_external_clifor B/G vsrun_allowlistedfor C,@register_index_freshness_check,model_constructforbidden underoutput/) with Phase 2 probes as canonical examples. The Phase 0 recipe stays untouched. -
Exit-criteria duplication.
stories/README.md §"Exit-criteria coverage"already has the complete mapping table. The draft prescribed a second copy in the phase README. Resolution: the phase README gets a small## Phase 2 exit-criteria — closedsection with a pointer tostories/README.md §"Exit-criteria coverage"(the canonical table) + a top-level[x]checklist of the high-level Phase-2 goals (G1–G10 from arch-design.md §"Goals"). No table duplication; one source of truth. -
Wrong file path.
src/codegenie/exec.pydoes not exist; the path issrc/codegenie/exec/__init__.py(execis a package, not a module). References line + AC-5 body corrected. -
GH Project board not verified. No board was ever verified to exist. The draft assumed
gh issue create --project <board>would just work. Resolution: AC-1's pre-flight assertsgh api repos/:owner/:repo/milestoneslists thePhase 3milestone (creates it if missing — milestone is repo-scoped, not project-scoped, so it does not require a Project board). The--projectflag is optional: the script accepts--project <name>for organizations that maintain a Project board; if omitted, the script files issues without a project association and prints a loudWARNING: no project board provided; issues filed without board association(Rule 12 — fail loud). No silent downgrade. -
Issue idempotency on re-run. The draft script would create 8 duplicate issues on a second run. Resolution: the script's first step is
gh issue list --json title --search "[Phase 3]" --state all --limit 100→ dedupe-by-title; if a matching title already exists with the same milestone, the script updates the body viagh issue edit(idempotent) rather than creating a duplicate. AC-1b asserts: a second invocation of the script produces zero new issues and zero body changes (no-op idempotence). -
Mutation-weakness on AC-1 test. The draft's check ("body mentions ADR-0007") would pass against an empty body containing the literal string. Resolution: AC-1's assertion is now structured — body MUST contain all-of
{ADR-0007, ADR-0031, src/codegenie/adapters/protocols.py, S2-01-plugin-registry-kernel, S2-02-plugin-manifest-pydantic, S2-03-plugin-loader-integrity, S2-04-plugin-resolver-extends}AND have a non-empty body length>= 200characters AND have a structured "Phase 2 context", "Phase 3 stories", "Acceptance" subsection. Mutation-resistant. -
mkdocs build --strictin unit test. The draft prescribed invoking it as a subprocess intest_contributing_cheatsheet.py::test_mkdocs_build_strict. That is slow + side-effectful + duplicates the existingmake docsCI job (per CLAUDE.mdmake checkchain) + creates a tmp build dir that hits disk. Resolution: unit test does grep-only — asserts the section heading exists, the seven steps are enumerated, and each canonical-example probe name appears. Themkdocs build --strictinvocation stays in the existingdocsCI workflow (make docs); a new unit test (tests/unit/docs/test_mkdocs_nav_includes_contributing.py) parsesmkdocs.ymlYAML and assertscontributing.mdis in the nav tree (no subprocess). Faster, hermetic, equivalent coverage. -
AC-9 fragility. Draft prescribed updating the test's
@pytest.mark.skipreasonstring to include the filed GitHub issue number. GH issue numbers can move on repo migrations; the existing reason already cites ADR-0007 +High-level-impl.md §Step 7(more durable). Resolution: AC-9 dropped (no edit totest_phase3_handoff_smoke.py); the test's skip-reason stays as-is. The filed issue #4 body cites the test path; Phase 3's executor unskips at the entry-gate review and updates the skip-reason then. -
AC-8 cherry-picked subset. Draft filed backlog issues for "Open implementation questions" #2/#4/#5 with no justification for excluding #1/#3/#6/#7/#8. Reading
stories/README.md §"Open implementation questions"shows #1, #3, #6, #7, #8 are already resolved (encoded in shipped stories — S1-02, S4-02/S7-02, S3-01, S7-04, S1-11 respectively); #2/#4/#5 are the actual open items needing future work. AC-8 now justifies the selection inline: "#2/#4/#5 are the three items still open perstories/README.md; #1/#3/#6/#7/#8 are resolved (citation inline)." -
AC-10 cross-story coupling. Draft AC-10 asserted every Step 8 done-criterion box is
[x], but 7 of 8 boxes are closed by S8-01/S8-02/S8-03; one slipping turns S8-04 red. Resolution: AC-10 split into AC-10a (this story marks the two Step 8 boxes it owns — "All five Phase-3 handoff issues exist"; "docs/contributing.mdbuilds inmkdocs build --strict"; "docs/phases/02-context-gather-layers-b-g/README.mdchecklist marked complete and committed" — three boxes) and AC-10b (status check — asserts S8-01/02/03 boxes are[x]as a read-only verification, not a write). If S8-01/02/03 ship before S8-04 lands, AC-10b passes trivially. If they don't, AC-10b surfaces the dependency as data without blocking S8-04's own boxes. -
Off-by-one in Goal §2. Draft said "the four 'Decisions noted but not yet documented'" but listed three (#2/#4/#5). Resolution: corrected to "three" and added the inline justification per finding #10.
-
Function name citation. Test function is
test_phase3_adapter_handoff_smoke(filetest_phase3_handoff_smoke.py). Multiple references corrected. -
Design-patterns:
IssueSpecPydantic model + typed registry. Eight heterogeneous issue payloads as inline dicts in the script is primitive obsession + anaemic dict. Resolution:scripts/_phase3_handoff_issues.pyexposes aFinaltuple ofIssueSpecfrozen Pydantic models (title: str,milestone: MilestoneName,body: str,labels: frozenset[Label],phase3_stories: tuple[str, ...]).scripts/file_phase3_handoff_issues.pyis the impure shell consuming the tuple. Open/Closed: a future handoff story adds a row to the tuple; the script logic is unchanged. Pure spec / impure I/O — CLAUDE.md convention. -
MilestoneNamenewtype. Draft uses rawstrfor milestone everywhere. Resolution:MilestoneName = NewType("MilestoneName", str)co-located in_phase3_handoff_issues.py(single-use; no need to land incodegenie.types.identifiers). -
AC-6 layer convention check. Draft prescribed "route external CLIs through
run_external_cli(B/G) orrun_allowlisted(C only)". Spot-check againstsrc/codegenie/exec/__init__.py+ Layer C probes confirmsrun_allowlistedis used directly by Layer C (e.g.,git rev-parse),run_external_cliis the wrapper for Layer B/G. The cheat-sheet text was correct; codified as a verifiable assertion intest_contributing_cheatsheet.pythat the section names both functions.
Full critic findings + decision rationale archived at _validation/S8-04-phase3-handoff-and-docs.md.
Context¶
Phase 2 ships kernel-side scaffolding only: adapter Protocols, TCCMLoader, SkillsLoader, IndexFreshness, registration plumbing. The Plugin Loader itself, the universal (*, *, *) fallback plugin, and the first concrete plugin (plugins/vulnerability-remediation--node--npm/) are deliberately deferred to Phase 3 per ADR-0007 + ADR-0031 §Consequences §1. Phase 3 has already been fully designed — 47 stories under docs/phases/03-vuln-deterministic-recipe/stories/ carry the implementation prescription. This story is the project-board mirror: file five GitHub issues that each link to the Phase 3 story files + a Phase 3 — Vuln remediation: deterministic recipe path repo-level milestone, so Phase 3 has a fully-loaded project-board view at start of work. No re-prescription.
The handoff is also the moment to close the Phase 2 README's exit-criteria sign-off (a high-level [x] checklist pointing at stories/README.md §"Exit-criteria coverage" as the canonical mapping table) and extend docs/contributing.md's existing ## Adding a probe section with a Phase 2 subsection (Layer B/C/D/E/G additions: @register_probe(heaviness=, runs_last=), run_external_cli, @register_index_freshness_check, the model_construct ban under output/). The subsection uses Phase 2's now-shipped probes as canonical examples — IndexHealthProbe (B2), RuntimeTraceProbe (C), SemgrepProbe (G), SkillsIndexProbe (D), ConventionsProbe (D) — so a new probe author can copy a real probe and only edit what's task-specific.
The most load-bearing of the five issues is #4 — unskip tests/adv/phase02/test_phase3_handoff_smoke.py at Phase 3 entry-gate review. The test (function name test_phase3_adapter_handoff_smoke) currently has @pytest.mark.skip(reason="enabled when Phase 3 plugin lands — see ... ADR-0007 ... High-level-impl.md §Step 7"). Unskipping forces re-verification that Phase 2's four adapter Protocols are imported unchanged. Any drift (e.g., Phase 3 discovers consumers(self, pkg: str) should be consumers(self, pkg: PackageId, *, transitively: bool = False)) requires an explicit ADR amendment to 02-ADR-0006 or 02-ADR-0007 — not a silent Protocol edit. This is the contract trip-wire phase-arch-design.md §"Gap 1" identified; issue #4 is what makes Phase 3 honor it.
This is the smallest story in Step 8 in code terms (zero new src/ code) and the largest in coordination terms (cross-phase contract handoff, GH issue automation, contributor docs).
References — where to look¶
- Architecture:
../phase-arch-design.md §"Integration with Phase 3"— the canonical source for issue body content (table of inheritance + implicit guarantees + new artifacts).../phase-arch-design.md §"Gap analysis" Gap 1— Adapter Protocol drift; thetest_phase3_handoff_smoke.pytrip-wire.../phase-arch-design.md §"Adversarial tests"—test_phase3_handoff_smoke.pyrow.- Phase ADRs:
../ADRs/0007-no-plugin-loader-in-phase-2.md— rationale; Phase 3 ships loader + first plugin + adapters together.../ADRs/0006-index-freshness-sum-type-location.md— variant set stable; Phase 3 extension requires amendment.../ADRs/0001-add-docker-and-security-cli-tools-to-allowed-binaries.md— Phase 3 amendment fornpm,jqis the cleanest precedent.../ADRs/0008-no-event-stream-in-phase-2.md— this story adds zero new structlog events.- Production ADRs:
../../../production/adrs/0031-plugin-architecture.md§Consequences §1 — first plugin doubles as the proof the loader works.../../../production/adrs/0032-language-search-adapters.md— the four adapter Protocols Phase 3 implements.- Source design:
../final-design.md §"What's next — handoff to Phase 3"— full prose of the four handoff bullets.../final-design.md §"Open questions deferred to implementation"— backlog items this story files.- Roadmap:
../../../roadmap.md §"Phase 3 — Vuln remediation: deterministic recipe path"— milestone name; issue milestones align to this.- Phase 3 stories (the issues mirror these — link, do not re-prescribe):
../../03-vuln-deterministic-recipe/stories/S2-01-plugin-registry-kernel.md../../03-vuln-deterministic-recipe/stories/S2-02-plugin-manifest-pydantic.md../../03-vuln-deterministic-recipe/stories/S2-03-plugin-loader-integrity.md../../03-vuln-deterministic-recipe/stories/S2-04-plugin-resolver-extends.md../../03-vuln-deterministic-recipe/stories/S7-01-vuln-node-npm-plugin-scaffold.md../../03-vuln-deterministic-recipe/stories/S7-02-npm-recipes-and-adapters.md../../03-vuln-deterministic-recipe/stories/S7-03-universal-hitl-fallback-plugin.md- Existing code (Phase 2 artifacts the handoff issues reference):
src/codegenie/adapters/protocols.py(S1-03) — four Protocols (DepGraphAdapter,ImportGraphAdapter,ScipAdapter,TestInventoryAdapter).src/codegenie/indices/freshness.py(S1-01) —IndexFreshnessvariant set.src/codegenie/tccm/(S1-04, S2-03) —TCCMLoader.src/codegenie/skills/loader.py(S2-01) — three-tier merge.src/codegenie/exec/__init__.py(S1-07; note: package, not module) —run_external_cli+ALLOWED_BINARIES. Currently lacksnpm,jq.tests/adv/phase02/test_phase3_handoff_smoke.py(S7-04) — functiontest_phase3_adapter_handoff_smoke; landed skipped with reason citing ADR-0007 + High-level-impl.md §Step 7. Do NOT edit the skip reason from this story — Phase 3 owns the unskip.docs/contributing.md— already has## Adding a probeH2 at line 69 (Phase 0/1 content); this story APPENDS a subsection.docs/phases/02-context-gather-layers-b-g/README.md— Phase 2 README; gets a new## Phase 2 exit-criteria — closedsection that POINTS atstories/README.md §"Exit-criteria coverage"(does NOT duplicate the table).docs/phases/02-context-gather-layers-b-g/stories/README.md §"Exit-criteria coverage"— canonical mapping table; do NOT edit.docs/phases/02-context-gather-layers-b-g/stories/README.md §"Open implementation questions"— 8 items; #2, #4, #5 are the three currently open (the other 5 are resolved by shipped stories per the inline citation).
Goal¶
Three deliverables, no production-code changes:
-
File eight GitHub issues on the repo (and optionally on a Project board if one exists) — five handoff issues (Phase 3 work) and three backlog issues (post-Phase-3 open questions). Each handoff issue is a project-board mirror that links to the canonical Phase 3 story file(s); each backlog issue links to the relevant
Open implementation questionsrow instories/README.md. Use a typedIssueSpecPydantic model inscripts/_phase3_handoff_issues.py(data registry) consumed byscripts/file_phase3_handoff_issues.py(impure shell). Idempotent on re-run (dedupe-by-title;gh issue editfor body updates). -
Extend
docs/contributing.md's existing## Adding a probesection (Phase 0/1 content, line 69) with a new subsection### Adding a Layer B/C/D/E/G probe (Phase 2 additions). The subsection covers seven Phase-2-specific topics: (a) heaviness annotation via@register_probe(heaviness=, runs_last=); (b)run_external_clifor Layer B/G external CLIs vsrun_allowlisteddirect for Layer C; (c)@register_index_freshness_checkOpen/Closed registration; (d) typedProbeOutput.schema_slicevia Pydantic withmodel_constructbanned underoutput/; (e)declared_inputsfor cache keys (including special tokens likeimage-digest:per ADR-0004); (f) confidence reporting discipline ("high"|"medium"|"low"— facts, not judgments); (g) canonical Phase 2 examples (IndexHealthProbe,RuntimeTraceProbe,SemgrepProbe,SkillsIndexProbe,ConventionsProbe). The doc passesmkdocs build --strictvia the existingmake docsCI job (this story does NOT invoke it from a unit test). -
Mark
docs/phases/02-context-gather-layers-b-g/README.md's exit-criteria closed. Append a## Phase 2 exit-criteria — closedsection that (a) points at the canonical table instories/README.md §"Exit-criteria coverage"(no duplication); (b) provides a top-level[x]checklist over the high-level Phase 2 goals (G1–G10 from arch-design.md §"Goals"), each line citing the story IDs that closed it (cross-checked against the canonical table).
Acceptance criteria¶
- [ ] AC-1 (Handoff issue #1 — Plugin Loader + manifest parser + resolver — linked to Phase 3 stories S2-01..S2-04). A GitHub issue exists with title
[Phase 3] Implement Plugin Loader: kernel + manifest parser + integrity loader + resolverand milestonePhase 3 — Vuln remediation: deterministic recipe path. Body MUST contain all of: (a) the literal substringsADR-0007,ADR-0031,src/codegenie/adapters/protocols.py; (b) markdown links to all four Phase 3 story files:S2-01-plugin-registry-kernel.md,S2-02-plugin-manifest-pydantic.md,S2-03-plugin-loader-integrity.md,S2-04-plugin-resolver-extends.md; (c) a "Phase 2 context" H3 (≥ 50 chars), a "Phase 3 stories" H3 (the four links), and an "Acceptance" H3; (d) total body length>= 200chars.tests/unit/docs/test_phase3_handoff_issues.py::test_issue_1_body_structuredreadstests/unit/docs/_fixtures/issues.json(committed dry-run output) and asserts each of (a)–(d). - [ ] AC-1b (Idempotency — re-running the script is a no-op). The script's first step:
gh issue list --json title,body,number --search "[Phase 3]" --state all --limit 100. For eachIssueSpecin the registry, if a matching title exists, the script callsgh issue edit <num> --body-file ...ONLY if the existing body differs from the rendered body (string compare).tests/unit/docs/test_phase3_handoff_issues.py::test_idempotent_second_runsimulates a second invocation against the fixture and asserts zerogh issue createcalls + zerogh issue editcalls (when bodies match). - [ ] AC-1c (No-board graceful degradation + loud warning). The script accepts
--project <board-name>as OPTIONAL. If absent, issues file without project association; the script printsWARNING: no project board provided; issues filed without board associationto stderr (Rule 12 — fail loud). If the--projectvalue is provided butgh project listdoes not match, the script EXITS with code 2 and a loud error (do not silently downgrade an explicit--projectflag).tests/unit/docs/test_phase3_handoff_issues.py::test_no_project_warningasserts both paths via subprocess monkeypatching. - [ ] AC-1d (Milestone pre-flight). The script's pre-flight asserts the
Phase 3 — Vuln remediation: deterministic recipe pathmilestone exists viagh api repos/:owner/:repo/milestones. If missing, the script creates it (idempotent: a second creation attempt is a no-op viagh api ... --silent || true).test_milestone_preflight_creates_idempotentlyasserts the milestone API call sequence. - [ ] AC-2 (Handoff issue #2 — first plugin
plugins/vulnerability-remediation--node--npm/+ four ADR-0032 adapter implementations — linked to S7-01 + S7-02). Body contains all of:ADR-0032,src/codegenie/adapters/protocols.py, markdown links toS7-01-vuln-node-npm-plugin-scaffold.mdandS7-02-npm-recipes-and-adapters.md, an enumeration of the four implementations (dep_graph_npm.py,import_graph_node.py,scip_node.py,test_inventory_node.py), and citations to the Phase 2 fixtures (monorepo-pnpm,minimal-ts). AC-1's structured-payload pattern (>= 200chars, three H3 sections) repeats.test_issue_2_body_structured. - [ ] AC-3 (Handoff issue #3 — universal
(*, *, *)fallback plugin / HITL escalation — linked to S7-03). Body containsproduction/design.md §"Humans always merge",ADR-0031, link toS7-03-universal-hitl-fallback-plugin.md, and an explanation of when the fallback fires (no concrete plugin matches the(task-class, language, package-manager)triple).test_issue_3_body_structured. - [ ] AC-4 (Handoff issue #4 — LOAD-BEARING — unskip
test_phase3_handoff_smoke.pyat Phase 3 entry-gate review; explicit ADR-amendment requirement). Body contains the literal phrases: (a)Any Protocol drift requires an explicit ADR amendment to 02-ADR-0006 / 02-ADR-0007; (b)tests/adv/phase02/test_phase3_handoff_smoke.py(file path); (c)test_phase3_adapter_handoff_smoke(the actual function name); (d)phase-arch-design.md §"Gap 1"; (e) a numbered "Acceptance at Phase 3 entry-gate" list with at least 3 items (run the test, verify Protocols imported unchanged, file ADR amendment if drift).test_issue_4_body_load_bearingasserts all five literals. - [ ] AC-5 (Handoff issue #5 — extend
ALLOWED_BINARIESfornpm,jqvia amendment ADR). Body contains the literal stringsrc/codegenie/exec/__init__.py(NOT the wrongsrc/codegenie/exec.py), references02-ADR-0001as the precedent, namesnpmandjqas the only two additions, and explicitly forbids "while we're at it" binaries (Implementation risk #2). The body acknowledges the structural enforcement: theALLOWED_BINARIES: frozenset[str]atexec/__init__.py:96is the real guard; the issue body restating the discipline is documentation, not a substitute for the frozenset edit.test_issue_5_body_correct_path. - [ ] AC-6 (
docs/contributing.md—### Adding a Layer B/C/D/E/G probe (Phase 2 additions)subsection added UNDER the existing## Adding a probeH2). The new content is an H3 subsection, NOT a parallel H2 (Rule 7 — surface the conflict, don't blend; one source-of-truth recipe with a Phase 2 addendum). The subsection covers the seven topics from Goal §2. Each topic names at least one canonical Phase 2 probe example. The existing 7-step recipe (Phase 0LanguageDetectionProbe) is unedited.tests/unit/docs/test_contributing_cheatsheet.py::test_subsection_under_existing_h2parsesdocs/contributing.md, asserts: (a) the existing## Adding a probeH2 at line ~69 is untouched (byte-identical first 50 lines of the section); (b) the new### Adding a Layer B/C/D/E/G probe (Phase 2 additions)H3 exists within that H2 (no parallel H2 introduced); (c) the H3 names all seven topics; (d) the H3 cites all five canonical probe examples (IndexHealthProbe,RuntimeTraceProbe,SemgrepProbe,SkillsIndexProbe,ConventionsProbe). - [ ] AC-6b (mkdocs nav unchanged +
contributing.mdreachable — no subprocess invocation in unit tests). A new unit testtests/unit/docs/test_mkdocs_nav_includes_contributing.pyparsesmkdocs.ymlas YAML and assertscontributing.mdappears in the nav tree (recursive search through nested lists). Themkdocs build --strictinvocation stays in the existingmake docsCI job (per CLAUDE.mdmake checkchain); this story does NOT shell out tomkdocsfrom a unit test. AC-6c (manual ritual, captured in_attempts/S8-04.md): runmake docslocally before opening the closing PR; capture exit 0 in the attempt log. - [ ] AC-7 (
docs/phases/02-context-gather-layers-b-g/README.md—## Phase 2 exit-criteria — closedsection appended; POINTS at canonical table; high-level[x]checklist over G1–G10). The new section: (a) starts with a single paragraph pointing atstories/README.md §"Exit-criteria coverage"as the canonical mapping ("Canonical mapping table: see stories/README.md §Exit-criteria coverage."); (b) follows with a markdown checklist of the ten Phase 2 high-level goals (G1–G10 fromphase-arch-design.md §"Goals"), each line[x]+ one-sentence summary + the story IDs that closed it (cross-referenced against the canonical table); (c) ends with a sign-off line crediting the story IDs that close Step 8 (S8-01..S8-04).tests/unit/docs/test_phase2_readme_signoff.pyparses the Phase 2 README, asserts: (i) the new H2 section exists; (ii) the canonical-table pointer line exists (literal substring match for the link); (iii) every checkbox in the section is[x], none[ ]; (iv) the checkbox count is exactly 10 (one per G1–G10) — NOT a duplicate of the full ~22-row table. No table duplication. - [ ] AC-8 (Backlog issues for the three OPEN open-questions — #2, #4, #5 — with inline justification for why the other five are excluded). Three backlog issues on the milestone
Backlog(orPost-Phase-3): [Backlog] Full-repo mypy --warn-unreachable rollout(perstories/README.md §"Open implementation questions"#2 — backlog item; the global flag inpyproject.toml:172already covers the repo, but per-module override file-list audit is outstanding).[Backlog] ExternalDocsProbe host-allowlist config schema(per #4 — first arises when a real user opts in; Phase-4-or-later).[Backlog] SkillsLoader per-tier signing (Sigstore-style)(per #5 — Phase 14 multi-tenant concern). The_phase3_handoff_issues.pyregistry's docstring explicitly justifies:# #1, #3, #6, #7, #8 are resolved by shipped stories (S1-02, S4-02/S7-02, S3-01, S7-04, S1-11); see stories/README.md §"Open implementation questions" inline citations.test_backlog_issues_justifiedasserts the docstring contains the justification literal AND the three backlogIssueSpecs are present.- [ ] AC-9 (No edit to
test_phase3_handoff_smoke.py's skip reason — Phase 3 owns the update). This story explicitly does NOT modifytests/adv/phase02/test_phase3_handoff_smoke.py. The existing@pytest.mark.skip(reason=...)already cites ADR-0007 +High-level-impl.md §Step 7(more durable than a GitHub issue number that may move on repo migration). Issue #4's body cites the file path; Phase 3's executor updates the skip-reason at the entry-gate review.tests/unit/docs/test_skip_reason_unchanged.pyreadstests/adv/phase02/test_phase3_handoff_smoke.py, computes BLAKE3 of the file, and asserts it matches a frozen_EXPECTED_BLAKE3constant captured at the time S8-04 lands. A future edit to the file triggers a loud test failure prompting an ADR review. - [ ] AC-10a (Step 8 done-criteria boxes owned by this story closed).
docs/phases/02-context-gather-layers-b-g/High-level-impl.md §"Step 8 — Done criteria"— the THREE boxes this story owns are marked[x]and reference S8-04: "All five Phase-3 handoff issues exist on the GitHub Project board with milestones aligned to roadmap.md §Phase 3"; "docs/contributing.mdbuilds inmkdocs build --strictand remains in curated nav"; "docs/phases/02-context-gather-layers-b-g/README.mdchecklist marked complete and committed".tests/unit/docs/test_step8_s8_04_boxes_closed.py::test_three_owned_boxes_checkedasserts exactly these three boxes are[x]with the literal(S8-04)annotation alongside each. - [ ] AC-10b (Read-only verification — other Step 8 boxes closed by S8-01/02/03). A read-only assertion: the other five Step 8 done-criteria boxes are also
[x](closed by S8-01/02/03).tests/unit/docs/test_step8_other_boxes_closed.pyis a SOFT assertion viapytest.warns(UserWarning)if any[ ]remains in Step 8; a hardxfailannotation when run in isolation (avoid this story failing due to S8-01/02/03 slips). The closing-PR's manual checklist verifies the hard zero-[ ]state before merge. - [ ] AC-11 (
mypy --strict+ruff+ Phase 0fencegreen; zero newsrc/imports). This story changes zerosrc/codegenie/**files.mypy --strict scripts/file_phase3_handoff_issues.py scripts/_phase3_handoff_issues.pypasses.ruff check + format --checkgreen on all new/touched files. Phase 0fencejob stays green trivially (no LLM/network imports introduced; the script'sghinvocation is viasubprocess— a stdlib import, not a network library import).
Out of scope¶
- Implementing any Phase 3 code. Plugin Loader, first plugin, adapters,
npm/jqallowlist edits are all Phase 3 (covered by Phase 3 stories S2-01..S2-04, S7-01..S7-03). - Unskipping
test_phase3_handoff_smoke.py. That action belongs to Phase 3's entry-gate review (covered by issue #4). AC-9 enforces this by BLAKE3-freezing the file. - Editing the four adapter
Protocols atsrc/codegenie/adapters/protocols.py. Any drift is an ADR amendment, not silent code change (Implementation risk #8). - Duplicating the
stories/README.md §"Exit-criteria coverage"table into the phase README. The phase README POINTS at the canonical table + provides a small G1–G10 sign-off checklist. - Adding a parallel
## Adding a Layer B/C/D/E/G probeH2 indocs/contributing.md. The new content is a SUBSECTION (H3) under the existing## Adding a probeH2. - Filing GitHub issues for already-resolved open questions (#1, #3, #6, #7, #8). The script's docstring justifies the selection inline.
- Updating
test_phase3_handoff_smoke.py's skip-reason text. Phase 3's entry-gate review owns that edit. - Invoking
mkdocs build --strictfrom a unit test. The existingmake docsCI job covers this; a unit subprocess is slow + duplicative. - Creating a GitHub Project board. If one exists, the script uses it via
--project; if not, issues file without board association + a loud warning. - Adding a "Phase 2 retrospective" document. Useful, but not required by the roadmap; if the team wants one, a separate ticket.
- Migrating
docs/contributing.mdto a new doc system. Stay inmkdocs. - Editing
roadmap.mdto mark Phase 2 done. Mechanical, separate commit on the closing PR.
Files to touch¶
New:
tests/unit/docs/__init__.py— empty.tests/unit/docs/test_phase3_handoff_issues.py— AC-1, AC-1b, AC-1c, AC-1d, AC-2, AC-3, AC-4, AC-5, AC-8. Reads from a generatedtests/unit/docs/_fixtures/issues.json(committed; produced by the script's--dry-runmode).tests/unit/docs/test_contributing_cheatsheet.py— AC-6 (grep-only; no subprocess).tests/unit/docs/test_mkdocs_nav_includes_contributing.py— AC-6b.tests/unit/docs/test_phase2_readme_signoff.py— AC-7.tests/unit/docs/test_skip_reason_unchanged.py— AC-9 (BLAKE3 freeze).tests/unit/docs/test_step8_s8_04_boxes_closed.py— AC-10a.tests/unit/docs/test_step8_other_boxes_closed.py— AC-10b (soft assertion).scripts/_phase3_handoff_issues.py— typedIssueSpecPydantic frozen model +MilestoneNamenewtype +Finaltuple of 8IssueSpecinstances (5 handoff + 3 backlog) + docstring justifying the open-question selection.scripts/file_phase3_handoff_issues.py— impure shell consuming the registry. Flags:--project <name>(optional),--dry-run(writes fixture totests/unit/docs/_fixtures/issues.json), default = live. Idempotent via title-dedupe + body-diffgh issue edit.tests/unit/docs/_fixtures/issues.json— committed dry-run output; the unit tests read this rather than hitting GH live.
Modified:
docs/contributing.md— append H3### Adding a Layer B/C/D/E/G probe (Phase 2 additions)UNDER the existing## Adding a probeH2. The existing 7-step recipe is unedited.docs/phases/02-context-gather-layers-b-g/README.md— append## Phase 2 exit-criteria — closedsection with canonical-table pointer + G1–G10[x]checklist + Step 8 sign-off line.docs/phases/02-context-gather-layers-b-g/High-level-impl.md— mark the THREE Step 8 done-criterion boxes this story owns as[x] (S8-04). (Other steps' done-criteria are closed by their own stories; this story closes only its three.)
Untouched (DO NOT EDIT):
src/codegenie/adapters/protocols.py(Implementation risk #8 — Protocol shape is Phase 3's discovery; any drift is ADR amendment).src/codegenie/exec/__init__.py(Phase 3 owns thenpm/jqextension).- Any Phase 2 production
src/code undersrc/codegenie/. tests/adv/phase02/test_phase3_handoff_smoke.py(AC-9 enforces — BLAKE3 frozen).docs/phases/02-context-gather-layers-b-g/stories/README.md(canonical exit-criteria coverage table; the phase README POINTS at it).roadmap.md§"Phase 3" itself (story files issues against the milestone; the roadmap text is unchanged).mkdocs.yml(nav already includes contributing; AC-6b verifies, does NOT edit).
TDD plan — red / green / refactor¶
RED (failing tests committed first):
test_phase3_handoff_issues.py::test_issue_1_body_structured— reads_fixtures/issues.json, asserts the four literal substrings + four story-link substrings + three H3 sections + body length>= 200. Fails red.test_phase3_handoff_issues.py::test_issue_4_body_load_bearing— asserts the five literal phrases for issue #4 (most load-bearing). Fails red.test_phase3_handoff_issues.py::test_issue_5_body_correct_path— asserts the literalsrc/codegenie/exec/__init__.py(NOTexec.py). Fails red. Guards against the original draft's wrong path.test_phase3_handoff_issues.py::test_idempotent_second_run— simulates two consecutive invocations via subprocess monkeypatching against the fixture; asserts zero create + zero edit calls on the second run. Fails red.test_phase3_handoff_issues.py::test_no_project_warning— invokes the script without--project; asserts stderr contains the literalWARNING: no project board provided. With an unknown--project bogus, asserts exit code 2 + loud error. Fails red.test_phase3_handoff_issues.py::test_milestone_preflight_creates_idempotently— asserts thegh api repos/:owner/:repo/milestonescall appears + creates if missing + does NOT create on a second run. Fails red.test_phase3_handoff_issues.py::test_backlog_issues_justified— asserts the script's docstring contains the inline justification literal AND the three backlogIssueSpecs exist with[Backlog]title prefix. Fails red.test_contributing_cheatsheet.py::test_subsection_under_existing_h2— parsesdocs/contributing.md, asserts: existing## Adding a probeH2 byte-identical first 50 lines; new H3 nested within; H3 names seven topics + five canonical probes. Fails red.test_mkdocs_nav_includes_contributing.py::test_contributing_in_nav_tree— parsesmkdocs.yml, recursive nav search; assertscontributing.mdpresent. Fails red if removed.test_phase2_readme_signoff.py::test_signoff_section_well_formed— asserts new H2 exists + canonical-table pointer substring + exactly 10 checkboxes all[x]+ Step 8 sign-off line citing S8-01..S8-04. Fails red.test_skip_reason_unchanged.py::test_blake3_frozen— asserts BLAKE3 oftests/adv/phase02/test_phase3_handoff_smoke.pymatches_EXPECTED_BLAKE3. Fails red if file changes.test_step8_s8_04_boxes_closed.py::test_three_owned_boxes_checked— asserts the three S8-04-owned boxes inHigh-level-impl.md §Step 8are[x] (S8-04). Fails red.test_step8_other_boxes_closed.py::test_other_boxes_warn_if_unchecked— soft assertion (pytest.warns); never hard-fails. Catches S8-01/02/03 slips as visible signal.
GREEN (minimum code to pass):
- Write
scripts/_phase3_handoff_issues.pywith theIssueSpecPydantic frozen model,MilestoneName = NewType("MilestoneName", str), and aFinal[tuple[IssueSpec, ...]]of 8 entries (5 handoff + 3 backlog). Module docstring includes the open-question justification literal. - Write
scripts/file_phase3_handoff_issues.pyas the impure shell:--dry-runwrites fixture, default = live with title-dedupe + body-diffgh issue edit,--project <name>optional with loud no-board warning. - Run
python scripts/file_phase3_handoff_issues.py --dry-runand commit the generatedtests/unit/docs/_fixtures/issues.json. - Append the
### Adding a Layer B/C/D/E/G probe (Phase 2 additions)H3 under the existing## Adding a probeH2 indocs/contributing.md. - Append the
## Phase 2 exit-criteria — closedsection to the phase README (canonical-table pointer + G1–G10 checklist + Step 8 sign-off). - Mark the three S8-04-owned boxes in
High-level-impl.md §Step 8as[x] (S8-04). - Compute
BLAKE3oftests/adv/phase02/test_phase3_handoff_smoke.pyand pin it into_EXPECTED_BLAKE3intest_skip_reason_unchanged.py. - Run
make docslocally; capturemkdocs build --strictexit 0 in_attempts/S8-04.md(AC-6c manual ritual). - Run the file-issue script live (
--no-dry-run, with--projectif a board exists) and capture issue numbers in_attempts/S8-04.mdfor the PR description. (This step is run by the human merging the closing PR, NOT by the executor — labelledOPERATOR-RUN.)
REFACTOR:
- Confirm
_phase3_handoff_issues.pyis pure data (theFinaltuple + theIssueSpecmodel); noghimport, nosubprocessimport, noosimport. - Confirm
file_phase3_handoff_issues.pyis the only file invokingsubprocess.run(["gh", ...]). - Validate the JSON fixture round-trips via
json.loads(Path(...).read_text()). - Double-check no PII / no internal hostnames leaked into issue bodies (Rule 12).
mypy --strict scripts/,ruff format,ruff checkclean._attempts/S8-04.mdcaptures themake docsexit-0 and the operator-run-livegh issue listoutput snapshot showing 8 issues with correct milestones.
Notes for the implementer¶
- Issue #4 is the load-bearing one. The other four issues are operational handoff; #4 is the contract trip-wire — without it, Phase 3 can silently drift the four Protocols and Phase 2's typing guarantee evaporates. Treat the wording with care: name the ADR amendment requirement explicitly. The Step 8 PR review must verify this issue's body before merge.
- Issues mirror stories, not replace them. Each handoff issue body links to the canonical Phase 3 story file(s). The story file is the implementation prescription; the GitHub issue is the project-board notification surface. Resist the urge to inline-copy the story's ACs into the issue body — that creates two sources of truth.
scripts/_phase3_handoff_issues.pyis pure data;scripts/file_phase3_handoff_issues.pyis the impure shell. This split is the functional-core / imperative-shell convention per CLAUDE.md. Tests load the pure registry directly; the impure script is integration-tested via the--dry-run→ fixture path.- Idempotency matters. The script will be re-run by a future contributor (e.g., to refresh issue bodies after a Phase 3 story is reorganized). Title-dedupe + body-diff
gh issue editis the convention. Nevergh issue createwithout checking for an existing match. - GH Project board reuse. If a
codewizard-sherpaProject board exists, the closing-PR operator passes--project codewizard-sherpa. If not, the script files without project association + a loud warning. Either way, the milestonePhase 3 — Vuln remediation: deterministic recipe pathis created by AC-1d's pre-flight (idempotent). - Issue labels. Apply
phase:3,handoff:from-phase-2, plus one ofloader/plugin/fallback/smoke/allowlistfor the five primary issues. Backlog issues getbacklog+ the relevant area label (mypy,external-docs,skills). docs/contributing.mdalready has## Adding a probe. Phase 0's 7-step recipe withLanguageDetectionProbeis the Phase 0/1 generic guidance; the new H3 is the Phase 2 addendum. Do NOT edit the existing H2 — append the H3 below it. Match existing heading depth + style (Rule 11 — match codebase conventions).- Phase 2 README's exit-criteria section POINTS at the canonical table. Do NOT duplicate
stories/README.md §"Exit-criteria coverage"into the phase README. Duplication will drift. The phase README's section is a high-level G1–G10 sign-off + a link to the canonical mapping. - Don't unskip
test_phase3_handoff_smoke.py. The unskip is Phase 3's first commit on that test — the action is the entry-gate review. If a reviewer asks "why don't we just unskip it now?", the answer is ADR-0007 / Implementation risk #8: Phase 2 has zero implementations of the Protocols; unskipping in Phase 2 verifies nothing because there's no concrete adapter to verify against. AC-9 enforces this with a BLAKE3 freeze. - AC-10b is intentionally soft. S8-04 lands after S8-01..S8-03; if one of those slips, AC-10b warns but does not hard-fail. The closing-PR's manual checklist is the hard gate. This is to avoid coupling S8-04's executor pass to other stories' completion.
- The
OPERATOR-RUNGREEN step (livegh issuecreation). The executor runs--dry-runand commits the fixture. The actual live-run against GitHub is a human operator step at PR merge time. Document this clearly in_attempts/S8-04.md; do NOT have the executor authenticate to GitHub. - No PII / no internal hostnames in issue bodies. The bodies are public on GitHub; review carefully before live-run.
- Mark
roadmap.md §"Phase 2"complete in a separate commit on the closing PR, not in this story. Mechanical, no test coverage. - Phase 0 fence stays green: zero new
src/imports introduced. Trivially. - Open-questions selection rationale. AC-8 files backlog issues for the three OPEN items (#2, #4, #5); the other five (#1, #3, #6, #7, #8) are resolved by shipped stories per the inline citations in
stories/README.md §"Open implementation questions". The script's docstring carries the justification so a future reader of_phase3_handoff_issues.pyunderstands the selection without re-reading the README. - Rule 2 vs
IssueSpecregistry. Eight issue payloads is past the rule-of-three threshold and heterogeneous enough that a typed registry is justified (open/closed: future handoff stories add rows to theFinaltuple; the script logic is unchanged). TheIssueSpecmodel carriestitle, milestone, body, labels, phase3_stories— the data shape eliminates dict-shuffling drift.