Phase 6 — SHERPA-style state machine for the vuln loop: Final design¶
Status: Design of record
Date: 2026-05-18
Roadmap source: ../../roadmap.md §"Phase 6"
Executive summary¶
Phase 6 turns the Phase 3 deterministic recipe path, the Phase 4 RAG-shaped LLM fallback, and the Phase 5 sandbox gates into one restartable SHERPA-style workflow. It does not redesign those capabilities. It composes them inside a plugin-local LangGraph subgraph, persists a typed ledger at semantic boundaries, and exposes one stable harness-facing contract, VulnRemediationSut, that Phase 6.5 may consume without knowing the graph's internal topology.
Decisions of record¶
- Plugin-local graph topology. The graph lives under
plugins/vulnerability-remediation--node--npm/subgraph/; reusable ports and typed contracts live undersrc/codegenie/. - Stable harness-facing SUT contract. Phase 6 owns
VulnRemediationSut:
class VulnRemediationSut(Protocol):
async def run_case(self, request: VulnRemediationCase) -> VulnRemediationResult: ...
def digest(self) -> SutDigest: ...
VulnRemediationCase names the repo fixture, CVE, cassette pin, and requested execution mode. VulnRemediationResult exposes sanitized outputs the harness needs: terminal state, patch digest, gate summary, failure modes, cost summary, and evidence references. The concrete LangGraph builder is behind the adapter.
3. Checkpoint on semantic boundaries. The ledger persists after plan acceptance, patch application, gate result, escalation, and terminal completion. Resume verifies the prior chain head before replay.
4. Edges own control flow. Nodes compute; conditional edges decide. No node directly calls another node.
5. Typed interruption. HITL is a discriminated-union outcome carrying reason, evidence, and resumption contract. "Paused" is not a boolean side channel.
6. No new trust bypass. Patch application, LLM invocation, and sandbox execution continue through Phase 3/4/5 ports and policies.
State model¶
The ledger uses a closed sum type:
NeedsPlanPlanReadyPatchAppliedGateFailedRetryableAwaitingHumanReviewCompletedFailedUnrecoverable
Every transition records: prior state id, next state id, triggering outcome, evidence digest, and checkpoint chain head.
Main workflow¶
- Load
VulnRemediationCase. - Build or resume
VulnLedger. - Plan through recipe-first → RAG-shaped LLM fallback.
- Apply patch through the existing transform port.
- Validate through the Phase 5 gate runner.
- Route:
- pass →
Completed - retryable failure → replan
- repeated failure or policy block →
AwaitingHumanReview - impossible state / integrity failure →
FailedUnrecoverable - Return
VulnRemediationResultthroughVulnRemediationSut.
Relationship to Phase 6.5¶
Phase 6.5 may depend on:
- the
VulnRemediationSutprotocol VulnRemediationCaseVulnRemediationResultSutDigest
Phase 6.5 may not depend on:
- the concrete graph builder
- node names
- checkpoint backend internals
- plugin-local file layout
That contract boundary is the main redesign outcome. It lets the harness measure behavior while Phase 6 remains free to refactor graph internals.
Exit criteria mapping¶
| Roadmap exit criterion | Phase 6 commitment |
|---|---|
| LangGraph state machine runs the vuln loop | Plugin-local subgraph with typed state ledger |
| Mid-run kill + resume works | Replay-verified semantic checkpoints |
| HITL interrupt resumes correctly | Typed interrupt outcome + resume validation |
| Phase 6.5 can evaluate the loop | Stable VulnRemediationSut contract |
Non-goals¶
- No second task class.
- No Temporal durability yet; Phase 9 owns that.
- No replacement of Phase 3/4/5 domain logic.
- No generalized graph framework for every future plugin before Phase 7 proves the second plugin shape.
Deferred questions¶
- Whether SQLite remains the local checkpoint backend after Phase 9 introduces Postgres.
- Whether future task classes share any workflow node implementations or only ports.
- Whether
SutDigestneeds to include prompt-template version once model release policy lands.