codewizard-sherpa¶
An autonomous agentic system that opens pull requests to modify code across an organization's repositories at portfolio scale.
Deterministic where it can be. Probabilistic only where it must. Humans always merge.
-
Get started
Clone the repo, run
make bootstrap, gather your firstRepoContextin five minutes. -
Architecture
The Temporal-orchestrated, layered-hybrid design — read top-down or drill into ADRs and phase designs.
-
Roadmap
17 phases from local bullet tracer to multi-tenant production. Phases 0–5 and 6.5 have complete design packages; Phase 6 has been redesigned against the plugin architecture.
What is this?¶
Most refactoring and vulnerability-remediation work in an organization is mechanical: upgrade this package, replace this base image, rewrite these call sites for an API change. The mechanical work is also where human engineers waste the most time — too repetitive to enjoy, too risky to delegate to a script that doesn't understand context.
codewizard-sherpa is built to do the mechanical work itself, end-to-end, across many repos at once, and stop at the point a human reviewer would actually add value: the pull-request review. It scans repos, gathers structured context, plans changes recipe-first and LLM-last, applies them in microVM sandboxes, validates them against objective signals, and opens PRs with full evidence bundles. It never merges.
The system is designed around the empirical finding that AI agents are "safer builders, risky maintainers" — they break things during refactors at higher rates than during net-new code. So structural changes go through recipes (OpenRewrite, AST manipulation), and the LLM is reserved for judgment calls only.
Headline shape¶
The production system is a Temporal-durable workflow envelope wrapping a three-layer orchestrator:
- Hierarchical Planner (LangGraph Supervisor) — reads intent, dispatches to a subgraph
- SHERPA-style State Machine (the worker subgraph) — Pydantic state ledger; nodes never call nodes
- Trust-Aware Verification (conditional edges) — microVM sandbox + objective signals decide every transition
LLMs appear only at the leaves, called via the Agents SDK for narrow judgment calls. Everything else — routing, gating, control flow, cost accounting — is deterministic.
→ Read the architecture overview
Status¶
As of 2026-05:
| What | Status |
|---|---|
| Phase 0 — Bullet tracer foundations | ✅ Shipped |
| Phase 1 — Layer A (Node) context gathering | ✅ Shipped |
| Phase 2 — Layers B–G context gathering | 🚧 Most stories shipped; closeout in flight |
| Phases 3–5, 6.5 — Designed | 📐 Designs complete |
| Phase 6 — Plugin-aware redesign | 📐 Designs complete |
| Phase 7 — Migration task-class (distroless containers) | 📐 Designs complete |
| Phases 8–16 — Roadmap stubs awaiting design | 📋 Planned |
The implementation focus today is the local CLI POC (codegenie gather). The probe contract it implements is the same one the production service will use (ADR-0007) — drift here would propagate everywhere.
The architectural commitments¶
Every subsystem in codewizard-sherpa honors nine load-bearing constraints. The two most important:
Commitment §1 — No LLM in the gather pipeline
Probes are deterministic; same inputs always produce same outputs. This is what makes the RepoContext artifact reproducible, cacheable, and auditable. Enforced in CI by import-linter (ADR-0005).
Commitment §8 — Humans always merge
Autonomy ends at PR creation. This is the consistent finding from every published autonomous-migration study. The system can spend hours of LLM time and days of sandbox compute building a PR; merging is always a human decision (ADR-0009).
→ All nine commitments + their ADRs
How to read further¶
This site is structured for progressive disclosure:
| If you have… | Read |
|---|---|
| 5 minutes | This page |
| 30 minutes | This page + Architecture overview |
| 2 hours | Add the Production design |
| A weekend | Add the ADR index (36 numbered decisions) and one or two phase final-designs |
| Want to contribute | Contributing guide |