Contributing¶
Welcome. This page is the onboarding guide for contributors picking up the
codewizard-sherpa repo cold. It is intentionally short: the why lives in
docs/production/design.md and the per-phase
architecture under docs/phases/<phase>/phase-arch-design.md. This page
tells you what to type.
Bootstrap¶
The repo standardizes on Python 3.11+ and uses uv when available,
falling back to python -m pip. From a clean clone:
This is the only command you should ever run by hand to get a usable
environment. It resolves all [project.optional-dependencies] extras
required for the dev loop. The Makefile shells out to either uv pip install
-e ".[dev]" or python -m pip install -e ".[dev]"; never mutate the
environment by editing setup.py (there isn't one) or pip-installing into
site-packages directly.
Phase 0 ships four extras (ADR-0006):
gather— the deterministic probe pipeline (runtime). No LLM SDKs here.dev— the local quality loop (ruff,mypy,pytest,pip-audit,import-linter,mkdocs).service— Temporal worker + service-facing tooling. Empty in Phase 0; populated by Phases 3+.agents— the LLM-SDK landing zone. When Phase 4 introduces the LLM fallback, packages likeanthropic(and any other LLM SDK) land in[agents]and NEVER in[project.dependencies]. ThefenceCI job enforces this: anything LLM-shaped independenciesfails the build.
Run make check once after bootstrap to confirm the loop:
This runs lint → typecheck → test → fence in order.
Running the harness¶
The probe pipeline is exposed as a single CLI entry point:
This writes .codegenie/context/repo-context.yaml (human-facing) and
.codegenie/context/raw/*.json (raw probe outputs) inside the analyzed
repo. Cache lives under .codegenie/cache/. The CLI offers to add
.codegenie/ to the analyzed repo's .gitignore on first run.
Useful sub-commands:
codegenie gather --no-cache PATH— force every probe to re-runcodegenie audit verify PATH— re-validate the audit chain (S3-06)codegenie gather --help— global flag inventory
If any of the external tools listed in docs/localv2.md §6 are missing
from $PATH, the CLI prints an actionable error and exits non-zero. Do not
try to monkey-patch the probe to "tolerate" a missing tool — fail loudly
(global rule 12).
Adding a probe¶
The "extension by addition" rule is load-bearing: adding a probe is never
"edit existing code." It is "register one new class." The
LanguageDetectionProbe shipped in S4-01 (Phase 0) is the worked example
to copy.
Numbered recipe:
- File the issue first — use the
New probetemplate at.github/ISSUE_TEMPLATE/new-probe.md. Name the Planner decision your evidence supports; if you can't, the probe is premature. - Write the failing tests first. Create
tests/unit/test_<probe>.pywith a happy path, a failure mode, and a confidence-reporting assertion. The probe contract demands honest confidence. - Declare the output schema under
src/codegenie/schema/probes/<probe_name>.py. Probe schemas use Pydantic v2 models. The output is facts, not judgments — nosafe_to_*booleans, norecommended_actionstrings. - Implement the probe class under
src/codegenie/probes/<probe_name>.py. Inherit from the ABC insrc/codegenie/probes/base.py. Populate: declared_inputs— file globs the probe reads (drives cache keys)applies_to_languages— list, or["*"]applies_to_tasks— list, or["*"]- Register the probe. Add
@register_probeabove the class.src/codegenie/probes/__init__.pyimports register-side-effects; adding a probe never requires editing a central list. - Validate the snapshot. Run
pytest tests/unit/test_probe_contract.py. If the probe widens the contract surface, that test fails — STOP. File an ADR amendment (template:.github/ISSUE_TEMPLATE/adr-amendment.md) before regenerating the snapshot. Per ADR-0007, drift is resolved by changing code, never by editing the spec. - Round-trip a fixture. Add a synthetic repo under
tests/fixtures/that exercises the probe end-to-end viacodegenie gather.
Probe version bumps¶
(Resolves open question Q2 from
phases/00-bullet-tracer-foundations/phase-arch-design.md.)
Each probe class carries a version: str class attribute. The convention:
- Patch bump (
1.0.0 → 1.0.1) — internal refactor, no output schema change, no cache-invalidation needed. - Minor bump (
1.0.x → 1.1.0) — output schema gains an OPTIONAL field. Old cache entries are still readable. - Major bump (
1.x.y → 2.0.0) — output schema breaks (renamed field, removed field, type change). Cache entries from the prior major version are invalidated at read time; an ADR is required.
Cache keys include the probe version. Never silently re-use the same
version after changing the output shape — stale repo-context.yaml files
in the wild will mis-merge.
Adding a Layer B/C/D/E/G probe (Phase 2 additions)¶
Phase 2 (Layers B–G) introduces five additions on top of the Phase 0/1 recipe above; the seven-step recipe still applies — these are additions, not replacements. Use the named Phase 2 probes as canonical examples to copy.
-
Heaviness annotation. Pass
heaviness=andruns_last=to@register_probe(...)(see Phase 2 ADR-0003). The coordinator sorts the non-prelude wave by heaviness;runs_last=Trueis for probes that must observe a fully-prepared workspace. Example:IndexHealthProbe(Layer B2,src/codegenie/probes/layer_b/index_health.py). -
run_external_clivsrun_allowlisted. Layer B/G probes that shell out to an external CLI route throughcodegenie.exec.run_external_cli— the Layer B/G wrapper that adds timing + structured-event emission on top of Phase 0'srun_allowlisted. Layer C probes (e.g.RuntimeTraceProbe,src/codegenie/probes/layer_c/runtime_trace.py) callrun_allowlisteddirectly because they pass explicit hardening flags (capability drops, read-only roots) the wrapper does not model. Adding a binary to either path requires an ADR amendment to 02-ADR-0001. Example wrapped:SemgrepProbe(Layer G,src/codegenie/probes/layer_g/semgrep.py). -
@register_index_freshness_checkOpen/Closed seam. If your probe answers "is this index fresh?", register a check atcodegenie.indices.freshnessvia the decorator (02-ADR-0006). TheIndexHealthProbeenumerates every registered check via the registry — no central edit needed. Example:SkillsIndexProberegisters a freshness check forSkillsIndex(Layer D,src/codegenie/probes/layer_d/skills_index.py). -
Typed
ProbeOutput.schema_slicevia Pydantic. Output schemas undersrc/codegenie/schema/probes/<probe>.pyare Pydantic v2 models withmodel_config = ConfigDict(frozen=True, extra="forbid"). Do not callmodel_constructanywhere undersrc/codegenie/output/— theforbidden-patternspre-commit hook bans it; validation must always run at the writer chokepoint (02-ADR-0010). Example:ConventionsProbe(Layer D,src/codegenie/probes/layer_d/conventions.py). -
declared_inputscache keys. Globs cover most cases; special tokens (e.g.image-digest:per 02-ADR-0004) ride alongside file globs and are resolved by the coordinator's snapshot system. Cache keys derive deterministically fromdeclared_inputs+ probe version. Example with special token:RuntimeTraceProbe. -
Confidence is a fact, not a judgment. Every
ProbeOutput.confidenceis"high" | "medium" | "low"based on observed evidence — never an editorialized recommendation. The Planner consumes confidence; probes never editorialize.IndexHealthProbederives confidence from theIndexFreshnesssum-type variant (Fresh →"high", Stale →"low"). -
Canonical probe examples (copy these).
IndexHealthProbe(Layer B2, the load-bearing probe),RuntimeTraceProbe(Layer C, sandboxed subprocess +image_digest_resolver),SemgrepProbe(Layer G, external CLI viarun_external_cli),SkillsIndexProbe(Layer D, registers a freshness check),ConventionsProbe(Layer D, typed slice + Open/Closed loader). Each lives undersrc/codegenie/probes/layer_<letter>/.
Project conventions¶
Coverage ratchet (resolves open question Q5)¶
The repo enforces line / branch coverage thresholds via
--cov-fail-under in pyproject.toml. The ratchet schedule is:
| Phase | Line | Branch | Notes |
|---|---|---|---|
| Phase 0 | 85 | 75 | 85/75 — current gate. |
| Phase 1 | 87 | 77 | Bumps to 87/77 when Phase 1's first probe lands. |
| Phase 2 | 90 | 80 | Bumps to 90/80. Frozen thereafter until Phase 5. |
The --cov-fail-under=85 line in pyproject.toml carries a comment
mirroring this schedule so a contributor editing the gate sees the table.
Do not raise the gate ahead of the schedule — coverage is a floor, not
a goal, and ad-hoc bumps create one-PR pain that gets reverted.
Probe contract is frozen¶
The probe ABC in src/codegenie/probes/base.py and the snapshot
tests/snapshots/probe_contract.v1.json are governed by ADR-0007.
Drift between the runtime ABC and the snapshot is resolved by changing
code, never by editing the spec. If you must widen the contract:
- File an issue with the
ADR amendmenttemplate. - Wait for the amendment text to be approved.
- Open a PR using the repo's PR template at
templates/adr-amendment.md. - Regenerate the snapshot using
scripts/regen_probe_contract_snapshot.py.
Structural defense tests (tests/fence/)¶
Three fences pin invariants of the composition, not behaviours of a single module. They catch classes of bugs that unit tests routinely miss because unit tests verify one module at a time against its declared interface, not against what the runtime composition actually wires up.
- Adding a new submodule under
src/codegenie/?tests/fence/test_per_submodule_cold_start.pyspawns a fresh subprocess for every importable submodule. A new circular import — even one that pytest never trips because its shared interpreter has primed sys.modules — fires this fence. If your new module is in the_KNOWN_BROKEN_PRE_FIXskip set, it's blocked on a tracked fix; do not add to that set without an explicit reason. - Adding a new attribute to
ProbeContext?tests/fence/test_probe_context_conformance.pyasserts the coordinator-built ctx (BudgetingContext) carries every attribute on the frozenProbeContextsurface. Forget to thread your new attribute through_make_probe_contextand the fence fires before a probe silentlyAttributeErrors at runtime. - Modifying a probe to read a new ctx attribute or use a new code
path? The smoke test
test_no_probe_errors_in_smoke_run_recordruns a real gather against thepolyglotfixture and asserts no probe reportsexit_status="error". Coordinator failure-isolation otherwise hides AttributeError-class drift; this assertion surfaces it.
These three were added 2026-05-19 after a probe-context drift and a plugins.manifest circular-import surfaced in end-to-end testing. The discipline: structural defenses are cheap to add and cheap to run; the moment a class of bug shows up in production, write the fence that would have caught it.
ADR lifecycle¶
Production ADR statuses are Proposed, Accepted, Provisional Accepted, Deferred, and Superseded.
- Use
Provisional Acceptedonly when the direction is binding now but a named future evidence point remains. The ADR must include**Review trigger:**and say what evidence will promote or retire it. - When a decision replaces an older one, use reciprocal links: the older ADR becomes
Superseded by ADR-NNNN, and the successor includes**Supersedes:** ADR-MMMM. - In the issue and PR, state what changed, why the older claim no longer holds, and what evidence supports the new posture.
Pre-commit hooks¶
.pre-commit-config.yaml runs ruff, ruff format, and mypy on
staged files. SHA-pinned for reproducibility. Run pre-commit install
once after make bootstrap; CI re-runs the same checks via make check
so the hook is convenience, not gate.
CI matrix¶
The six required jobs (all must pass on Python 3.11 AND 3.12 before merge):
lint—ruff check+ruff format --check+lint-importstypecheck—mypy --strict src/test—pytest+ coverage gate (see ratchet above)security—pip-auditdocs—mkdocs build --strictover the curatednavfence— the LLM-in-gather fence (ADR-0002)
See also¶
docs/roadmap.md— phased plan from local POC to productiondocs/localv2.md— canonical local POC specdocs/production/README.md— canonical production-target referencedocs/phases/00-bullet-tracer-foundations/README.md— Phase 0 exit criteria and handoff record