Phase 02 — Context gathering — Layers B–G: Final design¶
Status: Design of record (synthesized from three competing designs + critique).
Synthesized by: Graph-of-Thought synthesizer subagent
Date: 2026-05-14
Sources: design-performance.md · design-security.md · design-best-practices.md · critique.md
Amendment note (2026-05-17 — 02-ADR-0011): the "vendored
.so+tools/grammars.lockBLAKE3 pin" model described in §"Component design #12 —TreeSitterImportGraphProbe" and §"Resource & cost profile" was superseded after S4-04 and S4-06 hit a per-platform build-chain blocker. Grammars now ship as PyPI wheels (tree-sitter-typescript,tree-sitter-javascript, futuretree-sitter-python/tree-sitter-java) behind acodegenie.grammars.lock.language_for(name) -> tree_sitter.Languagekernel; supply-chain pinning ispip --require-hashesat the wheel boundary. The named-trigger C-extension discipline (Phase 1 ADR-0009 admitspy-tree-sitteras the one exception) carries forward unchanged — only grammar delivery changed. The body below records the original design-time decision and is preserved verbatim for historical context; current truth is 02-ADR-0011.
Lens summary¶
Phase 2 lands the remaining probe layers (B–G) on top of Phase 0/1's frozen contract surface, with IndexHealthProbe (B2) as the load-bearing citizen the roadmap exit criterion names by name. The synthesis takes the best-practices skeleton (kernel-only probes, ADR-0033 sum types from line 1, no plugin loader yet — Phase 3 owns that), bolts on the security lens's writer-chokepoint redaction and _run_external_cli sandbox port (with cost-pruned isolation), and pulls from the performance lens only the cache-correctness primitives (image-digest as a declared input, not as a cache-key bypass) and the deliberate stale-fixture test that proves B2 catches what it's there to catch. The synthesis rejects the performance design's Plugin Loader (Phase-3 deliverable per roadmap + ADR-0031 §Consequences §1), rejects every Phase 0 ABC edit proposed by [P] and [S] (cost_tier, capabilities: ProbeCapabilities) and finds the same dispatch leverage in registry-side annotations, rejects the security design's cryptographic-anchor freshness ceremony (defends a non-threat against an attacker who already owns $HOME) and the per-repo encryption-key theatre (key + ciphertext in the same trust tier), and rejects the performance design's unilateral pytest-xdist reversal of the Phase 0 veto. The result is a smaller Phase 2 than any of the three inputs proposed: ~kernel probes + one tagged-union IndexFreshness + four adapter Protocols (documentation as code) + the TCCMLoader skeleton + writer-chokepoint secret redaction + a one-function external-CLI port (_run_external_cli) wrapping run_allowlisted. Phase 3 owns the loader, the first plugin, the four adapter implementations, and the OpenRewrite recipes — as the roadmap says.
Goals (concrete, measurable)¶
These are the load-bearing exits Phase 2 must hit. Each goal is annotated with provenance.
- Every Layer B–G probe in
localv2.md§5.2–5.6 (kernel-only — language-agnostic) ships with golden-file coverage against the 5-repo fixture portfolio. [synth — adopts [B]'s kernel-only framing + [P]'s portfolio sizing] IndexHealthProbe(B2) surfaces a real staleness case in CI against a deliberately-seededstale-scipfixture; build FAILS if the probe does not catch it. [roadmap exit criterion — operationalized astests/adv/phase02/test_stale_scip_fixture.py] [synth]IndexFreshness = Fresh | Stale(reason: StaleReason)is the sum type B2 returns. One name, one module path (src/codegenie/indices/freshness.py), fourStaleReasonvariants (CommitsBehind,DigestMismatch,CoverageGap,IndexerError). Every consumer pattern-matches withassert_never. The competingAdapterConfidence/IndexConfidencenames are not shipped in Phase 2 (rationale below). [B] [synth — resolves critic finding #3]- Zero edits to Phase 0/1 frozen surfaces:
ProbeABC,ProbeContext(except an additive Phase-2-ADR-gated optionalimage_digest_resolvercallable mirroring Phase 1'sparsed_manifestprecedent),@register_probe,Coordinatorcore,Cache,OutputSanitizer,run_allowlisted,ALLOWED_BINARIES(extended additively per Phase 0 §6.4). No new ABC fields for cost tiers, capability bundles, or any other coordinator-internal concern. Per-probe scheduling data lives as registry annotations on the decorator, not on the ABC. [synth — rejects [P]'scost_tierand [S]'scapabilities: ProbeCapabilities; resolves critic finding #2] - Secret findings (
gitleaks+semgrep p/secrets+ entropy catch-all) are redacted at the writer chokepoint inrepo-context.yaml, raw artifacts, cache, and audit log. Plaintext is not persisted in Phase 2 — see Component §Secret redactor. Cleartext access path is deferred to Phase 5 (microVM-sandboxed planner consumption). [synth — resolves critic finding #7] - One subprocess port for Layer B/G external CLIs:
codegenie.exec.run_external_cli(probe_name, argv, *, cwd, allowlisted_egress, timeout_s) -> ProcessResult, a wrapper around Phase 0run_allowlistedthat adds env strip, working-directory restriction, and (on Linux only, optional)bubblewrap --unshare-netwhen available. Phase 0's chokepoint is the single subprocess path. [synth — adopts [S]'s Command pattern but uses the existing Phase 0 chokepoint rather than introducing a parallel one] - No new C-extension parser dependencies. Phase 1 ADR-0009 carries forward. The performance design's
msgpack/scip-python/tantivy/tree-sitter-python/gitleaks-pythonship-list is rejected. Phase 2 adds onlynetworkx(pure-Python depgraph),gitpython(rejected in favor of shelling out togitviarun_allowlisted—gitalready inALLOWED_BINARIESper Phase 0 §6.4),tree-sitter+ grammars (Phase 2 ADR amends Phase 1 ADR-0009 with the named trigger — see ADRs below). [synth — resolves critic shared blind spot #2 + best-practices open Q §4] tantivyships only as opt-in forExternalDocsIndexProbe(D9), with a ripgrep-via-run_allowlistedfallback that is the default. Phase 2's tested path is the fallback; tantivy lights up only when the user opts in via config. [synth — adopts [B]'s pure-Python ratio]- Cost target: $0/run. Tokens per gather: 0. Phase 0
fencejob continues to assert. [P+S+B agree, load-bearing commitment §2.1] - Wall-clock (1k-file fixture): cold p50 ≤ 90 s; warm p50 ≤ 1.5 s; incremental (single .ts change) p50 ≤ 10 s. No
pytest-xdist— the Phase 0 veto holds (synthesizer-recorded 10/4 in Phase 0). The performance design's portfolio-lane xdist exception is rejected; the portfolio tests fit within a serial CI lane of ≤ 6 minutes, validated bytests/bench/bench_portfolio_walltime.py(advisory). [synth — resolves critic finding #8] - Plugin scaffolding shipped in Phase 2 is kernel-only: the four
Protocolclasses from ADR-0032 incodegenie/adapters/protocols.py(documentation as code), theTCCMLoader+TCCMPydantic model +DerivedQuerydiscriminated union incodegenie/tccm/, theSkillsLoader+SkillPydantic model incodegenie/skills/. No plugin loader. Noplugin.yamlparser. Noplugins/universal--*--*/directory. No adapter implementations. Phase 3 ships all of those, together, as ADR-0031 §Consequences §1 prescribes. [B + roadmap — resolves critic finding #1] - Phase 2 ships no event stream. The audit anchor (
runs/<utc-iso>-<short>.json) from Phase 0 is unchanged. Per ADR-0034 §Consequences §1, the canonical event log lands in Phase 9 (or 13). Phase 2 emits structured slice metadata (probes report their owngathered_at,last_indexed_commit, etc. in their slices) which Phase 9 will project — but does NOT pre-ship.codegenie/events/JSONL. [synth — resolves critic [S] §"missed" + [P] §8 + ADR-0034]
Architecture¶
codegenie gather <path>
│
▼
┌────────────────────────────┐
│ Phase 0 CLI entry (click) │ ← unchanged
│ + tool readiness extended │
│ for B-G external CLIs │
│ + ALLOWED_BINARIES adds │
│ semgrep, syft, grype, │
│ gitleaks, scip-typescript│
└──────────────┬─────────────┘
│
▼
┌────────────────────────────┐
│ Phase 0 Coordinator │ ← unchanged ABC;
│ asyncio.Semaphore( │ registry now
│ min(cpu_count(), 8)) │ carries optional
│ per-probe Task + timeout │ annotations
│ failure isolation │ (heaviness=heavy
│ Phase 1 parsed_manifest │ used as a sort
│ memo │ key, not a sem-
└──────────────┬─────────────┘ aphore selector)
│
┌──────────────────────────────────┼──────────────────────────────────┐
│ Phase 1 Layer A probes (unchanged) │
│ ┌────────────────────── Phase 2 additions ──────────────────────┐ │
│ │ Layer B semantic_index_meta, index_health, dep_graph, │ │
│ │ tree_sitter_import_graph, generated_code, │ │
│ │ node_reflection (kernel surface; npm-specific │ │
│ │ refinements ship in Phase 3 plugin) │ │
│ │ Layer C dockerfile, sbom (syft), cve (grype), runtime_trace │ │
│ │ (5-scenario harness, sequential), certificate, │ │
│ │ entrypoint, shell_usage │ │
│ │ Layer D skills_index, conventions, adrs, policy, exceptions,│ │
│ │ repo_notes, repo_config, external_docs (opt-in) │ │
│ │ Layer E ownership, service_topology stub, slo stub │ │
│ │ Layer F empty (Phase 4+ task-specific evidence) │ │
│ │ Layer G semgrep, ast_grep, ripgrep_curated, gitleaks, │ │
│ │ test_coverage_mapping │ │
│ │ │ │
│ │ Kernel scaffolding (no implementations): │ │
│ │ codegenie.adapters.protocols (ADR-0032 Protocols) │ │
│ │ codegenie.tccm.{loader,model,queries} (ADR-0029) │ │
│ │ codegenie.skills.{loader,model} │ │
│ │ codegenie.conventions.{catalog,model} │ │
│ │ codegenie.indices.freshness (IndexFreshness sum type) │ │
│ │ codegenie.depgraph.{builder,model} (networkx graph; │ │
│ │ queries live in plugin adapters, not here) │ │
│ └────────────────────────────────────────────────────────────────┘ │
└──────────────────────────────────┬──────────────────────────────────┘
│
▼
┌────────────────────────────┐
│ Phase 0 OutputSanitizer │ ← extended with
│ (extended): │ SecretRedactor
│ - field-name regex (P0) │ pass (chokepoint;
│ - JSONValue tree (P0) │ refuses to write
│ - SecretRedactor (P2) │ plaintext on the
│ - PromptInjectionMarker │ persisted path)
│ (P2; tag, not redact) │
└──────────────┬─────────────┘
│
▼
┌────────────────────────────┐
│ Phase 0 Writer + Cache │ ← unchanged
│ atomic .tmp → os.replace │
│ content-addressed cache │
│ declared_inputs governs │
│ cache keys (unchanged) │
└──────────────┬─────────────┘
│
▼
.codegenie/context/
├── repo-context.yaml (envelope; redacted)
├── schema-version.txt
├── raw/ (per-probe JSON; redacted)
│ ├── scip-index.scip (binary; consumed by
│ │ Phase 3 adapter)
│ ├── runtime-trace-{scenario}.{strace,json}
│ ├── syft-sbom.json, grype-cves.json
│ ├── semgrep-findings.json (redacted),
│ ├── gitleaks-findings.json (redacted)
│ ├── dep-graph.json
│ └── import-graph.json
└── runs/<utc-iso>-<short>.json (Phase 0 audit anchor;
unchanged)
Three structural lines from this diagram, each load-bearing:
-
No new chokepoints. The Phase 0 coordinator/cache/sanitizer/writer are not extended in structure — only the sanitizer grows a new pass (
SecretRedactor) added by composition, and the writer's chokepoint property survives. The performance design's parallelcost-tier coordinatorand the security design's parallel_run_in_containerchokepoint are both rejected in favor of registry annotations +run_allowlisted-via-run_external_cliwrapping. -
Kernel-only probes; no language plugin code. Probes in Phase 2 are
applies_to_languages=["*"](or, for Node-specific Phase-1 follow-ons likeNodeReflectionProbealready targeted at the existing Phase 1 surface,["javascript","typescript"]). npm-specific behaviors (npm audit,npm outdated, peer-dep resolution) ship inside the Phase 3 plugin. Maven probes ship in Phase 8+. This is the extension-by-addition fence (commitment §2.5). -
Adapter Protocols ship; adapter implementations don't. ADR-0032's four
Protocolclasses are pure types (~80 LOC total) shipped undercodegenie.adapters.protocols. They are documentation as code. The performance design's projection-as-adapter-interface andscip-pythonreader are rejected for Phase 2 — the projection shape is a Phase 3 concern owned by the first plugin's adapter implementation, which can decide whether to project, mmap, or re-parse at query time. Phase 2 ships the.scipbinary blob; Phase 3 picks the consumption shape.
Components¶
1. IndexHealthProbe (B2 — the load-bearing one)¶
- Provenance: [B] structure + [S] threat-aware degradation reasons − [S]'s cryptographic-anchor ceremony.
- Purpose: Detect and surface index staleness for every index Phase 2 produces (SCIP, runtime trace, SBOM, semgrep, conventions, skills). Silent staleness is the worst failure mode of the entire system (
production/design.md §2.3,CLAUDE.mdload-bearing). This probe is what makes the load-bearing commitment §2.3 real in Phase 2. - Interface:
@register_probe class IndexHealthProbe(Probe): name: ProbeId = ProbeId("index_health") layer: Literal["B"] = "B" tier: Literal["base"] = "base" applies_to_tasks: list[str] = ["*"] applies_to_languages: list[str] = ["*"] requires: list[ProbeId] = [] # reads other probes' OUTPUTS; the # coordinator's topological order # places B2 last by registry annotation # (`runs_last=True`) declared_inputs: list[str] = [".codegenie/context/raw/*.json", ".git/HEAD", "<scip-index-output>", "<image-digest-token>"] cache_strategy: Literal["none"] = "none" # MUST run every gather; # caching this probe is # the same bug as caching # Date.now() timeout_seconds: int = 10 -
Internal design: B2 reads the freshness metadata each upstream probe already wrote into its own slice —
last_indexed_commit,files_indexed,files_in_repo,indexer_errors,last_traced_image_digest,built_image_digest,rule_pack_version, etc. — and the currentgit HEAD(viarun_allowlisted("git", "rev-parse", "HEAD", ...)— nogitpythondep, per critic best-practices Q §4). For each index it constructs a typedIndexFreshnessvalue via a smart constructor:# codegenie/indices/freshness.py from typing import Annotated, Literal, Union from datetime import datetime from pydantic import BaseModel, ConfigDict, Field class CommitsBehind(BaseModel): model_config = ConfigDict(extra="forbid", frozen=True) kind: Literal["commits_behind"] = "commits_behind" n: int last_indexed: str # commit sha; raw str at the IO boundary class DigestMismatch(BaseModel): model_config = ConfigDict(extra="forbid", frozen=True) kind: Literal["digest_mismatch"] = "digest_mismatch" expected: str actual: str class CoverageGap(BaseModel): model_config = ConfigDict(extra="forbid", frozen=True) kind: Literal["coverage_gap"] = "coverage_gap" files_indexed: int files_in_repo: int class IndexerError(BaseModel): model_config = ConfigDict(extra="forbid", frozen=True) kind: Literal["indexer_error"] = "indexer_error" message: str StaleReason = Annotated[ Union[CommitsBehind, DigestMismatch, CoverageGap, IndexerError], Field(discriminator="kind"), ] class Fresh(BaseModel): model_config = ConfigDict(extra="forbid", frozen=True) kind: Literal["fresh"] = "fresh" indexed_at: datetime class Stale(BaseModel): model_config = ConfigDict(extra="forbid", frozen=True) kind: Literal["stale"] = "stale" reason: StaleReason IndexFreshness = Annotated[Union[Fresh, Stale], Field(discriminator="kind")]The slice shape follows
localv2.md §5.2 B2verbatim for backward compatibility (every key the spec named), but eachconfidence: high|medium|lowfield is derived from the typedIndexFreshnessvalue rather than written directly by the probe. The flat string is whatrepo-context.yamlcarries (the human-readable rendering); the typed value is whatcodegenie.indices.freshness.IndexFreshnessround-trips through Pydantic for in-process consumers (Phase 3 adapters, Phase 8 Bundle Builder,CONTEXT_REPORT.md). - Phase-2-internal consumer. To prevent the "sum type without a consumer" anti-pattern the critic surfaced (shared blind spot #1, [B]'s own §Risks §1), Phase 2 itself ships one consumer: aCONTEXT_REPORT.md"Confidence" section renderer (src/codegenie/report/confidence_section.py) that pattern-matches onIndexFreshnessfor every index slice, prints the reason forStalevariants, and is exercised by every golden-file test. Amypy --warn-unreachable-clean missingcaseis a build error from Phase 2 onward. - Why this instead of [S]'s cryptographic anchoring. The security design's BLAKE3-chained audit-log lookup attacks an attacker who can write to.codegenie/cache/— but per Phase 0 ADR-0011,.codegenie/is 0700/0600 in the user's own directory. An attacker with that write capability already owns the host (commitment §2 threat model excludes host compromise). The critic [S] §"hidden assumption" #3 named this directly: the chain detects bit-rot, not adversaries. We keep the structural freshness signals (commit-equality, image-digest-equality, coverage-ratio, indexer-error-count) and reject the cryptographic layer. - What this does buy vs. mtime. B2 deliberately does NOT consult filesystemmtime. The freshness signal is(scip_indexed_commit == repo.HEAD)— an O(1) string compare against an indexer-emitted header — plus the coverage ratio and the image-digest comparison. mtime-based freshness is forbidden via a Phase-0forbidden-patternspre-commit addition (os.path.getmtime/Path.stat().st_mtimebanned insidesrc/codegenie/probes/index_health.py). [S] - Why this choice over the alternatives: All three input designs proposed three different sum-type names (AdapterConfidence/IndexConfidence/IndexFreshness) for the same concept. We ship one name (IndexFreshness), in one module (codegenie.indices.freshness), and document why the other two are not needed yet:AdapterConfidenceis ADR-0033's prescription for ADR-0032 adapter outputs — those ship with Phase 3, and the Phase 3 plugin author decides whetherAdapterConfidence = Trusted | Degraded | Unavailableis the same shape or a layered shape overIndexFreshness.IndexConfidenceis a name collision with the human-readableconfidence: high|medium|lowstring inrepo-context.yaml. One name, one module, one consumer in Phase 2. [synth — resolves critic finding #3] - Tradeoffs accepted: B2 must run last (runs_last=Trueregistry annotation). Coupling to every other probe's slice shape is real; we accept it because the coupling is inverted relative to what [P] proposed — B2 reads slice metadata that is already part of the schema; it does not require sibling probes to expose ahealth_check(slice) -> AdapterConfidenceProtocol (which [P] proposed, and which would have been a contract change to every probe). [S]'s alternative — B2 reads the audit-log event stream — is rejected; Phase 2 ships no event stream.
2. IndexFreshness sum type module¶
- Provenance: [B] verbatim; [synth] adds the Phase-2 consumer requirement.
- Purpose: One name, one module, one variant set for index freshness in Phase 2.
- Location:
src/codegenie/indices/freshness.py(the only file in thecodegenie.indicespackage for Phase 2;__init__.py:__all__ = ["IndexFreshness", "Fresh", "Stale", "StaleReason", "CommitsBehind", "DigestMismatch", "CoverageGap", "IndexerError"]). - Tradeoffs accepted: Lives outside
codegenie.probes.index_healthso Phase 3 adapter implementations and Phase 8 Bundle Builder can import without circular dependency on the probe module. [B] - Why this choice over the alternatives: Co-locating in
probes/index_health.py(the critic's preferred boring default) is rejected only because theCONTEXT_REPORT.mdconfidence-section renderer needs to import it without pulling in the probe registry, and the renderer is the Phase-2 consumer that closes the "schema without a consumer" gap. One additional package is the smallest separation that makes the consumer real.
3. _run_external_cli — single subprocess port for Layer B/G external CLIs¶
- Provenance: [S] Command pattern adapted to [B+all]'s "use the existing Phase 0 chokepoint" discipline. Refused [S]'s parallel
_run_external_clichokepoint that would have created a second subprocess pathway. - Purpose: Invoke
scip-typescript,syft,grype,semgrep,ast-grep,ripgrep,gitleaksunder a uniform, single-chokepoint pattern that wraps Phase 0'srun_allowlisted. - Interface:
# codegenie/exec.py (extends Phase 0 module) async def run_external_cli( probe_name: ProbeId, argv: list[str], *, cwd: Path, timeout_s: float, allowlisted_egress: frozenset[str] = frozenset(), # only for tools # that legitimately # fetch (grype DB) max_stdout_bytes: int = 64 * 1024 * 1024, # 64 MB ) -> ProcessResult: ... - Internal design: Delegates to the existing
run_allowlisted(argv, ...)with three additions on top — (a) env strip enforced to Phase 0 allowlist (PATH,HOME,LANG,LC_ALL,TERM,CODEGENIE_*); (b) on Linux only, optionalbubblewrap --unshare-net --ro-bind <repo> /work --bind <tmpdir> /tmp/probewrap whenbwrapis on PATH (graceful no-op when missing — bubblewrap is hardening, not a hard requirement; the structural defenses ride onrun_allowlisted); (c)stdout/stderrcapped atmax_stdout_bytesand tail-included in any failure. Thebubblewrappath is documented as best-effort hardening on Linux CI; macOS dev hosts fall back to env-strip + cwd restriction with a single startup warning. [S, scaled down per critic [S] finding #1] - Why this choice over the alternatives: [S] proposed
bubblewrapas a mandatory boundary on Linux with an admitted macOS gap; we keep it as opt-in-on-availability and instead lean on Phase 0'sALLOWED_BINARIES(the binary itself is checksum-allowlisted at the OS package-manager layer, not at our process layer — a real-world host-hygiene concern, not a Phase 2 design concern). The critic correctly flagged that mandatorybwrapcreates a developer/CI parity problem that delivers very little additional defense overrun_allowlistedfor the actual Phase 2 threat model (the repo author, not a malicious CLI binary). [synth — resolves critic [S] findings #1, #6] - Tradeoffs accepted: Layer C (
docker buildfor SBOM/CVE/runtime-trace) is NOT routed through_run_external_cli; it stays onrun_allowlisted("docker", ...)directly with explicit--network=none --cap-drop=ALL --security-opt=no-new-privilegesflags constructed in theRuntimeTraceProbemodule. Phase 0 ADR allowsdocker(Phase 2 ADR0001-add-docker-to-allowed-binaries.md); no separate_run_in_containerchokepoint. The microVM migration path (ADR-0012, Phase 5+) replaces this call site by amending the probe module, not by swapping a hexagonal port. [synth — resolves critic [S] finding §"Hexagonal sandbox claims that smuggle subprocess into the core"] - Pattern decisions: Command pattern at the value-typed-argv level. Refused: hexagonal port-and-adapter framing (one adapter today is one function; "Port" labeling is ceremony per critic [S] §"Hexagonal applied to
_run_external_cli"). Refused: parallel_run_in_containerfor Layer C; Layer C callsrun_allowlisted("docker", ...)directly.
4. SecretRedactor (Writer-chokepoint extension)¶
- Provenance: [S] structure − the encryption-key theatre + [synth] defer-storage policy.
- Purpose: Intercept every string in every
ProbeOutput.schema_slicebefore it lands inrepo-context.yaml, raw artifacts, cache, or audit; replace anything that matches a secret pattern with<REDACTED:fingerprint=BLAKE3_8>. Phase 2 does not persist plaintext at all. - Interface:
# codegenie/output_sanitizer.py (extends Phase 0 module) def redact_secrets(slice_: dict[str, JSONValue], probe_name: ProbeId ) -> tuple[dict[str, JSONValue], list[SecretFinding]]: """Returns (redacted_slice, in-memory findings list). The findings list is discarded after the gather; Phase 2 persists NO plaintext.""" - Internal design: Runs after Phase 0's field-name regex and the
JSONValuetree walk. Patterns fromgitleaks-equivalent defaults (AWSAKIA[0-9A-Z]{16}, GitHubghp_[A-Za-z0-9]{36}, JWT, RSA-----BEGIN…PRIVATE KEY-----, NPMnpm_[A-Za-z0-9]{36}, Anthropicsk-ant-…) plus Shannon-entropy ≥ 4.5 bits/char for length ≥ 32 unknowns. Each match is replaced with<REDACTED:fingerprint=<first-8-hex-of-blake3>>; an in-memorySecretFindingis collected (probe_name, fingerprint, pattern_class, file:line if available) and printed to the CLI summary at gather end, but not persisted. - Why this choice over [S]'s encryption ceremony. [S]'s per-repo key in
~/.codegenie/keys/<repo>.keyplus ciphertext in.codegenie/findings/secrets/<fp>.encwas — by [S]'s own §Risks §5 admission — encrypted-with-a-key-in-the-same-trust-tier. The critic correctly flagged this as obfuscation, not security ([S] critic finding #5). We pick a structurally simpler answer: don't persist the plaintext at all in Phase 2. The Planner (Phase 3+) does not need cleartext access to the secret to remediate it — it needs the fact that a secret exists at file:line, plus the pattern class. If a later phase needs cleartext for a specific judgment (e.g., a microVM-sandboxed CVE adjudicator), it can be re-derived at that point inside the microVM (ADR-0012) with the secret pattern as input. Phase-5 microVM escalation path: if/when cleartext access is genuinely required for an automated remediation, Phase 5's microVM picks up the cleartext directly from the analyzed repo at that point in time, processes it inside the sandbox, and never persists it. The Phase 2 design names this as the explicit escalation door (see Open questions §1). [synth — resolves critic finding #7] - Tradeoffs accepted: A human reviewer who wants to manually inspect the actual secret string must run
gitleaksthemselves against the repo at PR-review time; the PR evidence bundle carries only the fingerprint + file:line. The team's existing secret-hunting workflow is unchanged. We pay one regression: Phase 2 cannot do "secret rotation suggestions" inline; that's deferred to a Phase 4+ task class. - Pattern decisions: Chain-of-responsibility composition at the sanitizer level (the existing Phase 0 sanitizer is unchanged;
redact_secretsis a new pass that the sanitizer pipeline orchestrates). Refused: Capability pattern across the LLM boundary (critic flagged [S]'sSecretFindingCapabilityas authorization-with-a-fancier-name — the LLM never holds the token; the helper that reads-and-renders is the actual access surface). [synth]
5. Layer G security-CLI wrappers — SemgrepProbe, SyftProbe, GrypeProbe, GitleaksProbe¶
- Provenance: [B] structure verbatim (one file per scanner, ≤ 200 LOC each, single-responsibility, no shared
ScannerRunnerabstraction). - Purpose: Run third-party security/SBOM scanners; parse JSON output into typed schema slices via Pydantic smart constructors.
- Public interface: Each registers
@register_probe. Internal types use theScannerOutcome = ScannerRan | ScannerSkipped | ScannerFaileddiscriminated union from [B] verbatim. - Internal design: (a) check tool availability via Phase 0
tool_cache; (b) invoke viacodegenie.exec.run_external_cliwith explicit argv (no shell, no string interpolation,--metrics=offforsemgrepto refuse phone-home); (c) parse JSON via Pydantic smart constructor; (d) returnProbeOutput. Each scanner's findings flow through the writer-chokepointredact_secretspass before persistence —gitleaksfinds the secret; the sanitizer redacts it; the slice that lands inrepo-context.yamlis fingerprint-only. - Why this choice over the alternatives: [B]'s "four files instead of one abstraction" line — the critic's [S] lens proposed a unified
_run_external_cli(... capability_token)chokepoint specifically as the security primitive. We accept the chokepoint at the subprocess layer (run_external_cli) but not at the scanner-parser layer — semgrep's rule-pack flags, syft's image-vs-SBOM input mode, grype's SBOM input, and gitleaks's repo-root scan are genuinely different shapes, not interchangeable strategies. Critic-survivability: the security lens did not flag [B]'s four-file decomposition as wrong; it flagged the missing chokepoint. We give it the chokepoint at the right layer. [synth — resolves critic-noted tension between [B] §"Layer G security wrappers" and [S] §"Layer B/G external-CLI sandbox runner"] - Tradeoffs accepted: ~200 LOC of probe-level scaffolding duplicated four times. The duplication is the point per [B] critic §"Composition + clear naming over generic frameworks" — each probe is reviewable in one sitting; a shared abstraction would force four genuinely-different invocations through one type signature.
6. RuntimeTraceProbe (C4 — multi-scenario harness)¶
- Provenance: [B] structure, [P] cache-key against image digest.
- Purpose: Capture runtime behavior (syscalls, loaded libraries, network endpoints, shell invocations) of the analyzed-repo's container under 5 scenarios (
startup,smoke_test,healthcheck,shutdown,error_path). - Interface: Standard probe; reads
.codegenie/scenarios.yaml(Pydantic-validated; smart-constructor parsed; falls back to 5 defaults). Per-scenario result is aScenarioResult = TraceScenarioCompleted | TraceScenarioFailed | TraceScenarioSkippeddiscriminated union per [B] verbatim. - Internal design: Sequential per-scenario execution (no concurrency — multiple
docker runinstances of the same image race resource and confuse trace attribution). Each scenario:docker build→docker run --network=none --cap-drop=ALL --security-opt=no-new-privilegeswithstrace -fattaching from host (Linux) ordtrusswith sudo prompt (macOS) or fail-typedStraceUnavailable(other). Alldocker/stracecalls go through Phase 0run_allowlisted(Phase 2 ADR0001-add-docker-to-allowed-binaries.mdextendsALLOWED_BINARIES). Per-scenario timeout 120 s; aggregate timeout 600 s. - Cache key: The probe's
declared_inputsincludesDockerfile,.codegenie/scenarios.yaml, AND a special declared-input tokenimage-digest:<resolved-from-Dockerfile-FROM-and-build-context>— the resolved local image digest is treated as a declared input, not as a cache-key bypass. This satisfies Phase 0 I1 (declared_inputsis the universal cache key) and resolves critic [P] finding #6. The image-digest resolver is provided via an optionalProbeContext.image_digest_resolver: Callable[[Path], str | None] | None = Nonefield — a Phase-2-ADR-gated ProbeContext addition mirroring Phase 1 ADR-0002'sparsed_manifestprecedent (one optional callable, default None, defensive-check at the call site). This is the one ProbeContext field Phase 2 adds; it does NOT touch theProbeABC. [synth — resolves critic [P] finding #6] - Why this choice over [P]'s image-digest-keyed cache bypass: [P] proposed letting C-layer probes override
cache_key()directly, deviating from the Phase 0declared_inputsmodel. The critic flagged this as a structural deviation that future probes would copy. Image digest as a declared input token gives the same cache-hit behavior with no contract deviation — Phase 0'sdeclared_inputsspec already permits special tokens (perlocalv2.md §4). - Tradeoffs accepted: macOS
dtrussrequires sudo; we deterministically emitTraceScenarioFailed(reason=StraceUnavailable())on macOS rather than prompting, so the macOS path is behaviorally distinct and surfaces intests/property/test_trace_portability.py. Cold p50 ~90 s (5 scenarios × ~15 s); image-digest cache key means apackage.json-only change hits cache.
7. Adapter Protocol definitions (kernel side of ADR-0032 — documentation as code)¶
- Provenance: [B] verbatim. Critic-acknowledged risk: "Strategy via Protocol with zero implementations." We accept the risk because the Protocol is the spec Phase 3's first adapter must implement against; Phase 3's exit criterion includes "the first adapter implements the Phase 2 Protocols unchanged" — any drift is a Phase 2 amendment, not a Phase 3 quiet edit.
- Purpose: Define the four
Protocolinterfaces (DepGraphAdapter,ImportGraphAdapter,ScipAdapter,TestInventoryAdapter) plusAdapterConfidenceplaceholder. No implementations in Phase 2. - Where it lives:
src/codegenie/adapters/protocols.py(~80 LOC, pure types, stdlib +typing).AdapterConfidencelives incodegenie.adapters.confidence(separately, because — per critic finding #3 — its variant set is owned by Phase 3 when the first adapter ships; Phase 2 declares a placeholder sumTrusted | Degraded(reason: str) | Unavailable(reason: str)to give Phase 3 a typed target, marked with a docstring# Phase 3 plugin may extend; revise at first adapter). - Tradeoffs accepted: No
NullAdapterships in Phase 2 (critic [B] finding #2 flagged theNullAdapterfixture as schema-validating-itself). The Phase 3 plugin'svulnerability-remediation--node--npmis the first real implementation; Phase 2's exit does not require an implementation, only the Protocols + the Phase 3 contract to consume them unchanged.
8. TCCMLoader (kernel side of ADR-0029)¶
- Provenance: [B] verbatim, scoped down by [synth] response to critic finding [B] #3.
- Purpose: Load and Pydantic-validate Task-Class Context Manifests. No Bundle building (Phase 8). No TCCM fixture in
tests/fixtures/plugins/in Phase 2 (the critic flagged this as schema-validating-itself). - Public interface:
TCCMLoader.load(path: Path) -> Result[TCCM, TCCMLoadError].DerivedQueryis a Pydantic discriminated union over the five ADR-0030 primitives (noUnknownvariant — per [B]'s open Q §3 recommendation: ADR-amend on a sixth primitive). - Where it lives:
src/codegenie/tccm/{loader.py, model.py, queries.py}. - Phase-2-internal consumer. To prevent the schema-without-consumer trap the critic flagged, Phase 2 ships one TCCM in-tree —
docs/phases/02-context-gather-layers-b-g/_reference-tccm/tccm.yaml— an illustrative manifest for theindex-health-self-checktask class. This is not a plugin (noplugin.yaml; no probes; no subgraph); it is a deliberately-minimal reference fixture that exercises every field of theTCCMPydantic model. The integration test (tests/integration/tccm/test_reference_tccm_roundtrips.py) loads it, asserts the schema roundtrips, and consumes oneDerivedQueryvariant per primitive via a mock dispatcher. The reference TCCM is documentation, not infrastructure — it ships indocs/, not inplugins/. - Tradeoffs accepted: A small DSL surface (the five
DerivedQueryvariants) ships before any plugin needs them. The alternative —DerivedQuery: dict[str, Any]"for now" — directly violates ADR-0033 §1. Critic [B] finding #3 surfaced that no real consumer exists; we close that gap by shipping the reference TCCM underdocs/(one consumer in the integration-test path, not underplugins/where it would imply pluggability that Phase 3 owns).
9. SkillsLoader (D2)¶
- Provenance: [B] verbatim + [S]'s
_safe_yaml_load_skilldiscipline collapsed into the existing Phase 1safe_yaml.loadchokepoint. - Purpose: Load and index YAML-frontmatter
SKILL.mdfiles from~/.codegenie/skills/,.codegenie/skills/, optional~/.codegenie/skills-org/. Validate frontmatter against a Pydantic schema. Body is byte-offset-recorded only (progressive disclosure — commitment §2.7). - Internal design: YAML parsed via the Phase 1
codegenie.parsers.safe_yaml.loadchokepoint — not a parallel_safe_yaml_load_skillhelper. The critic correctly flagged [S]'s parallel loader as Rule 7's anti-pattern (two existing conventions blended). Phase 1'ssafe_yaml.loadalready wrapsyaml.CSafeLoaderwith size + depth caps; we add one extra discipline at the Skills call site (and only there):os.open(path, O_NOFOLLOW | O_NOCTTY)followed byos.fdopenbefore passing tosafe_yaml.load. TheO_NOFOLLOWflag refuses symlinks at the OS level, which is the genuine Phase-2 attack surface — Skills are user-writable across three trust tiers, and~/.codegenie/skills/x/SKILL.md → /etc/passwdis in [S]'s adversarial-fixture corpus. - Tradeoffs accepted: Three-tier merge (user > repo-local > org-shared) with first-tier-wins and a loud
skill_shadowedwarning in the CLI summary on every collision per [S]'s open Q §6. Bodies BLAKE3-hashed but not read into memory (progressive disclosure). One golden test asserts a hostile SKILL.md with!!python/objectin frontmatter raisesSkillsLoadErrorand executes no code. - Why this over [S]'s parallel loader: Phase 1's existing chokepoint is sufficient; the
O_NOFOLLOWdiscipline lives at the Skills-specific call site, not in a parallel YAML loader. [synth — resolves critic [S] finding #3]
10. ConventionsCatalogLoader (D5)¶
- Provenance: [B] verbatim.
- Purpose: Load and apply org convention catalog (
~/.codegenie/conventions/*.yaml); emit typedConventionResult = Pass | Fail | NotApplicablediscriminated union per rule. - Internal design: Pure functions over Pydantic-modeled rules; one
matchper pattern type withassert_neveron the unreachable branch. Pattern types (dockerfile_pattern,dockerfile_pattern_inverted,file_pattern,missing_file) are themselves a Pydantic discriminated union. - Where it lives:
src/codegenie/conventions/{catalog.py, model.py}. - Tradeoffs accepted: No rule engine. Conventions in Phase 2 are simple file/regex/Dockerfile checks; OPA/Rego ships in Phase 16 (ADR-0021) when policy engines become real.
11. DepGraphProbe (B5 — kernel skeleton with sum-typed ecosystem discriminator)¶
- Provenance: [B] structure − [B]'s string-keyed-dict deferred sum type.
- Purpose: Build a
networkx.DiGraphof the repo's internal package dependencies (monorepo modules and cross-references). Ecosystem-specific resolution lives in plugin-side adapters (Phase 3+); Phase 2 only stitches Phase-1's already-parsed manifests into the graph. - Internal design: Reads Layer A's
manifestsandbuild_systemslices Phase 1 wrote. For each manifest path, dispatches to a per-ecosystem builder via a@register_dep_graph_strategy(ecosystem: PackageManager)decorator mirroring Phase 0's@register_probe. [B]'s open-coded dict (ecosystem-detector: dict[str, Callable]) is replaced with a decorator registry, satisfying ADR-0033's Open/Closed discipline at the file boundary. ThePackageManagersum type already exists in the schema (["bun", "pnpm", "yarn-classic", "yarn-berry", "npm"]per Phase 1 ADR-0013) — Phase 2 imports it and uses it as the decorator key. - Why this choice over [B]'s deferred sum-type: Critic [B] finding #5 directly attacked the "TODO sum-type after first plugin ships" comment. The decorator-registry pattern is the same Open/Closed primitive Phase 0 already ships for probes; using it here is one decorator + a typed registry, not a new abstraction. The fix is ~30 LOC; the deferral was speculation. [synth — resolves critic finding #5 against [B]]
- Where it lives:
src/codegenie/depgraph/{builder.py, model.py, registry.py}.
12. TreeSitterImportGraphProbe (B3 — kernel skeleton)¶
- Provenance: [P]'s probe scoped down (no internal thread pool); [B]'s deferral overridden because B3 is in
localv2.md §5.2as Phase-2 scope. - Purpose: Extract file-level import edges from the source tree using tree-sitter grammars. Emits a
networkx.DiGraph-serializable JSON toraw/import-graph.json. Forward+reverse adjacency is not pre-computed in Phase 2 — Phase 3's firstImportGraphAdapterdecides whether to project, mmap, or walk at query time. - Internal design:
py-tree-sitterbindings (the one new C-extension dep Phase 2 accepts via amendment to Phase 1 ADR-0009; see new ADR0002-tree-sitter-grammars-phase-2-amendment.md). Per-file extraction is ~5 ms; for a 50k-LOC repo with 2k source files, ~10 s cold serially. No internalThreadPoolExecutor— the critic [P] finding §"second concurrency layer" was correct that hidden parallelism inside a probe lies to the coordinator's semaphore budget. The probe is one CPU slot under the Phase 0 single semaphore; sequential extraction is the boring shape. - Grammar pinning: Grammar
.so/.dylibBLAKE3 pins recorded intools/grammars.lock(vendored; reviewed-as-data). Load-time mismatch is a typedGrammarLoadRefusedfailure mode. Grammars are loaded in-process in Phase 2 (the [S] design's_grammar_runnerout-of-process subprocess is rejected as over-engineering for the actual Phase-2 attack model — a malicious grammar would be a deliberate supply-chain compromise the pin already guards against). [synth — accepts critic [S] §"hidden assumption" #1 on bubblewrap as analogous] - Why we accept this C-extension dep: Phase 1 ADR-0009's "named-trigger threshold" applies. The trigger:
localv2.md §5.2 B3namestree-sitteras a required tool for B3; Phase 2 cannot ship B3 without it. The ADR amendment (docs/phases/02-context-gather-layers-b-g/ADRs/0002-tree-sitter-grammars-phase-2-amendment.md) records the trigger fired, the CVE-feed surface accepted, and the wheel-matrix cost. The performance design'smsgpack/scip-python/tantivy/gitleaks-pythonall remain rejected — onlypy-tree-sitter+ grammar packs are added, because onlypy-tree-sitterhas a Phase 2 named consumer.
13. Probe registry annotations (the "cost-tier" discussion, resolved)¶
- Provenance: [synth] — overrules both [P] (
cost_tierABC field) and [B] (no scheduling hint at all). - Purpose: Give the Phase 0 coordinator enough information to schedule expensive probes intelligently without editing the
ProbeABC. - Interface: The
@register_probedecorator (Phase 0, frozen) is extended to accept optional keyword arguments that ride alongside the probe in the registry dict — they are annotations on the registry entry, not fields on theProbeclass. The signature becomes:Probes opt in by decoration:def register_probe( *, heaviness: Literal["light", "medium", "heavy"] = "light", runs_last: bool = False, ) -> Callable[[type[Probe]], type[Probe]]: ...@register_probe(heaviness="heavy")onRuntimeTraceProbeandSCIPIndexProbe. The coordinator reads these from the registry when sorting the topological-order chain (heavy probes start first under the single Phase 0Semaphore(min(cpu_count(), 8)));runs_last=Truereserves the slot forIndexHealthProbe. - Why this over [P]'s
cost_tier: [P] proposedcost_tier: Literal[0,1,2,3]as a new ABC field, defended as analogous to Phase 1'sparsed_manifestaddition. The critic correctly noted (finding #2) thatparsed_manifestwas added toProbeContext, not toProbeitself; the ABC was untouched.cost_tieris data the coordinator needs to dispatch, not data the probe needs to declare. Registry-side annotations capture exactly this scheduling concern at the right layer and require zero ABC edits. The Phase-0 single semaphore is preserved (no per-tier semaphore explosion — the critic [P] finding §"hidden assumption" #2 noted GitHub-hosted runners havecpu_count()=2where per-tier sizing degenerates to 2-vs-2 starvation). - Why this over [B]'s "no scheduling hint at all": Without
heaviness, the coordinator runs the 84-secondRuntimeTraceProbelast by topological accident, blocking cold-gather. The annotation is a soft sort key, not a separate semaphore; it does not change the contract surface — it changes which task starts in which order under the existing single-semaphore budget. [synth — resolves critic finding #2] - Tradeoffs accepted: The coordinator's
_dispatchextends by ~15 LOC to read the annotation and sort the ready-queue. This is a non-trivial coordinator edit; we ADR-gate it (docs/phases/02-context-gather-layers-b-g/ADRs/0003-coordinator-heaviness-sort-annotation.md). The edit is to the coordinator's scheduling order, not to the chokepoint surface area Phase 0 froze (Semaphore,wait_for, failure-isolationtry/except,ProbeOutputflow); the chokepoint is preserved.
14. Multi-repo fixture portfolio + the stale-SCIP fixture (roadmap exit criterion)¶
- Provenance: [P]'s portfolio sizing + [synth] explicit exit-criterion wiring.
- Purpose: Five fixture repos under
tests/fixtures/portfolio/exercising different probe surfaces. The load-bearing one istests/fixtures/portfolio/stale-scip/: its.codegenie/cache/is pre-populated with a SCIP index from a known prior commit; the repo HEAD has moved.IndexHealthProbeMUST detect this and returnIndexFreshness.Stale(reason=CommitsBehind(n>=1, last_indexed=<prior>)).tests/adv/phase02/test_stale_scip_fixture.pyasserts the typed outcome; the build FAILS if the probe doesn't catch it. - Why this matters: This is exactly the roadmap exit criterion ("IndexHealthProbe surfaces at least one real staleness case in CI (deliberately seeded fixture) — proving the probe actually catches what it's there to catch"), encoded as a CI gate. Phase 2 cannot exit without it.
- CI lane: Serial (no
pytest-xdist— the Phase 0 veto holds). Estimated CI walltime growth ≤ 6 minutes; the bench canarytests/bench/bench_portfolio_walltime.pyis advisory.
Data flow¶
A representative warm-path run on a real Node.js repo (~5k files, TypeScript, pnpm, GitHub Actions, Helm, image present in local registry) where src/payments/processor.ts changed since last gather:
- Phase 0 CLI + tool-readiness. Extended for
semgrep,syft,grype,gitleaks,scip-typescript,tree-sitter,docker,strace. Missing tool → typedMissingToolError; optional tool → probe shipsconfidence: lowslice. - Phase 0/1 prelude.
RepoSnapshotviarun_allowlisted("git", "rev-parse", "HEAD"). PathIndex built. Layer A probes run; most cache-hit. ~150 ms. - Phase 0 coordinator dispatches Phase 2 probes sorted by registry
heavinessannotation (heavy first): SCIPIndexProbe(heavy) starts;RuntimeTraceProbe(heavy) starts;SemgrepProbe,GitleaksProbe,SyftProbe,TreeSitterImportGraphProbe,DockerfileProbe(medium) all dispatch under the singleSemaphore(min(cpu_count(), 8)). Light probes (ConventionProbe,SkillsIndexProbe, etc.) wait for slot availability but finish during long-tail of heavy probes. No per-tier semaphores; one budget.SCIPIndexProbeMISSES (.tssource changed) — re-indexes; ~8 s.SyftProbe/GrypeProbecache-HIT (image digest unchanged — image-digest token indeclared_inputsmatches).SemgrepProbeMISSES the affected files; ~3 s incremental.GitleaksProbeMISSES (.git changed); ~2 s. Findings flow throughredact_secretsat the writer chokepoint — any AWS key in.git/history is replaced with<REDACTED:fingerprint=…>in the persisted slice.IndexHealthProberuns last (registryruns_last=True). Reads sibling slices; constructsIndexFreshness.Fresh(indexed_at=…)for SCIP (just re-indexed),IndexFreshness.Freshfor runtime trace (image digest match), etc. Theindex_health.{scip,runtime_trace,sbom,semgrep}.confidencestrings in the persisted slice are derived from these typed values.- Output merge + sanitizer + writer. The two-pass sanitizer (Phase 0 +
redact_secrets) runs once over the merged envelope. Validates against schema. Writes.codegenie/context/repo-context.yamlatomically. CONTEXT_REPORT.mdConfidence section is generated alongside; pattern-matches everyIndexFreshnessvalue viaassert_never-checkedmatch. AnyStalevariant prints its reason.- Audit anchor (Phase 0 unchanged). Per-probe execution path (
Ran/CacheHit/Skipped). - Exit 0. Total wall-clock: ~10 s, dominated by SCIP re-index. Without SCIP re-index (whitespace-only edit, SCIP cache-hits): ~1.5 s.
Cold gather (first time on a 50k-LOC service with no built image): SCIP re-index (~10 s) + docker build (~47 s) + 5 trace scenarios (~75 s sequential) + others in parallel under the single semaphore. Total: ~110-140 s. Meets the ≤ 180 s p95 cold-gather target.
Failure modes & recovery¶
| Failure | Detected by | Containment | Recovery | Source |
|---|---|---|---|---|
External CLI missing (e.g., semgrep not on $PATH) |
Phase 0 tool-readiness check at startup | Typed MissingToolError; CLI exits with install-command-from-localv2.md §6 if mandatory |
Operator installs the tool | [B] |
| External CLI exits non-zero | run_external_cli returns non-zero ProcessResult |
ScannerOutcome.ScannerFailed(exit_code, stderr_tail); ProbeOutput.confidence="low" |
Coordinator continues (Phase 0 failure isolation) | [B+S] |
| External CLI emits invalid JSON | Pydantic smart constructor returns Result.Err(ParseError(...)) |
Typed error; stdout/stderr tail in audit | Operator inspects audit log | [B] |
scip-typescript timeout on huge monorepo |
asyncio.wait_for at timeout_seconds=300 |
IndexFreshness.Stale(reason=IndexerError(message="timeout")); Phase 3 adapter falls back to tree-sitter (per ADR-0032 declared fallback) |
Operator re-runs with --force-refresh after fixing the underlying issue |
[P+B] |
docker build fails |
Subprocess exit code | C-tier probes emit confidence="unavailable"; gather completes with degraded tier-C |
Operator fixes Dockerfile; re-runs | [P] |
strace exec fails (macOS) |
run_external_cli raises StraceUnavailable typed exception |
TraceScenarioFailed(reason=StraceUnavailable()) per scenario; IndexHealthProbe reads aggregate TraceCoverage and emits IndexFreshness.Stale(reason=IndexerError(message="strace_unavailable")) for runtime_trace; gather still succeeds |
macOS path is permanent; CI is Linux-canonical | [P+B+S] |
gitleaks finds a real AWS key in .git/ history |
gitleaks parses; SecretRedactor matches AKIA... |
Plaintext replaced with <REDACTED:fingerprint=…> in repo-context.yaml, raw artifact, cache. Plaintext is not persisted in Phase 2. |
Human inspects fingerprint + file:line; runs gitleaks manually for cleartext at PR review time |
[S, scaled] |
Hostile YAML in Skill triggers !!python/object |
safe_yaml.load (Phase 1 chokepoint) refuses; yaml.YAMLError raised |
SkillsLoader wraps as Result.Err(SkillsLoadError); the offending file is skipped with an explicit error in the gather summary; other Skills load |
Operator inspects the named file; investigates supply chain | [B+S] |
Symlink ~/.codegenie/skills/x/SKILL.md → /etc/passwd |
os.open(O_NOFOLLOW) returns ELOOP at the Skills call site |
Skill skipped with typed SkillsLoadError(reason="symlink_refused"); loud CLI warning |
Operator investigates planted symlinks | [S] |
tree-sitter grammar BLAKE3 mismatch against tools/grammars.lock |
Pre-load hash check | GrammarLoadRefused typed failure; probe slice marked confidence: low; no grammar code executes |
Operator deliberately updates the pin (PR-reviewable) or investigates supply chain | [S] |
| Stale-SCIP fixture in CI (deliberate seeded staleness) | IndexHealthProbe reads last_indexed_commit mismatch |
Returns IndexFreshness.Stale(reason=CommitsBehind(n>=1, last_indexed=<prior>)); CI test asserts this exact typed outcome; build passes only if probe caught it |
This is the roadmap exit criterion | [synth] |
| Hostile semgrep/grype/gitleaks JSON (truncated, oversized, deeply nested) | Pydantic smart constructor + JSONValue tree depth cap |
Probe emits ScannerOutcome.ScannerFailed(reason="invalid_json", stderr_tail=stdout[-2048:]); sanitizer rejects oversized payloads upstream |
Operator inspects audit | [B+S] |
| Adversarial Dockerfile (forkbomb, infinite loop in build) | Phase 0 timeout + container --network=none --cap-drop=ALL --security-opt=no-new-privileges |
Probe times out; TraceScenarioFailed(reason=Timeout(seconds=120)); coordinator continues per Phase 0 isolation |
Operator inspects audit; investigates adversarial repo | [B+S] |
| Concurrent gather race against same repo | Phase 0 advisory lock at .codegenie/cache/.lock |
Second invocation waits or fails fast (configurable) | — | [P] |
| Plain Stage 7 telemetry hooks needed | — | Not in Phase 2 scope per [B]'s acknowledged blind spot; Phase 9/11 ships them | — | [B] |
Pattern across all rows: every failure produces a typed value, not a thrown exception (Rule 12 — fail loud, structural). Exceptions are reserved for genuinely-exceptional cases (bugs, OOM, signals).
Resource & cost profile¶
- Tokens per run: 0. Phase 0
fencejob continues to assert. Phase 2'sgatherextras additions:networkx(pure Python);py-tree-sitter+ grammars (the one C-extension exception per amendment to Phase 1 ADR-0009). NOmsgpack,scip-python,tantivy(opt-in only, falls back toripgrep-via-run_allowlisted),gitleaks-python(we shell out to thegitleaksbinary),httpx/requests/socket. Nogitpython—gitis already inALLOWED_BINARIES; we shell out for HEAD + rev-list-count. - External CLI runtime additions to ALLOWED_BINARIES:
semgrep,syft,grype,gitleaks,scip-typescript,tree-sitter(binary, optional — fallback to Python bindings),docker,strace(Linux). Each entry adds to Phase 0ALLOWED_BINARIESvia ADR0001-add-docker-and-security-cli-tools.md. - Wall-clock (1k-file fixture): Cold p50 ≤ 90 s; p95 ≤ 180 s. Warm cache p50 ≤ 1.5 s. Incremental (single .ts change) p50 ≤ 10 s.
- Memory peak: ≤ 600 MB during cold gather (dominated by
scip-typescript~400 MB subprocess andsemgrep~200 MB; codegenie process ~150 MB). Warm: ≤ 200 MB. - Disk per gather: repo-context.yaml ~60 KB;
raw/~8 MB (SCIP binary ~2 MB; SBOM ~1 MB; traces ~4 MB); audit anchor ~500 bytes per gather. - CI walltime delta vs. Phase 1: +5–6 minutes serial on the portfolio + adversarial lanes. No
pytest-xdist— the Phase 0 veto holds; the Phase 2 portfolio is small enough that serial CI is acceptable. The performance design's xdist exception is rejected (resolves critic finding #8 by not reversing the veto). - Where security/best-practices traded off perf: (a) sequential runtime trace scenarios (~75 s wall-clock floor vs. theoretical 15 s if parallel) — accepted because parallel traces against the same image race resources and confuse attribution; (b) no in-process
ThreadPoolExecutorinsideTreeSitterImportGraphProbe(~10 s sequential vs. theoretical ~3 s threaded) — accepted because hidden parallelism lies to the coordinator's semaphore budget; (c) plaintext-not-persisted secret-redaction (operator must manually re-derive cleartext at PR review time) — accepted because in-tier encryption is theatre.
Test plan¶
The Phase 0 + Phase 1 test stack carries forward unchanged. Phase 2 adds:
Unit tests (tests/unit/probes/, tests/unit/{indices,runtime,security,conventions,skills,tccm,adapters,depgraph}/):
| Test module | Asserts |
|---|---|
test_index_health_probe.py |
Per-source freshness assertions; every IndexFreshness variant constructible; cache_strategy = "none" enforced; runs_last annotation respected by coordinator |
test_indices_freshness.py |
IndexFreshness round-trip (model_dump_json → model_validate_json = identity); exhaustive match test that uses assert_never (missing case is a mypy --warn-unreachable build error in CI) |
test_scip_index_probe.py |
scip-typescript invocation; output binary present; cache-key sensitivity to tool-version stamp + Merkle of .ts files; timeout → IndexerError |
test_runtime_trace_probe.py |
Per-scenario sequential execution; per-scenario timeout; macOS StraceUnavailable deterministic path |
test_dep_graph_probe.py |
@register_dep_graph_strategy registry works; one strategy per PackageManager variant; monorepo graph correct |
test_tree_sitter_import_graph.py |
Per-file extraction; no internal thread pool; grammar pin verified at load |
test_security_wrappers.py (one per scanner) |
Pydantic smart constructor; subprocess mocked via pytest-subprocess; ScannerOutcome variants |
test_skills_loader.py |
Frontmatter parsing; O_NOFOLLOW symlink refusal; three-tier merge + shadowing warning; body byte-offset not loaded |
test_conventions_catalog.py |
One test per pattern type; NotApplicable path |
test_tccm_loader.py |
Loads the reference TCCM (docs/phases/02-context-gather-layers-b-g/_reference-tccm/); unknown compute: variant fails fast; five DerivedQuery variants round-trip |
test_adapter_protocols.py |
Protocol structural typing (a no-op stub passes isinstance via runtime_checkable); AdapterConfidence variants construct |
test_secret_redactor.py |
Each pattern class matches; entropy threshold catches generic high-entropy strings; mutation test: weakened regex causes at least one test to fail |
test_run_external_cli.py |
Env strip; allowlisted-egress respected; stdout cap; bubblewrap graceful no-op on macOS |
Integration tests (tests/integration/):
- One per scanner against a real-tool invocation (tiny vulnerable JS fixture for semgrep; tiny built image for syft → grype; planted dummy AWS key for gitleaks). CI-gated on the tool being present; skip-with-warning if missing.
RuntimeTraceProbeend-to-end against a hello-world Node container;shared_libs_loadedcontains expected entries;TraceCoverage = Completewhen all 5 scenarios succeed.tests/integration/tccm/test_reference_tccm_roundtrips.py— loads the reference TCCM, asserts eachDerivedQueryprimitive variant round-trips, dispatcher mock returns typed values.
Golden-file tests (tests/golden/probes/):
- One golden file per probe per portfolio fixture. CI diffs live output vs. committed expected;
pytest --update-goldenregenerates. - Five-repo portfolio under
tests/fixtures/portfolio/:minimal-ts,native-modules,monorepo-pnpm,distroless-target,stale-scip(the load-bearing fixture).
Property tests (tests/property/):
IndexFreshnessround-trip identity (Hypothesis).SkillsLoader.find_applicable(...)monotone inevidence_keys.TraceCoveragewell-formed for any combination of scenario outcomes.
Adversarial tests (tests/adv/phase02/) — the load-bearing exit:
test_stale_scip_fixture.py— the roadmap exit criterion. Builds expectIndexFreshness.Stale(reason=CommitsBehind(n>=1, last_indexed=<known prior commit>)); build FAILS otherwise.test_hostile_skills_yaml.py—!!python/object, billion-laughs, deep nesting, symlink-escape filenames. ≥ 8 cases.test_secret_in_source.py— gitleaks finds seeded secret; SecretRedactor replaces inrepo-context.yaml; raw artifact; cache; and the audit anchor. Plaintext present in zero persisted files.test_image_digest_drift.py— mutating the built image between gathers correctly invalidates tier-C caches via the image-digest declared-input token.test_concurrent_gather_race.py— two concurrent gathers don't corrupt cache; Phase 0 advisory lock works.
End-to-end tests (tests/e2e/):
- One end-to-end gather against a pinned open-source Node.js fixture; full
repo-context.yaml; everyIndexFreshnessvalue isFresh.
Bench (advisory, not gating; Phase 0 §3.2 discipline):
bench_portfolio_walltime.py— flags > 50% regressions on warm/cold p50.
Design patterns applied¶
| Decision | Pattern applied | Why here | Source | Pattern NOT applied (and why) |
|---|---|---|---|---|
IndexFreshness = Fresh \| Stale(reason: StaleReason) instead of freshness: Optional[str] |
Sum type / tagged union + Make-illegal-states-unrepresentable (ADR-0033 §3–4) | "Stale without a reason" is the silent failure mode B2 exists to prevent; mypy --warn-unreachable makes a missed case a build error |
[B] | Null Object Pattern — loses the reason a stale index is stale |
_run_external_cli (Layer B/G) and direct run_allowlisted("docker", ...) (Layer C) |
Command pattern at the value-typed-argv level | Auditing "every external CLI invocation" is grep _run_external_cli for B/G; grep "docker" for C. One chokepoint per family; one ADR per ALLOWED_BINARIES addition |
[S, scaled] | Hexagonal Port/Adapter — one adapter today is a function; "Port" labeling is ceremony (critic [S]) |
Adapter Protocol definitions in codegenie.adapters.protocols |
Structural subtyping (PEP 544) | Plugins are external (ADR-0031); inheriting from our ABC would couple plugin authors to our class hierarchy | [B] | Abstract Factory — too heavyweight for "instantiate the class named in plugin.yaml" |
@register_probe(heaviness="heavy", runs_last=True) registry annotations |
Registry pattern + decorator-data over ABC-fields | Scheduling data belongs to the coordinator's view, not the probe's contract; matches Phase 0's existing decorator-registry primitive | [synth] | cost_tier: Literal[0,1,2,3] ABC field ([P]) — ABC churn for a scheduling optimization (critic finding #2) |
@register_dep_graph_strategy(ecosystem: PackageManager) decorator |
Open/Closed at the file boundary | Adding a new ecosystem (Maven, Poetry) is a new file + decorator, never an edit to DepGraphProbe |
[synth — overrules [B]'s deferred string-dict] | String-keyed dict ([B]) — Phase-3-deferred sum type was the exact ADR-0033 violation the critic flagged |
SecretRedactor as a chokepoint pass in the existing Phase 0 sanitizer |
Chain of responsibility / pipeline composition | Single chokepoint discipline survives; one pass added by composition, not a parallel sanitizer | [synth — scales [S]] | Capability pattern across LLM boundary ([S]) — LLM never holds the token; authorization with a fancier name (critic [S] finding §"Capability pattern") |
ScannerOutcome, ScenarioResult, ConventionResult, IndexFreshness all Pydantic discriminated unions |
Make illegal states unrepresentable (ADR-0033 §4) | Every state machine in Phase 2 surfaces as a typed sum; pattern-matching exhaustiveness via mypy --warn-unreachable |
[B] | Optional[T] for parse results — loses the reason (same argument as IndexFreshness) |
One file per Layer G scanner; no shared ScannerRunner abstraction |
SRP + Rule of Three | Four scanners with four genuinely different I/O shapes don't share an abstraction worth ~60 LOC; chokepoint is at _run_external_cli, not at scanner-parser |
[B] | Template Method / Generic ScannerRunner — speculative abstraction |
Reference TCCM under docs/, not plugins/ |
Documentation as code, kept out of the plugin namespace | Phase 3 owns the plugin namespace; Phase 2 ships the schema with one consumer (the integration test) | [synth — overrules [B]'s tests/fixtures/plugins/synthetic--syn--syn/] |
Synthetic plugin fixture under plugins/ ([B]) — implies pluggability Phase 3 owns |
mypy --strict (Phase 0 baseline) preserved; --warn-unreachable adopted incrementally |
Strict-typing discipline (ADR-0033 §1, §4) | The IndexFreshness consumer in CONTEXT_REPORT.md requires --warn-unreachable to catch missed cases |
[synth — scales [B]] | --warn-unreachable + --enable-error-code=truthy-bool repo-wide retroactively ([B]) — Phase 0/1 retrofit blast radius (critic [B] finding #4) — Phase 2 enables them only on src/codegenie/{indices,probes/index_health.py,report,adapters,tccm}/** via per-module mypy config; full repo enablement is a tracked backlog item |
Patterns considered and deliberately rejected¶
-
Plugin Loader in Phase 2 ([P]). Roadmap and ADR-0031 §Consequences §1 assign the loader to Phase 3 alongside the first plugin. Pulling it forward hollows out Phase 3's exit criterion ("first plugin doubles as proof the loader works") because the loader would already exist without anything to test it. We ship
Protocolclasses (documentation) and theTCCMLoaderskeleton (kernel scaffolding) — noplugin.yamlparser, noplugins/universal--*--*/directory. -
cost_tier: Literal[0,1,2,3]on the Probe ABC ([P]). Coordinator scheduling data does not belong on the probe contract; it belongs on the registry annotation alongside the@register_probedecorator. Critic finding #2 was correct that this is contract churn for a scheduling optimization. -
ProbeContext.capabilities: ProbeCapabilitiesdiscriminated union ([S]). Every Phase 0/1 probe would need tomatchexhaustively on the discriminator to stay typecheck-clean — a coordinated every-file edit dressed as "additive." Phase 2 instead keeps capabilities implicit (the registry already declares heavy/light; the subprocess port_run_external_clialready gates network egress). -
Cryptographic anchoring on B2 + audit-log hash chain ([S]). Defends against an attacker who can write to
.codegenie/cache/— which per Phase 0 ADR-0011 requires having already compromised the host. Critic [S] finding #4 and #3 (hidden assumption) both correctly named this as ceremony against a non-threat. -
Per-repo encryption key for secret findings +
~/.codegenie/keys/<repo>.key([S]). Key + ciphertext live in the same trust tier ($HOME). Critic [S] finding #5 named this as obfuscation, not security. Phase 2's structural fix: don't persist plaintext. -
pytest-xdistfor the portfolio test lane ([P]). Phase 0 vetoed xdist 10/4 with a recorded rationale. The performance design reversed the veto unilaterally; Phase 2's portfolio is small enough that serial CI walltime fits. Critic finding #8 was correct. -
AdapterConfidenceas the type of every probe's freshness output ([P]). That conflates ADR-0033's prescription for ADR-0032 adapter outputs (Phase 3) with Phase 2's probe output. We keepIndexFreshnesslocalized to probes;AdapterConfidenceis a Phase 3 concern. -
Event stream (
.codegenie/events/) with hash-chained JSONL ([S]) / shape-compatible events writer ([P]). ADR-0034 §Consequences §1 explicitly defers the canonical event log to Phase 9 (or 13). Pre-shaping events in Phase 2 risks shape drift; Phase 9 owns the schema. -
scip-pythonparser + msgpack-on-disk projection ([P]). Adapter consumption shape is a Phase 3 concern. Adding a binary on-disk format that future adapters must agree on creates the very format-coupling [P] claimed projections eliminate. -
Out-of-process
_grammar_runnersubprocess for tree-sitter ([S]). The grammar pin already guards the supply-chain surface; the subprocess wrap is over-engineering for the Phase-2 threat model (a malicious grammar would be a deliberate supply-chain compromise the pin catches at load). -
SkillsLoader.__init__with auto-discovery via env vars (any design's temptation). Explicit search paths passed at construction; loader doesn't peek at env or import paths. -
gitpythonas a new Phase 2 dep ([B]'s open Q §4).gitis already inALLOWED_BINARIES; we shell out viarun_allowlistedfor HEAD + rev-list-count. Fewer deps; one less subprocess pattern.
Anti-patterns avoided¶
- Premature pluggability. No Plugin Loader; no universal-fallback plugin; no
NullAdapterfixture set. The Protocol classes ship as documentation; their first real consumer (Phase 3) is the proof the contract works. - Untyped
dict[str, Any]interfaces. Every Phase 2 module exchange goes through Pydantic models. The one inherited untyped surface (ProbeOutput.schema_slice: dict[str, JSONValue]) is bounded by Phase 0's recursiveJSONValuetype. - Side effects in constructors.
SkillsLoader.__init__(self, search_paths)is pure data; first call toload_all()is the first I/O. - Tag-and-dispatch without a tagged union. The Phase 0
cache_strategy: Literal["content", "none"]field is preserved; Phase 2 does not introduce a third behavior viacache_key()override ([P]'s image-digest-keying is expressed as adeclared_inputsspecial token, preserving the existing discriminator). - "Hexagonal" sandbox that smuggles subprocess into the core.
_run_external_cliis honestly a Command-pattern wrapper, not a hexagonal Port. We don't claim what we didn't build. - Schema before consumer. Every typed sum type in Phase 2 has at least one Phase-2 consumer:
IndexFreshnessis consumed byCONTEXT_REPORT.md's confidence-section renderer;TCCMis consumed by the reference-TCCM integration test;ScannerOutcomeis consumed by every Layer G probe's caller. AdapterProtocols are the one exception (documented as documentation-as-code, with Phase 3's exit gating their first real consumer). - Bypass-by-omission for
Result[T, E]. Phase 0/1 don't shipResult; Phase 2 introduces it in new code only, with aforbidden-patternspre-commit rule (bare except: passbanned insrc/codegenie/{indices,tccm,skills,conventions,adapters,depgraph}/**).
Risks (top 5)¶
-
Adapter Protocol drift between Phase 2 and Phase 3. We ship four
Protocolclasses with no implementations; Phase 3's first adapter may discover the Protocol is wrong. Mitigation: Phase 3's exit criterion explicitly requires "the first adapter implements the Phase 2 Protocols unchanged" — any drift is a Phase 2 amendment ADR, not a Phase 3 quiet edit. The Phase 2 reference TCCM exercises theDerivedQuerydiscriminator across all five primitives, giving Phase 3 a typed target for the adapter dispatch shape. -
IndexFreshnessconsumer-coverage gap. The Phase-2 consumer (CONTEXT_REPORT.md's confidence section) is the only thing exercising the sum type's variants until Phase 3 ships. If the variant set is wrong, we discover it late. Mitigation: the consumer is real code (src/codegenie/report/confidence_section.py), not test scaffolding; every golden file exercises it;mypy --warn-unreachableon that module enforces exhaustiveness from day 1. -
The deliberately-seeded
stale-scipfixture goes stale. Ifscip-typescriptupstream changes its header format, the fixture may stop catching the regression we built it for. Mitigation: the fixture is content-hash-pinned and the assertion checks the structural property (CommitsBehind.n >= 1), not a specific tool-version artifact. A fixture-regeneration runbook lives intests/fixtures/portfolio/stale-scip/README.md. -
tree-sitteris Phase 2's one C-extension exception; its CVE surface compounds. Phase 1 ADR-0009's named-trigger amendment accepts this; iftree-sitterships a memory-corruption CVE, Phase 2's import-graph probe is the affected surface. Mitigation: grammar BLAKE3 pins intools/grammars.lock; CVE-feed surface watched bypip-audit+osv-scannerper Phase 0 §2.5; in-process load (not subprocess) means a crashed grammar crashes the gather, which is loud (Phase 0 isolation contains it to one probe). -
dockeradded toALLOWED_BINARIESis a new attack surface. Phase 2 ADR0001-add-docker-to-allowed-binaries.mdaccepts this.docker buildruns adversarial-Dockerfile RUN instructions inside a container with--network=none --cap-drop=ALL --security-opt=no-new-privileges. Mitigation: the Layer-C probes usedockerwith explicit hardening flags constructed in the probe module (not via a Hexagonal Port — direct usage is the honest shape); ADR-0012 microVM substitution is the Phase 5+ upgrade path; the Phase 2 risk is documented as the "good enough until microVM" tier per critic-noted security trade.
Synthesis ledger¶
Vertex count¶
- Performance design ([P]): ~32 decision vertices.
- Security design ([S]): ~38 decision vertices.
- Best-practices design ([B]): ~30 decision vertices.
- Total: ~100 atomic decision vertices.
Edges¶
- AGREE: ~24 (all three on: no LLM in gather, IndexHealthProbe is load-bearing, secret findings need redaction at the writer chokepoint, sequential runtime trace scenarios, Pydantic discriminated unions for state machines, one file per Layer G scanner, no
gitleaks-pythonlibrary dep, no Bundle Builder in Phase 2) - CONFLICT: ~18 (resolved below)
- COMPLEMENT: ~12 (e.g., [B]'s
SkillsLoader+ [S]'sO_NOFOLLOWdiscipline compose) - SUBSUME: ~6 (e.g., [P]'s tree-sitter parallelism inside the probe is subsumed by [B]'s no-internal-pool default)
Conflict-resolution table¶
| Dimension | [P] picks | [S] picks | [B] picks | Winner | Exit | Roadmap | Commitments | Critic | Pattern | Sum |
|---|---|---|---|---|---|---|---|---|---|---|
| Plugin loader in Phase 2 | YES (loader + universal fallback) | (implicit, assumes loader) | NO (Protocols + TCCMLoader only) | [B] | 3 (no exit dependency) | 3 (Phase 3 owns it per ADR-0031) | 3 (extension-by-addition) | 3 (critic finding #1) | 2 (premature pluggability) | 14 |
| Probe ABC contract change | YES (cost_tier) |
YES (capabilities) |
NO | [B+synth] (kernel registry annotations instead) | 2 | 2 (preserves Phase 0/1 frozen surface) | 3 (commitment §2.5) | 3 (critic finding #2) | 3 (Open/Closed) | 13 |
IndexFreshness / AdapterConfidence / IndexConfidence |
AdapterConfidence (in probes) |
IndexConfidence (B2-only) |
IndexFreshness (B2; sum type at codegenie.indices.freshness) |
[B] | 3 (variant set fits exit criterion) | 2 (Phase 3 picks AdapterConfidence for adapters) |
3 (commitment §2.3) | 3 (critic finding #3) | 3 (illegal-states) | 14 |
| Secret findings handling | not addressed | redact + encrypted-on-disk under ~/.codegenie/keys |
inline JSON in gitleaks-findings.json |
[synth] (redact at writer chokepoint; do NOT persist plaintext; Phase 5 microVM is escalation door) | 2 | 3 | 3 (commitment §2 host hygiene) | 3 (critic finding #7) | 2 (no theatre) | 13 |
pytest-xdist reversal |
YES (portfolio lane) | silent | silent | Phase 0 veto holds | 1 | 2 (preserves Phase 0 decision) | 2 (no flake budget) | 3 (critic finding #8) | 1 | 9 |
| External-CLI sandbox | none (cost-tier only) | mandatory bubblewrap + macOS gap | none (run_allowlisted only) |
[synth] (_run_external_cli wraps run_allowlisted; bubblewrap on Linux when available, no hard requirement) |
1 | 1 | 2 | 3 (critic [S] findings #1, #6) | 2 (Command, not Hexagonal) | 9 |
ExternalDocsProbe network capability |
added via httpx |
sidecar binary under bwrap | opt-in with skip-cleanly | [B] (opt-in; default disabled; if enabled, uses _run_external_cli against an allowlisted host catalog) |
2 | 2 | 3 (commitment §2 — no httpx/requests/socket import in src/codegenie/) |
2 | 2 | 11 |
| Tree-sitter dep amendment to ADR-0009 | YES (also msgpack, scip-python, tantivy, gitleaks-python) | YES (also seccomp/bubblewrap libs) | YES (just tree-sitter) | [B] (tree-sitter only; named trigger fired) | 2 (B3 needs it) | 2 | 3 (commitment §2.5 dep-creep) | 3 (critic shared blind spot #2) | 2 | 12 |
| Cache-key strategy for image-built probes | new cache_key() override hook bypassing declared_inputs |
cryptographic anchor via audit-log | stays on declared_inputs |
[synth] ([B]'s discipline + image-digest as a declared-input token, the special-token pathway already in localv2.md §4) |
2 | 2 | 3 (commitment §2 — declared_inputs is the universal cache key) |
3 (critic [P] finding #6) | 2 | 12 |
| Audit-log event stream | shipped (3 variants) | shipped (10+ variants, hash-chained) | not shipped | [B] (Phase 0 audit anchor unchanged; ADR-0034 says Phase 9 ships the event log) | 1 | 3 (ADR-0034 §Consequences §1) | 2 | 3 (critic [S] §"missed") | 2 (event sourcing before its consumer = anti-pattern) | 11 |
IndexFreshness module location |
(AdapterConfidence in confidence.py) |
(IndexConfidence in index_health.py) |
codegenie.indices.freshness |
[B+synth] (separate module; Phase-2 consumer in report/confidence_section.py closes the schema-without-consumer gap) |
2 | 2 | 2 | 2 | 3 | 11 |
DepGraphProbe ecosystem dispatch |
(n/a) | (n/a) | string-keyed dict with TODO | [synth] (@register_dep_graph_strategy(ecosystem: PackageManager) decorator) |
1 | 2 | 3 (ADR-0033) | 3 (critic [B] finding #5) | 3 (Open/Closed) | 12 |
gitleaks shipping shape |
binary CLI (gitleaks-python lib) | binary CLI under bwrap | binary CLI via run_allowlisted |
[B] (binary CLI via _run_external_cli) |
1 | 2 | 3 (no new C-extension lib) | 2 | 2 | 10 |
gitpython dep |
added | (silent) | open Q | shell out via run_allowlisted("git", ...) |
1 | 2 | 3 (one less dep; git already allowlisted) |
2 (critic [B] §"hidden assumption" #2) | 2 | 10 |
mypy --warn-unreachable rollout |
(n/a) | (implied via assert_never) |
repo-wide retroactive | [synth] (per-module config: only Phase 2 modules) | 2 | 2 | 2 (commitment §3 — surgical changes) | 3 (critic [B] finding #4) | 2 | 11 |
RuntimeTraceProbe cache-key shape |
image-digest override hook | (silent) | declared_inputs = ["Dockerfile", ".codegenie/scenarios.yaml"] |
[synth] (image digest as declared-input special token via Phase-2 ADR-gated optional ProbeContext.image_digest_resolver) |
2 | 2 | 3 (declared_inputs discipline) |
3 | 2 | 12 |
SkillsLoader YAML safety |
(silent) | parallel _safe_yaml_load_skill chokepoint |
reuses Phase 1 safe_yaml.load |
[B+synth] (Phase 1 chokepoint + O_NOFOLLOW at Skills call site) |
2 | 2 | 3 (Rule 7 — don't fork conventions) | 3 (critic [S] finding #3) | 2 | 12 |
TreeSitterImportGraphProbe parallelism |
internal ThreadPoolExecutor |
(silent) | (silent) | [synth] (no internal pool; sequential under single semaphore) | 1 | 1 | 2 (honesty to coordinator's budget) | 3 (critic [P] §"hidden assumption" #3) | 2 | 9 |
Shared blind spots considered¶
All three designs quietly agreed on patterns the synthesis re-examined and resolved:
- Sum type pre-shipped without a real consumer — fixed by Phase-2-internal consumer (
CONTEXT_REPORT.md's confidence section) forIndexFreshness; reference TCCM underdocs/forTCCM; Phase 3 contract for adapterProtocols. tree-sitteradded without engaging Phase 1 ADR-0009 — fixed by explicit Phase 2 ADR amendment0002-tree-sitter-grammars-phase-2-amendment.md;msgpack/scip-python/tantivy/gitleaks-pythonremain rejected.RuntimeTraceProbe5-scenario configuration shape — three designs proposed three shapes (config flag, typed enum, scenarios.yaml). Phase 2 picksscenarios.yamlPydantic-validated ([B]); falls back to 5 default scenarios if absent.
Pattern reconciliation¶
| Pattern | Where it appeared | Synthesis disposition | Rationale |
|---|---|---|---|
| Plugin architecture / Plugin loader | [P] §2 (loader + universal fallback); [B] (Protocols only) | Adopt [B]: Protocols + TCCMLoader only. No loader in Phase 2. | Roadmap + ADR-0031 §Consequences §1 explicitly assign loader to Phase 3 |
| Hexagonal / Ports & Adapters | [P] §2 (loader-as-Port); [S] (_run_external_cli-as-Port) |
Reject both as Hexagonal; accept the Command pattern shape for _run_external_cli. |
Critic correctly flagged "one Adapter = no Port"; we don't claim what we don't build |
| Capability pattern | [S] (ProbeCapabilities + SecretFindingCapability) |
Reject. Authorization across an LLM boundary is not capability; LLM never holds the token | Critic [S] §"Capability pattern applied to SecretFindingCapability" |
| Event sourcing | [P] §8 (3 events); [S] §"EventStream" (10+ events with hash chain) | Defer. ADR-0034 §Consequences §1 says Phase 9 anchors the event log. Phase 2's audit anchor (Phase 0) is unchanged | Pre-shaping events before their consumer = schema-before-consumer anti-pattern |
| Decorator-registry | [P] (deprecated for cost_tier ABC field); [B] (preserved); [synth] extends with kwargs |
Adopt and extend (heaviness, runs_last, @register_dep_graph_strategy) |
Open/Closed at the file boundary; mirrors Phase 0 primitive |
Smart constructor + Result[T, E] |
[B] | Adopt selectively for new Phase 2 module boundaries; not retrofit to Phase 0/1 | Surgical-changes discipline (Rule 3); Phase 2 surfaces are isolated |
| Make-illegal-states-unrepresentable | [B] (every sum type); [P] (AdapterConfidence); [S] (IndexConfidence) |
Adopt with one name (IndexFreshness), one module, one Phase-2 consumer |
Resolves three competing names; closes critic finding #3 |
| Functional core / imperative shell | [P] (SCIP projector); [B] (CatalogLoader.apply) | Adopt where it earns its name; reject ceremony | Critic [P] §"FCIS on SCIP projector" — labeling pure functions as "core" doesn't earn the pattern |
| Strategy via Protocol | [B] (adapter Protocols, zero implementations) | Accept the ceremony cost | Protocols are the Phase 3 contract; risk is bounded by Phase 3 exit gating drift |
| Open/Closed | [synth] (DepGraphProbe decorator) |
Apply where the input designs flagged a TODO | Critic finding #5 against [B] |
Departures from all three inputs¶
-
No event stream in Phase 2. [P] ships 3 events; [S] ships 10+ hash-chained. Final design ships zero. Justification: ADR-0034 §Consequences §1 is unambiguous that Phase 9 anchors the event log; pre-shaping risks schema drift; the Phase 0 audit anchor already records
Ran/CacheHit/Skippedper probe, which is the only signal Phase 2 actually needs to satisfy the exit criterion. -
Image digest as a declared-input special token, not as a cache-key override. [P] proposed letting probes override
cache_key(); [S] is silent; [B] stays ondeclared_inputs. Final design extendsdeclared_inputswith a special-token form (the special-token pathway already permitted bylocalv2.md §4), via a Phase-2-ADR-gated optionalProbeContext.image_digest_resolvercallable. The discipline survives without bypass. -
heavinessregistry annotation instead ofcost_tierABC field. [P] proposed ABC field; [B] proposed nothing. Final design picks registry-annotation-as-decorator-kwarg, preserving the Phase 0 ABC and giving the coordinator a soft sort key under the single semaphore. -
No plaintext secret persistence. [S] proposed encrypted-on-disk; [P+B] silent. Final design persists no plaintext at all. The Phase 5 microVM is named explicitly as the escalation door for any future cleartext-required judgment.
-
heaviness+runs_lasttogether replace [P]'s 4-tier semaphore +IndexHealthProbe.requires=[every-other-probe]topological hack. The coordinator stays single-semaphore; ordering is a registry-side concern. -
Reference TCCM ships under
docs/, nottests/fixtures/plugins/. [B] proposed a synthetic plugin fixture; final design ships the reference TCCM as documentation so it doesn't imply pluggability Phase 3 owns. -
@register_dep_graph_strategydecorator instead of [B]'s deferred string-keyed dict. The fix is ~30 LOC and applies ADR-0033 immediately rather than deferring to Phase 3. -
SkillsLoaderreuses the Phase 1safe_yaml.loadchokepoint (with one extraO_NOFOLLOWdiscipline at the call site), instead of [S]'s parallel_safe_yaml_load_skillhelper. Rule 7 — don't fork conventions.
Exit-criteria checklist¶
- [x] "Every probe layer runs against real repos" → the 5-repo
tests/fixtures/portfolio/(minimal-ts,native-modules,monorepo-pnpm,distroless-target,stale-scip) exercises every probe layer. The integration tests undertests/integration/portfolio/are CI-gated; the bench canarytests/bench/bench_portfolio_walltime.pyis advisory. - [x] "IndexHealthProbe surfaces at least one real staleness case in CI (deliberately seeded fixture) — proving the probe actually catches what it's there to catch" →
tests/adv/phase02/test_stale_scip_fixture.pyassertsIndexHealthProbereturnsIndexFreshness.Stale(reason=CommitsBehind(n>=1, last_indexed=<known prior commit>))on thetests/fixtures/portfolio/stale-scip/fixture. Build FAILS if the probe doesn't catch it. This is the load-bearing test.
Load-bearing commitments check¶
- §2.1 No LLM in gather pipeline. Phase 0
fencejob continues to assert; Phase 2gatherextras add onlynetworkx(pure Python),py-tree-sitter(one C-extension exception, ADR-amended), andpydanticextensions (already in Phase 0). Noanthropic/openai/langgraphSDKs. ✅ - §2.2 Facts, not judgments. Every probe reports evidence;
IndexHealthProbereportsIndexFreshness.Stale(reason=CommitsBehind(n=17, …)), not "unsafe to use." ✅ - §2.3 Honest confidence.
IndexHealthProbeis the canonical example, withIndexFreshnessas the typed return; the Phase-2 consumer (CONTEXT_REPORT.md's confidence section) exercises every variant. ✅ - §2.4 Determinism over probabilism for structural changes. Phase 2 ships no transforms; the gather pipeline is deterministic end-to-end. ✅
- §2.5 Extension by addition. Adding a probe is a new file +
@register_probe; adding an ecosystem toDepGraphProbeis a new file +@register_dep_graph_strategy; no edits to existing probes or coordinator chokepoints (ProbeABC,OutputSanitizer.scrub,run_allowlisted, cache API). The one ABC-adjacent edit (ProbeContext.image_digest_resolver) follows Phase 1 ADR-0002'sparsed_manifestprecedent — additive, optional, ADR-gated. ✅ - §2.6 Organizational uniqueness as data, not prompts. Skills (YAML frontmatter), conventions (YAML), TCCMs (YAML), all Pydantic-validated. ✅
- §2.7 Progressive disclosure. Skill bodies are byte-offset-recorded, not loaded into memory; conventions, ADRs, repo notes referenced by path only. ✅
- §2.8 Humans always merge. Phase 2 is gather-only; no autonomy gates touched. ✅
- §2.9 Cost is observable end-to-end. Phase 2 emits no LLM cost; the Phase 0 audit anchor records per-probe
Ran/CacheHit/Skipped, which Phase 9 will project into the cost ledger when the event log lands. ✅
Roadmap coherence check¶
What prior phases established that this design depends on:
- Phase 0:
ProbeABC +@register_probedecorator +Coordinator(Semaphore(min(cpu_count(), 8)))+Cache(declared_inputs)+OutputSanitizer.scrub+run_allowlisted+ALLOWED_BINARIES+fenceCI test + audit anchorruns/<utc-iso>-<short>.json. All preserved; Phase 2 extendsALLOWED_BINARIESand composes a newredact_secretspass into the sanitizer. - Phase 1: Layer A probes (
LanguageDetection,NodeBuildSystem,NodeManifest,CI,Deployment,TestInventory);parsed_manifestmemo onProbeContext(precedent for the Phase 2image_digest_resolveraddition);safe_yaml.loadchokepoint; ADR-0009 (no new C-extension parser deps) — amended in Phase 2 with one named-trigger exception (py-tree-sitter);PackageManagerschema enum (["bun", "pnpm", "yarn-classic", "yarn-berry", "npm"]) — imported and reused as theDepGraphProbediscriminator.
What this design establishes that later phases will need:
- Phase 3 (first plugin): consumes
codegenie.adapters.protocolsProtocols, theIndexFreshnesssum type, theTCCMLoader, the_run_external_clichokepoint, and the per-probe slice shapes Phase 2 wrote. Phase 3 ships the Plugin Loader + first plugin + four ADR-0032 adapter implementations + universal fallback plugin together (as ADR-0031 §Consequences §1 prescribes — these are all Phase 3, not Phase 2). - Phase 4 (LLM fallback): consumes the
redact_secretschokepoint to ensure no secret reaches an LLM prompt. - Phase 5 (microVM sandbox): the escalation door for any future cleartext-required judgment on secret findings; replaces direct
dockerinvocations inRuntimeTraceProbeper ADR-0012. - Phase 8 (Supervisor + Bundle Builder): consumes
TCCMLoader, every adapter from Phase 3+, and theIndexFreshnessconfidence signal. - Phase 9 (canonical event log): projects the Phase 0 audit anchor into the typed Postgres event log; the Phase 2 slice metadata (
gathered_at,last_indexed_commit, etc.) becomes input to that projection.
Any new ADRs implied by this design that should be drafted:
docs/phases/02-context-gather-layers-b-g/ADRs/0001-add-docker-and-security-cli-tools-to-allowed-binaries.md—docker,strace,semgrep,syft,grype,gitleaks,scip-typescript,tree-sitteradded toALLOWED_BINARIES(mirroring Phase 1 ADR-0001nodeaddition).docs/phases/02-context-gather-layers-b-g/ADRs/0002-tree-sitter-grammars-phase-2-amendment.md— amendment to Phase 1 ADR-0009 (no new C-extension parser deps): named-trigger fired forpy-tree-sitterbecauselocalv2.md §5.2 B3requires it.docs/phases/02-context-gather-layers-b-g/ADRs/0003-coordinator-heaviness-sort-annotation.md—@register_probe(heaviness=…, runs_last=…)registry annotations; coordinator sort-order edit; preserves the single Semaphore + ABC contract.docs/phases/02-context-gather-layers-b-g/ADRs/0004-image-digest-as-declared-input-token.md— extendslocalv2.md §4declared_inputsspecial-token mechanism with theimage-digest:<resolver>token; introduces optionalProbeContext.image_digest_resolvercallable mirroring Phase 1 ADR-0002.docs/phases/02-context-gather-layers-b-g/ADRs/0005-secret-findings-no-plaintext-persistence.md— Phase 2 does NOT persist plaintext secrets;SecretRedactorat writer chokepoint; Phase 5 microVM is the named escalation door for any future cleartext-required judgment.docs/phases/02-context-gather-layers-b-g/ADRs/0006-index-freshness-sum-type-location.md—IndexFreshnesslives atcodegenie.indices.freshness; documents whyAdapterConfidenceandIndexConfidenceare NOT shipped in Phase 2.docs/phases/02-context-gather-layers-b-g/ADRs/0007-no-plugin-loader-in-phase-2.md— explicit deferral; Phase 3 ships loader + first plugin + adapters + universal fallback together per ADR-0031 §Consequences §1.docs/phases/02-context-gather-layers-b-g/ADRs/0008-no-event-stream-in-phase-2.md— defers to ADR-0034 §Consequences §1 (Phase 9 anchors the event log).docs/phases/02-context-gather-layers-b-g/ADRs/0009-pytest-xdist-veto-preserved.md— explicit re-affirmation of Phase 0's veto; Phase 2 portfolio fits serial CI.
Open questions deferred to implementation¶
-
Phase 5 microVM cleartext-access protocol. The
SecretRedactordefers cleartext persistence; if a Phase 4+ task class needs cleartext access for a remediation judgment, the Phase 5 microVM re-derives the secret from the analyzed repo at that point in time inside the sandbox. The exact handoff (does the microVM receive(file:line, pattern_class, fingerprint)and re-scan? does it receive the redacted slice + a one-time decryption capability tied to the workflow ID?) is a Phase 5 design concern. Phase 2's commitment is only that we do NOT persist plaintext anywhere Phase 4 can reach it. -
TreeSitterImportGraphProbeprojection shape. Phase 2 emitsraw/import-graph.jsonas forward-only adjacency; Phase 3's firstImportGraphAdapterdecides whether to pre-compute reverse, mmap a binary, or walk at query time. Phase 2 does not pre-decide this on Phase 3's behalf. -
SkillsLoaderorg-shared tier signing. Per-tier signing (Sigstore-style) for~/.codegenie/skills-org/is a Phase 14 multi-tenant concern; Phase 2 ships three-tier merge with first-tier-wins + loudskill_shadowedwarning. -
ExternalDocsProbeenablement & host allowlist shape. Phase 2 ships opt-in skip-cleanly; the allowlist config schema (external_docs:in.codegenie/config.yaml) lands when the first real user opts in. -
mypy --warn-unreachablerollout beyond the Phase 2 modules. Phase 2 enables it via per-module config oncodegenie.{indices,probes/index_health.py,report,adapters,tccm}/**; full-repo rollout is a tracked backlog item. -
Per-fixture cache pre-warming for CI walltime. Whether to commit
.codegenie/cache/blobs to the fixture portfolio (faster CI; opaque diff) or regenerate on every CI run (slower CI; transparent diff). Phase 2 picks regenerate-on-every-run; if CI walltime regresses past 8 minutes, this flips. -
stale-scipfixture regeneration policy. Documented intests/fixtures/portfolio/stale-scip/README.md; the structural assertion (CommitsBehind.n >= 1) is tool-version-agnostic; ifscip-typescriptchanges its header format, the fixture's pre-populated index is regenerated against the new format and the assertion still holds.