Phase 01 — Context gathering: Layer A (Node.js): Final design¶
Status: Design of record (synthesized from three competing designs + critique).
Synthesized by: Graph-of-Thought synthesizer subagent
Date: 2026-05-12
Sources: design-performance.md [P] · design-security.md [S] · design-best-practices.md [B] · critique.md
Provenance tags below: [P], [S], [B] for single-lens; [P+S], [P+B], [S+B] for two-lens agreement; [all] for unanimous; [synth] for synthesizer departure from all three.
Lens summary¶
Phase 0 built the spine — probe ABC, async coordinator, content-addressed cache, layered JSON Schema, two-pass sanitizer, subprocess allowlist, audit anchor. Phase 1's job is to populate that spine with five real Node.js probes, not to extend it.
The three lens designs split cleanly:
- [B] gets the shape: five new probe files, one new sub-schema per probe, two catalog YAML files, explicit imports, no edits to Phase 0 chokepoints. This matches Phase 0's §12 handoff verbatim. The blind spot is that [B] ignores adversarial input handling and accepts probe-isolation duplication that the perf and security lenses both flag.
- [S] gets the threat model: Phase 1 is the first phase parsing adversarial bytes from untrusted repos at scale. In-process parse caps, hard size/depth limits,
O_NOFOLLOWopens, nonode --versioninvocation, nonode_modules/*parsing by default, no Helm/Terraform rendering. The blind spot is the per-probe fork+exec sandbox — a brand-new architectural layer Phase 0 never sanctioned, which the critic correctly attacks as an ABC-bypassing edit-not-add and as a platform-conditional security claim. - [P] gets the seam: Phase 0's
ProbeExecution ∈ {Ran, CacheHit, Skipped}(final-design §2.6) is the right place to share parsedpackage.jsonacross probes — not a msgpack side-channel and not by re-parsing three times. The blind spot isviews.jsonforward-compat with an unspecified Phase 8, mmap reopening, hand-rolledyarn.lockparser justified by ~16 ms of average warm-path latency, and pushing for aPathIndexmixin that drifts the frozen ABC.
The synthesis picks [B]'s shape, [S]'s in-process caps and parse hygiene, [P]'s coordinator-level shared-parse seam — and refuses the per-probe sandbox, the views.json forward dependency, the mmap reopening, the msgpack side-channel, and the hand-rolled yarn parser. Where localv2.md and a lens disagree, localv2.md wins (Phase 0 §2.3 conformance rule).
Goals (concrete, measurable)¶
- Functional (roadmap exit):
codegenie gatherproduces a usefulrepo-context.yamlon a real Node.js repo. All six Layer A probes populate their slices. Schema validation passes. Cache hits on second run (no probe re-executes).[B+roadmap] - Probe contract conformance: Zero edits to
codegenie/probes/base.py;tests/unit/test_probe_contract.pysnapshot test passes.[B+all+critic] - Coverage ratchet: 90% line / 80% branch on
src/codegenie/excludingcli.pyand the newcodegenie/probes/*modules where structurally-narrow branches make 90% gameable; per-module floor 85% line / 75% branch fordeployment.pyandci.py, declared explicitly inpyproject.tomlwith the ADR-amendment trigger.[synth — softer than [B]'s blanket 90/80, harder than [P]'s silence] - Adversarial robustness: Zero successful parse-driven RCE or OOM against an adversarial fixture corpus (≥ 20 hostile inputs covering YAML bombs, JSON bombs, symlink escape, regex DoS, deep nesting, oversized inputs, hostile filenames). Caps enforced in-process, not via per-probe subprocess.
[synth — [S]'s threat coverage, [P]'s no-fork cost shape] - Wall-clock targets (advisory, surfaced via Phase 0 bench infrastructure, not PR-blocking):
- Cold (1k-file fixture, all probes miss cache): p50 ≤ 4 s, p95 ≤ 8 s.
[B] - Warm (cache full, all hits): p50 ≤ 0.4 s, p95 ≤ 1 s.
[B+P] - Incremental (
package.jsonchanged, four hits, two misses): p50 ≤ 1 s, p95 ≤ 2 s.[synth — softer than [P]'s 250 ms] - Hard caps in every parser (in-process, fail-loud):
package.json≤ 5 MB; lockfile ≤ 50 MB; YAML depth ≤ 64; JSON depth ≤ 64; per-probe parse wall-clock ≤ probe'stimeout_seconds(Phase 0 coordinator enforces). Exceeding any cap raises a typed exception →ProbeOutput(confidence="low", errors=[...]).[S — without the subprocess fork] - Tokens per run: 0.
[all]Phase 0fenceCI job continues to assert. - Extension by addition: Phase 1 adds only new files under
src/codegenie/probes/,src/codegenie/schema/probes/,src/codegenie/catalogs/,tests/unit/probes/,tests/adv/,tests/integration/probes/,tests/fixtures/. The only edits to existing Phase 0 files are: src/codegenie/probes/__init__.py— five newfrom . import ...lines (the documented extension seam).[B+all]src/codegenie/probes/language_detection.py— Phase 0 deliberately deferred framework hints + monorepo detection to Phase 1; the deferral is documented in Phase 0 final-design §2.10. This is an in-place extension of a Phase-0 probe that Phase 0 explicitly scoped to Phase 1, not an extension-by-addition violation.[synth — addresses critic §3]src/codegenie/exec.py— one entry added toALLOWED_BINARIES("node") gated by a new Phase-1 ADR (docs/phases/01-context-gather-layer-a-node/ADRs/0001-add-node-to-allowed-binaries.md).[B]- No new MCP/sandbox/views/streaming-writer infrastructure. No
_sandbox.py, noviews.json, noPathIndexmixin, no msgpack side-channel.[synth — vetoes [P]'s and [S]'s scope creep]
Architecture¶
codegenie gather <path>
│
▼
┌────────────────────────────┐
│ Phase 0 CLI entry (click) │ ← unchanged
│ - tool-readiness now │
│ includes optional node │
└──────────────┬─────────────┘
│
▼
┌────────────────────────────┐
│ Phase 0 Coordinator │ ← unchanged
│ - asyncio.Semaphore │
│ - per-probe asyncio.Task │
│ - cache lookup │
│ - ProbeExecution ∈ │
│ {Ran, CacheHit, Skipped}│ ← seam [P+synth]
│ - _ProbeOutputValidator │
│ - OutputSanitizer.scrub │
│ (Phase 0 two passes, │
│ unchanged) │
└──────────────┬─────────────┘
│
┌─────────────────────────┴───────────────────────────────┐
│ Phase 0 Probe Registry (explicit import — no entry pts)│
│ │
│ language_detection (extended in place; Phase 0 scoped │
│ framework + monorepo to Phase 1) │
│ │
│ ┌──────────────── Phase 1 additions ──────────────┐ │
│ │ node_build_system [synth + S caps] │ │
│ │ node_manifest [synth + S caps + B catalog]│ │
│ │ ci [synth + S caps] │ │
│ │ deployment [synth + S caps] │ │
│ │ test_inventory [synth + B file-count] │ │
│ └──────────────────────────────────────────────────┘ │
└─────────────────────────────────────────────────────────┘
│
▼
Phase 0 cache + audit + sanitizer + writer
│
▼
.codegenie/context/
├── repo-context.yaml (Phase 0 envelope + 6 slices)
├── schema-version.txt
├── raw/
│ ├── language_detection.json
│ ├── node_build_system.json
│ ├── node_manifest.json
│ ├── ci.json
│ ├── deployment.json
│ └── test_inventory.json
└── runs/<utc-iso>-<short>.json
src/codegenie/catalogs/ ← NEW (data, not code) [B]
native_modules.yaml ← NodeManifest reads
ci_providers.yaml ← CI reads
_schema.json ← catalog self-validation
src/codegenie/schema/probes/ ← NEW per-probe sub-schemas [B+S+synth]
language_detection.schema.json (Phase 0, extended)
node_build_system.schema.json
node_manifest.schema.json
ci.schema.json
deployment.schema.json
test_inventory.schema.json
# Each: additionalProperties: false at the per-probe sub-schema root
Three things to read from the diagram:
- Every Phase-0 box says "unchanged." This is the test of extension-by-addition.
[B+all] - Catalogs are a new sibling directory. They are data, loaded by the probes that consume them. Adding a native module is a YAML PR.
[B — production §2.6] - Each probe owns exactly one slice and one sub-schema. No probe writes outside its slice. The envelope
additionalProperties: trueunderprobes.*(Phase 0 §2.9) is preserved — the strictness lives in each probe's sub-schema root, not at the envelope level. This is the per-probe sub-schema policy the critic identified as the agreed-upon-but-undocumented position across all three lenses.[synth — explicitly resolves cross-design observation #1]
Components¶
1. LanguageDetectionProbe (extended in place)¶
- Provenance:
[B](shape, framework + monorepo detection) +[synth](Phase 0 explicitly deferred these to Phase 1). - Purpose: Extend Phase 0's
LanguageDetectionProbeto populate Node-specific fields fromlocalv2.md §5.1 A1: framework hints (NestJS, Express, Fastify, Next.js, Koa, Hapi), monorepo markers (pnpm-workspace.yaml,lerna.json,nx.json,turbo.json,package.json#workspaces). - Interface: Phase 0
ProbeABC unchanged.declared_inputsextended to includepackage.json(for framework dep lookup) and the monorepo marker files. - Internal design:
- Phase 0's
os.scandirwalk + extension counts unchanged. - New post-walk pass: read
package.jsonvia stdlibjson.loadswith a 5 MB size cap and a 64-level depth cap (implemented as a smallparsers.safe_json.load(path, *, max_bytes, max_depth)helper that lives insrc/codegenie/parsers/safe_json.py— see Component 8). - Framework detection: flat dict lookup of
dependencies + devDependenciesagainst a small constant set ({"@nestjs/core": "nestjs", "express": "express", "fastify": "fastify", "next": "next", "koa": "koa", "@hapi/hapi": "hapi"}). - Monorepo detection:
Path.exists()for the marker files + thepackage.json#workspacesfield if present. - Why this choice over the alternatives:
- Refuses [P]'s
PathIndexmixin (critic §1.1.1: drifts the frozen ABC). - Refuses [P]'s msgpack inter-probe cache (critic §1.1.2: bypasses sanitizer + validator).
- Refuses [S]'s per-probe sandbox subprocess (critic §2.1.1: brand-new layer Phase 0 never sanctioned).
- Adopts [B]'s "read
package.jsondirectly" with [S]'s safe-parse caps (critic §3.6: each probe re-parsing is fine if the parse is bounded and cheap; the seam to actually share parsed state lives in Component 2 below). - Extending Phase 0's file in place is licensed by Phase 0 final-design §2.10, which explicitly defers framework + monorepo detection to Phase 1. The critic flags this as "all three lenses ducked the question"; the synthesizer answers explicitly: this is the one extension-in-place that Phase 0 sanctioned.
- Tradeoffs accepted:
- Two Layer A probes (
language_detectionandnode_manifest) parsepackage.jsonindependently. Mitigated by the parsed-package.jsonin-coordinator memo (Component 2) so the second parse on the same gather is free. - Framework detection is shallow (decorator-level AST analysis is Phase 2's
NodeReflectionProbe's problem).
2. ParsedManifestMemo — in-coordinator parse memo¶
- Provenance:
[synth]— addresses critic's cross-design observation #3 ("All three accept readingpackage.jsonmore than once per gather; none uses the cheapest, cleanest seam"). - Purpose: Avoid the warm-path cost of two or three Layer A probes parsing the same
package.jsonon the same gather, without (a) [P]'s msgpack side-channel that bypasses the sanitizer or (b) [B]'s "violates DRY by a small margin" double-parse. - Interface: A typed read-through cache inside the coordinator's per-gather context. Probes access it via a single function
ctx.parsed_manifest(path: Path) -> dict[str, JSONValue] | Noneexposed onProbeContext. The function is provided by the coordinator at probe-dispatch time; first call parses (with caps), subsequent calls return the same in-memory dict. - Internal design:
- The memo is keyed by
(absolute_path, mtime_ns, size)on the coordinator side. If the file changed between probe dispatches (shouldn't happen during a single gather, but TOCTOU-safe), the memo re-parses. - The memo only memoizes files inside
repo_rootand only those matching a small allowlist ({"package.json"}in Phase 1; extendable in Phase 2). - Parsed dicts are read-only views (
types.MappingProxyTypeat the top level; nested dicts/lists are returned by reference — the contract is "don't mutate," enforced by mypy viaMappingtyping). Probes mutating returned data is a typed error at lint time. - The memo is per-gather, not per-process. It lives on the coordinator's per-run state and is discarded after the gather. Phase 9's Temporal lift will not see it; each Activity re-parses (which is correct — Activities are independent units of work).
- Why this choice over the alternatives:
- Refuses [P]'s msgpack-of-parsed-package.json (critic §1.1.2: bypasses validator and sanitizer, treated as a "contract violation by side channel"). The memo here is inside the coordinator, never written to disk, never persisted across gathers.
- Refuses [B]'s "each probe parses independently" (critic §3.6: 3× parse cost on the cache-miss path for zero isolation benefit, since each probe's
declared_inputsalready includespackage.json). - Refuses [S]'s implicit "each sandbox subprocess re-parses" (critic §2.1.2: 1.5 s of pure fork overhead the design admits to).
- Tradeoffs accepted:
- The memo is a small extension to
ProbeContext(Phase 0 dataclass), specifically the addition of an optionalparsed_manifest: Callable[[Path], dict | None]field. This does touch a Phase 0 dataclass — but the contract addition is optional (= Nonedefault), the function is only present when the coordinator provides it, and probes that don't use it are unaffected. This is the one Phase 0 dataclass extension Phase 1 makes; it requires a Phase-1 ADR (docs/phases/01-context-gather-layer-a-node/ADRs/0002-parsed-manifest-memo-on-probe-context.md). - Probes must defensive-check
ctx.parsed_manifest is not Noneand fall back to direct parsing. This is one if-statement per probe; mypy enforces it.
3. NodeBuildSystemProbe¶
- Provenance:
[B](shape, lockfile precedence, no script evaluation) +[S](in-process caps, no Helm rendering, no node_modules parsing) +[synth](node --versioncross-check is on by default but optional, resolving the conflict). - Purpose: Populate the
build_systemslice fromlocalv2.md §5.1 A2. Determines package manager, engine constraints, npm scripts, bundler, TypeScript compilation setup. - Interface: Standard probe ABC.
name = "node_build_system",layer = "A",applies_to_languages = ["javascript", "typescript"],applies_to_tasks = ["*"],requires = ["language_detection"],timeout_seconds = 30,declared_inputs = ["package.json", "pnpm-workspace.yaml", "lerna.json", "nx.json", "turbo.json", ".nvmrc", ".node-version", ".tool-versions", "tsconfig.json", "tsconfig.*.json", "package-lock.json", "pnpm-lock.yaml", "yarn.lock", "bun.lockb"]. - Internal design:
- Package-manager resolution by lockfile precedence:
bun.lockb>pnpm-lock.yaml>yarn.lock>package-lock.json(existence check, no parse). Multiple lockfiles dropconfidencetolowand emit a typed warning. - Yarn variant detection (post-precedence) — added per ADR-0013: when the resolved lockfile is
yarn.lock, run a priority-ordered detection step to emityarn-classicoryarn-berry(never the collapsedyarn). Priority chain: (1)package.json#packageManagermatches^yarn@1\.→yarn-classic; (2)^yarn@(2|3|4|\d+)\.→yarn-berry; (3).yarnrc.ymlexists →yarn-berry(Berry-only file extension); (4).yarn/directory exists →yarn-berry; (5).pnp.cjsor.pnp.loader.mjsexists →yarn-berry; (6) default →yarn-classicwithnode_build_system.yarn_variant_inferredwarning. The distinction is load-bearing: production ADR-0031 treatsvulnerability-remediation--node--yarn-classicandvulnerability-remediation--node--yarn-berryas distinct plugin scopes because their dependency-resolution models (node_modulesvs. PnP) are architecturally different. Implemented by storyS2-02a-yarn-variant-detection.mdon top of the shipped S2-02 base. package.jsonparsing viactx.parsed_manifest(repo_root / "package.json")— uses the memo from Component 2; falls back to directsafe_json.load(...)if memo is absent.tsconfig.jsonparsed viasrc/codegenie/parsers/jsonc.py— a stdlib-only line-comment + block-comment stripper followed bysafe_json.load.json5is rejected ([synth]— critic §1.1.6: each new C-extension dep is a CVE feed; we have stdlibjson+ ~30 lines of comment-strip).tsconfig.json#extendschain followed at most 4 levels deep, only to paths underrepo_root; cycles or escapes raise typed warnings and downgrade confidence.[S]- Node version: read
package.json#engines.node,.nvmrc,.node-version,.tool-versionsin declared precedence. Each is a small string read. node --versioncross-check: optional, on by default. Ifnodeis inALLOWED_BINARIESand on$PATH, callnode --versionviaexec.run_allowlisted(["node", "--version"], cwd=repo_root, timeout_s=5). The probe records both the declared constraint and the locally-resolved version; disagreement is a warning, not an error, and confidence stayshighbecause the constraint is the load-bearing fact. Thenodebinary addition toALLOWED_BINARIESis gated bydocs/phases/01-context-gather-layer-a-node/ADRs/0001-add-node-to-allowed-binaries.md.[B + localv2.md §5.1 A2 — overrides [S]'s veto per Phase 0 conformance rule]- Bundler detection: flat data lookup against
dependencies + devDependencies(webpack,rollup,esbuild,vite,parcel,turbopack) plus config-file presence (webpack.config.{js,ts,mjs,cjs},vite.config.{js,ts,mjs,cjs}, etc.). - Scripts: read
package.json#scriptsand record verbatim — no evaluation, no interpretation. - Why this choice over the alternatives:
- The
node --versiondecision is the security/best-practices conflict (table row 2). Resolution:localv2.md §5.1 A2explicitly specifies the cross-check; Phase 0 §2.3 makeslocalv2.mdthe source of truth. [S]'s threat (a hostile$PATHshim) is mitigated by the existingexec.run_allowlistedenv-strip (Phase 0 §2.5), short timeout (5 s), no-shell execution, and the fact that the value is only used as a display field, never as a control-flow input. The decision is recorded in a Phase-1 ADR; the security concern is documented but the [S] veto is overridden per the Rule 11 conformance principle the critic himself invokes. - Tradeoffs accepted:
- Adds one binary to
ALLOWED_BINARIES. ADR-gated; future entries follow the same workflow. tsconfig.jsonis parsed by a hand-rolled comment stripper (~30 lines, fuzz-tested intests/adv/test_tsconfig_pathological.py). Far cheaper than addingjson5as a C-extension dep.
4. NodeManifestProbe¶
- Provenance:
[B](shape, lockfile-not-node_modules, hand-curated catalog) +[S](in-process size+depth caps, nonpm ls, nonode_modules/*/package.jsonparsing by default) +[synth](yarn-lock library choice). - Purpose: Populate
manifestsfromlocalv2.md §5.1 A3. The single most distroless-relevant Layer A probe — native module enumeration is the largest source of distroless migration failures. - Interface: Standard probe ABC.
name = "node_manifest",layer = "A",applies_to_languages = ["javascript", "typescript"],applies_to_tasks = ["*"],requires = ["language_detection"],timeout_seconds = 30,declared_inputs = ["package.json", "pnpm-lock.yaml", "package-lock.json", "yarn.lock", "src/codegenie/catalogs/native_modules.yaml"].node_modules/*/package.jsonis NOT indeclared_inputs.[S] - Internal design:
package.jsonparse: viactx.parsed_manifest(memo) with 5 MB size cap + 64 depth cap.- Lockfile parsers — three small helpers, each in
src/codegenie/probes/_lockfiles/:_pnpm.py—yaml.CSafeLoader(banned via Phase 0 forbidden-patterns to use anything else), 50 MB size cap, 64 depth cap._npm.py—safe_json.load, 50 MB size cap, 64 depth cap._yarn.py—pyarnPyPI library if it's maintained (< 18 months since last release) at Phase-1 implementation time, otherwise a ~100-line hand-rolled line-scanner with no regex backtracking.[B + S agreement — refuses [P]'s "ship hand-rolled by default" for ~16 ms of average latency that critic §1.1.4 dismantles]. The decision is recorded in a Phase-1 ADR (docs/phases/01-context-gather-layer-a-node/ADRs/0003-yarn-lock-parser-choice.md) at land-time.
- Native module catalog:
src/codegenie/catalogs/native_modules.yaml, hand-curated, ships withbcrypt,sharp,better-sqlite3,node-canvas,node-rdkafka,node-pty,bufferutil,utf-8-validate,argon2,keytar. Each entry:{name, requires_node_gyp, system_deps_required, binary_artifacts_glob, notes, catalog_entry_version: int}. Catalog itself versioned viacatalog_version: intfield at file top — included in cache key so catalog updates invalidate cachedNodeManifestoutputs.[B + synth — resolves [B] Open Question #7] - No
npm ls/pnpm list/ Helm-style template rendering. Lockfile is the deterministic source.[B+S] engines,optionalDependencies,bundledDependenciesread from the parsedpackage.json.- Why this choice over the alternatives:
- Refuses [P]'s hand-rolled
yarn.lockparser by default — critic §1.1.4 establishes the 16 ms of average latency win does not justify a 1k-LOC maintenance liability. Hand-rolled is the fallback ifpyarnis unmaintained, not the default. - Refuses [P]'s
ruamel.yamlC-extension dep (critic §1.1.6) —yaml.CSafeLoaderis the Phase 0 ratified parser and is sufficient. - Refuses [S]'s
node_modules/*/package.jsonparsing as opt-in feature in Phase 1 — the threat model (attacker-controlled bytes at scale) is real and Phase 1 has no consumer that requires it. Deferred to Phase 2. - Tradeoffs accepted:
- Native module catalog gaps surface in Phase 7. Mitigation: an
import-linter-style test ensures each new probe in Phase 2+ ships its own catalog additions if applicable; catalog versioning means a Phase-7 catalog update invalidates Phase-1 cached outputs cleanly. The critic's concern (silent staleness, blast-radius five phases out) is acknowledged as a real risk and explicitly logged as a Phase 7 testing precondition. - Three lockfile parser files (~150 LOC each) is more code than [P]'s msgpack short-circuit. Acceptable trade for the cache contract.
5. CIProbe¶
- Provenance:
[B](shape, multi-provider catalog, GitHub Actions parser depth) +[S](yaml.CSafeLoadermandatory, Jenkinsfile by regex only, secrets references recorded literally not resolved) +[synth](singleton vs list — singleton withadditional_providerslist). - Purpose: Populate
cifromlocalv2.md §5.1 A4. Records CI provider, image-build presence, test/lint commands. - Interface: Standard probe ABC.
name = "ci",layer = "A",applies_to_languages = ["*"],applies_to_tasks = ["*"],requires = [],timeout_seconds = 10,declared_inputs = [".github/workflows/*.yml", ".github/workflows/*.yaml", ".gitlab-ci.yml", ".circleci/config.yml", "Jenkinsfile", "azure-pipelines.yml", "src/codegenie/catalogs/ci_providers.yaml"]. - Internal design:
- Provider catalog (
ci_providers.yaml): each entry{name, marker_paths: [...], parser: <name>}. First matching entry wins as theprovider; any other matching providers go into a newadditional_providers: list[str]field. This resolves [B]'s "emit a list" vslocalv2.md §5.1 A4's "singleton" tension without violating Phase 0 conformance —providerstays a singleton string,additional_providersis purely additive. The Phase 1 sub-schema declares both.[synth — addresses critic §3.4 directly] - GitHub Actions parser:
yaml.CSafeLoaderper workflow file (all workflows, not just the primary — [P]'s "skip non-primary" is rejected per critic §1.1.4 since it leaves us blind to multi-workflow setups). 10 MB cap per file, 64 depth cap. Extracts job names, steprun:commands, looks fordocker build,docker buildx,docker/build-push-action→builds_image: true. Test/lint commands extracted by simple substring match against{"npm run test", "pnpm test", "yarn test", "npm test", "pnpm run test", "yarn run test", "vitest", "jest", "playwright test"}. - GitHub Actions secret references (
${{ secrets.* }}) recorded as literal strings with areferences_secrets: list[str]field listing referenced secret names. Secret values are never accessible, so this is record-not-resolve.[S] - GitLab CI parser: same
yaml.CSafeLoader. - Jenkinsfile: presence + size + a bounded regex extraction for
sh '...'andsh "..."patterns (single capture group, line-bounded).confidence: low, explicit warning.[B+S] - CircleCI / Azure Pipelines: stub recognizers (presence only,
confidence: low); fuller parsers land when a consumer demands. - Why this choice over the alternatives:
- Refuses [P]'s "skip non-primary workflow" optimization — critic §1.1.4 establishes the gain (~50 ms per gather) is not worth the coverage loss.
- Refuses [S]'s implicit "Jenkinsfile parsed by path only" — the bounded regex for
sh '...'is a small win at zero risk surface (no eval, no backtracking). - The singleton-vs-list disagreement is resolved by adding an additive field rather than mutating the existing
providershape — this honors bothlocalv2.md §5.1 A4and the multi-provider reality. - Tradeoffs accepted:
- Multi-provider repos report
confidence: lowplus anadditional_providerslist. The schema enforces shape; the planner's interpretation is downstream. - We do not execute or simulate CI.
6. DeploymentProbe¶
- Provenance:
[B](shape, no Helm/Kustomize rendering, multi-env Helm as list) +[S](in-process caps, no Helm/Terraform binary invocation, kustomize-traversal cap, path-traversal refusal onkustomization.yaml#resources:outside repo root). - Purpose: Populate
deploymentfromlocalv2.md §5.1 A5. Records deployment type, image reference path, security context, ports, env vars. - Interface: Standard probe ABC.
name = "deployment",layer = "A",applies_to_languages = ["*"],applies_to_tasks = ["*"],requires = [],timeout_seconds = 15,declared_inputs = ["deploy/**/*.yaml", "deploy/**/*.yml", "k8s/**/*.yaml", "k8s/**/*.yml", "kubernetes/**/*.yaml", "Chart.yaml", "values.yaml", "values-*.yaml", "kustomization.yaml", "kustomization.yml", "helm/**/*", "charts/**/*", "*.tf"]. - Internal design:
- Type detection by file marker:
Chart.yaml→ Helm;kustomization.yaml→ Kustomize; rawkind: DeploymentYAML → raw;*.tf→ Terraform. - Helm: parse
Chart.yaml+values*.yamlwithyaml.CSafeLoader(10 MB cap each, depth 64). Record the image reference path (e.g.,image.repository) and the value at that path. Multi-environment Helm (values-prod.yaml,values-staging.yaml) recorded as aenvironments: list[{name, image_reference, ...}]field; the primaryimage_referencefield stays nullable for the single-env case. No Helm template rendering — that's a Planner-time decision in Phase 3+.[B+S] - Kustomize: parse
kustomization.yaml. Resources list followed one level deep. Paths inresources:that resolve outsiderepo_rootare rejected with akustomization_resource_path_outside_repo: truewarning (zip-slip mitigation,[S]). Overlay traversal capped at depth 5 and 50 total files. - Raw manifests:
yaml.CSafeLoadersafe_load_all(multi-document). Filter tokind ∈ {Deployment, StatefulSet, DaemonSet, Pod}. Extractspec.template.spec.containers[].image,securityContext,ports,env,envFrom. - Terraform:
*.tffiles enumerated by path only; no parsing in Phase 1.python-hcl2is NOT added to deps ([S]— historic CVEs; defer to Phase 2 with a richer parser). Slice recordsterraform_present: true, terraform_files: list[relative_path].confidence: lowif Terraform is detected and no other deployment type is. - Why this choice over the alternatives:
- Refuses [P]'s "streaming YAML parse for one key" — fragile (key ordering, anchors), zero meaningful win on 10 MB caps.
- Adopts [S]'s no-
hcl2stance; defers to Phase 2. - Multi-env-as-list resolves [B] Risk #2 directly: the sub-schema declares the shape, the envelope validation catches drift.
- Tradeoffs accepted:
- Terraform-heavy repos get a
confidence: lowslice with paths-only enumeration. Phase 2 closes when an actual consumer demands. - We do not render Helm or Kustomize — by design.
7. TestInventoryProbe¶
- Provenance:
[B](shape, file count not test count, framework detection from deps) +[S](lcov.info parse caps, test files not parsed — only enumerated). - Purpose: Populate
test_inventoryfromlocalv2.md §5.1 A6. Records test framework, test-file count, command paths, coverage data presence. - Interface: Standard probe ABC.
name = "test_inventory",layer = "A",applies_to_languages = ["javascript", "typescript"],applies_to_tasks = ["*"],requires = ["language_detection", "node_build_system"],timeout_seconds = 10,declared_inputs = ["package.json", "vitest.config.*", "jest.config.*", "playwright.config.*", ".mocharc.*", "test/**/*.test.*", "tests/**/*.test.*", "src/**/*.test.*", "**/*.spec.*", "coverage/lcov.info", "scripts/smoke.*", "tests/smoke/**/*"]. - Internal design:
- Framework detection: dict lookup against
dependencies + devDependencies(viactx.parsed_manifest) forvitest,jest,mocha,tap,@playwright/test,cypress.node:testreported ifpackage.json#engines.node >= 18and no other framework is declared. - Test-file count: single
os.walkwith Phase 0 noise-dir exclusions (node_modules,dist,build,coverage,.next,.turbo,.git). Match against*.test.{js,ts,jsx,tsx,mjs,cjs}and*.spec.{js,ts,jsx,tsx,mjs,cjs}. Field:unit_test_file_count: int, withunit_test_count_is_file_count: trueboolean to signal the limitation.[B+P agreement on counting files; [S] also agrees] - Command extraction: read
package.json#scriptsfor entries namedtest,test:unit,test:integration,test:smoke,test:e2e,test:coverage. Record verbatim. - Smoke script presence:
Path.exists()forscripts/smoke.{sh,js,ts}andtests/smoke/. - Coverage data: if
coverage/lcov.infoexists, parse summary only (totals: lines, functions, branches hit/found) via a small line-scanner (~40 LOC, no regex backtracking, 50 MB cap). The file format is unambiguous; no external lib needed. - Why this choice over the alternatives:
- Refuses [B]'s
lcov-parserPyPI dep — the lcov format is simple enough that 40 lines of stdlib is the right size. - Files-not-cases is the [P+B+S] convergence the critic flagged as "accidental agreement" — making it explicit closes that gap.
- Tradeoffs accepted:
- File count not test-case count. Dynamic test generation (
describe.each,test.each) is invisible. Signal-not-truth; the planner consumes this as a coarse indicator.
8. Safe-parse helpers — codegenie/parsers/¶
- Provenance:
[S](caps);[synth](placement — one shared helper module, no per-probe duplication). - Purpose: Centralize the in-process parse-with-caps idiom so every probe uses the same bounded parsers. Without this, each probe re-implements size+depth checks slightly differently, and the security goal degrades to "mostly enforced."
- Interface:
src/codegenie/parsers/safe_json.py:def load(path: Path, *, max_bytes: int, max_depth: int = 64) -> dict[str, JSONValue]. Reads the file once withO_NOFOLLOW, size-checked before parse. Uses stdlibjson.loadswith a small post-parse depth-walker (the C scanner has no native depth cap, so the walk is a stdlib-only second pass; bounded bymax_depth). Raises typedSizeCapExceeded/DepthCapExceeded/MalformedJSONError.src/codegenie/parsers/safe_yaml.py: same shape, wrappingyaml.CSafeLoader(the Phase 0 ratified loader; Phase 0forbidden-patternscontinues to banyaml.load). Depth check after parse; size cap before.src/codegenie/parsers/jsonc.py: stdlib-only line-comment + block-comment stripper, thensafe_json.load. Pathological inputs (unterminated block comments, deeply nested comment levels) fuzz-tested intests/adv/test_tsconfig_pathological.py.- Why this choice over the alternatives:
- Refuses [S]'s per-probe fork+exec sandbox (critic §2.1.1: ABC bypass, not extension-by-addition; ~1.5 s overhead per cold gather). In-process caps catch ~95% of the threat model (YAML bombs, JSON bombs, depth-DoS, oversized inputs) at ~0 ms of overhead per parse.
- Refuses [P]'s
orjson/pyjson5/ruamel.yamlC-extension drift (critic §1.1.6). - Tradeoffs accepted:
- In-process caps do not protect against parser-CVE exploits that bypass the depth check inside the C extension. Mitigation: Phase 0's
pip-audit+osv-scanner+ Dependabot onpyyaml/cpython watch this; rlimits at the OS level (Phase 14's production worker) are the future defense. The [S] design's claim that "sandbox catches the remaining 20% (parser CVEs)" is acknowledged; in Phase 1 we accept the 20% in exchange for not violating the Phase 0 ABC. This is the explicit risk; recorded under "Risks" below. - Two passes (size→parse→depth-walk) is ~5% slower than a hypothetical depth-aware parser. Acceptable.
9. Per-probe sub-schemas — strictness at the boundary¶
- Provenance:
[B](each probe owns one sub-schema file) +[S](additionalProperties: falseat sub-schema root) +[synth](resolution of Phase 0 §2.9 layered policy). - Purpose: The schema chokepoint where a typo in a Phase-1 probe's output is rejected at land-time, not at downstream-consumer time.
- Interface: Five new JSON files at
src/codegenie/schema/probes/: node_build_system.schema.jsonnode_manifest.schema.jsonci.schema.jsondeployment.schema.jsontest_inventory.schema.json- Plus an extension to the existing
language_detection.schema.json. - Internal design:
- Each sub-schema has
additionalProperties: falseat its own root. The Phase 0 envelope keepsadditionalProperties: trueunderprobes.*(Phase 0 §2.9, conflict-table row 4 winner). The strictness lives per-probe, not globally. This is exactly the position the critic identifies as agreed across all three lenses but un-documented; the synthesizer documents it here. - Sub-schemas are referenced from the envelope by
$refto relative path. - Each sub-schema declares required + optional fields with types. Optional fields use
nullfor not-present, not field-absence (this lets the schema'sadditionalProperties: falsemean what it says). - Adding a field is a code change + schema change in the same PR — the friction is the point.
- Why this choice over the alternatives:
- Refuses [S]'s sanitizer third-pass (critic §2.2.5: edits a frozen Phase-0 chokepoint without ADR amendment). The per-sub-schema policy gets the same effect — schema slice that doesn't conform is rejected — at the validator boundary, which Phase 0 §2.9 already shaped for layering.
- Refuses [P]'s implicit "size cap as system invariant" — Phase 0's
OutputSanitizerhas two passes; we don't add a third. - Tradeoffs accepted:
- A future probe wanting to emit a forward-compat field (e.g.,
prompt_injection_marker_countfrom [S] Goal #6) must amend its sub-schema in the same PR. This is correct: per-probe sub-schemas are the extension hook.
10. Catalog loader¶
- Provenance:
[B]verbatim. - Purpose: Load
native_modules.yamlandci_providers.yamlonce at module import, expose as immutable mappings, self-validate. - Interface:
src/codegenie/catalogs/__init__.pyexportsNATIVE_MODULES: Mapping[str, NativeModuleEntry]andCI_PROVIDERS: Mapping[str, CIProviderEntry]. Entry types areNamedTuples. Catalogs themselves self-validated against_schema.jsonat import. - Internal design:
yaml.safe_load(banned-without-Loader by Phase 0 forbidden-patterns; we useyaml.CSafeLoaderviasafe_yaml.load).types.MappingProxyTypefor freezing. Fail-loud at CLI startup if the catalog YAML is malformed or fails self-schema.[B+all] - Tradeoffs accepted: ~5 ms import-time cost. Worth it for the conventional shape.
11. Cache key — Phase 0 unchanged¶
- Provenance:
[B](no change) +[synth](explicit refusal of [S]'s byte-content rewrite). - Decision: The Phase 0 cache key derivation (
SHA-256(probe_name | probe_version | schema_version | inputs_hash_hex)whereinputs_hash = BLAKE3over sorted(path, size)tuples of files matchingdeclared_inputsafter exclusion) is preserved verbatim. The native module catalog version (catalog_versionfield) participates by being listed inNodeManifestProbe.declared_inputs(the catalog YAML is hashed like any other declared input). - Why not [S]'s byte-content cache key: Critic §2.2.3 establishes that this reverses an explicit Phase 0 decision (conflict-table row 14 in Phase 0 final-design) without writing an ADR amendment. The [S] motivation (cache poisoning resistance via attacker-controlled lockfile of same size) is real but is exactly the threat Phase 14's webhook-driven gather creates; Phase 14 redesigns the multi-actor key story (Phase 0 §2.7 commits to revisiting then). Phase 1 inherits Phase 0's choice; the future change is scoped where it belongs.
- Why not [P]'s PathIndex fingerprint: Critic §1.1.3 establishes the same governance issue. The PathIndex is also a new class hierarchy the Phase 0 ABC doesn't anticipate.
- Tradeoffs accepted: A cache-poisoning attacker who can write to the analyzed repo's
package.jsonand preserve its size and hit the same SHA-256 identity is theoretically possible but vanishingly unlikely under BLAKE3 over(path, size)— and the threat model assumes attacker doesn't have write access to the analyzed repo at gather time. Recorded as risk #4 below.
Data flow¶
A representative warm-path run on a real Node.js repo (~1k files, TypeScript, pnpm, GitHub Actions, Helm) where package.json changed since last gather:
- CLI entry (Phase 0, unchanged). Path validated, tool-readiness check includes optional
node. RepoSnapshotconstruction (Phase 0, unchanged).git rev-parse HEAD. AfterLanguageDetectionProbefinishes,detected_languagesis populated and the snapshot is frozen for the rest of the gather.- Probe registry filter (Phase 0, unchanged). Returns six probes for a TypeScript repo:
language_detection,node_build_system,node_manifest,ci,deployment,test_inventory. Topological order viarequires:language_detection→ wave 2 ={node_build_system, node_manifest, ci, deployment, test_inventory}(last two haverequires=[]but the planner emits them in the same wave). - Coordinator dispatch (Phase 0, unchanged +
ParsedManifestMemoprovided onProbeContext). Oneasyncio.Taskper probe, bounded bySemaphore(min(cpu_count(), 8)). - Per-probe cache lookup (Phase 0, unchanged). For each probe, compute
cache_key = identity_hash(name, version, schema_version, content_hash(declared_inputs)).package.jsonchanged →language_detection,node_build_system,node_manifest,test_inventorymiss;ci,deploymenthit (theirdeclared_inputsdid not change). language_detectionruns first (Phase 0 + Phase 1 framework + monorepo extension). ~80 ms p50: scandir walk (50 ms) +safe_json.load(package.json)(5 ms, with 5 MB + 64-depth cap) + framework + monorepo classification (5 ms). Parsedpackage.jsonis now in the memo for the rest of the gather.- Wave 2 dispatches (Phase 0).
node_build_system,node_manifest,test_inventoryeach callctx.parsed_manifest(repo_root / "package.json")— second call returns the memoized dict, no re-parse. Lockfile parse (pnpm-lock.yaml, ~250 ms p50) dominatesnode_manifest.tsconfig.jsonparsed viajsonc.py(5 ms).cianddeploymentare LRU/cache hits and skip execution entirely. - Per-probe ProbeOutput flows through
_ProbeOutputValidator(Pydantic, JSONValue recursive type, field-name regex) — Phase 0 boundary. Field names likepackage_manager,native_modules,image_reference,framework,references_secrets,additional_providers— none trip the secret-name regex. - OutputSanitizer.scrub (Phase 0, two passes unchanged). Absolute paths in lockfile entries (e.g.,
binary_artifacts: ["/Users/me/work/.../node_modules/sharp/build/..."]) get scrubbed to relative. Load-bearing for Phase 11. - Cache write (Phase 0). Each
ProbeOutputblob written. Index appended. - Output merge + schema validation (Phase 0 + Phase 1 sub-schemas). Each probe's slice merged; envelope
additionalProperties: falseat root,trueunderprobes.*, per-probe sub-schemaadditionalProperties: falseat its own root catches any typo'd field.[synth] - Raw artifacts written to
.codegenie/context/raw/<probe>.json(Phase 0). Lockfile dump, parsed Helm values, parsed CI workflow. - YAML write (Phase 0). Atomic
.tmp→os.replace.CSafeDumper, 0600. - Audit record (Phase 0). Per-probe execution path (
Ran/CacheHit/Skipped) — the seam Phase 14 needs. - Exit 0.
Second run, no changes: All six probes hit cache. Coordinator records CacheHit for each. No package.json parse (memo never populated; nothing to parse). YAML re-written from cached slices. Wall-clock target: p50 ≤ 0.4 s.
Failure modes & recovery¶
| Failure | Detected by | Containment | Recovery | Source |
|---|---|---|---|---|
NodeManifest lockfile parse error |
_pnpm/_npm/_yarn parser raises typed error |
Caught by probe run(); ProbeOutput(confidence="low", errors=[...]) |
Coordinator continues with degraded slice; planner sees confidence: low |
[B] |
| Lockfile exceeds 50 MB cap | safe_yaml.load / safe_json.load raises SizeCapExceeded |
Same as above | Probe failed; gather continues | [S] |
YAML billion-laughs in pnpm-lock.yaml |
safe_yaml.load depth-walker raises DepthCapExceeded |
Probe-level catch | confidence: low, gather continues. Test fixture exists. |
[S] |
JSON bomb in package.json (1 GB string, deep nesting) |
safe_json.load size cap or DepthCapExceeded |
Probe-level catch | Same | [S] |
| Native module catalog malformed | CatalogLoadError at module import |
Hard fail at CLI startup with path | Operator fixes catalog | [B] |
Path traversal in kustomization.yaml#resources |
DeploymentProbe resolves relative to repo_root and refuses if escapes |
Path skipped, warning emitted | Probe continues with the rest; slice has kustomization_resource_path_outside_repo: true |
[S] |
Symlink in declared_inputs outside repo |
Phase 0 walker (unchanged) | Skipped + logged | Probe sees no file; cache key derivation skips | [S+P0] |
tsconfig.json#extends chain exceeds 4 levels |
NodeBuildSystem internal counter |
Probe-level catch | confidence: medium, warnings: ["tsconfig.extends_depth_exceeded"] |
[S] |
node --version subprocess fails (binary absent, exec error, timeout) |
exec.run_allowlisted raises |
Probe catches | node_version_resolved_locally: null, confidence unaffected (constraint is load-bearing) |
[B] |
pyarn not installed at runtime (decided-to-fall-back path) |
ImportError at probe module import |
NodeManifest falls back to hand-rolled yarn-lock parser |
Gather continues with confidence: medium if hand-rolled returns empty, else high |
[synth] |
Probe exceeds timeout_seconds |
Phase 0 coordinator (asyncio.wait_for) |
Cancel + SIGKILL at 1.5 × timeout_s |
ProbeOutput(errors=["timeout"], confidence="low") |
[P0] |
| Probe slice fails its per-probe sub-schema | Draft202012Validator at envelope merge time |
YAML written with .invalid suffix; CLI exits 3 |
Operator inspects diff vs sub-schema | [B+P0] |
| Multi-environment Helm: probe emits list when consumer expects scalar | Per-probe sub-schema declares both image_reference and environments: list shapes |
Schema accepts; downstream contract test verifies consumer handles list | Documentation + planner-side handling | [B+synth] |
LanguageDetection reports a non-Node repo |
Phase 0 for_task filter |
Five Node-only probes are filtered out; their sub-schemas declare slices as optional (not required) at the envelope | YAML envelope omits the Node slices; schema validates | [synth — addresses [B] blind spot] |
| ParsedManifestMemo function not provided by coordinator (test path, old wiring) | Each probe defensive-checks | Falls back to direct safe_json.load |
Same correctness, 3× parse cost on warm path — surfaced in CI canary | [synth] |
package.json is a symlink pointing outside repo |
O_NOFOLLOW open in safe_json.load |
Open fails with ELOOP |
Probe records confidence: low, errors: ["symlink_skipped: package.json"] |
[S] |
The pattern: deterministic facts about messy reality, explicit confidence, never silent degradation. Hard-fails are reserved for load-bearing-invariant violations (catalog corruption, schema sub-schema malformed) per Rule 12.
Resource & cost profile¶
- Tokens per run: 0. Phase 0
fencejob continues to assert.[all] - Wall-clock per
codegenie gatheron the 1k-file fixture (M-series Mac, p50 / p95): - Cold (all cache miss): 4 s / 8 s. Dominated by
node_manifestlockfile parse (~250 ms p50) andtest_inventorywalk (~120 ms p50).[B] - Warm (all cache hits): 0.4 s / 1 s. Dominated by sanitizer + schema validation + YAML write.
[B+P] - Incremental (
package.jsonchanged, 4 misses / 2 hits): 1 s / 2 s.[synth] - Memory (RSS): ~90 MB peak on a 1k-file repo. ~70 MB idle (Phase 0 baseline).
[B+P0] - Storage per gather:
repo-context.yaml~30 KB;raw/~300 KB (lockfile dump dominates); cache blobs ~50 KB; audit ~5 KB. ~0.4 MB per gather.[B] - CI walltime delta vs Phase 0: +25 s p50, +45 s p95. Phase 0 90 s p95 advisory target now slips to ~120 s p95 — surfaced as a dashboard metric, not a gate (Phase 0 §3.2 set the budget as advisory). The slip is documented in the Phase 1 risk register.
[B+synth — softer than [S]'s implicit overrun] - External-dep additions:
pyarn(yarn-lock parser, conditional on maintenance status at land-time; otherwise hand-rolled fallback) and nothing else. Noorjson,pyjson5,ruamel.yaml,msgpack,hcl2,python-hcl2,lcov-parser.[synth — strict refusal of [P]'s extras and [B]'slcov-parser]
Test plan¶
The test pyramid is wider at the unit base than the integration top. Each probe is unit-tested exhaustively against fixture inputs before the integration test cares whether they compose correctly. Adversarial fixtures land in tests/adv/ and are CI-gating per Phase 0's adv-tests convention.
Unit tests (tests/unit/probes/)¶
| Test module | Asserts | Source |
|---|---|---|
test_language_detection.py (extends Phase 0) |
New: framework detection from deps; monorepo markers; confidence reporting on weak signals. | [B] |
test_node_build_system.py |
Lockfile-precedence selection; multi-lockfile drops confidence; engines.node precedence; tsconfig.json with comments; bundler detection; malformed package.json → confidence: low; node --version cross-check happy + disagreement + absent paths. |
[B+synth] |
test_node_manifest.py |
Each lockfile parser on a format-specific fixture; native-module detection (one fixture per cataloged module); optionalDependencies + bundledDependencies counting; cache-key stability across runs; catalog-version invalidates cache. |
[B+synth] |
test_ci.py |
GitHub Actions parser (build, no-build, matrix); GitLab CI; Jenkinsfile regex extraction; multi-provider repo (provider + additional_providers); absent CI directory. |
[B+S] |
test_deployment.py |
Helm Chart.yaml + values.yaml; multi-env Helm (values-prod.yaml + values-staging.yaml); Kustomize; raw Deployment; raw Pod skipped; multi-env list emission; Terraform paths-only. |
[B+S] |
test_test_inventory.py |
Vitest, Jest, Mocha, Tap, node:test, Playwright, Cypress detection; file count walk honors exclusions; package.json#scripts extraction; smoke-script presence; coverage lcov.info parsing. |
[B] |
test_catalogs.py |
Catalog YAML parses; catalog schema validates; duplicate names rejected; catalog_version is present. |
[B+synth] |
test_probe_registration.py (extends Phase 0) |
Each Phase 1 probe registers once; requires graph acyclic; applies_to_languages correct; applies_to_tasks=["*"]. |
[B] |
test_probe_slice_disjoint.py (extends Phase 0) |
No two probes write to the same top-level slice key. | [B] |
test_sub_schemas.py |
Each per-probe sub-schema is valid Draft 2020-12; each $ref resolves; each sub-schema has additionalProperties: false at root. |
[B+synth] |
test_cache_keys.py (extends Phase 0) |
For each Phase 1 probe, modifying a declared_inputs file changes the key; modifying a non-declared file does not. |
[B] |
test_parsers_safe_json.py |
Size cap fires; depth cap fires; valid JSON parses; symlink target refused (O_NOFOLLOW). |
[S+synth] |
test_parsers_safe_yaml.py |
Same shape; yaml.CSafeLoader used; depth cap fires on billion-laughs fixture. |
[S+synth] |
test_parsers_jsonc.py |
Strips line comments; strips block comments; handles trailing commas (per real-world tsconfig); pathological fuzz fixtures complete in < 1 s. | [synth] |
test_parsed_manifest_memo.py |
First call parses; second call returns memoized; mtime change re-parses; falsy memo → fallback. |
[synth] |
Adversarial tests (tests/adv/) — CI-gating¶
These are the load-bearing security tests. A regression here is a P0 defect.
test_yaml_billion_laughs.py— fixturepnpm-lock.yamlwith billion-laughs; assertDepthCapExceededfires; probe marked failed; gather exits 0; coordinator never OOMs.[S]test_json_bomb_deep_nesting.py—package.jsonwith 10,000 nested objects; depth cap fires.[S]test_json_bomb_huge_string.py—package.jsonwith a single 600 MB string; size cap fires (5 MB limit).[S]test_yaml_unsafe_tag.py—pnpm-lock.yamlwith!!python/object;CSafeLoaderrefuses; if a future bug uses unsafe loader, test detects (no sentinel side effect).[S]test_symlink_escape_in_declared_inputs.py—package.jsonsymlinks to/etc/passwd;O_NOFOLLOWopen fails; probe recordsconfidence: low; sensitive contents never appear in YAML.[S]test_zip_slip_kustomize.py—kustomization.yamlwithresources: ["../../etc/passwd"]; resolution refuses; warning emitted.[S]test_planted_node_on_path_ignored.py—$PATHincludes a maliciousnodeshim. The shim attempts to write a sentinel.exec.run_allowlistedenv-strip + timeout + the fact that Phase 0 allowlist resolvesnodeviashutil.whichafter env strip mean the shim runs in a stripped env without secret access; the sentinel write is not prevented by Phase 1 (the binary is allowlisted by name), but the secrets it might try to steal are not in scope. Test asserts the env strip and that no secret env var leaks.[synth — explicit acknowledgement thatnodeon$PATHis not RCE-proof; the env-strip carries the load-bearing weight]test_tsconfig_pathological.py— fixturetsconfig.jsonwith deeply nested block comments, unterminated string, circularextendschain.jsonc.pyeither parses successfully or raises a typed error; never hangs.[synth]test_regex_dos_yarn_lock.py— pathologicalyarn.lock(if hand-rolled fallback path is active); assert parser completes in < 1 s.[S]test_oversized_lockfile.py— 60 MB lockfile; size cap fires; probe failed; gather continues.[S]
Integration tests (tests/integration/probes/)¶
test_layer_a_end_to_end.py— fullcodegenie gatheragainsttests/fixtures/node_typescript_helm/; every Phase 1 probe produces a non-empty slice; envelope schema validates.[B]test_cache_hit_on_real_repo.py— gather twice; second run reportsProbeExecution.CacheHitfor all six probes; noos.scandircalled on second run (monkey-patched). This is the roadmap's exit-criterion test.[B+P0]test_cache_invalidation.py— gather; modifypackage.json; gather again; assertlanguage_detection,node_build_system,node_manifest,test_inventoryre-ran;ci,deploymentcache-hit.[B]test_real_oss_fixture.py— cloneexpressjs/expressat a pinned SHA; gather; assert schema validity + no native modules + GitHub Actions detected. Cached byactions/cacheon the SHA.[B]test_non_node_repo.py— gather on a Go-only fixture; Phase 1 probes are filtered out; envelope still validates (Layer A slices are optional).[synth — addresses [B] blind spot]
Golden files (tests/golden/)¶
Phase 2's full golden-file convention. Phase 1 lands one golden — the tests/fixtures/node_typescript_helm/ expected repo-context.yaml — to seed the convention.
Benchmarks (tests/bench/) — advisory only¶
test_warm_path_latency.py— gather a fixture twice; assert second-run wall-clock ratio ≤ 0.25 of first-run (advisory metric per Phase 0 §7.4).test_per_probe_rss.py—tracemallocper probe; advisory tracking against per-probe budgets.
Tests explicitly not in Phase 1¶
- No tests against live CI providers (
gh actionsAPI calls). - No tests requiring Docker /
node_modulesto be installed. - No tests of
IndexHealthProbe(Phase 2). - No property tests on lockfile parsers — they're small and well-understood; adversarial coverage carries the weight.
- No fork+exec sandbox tests (no per-probe sandbox exists in this design).
- No
views.jsonprojection tests (noviews.jsonartifact exists).
Risks (top 5)¶
- Native module catalog gap surfaces in Phase 7, five phases out. The catalog is hand-curated and seeded with ~10 well-known entries. A missed entry → a Phase 7 distroless migration that builds, tests pass, and crashes at runtime because of a missing
system_deps_required. Mitigation: catalog versioning means a Phase-7 catalog update cleanly invalidates cachedNodeManifestoutputs; thecatalog_entry_versionper entry lets us track when each native module was reviewed; Phase 7's integration tests are explicitly tasked with exercising the catalog and surfacing gaps. The risk is acknowledged as real silent-staleness in the spirit ofproduction/design.md §2.3— but the alternative (auto-derive from npm metadata) is materially worse (npm metadata is itself adversarial input). - In-process parse caps do not protect against parser-CVE exploits in
pyyaml/cpython. A future CVE inyaml.CSafeLoaderorjson.loadsthat bypasses our post-parse depth-walker would land RCE in the gather process. Mitigation: Phase 0pip-audit+osv-scanner+ Dependabot watch the parsers; the field-name regex + recursiveJSONValuetyping + path scrubber are belt-and-suspenders structural defenses. Phase 14's production gather worker adds OS-level rlimits + bwrap. Phase 1 explicitly chooses not to add a per-probe fork sandbox because (a) it's an ABC violation per critic §2.1.1 and (b) the marginal threat closure is ~20% for ~1.5 s of cold-gather overhead per Phase 0 CI cycle. - The
node --versioninvocation widens the host-$PATHattack surface. A hostilenodeshim on$PATHcan write side-effect files. Mitigation:exec.run_allowlistedstrips secrets from env; timeout is 5 s; output is parsed as a version string only (never as code); the value is a display field, never a control-flow input. The ADR documenting theALLOWED_BINARIESaddition records this risk explicitly. The decision overrides [S]'s veto per Rule 11 conformance withlocalv2.md §5.1 A2. - Cache-key derivation by
(path, size)is vulnerable to a same-size lockfile poisoning attack. An attacker with write access to the analyzed repo can substitute a lockfile of the same byte length and cause the cache to return a stale slice. Mitigation: Phase 1's threat model assumes the attacker does not have write access to the analyzed repo at gather time (Phase 14's production worker operates on a freshly-cloned worktree); the structural cache-key change ([S]'s byte-content hash) is deferred to Phase 14 when the multi-actor threat model arrives. Recorded as a Phase 14 precondition. - The non-Node repo path emits an envelope with most Layer A slices absent. Downstream consumers expecting all Layer A slices to be present will break on a Go-only or Python-only repo. Mitigation: the envelope sub-schemas declare each Layer A slice as optional at the
probes.*level. The per-probeapplies_to_languagesfilter (Phase 0 registry) correctly skips the Node probes on non-Node repos. Phase 2 (Layers B–G) will introduce language-agnostic probes that fill the gap; the envelope shape is forward-compatible. This is the cleanest of the three options the [B] design surfaced ("nullable variants" / "conditional branches" / "separate envelope"): optional slices is the minimum-friction path.
Synthesis ledger¶
Vertex count¶
- Performance
[P]: ~32 atomic decision vertices (SnapshotBuilder, PathIndex mixin, coordinator extensions, msgpack inter-probe cache, in-process LRU, mmap on Linux, streaming writer, views.json, hand-rolled yarn parser, BLAKE3 fingerprint, hash-validation-skip-on-LRU, per-probe RSS enforcement, ruamel.yaml C-mode, msgpack package.json, pyjson5/orjson, Tier-0/Tier-1 split, explicit DAG, cooperative cancellation, per-probeparsed.msgpack, …). - Security
[S]: ~38 atomic decision vertices (parser sandbox subprocess, rlimits PRE-exec, bwrap on Linux, sandbox-exec on macOS, env strip, stdin DEVNULL, byte-content cache key, O_NOFOLLOW, third sanitizer pass, additionalProperties false per probe, no node invocation, no node_modules parsing, no hcl2, no Helm template render, yarn-lock hand-rolled no-backtracking, JSON depth cap, YAML depth cap, parse-time cap, stdout cap, prompt-injection markers, audit input byte hashes, adversarial fixture corpus ≥ 50, …). - Best-practices
[B]: ~29 atomic decision vertices (extension by addition, ≤12 new files, ≤1100 LOC, 90/80 coverage, catalog YAML, explicit imports, lockfile-precedence ordering, no script eval, multi-lockfile = low confidence, multi-env Helm as list, Jenkinsfile presence-only, file count not test count, pyarn-if-maintained-else-hand-rolled, optional hcl2, node --version cross-check, packageManager-vs-lockfile preference, …). - Critic-flagged shared blind spots: 4 (warnings shape;
LanguageDetectionextension; readingpackage.json3× per gather; per-probe sub-schemaadditionalProperties: falseundocumented). - Total: ~99 atomic vertices, 35 cross-design edges resolved below.
Edges¶
| Class | Count | Examples |
|---|---|---|
| AGREE | 14 | Lockfile-precedence ordering for package-manager; no script evaluation; yaml.CSafeLoader mandatory; file count for test inventory; catalog YAML for native modules + CI providers; explicit imports for probe registry; no Helm template rendering; multi-lockfile = confidence drop; no npm ls. |
| COMPLEMENT | 9 | [S]'s in-process caps + [B]'s per-probe sub-schemas; [P]'s ProbeExecution seam + [B]'s catalog versioning; [S]'s O_NOFOLLOW + [B]'s declared_inputs precision; [B]'s multi-env-Helm-as-list + [S]'s path-traversal refusal. |
| SUBSUME | 4 | [B]'s per-probe sub-schema additionalProperties: false subsumes [S]'s third sanitizer pass; [synth]'s ParsedManifestMemo subsumes [P]'s msgpack side-channel; [synth]'s in-process safe_parse subsumes [S]'s fork+exec sandbox for caps purposes; Phase 0's existing _ProbeOutputValidator subsumes [S]'s coordinator-side prompt-injection filter. |
| CONFLICT | 8 | (See conflict table below.) |
| Total | 35 cross-design edges |
Conflict-resolution table¶
| # | Dimension | [P] picks | [S] picks | [B] picks | Winner | Exit-fit | Roadmap-fit | Commitments-fit | Critic-fit | Sum | Rationale |
|---|---|---|---|---|---|---|---|---|---|---|---|
| 1 | Probe execution model | Async-task per probe in coordinator + PathIndex mixin | Per-probe fork+exec subprocess with bwrap/sandbox-exec | Phase 0 unchanged (one asyncio task per probe) | [B] Phase 0 unchanged |
3 | 3 | 3 (extension by addition §2.5) | 3 | 12 | [P]'s mixin drifts the ABC (critic §1.1.1). [S]'s sandbox is a new architectural layer Phase 0 never sanctioned (critic §2.1.1) and adds ~1.5 s overhead per cold gather. [B]'s "Phase 0 unchanged" is the only option that preserves the load-bearing extension-by-addition commitment. |
| 2 | node --version invocation |
(silent) | No — RCE surface via $PATH |
Yes — read both engines.node and node --version |
[B] Yes |
2 (localv2 conformance) | 2 | 2 (Rule 11 conformance > taste) | 2 (resolves the cross-lens disagreement) | 8 | localv2.md §5.1 A2 explicitly specifies the cross-check; Phase 0 §2.3 makes localv2 the source of truth. [S]'s threat is mitigated by env-strip + timeout + display-only-not-control-flow. Risk recorded; ADR-gated. |
| 3 | Inter-probe parsed-state sharing | Yes — msgpack side-channel | (sandbox would block) | No — each probe re-parses for isolation | [synth] ParsedManifestMemo in coordinator |
3 | 3 (preserves cache-hit pass-through seam Phase 14 needs) | 3 (no side-channel; sanitizer + validator preserved) | 3 (resolves critic cross-design obs #3) | 12 | [P]'s msgpack bypasses the sanitizer (critic §1.1.2). [B]'s 3× parse cost is the wrong tradeoff (critic §3.6). The memo lives inside the coordinator, never written to disk; one Phase-1 ADR documents the ProbeContext extension. |
| 4 | additionalProperties per probe sub-schema |
(not specified) | Yes — load-bearing | Yes — friction is the point | [B+S] Yes |
3 | 3 | 3 (extension by addition) | 3 (resolves critic cross-design obs #1) | 12 | Both lenses agree; the synthesizer documents it at the per-probe sub-schema root (Phase 0 §2.9 layered policy preserved at envelope and probes.*). |
| 5 | yarn.lock parser | Hand-rolled by default (~200 ms warm win) | Hand-rolled no-backtracking | pyarn if maintained, else hand-rolled fallback |
[B] pyarn-if-maintained |
3 | 2 (defer maintenance burden until forced) | 3 | 3 (critic §1.1.4 demolishes [P]'s avg-latency math) | 11 | Decision rule encoded in a Phase-1 ADR at land-time; fallback parser exists as backup. |
| 6 | Cache-key derivation | PathIndex packed fingerprint | File-byte content hash | Phase 0 unchanged | [B] Phase 0 unchanged |
3 | 3 (Phase 14 redesigns multi-actor key; this is the right scope) | 3 (Phase 0 chokepoint) | 3 (critic §2.2.3 demolishes [S]'s ADR-bypass) | 12 | Both [P] and [S] rewrite a Phase 0 chokepoint without ADR amendment. Phase 0 final-design §2.7 commits to revisiting at Phase 14. Phase 1 inherits. |
| 7 | Streaming writer + views.json | Yes — streaming + Phase-8 projection | (not addressed) | Phase 0 batch write | [B] Phase 0 batch write |
3 | 2 (no Phase 8 design exists) | 3 (no forward-edit dependency) | 3 (critic §1.1.5 — Phase 8 → Phase 1 edit is the inverse of extension by addition) | 11 | Ship Phase 8 hot views in Phase 8. |
| 8 | New parser C-extension deps | Yes — msgpack, pyjson5, ruamel.yaml | (sandbox-only — implicitly stdlib only) | No — stdlib + PyYAML (Phase 0) + optional pyarn | [B] stdlib + pyarn |
3 | 3 | 3 (no CVE-surface growth) | 3 (critic §1.1.6) | 12 | Phase 0 ratified pyyaml.CSafeLoader + stdlib json + blake3. Phase 1 adds only pyarn (conditional). |
Tie-breaks (used for #2): localv2.md conformance wins per Phase 0 §2.3.
Shared blind spots considered¶
Critic's cross-design observations (carry-forward vs depart):
- All three want per-probe
additionalProperties: falseat sub-schema root, none cites Phase 0 §2.9. Carried forward and documented in Component 9. The strictness lives per-probe; the envelope'sprobes.*policy from Phase 0 staystrue. No release-versioning policy for sub-schemas is added in Phase 1 — declared an open question. - All three duck
LanguageDetectionProbe's extension; none argues why in-place modification is allowed. Carried forward with explicit rationale: Phase 0 final-design §2.10 explicitly defers framework + monorepo detection to Phase 1. This is the one in-place extension the synthesizer documents as licensed. - All three accept reading
package.jsonmore than once per gather; none uses Phase 0'sProbeExecutionseam. Departed: introducesParsedManifestMemo(Component 2), a Phase-1 ADR-gated extension toProbeContext. This addresses the critic's exact phrasing — "the cheapest, cleanest seam goes untouched." - All three under-specify the
warnings: list[str]anderrors: list[str]field shapes;production/design.md §2.2makes the structural defense load-bearing. Acknowledged as deferred — Phase 1 does not introduce a typed warning enum, but each per-probe sub-schema'swarningsfield is constrained to apatternthat requires structured warning IDs (^[a-z][a-z0-9_]*\.[a-z][a-z0-9_]*$, e.g.,tsconfig.extends_depth_exceeded). This is the minimum structural constraint that prevents prose-judgment smuggling; full typed warning enum is a Phase 2 task tied toIndexHealthProbe.
Departures from all three inputs¶
ParsedManifestMemoonProbeContext— none of the three lenses proposed this seam. It addresses critic cross-design observation #3 directly and avoids both the msgpack side-channel and the triple-parse waste.- Per-probe sub-schema strictness with envelope/probes.* layering preserved — all three lenses wanted strictness; none documented where. The synthesizer reads Phase 0 §2.9's layered policy as the load-bearing seam and places strictness at the sub-schema root only.
- Warnings field pattern constraint — the four-design-wide blind spot the critic flagged; the synthesizer adds the minimum structural constraint without introducing a typed enum.
safe_parsehelpers as a shared module — [S] specified caps per-probe; the synthesizer centralizes them insrc/codegenie/parsers/so each probe uses identical caps. Reduces "security goal degrades to mostly enforced" risk.- Per-probe coverage floor carve-out — [B] proposed lowering
deployment.pyto 85%. The synthesizer codifies it fordeployment.pyANDci.py(similar structural-narrow-branch shape) and gates further carve-outs by ADR amendment. This is the explicit "not theater" version of [B]'s mitigation that the critic flagged as gameable (Rule 9).
Exit-criteria checklist¶
- [ ]
codegenie gatherruns on a real Node.js repo → integration testtest_layer_a_end_to_end.pyagainsttests/fixtures/node_typescript_helm/. - [ ] A useful
repo-context.yamlis produced → all six Layer A slices populated; schema validates against envelope + per-probe sub-schemas. - [ ] Cache hits on second run → integration test
test_cache_hit_on_real_repo.py: all six probes returnProbeExecution.CacheHit;os.scandirmonkey-patched to confirm no re-execution. - [ ] All probes pass schema validation → CI gate: produced YAML must validate against envelope + per-probe sub-schemas, or build fails (Phase 0 exit-code 3 convention).
- [ ] Probe ABC unchanged →
tests/unit/test_probe_contract.pysnapshot test passes (Phase 0). - [ ] CI green on
mainincluding the new adversarial fixture suite. - [ ] Coverage floor 90/80 on
src/codegenie/(carve-outs fordeployment.py,ci.pyat 85/75 declared inpyproject.toml). - [ ]
fenceCI job continues to assert dependency closure has no LLM SDK.
Load-bearing commitments check¶
| Commitment | How honored |
|---|---|
| §2.1 No LLM in gather | Phase 0 fence CI job extended to cover Phase 1 deps; no LLM SDK added; pyarn is a YAML parser, not an LLM SDK. |
| §2.2 Facts, not judgments | Probes emit counts, paths, presence flags, version strings, dependency lists. No safe_for_distroless, no production_ready, no ci_appropriate. The _ProbeOutputValidator's recursive JSONValue makes judgment-shaped types structurally unrepresentable. Warning IDs constrained to pattern matching prevents prose-judgment smuggling. |
| §2.3 Honest confidence | Every probe emits confidence ∈ {high, medium, low} and an explicit warnings list with structured IDs. Multi-lockfile, missing CI, partial Jenkinsfile, Terraform-paths-only — all downgrade confidence and emit a typed warning. |
| §2.4 Determinism over probabilism | Parsers are stdlib json / yaml.CSafeLoader / hand-rolled deterministic line-scanners. No probabilistic classifier. No LLM. |
| §2.5 Extension by addition | Phase 1 adds new files under src/codegenie/probes/, src/codegenie/schema/probes/, src/codegenie/catalogs/, src/codegenie/parsers/. The three in-place edits (registry imports, LanguageDetection extension Phase 0 explicitly deferred, one ALLOWED_BINARIES entry) are each ADR-gated. |
| §2.6 Org uniqueness as data | native_modules.yaml and ci_providers.yaml are YAML data; adding entries is a YAML PR. Catalog self-schema validates structure. |
| §2.7 Progressive disclosure | repo-context.yaml references raw artifacts at .codegenie/context/raw/<probe>.json; doesn't inline lockfile content. |
| §2.8 Humans always merge | N/A in Phase 1 (no PRs opened). |
| §2.9 Cost observable + bounded | Phase 0 audit anchor extended with per-probe wall-clock for Phase 1 probes. Phase 13's cost ledger reads this same record. |
Roadmap coherence check¶
- Prior phases this depends on:
- Phase 0 (Bullet tracer + project foundations): probe contract, async coordinator, content-addressed cache, layered JSON Schema, two-pass sanitizer, subprocess allowlist, audit anchor,
ProbeExecution ∈ {Ran, CacheHit, Skipped}seam, BLAKE3 + SHA-256 hashing, structlog, mypy --strict, ruff, 85/75 coverage floor. - What this establishes for later phases:
- Phase 2 (Layers B–G):
IndexHealthProbereads Phase 1'sconfidenceoutputs and surfaces silent-staleness vectors (the catalog-gap risk especially). Thesafe_parsemodule is reusable for Phase 2'ssemgrepoutput parsing and the SCIP index health probe. - Phase 3 (deterministic recipe path): consumes Phase 1's
manifestsslice + native module catalog directly. The catalog-versioning story unblocks Phase 7. - Phase 7 (Chainguard distroless migration): the native module catalog from Phase 1 is the primary input. Phase 7's regression precondition: every Phase 1 native module appears in the distroless system-deps lookup; Phase 7's integration tests exercise the catalog.
- Phase 8 (Hot views): the four Phase 8 slices (
available_skills,entrypoint,risk_flags,confidence_summary) project fromrepo-context.yamlat Phase 8 time, not at Phase 1 time. Noviews.jsonshipped here. - Phase 14 (Continuous gather): Phase 0's
ProbeExecutionseam (preserved in Phase 1) + Phase 1's per-probe sub-schemas + the audit record's per-probe execution path are exactly what Phase 14's incremental gather needs. - New ADRs implied (to be written under
docs/phases/01-context-gather-layer-a-node/ADRs/): 0001-add-node-to-allowed-binaries.md— adds"node"toexec.ALLOWED_BINARIES; documents threat ($PATHshim) and mitigation (env strip + timeout + display-only).0002-parsed-manifest-memo-on-probe-context.md— extendsProbeContextwith optionalparsed_manifestcallable; documents non-persistence and per-gather scope; commits to discarding at gather end.0003-yarn-lock-parser-choice.md— decision rule forpyarnadoption at land-time; documents the < 18-month-maintenance heuristic + the hand-rolled fallback shape.0004-per-probe-subschema-additional-properties-false.md— extends Phase 0's layeredadditionalPropertiespolicy with the per-probe sub-schema strictness rule.0005-coverage-carve-outs-deployment-ci.md— codifies 85/75 floor fordeployment.pyandci.py; ADR-amendment required for further carve-outs.0006-native-module-catalog-versioning.md—catalog_versionfield at file top participates in cache key;catalog_entry_versionper entry tracks last review.0007-warnings-id-pattern.md—warnings[]entries must match^[a-z][a-z0-9_]*\.[a-z][a-z0-9_]*$; structural defense against prose-judgment smuggling.
Open questions deferred to implementation¶
pyarnadoption rule at land-time. Per ADR-0003, the implementer confirmspyarn's maintenance status and test-fixture conformance. If unmaintained, ship the ~100-line hand-rolledyarn.lockparser as the default.- Per-probe sub-schema versioning policy. Phase 1 lands v1 sub-schemas. The release-versioning policy for sub-schemas (how a forward-compatible field lands without breaking cached output) is deferred to Phase 2 when the first cross-phase sub-schema change is anticipated.
packageManagerfield handling.package.json#packageManager(e.g.,"pnpm@8.15.0") sometimes disagrees with the lockfile. Implementation: prefer the lockfile; emitwarnings: ["package_manager.declaration_lockfile_disagree"]on mismatch.- GitHub Actions parser depth — reusable workflows. Phase 1 parses top-level workflows;
uses:references to reusable workflows are recorded as paths only. Phase 2 may deepen if a consumer demands. - Helm template rendering stays a Planner-time decision in Phase 3+ (no rendering in Phase 1). Documented in deployment sub-schema.
- Multi-environment Helm
image_referenceconsumer contract. The sub-schema declares bothimage_reference: nullableandenvironments: list. Phase 3+ consumers must handle the list shape; documented as an open consumer-contract concern. - Typed warning enum. Deferred to Phase 2 (
IndexHealthProbe-driven). Phase 1 ships thewarnings[]pattern constraint as the minimum structural defense.