Phase 00 — Bullet tracer + project foundations: Performance-first design¶
Lens: Performance — throughput, latency, token economy, footprint. Designed by: Performance-first design subagent Date: 2026-05-11
Lens summary¶
I optimized Phase 0 for one thing: lock in the cost shape of every later phase. Phase 0 ships almost no user-visible work — one trivial probe, one stub YAML, a CI green light — but every choice here compounds. A 200 ms cold-start CLI is dead weight in a portfolio loop that runs codegenie gather against 1,000 repos. A coordinator that allocates a thread per probe blows the worker memory budget the moment Phase 2 lands 30 probes. A cache layout that does the wrong thing under content addressing makes Phase 1's exit criterion (cache hits on second run) impossible to verify cheaply. I deliberately accept extra implementation complexity in three places — the cache, the coordinator, and the CLI cold path — in exchange for the fastest defensible baseline. I deprioritize: pretty error messages, exhaustive type coverage on internal helpers, and onboarding ergonomics for contributors who aren't writing probes. CI walltime is treated as a first-class production metric, not a chore.
Goals (concrete, measurable)¶
These are the targets the Phase 0 scaffold must support by construction, not the targets Phase 0 itself reaches (Phase 0 only runs one probe). The point is to make later phases free of refactors.
- CLI cold start (
codegenie --help) p95: ≤ 80 ms on M-series Mac, ≤ 150 ms on a GitHub Actions Linux runner. Anything north of 200 ms is a Phase 1 tax — a portfolio of 1,000 repos pays it 1,000×. - Empty-dir
codegenie gather(LanguageDetection only) p95: ≤ 350 ms wall-clock; ≤ 250 ms steady-state after warm filesystem cache. - Coordinator overhead (dispatch + merge + write, excluding probe
run()time): ≤ 25 ms for 1 probe; the math must scale to ≤ 60 ms for 30 probes in Phase 2. - Cache hit dispatch cost (decide-to-skip path): ≤ 2 ms p95 per probe, locked in now so Phase 1's "second-run zero re-execution" target is trivially met.
- Worker memory ceiling (RSS) for a Phase 0 gather: ≤ 90 MB. This leaves headroom of ~410 MB for Phase 2's heavy probes inside a 512 MB worker budget — the worker count under a fixed memory budget is the constraint that will eventually cap portfolio throughput.
- CI walltime (lint + type + test on PR) p95: ≤ 90 s end-to-end. Developer iteration loop is itself a throughput input — slow CI is a hidden tax on every later phase.
- $/PR: N/A this phase. Phase 0 ships no LLM call. The token-economy bet is structural: every cache directory, every artifact path, every probe-output schema choice in Phase 0 is a decision the planner will hit in Phase 8 hot views. Get the layout wrong now and Redis denormalization costs more later.
Architecture¶
codegenie CLI (click)
│
┌──────────────────────┼───────────────────────┐
│ │ │
fast-path slow-path probe code tool-readiness
(--help, -V) (lazy-imported) check (cached)
│ │
└────────► Coordinator (asyncio) ◄─── ProbeRegistry
│
┌───────────────┼────────────────┐
│ │ │
CacheLookup RunProbe (LanguageDetection) ResultMerge
(mmap'd jsonl (single asyncio.Task, (in-place dict
index, xxh3 bounded by Semaphore(N)) build, no copies)
content key)
│ │
▼ ▼
.codegenie/cache/ .codegenie/context/
├── index.jsonl (append-only, ├── repo-context.yaml (atomic rename)
│ one record per cache entry, └── raw/
│ mmap-scanned at startup) ├── language_detection.json
└── blobs/ └── ...
└── <2-char shard>/<content-hash>.json
Data flow is strictly forward. Probe → ProbeOutput → coordinator merge → atomic YAML write. No probe-to-probe RPC. No central in-memory model object that grows during the run. The merge target is the YAML document itself, built as a flat dict and yaml.CDumper'd once at the end.
Components¶
CLI entry (codegenie/cli.py)¶
- Purpose: Accept user input, dispatch to commands, exit fast.
- Interface: stdin/argv → exit code + stdout/stderr.
- Internal design:
- No heavyweight imports at module top.
click,pyyaml,jsonschema, andpydanticare imported lazily inside the command function bodies. The bare--helpand--versionpaths import nothing past the standard library. This is the single biggest cold-start lever: pyyaml's C extension alone adds ~25 ms; jsonschema's referencing graph adds another ~20 ms; pydantic v2 adds ~40 ms even when unused. Bury them. clickoverargparse: stipulated by the roadmap. Configure withcontext_settings={"max_content_width": 100}andauto_envvar_prefix="CODEGENIE"— env vars are how CI overrides config without arg-plumbing.- Tool readiness check is cached to
~/.codegenie/.tool-cache.jsonkeyed by(tool, $PATH-hash, mtime). Re-runningwhich scip-typescripton every invocation is ~5 ms per tool × ~12 tools = 60 ms of wasted cold-start budget in Phase 2. Cache for 24h with version pinning on the tool binary's mtime. - Single
python -m codegenieandcodegenieentry point. No subprocess re-exec.pyproject.tomldeclares[project.scripts] codegenie = "codegenie.cli:main". - Tradeoffs accepted:
- Lazy imports make stack traces harder to read on import-time failures (the import happens deep in the call stack). Worth it — there are no import-time failures in steady state, and we eat the cost on cold-paths most users never hit.
- The tool-readiness cache can go stale if the user
brew upgrades a tool. Mitigated by 24h TTL and acodegenie gather --refresh-toolsflag.
Probe registry (codegenie/probes/registry.py)¶
- Purpose: Collect probe classes via
@register_probedecorator; exposeall_probes()andfor_task(task, languages)filters. - Interface: Module-level mutable list of probe classes; pure-Python decorator.
- Internal design:
- Decorator is a one-liner; no metaclass, no plugin discovery, no entry-point scan. Entry-point scanning via
importlib.metadatawould add 30–80 ms to every CLI invocation. Probes are imported by an explicitcodegenie.probespackage__init__.pythat lists them — flat, ordered, debuggable, fast. - Filtering (
for_task) returns a tuple, not a list, and is cached byfunctools.lru_cache(maxsize=32)on(task, frozenset(languages)). Phase 2 has 30+ probes; recomputing the filter per invocation is wasteful when 99% of gathers in Phase 8+ will be incremental re-gathers on the same(task, languages)shape. - Tradeoffs accepted:
- Probes added by external packages later (Phase 14+) won't auto-discover. When that need arrives, add entry-point scan behind a flag and pay the cost only when the flag is set. Not today.
Coordinator (codegenie/coordinator.py)¶
- Purpose: Dispatch probes concurrently within a memory and timeout budget; isolate failures; emit
ProbeOutputs in order of completion. - Interface:
- Input:
RepoSnapshot,Task,list[type[Probe]],Config. - Output:
dict[probe_name, ProbeOutput], plus a structured run-log JSON. - Errors: per-probe; one probe's exception does not poison the dispatch.
- Internal design:
asyncio.Semaphore(N)whereN = min(os.cpu_count(), config.max_concurrent_probes, 8)by default. Probes are I/O-bound (filesystem walks, subprocess calls); bounded concurrency beats serial; unbounded concurrency blows the file-descriptor budget on Mac (defaultulimit -nis 256) the moment Phase 2's runtime trace probes spin up multiplestraces. 8 is empirically the elbow for a 1-CPU GitHub runner withpackage.jsonparsing dominating.asyncio.wait_forper-probe timeout from the probe'stimeout_secondsdeclaration. Hard kill at 1.5× the declared timeout viaasyncio.create_task+cancel()+awaitwith a 100 ms grace.- Probe
run()failures are caught intoProbeOutput(errors=[...], confidence="low")and the coordinator advances. No probe ever raises out of the coordinator. CLAUDE.md "Fail loud" is satisfied by writing the error to the output, not by crashing — crashing breaks the portfolio loop. - No thread pool. Probes that need to shell out use
asyncio.create_subprocess_exec, notsubprocess.run. The coordinator stays in one thread, one event loop, one reactor. Thread pools add ~200 KB RSS per worker and a context-switch cost that adds up across 30 probes. - Output merge is incremental and in-place: as each probe completes, its
schema_sliceis shallow-merged into the running result dict. No deep-copy. No "merge function" abstraction. This is one line:result.update(probe_output.schema_slice). - Tradeoffs accepted:
- Probes can't share work mid-run (e.g., two probes both reading
package.json). This is fine for Phase 0 (one probe) and acceptable for Phase 1 because each probe's filesystem reads hit OS page cache for free. If it ever bites — Phase 8 — add aSharedReadCachethen. - Single event loop means a CPU-bound probe (parsing huge SCIP indexes in Phase 2) blocks the others. Workaround: that probe's
run()is the one place we addasyncio.to_thread()— but only when measured to matter. Don't prematurely thread.
Cache layer (codegenie/cache.py)¶
- Purpose: Content-addressed, durable, fast-lookup cache of per-probe outputs. Phase 0 stores the LanguageDetection output; everything bigger lifts in Phase 1.
- Interface:
get(cache_key) -> ProbeOutput | Noneput(cache_key, ProbeOutput) -> Nonekey_for(probe, snapshot, task) -> str- Internal design:
- Hash function:
xxh3_128(viaxxhashpackage), not SHA-256. Cache keys are not adversarial — they're identity, not security. xxh3 hashes a 1 MB lockfile in ~30 μs vs SHA-256's ~3 ms. Across a Phase 2 gather with 30 probes hashing ~5 MB each of declared inputs, that's 450 ms vs 4.5 ms. (The local v2 spec usessha256(probe_name | probe_version | inputs_hash)for the final key — fine, keep SHA-256 of the inputs_hash hex string if anyone wants a stable identity. But the bulk content hashing must be xxh3.) This contradicts the implicit reading oflocalv2.md §8's SHA-256 mention; flagging for the synthesizer. - Storage layout:
.codegenie/cache/index.jsonl(append-only) +.codegenie/cache/blobs/<2-char-shard>/<full-hash>.json. The JSONL index is mmap'd at startup and scanned linearly — for the cache sizes we'll see (≤ 10k entries per repo), that's faster than any tree-based store. Sharding the blob directory by the first 2 hex chars of the hash keeps any single dir under ~256 entries × n, well under the 10k threshold wheregetdents()slows on ext4/APFS. - No SQLite for the index. SQLite would add ~3 ms startup cost (open + WAL replay) and offers no benefit at this scale. Phase 6 introduces SQLite as the LangGraph checkpointer; that's the right place for it. Phase 0's cache index is text the user can
cat. - Atomic writes: blob written to
<dest>.tmp,fsync,rename. Index appended withO_APPEND(atomic for ≤PIPE_BUF=4096 bytes per record). - TTL enforcement is lazy. Don't scan-and-prune on every run; that's O(N) startup cost. Instead, lookups check the entry's
created_atand treat expired entries as misses. A separatecodegenie cache gccommand compacts. - Tradeoffs accepted:
- Append-only JSONL grows forever absent
gc. Acceptable for Phase 0 (one entry per gather). Document thegccommand exists. - xxh3 is non-cryptographic. If an attacker can write to
.codegenie/cache/blobs/, they can poison probe outputs. The cache lives inside the analyzed repo; if the attacker has write access there, the game is already lost. Flag for the security-lens designer.
Probe contract (codegenie/probes/base.py)¶
- Purpose: The ABC. Verbatim from
localv2.md §4per CLAUDE.md. - Interface: As specified.
- Internal design (performance additions inside the contract):
- Probes declare a class-level
version: strconstant. The cache key includes this. Bumpingversion = "1.1.0"invalidates that probe's cache entries with zero filesystem operations. declared_inputsis normalized at registration time (not at every cache-key compute) into a sorted tuple of compiledpathlib.PurePathpatterns. The cost is paid once, on import; every gather benefits.Probe.cache_key()default implementation: readdeclared_inputs, hash with xxh3, return hex digest. Critical: each input file is hashed viammap+ xxh3 streaming, notPath.read_bytes()→xxh3(...). For a 5 MB lockfile, the difference is 30 ms vs 5 ms — and the lockfile gets hashed on every gather.- Tradeoffs accepted:
- mmap on Windows behaves differently. Local POC is macOS/Linux per CLAUDE.md; ignore Windows.
LanguageDetection probe (codegenie/probes/language_detection.py)¶
- Purpose: Walk the tree, count extensions, emit the language stack slice.
- Interface: Standard
Probe.run. - Internal design:
- Use
os.scandirrecursively, notpathlib.Path.rglob.scandirreturnsDirEntryobjects with cachedstatinfo from the directory read;rglobcallsos.lstatseparately for every entry. On a 50k-file repo, the difference is ~400 ms vs ~80 ms. - Exclude well-known noise directories at the directory level, before descending:
node_modules,.git,dist,build,coverage,.next,.turbo,target. List is afrozensetin module scope. - For Phase 0, no
tree-sitterinvocation.localv2.mdA1 calls for tree-sitter for ambiguous cases. Phase 0 ships extension-counting only; tree-sitter lifts in Phase 1 when the actual A1 probe replaces this stub. Phase 0's job is to prove the harness, not deliver A1. - Tradeoffs accepted:
- Misclassifies
.hfiles without C/C++ context, etc. Out of scope.
Output writer (codegenie/output.py)¶
- Purpose: Build
repo-context.yamland the raw artifacts directory; atomically replace prior outputs. - Interface:
write(repo_context: dict, raw_artifacts: list[tuple[str, bytes]], output_dir: Path). - Internal design:
yaml.CSafeDumper(libyaml C extension), not the pure-Python dumper. Phase 2'srepo-context.yamlwill be a few hundred KB; C dumper is 10× faster.- Write
repo-context.yaml.tmpthenos.replacefor atomic publish. Phase 14 will have webhook readers tailing this file; they must never see a half-written document. - Raw artifacts are written first, then the index YAML last. If the writer crashes mid-write, downstream readers see either the prior consistent state or no
repo-context.yaml(and ignore stale raw files by checking the YAML's manifest).
Schema validation (codegenie/schema.py)¶
- Purpose: Validate the produced
repo-context.yamlagainst a JSON Schema before declaring the gather complete. - Interface:
validate(repo_context: dict) -> None(raises on invalid). - Internal design:
- Validator compiled once, module-level, via
jsonschema.Draft202012Validatorand frozen behind anlru_cache. Compiling the validator is ~30 ms; doing it per gather in a portfolio of 1,000 is a minute of pure waste. - Schema lives in
codegenie/schemas/repo-context.v1.json, committed; loaded at import time as a frozen dict. fastjsonschemaover stockjsonschemafor the validation hot path: ~10× faster on large documents. Stockjsonschemais fine for development-time schema authoring;fastjsonschemaruns the validation. Two libraries, one schema source — acceptable complexity for the speed win.
Project conventions (pyproject.toml, mypy, ruff, pre-commit, CI)¶
- Purpose: Ship the scaffolding the roadmap commits to: PEP 621 metadata, strict mypy, ruff lint+format, pytest+cov, pre-commit, GitHub Actions CI.
- Internal design:
rufffor both lint and format. Single tool replacesblack,isort,flake8,pylint. ruff format is ~30× faster than black on the same tree. CIruff check . && ruff format --check .is ~200 ms.mypy --strictagainst thecodegenie/package only, not tests. Tests getmypy --strict --disable-error-code=union-attror simply--ignore-missing-imports— type-checking tests yields diminishing returns and inflates CI time.mypydaemon (dmypy) is not used in CI (cold start dominates anyway) but the devpre-commitconfig invokesdmypy runfor ~3 s incremental type-checks instead ofmypy's ~15 s cold check.pytest-xdistfrom day one, even with five tests.pytest -n autoparallelizes; we want the muscle memory and the CI config in place before Phase 1 multiplies the test count by 30.pytest --import-mode=importlib(not the legacy mode). Faster collection, nosys.pathrewriting, conftest-clean.- GitHub Actions:
- One workflow file, three jobs running in parallel:
lint,typecheck,test. actions/setup-pythonwithcache: pipkeyed offpyproject.tomlhash.uvinstead ofpipfor install (pip install uvthenuv pip install -e ".[dev]"). 5–20× faster than pip on cold cache; 2× faster on warm cache. CI install drops from ~25 s to ~3 s.- Coverage is computed but not enforced as a gate in Phase 0 (the bar would be meaningless with five tests). Phase 1 adds a coverage threshold once there's a real corpus.
- One workflow file, three jobs running in parallel:
mkdocs-materialbuilds run in a separate CI job, only on changes todocs/**andmkdocs.yml. Path filtering keeps PR feedback time tight.- Tradeoffs accepted:
fastjsonschema+jsonschemais two libraries. Acceptable.uvis a young tool. Pin to a specific version; it's stable enough.
Data flow¶
A representative Phase 0 run, codegenie gather /path/to/repo:
- CLI entry (0–5 ms).
codegenieresolves to the entry point;clickis imported lazily insidecli.main. Argument parsing completes. Subcommand dispatch togather. - Tool-readiness check (1–3 ms cached, ~60 ms cold). Reads
~/.codegenie/.tool-cache.json. For Phase 0, onlypythonitself is required; the check is mostly a no-op. The cache infrastructure lands now because Phase 1 lights it up. - RepoSnapshot construction (5–15 ms).
git rev-parse HEADvia subprocess; config loading via lazy-imported pyyaml. No probe-relevant filesystem traversal yet. - Probe registry filter (≤ 1 ms cached).
for_task("distroless_migration", {"unknown"})returns the list withLanguageDetectionProbein it. - Coordinator dispatch (≤ 5 ms overhead).
asyncio.Semaphore(8)created; oneasyncio.Taskspawned forLanguageDetectionProbe.run. - Cache lookup (≤ 2 ms). mmap the index, scan for the probe's content-addressed key. Phase 0 cold-runs: miss. Phase 1 second-runs: hit, skip step 7.
- Probe execution (50–200 ms for an empty/small dir, 200–1500 ms for a real repo).
os.scandirwalk; extension counting;ProbeOutputconstructed. - Cache write (≤ 5 ms). Blob written; index appended.
- Output merge (≤ 1 ms). Shallow dict update.
- Schema validation (≤ 5 ms hot path via
fastjsonschema). - Output write (≤ 10 ms).
repo-context.yaml.tmpwritten,os.replaced. Raw artifacts directory written. - Exit.
Parallelism extraction: None visible in Phase 0 with one probe — but the coordinator code path is the same as Phase 2's 30-probe fan-out. Every microsecond of unnecessary serialization in the coordinator becomes a millisecond × 30 in Phase 2.
Cache consultations: Two — tool readiness, probe output. Both pre-mmap'd; no SQLite open, no JSON load of a giant manifest.
Serialization points: Three, all justified — the YAML write (the artifact contract), the cache index append (durability), the schema validation (the truth gate). Everything else is in-memory.
Failure modes & recovery¶
| Failure | Detected by | Recovery |
|---|---|---|
Probe run() raises |
Coordinator try/except around the awaited task |
Probe gets ProbeOutput(errors=[...], confidence="low"); coordinator continues; CLI exit code is 0 if any probe succeeded, 2 if all failed |
Probe exceeds timeout_seconds × 1.5 |
asyncio.wait_for + hard cancel |
Same as above; warning logged with elapsed time |
| Cache index corruption (truncated last record) | JSONL parse error on read | Last partial line is discarded; index treated as truth up to the last complete record. Append-only design makes this safe. |
| Cache blob missing for an index entry | FileNotFoundError at get() |
Treat as miss; re-run probe; rewrite blob. Index entry is left dangling, swept by cache gc. |
| Schema validation fails on produced YAML | fastjsonschema raises |
CLI exits with code 3 and a structured diff of the offending paths. The YAML is still written (with a .invalid suffix) so the developer can inspect. |
repo-context.yaml.tmp rename interrupted |
Filesystem atomicity (POSIX rename is atomic) | Either the new file is in place or the old one is; no half-state visible to readers. |
uv pip install flakes in CI |
GitHub Actions exit non-zero | Retry once via nick-fields/retry@v3 with backoff; if second attempt fails, fail the job (likely a real problem). One retry, not three — three retries hides flakiness we want to see. |
mkdocs build warns |
mkdocs build --strict |
Job fails. Strict mode is the gate the exit criterion implies. |
pytest-xdist worker crash |
xdist reports internalerror |
Job fails; report the crashing worker's stderr. No retry — crashing test workers indicate a real bug. |
Resource & cost profile¶
Concrete numbers (order-of-magnitude, validated against benchmarks for similar Python CLIs):
- Tokens per run: 0. Phase 0 makes no LLM calls.
- Wall-clock per run (p50 / p95):
codegenie --help: 30 ms / 80 ms.codegenie gatheron a 1k-file repo: 200 ms / 350 ms.codegenie gatheron a 50k-file Node repo: 800 ms / 1500 ms (dominated byos.scandir; the rest is < 100 ms total).- Memory per worker (RSS):
- Idle (post-import for
gather): ~60 MB. Pyyaml C + jsonschema referencing + pydantic accounts for ~30 MB; the rest is base interpreter + click + the codegenie package itself. - Peak during a 50k-file gather: ~90 MB. The scandir walk holds entry objects in a list briefly; the cache write transient is sub-MB.
- The 512 MB worker ceiling (a Phase 8+ target) is met with 5× headroom.
- Storage growth rate:
- Per-gather:
repo-context.yaml(~2 KB Phase 0; ~300 KB Phase 2) + raw artifacts (~1 KB Phase 0; ~5 MB Phase 2) + one cache blob (~1 KB Phase 0; cumulative cache ~50 MB after 30 probes warm). - Cache index grows ~150 bytes per gather. After a year of nightly continuous gather (365 gathers × 30 probes = 11k entries), the index is ~1.6 MB — mmap'd, scanned in single-digit ms.
- Hot vs cold cost ratio:
- Cold
gatheron a new repo: 100% probe execution. - Warm
gather(second run, no source changes): ≥ 95% cache hit dispatch path. Phase 1's exit criterion (cache hits on second run) is met by Phase 0's cache code, not Phase 1's. - CI cost per PR:
- Three parallel jobs × ~70 s p95 = ~70 s wall-clock per PR (parallelism caps the cost at the slowest job). The
testjob dominates. - Docs build job: ~25 s, only on docs-touching PRs.
- GitHub Actions minutes per PR: ~3.5 min on Linux runners. Phase 0 ships with ~5 tests so the test job will be I/O-bound on dependency install, not test execution — which is why
uvmatters.
Test plan¶
"Passes its tests" for Phase 0 means:
- Unit tests:
ProbeABC subclassing works; missing methods raise the expectedTypeError.@register_probeadds the class to the registry; double-registration is detected and rejected.LanguageDetectionProbe.run()against a fixture tree returns the expected dict; theProbeOutput.confidenceis"high"when files are unambiguous.cache.put→cache.getround-trips; mismatched cache key returnsNone; corrupted blob is detected.coordinator.gather()runs one probe end-to-end and produces a validRepoContext.- Output writer produces a YAML that re-parses to the same dict (round-trip identity).
- Schema validator accepts the produced YAML and rejects a known-invalid one.
- Smoke tests:
codegenie gather --helpexits 0.codegenie gather /tmp/empty-dir(aftermkdir -p /tmp/empty-dir) exits 0 and produces arepo-context.yamlwithlanguage_stack.primary: nullandconfidence: low.codegenie gather <fixture-node-repo>exits 0 and produces the expected language stack.- CI invariants:
- All three jobs (lint, typecheck, test) green on every PR.
mkdocs build --strictexits 0 in the docs job.- Performance regression tests (the canaries):
bench/test_cli_cold_start.pyassertscodegenie --helpcompletes in ≤ 200 ms (Linux CI; the 80 ms macOS target isn't a CI gate). Usessubprocess.run× 5, takes the median. Fails the build on regression. This is the canary — the entire performance lens collapses if CLI startup balloons.bench/test_coordinator_overhead.pymeasures coordinator dispatch+merge+write for 1 probe with a no-oprun(); asserts ≤ 30 ms (with headroom over the 25 ms target). Phase 2 will tighten this.bench/test_cache_hit_dispatch.pyasserts a cache-hitgather(second run, no input changes) is ≥ 5× faster than the cold run on the fixture repo.- Benchmark tests run on a single dedicated CI job tagged
[bench], on a fixed runner type, so the absolute numbers are comparable PR-to-PR. They post their numbers as a PR comment via a small action.
Risks (top 3–5)¶
fastjsonschemadivergence from canonicaljsonschema. The two have small spec deviations in error reporting and a couple of edge-case validations. If a future schema feature (e.g.,unevaluatedProperties) tripsfastjsonschemawe eat the runtime cost or change libraries. Mitigation: dev-time validation usesjsonschema(authoritative), CI hot-path usesfastjsonschemawith a once-per-CI-run differential test that both agree on a corpus of valid + invalid documents.uvinstability. It's the fastest installer; it's also young. A regression inuvcould break CI for a day. Mitigation: pinuvto an exact version in CI; have a documented fallback topip(one-line change in the workflow). Cost of fallback: ~20 s per CI run.- Lazy import discipline erodes. A future contributor adds a top-level
import yamltocodegenie/cli.pyand the cold start regresses 25 ms. Mitigation: thetest_cli_cold_start.pycanary catches it; animport-linterconfig flagged inpyproject.tomlblocks imports of heavy modules from the CLI entry path at typecheck time. - Cache layout assumptions break on Phase 2 scale. Append-only JSONL is fine at 11k entries; at 1M (continuous-gather at portfolio scale, Phase 14) it's not. Mitigation: the cache API is narrow (
get/put/key_for); swapping the backend to SQLite or LMDB in Phase 14 is bounded work. Don't pre-build it now. - xxh3 over SHA-256 introduces a "different hash than the spec" footgun. Reviewers will reach for SHA-256 by reflex. Mitigation: central
codegenie.hashingmodule is the single source of truth; document the choice; a unit test guards against accidental swap. Surface to the synthesizer —localv2.md §8mentionssha256(...)and this design swaps the inner hash. Justified by perf, but worth an explicit decision.
Acknowledged blind spots¶
What this lens deprioritized — the synthesizer should weight these against the security and best-practices designs:
- Security of the cache layer. xxh3 is non-cryptographic; the cache directory lives inside an analyzed repo; a compromised repo can poison its own cache. Performance lens treats this as "the threat model doesn't include local FS write attackers in Phase 0." Security-lens will likely disagree.
- Error message quality. Click's default errors are fine; we don't invest in pretty error frames, suggestion engines, or
did-you-meanheuristics. Time spent there is time stolen from CI walltime. - Contributor onboarding ergonomics. Lazy imports,
dmypydaemon,uvinstead ofpip,fastjsonschemaalongsidejsonschema— every one of these is a thing a new contributor has to learn. Best-practices lens will likely prefer the one-tool-per-job approach. - Cross-platform breadth. No Windows. macOS gets the dtruss fallback already noted in
localv2.md. We don't test on FreeBSD or Alpine. Phase 0 doesn't need to; this is a non-cost. - Observability/telemetry. Phase 13 introduces OpenTelemetry. Phase 0 emits a minimal structured-log JSON run record and stops there. No tracing, no metrics export. Performance-lens-aligned: each layer of telemetry has cost; pay for it when there's something to observe.
- Type-checking depth on tests. Tests get
--ignore-missing-imports. A bug class slips through. Acceptable tradeoff against CI walltime.
Open questions for the synthesizer¶
- xxh3 vs SHA-256 for cache content hashing. Perf argument is decisive (~100× faster).
localv2.md §8text mentions SHA-256. Is the spec mention prescriptive or descriptive? If prescriptive, I'd push back; if descriptive, switch and document. fastjsonschema+jsonschematwo-library setup, or justjsonschema? Synthesizer should weigh the ~10× speedup against the operational complexity of keeping the two agreeing.uvin CI now, or wait until the dev experience pain is real? Phase 0 doesn't have to useuv; CI completes in ~70 s with stockpip(vs ~50 s withuv). Not a difference that matters yet. Synthesizer's call whether to pay the novelty cost.- Benchmark canary as a hard CI gate, or advisory? A hard gate is the only way to actually preserve performance; advisory canaries get ignored. The cost is that PRs touching unrelated code can fail on a noisy benchmark machine. My vote: hard gate, with a clearly-marked
[skip-bench]PR-title escape hatch for code that the author asserts is performance-neutral. - Coordinator concurrency default of 8. Phase 0 only runs one probe so this doesn't matter; Phase 1 with 6 probes will start exercising it. Should the default be
os.cpu_count(),min(cpu, 8), or a config-driven number from the start? My pick ismin(cpu, 8)because portfolio workers in Phase 9+ may run on cheap 2-vCPU instances where unbounded concurrency thrashes.