Attempt log — S6-01 (SkillsIndexProbe Layer D)¶
Attempt 1 — phase-story-executor, 2026-05-17¶
Outcome¶
GREEN. All 22 ACs verified with runtime evidence; 27 unit + integration tests added; full suite green (2696 passed); lint / format / mypy / pre-commit / import-linter / fence clean.
What shipped¶
| Path | Action |
|---|---|
src/codegenie/probes/layer_d/__init__.py |
NEW — package marker with one-paragraph docstring naming Layer D's role. |
src/codegenie/probes/layer_d/skills_index.py |
NEW — IndexedSkill + SkillsIndexSlice Pydantic models, 4 pure helpers (_project_skill, _project_skills_sorted, _count_skills_per_tier, _compute_confidence), SkillsIndexProbe class registered with heaviness="light". |
src/codegenie/probes/__init__.py |
MODIFY — added explicit from codegenie.probes.layer_d import skills_index so @register_probe fires on package import; added "skills_index" to __all__. |
src/codegenie/schema/probes/skills_index.schema.json |
NEW — basic JSON-Schema for SkillsIndexSlice. The story's Implementation Outline §3 offered two options ("placeholder OR leave to S6-08"); I shipped the schema now so AC-19 verifies on this PR (avoids the "loud-until-S6-08" failure mode and lets the executor mark the story Done with all ACs green). S6-08 may refine / replace. |
tests/unit/probes/layer_d/__init__.py |
NEW — empty package marker. |
tests/unit/probes/layer_d/test_skills_index.py |
NEW — 27 tests keyed to ACs (every test docstring names the AC + the mutation it catches). |
AC → evidence map¶
| AC | Evidence (test name or gate output) |
|---|---|
| AC-1 | tests/unit/probes/layer_d/test_skills_index.py::test_layer_d_package_marker_exists; file present at src/codegenie/probes/layer_d/__init__.py. |
| AC-2 | test_skills_index_module_exports_exact_all. |
| AC-3 | test_slice_is_sorted_and_frozen (frozen + tuple-typed) + test_every_indexed_skill_field_populated_from_canonical_fixture[*] (6 parametrized — every field non-default-shaped). |
| AC-4 | test_run_returns_probeoutput_with_all_six_fields + test_tier_counts_match_three_tier_layout + test_symlinked_skill_yields_medium_confidence_no_raise (per_file_errors carries JSON-dumps). |
| AC-5 | test_probe_contract_attributes (every class attribute pinned to its declared value) + test_registry_heaviness_is_light. |
| AC-6 | test_run_returns_probeoutput_with_all_six_fields (all six ProbeOutput fields constructed) + test_fatal_load_error_yields_low_confidence (Err branch) + test_symlinked_skill_yields_medium_confidence_no_raise (Ok branch with per_file_errors). |
| AC-7 | test_declared_inputs_include_three_tier_tokens. |
| AC-9 | test_compute_confidence_high_on_clean_load + test_compute_confidence_medium_on_partial_success + test_compute_confidence_low_when_all_failed + the empty / symlinked / fatal-load integration tests. |
| AC-10 | test_tracemalloc_peak_under_1mb_on_100mb_body — sparse 100 MB body, peak <1 MB. |
| AC-11 | test_probe_module_source_has_no_file_open — inspect.getsource(si) contains none of os.open, os.read, .read_bytes, .read_text, .open(. |
| AC-12 | test_recorded_anchors_match_actual_body_blake3 — content_hash_bytes(body) == indexed.body_blake3, prefix preserved. |
| AC-13 | test_slice_is_sorted_and_frozen (lexical sort) + test_two_consecutive_gathers_byte_identical_json (sort_keys=True JSON equality across two runs). |
| AC-14 | test_tier_counts_match_three_tier_layout — 3 user / 1 repo / 0 org from a fixture; missing org tier counts 0 not raise. |
| AC-15 | test_empty_fixture_yields_high_confidence. |
| AC-16 | test_fatal_load_error_yields_low_confidence — monkeypatch SkillsLoader.load_all to return Err(FatalLoadError); probe emits confidence="low", skills=(), three-key zero tier_counts, error JSON in output.errors, no re-raise. |
| AC-17 | test_symlinked_skill_yields_medium_confidence_no_raise — partial-success fixture (one good + one symlinked) → confidence="medium", one skill, one per-file-error JSON with "reason": "symlink_refused". |
| AC-18 | test_shadowed_skill_propagates_first_tier_wins — user tier wins (one row); tier_counts == {user:1, repo:1, org:0} (both files on-disk surface to operators). |
| AC-19 | test_slice_matches_subschema — slice JSON validates against src/codegenie/schema/probes/skills_index.schema.json via importlib.resources. |
| AC-20 | test_registry_heaviness_is_light — default_registry._entries carries heaviness="light", runs_last=False. |
| AC-21 | .venv/bin/mypy --strict src/codegenie → "Success: no issues found in 111 source files" (verified after the GREEN code landed; 111 includes the new module). |
| AC-22 | test_every_indexed_skill_field_populated_from_canonical_fixture — parametrized over IndexedSkill.model_fields.keys() (6 fields). |
| AC-23 | test_projection_is_cardinality_and_order_preserving — Hypothesis property over _skills() strategy; cardinality + sort invariants. |
Gates¶
.venv/bin/ruff check src tests— All checks passed!.venv/bin/ruff format --check src/codegenie/probes/layer_d tests/unit/probes/layer_d src/codegenie/probes/__init__.py— 5 files already formatted..venv/bin/mypy --strict src/codegenie— 111 files, 0 errors..venv/bin/lint-imports --no-cache— 2 contracts kept (cli + package init)..venv/bin/pytest tests/unit/test_pyproject_fence.py— 9 passed (no LLM SDK leaked into the runtime closure).PATH=$VENV/bin:$PATH .venv/bin/pytest -q— 2696 passed, 29 skipped (macOS Layer-C Linux-only adv suite + a couple of[dev]-extras-gated tests), 3 deselected, 2 xfailed.PATH=$VENV/bin:$PATH pre-commit run --files <changed>— ruff / ruff-format / mypy / detect-secrets / end-of-files / trim-trailing-whitespace / forbidden-patterns — all passed.
Surprises during implementation¶
-
Probe.versionis required by the cache key path even though it's not on the frozen ABC. The registry docstring atsrc/codegenie/probes/registry.py:30-37is explicit thatversionis "a convention, not part of the frozen ABC" — butcoordinator/coordinator.pyreadscls.versionto build cache keys. Forgetting the attribute crashes 45 tests at gather-dispatch time withAttributeError("'SkillsIndexProbe' object has no attribute 'version'"). Every probe inlayer_b/andlayer_c/carriesversion: str = "0.1.0"for this reason. Fixed by adding the attribute; no story-level AC names it (it's a load-bearing convention the story implicitly assumes). Flagged in lessons. -
AC-11's source-grep over the WHOLE module catches docstrings. My initial GREEN included a helper docstring that mentioned
os.openin prose ("AC-11's source-grep interdict… does not containos.openetc."). The substring match in the test treats docstring occurrences identically to code occurrences, which is the point — a comment "I plan to addos.openhere later" would be just as load-bearing as the call itself. Reworded the docstring to use prose (opendir/readdir, "no file-opening primitives") without literal substrings. -
Implementation Outline §3's "preferred" deferred-schema option fails AC-19. The story's outline names the deferred-schema choice as "preferred", but the validator's AC-19 wires the schema in as a hard runtime test. Shipping the schema now is the only path that lets the executor mark the story
Donewith all ACs verified. Documented in the changes-shipped table above and noted as a design-pattern choice (schema-before-consumer beats schema-after-consumer when the consumer is a test in the same PR).
Refactor decisions¶
- Functional core / imperative shell — 4 pure module-level helpers (
_project_skill,_project_skills_sorted,_count_skills_per_tier,_compute_confidence) carry all the business logic; the probe class is orchestration only (search-path resolution + Result pattern-match + dataclass construction). Three of the four helpers have their own unit tests; the fourth (_project_skill) is exercised by every integration test. - Newtype preservation end-to-end —
IndexedSkill.applies_to_*usestuple[TaskClassId, ...]/tuple[Language, ...], nottuple[str, ...]. ADR-0033 §1 primitive-obsession; preservesmypy --strictdiscipline for the Planner consumer. - Smart constructor —
IndexedSkill.body_blake3regex-pinned to^blake3:[0-9a-f]{64}$; a regression in the loader that drops the prefix fails at Pydantic-validation time, not silently downstream. - Open/Closed — Rule-of-three trigger documented in the story Notes-for-implementer §9 (do NOT extract
_count_files_per_tiershared helper until the third tier-aware probe lands)._count_skills_per_tieraccepts aSequence[Path]so the shared extraction is a parameter-rename when the trigger fires. - Sum-type discipline —
_compute_confidencereturnsLiteral["high","medium","low"]exhaustively (noelse: raise); consumes the loader'sResult[LoadOutcome, FatalLoadError]discriminated union viaisinstance(result, Err) / Ok.
Files-to-touch table reconciliation¶
The story's Files-to-touch table named 5 entries; I shipped all 5 + the schema. The schema's status was "S6-08 dependency, AC-19 fails loudly until S6-08 lands it" — I chose to land it now (see surprise #3 above). No story files were dropped or skipped.
Suggested commit message¶
feat(phase2/S6-01): GREEN — SkillsIndexProbe Layer D body-byte-free indexer
Lands src/codegenie/probes/layer_d/skills_index.py as a
@register_probe(heaviness="light") Layer-D probe that projects
SkillsLoader.load_all() output into a typed SkillsIndexSlice carrying
the two indices the Planner queries (applies_to_tasks,
applies_to_languages), body byte-offset/size/BLAKE3 anchors, three-key
tier_counts derived via filesystem enumeration, and per-file-error JSON
round-trips. Bodies are NEVER opened by the probe — the loader (S2-01)
recorded body_offset/body_size/body_blake3 in one streaming pass; this
probe re-uses those anchors. tracemalloc test on a 100 MB sparse-body
fixture verifies peak <1 MB; AC-11 source-grep interdicts os.open /
os.read / .read_bytes / .read_text / .open( anywhere in the module.
Functional-core / imperative-shell split (4 pure helpers + orchestration
shell). Newtypes preserved end-to-end (SkillId/TaskClassId/Language —
no primitive-obsession laundering into raw str). Three-state confidence
policy (high on clean load, medium on partial success, low on total
failure or FatalLoadError). Schema shipped at
src/codegenie/schema/probes/skills_index.schema.json so AC-19 verifies
on this PR (Implementation Outline §3 offered deferring to S6-08; the
schema-before-consumer choice avoids the loud-until-S6-08 failure mode).
22 ACs verified with runtime evidence; 27 unit/integration tests added
(parametrized field-coverage + Hypothesis projection property included).
Full suite green: 2696 passed, 29 skipped, 2 xfailed. mypy --strict,
ruff, lint-imports, pre-commit, fence — all clean.
Lessons for future Phase 2 stories¶
Probe.versionis load-bearing, not optional. Every probe needsversion: str = "0.1.0"as a class attribute even though the frozen ABC doesn't declare it. The cache key reads it; missing it crashes 45 tests at gather time. Future Layer-D / Layer-E / Layer-G probes should copy the attribute from the closest sibling (S5-03 family is the canonical reference).- AC-11-style source-grep interdicts catch docstring strings. When writing a probe whose tests forbid
os.open/.read_bytes/ etc. anywhere in the module source, prose docstrings that mention those tokens by name will fail the test. Use synonyms (opendir/readdir, "no file-opening primitives") rather than literal forbidden substrings. - "Defer to a future story" can collide with hard runtime ACs. A story's Implementation Outline option to defer a dependency (e.g., schema files to S6-08) won't pass an AC that tests the dependency's existence at runtime. The executor should ship the dependency in the current PR when the AC needs it; flag the choice in the attempt log.
- The rule-of-three trigger needs to be a concrete predicate.
_count_skills_per_tieris per-Layer-D-probe today (rule-of-three not triggered). The next tier-aware Layer-D probe (S6-02ConventionsCatalogProbe?) is the second; the third is the trigger. Refactor to_count_files_per_tier(search_paths, glob_pattern)undersrc/codegenie/probes/_shared/tier_counts.pyat that point — not before (Rule 2 / YAGNI).