Attempt log — S6-01 (`SkillsIndexProbe` Layer D)¶

Attempt 1 — phase-story-executor, 2026-05-17¶

Outcome¶

GREEN. All 22 ACs verified with runtime evidence; 27 unit + integration tests added; full suite green (2696 passed); lint / format / mypy / pre-commit / import-linter / fence clean.

What shipped¶

Path	Action
`src/codegenie/probes/layer_d/__init__.py`	NEW — package marker with one-paragraph docstring naming Layer D's role.
`src/codegenie/probes/layer_d/skills_index.py`	NEW — `IndexedSkill` + `SkillsIndexSlice` Pydantic models, 4 pure helpers (`_project_skill`, `_project_skills_sorted`, `_count_skills_per_tier`, `_compute_confidence`), `SkillsIndexProbe` class registered with `heaviness="light"`.
`src/codegenie/probes/__init__.py`	MODIFY — added explicit `from codegenie.probes.layer_d import skills_index` so `@register_probe` fires on package import; added `"skills_index"` to `__all__`.
`src/codegenie/schema/probes/skills_index.schema.json`	NEW — basic JSON-Schema for `SkillsIndexSlice`. The story's Implementation Outline §3 offered two options ("placeholder OR leave to S6-08"); I shipped the schema now so AC-19 verifies on this PR (avoids the "loud-until-S6-08" failure mode and lets the executor mark the story Done with all ACs green). S6-08 may refine / replace.
`tests/unit/probes/layer_d/__init__.py`	NEW — empty package marker.
`tests/unit/probes/layer_d/test_skills_index.py`	NEW — 27 tests keyed to ACs (every test docstring names the AC + the mutation it catches).

AC → evidence map¶

AC	Evidence (test name or gate output)
AC-1	`tests/unit/probes/layer_d/test_skills_index.py::test_layer_d_package_marker_exists`; file present at `src/codegenie/probes/layer_d/__init__.py`.
AC-2	`test_skills_index_module_exports_exact_all`.
AC-3	`test_slice_is_sorted_and_frozen` (frozen + tuple-typed) + `test_every_indexed_skill_field_populated_from_canonical_fixture[*]` (6 parametrized — every field non-default-shaped).
AC-4	`test_run_returns_probeoutput_with_all_six_fields` + `test_tier_counts_match_three_tier_layout` + `test_symlinked_skill_yields_medium_confidence_no_raise` (per_file_errors carries JSON-dumps).
AC-5	`test_probe_contract_attributes` (every class attribute pinned to its declared value) + `test_registry_heaviness_is_light`.
AC-6	`test_run_returns_probeoutput_with_all_six_fields` (all six `ProbeOutput` fields constructed) + `test_fatal_load_error_yields_low_confidence` (Err branch) + `test_symlinked_skill_yields_medium_confidence_no_raise` (Ok branch with `per_file_errors`).
AC-7	`test_declared_inputs_include_three_tier_tokens`.
AC-9	`test_compute_confidence_high_on_clean_load` + `test_compute_confidence_medium_on_partial_success` + `test_compute_confidence_low_when_all_failed` + the empty / symlinked / fatal-load integration tests.
AC-10	`test_tracemalloc_peak_under_1mb_on_100mb_body` — sparse 100 MB body, peak <1 MB.
AC-11	`test_probe_module_source_has_no_file_open` — `inspect.getsource(si)` contains none of `os.open`, `os.read`, `.read_bytes`, `.read_text`, `.open(`.
AC-12	`test_recorded_anchors_match_actual_body_blake3` — `content_hash_bytes(body) == indexed.body_blake3`, prefix preserved.
AC-13	`test_slice_is_sorted_and_frozen` (lexical sort) + `test_two_consecutive_gathers_byte_identical_json` (`sort_keys=True` JSON equality across two runs).
AC-14	`test_tier_counts_match_three_tier_layout` — 3 user / 1 repo / 0 org from a fixture; missing org tier counts 0 not raise.
AC-15	`test_empty_fixture_yields_high_confidence`.
AC-16	`test_fatal_load_error_yields_low_confidence` — monkeypatch `SkillsLoader.load_all` to return `Err(FatalLoadError)`; probe emits `confidence="low"`, `skills=()`, three-key zero `tier_counts`, error JSON in `output.errors`, no re-raise.
AC-17	`test_symlinked_skill_yields_medium_confidence_no_raise` — partial-success fixture (one good + one symlinked) → `confidence="medium"`, one skill, one per-file-error JSON with `"reason": "symlink_refused"`.
AC-18	`test_shadowed_skill_propagates_first_tier_wins` — user tier wins (one row); `tier_counts == {user:1, repo:1, org:0}` (both files on-disk surface to operators).
AC-19	`test_slice_matches_subschema` — slice JSON validates against `src/codegenie/schema/probes/skills_index.schema.json` via `importlib.resources`.
AC-20	`test_registry_heaviness_is_light` — `default_registry._entries` carries `heaviness="light"`, `runs_last=False`.
AC-21	`.venv/bin/mypy --strict src/codegenie` → "Success: no issues found in 111 source files" (verified after the GREEN code landed; 111 includes the new module).
AC-22	`test_every_indexed_skill_field_populated_from_canonical_fixture` — parametrized over `IndexedSkill.model_fields.keys()` (6 fields).
AC-23	`test_projection_is_cardinality_and_order_preserving` — Hypothesis property over `_skills()` strategy; cardinality + sort invariants.

Gates¶

.venv/bin/ruff check src tests — All checks passed!
.venv/bin/ruff format --check src/codegenie/probes/layer_d tests/unit/probes/layer_d src/codegenie/probes/__init__.py — 5 files already formatted.
.venv/bin/mypy --strict src/codegenie — 111 files, 0 errors.
.venv/bin/lint-imports --no-cache — 2 contracts kept (cli + package init).
.venv/bin/pytest tests/unit/test_pyproject_fence.py — 9 passed (no LLM SDK leaked into the runtime closure).
PATH=$VENV/bin:$PATH .venv/bin/pytest -q — 2696 passed, 29 skipped (macOS Layer-C Linux-only adv suite + a couple of [dev]-extras-gated tests), 3 deselected, 2 xfailed.
PATH=$VENV/bin:$PATH pre-commit run --files <changed> — ruff / ruff-format / mypy / detect-secrets / end-of-files / trim-trailing-whitespace / forbidden-patterns — all passed.

Surprises during implementation¶

Probe.version is required by the cache key path even though it's not on the frozen ABC. The registry docstring at src/codegenie/probes/registry.py:30-37 is explicit that version is "a convention, not part of the frozen ABC" — but coordinator/coordinator.py reads cls.version to build cache keys. Forgetting the attribute crashes 45 tests at gather-dispatch time with AttributeError("'SkillsIndexProbe' object has no attribute 'version'"). Every probe in layer_b/ and layer_c/ carries version: str = "0.1.0" for this reason. Fixed by adding the attribute; no story-level AC names it (it's a load-bearing convention the story implicitly assumes). Flagged in lessons.
AC-11's source-grep over the WHOLE module catches docstrings. My initial GREEN included a helper docstring that mentioned os.open in prose ("AC-11's source-grep interdict… does not contain os.open etc."). The substring match in the test treats docstring occurrences identically to code occurrences, which is the point — a comment "I plan to add os.open here later" would be just as load-bearing as the call itself. Reworded the docstring to use prose (opendir/readdir, "no file-opening primitives") without literal substrings.
Implementation Outline §3's "preferred" deferred-schema option fails AC-19. The story's outline names the deferred-schema choice as "preferred", but the validator's AC-19 wires the schema in as a hard runtime test. Shipping the schema now is the only path that lets the executor mark the story Done with all ACs verified. Documented in the changes-shipped table above and noted as a design-pattern choice (schema-before-consumer beats schema-after-consumer when the consumer is a test in the same PR).

Refactor decisions¶

Functional core / imperative shell — 4 pure module-level helpers (_project_skill, _project_skills_sorted, _count_skills_per_tier, _compute_confidence) carry all the business logic; the probe class is orchestration only (search-path resolution + Result pattern-match + dataclass construction). Three of the four helpers have their own unit tests; the fourth (_project_skill) is exercised by every integration test.
Newtype preservation end-to-end — IndexedSkill.applies_to_* uses tuple[TaskClassId, ...] / tuple[Language, ...], not tuple[str, ...]. ADR-0033 §1 primitive-obsession; preserves mypy --strict discipline for the Planner consumer.
Smart constructor — IndexedSkill.body_blake3 regex-pinned to ^blake3:[0-9a-f]{64}$; a regression in the loader that drops the prefix fails at Pydantic-validation time, not silently downstream.
Open/Closed — Rule-of-three trigger documented in the story Notes-for-implementer §9 (do NOT extract _count_files_per_tier shared helper until the third tier-aware probe lands). _count_skills_per_tier accepts a Sequence[Path] so the shared extraction is a parameter-rename when the trigger fires.
Sum-type discipline — _compute_confidence returns Literal["high","medium","low"] exhaustively (no else: raise); consumes the loader's Result[LoadOutcome, FatalLoadError] discriminated union via isinstance(result, Err) / Ok.

Files-to-touch table reconciliation¶

The story's Files-to-touch table named 5 entries; I shipped all 5 + the schema. The schema's status was "S6-08 dependency, AC-19 fails loudly until S6-08 lands it" — I chose to land it now (see surprise #3 above). No story files were dropped or skipped.

Suggested commit message¶

feat(phase2/S6-01): GREEN — SkillsIndexProbe Layer D body-byte-free indexer

Lands src/codegenie/probes/layer_d/skills_index.py as a
@register_probe(heaviness="light") Layer-D probe that projects
SkillsLoader.load_all() output into a typed SkillsIndexSlice carrying
the two indices the Planner queries (applies_to_tasks,
applies_to_languages), body byte-offset/size/BLAKE3 anchors, three-key
tier_counts derived via filesystem enumeration, and per-file-error JSON
round-trips. Bodies are NEVER opened by the probe — the loader (S2-01)
recorded body_offset/body_size/body_blake3 in one streaming pass; this
probe re-uses those anchors. tracemalloc test on a 100 MB sparse-body
fixture verifies peak <1 MB; AC-11 source-grep interdicts os.open /
os.read / .read_bytes / .read_text / .open( anywhere in the module.

Functional-core / imperative-shell split (4 pure helpers + orchestration
shell). Newtypes preserved end-to-end (SkillId/TaskClassId/Language —
no primitive-obsession laundering into raw str). Three-state confidence
policy (high on clean load, medium on partial success, low on total
failure or FatalLoadError). Schema shipped at
src/codegenie/schema/probes/skills_index.schema.json so AC-19 verifies
on this PR (Implementation Outline §3 offered deferring to S6-08; the
schema-before-consumer choice avoids the loud-until-S6-08 failure mode).

22 ACs verified with runtime evidence; 27 unit/integration tests added
(parametrized field-coverage + Hypothesis projection property included).
Full suite green: 2696 passed, 29 skipped, 2 xfailed. mypy --strict,
ruff, lint-imports, pre-commit, fence — all clean.

Lessons for future Phase 2 stories¶

Probe.version is load-bearing, not optional. Every probe needs version: str = "0.1.0" as a class attribute even though the frozen ABC doesn't declare it. The cache key reads it; missing it crashes 45 tests at gather time. Future Layer-D / Layer-E / Layer-G probes should copy the attribute from the closest sibling (S5-03 family is the canonical reference).
AC-11-style source-grep interdicts catch docstring strings. When writing a probe whose tests forbid os.open / .read_bytes / etc. anywhere in the module source, prose docstrings that mention those tokens by name will fail the test. Use synonyms (opendir/readdir, "no file-opening primitives") rather than literal forbidden substrings.
"Defer to a future story" can collide with hard runtime ACs. A story's Implementation Outline option to defer a dependency (e.g., schema files to S6-08) won't pass an AC that tests the dependency's existence at runtime. The executor should ship the dependency in the current PR when the AC needs it; flag the choice in the attempt log.
The rule-of-three trigger needs to be a concrete predicate. _count_skills_per_tier is per-Layer-D-probe today (rule-of-three not triggered). The next tier-aware Layer-D probe (S6-02 ConventionsCatalogProbe?) is the second; the third is the trigger. Refactor to _count_files_per_tier(search_paths, glob_pattern) under src/codegenie/probes/_shared/tier_counts.py at that point — not before (Rule 2 / YAGNI).

Attempt log — S6-01 (SkillsIndexProbe Layer D)¶