Skip to content

Attempt log — S6-01 (SkillsIndexProbe Layer D)

Attempt 1 — phase-story-executor, 2026-05-17

Outcome

GREEN. All 22 ACs verified with runtime evidence; 27 unit + integration tests added; full suite green (2696 passed); lint / format / mypy / pre-commit / import-linter / fence clean.

What shipped

Path Action
src/codegenie/probes/layer_d/__init__.py NEW — package marker with one-paragraph docstring naming Layer D's role.
src/codegenie/probes/layer_d/skills_index.py NEW — IndexedSkill + SkillsIndexSlice Pydantic models, 4 pure helpers (_project_skill, _project_skills_sorted, _count_skills_per_tier, _compute_confidence), SkillsIndexProbe class registered with heaviness="light".
src/codegenie/probes/__init__.py MODIFY — added explicit from codegenie.probes.layer_d import skills_index so @register_probe fires on package import; added "skills_index" to __all__.
src/codegenie/schema/probes/skills_index.schema.json NEW — basic JSON-Schema for SkillsIndexSlice. The story's Implementation Outline §3 offered two options ("placeholder OR leave to S6-08"); I shipped the schema now so AC-19 verifies on this PR (avoids the "loud-until-S6-08" failure mode and lets the executor mark the story Done with all ACs green). S6-08 may refine / replace.
tests/unit/probes/layer_d/__init__.py NEW — empty package marker.
tests/unit/probes/layer_d/test_skills_index.py NEW — 27 tests keyed to ACs (every test docstring names the AC + the mutation it catches).

AC → evidence map

AC Evidence (test name or gate output)
AC-1 tests/unit/probes/layer_d/test_skills_index.py::test_layer_d_package_marker_exists; file present at src/codegenie/probes/layer_d/__init__.py.
AC-2 test_skills_index_module_exports_exact_all.
AC-3 test_slice_is_sorted_and_frozen (frozen + tuple-typed) + test_every_indexed_skill_field_populated_from_canonical_fixture[*] (6 parametrized — every field non-default-shaped).
AC-4 test_run_returns_probeoutput_with_all_six_fields + test_tier_counts_match_three_tier_layout + test_symlinked_skill_yields_medium_confidence_no_raise (per_file_errors carries JSON-dumps).
AC-5 test_probe_contract_attributes (every class attribute pinned to its declared value) + test_registry_heaviness_is_light.
AC-6 test_run_returns_probeoutput_with_all_six_fields (all six ProbeOutput fields constructed) + test_fatal_load_error_yields_low_confidence (Err branch) + test_symlinked_skill_yields_medium_confidence_no_raise (Ok branch with per_file_errors).
AC-7 test_declared_inputs_include_three_tier_tokens.
AC-9 test_compute_confidence_high_on_clean_load + test_compute_confidence_medium_on_partial_success + test_compute_confidence_low_when_all_failed + the empty / symlinked / fatal-load integration tests.
AC-10 test_tracemalloc_peak_under_1mb_on_100mb_body — sparse 100 MB body, peak <1 MB.
AC-11 test_probe_module_source_has_no_file_openinspect.getsource(si) contains none of os.open, os.read, .read_bytes, .read_text, .open(.
AC-12 test_recorded_anchors_match_actual_body_blake3content_hash_bytes(body) == indexed.body_blake3, prefix preserved.
AC-13 test_slice_is_sorted_and_frozen (lexical sort) + test_two_consecutive_gathers_byte_identical_json (sort_keys=True JSON equality across two runs).
AC-14 test_tier_counts_match_three_tier_layout — 3 user / 1 repo / 0 org from a fixture; missing org tier counts 0 not raise.
AC-15 test_empty_fixture_yields_high_confidence.
AC-16 test_fatal_load_error_yields_low_confidence — monkeypatch SkillsLoader.load_all to return Err(FatalLoadError); probe emits confidence="low", skills=(), three-key zero tier_counts, error JSON in output.errors, no re-raise.
AC-17 test_symlinked_skill_yields_medium_confidence_no_raise — partial-success fixture (one good + one symlinked) → confidence="medium", one skill, one per-file-error JSON with "reason": "symlink_refused".
AC-18 test_shadowed_skill_propagates_first_tier_wins — user tier wins (one row); tier_counts == {user:1, repo:1, org:0} (both files on-disk surface to operators).
AC-19 test_slice_matches_subschema — slice JSON validates against src/codegenie/schema/probes/skills_index.schema.json via importlib.resources.
AC-20 test_registry_heaviness_is_lightdefault_registry._entries carries heaviness="light", runs_last=False.
AC-21 .venv/bin/mypy --strict src/codegenie → "Success: no issues found in 111 source files" (verified after the GREEN code landed; 111 includes the new module).
AC-22 test_every_indexed_skill_field_populated_from_canonical_fixture — parametrized over IndexedSkill.model_fields.keys() (6 fields).
AC-23 test_projection_is_cardinality_and_order_preserving — Hypothesis property over _skills() strategy; cardinality + sort invariants.

Gates

  • .venv/bin/ruff check src tests — All checks passed!
  • .venv/bin/ruff format --check src/codegenie/probes/layer_d tests/unit/probes/layer_d src/codegenie/probes/__init__.py — 5 files already formatted.
  • .venv/bin/mypy --strict src/codegenie — 111 files, 0 errors.
  • .venv/bin/lint-imports --no-cache — 2 contracts kept (cli + package init).
  • .venv/bin/pytest tests/unit/test_pyproject_fence.py — 9 passed (no LLM SDK leaked into the runtime closure).
  • PATH=$VENV/bin:$PATH .venv/bin/pytest -q — 2696 passed, 29 skipped (macOS Layer-C Linux-only adv suite + a couple of [dev]-extras-gated tests), 3 deselected, 2 xfailed.
  • PATH=$VENV/bin:$PATH pre-commit run --files <changed> — ruff / ruff-format / mypy / detect-secrets / end-of-files / trim-trailing-whitespace / forbidden-patterns — all passed.

Surprises during implementation

  1. Probe.version is required by the cache key path even though it's not on the frozen ABC. The registry docstring at src/codegenie/probes/registry.py:30-37 is explicit that version is "a convention, not part of the frozen ABC" — but coordinator/coordinator.py reads cls.version to build cache keys. Forgetting the attribute crashes 45 tests at gather-dispatch time with AttributeError("'SkillsIndexProbe' object has no attribute 'version'"). Every probe in layer_b/ and layer_c/ carries version: str = "0.1.0" for this reason. Fixed by adding the attribute; no story-level AC names it (it's a load-bearing convention the story implicitly assumes). Flagged in lessons.

  2. AC-11's source-grep over the WHOLE module catches docstrings. My initial GREEN included a helper docstring that mentioned os.open in prose ("AC-11's source-grep interdict… does not contain os.open etc."). The substring match in the test treats docstring occurrences identically to code occurrences, which is the point — a comment "I plan to add os.open here later" would be just as load-bearing as the call itself. Reworded the docstring to use prose (opendir/readdir, "no file-opening primitives") without literal substrings.

  3. Implementation Outline §3's "preferred" deferred-schema option fails AC-19. The story's outline names the deferred-schema choice as "preferred", but the validator's AC-19 wires the schema in as a hard runtime test. Shipping the schema now is the only path that lets the executor mark the story Done with all ACs verified. Documented in the changes-shipped table above and noted as a design-pattern choice (schema-before-consumer beats schema-after-consumer when the consumer is a test in the same PR).

Refactor decisions

  • Functional core / imperative shell — 4 pure module-level helpers (_project_skill, _project_skills_sorted, _count_skills_per_tier, _compute_confidence) carry all the business logic; the probe class is orchestration only (search-path resolution + Result pattern-match + dataclass construction). Three of the four helpers have their own unit tests; the fourth (_project_skill) is exercised by every integration test.
  • Newtype preservation end-to-endIndexedSkill.applies_to_* uses tuple[TaskClassId, ...] / tuple[Language, ...], not tuple[str, ...]. ADR-0033 §1 primitive-obsession; preserves mypy --strict discipline for the Planner consumer.
  • Smart constructorIndexedSkill.body_blake3 regex-pinned to ^blake3:[0-9a-f]{64}$; a regression in the loader that drops the prefix fails at Pydantic-validation time, not silently downstream.
  • Open/Closed — Rule-of-three trigger documented in the story Notes-for-implementer §9 (do NOT extract _count_files_per_tier shared helper until the third tier-aware probe lands). _count_skills_per_tier accepts a Sequence[Path] so the shared extraction is a parameter-rename when the trigger fires.
  • Sum-type discipline_compute_confidence returns Literal["high","medium","low"] exhaustively (no else: raise); consumes the loader's Result[LoadOutcome, FatalLoadError] discriminated union via isinstance(result, Err) / Ok.

Files-to-touch table reconciliation

The story's Files-to-touch table named 5 entries; I shipped all 5 + the schema. The schema's status was "S6-08 dependency, AC-19 fails loudly until S6-08 lands it" — I chose to land it now (see surprise #3 above). No story files were dropped or skipped.

Suggested commit message

feat(phase2/S6-01): GREEN — SkillsIndexProbe Layer D body-byte-free indexer

Lands src/codegenie/probes/layer_d/skills_index.py as a
@register_probe(heaviness="light") Layer-D probe that projects
SkillsLoader.load_all() output into a typed SkillsIndexSlice carrying
the two indices the Planner queries (applies_to_tasks,
applies_to_languages), body byte-offset/size/BLAKE3 anchors, three-key
tier_counts derived via filesystem enumeration, and per-file-error JSON
round-trips. Bodies are NEVER opened by the probe — the loader (S2-01)
recorded body_offset/body_size/body_blake3 in one streaming pass; this
probe re-uses those anchors. tracemalloc test on a 100 MB sparse-body
fixture verifies peak <1 MB; AC-11 source-grep interdicts os.open /
os.read / .read_bytes / .read_text / .open( anywhere in the module.

Functional-core / imperative-shell split (4 pure helpers + orchestration
shell). Newtypes preserved end-to-end (SkillId/TaskClassId/Language —
no primitive-obsession laundering into raw str). Three-state confidence
policy (high on clean load, medium on partial success, low on total
failure or FatalLoadError). Schema shipped at
src/codegenie/schema/probes/skills_index.schema.json so AC-19 verifies
on this PR (Implementation Outline §3 offered deferring to S6-08; the
schema-before-consumer choice avoids the loud-until-S6-08 failure mode).

22 ACs verified with runtime evidence; 27 unit/integration tests added
(parametrized field-coverage + Hypothesis projection property included).
Full suite green: 2696 passed, 29 skipped, 2 xfailed. mypy --strict,
ruff, lint-imports, pre-commit, fence — all clean.

Lessons for future Phase 2 stories

  • Probe.version is load-bearing, not optional. Every probe needs version: str = "0.1.0" as a class attribute even though the frozen ABC doesn't declare it. The cache key reads it; missing it crashes 45 tests at gather time. Future Layer-D / Layer-E / Layer-G probes should copy the attribute from the closest sibling (S5-03 family is the canonical reference).
  • AC-11-style source-grep interdicts catch docstring strings. When writing a probe whose tests forbid os.open / .read_bytes / etc. anywhere in the module source, prose docstrings that mention those tokens by name will fail the test. Use synonyms (opendir/readdir, "no file-opening primitives") rather than literal forbidden substrings.
  • "Defer to a future story" can collide with hard runtime ACs. A story's Implementation Outline option to defer a dependency (e.g., schema files to S6-08) won't pass an AC that tests the dependency's existence at runtime. The executor should ship the dependency in the current PR when the AC needs it; flag the choice in the attempt log.
  • The rule-of-three trigger needs to be a concrete predicate. _count_skills_per_tier is per-Layer-D-probe today (rule-of-three not triggered). The next tier-aware Layer-D probe (S6-02 ConventionsCatalogProbe?) is the second; the third is the trigger. Refactor to _count_files_per_tier(search_paths, glob_pattern) under src/codegenie/probes/_shared/tier_counts.py at that point — not before (Rule 2 / YAGNI).