Story S6-01 — SkillsIndexProbe Layer D¶
Step: Step 6 — Ship Layer D + E + G probes (skills, conventions, ADRs, ownership, scanners)
Status: Done — GREEN 2026-05-17 (phase-story-executor; see _attempts/S6-01.md for the per-AC evidence table + gate log)
Effort: S
Depends on: S2-01 (SkillsLoader three-tier merge with O_NOFOLLOW + body_offset/body_size/body_blake3 recorded on Skill; per-file errors surfaced via LoadOutcome.per_file_errors)
ADRs honored: 02-ADR-0005 (no plaintext persistence — body bytes are never read into memory; only the offset/size/BLAKE3 anchors persist), Phase 1 ADR-0006 (safe_yaml chokepoint — frontmatter loads exclusively via safe_yaml.load), 02-ADR-0003 (@register_probe(heaviness=…) is decorator-side, not an ABC field)
Phase-2 commitment honored: "Progressive disclosure for context" (CLAUDE.md, final-design.md §"Open Q 3") — the probe records anchors to skill bodies, not the bodies themselves. The Planner reads bodies directly via the recorded body_offset at decision time.
Validation notes (2026-05-17, phase-story-validator v1)¶
This story was hardened in place against twelve block-severity contract mismatches with the actual S2-01 implementation (the loader returns Result[LoadOutcome, FatalLoadError], not Result[list[Skill], SkillsLoadError]; the Probe ABC is async def run(self, repo, ctx) with name: str = … not _run(self, ctx) with probe_id = ProbeId(…); Skill.applies_to_* carry the TaskClassId/Language newtypes, not str; Skill.body_blake3 is blake3:<64hex> not bare hex; ProbeOutput carries raw_artifacts/duration_ms/warnings, not probe_id; schemas live flat at src/codegenie/schema/probes/*.schema.json, not under a layer_d/ subdir; ProbeContext is a stdlib @dataclass with no for_test classmethod; the registry is default_registry: Registry, not _PROBE_REGISTRY; tier identity is not propagated through LoadOutcome so tier_counts is derived from a separate filesystem walk, not from the loader's return). Six new ACs added (empty fixture, FatalLoadError handling, per-file-error surfacing with the three-level confidence policy, shadowed-skill de-duplication propagation, byte-identical raw artifact, helper-driven ProbeContext construction), three mutation-resistance hardens (parametrized field-coverage smoke; property-based projection monotonicity; module-source os.open/os.read interdict), and one design-pattern harden (extract _project_skill and _count_skills_per_tier as pure helpers — functional core / imperative shell). The original draft's tier_counts derivation was unimplementable as written (the loader does not expose tier membership per skill); the harden routes it through a pure filesystem-enumeration helper that does not read body bytes and is therefore consistent with AC-16. Full report: _validation/S6-01-skills-index-probe.md. Verdict: HARDENED.
Context¶
The Planner needs to know which Skills apply to which task classes and languages when it dispatches a remediation. It does not need the skill bodies inlined into repo-context.yaml — that would explode the artifact (a typical skill body is ~3 KB; a typical install has 100+ skills; bodies are markdown the Planner will read fresh anyway). The body byte-offset + size + BLAKE3 are the recorded anchors; the gather pipeline never opens the body bytes for this probe.
SkillsLoader (S2-01) already does the three-tier merge (~/.codegenie/skills/, .codegenie/skills/, ~/.codegenie/skills-org/) and returns Result[LoadOutcome, FatalLoadError]. On Ok, LoadOutcome carries skills: list[Skill] (de-duplicated first-tier-wins) and per_file_errors: list[SkillsLoadError] (the discriminated union over SymlinkRefused | UnsafeYaml | FrontmatterUnterminated | SchemaViolation | IoFailure). Each Skill carries body_offset: int, body_size: int, and body_blake3: str (regex-pinned ^blake3:[0-9a-f]{64}$) from the loader's single streaming pass over the file. applies_to_tasks and applies_to_languages are typed list[TaskClassId] and list[Language] respectively.
The probe's job is purely indexing: project each Skill into a typed slice with the two query keys the Planner uses — applies_to_tasks and applies_to_languages — plus the offset/size/blake3 anchors, plus a tier-counts derived statistic. The body bytes never enter process memory inside this probe (the loader already paid the streaming hash cost).
The load-bearing observable: a 100 MB-body skill fixture (one SKILL.md with a 100 MB body section) gathers without the body ever entering process memory. tracemalloc peak attributable to SkillsIndexProbe.run plus the loader call it triggers must stay under 1 MB total (a generous ceiling that comfortably covers Pydantic-model allocations for the typical install of ≤ 100 skills; the loader's own envelope is ~256 KiB per its docstring §2). The 5,000x amplification factor (1 MB ceiling vs. 100 MB body) is what makes the assertion mutation-resistant: any future .read_bytes() over the body would blow the budget immediately.
References — where to look¶
- Architecture:
../phase-arch-design.md§"Component design" #9SkillsLoader— the loader returnsResult[LoadOutcome, FatalLoadError];LoadOutcome.skillsis the de-duplicated first-tier-wins list;LoadOutcome.per_file_errorscarries the discriminated union ofSymlinkRefused | UnsafeYaml | FrontmatterUnterminated | SchemaViolation | IoFailure; constructor pure, first I/O isload_all().../phase-arch-design.md§"Edge cases" rows 8/9/16 — symlink-refused, hostile-YAML, and three-tier collision are per-file failures (this skill skipped; other skills load; loud CLI warning). They do NOT collapse the whole probe toconfidence="low".../phase-arch-design.md§"Design patterns applied" row "Schema before consumer" — the slice has at least one Phase 2 consumer (tests/integration/probes/test_skills_index_probe.py).- Phase ADRs:
../ADRs/0005-secret-findings-no-plaintext-persistence.md— bodies are never loaded; the probe's commitment to anchors-only is the same chokepoint discipline applied to a different surface.../ADRs/0003-coordinator-heaviness-sort-annotation.md—heavinessis a@register_probe(heaviness=…)kwarg, NOT aProbeABC field; verified viadefault_registry.- Source design:
../High-level-impl.md§"Step 6" first bullet — slice shape and the body-offset-recorded / bodies-not-loaded commitment.../../localv2.md§5.4 D2 SkillsIndexProbe — slice shape; "Adding a new variant is dropping a new SKILL.md. No probe changes."- Existing kernel (S2-01 landed):
src/codegenie/skills/loader.py(S2-01) —SkillsLoader.load_all() -> Result[LoadOutcome, FatalLoadError];SkillsLoader.default()classmethod returns the pinned three-tier ordering.src/codegenie/skills/model.py(S2-01) —Skill,EvidenceQuery,Tier: TypeAlias = Literal["user","repo","org"],TIERS: Final[tuple[Tier,...]] = ("user","repo","org").src/codegenie/probes/base.py—ProbeABC (name: str,layer,tier,applies_to_*: list[str],requires: list[str],timeout_seconds: int,async def run(self, repo: RepoSnapshot, ctx: ProbeContext) -> ProbeOutput). Frozen contract — story DOES NOT propose edits.src/codegenie/probes/registry.py—default_registry: Registry;@register_probe(heaviness=…, runs_last=…)decorator;ProbeRegEntry(cls, heaviness, runs_last, registration_index).src/codegenie/types/identifiers.py—SkillId,TaskClassId,Language,ProbeIdnewtypes (re-exported by import).src/codegenie/schema/probes/dep_graph.schema.jsonetc. — Phase 2 schemas live FLAT undersrc/codegenie/schema/probes/(no per-layer subdir).- Test precedent:
tests/unit/probes/layer_b/test_dep_graph.py— canonical sibling test usingasyncio.run(probe.run(_make_repo(...), ctx))to invoke anasyncprobe. Mirror this idiom.tests/unit/skills/test_loader.py(S2-01) — pattern for fixture construction,content_hash_bytesfor BLAKE3-prefixed comparison,_write_skill-style helpers.
Goal¶
Implement src/codegenie/probes/layer_d/skills_index.py as a @register_probe(heaviness="light") probe that calls SkillsLoader.default().load_all() (or accepts caller-resolved search paths via ctx.config), projects each loaded Skill into an IndexedSkill typed-record carrying id: SkillId, applies_to_tasks: tuple[TaskClassId, ...], applies_to_languages: tuple[Language, ...], body_offset: int, body_size: int, body_blake3: str (the blake3:<64hex> form the loader produces — preserved verbatim, no re-hash), and emits a ProbeOutput whose schema_slice is a deterministic sorted list of IndexedSkill rows, a tier_counts: dict[Literal["user","repo","org"], int], and a per_file_errors: list[…] summary. The probe MUST NOT open any skill file (os.open, Path.open, read_bytes, read_text are all forbidden in the probe module's source); the loader paid the open/scan/hash cost once. Tier counts derive from a pure enumeration helper that walks SKILL.md filenames under each tier (filename-only — Path.rglob("SKILL.md"), no file reads), not from any field on LoadOutcome.
Acceptance criteria¶
Numbered for traceability to the TDD plan.
Module layout & types¶
- [ ] AC-1.
src/codegenie/probes/layer_d/__init__.pyexists (empty package marker; one-line module docstring naming "Layer D — organizational-knowledge probes (skills, conventions, ADRs, policy, exceptions, repo notes, repo config, external docs)"). - [ ] AC-2.
src/codegenie/probes/layer_d/skills_index.pyexports exactly__all__ = ["IndexedSkill", "SkillsIndexProbe", "SkillsIndexSlice"](alphabetical). - [ ] AC-3.
IndexedSkillis a PydanticBaseModelwithmodel_config = ConfigDict(frozen=True, extra="forbid")carrying exactly:id: SkillId,applies_to_tasks: tuple[TaskClassId, ...],applies_to_languages: tuple[Language, ...],body_offset: Annotated[int, Field(ge=0)],body_size: Annotated[int, Field(ge=0)],body_blake3: Annotated[str, Field(pattern=r"^blake3:[0-9a-f]{64}$")]. Tuples (not lists) because the slice is hash-stable. The three newtypes are preserved end-to-end (NOtuple[str, ...]— primitive-obsession regression flagged byADR-0033 §1andphase-arch-design §"Anti-patterns avoided"). - [ ] AC-4.
SkillsIndexSliceis a PydanticBaseModelwithmodel_config = ConfigDict(frozen=True, extra="forbid")carrying:skills: tuple[IndexedSkill, ...](sorted byidascending; lexicographic stable),tier_counts: dict[Literal["user", "repo", "org"], int](always exactly the three keys, never partial),per_file_errors: tuple[str, ...](each element is the JSON dumpmodel_dump_json()of oneSkillsLoadErrorvariant — strings, not raw models, becausedict[str, Any]slice surface forbids nested Pydantic).
Probe registration¶
- [ ] AC-5.
SkillsIndexProbeis@register_probe(heaviness="light")(kwarg form — 02-ADR-0003); class attributes are exactlyname: str = "skills_index",layer = "D",tier = "base",applies_to_tasks: list[str] = ["*"],applies_to_languages: list[str] = ["*"],requires: list[str] = [],timeout_seconds: int = 10,declared_inputs: list[str] = [...](set in__init__to the three search-path tokens — see AC-7). All list-typed because the frozen Phase-0ProbeABC islist[str], not tuple (the Pydantic slice uses tuples; the ABC contract is unchanged). - [ ] AC-6.
async def run(self, repo: RepoSnapshot, ctx: ProbeContext) -> ProbeOutput:is the implementation entry point (NOT a private_run— story aligns to theasync-Probe contract proven byDepGraphProbe,IndexHealthProbe, etc.). The method callsSkillsLoader(search_paths=self._resolve_search_paths(repo, ctx)).load_all()and pattern-matches theResult: - On
Ok(LoadOutcome(skills=skills, per_file_errors=errors))it projects skills via_project_skill, sorts byid, counts tiers via_count_skills_per_tier(search_paths), and emitsProbeOutput(schema_slice=slice_.model_dump(mode="json"), raw_artifacts=[raw_path], confidence=<see AC-9>, duration_ms=elapsed_ms, warnings=[], errors=[err.model_dump_json() for err in errors]). - On
Err(FatalLoadError(attempted=…))it emitsProbeOutput(schema_slice={"skills": [], "tier_counts": {"user":0,"repo":0,"org":0}, "per_file_errors": []}, raw_artifacts=[], confidence="low", duration_ms=elapsed_ms, warnings=[], errors=[fatal.model_dump_json()]).
Search-path resolution¶
- [ ] AC-7.
_resolve_search_paths(self, repo, ctx) -> list[Path]is a pure method on the probe (no I/O) that returns[user_tier, repo_tier, org_tier]resolved as follows:user_tierisPath(ctx.config.get("skills.user_path", "~/.codegenie/skills/")).expanduser();repo_tierisrepo.root / Path(ctx.config.get("skills.repo_path", ".codegenie/skills/"));org_tierisPath(ctx.config.get("skills.org_path", "~/.codegenie/skills-org/")).expanduser(). The probe'sdeclared_inputs(set in__init__) names the three string tokens"skills_user_search_path:<expanded>","skills_repo_search_path:.codegenie/skills/","skills_org_search_path:<expanded>"so cache invalidation fires when any tier path config changes (mirrors thedep_graph_strategy_set:<resolved>precedent atsrc/codegenie/probes/layer_b/dep_graph.py:420).
Confidence policy (three-state, mutation-resistant)¶
- [ ] AC-9. Confidence is computed by a pure helper
_compute_confidence(skills, per_file_errors) -> Literal["high","medium","low"]: "high"iffper_file_errors == [](every discovered file loaded cleanly), independent of whetherskills == [](an empty install is still a clean state)."medium"iffper_file_errors != []ANDskills != [](partial success — some skills loaded, some failed)."low"iffper_file_errors != []ANDskills == [](every discovered file failed; the loader found candidates but produced nothing usable).
Bodies-never-loaded — load-bearing structural proof¶
- [ ] AC-10. Bodies never loaded (tracemalloc). A
tracemalloc.start()→asyncio.run(probe.run(...))→tracemalloc.get_traced_memory()test with a fixture whoseSKILL.mdcarries a 100 MB body section assertspeak < 1024 * 1024(1 MB). Mutation caught: any futurepath.read_bytes()over the skill body would blow the budget by ~100x. The 1 MB ceiling comfortably absorbs the loader's own ~256 KiB envelope plus Pydantic-model allocations for a 4-skill fixture; tightening to 20 KB (the original draft's ceiling) is brittle against Pydantic version drift. - [ ] AC-11. No file open in the probe module (source-level interdict).
inspect.getsource(skills_index_module)does NOT contain any of the strings"os.open","os.read",".read_bytes",".read_text",".open("anywhere in the module. (The patternPath(...).openis also forbidden via the substring".open(".) The loader does every open; the probe is a pure consumer ofSkillrecords.
Anchor integrity¶
- [ ] AC-12. Anchors round-trip — BLAKE3 prefix-aware. For each indexed skill,
body_offset + body_size <= os.stat(path).st_size; opening the file in the test (not the probe) atbody_offset, readingbody_sizebytes, and computingcodegenie.hashing.content_hash_bytes(body)(which produces the canonicalblake3:<64hex>form) equalsindexed.body_blake3exactly. The test is allowed to read; the probe is not. Mutation caught: hashing the wrong byte range, dropping theblake3:prefix, or recomputing with a different hash function.
Determinism & sorting¶
- [ ] AC-13. Slice is sorted, sort is stable, JSON is byte-identical.
[s.id for s in slice_.skills] == sorted([s.id for s in slice_.skills]). Two consecutiveasyncio.run(probe.run(...))invocations on the same fixture produce byte-identicaljson.dumps(output.schema_slice, sort_keys=True). Mutation caught: dict-iteration-order leakage, non-stable sort, recomputed timestamps escaping into the slice.
Tier counts (filesystem-enumeration-derived, body-free)¶
- [ ] AC-14. Tier counts via pure enumeration. When the fixture has 3 SKILL.md files under
user_tier, 1 underrepo_tier, 0 underorg_tier, the slice'stier_counts == {"user": 3, "repo": 1, "org": 0}exactly. Counts are derived from_count_skills_per_tier(search_paths: list[Path]) -> dict[Tier, int](pure helper that doeslen(list(path.rglob("SKILL.md")))per tier — filename enumeration only, no file reads, no symlink follow; missing tier paths count as 0). Counts are pre-de-duplication (a shadowed skill counts as 1 in the tier where the file lives) sosum(tier_counts.values()) >= len(slice_.skills).
Empty & fatal corners¶
- [ ] AC-15. Empty fixture. With no
SKILL.mdfiles anywhere under any tier, the slice hasskills == (),tier_counts == {"user":0,"repo":0,"org":0},per_file_errors == (), andconfidence == "high"(clean empty state, not an error). - [ ] AC-16. FatalLoadError handling. If
SkillsLoader.load_all()returnsErr(FatalLoadError(reason="all_tiers_unreadable", attempted=[…])), the probe emitsconfidence="low", schema_slice with empty skills and zeroed tier_counts, anderrors=[<json of FatalLoadError>]. The probe does NOT re-raise the FatalLoadError.
Per-file errors surface (do not swallow)¶
- [ ] AC-17. Per-file errors round-trip. A fixture with one good
SKILL.mdand one symlinkedSKILL.mdproducesconfidence="medium",len(slice_.skills) == 1,len(slice_.per_file_errors) == 1, andjson.loads(slice_.per_file_errors[0])["reason"] == "symlink_refused". Mutation caught: silently droppingper_file_errors, conflating "some failed" with "all failed", erroneously raising on symlink. (Replaces the original draft's AC-11 which incorrectly assumedResult.Erron symlink — per-file failures stay insideOk(LoadOutcome).)
Shadowing propagation¶
- [ ] AC-18. Shadowed-skill de-duplication propagates. A fixture with the same
id: alphaskill under both user and repo tiers produces exactly oneIndexedSkill(id=SkillId("alpha"))in the slice (first-tier-wins; user wins), ANDtier_counts == {"user": 1, "repo": 1, "org": 0}(both files are present on disk; only one survives de-dup; the count reflects on-disk presence — operators reading the report see that a shadow happened). The loader'sskill_shadowedstructlog warning is observable but the probe does NOT need to re-emit it (already-loud at S2-01).
Schema validation¶
- [ ] AC-19. Schema slice validates.
tests/unit/probes/layer_d/test_skills_index.py::test_slice_matches_subschemaasserts the JSON-dumped slice round-trips throughsrc/codegenie/schema/probes/skills_index.schema.json(FLAT layout —schema/probes/, NOTschema/probes/layer_d/; the existingdep_graph.schema.json,index_health.schema.jsonetc. precedents prove the flat convention). The sub-schema ships in S6-08; this AC pins the consumer-side import path so S6-08 failing to ship the schema (or shipping it under the wrong name) is loud, not silent. Test imports viafrom importlib.resources import files; files("codegenie.schema.probes") / "skills_index.schema.json".
Registry annotation¶
- [ ] AC-20. Registry annotation —
heaviness="light". A test importsfrom codegenie.probes.registry import default_registry, finds the entry viaentry = next(e for e in default_registry._entries if e.cls.name == "skills_index")(or via a publicRegistry.for_task(...)if one exists), and assertsentry.heaviness == "light"andentry.runs_last is False. Uses the actual registry surface — there is no_PROBE_REGISTRYdict (the original draft was a documentation-bug; the canonical surface isdefault_registry: Registry).
Static typing¶
- [ ] AC-21.
mypy --strictpasses onsrc/codegenie/probes/layer_d/skills_index.py; noAnyescapes the slice.SkillId/TaskClassId/Languageare preserved through projection (nocast(str, …)to launder the newtypes into rawstr). The pydantic models forbidAnyfield types.
Mutation-resistance / property-based¶
- [ ] AC-22. Field-coverage parametrized smoke. A parametrized test iterates over
IndexedSkill.model_fields.keys()and asserts the projectedIndexedSkillfor a canonical fixture has every field set to a non-default value (e.g.,body_offset > 0,body_size > 0,body_blake3starts with"blake3:",idnon-empty, bothapplies_to_*tuples non-empty). Mutation caught: silently dropping a field in_project_skill(e.g., forgettingapplies_to_languages). - [ ] AC-23. Hypothesis property — projection is order-preserving and cardinality-preserving. A
@given(skill_lists())property test (using a Hypothesis strategy that buildsSkillinstances directly, no filesystem) asserts: len(_project_skills_sorted(skills)) == len({s.id for s in skills})(cardinality matches the input's unique-by-id count — proves projection neither duplicates nor drops).- The projected tuple is sorted:
[s.id for s in _project_skills_sorted(skills)] == sorted({s.id for s in skills}).
Catches subtle bugs the example-based tests miss (e.g., a stable-sort-but-not-actually-sorted regression on edge cases like single-element or identical-prefix IDs).
Implementation outline¶
- Create
src/codegenie/probes/layer_d/__init__.pywith the one-line module docstring (per AC-1). - Create
src/codegenie/probes/layer_d/skills_index.py: - Module docstring naming arch §"Component design" #9 and the progressive-disclosure commitment.
- Functional core (top of file):
_project_skill(skill: Skill) -> IndexedSkill,_project_skills_sorted(skills: Sequence[Skill]) -> tuple[IndexedSkill, ...],_count_skills_per_tier(search_paths: list[Path]) -> dict[Tier, int],_compute_confidence(skills, per_file_errors) -> Literal["high","medium","low"]. All pure (no I/O except_count_skills_per_tier'srglob, which reads directory entries only — not file bodies). Each helper is independently testable without a Probe instance. IndexedSkill(BaseModel)andSkillsIndexSlice(BaseModel)per AC-3/AC-4.- Imperative shell:
@register_probe(heaviness="light")class SkillsIndexProbe(Probe):with the ABC-correct class attributes (AC-5).__init__(self) -> None: setsself.declared_inputsbased on the resolved search-path tokens (AC-7)._resolve_search_paths(self, repo: RepoSnapshot, ctx: ProbeContext) -> list[Path]: pure resolution (AC-7).async def run(self, repo: RepoSnapshot, ctx: ProbeContext) -> ProbeOutput: dispatches toSkillsLoader(search_paths=...).load_all(), pattern-matches theResult, calls the four pure helpers, writes the raw artifact toctx.output_dir / "skills-index.json", and returns theProbeOutput.
- Add
src/codegenie/schema/probes/skills_index.schema.jsonplaceholder OR — preferred — leave the schema file as an S6-08 dependency and have AC-19 fail loudly until S6-08 lands it. The implementer flags this as an explicit S6-08 dependency in the PR description. - Write
tests/unit/probes/layer_d/__init__.py(empty marker) andtests/unit/probes/layer_d/test_skills_index.pyper the TDD plan. - Write
tests/integration/probes/test_skills_index_probe.pyexercising the 100 MB-body fixture undertracemalloc(AC-10) — sparse-file generation (os.truncate) to keep CI wall-clock under 1 s.
TDD plan — red / green / refactor¶
Red — write the failing tests first¶
Test file path: tests/unit/probes/layer_d/test_skills_index.py. Each test is keyed to one or more ACs and names the mutation it catches.
# tests/unit/probes/layer_d/test_skills_index.py
"""Unit + integration tests for SkillsIndexProbe (S6-01).
Each test is keyed to an AC and names the mutation it catches in its
docstring (Rule 9 — tests verify intent).
"""
from __future__ import annotations
import asyncio
import inspect
import json
import logging
import tracemalloc
from pathlib import Path
from typing import Any
import pytest
from codegenie.hashing import content_hash_bytes
from codegenie.probes.base import ProbeContext, ProbeOutput, RepoSnapshot
from codegenie.probes.layer_d import skills_index as si
from codegenie.probes.registry import default_registry
from codegenie.types.identifiers import SkillId
def _make_context(tmp_path: Path, *, config_overrides: dict[str, Any] | None = None) -> ProbeContext:
"""Construct a ``ProbeContext`` with every required field explicit.
Catches missing-required-field regressions immediately: any future
addition to the dataclass without an ADR amendment fails *this* line
rather than each individual test in a flaky way.
"""
config: dict[str, Any] = {
"skills.user_path": str(tmp_path / "user" / "skills"),
"skills.repo_path": ".codegenie/skills/",
"skills.org_path": str(tmp_path / "user" / "skills-org"),
}
if config_overrides is not None:
config.update(config_overrides)
return ProbeContext(
cache_dir=tmp_path / "cache",
output_dir=tmp_path / "out",
workspace=tmp_path / "work",
logger=logging.getLogger("test"),
config=config,
)
def _make_repo(tmp_path: Path) -> RepoSnapshot:
return RepoSnapshot(root=tmp_path / "repo", git_commit=None, detected_languages={}, config={})
def _write_skill(
path: Path, *, applies_to_tasks: list[str], applies_to_languages: list[str] = ["*"], body: str = "body\n"
) -> None:
"""Write a SKILL.md file. Caller pre-creates the parent dir."""
path.write_text(
"---\n"
f"id: {path.parent.name}\n"
f"applies_to_tasks: {applies_to_tasks!r}\n"
f"applies_to_languages: {applies_to_languages!r}\n"
"---\n"
+ body
)
def _run_probe(repo: RepoSnapshot, ctx: ProbeContext) -> ProbeOutput:
"""Sync wrapper for ``async def run`` (mirrors test_dep_graph.py:108 idiom)."""
return asyncio.run(si.SkillsIndexProbe().run(repo, ctx))
# --- AC-3, AC-4, AC-13 --------------------------------------------------------
def test_slice_is_sorted_and_frozen(tmp_path: Path) -> None:
"""AC-3, AC-13. Mutation caught: returning skills in load order
(would break two-consecutive-gathers determinism) — assertion pins
lexical sort on ``id``. Also: changing ``tuple`` to ``list`` on
IndexedSkill — hash-stability of the slice depends on tuple."""
user_dir = tmp_path / "user" / "skills"
(user_dir / "zebra").mkdir(parents=True)
(user_dir / "alpha").mkdir()
_write_skill(user_dir / "zebra" / "SKILL.md", applies_to_tasks=["distroless_migration"])
_write_skill(user_dir / "alpha" / "SKILL.md", applies_to_tasks=["vulnerability_remediation"])
output = _run_probe(_make_repo(tmp_path), _make_context(tmp_path))
slice_ = si.SkillsIndexSlice.model_validate(output.schema_slice)
assert [s.id for s in slice_.skills] == ["alpha", "zebra"]
with pytest.raises(Exception): # frozen Pydantic model
slice_.skills[0].id = "mutated" # type: ignore[misc]
def test_two_consecutive_gathers_byte_identical_json(tmp_path: Path) -> None:
"""AC-13 (second clause). Mutation caught: any timestamp leakage,
dict-iteration-order escape, or non-stable sort."""
user_dir = tmp_path / "user" / "skills" / "a"
user_dir.mkdir(parents=True)
_write_skill(user_dir / "SKILL.md", applies_to_tasks=["t1"])
out1 = _run_probe(_make_repo(tmp_path), _make_context(tmp_path))
out2 = _run_probe(_make_repo(tmp_path), _make_context(tmp_path))
assert json.dumps(out1.schema_slice, sort_keys=True) == json.dumps(out2.schema_slice, sort_keys=True)
# --- AC-10, AC-11 — load-bearing progressive-disclosure -----------------------
def test_tracemalloc_peak_under_1mb_on_100mb_body(tmp_path: Path) -> None:
"""AC-10. Mutation caught: any future ``path.read_bytes()`` /
``open(path).read()`` over the body section. The fixture has a 100 MB
body; reading it would blow the budget by ~100x.
Uses a sparse file (os.truncate) so the test wall-clock stays under 1 s
while the file-size invariant the loader walks is real.
"""
import os as _os
user_dir = tmp_path / "user" / "skills" / "big"
user_dir.mkdir(parents=True)
skill_path = user_dir / "SKILL.md"
skill_path.write_text(
"---\nid: big\napplies_to_tasks: ['t1']\napplies_to_languages: ['*']\n---\n"
)
# Extend the file to 100 MB via sparse truncate (O(1) on most filesystems).
with skill_path.open("rb+") as fh:
fh.seek(100 * 1024 * 1024 - 1)
fh.write(b"\0")
_os.utime(skill_path, None)
tracemalloc.start()
try:
output = _run_probe(_make_repo(tmp_path), _make_context(tmp_path))
_current, peak = tracemalloc.get_traced_memory()
finally:
tracemalloc.stop()
assert peak < 1024 * 1024, f"peak {peak} bytes exceeds 1 MB ceiling"
assert output.confidence in ("high", "medium") # not "low"
def test_probe_module_source_has_no_file_open(tmp_path: Path) -> None:
"""AC-11. Architectural test: source-introspect the WHOLE module
(not just the class) and confirm no file-opening primitives appear.
Mutation caught: a "convenience" addition like
``body_preview = path.read_text()[:200]`` would fail this test
immediately, before tracemalloc has to catch it at runtime.
"""
src = inspect.getsource(si)
for forbidden in ("os.open", "os.read", ".read_bytes", ".read_text", ".open("):
assert forbidden not in src, (
f"SkillsIndex module must not open skill body bytes; found {forbidden!r}. "
"The loader (S2-01) records body_offset/body_size/body_blake3 once; "
"the probe re-uses those anchors. Body reads are the Planner's job."
)
# --- AC-12 — anchors round-trip (BLAKE3 prefix-aware) ------------------------
def test_recorded_anchors_match_actual_body_blake3(tmp_path: Path) -> None:
"""AC-12. Mutation caught: hashing the wrong byte range (e.g.,
including the closing ``---`` separator), dropping the ``blake3:``
prefix, or hashing the frontmatter instead of the body."""
user_dir = tmp_path / "user" / "skills" / "small"
user_dir.mkdir(parents=True)
skill_path = user_dir / "SKILL.md"
_write_skill(skill_path, applies_to_tasks=["t1"], body="body bytes\n")
output = _run_probe(_make_repo(tmp_path), _make_context(tmp_path))
slice_ = si.SkillsIndexSlice.model_validate(output.schema_slice)
indexed = slice_.skills[0]
raw = skill_path.read_bytes() # test is allowed to read; the probe is not.
body = raw[indexed.body_offset : indexed.body_offset + indexed.body_size]
assert content_hash_bytes(body) == indexed.body_blake3 # ``blake3:<64hex>``
assert indexed.body_offset + indexed.body_size <= len(raw)
assert indexed.body_blake3.startswith("blake3:")
# --- AC-14 — tier counts (filesystem-enumeration-derived) ---------------------
def test_tier_counts_match_three_tier_layout(tmp_path: Path) -> None:
"""AC-14. Mutation caught: counting *all* skills as "user" tier;
miscounting empty org tier as 1 instead of 0; missing tier dir
raising instead of counting 0."""
user = tmp_path / "user" / "skills"
repo_root = tmp_path / "repo"
repo_dir = repo_root / ".codegenie" / "skills"
# org tier intentionally never created on disk
user.mkdir(parents=True)
repo_dir.mkdir(parents=True)
for name in ("a", "b", "c"):
(user / name).mkdir()
_write_skill(user / name / "SKILL.md", applies_to_tasks=["t1"])
(repo_dir / "x").mkdir()
_write_skill(repo_dir / "x" / "SKILL.md", applies_to_tasks=["t1"])
repo = RepoSnapshot(root=repo_root, git_commit=None, detected_languages={}, config={})
output = _run_probe(repo, _make_context(tmp_path))
slice_ = si.SkillsIndexSlice.model_validate(output.schema_slice)
assert slice_.tier_counts == {"user": 3, "repo": 1, "org": 0}
# --- AC-15 — empty fixture ----------------------------------------------------
def test_empty_fixture_yields_high_confidence(tmp_path: Path) -> None:
"""AC-15. Mutation caught: treating zero skills as an error
(returning ``confidence="low"`` on a clean empty install)."""
output = _run_probe(_make_repo(tmp_path), _make_context(tmp_path))
slice_ = si.SkillsIndexSlice.model_validate(output.schema_slice)
assert slice_.skills == ()
assert slice_.tier_counts == {"user": 0, "repo": 0, "org": 0}
assert slice_.per_file_errors == ()
assert output.confidence == "high"
# --- AC-17 — per-file errors round-trip --------------------------------------
def test_symlinked_skill_yields_medium_confidence_no_raise(tmp_path: Path) -> None:
"""AC-17. Mutation caught: re-raising ``SkillsLoadError`` from the
probe (would break Phase 0 coordinator failure-isolation); swallowing
``per_file_errors`` silently; conflating partial-success with
total-failure."""
user_dir = tmp_path / "user" / "skills"
(user_dir / "good").mkdir(parents=True)
_write_skill(user_dir / "good" / "SKILL.md", applies_to_tasks=["t1"])
(user_dir / "linked").mkdir()
real = tmp_path / "real_skill.md"
real.write_text("---\nid: malicious\napplies_to_tasks: ['*']\napplies_to_languages: ['*']\n---\nx\n")
(user_dir / "linked" / "SKILL.md").symlink_to(real)
output = _run_probe(_make_repo(tmp_path), _make_context(tmp_path))
slice_ = si.SkillsIndexSlice.model_validate(output.schema_slice)
assert output.confidence == "medium"
assert len(slice_.skills) == 1
assert len(slice_.per_file_errors) == 1
assert json.loads(slice_.per_file_errors[0])["reason"] == "symlink_refused"
# --- AC-18 — shadowed-skill de-duplication propagates -----------------------
def test_shadowed_skill_propagates_first_tier_wins(tmp_path: Path) -> None:
"""AC-18. Mutation caught: emitting both copies (de-dup leaked);
tier_counts collapsing to 1+0 instead of 1+1 (on-disk presence
intentionally surfaces the shadow to operators)."""
user = tmp_path / "user" / "skills" / "alpha"
user.mkdir(parents=True)
_write_skill(user / "SKILL.md", applies_to_tasks=["user_wins"])
repo_root = tmp_path / "repo"
repo_dir = repo_root / ".codegenie" / "skills" / "alpha"
repo_dir.mkdir(parents=True)
_write_skill(repo_dir / "SKILL.md", applies_to_tasks=["repo_loses"])
repo = RepoSnapshot(root=repo_root, git_commit=None, detected_languages={}, config={})
output = _run_probe(repo, _make_context(tmp_path))
slice_ = si.SkillsIndexSlice.model_validate(output.schema_slice)
assert [s.id for s in slice_.skills] == ["alpha"]
assert slice_.skills[0].applies_to_tasks == ("user_wins",)
assert slice_.tier_counts == {"user": 1, "repo": 1, "org": 0}
# --- AC-16 — FatalLoadError ---------------------------------------------------
def test_fatal_load_error_yields_low_confidence(tmp_path: Path, monkeypatch: pytest.MonkeyPatch) -> None:
"""AC-16. Mutation caught: re-raising FatalLoadError, emitting empty
schema_slice (which would fail AC-4's three-key invariant on
tier_counts)."""
from codegenie.result import Err
from codegenie.skills import loader as loader_mod
def _fake_load_all(self: Any) -> Any:
return Err(error=loader_mod.FatalLoadError(attempted=[Path("/nonexistent")]))
monkeypatch.setattr(loader_mod.SkillsLoader, "load_all", _fake_load_all)
output = _run_probe(_make_repo(tmp_path), _make_context(tmp_path))
slice_ = si.SkillsIndexSlice.model_validate(output.schema_slice)
assert output.confidence == "low"
assert slice_.skills == ()
assert slice_.tier_counts == {"user": 0, "repo": 0, "org": 0}
assert any("all_tiers_unreadable" in e for e in output.errors)
# --- AC-20 — registry annotation ---------------------------------------------
def test_registry_heaviness_is_light() -> None:
"""AC-20. Mutation caught: bumping to ``heaviness="medium"`` would
cause the coordinator to reserve the wrong scheduling slot;
forgetting the kwarg form would default to "light" silently. Verifies
against the actual registry surface (``default_registry``), not a
stand-in ``_PROBE_REGISTRY``."""
entry = next(e for e in default_registry._entries if e.cls.name == "skills_index")
assert entry.heaviness == "light"
assert entry.runs_last is False
# --- AC-22 — field-coverage parametrized smoke ------------------------------
@pytest.mark.parametrize("field_name", list(si.IndexedSkill.model_fields.keys()))
def test_every_indexed_skill_field_populated_from_canonical_fixture(
tmp_path: Path, field_name: str
) -> None:
"""AC-22. Mutation caught: silently dropping a field in
``_project_skill`` (e.g., forgetting ``applies_to_languages``).
Parametrized over the model's declared fields so adding a new field
auto-extends the test surface."""
user = tmp_path / "user" / "skills" / "canonical"
user.mkdir(parents=True)
_write_skill(
user / "SKILL.md",
applies_to_tasks=["distroless_migration"],
applies_to_languages=["python"],
body="body body body\n",
)
output = _run_probe(_make_repo(tmp_path), _make_context(tmp_path))
slice_ = si.SkillsIndexSlice.model_validate(output.schema_slice)
indexed = slice_.skills[0]
value = getattr(indexed, field_name)
# Each field must be populated to a non-default-shaped value.
if field_name == "body_blake3":
assert isinstance(value, str) and value.startswith("blake3:") and len(value) == 71
elif field_name in ("body_offset", "body_size"):
assert isinstance(value, int) and value > 0
elif field_name == "id":
assert isinstance(value, str) and value == "canonical"
elif field_name in ("applies_to_tasks", "applies_to_languages"):
assert isinstance(value, tuple) and len(value) >= 1
else:
pytest.fail(f"AC-22 not updated for new field {field_name!r}")
# --- AC-23 — Hypothesis projection property ----------------------------------
def test_projection_is_cardinality_and_order_preserving() -> None:
"""AC-23. Property: for any list of Skill instances with unique ids,
``_project_skills_sorted`` returns a tuple of the same length whose
ids are the sorted set of input ids. Mutation caught: stable-sort
bugs on single-element or identical-prefix IDs; accidental
de-duplication; sort-key drift."""
from hypothesis import given, strategies as st
from codegenie.skills.model import Skill
from codegenie.types.identifiers import Language, SkillId, TaskClassId
@st.composite
def _skills(draw: Any) -> list[Skill]:
ids = draw(st.lists(st.text(min_size=1, max_size=8, alphabet="abcdefg"), min_size=0, max_size=12, unique=True))
return [
Skill(
id=SkillId(i),
applies_to_tasks=[TaskClassId("*")],
applies_to_languages=[Language("*")],
body_offset=10,
body_size=10,
body_blake3="blake3:" + ("0" * 64),
)
for i in ids
]
@given(_skills())
def _prop(skills: list[Skill]) -> None:
projected = si._project_skills_sorted(skills)
assert len(projected) == len({s.id for s in skills})
assert [s.id for s in projected] == sorted({s.id for s in skills})
_prop() # type: ignore[call-arg]
# --- AC-19 — sub-schema validation (consumer side) ---------------------------
def test_slice_matches_subschema(tmp_path: Path) -> None:
"""AC-19. Mutation caught: schema drift — a future change to
``IndexedSkill`` (e.g., renaming ``body_offset`` → ``offset``) would
fail the JSON-Schema round-trip. Schema file ships in S6-08; this
test references it by ``importlib.resources`` so the schema is
`Schema before consumer` (arch §"Design patterns applied")."""
import jsonschema
from importlib.resources import files
schema = json.loads(
(files("codegenie.schema.probes") / "skills_index.schema.json").read_text()
)
user = tmp_path / "user" / "skills" / "a"
user.mkdir(parents=True)
_write_skill(user / "SKILL.md", applies_to_tasks=["t1"])
output = _run_probe(_make_repo(tmp_path), _make_context(tmp_path))
jsonschema.validate(output.schema_slice, schema)
Run pytest tests/unit/probes/layer_d/test_skills_index.py — fails with ModuleNotFoundError. Commit the red marker.
Green — make it pass¶
# src/codegenie/probes/layer_d/skills_index.py
"""SkillsIndexProbe — Layer D, light heaviness.
Projects ``SkillsLoader.load_all()`` output into a typed slice carrying
only the indices the Planner queries (``applies_to_tasks``,
``applies_to_languages``) plus the body byte-offset anchors. Bodies are
NEVER opened by this probe — the loader (S2-01) records
``body_offset``/``body_size``/``body_blake3`` in one pass over
frontmatter via ``safe_yaml.load``; this probe re-uses those anchors.
Sources:
- ../phase-arch-design.md §"Component design" #9 — loader contract.
- ../../localv2.md §5.4 D2 — slice shape.
- ../ADRs/0005-secret-findings-no-plaintext-persistence.md — same
"no plaintext" discipline applied to a different surface.
- ../ADRs/0003-coordinator-heaviness-sort-annotation.md — ``heaviness``
is registry-side, not on the Probe ABC.
"""
from __future__ import annotations
import json
import time
from collections.abc import Sequence
from pathlib import Path
from typing import Annotated, Literal
from pydantic import BaseModel, ConfigDict, Field
from codegenie.probes.base import Probe, ProbeContext, ProbeOutput, RepoSnapshot
from codegenie.probes.registry import register_probe
from codegenie.skills.loader import FatalLoadError, LoadOutcome, SkillsLoader, SkillsLoadError
from codegenie.skills.model import TIERS, Skill, Tier
from codegenie.result import Err, Ok
from codegenie.types.identifiers import Language, SkillId, TaskClassId
__all__ = ["IndexedSkill", "SkillsIndexProbe", "SkillsIndexSlice"]
# ---------------------------------------------------------------------------
# Pydantic models (slice surface).
# ---------------------------------------------------------------------------
class IndexedSkill(BaseModel):
model_config = ConfigDict(frozen=True, extra="forbid")
id: SkillId
applies_to_tasks: tuple[TaskClassId, ...]
applies_to_languages: tuple[Language, ...]
body_offset: Annotated[int, Field(ge=0)]
body_size: Annotated[int, Field(ge=0)]
body_blake3: Annotated[str, Field(pattern=r"^blake3:[0-9a-f]{64}$")]
class SkillsIndexSlice(BaseModel):
model_config = ConfigDict(frozen=True, extra="forbid")
skills: tuple[IndexedSkill, ...]
tier_counts: dict[Literal["user", "repo", "org"], int]
per_file_errors: tuple[str, ...]
# ---------------------------------------------------------------------------
# Functional core — pure helpers, independently testable.
# ---------------------------------------------------------------------------
def _project_skill(skill: Skill) -> IndexedSkill:
"""Project a loaded ``Skill`` into the indexable slice shape.
Newtypes preserved end-to-end (ADR-0033 §1 primitive-obsession).
"""
return IndexedSkill(
id=skill.id,
applies_to_tasks=tuple(skill.applies_to_tasks),
applies_to_languages=tuple(skill.applies_to_languages),
body_offset=skill.body_offset,
body_size=skill.body_size,
body_blake3=skill.body_blake3,
)
def _project_skills_sorted(skills: Sequence[Skill]) -> tuple[IndexedSkill, ...]:
"""Project + lexicographic-stable sort by ``id``."""
return tuple(_project_skill(s) for s in sorted(skills, key=lambda s: s.id))
def _count_skills_per_tier(search_paths: Sequence[Path]) -> dict[Tier, int]:
"""Count SKILL.md filenames per tier — enumeration only, no body reads.
Missing tier paths count as 0. Symlinked SKILL.md still counts as a
discovery (the per-file load is the loader's concern; this is a
presence-on-disk statistic).
"""
counts: dict[Tier, int] = {t: 0 for t in TIERS}
for tier, path in zip(TIERS, search_paths):
if path.exists():
counts[tier] = sum(1 for _ in path.rglob("SKILL.md"))
return counts
def _compute_confidence(
skills: Sequence[Skill], per_file_errors: Sequence[SkillsLoadError]
) -> Literal["high", "medium", "low"]:
"""Three-state confidence (see story AC-9)."""
if not per_file_errors:
return "high"
if skills:
return "medium"
return "low"
# ---------------------------------------------------------------------------
# Imperative shell — the probe class.
# ---------------------------------------------------------------------------
@register_probe(heaviness="light")
class SkillsIndexProbe(Probe):
name: str = "skills_index"
layer = "D"
tier = "base"
applies_to_tasks: list[str] = ["*"]
applies_to_languages: list[str] = ["*"]
requires: list[str] = []
timeout_seconds: int = 10
def __init__(self) -> None:
super().__init__()
# Search-path tokens land in declared_inputs so cache invalidates
# when any tier path config changes (mirrors dep_graph.py:420
# ``dep_graph_strategy_set:<resolved>`` precedent).
self.declared_inputs = [
"skills_user_search_path:~/.codegenie/skills/",
"skills_repo_search_path:.codegenie/skills/",
"skills_org_search_path:~/.codegenie/skills-org/",
]
def _resolve_search_paths(self, repo: RepoSnapshot, ctx: ProbeContext) -> list[Path]:
user = Path(ctx.config.get("skills.user_path", "~/.codegenie/skills/")).expanduser()
repo_tier = repo.root / Path(ctx.config.get("skills.repo_path", ".codegenie/skills/"))
org = Path(ctx.config.get("skills.org_path", "~/.codegenie/skills-org/")).expanduser()
return [user, repo_tier, org]
async def run(self, repo: RepoSnapshot, ctx: ProbeContext) -> ProbeOutput:
t0 = time.perf_counter()
search_paths = self._resolve_search_paths(repo, ctx)
result = SkillsLoader(search_paths=search_paths).load_all()
if isinstance(result, Err):
slice_ = SkillsIndexSlice(
skills=(),
tier_counts={t: 0 for t in TIERS},
per_file_errors=(),
)
return ProbeOutput(
schema_slice=slice_.model_dump(mode="json"),
raw_artifacts=[],
confidence="low",
duration_ms=int((time.perf_counter() - t0) * 1000),
warnings=[],
errors=[result.error.model_dump_json()],
)
assert isinstance(result, Ok)
outcome: LoadOutcome = result.value
slice_ = SkillsIndexSlice(
skills=_project_skills_sorted(outcome.skills),
tier_counts=_count_skills_per_tier(search_paths),
per_file_errors=tuple(err.model_dump_json() for err in outcome.per_file_errors),
)
confidence = _compute_confidence(outcome.skills, outcome.per_file_errors)
ctx.output_dir.mkdir(parents=True, exist_ok=True)
raw_path = ctx.output_dir / "skills-index.json"
raw_path.write_text(json.dumps(slice_.model_dump(mode="json"), sort_keys=True, indent=2))
return ProbeOutput(
schema_slice=slice_.model_dump(mode="json"),
raw_artifacts=[raw_path],
confidence=confidence,
duration_ms=int((time.perf_counter() - t0) * 1000),
warnings=[],
errors=[e for e in slice_.per_file_errors],
)
Refactor¶
_project_skill,_project_skills_sorted,_count_skills_per_tier,_compute_confidenceare pure module-level functions (functional core). The probe class is the imperative shell — opens no files, reads no bodies, performs no business logic beyond orchestration. This makes every business invariant unit-testable withoutasyncio.runor filesystem fixtures.- No shared "tier classifier" abstraction yet — Layer D has exactly one tier-aware probe (this one); rule-of-three has not triggered. Open/Closed seam documented: when the third tier-aware Layer-D probe appears (S6-02
ConventionsCatalogProbeis tier-aware but with a different tier set; a third would push past rule-of-three), extract_count_files_per_tier(search_paths, glob_pattern)intosrc/codegenie/probes/_shared/tier_counts.py. Tracker: file an issue when S6-02 lands. - The Pydantic models stay co-located with the probe (Layer D's sub-schemas are probe-owned; cross-probe sharing within Layer D is not anticipated).
Files to touch¶
| Path | Why |
|---|---|
src/codegenie/probes/layer_d/__init__.py |
New file — package marker; one-line module docstring naming Layer D's role. |
src/codegenie/probes/layer_d/skills_index.py |
New file — pure helpers + IndexedSkill + SkillsIndexSlice + SkillsIndexProbe. |
tests/unit/probes/layer_d/__init__.py |
New file — empty package marker. |
tests/unit/probes/layer_d/test_skills_index.py |
New file — 14 tests keyed to ACs (including the parametrized field-coverage and Hypothesis property). |
(S6-08 dependency) src/codegenie/schema/probes/skills_index.schema.json |
Forward dependency. AC-19 fails loudly until S6-08 ships the schema; implementer flags in PR description. |
Out of scope¶
- Skill body reading. The probe records anchors only. The Planner reads bodies at decision time via the recorded
body_offset. - Conventions, ADRs, policy, exceptions, repo notes, repo config, external docs. Separate Layer D probes (S6-02, S6-03, S6-04).
- Layer D sub-schema authoring. Ships in S6-08 alongside the freshness registrations; AC-19 surfaces the dependency.
- Three-tier-merge collision handling. S2-01's
SkillsLoaderemitsskill_shadowedwarnings; the probe consumes the de-duplicated list. Surfacing the warning in the slice (separate fromper_file_errors) is a Phase-3 concern. - Hostile-skills-yaml adversarial. Ships in S7-04 (
tests/adv/phase02/test_hostile_skills_yaml.py). - Adding
tiermembership toSkill/LoadOutcome. That is an S2-01 amendment and out of scope for S6-01;tier_countsderives from a separate filesystem walk instead. Documented in "Notes for the implementer" §9 as the rule-of-three trigger. - Editing the Phase 0
ProbeABC orProbeContextdataclass. Frozen contract per ADR-0007.
Notes for the implementer¶
- The 100 MB-body fixture is load-bearing. Use
os.truncate(sparse file) so disk write is O(1); the test still reads the offset/size from a real on-disk path. The 1 MB ceiling ontracemallocpeak is loose-but-load-bearing — it absorbs Pydantic-model allocations comfortably for ≤ 100 skills, and is still 100× tighter than what aread_bytes()regression would consume. Do not tighten to 20 KB; the original draft's 20 KB ceiling was brittle against Pydantic version drift, and the 1 MB margin keeps the test green across Pydantic 2.x point releases. SkillsLoaderalready paid the open cost. S2-01's loader doesos.open(path, O_NOFOLLOW | O_NOCTTY)→ streaming-scan frontmatter →safe_yaml.loadon a tempfile →content_hash_fd(fd, offset=body_offset, size=body_size)(which produces theblake3:<64hex>form). This probe re-uses those values verbatim. Do not re-open the file. Do not re-hash. Do not strip theblake3:prefix (the regex onIndexedSkill.body_blake3enforces the prefix).- No body preview field, ever. A "convenience"
body_preview: strfield onIndexedSkillwould force the probe to read bytes, breaking AC-10 and AC-11. If the Planner needs a preview, it readsbody[:200]itself at decision time (the operation is O(1) and per-decision, not per-gather). - Newtypes preserved.
SkillId,TaskClassId,Languageare the Phase-2NewTypes incodegenie.types.identifiers. Do notcast(str, …)to launder them away — that re-introduces the primitive-obsession ADR-0033 §1 explicitly forbids. The runtime cost of preserving aNewTypeis zero (identity-to-str). tier_countsisdict[Literal[...], int], notdict[str, int]. The three tiers are a closed set; a string-keyed dict invites primitive obsession ("org" vs. "Org" typos at consumer side). TheLiteralmakes the consumer'smatchexhaustive.- Confidence is three-state, not two.
"high"on a clean load (including the empty-install case),"medium"on partial success,"low"only on total failure orFatalLoadError. The original draft's "no medium" rule was based on the (incorrect) belief that the loader is all-or-error per call; it is in fact partial-success per file (the loader'sLoadOutcome.per_file_errorsexists precisely for this case). tracemallocis the right instrument. Phase 2 has nomemory_profilerdep;tracemallocis stdlib and deterministic.- Sub-schema location. S6-08 lands
src/codegenie/schema/probes/skills_index.schema.json(FLAT layout — nolayer_d/subdir; matches the existingdep_graph.schema.json,index_health.schema.json, etc. precedents). AC-19 references it viaimportlib.resourcesso the test fails loudly if S6-08 forgets to ship it (or ships it under the wrong name). Don't inline the schema into the probe file — Phase 2 keeps schemas as JSON resources. - Rule-of-three threshold (design-pattern open question).
SkillsLoaderdoes not currently expose tier membership per loadedSkill. This story derivestier_countsvia a separatePath.rglob("SKILL.md")walk. When the second tier-aware Layer-D probe needs the same derivation (ConventionsCatalogProbein S6-02 has tier-like layout but a different tier set; the third such consumer is the trigger), extract_count_files_per_tier(search_paths, glob_pattern)into a shared module. Until then, three similar lines (the per-tierrglobcount) is better than premature abstraction (Rule 2). Do not extendSkillorLoadOutcomewith atierfield as part of S6-01 — that is an S2-01 amendment and would be the wrong layer to fix it (the loader's discriminated-union ontieris internal to its_load_allmethod). async def runis the contract. The Probe ABC isasync def run(self, repo, ctx); the test usesasyncio.run(probe.run(repo, ctx))(mirroringtests/unit/probes/layer_b/test_dep_graph.py:108). Do not introduce a sync wrapper — the original draft's_runprivate sync method does not exist in the Phase-0 contract and was a documentation bug.ProbeContextis a stdlib@dataclass. Construct it with every required field set; there is noProbeContext.for_testclassmethod (the original draft assumed one). The test helper_make_context(tmp_path)exists in the test module only.- Registry surface is
default_registry. Not_PROBE_REGISTRY(no such symbol). The entry is found vianext(e for e in default_registry._entries if e.cls.name == "skills_index"); if a publicRegistryaccessor lands later, swap to it then.