Story S2-01 — SkillsLoader three-tier merge with O_NOFOLLOW + body byte-offset¶
Step: Step 2 — Plant kernel-side loaders (SkillsLoader, ConventionsCatalogLoader) and reference TCCM
Status: Done (GREEN 2026-05-15, all ACs satisfied — see _attempts/S2-01.md)
Effort: M
Depends on: S1-04 (TCCM model + loader establishes the Result[T, E] + safe_yaml-chokepoint + ADR-0033 newtype pattern this story repeats)
ADRs honored: 02-ADR-0007 (kernel-side scaffolding only — no plugin loader), Phase 1 ADR-0006 (safe_yaml chokepoint preserved), Phase 1 ADR-0008 (in-process parse caps + O_NOFOLLOW), Phase 0 ADR-0001 (hashing chokepoint — extension required, see Validation notes), production ADR-0033 §1, §3–4 (newtypes for every domain primitive; make-illegal-states-unrepresentable)
Validation notes (added 2026-05-15 by phase-story-validator)¶
This story was hardened against four critic lenses (coverage, test-quality, consistency, design-patterns). Verdict: HARDENED. The core shape (three-tier merge, progressive disclosure, safe_yaml chokepoint, Pydantic discriminated SkillsLoadError) is correct and traces cleanly to arch §"Component design" #9, 02-ADR-0007 §Decision, and production ADR-0033. Block-tier and harden-tier closures applied in place:
- B1 — Hashing chokepoint extension (block). Original story prescribed "streaming
os.read(fd, 64 * 1024)chunks throughcodegenie.hashing.content_hash", butcontent_hash(path: Path) -> strdoes not accept fd or chunks, andsrc/codegenie/hashing.py's module docstring forbids any directblake3/hashlib.sha256import outside that file (Phase 0 ADR-0001). The story now commits to extendingcodegenie.hashingwithcontent_hash_fd(fd, *, offset, size) -> stras a sibling chokepoint (Open/Closed at the file boundary — one new function, no edits to existing call sites). AC-25 enforces the chokepoint discipline. - B2 —
body_blake3format pinned (block). AC-8 originally allowed "[:32](or project's documented BLAKE3-fingerprint width)" — ambiguous. The hashing module's contract returnsf"blake3:{64-hex}"(tagged).body_blake3is now pinned to the same tagged shape (AC-8 regex^blake3:[0-9a-f]{64}$). - B3 —
unsafe_yamlumbrella documented (block).safe_yaml.loadtranslates everyyaml.YAMLErrorsubclass (includingConstructorErrorfor!!python/object, plus pure-syntaxParserError) to a singleMalformedYAMLError. The story'sreason="unsafe_yaml"was inheriting that umbrella implicitly; an operator reading the log would think everyunsafe_yamlevent was a supply-chain attack when it might be a syntax typo. Notes for implementer + AC-6b document this honestly: the reason name is the operationally-prudent umbrella ("treat as hostile until investigated"); the AC suite proves both the!!python/objectcase AND a pure-syntax-error case land in the same bucket, and that no code executes for either. Arch §"Edge cases" row 8 retained as authoritative. - H1 — Mutation-table closures (harden). The TDD plan now catches: same-tier collision (AC-16), exact
body_offset/body_sizefor a hand-constructed fixture (AC-17, off-by-one catcher), exactos.openflag set (AC-18,O_NOFOLLOWdrop catcher), within-tier deterministic ordering (AC-19, cross-OS), TOCTOU file-disappearance duringrglob→os.openwindow (AC-20),find_applicablecorrectness (AC-21 — pure-monotonicity tolerates thereturn []mutation),["*"]wildcard semantics forapplies_to_*(AC-22, per CLAUDE.md "['*']meaning 'all'"), defensive-copy offind_applicablereturn (AC-23), AST source-scan replacing ripgrep AC-12 (AC-24, per S1-04 AC-23 precedent), tempfile cleanup (AC-26). - H2 — Primitive obsession on domain fields (harden, ADR-0033 §1).
applies_to_tasksandapplies_to_languagesretyped fromlist[str]tolist[TaskClassId]andlist[Language](newtypes already exist incodegenie.types.identifiers).evidence_keysargument tofind_applicableretyped fromset[str]to a typedEvidenceQuery(task: TaskClassId | None, languages: set[Language])—set[str]was a degenerate flat bag that conflated task and language semantics. AC-21 / AC-22 are written against the typed surface. - H3 —
_TIER_NAMEis anti-pattern-adjacent (harden, ADR-0033 §3). Replaceddict[int, str]withTier: TypeAlias = Literal["user", "repo", "org"]+ a frozen_TIERS: Final[tuple[Tier, ...]] = ("user", "repo", "org")ordering constant. Tier identity is now a typed claim, not anint-keyed lookup the type checker can't see. Same observable behavior; same security commitment; ergonomically and structurally typed. - H4 — Functional-core / imperative-shell split (harden, Notes-only).
_load_one_skilldoes seven concerns (open, scan, tempfile, parse, hash, validate, classify). Recommended split: pure_split_frontmatter(data: bytes) -> Result[FrontmatterSplit, FrontmatterUnterminated]+ pure_build_skill(parsed_frontmatter, body_offset, body_size, body_blake3) -> Result[Skill, SchemaViolation]+ impure_open_skill_path,_read_with_streaming_hash,_load_one_skillcomposer. Not promoted to an AC (Rule 3 / surgical changes — the AC suite already constrains observable behavior); recorded in Notes-for-implementer so a contributor reaching for the obvious-but-monolithic shape sees the alternative. - H5 — Consistency w/ sibling
TCCMLoader(harden, Notes-only). TCCMLoader (S1-04) uses a markerCodegenieErrorsubclass with prefixedargs[0]strings ("parse: …","schema: …"). This story diverges deliberately to a Pydantic discriminated union because (a) partial-success multi-file semantics require a typed per-file error list, not a thrown exception, and (b) Phase 1's markers-only invariant binds onlycodegenie.errors.CodegenieErrorsubclasses, notResult.Errpayloads. The convention going forward: single-file loaders use markers + string prefix (S1-04); multi-file partial-success loaders use Pydantic discriminated unions (this story + S2-02). Documented in Notes-for-implementer. - NEEDS RESEARCH: none. All findings closeable from arch + ADRs + verified repo state (Stage 3 skipped).
A full audit log is at _validation/S2-01-skills-loader.md.
Context¶
SkillsLoader is the kernel-side loader for SKILL.md files — YAML-frontmatter + markdown-body documents that encode organizational uniqueness as data (Skills are Phase 4+'s structured input to the Planner; Phase 2 ships the loader so Phase 3's first plugin has a typed surface from day one — 02-ADR-0007 §"Decision"). Three load-bearing commitments make this story non-trivial: (a) three-tier merge across ~/.codegenie/skills/ (user) → .codegenie/skills/ (repo-local) → optional ~/.codegenie/skills-org/ (org-shared) with first-tier-wins + a loud skill_shadowed warning on collision (final-design.md §"Components" #9, [S]'s open Q §6 resolved); (b) progressive disclosure — the markdown body is byte-offset-recorded but never read into memory (commitment §2.7, verified by tracemalloc peak < 20 KB on a 100 MB-body fixture; a future contributor adding body: str to Skill would silently break the commitment without that test, per High-level-impl.md §"Risks specific to this step"); (c) per-file os.open(path, O_NOFOLLOW | O_NOCTTY) defeats symlink escape, and YAML parsing routes through Phase 1's codegenie.parsers.safe_yaml.load chokepoint — there is no parallel _safe_yaml_load_skill helper (final-design §"Conflict-resolution" row 9; Rule 7 — don't fork conventions).
The three-tier order is a security regression on inversion (High-level-impl.md §"Risks"): user-tier first means an attacker who plants a hostile ~/.codegenie/skills-org/ cannot override a user-trusted skill. The order is pinned in SkillsLoader.__init__ argument order and asserted in tests; future contributors get a failing test, not a silent flip.
References — where to look¶
- Architecture:
../phase-arch-design.md §"Component design" #9— interface, three-tier merge,O_NOFOLLOW, body byte-offset commitment,safe_yaml.loadchokepoint reuse.../phase-arch-design.md §"Scenarios" → Scenario 3— hostile YAML inSKILL.md(!!python/object) flow;SkillsLoadError(reason="unsafe_yaml")outcome.../phase-arch-design.md §"Edge cases"rows 8, 9, 16 — hostile YAML; planted symlink; three-tier collision.../phase-arch-design.md §"Data model"—SkillPydantic model (frozen=True, extra="forbid") withid: SkillId,applies_to_tasks: list[str],applies_to_languages: list[str],body_offset: int,body_size: int,body_blake3: str.../phase-arch-design.md §"Anti-patterns avoided"— "Side effects in constructors. Every loader … is pure data at__init__; first I/O isload_all()."- Phase ADRs:
../ADRs/0007-no-plugin-loader-in-phase-2.md—SkillsLoaderis kernel-side; noplugin.yaml, noplugins/directory.- Production ADRs:
../../production/adrs/0033-domain-modeling-discipline.md§3–4 — newtypeSkillId; make-illegal-states-unrepresentable forSkillsLoadErrorreason union.- Source design:
../final-design.md §"Components" #9— three-tier merge with first-tier-wins + loudskill_shadowedwarning;O_NOFOLLOWdiscipline;safe_yaml.loadreused.../final-design.md §"Conflict-resolution table"row 9 — Phase 1 chokepoint reused, no parallel helper.../final-design.md §"Departures from all three inputs"§8 — Rule 7 framing.- Existing code:
src/codegenie/parsers/safe_yaml.py(Phase 1 S1-03) —load(path, *, max_bytes, max_depth=64) -> Mapping[str, JSONValue]. This is the only YAML loader call site; do not add a parallel one.src/codegenie/parsers/_io.py(Phase 1 S1-03) —open_capped(path, *, max_bytes, parser_kind)already implementsO_NOFOLLOW. SkillsLoader does not call this directly (it routes throughsafe_yaml.load); theO_NOFOLLOWdiscipline at the SkillsLoader call site is a defense-in-depth on the frontmatter-extraction open (pre-safe_yaml.load), since SkillsLoader must locate the---frontmatter terminator before delegating the YAML bytes.src/codegenie/result.py(Phase 1, lifted in S1-04) —Result[T, E]sum type;Result.Ok/Result.Errconstructors.src/codegenie/tccm/loader.py(S1-04) — the establishedResult-returning,safe_yaml-routing loader shape this story mirrors.- External docs:
localv2.md§5.4 (Layer D probes) —skills_indexprobe is the Phase 2 consumer (S6-01); the slice shape includes skill IDs only, never bodies.- PyYAML
CSafeLoadersemantics (S1-03 module docstring) — already pinned upstream.
Goal¶
Ship src/codegenie/skills/ (three files: __init__.py, model.py, loader.py) with:
# model.py
Tier: TypeAlias = Literal["user", "repo", "org"]
class Skill(BaseModel): # frozen=True, extra="forbid"
id: SkillId
applies_to_tasks: list[TaskClassId] # ADR-0033 §1 — newtype list, not list[str]
applies_to_languages: list[Language] # ADR-0033 §1 — newtype list, not list[str]
body_offset: Annotated[int, Field(ge=0)] # progressive disclosure: body NOT loaded
body_size: Annotated[int, Field(ge=0)]
body_blake3: str # pinned format: r"^blake3:[0-9a-f]{64}$"
class EvidenceQuery(BaseModel): # frozen=True, extra="forbid"
task: TaskClassId | None # None = no task constraint in this query
languages: set[Language]
# loader.py
class SkillsLoader:
def __init__(self, search_paths: list[Path]) -> None: ... # pure data; no I/O
def load_all(self) -> Result[LoadOutcome, FatalLoadError]: ...
def find_applicable(self, evidence: EvidenceQuery) -> list[Skill]: ...
With the following invariants:
- Constructor is pure data. No
os.listdir, no file reads, no path normalization beyondPath(p)coercion in__init__. First I/O isload_all(). - Per-file open uses
os.open(path, O_RDONLY | O_NOFOLLOW | O_NOCTTY). The fd is read to locate the second---frontmatter delimiter (size-capped atmax_bytes=1 MiBtotal bytes scanned before locating the terminator; cap exceeded →Result.Err(SkillsLoadError(reason="frontmatter_unterminated"))). The frontmatter bytes are then passed tocodegenie.parsers.safe_yaml.loadvia an in-memory file (thesafe_yaml.loadsignature requires aPath; loader writes the frontmatter to atempfile.NamedTemporaryFileopeneddelete=Falseinside awithblock and passes that path — alternative implementations may extendsafe_yamlwith aloads_bytesshim in a follow-up, but Phase 2 ships the path-based form unchanged). - Body is byte-offset-recorded only.
body_offsetis the file offset (post-frontmatter---\nline) where the markdown body starts;body_size = fstat.st_size - body_offset.body_blake3is computed by streaming 64 KiB chunks through a new public API on the hashing chokepoint —codegenie.hashing.content_hash_fd(fd, *, offset: int, size: int) -> str(extension by addition, Open/Closed at the chokepoint; ADR-0001 preserved). The implementation reads in 64 KiB chunks and never materializes the body as a singlebytes/strvalue. Return value format isr"^blake3:[0-9a-f]{64}$"(identical tocontent_hash(path)).tracemallocpeak attributable to the SkillsLoader call frame is < 20 KB on a 100 MB-body fixture. No directblake3import outsidesrc/codegenie/hashing.py— AC-25 enforces this via AST source scan. - Three-tier merge: first-tier-wins. Iteration order is
search_pathsargument order. ConcreteSkillsLoader.default()classmethod factory pins[Path("~/.codegenie/skills/").expanduser(), Path(".codegenie/skills/"), Path("~/.codegenie/skills-org/").expanduser()]— the user tier wins over repo-local wins over org-shared (defensive against a hostile org-shared tier). Tier identity isTier: TypeAlias = Literal["user", "repo", "org"]with a frozen_TIERS: Final[tuple[Tier, ...]] = ("user", "repo", "org")ordering constant — not adict[int, str](ADR-0033 §3, typed-claim-not-int-keyed-lookup). A cross-tier collision (sameSkillIdresolved earlier, seen again later) emits exactly oneskill_shadowedstructlog event per collision with{skill_id, winning_tier, shadowed_tier, winning_path, shadowed_path}structured fields and the later occurrence is discarded. A same-tier collision (twoSKILL.mdwith the sameidunder the same tier root) is operationally indistinguishable from a cross-tier one — sameskill_shadowedevent withwinning_tier == shadowed_tierand the lexicographically-first path winning (so collision behavior is deterministic; AC-16). Missing optional tier (e.g., no~/.codegenie/skills-org/) → skipped silently (AC-3a). Within a single tier, results are emitted in lexicographic order ofpath.relative_to(tier_root)so the load order is deterministic across filesystems (xfs/ext4/APFSrgloborder is not guaranteed; AC-19). O_NOFOLLOWELOOP →Result.Err(SymlinkRefused(path=path))for that file; other skills load.safe_yaml.loadraisingMalformedYAMLError(covers!!python/objectConstructorError per S1-03 AC-9 AND any otheryaml.YAMLErrorsubclass —ParserError,ScannerError, top-level non-mapping) →Result.Err(UnsafeYaml(path=path))for that file; other skills load. The reason nameunsafe_yamlis deliberately the operationally-prudent umbrella ("treat as hostile until investigated"); the AC suite proves both the!!python/objectcase AND a pure-syntax-error case land in the same bucket and that no code executes for either (AC-6, AC-6b). Any otherOSErrorraised during open/read (e.g.,FileNotFoundErrorfrom a TOCTOU race betweenrglobyielding a path andos.openseeing it deleted;PermissionError;IsADirectoryError) →Result.Err(IoFailure(path=path, errno_name=errno.errorcode[exc.errno]))for that file; other skills load (AC-20). The onlyResult.Err(FatalLoadError(...))path onload_all()is catastrophic — e.g., the search path itself failsos.accessin a way that says "no tier is readable"; per-file failures are non-fatal.SkillsLoadErroris a Pydantic discriminated union, not aCodegenieErrormarker subclass. Phase 1's markers-only invariant binds onlycodegenie.errors.CodegenieErrorsubclasses; Phase 2's typedResult.Errvalues are Pydantic models with kind discriminators (reason: Literal["symlink_refused", "unsafe_yaml", "frontmatter_unterminated", "schema", "io_failure"]) + apath: Pathfield. This matches ADR-0033 §3 and the arch'sSkillsLoadError(reason=...)shape (§"Component design" #9). Convention note (carried inNotes for the implementer): single-file loaders (TCCMLoader, S1-04) useCodegenieErrormarker subclasses with string-prefixedargs[0]; multi-file partial-success loaders (this story, S2-02) use Pydantic discriminated unions because per-file errors must round-trip as values inside aLoadOutcomelist.find_applicable(evidence)is correct AND monotone. The matching rule, with["*"]wildcard per CLAUDE.md:- Tasks-component matches iff
"*"(cast toTaskClassId) is inskill.applies_to_tasksORevidence.task is not None and evidence.task in skill.applies_to_tasks. - Languages-component matches iff
"*"(cast toLanguage) is inskill.applies_to_languagesORset(skill.applies_to_languages) & evidence.languages != ∅. - A skill is applicable iff both components match (AND).
Two test obligations: correctness (AC-21 — a hand-constructed fixture with predictable membership: a vulnerability-remediation / typescript skill matches an EvidenceQuery(task="vulnerability-remediation", languages={"typescript"}) but not EvidenceQuery(task="distroless-migration", languages={"typescript"}); a wildcard skill matches both) and monotonicity (AC-11 — adding a language to evidence.languages never removes a match; setting evidence.task from None to a value never adds a match unless the skill was wildcard-or-matching anyway). Pure monotonicity is insufficient on its own — a return [] implementation is vacuously monotone — which is why AC-21 carries the correctness obligation. The function returns a fresh list per call (defensive copy; never the cached internal _skills reference; AC-23) so a caller mutating the returned list cannot corrupt loader state.
Acceptance criteria¶
- [x] AC-1 — module surface.
src/codegenie/skills/__init__.pyexportsSkill,SkillsLoader,SkillsLoadError,LoadOutcome,FatalLoadError,EvidenceQuery,Tieronly via__all__(exact-set test).Skillisfrozen=True, extra="forbid".SkillsLoadErroris aAnnotated[Union[...], Field(discriminator="reason")]discriminated union over fivereasonliterals (symlink_refused,unsafe_yaml,frontmatter_unterminated,schema,io_failure).Tier = Literal["user", "repo", "org"]. - [x] AC-2 — pure-data constructor (no I/O).
SkillsLoader(search_paths=[non_existent_path])does not raise and does not call any I/O primitive. Asserted bymonkeypatch.setattron all ofos.listdir,os.scandir,os.open,os.stat,pathlib.Path.exists,pathlib.Path.is_dir, each replaced by alambda *a, **kw: pytest.fail("constructor performed I/O"). (Strengthens againstPath.iterdir/Path.globsmuggling.) - [x] AC-3 — happy path three-tier merge. Given a fixture with
user/foo/SKILL.md,repo/bar/SKILL.md,org/baz/SKILL.md(distinct IDs),load_all()returnsResult.Ok(LoadOutcome(skills=[foo, bar, baz], per_file_errors=[]))in iteration order. Each loadedSkillis fully populated —id,applies_to_tasks,applies_to_languages,body_offset,body_size,body_blake3— every field asserted, not justid. - [x] AC-3a — missing optional tier silently skipped. With
search_paths=[user_dir, repo_dir, /nonexistent/org],load_all()succeeds withLoadOutcome.skillscontaining only the user+repo skills andper_file_errors == []. The missing-tier branch emits no log event (silent skip per arch §"Failure behavior"). - [x] AC-4 — first-tier-wins on cross-tier collision +
skill_shadowedwarning. Given a fixture withuser/dup/SKILL.md(iddup) andorg/dup/SKILL.md(iddup),load_all()returnsLoadOutcome.skills == [user_dup](one entry, the user-tier copy) and emits exactly oneskill_shadowedstructlog event with structured fields{event: "skill_shadowed", skill_id: "dup", winning_tier: "user", shadowed_tier: "org", winning_path: <user path>, shadowed_path: <org path>}. Asserted viastructlog.testing.capture_logs. Inversion (org-wins) is a test-asserted security regression. - [x] AC-4a — middle-tier wins when first tier absent. Fixture: no user
dup, butrepo/dupandorg/dupboth present. Repo wins;skill_shadowedevent haswinning_tier="repo",shadowed_tier="org". (Catches an off-by-one_TIERSindexing mutation that AC-4 alone misses.) - [x] AC-5 —
O_NOFOLLOWsymlink refusal yields typed per-file error, other skills load. Fixture:user/good/SKILL.md(legitimate) +user/evil/SKILL.md→ symlink totmp_path/outside.load_all()returnsResult.Ok(LoadOutcome(skills=[good], per_file_errors=[err]))whereisinstance(err, SymlinkRefused)anderr.path == user/evil/SKILL.md. The sentinel target is not read (nob"leaked\n"bytes appear in the in-memorySkillbody fields or any log event). - [x] AC-6 —
!!python/objectpayload yieldsunsafe_yamland executes no code. Fixture:user/evil/SKILL.mdwhose frontmatter contains!!python/object/apply:os.system ['touch {sentinel}']. Afterload_all():not sentinel.exists(); the per-file error isUnsafeYaml(path=user/evil/SKILL.md); other skills load. Inherits S1-03 AC-9 viasafe_yaml.load→MalformedYAMLError. - [x] AC-6b — pure-syntax-error YAML also lands as
unsafe_yaml(umbrella honesty). Fixture:user/typo/SKILL.mdwhose frontmatter isid: ok\napplies_to_tasks: [unterminated(legal trigger ofyaml.ScannerError, not a constructor exploit). The per-file error isUnsafeYaml(path=...)— same bucket as AC-6. Documented in Notes-for-implementer: the reason name is the operationally-prudent umbrella ("treat as hostile until investigated"); operators readunsafe_yamland inspect; no code path discriminates parse-failure-flavor. - [x] AC-6c — frontmatter unterminated → typed
FrontmatterUnterminated, no OOM. Fixture:user/eternal/SKILL.mdis exactlyb"---\n" + b"x: y\n" * (1 << 20)— opens correctly with---but the second---never appears within the 1 MiB scan cap.load_all()returnsper_file_errors=[FrontmatterUnterminated(path=...)]; the 1 MiB-scan-cap bound is asserted bytracemallocpeak < 2 MiB. - [x] AC-6d —
SchemaViolationon invalid frontmatter. Fixture:user/bad-schema/SKILL.mdwhose frontmatter isid: ok\napplies_to_tasks: "not-a-list"\napplies_to_languages: [ts]\n. Per-file error isSchemaViolation(path=..., details=[{...Pydantic ValidationError shape...}]);detailsis alist[dict], non-empty, includes a row withloc=("applies_to_tasks",). - [x] AC-7 — progressive disclosure: 100 MB body, peak < 20 KB. Fixture:
user/big/SKILL.mdwith a 32-byte frontmatter and a 100 MB markdown body of fixed bytes (b"\xab" * (1 << 20)repeated 100×, deterministic).tracemalloc.start()beforeload_all();take_snapshot()after. The peak delta attributable to the loader frame is < 20 KB. Pinned bytest_body_not_loaded_into_memory_under_100mb_fixture. A future contributor who replaces the streaming hash withbody = f.read(); content_hash_bytes(body)makes this test fail. - [x] AC-7a — exact
body_offset/body_sizefor a hand-constructed fixture (off-by-one catcher). Fixture: frontmatter is exactlyb"---\nid: x\napplies_to_tasks: [\"*\"]\napplies_to_languages: [\"*\"]\n---\n"(length 73 bytes, asserted constant), body isb"hello world\n"(12 bytes).load_all()returns aSkillwithbody_offset == 73andbody_size == 12. An implementation that includes the closing---\nin the body (setsbody_offset = 69) or excludes one extra leading byte fails this test. - [x] AC-8 —
body_blake3matches reference AND format pinned. For the AC-7a fixture,body_blake3 == codegenie.hashing.content_hash_bytes(b"hello world\n")(which isf"blake3:{hexdigest}"). The streaming hash and the one-shot hash agree byte-for-byte. The string matchesre.fullmatch(r"blake3:[0-9a-f]{64}", body_blake3). An implementation that drops theblake3:prefix, truncates to 32 hex, or silently switches to SHA-256 fails the regex. - [x] AC-9 —
Skillmodel isfrozen=True, extra="forbid". ConstructingSkill(id=SkillId("x"), applies_to_tasks=[], applies_to_languages=[], body_offset=0, body_size=0, body_blake3="blake3:" + "0"*64, extra_field="x")raises PydanticValidationError. Assigningskill.id = SkillId("other")on a constructed instance raisesValidationError(frozen). - [x] AC-10 —
SkillsLoadErroris a discriminated union, five reasons enumerated. Test parametrizes over{"symlink_refused", "unsafe_yaml", "frontmatter_unterminated", "schema", "io_failure"}and asserts each constructs successfully via the discriminator +pathfield; a sixth reason (reason="bogus") raisesValidationError. JSON-shape pin:SymlinkRefused(path=Path("/x")).model_dump() == {"reason": "symlink_refused", "path": "/x"}for stability across Pydantic minor versions (cross-version mutation catcher; per S1-04 precedent). - [x] AC-11 —
find_applicablemonotonicity (Hypothesis property).tests/property/test_skills_loader_monotone.py— for any twoEvidenceQueryvalues whereq1.languages ⊆ q2.languagesand(q1.task is None) or (q1.task == q2.task),set(s.id for s in loader.find_applicable(q1)) ⊆ set(s.id for s in loader.find_applicable(q2)). (Strict subset on languages; task None→Some is also monotone-increasing.) Adding evidence never removes a match. - [x] AC-11a —
find_applicablecorrectness (paired positive test, not just monotonicity). Pure monotonicity toleratesreturn []. With a fixture containing: vuln-ts(applies_to_tasks=[vulnerability-remediation],applies_to_languages=[typescript]),vuln-any(applies_to_tasks=[vulnerability-remediation],applies_to_languages=["*"]),any-ts(applies_to_tasks=["*"],applies_to_languages=[typescript]),noop(applies_to_tasks=[distroless],applies_to_languages=[go]):
find_applicable(EvidenceQuery(task="vulnerability-remediation", languages={"typescript"})) returns exactly {vuln-ts, vuln-any, any-ts} (order: cached _skills first-encountered); noop is not present. Each membership assertion is individual (the union of "what's in" and "what's out" is the mutation-resistant shape).
- [x] AC-11b — find_applicable([]) returns [] before load_all(). Constructing the loader and calling find_applicable(EvidenceQuery(task=None, languages=set())) without first calling load_all() returns [] and does not implicitly trigger I/O (verified by the same monkeypatch set as AC-2). Notes: explicit > implicit.
- [x] AC-12 — safe_yaml.load chokepoint asserted by runtime spy. Monkeypatch codegenie.parsers.safe_yaml.load with a wrapping spy; load_all() over a 3-skill fixture invokes the spy 3 times with max_bytes >= 1 << 20. No direct yaml.load / yaml.CSafeLoader / yaml.safe_load call can produce skill values (verified by replacing the spy with a pytest.fail and confirming no skills are loaded).
- [x] AC-13 — forbidden-patterns continues to ban model_construct. Pre-commit hook (Phase 0) scans src/codegenie/skills/ and finds zero model_construct calls. Reinforces ADR-0033 § "no smuggling around validation".
- [x] AC-14 — toolchain. ruff check, ruff format --check, mypy --strict, and mypy --warn-unreachable (per-module on codegenie.skills/**) are clean. pytest tests/unit/skills/ tests/property/test_skills_loader_monotone.py passes.
- [x] AC-15 — TDD discipline. Red tests committed failing; green commit makes them pass; refactor commit is no-op behavior. Validator can reproduce.
- [x] AC-16 — same-tier collision is also skill_shadowed, lexicographic-first wins. Fixture: user/a/SKILL.md (id dup) and user/b/SKILL.md (id dup) — both under the same user tier root. load_all() returns LoadOutcome.skills == [user_a] (lexicographic first) and emits exactly one skill_shadowed event with winning_tier == shadowed_tier == "user" and shadowed_path == user/b/SKILL.md. Same-tier collisions are operationally indistinguishable from cross-tier ones; the deterministic-load commitment (§2.6) requires both cases be handled.
- [x] AC-17 — exact body_offset / body_size (subsumed into AC-7a; retained as alias). See AC-7a.
- [x] AC-18 — os.open flag set exactly enforced. Spy monkeypatch.setattr(os, "open", spying_open) on the SkillsLoader-side open (NOT the safe_yaml-side; capture only the SKILL.md path opens). For each SKILL.md opened, flags == os.O_RDONLY | os.O_NOFOLLOW | os.O_NOCTTY exactly. An implementation that drops O_NOFOLLOW (silent security regression — AC-5 catches the behavioral consequence, AC-18 catches the code-level drop before any symlink fixture even runs) fails this test.
- [x] AC-19 — within-tier deterministic ordering (cross-filesystem reproducibility). Fixture: user/zzz/SKILL.md (id zzz) and user/aaa/SKILL.md (id aaa). load_all() returns skills in lexicographic order of path.relative_to(tier_root) — i.e., [aaa, zzz], irrespective of rglob's filesystem-dependent yield order. Assertion uses two-skill order, not insertion order.
- [x] AC-20 — TOCTOU file disappearance → typed IoFailure, no unhandled exception. Fixture: directly invoke _load_one_skill(Path("/path/that/does/not/exist/SKILL.md")). Returns Result.Err(IoFailure(path=..., errno_name="ENOENT")). (Catches the race window between rglob yielding a path and os.open opening it.) Other observed errnos catch as well — EACCES → "EACCES", EISDIR → "EISDIR" — via errno.errorcode[exc.errno].
- [x] AC-21 — find_applicable correctness (subsumed into AC-11a; retained as alias). See AC-11a.
- [x] AC-22 — ["*"] wildcard semantics. A skill with applies_to_tasks=[cast(TaskClassId, "*")] matches any EvidenceQuery.task (including None); a skill with applies_to_languages=[cast(Language, "*")] matches an EvidenceQuery.languages that is empty or any value. Documented in the module docstring; pinned by parametrized test. Note: "*" is the documented wildcard sentinel per CLAUDE.md §"Conventions to follow" ("['*'] meaning 'all'").
- [x] AC-23 — find_applicable returns a fresh list per call (defensive copy). loader.find_applicable(q) is not loader.find_applicable(q) (identity inequality). Mutating the returned list (result.clear()) does not affect a subsequent call's return. Internal _skills reference is not exposed.
- [x] AC-24 — AST source-scan: no direct YAML import outside safe_yaml. tests/unit/skills/test_no_direct_yaml_import.py uses ast.parse on every .py under src/codegenie/skills/ and asserts no Import / ImportFrom node names yaml. Replaces the ripgrep yaml\. check — durable against alias smuggling (from yaml import safe_load as _y). Per S1-04 AC-23 precedent.
- [x] AC-25 — hashing chokepoint (Open/Closed extension). tests/unit/skills/test_no_direct_blake3_import.py uses ast.parse on every .py under src/codegenie/skills/ and asserts no Import / ImportFrom node names blake3 or hashlib. All hash computation routes through codegenie.hashing.content_hash_fd. A new public symbol content_hash_fd(fd: int, *, offset: int, size: int) -> str is added to src/codegenie/hashing.py's __all__; the existing __all__ list grows by exactly one entry; no edits to existing content_hash / content_hash_bytes / etc.
- [x] AC-26 — tempfile cleanup verified, even on parse failure. After load_all() over a fixture that triggers UnsafeYaml (AC-6), len(list(Path(tempfile.gettempdir()).glob("*.yaml"))) is identical before and after the call (no orphans). The try/finally cleanup in _load_one_skill is exercised. Uses tempfile.tempdir monkeypatched to a per-test tmp_path / "tmp" for hermetic measurement.
Implementation outline¶
src/codegenie/hashing.py— extend by addition (Open/Closed): add publiccontent_hash_fd(fd: int, *, offset: int, size: int) -> strreturningf"blake3:{hex64}"identical in format tocontent_hash(path). Lazy-importsblake3per existing convention; reads in 64 KiB chunks viaos.read(fd, 1 << 16)afteros.lseek(fd, offset, os.SEEK_SET). Add to__all__. No edits to existing functions; new tests intests/unit/test_hashing.pyfor the new symbol parity withcontent_hash_bytes(round-trip identity).src/codegenie/skills/model.py—Tier: TypeAlias = Literal["user", "repo", "org"];_TIERS: Final[tuple[Tier, ...]] = ("user", "repo", "org"). ImportSkillId,TaskClassId,Languagefromcodegenie.types.identifiers(do not redefine — newtypes are kernel-side per S1-05). DefineSkillPydantic model (frozen=True, extra="forbid") with fields:id: SkillId,applies_to_tasks: list[TaskClassId],applies_to_languages: list[Language],body_offset: Annotated[int, Field(ge=0)],body_size: Annotated[int, Field(ge=0)],body_blake3: Annotated[str, Field(pattern=r"^blake3:[0-9a-f]{64}$")]. DefineEvidenceQuerymodel (frozen=True, extra="forbid") withtask: TaskClassId | None,languages: set[Language].src/codegenie/skills/loader.py::SkillsLoadError— Pydantic discriminated union over five reasons:class SymlinkRefused(BaseModel): # frozen=True, extra="forbid" reason: Literal["symlink_refused"] = "symlink_refused" path: Path class UnsafeYaml(BaseModel): reason: Literal["unsafe_yaml"] = "unsafe_yaml" path: Path class FrontmatterUnterminated(BaseModel): reason: Literal["frontmatter_unterminated"] = "frontmatter_unterminated" path: Path class SchemaViolation(BaseModel): reason: Literal["schema"] = "schema" path: Path details: list[dict] # Pydantic ValidationError.errors() shape class IoFailure(BaseModel): reason: Literal["io_failure"] = "io_failure" path: Path errno_name: str # errno.errorcode[exc.errno] — "ENOENT", "EACCES", "EISDIR", ... SkillsLoadError = Annotated[ Union[SymlinkRefused, UnsafeYaml, FrontmatterUnterminated, SchemaViolation, IoFailure], Field(discriminator="reason"), ] class LoadOutcome(BaseModel): # frozen=True, extra="forbid" skills: list[Skill] per_file_errors: list[SkillsLoadError] class FatalLoadError(BaseModel): # only path: catastrophic (no tier readable, etc.) reason: Literal["all_tiers_unreadable"] = "all_tiers_unreadable" attempted: list[Path]SkillsLoader.__init__(self, search_paths: list[Path]) -> None— storeself._search_paths = list(search_paths), initializeself._skills: list[Skill] = []. Nooscalls — noPath.exists, noPath.iterdir, noPath.glob, noos.scandir. The default factorySkillsLoader.default()(classmethod) builds the pinned three-tier order (user, repo-local, org-shared) zipped with_TIERS. Tier addition is ADR-amend-gated (intentional friction matches ADR-0030's "noUnknownvariant" precedent).SkillsLoader.load_all(self) -> Result[LoadOutcome, FatalLoadError]. A single bad file is not fatal (arch §"Failure behavior"). Outline:from codegenie.skills.model import _TIERS, Tier skills_by_id: dict[SkillId, tuple[Tier, Path, Skill]] = {} # tier → (path, skill) errors: list[SkillsLoadError] = [] for tier, search_path in zip(_TIERS, self._search_paths, strict=True): if not search_path.exists(): continue # missing optional tier silently skipped (AC-3a) # Sort within-tier for cross-FS determinism (AC-19). skill_mds = sorted(search_path.rglob("SKILL.md"), key=lambda p: p.relative_to(search_path).as_posix()) for skill_md in skill_mds: outcome = _load_one_skill(skill_md) match outcome: case Ok(value=skill): if skill.id in skills_by_id: prior_tier, prior_path, _ = skills_by_id[skill.id] log.warning("skill_shadowed", skill_id=skill.id, winning_tier=prior_tier, shadowed_tier=tier, winning_path=str(prior_path), shadowed_path=str(skill_md)) continue # first-tier-wins; lexicographic-first within tier (AC-16) skills_by_id[skill.id] = (tier, skill_md, skill) case Err(error=err): errors.append(err) log.warning("skill_load_failed", **err.model_dump(mode="json")) self._skills = [s for (_, _, s) in skills_by_id.values()] return Ok(value=LoadOutcome(skills=list(self._skills), per_file_errors=errors))_load_one_skill(path: Path) -> Result[Skill, SkillsLoadError]— the per-file routine. Recommended functional-core split (see Notes-for-implementer §"Functional-core split"); the monolithic shape below is acceptable but split is preferred:try: fd = os.open(path, os.O_RDONLY | os.O_NOFOLLOW | os.O_NOCTTY) # AC-18 asserts exact flags except OSError as exc: if exc.errno == errno.ELOOP: return Err(error=SymlinkRefused(path=path)) # AC-20: every other OSError → typed IoFailure (NOT raise); covers TOCTOU file-disappearance. return Err(error=IoFailure(path=path, errno_name=errno.errorcode.get(exc.errno or 0, "EUNKNOWN"))) try: frontmatter_bytes, body_offset = _read_frontmatter(fd, max_scan_bytes=1 << 20) if frontmatter_bytes is None: return Err(error=FrontmatterUnterminated(path=path)) body_size = os.fstat(fd).st_size - body_offset body_blake3 = content_hash_fd(fd, offset=body_offset, size=body_size) # AC-25 — chokepoint finally: os.close(fd) # Parse frontmatter via the safe_yaml chokepoint (no parallel loader). tmp_fd, tmp_name = tempfile.mkstemp(suffix=".yaml") tmp_path = Path(tmp_name) try: with os.fdopen(tmp_fd, "wb") as tf: tf.write(frontmatter_bytes) try: data = safe_yaml.load(tmp_path, max_bytes=1 << 20) except MalformedYAMLError: return Err(error=UnsafeYaml(path=path)) # AC-6 + AC-6b — umbrella for ALL yaml.YAMLError finally: tmp_path.unlink(missing_ok=True) # AC-26 — cleanup on every path try: return Ok(value=Skill(**data, body_offset=body_offset, body_size=body_size, body_blake3=body_blake3)) except ValidationError as exc: return Err(error=SchemaViolation(path=path, details=exc.errors()))_read_frontmatter(fd, *, max_scan_bytes)— read in 4 KiB chunks; locate the opening---\n(the file's first 4 bytes MUST be---\nor---\r\n, otherwise return(None, _)immediately — frontmatter is mandatory); locate the closing---\n; return(frontmatter_bytes, body_offset)wherebody_offset = bytes_consumed_through_closing_delimiter(the offset of the FIRST byte AFTER the closing---\n). Cap atmax_scan_bytestotal scanned before terminator →(None, body_offset_default).content_hash_fd(fd, *, offset, size)(from Step 0 above; lives incodegenie.hashing, not inskills/) —os.lseek(fd, offset, os.SEEK_SET); loopos.read(fd, 1 << 16)into the running BLAKE3 hasher; count bytes (assert ==size); returnf"blake3:{hasher.hexdigest()}". No intermediatebytesaccumulator. This is the path AC-7 protects.find_applicable(self, evidence: EvidenceQuery) -> list[Skill]— thin instance method that delegates to a pure module-level function_matches(skills: Sequence[Skill], evidence: EvidenceQuery) -> list[Skill](functional-core; testable without a loader). Matching rule per Invariant §7. Returnslist(matches)— a freshlist, never the cachedself._skillsreference (AC-23). Pre-load_allstate returns[]without I/O (AC-11b).- Pure matching function
_matches(skills, evidence):def _matches(skills: Sequence[Skill], evidence: EvidenceQuery) -> list[Skill]: _WILDCARD_TASK = cast(TaskClassId, "*") _WILDCARD_LANG = cast(Language, "*") out: list[Skill] = [] for skill in skills: tasks_ok = (_WILDCARD_TASK in skill.applies_to_tasks) or \ (evidence.task is not None and evidence.task in skill.applies_to_tasks) langs_ok = (_WILDCARD_LANG in skill.applies_to_languages) or \ (bool(set(skill.applies_to_languages) & evidence.languages)) if tasks_ok and langs_ok: out.append(skill) return out - structlog event names are literal strings for now (
"skill_shadowed","skill_load_failed"); S1-10'sFinal[str]constant promotion convention applies to Phase 1'sprobe.parser.cap_exceededonly — Phase 2 follow-up can promote these later, but YAGNI in Step 2.
TDD plan — red / green / refactor¶
Red — write the failing test first¶
Test files: tests/unit/skills/test_loader.py (15 named tests covering AC-1..AC-14) + tests/property/test_skills_loader_monotone.py (Hypothesis).
# tests/unit/skills/test_loader.py — red test pinning the load-bearing AC-7
import os
import textwrap
import tracemalloc
from pathlib import Path
import pytest
import structlog.testing
from codegenie.skills import Skill, SkillsLoader
from codegenie.skills.loader import SymlinkRefused, UnsafeYaml
def _write_skill(p: Path, sid: str, body: bytes = b"# body\n") -> Path:
p.mkdir(parents=True, exist_ok=True)
skill_md = p / "SKILL.md"
frontmatter = textwrap.dedent(f"""\
---
id: {sid}
applies_to_tasks: [vulnerability-remediation]
applies_to_languages: [typescript]
---
""").encode()
skill_md.write_bytes(frontmatter + body)
return skill_md
def test_constructor_is_pure_data(tmp_path: Path, monkeypatch: pytest.MonkeyPatch) -> None:
"""AC-2: __init__ must not perform I/O."""
calls: list[str] = []
monkeypatch.setattr(os, "listdir", lambda p: calls.append(("listdir", p)) or [])
monkeypatch.setattr(os, "open", lambda *a, **kw: calls.append(("open", a)) or (_ for _ in ()).throw(OSError(2, "stub")))
SkillsLoader(search_paths=[tmp_path / "does-not-exist"]) # must not raise, must not call os
assert calls == [], f"constructor performed I/O: {calls}"
def test_first_tier_wins_emits_skill_shadowed(tmp_path: Path) -> None:
"""AC-4: user-tier wins over org-tier; collision logs skill_shadowed."""
user = tmp_path / "user" / "dup"
org = tmp_path / "org" / "dup"
_write_skill(user, "dup", body=b"# user wins\n")
_write_skill(org, "dup", body=b"# org loses\n")
loader = SkillsLoader(search_paths=[tmp_path / "user", tmp_path / "org"])
with structlog.testing.capture_logs() as logs:
result = loader.load_all()
assert result.is_ok()
assert [s.id for s in result.unwrap().skills] == ["dup"]
shadow_events = [l for l in logs if l.get("event") == "skill_shadowed"]
assert len(shadow_events) == 1
ev = shadow_events[0]
assert ev["skill_id"] == "dup"
assert ev["winning_tier"] == "user"
assert ev["shadowed_tier"] == "org"
def test_symlink_refused_does_not_dereference(tmp_path: Path) -> None:
"""AC-5: O_NOFOLLOW must refuse a planted symlink; other skills must load."""
good = tmp_path / "user" / "good"
_write_skill(good, "good")
evil_dir = tmp_path / "user" / "evil"
evil_dir.mkdir(parents=True)
sentinel = tmp_path / "outside"
sentinel.write_bytes(b"leaked\n")
(evil_dir / "SKILL.md").symlink_to(sentinel)
result = SkillsLoader(search_paths=[tmp_path / "user"]).load_all()
assert result.is_ok()
outcome = result.unwrap()
assert [s.id for s in outcome.skills] == ["good"]
assert len(outcome.per_file_errors) == 1
err = outcome.per_file_errors[0]
assert isinstance(err, SymlinkRefused)
assert err.path == evil_dir / "SKILL.md"
def test_unsafe_yaml_payload_executes_no_code(tmp_path: Path) -> None:
"""AC-6: !!python/object payload must yield reason='unsafe_yaml' and run nothing."""
sentinel = tmp_path / "pwned"
evil = tmp_path / "user" / "evil"
evil.mkdir(parents=True)
(evil / "SKILL.md").write_bytes(
textwrap.dedent(f"""\
---
!!python/object/apply:os.system ['touch {sentinel}']
---
""").encode()
)
result = SkillsLoader(search_paths=[tmp_path / "user"]).load_all()
assert result.is_ok()
outcome = result.unwrap()
assert not sentinel.exists(), "!!python/object MUST NOT execute"
assert len(outcome.per_file_errors) == 1
assert isinstance(outcome.per_file_errors[0], UnsafeYaml)
def test_body_not_loaded_into_memory_under_100mb_fixture(tmp_path: Path) -> None:
"""AC-7: progressive disclosure — peak loader allocation < 20 KB on a 100 MB body."""
big = tmp_path / "user" / "big"
big.mkdir(parents=True)
skill_md = big / "SKILL.md"
frontmatter = (b"---\nid: big\napplies_to_tasks: [t]\napplies_to_languages: [l]\n---\n")
body_size = 100 * 1024 * 1024
with skill_md.open("wb") as f:
f.write(frontmatter)
# 1 MB chunks of os.urandom → 100 MB body, deterministically seeded for cross-run stability.
chunk = (b"\xab" * (1 << 20))
for _ in range(100):
f.write(chunk)
loader = SkillsLoader(search_paths=[tmp_path / "user"])
tracemalloc.start()
result = loader.load_all()
_, peak = tracemalloc.get_traced_memory()
tracemalloc.stop()
assert result.is_ok()
assert result.unwrap().skills[0].body_size == body_size
# 20 KB headroom for Pydantic / hash state / dict overhead; load-bearing budget.
assert peak < 20 * 1024, f"progressive disclosure breached: peak={peak} bytes"
Property test:
# tests/property/test_skills_loader_monotone.py — AC-11
from hypothesis import given, strategies as st
from codegenie.skills import SkillsLoader # built from in-memory fixtures via a helper
@given(st.sets(st.text(min_size=1, max_size=8)),
st.sets(st.text(min_size=1, max_size=8)))
def test_find_applicable_is_monotone_in_evidence_keys(a: set[str], b: set[str]) -> None:
loader = _loader_with_skills_from(...)
matches_a = {s.id for s in loader.find_applicable(a)}
matches_ab = {s.id for s in loader.find_applicable(a | b)}
assert matches_a <= matches_ab, "adding evidence removed a match"
Run; confirm every test fails because src/codegenie/skills/ does not exist. Commit as red.
Green — make it pass¶
Land the four edits in order:
src/codegenie/skills/model.py—SkillIdnewtype (or import fromcodegenie.adapters.idsif Step 1 already lifted it),SkillPydantic model.src/codegenie/skills/loader.py—SkillsLoadErrordiscriminated union,_read_frontmatter,_stream_hash_body,_load_one_skill,SkillsLoaderclass.src/codegenie/skills/__init__.py—__all__ = ["Skill", "SkillsLoader", "SkillsLoadError"]; re-export frommodelandloader.tests/unit/skills/andtests/property/directory scaffolding (__init__.pyfiles only; the tests above land here).
Refactor — clean up¶
- Module docstring on
skills/loader.pycitesphase-arch-design.md §"Component design" #9, 02-ADR-0007, and the progressive-disclosure commitment §2.7. _TIER_NAME = {0: "user", 1: "repo", 2: "org"}is a module-levelFinal[dict[int, str]]; the structlog event uses these strings, not raw indices.- The structlog event name literals (
"skill_shadowed","skill_load_failed") live near their emit sites; do not pre-emptively centralize. - No
# type: ignorecomments in this module; add Pydantic stubs todevextras only ifmypy --strictflags a concrete line.
Files to touch¶
| Path | Why |
|---|---|
src/codegenie/hashing.py |
Modify (additive) — add content_hash_fd(fd, *, offset, size) -> str to __all__; no edits to existing content_hash / content_hash_bytes / identity_hash (AC-25) |
src/codegenie/skills/__init__.py |
New — public surface (Skill, SkillsLoader, SkillsLoadError, LoadOutcome, FatalLoadError, EvidenceQuery, Tier) |
src/codegenie/skills/model.py |
New — Tier Literal alias, _TIERS ordering constant, Skill Pydantic model (typed with SkillId / TaskClassId / Language imported from codegenie.types.identifiers), EvidenceQuery model |
src/codegenie/skills/loader.py |
New — SkillsLoadError discriminated union (five reasons), LoadOutcome, FatalLoadError, _load_one_skill, _read_frontmatter, _matches pure function, SkillsLoader class |
tests/unit/test_hashing.py |
Modify (additive) — round-trip parity tests for content_hash_fd vs content_hash_bytes on identical bodies (AC-25 parity) |
tests/unit/skills/__init__.py |
New — package marker |
tests/unit/skills/test_loader.py |
New — named tests covering AC-1..AC-23, AC-26 |
tests/unit/skills/test_no_direct_yaml_import.py |
New — AST source-scan for AC-24 |
tests/unit/skills/test_no_direct_blake3_import.py |
New — AST source-scan for AC-25 |
tests/property/__init__.py |
New (if absent) — package marker |
tests/property/test_skills_loader_monotone.py |
New — Hypothesis property test for AC-11 (monotonicity); AC-11a positive correctness lives in tests/unit/skills/test_loader.py (deterministic fixture, not Hypothesis) |
Out of scope¶
- Layer D
SkillsIndexProbe(src/codegenie/probes/layer_d/skills_index.py) — S6-01; consumes this loader but does not ship here. - Per-tier signing (Sigstore-style) for
~/.codegenie/skills-org/— Phase 14 multi-tenant concern;ADRs/README.md "Decisions noted" #5records the deferral; this story ships first-tier-wins + loudskill_shadowedonly. ConventionsCatalogLoader— S2-02 (parallel-after-S1-04 with this story).- Reference TCCM roundtrip + Protocol-mock dispatcher — S2-03 (depends on this story landing because S2-03's mock dispatcher closes Gap 1 by exercising every Protocol method, but its setup borrows the typed
Result-returning loader shape). - Extending
safe_yamlwith aloads_bytes(data, *, ...)shim — out of scope; this story uses the documentedsafe_yaml.load(path, ...)interface via a temp file. A follow-up may addloads_bytesif a second caller (e.g.,ConventionsCatalogLoader's inline-doc path) materializes; Rule of Three until then. - Hostile YAML adversarial corpus tests — Phase 1 S5-01 already pins
safe_yaml's!!python/object+ alias-amplification defenses; this story exercises the SkillsLoader-side outcome only. - Skills body content rendering / Markdown parsing — never in scope; Phase 2 records
(body_offset, body_size, body_blake3)only. The Planner (Phase 4+) reads bodies lazily through a different code path.
Notes for the implementer¶
- Three-tier order is load-bearing. The arg order
[user, repo, org]is the security commitment. If__init__ever flips the order silently, thetest_first_tier_wins_emits_skill_shadowedtest fails — that's intentional. Do not "fix" the test; fix the order. O_NOFOLLOW | O_NOCTTYis the literal flag set. AddingO_CLOEXECis fine (defensive); omittingO_NOFOLLOWis a security regression. TheO_NOCTTYis paranoia for terminal-device paths and matches Phase 1'sparsers/_io.pyconvention.- Streaming-hash discipline. AC-7 is enforced by a
tracemallocbudget — if you replace the chunkedos.readloop withbody = f.read()the test fails on a 100 MB fixture. Do not "optimize" by reading the whole body even for small files; the implementation must be uniformly streaming. safe_yaml.loadchokepoint is non-negotiable. Phase 1 ratified one YAML loader (S1-03); do not importyamldirectly anywhere insrc/codegenie/skills/. AC-12 ripgreps foryaml\.and asserts zero hits. If you need a YAML capabilitysafe_yamldoesn't expose, file an issue extendingsafe_yaml— don't fork it.SkillsLoadErroris Pydantic, not aCodegenieErrorsubclass. Phase 1's markers-only invariant (tests/unit/test_errors.py::test_subclasses_are_markers_only) binds onlycodegenie.errors.CodegenieErrorsubclasses; Phase 2'sResult.Errvalues are typed Pydantic models with reason discriminators. Do not subclassCodegenieError; do not try to fitSkillsLoadErrorinto the markers-only contract — that is a different contract for a different purpose.- Per-file errors are NOT fatal. A symlink, an unsafe-YAML payload, or a schema violation on ONE skill must NOT cause
load_all()to returnResult.Err. The signature returnsResult.Ok(LoadOutcome(...))with a per-file-error list; the onlyResult.Errpath is catastrophic (e.g., all search paths unreadable). Arch §"Failure behavior" is explicit: "this skill skipped; loud CLI warning; other skills load." - Tempfile dance for
safe_yaml.load.safe_yaml.loadtakes aPath. The frontmatter-bytes-to-temp-file step is ugly but preserves the Phase 1 chokepoint without aloads_bytesshim. Thetempfile.NamedTemporaryFile(delete=False)+finally: tmp_path.unlink(missing_ok=True)shape leaks no fd; usedelete=Falsebecause Windows would otherwise lock the file (we run on macOS / Linux but the discipline is portable). If aloads_bytesshim lands later, replace this dance — one place to update. find_applicablemust remain a pure function over the cached_skillslist. Do not re-load on every call; the loader's state is the cached list from the lastload_all(). Ifload_all()was never called,find_applicablereturns[](do not auto-callload_all()— explicit > implicit, per Rule 3). The matching logic lives in a top-level pure function_matches(skills, evidence)(functional-core); the instance method is a thin wrapper. This lets the matching contract be unit-tested without a filesystem fixture.
Design-pattern notes (added by validator 2026-05-15)¶
- Hashing chokepoint extension (Open/Closed at the file boundary). The streaming-hash requirement forces a new public symbol on
codegenie.hashing. Do not importblake3directly undersrc/codegenie/skills/— that violates ADR-0001 §"single source of truth for hashing" (Phase 0). The Open/Closed-compliant move is to extendcodegenie.hashingwith a new function, leaving every existing call site unchanged. This is the same pattern asparsers/_io.py::open_capped— one kernel, many parser-kind callers — applied to hashing. - Functional-core / imperative-shell split.
_load_one_skilldoes seven concerns. Pure-function shape preferred: - Pure:
_split_frontmatter(data: bytes) -> Result[FrontmatterSplit, FrontmatterUnterminated](whereFrontmatterSplit = NamedTuple(frontmatter_bytes: bytes, body_offset: int)). - Pure:
_build_skill(parsed: dict, body_offset: int, body_size: int, body_blake3: str) -> Result[Skill, SchemaViolation](wraps thetry/except ValidationError). - Pure:
_matches(skills, evidence) -> list[Skill](thefind_applicableengine). - Impure:
_open_skill_path(path) -> Result[int, SymlinkRefused | IoFailure](theos.opencall). - Impure:
_read_with_streaming_hash(fd, body_offset) -> tuple[int, str](the fd-based loop). - Impure:
_load_one_skillis the imperative shell composing the above.
Each pure helper is unit-testable with bytes inputs — no filesystem fixture, no tempfile, no monkeypatch. The mutation-table closures (AC-7a, AC-11a) become trivial against pure functions. Not promoted to AC (Rule 3 / surgical changes — observable behavior is constrained by the AC suite; the shape is the implementer's call). But if the implementer writes a 200-LOC monolithic _load_one_skill, expect the executor's Validator pass to push back on test-isolation grounds.
- Tier typing — Literal, not int indices. _TIER_NAME = {0: "user", 1: "repo", 2: "org"} is a tag-and-dispatch over int (ADR-0033 §3 flags this as anti-pattern adjacent). Use Tier: TypeAlias = Literal["user", "repo", "org"] + _TIERS: Final[tuple[Tier, ...]] = ("user", "repo", "org"). zip(_TIERS, self._search_paths, strict=True) iterates with typed tier identity; log events emit winning_tier: Tier directly. The type checker now enforces "no fourth tier without an ADR amendment to _TIERS + Tier."
- SkillsLoadError is a Pydantic discriminated union, but TCCMLoadError (S1-04) is a marker CodegenieError subclass — both are correct. Convention forward: single-file loaders that raise on parse failure (safe_yaml.load, TCCMLoader) use marker exceptions with prefixed args[0]; multi-file partial-success loaders that return per-file errors as values (this story, S2-02 ConventionsCatalogLoader) use Pydantic discriminated unions. Phase 1's markers-only invariant (tests/unit/test_errors.py::test_subclasses_are_markers_only) binds only codegenie.errors.CodegenieError subclasses; Result.Err payloads are out-of-scope for that test and require their own typed-value discipline (ADR-0033 §3).
- Primitive obsession on applies_to_* (ADR-0033 §1). list[str] is syntactically correct but semantically wrong — applies_to_tasks is a list of task-class identifiers, not arbitrary strings; same for languages. Use list[TaskClassId] and list[Language] so a mistake like passing a Language where a TaskClassId is expected fails at type-check time. The newtypes already live in codegenie.types.identifiers; import them.
- EvidenceQuery over set[str] (ADR-0033 §3, illegal-states-unrepresentable). A flat set[str] of mixed task/language identifiers cannot distinguish "the empty set of languages, no task" from "a query with no task and any language" — and silently conflates applies_to_tasks matching with applies_to_languages matching. The typed EvidenceQuery(task: TaskClassId | None, languages: set[Language]) makes the two axes structurally separate; the wildcard semantics ("*") live in the Skill, not the query. Phase 4+ planners that produce queries get typed call sites.
- ["*"] wildcard sentinel — CLAUDE.md convention. Per CLAUDE.md §"Conventions to follow when writing the POC": "Each probe declares applies_to_tasks and applies_to_languages with ['*'] meaning 'all'." This is the documented wildcard. Do not invent a applies_to_tasks: list[TaskClassId] | None = None "any" sentinel — that's primitive-obsession resurfacing.
- Three-tier order is ADR-amend-gated (intentional friction). Adding a fourth tier (e.g., "container-mounted" or "monorepo-shared") requires editing Tier Literal + _TIERS constant + default() factory + this story's ACs. That friction is intentional — the security model depends on tier count and order, not just on the merge semantics. Per the precedent of ADR-0030's "no Unknown DerivedQuery variant" trapdoor.
- unsafe_yaml is operationally umbrella, not strictly accurate. safe_yaml.load translates every yaml.YAMLError subclass to MalformedYAMLError — ConstructorError (the !!python/... case), ParserError (the syntax-typo case), ScannerError, ComposerError. The unsafe_yaml reason name is the operationally-prudent umbrella ("treat as hostile until investigated"). Operators reading unsafe_yaml=true in a log inspect the file; the inspection itself distinguishes attack-attempt from typo. Do not split into unsafe_yaml / malformed_yaml — the cost-of-distinguishing exceeds the value (we'd be inferring intent from a __cause__ chain). AC-6 + AC-6b prove both arms land in the same bucket and that no code executes for either. Arch §"Edge cases" row 8 is authoritative.
- body_blake3 format pin. The hashing module returns f"blake3:{64-hex}". body_blake3 is pinned to that format via Field(pattern=...); AC-8 asserts the regex. An implementation that strips the prefix, truncates, or silently switches to SHA-256 fails the model construction itself — fail-loud, ADR-0033-style.
- No O_NOFOLLOW defense-in-depth via safe_yaml.load. safe_yaml.load itself uses O_NOFOLLOW on the path it's given (the tempfile). The SkillsLoader-side O_NOFOLLOW on path (the real SKILL.md) is the load-bearing defense — the tempfile-side check is incidental (tempfiles aren't symlinks). AC-18 pins the SkillsLoader-side flag set explicitly.
- TOCTOU: rglob does NOT snapshot the filesystem. Between search_path.rglob("SKILL.md") yielding a path and os.open(path) opening it, an attacker (or a concurrent process) can delete the file. The original story claimed "load_all skips by rglob's snapshot semantics" — that's incorrect; Python's pathlib.Path.rglob is a generator over os.scandir results, not a snapshot. AC-20 mandates typed IoFailure instead of an unhandled FileNotFoundError. Catch all OSError (except ELOOP which is SymlinkRefused) and surface as IoFailure(errno_name=errno.errorcode[exc.errno]). The errno-name is a structural fingerprint operators can grep; the per-file error does not let the gather fail.
- Same-tier collisions exist too. Two SKILL.md files under ~/.codegenie/skills/ with the same id: foo is a real misconfiguration. The story treats this the same as cross-tier — same skill_shadowed event, with winning_tier == shadowed_tier. Lexicographic-first wins (deterministic). AC-16 closes this case.