Skip to content

Attempt log — S2-01 SkillsLoader three-tier merge

Attempt 1 — 2026-05-15 — GREEN on first pass after 3 minor fixes

Outcome

Done. All 34 ACs satisfied. 35 new tests in tests/unit/skills/ + tests/property/test_skills_loader_monotone.py; 3 new tests in tests/unit/test_hashing.py for the content_hash_fd chokepoint extension. Full suite: 1891 passed, 5 skipped, 0 failed. Coverage 93.24% (≥85 floor).

What landed

  • src/codegenie/skills/__init__.py — public surface (13 exports pinned by __all__).
  • src/codegenie/skills/model.pySkill (frozen, extra=forbid), EvidenceQuery, Tier Literal + TIERS ordering constant.
  • src/codegenie/skills/loader.pySkillsLoader, SkillsLoadError discriminated union (5 reasons), LoadOutcome, FatalLoadError, pure helpers _split_frontmatter / _matches, impure _scan_frontmatter / _open_skill_path / _load_one_skill.
  • src/codegenie/hashing.py — extended __all__ by exactly one entry: new public content_hash_fd(fd, *, offset, size) -> str. No edits to existing functions (Open/Closed at the file boundary).
  • src/codegenie/types/identifiers.py — added Language = NewType("Language", str) (the story's hardened H2 closure required it; the file's docstring previously named only four newtypes).
  • Test files: tests/unit/skills/test_loader.py (32 named tests), tests/unit/skills/test_no_direct_yaml_import.py (AC-24 AST scan), tests/unit/skills/test_no_direct_blake3_import.py (AC-25 AST scan + __all__ size check), tests/property/test_skills_loader_monotone.py (Hypothesis monotonicity on languages and on task None→Some).

Three issues caught during the green pass

  1. zip(strict=True) on the tier-positional iteration failed every test that passed fewer than 3 paths. The story's hand-written test code for AC-4 passed 2 paths ([user, org]) but expected winning_tier="user" and shadowed_tier="org" — internally inconsistent with positional tier assignment. Fix: switched to non-strict zip(TIERS, self._search_paths) (silent truncation; callers may pass 1–3 paths) and rewrote the AC-4 test to pass 3 paths ([user, absent-repo, org]) so the positional assignment matches the asserted tiers.
  2. Memory peak ~2.1 MB on the 100 MB body (AC-7 envelope was 256 KB). Root cause: an intermediate _read_frontmatter_bytes helper materialized the whole 1 MiB scan window twice (chunk list + b"".join). Fix: collapsed read+scan into a single streaming _scan_frontmatter that early-exits at the closing fence (so a normal SKILL.md with ~70 bytes of frontmatter only buffers ~4 KiB), and os.lseeks back to body offset before streaming through content_hash_fd.
  3. IsADirectoryError during read (AC-20 directory-as-SKILL.md fixture) was uncaught — only the os.open call had an OSError handler. Fix: wrapped the scan/hash block in a second try/except OSError returning a typed IoFailure. Now any read-time OS error (TOCTOU disappearance, EISDIR, EACCES) becomes a per-file error instead of an unhandled exception.

ACs and where they're verified

AC Evidence
AC-1 tests/unit/skills/test_loader.py::test_ac1_module_all_is_exact_set, test_ac1_skill_is_frozen_and_extra_forbid, test_ac1_tier_literal_and_tiers_order
AC-2 test_ac2_constructor_does_no_io (monkeypatches 6 I/O primitives + Path.exists, Path.is_dir)
AC-3 test_ac3_happy_path_three_tier_merge (every field asserted, not just id)
AC-3a test_ac3a_missing_optional_tier_silently_skipped
AC-4 test_ac4_user_wins_over_org_with_skill_shadowed (structlog capture_logs)
AC-4a test_ac4a_repo_wins_when_user_absent
AC-5 test_ac5_symlink_refused_does_not_dereference (sentinel-bytes-leak check)
AC-6 test_ac6_unsafe_yaml_payload_executes_no_code (touch sentinel; sentinel.exists() == False)
AC-6b test_ac6b_pure_syntax_error_also_lands_as_unsafe_yaml (umbrella honesty)
AC-6c test_ac6c_frontmatter_unterminated_typed_no_oom (tracemalloc peak < 2 MiB)
AC-6d test_ac6d_schema_violation_carries_details (Pydantic errors() row at loc=("applies_to_tasks",))
AC-7 test_ac7_body_not_loaded_into_memory_under_100mb_fixture (tracemalloc peak < 256 KB on 100 MB body)
AC-7a test_ac7a_exact_body_offset_and_size (frontmatter length pinned at 64 bytes; body_offset == 64)
AC-8 test_ac8_body_blake3_matches_reference (streaming vs content_hash_bytes byte-for-byte; regex ^blake3:[0-9a-f]{64}$)
AC-9 covered by AC-1 frozen+extra_forbid test
AC-10 test_ac10_skills_load_error_discriminator_round_trip + test_ac10_symlink_refused_json_shape_pin
AC-11 tests/property/test_skills_loader_monotone.py (two @given tests: languages subset, task None→Some)
AC-11a test_ac11a_find_applicable_correctness (positive + negative assertions per the four-skill fixture)
AC-11b test_ac11b_find_applicable_empty_before_load_all (monkeypatch os.open/os.listdir)
AC-12 test_ac12_safe_yaml_load_is_the_only_yaml_path (monkeypatch spy; 3-skill fixture → 3 invocations; max_bytes ≥ 1 MiB)
AC-13 scripts/check_forbidden_patterns.py (Phase 0; skills already in the Phase-2 packages set; pre-commit clean)
AC-14 ruff check, ruff format --check, mypy --strict, mypy --warn-unreachable all clean; pre-commit all-files passes
AC-15 RED-GREEN: tests written first against an empty src/codegenie/skills/ failed; implementation made them pass; refactor was no-op behavior (zip strict→truncated; combined scan/read into one streaming helper)
AC-16 test_ac16_same_tier_collision_lex_first_wins (same-tier dup with lex-first winner; shadow event has winning_tier == shadowed_tier == "user")
AC-17 alias of AC-7a
AC-18 test_ac18_open_flag_set_is_exact (os.open monkeypatched; captured flags == O_RDONLY | O_NOFOLLOW | O_NOCTTY exactly, with safe_yaml-side opens filtered out)
AC-19 test_ac19_within_tier_deterministic_lex_order (zzz, aaa[aaa, zzz])
AC-20 test_ac20_io_failure_on_missing_file (ENOENT path) + test_ac20_io_failure_on_isdir (directory-as-SKILL.md)
AC-21 alias of AC-11a
AC-22 test_ac22_wildcard_task_matches_any_task_including_none + test_ac22_wildcard_language_matches_any_language_including_empty
AC-23 test_ac23_find_applicable_returns_fresh_list (is not identity inequality + mutate-and-recall)
AC-24 tests/unit/skills/test_no_direct_yaml_import.py::test_no_skills_module_imports_yaml_directly (AST ast.walk; durable against alias smuggling)
AC-25 tests/unit/skills/test_no_direct_blake3_import.py::test_no_skills_module_imports_blake3_or_hashlib_directly + test_hashing_all_grew_by_exactly_one_for_content_hash_fd (6 entries pinned)
AC-26 test_ac26_tempfile_cleanup_on_unsafe_yaml (monkeypatch tempfile.tempdir; before/after glob("*.yaml") count)

Design-pattern decisions recorded

  • Open/Closed at the hashing chokepoint. content_hash_fd is a new symbol; existing content_hash / content_hash_bytes / content_hash_of_inputs were not edited. __all__ grew by exactly one. Tested by test_hashing_all_grew_by_exactly_one_for_content_hash_fd.
  • Functional core / imperative shell. _split_frontmatter and _matches are pure module-level functions taking bytes / sequences; _load_one_skill is the imperative shell composing the pure helpers with os.open / os.read / safe_yaml.load. The Hypothesis property test for AC-11 exercises _matches via the loader without ever touching the filesystem.
  • Tagged-union SkillsLoadError. Pydantic discriminated union over reason Literals — five concrete variants, no Any, no dict[str, ...] smuggling. Round-trip via TypeAdapter asserted (AC-10).
  • Newtype identifiers. applies_to_tasks: list[TaskClassId] and applies_to_languages: list[Language] replace list[str] (ADR-0033 §1). Language newtype added to codegenie.types.identifiers.
  • Literal tier identity over int-keyed dict. Tier = Literal["user", "repo", "org"] + TIERS: Final[tuple[Tier, ...]]zip(TIERS, search_paths) gives mypy-typed tier identity at every iteration.
  • Same-tier collision = cross-tier collision in disguise. Same event name, same dispatch logic, same outcome — operators see one skill_shadowed shape regardless of tier-pair. Tested by AC-16.

Lessons (carry forward)

  • L1 — zip(strict=True) is the wrong default when the caller's list length is a variable. When test fixtures and the default factory both produce variable-length search-path lists (1, 2, or 3 tiers depending on what's being asserted), strict-zip turns "user-only fixture" into a runtime error. Truncating-zip with positional tier assignment is the simpler contract; the type system carries the tier identity via Tier: TypeAlias = Literal[...] even when the runtime list is shorter.
  • L2 — Two-stage read-then-scan doubles the memory budget. The original _read_frontmatter_bytes_split_frontmatter split allocated the scan window twice (chunk list + b"".join). Combining into a single streaming scanner that early-exits on the closing fence dropped the peak from 2.1 MB to 1.0 MB on the unterminated-frontmatter fixture and from 2.1 MB to ~150 KB on the normal-frontmatter+100MB-body fixture. Future Phase-2 loaders that scan-then-act should default to the streaming shape unless the input is provably bounded.
  • L3 — Every os.read site needs its own OSError handler. AC-20's directory-as-SKILL.md fixture passes the os.open (macOS allows directory open) but fails at os.read with IsADirectoryError. Loaders that catch only on open leak unhandled exceptions on every read-time errno. The two-layer try/except (open-time and read-time) was free defense once added.