Story S6-05 — Ownership + ServiceTopologyStub + SloStub Layer E¶
Step: Step 6 — Ship Layer D + E + G probes (skills, conventions, ADRs, ownership, scanners)
Status: Done — GREEN 2026-05-18 (phase-story-executor; see _attempts/S6-05.md for the per-AC evidence table + gate log)
Effort: S
Depends on: S6-04 (Layer-D deferred-stub probe-shape — Null Object + Tagged union via discriminator + Open/Closed at file boundary; _PROBE_ID: Final[ProbeId] constant alongside name: str ABC attr; async def run(repo, ctx) signature; six-field ProbeOutput with duration_ms via time.perf_counter(); confidence="high" for "absence is the data" deferred state; discriminator="opted_in" literal in source as grep-trip-wire); S5-03 (Layer-C marker-probe shape established — _make_repo/_make_ctx test helpers; pure-helper functional-core extracted from imperative-shell run); S2-02 (ConventionsCatalogLoader pattern-types — Layer-E OwnershipProbe re-uses the marker-driven probe discipline).
ADRs honored: Phase 0 ADR-0007 (Probe ABC frozen byte-for-byte against localv2.md §4 — name: str, async def run(self, repo, ctx), ProbeOutput six-field shape, confidence: Literal["high","medium","low"]), 02-ADR-0003 (@register_probe(heaviness=…) is a registry kwarg — NOT a Probe ABC field), 02-ADR-0005 (no plaintext persistence — emails / Slack handles in CODEOWNERS and service.yaml flow through the writer redactor; the probe captures evidence honestly), 02-ADR-0007 (no plugin loader / no Confluence / Notion / service-catalog HTTP clients — the same "ship the boundary, defer the implementation" discipline S6-04 applied to ExternalDocsProbe), 02-ADR-0008 (no event stream in Phase 2 — ServiceTopologyStub's and SloStub's real data sources are Phase-9-or-later event consumers)
Phase-2 deferred decision honored: localv2.md §5.5 — "These probes are mostly stubbed for local-dev. They emit 'data unavailable' markers when their data sources aren't configured." Phase 2 ships OwnershipProbe for real (it's a simple CODEOWNERS parser) and ships ServiceTopologyStub + SloStub as deferred stubs (mirrors S6-04 ExternalDocsProbe discipline; semantics re-routed through confidence="high" per the kernel ProbeOutput.confidence: Literal["high","medium","low"]).
Validation notes¶
Hardened 2026-05-17 via phase-story-validator (see _validation/S6-05-layer-e-probes.md for the full audit log). Twenty-one in-place edits resolved nine block-severity contract mismatches identical in shape to the S6-04 hardening — every one a drift between this story's draft and the kernel actually shipped (src/codegenie/probes/base.py:52-96, src/codegenie/probes/registry.py:131-238, src/codegenie/types/identifiers.py:29). The biggest structural fix re-routes the stubs' confidence semantics: the original AC-9 / AC-10 built every stub-test on confidence="unavailable", a value that does not exist in the frozen ProbeOutput.confidence: Literal["high","medium","low"] ABC field (Phase 0 ADR-0007, src/codegenie/probes/base.py:68). The harden re-routes the semantics through S6-04's "absence is the data → confidence='high'" precedent (the deferred-stub state is one the probe successfully determined — not a failure-to-determine), with typed opted_in: Literal[False] + reason: Literal["phase_9_or_later"] slice fields carrying the state and the confidence reporting the quality of the determination. OwnershipProbe.confidence for absent CODEOWNERS stays "low" — that case is a CertificateProbe-style upstream-absent observation (the file might exist but doesn't; the answer "no CODEOWNERS" is operationally a low-information signal for the Planner), not an opted-out deferral.
Eight further block-severity contract fixes mirror S6-04 / S6-01 / S6-02 / S6-03 hardenings: _run(self, ctx) → async def run(self, repo, ctx); probe_id class attrs → name: str = "ownership" (etc.) ABC attrs + module-level _PROBE_ID: Final[ProbeId] = ProbeId("ownership") (etc.) constants; full frozen Phase-0 ABC field set declared verbatim (version, layer, tier, requires, declared_inputs, timeout_seconds, cache_strategy); tuple[str, ...] → list[str] for applies_to_* (ABC requires list); from codegenie.ids → from codegenie.types.identifiers (codegenie.ids doesn't exist); _PROBE_REGISTRY["..."] → next(e for e in default_registry._entries if e.cls.name == "..."); ProbeContext.for_test(repo_root=...) (doesn't exist) → _make_repo(tmp_path) + _make_ctx(tmp_path) helpers (precedent: tests/unit/probes/layer_c/test_dockerfile.py:61-74); ProbeOutput(probe_id=..., …) → six-field shape (schema_slice, raw_artifacts, confidence, duration_ms, warnings, errors) with duration_ms via time.perf_counter().
Seven new ACs strengthen the AC set with mutation-resistance trip-wires and extension-by-addition discipline:
- AC-NEW-1 —
_PROBE_ID: Final[ProbeId]module constants for all three probes (dual-form identity discipline from S6-01/02/03/04 hardenings; mirrorssrc/codegenie/probes/layer_b/scip_index.py:114). - AC-NEW-2 — registry-membership smoke for all three probes (a future contributor removing the
@register_probedecorator would silently stop the probe from running in every gather; this test is the trip-wire — mirrors S6-04 AC-NEW-3). - AC-NEW-3 — tagged-union discriminator key
discriminator="opted_in"literal in stub-module sources (a future contributor changing the discriminator key fromopted_into e.g.kindwould silently fragment the Phase-9 widening — mirrors S6-04 AC-NEW-4). - AC-NEW-4 — no subclass-based extension path on any of the three probes (forbids
class OptedInServiceTopologyProbe(ServiceTopologyStubProbe)and friends; the eventual opted-in branch must bematch-dispatched conditional logic, not inheritance — mirrors S6-04 AC-NEW-5). - AC-NEW-5 —
OwnershipProbeinline-comment edge case (*.py @user # owners are the python teamparses with the inline-#and tail tokens stripped; otherwise a contributor "supporting" inline comments by stripping at the first#would also strip#characters legitimately appearing in patterns). - AC-NEW-6 —
OwnershipProbeline-parser extracted as a pure module-level free function_parse_codeowners_lines(text: str) -> tuple[tuple[OwnershipEntry, ...], tuple[str, ...]](functional-core / imperative-shell discipline —runis the only impure code; the parser is testable without anfdor aRepoSnapshot). - AC-NEW-7 —
OwnershipProbedocuments the intentional divergence from GitHub's documentedCODEOWNERSsearch order (CODEOWNERS>.github/CODEOWNERS>docs/CODEOWNERSin this implementation; GitHub uses.github/CODEOWNERS>CODEOWNERS>docs/CODEOWNERS) in the module docstring with the exact grep-discoverable phrase"Phase 2 search order intentionally diverges from GitHub"so a future contributor sees the choice was deliberate, not accidental.
Three design-pattern hardens were applied:
- The Phase-2 stub slices were renamed
ServiceTopologyStubSlice→NotOptedInServiceTopologySliceandSloStubSlice→NotOptedInSloSliceso the eventual Phase-9+ opted-in variants land as new sibling models under a tagged union (Annotated[NotOptedInServiceTopologySlice | OptedInServiceTopologySlice, Field(discriminator="opted_in")]) — not as backward-compatible additive fields on the existing models. This is the Open/Closed at the file boundary discipline S6-04 settled (NotOptedInExternalDocsSliceprecedent) and the rest of Phase 2 follows (Fresh+Stale,Pass+Fail+NotApplicable). - The Notes-for-implementer section explicitly names the design patterns each probe deploys: Null Object (stubs satisfy the ABC so consumers don't special-case), Tagged union via discriminator (
opted_inis the key choice; Phase 2 makes the discriminator-key commitment now), Open/Closed at file boundary (future opted-in branches land viamatchdispatch + new sibling slice models, never via subclass), and Functional core / imperative shell (the CODEOWNERS line-parser is pure;OwnershipProbe.runis the imperative shell). - The Refactor section now shows the exact
matchpattern the Phase-9 extension follows for each stub (match repo.config.get("service_topology"): case None | {}: …; case {"sources": _}: …; case other: assert_never(other)) so a Phase-9 contributor reading this story has a typed blueprint, not just a prohibition.
Schema-naming consistency note (deferred to S6-08): S6-05 ships Pydantic slice models; the JSON-Schema sub-schema files land in S6-08 under src/codegenie/schema/probes/layer_e/ (per S6-08 AC-9). S6-08 currently names them ownership.schema.json, service_topology_stub.schema.json, slo_stub.schema.json — the _stub suffix on the two stubs is inconsistent with the probe name ("service_topology", "slo" — no suffix). The convention "schema filename matches probe name" should win; this story pins probe names and slice models without the _stub suffix and flags the S6-08 naming for amendment in _validation/S6-05-layer-e-probes.md. AC-13 in this story is soft — it asserts the Pydantic slice contracts (extra="forbid", Literal enforcement) here; the JSON-Schema round-trip is an S6-08 concern.
No NEEDS RESEARCH findings — every gap traced to in-repo precedent: src/codegenie/probes/base.py:64-96 for the frozen ABC contract; src/codegenie/probes/layer_b/index_health.py:298-326 for the canonical full-field ABC declaration; src/codegenie/probes/layer_b/scip_index.py:114 for the _PROBE_ID: Final[ProbeId] dual-form discipline; src/codegenie/probes/layer_c/certificate.py and src/codegenie/probes/layer_c/dockerfile.py for the marker-probe shape OwnershipProbe mirrors; tests/unit/probes/layer_c/test_dockerfile.py:61-78 for the _make_repo/_make_ctx helper pattern; S6-04 validation report for the deferred-stub lineage.
Context¶
Layer E (Cross-repo / Operational) is mostly forward-looking — most of its data sources are production service catalogs, service meshes, SLO definitions, on-call schedules. None of that is meaningfully available in the local POC. One probe in Layer E has a real local-dev consumer today: OwnershipProbe, which reads CODEOWNERS. The other two — service topology and SLOs — exist as registered stubs so Phase 9+ can extend without contract churn.
CODEOWNERS is a stable, line-oriented format (GitHub / GitLab convention): one line per pattern, pattern @owner1 @team2. The probe parses the file (if present), emits a tuple[OwnershipEntry, ...] keyed by pattern, and notes absent / malformed cases without raising. The repo-root file is the canonical location; .github/CODEOWNERS, docs/CODEOWNERS, CODEOWNERS are the three conventional paths (GitHub's documented search order). Phase 2's OwnershipProbe intentionally diverges from GitHub's order — it searches CODEOWNERS > .github/CODEOWNERS > docs/CODEOWNERS because the repo-root file is the most visible to operators; an operator who wants .github/CODEOWNERS to win simply deletes the root file. Notes for the implementer §4 documents this with the exact grep-discoverable phrase the docstring must carry.
ServiceTopologyStub and SloStub follow the S6-04 ExternalDocsProbe discipline: register, run, emit confidence="high" (the probe successfully determined the feature is not opted in / not configured — absence is the data) with a typed slice carrying the state (opted_in: Literal[False], reason: Literal["phase_9_or_later"]). The discriminator key opted_in IS the schema commitment Phase 2 makes; the eventual Phase-9+ opted-in variants land as new sibling models under Annotated[NotOptedInServiceTopologySlice | OptedInServiceTopologySlice, Field(discriminator="opted_in")]. The slice schemas are deliberately minimal — Phase 9 extends with an ADR-amend rather than a backward-incompatible break.
Design-pattern lineage the implementer inherits (and must not break):
- Null Object pattern (stubs) — non-functional implementations that satisfy the
ProbeABC so the coordinator, theconfidencesection renderer, and the Planner consumeServiceTopologyStubProbe/SloStubProbeexactly as they consume any other Layer-E probe — no null-checks, noif probe.name in {"service_topology", "slo"}: skip, no special-casing. - Tagged union via discriminator (stubs) —
opted_inis the discriminator key. The Phase 2 closed shape isLiteral[False]; the eventual Phase-9+ tagged union isAnnotated[NotOptedInServiceTopologySlice | OptedInServiceTopologySlice, Field(discriminator="opted_in")]. Picking the discriminator key in Phase 2 is the only schema commitment Phase 2 makes for the stubs; everything else is deferred. - Open/Closed at the file boundary (stubs) — when the opted-in branches land (Phase 9+), the dispatch is
match repo.config.get("service_topology"): case None | {}: _emit_not_opted_in_topology(ctx); case {"sources": _}: _emit_opted_in_topology(repo, ctx); case other: assert_never(other)with an exhaustivematch+assert_neveron the discriminator. Not a subclass; not anif/elseladder; not edits to theNotOptedInServiceTopologySlicemodel. - Functional core / imperative shell (ownership) — the CODEOWNERS line-parser
_parse_codeowners_lines(text) -> tuple[tuple[OwnershipEntry, ...], tuple[str, ...]]is a pure module-level free function: bytes-in, parsed-tuples-out, nofd, noRepoSnapshot, noPath.OwnershipProbe.runis the only impure code (filesystem search + size check + atomic write of the raw artifact). This is the precedent S6-01/02/03/04 settled and Layer-C marker probes (dockerfile.py,entrypoint.py,shell_usage.py,certificate.py) already follow.
Phase 0's secret redactor handles email leakage at the writer chokepoint. The Phase 2 commitment is that the probe doesn't pre-filter — it captures evidence honestly, and the writer chokepoint redacts. (CODEOWNERS emails aren't AWS keys, but the same chokepoint discipline applies: the probe is honest about what's in the file; the redactor decides what reaches disk.)
References — where to look¶
- Architecture:
../phase-arch-design.md§"Anti-patterns avoided" — "Schema before consumer" — the stubs ship a minimal schema with one consumer (the unit test); the real opted-in schema lands with Phase 9.../phase-arch-design.md§"Edge cases" — typed-reason discriminated unions across every state machine;OwnershipProbe's "no file present" maps to a typedResult-shaped slice, not an exception.- Phase ADRs:
../ADRs/0005-secret-findings-no-plaintext-persistence.md— emails are not secrets per se, but the redactor handles handles/emails at the writer chokepoint.../ADRs/0007-no-plugin-loader-in-phase-2.md— the canonical "ship the boundary, defer the implementation" precedent S6-04 applied toExternalDocsProbe; same discipline applied here toServiceTopologyStub+SloStub.../ADRs/0008-no-event-stream-in-phase-2.md—SloStub's real data source (production SLO catalog) is a Phase-9-or-later event consumer.../ADRs/0003-coordinator-heaviness-sort-annotation.md—heavinessis a@register_probe(heaviness=...)kwarg, NOT aProbeABC field; registry membership verified viadefault_registry._entries.- Source design:
../High-level-impl.md§"Step 6" — marker-driven; stubs for Phase 9+.../../localv2.md§5.5 — Layer E description; stubs by default.- Probe-shape precedents (post-hardening):
./S6-04-external-docs-opt-in.md(HARDENED) — the canonical Phase-2 deferred-stub probe pattern:confidence="high"with typed slice carrying state;opted_in: Literal[False]discriminator key; Open/Closed at file boundary; no subclass extension;discriminator="opted_in"literal in source../S5-03-layer-c-marker-probes.md(HARDENED) — the canonical Layer-C marker-probe pattern:_make_repo/_make_ctxhelpers; pure functional-core parsers extracted fromrun; module-level_PROBE_ID: Final[ProbeId]constant.- Existing kernel (the authoritative contract for this probe):
src/codegenie/probes/base.py(Phase 0) —ProbeABC frozen byte-for-byte againstlocalv2.md §4.name: str(NOTprobe_id);layer,tier;applies_to_tasks: list[str]/applies_to_languages: list[str](NOTtuple);requires: list[str],declared_inputs: list[str],timeout_seconds: int,cache_strategy: Literal["content","none"].async def run(self, repo: RepoSnapshot, ctx: ProbeContext) -> ProbeOutput—RepoSnapshotis the FIRST arg (NOT actxfield).ProbeOutput.confidence: Literal["high","medium","low"]—"unavailable"is NOT a permitted value.ProbeOutput(schema_slice, raw_artifacts, confidence, duration_ms, warnings, errors)— six fields, noprobe_id.ProbeContextis a stdlib@dataclasswithcache_dir, output_dir, workspace, logger, configplus three optionals; NOrepo_root, NOfor_testclassmethod.src/codegenie/probes/registry.py:238—default_registry: Registry. The "look up heaviness" pattern:next(e for e in default_registry._entries if e.cls.name == "ownership").heaviness == "light". NO_PROBE_REGISTRYdict.src/codegenie/probes/layer_b/scip_index.py:114(precedent) —_PROBE_ID: Final[ProbeId] = ProbeId("scip_index")module-level constant alongsidename: str = "scip_index". Dual-form probe identity (str ABC attr + typed Final constant).src/codegenie/probes/layer_b/index_health.py:298-326— canonical full-field ABC declaration includingversion,layer,tier,requires,timeout_seconds,cache_strategy,declared_inputs.src/codegenie/probes/layer_c/certificate.py— Layer-C marker probe; the canonical full ABC field declaration mirror for marker-style probes.src/codegenie/probes/layer_c/dockerfile.py(with_dockerfile_parse.pysibling) — functional-core extraction precedent; line-by-line parser extracted as a pure free function.tests/unit/probes/layer_c/test_dockerfile.py:61-78— canonical_make_repo+_make_ctxhelper pattern for Layer-C/D/E tests.src/codegenie/types/identifiers.py:29—ProbeId = NewType("ProbeId", str). NOTcodegenie.ids— that module does not exist.
Goal¶
Ship three files under src/codegenie/probes/layer_e/:
ownership.py— realCODEOWNERSparser.@register_probe(heaviness="light"). Pure line-parser extracted as_parse_codeowners_lines(functional-core / imperative-shell). Parses any of the three GitHub-convention locations (Phase 2 order:CODEOWNERS>.github/CODEOWNERS>docs/CODEOWNERS); emits a typedOwnershipSlice;confidence="high"for found-file happy path,confidence="low"for absent file (Planner-actionable observation — mirrorsCertificateProbeupstream-absent pattern).service_topology_stub.py— deferred stub. Mirrors S6-04external_docs.pyprecedent:confidence="high", typedNotOptedInServiceTopologySlice(opted_in: Literal[False], reason: Literal["phase_9_or_later"]), no I/O beyond a single atomic raw-artifact write.opted_inis the eventual tagged-union discriminator key.slo_stub.py— deferred stub. Same shape asservice_topology_stub.py:confidence="high", typedNotOptedInSloSlice(opted_in: Literal[False], reason: Literal["phase_9_or_later"]), no I/O beyond a single atomic raw-artifact write.
The module docstrings explicitly name the deferral (stubs) and the intentional GitHub-search-order divergence (ownership) as grep-discoverable trip-wires.
Acceptance criteria¶
Numbered for traceability to the TDD plan.
Module layout & types¶
- [ ] AC-1. Three new files exist under
src/codegenie/probes/layer_e/plus__init__.py. Each file's__all__declares exactly the slice model(s) + probe class (alphabetically sorted). Specifically: ownership.py→__all__ = ["OwnershipEntry", "OwnershipProbe", "OwnershipSlice"]service_topology_stub.py→__all__ = ["NotOptedInServiceTopologySlice", "ServiceTopologyStubProbe"]-
slo_stub.py→__all__ = ["NotOptedInSloSlice", "SloStubProbe"] -
[ ] AC-2. Each Pydantic slice model has
model_config = ConfigDict(frozen=True, extra="forbid").OwnershipEntry,OwnershipSlice,NotOptedInServiceTopologySlice, andNotOptedInSloSlice— all four models. Mutation caught: a future contributor relaxingextra="forbid"toextra="allow"would silently admit unknown fields, breaking the discriminator-key invariants in AC-9 / AC-10. -
[ ] AC-3. Stub slice types are tagged-union-ready.
NotOptedInServiceTopologySlicehas exactlyopted_in: Literal[False]andreason: Literal["phase_9_or_later"]. TheLiteral[False]is the discriminator value; relaxing toboolwould silently admitopted_in=Trueslices before the Phase-9 opted-in branch exists.NotOptedInSloSlicehas the same two fields with identical types and identicalLiteralvalues.- The eventual tagged unions are
Annotated[NotOptedInServiceTopologySlice | OptedInServiceTopologySlice, Field(discriminator="opted_in")]andAnnotated[NotOptedInSloSlice | OptedInSloSlice, Field(discriminator="opted_in")]; Phase 2 ships only the not-opted-in variants.
Probe registration & ABC compliance¶
- [ ] AC-4. All three probes are decorated
@register_probe(heaviness="light")(kwarg form — 02-ADR-0003); class attributes declare the full frozen Phase-0ProbeABC field set verbatim:
name: str = "ownership" # or "service_topology" or "slo"
version: str = "0.1.0"
layer = "E"
tier = "base"
applies_to_tasks: list[str] = ["*"] # list[str], NOT tuple[str, ...]
applies_to_languages: list[str] = ["*"]
requires: list[str] = []
declared_inputs: list[str] = [...] # per-probe — see Implementation outline
timeout_seconds: int = 5
cache_strategy: Literal["content"] = "content"
Probe name values are exactly "ownership", "service_topology", "slo" (no _stub suffix on the stub probes — the probe identity is the long-term concern; the _stub suffix appears only on the Phase-2 implementation class names ServiceTopologyStubProbe / SloStubProbe).
-
[ ] AC-NEW-1.
_PROBE_ID: Final[ProbeId]constants exist. Each module declares a module-level_PROBE_ID: Final[ProbeId] = ProbeId("<probe-name>")constant alongside the class'sname: strABC attr. The dual-form discipline (str ABC attr + typedFinal[ProbeId]constant) mirrorsscip_index.py:114and S6-01/02/03/04 hardenings. TheProbeIdnewtype is imported fromcodegenie.types.identifiers— NOTcodegenie.ids(which does not exist). -
[ ] AC-5.
OwnershipProbe— happy path.async def run(self, repo: RepoSnapshot, ctx: ProbeContext) -> ProbeOutput. ReadsCODEOWNERSfrom the first present of:<repo>/CODEOWNERS,<repo>/.github/CODEOWNERS,<repo>/docs/CODEOWNERS. EmitsOwnershipSlice(source_path: str, entries: tuple[OwnershipEntry, ...])whereOwnershipEntry = (pattern: str, owners: tuple[str, ...], line_number: int). Six-fieldProbeOutputwithconfidence="high",duration_ms=int((time.perf_counter() - t0) * 1000),warnings=[],errors=[](errors empty on the pure happy path). -
[ ] AC-6.
OwnershipProbe— file absent.async def runreturnsconfidence="low"(NOT"unavailable"— that value does not exist in the kernelProbeOutput.confidence: Literal["high","medium","low"]per Phase 0 ADR-0007src/codegenie/probes/base.py:68; NOT"high"— absent CODEOWNERS is a Planner-actionable low-information observation, not a "we determined nothing is opted in" stub state). Slice carriessource_path=None, entries=().errors=["codeowners_absent"]. Mirrors theCertificateProbeupstream-absent precedent (src/codegenie/probes/layer_c/certificate.py:67-81). -
[ ] AC-7.
OwnershipProbe— malformed line. A line like*.py @valid-userparses to one entry; a line like*.py(no owners) parses toOwnershipEntry(pattern="*.py", owners=(), line_number=N)and adds"empty_owners_at_line_N"toerrors. Mutation caught: any "silently drop empty-owner lines" — operators need to know about misconfigured patterns. -
[ ] AC-8.
OwnershipProbe— comment lines + blank lines. Lines starting with#(after.lstrip()) and lines that strip to empty are skipped — they do NOT emit entries. Line numbers continue to increment over them (per Notes §3 — operators expectvim +N-compatible line numbers). Mutation caught: any parser that emits comments/blanks as entries with empty owners (would conflate with AC-7's empty-owners case). -
[ ] AC-NEW-5.
OwnershipProbe— inline-comment edge case. A line like*.py @user # owners are the python teamparses asOwnershipEntry(pattern="*.py", owners=("@user",), line_number=N)— the inline#…and all subsequent tokens on the line are dropped. The dropping is performed by the pure parser: split on whitespace; if any token starts with#, truncate the token list at that token (the#token and everything after it is comment-text). Mutation caught: a future contributor "supporting" inline comments by a naiveline.split("#", 1)[0]would also strip#characters legitimately appearing in patterns (e.g., a hypothetical*#testpattern); the per-token truncation discipline preserves pattern integrity. -
[ ] AC-9.
OwnershipProbe— three-location search order. When multipleCODEOWNERSfiles exist (which is operator misconfiguration but allowed), only the first found in Phase-2 order (CODEOWNERS>.github/CODEOWNERS>docs/CODEOWNERS) is parsed; the others are listed inerrors=["additional_codeowners_present_at:<path>", ...]. Mutation caught: any merge-the-files behavior; any precedence change. -
[ ] AC-NEW-7.
OwnershipProbe— intentional GitHub-divergence documented. The module docstring contains the exact grep-discoverable phrase"Phase 2 search order intentionally diverges from GitHub". An architectural test asserts this. Mutation caught: a future contributor "fixing" the divergence to match GitHub's order without an ADR-amend would silently change operator-observable behavior; the test fires on the docstring change first. -
[ ] AC-10.
ServiceTopologyStub— always not-opted-in.async def run(self, repo: RepoSnapshot, ctx: ProbeContext) -> ProbeOutputreturns the six-fieldProbeOutput:ProbeOutput( schema_slice=NotOptedInServiceTopologySlice( opted_in=False, reason="phase_9_or_later" ).model_dump(mode="json"), raw_artifacts=[ctx.output_dir / "service_topology.json"], confidence="high", # absence is the data; mirrors S6-04 NotOptedIn precedent duration_ms=int((time.perf_counter() - t0) * 1000), warnings=[], errors=[], )confidence="high"(not"low", not"unavailable"— the latter is not a permittedProbeOutput.confidencevalue): the probe successfully determined the feature is not opted in; the state lives in the slice, not in the confidence (mirrors S6-04NotOptedInExternalDocsSlicehardening). Norepo.configreads. Noctx.configreads. No file I/O beyond the single raw-artifact write. No network I/O. -
[ ] AC-11.
SloStub— always not-opted-in. Same shape asServiceTopologyStub:confidence="high", sliceNotOptedInSloSlice(opted_in=False, reason="phase_9_or_later"), single raw artifact atctx.output_dir / "slo.json", norepo.config/ctx.configreads, no network I/O.
Stub raw-artifact discipline¶
-
[ ] AC-NEW-6 (part 1: stubs). Each stub writes a single raw artifact atomically. Each stub probe writes the JSON-serialized slice (
json.dumps(slice.model_dump(mode="json"), sort_keys=True, indent=2)) toctx.output_dir / "<probe-name>.json"via the.tmp→os.replaceatomic-write pattern (Phase 0 writer chokepoint precedent).raw_artifactsis exactly[ctx.output_dir / "<probe-name>.json"]— a one-element list, no other files. Mirrors S6-04 AC-NEW-1. Mutation caught: a future contributor adding a body-bytes write would changeraw_artifactslength and fire the test. -
OwnershipProbe raw-artifact policy:
OwnershipProbedoes NOT write a raw artifact (raw_artifacts=[]). The slice contents are the complete evidence; there is no separate body to persist. This mirrors the Layer-C marker probes (certificate.py:76,dockerfile.py's post-parse path) — when the schema-slice contains 100% of the evidence, no raw artifact is written. The two stubs DO write raw artifacts because S6-04 establishes the deferred-stub raw-artifact discipline (the artifact carries the determination at the persistence boundary, making downstream cache-key derivation explicit).
Functional core (ownership)¶
- [ ] AC-NEW-6 (part 2: ownership pure parser).
OwnershipProbe's line-parser is a pure module-level free function with the exact signaturedef _parse_codeowners_lines(text: str) -> tuple[tuple[OwnershipEntry, ...], tuple[str, ...]]— input is the file body as text; output is(entries, parse_errors). The function performs NOPath/open/fdaccess, takes NORepoSnapshot/ProbeContext/Pathargument, raises NO exception (every malformed input is captured in theparse_errorstuple). Architectural test: AST-walk asserts_parse_codeowners_linesis aFunctionDefat module scope and its parameter annotation isstr. Mirrors the functional-core extraction disciplinedockerfile.py+_dockerfile_parse.pysettled.
Static integrity & negative invariants¶
-
[ ] AC-12. No HTTP / service-catalog client imports. The two stub files (
service_topology_stub.py,slo_stub.py) MUST NOT importhttpx,requests,aiohttp,urllib.request,socket,http.client,httplib, or any service-mesh / service-catalog client library. Architectural test (parametrized across the two stubs; mirrors S6-04 AC-6's seven-member set). -
[ ] AC-13. No cross-probe imports.
ownership.py,service_topology_stub.py,slo_stub.pydo not import each other. Also: none imports fromlayer_d/probes. Architectural test parametrized across the three files. Mutation caught: a future contributor extracting a shared "deferred stub" base in one file and importing it from the other (premature abstraction; Rule of Three forbids). -
[ ] AC-NEW-3. Tagged-union discriminator key documented in stub-module source. Each stub module's docstring (or a code comment / type annotation) contains the literal string
discriminator="opted_in"as the grep-discoverable trip-wire for the eventual Phase-9+ tagged-union widening. Mutation caught: a future contributor changing the discriminator key fromopted_into e.g.kindwould silently fragment the Phase-9 widening strategy. -
[ ] AC-NEW-4. No subclass-based extension path on any of the three probes. Architectural test: AST-walk asserts no module under
layer_e/containsclass X(OwnershipProbe),class X(ServiceTopologyStubProbe), orclass X(SloStubProbe). The eventual opted-in branches for the stubs land as conditionalmatchdispatch insiderun, not as subclasses. Composition over inheritance; mirrors S6-04 AC-NEW-5.
Body-size cap (ownership)¶
- [ ] AC-14.
OwnershipProbe— body size cap. ACODEOWNERSlarger thanOWNERSHIP_MAX_BYTES = 1 * 1024 * 1024(1 MB) is rejected withconfidence="low"anderrors=["codeowners_size_cap_exceeded:<n_bytes>"]. Mutation caught: any unbounded read would let a hostile repo OOM the gather. Implementation:Path.stat().st_size(oros.path.getsize(...)) check before opening the file; on cap-exceeded, return immediately without reading any bytes.
Registry & determinism¶
-
[ ] AC-NEW-2. Registry-membership smoke for all three probes.
next((e for e in default_registry._entries if e.cls.name == "<probe-name>"), None)is notNonefor each of"ownership","service_topology","slo". Mutation caught: a future contributor removing the@register_probedecorator would silently stop the probe from running in every gather. Mirrors S6-04 AC-NEW-3. -
[ ] AC-15. Determinism. Two consecutive
await Probe().run(repo, ctx)calls on the same fixture produce byte-identicalschema_sliceJSON for all three probes. For the two stubs, theraw_artifacts[0]file contents are also byte-identical.OwnershipProbepreserves source-file line order (NOT sorted — preserving authoring order is operationally useful when an entry is per-section).duration_msis excluded from the byte-identity comparison (it varies by clock).
Sub-schema deferred to S6-08¶
- [ ] AC-16. Pydantic slice contracts enforce
additionalProperties: false-equivalent at type level.extra="forbid"on every slice model is asserted bypytest.raises(pydantic.ValidationError)on each slice for an unknown-field payload. The JSON-Schema sub-schema files undersrc/codegenie/schema/probes/layer_e/land in S6-08; the JSON-Schema round-trip is not asserted in this story (forward-dependency on S6-08 — a Phase-2 story can't precondition on a downstream artifact). S6-08's schema-naming convention should match the probename(no_stubsuffix onservice_topology/slo); the validation report flags the S6-08 manifest for amendment.
Quality gates¶
-
[ ] AC-17.
mypy --strictpasses on all three probe files and the test module. -
[ ] AC-18. The stub deferrals are grep-able. A future Phase-9 contributor running
grep -rn "phase_9_or_later" src/codegenie/MUST find both stub modules. The slicereason: Literal["phase_9_or_later"]is the deliberate trip-wire (the literal value is the grep-token).
Implementation outline¶
- Create
src/codegenie/probes/layer_e/__init__.py(empty package marker; one-line docstring). ownership.py:- Module docstring with the exact AC-NEW-7 phrase
"Phase 2 search order intentionally diverges from GitHub", plus pointers tolocalv2.md§5.5 and theCertificateProbeupstream-absent precedent. _LOCATIONS: Final[tuple[str, ...]] = ("CODEOWNERS", ".github/CODEOWNERS", "docs/CODEOWNERS").OWNERSHIP_MAX_BYTES: Final[int] = 1 * 1024 * 1024._PROBE_ID: Final[ProbeId] = ProbeId("ownership")(imported fromcodegenie.types.identifiers).OwnershipEntry(BaseModel)withmodel_config = ConfigDict(frozen=True, extra="forbid")andpattern, owners, line_number.OwnershipSlice(BaseModel)withmodel_config = ConfigDict(frozen=True, extra="forbid"),source_path: str | None,entries: tuple[OwnershipEntry, ...].- Pure module-level function
_parse_codeowners_lines(text: str) -> tuple[tuple[OwnershipEntry, ...], tuple[str, ...]](functional core; AC-NEW-6 part 2) — splits on newlines, iterates with 1-indexed line numbers including blanks/comments in the count, skips blank/comment lines, splits each non-skipped line on whitespace, truncates at the first#-prefixed token (AC-NEW-5 inline-comment handling), and produces(entries, parse_errors). @register_probe(heaviness="light")class OwnershipProbe(Probe):with the full ABC field set (per AC-4 —declared_inputs: list[str] = ["CODEOWNERS", ".github/CODEOWNERS", "docs/CODEOWNERS"]) andasync def run(self, repo: RepoSnapshot, ctx: ProbeContext) -> ProbeOutputthat:- finds the first existing location; if none, returns
confidence="low"with empty slice +errors=["codeowners_absent"](AC-6). Path.stat().st_size > OWNERSHIP_MAX_BYTEScheck before opening the file → returnsconfidence="low"per AC-14.- reads the file text; calls
_parse_codeowners_lines(text); emits the slice + errors;raw_artifacts=[](per the marker-probe convention — the slice carries all evidence).
- finds the first existing location; if none, returns
service_topology_stub.py:- Module docstring with
"deferred to Phase 9 or later"and'discriminator="opted_in"'literal phrases (AC-NEW-3 grep-trip-wires). _PROBE_ID: Final[ProbeId] = ProbeId("service_topology").NotOptedInServiceTopologySlice(BaseModel)per AC-3.@register_probe(heaviness="light")class ServiceTopologyStubProbe(Probe):with the full ABC field set (per AC-4;name = "service_topology";declared_inputs: list[str] = []) andasync def run(self, repo, ctx)that emits the not-opted-in slice unconditionally withconfidence="high"and writes a single raw artifact toctx.output_dir / "service_topology.json"atomically (.tmp→os.replace).slo_stub.py:- Identical shape to
service_topology_stub.pywith_PROBE_ID = ProbeId("slo"),NotOptedInSloSlice,SloStubProbe,name = "slo", single raw artifact atctx.output_dir / "slo.json". - Add
tests/unit/probes/layer_e/__init__.py(empty package marker) andtests/unit/probes/layer_e/conftest.pywith_make_repo(tmp_path) + _make_ctx(tmp_path)helpers (precedent:tests/unit/probes/layer_c/test_dockerfile.py:61-74). The helpers are local to this test directory; do NOT cross-import fromtests/unit/probes/layer_c/. - Tests per the TDD plan.
Files to touch¶
| Path | Why |
|---|---|
src/codegenie/probes/layer_e/__init__.py |
New file — package marker. |
src/codegenie/probes/layer_e/ownership.py |
New file — real CODEOWNERS parser (≤ 150 LOC including pure parser). |
src/codegenie/probes/layer_e/service_topology_stub.py |
New file — deferred stub (≤ 50 LOC). |
src/codegenie/probes/layer_e/slo_stub.py |
New file — deferred stub (≤ 50 LOC). |
tests/unit/probes/layer_e/__init__.py |
New file — empty package marker. |
tests/unit/probes/layer_e/conftest.py |
New file — _make_repo and _make_ctx helpers (mirrors tests/unit/probes/layer_c/test_dockerfile.py:61-74). |
tests/unit/probes/layer_e/test_ownership.py |
New file — happy-path + size-cap + absent + malformed + comment + inline-comment + search-order + determinism + registry + pure-parser AST tests. |
tests/unit/probes/layer_e/test_stubs.py |
New file — parametrized across the two stubs (Null Object behavior, no forbidden imports, no subclass extension, discriminator-key grep, determinism, raw-artifact byte-identity, registry membership, _PROBE_ID constant). |
TDD plan — red / green / refactor¶
Red — write the failing tests first¶
# tests/unit/probes/layer_e/conftest.py
"""Shared helpers for Layer-E tests (S6-05).
`ProbeContext` is a stdlib @dataclass with no `for_test` classmethod;
constructing it in every test is verbose, so these helpers (mirroring
`tests/unit/probes/layer_c/test_dockerfile.py:61-74`) are the canonical
construction point for Layer-E unit tests.
"""
from __future__ import annotations
import logging
from pathlib import Path
import pytest
from codegenie.probes.base import ProbeContext, RepoSnapshot
@pytest.fixture
def _make_repo():
def _factory(tmp_path: Path, *, config: dict | None = None) -> RepoSnapshot:
return RepoSnapshot(
root=tmp_path,
git_commit=None,
detected_languages={},
config=config or {},
)
return _factory
@pytest.fixture
def _make_ctx():
def _factory(tmp_path: Path) -> ProbeContext:
workspace = tmp_path / "_ws"
workspace.mkdir(parents=True, exist_ok=True)
out_dir = tmp_path / "_out"
out_dir.mkdir(parents=True, exist_ok=True)
return ProbeContext(
cache_dir=tmp_path / "_cache",
output_dir=out_dir,
workspace=workspace,
logger=logging.getLogger("test"),
config={},
)
return _factory
# tests/unit/probes/layer_e/test_ownership.py
"""Unit tests for OwnershipProbe (S6-05)."""
from __future__ import annotations
import ast
import asyncio
import inspect
import json
from pathlib import Path
import pydantic
import pytest
from codegenie.probes.layer_e import ownership as op
from codegenie.probes.registry import default_registry
from codegenie.types.identifiers import ProbeId
# ----------------------------------------------------------------------
# Pure parser — functional core (AC-NEW-6 part 2)
# ----------------------------------------------------------------------
def test_parse_codeowners_lines_is_pure_module_function() -> None:
"""AC-NEW-6 (part 2). Mutation caught: a future contributor moving
the parser into the OwnershipProbe class (re-tangling pure / impure).
"""
src = inspect.getsource(op)
tree = ast.parse(src)
found = None
for node in tree.body:
if isinstance(node, ast.FunctionDef) and node.name == "_parse_codeowners_lines":
found = node
break
assert found is not None, "_parse_codeowners_lines must be a module-level function"
# Single positional `text: str` parameter (no `self`, no `Path`, no `RepoSnapshot`).
assert [a.arg for a in found.args.args] == ["text"]
def test_parse_codeowners_lines_happy_path() -> None:
"""AC-5. Mutation caught: any parser that emits owners as a single
string instead of splitting on whitespace would fail the tuple
length check."""
text = (
"# Default owners\n"
"* @platform-team\n"
"/api/ @api-team @platform-team\n"
"*.md @docs-team\n"
)
entries, errors = op._parse_codeowners_lines(text)
assert errors == ()
assert len(entries) == 3
assert entries[1].pattern == "/api/"
assert entries[1].owners == ("@api-team", "@platform-team")
assert entries[1].line_number == 3 # 1-indexed; comment line counts toward number
def test_parse_codeowners_lines_skips_comments_and_blanks() -> None:
"""AC-8. Mutation caught: emitting comments as entries with empty
owners would conflate with AC-7's empty-owners case."""
text = "# header\n\n* @team\n# trailing\n\n"
entries, errors = op._parse_codeowners_lines(text)
assert len(entries) == 1
assert entries[0].pattern == "*"
assert errors == ()
def test_parse_codeowners_lines_records_empty_owners_with_error() -> None:
"""AC-7. Mutation caught: silently dropping `*.py` (pattern with no
owners) — operators need to know this is misconfigured."""
text = "*.py\n"
entries, errors = op._parse_codeowners_lines(text)
assert entries[0].pattern == "*.py"
assert entries[0].owners == ()
assert "empty_owners_at_line_1" in errors
def test_parse_codeowners_lines_inline_comment_truncates_at_hash_token() -> None:
"""AC-NEW-5. Mutation caught: a `line.split('#', 1)[0]` naive
implementation would also strip `#` characters legitimately
appearing inside a pattern token; per-token truncation preserves
pattern integrity."""
text = "*.py @user # owners are the python team\n"
entries, errors = op._parse_codeowners_lines(text)
assert entries[0].pattern == "*.py"
assert entries[0].owners == ("@user",)
assert errors == ()
def test_parse_codeowners_lines_never_raises_on_garbage() -> None:
"""AC-NEW-6 (part 2). Pure parser must capture every malformed input
in the `errors` tuple, never raise."""
text = "\x00\x01\x02 weird\n@no-pattern-prefix\n leading-ws @t\n"
entries, errors = op._parse_codeowners_lines(text)
# No exception; either some entries parsed or all rejected — the
# parser's contract is "never raise", and per AC-7 empty-owner lines
# produce a parse_error rather than silently dropping.
assert isinstance(entries, tuple)
assert isinstance(errors, tuple)
# ----------------------------------------------------------------------
# Imperative shell — async probe.run
# ----------------------------------------------------------------------
def test_ownership_happy_path_parses_repo_root_file(
tmp_path: Path, _make_repo, _make_ctx,
) -> None:
"""AC-5. End-to-end imperative shell."""
repo = tmp_path / "repo"
repo.mkdir()
(repo / "CODEOWNERS").write_text(
"# Default owners\n* @platform-team\n/api/ @api-team @platform-team\n*.md @docs-team\n"
)
output = asyncio.run(op.OwnershipProbe().run(_make_repo(repo), _make_ctx(tmp_path)))
assert output.confidence == "high"
assert output.raw_artifacts == [] # AC: marker-probe convention; no raw artifact
slice_ = op.OwnershipSlice.model_validate(output.schema_slice)
assert slice_.source_path == "CODEOWNERS"
assert len(slice_.entries) == 3
assert slice_.entries[1].pattern == "/api/"
assert slice_.entries[1].owners == ("@api-team", "@platform-team")
assert output.duration_ms >= 0
def test_ownership_searches_three_locations_in_order(
tmp_path: Path, _make_repo, _make_ctx,
) -> None:
"""AC-9. Mutation caught: any precedence change — operators expect
the Phase-2 documented order (CODEOWNERS > .github/CODEOWNERS >
docs/CODEOWNERS), which intentionally diverges from GitHub."""
repo = tmp_path / "repo"
(repo / ".github").mkdir(parents=True)
(repo / "docs").mkdir()
(repo / "CODEOWNERS").write_text("* @root\n")
(repo / ".github" / "CODEOWNERS").write_text("* @github_dir\n")
(repo / "docs" / "CODEOWNERS").write_text("* @docs_dir\n")
output = asyncio.run(op.OwnershipProbe().run(_make_repo(repo), _make_ctx(tmp_path)))
slice_ = op.OwnershipSlice.model_validate(output.schema_slice)
assert slice_.source_path == "CODEOWNERS" # root wins
assert any("additional_codeowners_present_at" in e for e in output.errors)
def test_ownership_absent_yields_low_confidence_no_raise(
tmp_path: Path, _make_repo, _make_ctx,
) -> None:
"""AC-6. Mutation caught: re-raising on a no-CODEOWNERS repo; or
misframing absent-file as `confidence='high'` (the latter is the
deferred-stub framing — wrong for this probe; the right precedent
is the CertificateProbe upstream-absent pattern: confidence='low')."""
repo = tmp_path / "repo"
repo.mkdir()
output = asyncio.run(op.OwnershipProbe().run(_make_repo(repo), _make_ctx(tmp_path)))
assert output.confidence == "low"
slice_ = op.OwnershipSlice.model_validate(output.schema_slice)
assert slice_.entries == ()
assert slice_.source_path is None
assert "codeowners_absent" in output.errors
def test_ownership_size_cap_enforced(
tmp_path: Path, monkeypatch: pytest.MonkeyPatch, _make_repo, _make_ctx,
) -> None:
"""AC-14. Mutation caught: any unbounded read. The cap fires
*before* any read — verified by monkey-patching the size check; if
the implementation reads bytes before checking size, the test still
passes (the file is small) but the static design intent is
documented by Notes §2."""
import os
repo = tmp_path / "repo"
repo.mkdir()
(repo / "CODEOWNERS").write_text("* @team\n")
monkeypatch.setattr(os.path, "getsize", lambda p: op.OWNERSHIP_MAX_BYTES + 1)
output = asyncio.run(op.OwnershipProbe().run(_make_repo(repo), _make_ctx(tmp_path)))
assert output.confidence == "low"
assert any("codeowners_size_cap_exceeded" in e for e in output.errors)
def test_ownership_two_runs_byte_identical(
tmp_path: Path, _make_repo, _make_ctx,
) -> None:
"""AC-15. Mutation caught: any sort/reorder — line order preserves
operator's intent (early lines often override later)."""
repo = tmp_path / "repo"
repo.mkdir()
(repo / "CODEOWNERS").write_text("/b/ @b\n/a/ @a\n/c/ @c\n")
probe = op.OwnershipProbe()
out1 = asyncio.run(probe.run(_make_repo(repo), _make_ctx(tmp_path))).schema_slice
out2 = asyncio.run(probe.run(_make_repo(repo), _make_ctx(tmp_path))).schema_slice
assert json.dumps(out1, sort_keys=True) == json.dumps(out2, sort_keys=True)
slice_ = op.OwnershipSlice.model_validate(out1)
assert [e.pattern for e in slice_.entries] == ["/b/", "/a/", "/c/"]
# ----------------------------------------------------------------------
# Registration & static integrity
# ----------------------------------------------------------------------
def test_ownership_probe_registered_light() -> None:
"""AC-4, AC-NEW-2."""
entry = next(
(e for e in default_registry._entries if e.cls.name == "ownership"),
None,
)
assert entry is not None, "OwnershipProbe must be in default_registry._entries"
assert entry.heaviness == "light"
def test_ownership_probe_id_constant_exists() -> None:
"""AC-NEW-1. Dual-form identity discipline."""
assert hasattr(op, "_PROBE_ID")
assert op._PROBE_ID == ProbeId("ownership")
assert op.OwnershipProbe.name == "ownership"
def test_ownership_no_subclass_extension_path() -> None:
"""AC-NEW-4. Mutation caught: a `class FancyOwnershipProbe(OwnershipProbe)`
fragments dispatch into a class hierarchy; composition wins."""
tree = ast.parse(inspect.getsource(op))
for node in ast.walk(tree):
if isinstance(node, ast.ClassDef):
bases = [b.id for b in node.bases if isinstance(b, ast.Name)]
assert "OwnershipProbe" not in bases, f"Subclass {node.name!r} violates AC-NEW-4"
def test_ownership_docstring_documents_github_divergence() -> None:
"""AC-NEW-7. The intentional divergence from GitHub's documented
search order is grep-discoverable."""
assert op.__doc__ is not None
assert "Phase 2 search order intentionally diverges from GitHub" in op.__doc__
def test_ownership_slice_rejects_extra_fields() -> None:
"""AC-2, AC-16. Mutation caught: relaxing extra='forbid' to 'allow'
would silently admit unknown fields."""
with pytest.raises(pydantic.ValidationError):
op.OwnershipSlice(
source_path="CODEOWNERS",
entries=(),
extra_field="x", # type: ignore[call-arg]
)
# tests/unit/probes/layer_e/test_stubs.py
"""Unit tests for ServiceTopologyStub + SloStub (S6-05).
Parametrized across the two stub probes — they are deliberately
near-identical (S6-04 deferred-stub precedent); the parametrization
makes the duplication visible without hiding it behind a base class
(Rule of Three forbids — three deferred stubs is the trigger; we have
two stubs in this story plus S6-04's external_docs as the third, so
the rule-of-three discussion can land in a Phase-4 cleanup story
*if* a contributor wants to extract — until then, three similar
modules is cheaper than a premature abstraction).
"""
from __future__ import annotations
import ast
import asyncio
import inspect
import json
from pathlib import Path
import pydantic
import pytest
from codegenie.probes.layer_e import service_topology_stub as sts
from codegenie.probes.layer_e import slo_stub as slo
from codegenie.probes.registry import default_registry
from codegenie.types.identifiers import ProbeId
_STUB_PARAMS = [
pytest.param(
sts, sts.ServiceTopologyStubProbe, sts.NotOptedInServiceTopologySlice,
"service_topology", id="service_topology",
),
pytest.param(
slo, slo.SloStubProbe, slo.NotOptedInSloSlice,
"slo", id="slo",
),
]
@pytest.mark.parametrize(("module", "probe_cls", "slice_cls", "probe_name"), _STUB_PARAMS)
def test_stub_always_high_confidence_not_opted_in(
module, probe_cls, slice_cls, probe_name, tmp_path: Path,
_make_repo, _make_ctx,
) -> None:
"""AC-10, AC-11. Mutation caught: any future code that flips to
`confidence='low'` (which would signal "we tried and failed")
without an ADR-amend; or any code that reads `repo.config[...]`
and bifurcates the response (the slippery slope to a real
fetcher)."""
output = asyncio.run(probe_cls().run(_make_repo(tmp_path), _make_ctx(tmp_path)))
assert output.confidence == "high"
slice_ = slice_cls.model_validate(output.schema_slice)
assert slice_.opted_in is False
assert slice_.reason == "phase_9_or_later"
assert output.warnings == []
assert output.errors == []
@pytest.mark.parametrize(("module", "probe_cls", "slice_cls", "probe_name"), _STUB_PARAMS)
def test_stub_writes_single_raw_artifact_atomically(
module, probe_cls, slice_cls, probe_name, tmp_path: Path,
_make_repo, _make_ctx,
) -> None:
"""AC-NEW-6 (part 1: stubs). Single canonical raw artifact named
after the probe."""
ctx = _make_ctx(tmp_path)
output = asyncio.run(probe_cls().run(_make_repo(tmp_path), ctx))
expected = ctx.output_dir / f"{probe_name}.json"
assert output.raw_artifacts == [expected]
on_disk = json.loads(expected.read_bytes())
assert on_disk == output.schema_slice
@pytest.mark.parametrize(("module", "probe_cls", "slice_cls", "probe_name"), _STUB_PARAMS)
def test_stub_two_runs_byte_identical(
module, probe_cls, slice_cls, probe_name, tmp_path: Path,
_make_repo, _make_ctx,
) -> None:
"""AC-15. Mutation caught: any timestamp / nonce / per-run ID in
the not-opted-in slice or the raw artifact."""
ctx = _make_ctx(tmp_path)
probe = probe_cls()
out1 = asyncio.run(probe.run(_make_repo(tmp_path), ctx))
out2 = asyncio.run(probe.run(_make_repo(tmp_path), ctx))
assert json.dumps(out1.schema_slice, sort_keys=True) == json.dumps(
out2.schema_slice, sort_keys=True,
)
# Raw artifact was overwritten by run #2; re-read.
raw_bytes = (ctx.output_dir / f"{probe_name}.json").read_bytes()
assert json.loads(raw_bytes) == out2.schema_slice
@pytest.mark.parametrize(("module", "probe_cls", "slice_cls", "probe_name"), _STUB_PARAMS)
def test_stub_registered_light_and_in_registry(
module, probe_cls, slice_cls, probe_name,
) -> None:
"""AC-4, AC-NEW-2."""
entry = next(
(e for e in default_registry._entries if e.cls.name == probe_name),
None,
)
assert entry is not None, f"{probe_name} probe must be in default_registry._entries"
assert entry.heaviness == "light"
@pytest.mark.parametrize(("module", "probe_cls", "slice_cls", "probe_name"), _STUB_PARAMS)
def test_stub_probe_id_constant_exists(
module, probe_cls, slice_cls, probe_name,
) -> None:
"""AC-NEW-1. Dual-form identity."""
assert hasattr(module, "_PROBE_ID")
assert module._PROBE_ID == ProbeId(probe_name)
assert probe_cls.name == probe_name
@pytest.mark.parametrize(("module", "probe_cls", "slice_cls", "probe_name"), _STUB_PARAMS)
def test_stub_no_forbidden_imports(module, probe_cls, slice_cls, probe_name) -> None:
"""AC-12. Mutation caught: any HTTP/socket client import would
break the Phase-0 fence — and break the determinism guarantee."""
forbidden = {
"httpx", "requests", "urllib.request", "aiohttp",
"socket", "http.client", "httplib",
}
tree = ast.parse(inspect.getsource(module))
names: set[str] = set()
for node in ast.walk(tree):
if isinstance(node, ast.Import):
names.update(a.name for a in node.names)
elif isinstance(node, ast.ImportFrom) and node.module:
names.add(node.module)
assert not (forbidden & names), f"Forbidden imports in {probe_name}: {forbidden & names}"
@pytest.mark.parametrize(("module", "probe_cls", "slice_cls", "probe_name"), _STUB_PARAMS)
def test_stub_no_subclass_extension_path(
module, probe_cls, slice_cls, probe_name,
) -> None:
"""AC-NEW-4. The eventual opted-in branch is `match` dispatch
inside `run`, not a subclass."""
tree = ast.parse(inspect.getsource(module))
target_name = probe_cls.__name__
for node in ast.walk(tree):
if isinstance(node, ast.ClassDef):
bases = [b.id for b in node.bases if isinstance(b, ast.Name)]
assert target_name not in bases, (
f"Subclass {node.name!r} of {target_name} violates AC-NEW-4"
)
@pytest.mark.parametrize(("module", "probe_cls", "slice_cls", "probe_name"), _STUB_PARAMS)
def test_stub_docstring_names_opted_in_discriminator(
module, probe_cls, slice_cls, probe_name,
) -> None:
"""AC-NEW-3. Mutation caught: a future contributor changing the
discriminator key from `opted_in` to e.g. `kind` would silently
fragment the Phase-9+ tagged-union strategy. The grep token
`discriminator="opted_in"` is the load-bearing trip-wire."""
src = inspect.getsource(module)
assert 'discriminator="opted_in"' in src, (
f"{probe_name} module must explicitly name `opted_in` as the eventual "
"tagged-union discriminator key (in a comment, docstring, or code) per AC-NEW-3."
)
@pytest.mark.parametrize(("module", "probe_cls", "slice_cls", "probe_name"), _STUB_PARAMS)
def test_stub_docstring_documents_deferral(
module, probe_cls, slice_cls, probe_name,
) -> None:
"""AC-18. The deferral is grep-able via 'phase_9_or_later'."""
assert module.__doc__ is not None
assert "deferred to Phase 9 or later" in module.__doc__
@pytest.mark.parametrize(("module", "probe_cls", "slice_cls", "probe_name"), _STUB_PARAMS)
def test_stub_slice_rejects_opted_in_true(
module, probe_cls, slice_cls, probe_name,
) -> None:
"""AC-3. Mutation caught: relaxing `opted_in: Literal[False]` to
`bool` would silently accept a True value before the opted-in
branch exists in `run`."""
with pytest.raises(pydantic.ValidationError):
slice_cls(opted_in=True, reason="phase_9_or_later") # type: ignore[arg-type]
@pytest.mark.parametrize(("module", "probe_cls", "slice_cls", "probe_name"), _STUB_PARAMS)
def test_stub_slice_rejects_extra_fields(
module, probe_cls, slice_cls, probe_name,
) -> None:
"""AC-2, AC-16. extra='forbid' enforced at Pydantic level."""
with pytest.raises(pydantic.ValidationError):
slice_cls(opted_in=False, reason="phase_9_or_later", extra="x") # type: ignore[call-arg]
def test_no_cross_probe_imports_among_layer_e_files() -> None:
"""AC-13. The three layer_e files do not import each other; none
imports from layer_d. Mutation caught: a future contributor
extracting a shared base in one file and importing it in the
others (premature abstraction; Rule of Three forbids)."""
import importlib
own = importlib.import_module("codegenie.probes.layer_e.ownership")
files_to_check = [own, sts, slo]
forbidden_substrings = (
"codegenie.probes.layer_e.ownership",
"codegenie.probes.layer_e.service_topology_stub",
"codegenie.probes.layer_e.slo_stub",
"codegenie.probes.layer_d",
)
for mod in files_to_check:
src = inspect.getsource(mod)
tree = ast.parse(src)
for node in ast.walk(tree):
if isinstance(node, ast.ImportFrom) and node.module:
# Allow self-imports (a module can re-export from itself in __all__)
for forbidden in forbidden_substrings:
if forbidden == mod.__name__:
continue
assert forbidden not in node.module, (
f"{mod.__name__} forbidden-imports from {node.module}"
)
Green — make it pass¶
Skeleton for ownership.py (the impure shell is small; the parser is the pure core):
# src/codegenie/probes/layer_e/ownership.py
"""OwnershipProbe — Layer E, light heaviness.
Parses CODEOWNERS from three GitHub-convention locations.
Phase 2 search order intentionally diverges from GitHub:
this implementation prefers <repo>/CODEOWNERS over <repo>/.github/CODEOWNERS
because the repo-root file is the most visible to operators. An operator
who wants .github/CODEOWNERS to win simply deletes the root file.
Functional core / imperative shell:
`_parse_codeowners_lines(text)` is the pure core; `OwnershipProbe.run`
is the only impure code (filesystem search + size cap + file read).
Sources:
- ../../localv2.md §5.5 E1.
- src/codegenie/probes/layer_c/certificate.py:67-81 (CertificateProbe
upstream-absent precedent: absent expected file → confidence='low').
"""
from __future__ import annotations
import json
import os
import time
from pathlib import Path
from typing import Final, Literal
from pydantic import BaseModel, ConfigDict
from codegenie.probes.base import Probe, ProbeContext, ProbeOutput, RepoSnapshot
from codegenie.probes.registry import register_probe
from codegenie.types.identifiers import ProbeId
__all__ = ["OwnershipEntry", "OwnershipProbe", "OwnershipSlice"]
_LOCATIONS: Final[tuple[str, ...]] = ("CODEOWNERS", ".github/CODEOWNERS", "docs/CODEOWNERS")
OWNERSHIP_MAX_BYTES: Final[int] = 1 * 1024 * 1024 # 1 MB
_PROBE_ID: Final[ProbeId] = ProbeId("ownership")
class OwnershipEntry(BaseModel):
model_config = ConfigDict(frozen=True, extra="forbid")
pattern: str
owners: tuple[str, ...]
line_number: int
class OwnershipSlice(BaseModel):
model_config = ConfigDict(frozen=True, extra="forbid")
source_path: str | None
entries: tuple[OwnershipEntry, ...]
def _parse_codeowners_lines(text: str) -> tuple[tuple[OwnershipEntry, ...], tuple[str, ...]]:
"""Pure functional-core CODEOWNERS line parser.
1-indexed line numbers include blank/comment lines in the count
(operators expect `vim +N`-compatible line numbers).
"""
entries: list[OwnershipEntry] = []
errors: list[str] = []
for idx, raw in enumerate(text.splitlines(), start=1):
stripped = raw.strip()
if not stripped or stripped.startswith("#"):
continue
tokens = stripped.split()
# AC-NEW-5: truncate at the first `#`-prefixed token (inline comment).
for i, tok in enumerate(tokens):
if tok.startswith("#"):
tokens = tokens[:i]
break
if not tokens:
continue
pattern = tokens[0]
owners = tuple(tokens[1:])
if not owners:
errors.append(f"empty_owners_at_line_{idx}")
entries.append(OwnershipEntry(pattern=pattern, owners=owners, line_number=idx))
return tuple(entries), tuple(errors)
@register_probe(heaviness="light")
class OwnershipProbe(Probe):
name: str = "ownership"
version: str = "0.1.0"
layer = "E"
tier = "base"
applies_to_tasks: list[str] = ["*"]
applies_to_languages: list[str] = ["*"]
requires: list[str] = []
declared_inputs: list[str] = list(_LOCATIONS)
timeout_seconds: int = 5
cache_strategy: Literal["content"] = "content"
async def run(self, repo: RepoSnapshot, ctx: ProbeContext) -> ProbeOutput:
t0 = time.perf_counter()
found: list[Path] = [repo.root / loc for loc in _LOCATIONS if (repo.root / loc).exists()]
if not found:
return ProbeOutput(
schema_slice=OwnershipSlice(source_path=None, entries=()).model_dump(mode="json"),
raw_artifacts=[],
confidence="low",
duration_ms=int((time.perf_counter() - t0) * 1000),
warnings=[],
errors=["codeowners_absent"],
)
primary = found[0]
extra_errors: list[str] = [
f"additional_codeowners_present_at:{p.relative_to(repo.root)}" for p in found[1:]
]
size = os.path.getsize(primary)
if size > OWNERSHIP_MAX_BYTES:
return ProbeOutput(
schema_slice=OwnershipSlice(
source_path=str(primary.relative_to(repo.root)), entries=(),
).model_dump(mode="json"),
raw_artifacts=[],
confidence="low",
duration_ms=int((time.perf_counter() - t0) * 1000),
warnings=[],
errors=[f"codeowners_size_cap_exceeded:{size}"] + extra_errors,
)
text = primary.read_text()
entries, parse_errors = _parse_codeowners_lines(text)
return ProbeOutput(
schema_slice=OwnershipSlice(
source_path=str(primary.relative_to(repo.root)),
entries=entries,
).model_dump(mode="json"),
raw_artifacts=[],
confidence="high",
duration_ms=int((time.perf_counter() - t0) * 1000),
warnings=[],
errors=list(parse_errors) + extra_errors,
)
Skeleton for service_topology_stub.py (mirrors S6-04 external_docs.py shape):
# src/codegenie/probes/layer_e/service_topology_stub.py
"""ServiceTopologyStubProbe — Layer E, light heaviness, deferred stub.
The real service-topology data source (service mesh / Backstage / OpsLevel /
Cortex) is deferred to Phase 9 or later. Phase 2 ships a registered stub
that emits a typed NotOptedInServiceTopologySlice with `opted_in=False,
reason="phase_9_or_later"` and `confidence="high"` (absence is the data;
S6-04 NotOptedInExternalDocsSlice precedent).
Discriminator key for the eventual tagged union: `discriminator="opted_in"`.
The Phase-9+ opted-in variant lands as a *new* sibling Pydantic model
(`OptedInServiceTopologySlice`) joined under
`Annotated[NotOptedInServiceTopologySlice | OptedInServiceTopologySlice,
Field(discriminator="opted_in")]`, dispatched via `match` on
`repo.config.get("service_topology")` inside `run` — never via subclass.
NONE of the Phase-9 opted-in logic ships in Phase 2.
"""
from __future__ import annotations
import json
import os
import time
from typing import Final, Literal
from pydantic import BaseModel, ConfigDict
from codegenie.probes.base import Probe, ProbeContext, ProbeOutput, RepoSnapshot
from codegenie.probes.registry import register_probe
from codegenie.types.identifiers import ProbeId
__all__ = ["NotOptedInServiceTopologySlice", "ServiceTopologyStubProbe"]
_PROBE_ID: Final[ProbeId] = ProbeId("service_topology")
class NotOptedInServiceTopologySlice(BaseModel):
"""Phase-2 closed shape — not-opted-in variant of the eventual
`Annotated[NotOptedInServiceTopologySlice | OptedInServiceTopologySlice,
Field(discriminator="opted_in")]` tagged union."""
model_config = ConfigDict(frozen=True, extra="forbid")
opted_in: Literal[False]
reason: Literal["phase_9_or_later"]
@register_probe(heaviness="light")
class ServiceTopologyStubProbe(Probe):
name: str = "service_topology"
version: str = "0.1.0"
layer = "E"
tier = "base"
applies_to_tasks: list[str] = ["*"]
applies_to_languages: list[str] = ["*"]
requires: list[str] = []
declared_inputs: list[str] = []
timeout_seconds: int = 5
cache_strategy: Literal["content"] = "content"
async def run(self, repo: RepoSnapshot, ctx: ProbeContext) -> ProbeOutput:
t0 = time.perf_counter()
slice_ = NotOptedInServiceTopologySlice(opted_in=False, reason="phase_9_or_later")
payload = slice_.model_dump(mode="json")
out_path = ctx.output_dir / "service_topology.json"
tmp_path = out_path.with_suffix(".json.tmp")
tmp_path.write_text(json.dumps(payload, sort_keys=True, indent=2))
os.replace(tmp_path, out_path)
return ProbeOutput(
schema_slice=payload,
raw_artifacts=[out_path],
confidence="high",
duration_ms=int((time.perf_counter() - t0) * 1000),
warnings=[],
errors=[],
)
slo_stub.py mirrors service_topology_stub.py exactly with _PROBE_ID = ProbeId("slo"), NotOptedInSloSlice, SloStubProbe, name = "slo", and out_path = ctx.output_dir / "slo.json". The duplication is deliberate (Rule of Three not yet triggered — three deferred stubs across S6-04 + this story, but each one's eventual divergence will be different; see Notes §7).
Refactor¶
- Do not extract a shared stub base between
service_topology_stub,slo_stub, and S6-04'sexternal_docs. Three deferred-stub probes with identical inert shape is the Rule-of-Three trigger threshold, but each one's eventual Phase-9+ divergence will be different:external_docswidens to a Confluence/Notion/URL-list tagged union;service_topologywidens to a service-mesh/Backstage/OpsLevel union;slowidens to per-provider SLO catalogs. Extracting a base now would force a generic /Any-typedreasonand erase the per-stubLiteralinvariants — both worse than duplication. - When the Phase-9+ opted-in branches land (one stub at a time), each one's
rundispatches via exhaustivematchinside the same file (not via subclass; not via anif/elseladder; not via edits to the existingNotOptedIn…Slicemodel):
async def run(self, repo: RepoSnapshot, ctx: ProbeContext) -> ProbeOutput:
match repo.config.get("service_topology"):
case None | {}:
return await _emit_not_opted_in_topology(ctx)
case {"sources": _}:
return await _emit_opted_in_topology(repo, ctx)
case other:
assert_never(other)
_emit_not_opted_in_topology and _emit_opted_in_topology are module-level free functions (functional-core / imperative-shell discipline). The opted-in branch's mesh-fetcher lands as a separate sibling module _service_topology_fetcher.py; the probe class stays a thin dispatcher.
OwnershipProbe's_parse_codeowners_linesis the functional-core extract — keep it pure (text: str -> (entries, errors)). Any temptation to "let the parser open the file directly" or "let the parser take aPath" re-tangles pure/impure and breaks the AST test for AC-NEW-6 part 2.
Out of scope¶
- Service-catalog HTTP clients (Backstage, OpsLevel, Cortex). Phase 9+.
- Service mesh API integration (Istio, Linkerd, Consul Connect). Phase 9+.
- OpenAPI / gRPC / GraphQL parsing (
ServiceContractProbe— E3 inlocalv2.md). Phase-3-or-later; not Phase 2's scope per the manifest. - Production config probe (E5). Phase 9+ deferred stub.
- JSON-Schema sub-schema round-trip tests. Sub-schemas under
src/codegenie/schema/probes/layer_e/land in S6-08; this story asserts Pydantic-levelextra="forbid"+Literalenforcement, not JSON Schema. - Email / handle redaction. The writer chokepoint (
SecretRedactor) handles it via Phase 0's field-name regex + Phase 2's pattern set. The probe captures@team/@userhonestly. CODEOWNERSescape sequences (\#, escaped spaces in patterns). Phase 2's parser is whitespace-split + token-truncate; escape handling is a Phase 3+ refinement if real CODEOWNERS files demand it.
Notes for the implementer¶
OwnershipProbe.confidenceis"low"on absent file,"high"on found-file. Distinction matters: absent CODEOWNERS is a Planner-actionable observation ("this repo has no owners declared, route to default"); the CertificateProbe upstream-absent precedent applies (src/codegenie/probes/layer_c/certificate.py:67-81). This is different from the deferred-stub framing used byServiceTopologyStub/SloStub(whereconfidence="high"because the probe successfully determined the feature is not opted in — S6-04NotOptedInExternalDocsSliceprecedent). Both framings coexist in the codebase; pick the right one per probe.Path.stat().st_size(oros.path.getsize(...)) before opening the file. AC-14 is the first defense. If the file passes the cap,Path.read_text()is fine for Phase 2 (CODEOWNERS files are small in practice; bounded line iteration viaitertools.islicewould be belt-and-suspenders buttext.splitlines()after a size-cap-bounded read is sufficient for the 1 MB cap).OwnershipEntry.line_numberis 1-indexed and includes blank/comment lines in the count. Operators expect a line number that matches their editor (vim +N). Counting only emitted entries would diverge from the actual file.- GitHub's documented search order is
.github/CODEOWNERS>CODEOWNERS>docs/CODEOWNERS. The implementation here usesCODEOWNERS>.github/CODEOWNERS>docs/CODEOWNERS. This is intentional — Phase 2's discipline is "the root location wins" because it's the most visible. If an operator wants.github/CODEOWNERSto win, they delete the root file. AC-NEW-7 requires the exact phrase"Phase 2 search order intentionally diverges from GitHub"in the module docstring so a future contributor sees the choice was deliberate. OwnershipSlice.source_path: str | Noneis honest about the absent case. A sentinel""would be primitive obsession;Noneis the right "no file" representation.OwnershipEntry.ownersistuple[str, ...], notset[str]. ACODEOWNERSline*.py @a @b @ais operator misconfiguration but is parsed as-given; deduplication is the Planner's responsibility (or a later linter probe).ServiceTopologyStubandSloStubare deliberately near-identical. Resist refactoring them into a single file or a base class — the eventual Phase-9+ divergence (per-source error unions, per-source config schemas) means the closedreason: Literal[...]discriminator on each will diverge. Their identicality now is a coincidence of "both empty." Rule-of-three with S6-04 ExternalDocsProbe is the third instance; the cleanup conversation lands at Phase 4 only if a contributor wants it; otherwise three similar deferred-stub modules (each ≤ 50 LOC) is cheaper than one shared abstraction.- Design patterns to name explicitly (mirrors S6-04 Notes §6):
- Null Object — the stubs satisfy the
ProbeABC so the coordinator, renderer, and Planner consumeServiceTopologyStubProbe/SloStubProbeexactly as they consume any other Layer-E probe — no null-checks, noif probe.name in {"service_topology", "slo"}: skip, no special-casing. - Tagged union via discriminator —
opted_inis the discriminator key for the eventual Phase-9+ widening. The Phase-2 commitment is the key choice (opted_in, notkindortype); the variants land later. - Open/Closed at file boundary — the eventual opted-in branches land via
matchdispatch + new sibling slice models in the same file, never via subclass and never via edits to the existingNotOptedIn…Slicemodels. - Functional core / imperative shell —
OwnershipProbe._parse_codeowners_linesis pure;runis the imperative shell. Mirrorsdockerfile.py+_dockerfile_parse.py. async def run(self, repo, ctx)— NOT_run(self, ctx). Phase 0 ADR-0007 freezes the ABC byte-for-byte againstlocalv2.md §4:async def run(self, repo: RepoSnapshot, ctx: ProbeContext) -> ProbeOutput. Therepois the first positional arg (not actxfield); the method is public (run), not_run. Tests useasyncio.run(probe.run(repo, ctx))(precedent:tests/unit/probes/layer_c/test_dockerfile.py:77-78). The_make_repo/_make_ctxhelpers intests/unit/probes/layer_e/conftest.pyare the canonical construction points —ProbeContextis a stdlib@dataclasswith nofor_testclassmethod.name: str = "ownership"ABC attr +_PROBE_ID: Final[ProbeId]module constant — the dual-form identity. S6-01..S6-04 settled this convention: the ABC field isname: str(frozen, stringly-typed for ABC compatibility); the module-level_PROBE_ID: Final[ProbeId]constant carries the typedNewType-wrapped identifier for any in-module use.ProbeIdis imported fromcodegenie.types.identifiers—codegenie.idsdoes not exist (recurring documentation drift).tuple[str, ...]forapplies_to_*is wrong — uselist[str]. The ABC declaresapplies_to_tasks: list[str]andapplies_to_languages: list[str]. Atupleannotation contradicts the contract and will fail thetests/unit/test_probe_contract.pysnapshot.- Atomic write via
.tmp→os.replacefor the stub raw artifacts. Mirrors S6-04 GREEN code path. The byte-identity determinism test (AC-15) depends onjson.dumps(..., sort_keys=True, indent=2)+ atomic replace. - Phase 0 ADR-0007 freezes the
ProbeABC. The full field set is mandatory. Every probe must declare the full field set (version,layer,tier,applies_to_*,requires,declared_inputs,timeout_seconds,cache_strategy). The canonical full-field reference issrc/codegenie/probes/layer_b/index_health.py:298-326; mirror it exactly.