Skip to content

S8-01 — Confidence section renderer (attempt log)

Append-only journal. Each attempt records what was tried, what worked, what didn't, and the lesson for the next attempt or story.


Attempt 1 — 2026-05-18 (phase-story-executor)

Outcome: GREEN. All 8 ACs verified; all 22 added tests pass; full pytest suite stays green (3546 passed, 33 skipped, 2 xfailed); mypy --strict --warn-unreachable and ruff check/ruff format clean across src/codegenie/report/ + the writer integration; AC-3 ritual produces the required [unreachable] error at both nesting levels.

Code shipped

  • New:
  • src/codegenie/report/__init__.py — closed __all__ re-exporting ConfidenceSectionRenderer and render_confidence_section.
  • src/codegenie/report/confidence_section.py — the renderer (~210 LOC including module docstring and helpers).
  • tests/unit/report/__init__.py — empty package init.
  • tests/unit/report/test_confidence_section.py — 20 unit tests covering AC-1, AC-2, AC-4 (per variant + ordering + ASCII), AC-5 (3 paths), AC-8.
  • tests/integration/test_writer_renders_confidence_section.py — 2 tests for AC-6 (file exists + heading + rows; byte-identical replay).
  • Modified:
  • src/codegenie/output/writer.py — added _publish_context_report helper invoked between the YAML os.replace and the recursive mode-fix; the renderer is imported lazily so a future writer can pre-date its consumer without an import-graph surprise. A renderer failure logs report.confidence_section.render_failed and is swallowed (defense in depth — the renderer is documented as never-raising; repo-context.yaml integrity is sacrosanct).
  • tests/unit/test_output_writer.py::test_writer_replaces_raws_before_yaml — invariant strengthened from "yaml is the last os.replace" to "every raw appears before yaml AND yaml appears before CONTEXT_REPORT.md". Original intent preserved; new ordering invariant locked.

Per-AC evidence

AC Where verified Status
AC-1 — module surface, closed __all__, no probe imports tests/unit/report/test_confidence_section.py::test_module_surface_is_closed, test_renderer_class_wraps_function, test_no_probe_registry_import PASS
AC-2 — exhaustive variant coverage test_exhaustive_match_every_variant (programmatic enumeration of Fresh, CommitsBehind, DigestMismatch, CoverageGap, IndexerError) PASS
AC-3 — mypy --warn-unreachable enforces removed-case → build break Ritual captured below (both nesting levels) PASS
AC-4 — deterministic order + per-variant row format test_row_format_per_variant_{fresh,commits_behind,digest_mismatch,coverage_gap,indexer_error}, test_row_order_deterministic, test_ascii_only_no_emoji PASS
AC-5 — malformed slice → stable slice_malformed row; never raises test_malformed_slice_does_not_crash, test_malformed_freshness_missing_entirely, test_empty_envelope_returns_heading_only, test_renderer_never_raises_on_random_garbage (6 parametrize cases) PASS
AC-6 — writer integration: atomic write, contains heading + rows, byte-identical tests/integration/test_writer_renders_confidence_section.py::test_context_report_md_written_with_confidence_section, ::test_context_report_md_byte_identical_across_runs PASS
AC-7 — mypy --strict + ruff clean mypy --strict src/codegenie/report/Success: no issues found in 2 source files; ruff check ...All checks passed!; ruff format --check ...5 files already formatted PASS
AC-8 — no probe-registry import side-effect test_no_probe_registry_import (subprocess clean-state import; asserts no codegenie.probes.* modules loaded) PASS

AC-3 ritual — mypy --warn-unreachable enforcement

Both nesting levels were tested. The required [unreachable] error fires on a removed case at either level.

(a) outer case Fresh(...) deleted:

src/codegenie/report/confidence_section.py:202: error: Argument 1 to "assert_never" has incompatible type "Fresh"; expected "Never"  [arg-type]
Found 1 error in 1 file (checked 2 source files)

(b) inner case DigestMismatch(...) deleted:

src/codegenie/report/confidence_section.py:196: error: Argument 1 to "assert_never" has incompatible type "DigestMismatch"; expected "Never"  [arg-type]
Found 1 error in 1 file (checked 2 source files)

Restored tree: Success: no issues found in 2 source files.

warn_unreachable is enabled globally in pyproject.toml (S1-11 had the intent of a per-module override, but the project-wide setting is already in force — the deletion ritual proves the gate fires). No edits to pyproject.toml were needed for this story.

Conflict surfaced + resolution (CLAUDE.md Rule 7)

Story AC-2 literal text: "single match value: statement … with arms for Fresh(...) and Stale(reason=CommitsBehind(...)), …".

Reality: mypy's match-narrowing on Pydantic's nested discriminated union (Stale.reason: StaleReason = Annotated[CommitsBehind | … | IndexerError, Field(discriminator="kind")]) does NOT fully exhaust the inner type — after all four case Stale(reason=X) arms, mypy still thinks Stale could remain, so assert_never(value) errors out at build time. This was reproduced under the current pinned mypy + Pydantic versions.

Resolution chosen: nested match (outer over Fresh|Stale, inner over StaleReason). This is the convention the producer already uses (codegenie.probes.layer_b.index_health._derive_confidence — see that function's docstring "Nested exhaustive match with assert_never on each default arm so mypy --warn-unreachable enforces handling of every variant at BOTH levels"). The deviation is strictly stronger than the literal AC (both levels assert_never-guarded) and matches CLAUDE.md Rule 11 ("match the codebase's conventions"). Logged here per Rule 7 "surface conflicts, don't average them"; not silently smoothed in.

Refactor decisions (design-patterns lens)

Pattern Decision
Tagged union / sum type Reinforced — the renderer is the second consumer of IndexFreshness's sum-type discipline (after _derive_confidence).
assert_never exhaustiveness Reinforced at both nesting levels.
Functional core / imperative shell Renderer is pure — no I/O, no logger, no clock. Imperative shell lives in the writer (the _publish_context_report helper).
Capability / chokepoint Writer remains the only path from RedactedSlice to disk; the renderer is invoked behind that chokepoint, never independently.
Open/Closed at the file boundary Renderer adds no if index_name == … branches; per-variant logic is the case arms.
Smart constructor Not applied (renderer is a pure function); RedactedSlice upstream is the relevant smart constructor and is unchanged.
Strategy pattern Considered for per-variant row formatting; rejected — the inline match arms are clearer than a dict-of-strategies and the AC requires assert_never exhaustiveness in one place.
Lazy import Applied at writer side (_publish_context_report imports codegenie.report inside the function) so the writer's static import graph stays minimal and a future re-ordering of phase modules doesn't fail at import time.

No anti-patterns introduced (no primitive obsession, no anaemic types, no hidden state, no shotgun branching).

Gates

  • make lint → clean.
  • lint-imports --config pyproject.toml --no-cacheContracts: 2 kept, 0 broken.
  • mypy --strictSuccess: no issues found in 132 source files.
  • make test3546 passed, 33 skipped, 3 deselected, 2 xfailed.
  • make fence → 9 fence tests pass (local Makefile coverage gate is a documented false-positive on narrow subsets per CLAUDE.md; CI runs fence with -o "addopts=", so this is not a CI blocker).

Lessons for follow-on stories

  • L1. mypy's match-narrowing on nested Pydantic discriminated unions is incomplete; nested match is the codebase convention precisely because of this. When a future story claims "single match" over a nested sum type, plan for the nested form up front and surface the AC-text deviation in the validation phase, not in the executor's refactor step.
  • L2. The _INDEX_FRESHNESS_ADAPTER = TypeAdapter(IndexFreshness) construction is a module-level constant — every renderer that validates a Pydantic discriminated-union envelope should follow the same pattern so the per-call construction cost doesn't dominate the hot path.
  • L3. Defense-in-depth on pathological envelopes paid off — the parametrized "random garbage" test caught a None slice case that the AC-5 minimal contract would have missed.
  • L4. Tests that assert ordering of os.replace calls (S0 / S3-03 era) are inherently brittle when a new artifact is added to the pipeline. Strengthen them to assert the relations between artifacts (a appears before b) rather than absolute positions.