S8-01 — Confidence section renderer (attempt log)¶
Append-only journal. Each attempt records what was tried, what worked, what didn't, and the lesson for the next attempt or story.
Attempt 1 — 2026-05-18 (phase-story-executor)¶
Outcome: GREEN. All 8 ACs verified; all 22 added tests pass; full pytest
suite stays green (3546 passed, 33 skipped, 2 xfailed); mypy --strict
--warn-unreachable and ruff check/ruff format clean across
src/codegenie/report/ + the writer integration; AC-3 ritual produces the
required [unreachable] error at both nesting levels.
Code shipped¶
- New:
src/codegenie/report/__init__.py— closed__all__re-exportingConfidenceSectionRendererandrender_confidence_section.src/codegenie/report/confidence_section.py— the renderer (~210 LOC including module docstring and helpers).tests/unit/report/__init__.py— empty package init.tests/unit/report/test_confidence_section.py— 20 unit tests covering AC-1, AC-2, AC-4 (per variant + ordering + ASCII), AC-5 (3 paths), AC-8.tests/integration/test_writer_renders_confidence_section.py— 2 tests for AC-6 (file exists + heading + rows; byte-identical replay).- Modified:
src/codegenie/output/writer.py— added_publish_context_reporthelper invoked between the YAMLos.replaceand the recursive mode-fix; the renderer is imported lazily so a future writer can pre-date its consumer without an import-graph surprise. A renderer failure logsreport.confidence_section.render_failedand is swallowed (defense in depth — the renderer is documented as never-raising;repo-context.yamlintegrity is sacrosanct).tests/unit/test_output_writer.py::test_writer_replaces_raws_before_yaml— invariant strengthened from "yaml is the lastos.replace" to "every raw appears before yaml AND yaml appears before CONTEXT_REPORT.md". Original intent preserved; new ordering invariant locked.
Per-AC evidence¶
| AC | Where verified | Status |
|---|---|---|
AC-1 — module surface, closed __all__, no probe imports |
tests/unit/report/test_confidence_section.py::test_module_surface_is_closed, test_renderer_class_wraps_function, test_no_probe_registry_import |
PASS |
| AC-2 — exhaustive variant coverage | test_exhaustive_match_every_variant (programmatic enumeration of Fresh, CommitsBehind, DigestMismatch, CoverageGap, IndexerError) |
PASS |
AC-3 — mypy --warn-unreachable enforces removed-case → build break |
Ritual captured below (both nesting levels) | PASS |
| AC-4 — deterministic order + per-variant row format | test_row_format_per_variant_{fresh,commits_behind,digest_mismatch,coverage_gap,indexer_error}, test_row_order_deterministic, test_ascii_only_no_emoji |
PASS |
AC-5 — malformed slice → stable slice_malformed row; never raises |
test_malformed_slice_does_not_crash, test_malformed_freshness_missing_entirely, test_empty_envelope_returns_heading_only, test_renderer_never_raises_on_random_garbage (6 parametrize cases) |
PASS |
| AC-6 — writer integration: atomic write, contains heading + rows, byte-identical | tests/integration/test_writer_renders_confidence_section.py::test_context_report_md_written_with_confidence_section, ::test_context_report_md_byte_identical_across_runs |
PASS |
AC-7 — mypy --strict + ruff clean |
mypy --strict src/codegenie/report/ → Success: no issues found in 2 source files; ruff check ... → All checks passed!; ruff format --check ... → 5 files already formatted |
PASS |
| AC-8 — no probe-registry import side-effect | test_no_probe_registry_import (subprocess clean-state import; asserts no codegenie.probes.* modules loaded) |
PASS |
AC-3 ritual — mypy --warn-unreachable enforcement¶
Both nesting levels were tested. The required [unreachable] error fires
on a removed case at either level.
(a) outer case Fresh(...) deleted:
src/codegenie/report/confidence_section.py:202: error: Argument 1 to "assert_never" has incompatible type "Fresh"; expected "Never" [arg-type]
Found 1 error in 1 file (checked 2 source files)
(b) inner case DigestMismatch(...) deleted:
src/codegenie/report/confidence_section.py:196: error: Argument 1 to "assert_never" has incompatible type "DigestMismatch"; expected "Never" [arg-type]
Found 1 error in 1 file (checked 2 source files)
Restored tree: Success: no issues found in 2 source files.
warn_unreachable is enabled globally in pyproject.toml (S1-11 had the
intent of a per-module override, but the project-wide setting is already
in force — the deletion ritual proves the gate fires). No edits to
pyproject.toml were needed for this story.
Conflict surfaced + resolution (CLAUDE.md Rule 7)¶
Story AC-2 literal text: "single match value: statement … with arms
for Fresh(...) and Stale(reason=CommitsBehind(...)), …".
Reality: mypy's match-narrowing on Pydantic's nested discriminated
union (Stale.reason: StaleReason = Annotated[CommitsBehind | … |
IndexerError, Field(discriminator="kind")]) does NOT fully exhaust the
inner type — after all four case Stale(reason=X) arms, mypy still
thinks Stale could remain, so assert_never(value) errors out at
build time. This was reproduced under the current pinned mypy + Pydantic
versions.
Resolution chosen: nested match (outer over Fresh|Stale, inner
over StaleReason). This is the convention the producer already uses
(codegenie.probes.layer_b.index_health._derive_confidence — see that
function's docstring "Nested exhaustive match with assert_never on each
default arm so mypy --warn-unreachable enforces handling of every
variant at BOTH levels"). The deviation is strictly stronger than the
literal AC (both levels assert_never-guarded) and matches CLAUDE.md
Rule 11 ("match the codebase's conventions"). Logged here per Rule 7
"surface conflicts, don't average them"; not silently smoothed in.
Refactor decisions (design-patterns lens)¶
| Pattern | Decision |
|---|---|
| Tagged union / sum type | Reinforced — the renderer is the second consumer of IndexFreshness's sum-type discipline (after _derive_confidence). |
assert_never exhaustiveness |
Reinforced at both nesting levels. |
| Functional core / imperative shell | Renderer is pure — no I/O, no logger, no clock. Imperative shell lives in the writer (the _publish_context_report helper). |
| Capability / chokepoint | Writer remains the only path from RedactedSlice to disk; the renderer is invoked behind that chokepoint, never independently. |
| Open/Closed at the file boundary | Renderer adds no if index_name == … branches; per-variant logic is the case arms. |
| Smart constructor | Not applied (renderer is a pure function); RedactedSlice upstream is the relevant smart constructor and is unchanged. |
| Strategy pattern | Considered for per-variant row formatting; rejected — the inline match arms are clearer than a dict-of-strategies and the AC requires assert_never exhaustiveness in one place. |
| Lazy import | Applied at writer side (_publish_context_report imports codegenie.report inside the function) so the writer's static import graph stays minimal and a future re-ordering of phase modules doesn't fail at import time. |
No anti-patterns introduced (no primitive obsession, no anaemic types, no hidden state, no shotgun branching).
Gates¶
make lint→ clean.lint-imports --config pyproject.toml --no-cache→Contracts: 2 kept, 0 broken.mypy --strict→Success: no issues found in 132 source files.make test→3546 passed, 33 skipped, 3 deselected, 2 xfailed.make fence→ 9 fence tests pass (local Makefile coverage gate is a documented false-positive on narrow subsets perCLAUDE.md; CI runs fence with-o "addopts=", so this is not a CI blocker).
Lessons for follow-on stories¶
- L1. mypy's match-narrowing on nested Pydantic discriminated unions
is incomplete; nested
matchis the codebase convention precisely because of this. When a future story claims "single match" over a nested sum type, plan for the nested form up front and surface the AC-text deviation in the validation phase, not in the executor's refactor step. - L2. The
_INDEX_FRESHNESS_ADAPTER = TypeAdapter(IndexFreshness)construction is a module-level constant — every renderer that validates a Pydantic discriminated-union envelope should follow the same pattern so the per-call construction cost doesn't dominate the hot path. - L3. Defense-in-depth on pathological envelopes paid off — the
parametrized "random garbage" test caught a
Noneslice case that the AC-5 minimal contract would have missed. - L4. Tests that assert ordering of
os.replacecalls (S0 / S3-03 era) are inherently brittle when a new artifact is added to the pipeline. Strengthen them to assert the relations between artifacts (a appears before b) rather than absolute positions.