Skip to content

Attempt log: S8-02 — CLI summary block on stdout

Append-only journal. Each attempt records what was tried, what worked, what didn't, and the lesson for the next attempt or story.


Attempt 1 — 2026-05-18 (phase-story-executor)

Outcome: GREEN. All 10 ACs verified; 39 new tests added (29 unit + 8 parametrized integration + 2 ordering); full pytest suite stays green (3588 passed, 33 skipped, 2 xfailed); mypy --strict, ruff check/ruff format, lint-imports, make fence, and the full pre-commit run --all-files all clean.

Code shipped

  • New:
  • src/codegenie/cli_summary.py — pure SummaryBlock frozen dataclass + summary_block(...) factory + as_lines() formatter. ~95 LOC. No I/O — AST gate enforces it.
  • tests/unit/cli/__init__.py + tests/integration/cli/__init__.py
    • tests/integration/cli/conftest.py (autouse no-op of _seam_configure_logging, mirroring smoke conftest).
  • tests/unit/cli/test_summary_block_pure.py — 9 tests covering AC-8 (purity, regex, idempotence, sort+dedup, AST gate, hypothesis property).
  • tests/unit/cli/test_summary_fingerprints_format.py — 4 tests for AC-3 format regex + hypothesis property.
  • tests/unit/cli/test_summary_no_new_events.py — 2 tests for AC-6 (runtime event-name guard + source-grep interdict on the new helper body).
  • tests/unit/skills/test_loader_shadowed_skills.py — 2 tests for the loader's additive shadowed_skills field (zero-collision / cross-tier collision).
  • tests/integration/cli/test_summary_stdout_shape.py — AC-1.
  • tests/integration/cli/test_summary_count_matches_event.py — AC-2 (stdout count == envelope.written field; both clean and secret-seeded cases).
  • tests/integration/cli/test_summary_plaintext_boundary.py — AC-3 boundary, parametrized over sanitizer._PATTERNS + entropy (7 parameters; auto-extends when a 7th class lands).
  • tests/integration/cli/test_summary_skill_shadowed_data_path.py — AC-4 wiring (calls _emit_phase2_summary directly with synthetic schema_slice, plus a live zero-collision smoke).
  • tests/integration/cli/test_summary_order_after_audit.py — AC-5.
  • tests/integration/cli/test_summary_zero_state.py — AC-7.
  • tests/integration/cli/test_summary_determinism.py — AC-9 (two consecutive gathers, byte-identical stdout; zero-state + three seeded secrets).
  • Modified:
  • src/codegenie/skills/model.py — added ShadowedSkill frozen Pydantic model (skill_id, shadowed_tier, winning_tier, shadowed_path, winning_path — the same five fields the loader's existing structlog event already populates).
  • src/codegenie/skills/loader.py — extended LoadOutcome with shadowed_skills: tuple[ShadowedSkill, ...] (default empty); collision branch appends a ShadowedSkill row in addition to the existing _logger.warning(_EVENT_SHADOWED, ...) call.
  • src/codegenie/probes/layer_d/skills_index.py — extended SkillsIndexSlice with shadowed_skills; passed outcome.shadowed_skills into the constructor.
  • src/codegenie/schema/probes/skills_index.schema.json — added the shadowed_skills array property + ShadowedSkill $defs entry (additionalProperties: false).
  • src/codegenie/cli.py — added _emit_phase2_summary(findings_count, fingerprints, skills_slice) helper; invoked at the end of _run_gather_pipeline after _seam_audit_record returns.

Per-AC evidence

AC Where verified Status
AC-1 — exactly three lines, in order, no extras test_only_three_lines PASS
AC-2 — stdout count == envelope.written field (both surfaces) test_count_zero_on_clean_minimal_ts, test_count_one_when_secret_seeded (asserts equality, plus zero secrets.summary emission) PASS
AC-3 — fingerprints format + plaintext boundary over _PATTERNS test_summary_fingerprints_format (regex + hypothesis), test_summary_plaintext_boundary (parametrized over _PATTERNS + entropy = 7 cases) PASS
AC-4 — LoadOutcome.shadowed_skills data path, not events test_loader_shadowed_skills::test_cross_tier_collision_yields_one_shadowed_skill_row, test_emit_phase2_summary_reads_shadowed_skills_from_slice, test_shadowed_skill_model_uses_real_loader_field_names PASS
AC-5 — summary emits after audit record, exit 0 irrespective of count test_audit_record_exists_before_stdout, test_exit_code_zero_irrespective_of_count PASS
AC-6 — no new Phase-2 structlog events test_no_forbidden_event_names_in_clean_gather, test_no_new_event_emission_in_cli_summary_source PASS
AC-7 — zero-state grep-ability test_three_zero_lines_present PASS
AC-8 — pure formatter / AST gate test_summary_block_pure.py (9 tests including test_pure_no_io_imports AST walk) PASS
AC-9 — determinism (byte-identical across two runs) test_byte_identical_across_two_runs_zero_state, test_byte_identical_with_three_seeded_secrets PASS
AC-10 — mypy + ruff + lint-imports + fence green mypy --strict src/Success: no issues found in 133 source files; ruff checkAll checks passed!; lint-importsContracts: 2 kept, 0 broken.; make fence → 9 tests pass; pre-commit all-files → clean PASS

Conflict surfaced + resolution (CLAUDE.md Rule 7)

Story Goal text: "Impure shell: _emit_summary_block(block: SummaryBlock) calls print three times on stdout."

Reality: the repo bans print( in src/ via two stacked structural defenses: ruff's T201 rule (pyproject.toml [tool.ruff.lint]) and the forbidden-patterns pre-commit hook (scripts/check_forbidden_patterns.py). Both fire on the literal print( substring. Operator-facing stdout in this codebase flows through sys.stdout.write; logging flows through structlog.

Resolution chosen: sys.stdout.write(line + "\n") (one newline per line, three calls). Semantically identical to print for this use case, respects both structural defenses, and matches the codebase's existing convention (Rule 11). The story's "calls print" was casual phrasing for "writes three lines on stdout" — the intent is the operator-visible stdout surface, not the specific stdlib helper.

The comment block explaining the choice deliberately does NOT contain the literal token print( because the forbidden-patterns regex matches anywhere in the file, even inside comments. First attempt's comment ("sys.stdout.write (not print()…") tripped the hook; the shipped comment reframes without the forbidden substring.

Out-of-scope finding (Rule 3 — surgical changes)

SkillsIndexProbe.run (src/codegenie/probes/layer_d/skills_index.py) reads ctx.config and ctx.output_dir, but the coordinator constructs BudgetingContext (which has only workspace, raw_artifact_mb, bytes_written, parsed_manifest, input_snapshot). The probe therefore fails on every gather with AttributeError: 'BudgetingContext' object has no attribute 'config' and is failure-isolated to an empty schema_slice.

This is pre-existing (not introduced by S8-02) and out of scope for this story. My code handles the failure-isolated output gracefully: the .get("shadowed_skills", []) default renders skill_shadowed=[], and AC-4's data-path is validated at the CLI seam (with a synthetic schema_slice) rather than through the broken upstream probe. The zero-collision smoke test_no_collisions_renders_empty_skill_shadowed_on_live_gather exercises the live coordinator output for the case the live pipeline can actually produce today.

Surfacing this as a separate follow-up so a future story owns the BudgetingContextProbeContext API drift — at least four other probes (runtime_trace, ast_grep, tree_sitter_import_graph, semgrep) hit the same wall today.

Refactor decisions (design-patterns lens)

Pattern Decision
Functional core / imperative shell Reinforced — cli_summary.summary_block is pure (AST gate); _emit_phase2_summary is the only impure shell (three sys.stdout.write calls).
Smart constructor Preserved — _emit_phase2_summary takes primitives (findings_count: int, fingerprints: list[str]) instead of RedactedSlice so test code never constructs RedactedSlice outside the documented two-site closed set (02-ADR-0010). Refactored from a RedactedSlice parameter mid-implementation after the structural-defense test caught the leakage.
Newtype discipline ShadowedSkill.skill_id: SkillId — newtype preserved end-to-end; no primitive obsession on the carrier shape.
Tagged union / sum type Not applicable — ShadowedSkill is a flat record, not a sum type.
Open/Closed at the file boundary test_summary_plaintext_boundary iterates sanitizer._PATTERNS so a 7th pattern auto-exercises the boundary check (no per-pattern test file edits).
Strategy pattern Considered + rejected for summary_block formatting — three rendered lines with three trivial formats; a strategy dict would be premature abstraction (Rule 2).
Lazy import Applied at the impure shell — _emit_phase2_summary imports cli_summary and skills.model via importlib.import_module to keep codegenie.cli's static import graph minimal (the import-linter cli contract stays kept).
Capability / chokepoint The redactor remains the only construction site for RedactedSlice; the smart-constructor invariant is preserved by the primitive-passing seam in _emit_phase2_summary.

No anti-patterns introduced (no Any in new signatures, no if/elif ladder over a closed variant set, no broad except Exception, no mutable module-level state, no two-level concrete inheritance).

Gates

  • make lint — clean (ruff check + ruff format).
  • lint-imports --config pyproject.toml --no-cacheContracts: 2 kept, 0 broken.
  • mypy --strict src/Success: no issues found in 133 source files.
  • pre-commit run --all-files → all hooks pass.
  • make fence → 9 fence tests pass.
  • Full pytest suite — 3588 passed, 33 skipped, 3 deselected, 2 xfailed.

Lessons for follow-on stories

  • L1. The repo's print( ban is two-layer structural (ruff T201
  • forbidden-patterns hook). Any new operator-facing stdout in src/ must use sys.stdout.write and comments must avoid the literal substring print( because the forbidden-patterns regex doesn't respect Python comment boundaries.
  • L2. The RedactedSlice smart-constructor invariant is enforced by AST scan of every src/ and tests/ file. A new seam that takes RedactedSlice as a parameter forces test code to construct one, tripping the closed two-site allowlist. Prefer primitive parameters at consumer seams — the wrapper carries the invariant at the producer side, not at every consumer signature.
  • L3. BudgetingContext vs ProbeContext API drift — several Layer-B/C/D probes (runtime_trace, ast_grep, tree_sitter_import_graph, semgrep, skills_index) read attributes that exist on ProbeContext but not on the coordinator-constructed BudgetingContext. They are all failure-isolated to empty slices on every gather. This is a Phase-2 follow-up worth a dedicated story; the affected probes are visible by grepping ctx.config\|ctx.output_dir\|ctx.cache_dir\|ctx.logger under src/codegenie/probes/.
  • L4. structlog.testing.capture_logs() swallows print-to-stdout events — wrapping CliRunner.invoke(...) in capture_logs() intercepts the structlog chain so events don't pollute stdout. This is the pattern any future test that needs a "clean stdout" assertion through a Click runner should follow (the _seam_configure_logging no-op alone isn't enough — structlog's default PrintLogger writes to stdout when unconfigured).