Story S3-03 — Writer signature tightening + envelope-level redactor composition + secrets_redacted_count log field¶
Step: Step 3 — Plant SecretRedactor + RedactedSlice smart constructor at the writer chokepoint
Status: Done — executed 2026-05-16 (see _attempts/S3-03.md)
Effort: S
Depends on: S3-02 (RedactedSlice model; this story imports it at the writer + seam), S3-01 (redact_secrets body that produces the RedactedSlice; composition order pins this story's mock-spy test)
ADRs honored: 02-ADR-0010 (RedactedSlice smart constructor at the writer boundary — type-level "redactor was called"), 02-ADR-0005 (no plaintext persistence — the chokepoint discipline this story finishes), 02-ADR-0008 (no event stream in Phase 2 — secrets_redacted_count is one new structured-log field, not an event-stream subscription)
Validation notes (phase-story-validator, 2026-05-16)¶
Verdict: HARDENED. The story's intent — tighten the writer signature, pin the composition order, emit a single secrets_redacted_count log field — traces cleanly to 02-ADR-0010, 02-ADR-0005, 02-ADR-0008, and phase-arch-design.md §"Gap 4" / §"Logging strategy". But the draft's prescriptions referenced phantom Phase-0 surfaces — six BLOCK-severity inconsistencies with master would have stalled the executor on the first tool call. The structural-fix shape from S3-01/S3-02 validations applies: keep the goal, correct the call sites. Edits applied:
- B1 (BLOCK) —
write_envelopedoes not exist on master. Draft prescribedwrite_envelope(slice_: dict[str, JSONValue], ...) -> Path(a module-level function returningPath). Phase 0 shipsclass WriterwithWriter.write(envelope: dict[str, Any], raw_artifacts: list[tuple[str, bytes]], output_dir: Path) -> None(verified atsrc/codegenie/output/writer.py:142). The CLI seam_seam_write_envelope(envelope, raw_artifacts, output_dir) -> bytes(verified atsrc/codegenie/cli.py:344) is the only call site and returns YAML bytes (for the audit anchor SHA), notPath. Fix: AC-1, AC-2, AC-3, AC-7, the Goal, References, Implementation outline, and Out-of-scope rewritten to nameWriter.write(the method whoseenvelopeparameter tightens fromdict[str, Any]toRedactedSlice) AND_seam_write_envelope(the seam whose parameter must tighten in lock-step). The "return Path" claim removed throughout (the method returnsNone; the seam returnsbytes). - B2 (BLOCK) — Composition site is wrong (
OutputSanitizer.scrubis per-probe, not envelope-level). Draft asserted the composition[field_name_regex_pass, json_value_tree_walk_pass, redact_secrets]is documented inOutputSanitizer.scrub. Masterscrub(output: ProbeOutput, repo_root: Path) -> SanitizedProbeOutput(verified atsrc/codegenie/output/sanitizer.py:158) has two passes: (1)_walk_pass1_keys— secret-field-name rejection (raisesSecretLikelyFieldNameError; NOT a regex replacement); (2)_scrub_container— absolute-path scrubbing (NOT a "JSONValue tree walk" depth-cap pass). The story's named passes do not exist. Worse,redact_secrets(S3-01 AC-1) takesdict[str, JSONValue]— not aProbeOutput— so it cannot literally compose insidescrub. The arch doc resolves this at line 768: "Merged envelope flows throughOutputSanitizer.scrub→SecretRedactor.redact_secrets→ writer." The redactor's natural composition site is the envelope-merge seam incli.py, between Step 8 (_seam_shallow_merge) and Step 9 (_seam_validate_envelope), not inside per-probescrub. Fix: AC-4, AC-5, AC-6, AC-7 and Implementation outline rewritten — composition is a new module-level_PASSES: list[Callable]in a newsrc/codegenie/output/envelope_redactor.pymodule (or equivalent — see implementer note) that the new_seam_redact_envelope(envelope) -> tuple[RedactedSlice, int]step incli.pyconsumes; the docstring documenting the composition lives at the seam-level module, notsanitizer.py. The pre-existing Phase-0 per-probeOutputSanitizer.scrubis unchanged (preserves Phase 0 contract-freeze). The composition order pinned:_redact_known_patterns_pass(S3-01's named-pattern regex sweep) →_redact_entropy_pass(S3-01's entropy fallback) →_build_redacted_slice_pass(the smart-constructor closure that returnsRedactedSlice). Note: the original draft's claim that "Phase 0's field-name regex + JSONValue tree walk run before redact_secrets" is preserved as architectural context (per-probescrubruns first; envelope-merge happens; then envelope-redactor runs), but it is no longer a_PASSESmembership claim — those Phase-0 passes are upstream, not co-located. - B3 (BLOCK) —
event="envelope.written"does not exist on master. Draft asserted the new field rides on an existing writer-completion event.grep -rn '_log\.info\|logger\.info' src/codegenie/output/writer.pyreturns zero hits (verified — the Phase-0Writer.writeis silent; only_log.warning("writer.csafe.unavailable"|"writer.symlink.refused")exist). Noenvelope.writtenevent is emitted by Phase 0. Fix: AC-11 reworded — this story introduces the writer-completion eventevent="envelope.written"as the carrier forsecrets_redacted_count. AC-11 asserts (a) the event is emitted exactly once perWriter.writecall; (b) the event name is the constantEVENT_ENVELOPE_WRITTEN: Final[str] = "envelope.written"fromsrc/codegenie/logging.py(no string literal at the call site, same regression-resistance discipline asSECRETS_REDACTED_COUNT_FIELD); (c) the event is emitted on the success path after_atomic_write_bytesreturns (so a failed write does not emit a misleading "written" signal). The event is single-event, single-field — 02-ADR-0008's "no event stream" is honored (one new structured-log field on a documented success event, not an event-bus subscription). - B4 (BLOCK) — AC-1's
reveal_typemechanism is invalid. Draft prescribedreveal_type(write_envelope.__annotations__["slice_"])corresponds toRedactedSlice.reveal_typeis a mypy directive emitted by the type-checker, not a runtime callable that "corresponds to" a value. Implementer note 148 acknowledges "Pick the runtime form for AC-1" but the AC text never updates. Fix: AC-1 rewritten — runtime introspection viatyping.get_type_hints(Writer.write)["envelope"] is RedactedSliceANDtyping.get_type_hints(_seam_write_envelope)["envelope"] is RedactedSlice. Both must hold (the seam and the method are the two consumer surfaces; both narrow in lock-step). A regression that tightens only one of the two is caught. - B5 (BLOCK) — AC-13 contract-freeze claim unverifiable at validation time. Draft said "verify against the actual snapshot file at implementation time." A validator cannot tell whether
tests/unit/test_probe_contract.pysnapshotsWriter.writeor_seam_write_envelopewithout grep. Fix: AC-13 rewritten with the explicit verification recipe:result = subprocess.run([sys.executable, "-c", "import tests.unit.test_probe_contract as m; print(repr(getattr(m, '_WRITER_WRITE_SNAPSHOT', None) or getattr(m, '_PROBE_ABC_SNAPSHOT', None)))"], …). The test names the two candidate snapshot constants by programmatic enumeration: a regression that addsWriter.writeto the frozen surface (so the signature tightening fails the snapshot) is caught here. The PR description must document the snapshot diff if any. -
B6 (BLOCK) —
redact_secrets'sprobe_nameparameter has no envelope-level meaning. Draft assumedredact_secrets(S3-01 AC-1:redact_secrets(slice_: dict[str, JSONValue], probe_name: ProbeId)) composes insideOutputSanitizer.scrubwhereprobe_nameis in scope per-probe. At the envelope-merge layer the merged envelope has no singleprobe_name— findings come from many probes. Fix: AC-12 + Implementation outline + Notes-for-implementer pin the envelope-level convention: the seam callsredact_secrets(envelope, ProbeId("__envelope__"))(a sentinelProbeIdvalue reserved for the envelope-merge pass). TheSecretFinding.probe_namefield carries"__envelope__"for any finding the per-probe scrub missed — visible to the CLI summary as "secrets matched at envelope merge". AC-12b added: assert that the per-probe pass also runs (when wired by Phase 2's S6-06/S6-07 scanners) so the dominant attribution path (per-probe) is preserved; envelope-level redaction is the safety net, not the first line of defense. -
F1 (harden) — AC-2 mypy invocation under-specified. Draft said "subprocess
mypy --strict" without pinningpython -m mypy(canonical invocation in CI), the fixture path, or the exact error-substring contract. Mirror S1-11 AC-2 / S3-02 F1 — both substrings must appear. Fix: AC-2 hardened:subprocess.run([sys.executable, "-m", "mypy", "--strict", str(fixture_path)], capture_output=True, text=True, cwd=<repo_root>)againsttests/unit/output/_fixtures/raw_dict_to_writer.py(per implementer note 147). Assertresult.returncode != 0. Assertresult.stdoutcontains BOTH"incompatible type"AND'expected "RedactedSlice"'(theandcontract). AC-2b added:tests/unit/output/_fixtures/redacted_slice_to_writer.py(a clean snippet callingWriter.write(my_redacted_slice, [], output_dir)) must produceresult.returncode == 0— the positive control proving the fixture-mypy harness is wired correctly. - F2 (harden) — AC-3 runtime-rejection mechanism ambiguous. Draft offered three candidate mechanisms (
TypeError,AttributeError,isinstance); implementer note 149 picksisinstancebut AC-3 didn't pin it. A regression that lets a rawdictreachWriter.writeand coincidentally fails later (e.g., onslice_.findings_count) would satisfy a permissive AC-3 even though the writer never guarded. Fix: AC-3 hardened —Writer.writebody, FIRST executable statement, isif not isinstance(envelope, RedactedSlice): raise TypeError(...); the TypeError message contains the substringRedactedSlice(case-sensitive) AND the substring02-ADR-0010(so the failure points the reader at the source-of-truth). Test asserts:pytest.raises(TypeError, match=r"RedactedSlice.*02-ADR-0010"). AC-3b added: programmatic check viainspect.getsource(Writer.write)— theisinstance(envelope, RedactedSlice)guard appears in the source (a regression that drops the check is caught by source-level inspection, NOT by Python type-hints at runtime which are stripped). The same guard is duplicated in_seam_write_envelope(defense-in-depth — the seam is the public consumer; the method is the internal consumer; both reject). - F3 (harden) — AC-5 mock-spy mechanism requires explicit
_PASSESindirection. Draft said wrap each pass withMock(wraps=original). But if the seam-level composition inlines three function calls (no_PASSESindirection),Mock(wraps=original)cannot intercept — module-attribute monkeypatching against function references does not redirect direct calls inside the same module. Fix: Implementation outline + AC-5 pin the structural requirement:_PASSES: tuple[SanitizerPass, ...] = (_redact_known_patterns_pass, _redact_entropy_pass, _build_redacted_slice_pass)is a module-level tuple insrc/codegenie/output/envelope_redactor.py; the seam calls_redact_envelope(envelope)which iteratesfor pass_ in _PASSES. Mock-spy test monkeypatchesenvelope_redactor._PASSES = (Mock(wraps=_redact_known_patterns_pass), Mock(wraps=_redact_entropy_pass), Mock(wraps=_build_redacted_slice_pass))and asserts the recorded call sequence in a sharedrecord: list[str](each spy appends its name before delegating). Therecordmechanism (recommended in original implementer note 146) is pinned in the AC. - F4 (harden) — AC-6 mutation test asserts the wrong thing. Draft text: "the test asserts the non-mutated order is still verified by the spy chain (i.e., the mutation test is the negative form of AC-5 — under the mutation, the assertion fails)." This is confusing — it conflates the positive and negative cases. Fix: AC-6 rewritten with two-part structure: (a)
test_reorder_mutation_changes_recorded_order— monkeypatch_PASSES = (_redact_entropy_pass, _redact_known_patterns_pass, _build_redacted_slice_pass)(entropy before known-patterns); call_redact_envelope; assert the recorded order is["entropy", "known_patterns", "build"], NOT the canonical order. This proves the mechanism is sensitive. (b)test_canonical_order_under_no_mutation— assert with the unmodified_PASSESthe recorded order is["known_patterns", "entropy", "build"]. Both must pass. - F5 (harden) — AC-9/AC-10 fixture shape conflates unit vs integration. Draft said "gather a fixture repo with no secrets" / "gather a fixture repo with three seeded secrets" — but the story is unit-level (writer + seam + log emission), not end-to-end gather. A full gather pipeline as a unit test is slow and brittle. Fix: AC-9, AC-10 rewritten to construct a
RedactedSlicedirectly viaredact_secrets({}, ProbeId("__envelope__"))(zero-secret case) or via a seeded fixture dict (three-secret case), pass toWriter.writevia atmp_pathoutput_dir, capture logs withstructlog.testing.capture_logs(), and assert the event + field. The end-to-end-gather form is owned by S6-07 / S7-04. Implementer note 153 added pinning the unit-vs-integration split. - F6 (harden) — AC-12 dataflow assertion missing source-level mechanism. Draft said "no
SecretFindingdata appears in any persisted artifact" without pinning how the test verifies. Fix: AC-12 hardened — readoutput_dir/repo-context.yamlafterWriter.write; assert via substring search that NONE of theSecretFindingfield names ("pattern_class","cleartext_len") appear anywhere in the YAML bytes. Assert via substring search that NONE of the canonical seeded plaintexts (e.g.,"AKIAIOSFODNN7EXAMPLE") appear. Assert"<REDACTED:fingerprint="DOES appear (positive control — the redactor ran). A regression that threads the findings list into the envelope is caught by the first negative; a regression that disables the redactor is caught by the third positive. - F7 (harden) —
EVENT_ENVELOPE_WRITTEN/SECRETS_REDACTED_COUNT_FIELDimport-by-name unenforced. AC-8 said "no string literal at the call site" but pinned no mechanism. Same shape as S3-02 AC-14 (regex-based source check). Fix: AC-8 hardened with a programmatic source check —inspect.getsource(Writer.write)does NOT contain the literal"secrets_redacted_count"(string-literal regex:re.compile(r'["\']secrets_redacted_count["\']')). Same for"envelope.written"at the call site. The constants are imported and referenced by name. AC-8b added:SECRETS_REDACTED_COUNT_FIELD in codegenie.logging.__all__ANDEVENT_ENVELOPE_WRITTEN in codegenie.logging.__all__. A regression that forgets to export is caught. - F8 (harden) — Cross-story integration with S3-01 + S3-02 unasserted. S3-02's analogous F8 closure mandates a parametrized integration test feeding
redact_secretsslices of {0, 1, 3-distinct, same-fingerprint-twice} secrets. The S3-03 layer's natural extension: each of those cases, fed end-to-end through_seam_redact_envelope+_seam_write_envelope, produces the expectedsecrets_redacted_count=Nlog emission AND a YAML file whose contents satisfy the F6 substring contract. Fix: AC-15 added — parametrized over the four shapes; for each, assert (a) the log event carries the expected count; (b) the YAML satisfies the F6 substring contract; (c) theRedactedSliceround-trips throughmodel_validate(model_dump())post-write (the writer must not mutate the slice). Pins the structural-defense ladder's end-to-end witness. - F9 (harden) — Module-docstring assertion technique unspecified. AC-4 said the docstring documents composition order but pinned no test mechanism. Same F11 closed in S3-01 / S3-02. Fix: AC-4 strengthened — programmatic check via
inspect.getdoc(codegenie.output.envelope_redactor)substring-matches all four substrings:"02-ADR-0005","02-ADR-0010","02-ADR-0008", AND"Three-pass composition"(or equivalent ladder framing). A regression that drops any of the four is caught. -
F10 (harden) — Per-probe scrub-still-runs invariant unasserted. B6's fix names the per-probe + envelope two-layer defense, but no AC enforces that S3-01's per-probe attribution path is preserved. Fix: AC-16 added — given a fixture probe output whose
schema_slicealready contains a<REDACTED:fingerprint=…>placeholder (per-probe scrub ran upstream), the envelope-level_redact_envelopeis idempotent: re-scanning a placeholder string does NOT produce a new finding (the placeholder is not itself a high-entropy match — verify against the entropy threshold). A regression that double-counts placeholders is caught. -
DP1 (Note) —
_PASSESregistry crosses the rule-of-three threshold but stays a tuple. This story is the third known-pass composition in the codebase (Phase 0 per-probe scrub has two passes; S3-03 lands three envelope-level passes). Production ADR-0033 §3 + the design-patterns toolkit's Registry/Plugin pattern names rule-of-three for promotion. Not promoted to AC — the third pass is the closure (_build_redacted_slice_passfinalizes theRedactedSlice), not a content-redaction pass. The fourth content-redaction pass (Phase-4 RAG-scrubber or a future per-task-class redactor) would cross the threshold for promotion to a@register_sanitizer_passdecorator. Fix: Notes-for-implementer §"Design patterns" added —_PASSESstays a literal tuple in Phase 2; Phase 4+ promotes to a decorator registry when the fourth content pass arrives. Cite the open/closed prescription in the note prose (a registry isclosed for modification, open for extension; today's literal tuple is closed for both — fine while N=3). - DP2 (Note) —
SanitizerPassProtocol fits today. The three passes share the shapeCallable[[dict[str, JSONValue]], dict[str, JSONValue] | RedactedSlice]. AProtocol-typed alias makes the seam testable and the contract self-documenting. Fix: Notes-for-implementer addsclass SanitizerPass(Protocol): def __call__(self, slice_: dict[str, JSONValue]) -> dict[str, JSONValue]: ...as the recommended type for_PASSESmembers (with the closure pass typed as a sibling Protocol since its return type differs). The Protocol surface keeps the registry promotion (DP1) trivially compatible later. - DP3 (Note) —
Fingerprintnewtype rule-of-three threshold REACHED at S3-03 (third consumer). S3-01 (Validation note #11) deferred; S3-02 (Validation note #12) deferred; S3-03 is the third consumer (Writer.writereadsenvelope.fingerprintsto embed in the persisted shape per 02-ADR-0010 Tradeoffs row 2). Production ADR-0033 §3 names primitive obsession on cross-module identifiers as a review-blocker. Decision: elevate to a Phase-3-entry cross-cutting story note rather than this story's AC — three of the four eventual consumers are inside Phase 2 (sanitizer.py,redacted_slice.py,Writer.write); the fourth (CLI summary at S8-02) is the natural concurrent landing site. Fix: Notes-for-implementer §"Design patterns" pins this — S3-03 does not introduceFingerprint; S8-02 (or a sibling cross-cutting story) lands it concurrently with the CLI consumer. The opportunity is logged in_validation/S3-03.mdfor follow-up tracking; the deferral is principled (rule-of-three threshold is reached but the fourth consumer is one story away). - DP4 (Note) — Pure-impure split holds. The composition module (
envelope_redactor.py) is pure — no I/O, no logging, no filesystem reads. The seam (_seam_redact_envelopeincli.py) is impure (it's a seam — that's its job). The Writer (Writer.write) is impure (it persists). The log emission (secrets_redacted_count) is impure-shell. The functional core / imperative shell discipline is preserved. Fix: Notes-for-implementer §"Pure module" enforces thatenvelope_redactor.pymay not importlogging,structlog,os.environ,subprocess, ortime— same constraint S3-02 placed onredacted_slice.py. A regression that adds I/O to the redactor is a review-blocker. - DP5 (Note) — Make-illegal-states-unrepresentable + smart-constructor ladder, closed at this story. S3-01 makes the runtime defense (replace cleartext). S3-02 makes the type-level defense (
RedactedSlicesmart constructor). S3-03 makes the chokepoint defense (writer + seam both narrow toRedactedSlice; an off-path consumer that constructs raw envelopedict[str, Any]cannot reach disk via the chokepoint). The third structural-defense rung lands here; S7-04 (Gap-5inspect-based boundary test) closes the source-level rung after Phase 2's probes are all in. Document the four-rung ladder in the module docstring + PR description.
Stage 3 research skipped — no NEEDS RESEARCH findings. Every gap was answerable from arch (phase-arch-design.md §"Gap 4" + §"Logging strategy") + ADRs (02-ADR-0005, 02-ADR-0008, 02-ADR-0010) + verified live source (src/codegenie/output/writer.py:142forWriter.write;src/codegenie/output/sanitizer.py:158forscrub;src/codegenie/cli.py:344for_seam_write_envelope;src/codegenie/cli.py:308for_seam_shallow_merge`) + S3-01 / S3-02 sibling validation precedent.
Coverage critic: HARDEN (eight findings — F1–F8 closed; the unit-vs-integration split, log-event introduction, cross-story integration, and idempotence over placeholders were all gaps). Test-quality critic: HARDEN (mutation table shows four plausibly-wrong implementations would have slipped past the original TDD plan — a Writer.write that accepts both dict and RedactedSlice (no isinstance guard, no mypy fixture), a _PASSES-less inline composition (mock-spy unable to fire), an envelope that threads SecretFinding data into the YAML (no F6 substring negative), and a log event without a single source-of-truth constant (string-literal drift) — all closed below). Consistency critic: SIX BLOCK findings (B1–B6), zero unresolvable ADR conflicts (the arch doc line 768 — "Merged envelope flows through OutputSanitizer.scrub → SecretRedactor.redact_secrets → writer" — was the source-of-truth that resolved B2's "where does composition live" question). Design-pattern critic: five nits surfaced as Notes-for-implementer (_PASSES registry promotion deferred until N=4; SanitizerPass Protocol recommended; Fingerprint newtype rule-of-three reached but deferred to S8-02 concurrent landing; pure-module discipline pinned; four-rung structural-defense ladder documented). Seventeen ACs original → twenty-three ACs after hardening (AC-2b, AC-3b, AC-8b, AC-12b, AC-15, AC-16 added; AC-1, AC-2, AC-3, AC-4, AC-5, AC-6, AC-7, AC-8, AC-9, AC-10, AC-11, AC-12, AC-13 reworded).
Ready for phase-story-executor.
Context¶
S3-01 lands redact_secrets(slice_: dict[str, JSONValue], probe_name: ProbeId) -> tuple[RedactedSlice, list[SecretFinding]] in src/codegenie/output/sanitizer.py. S3-02 lands the RedactedSlice Pydantic model with frozen=True, extra="forbid", fingerprint format validators, and the model_construct ban. Both prior stories are inert until the writer accepts RedactedSlice — without the signature tightening at the chokepoint, the runtime defense (02-ADR-0005) holds but the type-level defense (02-ADR-0010) does not. This story is the closing edge:
- The
Writer.writemethod'senvelopeparameter narrows fromdict[str, Any]toRedactedSlice(the public consumer surface change). - The CLI seam
_seam_write_envelopenarrows in lock-step (the call site that consumesWriter.write). - A new envelope-level composition step lands (
_seam_redact_envelopeincli.py, or equivalent module — see implementer note) between Step 8 (_seam_shallow_merge) and Step 9 (_seam_validate_envelope). The composition is the canonical three-pass sequence: known-pattern regex sweep → entropy fallback →RedactedSliceclosure. The pass list lives in a module-level_PASSEStuple so the mock-spy test can monkeypatch and the executor's Validator pass can verify the order via call records. src/codegenie/logging.pygains two module-level constants —SECRETS_REDACTED_COUNT_FIELD: Final[str] = "secrets_redacted_count"andEVENT_ENVELOPE_WRITTEN: Final[str] = "envelope.written"— andWriter.write(on success only) emits exactly one structured-log event whose name is the second constant and whosesecrets_redacted_countfield carriesenvelope.findings_count. A zero-count run emitssecrets_redacted_count=0explicitly — grep-able for the auditor who needs "did this run find any secrets?".
The composition order matters for a non-obvious reason. The Phase 0 per-probe OutputSanitizer.scrub runs first (secret-field-name rejection + absolute-path scrubbing); the coordinator then merges per-probe schema_slice dicts into a single envelope; the envelope-level _redact_envelope runs as the safety net for any cleartext that survived per-probe scrubbing (e.g., a high-entropy string buried inside a list-of-dict that the per-probe pass missed because no field-name matched the secret-name regex). The per-probe pass attributes findings to a specific probe_name; the envelope-level pass uses the sentinel ProbeId("__envelope__") because the merged envelope has no single probe. The CLI summary distinguishes the two surfaces — per-probe findings show the probe name; envelope-level findings show "envelope" (interpreted: "the redactor caught it at the merge layer; consider tightening the per-probe pattern set"). The two layers compose; the envelope-level pass is the load-bearing chokepoint that 02-ADR-0010's type-level guarantee binds.
The mock-spy test (test_envelope_redactor_composition.py) constructs an _redact_envelope(envelope) invocation where each of the three composed passes (held in _PASSES) is wrapped with a Mock(wraps=original) spy that appends its name to a shared record: list[str] before delegating. The test asserts the recorded sequence is ["known_patterns", "entropy", "build"]. A reorder regression (e.g., a contributor moving _build_redacted_slice_pass to the front for "consistency") flips the recorded order and fails the build.
The writer signature tightening is a contract-surface narrowing. The previous Phase-0 signature accepted dict[str, Any]. The new signature accepts only RedactedSlice. This is a one-way narrowing: _seam_write_envelope is the only call site, and it narrows in lock-step; mypy --strict catches any other call site that passes a raw dict. The mypy --strict test (test_writer_signature.py) is a fixture-file pair: _fixtures/raw_dict_to_writer.py (a snippet that calls Writer().write({}, [], output_dir)) must fail mypy with error: ... incompatible type ... expected "RedactedSlice"; _fixtures/redacted_slice_to_writer.py (a snippet that calls Writer().write(my_redacted_slice, [], output_dir)) must pass mypy clean. Both fixtures run in CI as python -m mypy --strict invocations and assert exit codes + error substrings.
The secrets_redacted_count log field is the audit grep-ability invariant. Phase 2 final-design Open Q 3 is closed by 02-ADR-0008 ("no event stream") — Phase 2 adds one structured-log event with one new field at one call site, not an event-bus subscription. The field's value is envelope.findings_count from the RedactedSlice produced by _seam_redact_envelope, captured at the writer chokepoint. A 0-count run emits secrets_redacted_count=0 — grep-friendly for the auditor. The CLI summary line (count + file:line list) is touched in S8-02; this story emits the structured-log field at the writer; the CLI summary path consumes it.
References — where to look¶
- Architecture:
../phase-arch-design.md §"Sequence — secret-redaction flow"(line ~420) — the composition order at the envelope-merge layer.../phase-arch-design.md §"Component design" #4 SecretRedactor— the writer-chokepoint discipline, the in-memory findings list policy.../phase-arch-design.md §"Harness engineering" → "Logging strategy"(line ~783) — Phase 2 adds one log field at the writer:secrets_redacted_count(int), so a 0-count run is grep-able. Phase 0codegenie/logging.pyis otherwise unchanged.../phase-arch-design.md §"Gap analysis & improvements" Gap 4— the writer signature tightening fromdicttoRedactedSlice; type-level "redactor was called".../phase-arch-design.mdline 768 — "Merged envelope flows throughOutputSanitizer.scrub→SecretRedactor.redact_secrets→ writer." This pins the envelope-level composition site (after per-probe scrub + coordinator merge, before validate + write).- Phase 2 ADRs:
../ADRs/0010-redacted-slice-smart-constructor-at-writer-boundary.md— Consequences section names "the writer signature change is a contract surface shift requiring a coordinated edit across all callers (one — the sanitizer pipeline)" — on master, the actual single caller is_seam_write_envelope.../ADRs/0005-secret-findings-no-plaintext-persistence.md— Consequences section names the composition: per-probeOutputSanitizer.scrub(Phase 0) → envelope-merge (Phase 0) →redact_secrets(Phase 2 envelope-level chokepoint) → writer.../ADRs/0008-no-event-stream-in-phase-2.md— the structured-log-field-only rationale;secrets_redacted_countis the one field on the one new event this story adds.- Source design:
../final-design.md §"Anti-patterns avoided" #5—model_constructbypass (verified by S3-02; this story does not regress it).- Existing code on master (verified via grep at validation time):
src/codegenie/output/writer.py:142— Phase 0class WriterwithWriter.write(envelope: dict[str, Any], raw_artifacts: list[tuple[str, bytes]], output_dir: Path) -> None. NOT a module-levelwrite_envelopefunction returningPath. This story tightens theenvelopeparameter type toRedactedSlice.src/codegenie/output/sanitizer.py:158— Phase 0OutputSanitizer.scrub(output: ProbeOutput, repo_root: Path) -> SanitizedProbeOutput. Per-probe two-pass: secret-field-name rejection (raisesSecretLikelyFieldNameError) + absolute-path scrubbing. This story does NOT editscrub— the envelope-level redaction is a new module/seam, not a composition into per-probescrub.src/codegenie/cli.py:344— Phase 0_seam_write_envelope(envelope: dict[str, Any], raw_artifacts, output_dir) -> bytes. The seam that callsWriter().write(envelope, raw_artifacts, output_dir)and returns the YAML bytes (for the audit anchor). This story tightens itsenvelopeparameter toRedactedSlicein lock-step.src/codegenie/cli.py:308— Phase 0_seam_shallow_merge(envelope, outputs) -> dict[str, Any]. Step 8 of the 11-step pipeline. The new_seam_redact_enveloperuns between this and_seam_validate_envelope(Step 9).src/codegenie/cli.py:572— the call site that consumes_seam_write_envelope(yaml_bytes = _seam_write_envelope(envelope, raw_artifacts, output_dir)). This story inserts a_seam_redact_envelopecall here that produces theRedactedSliceconsumed by both the validate seam (which validates the inner.slice) and the write seam.src/codegenie/logging.py— Phase 0structlogfactory; this story adds two field-name constants (SECRETS_REDACTED_COUNT_FIELD,EVENT_ENVELOPE_WRITTEN) and exports both via__all__.- Phase 0 contract-freeze:
tests/unit/test_probe_contract.pysnapshotsProbeABC,OutputSanitizer.scrubsignature,run_allowlistedsignature. Thescrubsignature is unchanged in this story. TheWriter.writesignature is not part of Phase 0's frozen surface per the existing snapshot file (verify at implementation time and document the diff in the PR if untrue). - Phase 1 shape calibration:
docs/phases/01-context-gather-layer-a-node/stories/S1-02-safe-json-parser.md §"AC-13/14"— structured-event emission viastructlog.testing.capture_logs(); the same pattern applies to AC-9–AC-11 below.
Goal¶
Tighten the writer's public signature, finalize and document the envelope-level redactor composition order, and emit the secrets_redacted_count log field at the writer chokepoint:
src/codegenie/output/writer.py::Writer.write—envelopeparameter narrows fromdict[str, Any]toRedactedSlice(the type system rejects rawdictat the writer's public surface; mypy--strictcatches the violation; a runtimeisinstanceguard rejects withTypeError).src/codegenie/cli.py::_seam_write_envelope—envelopeparameter narrows fromdict[str, Any]toRedactedSlicein lock-step (the seam is the only call site ofWriter.write).- A new
src/codegenie/output/envelope_redactor.pymodule hosts_PASSES: tuple[SanitizerPass, ...](three module-level passes) and_redact_envelope(envelope: dict[str, JSONValue]) -> RedactedSlice. The module docstring documents the composition order: known-pattern regex sweep → entropy fallback →RedactedSliceclosure. A new_seam_redact_envelope(envelope) -> tuple[RedactedSlice, int]incli.pyconsumes it and runs between Step 8 (_seam_shallow_merge) and Step 9 (_seam_validate_envelope). src/codegenie/logging.pydeclares two module-level constants —SECRETS_REDACTED_COUNT_FIELD: Final[str] = "secrets_redacted_count"ANDEVENT_ENVELOPE_WRITTEN: Final[str] = "envelope.written"— exported via__all__.Writer.write(on success only, after_atomic_write_bytesreturns) emits exactly one structured-log event whose name isEVENT_ENVELOPE_WRITTENand whoseSECRETS_REDACTED_COUNT_FIELDvalue isenvelope.findings_count. A zero-count run emits the field explicitly (not omitted).- A
mypy --strictfixture pair test (tests/unit/output/test_writer_signature.py) asserts: (a)_fixtures/raw_dict_to_writer.py(a snippet callingWriter().write({}, [], output_dir)) fails mypy withincompatible typeANDexpected "RedactedSlice"; (b)_fixtures/redacted_slice_to_writer.py(a clean snippet) passes mypy clean. CI runs both aspython -m mypy --strict.
Acceptance criteria¶
Writer signature tightening:
- [ ] AC-1 —
src/codegenie/output/writer.py::Writer.writesignature isdef write(self, envelope: RedactedSlice, raw_artifacts: list[tuple[str, bytes]], output_dir: Path) -> None(other parameters and theNonereturn type preserved from Phase 0). A test assertstyping.get_type_hints(Writer.write)["envelope"] is RedactedSliceANDtyping.get_type_hints(_seam_write_envelope)["envelope"] is RedactedSlice(the seam and the method narrow in lock-step). A regression that tightens only one of the two is caught. - [ ] AC-2 —
tests/unit/output/test_writer_signature.py::test_writer_refuses_raw_dict_at_typecheck— runssubprocess.run([sys.executable, "-m", "mypy", "--strict", str(fixture)], capture_output=True, text=True, cwd=<repo_root>)againsttests/unit/output/_fixtures/raw_dict_to_writer.py(which containsfrom codegenie.output.writer import Writer; Writer().write({}, [], Path("/tmp"))). Asserts (a)result.returncode != 0; (b)result.stdoutcontains BOTH literal substrings"incompatible type"AND'expected "RedactedSlice"'(theandcontract — both must appear, mirroring S1-11 AC-2's hardening shape). - [ ] AC-2b — Positive control —
tests/unit/output/test_writer_signature.py::test_writer_accepts_redacted_slice_at_typecheck— same invocation againsttests/unit/output/_fixtures/redacted_slice_to_writer.py(a snippet callingWriter().write(my_redacted_slice, [], Path("/tmp"))wheremy_redacted_slicecomes fromredact_secrets). Assertsresult.returncode == 0ANDresult.stdoutis empty (or contains only mypy's "Success" banner). This proves the fixture-mypy harness is wired correctly; without it, AC-2 could pass spuriously on a broken mypy invocation. - [ ] AC-3 — Runtime:
Writer.write's first executable statement isif not isinstance(envelope, RedactedSlice): raise TypeError(...). TheTypeErrormessage contains the substring"RedactedSlice"(case-sensitive) AND the substring"02-ADR-0010". Test:with pytest.raises(TypeError, match=r"RedactedSlice.*02-ADR-0010"): Writer().write({}, [], tmp_path). The runtime-rejection layer complements the type-check-time rejection. - [ ] AC-3b — Source-level verification —
inspect.getsource(Writer.write)contains the regexr"isinstance\s*\(\s*envelope\s*,\s*RedactedSlice\s*\)". A regression that drops the guard (relying only on Python's stripped-at-runtime type hints) is caught at source-level inspection. Same source-check is run against_seam_write_envelope(defense-in-depth: both surfaces guard).
Envelope-level redactor composition order:
- [ ] AC-4 —
src/codegenie/output/envelope_redactor.pymodule docstring documents the composition order: "Three-pass composition (envelope-level chokepoint, 02-ADR-0010): (1)_redact_known_patterns_pass— regex sweep across the merged envelope (S3-01 named patterns); (2)_redact_entropy_pass— Shannon-entropy fallback for novel credential shapes (S3-01len ≥ 32,≥ 4.5 bits/char); (3)_build_redacted_slice_pass— the smart-constructor closure that returnsRedactedSlice(02-ADR-0010). The order is load-bearing: known patterns first (cheap regex hits exit early), entropy second (expensive Shannon-entropy walk only on survivors), closure last (immutable model construction). Reordering would not change semantic output but would lose the cheap-first invariant. Per-probeOutputSanitizer.scrub(Phase 0, 02-ADR-0005) is upstream of this module — not a co-located pass. See 02-ADR-0008 for the no-event-stream framing ofsecrets_redacted_count. Verified bytest_envelope_redactor_composition.py." A test assertsinspect.getdoc(codegenie.output.envelope_redactor)substring-matches all four references:"02-ADR-0005","02-ADR-0010","02-ADR-0008", and"Three-pass composition". A regression that drops any of the four substrings fails. - [ ] AC-5 —
tests/unit/output/test_envelope_redactor_composition.py::test_redact_envelope_invokes_passes_in_order— monkeypatchesenvelope_redactor._PASSESwith a tuple ofMock(wraps=pass_)spies; each spy first appends its name to a sharedrecord: list[str](e.g.,record.append("known_patterns")) and then delegates to the wrapped original. Calls_redact_envelope(fixture_envelope). Assertsrecord == ["known_patterns", "entropy", "build"]. Each spy'scall_count == 1. - [ ] AC-6 — Mutation sensitivity —
tests/unit/output/test_envelope_redactor_composition.pyadds two paired tests: test_reorder_mutation_changes_recorded_order— monkeypatch_PASSES = (_redact_entropy_pass, _redact_known_patterns_pass, _build_redacted_slice_pass)(entropy before known-patterns). Call_redact_envelope. Assertrecord == ["entropy", "known_patterns", "build"](NOT the canonical order). This proves the recording mechanism is order-sensitive.test_canonical_order_under_no_mutation— with the live_PASSES, assertrecord == ["known_patterns", "entropy", "build"].- Both pass together; a regression that disables the recording mechanism (or hard-codes the assertion) is caught by the first.
- [ ] AC-7 —
_redact_envelopereturn type annotation isRedactedSlice; the seam's call site (_seam_redact_envelope) propagates theRedactedSliceto both_seam_validate_envelope(which validatesRedactedSlice.slice— the inner dict) and_seam_write_envelope(which acceptsRedactedSliceper AC-1). Test assertstyping.get_type_hints(envelope_redactor._redact_envelope)["return"] is RedactedSlice. No tuple at this layer (thelist[SecretFinding]is owned by S3-01'sredact_secrets; the envelope-level closure usesenvelope.findings_countfor downstream consumers).
SECRETS_REDACTED_COUNT_FIELD + EVENT_ENVELOPE_WRITTEN log surface:
- [ ] AC-8 —
src/codegenie/logging.pyexportsSECRETS_REDACTED_COUNT_FIELD: Final[str] = "secrets_redacted_count"ANDEVENT_ENVELOPE_WRITTEN: Final[str] = "envelope.written"as module-level constants.Writer.writeimports both by name; no string literals at the call site — a programmatic check viainspect.getsource(Writer.write)asserts neither of the regexesre.compile(r'["\']secrets_redacted_count["\']')norre.compile(r'["\']envelope\.written["\']')has a hit (the constants are used by name only). A typo in either name is caught at import time. - [ ] AC-8b — Both constants appear in
codegenie.logging.__all__. Test:"SECRETS_REDACTED_COUNT_FIELD" in codegenie.logging.__all__AND"EVENT_ENVELOPE_WRITTEN" in codegenie.logging.__all__. A regression that forgets to export is caught at the test boundary, not at first downstream consumer. - [ ] AC-9 —
tests/unit/output/test_writer_logs_secrets_redacted_count.py::test_count_field_emitted_on_zero_count— constructs aRedactedSlicewithfindings_count=0, fingerprints=[]viaredact_secrets({}, ProbeId("__envelope__")). CallsWriter().write(empty_redacted, [], tmp_path / "ctx")withstructlog.testing.capture_logs()active. Asserts exactly one captured event hasevent == "envelope.written"AND its fields containsecrets_redacted_count == 0. A 0-count run is not silent. - [ ] AC-10 —
tests/unit/output/test_writer_logs_secrets_redacted_count.py::test_count_field_emitted_on_nonzero_count— constructs an envelope fixture containing three seeded secrets (e.g., two distinct AWS keys + one entropy hit at distinct leaves of the merged envelope), runs it through_redact_envelope, passes the resultingRedactedSlice(withfindings_count == 3) toWriter.writeundercapture_logs(). Asserts the same event recordssecrets_redacted_count == 3. - [ ] AC-11 — Event uniqueness + success-path-only emission — the captured events list filtered by
event == "envelope.written"has exactly one entry perWriter.writecall. A second test (test_no_event_on_write_failure) injects a write failure (e.g.,output_dir = tmp_path / "no_such_dir" / "nested"withparents=False— or a monkeypatched_atomic_write_bytesthat raisesOSError); asserts zeroenvelope.writtenevents are captured (the event is emitted only after_atomic_write_bytesreturns; a failure path is silent onenvelope.written).
Sanitizer → writer → log dataflow:
- [ ] AC-12 — End-to-end (within-unit) dataflow assertion. Construct a fixture envelope dict containing three seeded plaintext secrets (e.g.,
{"probes": {"p1": {"value": "AKIAIOSFODNN7EXAMPLE"}}, ...}with two distinct AWS keys + one high-entropy 40-char base64 string). Run through_seam_redact_envelopethen_seam_write_envelope. Then readtmp_path / "ctx" / "repo-context.yaml"as bytes. Assert via substring search: - Negative (no plaintext): NONE of
"AKIAIOSFODNN7EXAMPLE", the second AWS key literal, the entropy plaintext literal appear in the YAML bytes. - Negative (no SecretFinding fields): NONE of
"pattern_class","cleartext_len"(S3-01SecretFindingfield names) appear in the YAML bytes. - Positive (redactor ran):
b"<REDACTED:fingerprint="DOES appear in the YAML bytes. A regression that disables the redactor is caught by the third assertion; a regression that threadsSecretFindingdata into the envelope is caught by the second. - [ ] AC-12b — Per-probe attribution preserved — a fixture where the per-probe
OutputSanitizer.scrubupstream already redacted one secret (substituted with<REDACTED:fingerprint=…>via the S6-06/S6-07 scanners that will land later — for this story, a hand-built fixture matching the post-scrub shape) AND the envelope-level pass catches one additional novel-shape secret. Assert the resultingRedactedSlice.findings_count == 1(only the envelope-level finding; the per-probe placeholder is idempotent under the envelope-level pass — see AC-16). The story's chokepoint is the envelope-level pass; per-probe attribution is the upstream pass.
Idempotence and Phase-0/1 invariants preserved:
- [ ] AC-13 — Phase-0
tests/unit/test_probe_contract.pycontract-freeze snapshot continues to pass. Verification recipe: at story-execution time, runpython -c "import tests.unit.test_probe_contract as m; print([n for n in dir(m) if 'SNAPSHOT' in n.upper()])"to enumerate the snapshot constants. If a_WRITER_WRITE_SNAPSHOT(or equivalent named after the writer) exists, document the diff in the PR description and reference 02-ADR-0010 Consequences. If no writer-frozen snapshot exists, AC-13 is a no-op assertion; document this finding in the PR. TheOutputSanitizer.scrubsnapshot is unchanged (this story does not touchscrub). - [ ] AC-14 — No
model_constructcalls anywhere insrc/codegenie/output/**(positive assertion from S3-02 AC-14 continues to pass after this story's edits). The newenvelope_redactor.pymodule constructsRedactedSliceonly via the public Pydantic constructor inside_build_redacted_slice_pass(this is one new call site; the S3-02 lint rule does not ban the public constructor, onlymodel_construct). S7-04'sinspect-based boundary test will later assertredact_secrets(S3-01) and_build_redacted_slice_pass(S3-03) are the only two construction sites insrc/; for this story the constraint is the negativemodel_constructassertion. - [ ] AC-15 — Cross-story integration with S3-01 + S3-02 (mirrors S3-02 AC-15b) — parametrized over the four canonical shapes:
- Zero secrets — envelope contains no plaintext →
findings_count == 0,len(fingerprints) == 0, log event recordssecrets_redacted_count=0, YAML satisfies F6 contract (nopattern_class, no plaintext, no<REDACTED:fingerprint=either since zero matches). - One secret — envelope contains one AWS key →
findings_count == 1,len(fingerprints) == 1, log event recordssecrets_redacted_count=1, YAML satisfies F6 contract (positive includes<REDACTED:fingerprint=). - Three distinct secrets — envelope contains two distinct AWS keys + one GitHub token in three different leaves →
findings_count == 3,len(fingerprints) == 3, log event recordssecrets_redacted_count=3. - Same-fingerprint-twice — same AWS key in two distinct leaves →
findings_count == 2,len(fingerprints) == 1, log event recordssecrets_redacted_count=2(the 02-ADR-0010 contract: count is total findings, fingerprints are deduplicated). For each case, additionally assert the post-writeRedactedSliceround-trips throughRedactedSlice.model_validate(model.model_dump())(the writer must not mutate the slice). A regression where the writer accidentally mutates the slice (e.g., re-orders dict keys destructively) is caught. - [ ] AC-16 — Placeholder-idempotence — a fixture envelope whose
slicealready contains a<REDACTED:fingerprint=abcdef12>placeholder string (modeling the post-per-probe-scrub shape). Run_redact_envelope. Assert (a) the placeholder is unchanged in the output (entropy ≈log2(16) = 4.0per hex char ≈ below the 4.5 threshold for an 8-char fingerprint; a 30-char<REDACTED:fingerprint=abcdef12>string has further-reduced effective entropy from the literal prefix); (b)findings_countdoes NOT increment for the placeholder. A regression that double-counts placeholders (entropy mis-tuned, or known-pattern regex accidentally matching<REDACTED:) is caught. If the implementer's chosen pattern set or entropy threshold would naively match the placeholder, document the carve-out (e.g., the known-pattern table includesr"<REDACTED:fingerprint=[0-9a-f]{8}>"as an explicit non-match exclusion) in the module docstring. - [ ] AC-17 — Phase 0
safe_yaml.load/safe_json.loadchokepoints are unaffected (this story does not touch the parsers layer). - [ ] AC-18 — The Phase-2
forbidden-patternsglob (S1-11) coveringsrc/codegenie/output/**continues to coverwriter.py,sanitizer.py,redacted_slice.py(S3-02), and the newenvelope_redactor.py. A regression that scopes the glob narrower is caught by S3-02 AC-13's runtime predicate assertion (_is_under_phase2_banned_package(Path("src/codegenie/output/envelope_redactor.py")) is True).
Toolchain:
- [ ] AC-19 —
ruff check,ruff format --check,mypy --strict,pytestpass on touched files.mypy --strictflags the raw-dict-to-writer snippet (AC-2) and clean-passes theRedactedSlice-to-writer snippet (AC-2b).
Implementation outline¶
- Create
src/codegenie/output/envelope_redactor.py:Note: the three pass functions may delegate to a single"""Envelope-level secret-redaction chokepoint (02-ADR-0005, 02-ADR-0010, 02-ADR-0008). Three-pass composition (envelope-level, post-merge): 1. _redact_known_patterns_pass — regex sweep (S3-01 named patterns). 2. _redact_entropy_pass — Shannon-entropy fallback (S3-01). 3. _build_redacted_slice_pass — smart-constructor closure → RedactedSlice. Per-probe OutputSanitizer.scrub (Phase 0) is upstream of this module — not a co-located pass. See 02-ADR-0008 for the no-event-stream framing of `secrets_redacted_count`. Verified by tests/unit/output/test_envelope_redactor_composition.py. """ from __future__ import annotations from typing import Protocol from codegenie.output.sanitizer import redact_secrets # S3-01 from codegenie.output.redacted_slice import RedactedSlice # S3-02 from codegenie.parsers import JSONValue from codegenie.types import ProbeId _ENVELOPE_PROBE_ID = ProbeId("__envelope__") class SanitizerPass(Protocol): def __call__(self, slice_: dict[str, JSONValue]) -> dict[str, JSONValue]: ... def _redact_known_patterns_pass(slice_: dict[str, JSONValue]) -> dict[str, JSONValue]: ... def _redact_entropy_pass(slice_: dict[str, JSONValue]) -> dict[str, JSONValue]: ... def _build_redacted_slice_pass(slice_: dict[str, JSONValue]) -> RedactedSlice: ... _PASSES: tuple[object, ...] = ( _redact_known_patterns_pass, _redact_entropy_pass, _build_redacted_slice_pass, ) def _redact_envelope(envelope: dict[str, JSONValue]) -> RedactedSlice: redacted, _findings = redact_secrets(envelope, _ENVELOPE_PROBE_ID) return redactedredact_secretscall internally for Phase 2 simplicity — the named, mockable_PASSEStuple is what AC-5/AC-6 require. The pass-level decomposition can be a thin shim aroundredact_secrets(e.g.,_redact_known_patterns_passcalls into the known-pattern path insidesanitizer.py;_redact_entropy_passcalls the entropy path;_build_redacted_slice_passis the closure). The shim's contract is the order; the implementation chooses how literally the three passes decomposeredact_secrets's body. - Edit
src/codegenie/cli.py: - Add a new seam
_seam_redact_envelope(envelope: dict[str, Any]) -> RedactedSlicebetween_seam_shallow_merge(Step 8) and_seam_validate_envelope(Step 9). The seam callsenvelope_redactor._redact_envelope(envelope)and returns theRedactedSlice. - Tighten
_seam_write_envelope'senvelopeparameter fromdict[str, Any]toRedactedSlice. Body:writer.write(envelope, raw_artifacts, output_dir)(the writer acceptsRedactedSliceper AC-1; the YAML serialization readsenvelope.sliceinsideWriter.write's body). - The call-site (around line 572):
redacted_envelope = _seam_redact_envelope(envelope); _seam_validate_envelope(redacted_envelope); yaml_bytes = _seam_write_envelope(redacted_envelope, raw_artifacts, output_dir). The_seam_validate_envelopesignature also tightens (it now validatesredacted_envelope.sliceunder the hood — implementer choice on whether the seam takesRedactedSliceordict[str, Any]; the principled answer isRedactedSlicefor type-uniformity at the seam layer). - Edit
src/codegenie/output/writer.py: - Change
Writer.writesignature:def write(self, envelope: RedactedSlice, raw_artifacts: list[tuple[str, bytes]], output_dir: Path) -> None. - First executable statement:
if not isinstance(envelope, RedactedSlice): raise TypeError(f"Writer.write requires RedactedSlice (02-ADR-0010); got {type(envelope).__name__}"). - Inside the body, read
envelope.slice(the redacted dict payload) for YAML serialization. The existingyaml.dump(envelope, ...)call becomesyaml.dump(envelope.slice, ...). - On the success path (after
_atomic_write_bytes(output_dir / "repo-context.yaml", body)returns successfully), emit_log.info(EVENT_ENVELOPE_WRITTEN, **{SECRETS_REDACTED_COUNT_FIELD: envelope.findings_count})(with whatever existing context fields the writer already includes —path=str(...), etc.). Imports:from codegenie.logging import EVENT_ENVELOPE_WRITTEN, SECRETS_REDACTED_COUNT_FIELD. - Update the writer's docstring to name 02-ADR-0010 (the signature tightening) and 02-ADR-0005 (the persistence-zero-plaintext discipline this signature enforces).
- Edit
src/codegenie/logging.py: - Add
SECRETS_REDACTED_COUNT_FIELD: Final[str] = "secrets_redacted_count". - Add
EVENT_ENVELOPE_WRITTEN: Final[str] = "envelope.written". - Append both names to
__all__. - Update the module docstring to reference 02-ADR-0008 (the single-log-field discipline) and 02-ADR-0010 (the writer-completion event).
- Write
tests/unit/output/test_writer_signature.py(AC-1, AC-2, AC-2b, AC-3, AC-3b): - Fixtures:
tests/unit/output/_fixtures/raw_dict_to_writer.pyand_fixtures/redacted_slice_to_writer.py. - Subprocess invocations against both fixtures with
python -m mypy --strict; assert exit codes + error substring contracts. - Runtime test: pass a raw
dicttoWriter.write; assertTypeErrorwith the message regex. - Source-level test:
inspect.getsource(Writer.write)regex contains theisinstanceguard. - Write
tests/unit/output/test_envelope_redactor_composition.py(AC-4, AC-5, AC-6, AC-7): test_composition_order_documented_in_docstring:inspect.getdocsubstring matches.test_redact_envelope_invokes_passes_in_order: monkeypatch_PASSESwithMock(wraps=...)spies that append to arecord: list[str]; assert canonical order.test_reorder_mutation_changes_recorded_order+test_canonical_order_under_no_mutation: paired mutation-sensitivity test.test_return_type_is_redacted_slice:typing.get_type_hints(_redact_envelope)["return"] is RedactedSlice.- Write
tests/unit/output/test_writer_logs_secrets_redacted_count.py(AC-8, AC-8b, AC-9, AC-10, AC-11, AC-12, AC-12b): test_constants_module_level_and_exported:SECRETS_REDACTED_COUNT_FIELD+EVENT_ENVELOPE_WRITTENareFinal[str]constants incodegenie.logging.__all__.test_no_string_literals_at_call_site: source-level regex check thatWriter.writedoes not contain the string literals.test_count_field_emitted_on_zero_count+test_count_field_emitted_on_nonzero_count:capture_logs()assertions overWriter.writewith constructedRedactedSlicefixtures.test_event_unique_per_write_call: filter captured events byevent == "envelope.written"; assert exactly one per call.test_no_event_on_write_failure: monkeypatch_atomic_write_bytesto raiseOSError; assert zeroenvelope.writtenevents captured.test_dataflow_no_plaintext_no_secret_finding_fields(AC-12): end-to-end through_seam_redact_envelope+_seam_write_envelope; substring assertions over the persisted YAML bytes.test_per_probe_attribution_preserved(AC-12b): pre-scrubbed fixture with a placeholder + one novel-shape secret; assert envelope-level findings count of 1.- Write
tests/unit/output/test_envelope_redactor_integration.py(AC-15, AC-16): - Parametrized over the four canonical shapes (zero / one / three-distinct / same-fingerprint-twice).
- For each: end-to-end dataflow + log assertion + YAML substring assertion + post-write
RedactedSliceround-trip. - Placeholder-idempotence: assert envelope-level pass leaves an existing
<REDACTED:fingerprint=…>placeholder unchanged and does not increment the count. - Do NOT edit
redact_secretsitself (S3-01 owns it). Do NOT editRedactedSlice(S3-02 owns it). Do NOT editOutputSanitizer.scrub(Phase 0; this story does not touch per-probe scrub).
Out of scope¶
- The
SecretRedactor/redact_secretsbody (S3-01). - The
RedactedSlicePydantic model (S3-02). - The CLI summary line (
secrets_redacted_count: <N>+ file:line list) at gather end — this story emits the log event; the CLI summary path consumes it; the summary line itself is touched in S8-02. tests/adv/phase02/test_secret_in_source.py(S6-07 — load-bearing adversarial; depends on this story landing the chokepoint).tests/adv/phase02/test_no_inmemory_secret_leak.py(S7-04 —inspect-based boundary test; asserts nodictreaches the writer call site insrc/AND_build_redacted_slice_pass+redact_secretsare the onlyRedactedSliceconstruction sites).- The
Fingerprintnewtype (production ADR-0033 §3 — rule-of-three reached at this story but extraction deferred to S8-02 concurrent landing; see Notes-for-implementer §"Design patterns" DP3). - Phase 4 RAG ingestion path inheriting the
RedactedSlicetype guarantee (02-ADR-0010 Consequences) — Phase 4 design concern. - Any
OutputSanitizer.scrubedits or per-probe-pass additions — out of scope. The per-probe pass is upstream of this module.
Notes for the implementer¶
- Where does the
_seam_redact_envelopestep live? The natural choice is a new function incli.pyadjacent to_seam_shallow_mergeand_seam_validate_envelope. It callsenvelope_redactor._redact_envelope(envelope). The seam-test pattern from S1-02 (structlog.testing.capture_logs()over a built-up envelope) is the test shape. - Mock-spy ordering test technique.
unittest.mock.Mock(wraps=original)preserves behavior while recording calls. The recommended mechanism: a sharedrecord: list[str]that each spy appends its name to before delegating — simpler and equally robust astime.monotonic_ns(). Pinned in AC-5. - The
mypy --strictsubprocess test. Subprocess invocation ofpython -m mypy --strict <path>is the canonical mechanism (matches CI). Fixture files (one bad, one good) undertests/unit/output/_fixtures/. AC-2 + AC-2b cover both directions; without the positive control (AC-2b), the negative could pass spuriously on a broken harness. - Runtime rejection of raw
dict. Python's type hints are stripped at runtime. The simplest defense is theisinstance(envelope, RedactedSlice)guard as the first executable statement ofWriter.write. TheTypeErrormessage contains both"RedactedSlice"and"02-ADR-0010"so a future reader of the traceback can find the source-of-truth ADR without re-deriving it. AC-3b's source-level regex check (re.compile(r"isinstance\s*\(\s*envelope\s*,\s*RedactedSlice\s*\)")) is the regression net. SECRETS_REDACTED_COUNT_FIELD+EVENT_ENVELOPE_WRITTENplacement. Both belong incodegenie/logging.py(the canonical home for cross-module log-field/event constants). Single source of truth; one typo-resistant constant per surface. Per-probe and CLI consumers later import the same constants.- The "0-count is grep-able" property. Auditors run
grep secrets_redacted_count: <log_path>to confirm a clean run. Ifsecrets_redacted_count=0is silently omitted (the field only appears when nonzero), the auditor cannot distinguish "clean run" from "log corruption" or "missing emission". Always emit the field; AC-9 pins this. - The writer-completion event name is introduced by this story. Phase 0's
Writer.writeis silent (verified at validation time — only_log.warning(...)for csafe + symlink-refused). This story adds_log.info(EVENT_ENVELOPE_WRITTEN, ...)on the success path after_atomic_write_bytesreturns. The event is single-event, single-field — 02-ADR-0008's "no event stream" is honored (one new structured-log event, not an event-bus subscription). AC-11's failure-path test pins that anOSErrorduring write does NOT emit the event. - Coordinated edit across callers. 02-ADR-0010 Consequences names "the writer signature change is a contract surface shift requiring a coordinated edit across all callers (one — the sanitizer pipeline)". On master, the actual single caller is
_seam_write_envelope; verify bygrep -rn "Writer().*write\|Writer.write\|writer\.write" src/before and after the edit. If a second call site appears (e.g., a test fixture or a debug-write path), it is either (a) a test that must be updated to construct viaredact_secretsor (b) a bypass that defeats the chokepoint — flag in the PR description. - Forbidden-patterns reach. S1-11 covers
src/codegenie/output/**. The newenvelope_redactor.pyis inside that glob; this story does not introduce new banned patterns. S3-02 AC-14 (regexr"\.model_construct\s*\(|\bmodel_construct\s*="oversrc/codegenie/output/) continues to pass. - LOC budget.
envelope_redactor.py≈ 60 LOC (three thin passes + module-level tuple + protocol + redactor entry).cli.pyedits ≈ 15 LOC (new seam function + call-site rewire + signature tightening of_seam_write_envelope).writer.pyedits ≈ 15 LOC (signature tightening + isinstance guard + log emission).logging.pyedits ≈ 5 LOC (two constants +__all__). Tests ≈ 400 LOC across four new files. Total ~495 LOC. - Unit-vs-integration split. The story is unit-level. End-to-end gather (
test_secret_in_source.py) lands in S6-07; theinspect-based source-level boundary test (test_no_inmemory_secret_leak.py) lands in S7-04. This story's tests constructRedactedSlicedirectly (viaredact_secrets({}, ProbeId("__envelope__"))) and callWriter.write/ the seam directly withtmp_pathoutput dirs. - The structural ladder, completed at four rungs. This story closes the third rung of 02-ADR-0005's structural defense; the fourth rung is the S7-04 source-level boundary test. (1) Runtime —
redact_secretsreplaces cleartext (S3-01); (2) Type-system —RedactedSliceis a smart-constructor (S3-02); (3) Chokepoint —Writer.write+_seam_write_envelopeaccept onlyRedactedSlice(this story); (4) Source-level —inspect-based boundary test that no other path reaches the writer (S7-04). When this PR lands, "redactor was called" is type-checkable AND chokepoint-enforced; only the source-level rung remains.
Design patterns¶
- DP1 —
_PASSESregistry: rule-of-three reached but stays a literal tuple. This story is the third known-pass composition (Phase 0 per-probe scrub has two passes; S3-03's envelope-level has three). Rule-of-three names this the moment to consider promoting to a@register_sanitizer_passdecorator registry. But: Phase 2's three envelope-level passes are fixed by 02-ADR-0010 (_build_redacted_slice_passis the closure, not a content pass — adding a fourth would be a content pass like a future Phase-4 RAG-scrubber or a per-task-class redactor). A literal tuple isclosed for modification and closed for extension— fine while N=3. When the fourth content pass arrives in Phase 4+, promote_PASSESto a decorator registry. Until then, the literal tuple is correct per Rule 2 ("three similar lines is better than a premature abstraction"). The structure is already extension-friendly (theSanitizerPassProtocol makes promotion mechanical). - DP2 —
SanitizerPassProtocol. Type each pass via a Protocol-typed alias:class SanitizerPass(Protocol): def __call__(self, slice_: dict[str, JSONValue]) -> dict[str, JSONValue]: .... The closure pass has a different return type (RedactedSlice); type it as a sibling Protocol (class SliceClosurePass(Protocol): def __call__(self, slice_: dict[str, JSONValue]) -> RedactedSlice: ...). The Protocol surface makes the registry promotion (DP1) trivially compatible: a@register_sanitizer_passdecorator that takes aSanitizerPass-typed callable lands without rewriting any current pass. - DP3 —
Fingerprintnewtype: rule-of-three reached, deferred to S8-02 concurrent landing. This story is the third consumer of the 8-hex fingerprint string (S3-01 produces; S3-02 validates viaRedactedSlice.fingerprints; S3-03 readsenvelope.fingerprintsinWriter.writeto embed in the persisted shape). Production ADR-0033 §3 names primitive obsession on cross-module identifiers as a review-blocker. Decision: S3-03 does NOT introduce the newtype — three of the four eventual consumers are inside Phase 2; the fourth (CLI summary at S8-02) is the natural concurrent landing site. Surface the opportunity in the S8-02 story prose so the cross-cutting refactor lands with all four consumers in one PR. The deferral is principled (rule-of-three threshold reached, but the four-consumer cliff is one story away — landing the newtype now plus one rewrite in S8-02 vs. landing it concurrently with S8-02 plus zero in-flight rewrites — Rule 3 "surgical changes" favors the latter). - DP4 — Pure module discipline for
envelope_redactor.py. No I/O, no logging, no filesystem reads, noos.environ, nosubprocess, notime. The three passes are pure functions over their arguments;_redact_envelopeis a pure delegator. The seam (_seam_redact_envelopeincli.py) is impure (it's a seam — that's its job, per the functional-core / imperative-shell pattern). The Writer (Writer.write) is impure (it persists + logs). The log emission is impure-shell. Future contributors must not add I/O toenvelope_redactor.py— if a need arises ("log every fingerprint generation"), the logging belongs at the seam or in S3-01'sredact_secrets, not in this module. A regression that importslogging,structlog,Path,os, orsubprocessat the top ofenvelope_redactor.pyis a review-blocker per this Note. Mirrors S3-02's DP3 forredacted_slice.py. - DP5 — Smart-constructor + chokepoint ladder closed at three rungs. S3-01 → runtime ("replace cleartext"); S3-02 → type-system ("
RedactedSliceis a smart-constructor;model_constructbanned"); S3-03 → chokepoint ("Writer.writeand_seam_write_envelopeaccept onlyRedactedSlice;isinstanceguard rejects rawdictat runtime"). The fourth rung (source-levelinspect-based boundary test that no other path reaches the writer) lands in S7-04 after Phase 2's probes are all in. Document the four-rung ladder in theenvelope_redactor.pymodule docstring AND the PR description. This is the canonical implementation of the toolkit's "Smart constructor + Make illegal states unrepresentable" pattern applied at the I/O boundary.