Attempt log — S1-06 ALLOWED_BINARIES ten additions (02-ADR-0001)¶
Attempt 1 — 2026-05-15 — GREEN on first cycle¶
Strategy. Data-only kernel-tier extension; single red→green→refactor cycle with no inner-loop retries. Validator hardened the story upfront (HARDENED verdict in _validation/S1-06-…md — added AC-9 for the two pre-existing Phase-1 closed-set guards, added AC-10 for the 02-ADR-0001 amendment, narrowed the TDD plan's exception catch). With those fixes baked into the story, the implementer cycle was uneventful.
Sequence.
-
RED — wrote
tests/unit/exec/test_allowed_binaries.py(20 cases: 1 strict equality + 1 "every new binary present" + 1 sensitive-constants pin + 10 parametrized allowlist-acceptance + 7 parametrized env-strip). Firstpytestrun failed on the strict-equality assertion (frozenset({'git','node'}) != EXPECTED_TOTAL) — confirms the test would fail under the wrong implementation. -
GREEN — extended
src/codegenie/exec.py'sALLOWED_BINARIESliteral from{"git", "node"}to the twelve-entry frozenset; updated the comment block above the constant to name 02-ADR-0001 as the governing decision (replaces the Phase-1-only note). 20 new tests went green. -
AC-9 — updated both pre-existing closed-set guard tests forward (
test_node_in_allowed_binariesintests/unit/test_exec.py; renamedtest_allowed_binaries_invariant_unchanged→test_allowed_binaries_invariant_phase2intests/unit/probes/test_deployment.py). Kepttest_allowed_binaries_closed_set_regression's parametrize list of denied binaries (bash, sh, python, curl, wget, ssh) unchanged — those six are still forbidden under the twelve-entry closed set. -
AC-10 — amended
docs/phases/02-context-gather-layers-b-g/ADRs/0001-add-docker-and-security-cli-tools-to-allowed-binaries.md's Decision section: "eight new entries" → "ten new entries", added a 2-paragraph amendment block enumeratingast-grep(perlocalv2.md §5.6 G2) andripgrep(perlocalv2.md §5.6 G3). Other paragraphs (Pattern fit, Consequences, Reversibility, Evidence/sources) explicitly noted as applying unchanged. -
Gates —
ruff check(one E501 in the new comment, fixed by re-wrapping),ruff format(idempotent),mypy --strictonsrc/codegenie(61 files, no issues), fullpytestsuite (1625 passed, 1 skipped, 2 xfailed; requiredPATH="$PWD/.venv/bin:$PATH"for the import-linter canary — L3 lesson surfaced and resolved bypip install -e .[dev]).
Refactor decisions. None substantive. The story is data-only by design (registry pattern preserved). No new abstractions; no kernel surface changes; chokepoint signature untouched. The tests/unit/exec/ test directory follows the existing subdir convention (with __init__.py); exec is a Python builtin name but not a stdlib module, so L6 (the types/ collision) does not apply.
Runtime evidence (Ralph Wiggum pass).
| AC | Evidence |
|---|---|
| 1 | tests/unit/exec/test_allowed_binaries.py::test_allowed_binaries_is_exact_twelve_entry_set |
| 2 | tests/unit/exec/test_allowed_binaries.py::test_every_new_binary_is_present |
| 3 | tests/unit/exec/test_allowed_binaries.py::test_phase_0_sensitive_constants_unchanged |
| 4 | test_new_binary_not_rejected_by_allowlist (10 params) + test_sensitive_env_var_is_dropped_from_child_env (7 params) |
| 5 | Diff scope: only src/codegenie/exec.py (data + comment), tests/, and one ADR — no source file added a second subprocess/create_subprocess_exec call site |
| 6 | tests/unit/test_exec.py::test_run_allowlisted_signature_default_is_none (unchanged, still green) |
| 7 | RED demonstrated at pytest first invocation (failure on strict-equality assertion); green after the exec.py literal change |
| 8 | ruff check / ruff format --check clean across src tests; mypy --strict src/codegenie clean; full pytest 1625 passed |
| 9 | tests/unit/test_exec.py::test_node_in_allowed_binaries + tests/unit/probes/test_deployment.py::test_allowed_binaries_invariant_phase2 — both green against the twelve-entry set; closed-set regression parametrize unchanged |
| 10 | docs/phases/02-context-gather-layers-b-g/ADRs/0001-add-docker-and-security-cli-tools-to-allowed-binaries.md §Decision now states "ten new entries" and includes the ast-grep/ripgrep amendment block |
Final state. GREEN on first cycle; all 10 ACs PASS; all cross-cutting gates green; ready to commit.
Files touched (5).
src/codegenie/exec.py— extendedALLOWED_BINARIES; updated comment block.tests/unit/exec/__init__.py— new (empty package marker).tests/unit/exec/test_allowed_binaries.py— new (20 cases).tests/unit/test_exec.py— updatedtest_node_in_allowed_binaries's expected set.tests/unit/probes/test_deployment.py— renamed + updatedtest_allowed_binaries_invariant_*.docs/phases/02-context-gather-layers-b-g/ADRs/0001-…md— Decision amended (eight → ten new entries).
Attempt 2 — 2026-05-15 — Pass-2 hardening (AC-2 rewrite + AC-11-16 added)¶
Trigger. The validator ran a Pass-2 four-critic deep audit that surfaced 2 block-tier and 5 harden-tier gaps the Pass-1 light-touch validation missed. The story file gained AC-2-rewrite + AC-11/12/13/14/15/16 + AC-10-extended. Re-entered the executor in mid-cycle to land them.
Sequence (additive on top of Attempt 1).
-
AC-11 — module docstring pin. Inserted a "Phase 2 (02-ADR-0001) extends
ALLOWED_BINARIESwith the ten Layer B/C/G tools listed in …" paragraph into the top docstring ofsrc/codegenie/exec.py. Added a meta-testtest_exec_module_docstring_phase2_presentthat whitespace-normalizescodegenie.exec.__doc__(source-file docstrings wrap at column-80) and asserts"02-ADR-0001"AND"ten Layer B/C/G tools"are both present. -
AC-2 rewrite — ADR cross-document gate. Replaced the original prose-deferral with a meta-test
test_adr_0001_enumerates_all_new_binariesthat opens02-ADR-0001.md, asserts"ten new entries"IS present, asserts"eight new entries"is NOT present, and asserts every binary inEXPECTED_NEW_BINARIESappears as a backticked identifier. First red run flagged that the original Amendment block AND two Consequences bullets still said"eight new entries"/"eight named entries"/"eight new entries"; reworded all three to remove the literal"eight new entries"string while preserving meaning. ADR is now self-consistent. -
AC-12 — env-strip parametric per new binary. Added
test_env_strip_applies_to_each_new_binaryparametrized over(binary, sensitive_key)∈["docker", "semgrep"] × ["OPENAI_API_KEY", "AWS_SECRET_ACCESS_KEY", "GITHUB_TOKEN"](6 cases). Each captures the spawn-spy env dict AND thesubproc.env_extra.sensitive_key_droppedstructlog event at levelwarning. Catches theif binary in NEW: env = os.environ.copy()per-binary special-case mutant. -
AC-13 —
_RUNNING_PROCScleanup pin. Addedassert len(_RUNNING_PROCS) == 0totest_new_binary_not_rejected_by_allowlistafter the call returns or raises. Catches the "skip thefinally:pop for new binaries" mutant; preserves Phase 7's coordinator-cancel pathway invariant. -
AC-14 — path-traversal regression for ten new binaries. Added
test_new_binaries_reject_resolved_pathsparametrized over 20 cases (/usr/bin/{b}+./{b}for each of the 10 new binaries). All raiseDisallowedSubprocessErrorBEFORE spawn; the spawn-spyassert_not_awaited()confirms Phase 0 invariant 1. -
AC-15 — closed-set negative-list extension. Extended
tests/unit/test_exec.py::test_allowed_binaries_closed_set_regression's parametrize list from the original six (bash, sh, python, curl, wget, ssh) to fifteen — added[bwrap, bubblewrap, eval, exec, kill, chmod, chown, dd, nc]. Thebwrap/bubblewrapentries pin the wrapper-pattern exception (02-ADR-0001 §Consequences) structurally; the other seven are adjacent dangerous binaries Phase 2 calls out as never-allowlisted. Supersedes Pass-1 AC-9's "parametrize list unchanged" instruction. -
AC-16 —
AWS_*prefix-match coverage on a new binary. Addedtest_aws_prefix_match_strips_arbitrary_key_for_new_binaryassertingAWS_FOOis dropped fromdocker-argv env AND the drop event fires at levelwarning. Exercises the_SENSITIVE_PREFIXtuple path (whichAWS_ACCESS_KEY_IDetc. do not, because they're also in_SENSITIVE_EXACT). -
AC-10 extension — ADR §Tradeoffs row 2 + §Consequences bullet. Updated the CVE-feeds row from "Eight new CVE feeds" → "Ten new CVE feeds" with the full parenthetical (
docker, syft, grype, gitleaks, semgrep, ast-grep, ripgrep, scip-typescript, tree-sitter, strace). Added a new §Consequences bullet recording thebwrap/bubblewrap-not-in-allowlist policy as a load-bearing decision with reference back to AC-15's structural pin. -
Style alignment (Notes-for-implementer follow-through). Refactored the env-strip mocks in
test_allowed_binaries.pyfrompatch.object(_aio, "create_subprocess_exec", fake_exec)to the family-precedentmonkeypatch.setattr(asyncio, "create_subprocess_exec", spy)shape via a module-private_make_spawn_spy(monkeypatch)helper. Mocking style now matches the eight precedents intests/unit/test_exec.pyper Rule 11.
Lint/type fixes during cycle.
- ruff E501 on long docstring path: wrapped
docs/phases/02-context-gather-layers-b-g/ADRs//0001-…mdacross two source lines (visible as one path in rendered docstring). - mypy
_Call | Noneattribute access onspy.await_args.kwargs: addedassert spy.await_args is not Nonebefore each of the three call sites (matches the upstreammock.AsyncMocktyping). - Removed
from typing import Anyafter removing the trailing_ = Anysentinel; ruff F401 silently approved.
Final gates (post-Pass-2).
ruff check src tests→ All checks passed!ruff format --check src tests→ 189 files already formatted.mypy --strict src/codegenie tests/unit/exec→ Success: 63 source files clean.pytest(full suite) → 1663 passed, 1 skipped, 2 xfailed (Pass-1 was 1625 passed; +38 from the new parametrics in AC-12/13/14/16 + AC-15 nine new closed-set cases + 2 ADR/docstring meta-tests + 9 extra AC-15 parametric cases applied across two files).
Runtime evidence (Ralph Wiggum pass — Pass 2).
| AC | Evidence (file::function) |
|---|---|
| 2 (rewrite) | tests/unit/exec/test_allowed_binaries.py::test_adr_0001_enumerates_all_new_binaries |
| 10 (extended) | ADR §Tradeoffs row 2 says "Ten new CVE feeds"; ADR §Consequences has bwrap-policy bullet (verified by AC-2's meta-test ∋ "ten new entries" / ∌ "eight new entries") |
| 11 | tests/unit/exec/test_allowed_binaries.py::test_exec_module_docstring_phase2_present |
| 12 | tests/unit/exec/test_allowed_binaries.py::test_env_strip_applies_to_each_new_binary (6 cases) |
| 13 | tests/unit/exec/test_allowed_binaries.py::test_new_binary_not_rejected_by_allowlist (10 cases — _RUNNING_PROCS empty assertion) |
| 14 | tests/unit/exec/test_allowed_binaries.py::test_new_binaries_reject_resolved_paths (20 cases) |
| 15 | tests/unit/test_exec.py::test_allowed_binaries_closed_set_regression (15 cases) |
| 16 | tests/unit/exec/test_allowed_binaries.py::test_aws_prefix_match_strips_arbitrary_key_for_new_binary |
Files touched (Pass 2 delta). Same five files as Pass 1, plus expanded test coverage. No new files created in Pass 2.
Final state (story-wide). All 16 ACs PASS; all cross-cutting gates green (1663 tests, 0 failures); ready to commit.