Skip to content

Attempt log — S1-06 ALLOWED_BINARIES ten additions (02-ADR-0001)

Attempt 1 — 2026-05-15 — GREEN on first cycle

Strategy. Data-only kernel-tier extension; single red→green→refactor cycle with no inner-loop retries. Validator hardened the story upfront (HARDENED verdict in _validation/S1-06-…md — added AC-9 for the two pre-existing Phase-1 closed-set guards, added AC-10 for the 02-ADR-0001 amendment, narrowed the TDD plan's exception catch). With those fixes baked into the story, the implementer cycle was uneventful.

Sequence.

  1. RED — wrote tests/unit/exec/test_allowed_binaries.py (20 cases: 1 strict equality + 1 "every new binary present" + 1 sensitive-constants pin + 10 parametrized allowlist-acceptance + 7 parametrized env-strip). First pytest run failed on the strict-equality assertion (frozenset({'git','node'}) != EXPECTED_TOTAL) — confirms the test would fail under the wrong implementation.

  2. GREEN — extended src/codegenie/exec.py's ALLOWED_BINARIES literal from {"git", "node"} to the twelve-entry frozenset; updated the comment block above the constant to name 02-ADR-0001 as the governing decision (replaces the Phase-1-only note). 20 new tests went green.

  3. AC-9 — updated both pre-existing closed-set guard tests forward (test_node_in_allowed_binaries in tests/unit/test_exec.py; renamed test_allowed_binaries_invariant_unchangedtest_allowed_binaries_invariant_phase2 in tests/unit/probes/test_deployment.py). Kept test_allowed_binaries_closed_set_regression's parametrize list of denied binaries (bash, sh, python, curl, wget, ssh) unchanged — those six are still forbidden under the twelve-entry closed set.

  4. AC-10 — amended docs/phases/02-context-gather-layers-b-g/ADRs/0001-add-docker-and-security-cli-tools-to-allowed-binaries.md's Decision section: "eight new entries" → "ten new entries", added a 2-paragraph amendment block enumerating ast-grep (per localv2.md §5.6 G2) and ripgrep (per localv2.md §5.6 G3). Other paragraphs (Pattern fit, Consequences, Reversibility, Evidence/sources) explicitly noted as applying unchanged.

  5. Gates — ruff check (one E501 in the new comment, fixed by re-wrapping), ruff format (idempotent), mypy --strict on src/codegenie (61 files, no issues), full pytest suite (1625 passed, 1 skipped, 2 xfailed; required PATH="$PWD/.venv/bin:$PATH" for the import-linter canary — L3 lesson surfaced and resolved by pip install -e .[dev]).

Refactor decisions. None substantive. The story is data-only by design (registry pattern preserved). No new abstractions; no kernel surface changes; chokepoint signature untouched. The tests/unit/exec/ test directory follows the existing subdir convention (with __init__.py); exec is a Python builtin name but not a stdlib module, so L6 (the types/ collision) does not apply.

Runtime evidence (Ralph Wiggum pass).

AC Evidence
1 tests/unit/exec/test_allowed_binaries.py::test_allowed_binaries_is_exact_twelve_entry_set
2 tests/unit/exec/test_allowed_binaries.py::test_every_new_binary_is_present
3 tests/unit/exec/test_allowed_binaries.py::test_phase_0_sensitive_constants_unchanged
4 test_new_binary_not_rejected_by_allowlist (10 params) + test_sensitive_env_var_is_dropped_from_child_env (7 params)
5 Diff scope: only src/codegenie/exec.py (data + comment), tests/, and one ADR — no source file added a second subprocess/create_subprocess_exec call site
6 tests/unit/test_exec.py::test_run_allowlisted_signature_default_is_none (unchanged, still green)
7 RED demonstrated at pytest first invocation (failure on strict-equality assertion); green after the exec.py literal change
8 ruff check / ruff format --check clean across src tests; mypy --strict src/codegenie clean; full pytest 1625 passed
9 tests/unit/test_exec.py::test_node_in_allowed_binaries + tests/unit/probes/test_deployment.py::test_allowed_binaries_invariant_phase2 — both green against the twelve-entry set; closed-set regression parametrize unchanged
10 docs/phases/02-context-gather-layers-b-g/ADRs/0001-add-docker-and-security-cli-tools-to-allowed-binaries.md §Decision now states "ten new entries" and includes the ast-grep/ripgrep amendment block

Final state. GREEN on first cycle; all 10 ACs PASS; all cross-cutting gates green; ready to commit.

Files touched (5).

  1. src/codegenie/exec.py — extended ALLOWED_BINARIES; updated comment block.
  2. tests/unit/exec/__init__.py — new (empty package marker).
  3. tests/unit/exec/test_allowed_binaries.py — new (20 cases).
  4. tests/unit/test_exec.py — updated test_node_in_allowed_binaries's expected set.
  5. tests/unit/probes/test_deployment.py — renamed + updated test_allowed_binaries_invariant_*.
  6. docs/phases/02-context-gather-layers-b-g/ADRs/0001-…md — Decision amended (eight → ten new entries).

Attempt 2 — 2026-05-15 — Pass-2 hardening (AC-2 rewrite + AC-11-16 added)

Trigger. The validator ran a Pass-2 four-critic deep audit that surfaced 2 block-tier and 5 harden-tier gaps the Pass-1 light-touch validation missed. The story file gained AC-2-rewrite + AC-11/12/13/14/15/16 + AC-10-extended. Re-entered the executor in mid-cycle to land them.

Sequence (additive on top of Attempt 1).

  1. AC-11 — module docstring pin. Inserted a "Phase 2 (02-ADR-0001) extends ALLOWED_BINARIES with the ten Layer B/C/G tools listed in …" paragraph into the top docstring of src/codegenie/exec.py. Added a meta-test test_exec_module_docstring_phase2_present that whitespace-normalizes codegenie.exec.__doc__ (source-file docstrings wrap at column-80) and asserts "02-ADR-0001" AND "ten Layer B/C/G tools" are both present.

  2. AC-2 rewrite — ADR cross-document gate. Replaced the original prose-deferral with a meta-test test_adr_0001_enumerates_all_new_binaries that opens 02-ADR-0001.md, asserts "ten new entries" IS present, asserts "eight new entries" is NOT present, and asserts every binary in EXPECTED_NEW_BINARIES appears as a backticked identifier. First red run flagged that the original Amendment block AND two Consequences bullets still said "eight new entries" / "eight named entries" / "eight new entries"; reworded all three to remove the literal "eight new entries" string while preserving meaning. ADR is now self-consistent.

  3. AC-12 — env-strip parametric per new binary. Added test_env_strip_applies_to_each_new_binary parametrized over (binary, sensitive_key)["docker", "semgrep"] × ["OPENAI_API_KEY", "AWS_SECRET_ACCESS_KEY", "GITHUB_TOKEN"] (6 cases). Each captures the spawn-spy env dict AND the subproc.env_extra.sensitive_key_dropped structlog event at level warning. Catches the if binary in NEW: env = os.environ.copy() per-binary special-case mutant.

  4. AC-13 — _RUNNING_PROCS cleanup pin. Added assert len(_RUNNING_PROCS) == 0 to test_new_binary_not_rejected_by_allowlist after the call returns or raises. Catches the "skip the finally: pop for new binaries" mutant; preserves Phase 7's coordinator-cancel pathway invariant.

  5. AC-14 — path-traversal regression for ten new binaries. Added test_new_binaries_reject_resolved_paths parametrized over 20 cases (/usr/bin/{b} + ./{b} for each of the 10 new binaries). All raise DisallowedSubprocessError BEFORE spawn; the spawn-spy assert_not_awaited() confirms Phase 0 invariant 1.

  6. AC-15 — closed-set negative-list extension. Extended tests/unit/test_exec.py::test_allowed_binaries_closed_set_regression's parametrize list from the original six (bash, sh, python, curl, wget, ssh) to fifteen — added [bwrap, bubblewrap, eval, exec, kill, chmod, chown, dd, nc]. The bwrap/bubblewrap entries pin the wrapper-pattern exception (02-ADR-0001 §Consequences) structurally; the other seven are adjacent dangerous binaries Phase 2 calls out as never-allowlisted. Supersedes Pass-1 AC-9's "parametrize list unchanged" instruction.

  7. AC-16 — AWS_* prefix-match coverage on a new binary. Added test_aws_prefix_match_strips_arbitrary_key_for_new_binary asserting AWS_FOO is dropped from docker-argv env AND the drop event fires at level warning. Exercises the _SENSITIVE_PREFIX tuple path (which AWS_ACCESS_KEY_ID etc. do not, because they're also in _SENSITIVE_EXACT).

  8. AC-10 extension — ADR §Tradeoffs row 2 + §Consequences bullet. Updated the CVE-feeds row from "Eight new CVE feeds" → "Ten new CVE feeds" with the full parenthetical (docker, syft, grype, gitleaks, semgrep, ast-grep, ripgrep, scip-typescript, tree-sitter, strace). Added a new §Consequences bullet recording the bwrap/bubblewrap-not-in-allowlist policy as a load-bearing decision with reference back to AC-15's structural pin.

  9. Style alignment (Notes-for-implementer follow-through). Refactored the env-strip mocks in test_allowed_binaries.py from patch.object(_aio, "create_subprocess_exec", fake_exec) to the family-precedent monkeypatch.setattr(asyncio, "create_subprocess_exec", spy) shape via a module-private _make_spawn_spy(monkeypatch) helper. Mocking style now matches the eight precedents in tests/unit/test_exec.py per Rule 11.

Lint/type fixes during cycle.

  • ruff E501 on long docstring path: wrapped docs/phases/02-context-gather-layers-b-g/ADRs//0001-…md across two source lines (visible as one path in rendered docstring).
  • mypy _Call | None attribute access on spy.await_args.kwargs: added assert spy.await_args is not None before each of the three call sites (matches the upstream mock.AsyncMock typing).
  • Removed from typing import Any after removing the trailing _ = Any sentinel; ruff F401 silently approved.

Final gates (post-Pass-2).

  • ruff check src tests → All checks passed!
  • ruff format --check src tests → 189 files already formatted.
  • mypy --strict src/codegenie tests/unit/exec → Success: 63 source files clean.
  • pytest (full suite) → 1663 passed, 1 skipped, 2 xfailed (Pass-1 was 1625 passed; +38 from the new parametrics in AC-12/13/14/16 + AC-15 nine new closed-set cases + 2 ADR/docstring meta-tests + 9 extra AC-15 parametric cases applied across two files).

Runtime evidence (Ralph Wiggum pass — Pass 2).

AC Evidence (file::function)
2 (rewrite) tests/unit/exec/test_allowed_binaries.py::test_adr_0001_enumerates_all_new_binaries
10 (extended) ADR §Tradeoffs row 2 says "Ten new CVE feeds"; ADR §Consequences has bwrap-policy bullet (verified by AC-2's meta-test ∋ "ten new entries" / ∌ "eight new entries")
11 tests/unit/exec/test_allowed_binaries.py::test_exec_module_docstring_phase2_present
12 tests/unit/exec/test_allowed_binaries.py::test_env_strip_applies_to_each_new_binary (6 cases)
13 tests/unit/exec/test_allowed_binaries.py::test_new_binary_not_rejected_by_allowlist (10 cases — _RUNNING_PROCS empty assertion)
14 tests/unit/exec/test_allowed_binaries.py::test_new_binaries_reject_resolved_paths (20 cases)
15 tests/unit/test_exec.py::test_allowed_binaries_closed_set_regression (15 cases)
16 tests/unit/exec/test_allowed_binaries.py::test_aws_prefix_match_strips_arbitrary_key_for_new_binary

Files touched (Pass 2 delta). Same five files as Pass 1, plus expanded test coverage. No new files created in Pass 2.

Final state (story-wide). All 16 ACs PASS; all cross-cutting gates green (1663 tests, 0 failures); ready to commit.