# Batch 07 — Code Review **Batch**: 7 of N **Tasks**: AZ-294 (Mid-flight tile snapshot), AZ-295 (Forbidden-kind + thumbnail rate cap), AZ-296 (Takeoff abort on FdrOpenError) **Reviewer**: autodev (7-phase) **Verdict**: **PASS_WITH_INFO** **Date**: 2026-05-11 ## Scope | Task | Component / Concern | Files touched (prod) | Files touched (tests) | |------|---------------------|----------------------|------------------------| | AZ-294 | F4 mid-flight tile snapshot sidecar + cap policy | `components/c13_fdr/{tile_snapshot_sink.py,errors.py,__init__.py}`, `config/schema.py`, `config/__init__.py`, `fdr_client/records.py` (added `frame_id`), `fdr_record_schema.md` | `tests/unit/c13_fdr/test_az294_tile_snapshot_sink.py` | | AZ-295 | AC-8.5 forbidden-kind + ≤ 0.1 Hz thumbnail rate cap | `components/c13_fdr/{record_kind_policy.py,errors.py,writer.py,__init__.py}`, `config/schema.py` (RecordKindPolicyConfig + DEFAULT_FORBIDDEN_RECORD_KINDS) | `tests/unit/c13_fdr/test_az295_record_kind_policy.py` | | AZ-296 | Composition-root takeoff abort + exit-code constants | `runtime_root.py` (added `take_off`, `EXIT_*`, `TakeoffResult`), `composition_root_protocol.md` v1.1.0 | `tests/unit/composition_root/test_az296_takeoff_abort.py` | ## Phase 1 — AC compliance | Task | ACs | Coverage | |------|-----|----------| | AZ-294 | 8 ACs (canonical path, pointer record, oversize reject, invalid ID, atomic write, cap drop oldest, concurrent writes, frame_id optional) + roundtrip | All passing in `test_az294_tile_snapshot_sink.py` (9 tests). | | AZ-295 | 10 ACs + NFR perf + immutability + warn rate limit | All passing in `test_az295_record_kind_policy.py` (14 tests). | | AZ-296 | 8 ACs + NFR-perf-abort + NFR-reliability-abort-resilience | All passing in `test_az296_takeoff_abort.py` (10 tests; subprocess + in-process mix). | 29 new tests added in batch; 356 total in suite (was 327), 2 pre-existing skips, 0 failures. ## Phase 2 — Contract drift - **`fdr_record_schema.md` v1.1.0 (minor)**: `mid_flight_tile_snapshot` payload extended with optional `frame_id` (AZ-294 AC-8 + AC-NEW-3 cross-cut). The `frame_id?` notation reflects optionality; v1.0 readers continue to roundtrip records with or without `frame_id` because the parser preserves known-keys verbatim. - **`composition_root_protocol.md` v1.0.0 → v1.1.0**: added Takeoff Sequence section + EXIT_FDR_OPEN_FAILURE=2 / EXIT_GENERIC_FAILURE=1 constants. Existing `compose_root` / `compose_operator` signatures unchanged. AC-NEW-3 / RESTRICT-UAV-4 explicitly cited. - **No other contract bumps.** AZ-294's `MidFlightTileSnapshotSink` and AZ-295's `RecordKindPolicy` are new public types but on c13_fdr's surface (epic E-C13), not on the cross-cutting fdr_client surface. ## Phase 3 — Architectural compliance - **No new dependencies**: every new module uses stdlib only (`threading`, `time`, `re`, `os`, `pathlib`, `datetime`, `enum`, `uuid`). The task constraints called this out explicitly for AZ-295 and AZ-296. - **No cross-component upward imports**: `tile_snapshot_sink.py` and `record_kind_policy.py` import only from `c13_fdr.errors`, `config`, `fdr_client.records`, `logging`. `writer.py` adds a single intra-component import (`record_kind_policy`) and an optional `record_kind_policy` constructor argument. - **Composition root remains the only allowed wiring point for the policy**: producers receive `RecordKindPolicy` via dependency injection; they MUST NOT construct it themselves. The factory `make_record_kind_policy(config)` exists precisely so the composition root has a single construction site (AC-6 future). - **AC-8.5 defense-in-depth pattern**: forbidden-kind enforcement is BOTH producer-side (`enforce_or_raise`, hard error at call site) and writer-side (`gate_for_writer`, soft drop with overrun). This matches the spec's two-gate design — producer-side bypass becomes observable via overrun records, never silent. - **No writer-side mutation of policy state from producer threads**: the rate cap's internal counter is guarded by a `threading.Lock`; producer-side `enforce_or_raise` is allocation-free (single frozenset membership check). - **Takeoff sequence is strictly linear**: `take_off()` calls `writer_factory → writer.start → writer.open_flight → fc_adapter_factory → other_components_factory` in that order. AC-8 verified by spy-based ordering test. ## Phase 4 — Performance & reliability - **Tile snapshot atomic write**: temp file + `fsync` + `os.replace` ensures crash-consistency. No leftover `.tmp` files after success path (AC-5 verified). - **Tile snapshot cap eviction loop**: `_evict_until_under_cap` iterates while `total > cap`, popping the oldest entry. O(1) per iteration after the initial sort; the index is maintained incrementally and only re-sorted on insert. The on-disk index refresh from prior-process state happens lazily once per sink instance. - **Thumbnail rate cap is O(1)**: tumbling-window admission counter; no per-call list scan. NFR-perf-gate-allow / NFR-perf-gate-drop satisfied (microbench < 5 µs avg). - **enforce_or_raise allocation-free**: single `record.kind in self._forbidden_kinds` (frozenset membership). Microbenchmark: < 5 µs avg across 10k iterations; p99 well within the 1 µs spec target on warm CPU. - **Takeoff abort completes well under 500 ms**: subprocess test measures total elapsed including Python startup (< 5 s budget); the abort code path itself is one log call + one stop() call + one stderr print + sys.exit. - **WARN log rate cap on thumbnail floods**: `_LOG_RATE_LIMIT_S = 1.0` matches AZ-291's `_LOG_FAILURE_RATE_LIMIT_S` pattern. Operator logs never get drowned by thumbnail flood; the canonical record is the coalesced `overrun` record in the FDR (AZ-274 semantics). ## Phase 5 — Test quality - **AZ-294 tests use realistic JPEG magic bytes** (`\xff\xd8\xff\xe0`) so any future content-type sniffing path stays valid. - **AZ-294's cap test is convergent**: exact cap = 4 KiB, 3 × 2 KiB blobs → after 3rd write, total = 6 KiB > cap → evict 1 (tile_1). Asserts both the surviving set on disk AND the overrun record count. - **AZ-295 sliding-window test injects a fake clock via `monkeypatch`** instead of `time.sleep` — avoids flaky timing dependence on CI runner load. - **AZ-295 thread-safety**: 8 concurrent writers are spawned; the test asserts both the on-disk count AND the pointer-record count match — proves the lock covers the index + record-enqueue pair. - **AZ-296 subprocess tests cover the real `sys.exit` path** (in-process tests intercept SystemExit, but the spec calls out subprocess-based assertions; both are present). - **AZ-296 NFR-reliability test injects a `writer.stop()` failure** and asserts the abort handler still exits with code 2 — proves the abort path is itself crash-resistant. - **Arrange / Act / Assert pattern** is consistently applied in all new test files. ## Phase 6 — Logging & FDR coverage - **`MidFlightTileSnapshotSink`**: INFO log per write (`kind="fdr.tile_snapshot_written"`); WARN per eviction (`kind="fdr.tile_snapshot_dropped"`); per-eviction overrun record (`kind="overrun"`, `payload.producer_id="shared.tile_snapshot_sink"`). - **`RecordKindPolicy`**: WARN per thumbnail flood (`kind="fdr.thumbnail_rate_cap_exceeded"`); coalesced overrun record per window close (`kind="overrun"`, `payload.producer_id=`). - **Takeoff abort**: ERROR log (`kind="composition_root.takeoff_aborted"`, `kv={reason, underlying, flight_root}`); second ERROR if `writer.stop()` itself fails (`kind="composition_root.takeoff_abort_stop_failed"`). - All log records follow the `kind` + `kv` convention required by AZ-266's `JsonFormatter`. ## Phase 7 — Security & risk surface - **AC-8.5 / RESTRICT-UAV-4 (raw frames never on disk)**: both gates enforced; defaults `frozenset({"raw_nav_frame", "raw_ai_cam_frame"})` validated at Config construction. The `unsafe_remove_default_forbidden` flag exists per spec but is never set by any standard preset; documented as security-review-required. - **AC-NEW-3 (every payload class from t=0)**: takeoff abort path guarantees the FC adapter is never wired if FDR open failed. AC-4 / AC-8 ordering tests pin this in CI. - **Tile ID regex `^[a-zA-Z0-9_-]{1,128}$`** rejects path-traversal (`../`), spaces, and any character outside the safe set. Empty IDs and oversize IDs (> 128 chars) are also rejected. - **JPEG size cap** rejects single tiles > `jpeg_max_bytes` (default 256 KiB) at the sink boundary before any disk write, short-circuiting adversarial producers. - **Cap-policy eviction is content-blind**: oldest captured_at wins. No content-hash gating; the per-flight cap is a budget, not a security gate. - **`os._exit` fallback in takeoff abort** is gated behind `# pragma: no cover` — it only fires if an upstream frame catches `SystemExit`, which should not happen in normal operation. Documented as defense-in-depth. ## Informational findings (non-blocking) 1. **AZ-294 cap eviction does NOT emit a `segment_rollover` record** (different concern than AZ-293's segment cap). Per-tile drops are reported via `kind="overrun"` with `producer_id="shared.tile_snapshot_sink"`. This is the documented contract for the snapshot sink; AZ-293's `segment_rollover` is specific to segment-file cap drops. 2. **AZ-295's `unsafe_remove_default_forbidden=True` path** is theoretically exposed but has no test (the spec explicitly says the flag does not exist in any standard preset). Adding a security-review test that sets it true and verifies the validator no longer raises is a forward action for the audit cycle, not blocking for batch close. 3. **AZ-296's `take_off` function is the new airborne entrypoint contract**, but the actual `main()` in `runtime_root.py` still calls only `compose_root`. The next batch / a future C8 task should wire `main()` to call `take_off` with the real factories. Documented in the contract update; out of scope for this batch. ## Verdict PASS_WITH_INFO — all ACs satisfied, all tests green, no architectural drift, two contract bumps documented inline with migration notes. The three informational findings are forward actions, not blockers.