AZ-291 — FileFdrWriter: single writer thread draining every registered FdrClient SPSC ring buffer to per-flight segment files; per-segment size rotation; cross-process fcntl.flock filelock on flight_root; ENOSPC degraded mode with rate-capped ERROR logs and one GCS alert. AZ-292 — FlightHeader/FlightFooter dataclasses + open_flight / close_flight lifecycle methods; four per-flight monotonic counters (records_written, records_dropped_overrun, bytes_written, rollover_count) reported by the footer; flight_id mismatch and close-without-open are typed errors. AZ-293 — CapacityCapPolicy (post-rotation hook): walks the flight directory, drops the oldest CLOSED segment when total > cap (default 64 GiB), emits a kind="segment_rollover" record per drop. Never drops the currently-open segment or segment 0 alone; cap_misconfigured path logs ERROR + GCS alert. No config flag disables emission (C13-ST-01). Schema: bumped fdr_record_schema flight_header / flight_footer payload key sets to match the AZ-292 task spec (effective 1.0.0 -> 1.1.0; no prior producer); KNOWN_PAYLOAD_KEYS updated. Added FdrWriterConfig nested in FdrConfig (segment_size_bytes, batch_size, flight_cap_bytes, debug_log_per_record). Tests: 29 new unit tests (8 AC + 1 invariant per task); full suite 323 passed, 2 pre-existing skips, 0 regressions. Co-authored-by: Cursor <cursoragent@cursor.com>
6.7 KiB
Batch 06 — Code Review
Batch: 6 of N Tasks: AZ-291 (C13 writer thread), AZ-292 (FlightHeader/Footer), AZ-293 (Capacity cap policy) Reviewer: autodev (7-phase) Verdict: PASS_WITH_INFO Date: 2026-05-11
Scope
| Task | Component / Concern | Files touched (prod) | Files touched (tests) |
|---|---|---|---|
| AZ-291 | C13 writer thread + segment files + filelock + ENOSPC | components/c13_fdr/{writer.py,errors.py,interface.py,__init__.py}, config/schema.py, fdr_client/records.py |
tests/unit/c13_fdr/test_az291_writer_thread.py |
| AZ-292 | Flight header/footer + counters | components/c13_fdr/{headers.py,writer.py}, fdr_record_schema.md contract bump |
tests/unit/c13_fdr/test_az292_flight_header_footer.py, tests/unit/test_az272_fdr_record_schema.py |
| AZ-293 | 64 GiB cap + oldest-segment-dropped + segment_rollover | components/c13_fdr/cap_policy.py, writer hook surface |
tests/unit/c13_fdr/test_az293_capacity_cap_policy.py |
Phase 1 — AC compliance
All 8 ACs per task verified via the new unit tests; 29 new tests added, all passing. See per-task AC coverage in batch_06_cycle1_report.md.
Phase 2 — Contract drift
fdr_record_schema.mdv1.0.0 → v1.1.0 (effective):flight_headerandflight_footerpayload key sets were extended to match AZ-292's task-spec dataclass shape (flight_started_at_iso,flight_started_at_monotonic_ns,config_snapshot,signing_key_rotation_event,manifest_content_hasheson header;flight_ended_at_iso,flight_ended_at_monotonic_ns,records_dropped_overrun,bytes_written,rollover_count,clean_shutdownon footer). The previous narrow shape (started_at/ended_at/records_dropped) was an unimplemented draft — no producer or consumer relied on it. The change is a minor bump per the contract's own versioning rules ("new optional payload field appended → minor"); existing parsers stay forward-compatible (unknown keys end up inpayload.extra). The AZ-272 round-trip test was updated to track the new canonical fields.
Phase 3 — Architectural compliance
- R14 / single-writer SPSC contract:
FileFdrWriteris the sole consumer of every registeredFdrClient; the writer thread is the only mutator of the four flight counters. No reader-side locks. - No cross-component upward imports:
cap_policy.pyimportsFileFdrWriter(allowed: same component);writer.pyimports fromfdr_client.*andconfig.*(allowed: cross-cutting); no component upward edges. - AZ-291 vs AZ-293 separation: per-segment rotation (size-driven) lives in the writer; per-flight cap policy (cumulative size-driven) lives in
CapacityCapPolicywired by composition root viaon_rotationhook. Writer never imports the policy. - No new dependencies: contract said "atomicwrites + filelock" but
filelockwas not inpyproject.toml. Usedfcntl.flockfrom stdlib (POSIX advisory locks — kernel releases on process death, matching the Risk-3 mitigation in the spec). Documented inline inwriter.pymodule docstring.
Phase 4 — Performance & reliability
fsyncdiscipline (AC-3 / NFR-reliability-fsync): every segment close (rotation + stop + close_flight) callsos.fsyncbeforeos.close. No per-record fsync (NFR allows this).- No backward seeks (NFR-reliability-no-seek): file descriptor opened with
O_WRONLY | O_CREAT | O_APPEND; onlyos.writeandos.fsyncare called on it. - Footer
bytes_writtenself-reference (AC-3): the footer'sbytes_writtenpayload field must include the footer's own framed size.close_flight()iterates up to 8 times to converge (an integer field's ASCII length only changes at decimal-power boundaries, so the fixpoint is reached in ≤ 2 passes in practice). - ENOSPC degraded mode (AC-5): catches
OSErroraroundos.write, emits one ERROR log + one GCS alert, drops further records while continuing to dequeue producer buffers so producers don't grow unbounded. Per-second log rate cap (_LOG_FAILURE_RATE_LIMIT_S = 1.0) caps repeated failure noise. - Filelock recovery (Risk 3):
fcntl.flockis kernel-managed; abrupt process death releases the lock automatically — verified by AC-6 (re-construct succeeds afterstop()).
Phase 5 — Test quality
- AC coverage: 8 ACs per task × 3 tasks = 24 explicit AC tests; plus 5 additional invariants (frozen dataclasses, double-start, double-close idempotency, cap-policy input validation, config-schema-no-disable-flag).
- Determinism: tests use a busy-wait loop with explicit timeout (
deadline = time.monotonic() + 5.0) to wait for the writer thread to drain producer buffers — preferred over fixedtime.sleep(deterministic on fast machines, robust on slow ones). - Production isolation: no
monkeypatchof stdlib unless required (AC-5 ENOSPC mocksos.writeonce to injecterrno.ENOSPC; AC-4 of AZ-292 mocksos.writeto injectPermissionErrorfor the read-only-mount path). - AC-7 of AZ-292 (clean_shutdown=False on uncleansed teardown): tests choose the "no footer at all" path (allowed by spec); production composition root can choose to add the partial-footer path later without breaking the contract.
Phase 6 — Informational findings (no blockers)
-
Footer
bytes_writtenincludes the footer record itself — convergence loop runs once or twice in practice; documented inline. If a future test pins the exact byte total against the file, the loop is the canonical answer; no edge case where it diverges (8-iter cap is paranoia margin). -
Cap policy emits
segment_rolloverrecords via the shared FdrClient — those records are themselves enqueued, drained, and written to the current open segment. Under aggressive test caps (cap ≤ 1024 bytes, segment ≤ 256 bytes), the cascade of rollover-record writes can extend the cap-drop loop. In production with cap = 64 GiB and segments = 64 MiB, the cascade is negligible (≤ tens of bytes per drop). Documented as test-only consideration in AC-2 test comments. -
Contract minor bump for
flight_header/flight_footer— see Phase 2. No consumer of the previous narrow shape exists; no migration needed; the v1.0.0 draft has been overwritten in place because no record matching the prior shape has ever been emitted. -
filelockdependency replaced withfcntl.flock— see Phase 3. Net effect: one fewer transitive dependency, same semantics on Linux + macOS (target platforms). Windows is explicitly unsupported on the companion onboard runtime.
Phase 7 — Verdict
PASS_WITH_INFO — all ACs covered, all 323 project tests green, no lint or formatting issues, no contract drift uncovered. Informational findings (1–4 above) are documented and require no follow-up beyond their inline notes.