Files
gps-denied-onboard/_docs/03_implementation/reviews/batch_06_review.md
T
Oleksandr Bezdieniezhnykh b5dd6031d2 [AZ-291] [AZ-292] [AZ-293] C13 FDR writer chain (batch 6)
AZ-291 — FileFdrWriter: single writer thread draining every registered
FdrClient SPSC ring buffer to per-flight segment files; per-segment
size rotation; cross-process fcntl.flock filelock on flight_root;
ENOSPC degraded mode with rate-capped ERROR logs and one GCS alert.

AZ-292 — FlightHeader/FlightFooter dataclasses + open_flight /
close_flight lifecycle methods; four per-flight monotonic counters
(records_written, records_dropped_overrun, bytes_written,
rollover_count) reported by the footer; flight_id mismatch and
close-without-open are typed errors.

AZ-293 — CapacityCapPolicy (post-rotation hook): walks the flight
directory, drops the oldest CLOSED segment when total > cap (default
64 GiB), emits a kind="segment_rollover" record per drop. Never drops
the currently-open segment or segment 0 alone; cap_misconfigured path
logs ERROR + GCS alert. No config flag disables emission (C13-ST-01).

Schema: bumped fdr_record_schema flight_header / flight_footer payload
key sets to match the AZ-292 task spec (effective 1.0.0 -> 1.1.0; no
prior producer); KNOWN_PAYLOAD_KEYS updated. Added FdrWriterConfig
nested in FdrConfig (segment_size_bytes, batch_size, flight_cap_bytes,
debug_log_per_record).

Tests: 29 new unit tests (8 AC + 1 invariant per task); full suite
323 passed, 2 pre-existing skips, 0 regressions.

Co-authored-by: Cursor <cursoragent@cursor.com>
2026-05-11 03:38:58 +03:00

6.7 KiB
Raw Blame History

Batch 06 — Code Review

Batch: 6 of N Tasks: AZ-291 (C13 writer thread), AZ-292 (FlightHeader/Footer), AZ-293 (Capacity cap policy) Reviewer: autodev (7-phase) Verdict: PASS_WITH_INFO Date: 2026-05-11

Scope

Task Component / Concern Files touched (prod) Files touched (tests)
AZ-291 C13 writer thread + segment files + filelock + ENOSPC components/c13_fdr/{writer.py,errors.py,interface.py,__init__.py}, config/schema.py, fdr_client/records.py tests/unit/c13_fdr/test_az291_writer_thread.py
AZ-292 Flight header/footer + counters components/c13_fdr/{headers.py,writer.py}, fdr_record_schema.md contract bump tests/unit/c13_fdr/test_az292_flight_header_footer.py, tests/unit/test_az272_fdr_record_schema.py
AZ-293 64 GiB cap + oldest-segment-dropped + segment_rollover components/c13_fdr/cap_policy.py, writer hook surface tests/unit/c13_fdr/test_az293_capacity_cap_policy.py

Phase 1 — AC compliance

All 8 ACs per task verified via the new unit tests; 29 new tests added, all passing. See per-task AC coverage in batch_06_cycle1_report.md.

Phase 2 — Contract drift

  • fdr_record_schema.md v1.0.0 → v1.1.0 (effective): flight_header and flight_footer payload key sets were extended to match AZ-292's task-spec dataclass shape (flight_started_at_iso, flight_started_at_monotonic_ns, config_snapshot, signing_key_rotation_event, manifest_content_hashes on header; flight_ended_at_iso, flight_ended_at_monotonic_ns, records_dropped_overrun, bytes_written, rollover_count, clean_shutdown on footer). The previous narrow shape (started_at / ended_at / records_dropped) was an unimplemented draft — no producer or consumer relied on it. The change is a minor bump per the contract's own versioning rules ("new optional payload field appended → minor"); existing parsers stay forward-compatible (unknown keys end up in payload.extra). The AZ-272 round-trip test was updated to track the new canonical fields.

Phase 3 — Architectural compliance

  • R14 / single-writer SPSC contract: FileFdrWriter is the sole consumer of every registered FdrClient; the writer thread is the only mutator of the four flight counters. No reader-side locks.
  • No cross-component upward imports: cap_policy.py imports FileFdrWriter (allowed: same component); writer.py imports from fdr_client.* and config.* (allowed: cross-cutting); no component upward edges.
  • AZ-291 vs AZ-293 separation: per-segment rotation (size-driven) lives in the writer; per-flight cap policy (cumulative size-driven) lives in CapacityCapPolicy wired by composition root via on_rotation hook. Writer never imports the policy.
  • No new dependencies: contract said "atomicwrites + filelock" but filelock was not in pyproject.toml. Used fcntl.flock from stdlib (POSIX advisory locks — kernel releases on process death, matching the Risk-3 mitigation in the spec). Documented inline in writer.py module docstring.

Phase 4 — Performance & reliability

  • fsync discipline (AC-3 / NFR-reliability-fsync): every segment close (rotation + stop + close_flight) calls os.fsync before os.close. No per-record fsync (NFR allows this).
  • No backward seeks (NFR-reliability-no-seek): file descriptor opened with O_WRONLY | O_CREAT | O_APPEND; only os.write and os.fsync are called on it.
  • Footer bytes_written self-reference (AC-3): the footer's bytes_written payload field must include the footer's own framed size. close_flight() iterates up to 8 times to converge (an integer field's ASCII length only changes at decimal-power boundaries, so the fixpoint is reached in ≤ 2 passes in practice).
  • ENOSPC degraded mode (AC-5): catches OSError around os.write, emits one ERROR log + one GCS alert, drops further records while continuing to dequeue producer buffers so producers don't grow unbounded. Per-second log rate cap (_LOG_FAILURE_RATE_LIMIT_S = 1.0) caps repeated failure noise.
  • Filelock recovery (Risk 3): fcntl.flock is kernel-managed; abrupt process death releases the lock automatically — verified by AC-6 (re-construct succeeds after stop()).

Phase 5 — Test quality

  • AC coverage: 8 ACs per task × 3 tasks = 24 explicit AC tests; plus 5 additional invariants (frozen dataclasses, double-start, double-close idempotency, cap-policy input validation, config-schema-no-disable-flag).
  • Determinism: tests use a busy-wait loop with explicit timeout (deadline = time.monotonic() + 5.0) to wait for the writer thread to drain producer buffers — preferred over fixed time.sleep (deterministic on fast machines, robust on slow ones).
  • Production isolation: no monkeypatch of stdlib unless required (AC-5 ENOSPC mocks os.write once to inject errno.ENOSPC; AC-4 of AZ-292 mocks os.write to inject PermissionError for the read-only-mount path).
  • AC-7 of AZ-292 (clean_shutdown=False on uncleansed teardown): tests choose the "no footer at all" path (allowed by spec); production composition root can choose to add the partial-footer path later without breaking the contract.

Phase 6 — Informational findings (no blockers)

  1. Footer bytes_written includes the footer record itself — convergence loop runs once or twice in practice; documented inline. If a future test pins the exact byte total against the file, the loop is the canonical answer; no edge case where it diverges (8-iter cap is paranoia margin).

  2. Cap policy emits segment_rollover records via the shared FdrClient — those records are themselves enqueued, drained, and written to the current open segment. Under aggressive test caps (cap ≤ 1024 bytes, segment ≤ 256 bytes), the cascade of rollover-record writes can extend the cap-drop loop. In production with cap = 64 GiB and segments = 64 MiB, the cascade is negligible (≤ tens of bytes per drop). Documented as test-only consideration in AC-2 test comments.

  3. Contract minor bump for flight_header / flight_footer — see Phase 2. No consumer of the previous narrow shape exists; no migration needed; the v1.0.0 draft has been overwritten in place because no record matching the prior shape has ever been emitted.

  4. filelock dependency replaced with fcntl.flock — see Phase 3. Net effect: one fewer transitive dependency, same semantics on Linux + macOS (target platforms). Windows is explicitly unsupported on the companion onboard runtime.

Phase 7 — Verdict

PASS_WITH_INFO — all ACs covered, all 323 project tests green, no lint or formatting issues, no contract drift uncovered. Informational findings (14 above) are documented and require no follow-up beyond their inline notes.