Files
gps-denied-onboard/_docs/03_implementation/batch_06_cycle1_report.md
T
Oleksandr Bezdieniezhnykh b5dd6031d2 [AZ-291] [AZ-292] [AZ-293] C13 FDR writer chain (batch 6)
AZ-291 — FileFdrWriter: single writer thread draining every registered
FdrClient SPSC ring buffer to per-flight segment files; per-segment
size rotation; cross-process fcntl.flock filelock on flight_root;
ENOSPC degraded mode with rate-capped ERROR logs and one GCS alert.

AZ-292 — FlightHeader/FlightFooter dataclasses + open_flight /
close_flight lifecycle methods; four per-flight monotonic counters
(records_written, records_dropped_overrun, bytes_written,
rollover_count) reported by the footer; flight_id mismatch and
close-without-open are typed errors.

AZ-293 — CapacityCapPolicy (post-rotation hook): walks the flight
directory, drops the oldest CLOSED segment when total > cap (default
64 GiB), emits a kind="segment_rollover" record per drop. Never drops
the currently-open segment or segment 0 alone; cap_misconfigured path
logs ERROR + GCS alert. No config flag disables emission (C13-ST-01).

Schema: bumped fdr_record_schema flight_header / flight_footer payload
key sets to match the AZ-292 task spec (effective 1.0.0 -> 1.1.0; no
prior producer); KNOWN_PAYLOAD_KEYS updated. Added FdrWriterConfig
nested in FdrConfig (segment_size_bytes, batch_size, flight_cap_bytes,
debug_log_per_record).

Tests: 29 new unit tests (8 AC + 1 invariant per task); full suite
323 passed, 2 pre-existing skips, 0 regressions.

Co-authored-by: Cursor <cursoragent@cursor.com>
2026-05-11 03:38:58 +03:00

8.8 KiB

Batch 06 — Implementation Report (Cycle 1)

Tasks: AZ-291, AZ-292, AZ-293 Component: C13 FDR Writer (E-C13) Cycle: 1 (Build → Ship) Date: 2026-05-11

Summary

Built the C13 FDR writer chain end-to-end. AZ-291 lands the single writer thread + segment file lifecycle + cross-process filelock + ENOSPC degraded mode. AZ-292 lands the FlightHeader / FlightFooter records and the four per-flight counters (records_written, records_dropped_overrun, bytes_written, rollover_count) that make a flight directory self-describing. AZ-293 lands the per-flight 64 GiB cap policy with oldest-segment-dropped + canonical segment_rollover record emission.

The three tasks share a single module (components/c13_fdr/) with these new files:

  • errors.py — five typed exceptions covering construction, open, close, and concurrent-writer failure paths.
  • headers.pyFlightHeader and FlightFooter frozen dataclasses.
  • writer.pyFileFdrWriter (AZ-291 + AZ-292).
  • cap_policy.pyCapacityCapPolicy (AZ-293).
  • __init__.py, interface.py — re-exports.

Features Landed

AZ-291 — Writer thread + segment lifecycle

  • FileFdrWriter(flight_root, flight_id, config, fdr_clients, gcs_alert, *, on_rotation, drain_sleep_s) constructor.
  • start(), stop(), open_flight(header), close_flight() lifecycle methods.
  • Background writer thread that loops over every registered FdrClient.drain(batch_size) and writes serialised records to the current segment with <uint32-LE length prefix> | <serialised body> framing.
  • Per-segment rotation triggered by segment_size_bytes (default 64 MiB).
  • Cross-process filelock via fcntl.flock(LOCK_EX | LOCK_NB) on flight_root/.fdr.lock; held for the entire flight; constructor-time FdrConcurrentWriterError on contention.
  • ENOSPC degraded mode: one ERROR log + one GCS alert; subsequent failures are log-rate-capped at 1/sec; producer buffers keep draining (records discarded) so producer-side memory does not grow unbounded.
  • Public introspection: current_segment_path(), current_segment_bytes(), segments_written(), is_rolling(), is_degraded(), current_size_bytes(), rollover_count, records_dropped_overrun, flight_id, flight_dir.

AZ-292 — FlightHeader / FlightFooter + counters

  • FlightHeader dataclass with flight_id, flight_started_at_iso, flight_started_at_monotonic_ns, config_snapshot, signing_key_rotation_event, manifest_content_hashes, build_info.
  • FlightFooter dataclass with flight_id, flight_ended_at_iso, flight_ended_at_monotonic_ns, records_written, records_dropped_overrun, bytes_written, rollover_count, clean_shutdown.
  • open_flight(header) writes the header as the first record of segment 0; rejects flight_id mismatch with FdrOpenError.
  • close_flight() drains pending producer records, builds the footer (iteratively converging bytes_written to include the footer's own size), writes it, releases the filelock, and returns the FlightFooter to the caller. Idempotent (a second call returns the cached footer).
  • Counter integration: _append_record increments _records_written and _bytes_written; _observe_overrun_record aggregates payload.dropped_count into _records_dropped_overrun; _rotate_segment bumps _rollover_count.

AZ-293 — Capacity cap policy

  • CapacityCapPolicy(cap_bytes, fdr_client, gcs_alert) callable; invoked by FileFdrWriter via the on_rotation hook after every per-segment rotation.
  • Walks the flight directory, sums on-disk segment sizes + writer's running current_segment_bytes, and unlinks the oldest CLOSED segment if total > cap. Repeats until under cap.
  • Segment 0 (containing the flight_header) is never dropped unless it is the only candidate AND the directory is over cap by itself — in that case logs fdr.cap_misconfigured ERROR + emits one GCS alert and lets the flight continue in degraded mode.
  • Each drop enqueues a kind="segment_rollover" FdrRecord (envelope producer_id="shared.fdr_client") carrying old_segment, new_segment, total_bytes_after; bumps writer.rollover_count; logs fdr.cap_drop INFO.
  • Default cap_bytes = 64 * 1024**3 (64 GiB exactly per AC-NEW-3 + AC-7); valid range [1024, 2**40].
  • No config flag disables segment_rollover emission (AC-6 verified by a config-schema scan test).

Schema / Contract Changes

  • _docs/02_document/contracts/shared_fdr_client/fdr_record_schema.mdflight_header and flight_footer payload key sets extended to match AZ-292's task-spec dataclasses. Effective minor bump (1.0.0 → 1.1.0); no breaking change since no producer or consumer used the previous narrow shape.
  • src/gps_denied_onboard/fdr_client/records.pyKNOWN_PAYLOAD_KEYS updated for the two kinds.
  • src/gps_denied_onboard/config/schema.py — added FdrWriterConfig nested inside FdrConfig. Fields: segment_size_bytes (default 64 MiB), batch_size (default 64), flight_cap_bytes (default 64 GiB), debug_log_per_record (default False).

Dependency Changes

None. Despite the AZ-291 spec calling for filelock, the package was not in pyproject.toml and fcntl.flock from the stdlib provides equivalent POSIX advisory-lock semantics (kernel auto-releases on process death — directly matching the Risk-3 mitigation). Documented inline in the writer's module docstring.

Test Results

  • New tests: 29 (9 for AZ-291, 10 for AZ-292, 10 for AZ-293).
  • Full suite: 323 passed, 2 skipped (pre-existing cmake / actionlint skips). 0 regressions.

Acceptance Criteria Coverage

Task AC Test Status
AZ-291 AC-1 drain all producers test_ac1_drain_all_registered_producers PASS
AZ-291 AC-2 per-segment rotation test_ac2_per_segment_rotation_at_size_cap PASS
AZ-291 AC-3 atomic rotation test_ac3_atomic_rotation_no_half_segment PASS
AZ-291 AC-4 filelock prevents concurrent test_ac4_concurrent_writer_blocked_by_filelock PASS
AZ-291 AC-5 ENOSPC degrades + alerts test_ac5_enospc_degrades_and_alerts PASS
AZ-291 AC-6 stop drains + fsyncs + releases lock test_ac6_stop_drains_and_releases_lock PASS
AZ-291 AC-7 segment file layout test_ac7_segment_layout PASS
AZ-291 AC-8 steady-state no overrun test_ac8_steady_state_no_overrun PASS
AZ-292 AC-1 header is first record test_ac1_flight_header_is_first_record PASS
AZ-292 AC-2 footer is last record test_ac2_flight_footer_is_last_record PASS
AZ-292 AC-3 counters reflect reality test_ac3_counters_reflect_on_disk_reality PASS
AZ-292 AC-4 open_flight FdrOpenError on disk failure test_ac4_open_flight_fdrerror_on_disk_failure PASS
AZ-292 AC-5 reject flight_id mismatch test_ac5_open_flight_rejects_flight_id_mismatch PASS
AZ-292 AC-6 close without open raises test_ac6_close_without_open_raises PASS
AZ-292 AC-7 clean_shutdown=False on teardown test_ac7_uncleansed_teardown_no_clean_shutdown PASS
AZ-292 AC-8 records_dropped_overrun aggregates test_ac8_records_dropped_overrun_aggregates_dropped_counts PASS
AZ-293 AC-1 drop oldest when over cap test_ac1_drop_oldest_when_dir_exceeds_cap PASS
AZ-293 AC-2 loop until under cap test_ac2_loop_until_under_cap PASS
AZ-293 AC-3 misconfigured cap path test_ac3_cap_misconfigured_when_segment_zero_alone PASS
AZ-293 AC-4 open segment never dropped test_ac4_currently_open_segment_never_dropped PASS
AZ-293 AC-5 canonical fields on rollover test_ac5_segment_rollover_record_has_canonical_fields PASS
AZ-293 AC-6 no disable flag test_ac6_no_config_flag_disables_segment_rollover + test_config_full_schema_has_no_rollover_disable_field PASS
AZ-293 AC-7 default cap is exactly 64 GiB test_ac7_default_cap_is_exactly_64_gib PASS
AZ-293 AC-8 rollover_count matches test_ac8_rollover_count_matches_segment_rollover_records PASS

Follow-ups

  • AZ-294 / AZ-295 / AZ-296: mid-flight tile snapshot path, thumbnail rate cap, and takeoff-abort wiring — next sub-tasks in E-C13 (out of scope for Batch 6).
  • Composition root wiring: the runtime_root.py will inject the CapacityCapPolicy instance as the writer's on_rotation callback when E-C13's full wiring lands (likely a later batch or AZ-270 expansion).
  • NFR-perf microbenches: NFR-perf-throughput (≥ 200 Hz on Tier-2), NFR-perf-rotation (p99 ≤ 50 ms), NFR-perf-hook (p99 ≤ 50 ms), NFR-perf-multi-drop (≤ 100 ms) are documented in the specs but require Tier-2 hardware to run; tracked for a future Jetson-harness cycle.
  • AZ-294 mid-flight tile snapshot: depends on the writer being able to record a JSON pointer record without copying the JPEG inline (sidecar_path invariant); the existing _append_record supports this directly. Implementation will live in this same module.