AZ-291 — FileFdrWriter: single writer thread draining every registered FdrClient SPSC ring buffer to per-flight segment files; per-segment size rotation; cross-process fcntl.flock filelock on flight_root; ENOSPC degraded mode with rate-capped ERROR logs and one GCS alert. AZ-292 — FlightHeader/FlightFooter dataclasses + open_flight / close_flight lifecycle methods; four per-flight monotonic counters (records_written, records_dropped_overrun, bytes_written, rollover_count) reported by the footer; flight_id mismatch and close-without-open are typed errors. AZ-293 — CapacityCapPolicy (post-rotation hook): walks the flight directory, drops the oldest CLOSED segment when total > cap (default 64 GiB), emits a kind="segment_rollover" record per drop. Never drops the currently-open segment or segment 0 alone; cap_misconfigured path logs ERROR + GCS alert. No config flag disables emission (C13-ST-01). Schema: bumped fdr_record_schema flight_header / flight_footer payload key sets to match the AZ-292 task spec (effective 1.0.0 -> 1.1.0; no prior producer); KNOWN_PAYLOAD_KEYS updated. Added FdrWriterConfig nested in FdrConfig (segment_size_bytes, batch_size, flight_cap_bytes, debug_log_per_record). Tests: 29 new unit tests (8 AC + 1 invariant per task); full suite 323 passed, 2 pre-existing skips, 0 regressions. Co-authored-by: Cursor <cursoragent@cursor.com>
8.8 KiB
Batch 06 — Implementation Report (Cycle 1)
Tasks: AZ-291, AZ-292, AZ-293 Component: C13 FDR Writer (E-C13) Cycle: 1 (Build → Ship) Date: 2026-05-11
Summary
Built the C13 FDR writer chain end-to-end. AZ-291 lands the single writer thread + segment file lifecycle + cross-process filelock + ENOSPC degraded mode. AZ-292 lands the FlightHeader / FlightFooter records and the four per-flight counters (records_written, records_dropped_overrun, bytes_written, rollover_count) that make a flight directory self-describing. AZ-293 lands the per-flight 64 GiB cap policy with oldest-segment-dropped + canonical segment_rollover record emission.
The three tasks share a single module (components/c13_fdr/) with these new files:
errors.py— five typed exceptions covering construction, open, close, and concurrent-writer failure paths.headers.py—FlightHeaderandFlightFooterfrozen dataclasses.writer.py—FileFdrWriter(AZ-291 + AZ-292).cap_policy.py—CapacityCapPolicy(AZ-293).__init__.py,interface.py— re-exports.
Features Landed
AZ-291 — Writer thread + segment lifecycle
FileFdrWriter(flight_root, flight_id, config, fdr_clients, gcs_alert, *, on_rotation, drain_sleep_s)constructor.start(),stop(),open_flight(header),close_flight()lifecycle methods.- Background writer thread that loops over every registered
FdrClient.drain(batch_size)and writes serialised records to the current segment with<uint32-LE length prefix> | <serialised body>framing. - Per-segment rotation triggered by
segment_size_bytes(default 64 MiB). - Cross-process filelock via
fcntl.flock(LOCK_EX | LOCK_NB)onflight_root/.fdr.lock; held for the entire flight; constructor-timeFdrConcurrentWriterErroron contention. - ENOSPC degraded mode: one ERROR log + one GCS alert; subsequent failures are log-rate-capped at 1/sec; producer buffers keep draining (records discarded) so producer-side memory does not grow unbounded.
- Public introspection:
current_segment_path(),current_segment_bytes(),segments_written(),is_rolling(),is_degraded(),current_size_bytes(),rollover_count,records_dropped_overrun,flight_id,flight_dir.
AZ-292 — FlightHeader / FlightFooter + counters
FlightHeaderdataclass withflight_id,flight_started_at_iso,flight_started_at_monotonic_ns,config_snapshot,signing_key_rotation_event,manifest_content_hashes,build_info.FlightFooterdataclass withflight_id,flight_ended_at_iso,flight_ended_at_monotonic_ns,records_written,records_dropped_overrun,bytes_written,rollover_count,clean_shutdown.open_flight(header)writes the header as the first record of segment 0; rejects flight_id mismatch withFdrOpenError.close_flight()drains pending producer records, builds the footer (iteratively convergingbytes_writtento include the footer's own size), writes it, releases the filelock, and returns theFlightFooterto the caller. Idempotent (a second call returns the cached footer).- Counter integration:
_append_recordincrements_records_writtenand_bytes_written;_observe_overrun_recordaggregatespayload.dropped_countinto_records_dropped_overrun;_rotate_segmentbumps_rollover_count.
AZ-293 — Capacity cap policy
CapacityCapPolicy(cap_bytes, fdr_client, gcs_alert)callable; invoked byFileFdrWritervia theon_rotationhook after every per-segment rotation.- Walks the flight directory, sums on-disk segment sizes + writer's running
current_segment_bytes, and unlinks the oldest CLOSED segment if total > cap. Repeats until under cap. - Segment 0 (containing the
flight_header) is never dropped unless it is the only candidate AND the directory is over cap by itself — in that case logsfdr.cap_misconfiguredERROR + emits one GCS alert and lets the flight continue in degraded mode. - Each drop enqueues a
kind="segment_rollover"FdrRecord(envelopeproducer_id="shared.fdr_client") carryingold_segment,new_segment,total_bytes_after; bumpswriter.rollover_count; logsfdr.cap_dropINFO. - Default
cap_bytes = 64 * 1024**3(64 GiB exactly per AC-NEW-3 + AC-7); valid range[1024, 2**40]. - No config flag disables
segment_rolloveremission (AC-6 verified by a config-schema scan test).
Schema / Contract Changes
_docs/02_document/contracts/shared_fdr_client/fdr_record_schema.md—flight_headerandflight_footerpayload key sets extended to match AZ-292's task-spec dataclasses. Effective minor bump (1.0.0 → 1.1.0); no breaking change since no producer or consumer used the previous narrow shape.src/gps_denied_onboard/fdr_client/records.py—KNOWN_PAYLOAD_KEYSupdated for the two kinds.src/gps_denied_onboard/config/schema.py— addedFdrWriterConfignested insideFdrConfig. Fields:segment_size_bytes(default 64 MiB),batch_size(default 64),flight_cap_bytes(default 64 GiB),debug_log_per_record(default False).
Dependency Changes
None. Despite the AZ-291 spec calling for filelock, the package was not in pyproject.toml and fcntl.flock from the stdlib provides equivalent POSIX advisory-lock semantics (kernel auto-releases on process death — directly matching the Risk-3 mitigation). Documented inline in the writer's module docstring.
Test Results
- New tests: 29 (9 for AZ-291, 10 for AZ-292, 10 for AZ-293).
- Full suite: 323 passed, 2 skipped (pre-existing cmake / actionlint skips). 0 regressions.
Acceptance Criteria Coverage
| Task | AC | Test | Status |
|---|---|---|---|
| AZ-291 | AC-1 drain all producers | test_ac1_drain_all_registered_producers |
PASS |
| AZ-291 | AC-2 per-segment rotation | test_ac2_per_segment_rotation_at_size_cap |
PASS |
| AZ-291 | AC-3 atomic rotation | test_ac3_atomic_rotation_no_half_segment |
PASS |
| AZ-291 | AC-4 filelock prevents concurrent | test_ac4_concurrent_writer_blocked_by_filelock |
PASS |
| AZ-291 | AC-5 ENOSPC degrades + alerts | test_ac5_enospc_degrades_and_alerts |
PASS |
| AZ-291 | AC-6 stop drains + fsyncs + releases lock | test_ac6_stop_drains_and_releases_lock |
PASS |
| AZ-291 | AC-7 segment file layout | test_ac7_segment_layout |
PASS |
| AZ-291 | AC-8 steady-state no overrun | test_ac8_steady_state_no_overrun |
PASS |
| AZ-292 | AC-1 header is first record | test_ac1_flight_header_is_first_record |
PASS |
| AZ-292 | AC-2 footer is last record | test_ac2_flight_footer_is_last_record |
PASS |
| AZ-292 | AC-3 counters reflect reality | test_ac3_counters_reflect_on_disk_reality |
PASS |
| AZ-292 | AC-4 open_flight FdrOpenError on disk failure | test_ac4_open_flight_fdrerror_on_disk_failure |
PASS |
| AZ-292 | AC-5 reject flight_id mismatch | test_ac5_open_flight_rejects_flight_id_mismatch |
PASS |
| AZ-292 | AC-6 close without open raises | test_ac6_close_without_open_raises |
PASS |
| AZ-292 | AC-7 clean_shutdown=False on teardown | test_ac7_uncleansed_teardown_no_clean_shutdown |
PASS |
| AZ-292 | AC-8 records_dropped_overrun aggregates | test_ac8_records_dropped_overrun_aggregates_dropped_counts |
PASS |
| AZ-293 | AC-1 drop oldest when over cap | test_ac1_drop_oldest_when_dir_exceeds_cap |
PASS |
| AZ-293 | AC-2 loop until under cap | test_ac2_loop_until_under_cap |
PASS |
| AZ-293 | AC-3 misconfigured cap path | test_ac3_cap_misconfigured_when_segment_zero_alone |
PASS |
| AZ-293 | AC-4 open segment never dropped | test_ac4_currently_open_segment_never_dropped |
PASS |
| AZ-293 | AC-5 canonical fields on rollover | test_ac5_segment_rollover_record_has_canonical_fields |
PASS |
| AZ-293 | AC-6 no disable flag | test_ac6_no_config_flag_disables_segment_rollover + test_config_full_schema_has_no_rollover_disable_field |
PASS |
| AZ-293 | AC-7 default cap is exactly 64 GiB | test_ac7_default_cap_is_exactly_64_gib |
PASS |
| AZ-293 | AC-8 rollover_count matches | test_ac8_rollover_count_matches_segment_rollover_records |
PASS |
Follow-ups
- AZ-294 / AZ-295 / AZ-296: mid-flight tile snapshot path, thumbnail rate cap, and takeoff-abort wiring — next sub-tasks in E-C13 (out of scope for Batch 6).
- Composition root wiring: the
runtime_root.pywill inject theCapacityCapPolicyinstance as the writer'son_rotationcallback when E-C13's full wiring lands (likely a later batch or AZ-270 expansion). - NFR-perf microbenches: NFR-perf-throughput (≥ 200 Hz on Tier-2), NFR-perf-rotation (p99 ≤ 50 ms), NFR-perf-hook (p99 ≤ 50 ms), NFR-perf-multi-drop (≤ 100 ms) are documented in the specs but require Tier-2 hardware to run; tracked for a future Jetson-harness cycle.
- AZ-294 mid-flight tile snapshot: depends on the writer being able to record a JSON pointer record without copying the JPEG inline (
sidecar_pathinvariant); the existing_append_recordsupports this directly. Implementation will live in this same module.