Implements two new C12 services and rebalances the C11/C12 boundary in one atomic commit: * AZ-329 PostLandingUploadOrchestrator — gates C11 upload on the `flight_footer` FDR record's `clean_shutdown` field; 4 refusal modes; new FdrFooterReader Protocol + LocalFdrFooterReader. * AZ-330 OperatorReLocService — AC-3.4 visual-loss re-localization hint; reuses shared LatLonAlt; OperatorCommandTransport Protocol cut (E-C8 owns the future pymavlink concrete); new FDR record kind `c12.reloc.requested`; log redaction (lat/lon 5 decimals, reason 200 chars). * AZ-523 C11 internal flight-state gate removed (SRP refactor): `confirm_flight_state` / `FlightStateSignal` use / `FlightStateNotOnGroundError` deleted from C11; TileUploader contract bumped to v2.0.0 (frozen) with migration note; AZ-317 superseded. * AZ-524 Package rename `c12_operator_tooling` → `c12_operator_orchestrator` across source, tests, pyproject, CMake, Dockerfile, compose, CI, runtime-root services class (`OperatorOrchestratorServices`) + factory function (`build_operator_orchestrator`), logger namespaces, config slug, docs, and the E-C12 epic title. Tests: 1543 passed, 80 skipped (all environment gates). Targeted AC suite (AZ-329 + AZ-330 + FdrFooterReader): 37 passed. Cold-start NFR-perf still ≤ 500 ms p99. Tracker: AZ-317 → Done (superseded); AZ-319 v2.0.0 contract bump comment; AZ-329/AZ-330 → In Testing; AZ-253 epic renamed; AZ-523 + AZ-524 created and closed as audit-trail tickets. See `_docs/03_implementation/batch_44_cycle1_report.md`. Co-authored-by: Cursor <cursoragent@cursor.com>
7.2 KiB
C13 — Flight Data Recorder (FDR)
1. High-Level Overview
Purpose: persist a per-flight ≤ 64 GB record of every payload class onboard (estimates, IMU traces, emitted MAVLink, system health, mid-flight tiles, ≤0.1 Hz failed-tile thumbnails) without silently dropping data (AC-NEW-3). Exclude raw nav/AI-cam frames (AC-8.5; only the failed-tile thumbnail forensic exception is allowed). The FDR is the system's audit log: every safety-critical decision, every emitted frame, every signing key rotation, every spoof-promotion-block lands here.
Architectural Pattern: single concrete FileFdrWriter behind a FdrWriter interface. Single writer thread fed by lock-free in-process queues from every component. Lossy on writer-thread overrun only by logging the rollover event, never silently.
Upstream dependencies: every component publishes to C13 via in-process pub/sub (drop-oldest-with-rollover-log on overrun).
Downstream consumers:
- Post-flight: operator workstation (read via C12 retrieval).
- Real-time: nothing — C13 is write-only at runtime.
2. Internal Interfaces
Interface: FdrWriter
| Method | Input | Output | Async | Error Types |
|---|---|---|---|---|
open_flight |
FlightHeader |
None |
No (called once at takeoff) | FdrOpenError |
write_record |
FdrRecord |
None |
No (lock-free enqueue) | FdrQueueOverrunError (logged but does not raise) |
close_flight |
() |
FlightFooter |
No (called once at landing) | — |
current_size_bytes |
() |
int |
No | — |
is_rolling |
() |
bool |
No | — |
Input/Output DTOs:
FlightHeader:
flight_id: uuid
flight_started_at: ISO 8601 + monotonic_ns
config_snapshot: JSON
signing_key_rotation_event: record
manifest_content_hashes: dict[Path, sha256]
FdrRecord: see data_model.md (FdrRecord; tagged union over payload classes)
FlightFooter:
flight_ended_at: ISO 8601 + monotonic_ns
records_written: int
records_dropped_overrun: int
bytes_written: int
rollover_count: int
3. External API Specification
Not applicable.
4. Data Access Patterns
| Query | Frequency | Hot Path | Index Needed |
|---|---|---|---|
write_record from every component |
up to ~100 Hz aggregate | Yes | n/a |
| Post-flight read (operator retrieval) | once per flight | No | filesystem layout per (flight_id, segment) |
Caching Strategy
| Data | Cache Type | TTL | Invalidation |
|---|---|---|---|
| In-process queue from each producer | bounded ring (drop-oldest with rollover log) | flight lifetime | per-record write |
| Writer-thread buffer | sized for ≥1 s of typical write load | flight lifetime | flush on segment rollover |
Storage Estimates
| Table/Collection | Est. Row Count (1yr) | Row Size | Total Size | Growth Rate |
|---|---|---|---|---|
| Per-flight record file (segmented, oldest-segment-dropped policy) | bounded by 64 GB per AC-NEW-3 | varies per payload class | ≤ 64 GB / flight | bounded by AC-NEW-3 |
| Per-flight tile snapshots (mid-flight tiles) | ~few hundred / flight | 50–200 KB each | up to ~50 MB / flight | bounded by F4 mid-flight gen |
| Per-flight failed-tile thumbnails (AC-8.5 forensic exception) | ≤ 0.1 Hz × 8 h = ≤ 2880 thumbnails / flight | small JPEG | <50 MB | bounded by ≤ 0.1 Hz cap |
Data Management
Seed data: none.
Rollback: per-segment file layout makes per-segment deletion safe. The writer never overwrites a closed segment; it only appends to the current open segment, then opens a new segment when the previous reaches a configurable size cap.
5. Implementation Details
Algorithmic Complexity: per-record cost is O(record_size) for serialisation + write. Aggregate throughput sized for the worst-case AC-NEW-3 cap.
State Management:
- Owns the open per-flight segment file handle.
- Owns the writer thread and the in-process producer queues.
- Owns the rollover policy (oldest-segment-dropped first when total reaches 64 GB).
Key Dependencies:
| Library | Version | Purpose |
|---|---|---|
| orjson / msgpack | per project pin | Record serialisation (serialised format choice during decompose phase) |
| atomicwrites | latest | Segment file rotation (atomic open of new segment + close of previous) |
| filelock | per project pin | Cross-process safety for the FDR root (operator-orchestrator reads while companion writes — companion-only access during flight) |
Error Handling Strategy:
FdrOpenErrorat takeoff: refuse takeoff (per AC-NEW-3 every payload class must be present from t=0).FdrQueueOverrunError: per-producer drop-oldest, but the rollover event itself is ALWAYS logged (a separate "overrun" record in the FDR records the dropped count and producer-id). Never silent.- Filesystem write failure mid-flight: log to stdout/stderr (since we can't log to FDR at this point) + STATUSTEXT to GCS; the system continues to emit external positions because losing the audit log doesn't compromise navigation, but the operator must be alerted.
6. Extensions and Helpers
| Helper | Purpose | Used By |
|---|---|---|
RecordSchema |
versioned record schema for cross-version FDR compatibility | C13 only — this is internal |
7. Caveats & Edge Cases
Known limitations:
- 64 GB cap is per AC-NEW-3. If payload-class throughput grows beyond what the cap supports for an 8 h flight, the producers MUST throttle or accept oldest-dropped — the FDR will not silently exceed the cap.
- Failed-tile thumbnail forensic exception is the ONLY raw-imagery-adjacent persistence; AC-8.5 must be re-asserted if any new payload class is added.
Potential race conditions:
- The writer thread is the single writer; producers enqueue lock-free. No filesystem contention from within the companion. Operator-tool reads happen post-landing only.
Performance bottlenecks:
- Writer-thread serialisation throughput must exceed peak producer throughput. NFT-LIM-02 (8 h synthetic AC-NEW-3) validates.
8. Dependency Graph
Must be implemented after: nothing internal — C13 is foundational along with C7.
Can be implemented in parallel with: every other component.
Blocks: every component (every component logs to C13).
9. Logging Strategy
| Log Level | When | Example |
|---|---|---|
| ERROR | FdrOpenError, mid-flight filesystem write failure |
C13 segment write failure: errno=ENOSPC; STATUSTEXT to GCS |
| WARN | queue overrun (any producer) | C13 queue overrun: producer=c5_state; dropped_count=23 |
| INFO | open/close flight; segment rollover | C13 flight opened: flight_id=…; segment=0 |
| DEBUG | per-write timing (only in dev tier) | C13 record written: kind=estimate; bytes=412; took=0.1ms |
Log format: structured JSON to stdout/journald. Log storage: stdout / journald — but not C13 itself for ERROR (we'd be writing to the broken thing). FDR records are the project-level "logs" for everything except C13's own operational status.