Transitioned the autodev state to phase 21, reflecting the completion of Step 5 and the drafting of Step 6 epics. Revised the architecture documentation to clarify the roles of the Tile Manager and its components, ensuring accurate representation of the system's operational flow. Updated glossary entries for Flight State and Operator to incorporate recent changes and enhance clarity on component interactions and responsibilities.
7.2 KiB
C13 — Flight Data Recorder (FDR)
1. High-Level Overview
Purpose: persist a per-flight ≤ 64 GB record of every payload class onboard (estimates, IMU traces, emitted MAVLink, system health, mid-flight tiles, ≤0.1 Hz failed-tile thumbnails) without silently dropping data (AC-NEW-3). Exclude raw nav/AI-cam frames (AC-8.5; only the failed-tile thumbnail forensic exception is allowed). The FDR is the system's audit log: every safety-critical decision, every emitted frame, every signing key rotation, every spoof-promotion-block lands here.
Architectural Pattern: single concrete FileFdrWriter behind a FdrWriter interface. Single writer thread fed by lock-free in-process queues from every component. Lossy on writer-thread overrun only by logging the rollover event, never silently.
Upstream dependencies: every component publishes to C13 via in-process pub/sub (drop-oldest-with-rollover-log on overrun).
Downstream consumers:
- Post-flight: operator workstation (read via C12 retrieval).
- Real-time: nothing — C13 is write-only at runtime.
2. Internal Interfaces
Interface: FdrWriter
| Method | Input | Output | Async | Error Types |
|---|---|---|---|---|
open_flight |
FlightHeader |
None |
No (called once at takeoff) | FdrOpenError |
write_record |
FdrRecord |
None |
No (lock-free enqueue) | FdrQueueOverrunError (logged but does not raise) |
close_flight |
() |
FlightFooter |
No (called once at landing) | — |
current_size_bytes |
() |
int |
No | — |
is_rolling |
() |
bool |
No | — |
Input/Output DTOs:
FlightHeader:
flight_id: uuid
flight_started_at: ISO 8601 + monotonic_ns
config_snapshot: JSON
signing_key_rotation_event: record
manifest_content_hashes: dict[Path, sha256]
FdrRecord: see data_model.md (FdrRecord; tagged union over payload classes)
FlightFooter:
flight_ended_at: ISO 8601 + monotonic_ns
records_written: int
records_dropped_overrun: int
bytes_written: int
rollover_count: int
3. External API Specification
Not applicable.
4. Data Access Patterns
| Query | Frequency | Hot Path | Index Needed |
|---|---|---|---|
write_record from every component |
up to ~100 Hz aggregate | Yes | n/a |
| Post-flight read (operator retrieval) | once per flight | No | filesystem layout per (flight_id, segment) |
Caching Strategy
| Data | Cache Type | TTL | Invalidation |
|---|---|---|---|
| In-process queue from each producer | bounded ring (drop-oldest with rollover log) | flight lifetime | per-record write |
| Writer-thread buffer | sized for ≥1 s of typical write load | flight lifetime | flush on segment rollover |
Storage Estimates
| Table/Collection | Est. Row Count (1yr) | Row Size | Total Size | Growth Rate |
|---|---|---|---|---|
| Per-flight record file (segmented, oldest-segment-dropped policy) | bounded by 64 GB per AC-NEW-3 | varies per payload class | ≤ 64 GB / flight | bounded by AC-NEW-3 |
| Per-flight tile snapshots (mid-flight tiles) | ~few hundred / flight | 50–200 KB each | up to ~50 MB / flight | bounded by F4 mid-flight gen |
| Per-flight failed-tile thumbnails (AC-8.5 forensic exception) | ≤ 0.1 Hz × 8 h = ≤ 2880 thumbnails / flight | small JPEG | <50 MB | bounded by ≤ 0.1 Hz cap |
Data Management
Seed data: none.
Rollback: per-segment file layout makes per-segment deletion safe. The writer never overwrites a closed segment; it only appends to the current open segment, then opens a new segment when the previous reaches a configurable size cap.
5. Implementation Details
Algorithmic Complexity: per-record cost is O(record_size) for serialisation + write. Aggregate throughput sized for the worst-case AC-NEW-3 cap.
State Management:
- Owns the open per-flight segment file handle.
- Owns the writer thread and the in-process producer queues.
- Owns the rollover policy (oldest-segment-dropped first when total reaches 64 GB).
Key Dependencies:
| Library | Version | Purpose |
|---|---|---|
| orjson / msgpack | per project pin | Record serialisation (serialised format choice during decompose phase) |
| atomicwrites | latest | Segment file rotation (atomic open of new segment + close of previous) |
| filelock | per project pin | Cross-process safety for the FDR root (operator-tool reads while companion writes — companion-only access during flight) |
Error Handling Strategy:
FdrOpenErrorat takeoff: refuse takeoff (per AC-NEW-3 every payload class must be present from t=0).FdrQueueOverrunError: per-producer drop-oldest, but the rollover event itself is ALWAYS logged (a separate "overrun" record in the FDR records the dropped count and producer-id). Never silent.- Filesystem write failure mid-flight: log to stdout/stderr (since we can't log to FDR at this point) + STATUSTEXT to GCS; the system continues to emit external positions because losing the audit log doesn't compromise navigation, but the operator must be alerted.
6. Extensions and Helpers
| Helper | Purpose | Used By |
|---|---|---|
RecordSchema |
versioned record schema for cross-version FDR compatibility | C13 only — this is internal |
7. Caveats & Edge Cases
Known limitations:
- 64 GB cap is per AC-NEW-3. If payload-class throughput grows beyond what the cap supports for an 8 h flight, the producers MUST throttle or accept oldest-dropped — the FDR will not silently exceed the cap.
- Failed-tile thumbnail forensic exception is the ONLY raw-imagery-adjacent persistence; AC-8.5 must be re-asserted if any new payload class is added.
Potential race conditions:
- The writer thread is the single writer; producers enqueue lock-free. No filesystem contention from within the companion. Operator-tool reads happen post-landing only.
Performance bottlenecks:
- Writer-thread serialisation throughput must exceed peak producer throughput. NFT-LIM-02 (8 h synthetic AC-NEW-3) validates.
8. Dependency Graph
Must be implemented after: nothing internal — C13 is foundational along with C7.
Can be implemented in parallel with: every other component.
Blocks: every component (every component logs to C13).
9. Logging Strategy
| Log Level | When | Example |
|---|---|---|
| ERROR | FdrOpenError, mid-flight filesystem write failure |
C13 segment write failure: errno=ENOSPC; STATUSTEXT to GCS |
| WARN | queue overrun (any producer) | C13 queue overrun: producer=c5_state; dropped_count=23 |
| INFO | open/close flight; segment rollover | C13 flight opened: flight_id=…; segment=0 |
| DEBUG | per-write timing (only in dev tier) | C13 record written: kind=estimate; bytes=412; took=0.1ms |
Log format: structured JSON to stdout/journald. Log storage: stdout / journald — but not C13 itself for ERROR (we'd be writing to the broken thing). FDR records are the project-level "logs" for everything except C13's own operational status.