mirror of
https://github.com/azaion/gps-denied-onboard.git
synced 2026-06-22 12:11:13 +00:00
Update autodev state, architecture documentation, and glossary terms
Transitioned the autodev state to phase 21, reflecting the completion of Step 5 and the drafting of Step 6 epics. Revised the architecture documentation to clarify the roles of the Tile Manager and its components, ensuring accurate representation of the system's operational flow. Updated glossary entries for Flight State and Operator to incorporate recent changes and enhance clarity on component interactions and responsibilities.
This commit is contained in:
@@ -0,0 +1,136 @@
|
||||
# C13 — Flight Data Recorder (FDR)
|
||||
|
||||
## 1. High-Level Overview
|
||||
|
||||
**Purpose**: persist a per-flight ≤ 64 GB record of every payload class onboard (estimates, IMU traces, emitted MAVLink, system health, mid-flight tiles, ≤0.1 Hz failed-tile thumbnails) without silently dropping data (AC-NEW-3). Exclude raw nav/AI-cam frames (AC-8.5; only the failed-tile thumbnail forensic exception is allowed). The FDR is the system's audit log: every safety-critical decision, every emitted frame, every signing key rotation, every spoof-promotion-block lands here.
|
||||
|
||||
**Architectural Pattern**: single concrete `FileFdrWriter` behind a `FdrWriter` interface. Single writer thread fed by lock-free in-process queues from every component. Lossy on writer-thread overrun **only by logging the rollover event**, never silently.
|
||||
|
||||
**Upstream dependencies**: every component publishes to C13 via in-process pub/sub (drop-oldest-with-rollover-log on overrun).
|
||||
|
||||
**Downstream consumers**:
|
||||
- Post-flight: operator workstation (read via C12 retrieval).
|
||||
- Real-time: nothing — C13 is write-only at runtime.
|
||||
|
||||
## 2. Internal Interfaces
|
||||
|
||||
### Interface: `FdrWriter`
|
||||
|
||||
| Method | Input | Output | Async | Error Types |
|
||||
|--------|-------|--------|-------|-------------|
|
||||
| `open_flight` | `FlightHeader` | `None` | No (called once at takeoff) | `FdrOpenError` |
|
||||
| `write_record` | `FdrRecord` | `None` | No (lock-free enqueue) | `FdrQueueOverrunError` (logged but does not raise) |
|
||||
| `close_flight` | `()` | `FlightFooter` | No (called once at landing) | — |
|
||||
| `current_size_bytes` | `()` | `int` | No | — |
|
||||
| `is_rolling` | `()` | `bool` | No | — |
|
||||
|
||||
**Input/Output DTOs**:
|
||||
```
|
||||
FlightHeader:
|
||||
flight_id: uuid
|
||||
flight_started_at: ISO 8601 + monotonic_ns
|
||||
config_snapshot: JSON
|
||||
signing_key_rotation_event: record
|
||||
manifest_content_hashes: dict[Path, sha256]
|
||||
|
||||
FdrRecord: see data_model.md (FdrRecord; tagged union over payload classes)
|
||||
|
||||
FlightFooter:
|
||||
flight_ended_at: ISO 8601 + monotonic_ns
|
||||
records_written: int
|
||||
records_dropped_overrun: int
|
||||
bytes_written: int
|
||||
rollover_count: int
|
||||
```
|
||||
|
||||
## 3. External API Specification
|
||||
|
||||
Not applicable.
|
||||
|
||||
## 4. Data Access Patterns
|
||||
|
||||
| Query | Frequency | Hot Path | Index Needed |
|
||||
|-------|-----------|----------|--------------|
|
||||
| `write_record` from every component | up to ~100 Hz aggregate | Yes | n/a |
|
||||
| Post-flight read (operator retrieval) | once per flight | No | filesystem layout per `(flight_id, segment)` |
|
||||
|
||||
### Caching Strategy
|
||||
|
||||
| Data | Cache Type | TTL | Invalidation |
|
||||
|------|-----------|-----|-------------|
|
||||
| In-process queue from each producer | bounded ring (drop-oldest with rollover log) | flight lifetime | per-record write |
|
||||
| Writer-thread buffer | sized for ≥1 s of typical write load | flight lifetime | flush on segment rollover |
|
||||
|
||||
### Storage Estimates
|
||||
|
||||
| Table/Collection | Est. Row Count (1yr) | Row Size | Total Size | Growth Rate |
|
||||
|-----------------|---------------------|----------|------------|-------------|
|
||||
| Per-flight record file (segmented, oldest-segment-dropped policy) | bounded by 64 GB per AC-NEW-3 | varies per payload class | ≤ 64 GB / flight | bounded by AC-NEW-3 |
|
||||
| Per-flight tile snapshots (mid-flight tiles) | ~few hundred / flight | 50–200 KB each | up to ~50 MB / flight | bounded by F4 mid-flight gen |
|
||||
| Per-flight failed-tile thumbnails (AC-8.5 forensic exception) | ≤ 0.1 Hz × 8 h = ≤ 2880 thumbnails / flight | small JPEG | <50 MB | bounded by ≤ 0.1 Hz cap |
|
||||
|
||||
### Data Management
|
||||
|
||||
**Seed data**: none.
|
||||
|
||||
**Rollback**: per-segment file layout makes per-segment deletion safe. The writer never overwrites a closed segment; it only appends to the current open segment, then opens a new segment when the previous reaches a configurable size cap.
|
||||
|
||||
## 5. Implementation Details
|
||||
|
||||
**Algorithmic Complexity**: per-record cost is `O(record_size)` for serialisation + write. Aggregate throughput sized for the worst-case AC-NEW-3 cap.
|
||||
|
||||
**State Management**:
|
||||
- Owns the open per-flight segment file handle.
|
||||
- Owns the writer thread and the in-process producer queues.
|
||||
- Owns the rollover policy (oldest-segment-dropped first when total reaches 64 GB).
|
||||
|
||||
**Key Dependencies**:
|
||||
|
||||
| Library | Version | Purpose |
|
||||
|---------|---------|---------|
|
||||
| orjson / msgpack | per project pin | Record serialisation (serialised format choice during decompose phase) |
|
||||
| atomicwrites | latest | Segment file rotation (atomic open of new segment + close of previous) |
|
||||
| filelock | per project pin | Cross-process safety for the FDR root (operator-tool reads while companion writes — companion-only access during flight) |
|
||||
|
||||
**Error Handling Strategy**:
|
||||
- `FdrOpenError` at takeoff: refuse takeoff (per AC-NEW-3 every payload class must be present from t=0).
|
||||
- `FdrQueueOverrunError`: per-producer drop-oldest, but the rollover event itself is ALWAYS logged (a separate "overrun" record in the FDR records the dropped count and producer-id). Never silent.
|
||||
- Filesystem write failure mid-flight: log to stdout/stderr (since we can't log to FDR at this point) + STATUSTEXT to GCS; the system continues to emit external positions because losing the audit log doesn't compromise navigation, but the operator must be alerted.
|
||||
|
||||
## 6. Extensions and Helpers
|
||||
|
||||
| Helper | Purpose | Used By |
|
||||
|--------|---------|---------|
|
||||
| `RecordSchema` | versioned record schema for cross-version FDR compatibility | C13 only — this is internal |
|
||||
|
||||
## 7. Caveats & Edge Cases
|
||||
|
||||
**Known limitations**:
|
||||
- 64 GB cap is per AC-NEW-3. If payload-class throughput grows beyond what the cap supports for an 8 h flight, the producers MUST throttle or accept oldest-dropped — the FDR will not silently exceed the cap.
|
||||
- Failed-tile thumbnail forensic exception is the ONLY raw-imagery-adjacent persistence; AC-8.5 must be re-asserted if any new payload class is added.
|
||||
|
||||
**Potential race conditions**:
|
||||
- The writer thread is the single writer; producers enqueue lock-free. No filesystem contention from within the companion. Operator-tool reads happen post-landing only.
|
||||
|
||||
**Performance bottlenecks**:
|
||||
- Writer-thread serialisation throughput must exceed peak producer throughput. NFT-LIM-02 (8 h synthetic AC-NEW-3) validates.
|
||||
|
||||
## 8. Dependency Graph
|
||||
|
||||
**Must be implemented after**: nothing internal — C13 is foundational along with C7.
|
||||
|
||||
**Can be implemented in parallel with**: every other component.
|
||||
|
||||
**Blocks**: every component (every component logs to C13).
|
||||
|
||||
## 9. Logging Strategy
|
||||
|
||||
| Log Level | When | Example |
|
||||
|-----------|------|---------|
|
||||
| ERROR | `FdrOpenError`, mid-flight filesystem write failure | `C13 segment write failure: errno=ENOSPC; STATUSTEXT to GCS` |
|
||||
| WARN | queue overrun (any producer) | `C13 queue overrun: producer=c5_state; dropped_count=23` |
|
||||
| INFO | open/close flight; segment rollover | `C13 flight opened: flight_id=…; segment=0` |
|
||||
| DEBUG | per-write timing (only in dev tier) | `C13 record written: kind=estimate; bytes=412; took=0.1ms` |
|
||||
|
||||
**Log format**: structured JSON to stdout/journald.
|
||||
**Log storage**: stdout / journald — but not C13 itself for ERROR (we'd be writing to the broken thing). FDR records are the project-level "logs" for everything except C13's own operational status.
|
||||
@@ -0,0 +1,166 @@
|
||||
# Test Specification — C13 Flight Data Recorder
|
||||
|
||||
Component-scoped. Suite-level coverage in `_docs/02_document/tests/*.md`.
|
||||
|
||||
## Acceptance Criteria Traceability
|
||||
|
||||
| AC ID | Acceptance Criterion (one-line) | Test IDs | Coverage |
|
||||
|-------|---------------------------------|----------|----------|
|
||||
| AC-1.4 | 95% covariance + source label (FDR record class) | FT-P-03, **C13-IT-01** | Covered |
|
||||
| AC-4.5 (revised) | Internal smoothing past keyframes (FDR-only path) | FT-P-10, **C13-IT-02** | Covered |
|
||||
| AC-8.5 | No raw nav/AI-cam frame retention except thumbnail log | FT-P-18, **C13-IT-03** | Covered |
|
||||
| AC-NEW-3 | FDR ≤64 GB / flight, no silent drops | NFT-LIM-02, **C13-IT-04** | Covered |
|
||||
| RESTRICT-UAV-4 | No raw-photo storage; tile cache + FDR only | FT-P-18, NFT-LIM-03 | Covered |
|
||||
|
||||
---
|
||||
|
||||
## Component-Internal Tests
|
||||
|
||||
### C13-IT-01: every payload class produces an FDR record
|
||||
|
||||
**Summary**: every component documented as an FDR producer publishes records of the right `kind`; missing producer = test failure.
|
||||
|
||||
**Traces to**: AC-1.4, AC-NEW-3 (every-payload-class-from-t=0 invariant)
|
||||
|
||||
**Description**: spin up a minimal in-process test harness wiring all 14 components (mocked where heavy, real where light); drive 10 s of synthetic flight; assert the FDR contains records of every documented `kind` (estimate, vio_health, vpr_health, match_health, pose_estimate, source_label_change, fc_emit, fc_inbound, signing_key_event, spoof_promotion_block, mid_flight_tile, failed_tile_thumbnail, system_health, segment_rollover).
|
||||
|
||||
**Input data**: synthetic 10 s flight.
|
||||
|
||||
**Expected result**: every kind present with at least one record.
|
||||
|
||||
**Max execution time**: 60 s.
|
||||
|
||||
---
|
||||
|
||||
### C13-IT-02: smoothed past-keyframe records land in FDR (NOT in FC stream)
|
||||
|
||||
**Summary**: when C5 publishes a smoothed past-keyframe revision, it lands in FDR but is NOT forwarded to C8's emission path.
|
||||
|
||||
**Traces to**: AC-4.5 (revised)
|
||||
|
||||
**Description**: per C5-IT-04 — re-run that scenario; assert (a) FDR contains the smoothed-history record class, (b) the C8 emission stream contains the original (unshifted) value.
|
||||
|
||||
**Input data**: shared with C5-IT-04.
|
||||
|
||||
**Expected result**: per assertion.
|
||||
|
||||
**Max execution time**: 60 s.
|
||||
|
||||
---
|
||||
|
||||
### C13-IT-03: AC-8.5 forensic-thumbnail-only enforcement
|
||||
|
||||
**Summary**: C13 refuses to write a raw nav-cam or AI-cam frame; the only allowed exception is the failed-tile thumbnail at ≤0.1 Hz cap.
|
||||
|
||||
**Traces to**: AC-8.5
|
||||
|
||||
**Description**: attempt to write an `FdrRecord` of `kind = "raw_nav_frame"`; assert C13 raises `RawFrameWriteForbiddenError`. Attempt `kind = "failed_tile_thumbnail"` at 0.05 Hz; assert accepted. Attempt the same kind at 0.5 Hz (above cap); assert rate-limited (drop with cap log).
|
||||
|
||||
**Input data**: scripted producer.
|
||||
|
||||
**Expected result**: per assertion.
|
||||
|
||||
**Max execution time**: 30 s.
|
||||
|
||||
---
|
||||
|
||||
### C13-IT-04: 64 GB cap holds without silent drop
|
||||
|
||||
**Summary**: under synthetic 8 h replay producing > 64 GB worth of records, C13 enforces the cap via oldest-segment-dropped + always logs the rollover event.
|
||||
|
||||
**Traces to**: AC-NEW-3
|
||||
|
||||
**Description**: scripted producer that emits at peak rates known to cross 64 GB in 8 h; replay; assert (a) total disk usage stays ≤ 64 GB, (b) every segment-rollover-with-drop is recorded with producer-id + dropped count, (c) the FlightFooter shows non-zero `records_dropped_overrun` matching the rollover events.
|
||||
|
||||
**Input data**: synthetic high-rate producer.
|
||||
|
||||
**Expected result**: cap held; every drop visible.
|
||||
|
||||
**Max execution time**: 8 h on a Tier-1 runner (NFT-LIM-02 budget).
|
||||
|
||||
---
|
||||
|
||||
### C13-IT-05: queue overrun produces a structured drop record (never silent)
|
||||
|
||||
**Summary**: when a producer overruns its in-process queue, C13 writes a structured "overrun" record naming the producer and the dropped count.
|
||||
|
||||
**Traces to**: AC-NEW-3 (no-silent-drop)
|
||||
|
||||
**Description**: artificially throttle the writer thread; flood one producer's queue; assert (a) the first dropped record triggers an "overrun" record, (b) the overrun record includes producer-id + dropped count, (c) when throttling is removed, normal writes resume.
|
||||
|
||||
**Input data**: scripted producer + writer-thread throttle harness.
|
||||
|
||||
**Expected result**: overrun record present + accurate count.
|
||||
|
||||
**Max execution time**: 60 s.
|
||||
|
||||
---
|
||||
|
||||
### C13-IT-06: refuse takeoff if `open_flight` fails
|
||||
|
||||
**Summary**: per AC-NEW-3, every payload class must be present from t=0 — if C13 cannot open the segment file, takeoff is aborted.
|
||||
|
||||
**Traces to**: AC-NEW-3
|
||||
|
||||
**Description**: configure `flight_root` to a directory the process cannot write to; call `open_flight`; assert `FdrOpenError` raised; assert the calling code (compositional root) refuses to open the FC adapter.
|
||||
|
||||
**Input data**: read-only `flight_root` directory.
|
||||
|
||||
**Expected result**: takeoff aborts; error logged.
|
||||
|
||||
**Max execution time**: 5 s.
|
||||
|
||||
---
|
||||
|
||||
## Performance Tests
|
||||
|
||||
### C13-PT-01: writer-thread throughput vs peak producer rate
|
||||
|
||||
**Traces to**: AC-NEW-3
|
||||
|
||||
**Load scenario**: aggregated peak producer rate (~100 Hz combined records).
|
||||
|
||||
**Expected results**:
|
||||
|
||||
| Metric | Target | Failure Threshold |
|
||||
|--------|--------|-------------------|
|
||||
| Writer throughput | ≥ 200 Hz sustained | < 100 Hz |
|
||||
| Per-record serialise + write p95 | ≤ 5 ms | 20 ms |
|
||||
|
||||
---
|
||||
|
||||
## Security Tests
|
||||
|
||||
### C13-ST-01: FDR record cannot be silenced via config
|
||||
|
||||
**Summary**: there is no config flag that disables the spoofing-promotion-block, signing-key-rotation, or rollover-drop record kinds — they are mandatory per AC-NEW-3 + ADR-008.
|
||||
|
||||
**Traces to**: defensive (AC-NEW-3, ADR-008)
|
||||
|
||||
**Test procedure**:
|
||||
1. Search the config schema for any flag that could disable any of those record kinds.
|
||||
2. Assert no such flag exists.
|
||||
3. Inject events of each kind under every documented config preset; assert all land in FDR.
|
||||
|
||||
**Pass criteria**: no disabling flag found; all events land.
|
||||
**Fail criteria**: any disabling flag exists or any event suppressed.
|
||||
|
||||
---
|
||||
|
||||
## Acceptance Tests
|
||||
|
||||
Covered transitively via FT-P-03 / FT-P-10 / FT-P-18 / NFT-LIM-02.
|
||||
|
||||
---
|
||||
|
||||
## Test Data Management
|
||||
|
||||
| Data Set | Source | Size |
|
||||
|----------|--------|------|
|
||||
| Synthetic 10 s flight harness | scripted | <10 MB |
|
||||
| 8 h synthetic high-rate producer | scripted | runtime-generated |
|
||||
| Read-only `flight_root` fixture | scripted | n/a (fs perms) |
|
||||
|
||||
**Setup**: per-test `flight_root` under `tests/tmp/c13/<test-id>/`.
|
||||
**Teardown**: drop tmp directory.
|
||||
**Data isolation**: per-test `flight_root`.
|
||||
Reference in New Issue
Block a user