# Roadmap: GPS-Denied Onboard Navigation System — Stage 2 **Stage:** 2 (independent iteration) **Created:** 2026-05-10 **Branch:** `stage2` (HEAD = stage1; v1.0 archived under `.planning/archive/v1.0/`) **Granularity:** standard **Total phases:** 6 **Total requirements mapped:** 52 / 52 (100% coverage) --- ## Overview Stage 2 is a **self-contained iteration** with its own phase numbering (1–6). It is NOT a continuation of Stage 1's seven phases — those are archived under `.planning/archive/v1.0/` and treated as MVP starting capital (the working ESKF + cuVSLAM/ORB VO + GPR + MAVLink + 195 passing tests). The Stage 2 mission: refactor the inherited MVP into a hexagonal/ports-and-adapters architecture, re-implement (not merge) selected concept-level ideas from the parallel `try02` branch, formalize acceptance criteria with testable numerics, and add the Azaion 10.05.2026 real-flight integration fixture — all without regressing any of the 195 stage1 tests. Phases are derived from the ten Stage 2 requirement categories (ARCH, AC, SAFE, VERIFY, FDR, VPR, MAVOUT, FIXTURE, TEST, OBS) and ordered so each phase stabilizes the Protocol surfaces and test infrastructure that the next phase depends on. ## Phase Dependency Order ``` Phase 1 (ARCH — hexagonal refactor + composition root; Protocols stabilized) ↓ Phase 2 (AC + TEST taxonomy + structlog spine — measurement scaffolding) ↓ Phase 3 (SAFE state machine + VERIFY anchor gates — authoritative source labels) ↓ Phase 4 (Conditional Multi-Scale VPR + FDR — uses SAFE triggers, FDR for audit) ↓ Phase 5 (MAVOUT — source-aware GPS_INPUT + spoofing + blackout — needs SAFE labels) ↓ Phase 6 (FIXTURE — Azaion replay + CLI + per-env Docker — exercises everything e2e) ``` ## Phases - [ ] **Phase 1: Hexagonal Refactor & Composition Root** — Reorganize stage1 MVP into `components/` hexagonal layout with Protocol-typed DI composition root; no regressions. - [ ] **Phase 2: Acceptance Criteria + Test Taxonomy + Observability Spine** — Formal AC document with numeric thresholds, `tests/{unit,integration,blackbox,sitl,e2e}/` taxonomy, structlog correlation_id spine. - [ ] **Phase 3: Safety Anchor State Machine & Geometry-Gated Verifier** — Authoritative `source_label` ownership + accept/reject gates for satellite anchors before they reach ESKF. - [ ] **Phase 4: Conditional Multi-Scale VPR + Flight Data Recorder** — Trigger-driven DINOv2 forward, multi-scale FAISS chunks, append-only event log with bounded storage. - [ ] **Phase 5: MAVLink Source-Aware Output & Spoofing/Blackout Handling** — Source labels + anchor age in GPS_INPUT, spoofing-promotion <3s, visual-blackout dead-reckoning ≤400ms, ODOMETRY scaffold behind feature flag. - [ ] **Phase 6: Real-Flight Fixture (Azaion 10.05.2026) + CLI + Per-Env Docker** — End-to-end integration test on real flight data, `gps_denied` typer CLI, split Jetson/x86 Dockerfiles. ## Phase Details ### Phase 1: Hexagonal Refactor & Composition Root **Goal**: Stage1 MVP reorganized into hexagonal/ports-and-adapters layout with explicit DI composition root; all 195 stage1 tests still pass. **Depends on**: Nothing (first phase; consumes stage1 archived code as input). **Requirements**: ARCH-01, ARCH-02, ARCH-03, ARCH-04, ARCH-05, ARCH-06, ARCH-07 **Success Criteria** (what must be TRUE): 1. Every swappable component (vio, satellite_matcher, gpr, anchor_verifier, safety_state, flight_recorder, mavlink_io, coordinate_transforms) lives under `src/gps_denied/components//` with its own `protocol.py` + concrete impls + (where needed) `native/` bridge. 2. Hot-path types (`FrameState`, `IMUSample`, `PositionEstimate`, `VOEstimate`, `SatelliteAnchor`) are `@dataclass(slots=True, frozen=True)` and Pydantic no longer touches the per-frame path. 3. Calling `build_pipeline(env="x86_dev")` / `"jetson"` / `"ci"` / `"sitl"` from `pipeline/composition.py` returns a fully-wired `Pipeline` with environment-correct adapters and no concrete imports leaking into pipeline orchestration. 4. Per-environment YAML configs (`config/{jetson,x86_dev,ci,sitl}.yaml`) load via `pydantic-settings` into a typed `RuntimeConfig` that drives composition. 5. `pytest` runs all 195 stage1 tests (+ 8 SITL skipped) green and accuracy benchmarks show no regression vs the archived stage1 baseline. **Plans**: TBD ### Phase 2: Acceptance Criteria + Test Taxonomy + Observability Spine **Goal**: Project gains a formal, testable acceptance-criteria contract, a structured test taxonomy, and a structured-logging spine — the measurement scaffolding every later phase needs to prove its claims. **Depends on**: Phase 1 (Protocol surfaces and components/ layout must exist before tests/AC can reference them). **Requirements**: AC-01, AC-02, AC-03, AC-04, AC-05, AC-06, TEST-01, TEST-02, TEST-03, OBS-01 **Success Criteria** (what must be TRUE): 1. `_docs/00_problem/acceptance_criteria.md` lists every AC-1.x…AC-NEW-x with numeric threshold + validation method + linked test ID(s); no AC entry is unbound. 2. `tests/` is reorganized into `unit/integration/blackbox/sitl/e2e/`, every existing test is reclassified, and `pytest -m unit|integration|blackbox|sitl|e2e` selects the right subset for CI. 3. Running `scripts/gen_ac_traceability.py` produces `.planning/AC-TRACEABILITY.md` linking every AC ID → test ID(s) → component(s); CI fails if any AC is orphaned. 4. Position-accuracy, failure-mode, and real-time-performance ACs are wired to `tests/integration/accuracy/`, `tests/blackbox/failure_modes/`, and a benchmark harness that emits CI-tracked metrics. 5. Pipeline emits structured JSON via `structlog` with `correlation_id` (frame_id) on every per-frame log line, and Pydantic logging schemas guard the boundary records. **Plans**: TBD ### Phase 3: Safety Anchor State Machine & Geometry-Gated Verifier **Goal**: A separate safety layer — not the ESKF — owns the authoritative `source_label`, enforces monotonic covariance growth in non-anchored modes, and only accepts satellite anchors that pass formal geometric gates. **Depends on**: Phase 2 (needs AC document + test taxonomy + structlog so state-machine behavior is testable and observable). **Requirements**: SAFE-01, SAFE-02, SAFE-03, SAFE-04, SAFE-05, SAFE-06, VERIFY-01, VERIFY-02, VERIFY-03, VERIFY-04, VERIFY-05 **Success Criteria** (what must be TRUE): 1. Every emitted `PositionEstimate` carries one of `satellite_anchored / vo_extrapolated / dead_reckoned` set by `SafetyAnchorStateMachine`, plus an `anchor_age_ms` field that increases until the next accepted anchor. 2. Property-based tests prove covariance never decreases without an accepted anchor, and a unit-test matrix exercises all 9 declared state transitions. 3. `GeometryGatedAnchorVerifier` accepts/rejects each candidate using configurable gates (min inliers, max mean reprojection error, max homography condition number, freshness window) and emits a machine-readable rejection reason on every reject. 4. Tile-write eligibility (`can_persist_tile`) is exposed by the state machine and is `false` whenever the system is in `dead_reckoned`, so the tile cache cannot be poisoned during blind flight. 5. The state machine never sees raw VPR top-K candidates — `AnchorVerifier` is the only path that can hand it an accepted anchor — and benchmark mode lets matcher profiles be compared offline on a fixed frame. **Plans**: TBD ### Phase 4: Conditional Multi-Scale VPR + Flight Data Recorder **Goal**: DINOv2 retrieval runs only when re-localization is actually needed; chunks are decoupled from storage tiles with multi-scale coverage; every state transition / anchor decision / MAVLink emission is captured in an append-only flight recorder with bounded storage and explicit health states. **Depends on**: Phase 3 (VPR triggers and FDR events ride on SAFE state-transitions and VERIFY accept/reject decisions). **Requirements**: VPR-01, VPR-02, VPR-03, VPR-04, VPR-05, FDR-01, FDR-02, FDR-03, FDR-04, FDR-05, FDR-06 **Success Criteria** (what must be TRUE): 1. In steady state the pipeline ranks chunks by IMU+VO geometric prior and skips the DINOv2 forward; DINOv2 runs only on declared re-loc triggers (cold start, sharp turn, σ_xy > 50m, VO failure ≥2 frames, disconnected segment). 2. VPR chunks cover the operating area with 600–800m ground footprint and 40–50% overlap so any frame footprint falls fully inside ≥1 chunk; FAISS holds both fine-scale (z=20) and coarse-scale (z=17/18) descriptor sets. 3. Top-K is dynamic — K=5 stable, K=20 active-conflict, K=50 expanding-window — and the integration uses the existing `chunk_manager.py` / `gpr.py` API surface without breaking stage1 GPR contracts. 4. `FlightRecorder` writes append-only JSONL segments to `data/fdr/{flight_id}/segment-NNNN.jsonl`, enforces configurable segment + total storage byte limits, and exposes `health ∈ {ok, degraded, critical}`. 5. State transitions, anchor accept/reject decisions, MAVLink sends, and pipeline errors are all recorded as FDR events; AC-NEW-3 forensic thumbnails fire at ≤0.1Hz on tile-generation failures within the FDR size budget. **Plans**: TBD ### Phase 5: MAVLink Source-Aware Output & Spoofing/Blackout Handling **Goal**: The MAVLink output the flight controller actually sees carries source provenance and reacts correctly to GPS spoofing and visual blackout, with the dual-channel ODOMETRY path scaffolded but disabled. **Depends on**: Phase 4 (needs SAFE source labels, FDR audit channel, and VPR triggers to drive blackout/promotion semantics). **Requirements**: MAVOUT-01, MAVOUT-02, MAVOUT-03, MAVOUT-04 **Success Criteria** (what must be TRUE): 1. Every `GPS_INPUT` message carries `source_label`, `anchor_age_ms`, and `covariance_semimajor_m` propagated from the corresponding `PositionEstimate` (mapped into `horiz_accuracy` and a custom STATUSTEXT for label/age). 2. When real-GPS health rolling average drops below threshold, the system promotes its own estimate to FC primary within <3s and emits a `STATUSTEXT` on every promotion/demotion. 3. When the camera produces no usable signal, the pipeline switches to `dead_reckoned` within ≤1 processed frame OR ≤400ms and emits `VISUAL_BLACKOUT_IMU_ONLY` STATUSTEXT at 1–2Hz until imagery returns. 4. The `ODOMETRY` emitter exists in code but is disabled by `config.mavlink.odometry_enabled=false` in stage 2, and an integration test asserts ODOMETRY is intentionally absent on the wire. **Plans**: TBD ### Phase 6: Real-Flight Fixture (Azaion 10.05.2026) + CLI + Per-Env Docker **Goal**: The whole stack is exercised end-to-end against real flight data, an operator-facing CLI replays flights and runs AC benchmarks, and per-environment Docker images close the deployment loop. **Depends on**: Phase 5 (final phase — exercises ARCH + AC + SAFE + VERIFY + VPR + FDR + MAVOUT against the Azaion fixture). **Requirements**: FIXTURE-01, FIXTURE-02, FIXTURE-03, FIXTURE-04, FIXTURE-05, FIXTURE-06, FIXTURE-07, OBS-02, OBS-03 **Success Criteria** (what must be TRUE): 1. `tests/integration/azaion_flight/` runs against `Data/Azaion/10.05.2026/` (tlog + cropped EO video + MAVLink CSV) and is documented in `_docs/00_problem/fixtures.md` with ground-truth references and known limitations. 2. `scripts/prep_azaion_fixture.py` produces HUD-stripped EO frames at 0.7 fps, an IMU/GPS/ATTITUDE CSV from the tlog, and a timestamp-aligned manifest. 3. MAVLink replay decodes every `GLOBAL_POSITION_INT` / `RAW_IMU` / `ATTITUDE` message without error; ESKF replay on the real IMU samples produces no NaN/Inf and shows bounded covariance growth; ORB-SLAM3 VO smoke test achieves ≥30% frame registration on the cropped EO frames. 4. The GPS-denial simulation masks `GPS_RAW_INT` for t∈[180s, 280s] and the pipeline correctly switches to `vo_extrapolated` and back to `satellite_anchored` when GPS returns. 5. `gps_denied` typer CLI exposes `replay --tlog ... --video ...`, `benchmark --scenario ...`, and `bench-ac AC-1.1`; `Dockerfile.x86_dev` and `Dockerfile.jetson` (multi-stage with TRT engine prebuild step) build green and run the replay end-to-end on their respective platforms. **Plans**: TBD ## Progress | Phase | Plans Complete | Status | Completed | |-------|----------------|--------|-----------| | 1. Hexagonal Refactor & Composition Root | 0/0 | Not started | - | | 2. Acceptance Criteria + Test Taxonomy + Observability Spine | 0/0 | Not started | - | | 3. Safety Anchor State Machine & Geometry-Gated Verifier | 0/0 | Not started | - | | 4. Conditional Multi-Scale VPR + Flight Data Recorder | 0/0 | Not started | - | | 5. MAVLink Source-Aware Output & Spoofing/Blackout Handling | 0/0 | Not started | - | | 6. Real-Flight Fixture (Azaion 10.05.2026) + CLI + Per-Env Docker | 0/0 | Not started | - | ## Coverage Summary | Category | Count | Phase | |----------|-------|-------| | ARCH | 7 | Phase 1 | | AC | 6 | Phase 2 | | TEST | 3 | Phase 2 | | OBS-01 | 1 | Phase 2 | | SAFE | 6 | Phase 3 | | VERIFY | 5 | Phase 3 | | VPR | 5 | Phase 4 | | FDR | 6 | Phase 4 | | MAVOUT | 4 | Phase 5 | | FIXTURE | 7 | Phase 6 | | OBS-02, OBS-03 | 2 | Phase 6 | | **Total** | **52** | **6 phases** | 100% of Stage 2 requirements mapped; no orphans; no duplicates.