- 02-05-SUMMARY.md: full execution record for plan 02-05 - ROADMAP.md: plan progress updated (5 summaries of 7 plans complete) - REQUIREMENTS.md: AC-06 and TEST-02 marked complete Per-marker CI jobs and ac-traceability gate are now the CI contract. All 21 orphan ACs annotated pending-phase-N; --check exits 0 locally.
13 KiB
Roadmap: GPS-Denied Onboard Navigation System — Stage 2
Stage: 2 (independent iteration)
Created: 2026-05-10
Branch: stage2 (HEAD = stage1; v1.0 archived under .planning/archive/v1.0/)
Granularity: standard
Total phases: 6
Total requirements mapped: 52 / 52 (100% coverage)
Overview
Stage 2 is a self-contained iteration with its own phase numbering (1–6). It is NOT a continuation of Stage 1's seven phases — those are archived under .planning/archive/v1.0/ and treated as MVP starting capital (the working ESKF + cuVSLAM/ORB VO + GPR + MAVLink + 195 passing tests).
The Stage 2 mission: refactor the inherited MVP into a hexagonal/ports-and-adapters architecture, re-implement (not merge) selected concept-level ideas from the parallel try02 branch, formalize acceptance criteria with testable numerics, and add the Azaion 10.05.2026 real-flight integration fixture — all without regressing any of the 195 stage1 tests.
Phases are derived from the ten Stage 2 requirement categories (ARCH, AC, SAFE, VERIFY, FDR, VPR, MAVOUT, FIXTURE, TEST, OBS) and ordered so each phase stabilizes the Protocol surfaces and test infrastructure that the next phase depends on.
Phase Dependency Order
Phase 1 (ARCH — hexagonal refactor + composition root; Protocols stabilized)
↓
Phase 2 (AC + TEST taxonomy + structlog spine — measurement scaffolding)
↓
Phase 3 (SAFE state machine + VERIFY anchor gates — authoritative source labels)
↓
Phase 4 (Conditional Multi-Scale VPR + FDR — uses SAFE triggers, FDR for audit)
↓
Phase 5 (MAVOUT — source-aware GPS_INPUT + spoofing + blackout — needs SAFE labels)
↓
Phase 6 (FIXTURE — Azaion replay + CLI + per-env Docker — exercises everything e2e)
Phases
- Phase 1: Hexagonal Refactor & Composition Root — Reorganize stage1 MVP into
components/hexagonal layout with Protocol-typed DI composition root; no regressions. - Phase 2: Acceptance Criteria + Test Taxonomy + Observability Spine — Formal AC document with numeric thresholds,
tests/{unit,integration,blackbox,sitl,e2e}/taxonomy, structlog correlation_id spine. - Phase 3: Safety Anchor State Machine & Geometry-Gated Verifier — Authoritative
source_labelownership + accept/reject gates for satellite anchors before they reach ESKF. - Phase 4: Conditional Multi-Scale VPR + Flight Data Recorder — Trigger-driven DINOv2 forward, multi-scale FAISS chunks, append-only event log with bounded storage.
- Phase 5: MAVLink Source-Aware Output & Spoofing/Blackout Handling — Source labels + anchor age in GPS_INPUT, spoofing-promotion <3s, visual-blackout dead-reckoning ≤400ms, ODOMETRY scaffold behind feature flag.
- Phase 6: Real-Flight Fixture (Azaion 10.05.2026) + CLI + Per-Env Docker — End-to-end integration test on real flight data,
gps_deniedtyper CLI, split Jetson/x86 Dockerfiles.
Phase Details
Phase 1: Hexagonal Refactor & Composition Root
Goal: Stage1 MVP reorganized into hexagonal/ports-and-adapters layout with explicit DI composition root; all 195 stage1 tests still pass. Depends on: Nothing (first phase; consumes stage1 archived code as input). Requirements: ARCH-01, ARCH-02, ARCH-03, ARCH-04, ARCH-05, ARCH-06, ARCH-07 Success Criteria (what must be TRUE):
- Every swappable component (vio, satellite_matcher, gpr, anchor_verifier, safety_state, flight_recorder, mavlink_io, coordinate_transforms) lives under
src/gps_denied/components/<name>/with its ownprotocol.py+ concrete impls + (where needed)native/bridge. - Hot-path types (
FrameState,IMUSample,PositionEstimate,VOEstimate,SatelliteAnchor) are@dataclass(slots=True, frozen=True)and Pydantic no longer touches the per-frame path. - Calling
build_pipeline(env="x86_dev")/"jetson"/"ci"/"sitl"frompipeline/composition.pyreturns a fully-wiredPipelinewith environment-correct adapters and no concrete imports leaking into pipeline orchestration. - Per-environment YAML configs (
config/{jetson,x86_dev,ci,sitl}.yaml) load viapydantic-settingsinto a typedRuntimeConfigthat drives composition. pytestruns all 195 stage1 tests (+ 8 SITL skipped) green and accuracy benchmarks show no regression vs the archived stage1 baseline. Plans: TBD
Phase 2: Acceptance Criteria + Test Taxonomy + Observability Spine
Goal: Project gains a formal, testable acceptance-criteria contract, a structured test taxonomy, and a structured-logging spine — the measurement scaffolding every later phase needs to prove its claims. Depends on: Phase 1 (Protocol surfaces and components/ layout must exist before tests/AC can reference them). Requirements: AC-01, AC-02, AC-03, AC-04, AC-05, AC-06, TEST-01, TEST-02, TEST-03, OBS-01 Success Criteria (what must be TRUE):
_docs/00_problem/acceptance_criteria.mdlists every AC-1.x…AC-NEW-x with numeric threshold + validation method + linked test ID(s); no AC entry is unbound.tests/is reorganized intounit/integration/blackbox/sitl/e2e/, every existing test is reclassified, andpytest -m unit|integration|blackbox|sitl|e2eselects the right subset for CI.- Running
scripts/gen_ac_traceability.pyproduces.planning/AC-TRACEABILITY.mdlinking every AC ID → test ID(s) → component(s); CI fails if any AC is orphaned. - Position-accuracy, failure-mode, and real-time-performance ACs are wired to
tests/integration/accuracy/,tests/blackbox/failure_modes/, and a benchmark harness that emits CI-tracked metrics. - Pipeline emits structured JSON via
structlogwithcorrelation_id(frame_id) on every per-frame log line, and Pydantic logging schemas guard the boundary records. Plans: TBD
Phase 3: Safety Anchor State Machine & Geometry-Gated Verifier
Goal: A separate safety layer — not the ESKF — owns the authoritative source_label, enforces monotonic covariance growth in non-anchored modes, and only accepts satellite anchors that pass formal geometric gates.
Depends on: Phase 2 (needs AC document + test taxonomy + structlog so state-machine behavior is testable and observable).
Requirements: SAFE-01, SAFE-02, SAFE-03, SAFE-04, SAFE-05, SAFE-06, VERIFY-01, VERIFY-02, VERIFY-03, VERIFY-04, VERIFY-05
Success Criteria (what must be TRUE):
- Every emitted
PositionEstimatecarries one ofsatellite_anchored / vo_extrapolated / dead_reckonedset bySafetyAnchorStateMachine, plus ananchor_age_msfield that increases until the next accepted anchor. - Property-based tests prove covariance never decreases without an accepted anchor, and a unit-test matrix exercises all 9 declared state transitions.
GeometryGatedAnchorVerifieraccepts/rejects each candidate using configurable gates (min inliers, max mean reprojection error, max homography condition number, freshness window) and emits a machine-readable rejection reason on every reject.- Tile-write eligibility (
can_persist_tile) is exposed by the state machine and isfalsewhenever the system is indead_reckoned, so the tile cache cannot be poisoned during blind flight. - The state machine never sees raw VPR top-K candidates —
AnchorVerifieris the only path that can hand it an accepted anchor — and benchmark mode lets matcher profiles be compared offline on a fixed frame. Plans: TBD
Phase 4: Conditional Multi-Scale VPR + Flight Data Recorder
Goal: DINOv2 retrieval runs only when re-localization is actually needed; chunks are decoupled from storage tiles with multi-scale coverage; every state transition / anchor decision / MAVLink emission is captured in an append-only flight recorder with bounded storage and explicit health states. Depends on: Phase 3 (VPR triggers and FDR events ride on SAFE state-transitions and VERIFY accept/reject decisions). Requirements: VPR-01, VPR-02, VPR-03, VPR-04, VPR-05, FDR-01, FDR-02, FDR-03, FDR-04, FDR-05, FDR-06 Success Criteria (what must be TRUE):
- In steady state the pipeline ranks chunks by IMU+VO geometric prior and skips the DINOv2 forward; DINOv2 runs only on declared re-loc triggers (cold start, sharp turn, σ_xy > 50m, VO failure ≥2 frames, disconnected segment).
- VPR chunks cover the operating area with 600–800m ground footprint and 40–50% overlap so any frame footprint falls fully inside ≥1 chunk; FAISS holds both fine-scale (z=20) and coarse-scale (z=17/18) descriptor sets.
- Top-K is dynamic — K=5 stable, K=20 active-conflict, K=50 expanding-window — and the integration uses the existing
chunk_manager.py/gpr.pyAPI surface without breaking stage1 GPR contracts. FlightRecorderwrites append-only JSONL segments todata/fdr/{flight_id}/segment-NNNN.jsonl, enforces configurable segment + total storage byte limits, and exposeshealth ∈ {ok, degraded, critical}.- State transitions, anchor accept/reject decisions, MAVLink sends, and pipeline errors are all recorded as FDR events; AC-NEW-3 forensic thumbnails fire at ≤0.1Hz on tile-generation failures within the FDR size budget. Plans: TBD
Phase 5: MAVLink Source-Aware Output & Spoofing/Blackout Handling
Goal: The MAVLink output the flight controller actually sees carries source provenance and reacts correctly to GPS spoofing and visual blackout, with the dual-channel ODOMETRY path scaffolded but disabled. Depends on: Phase 4 (needs SAFE source labels, FDR audit channel, and VPR triggers to drive blackout/promotion semantics). Requirements: MAVOUT-01, MAVOUT-02, MAVOUT-03, MAVOUT-04 Success Criteria (what must be TRUE):
- Every
GPS_INPUTmessage carriessource_label,anchor_age_ms, andcovariance_semimajor_mpropagated from the correspondingPositionEstimate(mapped intohoriz_accuracyand a custom STATUSTEXT for label/age). - When real-GPS health rolling average drops below threshold, the system promotes its own estimate to FC primary within <3s and emits a
STATUSTEXTon every promotion/demotion. - When the camera produces no usable signal, the pipeline switches to
dead_reckonedwithin ≤1 processed frame OR ≤400ms and emitsVISUAL_BLACKOUT_IMU_ONLYSTATUSTEXT at 1–2Hz until imagery returns. - The
ODOMETRYemitter exists in code but is disabled byconfig.mavlink.odometry_enabled=falsein stage 2, and an integration test asserts ODOMETRY is intentionally absent on the wire. Plans: TBD
Phase 6: Real-Flight Fixture (Azaion 10.05.2026) + CLI + Per-Env Docker
Goal: The whole stack is exercised end-to-end against real flight data, an operator-facing CLI replays flights and runs AC benchmarks, and per-environment Docker images close the deployment loop. Depends on: Phase 5 (final phase — exercises ARCH + AC + SAFE + VERIFY + VPR + FDR + MAVOUT against the Azaion fixture). Requirements: FIXTURE-01, FIXTURE-02, FIXTURE-03, FIXTURE-04, FIXTURE-05, FIXTURE-06, FIXTURE-07, OBS-02, OBS-03 Success Criteria (what must be TRUE):
tests/integration/azaion_flight/runs againstData/Azaion/10.05.2026/(tlog + cropped EO video + MAVLink CSV) and is documented in_docs/00_problem/fixtures.mdwith ground-truth references and known limitations.scripts/prep_azaion_fixture.pyproduces HUD-stripped EO frames at 0.7 fps, an IMU/GPS/ATTITUDE CSV from the tlog, and a timestamp-aligned manifest.- MAVLink replay decodes every
GLOBAL_POSITION_INT/RAW_IMU/ATTITUDEmessage without error; ESKF replay on the real IMU samples produces no NaN/Inf and shows bounded covariance growth; ORB-SLAM3 VO smoke test achieves ≥30% frame registration on the cropped EO frames. - The GPS-denial simulation masks
GPS_RAW_INTfor t∈[180s, 280s] and the pipeline correctly switches tovo_extrapolatedand back tosatellite_anchoredwhen GPS returns. gps_deniedtyper CLI exposesreplay --tlog ... --video ...,benchmark --scenario ..., andbench-ac AC-1.1;Dockerfile.x86_devandDockerfile.jetson(multi-stage with TRT engine prebuild step) build green and run the replay end-to-end on their respective platforms. Plans: TBD
Progress
| Phase | Plans Complete | Status | Completed |
|---|---|---|---|
| 1. Hexagonal Refactor & Composition Root | 0/0 | Not started | - |
| 2. Acceptance Criteria + Test Taxonomy + Observability Spine | 5/7 | In Progress | |
| 3. Safety Anchor State Machine & Geometry-Gated Verifier | 0/0 | Not started | - |
| 4. Conditional Multi-Scale VPR + Flight Data Recorder | 0/0 | Not started | - |
| 5. MAVLink Source-Aware Output & Spoofing/Blackout Handling | 0/0 | Not started | - |
| 6. Real-Flight Fixture (Azaion 10.05.2026) + CLI + Per-Env Docker | 0/0 | Not started | - |
Coverage Summary
| Category | Count | Phase |
|---|---|---|
| ARCH | 7 | Phase 1 |
| AC | 6 | Phase 2 |
| TEST | 3 | Phase 2 |
| OBS-01 | 1 | Phase 2 |
| SAFE | 6 | Phase 3 |
| VERIFY | 5 | Phase 3 |
| VPR | 5 | Phase 4 |
| FDR | 6 | Phase 4 |
| MAVOUT | 4 | Phase 5 |
| FIXTURE | 7 | Phase 6 |
| OBS-02, OBS-03 | 2 | Phase 6 |
| Total | 52 | 6 phases |
100% of Stage 2 requirements mapped; no orphans; no duplicates.