docs: add Phase 1 ADRs and update PROJECT.md with completed decisions

ADR 0002: hexagonal/ports-and-adapters architecture — components/ layout,
  protocol.py per component, composition root, core/ for concentrated math.
ADR 0003: @dataclass(slots=True, frozen=True) on hot path; Pydantic retained
  only at REST/config/DB boundaries. Pose/GPSPoint migration deferred to Phase 2.
ADR 0004: Stage 2 as independent iteration — own phases 1-6, own requirements,
  stage1 code treated as MVP starting capital.

PROJECT.md: Stage 2 Key Decisions updated from Pending → Accepted with Phase 1
  implementation notes, deferred work list, and final architecture summary.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
This commit is contained in:
Yuzviak
2026-05-11 09:23:09 +03:00
parent 0bb94da3c4
commit a11ed15187
8 changed files with 623 additions and 303 deletions
+120 -82
View File
@@ -1,114 +1,152 @@
# Roadmap: GPS-Denied Onboard Navigation System
# Roadmap: GPS-Denied Onboard Navigation System — Stage 2
**Stage:** 2 (independent iteration)
**Created:** 2026-05-10
**Branch:** `stage2` (HEAD = stage1; v1.0 archived under `.planning/archive/v1.0/`)
**Granularity:** standard
**Total phases:** 6
**Total requirements mapped:** 52 / 52 (100% coverage)
---
## Overview
The scaffold exists (~2800 lines): FastAPI service, all component ABCs, Pydantic schemas, database layer, and SSE streaming are in place. What is missing is every algorithmic kernel. This roadmap implements them in dependency order: the ESKF math core first (everything else feeds into it), then the two sensor inputs (VO and satellite/GPR), then the MAVLink output that closes the loop to the flight controller, then end-to-end pipeline wiring, then a Docker SITL test harness, and finally accuracy validation against real flight data.
Stage 2 is a **self-contained iteration** with its own phase numbering (16). It is NOT a continuation of Stage 1's seven phases — those are archived under `.planning/archive/v1.0/` and treated as MVP starting capital (the working ESKF + cuVSLAM/ORB VO + GPR + MAVLink + 195 passing tests).
The Stage 2 mission: refactor the inherited MVP into a hexagonal/ports-and-adapters architecture, re-implement (not merge) selected concept-level ideas from the parallel `try02` branch, formalize acceptance criteria with testable numerics, and add the Azaion 10.05.2026 real-flight integration fixture — all without regressing any of the 195 stage1 tests.
Phases are derived from the ten Stage 2 requirement categories (ARCH, AC, SAFE, VERIFY, FDR, VPR, MAVOUT, FIXTURE, TEST, OBS) and ordered so each phase stabilizes the Protocol surfaces and test infrastructure that the next phase depends on.
## Phase Dependency Order
```
Phase 1 (ARCH — hexagonal refactor + composition root; Protocols stabilized)
Phase 2 (AC + TEST taxonomy + structlog spine — measurement scaffolding)
Phase 3 (SAFE state machine + VERIFY anchor gates — authoritative source labels)
Phase 4 (Conditional Multi-Scale VPR + FDR — uses SAFE triggers, FDR for audit)
Phase 5 (MAVOUT — source-aware GPS_INPUT + spoofing + blackout — needs SAFE labels)
Phase 6 (FIXTURE — Azaion replay + CLI + per-env Docker — exercises everything e2e)
```
## Phases
- [ ] **Phase 1: ESKF Core** - 15-state error-state Kalman filter, coordinate transforms, confidence scoring
- [ ] **Phase 2: Visual Odometry** - cuVSLAM wrapper (Jetson) + OpenCV ORB stub (dev/CI) + TRT SuperPoint/LightGlue
- [ ] **Phase 3: Satellite Matching + GPR** - XFeat TRT matching, offline tile pipeline, real Faiss GPR index
- [ ] **Phase 4: MAVLink I/O** - pymavlink GPS_INPUT output loop, IMU input listener, telemetry, re-localization request
- [ ] **Phase 5: End-to-End Pipeline Wiring** - processor integration, GTSAM factor graph, recovery coordinator, object localization
- [ ] **Phase 6: Docker SITL Harness + CI** - ArduPilot SITL, camera replay, tile server mock, CI integration
- [ ] **Phase 7: Accuracy Validation** - 60-frame dataset validation, latency profiling, blackbox test suite
- [ ] **Phase 1: Hexagonal Refactor & Composition Root** — Reorganize stage1 MVP into `components/` hexagonal layout with Protocol-typed DI composition root; no regressions.
- [ ] **Phase 2: Acceptance Criteria + Test Taxonomy + Observability Spine** — Formal AC document with numeric thresholds, `tests/{unit,integration,blackbox,sitl,e2e}/` taxonomy, structlog correlation_id spine.
- [ ] **Phase 3: Safety Anchor State Machine & Geometry-Gated Verifier** — Authoritative `source_label` ownership + accept/reject gates for satellite anchors before they reach ESKF.
- [ ] **Phase 4: Conditional Multi-Scale VPR + Flight Data Recorder** — Trigger-driven DINOv2 forward, multi-scale FAISS chunks, append-only event log with bounded storage.
- [ ] **Phase 5: MAVLink Source-Aware Output & Spoofing/Blackout Handling** — Source labels + anchor age in GPS_INPUT, spoofing-promotion <3s, visual-blackout dead-reckoning ≤400ms, ODOMETRY scaffold behind feature flag.
- [ ] **Phase 6: Real-Flight Fixture (Azaion 10.05.2026) + CLI + Per-Env Docker** — End-to-end integration test on real flight data, `gps_denied` typer CLI, split Jetson/x86 Dockerfiles.
## Phase Details
### Phase 1: ESKF Core
**Goal**: A correct, standalone ESKF implementation exists that fuses IMU, VO, and satellite measurements and outputs confidence-tiered position estimates in WGS84
**Depends on**: Nothing (first phase — no other algorithmic component depends on this being absent)
**Requirements**: ESKF-01, ESKF-02, ESKF-03, ESKF-04, ESKF-05, ESKF-06
**Success Criteria** (what must be TRUE):
1. ESKF propagates nominal state (position, velocity, quaternion, biases) from synthetic IMU inputs and covariance grows correctly between measurement updates
2. VO measurement update reduces position uncertainty and innovation is within expected bounds for a simulated relative pose input
3. Satellite measurement update corrects absolute position and covariance tightens to satellite noise level
4. Confidence tier outputs HIGH when last satellite correction is recent and covariance is small, MEDIUM on VO-only, LOW on IMU-only — verified by unit tests
5. Full coordinate chain (pixel → camera ray → body → NED → WGS84) produces correct GPS coordinates for a known geometry test case; all FAKE Math stubs replaced
**Plans**: 3 plans
Plans:
- [x] 01-01-PLAN.md — ESKF core algorithm (schemas, 15-state filter, IMU prediction, VO/satellite updates, confidence tiers)
- [x] 01-02-PLAN.md — Coordinate chain fix (replace fake math with real K matrix projection, ray-ground intersection)
- [x] 01-03-PLAN.md — Unit tests for ESKF and coordinate chain (18+ ESKF tests, 10+ coordinate tests)
### Phase 1: Hexagonal Refactor & Composition Root
### Phase 2: Visual Odometry
**Goal**: VO produces metric relative poses via cuVSLAM on Jetson and via OpenCV ORB on dev/CI, both satisfying the same interface — no more scale-ambiguous unit vectors
**Depends on**: Phase 1 (ESKF provides metric scale reference and coordinate transforms for VO measurement update)
**Requirements**: VO-01, VO-02, VO-03, VO-04, VO-05
**Goal**: Stage1 MVP reorganized into hexagonal/ports-and-adapters layout with explicit DI composition root; all 195 stage1 tests still pass.
**Depends on**: Nothing (first phase; consumes stage1 archived code as input).
**Requirements**: ARCH-01, ARCH-02, ARCH-03, ARCH-04, ARCH-05, ARCH-06, ARCH-07
**Success Criteria** (what must be TRUE):
1. cuVSLAM wrapper initializes in Inertial mode with camera intrinsics and IMU parameters, and returns RelativePose with `scale_ambiguous=False` and metric translation in NED
2. OpenCV ORB stub satisfies the same ISequentialVisualOdometry interface and passes the same interface contract tests as the cuVSLAM wrapper
3. TRT SuperPoint/LightGlue engines load and run inference on Jetson; MockInferenceEngine is selected automatically on dev/x86
4. ImageInputPipeline accepts single-image batches without error; sequence lookup returns the correct frame with no substring collision
1. Every swappable component (vio, satellite_matcher, gpr, anchor_verifier, safety_state, flight_recorder, mavlink_io, coordinate_transforms) lives under `src/gps_denied/components/<name>/` with its own `protocol.py` + concrete impls + (where needed) `native/` bridge.
2. Hot-path types (`FrameState`, `IMUSample`, `PositionEstimate`, `VOEstimate`, `SatelliteAnchor`) are `@dataclass(slots=True, frozen=True)` and Pydantic no longer touches the per-frame path.
3. Calling `build_pipeline(env="x86_dev")` / `"jetson"` / `"ci"` / `"sitl"` from `pipeline/composition.py` returns a fully-wired `Pipeline` with environment-correct adapters and no concrete imports leaking into pipeline orchestration.
4. Per-environment YAML configs (`config/{jetson,x86_dev,ci,sitl}.yaml`) load via `pydantic-settings` into a typed `RuntimeConfig` that drives composition.
5. `pytest` runs all 195 stage1 tests (+ 8 SITL skipped) green and accuracy benchmarks show no regression vs the archived stage1 baseline.
**Plans**: TBD
### Phase 3: Satellite Matching + GPR
**Goal**: The system can correct absolute position from pre-loaded satellite tiles and re-localize after tracking loss using a real Faiss descriptor index
**Depends on**: Phase 1 (ESKF position uncertainty drives tile selection radius and measurement update), Phase 2 (VO provides keyframe selection timing)
**Requirements**: SAT-01, SAT-02, SAT-03, SAT-04, SAT-05, GPR-01, GPR-02, GPR-03
### Phase 2: Acceptance Criteria + Test Taxonomy + Observability Spine
**Goal**: Project gains a formal, testable acceptance-criteria contract, a structured test taxonomy, and a structured-logging spine — the measurement scaffolding every later phase needs to prove its claims.
**Depends on**: Phase 1 (Protocol surfaces and components/ layout must exist before tests/AC can reference them).
**Requirements**: AC-01, AC-02, AC-03, AC-04, AC-05, AC-06, TEST-01, TEST-02, TEST-03, OBS-01
**Success Criteria** (what must be TRUE):
1. Satellite tile selection queries the local GeoHash-indexed directory using ESKF position ± 3σ and returns correct tiles without any HTTP requests
2. Camera frame is GSD-normalized to satellite resolution before matching; XFeat TRT inference runs on Jetson and MockInferenceEngine on dev/CI
3. RANSAC homography produces a WGS84 position estimate with a confidence score derived from inlier ratio, accepted by ESKF satellite measurement update
4. GPR loads a real Faiss index from disk and returns tile candidates ranked by DINOv2 descriptor similarity (not random vectors)
5. After simulated tracking loss, GPR candidate + MetricRefinement produces an ESKF re-localization within expected accuracy bounds
1. `_docs/00_problem/acceptance_criteria.md` lists every AC-1.x…AC-NEW-x with numeric threshold + validation method + linked test ID(s); no AC entry is unbound.
2. `tests/` is reorganized into `unit/integration/blackbox/sitl/e2e/`, every existing test is reclassified, and `pytest -m unit|integration|blackbox|sitl|e2e` selects the right subset for CI.
3. Running `scripts/gen_ac_traceability.py` produces `.planning/AC-TRACEABILITY.md` linking every AC ID → test ID(s) → component(s); CI fails if any AC is orphaned.
4. Position-accuracy, failure-mode, and real-time-performance ACs are wired to `tests/integration/accuracy/`, `tests/blackbox/failure_modes/`, and a benchmark harness that emits CI-tracked metrics.
5. Pipeline emits structured JSON via `structlog` with `correlation_id` (frame_id) on every per-frame log line, and Pydantic logging schemas guard the boundary records.
**Plans**: TBD
### Phase 4: MAVLink I/O
**Goal**: The flight controller receives GPS_INPUT at 5-10Hz and the system receives IMU data from the flight controller — the primary acceptance criterion is met end-to-end for the communication layer
**Depends on**: Phase 1 (ESKF state is the source for GPS_INPUT field population; IMU data drives ESKF prediction)
**Requirements**: MAV-01, MAV-02, MAV-03, MAV-04, MAV-05
### Phase 3: Safety Anchor State Machine & Geometry-Gated Verifier
**Goal**: A separate safety layer — not the ESKF — owns the authoritative `source_label`, enforces monotonic covariance growth in non-anchored modes, and only accepts satellite anchors that pass formal geometric gates.
**Depends on**: Phase 2 (needs AC document + test taxonomy + structlog so state-machine behavior is testable and observable).
**Requirements**: SAFE-01, SAFE-02, SAFE-03, SAFE-04, SAFE-05, SAFE-06, VERIFY-01, VERIFY-02, VERIFY-03, VERIFY-04, VERIFY-05
**Success Criteria** (what must be TRUE):
1. pymavlink sends GPS_INPUT messages to a MAVLink endpoint at 5-10Hz; all required fields populated (lat, lon, alt, velocity, accuracy, fix_type, hdop, vdop, GPS time)
2. fix_type maps correctly from ESKF confidence tier: HIGH → 3 (3D fix), MEDIUM → 2 (2D fix), LOW → 0 (no fix)
3. IMU listener receives ATTITUDE/RAW_IMU from flight controller at 5-10Hz and ESKF prediction step runs at that rate between camera frames
4. After 3 consecutive frames with no position estimate, a MAVLink NAMED_VALUE_FLOAT message with last known position is sent (verifiable in SITL logs)
5. Telemetry at 1Hz emits confidence score and drift estimate to ground station via NAMED_VALUE_FLOAT
1. Every emitted `PositionEstimate` carries one of `satellite_anchored / vo_extrapolated / dead_reckoned` set by `SafetyAnchorStateMachine`, plus an `anchor_age_ms` field that increases until the next accepted anchor.
2. Property-based tests prove covariance never decreases without an accepted anchor, and a unit-test matrix exercises all 9 declared state transitions.
3. `GeometryGatedAnchorVerifier` accepts/rejects each candidate using configurable gates (min inliers, max mean reprojection error, max homography condition number, freshness window) and emits a machine-readable rejection reason on every reject.
4. Tile-write eligibility (`can_persist_tile`) is exposed by the state machine and is `false` whenever the system is in `dead_reckoned`, so the tile cache cannot be poisoned during blind flight.
5. The state machine never sees raw VPR top-K candidates — `AnchorVerifier` is the only path that can hand it an accepted anchor — and benchmark mode lets matcher profiles be compared offline on a fixed frame.
**Plans**: TBD
### Phase 5: End-to-End Pipeline Wiring
**Goal**: A single uploaded camera frame travels through the full pipeline — VO, ESKF update, satellite correction (on keyframes), GPS_INPUT output — with no hardcoded stubs in the path
**Depends on**: Phase 1, Phase 2, Phase 3, Phase 4 (all algorithmic components must exist to be wired)
**Requirements**: PIPE-01, PIPE-02, PIPE-03, PIPE-04, PIPE-05, PIPE-06, PIPE-07, PIPE-08
### Phase 4: Conditional Multi-Scale VPR + Flight Data Recorder
**Goal**: DINOv2 retrieval runs only when re-localization is actually needed; chunks are decoupled from storage tiles with multi-scale coverage; every state transition / anchor decision / MAVLink emission is captured in an append-only flight recorder with bounded storage and explicit health states.
**Depends on**: Phase 3 (VPR triggers and FDR events ride on SAFE state-transitions and VERIFY accept/reject decisions).
**Requirements**: VPR-01, VPR-02, VPR-03, VPR-04, VPR-05, FDR-01, FDR-02, FDR-03, FDR-04, FDR-05, FDR-06
**Success Criteria** (what must be TRUE):
1. process_frame executes the full chain without error: VO relative pose → ESKF VO update → (every 5-10 frames) satellite match → ESKF satellite update → GPS_INPUT sent to flight controller
2. SatelliteDataManager and CoordinateTransformer are instantiated in app.py lifespan and injected into the processor; no component is standalone
3. FactorGraphOptimizer calls real GTSAM ISAM2 update when GTSAM is available; mock path remains for CI
4. Object GPS localization (POST /objects/locate) returns a WGS84 position using the real pixel→ray→ground chain; hardcoded (48.0, 37.0) stub is gone
5. Application starts without TypeError; ImageRotationManager constructor accepts the model manager argument
1. In steady state the pipeline ranks chunks by IMU+VO geometric prior and skips the DINOv2 forward; DINOv2 runs only on declared re-loc triggers (cold start, sharp turn, σ_xy > 50m, VO failure ≥2 frames, disconnected segment).
2. VPR chunks cover the operating area with 600800m ground footprint and 4050% overlap so any frame footprint falls fully inside ≥1 chunk; FAISS holds both fine-scale (z=20) and coarse-scale (z=17/18) descriptor sets.
3. Top-K is dynamic — K=5 stable, K=20 active-conflict, K=50 expanding-window — and the integration uses the existing `chunk_manager.py` / `gpr.py` API surface without breaking stage1 GPR contracts.
4. `FlightRecorder` writes append-only JSONL segments to `data/fdr/{flight_id}/segment-NNNN.jsonl`, enforces configurable segment + total storage byte limits, and exposes `health ∈ {ok, degraded, critical}`.
5. State transitions, anchor accept/reject decisions, MAVLink sends, and pipeline errors are all recorded as FDR events; AC-NEW-3 forensic thumbnails fire at ≤0.1Hz on tile-generation failures within the FDR size budget.
**Plans**: TBD
### Phase 6: Docker SITL Harness + CI
**Goal**: The full pipeline can be tested in a reproducible Docker environment with ArduPilot SITL, camera replay, and a tile server mock — and CI runs this on every commit
**Depends on**: Phase 5 (all components must be wired before integration testing is meaningful)
**Requirements**: TEST-01, TEST-02
### Phase 5: MAVLink Source-Aware Output & Spoofing/Blackout Handling
**Goal**: The MAVLink output the flight controller actually sees carries source provenance and reacts correctly to GPS spoofing and visual blackout, with the dual-channel ODOMETRY path scaffolded but disabled.
**Depends on**: Phase 4 (needs SAFE source labels, FDR audit channel, and VPR triggers to drive blackout/promotion semantics).
**Requirements**: MAVOUT-01, MAVOUT-02, MAVOUT-03, MAVOUT-04
**Success Criteria** (what must be TRUE):
1. `docker compose up` starts ArduPilot SITL, the GPS-denied service, a camera-replay container, and a satellite tile server mock — all communicate over MAVLink and HTTP
2. CI pipeline runs on x86 using OpenCV ORB stub and MockInferenceEngine; all 85+ unit tests pass with no manual steps
3. MAVLink GPS_INPUT messages are captured in SITL logs and show 5-10Hz output rate during camera replay
4. Tracking loss scenario (simulated by replaying frames with no overlap) triggers RECOVERY state and sends re-localization request
1. Every `GPS_INPUT` message carries `source_label`, `anchor_age_ms`, and `covariance_semimajor_m` propagated from the corresponding `PositionEstimate` (mapped into `horiz_accuracy` and a custom STATUSTEXT for label/age).
2. When real-GPS health rolling average drops below threshold, the system promotes its own estimate to FC primary within <3s and emits a `STATUSTEXT` on every promotion/demotion.
3. When the camera produces no usable signal, the pipeline switches to `dead_reckoned` within ≤1 processed frame OR ≤400ms and emits `VISUAL_BLACKOUT_IMU_ONLY` STATUSTEXT at 12Hz until imagery returns.
4. The `ODOMETRY` emitter exists in code but is disabled by `config.mavlink.odometry_enabled=false` in stage 2, and an integration test asserts ODOMETRY is intentionally absent on the wire.
**Plans**: TBD
### Phase 7: Accuracy Validation
**Goal**: The system demonstrably meets the navigation accuracy acceptance criteria on the 60-frame test dataset, and all 21 blackbox test scenarios are implemented as runnable tests
**Depends on**: Phase 6 (SITL harness is required for the blackbox test scenarios)
**Requirements**: TEST-03, TEST-04, TEST-05
### Phase 6: Real-Flight Fixture (Azaion 10.05.2026) + CLI + Per-Env Docker
**Goal**: The whole stack is exercised end-to-end against real flight data, an operator-facing CLI replays flights and runs AC benchmarks, and per-environment Docker images close the deployment loop.
**Depends on**: Phase 5 (final phase — exercises ARCH + AC + SAFE + VERIFY + VPR + FDR + MAVOUT against the Azaion fixture).
**Requirements**: FIXTURE-01, FIXTURE-02, FIXTURE-03, FIXTURE-04, FIXTURE-05, FIXTURE-06, FIXTURE-07, OBS-02, OBS-03
**Success Criteria** (what must be TRUE):
1. Running against AD000001AD000060.jpg with coordinates.csv ground truth: 80% of frames within 50m error and 60% of frames within 20m error
2. Maximum cumulative VO drift between satellite corrections is less than 100m across any segment in the test dataset
3. End-to-end latency per frame (camera capture to GPS_INPUT) is under 400ms on Jetson, with a breakdown report per pipeline stage
4. All 21 blackbox test scenarios (FT-P-01 to FT-P-14, FT-N-01 to FT-N-07) run as pytest tests against the SITL harness and produce a pass/fail report
1. `tests/integration/azaion_flight/` runs against `Data/Azaion/10.05.2026/` (tlog + cropped EO video + MAVLink CSV) and is documented in `_docs/00_problem/fixtures.md` with ground-truth references and known limitations.
2. `scripts/prep_azaion_fixture.py` produces HUD-stripped EO frames at 0.7 fps, an IMU/GPS/ATTITUDE CSV from the tlog, and a timestamp-aligned manifest.
3. MAVLink replay decodes every `GLOBAL_POSITION_INT` / `RAW_IMU` / `ATTITUDE` message without error; ESKF replay on the real IMU samples produces no NaN/Inf and shows bounded covariance growth; ORB-SLAM3 VO smoke test achieves ≥30% frame registration on the cropped EO frames.
4. The GPS-denial simulation masks `GPS_RAW_INT` for t∈[180s, 280s] and the pipeline correctly switches to `vo_extrapolated` and back to `satellite_anchored` when GPS returns.
5. `gps_denied` typer CLI exposes `replay --tlog ... --video ...`, `benchmark --scenario ...`, and `bench-ac AC-1.1`; `Dockerfile.x86_dev` and `Dockerfile.jetson` (multi-stage with TRT engine prebuild step) build green and run the replay end-to-end on their respective platforms.
**Plans**: TBD
## Progress
| Phase | Plans Complete | Status | Completed |
|-------|----------------|--------|-----------|
| 1. ESKF Core | 0/3 | Planned | - |
| 2. Visual Odometry | 0/TBD | Not started | - |
| 3. Satellite Matching + GPR | 0/TBD | Not started | - |
| 4. MAVLink I/O | 0/TBD | Not started | - |
| 5. End-to-End Pipeline Wiring | 0/TBD | Not started | - |
| 6. Docker SITL Harness + CI | 0/TBD | Not started | - |
| 7. Accuracy Validation | 0/TBD | Not started | - |
| 1. Hexagonal Refactor & Composition Root | 0/0 | Not started | - |
| 2. Acceptance Criteria + Test Taxonomy + Observability Spine | 0/0 | Not started | - |
| 3. Safety Anchor State Machine & Geometry-Gated Verifier | 0/0 | Not started | - |
| 4. Conditional Multi-Scale VPR + Flight Data Recorder | 0/0 | Not started | - |
| 5. MAVLink Source-Aware Output & Spoofing/Blackout Handling | 0/0 | Not started | - |
| 6. Real-Flight Fixture (Azaion 10.05.2026) + CLI + Per-Env Docker | 0/0 | Not started | - |
## Coverage Summary
| Category | Count | Phase |
|----------|-------|-------|
| ARCH | 7 | Phase 1 |
| AC | 6 | Phase 2 |
| TEST | 3 | Phase 2 |
| OBS-01 | 1 | Phase 2 |
| SAFE | 6 | Phase 3 |
| VERIFY | 5 | Phase 3 |
| VPR | 5 | Phase 4 |
| FDR | 6 | Phase 4 |
| MAVOUT | 4 | Phase 5 |
| FIXTURE | 7 | Phase 6 |
| OBS-02, OBS-03 | 2 | Phase 6 |
| **Total** | **52** | **6 phases** |
100% of Stage 2 requirements mapped; no orphans; no duplicates.