mirror of
https://github.com/azaion/gps-denied-onboard.git
synced 2026-06-22 22:41:13 +00:00
docs: add Phase 1 ADRs and update PROJECT.md with completed decisions
ADR 0002: hexagonal/ports-and-adapters architecture — components/ layout, protocol.py per component, composition root, core/ for concentrated math. ADR 0003: @dataclass(slots=True, frozen=True) on hot path; Pydantic retained only at REST/config/DB boundaries. Pose/GPSPoint migration deferred to Phase 2. ADR 0004: Stage 2 as independent iteration — own phases 1-6, own requirements, stage1 code treated as MVP starting capital. PROJECT.md: Stage 2 Key Decisions updated from Pending → Accepted with Phase 1 implementation notes, deferred work list, and final architecture summary. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
This commit is contained in:
+138
-61
@@ -1,98 +1,175 @@
|
||||
# GPS-Denied Onboard Navigation System
|
||||
# GPS-Denied Onboard Navigation System — Stage 2
|
||||
|
||||
## What This Is
|
||||
|
||||
Real-time GPS-independent position estimation system for a fixed-wing UAV operating in GPS-denied/spoofed environments (flat terrain, Ukraine). Runs onboard a Jetson Orin Nano Super (8GB shared, 67 TOPS). Fuses visual odometry (cuVSLAM), satellite image matching (TensorRT FP16), and IMU via an ESKF to output MAVLink GPS_INPUT to an ArduPilot flight controller at 5-10Hz, while also streaming position and confidence over SSE to a ground station.
|
||||
|
||||
## Stage 2 Iteration
|
||||
|
||||
**Stage 2 is a self-contained iteration of the project.** It is NOT a continuation of Stage 1's phase numbering — it has its own roadmap (Phases 1–6), its own requirements list, and its own success criteria. Each stage is conceptually a new pass at the system: same problem, same end goal, fresh decisions about HOW.
|
||||
|
||||
**Stage 2 starting capital:**
|
||||
|
||||
- **From stage1 (own work):** The full v1 pipeline as MVP — ESKF (15-state), cuVSLAM/ORB VO, satellite matching + GPR, MAVLink GPS_INPUT, pipeline orchestration, SITL harness, accuracy benchmarks, 195 passing tests. Treated as **MVP, not production** — refactoring is allowed and expected.
|
||||
- **From try02 (parallel team):** Concept-level ideas only — Safety Anchor State Machine, Geometry-Gated Anchor Verifier, Flight Data Recorder, Conditional Multi-Scale VPR, dual-channel MAVLink design, formal Acceptance Criteria document with numeric thresholds, structured test taxonomy.
|
||||
- **From real-flight data:** Azaion 10.05.2026 dataset (tlog + 6min video + 9.5Hz GPS ground truth) as integration fixture.
|
||||
|
||||
**Stage 2 is free to:**
|
||||
|
||||
- Reorganize the codebase (hexagonal layout) — no production lock-in
|
||||
- Replace, swap, or rebuild components — only AC-driven test outcomes are sacred
|
||||
- Change the architecture wholesale if a better path emerges mid-stage
|
||||
- Diverge from try02's choices where the evidence supports it (e.g., reject BASALT in favor of cuVSLAM, reject Pydantic on hot path)
|
||||
|
||||
**Stage 2 archive:** `_planning/archive/v1.0/` preserves stage1's PROJECT.md, REQUIREMENTS.md, ROADMAP.md, and Phase 1 artifacts as historical record.
|
||||
|
||||
## Core Value
|
||||
|
||||
The flight controller must receive valid MAVLink GPS_INPUT at 5-10Hz with position accuracy ≤50m for 80% of frames — without this, the UAV cannot navigate in GPS-denied airspace.
|
||||
|
||||
## Requirements
|
||||
## Stage 2 Goal
|
||||
|
||||
### Validated
|
||||
Refactor the inherited stage1 MVP into a hexagonal/ports-and-adapters architecture with explicit DI composition root, integrate selected concept-level ideas from `try02`, formalize acceptance criteria with testable numerics, and add a real-flight integration fixture (Azaion 10.05.2026).
|
||||
|
||||
- ✓ FastAPI service scaffold with SSE streaming — existing
|
||||
- ✓ FlightProcessor orchestrator with NORMAL/LOST/RECOVERY state machine — existing
|
||||
- ✓ CoordinateTransformer (GPS↔ENU, pixel→camera→body→NED→WGS84) — existing
|
||||
- ✓ SatelliteDataManager (tile fetch, diskcache, GeoHash lookup) — existing
|
||||
- ✓ ImageInputPipeline (frame queue, batch validation, storage) — existing
|
||||
- ✓ SQLAlchemy async DB layer (flights, waypoints, frames, results) — existing
|
||||
- ✓ Pydantic schema contracts for all inter-component data — existing
|
||||
- ✓ ABC interfaces for all core components (VO, GPR, metric, graph) — existing
|
||||
## Stage 2 Target Features
|
||||
|
||||
### Active
|
||||
**Architecture:**
|
||||
- Hexagonal layout — `src/gps_denied/components/{vio, satellite_matcher, gpr, anchor_verifier, safety_state, flight_recorder, mavlink_io, coordinate_transforms}/` with `protocol.py` + concrete impls per component
|
||||
- Hot-path types as `@dataclass(slots=True, frozen=True)` for `FrameState`, `IMUSample`, `PositionEstimate`; Pydantic kept only at REST/config/DB boundaries
|
||||
- Composition root `pipeline/composition.py` with explicit DI for env-specific wiring (jetson/x86_dev/ci/sitl)
|
||||
- Per-environment config — `config/{jetson,x86_dev,ci,sitl}.yaml` driven by pydantic-settings
|
||||
- `core/` retained for concentrated math (ESKF, factor graph, RANSAC) — single-file pure functions
|
||||
|
||||
- [ ] ESKF implementation (15-state error-state Kalman filter: IMU prediction + VO update + satellite update + covariance propagation)
|
||||
- [ ] MAVLink GPS_INPUT output (pymavlink, UART/UDP, 5-10Hz loop, ESKF state→GPS_INPUT field mapping)
|
||||
- [ ] Real VO implementation (cuVSLAM on Jetson / OpenCV ORB stub on dev for CI)
|
||||
- [ ] Real TensorRT inference (SuperPoint+LightGlue for VO, XFeat for satellite matching — FP16 on Jetson)
|
||||
- [ ] Satellite feature matching pipeline (tile selection by ESKF uncertainty, RANSAC homography, WGS84 extraction)
|
||||
- [ ] GlobalPlaceRecognition implementation (AnyLoc/DINOv2 candidate retrieval, FAISS index, tile scoring)
|
||||
- [ ] FactorGraph implementation (pose graph with VO edges + satellite anchor nodes, optimization loop)
|
||||
- [ ] FailureRecoveryCoordinator (tracking loss detection, re-init protocol, operator re-localization hint)
|
||||
- [ ] End-to-end pipeline wiring (processor.process_frame → VO → ESKF → satellite → GPS_INPUT)
|
||||
- [ ] Docker SITL test harness (ArduPilot SITL, camera replay, tile server mock, CI integration)
|
||||
- [ ] Confidence scoring and GPS_INPUT fix_type mapping (HIGH/MEDIUM/LOW → fix_type 3/2/0)
|
||||
- [ ] Object GPS localization endpoint (POST /objects/locate with gimbal angle projection)
|
||||
**try02 concept integration:**
|
||||
- Acceptance Criteria document — formal AC-1.x…AC-NEW-x with numeric thresholds, validation methods, test linkage
|
||||
- Safety Anchor State Machine — separate layer over ESKF owning `source_label` (`satellite_anchored`/`vo_extrapolated`/`dead_reckoned`), monotonic covariance growth, anchor age, tile write eligibility
|
||||
- Geometry-gated Anchor Verifier — formal accept/reject gates (inliers, MRE, reprojection error) before anchor enters ESKF
|
||||
- Flight Data Recorder (FDR) — append-only event log with bounded segment storage and health states
|
||||
- Conditional VPR invocation — DINOv2 forward only on re-loc triggers; steady-state geometric prior
|
||||
- Multi-scale VPR chunks — 600-800m ground-footprint chunks at 40-50% overlap, decoupled from storage tiles, fine (z=20) + coarse (z=17) scales
|
||||
- Source label + anchor_age_ms emitted in every GPS_INPUT estimate
|
||||
- Visual blackout handling — switch to `dead_reckoned` ≤400ms, monotonic covariance growth, `VISUAL_BLACKOUT_IMU_ONLY` STATUSTEXT @ 1-2Hz
|
||||
- Spoofing-promotion latency monitor — promote own estimate to FC primary within <3s of detected real-GPS health drop
|
||||
- Test taxonomy — `tests/{unit,integration,blackbox,sitl,e2e}/`
|
||||
- Dual-channel MAVLink design — `GPS_INPUT` primary (v1 only), `ODOMETRY` auxiliary scaffolded behind feature flag for v1.1
|
||||
- Structured JSON logging with `correlation_id` (frame_id) per-frame
|
||||
- CLI tool `gps_denied replay --tlog ... --video ...`
|
||||
- Real-flight integration fixture — Azaion 10.05.2026 as `tests/integration/azaion_flight/`
|
||||
|
||||
### Out of Scope
|
||||
## Stage 2 Explicit Non-Goals
|
||||
|
||||
- TensorRT engine building tooling — engines are pre-built offline, system only loads them
|
||||
- Google Maps tile download tooling — tiles pre-cached before flight, not streamed live
|
||||
- Full ArduPilot integration testing on hardware — Jetson hardware validation is post-v1
|
||||
- Mobile/web ground station UI — SSE stream is consumed by external systems
|
||||
- BASALT VIO backend — cuVSLAM remains primary (aarch64) with ORB-SLAM3 as CI baseline
|
||||
- Pydantic on the per-frame hot path — dataclasses replace it
|
||||
- Mandatory PostgreSQL — SQLite remains the embedded default
|
||||
- Microservice processes / IPC — single-process architecture preserved
|
||||
- Folder-per-component split for `core/` math files — ESKF/factor graph stay concentrated
|
||||
- Mid-flight tile generation + write-back to Suite (AC-8.4) — deferred to Stage 3
|
||||
- Production hardware validation on Jetson — deferred to Stage 3
|
||||
|
||||
## Future Stages (parking lot)
|
||||
|
||||
- **Stage 3 candidates:** Jetson hardware validation, mid-flight tile generation + Suite write-back, ODOMETRY channel enabled, AC-NEW-1 cold-boot benchmark, BASALT evaluation if cuVSLAM blockers emerge
|
||||
|
||||
## Out of Scope (across all stages, unless re-opened)
|
||||
|
||||
- TensorRT engine building tooling — engines are pre-built offline
|
||||
- Google Maps tile download tooling — tiles pre-cached before flight
|
||||
- Mobile/web ground station UI — SSE consumed by external systems
|
||||
- Multi-UAV coordination — single UAV instance only
|
||||
|
||||
## Context
|
||||
|
||||
**Hardware target:** Jetson Orin Nano Super (8GB LPDDR5 shared, JetPack 6.2.2, CUDA 12.6, TRT 10.3.0). All development happens on x86 Linux; cuVSLAM and TRT are Jetson-only — dev machine uses OpenCV ORB stub and MockInferenceEngine.
|
||||
**Hardware target:** Jetson Orin Nano Super (8GB LPDDR5 shared, JetPack 6.2.2, CUDA 12.6, TRT 10.3.0). Development on x86 Linux; cuVSLAM and TRT are Jetson-only — dev/CI uses OpenCV ORB stub and MockInferenceEngine.
|
||||
|
||||
**Camera:** ADTI 20L V1 (5456×3632, APS-C, 16mm lens, nadir fixed, 0.7fps). AI detection camera: Viewpro A40 Pro (separate).
|
||||
**Camera (target):** ADTI 20L V1 (5456×3632, APS-C, 16mm lens, nadir fixed, 0.7fps). AI detection camera: Viewpro A40 Pro (separate).
|
||||
|
||||
**Flight controller:** ArduPilot via MAVLink UART. System sends GPS_INPUT; receives IMU (200Hz) and GLOBAL_POSITION_INT (1Hz) from FC.
|
||||
**Camera (Azaion fixture):** Multirotor gimbal EO+IR split-screen with HUD overlay, 1280×720 @ 30fps. Used for integration testing only — does not represent target deployment camera.
|
||||
|
||||
**Key latency budget:** <400ms end-to-end per frame (camera @ 0.7fps = 1430ms window).
|
||||
**Flight controller:** ArduPilot via MAVLink UART. System sends GPS_INPUT; receives IMU (200Hz target / 9.7Hz in Azaion fixture) and GLOBAL_POSITION_INT (1Hz) from FC.
|
||||
|
||||
**Existing scaffold:** ~2800 lines of Python code exist as a well-structured scaffold. All modules are present with ABC interfaces and schemas, but critical algorithmic kernels (ESKF, real VO, TRT inference, MAVLink) are missing or mocked.
|
||||
**Key latency budget:** <400ms end-to-end per frame.
|
||||
|
||||
**Test data:** 60 UAV frames (AD000001-AD000060.jpg), coordinates.csv with ground-truth GPS, expected_results/position_accuracy.csv. 43 documented test scenarios across 7 categories.
|
||||
**Stage 1 inheritance:** ~7,800 lines of working Python code with 195 passing tests. All algorithmic kernels (ESKF, VO, GPR, MAVLink, factor graph) implemented. Stage 2 starts from this codebase on branch `stage2` (HEAD = stage1).
|
||||
|
||||
**Reference branch:** `try02` is checked out as a worktree at `../gps-denied-onboard-try02/` for concept harvesting. We do NOT merge from try02 — we read it for ideas and re-implement what fits.
|
||||
|
||||
## Constraints
|
||||
|
||||
- **Performance**: <400ms/frame end-to-end, <8GB RAM+VRAM — non-negotiable for real-time flight
|
||||
- **Hardware**: cuVSLAM v15.0.0 (aarch64-only wheel) — stub interface required for CI
|
||||
- **Platform**: JetPack 6.2.2, Python 3.10+, TensorRT 10.3.0, CUDA 12.6
|
||||
- **Navigation accuracy**: 80% frames ≤50m, 60% frames ≤20m, max drift 100m between satellite corrections
|
||||
- **Resilience**: Handle sharp turns (disconnected VO segments), 3+ consecutive satellite match failures
|
||||
- **Performance:** <400ms/frame end-to-end p95, <8GB RAM+VRAM — non-negotiable
|
||||
- **Hardware:** cuVSLAM v15.0.0 (aarch64-only wheel) — Protocol with stub on x86
|
||||
- **Platform:** JetPack 6.2.2, Python 3.10+, TensorRT 10.3.0, CUDA 12.6
|
||||
- **Navigation accuracy:** 80% frames ≤50m, 60% frames ≤20m, max drift 100m between satellite corrections
|
||||
- **Resilience:** Handle sharp turns (disconnected VO segments), 3+ consecutive satellite match failures, visual blackout, GPS spoofing promotion <3s
|
||||
- **Regression floor:** All 195 stage1 passing tests must continue to pass after refactor
|
||||
|
||||
## Key Decisions
|
||||
## Stage 2 Key Decisions
|
||||
|
||||
| Decision | Rationale | Outcome |
|
||||
|----------|-----------|---------|
|
||||
| ESKF over EKF/UKF | 15-state error-state formulation avoids quaternion singularities, standard for INS | — Pending |
|
||||
| XFeat over LiteSAM for satellite matching | LiteSAM may exceed 400ms budget on Jetson; XFeat is faster | — Pending (benchmark required) |
|
||||
| OpenCV ORB stub for dev/CI | cuVSLAM is aarch64-only; CI must run on x86 | — Pending |
|
||||
| AnyLoc/DINOv2 for GPR | Validated on UAV-VisLoc benchmark (17.86m RMSE) | — Pending |
|
||||
| diskcache + GeoHash for tiles | O(1) tile lookup, no DB overhead, LRU eviction | ✓ Good |
|
||||
| AsyncSQLAlchemy + aiosqlite | Non-blocking DB for async FastAPI service | ✓ Good |
|
||||
| Hexagonal layout with `components/` folders | Clear ownership per swappable backend, native bridges colocate with adapter | ✓ Phase 1 |
|
||||
| `@dataclass(slots=True, frozen=True)` on hot path, Pydantic at boundaries only | Avoid try02's per-frame Pydantic latency cost; validate where it catches bugs (REST input, config) | ✓ Phase 1 (hot_types/ scaffolded; full migration Phase 2) |
|
||||
| Explicit DI composition root | One file wires environment-specific implementations; tests pass mock dependencies | ✓ Phase 1 (`pipeline/composition.py:build_pipeline`) |
|
||||
| Adopt try02 concept ideas, reject try02 layout details | Take Safety Anchor / Anchor Verifier / FDR / Conditional VPR; reject Pydantic-on-hot-path, BASALT | ✓ Adopted — Phases 3–5 |
|
||||
| Take try02 acceptance criteria with numeric thresholds | Their AC-1.x…AC-NEW-x is more rigorous than stage1's drafts; bind every AC to ≥1 test | ✓ Adopted — Phase 2 |
|
||||
| Test taxonomy `unit/integration/blackbox/sitl/e2e` | Clarifies CI-on-push vs PR vs nightly vs hardware-only test runs | ✓ Phase 2 |
|
||||
| Stage as iteration, not phase continuation | Each stage = own roadmap, own phase numbering, own success criteria | ✓ Adopted |
|
||||
|
||||
## Phase 1 Outcome (2026-05-11, completed)
|
||||
|
||||
**ARCH-01..07 all satisfied.** 216 tests pass (baseline 195+21 new = 216), 0 failures, accuracy benchmarks unchanged.
|
||||
|
||||
### What was built
|
||||
|
||||
**Components scaffold** (`src/gps_denied/components/`):
|
||||
- `vio/` — `protocol.py` + `orbslam_backend.py` + `cuvslam_backend.py` + `factory.py`; `core/vo.py` is a shim
|
||||
- `gpr/` — `protocol.py` + `faiss_gpr.py` (inline numpy fallback preserved); `core/gpr.py` is a shim
|
||||
- `satellite_matcher/` — `protocol.py` + `local_tile_loader.py` + `metric_refinement.py`; `core/satellite.py`, `core/metric.py` are shims
|
||||
- `mavlink_io/` — `protocol.py` + `pymavlink_bridge.py` + `mock_mavlink.py`; `core/mavlink.py` is a shim (re-exports private helpers `_confidence_to_fix_type`, `_eskf_to_gps_input`, `_unix_to_gps_time`)
|
||||
- `anchor_verifier/`, `safety_state/`, `flight_recorder/`, `coordinate_transforms/` — Protocol stubs only (Phases 3–5)
|
||||
|
||||
**Hot-path types** (`src/gps_denied/hot_types/`): `FrameState`, `IMUSample`, `PositionEstimate`, `VOEstimate`, `SatelliteAnchor` as `@dataclass(slots=True, frozen=True)`. Schemas shimmed to re-export. `Pose` stays Pydantic (mutation sites in `factor_graph.py` lines 182–297); `GPSPoint` stays Pydantic. Full hot-path migration deferred to Phase 2.
|
||||
|
||||
**Pipeline package** (`src/gps_denied/pipeline/`):
|
||||
- `orchestrator.py` — `FlightProcessor` (moved from `core/processor.py`)
|
||||
- `image_input.py`, `result_manager.py`, `sse_streamer.py` (moved from `core/`)
|
||||
- `composition.py` — `build_pipeline(env: Literal["jetson","x86_dev","ci","sitl"]) -> FlightProcessor`
|
||||
|
||||
**Composition root**: wires 10 components; lazy imports inside function body to avoid circular imports; Jetson env → `prefer_cuvslam=True`, `prefer_mono_depth=True`; other envs → mocks.
|
||||
|
||||
**Config**: `AppSettings.env` Literal field + `RuntimeConfig = AppSettings` alias. `pydantic-settings YamlConfigSettingsSource` loads `config/{env}.yaml`. `pyyaml>=6.0` declared.
|
||||
|
||||
**ABC→Protocol sweep**: 6 interfaces converted to `typing.Protocol` with `@runtime_checkable`:
|
||||
`IFactorGraphOptimizer`, `IRouteChunkManager`, `IFailureRecoveryCoordinator`, `IModelManager`, `IImageMatcher`, + all 8 component Protocols from `components/*/protocol.py`.
|
||||
|
||||
**`core/` retained** for concentrated math: `eskf.py`, `factor_graph.py`, `coordinates.py`, `chunk_manager.py`, `recovery.py`, `rotation.py`, `models.py`.
|
||||
|
||||
**Shim policy**: every moved file leaves a re-export shim at its old path. Tests import from old paths — shims keep them green. Shim removal is Phase 2 work.
|
||||
|
||||
### Deferred to Phase 2
|
||||
|
||||
- Full hot-path type migration (`Pose`, `GPSPoint`, remaining Pydantic models on frame path)
|
||||
- Test reorganization to `tests/{unit,integration,blackbox,sitl,e2e}/`
|
||||
- Shim removal from `core/`
|
||||
- YAML config enrichment with env-specific overrides (MAVLink connection strings, tile dirs)
|
||||
|
||||
## Stage 1 Decisions Inherited (validated, kept)
|
||||
|
||||
| Decision | Outcome |
|
||||
|----------|---------|
|
||||
| ESKF over EKF/UKF | ✓ Stage 1 |
|
||||
| XFeat over LiteSAM for satellite matching | ✓ Stage 1 |
|
||||
| OpenCV ORB stub for dev/CI; cuVSLAM on Jetson | ✓ Stage 1 |
|
||||
| AnyLoc/DINOv2 for GPR | ✓ Stage 1 |
|
||||
| diskcache + GeoHash for tiles | ✓ Stage 1 |
|
||||
| AsyncSQLAlchemy + aiosqlite | ✓ Stage 1 |
|
||||
|
||||
## Evolution
|
||||
|
||||
This document evolves at phase transitions and milestone boundaries.
|
||||
Each stage is its own iteration with its own PROJECT.md, REQUIREMENTS.md, ROADMAP.md. At stage completion:
|
||||
|
||||
**After each phase transition** (via `/gsd:transition`):
|
||||
1. Requirements invalidated? → Move to Out of Scope with reason
|
||||
2. Requirements validated? → Move to Validated with phase reference
|
||||
3. New requirements emerged? → Add to Active
|
||||
4. Decisions to log? → Add to Key Decisions
|
||||
5. "What This Is" still accurate? → Update if drifted
|
||||
|
||||
**After each milestone** (via `/gsd:complete-milestone`):
|
||||
1. Full review of all sections
|
||||
2. Core Value check — still the right priority?
|
||||
3. Audit Out of Scope — reasons still valid?
|
||||
4. Update Context with current state
|
||||
1. Snapshot current PROJECT.md / REQUIREMENTS.md / ROADMAP.md / phases/ → `.planning/archive/v[X.Y]/`
|
||||
2. Open new stage with fresh roadmap (Phase 1 of the new stage)
|
||||
3. Carry forward only validated decisions and unresolved Future-stages items
|
||||
|
||||
---
|
||||
*Last updated: 2026-04-01 after initialization*
|
||||
*Stage 2 opened: 2026-05-10*
|
||||
|
||||
Reference in New Issue
Block a user