Files
gps-denied-onboard/.planning/PROJECT.md
T
Yuzviak a11ed15187 docs: add Phase 1 ADRs and update PROJECT.md with completed decisions
ADR 0002: hexagonal/ports-and-adapters architecture — components/ layout,
  protocol.py per component, composition root, core/ for concentrated math.
ADR 0003: @dataclass(slots=True, frozen=True) on hot path; Pydantic retained
  only at REST/config/DB boundaries. Pose/GPSPoint migration deferred to Phase 2.
ADR 0004: Stage 2 as independent iteration — own phases 1-6, own requirements,
  stage1 code treated as MVP starting capital.

PROJECT.md: Stage 2 Key Decisions updated from Pending → Accepted with Phase 1
  implementation notes, deferred work list, and final architecture summary.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-11 09:23:09 +03:00

176 lines
12 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters
This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.
# GPS-Denied Onboard Navigation System — Stage 2
## What This Is
Real-time GPS-independent position estimation system for a fixed-wing UAV operating in GPS-denied/spoofed environments (flat terrain, Ukraine). Runs onboard a Jetson Orin Nano Super (8GB shared, 67 TOPS). Fuses visual odometry (cuVSLAM), satellite image matching (TensorRT FP16), and IMU via an ESKF to output MAVLink GPS_INPUT to an ArduPilot flight controller at 5-10Hz, while also streaming position and confidence over SSE to a ground station.
## Stage 2 Iteration
**Stage 2 is a self-contained iteration of the project.** It is NOT a continuation of Stage 1's phase numbering — it has its own roadmap (Phases 16), its own requirements list, and its own success criteria. Each stage is conceptually a new pass at the system: same problem, same end goal, fresh decisions about HOW.
**Stage 2 starting capital:**
- **From stage1 (own work):** The full v1 pipeline as MVP — ESKF (15-state), cuVSLAM/ORB VO, satellite matching + GPR, MAVLink GPS_INPUT, pipeline orchestration, SITL harness, accuracy benchmarks, 195 passing tests. Treated as **MVP, not production** — refactoring is allowed and expected.
- **From try02 (parallel team):** Concept-level ideas only — Safety Anchor State Machine, Geometry-Gated Anchor Verifier, Flight Data Recorder, Conditional Multi-Scale VPR, dual-channel MAVLink design, formal Acceptance Criteria document with numeric thresholds, structured test taxonomy.
- **From real-flight data:** Azaion 10.05.2026 dataset (tlog + 6min video + 9.5Hz GPS ground truth) as integration fixture.
**Stage 2 is free to:**
- Reorganize the codebase (hexagonal layout) — no production lock-in
- Replace, swap, or rebuild components — only AC-driven test outcomes are sacred
- Change the architecture wholesale if a better path emerges mid-stage
- Diverge from try02's choices where the evidence supports it (e.g., reject BASALT in favor of cuVSLAM, reject Pydantic on hot path)
**Stage 2 archive:** `_planning/archive/v1.0/` preserves stage1's PROJECT.md, REQUIREMENTS.md, ROADMAP.md, and Phase 1 artifacts as historical record.
## Core Value
The flight controller must receive valid MAVLink GPS_INPUT at 5-10Hz with position accuracy ≤50m for 80% of frames — without this, the UAV cannot navigate in GPS-denied airspace.
## Stage 2 Goal
Refactor the inherited stage1 MVP into a hexagonal/ports-and-adapters architecture with explicit DI composition root, integrate selected concept-level ideas from `try02`, formalize acceptance criteria with testable numerics, and add a real-flight integration fixture (Azaion 10.05.2026).
## Stage 2 Target Features
**Architecture:**
- Hexagonal layout — `src/gps_denied/components/{vio, satellite_matcher, gpr, anchor_verifier, safety_state, flight_recorder, mavlink_io, coordinate_transforms}/` with `protocol.py` + concrete impls per component
- Hot-path types as `@dataclass(slots=True, frozen=True)` for `FrameState`, `IMUSample`, `PositionEstimate`; Pydantic kept only at REST/config/DB boundaries
- Composition root `pipeline/composition.py` with explicit DI for env-specific wiring (jetson/x86_dev/ci/sitl)
- Per-environment config — `config/{jetson,x86_dev,ci,sitl}.yaml` driven by pydantic-settings
- `core/` retained for concentrated math (ESKF, factor graph, RANSAC) — single-file pure functions
**try02 concept integration:**
- Acceptance Criteria document — formal AC-1.x…AC-NEW-x with numeric thresholds, validation methods, test linkage
- Safety Anchor State Machine — separate layer over ESKF owning `source_label` (`satellite_anchored`/`vo_extrapolated`/`dead_reckoned`), monotonic covariance growth, anchor age, tile write eligibility
- Geometry-gated Anchor Verifier — formal accept/reject gates (inliers, MRE, reprojection error) before anchor enters ESKF
- Flight Data Recorder (FDR) — append-only event log with bounded segment storage and health states
- Conditional VPR invocation — DINOv2 forward only on re-loc triggers; steady-state geometric prior
- Multi-scale VPR chunks — 600-800m ground-footprint chunks at 40-50% overlap, decoupled from storage tiles, fine (z=20) + coarse (z=17) scales
- Source label + anchor_age_ms emitted in every GPS_INPUT estimate
- Visual blackout handling — switch to `dead_reckoned` ≤400ms, monotonic covariance growth, `VISUAL_BLACKOUT_IMU_ONLY` STATUSTEXT @ 1-2Hz
- Spoofing-promotion latency monitor — promote own estimate to FC primary within <3s of detected real-GPS health drop
- Test taxonomy — `tests/{unit,integration,blackbox,sitl,e2e}/`
- Dual-channel MAVLink design — `GPS_INPUT` primary (v1 only), `ODOMETRY` auxiliary scaffolded behind feature flag for v1.1
- Structured JSON logging with `correlation_id` (frame_id) per-frame
- CLI tool `gps_denied replay --tlog ... --video ...`
- Real-flight integration fixture — Azaion 10.05.2026 as `tests/integration/azaion_flight/`
## Stage 2 Explicit Non-Goals
- BASALT VIO backend — cuVSLAM remains primary (aarch64) with ORB-SLAM3 as CI baseline
- Pydantic on the per-frame hot path — dataclasses replace it
- Mandatory PostgreSQL — SQLite remains the embedded default
- Microservice processes / IPC — single-process architecture preserved
- Folder-per-component split for `core/` math files — ESKF/factor graph stay concentrated
- Mid-flight tile generation + write-back to Suite (AC-8.4) — deferred to Stage 3
- Production hardware validation on Jetson — deferred to Stage 3
## Future Stages (parking lot)
- **Stage 3 candidates:** Jetson hardware validation, mid-flight tile generation + Suite write-back, ODOMETRY channel enabled, AC-NEW-1 cold-boot benchmark, BASALT evaluation if cuVSLAM blockers emerge
## Out of Scope (across all stages, unless re-opened)
- TensorRT engine building tooling — engines are pre-built offline
- Google Maps tile download tooling — tiles pre-cached before flight
- Mobile/web ground station UI — SSE consumed by external systems
- Multi-UAV coordination — single UAV instance only
## Context
**Hardware target:** Jetson Orin Nano Super (8GB LPDDR5 shared, JetPack 6.2.2, CUDA 12.6, TRT 10.3.0). Development on x86 Linux; cuVSLAM and TRT are Jetson-only — dev/CI uses OpenCV ORB stub and MockInferenceEngine.
**Camera (target):** ADTI 20L V1 (5456×3632, APS-C, 16mm lens, nadir fixed, 0.7fps). AI detection camera: Viewpro A40 Pro (separate).
**Camera (Azaion fixture):** Multirotor gimbal EO+IR split-screen with HUD overlay, 1280×720 @ 30fps. Used for integration testing only — does not represent target deployment camera.
**Flight controller:** ArduPilot via MAVLink UART. System sends GPS_INPUT; receives IMU (200Hz target / 9.7Hz in Azaion fixture) and GLOBAL_POSITION_INT (1Hz) from FC.
**Key latency budget:** <400ms end-to-end per frame.
**Stage 1 inheritance:** ~7,800 lines of working Python code with 195 passing tests. All algorithmic kernels (ESKF, VO, GPR, MAVLink, factor graph) implemented. Stage 2 starts from this codebase on branch `stage2` (HEAD = stage1).
**Reference branch:** `try02` is checked out as a worktree at `../gps-denied-onboard-try02/` for concept harvesting. We do NOT merge from try02 — we read it for ideas and re-implement what fits.
## Constraints
- **Performance:** <400ms/frame end-to-end p95, <8GB RAM+VRAM — non-negotiable
- **Hardware:** cuVSLAM v15.0.0 (aarch64-only wheel) — Protocol with stub on x86
- **Platform:** JetPack 6.2.2, Python 3.10+, TensorRT 10.3.0, CUDA 12.6
- **Navigation accuracy:** 80% frames ≤50m, 60% frames ≤20m, max drift 100m between satellite corrections
- **Resilience:** Handle sharp turns (disconnected VO segments), 3+ consecutive satellite match failures, visual blackout, GPS spoofing promotion <3s
- **Regression floor:** All 195 stage1 passing tests must continue to pass after refactor
## Stage 2 Key Decisions
| Decision | Rationale | Outcome |
|----------|-----------|---------|
| Hexagonal layout with `components/` folders | Clear ownership per swappable backend, native bridges colocate with adapter | ✓ Phase 1 |
| `@dataclass(slots=True, frozen=True)` on hot path, Pydantic at boundaries only | Avoid try02's per-frame Pydantic latency cost; validate where it catches bugs (REST input, config) | ✓ Phase 1 (hot_types/ scaffolded; full migration Phase 2) |
| Explicit DI composition root | One file wires environment-specific implementations; tests pass mock dependencies | ✓ Phase 1 (`pipeline/composition.py:build_pipeline`) |
| Adopt try02 concept ideas, reject try02 layout details | Take Safety Anchor / Anchor Verifier / FDR / Conditional VPR; reject Pydantic-on-hot-path, BASALT | ✓ Adopted — Phases 35 |
| Take try02 acceptance criteria with numeric thresholds | Their AC-1.x…AC-NEW-x is more rigorous than stage1's drafts; bind every AC to ≥1 test | ✓ Adopted — Phase 2 |
| Test taxonomy `unit/integration/blackbox/sitl/e2e` | Clarifies CI-on-push vs PR vs nightly vs hardware-only test runs | ✓ Phase 2 |
| Stage as iteration, not phase continuation | Each stage = own roadmap, own phase numbering, own success criteria | ✓ Adopted |
## Phase 1 Outcome (2026-05-11, completed)
**ARCH-01..07 all satisfied.** 216 tests pass (baseline 195+21 new = 216), 0 failures, accuracy benchmarks unchanged.
### What was built
**Components scaffold** (`src/gps_denied/components/`):
- `vio/``protocol.py` + `orbslam_backend.py` + `cuvslam_backend.py` + `factory.py`; `core/vo.py` is a shim
- `gpr/``protocol.py` + `faiss_gpr.py` (inline numpy fallback preserved); `core/gpr.py` is a shim
- `satellite_matcher/``protocol.py` + `local_tile_loader.py` + `metric_refinement.py`; `core/satellite.py`, `core/metric.py` are shims
- `mavlink_io/``protocol.py` + `pymavlink_bridge.py` + `mock_mavlink.py`; `core/mavlink.py` is a shim (re-exports private helpers `_confidence_to_fix_type`, `_eskf_to_gps_input`, `_unix_to_gps_time`)
- `anchor_verifier/`, `safety_state/`, `flight_recorder/`, `coordinate_transforms/` — Protocol stubs only (Phases 35)
**Hot-path types** (`src/gps_denied/hot_types/`): `FrameState`, `IMUSample`, `PositionEstimate`, `VOEstimate`, `SatelliteAnchor` as `@dataclass(slots=True, frozen=True)`. Schemas shimmed to re-export. `Pose` stays Pydantic (mutation sites in `factor_graph.py` lines 182297); `GPSPoint` stays Pydantic. Full hot-path migration deferred to Phase 2.
**Pipeline package** (`src/gps_denied/pipeline/`):
- `orchestrator.py``FlightProcessor` (moved from `core/processor.py`)
- `image_input.py`, `result_manager.py`, `sse_streamer.py` (moved from `core/`)
- `composition.py``build_pipeline(env: Literal["jetson","x86_dev","ci","sitl"]) -> FlightProcessor`
**Composition root**: wires 10 components; lazy imports inside function body to avoid circular imports; Jetson env → `prefer_cuvslam=True`, `prefer_mono_depth=True`; other envs → mocks.
**Config**: `AppSettings.env` Literal field + `RuntimeConfig = AppSettings` alias. `pydantic-settings YamlConfigSettingsSource` loads `config/{env}.yaml`. `pyyaml>=6.0` declared.
**ABC→Protocol sweep**: 6 interfaces converted to `typing.Protocol` with `@runtime_checkable`:
`IFactorGraphOptimizer`, `IRouteChunkManager`, `IFailureRecoveryCoordinator`, `IModelManager`, `IImageMatcher`, + all 8 component Protocols from `components/*/protocol.py`.
**`core/` retained** for concentrated math: `eskf.py`, `factor_graph.py`, `coordinates.py`, `chunk_manager.py`, `recovery.py`, `rotation.py`, `models.py`.
**Shim policy**: every moved file leaves a re-export shim at its old path. Tests import from old paths — shims keep them green. Shim removal is Phase 2 work.
### Deferred to Phase 2
- Full hot-path type migration (`Pose`, `GPSPoint`, remaining Pydantic models on frame path)
- Test reorganization to `tests/{unit,integration,blackbox,sitl,e2e}/`
- Shim removal from `core/`
- YAML config enrichment with env-specific overrides (MAVLink connection strings, tile dirs)
## Stage 1 Decisions Inherited (validated, kept)
| Decision | Outcome |
|----------|---------|
| ESKF over EKF/UKF | ✓ Stage 1 |
| XFeat over LiteSAM for satellite matching | ✓ Stage 1 |
| OpenCV ORB stub for dev/CI; cuVSLAM on Jetson | ✓ Stage 1 |
| AnyLoc/DINOv2 for GPR | ✓ Stage 1 |
| diskcache + GeoHash for tiles | ✓ Stage 1 |
| AsyncSQLAlchemy + aiosqlite | ✓ Stage 1 |
## Evolution
Each stage is its own iteration with its own PROJECT.md, REQUIREMENTS.md, ROADMAP.md. At stage completion:
1. Snapshot current PROJECT.md / REQUIREMENTS.md / ROADMAP.md / phases/ → `.planning/archive/v[X.Y]/`
2. Open new stage with fresh roadmap (Phase 1 of the new stage)
3. Carry forward only validated decisions and unresolved Future-stages items
---
*Stage 2 opened: 2026-05-10*