Files
gps-denied-onboard/_docs/02_document/architecture.md
T

242 lines
15 KiB
Markdown

# GPS-Denied Onboard Localization — Architecture
## Architecture Vision
Build a Jetson-hosted onboard localization pipeline for fixed-wing GPS-denied flight. The hot path fuses fixed nadir camera frames and FC telemetry through OpenCV geometry, BASALT VIO, and a project-owned safety/anchor wrapper that emits calibrated `GPS_INPUT` estimates and QGC/FDR status. A triggered satellite-anchor path uses DINOv2-VLAD, CPU FAISS, ALIKED/DISK+LightGlue, and RANSAC against the offline cache; generated tiles are written back only with strict provenance and covariance gates.
### Components / Responsibilities
- Camera ingest/calibration: load frames, apply intrinsics/extrinsics, validate image quality.
- BASALT VIO adapter: produce relative camera+IMU motion from synchronized nav frames and FC IMU.
- Safety/anchor wrapper: own covariance calibration, source labels, degraded modes, anchor fusion, and `GPS_INPUT`.
- Satellite retrieval: retrieve VPR chunks from offline descriptor indexes.
- Anchor verification: run local matching/RANSAC and reject unsafe anchors.
- Cache/tile lifecycle: manage COGs, manifests, freshness, generated tiles, and sync metadata.
- MAVLink/GCS integration: consume FC telemetry and emit `GPS_INPUT`/QGC status.
- FDR/observability: record replayable mission evidence under storage caps.
- Validation harness: run still-image, public dataset, SITL, Jetson, and representative replay tests.
### Principles / Non-Negotiables
- No in-flight satellite-provider calls; runtime uses offline cache only.
- BASALT is a VIO component, not the safety authority.
- Confidence must be honest; covariance must grow in degraded modes.
- Heavy VPR/local matching is trigger-based, not per-frame.
- Raw nav/AI frames are not retained in normal operation.
- GPL VIO libraries remain reference-only unless explicitly approved.
- Plane SITL and Jetson hardware are release gates.
- Public datasets can de-risk, but representative synchronized flight data is required for final acceptance.
## 1. System Context
**Problem being solved**: During fixed-wing flight, GPS may be denied or spoofed. The onboard system must estimate WGS84 coordinates for navigation-camera frame centers and detected objects, stream `GPS_INPUT` to ArduPilot Plane, report confidence honestly, and maintain safety during VO failure, stale imagery, spoofing, and visual blackout.
**System boundaries**:
- In scope: onboard localization runtime, offline cache consumption, BASALT VIO integration, satellite anchor verification, MAVLink output, QGC status, FDR, generated tile metadata, validation harness.
- Out of scope: upstream commercial satellite-provider sourcing, Satellite Service ingest implementation, AI mission-camera detection itself, PX4 support, raw-frame retention as a normal operating mode.
**External systems**:
| System | Integration Type | Direction | Purpose |
|--------|------------------|-----------|---------|
| ArduPilot Plane FC | MAVLink | Inbound/Outbound | FC telemetry in, `GPS_INPUT` and status out |
| QGroundControl | MAVLink telemetry | Outbound | Downsampled operator status and failsafe messages |
| Azaion Suite Satellite Service | Offline file/cache sync | Inbound before flight, outbound after landing | Provides cache and receives generated tiles |
| Public/replay datasets | File/rosbag/fixture | Inbound to validation | De-risk BASALT, VPR, and anchor logic |
## 2. Technology Stack
| Layer | Technology | Version / Mode | Rationale |
|-------|------------|----------------|-----------|
| OS / GPU stack | JetPack Ubuntu + CUDA/TensorRT/ONNX Runtime | Jetson Orin Nano Super target | Required for production hardware profiling |
| Runtime language | Python + C++ | Python orchestration; C++ for BASALT/hot vision paths | Fits MAVLink/test tooling and native VIO dependencies |
| Geometry | OpenCV 4.x | Calibration, undistortion, homography, RANSAC/USAC | Mature utility layer |
| VIO | BASALT | Production candidate | BSD-friendly, strong benchmark evidence |
| VIO reference | OpenVINS | Reference/covariance baseline only | Strong EKF covariance story; GPLv3 risk |
| Backup VIO | Kimera-VIO | Backup candidate | BSD-friendly fallback with mono caveats |
| Local matching | ALIKED/DISK + LightGlue | Anchor verification and optional VO fallback | Strong learned correspondences; profile before hot-path use |
| Retrieval | DINOv2-VLAD + CPU FAISS | Triggered VPR only | Robust candidate retrieval under cache/offline constraints |
| Structured metadata DB | PostgreSQL + PostGIS | Onboard/local deployment | Spatial cache manifests, mission state, generated-tile metadata, and FDR event indexes |
| Cache imagery | COG + PostgreSQL/PostGIS manifest + signed JSON sidecars | Write-new COG objects | Efficient geospatial rasters with queryable spatial metadata and auditable sidecars |
| FDR | PostgreSQL event index + CBOR segment payloads, optional Parquet export | Per-flight rollover | Queryable event metadata with compact bounded payload segments |
| MAVLink | MAVSDK + pymavlink | MAVSDK telemetry, pymavlink `GPS_INPUT` | Exact output control |
**Key constraints from restrictions.md**:
- Jetson has 8 GB shared memory and 25 W thermal envelope, so heavy VPR/local matching cannot run every frame.
- Runtime must be offline with respect to satellite providers, so all imagery and descriptors are preloaded.
- The camera is fixed nadir; all VO choices must be validated against low-parallax/planar terrain.
- ADTi public specs conflict with current assumptions on resolution, continuous FPS, and operating temperature; manufacturer specs must be pinned before implementation.
## 3. Deployment Model
**Environments**: Development replay, public-dataset replay, Jetson hardware validation, Plane SITL, representative flight/replay rig.
**Infrastructure**:
- Onboard production runtime runs on the Jetson companion computer, not in cloud.
- Replay/test infrastructure may use Docker for deterministic fixture tests.
- Release gates require local Jetson hardware and ArduPilot Plane SITL.
**Environment-specific configuration**:
| Config | Development | Production |
|--------|-------------|------------|
| Satellite cache | Small fixture cache | Preloaded operational-area cache |
| Descriptor index | Fixture FAISS index | CPU-first FAISS index with PQ/IVF if needed |
| Secrets/signing | Local test keys | Mission/cache signing keys from Suite process |
| FDR | Local temp output | Per-flight bounded NVMe storage |
| MAVLink | SITL/replay | Physical FC telemetry link |
## 4. Data Model Overview
**Core entities**:
| Entity | Description | Owned By Component |
|--------|-------------|--------------------|
| FrameRecord | Navigation-camera frame metadata, total-occlusion status, and processing status | Camera ingest/calibration |
| TelemetrySample | FC IMU, attitude, airspeed, altitude, GPS health | MAVLink/GCS integration |
| VioState | BASALT-relative pose/velocity/bias output and quality metadata | BASALT VIO adapter |
| PositionEstimate | WGS84 estimate, covariance, source label, fix type, anchor age | Safety/anchor wrapper |
| VprChunk | Retrieval unit over cache imagery and descriptors | Satellite retrieval |
| AnchorCandidate | Retrieved tile/chunk with local-match and RANSAC evidence | Anchor verification |
| CacheTile | COG tile plus manifest and sidecar metadata | Cache/tile lifecycle |
| GeneratedTile | In-flight orthorectified tile with trust/provenance metadata | Cache/tile lifecycle |
| FdrSegment | Bounded replayable log segment | FDR/observability |
**Data flow summary**:
- Frame quality/total-occlusion gate + telemetry -> BASALT VIO when usable, or IMU-only degraded mode when not -> safety/anchor wrapper -> `GPS_INPUT`, QGC, FDR.
- Relocalization trigger -> DINOv2-VLAD/FAISS -> ALIKED/DISK+LightGlue/RANSAC -> accepted/rejected anchor.
- High-confidence pose + frame -> generated tile -> manifest/sidecar -> post-flight Satellite Service sync.
## 5. Integration Points
### Internal Communication
| From | To | Protocol | Pattern | Notes |
|------|----|----------|---------|-------|
| Camera ingest/calibration | BASALT VIO adapter | In-process queue or shared frame bus | Streaming | Timestamp discipline is critical |
| MAVLink telemetry | BASALT VIO adapter | In-process telemetry buffer | Streaming | IMU/attitude/altitude sync |
| BASALT VIO adapter | Safety/anchor wrapper | Typed state messages | Streaming | Wrapper calibrates confidence |
| Safety/anchor wrapper | Satellite retrieval | Command | Triggered request | Only on relocalization conditions |
| Satellite retrieval | Anchor verification | Candidate list | Request-response | Dynamic top-K |
| Anchor verification | Safety/anchor wrapper | Anchor decision | Request-response | Includes MRE/inliers/provenance |
| Safety/anchor wrapper | MAVLink/GCS integration | Position/status DTO | Streaming | `GPS_INPUT` emitted frame-by-frame |
| Safety/anchor wrapper | FDR/observability | Append-only events | Streaming | Bounded segments |
### External Integrations
| External System | Protocol | Auth | Failure Mode |
|-----------------|----------|------|--------------|
| ArduPilot Plane | MAVLink | Source/system ID allowlist | Degrade/failsafe; never trust spoofed GPS blindly |
| QGroundControl | MAVLink | FC telemetry path | Downsampled status may be delayed but local FDR remains authoritative |
| Satellite Service | Offline cache files | Signed manifests/sidecars | Missing/stale cache causes degraded mode, not network fetch |
| Public datasets | File/rosbag | License constraints | Not final acceptance unless representative and license-compatible |
## 6. Non-Functional Requirements
| Requirement | Target | Measurement | Priority |
|-------------|--------|-------------|----------|
| Frame latency | <400 ms p95 | Capture/replay timestamp to emitted estimate | High |
| Memory | <8 GB shared | Jetson monitoring | High |
| First fix | <30 s p95 | 50 cold starts | High |
| Thermal | No throttle at 25 W / +50 C | 8-hour hot-soak | High |
| FDR storage | <=64 GB/flight | 8-hour synthetic load | High |
| Cache storage | ~10 GB persistent budget | Full mission cache accounting | High |
| False position | P(error >500 m) <0.1%, >1 km <0.01% | Monte Carlo/replay | High |
## 7. Security Architecture
**Authentication / trust boundary**:
- Runtime accepts only local cache files with valid manifest/signature/provenance.
- MAVLink input is filtered by expected source/system IDs and FC health semantics.
**Data protection**:
- At rest: FDR and cache sidecars should be integrity protected; mission secrets/signing keys are not stored in code.
- In transit: no in-flight satellite-provider network dependency; MAVLink link security depends on FC/GCS deployment.
**Audit logging**:
- FDR records estimates, covariance, anchors, rejected anchors, cache validation failures, spoofing/blackout transitions, emitted `GPS_INPUT`, resource health, and tile-write decisions.
## 8. Key Architectural Decisions
### ADR-001: BASALT As Production VIO Candidate
**Context**: A naive OpenCV-only VIO implementation is risky, while OpenVINS has GPLv3 production constraints.
**Decision**: Use BASALT as the production relative VIO candidate and keep OpenVINS as covariance/reference baseline.
**Alternatives considered**:
1. OpenVINS as production core — rejected by default because of GPLv3 and generic VIO ownership.
2. Kimera-VIO — retained as backup due to BSD license but mono-inertial caveats.
3. Fully custom OpenCV/ESKF — fallback only because implementation burden is high.
**Consequences**: The safety/anchor wrapper must calibrate confidence around BASALT and prove it on representative data.
### ADR-002: ALIKED-LightGlue Role
**Context**: ALIKED-LightGlue can produce strong local correspondences and can support frame-to-frame homography/pose estimation.
**Decision**: Use ALIKED/DISK+LightGlue for satellite-anchor verification and evaluate it as an optional VO fallback/keyframe-assist path, not as the default BASALT replacement.
**Alternatives considered**:
1. Per-frame ALIKED-LightGlue VO hot path — deferred until Jetson profiling proves latency/memory fit.
2. SIFT/ORB-only matching — retained as regression baseline, weaker under cross-domain conditions.
3. SuperPoint+LightGlue — license-gated.
**Consequences**: Implementation tasks must benchmark ALIKED-LightGlue on frame-to-frame VO and cross-domain anchor workloads separately.
### ADR-003: Cache Metadata Format
**Context**: JSON is simple and auditable, but operational cache queries need spatial indexing, freshness filters, update safety, and integration with the project PostgreSQL database.
**Decision**: Use PostgreSQL with PostGIS as the primary cache manifest/index database, with signed JSON sidecars for each tile/generated tile for auditability and interchange.
**Alternatives considered**:
1. JSON-only manifest — simpler, but weak for query/update scale, spatial search, and consistency.
2. Embedded single-file metadata DB — efficient for small deployments, but rejected because the project will use PostgreSQL/PostGIS.
**Consequences**: The cache lifecycle component owns PostgreSQL migrations, PostGIS indexes, signature checks, and sidecar/db consistency.
### ADR-004: FDR Format
**Context**: The FDR must be compact, bounded, replayable, and exportable for analysis.
**Decision**: Use PostgreSQL for FDR event indexes and mission-query metadata, with CBOR-backed segment payloads for bounded append-heavy runtime data and optional Parquet export after flight.
**Alternatives considered**:
1. Plain CSV — rejected for type safety, size, and complex payloads.
2. Parquet as primary onboard format — good analytics, but less ideal as the runtime append/rollover path.
**Consequences**: FDR implementation must define PostgreSQL tables/indexes, CBOR segment schema, rollover behavior, and export tooling.
### ADR-006: Total Occlusion Before VIO
**Context**: BASALT should not receive frames that are completely unusable because of lens cover, cloud/whiteout, decode failure, extreme exposure, or other total visual blackout.
**Decision**: Camera ingest performs a pre-VIO total-occlusion/blackout check. Total occlusion bypasses BASALT for that frame, sends a `total_occlusion` or `visual_blackout` degradation signal to the safety wrapper, and continues IMU-only propagation from the last trusted state.
**Alternatives considered**:
1. Let BASALT detect every visual failure — rejected because total occlusion is cheaper and safer to catch before the VIO hot path.
2. Drop frames silently — rejected because the wrapper must grow covariance and emit honest degraded output.
**Consequences**: The camera component must expose `occlusion_status`, and tests must assert mode transition to `dead_reckoned`/failsafe under total blackout.
### ADR-005: Public Dataset Strategy
**Context**: Current project sample data lacks synchronized IMU and ground-truth trajectory.
**Decision**: Prioritize MUN-FRL for synchronized nadir camera + IMU + GNSS/ground truth; use ALTO for aerial localization/VPR and long nadir trajectories; investigate Kagaru/EPFL for fixed-wing/farmland relevance; use EuRoC/UZH FPV only as VIO proxies if license-compatible.
**Consequences**: Public datasets de-risk components but do not replace representative target flight data for final acceptance.