Files
gps-denied-onboard/_docs/01_solution/solution_draft01.md
T

17 KiB
Raw Blame History

Solution Draft

Product Solution Description

A real-time GPS-denied visual navigation system for fixed-wing UAVs, running entirely on a Jetson Orin Nano Super (8GB). The system determines frame-center GPS coordinates by fusing three information sources: (1) CUDA-accelerated visual odometry (cuVSLAM), (2) absolute position corrections from satellite image matching, and (3) IMU-based motion prediction. Results stream to clients via REST API + SSE in real time.

Hard constraint: Camera shoots at ~3fps (333-400ms interval). The full pipeline must complete within 400ms per frame.

Satellite matching strategy: Benchmark LiteSAM on actual Orin Nano Super hardware as a day-one priority. If LiteSAM cannot achieve ≤400ms at 480px resolution, abandon it entirely and use XFeat semi-dense matching as the primary satellite matcher. Speed is non-negotiable.

Core architectural principles:

  1. cuVSLAM handles VO — NVIDIA's CUDA-accelerated library achieves 90fps on Jetson Orin Nano, giving VO essentially "for free" (~11ms/frame).
  2. Keyframe-based satellite matching — satellite matcher runs on keyframes only (every 3-10 frames), amortizing its cost. Non-keyframes rely on cuVSLAM VO + IMU.
  3. Every keyframe independently attempts satellite-based geo-localization — this handles disconnected segments natively.
  4. Pipeline parallelism — satellite matching for frame N overlaps with VO processing of frame N+1 via CUDA streams.
┌─────────────────────────────────────────────────────────────────┐
│                    OFFLINE (Before Flight)                       │
│  Satellite Tiles → Download & Crop → Store as tile pairs        │
│  (Google Maps)     (per flight plan)   (disk, GeoHash indexed)  │
└─────────────────────────────────────────────────────────────────┘
                              │
                              ▼
┌─────────────────────────────────────────────────────────────────┐
│                    ONLINE (During Flight)                        │
│                                                                  │
│  EVERY FRAME (400ms budget):                                     │
│  ┌────────────────────────────────┐                              │
│  │ Camera → Downsample (CUDA 2ms)│                               │
│  │       → cuVSLAM VO+IMU (~11ms)│──→ ESKF Update → SSE Emit   │
│  └────────────────────────────────┘         ↑                    │
│                                             │                    │
│  KEYFRAMES ONLY (every 3-10 frames):        │                    │
│  ┌────────────────────────────────────┐     │                    │
│  │ Satellite match (async CUDA stream)│─────┘                    │
│  │ LiteSAM or XFeat (see benchmark)  │                           │
│  │ (does NOT block VO output)         │                           │
│  └────────────────────────────────────┘                          │
│                                                                  │
│  IMU: 100+Hz continuous → ESKF prediction                        │
└─────────────────────────────────────────────────────────────────┘

Speed Optimization Techniques

1. cuVSLAM for Visual Odometry (~11ms/frame)

NVIDIA's CUDA-accelerated VO library (v15.0.0, March 2026) achieves 90fps on Jetson Orin Nano. Supports monocular camera + IMU natively. Features: automatic IMU fallback when visual tracking fails, loop closure, Python and C++ APIs. This eliminates custom VO entirely.

2. Keyframe-Based Satellite Matching

Not every frame needs satellite matching. Strategy:

  • cuVSLAM provides VO at every frame (high-rate, low-latency)
  • Satellite matching triggers on keyframes selected by:
    • Fixed interval: every 3-10 frames (~1-3.3s between satellite corrections)
    • Confidence drop: when ESKF covariance exceeds threshold
    • VO failure: when cuVSLAM reports tracking loss (sharp turn)

3. Satellite Matcher Selection (Benchmark-Driven)

Candidate A: LiteSAM (opt) — Best accuracy for satellite-aerial matching (RMSE@30 = 17.86m on UAV-VisLoc). 6.31M params, MobileOne + TAIFormer + MinGRU. Benchmarked at 497ms on Jetson AGX Orin at 1184px. AGX Orin is 3-4x more powerful than Orin Nano Super (275 TOPS vs 67 TOPS, $2000+ vs $249).

Realistic Orin Nano Super estimates:

  • At 1184px: ~1.5-2.0s (unusable)
  • At 640px: ~500-800ms (borderline)
  • At 480px: ~300-500ms (best case)

Candidate B: XFeat semi-dense — ~50-100ms on Orin Nano Super. Proven on Jetson. Not specifically designed for cross-view satellite-aerial, but fast and reliable.

Decision rule: Benchmark LiteSAM TensorRT FP16 at 480px on Orin Nano Super. If ≤400ms → use LiteSAM. If >400ms → abandon LiteSAM, use XFeat as primary. No hybrid compromises — pick one and optimize it.

4. TensorRT FP16 Optimization

LiteSAM's MobileOne backbone is reparameterizable — multi-branch training structure collapses to a single feed-forward path at inference. Combined with TensorRT FP16, this maximizes throughput. INT8 is possible for MobileOne backbone but ViT/transformer components may degrade with INT8.

5. CUDA Stream Pipelining

Overlap operations across consecutive frames:

  • Stream A: cuVSLAM VO for current frame (~11ms) + ESKF fusion (~1ms)
  • Stream B: Satellite matching for previous keyframe (async)
  • CPU: SSE emission, tile management, keyframe selection logic

6. Pre-cropped Satellite Tiles

Offline: for each satellite tile, store both the raw image and a pre-resized version matching the satellite matcher's input resolution. Runtime avoids resize cost.

Existing/Competitor Solutions Analysis

Solution Approach Accuracy Hardware Limitations
Mateos-Ramirez et al. (2024) VO (ORB) + satellite keypoint correction + Kalman 142m mean / 17km (0.83%) Orange Pi class No re-localization; ORB only; 1000m+ altitude
SatLoc (2025) DinoV2 + XFeat + optical flow + adaptive fusion <15m, >90% coverage Edge (unspecified) Paper not fully accessible
LiteSAM (2025) MobileOne + TAIFormer + MinGRU subpixel refinement RMSE@30 = 17.86m on UAV-VisLoc RTX 3090 (62ms), AGX Orin (497ms) Not tested on Orin Nano; AGX Orin is 3-4x more powerful
TerboucheHacene/visual_localization SuperPoint/SuperGlue/GIM + VO + satellite Not quantified Desktop-class Not edge-optimized
cuVSLAM (NVIDIA, 2025-2026) CUDA-accelerated VO+SLAM, mono/stereo/IMU <1% trajectory error (KITTI), <5cm (EuRoC) Jetson Orin Nano (90fps) VO only, no satellite matching

Key insight: Combine cuVSLAM (best-in-class VO for Jetson) with the fastest viable satellite-aerial matcher via ESKF fusion. LiteSAM is the accuracy leader but unproven on Orin Nano Super — benchmark first, abandon for XFeat if too slow.

Architecture

Component: Visual Odometry

Solution Tools Advantages Limitations Fit
cuVSLAM (mono+IMU) PyCuVSLAM / C++ API 90fps on Orin Nano, NVIDIA-optimized, loop closure, IMU fallback Closed-source CUDA library Best
XFeat frame-to-frame XFeatTensorRT 5x faster than SuperPoint, open-source ~30-50ms total, no IMU integration ⚠️ Fallback
ORB-SLAM3 OpenCV + custom Well-understood, open-source CPU-heavy, ~30fps on Orin ⚠️ Slower

Selected: cuVSLAM (mono+IMU mode) — purpose-built by NVIDIA for Jetson. ~11ms/frame leaves 389ms for everything else. Auto-fallback to IMU when visual tracking fails.

Component: Satellite Image Matching

Solution Tools Advantages Limitations Fit
LiteSAM (opt) TensorRT Best satellite-aerial accuracy (RMSE@30 17.86m), 6.31M params, subpixel refinement 497ms on AGX Orin at 1184px; AGX Orin is 3-4x more powerful than Orin Nano Super If benchmark passes
XFeat semi-dense XFeatTensorRT ~50-100ms, lightweight, Jetson-proven Not designed for cross-view satellite-aerial If LiteSAM fails benchmark
EfficientLoFTR TensorRT Good accuracy, semi-dense 15.05M params (2.4x LiteSAM), slower ⚠️ Heavier
SuperPoint + LightGlue TensorRT C++ Good general matching Sparse only, worse on satellite-aerial ⚠️ Not specialized

Selection: Benchmark-driven. Day-one test on Orin Nano Super:

  1. Export LiteSAM (opt) to TensorRT FP16
  2. Measure at 480px, 640px, 800px
  3. If ≤400ms at 480px → LiteSAM
  4. If >400ms at any viable resolution → XFeat semi-dense (primary, no hybrid)

Component: Sensor Fusion

Solution Tools Advantages Limitations Fit
Error-State EKF (ESKF) Custom Python/C++ Lightweight, multi-rate, well-understood Linear approximation Best
Hybrid ESKF/UKF Custom 49% better accuracy More complex ⚠️ Upgrade path
Factor Graph (GTSAM) GTSAM Best accuracy Heavy compute Too heavy

Selected: ESKF with adaptive measurement noise. State vector: [position(3), velocity(3), orientation_quat(4), accel_bias(3), gyro_bias(3)] = 16 states.

Measurement sources and rates:

  • IMU prediction: 100+Hz
  • cuVSLAM VO update: ~3Hz (every frame)
  • Satellite update: ~0.3-1Hz (keyframes only, delayed via async pipeline)

Component: Satellite Tile Preprocessing (Offline)

Selected: GeoHash-indexed tile pairs on disk.

Pipeline:

  1. Define operational area from flight plan
  2. Download satellite tiles from Google Maps Tile API at max zoom (18-19)
  3. Pre-resize each tile to matcher input resolution
  4. Store: original tile + resized tile + metadata (GPS bounds, zoom, GSD) in GeoHash-indexed directory structure
  5. Copy to Jetson storage before flight

Component: Re-localization (Disconnected Segments)

Selected: Keyframe satellite matching is always active + expanded search on VO failure.

When cuVSLAM reports tracking loss (sharp turn, no features):

  1. Immediately flag next frame as keyframe → trigger satellite matching
  2. Expand tile search radius (from ±200m to ±1km based on IMU dead-reckoning uncertainty)
  3. If match found: position recovered, new segment begins
  4. If 3+ consecutive keyframe failures: request user input via API

Component: Object Center Coordinates

Geometric calculation once frame-center GPS is known:

  1. Pixel offset from center: (dx_px, dy_px)
  2. Convert to meters: dx_m = dx_px × GSD, dy_m = dy_px × GSD
  3. Rotate by IMU yaw heading
  4. Convert meter offset to lat/lon and add to frame-center GPS

Component: API & Streaming

Selected: FastAPI + sse-starlette. REST for session management, SSE for real-time position stream. OpenAPI auto-documentation.

Processing Time Budget (per frame, 400ms budget)

Normal Frame (non-keyframe, ~60-80% of frames)

Step Time Notes
Image capture + transfer ~10ms CSI/USB3
Downsample (for cuVSLAM) ~2ms OpenCV CUDA
cuVSLAM VO+IMU ~11ms NVIDIA CUDA-optimized, 90fps capable
ESKF fusion (VO+IMU update) ~1ms C extension or NumPy
SSE emit ~1ms Async
Total ~25ms Well within 400ms

Keyframe Satellite Matching (async, every 3-10 frames)

Runs asynchronously on a separate CUDA stream — does NOT block per-frame VO output.

Path A — LiteSAM (if benchmark passes):

Step Time Notes
Downsample to ~480px ~1ms OpenCV CUDA
Load satellite tile ~5ms Pre-resized, from storage
LiteSAM (opt) matching ~300-500ms TensorRT FP16, 480px, Orin Nano Super estimate
Geometric pose (RANSAC) ~5ms Homography estimation
ESKF satellite update ~1ms Delayed measurement
Total ~310-510ms Async, does not block VO

Path B — XFeat (if LiteSAM abandoned):

Step Time Notes
XFeat feature extraction (both images) ~10-20ms TensorRT FP16/INT8
XFeat semi-dense matching ~30-50ms KNN + refinement
Geometric verification (RANSAC) ~5ms
ESKF satellite update ~1ms
Total ~50-80ms Comfortably within budget

Per-Frame Wall-Clock Latency

Every frame:

  • VO result emitted in ~25ms (cuVSLAM + ESKF + SSE)
  • Satellite correction arrives asynchronously on keyframes
  • Client gets immediate position, then refined position when satellite match completes

Memory Budget (Jetson Orin Nano Super, 8GB shared)

Component Memory Notes
OS + runtime ~1.5GB JetPack 6.2 + Python
cuVSLAM ~200-300MB NVIDIA CUDA library + internal state
Satellite matcher TensorRT ~50-100MB LiteSAM FP16 or XFeat FP16
Current frame (downsampled) ~2MB 640×480×3
Satellite tile (pre-resized) ~1MB Single active tile
ESKF state + buffers ~10MB
FastAPI + SSE runtime ~100MB
Total ~1.9-2.4GB ~25-30% of 8GB — comfortable margin

Confidence Scoring

Level Condition Expected Accuracy
HIGH Satellite match succeeded + cuVSLAM consistent <20m
MEDIUM cuVSLAM VO only, recent satellite correction (<500m travel) 20-50m
LOW cuVSLAM VO only, no recent satellite correction 50-100m+
VERY LOW IMU dead-reckoning only (cuVSLAM + satellite both failed) 100m+
MANUAL User-provided position As provided

Key Risks and Mitigations

Risk Likelihood Impact Mitigation
LiteSAM too slow on Orin Nano Super HIGH Misses 400ms deadline Abandon LiteSAM, use XFeat. Day-one benchmark is the go/no-go gate
cuVSLAM not supporting nadir-only camera well MEDIUM VO accuracy degrades Fall back to XFeat frame-to-frame matching
Google Maps satellite quality in conflict zone HIGH Satellite matching fails Accept VO+IMU with higher drift; request user input sooner; alternative satellite providers
XFeat cross-view accuracy insufficient MEDIUM Position corrections less accurate than LiteSAM Increase keyframe frequency; multi-tile consensus voting; geometric verification with strict RANSAC
cuVSLAM is closed-source LOW Hard to debug Fallback to XFeat VO; cuVSLAM has Python+C++ APIs

Testing Strategy

Integration / Functional Tests

  • End-to-end pipeline test with real flight data (60 images from input_data/)
  • Compare computed positions against ground truth GPS from coordinates.csv
  • Measure: percentage within 50m, percentage within 20m
  • Test sharp-turn handling: introduce 90-degree heading change in sequence
  • Test user-input fallback: simulate 3+ consecutive failures
  • Test SSE streaming: verify client receives VO result within 50ms, satellite-corrected result within 500ms
  • Test session management: start/stop/restart flight sessions via REST API

Non-Functional Tests

  • Day-one benchmark: LiteSAM TensorRT FP16 at 480/640/800px on Orin Nano Super → go/no-go for LiteSAM
  • cuVSLAM benchmark: verify 90fps monocular+IMU on Orin Nano Super
  • Performance: measure per-frame processing time (must be <400ms)
  • Memory: monitor peak usage during 1000-frame session (must stay <8GB)
  • Stress: process 3000 frames without memory leak
  • Keyframe strategy: vary interval (2, 3, 5, 10) and measure accuracy vs latency tradeoff

References

  • AC Assessment: _docs/00_research/gps_denied_nav/00_ac_assessment.md
  • Tech stack evaluation: _docs/01_solution/tech_stack.md