17 KiB
Solution Draft
Product Solution Description
A real-time GPS-denied visual navigation system for fixed-wing UAVs, running entirely on a Jetson Orin Nano Super (8GB). The system determines frame-center GPS coordinates by fusing three information sources: (1) CUDA-accelerated visual odometry (cuVSLAM), (2) absolute position corrections from satellite image matching, and (3) IMU-based motion prediction. Results stream to clients via REST API + SSE in real time.
Hard constraint: Camera shoots at ~3fps (333-400ms interval). The full pipeline must complete within 400ms per frame.
Satellite matching strategy: Benchmark LiteSAM on actual Orin Nano Super hardware as a day-one priority. If LiteSAM cannot achieve ≤400ms at 480px resolution, abandon it entirely and use XFeat semi-dense matching as the primary satellite matcher. Speed is non-negotiable.
Core architectural principles:
- cuVSLAM handles VO — NVIDIA's CUDA-accelerated library achieves 90fps on Jetson Orin Nano, giving VO essentially "for free" (~11ms/frame).
- Keyframe-based satellite matching — satellite matcher runs on keyframes only (every 3-10 frames), amortizing its cost. Non-keyframes rely on cuVSLAM VO + IMU.
- Every keyframe independently attempts satellite-based geo-localization — this handles disconnected segments natively.
- Pipeline parallelism — satellite matching for frame N overlaps with VO processing of frame N+1 via CUDA streams.
┌─────────────────────────────────────────────────────────────────┐
│ OFFLINE (Before Flight) │
│ Satellite Tiles → Download & Crop → Store as tile pairs │
│ (Google Maps) (per flight plan) (disk, GeoHash indexed) │
└─────────────────────────────────────────────────────────────────┘
│
▼
┌─────────────────────────────────────────────────────────────────┐
│ ONLINE (During Flight) │
│ │
│ EVERY FRAME (400ms budget): │
│ ┌────────────────────────────────┐ │
│ │ Camera → Downsample (CUDA 2ms)│ │
│ │ → cuVSLAM VO+IMU (~11ms)│──→ ESKF Update → SSE Emit │
│ └────────────────────────────────┘ ↑ │
│ │ │
│ KEYFRAMES ONLY (every 3-10 frames): │ │
│ ┌────────────────────────────────────┐ │ │
│ │ Satellite match (async CUDA stream)│─────┘ │
│ │ LiteSAM or XFeat (see benchmark) │ │
│ │ (does NOT block VO output) │ │
│ └────────────────────────────────────┘ │
│ │
│ IMU: 100+Hz continuous → ESKF prediction │
└─────────────────────────────────────────────────────────────────┘
Speed Optimization Techniques
1. cuVSLAM for Visual Odometry (~11ms/frame)
NVIDIA's CUDA-accelerated VO library (v15.0.0, March 2026) achieves 90fps on Jetson Orin Nano. Supports monocular camera + IMU natively. Features: automatic IMU fallback when visual tracking fails, loop closure, Python and C++ APIs. This eliminates custom VO entirely.
2. Keyframe-Based Satellite Matching
Not every frame needs satellite matching. Strategy:
- cuVSLAM provides VO at every frame (high-rate, low-latency)
- Satellite matching triggers on keyframes selected by:
- Fixed interval: every 3-10 frames (~1-3.3s between satellite corrections)
- Confidence drop: when ESKF covariance exceeds threshold
- VO failure: when cuVSLAM reports tracking loss (sharp turn)
3. Satellite Matcher Selection (Benchmark-Driven)
Candidate A: LiteSAM (opt) — Best accuracy for satellite-aerial matching (RMSE@30 = 17.86m on UAV-VisLoc). 6.31M params, MobileOne + TAIFormer + MinGRU. Benchmarked at 497ms on Jetson AGX Orin at 1184px. AGX Orin is 3-4x more powerful than Orin Nano Super (275 TOPS vs 67 TOPS, $2000+ vs $249).
Realistic Orin Nano Super estimates:
- At 1184px: ~1.5-2.0s (unusable)
- At 640px: ~500-800ms (borderline)
- At 480px: ~300-500ms (best case)
Candidate B: XFeat semi-dense — ~50-100ms on Orin Nano Super. Proven on Jetson. Not specifically designed for cross-view satellite-aerial, but fast and reliable.
Decision rule: Benchmark LiteSAM TensorRT FP16 at 480px on Orin Nano Super. If ≤400ms → use LiteSAM. If >400ms → abandon LiteSAM, use XFeat as primary. No hybrid compromises — pick one and optimize it.
4. TensorRT FP16 Optimization
LiteSAM's MobileOne backbone is reparameterizable — multi-branch training structure collapses to a single feed-forward path at inference. Combined with TensorRT FP16, this maximizes throughput. INT8 is possible for MobileOne backbone but ViT/transformer components may degrade with INT8.
5. CUDA Stream Pipelining
Overlap operations across consecutive frames:
- Stream A: cuVSLAM VO for current frame (~11ms) + ESKF fusion (~1ms)
- Stream B: Satellite matching for previous keyframe (async)
- CPU: SSE emission, tile management, keyframe selection logic
6. Pre-cropped Satellite Tiles
Offline: for each satellite tile, store both the raw image and a pre-resized version matching the satellite matcher's input resolution. Runtime avoids resize cost.
Existing/Competitor Solutions Analysis
| Solution | Approach | Accuracy | Hardware | Limitations |
|---|---|---|---|---|
| Mateos-Ramirez et al. (2024) | VO (ORB) + satellite keypoint correction + Kalman | 142m mean / 17km (0.83%) | Orange Pi class | No re-localization; ORB only; 1000m+ altitude |
| SatLoc (2025) | DinoV2 + XFeat + optical flow + adaptive fusion | <15m, >90% coverage | Edge (unspecified) | Paper not fully accessible |
| LiteSAM (2025) | MobileOne + TAIFormer + MinGRU subpixel refinement | RMSE@30 = 17.86m on UAV-VisLoc | RTX 3090 (62ms), AGX Orin (497ms) | Not tested on Orin Nano; AGX Orin is 3-4x more powerful |
| TerboucheHacene/visual_localization | SuperPoint/SuperGlue/GIM + VO + satellite | Not quantified | Desktop-class | Not edge-optimized |
| cuVSLAM (NVIDIA, 2025-2026) | CUDA-accelerated VO+SLAM, mono/stereo/IMU | <1% trajectory error (KITTI), <5cm (EuRoC) | Jetson Orin Nano (90fps) | VO only, no satellite matching |
Key insight: Combine cuVSLAM (best-in-class VO for Jetson) with the fastest viable satellite-aerial matcher via ESKF fusion. LiteSAM is the accuracy leader but unproven on Orin Nano Super — benchmark first, abandon for XFeat if too slow.
Architecture
Component: Visual Odometry
| Solution | Tools | Advantages | Limitations | Fit |
|---|---|---|---|---|
| cuVSLAM (mono+IMU) | PyCuVSLAM / C++ API | 90fps on Orin Nano, NVIDIA-optimized, loop closure, IMU fallback | Closed-source CUDA library | ✅ Best |
| XFeat frame-to-frame | XFeatTensorRT | 5x faster than SuperPoint, open-source | ~30-50ms total, no IMU integration | ⚠️ Fallback |
| ORB-SLAM3 | OpenCV + custom | Well-understood, open-source | CPU-heavy, ~30fps on Orin | ⚠️ Slower |
Selected: cuVSLAM (mono+IMU mode) — purpose-built by NVIDIA for Jetson. ~11ms/frame leaves 389ms for everything else. Auto-fallback to IMU when visual tracking fails.
Component: Satellite Image Matching
| Solution | Tools | Advantages | Limitations | Fit |
|---|---|---|---|---|
| LiteSAM (opt) | TensorRT | Best satellite-aerial accuracy (RMSE@30 17.86m), 6.31M params, subpixel refinement | 497ms on AGX Orin at 1184px; AGX Orin is 3-4x more powerful than Orin Nano Super | ✅ If benchmark passes |
| XFeat semi-dense | XFeatTensorRT | ~50-100ms, lightweight, Jetson-proven | Not designed for cross-view satellite-aerial | ✅ If LiteSAM fails benchmark |
| EfficientLoFTR | TensorRT | Good accuracy, semi-dense | 15.05M params (2.4x LiteSAM), slower | ⚠️ Heavier |
| SuperPoint + LightGlue | TensorRT C++ | Good general matching | Sparse only, worse on satellite-aerial | ⚠️ Not specialized |
Selection: Benchmark-driven. Day-one test on Orin Nano Super:
- Export LiteSAM (opt) to TensorRT FP16
- Measure at 480px, 640px, 800px
- If ≤400ms at 480px → LiteSAM
- If >400ms at any viable resolution → XFeat semi-dense (primary, no hybrid)
Component: Sensor Fusion
| Solution | Tools | Advantages | Limitations | Fit |
|---|---|---|---|---|
| Error-State EKF (ESKF) | Custom Python/C++ | Lightweight, multi-rate, well-understood | Linear approximation | ✅ Best |
| Hybrid ESKF/UKF | Custom | 49% better accuracy | More complex | ⚠️ Upgrade path |
| Factor Graph (GTSAM) | GTSAM | Best accuracy | Heavy compute | ❌ Too heavy |
Selected: ESKF with adaptive measurement noise. State vector: [position(3), velocity(3), orientation_quat(4), accel_bias(3), gyro_bias(3)] = 16 states.
Measurement sources and rates:
- IMU prediction: 100+Hz
- cuVSLAM VO update: ~3Hz (every frame)
- Satellite update: ~0.3-1Hz (keyframes only, delayed via async pipeline)
Component: Satellite Tile Preprocessing (Offline)
Selected: GeoHash-indexed tile pairs on disk.
Pipeline:
- Define operational area from flight plan
- Download satellite tiles from Google Maps Tile API at max zoom (18-19)
- Pre-resize each tile to matcher input resolution
- Store: original tile + resized tile + metadata (GPS bounds, zoom, GSD) in GeoHash-indexed directory structure
- Copy to Jetson storage before flight
Component: Re-localization (Disconnected Segments)
Selected: Keyframe satellite matching is always active + expanded search on VO failure.
When cuVSLAM reports tracking loss (sharp turn, no features):
- Immediately flag next frame as keyframe → trigger satellite matching
- Expand tile search radius (from ±200m to ±1km based on IMU dead-reckoning uncertainty)
- If match found: position recovered, new segment begins
- If 3+ consecutive keyframe failures: request user input via API
Component: Object Center Coordinates
Geometric calculation once frame-center GPS is known:
- Pixel offset from center: (dx_px, dy_px)
- Convert to meters: dx_m = dx_px × GSD, dy_m = dy_px × GSD
- Rotate by IMU yaw heading
- Convert meter offset to lat/lon and add to frame-center GPS
Component: API & Streaming
Selected: FastAPI + sse-starlette. REST for session management, SSE for real-time position stream. OpenAPI auto-documentation.
Processing Time Budget (per frame, 400ms budget)
Normal Frame (non-keyframe, ~60-80% of frames)
| Step | Time | Notes |
|---|---|---|
| Image capture + transfer | ~10ms | CSI/USB3 |
| Downsample (for cuVSLAM) | ~2ms | OpenCV CUDA |
| cuVSLAM VO+IMU | ~11ms | NVIDIA CUDA-optimized, 90fps capable |
| ESKF fusion (VO+IMU update) | ~1ms | C extension or NumPy |
| SSE emit | ~1ms | Async |
| Total | ~25ms | Well within 400ms |
Keyframe Satellite Matching (async, every 3-10 frames)
Runs asynchronously on a separate CUDA stream — does NOT block per-frame VO output.
Path A — LiteSAM (if benchmark passes):
| Step | Time | Notes |
|---|---|---|
| Downsample to ~480px | ~1ms | OpenCV CUDA |
| Load satellite tile | ~5ms | Pre-resized, from storage |
| LiteSAM (opt) matching | ~300-500ms | TensorRT FP16, 480px, Orin Nano Super estimate |
| Geometric pose (RANSAC) | ~5ms | Homography estimation |
| ESKF satellite update | ~1ms | Delayed measurement |
| Total | ~310-510ms | Async, does not block VO |
Path B — XFeat (if LiteSAM abandoned):
| Step | Time | Notes |
|---|---|---|
| XFeat feature extraction (both images) | ~10-20ms | TensorRT FP16/INT8 |
| XFeat semi-dense matching | ~30-50ms | KNN + refinement |
| Geometric verification (RANSAC) | ~5ms | |
| ESKF satellite update | ~1ms | |
| Total | ~50-80ms | Comfortably within budget |
Per-Frame Wall-Clock Latency
Every frame:
- VO result emitted in ~25ms (cuVSLAM + ESKF + SSE)
- Satellite correction arrives asynchronously on keyframes
- Client gets immediate position, then refined position when satellite match completes
Memory Budget (Jetson Orin Nano Super, 8GB shared)
| Component | Memory | Notes |
|---|---|---|
| OS + runtime | ~1.5GB | JetPack 6.2 + Python |
| cuVSLAM | ~200-300MB | NVIDIA CUDA library + internal state |
| Satellite matcher TensorRT | ~50-100MB | LiteSAM FP16 or XFeat FP16 |
| Current frame (downsampled) | ~2MB | 640×480×3 |
| Satellite tile (pre-resized) | ~1MB | Single active tile |
| ESKF state + buffers | ~10MB | |
| FastAPI + SSE runtime | ~100MB | |
| Total | ~1.9-2.4GB | ~25-30% of 8GB — comfortable margin |
Confidence Scoring
| Level | Condition | Expected Accuracy |
|---|---|---|
| HIGH | Satellite match succeeded + cuVSLAM consistent | <20m |
| MEDIUM | cuVSLAM VO only, recent satellite correction (<500m travel) | 20-50m |
| LOW | cuVSLAM VO only, no recent satellite correction | 50-100m+ |
| VERY LOW | IMU dead-reckoning only (cuVSLAM + satellite both failed) | 100m+ |
| MANUAL | User-provided position | As provided |
Key Risks and Mitigations
| Risk | Likelihood | Impact | Mitigation |
|---|---|---|---|
| LiteSAM too slow on Orin Nano Super | HIGH | Misses 400ms deadline | Abandon LiteSAM, use XFeat. Day-one benchmark is the go/no-go gate |
| cuVSLAM not supporting nadir-only camera well | MEDIUM | VO accuracy degrades | Fall back to XFeat frame-to-frame matching |
| Google Maps satellite quality in conflict zone | HIGH | Satellite matching fails | Accept VO+IMU with higher drift; request user input sooner; alternative satellite providers |
| XFeat cross-view accuracy insufficient | MEDIUM | Position corrections less accurate than LiteSAM | Increase keyframe frequency; multi-tile consensus voting; geometric verification with strict RANSAC |
| cuVSLAM is closed-source | LOW | Hard to debug | Fallback to XFeat VO; cuVSLAM has Python+C++ APIs |
Testing Strategy
Integration / Functional Tests
- End-to-end pipeline test with real flight data (60 images from input_data/)
- Compare computed positions against ground truth GPS from coordinates.csv
- Measure: percentage within 50m, percentage within 20m
- Test sharp-turn handling: introduce 90-degree heading change in sequence
- Test user-input fallback: simulate 3+ consecutive failures
- Test SSE streaming: verify client receives VO result within 50ms, satellite-corrected result within 500ms
- Test session management: start/stop/restart flight sessions via REST API
Non-Functional Tests
- Day-one benchmark: LiteSAM TensorRT FP16 at 480/640/800px on Orin Nano Super → go/no-go for LiteSAM
- cuVSLAM benchmark: verify 90fps monocular+IMU on Orin Nano Super
- Performance: measure per-frame processing time (must be <400ms)
- Memory: monitor peak usage during 1000-frame session (must stay <8GB)
- Stress: process 3000 frames without memory leak
- Keyframe strategy: vary interval (2, 3, 5, 10) and measure accuracy vs latency tradeoff
References
- LiteSAM (2025): https://www.mdpi.com/2072-4292/17/19/3349
- LiteSAM code: https://github.com/boyagesmile/LiteSAM
- cuVSLAM (2025-2026): https://github.com/NVlabs/PyCuVSLAM
- PyCuVSLAM API: https://nvlabs.github.io/PyCuVSLAM/api.html
- Mateos-Ramirez et al. (2024): https://www.mdpi.com/2076-3417/14/16/7420
- SatLoc (2025): https://www.scilit.com/publications/e5cafaf875a49297a62b298a89d5572f
- XFeat (CVPR 2024): https://arxiv.org/abs/2404.19174
- XFeat TensorRT for Jetson: https://github.com/PranavNedunghat/XFeatTensorRT
- EfficientLoFTR (CVPR 2024): https://github.com/zju3dv/EfficientLoFTR
- JetPack 6.2: https://docs.nvidia.com/jetson/archives/jetpack-archived/jetpack-62/release-notes/
- Hybrid ESKF/UKF: https://arxiv.org/abs/2512.17505
- Google Maps Tile API: https://developers.google.com/maps/documentation/tile/satellite
Related Artifacts
- AC Assessment:
_docs/00_research/gps_denied_nav/00_ac_assessment.md - Tech stack evaluation:
_docs/01_solution/tech_stack.md