mirror of
https://github.com/azaion/gps-denied-onboard.git
synced 2026-04-22 21:56:38 +00:00
284 lines
17 KiB
Markdown
284 lines
17 KiB
Markdown
# Solution Draft
|
||
|
||
## Product Solution Description
|
||
|
||
A real-time GPS-denied visual navigation system for fixed-wing UAVs, running entirely on a Jetson Orin Nano Super (8GB). The system determines frame-center GPS coordinates by fusing three information sources: (1) CUDA-accelerated visual odometry (cuVSLAM), (2) absolute position corrections from satellite image matching, and (3) IMU-based motion prediction. Results stream to clients via REST API + SSE in real time.
|
||
|
||
**Hard constraint**: Camera shoots at ~3fps (333-400ms interval). The full pipeline must complete within **400ms per frame**.
|
||
|
||
**Satellite matching strategy**: Benchmark LiteSAM on actual Orin Nano Super hardware as a day-one priority. If LiteSAM cannot achieve ≤400ms at 480px resolution, **abandon it entirely** and use XFeat semi-dense matching as the primary satellite matcher. Speed is non-negotiable.
|
||
|
||
**Core architectural principles**:
|
||
1. **cuVSLAM handles VO** — NVIDIA's CUDA-accelerated library achieves 90fps on Jetson Orin Nano, giving VO essentially "for free" (~11ms/frame).
|
||
2. **Keyframe-based satellite matching** — satellite matcher runs on keyframes only (every 3-10 frames), amortizing its cost. Non-keyframes rely on cuVSLAM VO + IMU.
|
||
3. **Every keyframe independently attempts satellite-based geo-localization** — this handles disconnected segments natively.
|
||
4. **Pipeline parallelism** — satellite matching for frame N overlaps with VO processing of frame N+1 via CUDA streams.
|
||
|
||
```
|
||
┌─────────────────────────────────────────────────────────────────┐
|
||
│ OFFLINE (Before Flight) │
|
||
│ Satellite Tiles → Download & Crop → Store as tile pairs │
|
||
│ (Google Maps) (per flight plan) (disk, GeoHash indexed) │
|
||
└─────────────────────────────────────────────────────────────────┘
|
||
│
|
||
▼
|
||
┌─────────────────────────────────────────────────────────────────┐
|
||
│ ONLINE (During Flight) │
|
||
│ │
|
||
│ EVERY FRAME (400ms budget): │
|
||
│ ┌────────────────────────────────┐ │
|
||
│ │ Camera → Downsample (CUDA 2ms)│ │
|
||
│ │ → cuVSLAM VO+IMU (~11ms)│──→ ESKF Update → SSE Emit │
|
||
│ └────────────────────────────────┘ ↑ │
|
||
│ │ │
|
||
│ KEYFRAMES ONLY (every 3-10 frames): │ │
|
||
│ ┌────────────────────────────────────┐ │ │
|
||
│ │ Satellite match (async CUDA stream)│─────┘ │
|
||
│ │ LiteSAM or XFeat (see benchmark) │ │
|
||
│ │ (does NOT block VO output) │ │
|
||
│ └────────────────────────────────────┘ │
|
||
│ │
|
||
│ IMU: 100+Hz continuous → ESKF prediction │
|
||
└─────────────────────────────────────────────────────────────────┘
|
||
```
|
||
|
||
## Speed Optimization Techniques
|
||
|
||
### 1. cuVSLAM for Visual Odometry (~11ms/frame)
|
||
NVIDIA's CUDA-accelerated VO library (v15.0.0, March 2026) achieves 90fps on Jetson Orin Nano. Supports monocular camera + IMU natively. Features: automatic IMU fallback when visual tracking fails, loop closure, Python and C++ APIs. This eliminates custom VO entirely.
|
||
|
||
### 2. Keyframe-Based Satellite Matching
|
||
Not every frame needs satellite matching. Strategy:
|
||
- cuVSLAM provides VO at every frame (high-rate, low-latency)
|
||
- Satellite matching triggers on **keyframes** selected by:
|
||
- Fixed interval: every 3-10 frames (~1-3.3s between satellite corrections)
|
||
- Confidence drop: when ESKF covariance exceeds threshold
|
||
- VO failure: when cuVSLAM reports tracking loss (sharp turn)
|
||
|
||
### 3. Satellite Matcher Selection (Benchmark-Driven)
|
||
|
||
**Candidate A: LiteSAM (opt)** — Best accuracy for satellite-aerial matching (RMSE@30 = 17.86m on UAV-VisLoc). 6.31M params, MobileOne + TAIFormer + MinGRU. Benchmarked at 497ms on Jetson AGX Orin at 1184px. AGX Orin is 3-4x more powerful than Orin Nano Super (275 TOPS vs 67 TOPS, $2000+ vs $249).
|
||
|
||
Realistic Orin Nano Super estimates:
|
||
- At 1184px: ~1.5-2.0s (unusable)
|
||
- At 640px: ~500-800ms (borderline)
|
||
- At 480px: ~300-500ms (best case)
|
||
|
||
**Candidate B: XFeat semi-dense** — ~50-100ms on Orin Nano Super. Proven on Jetson. Not specifically designed for cross-view satellite-aerial, but fast and reliable.
|
||
|
||
**Decision rule**: Benchmark LiteSAM TensorRT FP16 at 480px on Orin Nano Super. If ≤400ms → use LiteSAM. If >400ms → **abandon LiteSAM, use XFeat as primary**. No hybrid compromises — pick one and optimize it.
|
||
|
||
### 4. TensorRT FP16 Optimization
|
||
LiteSAM's MobileOne backbone is reparameterizable — multi-branch training structure collapses to a single feed-forward path at inference. Combined with TensorRT FP16, this maximizes throughput. INT8 is possible for MobileOne backbone but ViT/transformer components may degrade with INT8.
|
||
|
||
### 5. CUDA Stream Pipelining
|
||
Overlap operations across consecutive frames:
|
||
- Stream A: cuVSLAM VO for current frame (~11ms) + ESKF fusion (~1ms)
|
||
- Stream B: Satellite matching for previous keyframe (async)
|
||
- CPU: SSE emission, tile management, keyframe selection logic
|
||
|
||
### 6. Pre-cropped Satellite Tiles
|
||
Offline: for each satellite tile, store both the raw image and a pre-resized version matching the satellite matcher's input resolution. Runtime avoids resize cost.
|
||
|
||
## Existing/Competitor Solutions Analysis
|
||
|
||
| Solution | Approach | Accuracy | Hardware | Limitations |
|
||
|----------|----------|----------|----------|-------------|
|
||
| Mateos-Ramirez et al. (2024) | VO (ORB) + satellite keypoint correction + Kalman | 142m mean / 17km (0.83%) | Orange Pi class | No re-localization; ORB only; 1000m+ altitude |
|
||
| SatLoc (2025) | DinoV2 + XFeat + optical flow + adaptive fusion | <15m, >90% coverage | Edge (unspecified) | Paper not fully accessible |
|
||
| LiteSAM (2025) | MobileOne + TAIFormer + MinGRU subpixel refinement | RMSE@30 = 17.86m on UAV-VisLoc | RTX 3090 (62ms), AGX Orin (497ms) | Not tested on Orin Nano; AGX Orin is 3-4x more powerful |
|
||
| TerboucheHacene/visual_localization | SuperPoint/SuperGlue/GIM + VO + satellite | Not quantified | Desktop-class | Not edge-optimized |
|
||
| cuVSLAM (NVIDIA, 2025-2026) | CUDA-accelerated VO+SLAM, mono/stereo/IMU | <1% trajectory error (KITTI), <5cm (EuRoC) | Jetson Orin Nano (90fps) | VO only, no satellite matching |
|
||
|
||
**Key insight**: Combine cuVSLAM (best-in-class VO for Jetson) with the fastest viable satellite-aerial matcher via ESKF fusion. LiteSAM is the accuracy leader but unproven on Orin Nano Super — benchmark first, abandon for XFeat if too slow.
|
||
|
||
## Architecture
|
||
|
||
### Component: Visual Odometry
|
||
|
||
| Solution | Tools | Advantages | Limitations | Fit |
|
||
|----------|-------|-----------|-------------|-----|
|
||
| cuVSLAM (mono+IMU) | PyCuVSLAM / C++ API | 90fps on Orin Nano, NVIDIA-optimized, loop closure, IMU fallback | Closed-source CUDA library | ✅ Best |
|
||
| XFeat frame-to-frame | XFeatTensorRT | 5x faster than SuperPoint, open-source | ~30-50ms total, no IMU integration | ⚠️ Fallback |
|
||
| ORB-SLAM3 | OpenCV + custom | Well-understood, open-source | CPU-heavy, ~30fps on Orin | ⚠️ Slower |
|
||
|
||
**Selected**: **cuVSLAM (mono+IMU mode)** — purpose-built by NVIDIA for Jetson. ~11ms/frame leaves 389ms for everything else. Auto-fallback to IMU when visual tracking fails.
|
||
|
||
### Component: Satellite Image Matching
|
||
|
||
| Solution | Tools | Advantages | Limitations | Fit |
|
||
|----------|-------|-----------|-------------|-----|
|
||
| LiteSAM (opt) | TensorRT | Best satellite-aerial accuracy (RMSE@30 17.86m), 6.31M params, subpixel refinement | 497ms on AGX Orin at 1184px; AGX Orin is 3-4x more powerful than Orin Nano Super | ✅ If benchmark passes |
|
||
| XFeat semi-dense | XFeatTensorRT | ~50-100ms, lightweight, Jetson-proven | Not designed for cross-view satellite-aerial | ✅ If LiteSAM fails benchmark |
|
||
| EfficientLoFTR | TensorRT | Good accuracy, semi-dense | 15.05M params (2.4x LiteSAM), slower | ⚠️ Heavier |
|
||
| SuperPoint + LightGlue | TensorRT C++ | Good general matching | Sparse only, worse on satellite-aerial | ⚠️ Not specialized |
|
||
|
||
**Selection**: Benchmark-driven. Day-one test on Orin Nano Super:
|
||
1. Export LiteSAM (opt) to TensorRT FP16
|
||
2. Measure at 480px, 640px, 800px
|
||
3. If ≤400ms at 480px → **LiteSAM**
|
||
4. If >400ms at any viable resolution → **XFeat semi-dense** (primary, no hybrid)
|
||
|
||
### Component: Sensor Fusion
|
||
|
||
| Solution | Tools | Advantages | Limitations | Fit |
|
||
|----------|-------|-----------|-------------|-----|
|
||
| Error-State EKF (ESKF) | Custom Python/C++ | Lightweight, multi-rate, well-understood | Linear approximation | ✅ Best |
|
||
| Hybrid ESKF/UKF | Custom | 49% better accuracy | More complex | ⚠️ Upgrade path |
|
||
| Factor Graph (GTSAM) | GTSAM | Best accuracy | Heavy compute | ❌ Too heavy |
|
||
|
||
**Selected**: **ESKF** with adaptive measurement noise. State vector: [position(3), velocity(3), orientation_quat(4), accel_bias(3), gyro_bias(3)] = 16 states.
|
||
|
||
Measurement sources and rates:
|
||
- IMU prediction: 100+Hz
|
||
- cuVSLAM VO update: ~3Hz (every frame)
|
||
- Satellite update: ~0.3-1Hz (keyframes only, delayed via async pipeline)
|
||
|
||
### Component: Satellite Tile Preprocessing (Offline)
|
||
|
||
**Selected**: **GeoHash-indexed tile pairs on disk**.
|
||
|
||
Pipeline:
|
||
1. Define operational area from flight plan
|
||
2. Download satellite tiles from Google Maps Tile API at max zoom (18-19)
|
||
3. Pre-resize each tile to matcher input resolution
|
||
4. Store: original tile + resized tile + metadata (GPS bounds, zoom, GSD) in GeoHash-indexed directory structure
|
||
5. Copy to Jetson storage before flight
|
||
|
||
### Component: Re-localization (Disconnected Segments)
|
||
|
||
**Selected**: **Keyframe satellite matching is always active + expanded search on VO failure**.
|
||
|
||
When cuVSLAM reports tracking loss (sharp turn, no features):
|
||
1. Immediately flag next frame as keyframe → trigger satellite matching
|
||
2. Expand tile search radius (from ±200m to ±1km based on IMU dead-reckoning uncertainty)
|
||
3. If match found: position recovered, new segment begins
|
||
4. If 3+ consecutive keyframe failures: request user input via API
|
||
|
||
### Component: Object Center Coordinates
|
||
|
||
Geometric calculation once frame-center GPS is known:
|
||
1. Pixel offset from center: (dx_px, dy_px)
|
||
2. Convert to meters: dx_m = dx_px × GSD, dy_m = dy_px × GSD
|
||
3. Rotate by IMU yaw heading
|
||
4. Convert meter offset to lat/lon and add to frame-center GPS
|
||
|
||
### Component: API & Streaming
|
||
|
||
**Selected**: **FastAPI + sse-starlette**. REST for session management, SSE for real-time position stream. OpenAPI auto-documentation.
|
||
|
||
## Processing Time Budget (per frame, 400ms budget)
|
||
|
||
### Normal Frame (non-keyframe, ~60-80% of frames)
|
||
|
||
| Step | Time | Notes |
|
||
|------|------|-------|
|
||
| Image capture + transfer | ~10ms | CSI/USB3 |
|
||
| Downsample (for cuVSLAM) | ~2ms | OpenCV CUDA |
|
||
| cuVSLAM VO+IMU | ~11ms | NVIDIA CUDA-optimized, 90fps capable |
|
||
| ESKF fusion (VO+IMU update) | ~1ms | C extension or NumPy |
|
||
| SSE emit | ~1ms | Async |
|
||
| **Total** | **~25ms** | Well within 400ms |
|
||
|
||
### Keyframe Satellite Matching (async, every 3-10 frames)
|
||
|
||
Runs asynchronously on a separate CUDA stream — does NOT block per-frame VO output.
|
||
|
||
**Path A — LiteSAM (if benchmark passes)**:
|
||
|
||
| Step | Time | Notes |
|
||
|------|------|-------|
|
||
| Downsample to ~480px | ~1ms | OpenCV CUDA |
|
||
| Load satellite tile | ~5ms | Pre-resized, from storage |
|
||
| LiteSAM (opt) matching | ~300-500ms | TensorRT FP16, 480px, Orin Nano Super estimate |
|
||
| Geometric pose (RANSAC) | ~5ms | Homography estimation |
|
||
| ESKF satellite update | ~1ms | Delayed measurement |
|
||
| **Total** | **~310-510ms** | Async, does not block VO |
|
||
|
||
**Path B — XFeat (if LiteSAM abandoned)**:
|
||
|
||
| Step | Time | Notes |
|
||
|------|------|-------|
|
||
| XFeat feature extraction (both images) | ~10-20ms | TensorRT FP16/INT8 |
|
||
| XFeat semi-dense matching | ~30-50ms | KNN + refinement |
|
||
| Geometric verification (RANSAC) | ~5ms | |
|
||
| ESKF satellite update | ~1ms | |
|
||
| **Total** | **~50-80ms** | Comfortably within budget |
|
||
|
||
### Per-Frame Wall-Clock Latency
|
||
|
||
Every frame:
|
||
- **VO result emitted in ~25ms** (cuVSLAM + ESKF + SSE)
|
||
- Satellite correction arrives asynchronously on keyframes
|
||
- Client gets immediate position, then refined position when satellite match completes
|
||
|
||
## Memory Budget (Jetson Orin Nano Super, 8GB shared)
|
||
|
||
| Component | Memory | Notes |
|
||
|-----------|--------|-------|
|
||
| OS + runtime | ~1.5GB | JetPack 6.2 + Python |
|
||
| cuVSLAM | ~200-300MB | NVIDIA CUDA library + internal state |
|
||
| Satellite matcher TensorRT | ~50-100MB | LiteSAM FP16 or XFeat FP16 |
|
||
| Current frame (downsampled) | ~2MB | 640×480×3 |
|
||
| Satellite tile (pre-resized) | ~1MB | Single active tile |
|
||
| ESKF state + buffers | ~10MB | |
|
||
| FastAPI + SSE runtime | ~100MB | |
|
||
| **Total** | **~1.9-2.4GB** | ~25-30% of 8GB — comfortable margin |
|
||
|
||
## Confidence Scoring
|
||
|
||
| Level | Condition | Expected Accuracy |
|
||
|-------|-----------|-------------------|
|
||
| HIGH | Satellite match succeeded + cuVSLAM consistent | <20m |
|
||
| MEDIUM | cuVSLAM VO only, recent satellite correction (<500m travel) | 20-50m |
|
||
| LOW | cuVSLAM VO only, no recent satellite correction | 50-100m+ |
|
||
| VERY LOW | IMU dead-reckoning only (cuVSLAM + satellite both failed) | 100m+ |
|
||
| MANUAL | User-provided position | As provided |
|
||
|
||
## Key Risks and Mitigations
|
||
|
||
| Risk | Likelihood | Impact | Mitigation |
|
||
|------|-----------|--------|------------|
|
||
| LiteSAM too slow on Orin Nano Super | HIGH | Misses 400ms deadline | **Abandon LiteSAM, use XFeat**. Day-one benchmark is the go/no-go gate |
|
||
| cuVSLAM not supporting nadir-only camera well | MEDIUM | VO accuracy degrades | Fall back to XFeat frame-to-frame matching |
|
||
| Google Maps satellite quality in conflict zone | HIGH | Satellite matching fails | Accept VO+IMU with higher drift; request user input sooner; alternative satellite providers |
|
||
| XFeat cross-view accuracy insufficient | MEDIUM | Position corrections less accurate than LiteSAM | Increase keyframe frequency; multi-tile consensus voting; geometric verification with strict RANSAC |
|
||
| cuVSLAM is closed-source | LOW | Hard to debug | Fallback to XFeat VO; cuVSLAM has Python+C++ APIs |
|
||
|
||
## Testing Strategy
|
||
|
||
### Integration / Functional Tests
|
||
- End-to-end pipeline test with real flight data (60 images from input_data/)
|
||
- Compare computed positions against ground truth GPS from coordinates.csv
|
||
- Measure: percentage within 50m, percentage within 20m
|
||
- Test sharp-turn handling: introduce 90-degree heading change in sequence
|
||
- Test user-input fallback: simulate 3+ consecutive failures
|
||
- Test SSE streaming: verify client receives VO result within 50ms, satellite-corrected result within 500ms
|
||
- Test session management: start/stop/restart flight sessions via REST API
|
||
|
||
### Non-Functional Tests
|
||
- **Day-one benchmark**: LiteSAM TensorRT FP16 at 480/640/800px on Orin Nano Super → go/no-go for LiteSAM
|
||
- cuVSLAM benchmark: verify 90fps monocular+IMU on Orin Nano Super
|
||
- Performance: measure per-frame processing time (must be <400ms)
|
||
- Memory: monitor peak usage during 1000-frame session (must stay <8GB)
|
||
- Stress: process 3000 frames without memory leak
|
||
- Keyframe strategy: vary interval (2, 3, 5, 10) and measure accuracy vs latency tradeoff
|
||
|
||
## References
|
||
- LiteSAM (2025): https://www.mdpi.com/2072-4292/17/19/3349
|
||
- LiteSAM code: https://github.com/boyagesmile/LiteSAM
|
||
- cuVSLAM (2025-2026): https://github.com/NVlabs/PyCuVSLAM
|
||
- PyCuVSLAM API: https://nvlabs.github.io/PyCuVSLAM/api.html
|
||
- Mateos-Ramirez et al. (2024): https://www.mdpi.com/2076-3417/14/16/7420
|
||
- SatLoc (2025): https://www.scilit.com/publications/e5cafaf875a49297a62b298a89d5572f
|
||
- XFeat (CVPR 2024): https://arxiv.org/abs/2404.19174
|
||
- XFeat TensorRT for Jetson: https://github.com/PranavNedunghat/XFeatTensorRT
|
||
- EfficientLoFTR (CVPR 2024): https://github.com/zju3dv/EfficientLoFTR
|
||
- JetPack 6.2: https://docs.nvidia.com/jetson/archives/jetpack-archived/jetpack-62/release-notes/
|
||
- Hybrid ESKF/UKF: https://arxiv.org/abs/2512.17505
|
||
- Google Maps Tile API: https://developers.google.com/maps/documentation/tile/satellite
|
||
|
||
## Related Artifacts
|
||
- AC Assessment: `_docs/00_research/gps_denied_nav/00_ac_assessment.md`
|
||
- Tech stack evaluation: `_docs/01_solution/tech_stack.md`
|