mirror of
https://github.com/azaion/gps-denied-onboard.git
synced 2026-04-22 21:56:38 +00:00
357 lines
27 KiB
Markdown
357 lines
27 KiB
Markdown
# Solution Draft
|
||
|
||
## Assessment Findings
|
||
|
||
| Old Component Solution | Weak Point (functional/security/performance) | New Solution |
|
||
|------------------------|----------------------------------------------|-------------|
|
||
| LiteSAM at 480px as satellite matcher | **Performance**: 497ms on AGX Orin at 1184px. Orin Nano Super is ~3-4x slower. At 480px estimated ~270-360ms — borderline. Paper uses PyTorch AMP, not TensorRT FP16. TensorRT could bring 2-3x improvement. | Add TensorRT FP16 as mandatory optimization step. Revised estimate at 480px with TensorRT: ~90-180ms. Still benchmark-driven: abandon if >400ms. |
|
||
| XFeat as LiteSAM fallback for satellite matching | **Functional**: XFeat is a general-purpose feature matcher, NOT designed for cross-view satellite-aerial gap. May fail on season/lighting differences between UAV and satellite imagery. | **Expand fallback options**: benchmark EfficientLoFTR (designed for weak-texture aerial) alongside XFeat. Consider STHN-style deep homography as third option. See detailed satellite matcher comparison below. |
|
||
| SP+LG considered as "sparse only, worse on satellite-aerial" | **Functional**: LiteSAM paper confirms "SP+LG achieves fastest inference speed but at expense of accuracy." Sparse matcher fails on texture-scarce regions. ~180-360ms on Orin Nano Super. | **Reject SP+LG** for both VO and satellite matching. cuVSLAM is 15-33x faster for VO. |
|
||
| cuVSLAM on low-texture terrain | **Functional**: cuVSLAM uses Shi-Tomasi corners + Lucas-Kanade tracking. On uniform agricultural fields/water bodies, features will be sparse → frequent tracking loss. IMU fallback lasts only ~1s. No published benchmarks for nadir agricultural terrain. Does NOT guarantee pose recovery after tracking loss. | **CRITICAL RISK**: cuVSLAM will likely fail frequently over low-texture terrain. Mitigation: (1) increase satellite matching frequency in low-texture areas, (2) use IMU dead-reckoning bridge, (3) accept higher drift in featureless segments, (4) XFeat VO as secondary fallback may also struggle on same terrain. |
|
||
| cuVSLAM memory estimate ~200-300MB | **Performance**: Map grows over time. For 3000-frame flights (~16min at 3fps), map could reach 500MB-1GB without pruning. | Configure cuVSLAM map pruning. Set max keyframes. Monitor memory. |
|
||
| Tile search on VO failure: "expand to ±1km" | **Functional**: Underspecified. Loading 10-20 tiles slow from disk I/O. | Preload tiles within ±2km of flight plan into RAM. Ranked search by IMU dead-reckoning position. |
|
||
| LiteSAM resolution | **Performance**: Paper benchmarked at 1184px on AGX Orin (497ms AMP). TensorRT FP16 with reparameterized MobileOne expected 2-3x faster. | Benchmark LiteSAM TRT FP16 at **1280px** on Orin Nano Super. If ≤200ms → use LiteSAM at 1280px. If >200ms → use XFeat. |
|
||
| SP+LG proposed for VO by user | **Performance**: ~130-280ms/frame on Orin Nano. cuVSLAM ~8.6ms/frame. No IMU, no loop closure. | **Reject SP+LG for VO.** cuVSLAM 15-33x faster. XFeat frame-to-frame remains fallback. |
|
||
|
||
## Product Solution Description
|
||
|
||
A real-time GPS-denied visual navigation system for fixed-wing UAVs, running entirely on a Jetson Orin Nano Super (8GB). The system determines frame-center GPS coordinates by fusing three information sources: (1) CUDA-accelerated visual odometry (cuVSLAM), (2) absolute position corrections from satellite image matching, and (3) IMU-based motion prediction. Results stream to clients via REST API + SSE in real time.
|
||
|
||
**Hard constraint**: Camera shoots at ~3fps (333-400ms interval). The full pipeline must complete within **400ms per frame**.
|
||
|
||
**Satellite matching strategy**: Benchmark LiteSAM TensorRT FP16 at **1280px** on Orin Nano Super as a day-one priority. The paper's AGX Orin benchmark used PyTorch AMP — TensorRT FP16 with reparameterized MobileOne should yield 2-3x additional speedup. **Decision rule: if LiteSAM TRT FP16 at 1280px ≤200ms → use LiteSAM. If >200ms → use XFeat.**
|
||
|
||
**Core architectural principles**:
|
||
1. **cuVSLAM handles VO** — 116fps on Orin Nano 8GB, ~8.6ms/frame. SuperPoint+LightGlue was evaluated and rejected (15-33x slower, no IMU integration).
|
||
2. **Keyframe-based satellite matching** — satellite matcher runs on keyframes only (every 3-10 frames), amortizing cost. Non-keyframes rely on cuVSLAM VO + IMU.
|
||
3. **Every keyframe independently attempts satellite-based geo-localization** — handles disconnected segments natively.
|
||
4. **Pipeline parallelism** — satellite matching for frame N overlaps with VO processing of frame N+1 via CUDA streams.
|
||
5. **Proactive tile loading** — preload tiles within ±2km of flight plan into RAM for fast lookup during expanded search.
|
||
|
||
```
|
||
┌─────────────────────────────────────────────────────────────────┐
|
||
│ OFFLINE (Before Flight) │
|
||
│ Satellite Tiles → Download & Crop → Store as tile pairs │
|
||
│ (Google Maps) (per flight plan) (disk, GeoHash indexed) │
|
||
└─────────────────────────────────────────────────────────────────┘
|
||
│
|
||
▼
|
||
┌─────────────────────────────────────────────────────────────────┐
|
||
│ ONLINE (During Flight) │
|
||
│ │
|
||
│ EVERY FRAME (400ms budget): │
|
||
│ ┌────────────────────────────────┐ │
|
||
│ │ Camera → Downsample (CUDA 2ms)│ │
|
||
│ │ → cuVSLAM VO+IMU (~9ms) │──→ ESKF Update → SSE Emit │
|
||
│ └────────────────────────────────┘ ↑ │
|
||
│ │ │
|
||
│ KEYFRAMES ONLY (every 3-10 frames): │ │
|
||
│ ┌────────────────────────────────────┐ │ │
|
||
│ │ Satellite match (async CUDA stream)│─────┘ │
|
||
│ │ LiteSAM TRT FP16 or XFeat │ │
|
||
│ │ (does NOT block VO output) │ │
|
||
│ └────────────────────────────────────┘ │
|
||
│ │
|
||
│ IMU: 100+Hz continuous → ESKF prediction │
|
||
│ TILES: ±2km preloaded in RAM from flight plan │
|
||
└─────────────────────────────────────────────────────────────────┘
|
||
```
|
||
|
||
## Speed Optimization Techniques
|
||
|
||
### 1. cuVSLAM for Visual Odometry (~9ms/frame)
|
||
NVIDIA's CUDA-accelerated VO library (v15.0.0, March 2026) achieves 116fps on Jetson Orin Nano 8GB at 720p. Supports monocular camera + IMU natively. Features: automatic IMU fallback when visual tracking fails, loop closure, Python and C++ APIs.
|
||
|
||
**Why not SuperPoint+LightGlue for VO**: SP+LG is 15-33x slower (~130-280ms vs ~9ms). Lacks IMU integration, loop closure, auto-fallback.
|
||
|
||
**CRITICAL: cuVSLAM on difficult/even terrain (agricultural fields, water)**:
|
||
cuVSLAM uses Shi-Tomasi corner detection + Lucas-Kanade optical flow tracking (classical features, not learned). On uniform agricultural terrain or water bodies:
|
||
- Very few corners will be detected → sparse/unreliable tracking
|
||
- Frequent keyframe creation → heavier compute
|
||
- Tracking loss → IMU fallback (~1 second) → constant-velocity integrator (~0.5s more)
|
||
- cuVSLAM does NOT guarantee pose recovery after tracking loss
|
||
- All published benchmarks (KITTI: urban/suburban, EuRoC: indoor) do NOT include nadir agricultural terrain
|
||
- Multi-stereo mode helps with featureless surfaces, but we have mono camera only
|
||
|
||
**Mitigation strategy for low-texture terrain**:
|
||
1. **Increase satellite matching frequency**: In low-texture areas (detected by cuVSLAM's keypoint count dropping), switch from every 3-10 frames to every frame
|
||
2. **IMU dead-reckoning bridge**: When cuVSLAM reports tracking loss, ESKF continues with IMU prediction. At 3fps with ~1.5s IMU bridge, that covers ~4-5 frames
|
||
3. **Accept higher drift**: In featureless segments, position accuracy degrades to IMU-only level (50-100m+ over ~10s). Satellite matching must recover absolute position when texture returns
|
||
4. **Keypoint density monitoring**: Track cuVSLAM's number of tracked features per frame. When below threshold (e.g., <50), proactively trigger satellite matching
|
||
5. **XFeat frame-to-frame as VO fallback**: XFeat uses learned features that may detect texture invisible to Shi-Tomasi corners. But XFeat may also struggle on truly uniform terrain
|
||
|
||
### 2. Keyframe-Based Satellite Matching
|
||
Not every frame needs satellite matching. Strategy:
|
||
- cuVSLAM provides VO at every frame (high-rate, low-latency)
|
||
- Satellite matching triggers on **keyframes** selected by:
|
||
- Fixed interval: every 3-10 frames (~1-3.3s between satellite corrections)
|
||
- Confidence drop: when ESKF covariance exceeds threshold
|
||
- VO failure: when cuVSLAM reports tracking loss (sharp turn)
|
||
|
||
### 3. Satellite Matcher Selection (Benchmark-Driven)
|
||
|
||
**Important context**: Our UAV-to-satellite matching is EASIER than typical cross-view geo-localization problems. Both the UAV camera and satellite imagery are approximately nadir (top-down). The main challenges are season/lighting differences, resolution mismatch, and temporal changes — not the extreme viewpoint gap seen in ground-to-satellite matching. This means even general-purpose matchers may perform well.
|
||
|
||
**Candidate A: LiteSAM (opt) with TensorRT FP16 at 1280px** — Best satellite-aerial accuracy (RMSE@30 = 17.86m on UAV-VisLoc). 6.31M params, MobileOne reparameterizable for TensorRT. Paper benchmarked at 497ms on AGX Orin using AMP at 1184px. TensorRT FP16 with reparameterized MobileOne expected 2-3x faster than AMP. At 1280px (close to paper's 1184px benchmark resolution), accuracy should match published results.
|
||
|
||
Orin Nano Super TensorRT FP16 estimate at 1280px:
|
||
- AGX Orin AMP @ 1184px: 497ms
|
||
- TRT FP16 speedup over AMP: ~2-3x → AGX Orin TRT estimate: ~165-250ms
|
||
- Orin Nano Super is ~3-4x slower → estimate: ~500-1000ms without TRT
|
||
- With TRT FP16: **~165-330ms** (realistic range)
|
||
- Go/no-go threshold: **≤200ms**
|
||
|
||
**Candidate B (fallback): XFeat semi-dense** — ~50-100ms on Orin Nano Super. Proven on Jetson. General-purpose, not designed for cross-view gap. FASTEST option. Since our cross-view gap is small (both nadir), XFeat may work adequately for this specific use case.
|
||
|
||
**Other evaluated options (not selected)**:
|
||
|
||
- **EfficientLoFTR**: Semi-dense, 15.05M params, handles weak-texture well. ~20% slower than LiteSAM. Strong option if LiteSAM codebase proves difficult to export to TRT, but larger model footprint.
|
||
- **Deep Homography (STHN-style)**: End-to-end homography estimation, no feature/RANSAC pipeline. 4.24m at 50m range. Interesting future option but needs RGB retraining — higher implementation risk.
|
||
- **PFED and retrieval-based methods**: Image RETRIEVAL only (identifies which tile matches), not pixel-level matching. We already know which tile to use from ESKF position.
|
||
- **SuperPoint+LightGlue**: Sparse matcher. LiteSAM paper confirms worse satellite-aerial accuracy. Slower than XFeat.
|
||
|
||
**Decision rule** (day-one on Orin Nano Super):
|
||
1. Export LiteSAM (opt) to TensorRT FP16
|
||
2. Benchmark at **1280px**
|
||
3. **If ≤200ms → use LiteSAM at 1280px**
|
||
4. **If >200ms → use XFeat**
|
||
|
||
### 4. TensorRT FP16 Optimization
|
||
LiteSAM's MobileOne backbone is reparameterizable — multi-branch training structure collapses to a single feed-forward path at inference. Combined with TensorRT FP16, this maximizes throughput. **Do NOT use INT8 on transformer components** (TAIFormer) — accuracy degrades. INT8 is safe only for the MobileOne backbone CNN layers.
|
||
|
||
### 5. CUDA Stream Pipelining
|
||
Overlap operations across consecutive frames:
|
||
- Stream A: cuVSLAM VO for current frame (~9ms) + ESKF fusion (~1ms)
|
||
- Stream B: Satellite matching for previous keyframe (async)
|
||
- CPU: SSE emission, tile management, keyframe selection logic
|
||
|
||
### 6. Proactive Tile Loading
|
||
**Change from draft01**: Instead of loading tiles on-demand from disk, preload tiles within ±2km of the flight plan into RAM at session start. This eliminates disk I/O latency during flight. For a 50km flight path, ~2000 tiles at zoom 19 ≈ ~200MB RAM — well within budget.
|
||
|
||
On VO failure / expanded search:
|
||
1. Compute IMU dead-reckoning position
|
||
2. Rank preloaded tiles by distance to predicted position
|
||
3. Try top 3 tiles (not all tiles in ±1km radius)
|
||
4. If no match in top 3, expand to next 3
|
||
|
||
## Existing/Competitor Solutions Analysis
|
||
|
||
| Solution | Approach | Accuracy | Hardware | Limitations |
|
||
|----------|----------|----------|----------|-------------|
|
||
| Mateos-Ramirez et al. (2024) | VO (ORB) + satellite keypoint correction + Kalman | 142m mean / 17km (0.83%) | Orange Pi class | No re-localization; ORB only; 1000m+ altitude |
|
||
| SatLoc (2025) | DinoV2 + XFeat + optical flow + adaptive fusion | <15m, >90% coverage | Edge (unspecified) | Paper not fully accessible |
|
||
| LiteSAM (2025) | MobileOne + TAIFormer + MinGRU subpixel refinement | RMSE@30 = 17.86m on UAV-VisLoc | RTX 3090 (62ms), AGX Orin (497ms@1184px) | Not tested on Orin Nano; AGX Orin is 3-4x more powerful |
|
||
| TerboucheHacene/visual_localization | SuperPoint/SuperGlue/GIM + VO + satellite | Not quantified | Desktop-class | Not edge-optimized |
|
||
| cuVSLAM (NVIDIA, 2025-2026) | CUDA-accelerated VO+SLAM, mono/stereo/IMU | <1% trajectory error (KITTI), <5cm (EuRoC) | Jetson Orin Nano (116fps) | VO only, no satellite matching |
|
||
| VRLM (2024) | FocalNet backbone + multi-scale feature fusion | 83.35% MA@20 | Desktop | Not edge-optimized |
|
||
| Scale-Aware UAV-to-Satellite (2026) | Semantic geometric + metric scale recovery | N/A | Desktop | Addresses scale ambiguity problem |
|
||
| EfficientLoFTR (CVPR 2024) | Aggregated attention + adaptive token selection, semi-dense | Competitive with LiteSAM | 2.5x faster than LoFTR, TRT available | 15.05M params, heavier than LiteSAM |
|
||
| PFED (2025) | Knowledge distillation + multi-view refinement, retrieval | 97.15% Recall@1 (University-1652) | AGX Orin (251.5 FPS) | Retrieval only, not pixel-level matching |
|
||
| STHN (IEEE RA-L 2024) | Deep homography estimation, coarse-to-fine | 4.24m at 50m range | Open-source, lightweight | Trained on thermal, needs RGB retraining |
|
||
| Hierarchical AVL (2025) | DINOv2 retrieval + SuperPoint matching | 64.5-95% success rate | ROS, IMU integration | Two-stage complexity |
|
||
| JointLoc (IROS 2024) | Retrieval + VO fusion, adaptive weighting | 0.237m RMSE over 1km | Open-source | Designed for Mars/planetary, needs adaptation |
|
||
|
||
## Architecture
|
||
|
||
### Component: Visual Odometry
|
||
|
||
| Solution | Tools | Advantages | Limitations | Performance | Fit |
|
||
|----------|-------|-----------|-------------|------------|-----|
|
||
| cuVSLAM (mono+IMU) | PyCuVSLAM v15.0.0 | 116fps on Orin Nano, NVIDIA-optimized, loop closure, IMU fallback | Closed-source CUDA library | ~9ms/frame | ✅ Best |
|
||
| XFeat frame-to-frame | XFeatTensorRT | 5x faster than SuperPoint, open-source | ~30-50ms total, no IMU integration | ~30-50ms/frame | ⚠️ Fallback |
|
||
| SuperPoint+LightGlue | LightGlue-ONNX TRT | Good accuracy, adaptive pruning | ~130-280ms, no IMU, no loop closure | ~130-280ms/frame | ❌ Rejected |
|
||
| ORB-SLAM3 | OpenCV + custom | Well-understood, open-source | CPU-heavy, ~30fps on Orin | ~33ms/frame | ⚠️ Slower |
|
||
|
||
**Selected**: **cuVSLAM (mono+IMU mode)** — 116fps, purpose-built by NVIDIA for Jetson. Auto-fallback to IMU when visual tracking fails.
|
||
|
||
**SP+LG rejection rationale**: 15-33x slower than cuVSLAM. No built-in IMU fusion, loop closure, or tracking failure detection. Building these features around SP+LG would take significant development time and still be slower. XFeat at ~30-50ms is a better fallback for VO if cuVSLAM fails on nadir camera.
|
||
|
||
### Component: Satellite Image Matching
|
||
|
||
| Solution | Tools | Advantages | Limitations | Performance | Fit |
|
||
|----------|-------|-----------|-------------|------------|-----|
|
||
| LiteSAM (opt) TRT FP16 @ 1280px | TensorRT | Best satellite-aerial accuracy (RMSE@30 17.86m), 6.31M params, subpixel refinement | Untested on Orin Nano Super with TensorRT | Est. ~165-330ms @ 1280px TRT FP16 | ✅ If ≤200ms |
|
||
| XFeat semi-dense | XFeatTensorRT | ~50-100ms, lightweight, Jetson-proven, fastest | General-purpose, not designed for cross-view. Our nadir-nadir gap is small → may work. | ~50-100ms | ✅ Fallback if LiteSAM >200ms |
|
||
|
||
**Selection**: Day-one benchmark on Orin Nano Super:
|
||
1. Export LiteSAM (opt) to TensorRT FP16
|
||
2. Benchmark at **1280px**
|
||
3. **If ≤200ms → LiteSAM at 1280px**
|
||
4. **If >200ms → XFeat**
|
||
|
||
### Component: Sensor Fusion
|
||
|
||
| Solution | Tools | Advantages | Limitations | Performance | Fit |
|
||
|----------|-------|-----------|-------------|------------|-----|
|
||
| Error-State EKF (ESKF) | Custom Python/C++ | Lightweight, multi-rate, well-understood | Linear approximation | <1ms/step | ✅ Best |
|
||
| Hybrid ESKF/UKF | Custom | 49% better accuracy | More complex | ~2-3ms/step | ⚠️ Upgrade path |
|
||
| Factor Graph (GTSAM) | GTSAM | Best accuracy | Heavy compute | ~10-50ms/step | ❌ Too heavy |
|
||
|
||
**Selected**: **ESKF** with adaptive measurement noise. State vector: [position(3), velocity(3), orientation_quat(4), accel_bias(3), gyro_bias(3)] = 16 states.
|
||
|
||
### Component: Satellite Tile Preprocessing (Offline)
|
||
|
||
**Selected**: **GeoHash-indexed tile pairs on disk + RAM preloading**.
|
||
|
||
Pipeline:
|
||
1. Define operational area from flight plan
|
||
2. Download satellite tiles from Google Maps Tile API at max zoom (18-19)
|
||
3. Pre-resize each tile to matcher input resolution
|
||
4. Store: original tile + resized tile + metadata (GPS bounds, zoom, GSD) in GeoHash-indexed directory structure
|
||
5. Copy to Jetson storage before flight
|
||
6. **At session start**: preload tiles within ±2km of flight plan into RAM (~200MB for 50km route)
|
||
|
||
### Component: Re-localization (Disconnected Segments)
|
||
|
||
**Selected**: **Keyframe satellite matching is always active + ranked tile search on VO failure**.
|
||
|
||
When cuVSLAM reports tracking loss (sharp turn, no features):
|
||
1. Immediately flag next frame as keyframe → trigger satellite matching
|
||
2. Compute IMU dead-reckoning position since last known position
|
||
3. Rank preloaded tiles by distance to dead-reckoning position
|
||
4. Try top 3 tiles sequentially (not all tiles in radius)
|
||
5. If match found: position recovered, new segment begins
|
||
6. If 3 consecutive keyframe failures across top tiles: expand to next 3 tiles
|
||
7. If still no match after 3+ full attempts: request user input via API
|
||
|
||
### Component: Object Center Coordinates
|
||
|
||
Geometric calculation once frame-center GPS is known:
|
||
1. Pixel offset from center: (dx_px, dy_px)
|
||
2. Convert to meters: dx_m = dx_px × GSD, dy_m = dy_px × GSD
|
||
3. Rotate by IMU yaw heading
|
||
4. Convert meter offset to lat/lon and add to frame-center GPS
|
||
|
||
### Component: API & Streaming
|
||
|
||
**Selected**: **FastAPI + sse-starlette**. REST for session management, SSE for real-time position stream. OpenAPI auto-documentation.
|
||
|
||
## Processing Time Budget (per frame, 400ms budget)
|
||
|
||
### Normal Frame (non-keyframe, ~60-80% of frames)
|
||
|
||
| Step | Time | Notes |
|
||
|------|------|-------|
|
||
| Image capture + transfer | ~10ms | CSI/USB3 |
|
||
| Downsample (for cuVSLAM) | ~2ms | OpenCV CUDA |
|
||
| cuVSLAM VO+IMU | ~9ms | NVIDIA CUDA-optimized, 116fps capable |
|
||
| ESKF fusion (VO+IMU update) | ~1ms | C extension or NumPy |
|
||
| SSE emit | ~1ms | Async |
|
||
| **Total** | **~23ms** | Well within 400ms |
|
||
|
||
### Keyframe Satellite Matching (async, every 3-10 frames)
|
||
|
||
Runs asynchronously on a separate CUDA stream — does NOT block per-frame VO output.
|
||
|
||
**Path A — LiteSAM TRT FP16 at 1280px (if ≤200ms benchmark)**:
|
||
|
||
| Step | Time | Notes |
|
||
|------|------|-------|
|
||
| Downsample to 1280px | ~1ms | OpenCV CUDA |
|
||
| Load satellite tile | ~1ms | Pre-loaded in RAM |
|
||
| LiteSAM (opt) TRT FP16 matching | ≤200ms | TensorRT FP16, 1280px, go/no-go threshold |
|
||
| Geometric pose (RANSAC) | ~5ms | Homography estimation |
|
||
| ESKF satellite update | ~1ms | Delayed measurement |
|
||
| **Total** | **≤210ms** | Async, within budget |
|
||
|
||
**Path B — XFeat (if LiteSAM >200ms)**:
|
||
|
||
| Step | Time | Notes |
|
||
|------|------|-------|
|
||
| XFeat feature extraction (both images) | ~10-20ms | TensorRT FP16/INT8 |
|
||
| XFeat semi-dense matching | ~30-50ms | KNN + refinement |
|
||
| Geometric verification (RANSAC) | ~5ms | |
|
||
| ESKF satellite update | ~1ms | |
|
||
| **Total** | **~50-80ms** | Comfortably within budget |
|
||
|
||
## Memory Budget (Jetson Orin Nano Super, 8GB shared)
|
||
|
||
| Component | Memory | Notes |
|
||
|-----------|--------|-------|
|
||
| OS + runtime | ~1.5GB | JetPack 6.2 + Python |
|
||
| cuVSLAM | ~200-500MB | CUDA library + map state. **Configure map pruning for 3000-frame flights** |
|
||
| Satellite matcher TensorRT | ~50-100MB | LiteSAM FP16 or XFeat FP16 |
|
||
| Preloaded satellite tiles | ~200MB | ±2km of flight plan, pre-resized |
|
||
| Current frame (downsampled) | ~2MB | 640×480×3 |
|
||
| ESKF state + buffers | ~10MB | |
|
||
| FastAPI + SSE runtime | ~100MB | |
|
||
| **Total** | **~2.1-2.9GB** | ~26-36% of 8GB — comfortable margin |
|
||
|
||
## Confidence Scoring
|
||
|
||
| Level | Condition | Expected Accuracy |
|
||
|-------|-----------|-------------------|
|
||
| HIGH | Satellite match succeeded + cuVSLAM consistent | <20m |
|
||
| MEDIUM | cuVSLAM VO only, recent satellite correction (<500m travel) | 20-50m |
|
||
| LOW | cuVSLAM VO only, no recent satellite correction | 50-100m+ |
|
||
| VERY LOW | IMU dead-reckoning only (cuVSLAM + satellite both failed) | 100m+ |
|
||
| MANUAL | User-provided position | As provided |
|
||
|
||
## Key Risks and Mitigations
|
||
|
||
| Risk | Likelihood | Impact | Mitigation |
|
||
|------|-----------|--------|------------|
|
||
| **cuVSLAM fails on low-texture agricultural terrain** | **HIGH** | Frequent tracking loss, degraded VO | Increase satellite matching frequency when keypoint count drops. IMU dead-reckoning bridge (~1.5s). Accept higher drift in featureless segments. Satellite matching recovers position when texture returns. |
|
||
| LiteSAM TRT FP16 >200ms at 1280px on Orin Nano Super | MEDIUM | Must use XFeat instead (less accurate for cross-view) | Day-one TRT FP16 benchmark. If >200ms → XFeat. Since our nadir-nadir gap is small, XFeat may still perform adequately. |
|
||
| XFeat cross-view accuracy insufficient | MEDIUM | Satellite corrections less accurate | Benchmark XFeat on actual operational area satellite-aerial pairs. Increase keyframe frequency; multi-tile consensus; strict RANSAC. |
|
||
| cuVSLAM map memory growth on long flights | MEDIUM | Memory pressure | Configure map pruning, set max keyframes. Monitor memory. |
|
||
| Google Maps satellite quality in conflict zone | HIGH | Satellite matching fails | Accept VO+IMU with higher drift; request user input sooner; alternative satellite providers |
|
||
| cuVSLAM is closed-source, no nadir benchmarks | MEDIUM | Unknown failure modes over farmland | Extensive testing with real nadir UAV imagery before deployment. XFeat VO as fallback (also uses learned features). |
|
||
| Tile I/O bottleneck during expanded search | LOW | Delayed re-localization | Preload ±2km tiles in RAM; ranked search instead of exhaustive |
|
||
|
||
## Testing Strategy
|
||
|
||
### Integration / Functional Tests
|
||
- End-to-end pipeline test with real flight data (60 images from input_data/)
|
||
- Compare computed positions against ground truth GPS from coordinates.csv
|
||
- Measure: percentage within 50m, percentage within 20m
|
||
- Test sharp-turn handling: introduce 90-degree heading change in sequence
|
||
- Test user-input fallback: simulate 3+ consecutive failures
|
||
- Test SSE streaming: verify client receives VO result within 50ms, satellite-corrected result within 500ms
|
||
- Test session management: start/stop/restart flight sessions via REST API
|
||
- Test cuVSLAM map memory: run 3000-frame session, monitor memory growth
|
||
|
||
### Non-Functional Tests
|
||
- **Day-one satellite matcher benchmark**: LiteSAM TRT FP16 at **1280px** on Orin Nano Super. If ≤200ms → use LiteSAM. If >200ms → use XFeat. Also measure accuracy on test satellite-aerial pairs for both.
|
||
- cuVSLAM benchmark: verify 116fps monocular+IMU on Orin Nano Super
|
||
- **cuVSLAM terrain stress test**: test with nadir camera over (a) urban/structured terrain, (b) agricultural fields, (c) water/uniform terrain, (d) forest. Measure: keypoint count, tracking success rate, drift per 100 frames, IMU fallback frequency
|
||
- cuVSLAM keypoint monitoring: verify that low-keypoint detection triggers increased satellite matching
|
||
- Performance: measure per-frame processing time (must be <400ms)
|
||
- Memory: monitor peak usage during 3000-frame session (must stay <8GB)
|
||
- Stress: process 3000 frames without memory leak
|
||
- Keyframe strategy: vary interval (2, 3, 5, 10) and measure accuracy vs latency tradeoff
|
||
- Tile preloading: verify RAM usage of preloaded tiles for 50km flight plan
|
||
|
||
## References
|
||
- EfficientLoFTR (CVPR 2024): https://github.com/zju3dv/EfficientLoFTR
|
||
- EfficientLoFTR paper: https://zju3dv.github.io/efficientloftr/
|
||
- LoFTR TensorRT adaptation: https://github.com/Kolkir/LoFTR_TRT
|
||
- PFED (2025): https://github.com/SkyEyeLoc/PFED
|
||
- STHN (IEEE RA-L 2024): https://github.com/arplaboratory/STHN
|
||
- JointLoc (IROS 2024): https://github.com/LuoXubo/JointLoc
|
||
- Hierarchical AVL (MDPI 2025): https://www.mdpi.com/2072-4292/17/20/3470
|
||
- LiteSAM (2025): https://www.mdpi.com/2072-4292/17/19/3349
|
||
- LiteSAM code: https://github.com/boyagesmile/LiteSAM
|
||
- cuVSLAM (2025-2026): https://github.com/NVlabs/PyCuVSLAM
|
||
- cuVSLAM paper: https://arxiv.org/abs/2506.04359
|
||
- PyCuVSLAM API: https://nvlabs.github.io/PyCuVSLAM/api.html
|
||
- Intermodalics cuVSLAM benchmark: https://www.intermodalics.ai/blog/nvidia-isaac-ros-in-depth-cuvslam-and-the-dp3-1-release
|
||
- Mateos-Ramirez et al. (2024): https://www.mdpi.com/2076-3417/14/16/7420
|
||
- SatLoc (2025): https://www.scilit.com/publications/e5cafaf875a49297a62b298a89d5572f
|
||
- XFeat (CVPR 2024): https://arxiv.org/abs/2404.19174
|
||
- XFeat TensorRT for Jetson: https://github.com/PranavNedunghat/XFeatTensorRT
|
||
- EfficientLoFTR (CVPR 2024): https://github.com/zju3dv/EfficientLoFTR
|
||
- LightGlue (ICCV 2023): https://github.com/cvg/LightGlue
|
||
- LightGlue TensorRT: https://fabio-sim.github.io/blog/accelerating-lightglue-inference-onnx-runtime-tensorrt/
|
||
- LightGlue TRT Jetson: https://github.com/qdLMF/LightGlue-with-FlashAttentionV2-TensorRT
|
||
- ForestVO / SP+LG VO: https://arxiv.org/html/2504.01261v1
|
||
- vo_lightglue (SP+LG VO): https://github.com/himadrir/vo_lightglue
|
||
- JetPack 6.2: https://docs.nvidia.com/jetson/archives/jetpack-archived/jetpack-62/release-notes/
|
||
- Hybrid ESKF/UKF: https://arxiv.org/abs/2512.17505
|
||
- Google Maps Tile API: https://developers.google.com/maps/documentation/tile/satellite
|
||
|
||
## Related Artifacts
|
||
- AC Assessment: `_docs/00_research/gps_denied_nav/00_ac_assessment.md`
|
||
- Tech stack evaluation: `_docs/01_solution/tech_stack.md`
|
||
- Security analysis: `_docs/01_solution/security_analysis.md`
|