Files
gps-denied-onboard/_docs/01_solution/solution_draft02.md
T

357 lines
27 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters
This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.
# Solution Draft
## Assessment Findings
| Old Component Solution | Weak Point (functional/security/performance) | New Solution |
|------------------------|----------------------------------------------|-------------|
| LiteSAM at 480px as satellite matcher | **Performance**: 497ms on AGX Orin at 1184px. Orin Nano Super is ~3-4x slower. At 480px estimated ~270-360ms — borderline. Paper uses PyTorch AMP, not TensorRT FP16. TensorRT could bring 2-3x improvement. | Add TensorRT FP16 as mandatory optimization step. Revised estimate at 480px with TensorRT: ~90-180ms. Still benchmark-driven: abandon if >400ms. |
| XFeat as LiteSAM fallback for satellite matching | **Functional**: XFeat is a general-purpose feature matcher, NOT designed for cross-view satellite-aerial gap. May fail on season/lighting differences between UAV and satellite imagery. | **Expand fallback options**: benchmark EfficientLoFTR (designed for weak-texture aerial) alongside XFeat. Consider STHN-style deep homography as third option. See detailed satellite matcher comparison below. |
| SP+LG considered as "sparse only, worse on satellite-aerial" | **Functional**: LiteSAM paper confirms "SP+LG achieves fastest inference speed but at expense of accuracy." Sparse matcher fails on texture-scarce regions. ~180-360ms on Orin Nano Super. | **Reject SP+LG** for both VO and satellite matching. cuVSLAM is 15-33x faster for VO. |
| cuVSLAM on low-texture terrain | **Functional**: cuVSLAM uses Shi-Tomasi corners + Lucas-Kanade tracking. On uniform agricultural fields/water bodies, features will be sparse → frequent tracking loss. IMU fallback lasts only ~1s. No published benchmarks for nadir agricultural terrain. Does NOT guarantee pose recovery after tracking loss. | **CRITICAL RISK**: cuVSLAM will likely fail frequently over low-texture terrain. Mitigation: (1) increase satellite matching frequency in low-texture areas, (2) use IMU dead-reckoning bridge, (3) accept higher drift in featureless segments, (4) XFeat VO as secondary fallback may also struggle on same terrain. |
| cuVSLAM memory estimate ~200-300MB | **Performance**: Map grows over time. For 3000-frame flights (~16min at 3fps), map could reach 500MB-1GB without pruning. | Configure cuVSLAM map pruning. Set max keyframes. Monitor memory. |
| Tile search on VO failure: "expand to ±1km" | **Functional**: Underspecified. Loading 10-20 tiles slow from disk I/O. | Preload tiles within ±2km of flight plan into RAM. Ranked search by IMU dead-reckoning position. |
| LiteSAM resolution | **Performance**: Paper benchmarked at 1184px on AGX Orin (497ms AMP). TensorRT FP16 with reparameterized MobileOne expected 2-3x faster. | Benchmark LiteSAM TRT FP16 at **1280px** on Orin Nano Super. If ≤200ms → use LiteSAM at 1280px. If >200ms → use XFeat. |
| SP+LG proposed for VO by user | **Performance**: ~130-280ms/frame on Orin Nano. cuVSLAM ~8.6ms/frame. No IMU, no loop closure. | **Reject SP+LG for VO.** cuVSLAM 15-33x faster. XFeat frame-to-frame remains fallback. |
## Product Solution Description
A real-time GPS-denied visual navigation system for fixed-wing UAVs, running entirely on a Jetson Orin Nano Super (8GB). The system determines frame-center GPS coordinates by fusing three information sources: (1) CUDA-accelerated visual odometry (cuVSLAM), (2) absolute position corrections from satellite image matching, and (3) IMU-based motion prediction. Results stream to clients via REST API + SSE in real time.
**Hard constraint**: Camera shoots at ~3fps (333-400ms interval). The full pipeline must complete within **400ms per frame**.
**Satellite matching strategy**: Benchmark LiteSAM TensorRT FP16 at **1280px** on Orin Nano Super as a day-one priority. The paper's AGX Orin benchmark used PyTorch AMP — TensorRT FP16 with reparameterized MobileOne should yield 2-3x additional speedup. **Decision rule: if LiteSAM TRT FP16 at 1280px ≤200ms → use LiteSAM. If >200ms → use XFeat.**
**Core architectural principles**:
1. **cuVSLAM handles VO** — 116fps on Orin Nano 8GB, ~8.6ms/frame. SuperPoint+LightGlue was evaluated and rejected (15-33x slower, no IMU integration).
2. **Keyframe-based satellite matching** — satellite matcher runs on keyframes only (every 3-10 frames), amortizing cost. Non-keyframes rely on cuVSLAM VO + IMU.
3. **Every keyframe independently attempts satellite-based geo-localization** — handles disconnected segments natively.
4. **Pipeline parallelism** — satellite matching for frame N overlaps with VO processing of frame N+1 via CUDA streams.
5. **Proactive tile loading** — preload tiles within ±2km of flight plan into RAM for fast lookup during expanded search.
```
┌─────────────────────────────────────────────────────────────────┐
│ OFFLINE (Before Flight) │
│ Satellite Tiles → Download & Crop → Store as tile pairs │
│ (Google Maps) (per flight plan) (disk, GeoHash indexed) │
└─────────────────────────────────────────────────────────────────┘
┌─────────────────────────────────────────────────────────────────┐
│ ONLINE (During Flight) │
│ │
│ EVERY FRAME (400ms budget): │
│ ┌────────────────────────────────┐ │
│ │ Camera → Downsample (CUDA 2ms)│ │
│ │ → cuVSLAM VO+IMU (~9ms) │──→ ESKF Update → SSE Emit │
│ └────────────────────────────────┘ ↑ │
│ │ │
│ KEYFRAMES ONLY (every 3-10 frames): │ │
│ ┌────────────────────────────────────┐ │ │
│ │ Satellite match (async CUDA stream)│─────┘ │
│ │ LiteSAM TRT FP16 or XFeat │ │
│ │ (does NOT block VO output) │ │
│ └────────────────────────────────────┘ │
│ │
│ IMU: 100+Hz continuous → ESKF prediction │
│ TILES: ±2km preloaded in RAM from flight plan │
└─────────────────────────────────────────────────────────────────┘
```
## Speed Optimization Techniques
### 1. cuVSLAM for Visual Odometry (~9ms/frame)
NVIDIA's CUDA-accelerated VO library (v15.0.0, March 2026) achieves 116fps on Jetson Orin Nano 8GB at 720p. Supports monocular camera + IMU natively. Features: automatic IMU fallback when visual tracking fails, loop closure, Python and C++ APIs.
**Why not SuperPoint+LightGlue for VO**: SP+LG is 15-33x slower (~130-280ms vs ~9ms). Lacks IMU integration, loop closure, auto-fallback.
**CRITICAL: cuVSLAM on difficult/even terrain (agricultural fields, water)**:
cuVSLAM uses Shi-Tomasi corner detection + Lucas-Kanade optical flow tracking (classical features, not learned). On uniform agricultural terrain or water bodies:
- Very few corners will be detected → sparse/unreliable tracking
- Frequent keyframe creation → heavier compute
- Tracking loss → IMU fallback (~1 second) → constant-velocity integrator (~0.5s more)
- cuVSLAM does NOT guarantee pose recovery after tracking loss
- All published benchmarks (KITTI: urban/suburban, EuRoC: indoor) do NOT include nadir agricultural terrain
- Multi-stereo mode helps with featureless surfaces, but we have mono camera only
**Mitigation strategy for low-texture terrain**:
1. **Increase satellite matching frequency**: In low-texture areas (detected by cuVSLAM's keypoint count dropping), switch from every 3-10 frames to every frame
2. **IMU dead-reckoning bridge**: When cuVSLAM reports tracking loss, ESKF continues with IMU prediction. At 3fps with ~1.5s IMU bridge, that covers ~4-5 frames
3. **Accept higher drift**: In featureless segments, position accuracy degrades to IMU-only level (50-100m+ over ~10s). Satellite matching must recover absolute position when texture returns
4. **Keypoint density monitoring**: Track cuVSLAM's number of tracked features per frame. When below threshold (e.g., <50), proactively trigger satellite matching
5. **XFeat frame-to-frame as VO fallback**: XFeat uses learned features that may detect texture invisible to Shi-Tomasi corners. But XFeat may also struggle on truly uniform terrain
### 2. Keyframe-Based Satellite Matching
Not every frame needs satellite matching. Strategy:
- cuVSLAM provides VO at every frame (high-rate, low-latency)
- Satellite matching triggers on **keyframes** selected by:
- Fixed interval: every 3-10 frames (~1-3.3s between satellite corrections)
- Confidence drop: when ESKF covariance exceeds threshold
- VO failure: when cuVSLAM reports tracking loss (sharp turn)
### 3. Satellite Matcher Selection (Benchmark-Driven)
**Important context**: Our UAV-to-satellite matching is EASIER than typical cross-view geo-localization problems. Both the UAV camera and satellite imagery are approximately nadir (top-down). The main challenges are season/lighting differences, resolution mismatch, and temporal changes — not the extreme viewpoint gap seen in ground-to-satellite matching. This means even general-purpose matchers may perform well.
**Candidate A: LiteSAM (opt) with TensorRT FP16 at 1280px** — Best satellite-aerial accuracy (RMSE@30 = 17.86m on UAV-VisLoc). 6.31M params, MobileOne reparameterizable for TensorRT. Paper benchmarked at 497ms on AGX Orin using AMP at 1184px. TensorRT FP16 with reparameterized MobileOne expected 2-3x faster than AMP. At 1280px (close to paper's 1184px benchmark resolution), accuracy should match published results.
Orin Nano Super TensorRT FP16 estimate at 1280px:
- AGX Orin AMP @ 1184px: 497ms
- TRT FP16 speedup over AMP: ~2-3x → AGX Orin TRT estimate: ~165-250ms
- Orin Nano Super is ~3-4x slower → estimate: ~500-1000ms without TRT
- With TRT FP16: **~165-330ms** (realistic range)
- Go/no-go threshold: **≤200ms**
**Candidate B (fallback): XFeat semi-dense** — ~50-100ms on Orin Nano Super. Proven on Jetson. General-purpose, not designed for cross-view gap. FASTEST option. Since our cross-view gap is small (both nadir), XFeat may work adequately for this specific use case.
**Other evaluated options (not selected)**:
- **EfficientLoFTR**: Semi-dense, 15.05M params, handles weak-texture well. ~20% slower than LiteSAM. Strong option if LiteSAM codebase proves difficult to export to TRT, but larger model footprint.
- **Deep Homography (STHN-style)**: End-to-end homography estimation, no feature/RANSAC pipeline. 4.24m at 50m range. Interesting future option but needs RGB retraining — higher implementation risk.
- **PFED and retrieval-based methods**: Image RETRIEVAL only (identifies which tile matches), not pixel-level matching. We already know which tile to use from ESKF position.
- **SuperPoint+LightGlue**: Sparse matcher. LiteSAM paper confirms worse satellite-aerial accuracy. Slower than XFeat.
**Decision rule** (day-one on Orin Nano Super):
1. Export LiteSAM (opt) to TensorRT FP16
2. Benchmark at **1280px**
3. **If ≤200ms → use LiteSAM at 1280px**
4. **If >200ms → use XFeat**
### 4. TensorRT FP16 Optimization
LiteSAM's MobileOne backbone is reparameterizable — multi-branch training structure collapses to a single feed-forward path at inference. Combined with TensorRT FP16, this maximizes throughput. **Do NOT use INT8 on transformer components** (TAIFormer) — accuracy degrades. INT8 is safe only for the MobileOne backbone CNN layers.
### 5. CUDA Stream Pipelining
Overlap operations across consecutive frames:
- Stream A: cuVSLAM VO for current frame (~9ms) + ESKF fusion (~1ms)
- Stream B: Satellite matching for previous keyframe (async)
- CPU: SSE emission, tile management, keyframe selection logic
### 6. Proactive Tile Loading
**Change from draft01**: Instead of loading tiles on-demand from disk, preload tiles within ±2km of the flight plan into RAM at session start. This eliminates disk I/O latency during flight. For a 50km flight path, ~2000 tiles at zoom 19 ≈ ~200MB RAM — well within budget.
On VO failure / expanded search:
1. Compute IMU dead-reckoning position
2. Rank preloaded tiles by distance to predicted position
3. Try top 3 tiles (not all tiles in ±1km radius)
4. If no match in top 3, expand to next 3
## Existing/Competitor Solutions Analysis
| Solution | Approach | Accuracy | Hardware | Limitations |
|----------|----------|----------|----------|-------------|
| Mateos-Ramirez et al. (2024) | VO (ORB) + satellite keypoint correction + Kalman | 142m mean / 17km (0.83%) | Orange Pi class | No re-localization; ORB only; 1000m+ altitude |
| SatLoc (2025) | DinoV2 + XFeat + optical flow + adaptive fusion | <15m, >90% coverage | Edge (unspecified) | Paper not fully accessible |
| LiteSAM (2025) | MobileOne + TAIFormer + MinGRU subpixel refinement | RMSE@30 = 17.86m on UAV-VisLoc | RTX 3090 (62ms), AGX Orin (497ms@1184px) | Not tested on Orin Nano; AGX Orin is 3-4x more powerful |
| TerboucheHacene/visual_localization | SuperPoint/SuperGlue/GIM + VO + satellite | Not quantified | Desktop-class | Not edge-optimized |
| cuVSLAM (NVIDIA, 2025-2026) | CUDA-accelerated VO+SLAM, mono/stereo/IMU | <1% trajectory error (KITTI), <5cm (EuRoC) | Jetson Orin Nano (116fps) | VO only, no satellite matching |
| VRLM (2024) | FocalNet backbone + multi-scale feature fusion | 83.35% MA@20 | Desktop | Not edge-optimized |
| Scale-Aware UAV-to-Satellite (2026) | Semantic geometric + metric scale recovery | N/A | Desktop | Addresses scale ambiguity problem |
| EfficientLoFTR (CVPR 2024) | Aggregated attention + adaptive token selection, semi-dense | Competitive with LiteSAM | 2.5x faster than LoFTR, TRT available | 15.05M params, heavier than LiteSAM |
| PFED (2025) | Knowledge distillation + multi-view refinement, retrieval | 97.15% Recall@1 (University-1652) | AGX Orin (251.5 FPS) | Retrieval only, not pixel-level matching |
| STHN (IEEE RA-L 2024) | Deep homography estimation, coarse-to-fine | 4.24m at 50m range | Open-source, lightweight | Trained on thermal, needs RGB retraining |
| Hierarchical AVL (2025) | DINOv2 retrieval + SuperPoint matching | 64.5-95% success rate | ROS, IMU integration | Two-stage complexity |
| JointLoc (IROS 2024) | Retrieval + VO fusion, adaptive weighting | 0.237m RMSE over 1km | Open-source | Designed for Mars/planetary, needs adaptation |
## Architecture
### Component: Visual Odometry
| Solution | Tools | Advantages | Limitations | Performance | Fit |
|----------|-------|-----------|-------------|------------|-----|
| cuVSLAM (mono+IMU) | PyCuVSLAM v15.0.0 | 116fps on Orin Nano, NVIDIA-optimized, loop closure, IMU fallback | Closed-source CUDA library | ~9ms/frame | ✅ Best |
| XFeat frame-to-frame | XFeatTensorRT | 5x faster than SuperPoint, open-source | ~30-50ms total, no IMU integration | ~30-50ms/frame | ⚠️ Fallback |
| SuperPoint+LightGlue | LightGlue-ONNX TRT | Good accuracy, adaptive pruning | ~130-280ms, no IMU, no loop closure | ~130-280ms/frame | ❌ Rejected |
| ORB-SLAM3 | OpenCV + custom | Well-understood, open-source | CPU-heavy, ~30fps on Orin | ~33ms/frame | ⚠️ Slower |
**Selected**: **cuVSLAM (mono+IMU mode)** — 116fps, purpose-built by NVIDIA for Jetson. Auto-fallback to IMU when visual tracking fails.
**SP+LG rejection rationale**: 15-33x slower than cuVSLAM. No built-in IMU fusion, loop closure, or tracking failure detection. Building these features around SP+LG would take significant development time and still be slower. XFeat at ~30-50ms is a better fallback for VO if cuVSLAM fails on nadir camera.
### Component: Satellite Image Matching
| Solution | Tools | Advantages | Limitations | Performance | Fit |
|----------|-------|-----------|-------------|------------|-----|
| LiteSAM (opt) TRT FP16 @ 1280px | TensorRT | Best satellite-aerial accuracy (RMSE@30 17.86m), 6.31M params, subpixel refinement | Untested on Orin Nano Super with TensorRT | Est. ~165-330ms @ 1280px TRT FP16 | ✅ If ≤200ms |
| XFeat semi-dense | XFeatTensorRT | ~50-100ms, lightweight, Jetson-proven, fastest | General-purpose, not designed for cross-view. Our nadir-nadir gap is small → may work. | ~50-100ms | ✅ Fallback if LiteSAM >200ms |
**Selection**: Day-one benchmark on Orin Nano Super:
1. Export LiteSAM (opt) to TensorRT FP16
2. Benchmark at **1280px**
3. **If ≤200ms → LiteSAM at 1280px**
4. **If >200ms → XFeat**
### Component: Sensor Fusion
| Solution | Tools | Advantages | Limitations | Performance | Fit |
|----------|-------|-----------|-------------|------------|-----|
| Error-State EKF (ESKF) | Custom Python/C++ | Lightweight, multi-rate, well-understood | Linear approximation | <1ms/step | ✅ Best |
| Hybrid ESKF/UKF | Custom | 49% better accuracy | More complex | ~2-3ms/step | ⚠️ Upgrade path |
| Factor Graph (GTSAM) | GTSAM | Best accuracy | Heavy compute | ~10-50ms/step | ❌ Too heavy |
**Selected**: **ESKF** with adaptive measurement noise. State vector: [position(3), velocity(3), orientation_quat(4), accel_bias(3), gyro_bias(3)] = 16 states.
### Component: Satellite Tile Preprocessing (Offline)
**Selected**: **GeoHash-indexed tile pairs on disk + RAM preloading**.
Pipeline:
1. Define operational area from flight plan
2. Download satellite tiles from Google Maps Tile API at max zoom (18-19)
3. Pre-resize each tile to matcher input resolution
4. Store: original tile + resized tile + metadata (GPS bounds, zoom, GSD) in GeoHash-indexed directory structure
5. Copy to Jetson storage before flight
6. **At session start**: preload tiles within ±2km of flight plan into RAM (~200MB for 50km route)
### Component: Re-localization (Disconnected Segments)
**Selected**: **Keyframe satellite matching is always active + ranked tile search on VO failure**.
When cuVSLAM reports tracking loss (sharp turn, no features):
1. Immediately flag next frame as keyframe → trigger satellite matching
2. Compute IMU dead-reckoning position since last known position
3. Rank preloaded tiles by distance to dead-reckoning position
4. Try top 3 tiles sequentially (not all tiles in radius)
5. If match found: position recovered, new segment begins
6. If 3 consecutive keyframe failures across top tiles: expand to next 3 tiles
7. If still no match after 3+ full attempts: request user input via API
### Component: Object Center Coordinates
Geometric calculation once frame-center GPS is known:
1. Pixel offset from center: (dx_px, dy_px)
2. Convert to meters: dx_m = dx_px × GSD, dy_m = dy_px × GSD
3. Rotate by IMU yaw heading
4. Convert meter offset to lat/lon and add to frame-center GPS
### Component: API & Streaming
**Selected**: **FastAPI + sse-starlette**. REST for session management, SSE for real-time position stream. OpenAPI auto-documentation.
## Processing Time Budget (per frame, 400ms budget)
### Normal Frame (non-keyframe, ~60-80% of frames)
| Step | Time | Notes |
|------|------|-------|
| Image capture + transfer | ~10ms | CSI/USB3 |
| Downsample (for cuVSLAM) | ~2ms | OpenCV CUDA |
| cuVSLAM VO+IMU | ~9ms | NVIDIA CUDA-optimized, 116fps capable |
| ESKF fusion (VO+IMU update) | ~1ms | C extension or NumPy |
| SSE emit | ~1ms | Async |
| **Total** | **~23ms** | Well within 400ms |
### Keyframe Satellite Matching (async, every 3-10 frames)
Runs asynchronously on a separate CUDA stream — does NOT block per-frame VO output.
**Path A — LiteSAM TRT FP16 at 1280px (if ≤200ms benchmark)**:
| Step | Time | Notes |
|------|------|-------|
| Downsample to 1280px | ~1ms | OpenCV CUDA |
| Load satellite tile | ~1ms | Pre-loaded in RAM |
| LiteSAM (opt) TRT FP16 matching | ≤200ms | TensorRT FP16, 1280px, go/no-go threshold |
| Geometric pose (RANSAC) | ~5ms | Homography estimation |
| ESKF satellite update | ~1ms | Delayed measurement |
| **Total** | **≤210ms** | Async, within budget |
**Path B — XFeat (if LiteSAM >200ms)**:
| Step | Time | Notes |
|------|------|-------|
| XFeat feature extraction (both images) | ~10-20ms | TensorRT FP16/INT8 |
| XFeat semi-dense matching | ~30-50ms | KNN + refinement |
| Geometric verification (RANSAC) | ~5ms | |
| ESKF satellite update | ~1ms | |
| **Total** | **~50-80ms** | Comfortably within budget |
## Memory Budget (Jetson Orin Nano Super, 8GB shared)
| Component | Memory | Notes |
|-----------|--------|-------|
| OS + runtime | ~1.5GB | JetPack 6.2 + Python |
| cuVSLAM | ~200-500MB | CUDA library + map state. **Configure map pruning for 3000-frame flights** |
| Satellite matcher TensorRT | ~50-100MB | LiteSAM FP16 or XFeat FP16 |
| Preloaded satellite tiles | ~200MB | ±2km of flight plan, pre-resized |
| Current frame (downsampled) | ~2MB | 640×480×3 |
| ESKF state + buffers | ~10MB | |
| FastAPI + SSE runtime | ~100MB | |
| **Total** | **~2.1-2.9GB** | ~26-36% of 8GB — comfortable margin |
## Confidence Scoring
| Level | Condition | Expected Accuracy |
|-------|-----------|-------------------|
| HIGH | Satellite match succeeded + cuVSLAM consistent | <20m |
| MEDIUM | cuVSLAM VO only, recent satellite correction (<500m travel) | 20-50m |
| LOW | cuVSLAM VO only, no recent satellite correction | 50-100m+ |
| VERY LOW | IMU dead-reckoning only (cuVSLAM + satellite both failed) | 100m+ |
| MANUAL | User-provided position | As provided |
## Key Risks and Mitigations
| Risk | Likelihood | Impact | Mitigation |
|------|-----------|--------|------------|
| **cuVSLAM fails on low-texture agricultural terrain** | **HIGH** | Frequent tracking loss, degraded VO | Increase satellite matching frequency when keypoint count drops. IMU dead-reckoning bridge (~1.5s). Accept higher drift in featureless segments. Satellite matching recovers position when texture returns. |
| LiteSAM TRT FP16 >200ms at 1280px on Orin Nano Super | MEDIUM | Must use XFeat instead (less accurate for cross-view) | Day-one TRT FP16 benchmark. If >200ms → XFeat. Since our nadir-nadir gap is small, XFeat may still perform adequately. |
| XFeat cross-view accuracy insufficient | MEDIUM | Satellite corrections less accurate | Benchmark XFeat on actual operational area satellite-aerial pairs. Increase keyframe frequency; multi-tile consensus; strict RANSAC. |
| cuVSLAM map memory growth on long flights | MEDIUM | Memory pressure | Configure map pruning, set max keyframes. Monitor memory. |
| Google Maps satellite quality in conflict zone | HIGH | Satellite matching fails | Accept VO+IMU with higher drift; request user input sooner; alternative satellite providers |
| cuVSLAM is closed-source, no nadir benchmarks | MEDIUM | Unknown failure modes over farmland | Extensive testing with real nadir UAV imagery before deployment. XFeat VO as fallback (also uses learned features). |
| Tile I/O bottleneck during expanded search | LOW | Delayed re-localization | Preload ±2km tiles in RAM; ranked search instead of exhaustive |
## Testing Strategy
### Integration / Functional Tests
- End-to-end pipeline test with real flight data (60 images from input_data/)
- Compare computed positions against ground truth GPS from coordinates.csv
- Measure: percentage within 50m, percentage within 20m
- Test sharp-turn handling: introduce 90-degree heading change in sequence
- Test user-input fallback: simulate 3+ consecutive failures
- Test SSE streaming: verify client receives VO result within 50ms, satellite-corrected result within 500ms
- Test session management: start/stop/restart flight sessions via REST API
- Test cuVSLAM map memory: run 3000-frame session, monitor memory growth
### Non-Functional Tests
- **Day-one satellite matcher benchmark**: LiteSAM TRT FP16 at **1280px** on Orin Nano Super. If ≤200ms → use LiteSAM. If >200ms → use XFeat. Also measure accuracy on test satellite-aerial pairs for both.
- cuVSLAM benchmark: verify 116fps monocular+IMU on Orin Nano Super
- **cuVSLAM terrain stress test**: test with nadir camera over (a) urban/structured terrain, (b) agricultural fields, (c) water/uniform terrain, (d) forest. Measure: keypoint count, tracking success rate, drift per 100 frames, IMU fallback frequency
- cuVSLAM keypoint monitoring: verify that low-keypoint detection triggers increased satellite matching
- Performance: measure per-frame processing time (must be <400ms)
- Memory: monitor peak usage during 3000-frame session (must stay <8GB)
- Stress: process 3000 frames without memory leak
- Keyframe strategy: vary interval (2, 3, 5, 10) and measure accuracy vs latency tradeoff
- Tile preloading: verify RAM usage of preloaded tiles for 50km flight plan
## References
- EfficientLoFTR (CVPR 2024): https://github.com/zju3dv/EfficientLoFTR
- EfficientLoFTR paper: https://zju3dv.github.io/efficientloftr/
- LoFTR TensorRT adaptation: https://github.com/Kolkir/LoFTR_TRT
- PFED (2025): https://github.com/SkyEyeLoc/PFED
- STHN (IEEE RA-L 2024): https://github.com/arplaboratory/STHN
- JointLoc (IROS 2024): https://github.com/LuoXubo/JointLoc
- Hierarchical AVL (MDPI 2025): https://www.mdpi.com/2072-4292/17/20/3470
- LiteSAM (2025): https://www.mdpi.com/2072-4292/17/19/3349
- LiteSAM code: https://github.com/boyagesmile/LiteSAM
- cuVSLAM (2025-2026): https://github.com/NVlabs/PyCuVSLAM
- cuVSLAM paper: https://arxiv.org/abs/2506.04359
- PyCuVSLAM API: https://nvlabs.github.io/PyCuVSLAM/api.html
- Intermodalics cuVSLAM benchmark: https://www.intermodalics.ai/blog/nvidia-isaac-ros-in-depth-cuvslam-and-the-dp3-1-release
- Mateos-Ramirez et al. (2024): https://www.mdpi.com/2076-3417/14/16/7420
- SatLoc (2025): https://www.scilit.com/publications/e5cafaf875a49297a62b298a89d5572f
- XFeat (CVPR 2024): https://arxiv.org/abs/2404.19174
- XFeat TensorRT for Jetson: https://github.com/PranavNedunghat/XFeatTensorRT
- EfficientLoFTR (CVPR 2024): https://github.com/zju3dv/EfficientLoFTR
- LightGlue (ICCV 2023): https://github.com/cvg/LightGlue
- LightGlue TensorRT: https://fabio-sim.github.io/blog/accelerating-lightglue-inference-onnx-runtime-tensorrt/
- LightGlue TRT Jetson: https://github.com/qdLMF/LightGlue-with-FlashAttentionV2-TensorRT
- ForestVO / SP+LG VO: https://arxiv.org/html/2504.01261v1
- vo_lightglue (SP+LG VO): https://github.com/himadrir/vo_lightglue
- JetPack 6.2: https://docs.nvidia.com/jetson/archives/jetpack-archived/jetpack-62/release-notes/
- Hybrid ESKF/UKF: https://arxiv.org/abs/2512.17505
- Google Maps Tile API: https://developers.google.com/maps/documentation/tile/satellite
## Related Artifacts
- AC Assessment: `_docs/00_research/gps_denied_nav/00_ac_assessment.md`
- Tech stack evaluation: `_docs/01_solution/tech_stack.md`
- Security analysis: `_docs/01_solution/security_analysis.md`