gps-denied-onboard/_docs/01_solution/solution_draft02.md

# Solution Draft

## Assessment Findings

| Old Component Solution | Weak Point (functional/security/performance) | New Solution |
|------------------------|----------------------------------------------|-------------|
| LiteSAM at 480px as satellite matcher | **Performance**: 497ms on AGX Orin at 1184px. Orin Nano Super is ~3-4x slower. At 480px estimated ~270-360ms — borderline. Paper uses PyTorch AMP, not TensorRT FP16. TensorRT could bring 2-3x improvement. | Add TensorRT FP16 as mandatory optimization step. Revised estimate at 480px with TensorRT: ~90-180ms. Still benchmark-driven: abandon if >400ms. |
| XFeat as LiteSAM fallback for satellite matching | **Functional**: XFeat is a general-purpose feature matcher, NOT designed for cross-view satellite-aerial gap. May fail on season/lighting differences between UAV and satellite imagery. | **Expand fallback options**: benchmark EfficientLoFTR (designed for weak-texture aerial) alongside XFeat. Consider STHN-style deep homography as third option. See detailed satellite matcher comparison below. |
| SP+LG considered as "sparse only, worse on satellite-aerial" | **Functional**: LiteSAM paper confirms "SP+LG achieves fastest inference speed but at expense of accuracy." Sparse matcher fails on texture-scarce regions. ~180-360ms on Orin Nano Super. | **Reject SP+LG** for both VO and satellite matching. cuVSLAM is 15-33x faster for VO. |
| cuVSLAM on low-texture terrain | **Functional**: cuVSLAM uses Shi-Tomasi corners + Lucas-Kanade tracking. On uniform agricultural fields/water bodies, features will be sparse → frequent tracking loss. IMU fallback lasts only ~1s. No published benchmarks for nadir agricultural terrain. Does NOT guarantee pose recovery after tracking loss. | **CRITICAL RISK**: cuVSLAM will likely fail frequently over low-texture terrain. Mitigation: (1) increase satellite matching frequency in low-texture areas, (2) use IMU dead-reckoning bridge, (3) accept higher drift in featureless segments, (4) XFeat VO as secondary fallback may also struggle on same terrain. |
| cuVSLAM memory estimate ~200-300MB | **Performance**: Map grows over time. For 3000-frame flights (~16min at 3fps), map could reach 500MB-1GB without pruning. | Configure cuVSLAM map pruning. Set max keyframes. Monitor memory. |
| Tile search on VO failure: "expand to ±1km" | **Functional**: Underspecified. Loading 10-20 tiles slow from disk I/O. | Preload tiles within ±2km of flight plan into RAM. Ranked search by IMU dead-reckoning position. |
| LiteSAM resolution | **Performance**: Paper benchmarked at 1184px on AGX Orin (497ms AMP). TensorRT FP16 with reparameterized MobileOne expected 2-3x faster. | Benchmark LiteSAM TRT FP16 at **1280px** on Orin Nano Super. If ≤200ms → use LiteSAM at 1280px. If >200ms → use XFeat. |
| SP+LG proposed for VO by user | **Performance**: ~130-280ms/frame on Orin Nano. cuVSLAM ~8.6ms/frame. No IMU, no loop closure. | **Reject SP+LG for VO.** cuVSLAM 15-33x faster. XFeat frame-to-frame remains fallback. |

## Product Solution Description

A real-time GPS-denied visual navigation system for fixed-wing UAVs, running entirely on a Jetson Orin Nano Super (8GB). The system determines frame-center GPS coordinates by fusing three information sources: (1) CUDA-accelerated visual odometry (cuVSLAM), (2) absolute position corrections from satellite image matching, and (3) IMU-based motion prediction. Results stream to clients via REST API + SSE in real time.

**Hard constraint**: Camera shoots at ~3fps (333-400ms interval). The full pipeline must complete within **400ms per frame**.

**Satellite matching strategy**: Benchmark LiteSAM TensorRT FP16 at **1280px** on Orin Nano Super as a day-one priority. The paper's AGX Orin benchmark used PyTorch AMP — TensorRT FP16 with reparameterized MobileOne should yield 2-3x additional speedup. **Decision rule: if LiteSAM TRT FP16 at 1280px ≤200ms → use LiteSAM. If >200ms → use XFeat.**

**Core architectural principles**:
1. **cuVSLAM handles VO** — 116fps on Orin Nano 8GB, ~8.6ms/frame. SuperPoint+LightGlue was evaluated and rejected (15-33x slower, no IMU integration).
2. **Keyframe-based satellite matching** — satellite matcher runs on keyframes only (every 3-10 frames), amortizing cost. Non-keyframes rely on cuVSLAM VO + IMU.
3. **Every keyframe independently attempts satellite-based geo-localization** — handles disconnected segments natively.
4. **Pipeline parallelism** — satellite matching for frame N overlaps with VO processing of frame N+1 via CUDA streams.
5. **Proactive tile loading** — preload tiles within ±2km of flight plan into RAM for fast lookup during expanded search.

```
┌─────────────────────────────────────────────────────────────────┐
│                    OFFLINE (Before Flight)                       │
│  Satellite Tiles → Download & Crop → Store as tile pairs        │
│  (Google Maps)     (per flight plan)   (disk, GeoHash indexed)  │
└─────────────────────────────────────────────────────────────────┘
                              │
                              ▼
┌─────────────────────────────────────────────────────────────────┐
│                    ONLINE (During Flight)                        │
│                                                                  │
│  EVERY FRAME (400ms budget):                                     │
│  ┌────────────────────────────────┐                              │
│  │ Camera → Downsample (CUDA 2ms)│                               │
│  │       → cuVSLAM VO+IMU (~9ms) │──→ ESKF Update → SSE Emit   │
│  └────────────────────────────────┘         ↑                    │
│                                             │                    │
│  KEYFRAMES ONLY (every 3-10 frames):        │                    │
│  ┌────────────────────────────────────┐     │                    │
│  │ Satellite match (async CUDA stream)│─────┘                    │
│  │ LiteSAM TRT FP16 or XFeat         │                           │
│  │ (does NOT block VO output)         │                           │
│  └────────────────────────────────────┘                          │
│                                                                  │
│  IMU: 100+Hz continuous → ESKF prediction                        │
│  TILES: ±2km preloaded in RAM from flight plan                   │
└─────────────────────────────────────────────────────────────────┘
```

## Speed Optimization Techniques

### 1. cuVSLAM for Visual Odometry (~9ms/frame)
NVIDIA's CUDA-accelerated VO library (v15.0.0, March 2026) achieves 116fps on Jetson Orin Nano 8GB at 720p. Supports monocular camera + IMU natively. Features: automatic IMU fallback when visual tracking fails, loop closure, Python and C++ APIs.

**Why not SuperPoint+LightGlue for VO**: SP+LG is 15-33x slower (~130-280ms vs ~9ms). Lacks IMU integration, loop closure, auto-fallback.

**CRITICAL: cuVSLAM on difficult/even terrain (agricultural fields, water)**:
cuVSLAM uses Shi-Tomasi corner detection + Lucas-Kanade optical flow tracking (classical features, not learned). On uniform agricultural terrain or water bodies:
- Very few corners will be detected → sparse/unreliable tracking
- Frequent keyframe creation → heavier compute
- Tracking loss → IMU fallback (~1 second) → constant-velocity integrator (~0.5s more)
- cuVSLAM does NOT guarantee pose recovery after tracking loss
- All published benchmarks (KITTI: urban/suburban, EuRoC: indoor) do NOT include nadir agricultural terrain
- Multi-stereo mode helps with featureless surfaces, but we have mono camera only

**Mitigation strategy for low-texture terrain**:
1. **Increase satellite matching frequency**: In low-texture areas (detected by cuVSLAM's keypoint count dropping), switch from every 3-10 frames to every frame
2. **IMU dead-reckoning bridge**: When cuVSLAM reports tracking loss, ESKF continues with IMU prediction. At 3fps with ~1.5s IMU bridge, that covers ~4-5 frames
3. **Accept higher drift**: In featureless segments, position accuracy degrades to IMU-only level (50-100m+ over ~10s). Satellite matching must recover absolute position when texture returns
4. **Keypoint density monitoring**: Track cuVSLAM's number of tracked features per frame. When below threshold (e.g., <50), proactively trigger satellite matching
5. **XFeat frame-to-frame as VO fallback**: XFeat uses learned features that may detect texture invisible to Shi-Tomasi corners. But XFeat may also struggle on truly uniform terrain

### 2. Keyframe-Based Satellite Matching
Not every frame needs satellite matching. Strategy:
- cuVSLAM provides VO at every frame (high-rate, low-latency)
- Satellite matching triggers on **keyframes** selected by:
  - Fixed interval: every 3-10 frames (~1-3.3s between satellite corrections)
  - Confidence drop: when ESKF covariance exceeds threshold
  - VO failure: when cuVSLAM reports tracking loss (sharp turn)

### 3. Satellite Matcher Selection (Benchmark-Driven)

**Important context**: Our UAV-to-satellite matching is EASIER than typical cross-view geo-localization problems. Both the UAV camera and satellite imagery are approximately nadir (top-down). The main challenges are season/lighting differences, resolution mismatch, and temporal changes — not the extreme viewpoint gap seen in ground-to-satellite matching. This means even general-purpose matchers may perform well.

**Candidate A: LiteSAM (opt) with TensorRT FP16 at 1280px** — Best satellite-aerial accuracy (RMSE@30 = 17.86m on UAV-VisLoc). 6.31M params, MobileOne reparameterizable for TensorRT. Paper benchmarked at 497ms on AGX Orin using AMP at 1184px. TensorRT FP16 with reparameterized MobileOne expected 2-3x faster than AMP. At 1280px (close to paper's 1184px benchmark resolution), accuracy should match published results.

Orin Nano Super TensorRT FP16 estimate at 1280px:
- AGX Orin AMP @ 1184px: 497ms
- TRT FP16 speedup over AMP: ~2-3x → AGX Orin TRT estimate: ~165-250ms
- Orin Nano Super is ~3-4x slower → estimate: ~500-1000ms without TRT
- With TRT FP16: **~165-330ms** (realistic range)
- Go/no-go threshold: **≤200ms**

**Candidate B (fallback): XFeat semi-dense** — ~50-100ms on Orin Nano Super. Proven on Jetson. General-purpose, not designed for cross-view gap. FASTEST option. Since our cross-view gap is small (both nadir), XFeat may work adequately for this specific use case.

**Other evaluated options (not selected)**:

- **EfficientLoFTR**: Semi-dense, 15.05M params, handles weak-texture well. ~20% slower than LiteSAM. Strong option if LiteSAM codebase proves difficult to export to TRT, but larger model footprint.
- **Deep Homography (STHN-style)**: End-to-end homography estimation, no feature/RANSAC pipeline. 4.24m at 50m range. Interesting future option but needs RGB retraining — higher implementation risk.
- **PFED and retrieval-based methods**: Image RETRIEVAL only (identifies which tile matches), not pixel-level matching. We already know which tile to use from ESKF position.
- **SuperPoint+LightGlue**: Sparse matcher. LiteSAM paper confirms worse satellite-aerial accuracy. Slower than XFeat.

**Decision rule** (day-one on Orin Nano Super):
1. Export LiteSAM (opt) to TensorRT FP16
2. Benchmark at **1280px**
3. **If ≤200ms → use LiteSAM at 1280px**
4. **If >200ms → use XFeat**

### 4. TensorRT FP16 Optimization
LiteSAM's MobileOne backbone is reparameterizable — multi-branch training structure collapses to a single feed-forward path at inference. Combined with TensorRT FP16, this maximizes throughput. **Do NOT use INT8 on transformer components** (TAIFormer) — accuracy degrades. INT8 is safe only for the MobileOne backbone CNN layers.

### 5. CUDA Stream Pipelining
Overlap operations across consecutive frames:
- Stream A: cuVSLAM VO for current frame (~9ms) + ESKF fusion (~1ms)
- Stream B: Satellite matching for previous keyframe (async)
- CPU: SSE emission, tile management, keyframe selection logic

### 6. Proactive Tile Loading
**Change from draft01**: Instead of loading tiles on-demand from disk, preload tiles within ±2km of the flight plan into RAM at session start. This eliminates disk I/O latency during flight. For a 50km flight path, ~2000 tiles at zoom 19 ≈ ~200MB RAM — well within budget.

On VO failure / expanded search:
1. Compute IMU dead-reckoning position
2. Rank preloaded tiles by distance to predicted position
3. Try top 3 tiles (not all tiles in ±1km radius)
4. If no match in top 3, expand to next 3

## Existing/Competitor Solutions Analysis

| Solution | Approach | Accuracy | Hardware | Limitations |
|----------|----------|----------|----------|-------------|
| Mateos-Ramirez et al. (2024) | VO (ORB) + satellite keypoint correction + Kalman | 142m mean / 17km (0.83%) | Orange Pi class | No re-localization; ORB only; 1000m+ altitude |
| SatLoc (2025) | DinoV2 + XFeat + optical flow + adaptive fusion | <15m, >90% coverage | Edge (unspecified) | Paper not fully accessible |
| LiteSAM (2025) | MobileOne + TAIFormer + MinGRU subpixel refinement | RMSE@30 = 17.86m on UAV-VisLoc | RTX 3090 (62ms), AGX Orin (497ms@1184px) | Not tested on Orin Nano; AGX Orin is 3-4x more powerful |
| TerboucheHacene/visual_localization | SuperPoint/SuperGlue/GIM + VO + satellite | Not quantified | Desktop-class | Not edge-optimized |
| cuVSLAM (NVIDIA, 2025-2026) | CUDA-accelerated VO+SLAM, mono/stereo/IMU | <1% trajectory error (KITTI), <5cm (EuRoC) | Jetson Orin Nano (116fps) | VO only, no satellite matching |
| VRLM (2024) | FocalNet backbone + multi-scale feature fusion | 83.35% MA@20 | Desktop | Not edge-optimized |
| Scale-Aware UAV-to-Satellite (2026) | Semantic geometric + metric scale recovery | N/A | Desktop | Addresses scale ambiguity problem |
| EfficientLoFTR (CVPR 2024) | Aggregated attention + adaptive token selection, semi-dense | Competitive with LiteSAM | 2.5x faster than LoFTR, TRT available | 15.05M params, heavier than LiteSAM |
| PFED (2025) | Knowledge distillation + multi-view refinement, retrieval | 97.15% Recall@1 (University-1652) | AGX Orin (251.5 FPS) | Retrieval only, not pixel-level matching |
| STHN (IEEE RA-L 2024) | Deep homography estimation, coarse-to-fine | 4.24m at 50m range | Open-source, lightweight | Trained on thermal, needs RGB retraining |
| Hierarchical AVL (2025) | DINOv2 retrieval + SuperPoint matching | 64.5-95% success rate | ROS, IMU integration | Two-stage complexity |
| JointLoc (IROS 2024) | Retrieval + VO fusion, adaptive weighting | 0.237m RMSE over 1km | Open-source | Designed for Mars/planetary, needs adaptation |

## Architecture

### Component: Visual Odometry

| Solution | Tools | Advantages | Limitations | Performance | Fit |
|----------|-------|-----------|-------------|------------|-----|
| cuVSLAM (mono+IMU) | PyCuVSLAM v15.0.0 | 116fps on Orin Nano, NVIDIA-optimized, loop closure, IMU fallback | Closed-source CUDA library | ~9ms/frame | ✅ Best |
| XFeat frame-to-frame | XFeatTensorRT | 5x faster than SuperPoint, open-source | ~30-50ms total, no IMU integration | ~30-50ms/frame | ⚠️ Fallback |
| SuperPoint+LightGlue | LightGlue-ONNX TRT | Good accuracy, adaptive pruning | ~130-280ms, no IMU, no loop closure | ~130-280ms/frame | ❌ Rejected |
| ORB-SLAM3 | OpenCV + custom | Well-understood, open-source | CPU-heavy, ~30fps on Orin | ~33ms/frame | ⚠️ Slower |

**Selected**: **cuVSLAM (mono+IMU mode)** — 116fps, purpose-built by NVIDIA for Jetson. Auto-fallback to IMU when visual tracking fails.

**SP+LG rejection rationale**: 15-33x slower than cuVSLAM. No built-in IMU fusion, loop closure, or tracking failure detection. Building these features around SP+LG would take significant development time and still be slower. XFeat at ~30-50ms is a better fallback for VO if cuVSLAM fails on nadir camera.

### Component: Satellite Image Matching

| Solution | Tools | Advantages | Limitations | Performance | Fit |
|----------|-------|-----------|-------------|------------|-----|
| LiteSAM (opt) TRT FP16 @ 1280px | TensorRT | Best satellite-aerial accuracy (RMSE@30 17.86m), 6.31M params, subpixel refinement | Untested on Orin Nano Super with TensorRT | Est. ~165-330ms @ 1280px TRT FP16 | ✅ If ≤200ms |
| XFeat semi-dense | XFeatTensorRT | ~50-100ms, lightweight, Jetson-proven, fastest | General-purpose, not designed for cross-view. Our nadir-nadir gap is small → may work. | ~50-100ms | ✅ Fallback if LiteSAM >200ms |

**Selection**: Day-one benchmark on Orin Nano Super:
1. Export LiteSAM (opt) to TensorRT FP16
2. Benchmark at **1280px**
3. **If ≤200ms → LiteSAM at 1280px**
4. **If >200ms → XFeat**

### Component: Sensor Fusion

| Solution | Tools | Advantages | Limitations | Performance | Fit |
|----------|-------|-----------|-------------|------------|-----|
| Error-State EKF (ESKF) | Custom Python/C++ | Lightweight, multi-rate, well-understood | Linear approximation | <1ms/step | ✅ Best |
| Hybrid ESKF/UKF | Custom | 49% better accuracy | More complex | ~2-3ms/step | ⚠️ Upgrade path |
| Factor Graph (GTSAM) | GTSAM | Best accuracy | Heavy compute | ~10-50ms/step | ❌ Too heavy |

**Selected**: **ESKF** with adaptive measurement noise. State vector: [position(3), velocity(3), orientation_quat(4), accel_bias(3), gyro_bias(3)] = 16 states.

### Component: Satellite Tile Preprocessing (Offline)

**Selected**: **GeoHash-indexed tile pairs on disk + RAM preloading**.

Pipeline:
1. Define operational area from flight plan
2. Download satellite tiles from Google Maps Tile API at max zoom (18-19)
3. Pre-resize each tile to matcher input resolution
4. Store: original tile + resized tile + metadata (GPS bounds, zoom, GSD) in GeoHash-indexed directory structure
5. Copy to Jetson storage before flight
6. **At session start**: preload tiles within ±2km of flight plan into RAM (~200MB for 50km route)

### Component: Re-localization (Disconnected Segments)

**Selected**: **Keyframe satellite matching is always active + ranked tile search on VO failure**.

When cuVSLAM reports tracking loss (sharp turn, no features):
1. Immediately flag next frame as keyframe → trigger satellite matching
2. Compute IMU dead-reckoning position since last known position
3. Rank preloaded tiles by distance to dead-reckoning position
4. Try top 3 tiles sequentially (not all tiles in radius)
5. If match found: position recovered, new segment begins
6. If 3 consecutive keyframe failures across top tiles: expand to next 3 tiles
7. If still no match after 3+ full attempts: request user input via API

### Component: Object Center Coordinates

Geometric calculation once frame-center GPS is known:
1. Pixel offset from center: (dx_px, dy_px)
2. Convert to meters: dx_m = dx_px × GSD, dy_m = dy_px × GSD
3. Rotate by IMU yaw heading
4. Convert meter offset to lat/lon and add to frame-center GPS

### Component: API & Streaming

**Selected**: **FastAPI + sse-starlette**. REST for session management, SSE for real-time position stream. OpenAPI auto-documentation.

## Processing Time Budget (per frame, 400ms budget)

### Normal Frame (non-keyframe, ~60-80% of frames)

| Step | Time | Notes |
|------|------|-------|
| Image capture + transfer | ~10ms | CSI/USB3 |
| Downsample (for cuVSLAM) | ~2ms | OpenCV CUDA |
| cuVSLAM VO+IMU | ~9ms | NVIDIA CUDA-optimized, 116fps capable |
| ESKF fusion (VO+IMU update) | ~1ms | C extension or NumPy |
| SSE emit | ~1ms | Async |
| **Total** | **~23ms** | Well within 400ms |

### Keyframe Satellite Matching (async, every 3-10 frames)

Runs asynchronously on a separate CUDA stream — does NOT block per-frame VO output.

**Path A — LiteSAM TRT FP16 at 1280px (if ≤200ms benchmark)**:

| Step | Time | Notes |
|------|------|-------|
| Downsample to 1280px | ~1ms | OpenCV CUDA |
| Load satellite tile | ~1ms | Pre-loaded in RAM |
| LiteSAM (opt) TRT FP16 matching | ≤200ms | TensorRT FP16, 1280px, go/no-go threshold |
| Geometric pose (RANSAC) | ~5ms | Homography estimation |
| ESKF satellite update | ~1ms | Delayed measurement |
| **Total** | **≤210ms** | Async, within budget |

**Path B — XFeat (if LiteSAM >200ms)**:

| Step | Time | Notes |
|------|------|-------|
| XFeat feature extraction (both images) | ~10-20ms | TensorRT FP16/INT8 |
| XFeat semi-dense matching | ~30-50ms | KNN + refinement |
| Geometric verification (RANSAC) | ~5ms | |
| ESKF satellite update | ~1ms | |
| **Total** | **~50-80ms** | Comfortably within budget |

## Memory Budget (Jetson Orin Nano Super, 8GB shared)

| Component | Memory | Notes |
|-----------|--------|-------|
| OS + runtime | ~1.5GB | JetPack 6.2 + Python |
| cuVSLAM | ~200-500MB | CUDA library + map state. **Configure map pruning for 3000-frame flights** |
| Satellite matcher TensorRT | ~50-100MB | LiteSAM FP16 or XFeat FP16 |
| Preloaded satellite tiles | ~200MB | ±2km of flight plan, pre-resized |
| Current frame (downsampled) | ~2MB | 640×480×3 |
| ESKF state + buffers | ~10MB | |
| FastAPI + SSE runtime | ~100MB | |
| **Total** | **~2.1-2.9GB** | ~26-36% of 8GB — comfortable margin |

## Confidence Scoring

| Level | Condition | Expected Accuracy |
|-------|-----------|-------------------|
| HIGH | Satellite match succeeded + cuVSLAM consistent | <20m |
| MEDIUM | cuVSLAM VO only, recent satellite correction (<500m travel) | 20-50m |
| LOW | cuVSLAM VO only, no recent satellite correction | 50-100m+ |
| VERY LOW | IMU dead-reckoning only (cuVSLAM + satellite both failed) | 100m+ |
| MANUAL | User-provided position | As provided |

## Key Risks and Mitigations

| Risk | Likelihood | Impact | Mitigation |
|------|-----------|--------|------------|
| **cuVSLAM fails on low-texture agricultural terrain** | **HIGH** | Frequent tracking loss, degraded VO | Increase satellite matching frequency when keypoint count drops. IMU dead-reckoning bridge (~1.5s). Accept higher drift in featureless segments. Satellite matching recovers position when texture returns. |
| LiteSAM TRT FP16 >200ms at 1280px on Orin Nano Super | MEDIUM | Must use XFeat instead (less accurate for cross-view) | Day-one TRT FP16 benchmark. If >200ms → XFeat. Since our nadir-nadir gap is small, XFeat may still perform adequately. |
| XFeat cross-view accuracy insufficient | MEDIUM | Satellite corrections less accurate | Benchmark XFeat on actual operational area satellite-aerial pairs. Increase keyframe frequency; multi-tile consensus; strict RANSAC. |
| cuVSLAM map memory growth on long flights | MEDIUM | Memory pressure | Configure map pruning, set max keyframes. Monitor memory. |
| Google Maps satellite quality in conflict zone | HIGH | Satellite matching fails | Accept VO+IMU with higher drift; request user input sooner; alternative satellite providers |
| cuVSLAM is closed-source, no nadir benchmarks | MEDIUM | Unknown failure modes over farmland | Extensive testing with real nadir UAV imagery before deployment. XFeat VO as fallback (also uses learned features). |
| Tile I/O bottleneck during expanded search | LOW | Delayed re-localization | Preload ±2km tiles in RAM; ranked search instead of exhaustive |

## Testing Strategy

### Integration / Functional Tests
- End-to-end pipeline test with real flight data (60 images from input_data/)
- Compare computed positions against ground truth GPS from coordinates.csv
- Measure: percentage within 50m, percentage within 20m
- Test sharp-turn handling: introduce 90-degree heading change in sequence
- Test user-input fallback: simulate 3+ consecutive failures
- Test SSE streaming: verify client receives VO result within 50ms, satellite-corrected result within 500ms
- Test session management: start/stop/restart flight sessions via REST API
- Test cuVSLAM map memory: run 3000-frame session, monitor memory growth

### Non-Functional Tests
- **Day-one satellite matcher benchmark**: LiteSAM TRT FP16 at **1280px** on Orin Nano Super. If ≤200ms → use LiteSAM. If >200ms → use XFeat. Also measure accuracy on test satellite-aerial pairs for both.
- cuVSLAM benchmark: verify 116fps monocular+IMU on Orin Nano Super
- **cuVSLAM terrain stress test**: test with nadir camera over (a) urban/structured terrain, (b) agricultural fields, (c) water/uniform terrain, (d) forest. Measure: keypoint count, tracking success rate, drift per 100 frames, IMU fallback frequency
- cuVSLAM keypoint monitoring: verify that low-keypoint detection triggers increased satellite matching
- Performance: measure per-frame processing time (must be <400ms)
- Memory: monitor peak usage during 3000-frame session (must stay <8GB)
- Stress: process 3000 frames without memory leak
- Keyframe strategy: vary interval (2, 3, 5, 10) and measure accuracy vs latency tradeoff
- Tile preloading: verify RAM usage of preloaded tiles for 50km flight plan

## References
- EfficientLoFTR (CVPR 2024): https://github.com/zju3dv/EfficientLoFTR
- EfficientLoFTR paper: https://zju3dv.github.io/efficientloftr/
- LoFTR TensorRT adaptation: https://github.com/Kolkir/LoFTR_TRT
- PFED (2025): https://github.com/SkyEyeLoc/PFED
- STHN (IEEE RA-L 2024): https://github.com/arplaboratory/STHN
- JointLoc (IROS 2024): https://github.com/LuoXubo/JointLoc
- Hierarchical AVL (MDPI 2025): https://www.mdpi.com/2072-4292/17/20/3470
- LiteSAM (2025): https://www.mdpi.com/2072-4292/17/19/3349
- LiteSAM code: https://github.com/boyagesmile/LiteSAM
- cuVSLAM (2025-2026): https://github.com/NVlabs/PyCuVSLAM
- cuVSLAM paper: https://arxiv.org/abs/2506.04359
- PyCuVSLAM API: https://nvlabs.github.io/PyCuVSLAM/api.html
- Intermodalics cuVSLAM benchmark: https://www.intermodalics.ai/blog/nvidia-isaac-ros-in-depth-cuvslam-and-the-dp3-1-release
- Mateos-Ramirez et al. (2024): https://www.mdpi.com/2076-3417/14/16/7420
- SatLoc (2025): https://www.scilit.com/publications/e5cafaf875a49297a62b298a89d5572f
- XFeat (CVPR 2024): https://arxiv.org/abs/2404.19174
- XFeat TensorRT for Jetson: https://github.com/PranavNedunghat/XFeatTensorRT
- EfficientLoFTR (CVPR 2024): https://github.com/zju3dv/EfficientLoFTR
- LightGlue (ICCV 2023): https://github.com/cvg/LightGlue
- LightGlue TensorRT: https://fabio-sim.github.io/blog/accelerating-lightglue-inference-onnx-runtime-tensorrt/
- LightGlue TRT Jetson: https://github.com/qdLMF/LightGlue-with-FlashAttentionV2-TensorRT
- ForestVO / SP+LG VO: https://arxiv.org/html/2504.01261v1
- vo_lightglue (SP+LG VO): https://github.com/himadrir/vo_lightglue
- JetPack 6.2: https://docs.nvidia.com/jetson/archives/jetpack-archived/jetpack-62/release-notes/
- Hybrid ESKF/UKF: https://arxiv.org/abs/2512.17505
- Google Maps Tile API: https://developers.google.com/maps/documentation/tile/satellite

## Related Artifacts
- AC Assessment: `_docs/00_research/gps_denied_nav/00_ac_assessment.md`
- Tech stack evaluation: `_docs/01_solution/tech_stack.md`
- Security analysis: `_docs/01_solution/security_analysis.md`