# Solution Draft ## Assessment Findings | Old Component Solution | Weak Point (functional/security/performance) | New Solution | |------------------------|----------------------------------------------|-------------| | LiteSAM at 480px as satellite matcher | **Performance**: 497ms on AGX Orin at 1184px. Orin Nano Super is ~3-4x slower. At 480px estimated ~270-360ms — borderline. Paper uses PyTorch AMP, not TensorRT FP16. TensorRT could bring 2-3x improvement. | Add TensorRT FP16 as mandatory optimization step. Revised estimate at 480px with TensorRT: ~90-180ms. Still benchmark-driven: abandon if >400ms. | | XFeat as LiteSAM fallback for satellite matching | **Functional**: XFeat is a general-purpose feature matcher, NOT designed for cross-view satellite-aerial gap. May fail on season/lighting differences between UAV and satellite imagery. | **Expand fallback options**: benchmark EfficientLoFTR (designed for weak-texture aerial) alongside XFeat. Consider STHN-style deep homography as third option. See detailed satellite matcher comparison below. | | SP+LG considered as "sparse only, worse on satellite-aerial" | **Functional**: LiteSAM paper confirms "SP+LG achieves fastest inference speed but at expense of accuracy." Sparse matcher fails on texture-scarce regions. ~180-360ms on Orin Nano Super. | **Reject SP+LG** for both VO and satellite matching. cuVSLAM is 15-33x faster for VO. | | cuVSLAM on low-texture terrain | **Functional**: cuVSLAM uses Shi-Tomasi corners + Lucas-Kanade tracking. On uniform agricultural fields/water bodies, features will be sparse → frequent tracking loss. IMU fallback lasts only ~1s. No published benchmarks for nadir agricultural terrain. Does NOT guarantee pose recovery after tracking loss. | **CRITICAL RISK**: cuVSLAM will likely fail frequently over low-texture terrain. Mitigation: (1) increase satellite matching frequency in low-texture areas, (2) use IMU dead-reckoning bridge, (3) accept higher drift in featureless segments, (4) XFeat VO as secondary fallback may also struggle on same terrain. | | cuVSLAM memory estimate ~200-300MB | **Performance**: Map grows over time. For 3000-frame flights (~16min at 3fps), map could reach 500MB-1GB without pruning. | Configure cuVSLAM map pruning. Set max keyframes. Monitor memory. | | Tile search on VO failure: "expand to ±1km" | **Functional**: Underspecified. Loading 10-20 tiles slow from disk I/O. | Preload tiles within ±2km of flight plan into RAM. Ranked search by IMU dead-reckoning position. | | LiteSAM resolution | **Performance**: Paper benchmarked at 1184px on AGX Orin (497ms AMP). TensorRT FP16 with reparameterized MobileOne expected 2-3x faster. | Benchmark LiteSAM TRT FP16 at **1280px** on Orin Nano Super. If ≤200ms → use LiteSAM at 1280px. If >200ms → use XFeat. | | SP+LG proposed for VO by user | **Performance**: ~130-280ms/frame on Orin Nano. cuVSLAM ~8.6ms/frame. No IMU, no loop closure. | **Reject SP+LG for VO.** cuVSLAM 15-33x faster. XFeat frame-to-frame remains fallback. | ## Product Solution Description A real-time GPS-denied visual navigation system for fixed-wing UAVs, running entirely on a Jetson Orin Nano Super (8GB). The system determines frame-center GPS coordinates by fusing three information sources: (1) CUDA-accelerated visual odometry (cuVSLAM), (2) absolute position corrections from satellite image matching, and (3) IMU-based motion prediction. Results stream to clients via REST API + SSE in real time. **Hard constraint**: Camera shoots at ~3fps (333-400ms interval). The full pipeline must complete within **400ms per frame**. **Satellite matching strategy**: Benchmark LiteSAM TensorRT FP16 at **1280px** on Orin Nano Super as a day-one priority. The paper's AGX Orin benchmark used PyTorch AMP — TensorRT FP16 with reparameterized MobileOne should yield 2-3x additional speedup. **Decision rule: if LiteSAM TRT FP16 at 1280px ≤200ms → use LiteSAM. If >200ms → use XFeat.** **Core architectural principles**: 1. **cuVSLAM handles VO** — 116fps on Orin Nano 8GB, ~8.6ms/frame. SuperPoint+LightGlue was evaluated and rejected (15-33x slower, no IMU integration). 2. **Keyframe-based satellite matching** — satellite matcher runs on keyframes only (every 3-10 frames), amortizing cost. Non-keyframes rely on cuVSLAM VO + IMU. 3. **Every keyframe independently attempts satellite-based geo-localization** — handles disconnected segments natively. 4. **Pipeline parallelism** — satellite matching for frame N overlaps with VO processing of frame N+1 via CUDA streams. 5. **Proactive tile loading** — preload tiles within ±2km of flight plan into RAM for fast lookup during expanded search. ``` ┌─────────────────────────────────────────────────────────────────┐ │ OFFLINE (Before Flight) │ │ Satellite Tiles → Download & Crop → Store as tile pairs │ │ (Google Maps) (per flight plan) (disk, GeoHash indexed) │ └─────────────────────────────────────────────────────────────────┘ │ ▼ ┌─────────────────────────────────────────────────────────────────┐ │ ONLINE (During Flight) │ │ │ │ EVERY FRAME (400ms budget): │ │ ┌────────────────────────────────┐ │ │ │ Camera → Downsample (CUDA 2ms)│ │ │ │ → cuVSLAM VO+IMU (~9ms) │──→ ESKF Update → SSE Emit │ │ └────────────────────────────────┘ ↑ │ │ │ │ │ KEYFRAMES ONLY (every 3-10 frames): │ │ │ ┌────────────────────────────────────┐ │ │ │ │ Satellite match (async CUDA stream)│─────┘ │ │ │ LiteSAM TRT FP16 or XFeat │ │ │ │ (does NOT block VO output) │ │ │ └────────────────────────────────────┘ │ │ │ │ IMU: 100+Hz continuous → ESKF prediction │ │ TILES: ±2km preloaded in RAM from flight plan │ └─────────────────────────────────────────────────────────────────┘ ``` ## Speed Optimization Techniques ### 1. cuVSLAM for Visual Odometry (~9ms/frame) NVIDIA's CUDA-accelerated VO library (v15.0.0, March 2026) achieves 116fps on Jetson Orin Nano 8GB at 720p. Supports monocular camera + IMU natively. Features: automatic IMU fallback when visual tracking fails, loop closure, Python and C++ APIs. **Why not SuperPoint+LightGlue for VO**: SP+LG is 15-33x slower (~130-280ms vs ~9ms). Lacks IMU integration, loop closure, auto-fallback. **CRITICAL: cuVSLAM on difficult/even terrain (agricultural fields, water)**: cuVSLAM uses Shi-Tomasi corner detection + Lucas-Kanade optical flow tracking (classical features, not learned). On uniform agricultural terrain or water bodies: - Very few corners will be detected → sparse/unreliable tracking - Frequent keyframe creation → heavier compute - Tracking loss → IMU fallback (~1 second) → constant-velocity integrator (~0.5s more) - cuVSLAM does NOT guarantee pose recovery after tracking loss - All published benchmarks (KITTI: urban/suburban, EuRoC: indoor) do NOT include nadir agricultural terrain - Multi-stereo mode helps with featureless surfaces, but we have mono camera only **Mitigation strategy for low-texture terrain**: 1. **Increase satellite matching frequency**: In low-texture areas (detected by cuVSLAM's keypoint count dropping), switch from every 3-10 frames to every frame 2. **IMU dead-reckoning bridge**: When cuVSLAM reports tracking loss, ESKF continues with IMU prediction. At 3fps with ~1.5s IMU bridge, that covers ~4-5 frames 3. **Accept higher drift**: In featureless segments, position accuracy degrades to IMU-only level (50-100m+ over ~10s). Satellite matching must recover absolute position when texture returns 4. **Keypoint density monitoring**: Track cuVSLAM's number of tracked features per frame. When below threshold (e.g., <50), proactively trigger satellite matching 5. **XFeat frame-to-frame as VO fallback**: XFeat uses learned features that may detect texture invisible to Shi-Tomasi corners. But XFeat may also struggle on truly uniform terrain ### 2. Keyframe-Based Satellite Matching Not every frame needs satellite matching. Strategy: - cuVSLAM provides VO at every frame (high-rate, low-latency) - Satellite matching triggers on **keyframes** selected by: - Fixed interval: every 3-10 frames (~1-3.3s between satellite corrections) - Confidence drop: when ESKF covariance exceeds threshold - VO failure: when cuVSLAM reports tracking loss (sharp turn) ### 3. Satellite Matcher Selection (Benchmark-Driven) **Important context**: Our UAV-to-satellite matching is EASIER than typical cross-view geo-localization problems. Both the UAV camera and satellite imagery are approximately nadir (top-down). The main challenges are season/lighting differences, resolution mismatch, and temporal changes — not the extreme viewpoint gap seen in ground-to-satellite matching. This means even general-purpose matchers may perform well. **Candidate A: LiteSAM (opt) with TensorRT FP16 at 1280px** — Best satellite-aerial accuracy (RMSE@30 = 17.86m on UAV-VisLoc). 6.31M params, MobileOne reparameterizable for TensorRT. Paper benchmarked at 497ms on AGX Orin using AMP at 1184px. TensorRT FP16 with reparameterized MobileOne expected 2-3x faster than AMP. At 1280px (close to paper's 1184px benchmark resolution), accuracy should match published results. Orin Nano Super TensorRT FP16 estimate at 1280px: - AGX Orin AMP @ 1184px: 497ms - TRT FP16 speedup over AMP: ~2-3x → AGX Orin TRT estimate: ~165-250ms - Orin Nano Super is ~3-4x slower → estimate: ~500-1000ms without TRT - With TRT FP16: **~165-330ms** (realistic range) - Go/no-go threshold: **≤200ms** **Candidate B (fallback): XFeat semi-dense** — ~50-100ms on Orin Nano Super. Proven on Jetson. General-purpose, not designed for cross-view gap. FASTEST option. Since our cross-view gap is small (both nadir), XFeat may work adequately for this specific use case. **Other evaluated options (not selected)**: - **EfficientLoFTR**: Semi-dense, 15.05M params, handles weak-texture well. ~20% slower than LiteSAM. Strong option if LiteSAM codebase proves difficult to export to TRT, but larger model footprint. - **Deep Homography (STHN-style)**: End-to-end homography estimation, no feature/RANSAC pipeline. 4.24m at 50m range. Interesting future option but needs RGB retraining — higher implementation risk. - **PFED and retrieval-based methods**: Image RETRIEVAL only (identifies which tile matches), not pixel-level matching. We already know which tile to use from ESKF position. - **SuperPoint+LightGlue**: Sparse matcher. LiteSAM paper confirms worse satellite-aerial accuracy. Slower than XFeat. **Decision rule** (day-one on Orin Nano Super): 1. Export LiteSAM (opt) to TensorRT FP16 2. Benchmark at **1280px** 3. **If ≤200ms → use LiteSAM at 1280px** 4. **If >200ms → use XFeat** ### 4. TensorRT FP16 Optimization LiteSAM's MobileOne backbone is reparameterizable — multi-branch training structure collapses to a single feed-forward path at inference. Combined with TensorRT FP16, this maximizes throughput. **Do NOT use INT8 on transformer components** (TAIFormer) — accuracy degrades. INT8 is safe only for the MobileOne backbone CNN layers. ### 5. CUDA Stream Pipelining Overlap operations across consecutive frames: - Stream A: cuVSLAM VO for current frame (~9ms) + ESKF fusion (~1ms) - Stream B: Satellite matching for previous keyframe (async) - CPU: SSE emission, tile management, keyframe selection logic ### 6. Proactive Tile Loading **Change from draft01**: Instead of loading tiles on-demand from disk, preload tiles within ±2km of the flight plan into RAM at session start. This eliminates disk I/O latency during flight. For a 50km flight path, ~2000 tiles at zoom 19 ≈ ~200MB RAM — well within budget. On VO failure / expanded search: 1. Compute IMU dead-reckoning position 2. Rank preloaded tiles by distance to predicted position 3. Try top 3 tiles (not all tiles in ±1km radius) 4. If no match in top 3, expand to next 3 ## Existing/Competitor Solutions Analysis | Solution | Approach | Accuracy | Hardware | Limitations | |----------|----------|----------|----------|-------------| | Mateos-Ramirez et al. (2024) | VO (ORB) + satellite keypoint correction + Kalman | 142m mean / 17km (0.83%) | Orange Pi class | No re-localization; ORB only; 1000m+ altitude | | SatLoc (2025) | DinoV2 + XFeat + optical flow + adaptive fusion | <15m, >90% coverage | Edge (unspecified) | Paper not fully accessible | | LiteSAM (2025) | MobileOne + TAIFormer + MinGRU subpixel refinement | RMSE@30 = 17.86m on UAV-VisLoc | RTX 3090 (62ms), AGX Orin (497ms@1184px) | Not tested on Orin Nano; AGX Orin is 3-4x more powerful | | TerboucheHacene/visual_localization | SuperPoint/SuperGlue/GIM + VO + satellite | Not quantified | Desktop-class | Not edge-optimized | | cuVSLAM (NVIDIA, 2025-2026) | CUDA-accelerated VO+SLAM, mono/stereo/IMU | <1% trajectory error (KITTI), <5cm (EuRoC) | Jetson Orin Nano (116fps) | VO only, no satellite matching | | VRLM (2024) | FocalNet backbone + multi-scale feature fusion | 83.35% MA@20 | Desktop | Not edge-optimized | | Scale-Aware UAV-to-Satellite (2026) | Semantic geometric + metric scale recovery | N/A | Desktop | Addresses scale ambiguity problem | | EfficientLoFTR (CVPR 2024) | Aggregated attention + adaptive token selection, semi-dense | Competitive with LiteSAM | 2.5x faster than LoFTR, TRT available | 15.05M params, heavier than LiteSAM | | PFED (2025) | Knowledge distillation + multi-view refinement, retrieval | 97.15% Recall@1 (University-1652) | AGX Orin (251.5 FPS) | Retrieval only, not pixel-level matching | | STHN (IEEE RA-L 2024) | Deep homography estimation, coarse-to-fine | 4.24m at 50m range | Open-source, lightweight | Trained on thermal, needs RGB retraining | | Hierarchical AVL (2025) | DINOv2 retrieval + SuperPoint matching | 64.5-95% success rate | ROS, IMU integration | Two-stage complexity | | JointLoc (IROS 2024) | Retrieval + VO fusion, adaptive weighting | 0.237m RMSE over 1km | Open-source | Designed for Mars/planetary, needs adaptation | ## Architecture ### Component: Visual Odometry | Solution | Tools | Advantages | Limitations | Performance | Fit | |----------|-------|-----------|-------------|------------|-----| | cuVSLAM (mono+IMU) | PyCuVSLAM v15.0.0 | 116fps on Orin Nano, NVIDIA-optimized, loop closure, IMU fallback | Closed-source CUDA library | ~9ms/frame | ✅ Best | | XFeat frame-to-frame | XFeatTensorRT | 5x faster than SuperPoint, open-source | ~30-50ms total, no IMU integration | ~30-50ms/frame | ⚠️ Fallback | | SuperPoint+LightGlue | LightGlue-ONNX TRT | Good accuracy, adaptive pruning | ~130-280ms, no IMU, no loop closure | ~130-280ms/frame | ❌ Rejected | | ORB-SLAM3 | OpenCV + custom | Well-understood, open-source | CPU-heavy, ~30fps on Orin | ~33ms/frame | ⚠️ Slower | **Selected**: **cuVSLAM (mono+IMU mode)** — 116fps, purpose-built by NVIDIA for Jetson. Auto-fallback to IMU when visual tracking fails. **SP+LG rejection rationale**: 15-33x slower than cuVSLAM. No built-in IMU fusion, loop closure, or tracking failure detection. Building these features around SP+LG would take significant development time and still be slower. XFeat at ~30-50ms is a better fallback for VO if cuVSLAM fails on nadir camera. ### Component: Satellite Image Matching | Solution | Tools | Advantages | Limitations | Performance | Fit | |----------|-------|-----------|-------------|------------|-----| | LiteSAM (opt) TRT FP16 @ 1280px | TensorRT | Best satellite-aerial accuracy (RMSE@30 17.86m), 6.31M params, subpixel refinement | Untested on Orin Nano Super with TensorRT | Est. ~165-330ms @ 1280px TRT FP16 | ✅ If ≤200ms | | XFeat semi-dense | XFeatTensorRT | ~50-100ms, lightweight, Jetson-proven, fastest | General-purpose, not designed for cross-view. Our nadir-nadir gap is small → may work. | ~50-100ms | ✅ Fallback if LiteSAM >200ms | **Selection**: Day-one benchmark on Orin Nano Super: 1. Export LiteSAM (opt) to TensorRT FP16 2. Benchmark at **1280px** 3. **If ≤200ms → LiteSAM at 1280px** 4. **If >200ms → XFeat** ### Component: Sensor Fusion | Solution | Tools | Advantages | Limitations | Performance | Fit | |----------|-------|-----------|-------------|------------|-----| | Error-State EKF (ESKF) | Custom Python/C++ | Lightweight, multi-rate, well-understood | Linear approximation | <1ms/step | ✅ Best | | Hybrid ESKF/UKF | Custom | 49% better accuracy | More complex | ~2-3ms/step | ⚠️ Upgrade path | | Factor Graph (GTSAM) | GTSAM | Best accuracy | Heavy compute | ~10-50ms/step | ❌ Too heavy | **Selected**: **ESKF** with adaptive measurement noise. State vector: [position(3), velocity(3), orientation_quat(4), accel_bias(3), gyro_bias(3)] = 16 states. ### Component: Satellite Tile Preprocessing (Offline) **Selected**: **GeoHash-indexed tile pairs on disk + RAM preloading**. Pipeline: 1. Define operational area from flight plan 2. Download satellite tiles from Google Maps Tile API at max zoom (18-19) 3. Pre-resize each tile to matcher input resolution 4. Store: original tile + resized tile + metadata (GPS bounds, zoom, GSD) in GeoHash-indexed directory structure 5. Copy to Jetson storage before flight 6. **At session start**: preload tiles within ±2km of flight plan into RAM (~200MB for 50km route) ### Component: Re-localization (Disconnected Segments) **Selected**: **Keyframe satellite matching is always active + ranked tile search on VO failure**. When cuVSLAM reports tracking loss (sharp turn, no features): 1. Immediately flag next frame as keyframe → trigger satellite matching 2. Compute IMU dead-reckoning position since last known position 3. Rank preloaded tiles by distance to dead-reckoning position 4. Try top 3 tiles sequentially (not all tiles in radius) 5. If match found: position recovered, new segment begins 6. If 3 consecutive keyframe failures across top tiles: expand to next 3 tiles 7. If still no match after 3+ full attempts: request user input via API ### Component: Object Center Coordinates Geometric calculation once frame-center GPS is known: 1. Pixel offset from center: (dx_px, dy_px) 2. Convert to meters: dx_m = dx_px × GSD, dy_m = dy_px × GSD 3. Rotate by IMU yaw heading 4. Convert meter offset to lat/lon and add to frame-center GPS ### Component: API & Streaming **Selected**: **FastAPI + sse-starlette**. REST for session management, SSE for real-time position stream. OpenAPI auto-documentation. ## Processing Time Budget (per frame, 400ms budget) ### Normal Frame (non-keyframe, ~60-80% of frames) | Step | Time | Notes | |------|------|-------| | Image capture + transfer | ~10ms | CSI/USB3 | | Downsample (for cuVSLAM) | ~2ms | OpenCV CUDA | | cuVSLAM VO+IMU | ~9ms | NVIDIA CUDA-optimized, 116fps capable | | ESKF fusion (VO+IMU update) | ~1ms | C extension or NumPy | | SSE emit | ~1ms | Async | | **Total** | **~23ms** | Well within 400ms | ### Keyframe Satellite Matching (async, every 3-10 frames) Runs asynchronously on a separate CUDA stream — does NOT block per-frame VO output. **Path A — LiteSAM TRT FP16 at 1280px (if ≤200ms benchmark)**: | Step | Time | Notes | |------|------|-------| | Downsample to 1280px | ~1ms | OpenCV CUDA | | Load satellite tile | ~1ms | Pre-loaded in RAM | | LiteSAM (opt) TRT FP16 matching | ≤200ms | TensorRT FP16, 1280px, go/no-go threshold | | Geometric pose (RANSAC) | ~5ms | Homography estimation | | ESKF satellite update | ~1ms | Delayed measurement | | **Total** | **≤210ms** | Async, within budget | **Path B — XFeat (if LiteSAM >200ms)**: | Step | Time | Notes | |------|------|-------| | XFeat feature extraction (both images) | ~10-20ms | TensorRT FP16/INT8 | | XFeat semi-dense matching | ~30-50ms | KNN + refinement | | Geometric verification (RANSAC) | ~5ms | | | ESKF satellite update | ~1ms | | | **Total** | **~50-80ms** | Comfortably within budget | ## Memory Budget (Jetson Orin Nano Super, 8GB shared) | Component | Memory | Notes | |-----------|--------|-------| | OS + runtime | ~1.5GB | JetPack 6.2 + Python | | cuVSLAM | ~200-500MB | CUDA library + map state. **Configure map pruning for 3000-frame flights** | | Satellite matcher TensorRT | ~50-100MB | LiteSAM FP16 or XFeat FP16 | | Preloaded satellite tiles | ~200MB | ±2km of flight plan, pre-resized | | Current frame (downsampled) | ~2MB | 640×480×3 | | ESKF state + buffers | ~10MB | | | FastAPI + SSE runtime | ~100MB | | | **Total** | **~2.1-2.9GB** | ~26-36% of 8GB — comfortable margin | ## Confidence Scoring | Level | Condition | Expected Accuracy | |-------|-----------|-------------------| | HIGH | Satellite match succeeded + cuVSLAM consistent | <20m | | MEDIUM | cuVSLAM VO only, recent satellite correction (<500m travel) | 20-50m | | LOW | cuVSLAM VO only, no recent satellite correction | 50-100m+ | | VERY LOW | IMU dead-reckoning only (cuVSLAM + satellite both failed) | 100m+ | | MANUAL | User-provided position | As provided | ## Key Risks and Mitigations | Risk | Likelihood | Impact | Mitigation | |------|-----------|--------|------------| | **cuVSLAM fails on low-texture agricultural terrain** | **HIGH** | Frequent tracking loss, degraded VO | Increase satellite matching frequency when keypoint count drops. IMU dead-reckoning bridge (~1.5s). Accept higher drift in featureless segments. Satellite matching recovers position when texture returns. | | LiteSAM TRT FP16 >200ms at 1280px on Orin Nano Super | MEDIUM | Must use XFeat instead (less accurate for cross-view) | Day-one TRT FP16 benchmark. If >200ms → XFeat. Since our nadir-nadir gap is small, XFeat may still perform adequately. | | XFeat cross-view accuracy insufficient | MEDIUM | Satellite corrections less accurate | Benchmark XFeat on actual operational area satellite-aerial pairs. Increase keyframe frequency; multi-tile consensus; strict RANSAC. | | cuVSLAM map memory growth on long flights | MEDIUM | Memory pressure | Configure map pruning, set max keyframes. Monitor memory. | | Google Maps satellite quality in conflict zone | HIGH | Satellite matching fails | Accept VO+IMU with higher drift; request user input sooner; alternative satellite providers | | cuVSLAM is closed-source, no nadir benchmarks | MEDIUM | Unknown failure modes over farmland | Extensive testing with real nadir UAV imagery before deployment. XFeat VO as fallback (also uses learned features). | | Tile I/O bottleneck during expanded search | LOW | Delayed re-localization | Preload ±2km tiles in RAM; ranked search instead of exhaustive | ## Testing Strategy ### Integration / Functional Tests - End-to-end pipeline test with real flight data (60 images from input_data/) - Compare computed positions against ground truth GPS from coordinates.csv - Measure: percentage within 50m, percentage within 20m - Test sharp-turn handling: introduce 90-degree heading change in sequence - Test user-input fallback: simulate 3+ consecutive failures - Test SSE streaming: verify client receives VO result within 50ms, satellite-corrected result within 500ms - Test session management: start/stop/restart flight sessions via REST API - Test cuVSLAM map memory: run 3000-frame session, monitor memory growth ### Non-Functional Tests - **Day-one satellite matcher benchmark**: LiteSAM TRT FP16 at **1280px** on Orin Nano Super. If ≤200ms → use LiteSAM. If >200ms → use XFeat. Also measure accuracy on test satellite-aerial pairs for both. - cuVSLAM benchmark: verify 116fps monocular+IMU on Orin Nano Super - **cuVSLAM terrain stress test**: test with nadir camera over (a) urban/structured terrain, (b) agricultural fields, (c) water/uniform terrain, (d) forest. Measure: keypoint count, tracking success rate, drift per 100 frames, IMU fallback frequency - cuVSLAM keypoint monitoring: verify that low-keypoint detection triggers increased satellite matching - Performance: measure per-frame processing time (must be <400ms) - Memory: monitor peak usage during 3000-frame session (must stay <8GB) - Stress: process 3000 frames without memory leak - Keyframe strategy: vary interval (2, 3, 5, 10) and measure accuracy vs latency tradeoff - Tile preloading: verify RAM usage of preloaded tiles for 50km flight plan ## References - EfficientLoFTR (CVPR 2024): https://github.com/zju3dv/EfficientLoFTR - EfficientLoFTR paper: https://zju3dv.github.io/efficientloftr/ - LoFTR TensorRT adaptation: https://github.com/Kolkir/LoFTR_TRT - PFED (2025): https://github.com/SkyEyeLoc/PFED - STHN (IEEE RA-L 2024): https://github.com/arplaboratory/STHN - JointLoc (IROS 2024): https://github.com/LuoXubo/JointLoc - Hierarchical AVL (MDPI 2025): https://www.mdpi.com/2072-4292/17/20/3470 - LiteSAM (2025): https://www.mdpi.com/2072-4292/17/19/3349 - LiteSAM code: https://github.com/boyagesmile/LiteSAM - cuVSLAM (2025-2026): https://github.com/NVlabs/PyCuVSLAM - cuVSLAM paper: https://arxiv.org/abs/2506.04359 - PyCuVSLAM API: https://nvlabs.github.io/PyCuVSLAM/api.html - Intermodalics cuVSLAM benchmark: https://www.intermodalics.ai/blog/nvidia-isaac-ros-in-depth-cuvslam-and-the-dp3-1-release - Mateos-Ramirez et al. (2024): https://www.mdpi.com/2076-3417/14/16/7420 - SatLoc (2025): https://www.scilit.com/publications/e5cafaf875a49297a62b298a89d5572f - XFeat (CVPR 2024): https://arxiv.org/abs/2404.19174 - XFeat TensorRT for Jetson: https://github.com/PranavNedunghat/XFeatTensorRT - EfficientLoFTR (CVPR 2024): https://github.com/zju3dv/EfficientLoFTR - LightGlue (ICCV 2023): https://github.com/cvg/LightGlue - LightGlue TensorRT: https://fabio-sim.github.io/blog/accelerating-lightglue-inference-onnx-runtime-tensorrt/ - LightGlue TRT Jetson: https://github.com/qdLMF/LightGlue-with-FlashAttentionV2-TensorRT - ForestVO / SP+LG VO: https://arxiv.org/html/2504.01261v1 - vo_lightglue (SP+LG VO): https://github.com/himadrir/vo_lightglue - JetPack 6.2: https://docs.nvidia.com/jetson/archives/jetpack-archived/jetpack-62/release-notes/ - Hybrid ESKF/UKF: https://arxiv.org/abs/2512.17505 - Google Maps Tile API: https://developers.google.com/maps/documentation/tile/satellite ## Related Artifacts - AC Assessment: `_docs/00_research/gps_denied_nav/00_ac_assessment.md` - Tech stack evaluation: `_docs/01_solution/tech_stack.md` - Security analysis: `_docs/01_solution/security_analysis.md`