27 KiB
Solution Draft
Assessment Findings
| Old Component Solution | Weak Point (functional/security/performance) | New Solution |
|---|---|---|
| LiteSAM at 480px as satellite matcher | Performance: 497ms on AGX Orin at 1184px. Orin Nano Super is ~3-4x slower. At 480px estimated ~270-360ms — borderline. Paper uses PyTorch AMP, not TensorRT FP16. TensorRT could bring 2-3x improvement. | Add TensorRT FP16 as mandatory optimization step. Revised estimate at 480px with TensorRT: ~90-180ms. Still benchmark-driven: abandon if >400ms. |
| XFeat as LiteSAM fallback for satellite matching | Functional: XFeat is a general-purpose feature matcher, NOT designed for cross-view satellite-aerial gap. May fail on season/lighting differences between UAV and satellite imagery. | Expand fallback options: benchmark EfficientLoFTR (designed for weak-texture aerial) alongside XFeat. Consider STHN-style deep homography as third option. See detailed satellite matcher comparison below. |
| SP+LG considered as "sparse only, worse on satellite-aerial" | Functional: LiteSAM paper confirms "SP+LG achieves fastest inference speed but at expense of accuracy." Sparse matcher fails on texture-scarce regions. ~180-360ms on Orin Nano Super. | Reject SP+LG for both VO and satellite matching. cuVSLAM is 15-33x faster for VO. |
| cuVSLAM on low-texture terrain | Functional: cuVSLAM uses Shi-Tomasi corners + Lucas-Kanade tracking. On uniform agricultural fields/water bodies, features will be sparse → frequent tracking loss. IMU fallback lasts only ~1s. No published benchmarks for nadir agricultural terrain. Does NOT guarantee pose recovery after tracking loss. | CRITICAL RISK: cuVSLAM will likely fail frequently over low-texture terrain. Mitigation: (1) increase satellite matching frequency in low-texture areas, (2) use IMU dead-reckoning bridge, (3) accept higher drift in featureless segments, (4) XFeat VO as secondary fallback may also struggle on same terrain. |
| cuVSLAM memory estimate ~200-300MB | Performance: Map grows over time. For 3000-frame flights (~16min at 3fps), map could reach 500MB-1GB without pruning. | Configure cuVSLAM map pruning. Set max keyframes. Monitor memory. |
| Tile search on VO failure: "expand to ±1km" | Functional: Underspecified. Loading 10-20 tiles slow from disk I/O. | Preload tiles within ±2km of flight plan into RAM. Ranked search by IMU dead-reckoning position. |
| LiteSAM resolution | Performance: Paper benchmarked at 1184px on AGX Orin (497ms AMP). TensorRT FP16 with reparameterized MobileOne expected 2-3x faster. | Benchmark LiteSAM TRT FP16 at 1280px on Orin Nano Super. If ≤200ms → use LiteSAM at 1280px. If >200ms → use XFeat. |
| SP+LG proposed for VO by user | Performance: ~130-280ms/frame on Orin Nano. cuVSLAM ~8.6ms/frame. No IMU, no loop closure. | Reject SP+LG for VO. cuVSLAM 15-33x faster. XFeat frame-to-frame remains fallback. |
Product Solution Description
A real-time GPS-denied visual navigation system for fixed-wing UAVs, running entirely on a Jetson Orin Nano Super (8GB). The system determines frame-center GPS coordinates by fusing three information sources: (1) CUDA-accelerated visual odometry (cuVSLAM), (2) absolute position corrections from satellite image matching, and (3) IMU-based motion prediction. Results stream to clients via REST API + SSE in real time.
Hard constraint: Camera shoots at ~3fps (333-400ms interval). The full pipeline must complete within 400ms per frame.
Satellite matching strategy: Benchmark LiteSAM TensorRT FP16 at 1280px on Orin Nano Super as a day-one priority. The paper's AGX Orin benchmark used PyTorch AMP — TensorRT FP16 with reparameterized MobileOne should yield 2-3x additional speedup. Decision rule: if LiteSAM TRT FP16 at 1280px ≤200ms → use LiteSAM. If >200ms → use XFeat.
Core architectural principles:
- cuVSLAM handles VO — 116fps on Orin Nano 8GB, ~8.6ms/frame. SuperPoint+LightGlue was evaluated and rejected (15-33x slower, no IMU integration).
- Keyframe-based satellite matching — satellite matcher runs on keyframes only (every 3-10 frames), amortizing cost. Non-keyframes rely on cuVSLAM VO + IMU.
- Every keyframe independently attempts satellite-based geo-localization — handles disconnected segments natively.
- Pipeline parallelism — satellite matching for frame N overlaps with VO processing of frame N+1 via CUDA streams.
- Proactive tile loading — preload tiles within ±2km of flight plan into RAM for fast lookup during expanded search.
┌─────────────────────────────────────────────────────────────────┐
│ OFFLINE (Before Flight) │
│ Satellite Tiles → Download & Crop → Store as tile pairs │
│ (Google Maps) (per flight plan) (disk, GeoHash indexed) │
└─────────────────────────────────────────────────────────────────┘
│
▼
┌─────────────────────────────────────────────────────────────────┐
│ ONLINE (During Flight) │
│ │
│ EVERY FRAME (400ms budget): │
│ ┌────────────────────────────────┐ │
│ │ Camera → Downsample (CUDA 2ms)│ │
│ │ → cuVSLAM VO+IMU (~9ms) │──→ ESKF Update → SSE Emit │
│ └────────────────────────────────┘ ↑ │
│ │ │
│ KEYFRAMES ONLY (every 3-10 frames): │ │
│ ┌────────────────────────────────────┐ │ │
│ │ Satellite match (async CUDA stream)│─────┘ │
│ │ LiteSAM TRT FP16 or XFeat │ │
│ │ (does NOT block VO output) │ │
│ └────────────────────────────────────┘ │
│ │
│ IMU: 100+Hz continuous → ESKF prediction │
│ TILES: ±2km preloaded in RAM from flight plan │
└─────────────────────────────────────────────────────────────────┘
Speed Optimization Techniques
1. cuVSLAM for Visual Odometry (~9ms/frame)
NVIDIA's CUDA-accelerated VO library (v15.0.0, March 2026) achieves 116fps on Jetson Orin Nano 8GB at 720p. Supports monocular camera + IMU natively. Features: automatic IMU fallback when visual tracking fails, loop closure, Python and C++ APIs.
Why not SuperPoint+LightGlue for VO: SP+LG is 15-33x slower (~130-280ms vs ~9ms). Lacks IMU integration, loop closure, auto-fallback.
CRITICAL: cuVSLAM on difficult/even terrain (agricultural fields, water): cuVSLAM uses Shi-Tomasi corner detection + Lucas-Kanade optical flow tracking (classical features, not learned). On uniform agricultural terrain or water bodies:
- Very few corners will be detected → sparse/unreliable tracking
- Frequent keyframe creation → heavier compute
- Tracking loss → IMU fallback (~1 second) → constant-velocity integrator (~0.5s more)
- cuVSLAM does NOT guarantee pose recovery after tracking loss
- All published benchmarks (KITTI: urban/suburban, EuRoC: indoor) do NOT include nadir agricultural terrain
- Multi-stereo mode helps with featureless surfaces, but we have mono camera only
Mitigation strategy for low-texture terrain:
- Increase satellite matching frequency: In low-texture areas (detected by cuVSLAM's keypoint count dropping), switch from every 3-10 frames to every frame
- IMU dead-reckoning bridge: When cuVSLAM reports tracking loss, ESKF continues with IMU prediction. At 3fps with ~1.5s IMU bridge, that covers ~4-5 frames
- Accept higher drift: In featureless segments, position accuracy degrades to IMU-only level (50-100m+ over ~10s). Satellite matching must recover absolute position when texture returns
- Keypoint density monitoring: Track cuVSLAM's number of tracked features per frame. When below threshold (e.g., <50), proactively trigger satellite matching
- XFeat frame-to-frame as VO fallback: XFeat uses learned features that may detect texture invisible to Shi-Tomasi corners. But XFeat may also struggle on truly uniform terrain
2. Keyframe-Based Satellite Matching
Not every frame needs satellite matching. Strategy:
- cuVSLAM provides VO at every frame (high-rate, low-latency)
- Satellite matching triggers on keyframes selected by:
- Fixed interval: every 3-10 frames (~1-3.3s between satellite corrections)
- Confidence drop: when ESKF covariance exceeds threshold
- VO failure: when cuVSLAM reports tracking loss (sharp turn)
3. Satellite Matcher Selection (Benchmark-Driven)
Important context: Our UAV-to-satellite matching is EASIER than typical cross-view geo-localization problems. Both the UAV camera and satellite imagery are approximately nadir (top-down). The main challenges are season/lighting differences, resolution mismatch, and temporal changes — not the extreme viewpoint gap seen in ground-to-satellite matching. This means even general-purpose matchers may perform well.
Candidate A: LiteSAM (opt) with TensorRT FP16 at 1280px — Best satellite-aerial accuracy (RMSE@30 = 17.86m on UAV-VisLoc). 6.31M params, MobileOne reparameterizable for TensorRT. Paper benchmarked at 497ms on AGX Orin using AMP at 1184px. TensorRT FP16 with reparameterized MobileOne expected 2-3x faster than AMP. At 1280px (close to paper's 1184px benchmark resolution), accuracy should match published results.
Orin Nano Super TensorRT FP16 estimate at 1280px:
- AGX Orin AMP @ 1184px: 497ms
- TRT FP16 speedup over AMP: ~2-3x → AGX Orin TRT estimate: ~165-250ms
- Orin Nano Super is ~3-4x slower → estimate: ~500-1000ms without TRT
- With TRT FP16: ~165-330ms (realistic range)
- Go/no-go threshold: ≤200ms
Candidate B (fallback): XFeat semi-dense — ~50-100ms on Orin Nano Super. Proven on Jetson. General-purpose, not designed for cross-view gap. FASTEST option. Since our cross-view gap is small (both nadir), XFeat may work adequately for this specific use case.
Other evaluated options (not selected):
- EfficientLoFTR: Semi-dense, 15.05M params, handles weak-texture well. ~20% slower than LiteSAM. Strong option if LiteSAM codebase proves difficult to export to TRT, but larger model footprint.
- Deep Homography (STHN-style): End-to-end homography estimation, no feature/RANSAC pipeline. 4.24m at 50m range. Interesting future option but needs RGB retraining — higher implementation risk.
- PFED and retrieval-based methods: Image RETRIEVAL only (identifies which tile matches), not pixel-level matching. We already know which tile to use from ESKF position.
- SuperPoint+LightGlue: Sparse matcher. LiteSAM paper confirms worse satellite-aerial accuracy. Slower than XFeat.
Decision rule (day-one on Orin Nano Super):
- Export LiteSAM (opt) to TensorRT FP16
- Benchmark at 1280px
- If ≤200ms → use LiteSAM at 1280px
- If >200ms → use XFeat
4. TensorRT FP16 Optimization
LiteSAM's MobileOne backbone is reparameterizable — multi-branch training structure collapses to a single feed-forward path at inference. Combined with TensorRT FP16, this maximizes throughput. Do NOT use INT8 on transformer components (TAIFormer) — accuracy degrades. INT8 is safe only for the MobileOne backbone CNN layers.
5. CUDA Stream Pipelining
Overlap operations across consecutive frames:
- Stream A: cuVSLAM VO for current frame (~9ms) + ESKF fusion (~1ms)
- Stream B: Satellite matching for previous keyframe (async)
- CPU: SSE emission, tile management, keyframe selection logic
6. Proactive Tile Loading
Change from draft01: Instead of loading tiles on-demand from disk, preload tiles within ±2km of the flight plan into RAM at session start. This eliminates disk I/O latency during flight. For a 50km flight path, ~2000 tiles at zoom 19 ≈ ~200MB RAM — well within budget.
On VO failure / expanded search:
- Compute IMU dead-reckoning position
- Rank preloaded tiles by distance to predicted position
- Try top 3 tiles (not all tiles in ±1km radius)
- If no match in top 3, expand to next 3
Existing/Competitor Solutions Analysis
| Solution | Approach | Accuracy | Hardware | Limitations |
|---|---|---|---|---|
| Mateos-Ramirez et al. (2024) | VO (ORB) + satellite keypoint correction + Kalman | 142m mean / 17km (0.83%) | Orange Pi class | No re-localization; ORB only; 1000m+ altitude |
| SatLoc (2025) | DinoV2 + XFeat + optical flow + adaptive fusion | <15m, >90% coverage | Edge (unspecified) | Paper not fully accessible |
| LiteSAM (2025) | MobileOne + TAIFormer + MinGRU subpixel refinement | RMSE@30 = 17.86m on UAV-VisLoc | RTX 3090 (62ms), AGX Orin (497ms@1184px) | Not tested on Orin Nano; AGX Orin is 3-4x more powerful |
| TerboucheHacene/visual_localization | SuperPoint/SuperGlue/GIM + VO + satellite | Not quantified | Desktop-class | Not edge-optimized |
| cuVSLAM (NVIDIA, 2025-2026) | CUDA-accelerated VO+SLAM, mono/stereo/IMU | <1% trajectory error (KITTI), <5cm (EuRoC) | Jetson Orin Nano (116fps) | VO only, no satellite matching |
| VRLM (2024) | FocalNet backbone + multi-scale feature fusion | 83.35% MA@20 | Desktop | Not edge-optimized |
| Scale-Aware UAV-to-Satellite (2026) | Semantic geometric + metric scale recovery | N/A | Desktop | Addresses scale ambiguity problem |
| EfficientLoFTR (CVPR 2024) | Aggregated attention + adaptive token selection, semi-dense | Competitive with LiteSAM | 2.5x faster than LoFTR, TRT available | 15.05M params, heavier than LiteSAM |
| PFED (2025) | Knowledge distillation + multi-view refinement, retrieval | 97.15% Recall@1 (University-1652) | AGX Orin (251.5 FPS) | Retrieval only, not pixel-level matching |
| STHN (IEEE RA-L 2024) | Deep homography estimation, coarse-to-fine | 4.24m at 50m range | Open-source, lightweight | Trained on thermal, needs RGB retraining |
| Hierarchical AVL (2025) | DINOv2 retrieval + SuperPoint matching | 64.5-95% success rate | ROS, IMU integration | Two-stage complexity |
| JointLoc (IROS 2024) | Retrieval + VO fusion, adaptive weighting | 0.237m RMSE over 1km | Open-source | Designed for Mars/planetary, needs adaptation |
Architecture
Component: Visual Odometry
| Solution | Tools | Advantages | Limitations | Performance | Fit |
|---|---|---|---|---|---|
| cuVSLAM (mono+IMU) | PyCuVSLAM v15.0.0 | 116fps on Orin Nano, NVIDIA-optimized, loop closure, IMU fallback | Closed-source CUDA library | ~9ms/frame | ✅ Best |
| XFeat frame-to-frame | XFeatTensorRT | 5x faster than SuperPoint, open-source | ~30-50ms total, no IMU integration | ~30-50ms/frame | ⚠️ Fallback |
| SuperPoint+LightGlue | LightGlue-ONNX TRT | Good accuracy, adaptive pruning | ~130-280ms, no IMU, no loop closure | ~130-280ms/frame | ❌ Rejected |
| ORB-SLAM3 | OpenCV + custom | Well-understood, open-source | CPU-heavy, ~30fps on Orin | ~33ms/frame | ⚠️ Slower |
Selected: cuVSLAM (mono+IMU mode) — 116fps, purpose-built by NVIDIA for Jetson. Auto-fallback to IMU when visual tracking fails.
SP+LG rejection rationale: 15-33x slower than cuVSLAM. No built-in IMU fusion, loop closure, or tracking failure detection. Building these features around SP+LG would take significant development time and still be slower. XFeat at ~30-50ms is a better fallback for VO if cuVSLAM fails on nadir camera.
Component: Satellite Image Matching
| Solution | Tools | Advantages | Limitations | Performance | Fit |
|---|---|---|---|---|---|
| LiteSAM (opt) TRT FP16 @ 1280px | TensorRT | Best satellite-aerial accuracy (RMSE@30 17.86m), 6.31M params, subpixel refinement | Untested on Orin Nano Super with TensorRT | Est. ~165-330ms @ 1280px TRT FP16 | ✅ If ≤200ms |
| XFeat semi-dense | XFeatTensorRT | ~50-100ms, lightweight, Jetson-proven, fastest | General-purpose, not designed for cross-view. Our nadir-nadir gap is small → may work. | ~50-100ms | ✅ Fallback if LiteSAM >200ms |
Selection: Day-one benchmark on Orin Nano Super:
- Export LiteSAM (opt) to TensorRT FP16
- Benchmark at 1280px
- If ≤200ms → LiteSAM at 1280px
- If >200ms → XFeat
Component: Sensor Fusion
| Solution | Tools | Advantages | Limitations | Performance | Fit |
|---|---|---|---|---|---|
| Error-State EKF (ESKF) | Custom Python/C++ | Lightweight, multi-rate, well-understood | Linear approximation | <1ms/step | ✅ Best |
| Hybrid ESKF/UKF | Custom | 49% better accuracy | More complex | ~2-3ms/step | ⚠️ Upgrade path |
| Factor Graph (GTSAM) | GTSAM | Best accuracy | Heavy compute | ~10-50ms/step | ❌ Too heavy |
Selected: ESKF with adaptive measurement noise. State vector: [position(3), velocity(3), orientation_quat(4), accel_bias(3), gyro_bias(3)] = 16 states.
Component: Satellite Tile Preprocessing (Offline)
Selected: GeoHash-indexed tile pairs on disk + RAM preloading.
Pipeline:
- Define operational area from flight plan
- Download satellite tiles from Google Maps Tile API at max zoom (18-19)
- Pre-resize each tile to matcher input resolution
- Store: original tile + resized tile + metadata (GPS bounds, zoom, GSD) in GeoHash-indexed directory structure
- Copy to Jetson storage before flight
- At session start: preload tiles within ±2km of flight plan into RAM (~200MB for 50km route)
Component: Re-localization (Disconnected Segments)
Selected: Keyframe satellite matching is always active + ranked tile search on VO failure.
When cuVSLAM reports tracking loss (sharp turn, no features):
- Immediately flag next frame as keyframe → trigger satellite matching
- Compute IMU dead-reckoning position since last known position
- Rank preloaded tiles by distance to dead-reckoning position
- Try top 3 tiles sequentially (not all tiles in radius)
- If match found: position recovered, new segment begins
- If 3 consecutive keyframe failures across top tiles: expand to next 3 tiles
- If still no match after 3+ full attempts: request user input via API
Component: Object Center Coordinates
Geometric calculation once frame-center GPS is known:
- Pixel offset from center: (dx_px, dy_px)
- Convert to meters: dx_m = dx_px × GSD, dy_m = dy_px × GSD
- Rotate by IMU yaw heading
- Convert meter offset to lat/lon and add to frame-center GPS
Component: API & Streaming
Selected: FastAPI + sse-starlette. REST for session management, SSE for real-time position stream. OpenAPI auto-documentation.
Processing Time Budget (per frame, 400ms budget)
Normal Frame (non-keyframe, ~60-80% of frames)
| Step | Time | Notes |
|---|---|---|
| Image capture + transfer | ~10ms | CSI/USB3 |
| Downsample (for cuVSLAM) | ~2ms | OpenCV CUDA |
| cuVSLAM VO+IMU | ~9ms | NVIDIA CUDA-optimized, 116fps capable |
| ESKF fusion (VO+IMU update) | ~1ms | C extension or NumPy |
| SSE emit | ~1ms | Async |
| Total | ~23ms | Well within 400ms |
Keyframe Satellite Matching (async, every 3-10 frames)
Runs asynchronously on a separate CUDA stream — does NOT block per-frame VO output.
Path A — LiteSAM TRT FP16 at 1280px (if ≤200ms benchmark):
| Step | Time | Notes |
|---|---|---|
| Downsample to 1280px | ~1ms | OpenCV CUDA |
| Load satellite tile | ~1ms | Pre-loaded in RAM |
| LiteSAM (opt) TRT FP16 matching | ≤200ms | TensorRT FP16, 1280px, go/no-go threshold |
| Geometric pose (RANSAC) | ~5ms | Homography estimation |
| ESKF satellite update | ~1ms | Delayed measurement |
| Total | ≤210ms | Async, within budget |
Path B — XFeat (if LiteSAM >200ms):
| Step | Time | Notes |
|---|---|---|
| XFeat feature extraction (both images) | ~10-20ms | TensorRT FP16/INT8 |
| XFeat semi-dense matching | ~30-50ms | KNN + refinement |
| Geometric verification (RANSAC) | ~5ms | |
| ESKF satellite update | ~1ms | |
| Total | ~50-80ms | Comfortably within budget |
Memory Budget (Jetson Orin Nano Super, 8GB shared)
| Component | Memory | Notes |
|---|---|---|
| OS + runtime | ~1.5GB | JetPack 6.2 + Python |
| cuVSLAM | ~200-500MB | CUDA library + map state. Configure map pruning for 3000-frame flights |
| Satellite matcher TensorRT | ~50-100MB | LiteSAM FP16 or XFeat FP16 |
| Preloaded satellite tiles | ~200MB | ±2km of flight plan, pre-resized |
| Current frame (downsampled) | ~2MB | 640×480×3 |
| ESKF state + buffers | ~10MB | |
| FastAPI + SSE runtime | ~100MB | |
| Total | ~2.1-2.9GB | ~26-36% of 8GB — comfortable margin |
Confidence Scoring
| Level | Condition | Expected Accuracy |
|---|---|---|
| HIGH | Satellite match succeeded + cuVSLAM consistent | <20m |
| MEDIUM | cuVSLAM VO only, recent satellite correction (<500m travel) | 20-50m |
| LOW | cuVSLAM VO only, no recent satellite correction | 50-100m+ |
| VERY LOW | IMU dead-reckoning only (cuVSLAM + satellite both failed) | 100m+ |
| MANUAL | User-provided position | As provided |
Key Risks and Mitigations
| Risk | Likelihood | Impact | Mitigation |
|---|---|---|---|
| cuVSLAM fails on low-texture agricultural terrain | HIGH | Frequent tracking loss, degraded VO | Increase satellite matching frequency when keypoint count drops. IMU dead-reckoning bridge (~1.5s). Accept higher drift in featureless segments. Satellite matching recovers position when texture returns. |
| LiteSAM TRT FP16 >200ms at 1280px on Orin Nano Super | MEDIUM | Must use XFeat instead (less accurate for cross-view) | Day-one TRT FP16 benchmark. If >200ms → XFeat. Since our nadir-nadir gap is small, XFeat may still perform adequately. |
| XFeat cross-view accuracy insufficient | MEDIUM | Satellite corrections less accurate | Benchmark XFeat on actual operational area satellite-aerial pairs. Increase keyframe frequency; multi-tile consensus; strict RANSAC. |
| cuVSLAM map memory growth on long flights | MEDIUM | Memory pressure | Configure map pruning, set max keyframes. Monitor memory. |
| Google Maps satellite quality in conflict zone | HIGH | Satellite matching fails | Accept VO+IMU with higher drift; request user input sooner; alternative satellite providers |
| cuVSLAM is closed-source, no nadir benchmarks | MEDIUM | Unknown failure modes over farmland | Extensive testing with real nadir UAV imagery before deployment. XFeat VO as fallback (also uses learned features). |
| Tile I/O bottleneck during expanded search | LOW | Delayed re-localization | Preload ±2km tiles in RAM; ranked search instead of exhaustive |
Testing Strategy
Integration / Functional Tests
- End-to-end pipeline test with real flight data (60 images from input_data/)
- Compare computed positions against ground truth GPS from coordinates.csv
- Measure: percentage within 50m, percentage within 20m
- Test sharp-turn handling: introduce 90-degree heading change in sequence
- Test user-input fallback: simulate 3+ consecutive failures
- Test SSE streaming: verify client receives VO result within 50ms, satellite-corrected result within 500ms
- Test session management: start/stop/restart flight sessions via REST API
- Test cuVSLAM map memory: run 3000-frame session, monitor memory growth
Non-Functional Tests
- Day-one satellite matcher benchmark: LiteSAM TRT FP16 at 1280px on Orin Nano Super. If ≤200ms → use LiteSAM. If >200ms → use XFeat. Also measure accuracy on test satellite-aerial pairs for both.
- cuVSLAM benchmark: verify 116fps monocular+IMU on Orin Nano Super
- cuVSLAM terrain stress test: test with nadir camera over (a) urban/structured terrain, (b) agricultural fields, (c) water/uniform terrain, (d) forest. Measure: keypoint count, tracking success rate, drift per 100 frames, IMU fallback frequency
- cuVSLAM keypoint monitoring: verify that low-keypoint detection triggers increased satellite matching
- Performance: measure per-frame processing time (must be <400ms)
- Memory: monitor peak usage during 3000-frame session (must stay <8GB)
- Stress: process 3000 frames without memory leak
- Keyframe strategy: vary interval (2, 3, 5, 10) and measure accuracy vs latency tradeoff
- Tile preloading: verify RAM usage of preloaded tiles for 50km flight plan
References
- EfficientLoFTR (CVPR 2024): https://github.com/zju3dv/EfficientLoFTR
- EfficientLoFTR paper: https://zju3dv.github.io/efficientloftr/
- LoFTR TensorRT adaptation: https://github.com/Kolkir/LoFTR_TRT
- PFED (2025): https://github.com/SkyEyeLoc/PFED
- STHN (IEEE RA-L 2024): https://github.com/arplaboratory/STHN
- JointLoc (IROS 2024): https://github.com/LuoXubo/JointLoc
- Hierarchical AVL (MDPI 2025): https://www.mdpi.com/2072-4292/17/20/3470
- LiteSAM (2025): https://www.mdpi.com/2072-4292/17/19/3349
- LiteSAM code: https://github.com/boyagesmile/LiteSAM
- cuVSLAM (2025-2026): https://github.com/NVlabs/PyCuVSLAM
- cuVSLAM paper: https://arxiv.org/abs/2506.04359
- PyCuVSLAM API: https://nvlabs.github.io/PyCuVSLAM/api.html
- Intermodalics cuVSLAM benchmark: https://www.intermodalics.ai/blog/nvidia-isaac-ros-in-depth-cuvslam-and-the-dp3-1-release
- Mateos-Ramirez et al. (2024): https://www.mdpi.com/2076-3417/14/16/7420
- SatLoc (2025): https://www.scilit.com/publications/e5cafaf875a49297a62b298a89d5572f
- XFeat (CVPR 2024): https://arxiv.org/abs/2404.19174
- XFeat TensorRT for Jetson: https://github.com/PranavNedunghat/XFeatTensorRT
- EfficientLoFTR (CVPR 2024): https://github.com/zju3dv/EfficientLoFTR
- LightGlue (ICCV 2023): https://github.com/cvg/LightGlue
- LightGlue TensorRT: https://fabio-sim.github.io/blog/accelerating-lightglue-inference-onnx-runtime-tensorrt/
- LightGlue TRT Jetson: https://github.com/qdLMF/LightGlue-with-FlashAttentionV2-TensorRT
- ForestVO / SP+LG VO: https://arxiv.org/html/2504.01261v1
- vo_lightglue (SP+LG VO): https://github.com/himadrir/vo_lightglue
- JetPack 6.2: https://docs.nvidia.com/jetson/archives/jetpack-archived/jetpack-62/release-notes/
- Hybrid ESKF/UKF: https://arxiv.org/abs/2512.17505
- Google Maps Tile API: https://developers.google.com/maps/documentation/tile/satellite
Related Artifacts
- AC Assessment:
_docs/00_research/gps_denied_nav/00_ac_assessment.md - Tech stack evaluation:
_docs/01_solution/tech_stack.md - Security analysis:
_docs/01_solution/security_analysis.md