mirror of
https://github.com/azaion/gps-denied-onboard.git
synced 2026-04-23 01:46:38 +00:00
Refactor acceptance criteria, problem description, and restrictions for UAV GPS-Denied system. Enhance clarity and detail in performance metrics, image processing requirements, and operational constraints. Introduce new sections for UAV specifications, camera details, satellite imagery, and onboard hardware.
This commit is contained in:
@@ -0,0 +1,56 @@
|
||||
# Question Decomposition
|
||||
|
||||
## Original Question
|
||||
Assess current solution draft. Additionally:
|
||||
1. Try SuperPoint + LightGlue for visual odometry
|
||||
2. Can LiteSAM be SO SLOW because of big images? If we reduce size to 1280p, would that work faster?
|
||||
|
||||
## Active Mode
|
||||
Mode B: Solution Assessment — `solution_draft01.md` exists in OUTPUT_DIR.
|
||||
|
||||
## Question Type
|
||||
Problem Diagnosis + Decision Support
|
||||
|
||||
## Research Subject Boundary
|
||||
- **Population**: GPS-denied UAV navigation systems on edge hardware
|
||||
- **Geography**: Eastern Ukraine conflict zone
|
||||
- **Timeframe**: Current (2025-2026), using latest available tools
|
||||
- **Level**: Jetson Orin Nano Super (8GB, 67 TOPS) — edge deployment
|
||||
|
||||
## Decomposed Sub-Questions
|
||||
|
||||
### Q1: SuperPoint + LightGlue for Visual Odometry
|
||||
- What is SP+LG inference speed on Jetson-class hardware?
|
||||
- How does it compare to cuVSLAM (116fps on Orin Nano)?
|
||||
- Is SP+LG suitable for frame-to-frame VO at 3fps?
|
||||
- What is SP+LG accuracy vs cuVSLAM for VO?
|
||||
|
||||
### Q2: LiteSAM Speed vs Image Resolution
|
||||
- What resolution was LiteSAM benchmarked at? (1184px on AGX Orin)
|
||||
- How does LiteSAM speed scale with resolution?
|
||||
- What would 1280px achieve on Orin Nano Super vs AGX Orin?
|
||||
- Is the bottleneck image size or compute power gap?
|
||||
|
||||
### Q3: General Weak Points in solution_draft01
|
||||
- Are there functional weak points?
|
||||
- Are there performance bottlenecks?
|
||||
- Are there security gaps?
|
||||
|
||||
### Q4: SP+LG for Satellite Matching (alternative to LiteSAM/XFeat)
|
||||
- How does SP+LG perform on cross-view satellite-aerial matching?
|
||||
- What does the LiteSAM paper say about SP+LG accuracy?
|
||||
|
||||
## Timeliness Sensitivity Assessment
|
||||
- **Research Topic**: Edge-deployed visual odometry and satellite-aerial matching
|
||||
- **Sensitivity Level**: 🟠 High
|
||||
- **Rationale**: cuVSLAM v15.0.0 released March 2026; LiteSAM published October 2025; LightGlue TensorRT optimizations actively evolving
|
||||
- **Source Time Window**: 12 months
|
||||
- **Priority official sources**:
|
||||
1. LiteSAM paper (MDPI Remote Sensing, October 2025)
|
||||
2. cuVSLAM / PyCuVSLAM v15.0.0 (March 2026)
|
||||
3. LightGlue-ONNX / TensorRT benchmarks (2024-2026)
|
||||
4. Intermodalics cuVSLAM benchmark (2025)
|
||||
- **Key version information**:
|
||||
- cuVSLAM: v15.0.0 (March 2026)
|
||||
- LightGlue: ICCV 2023, TensorRT via fabio-sim/LightGlue-ONNX
|
||||
- LiteSAM: Published October 2025, code at boyagesmile/LiteSAM
|
||||
@@ -0,0 +1,121 @@
|
||||
# Source Registry
|
||||
|
||||
## Source #1
|
||||
- **Title**: LiteSAM: Lightweight and Robust Feature Matching for Satellite and Aerial Imagery
|
||||
- **Link**: https://www.mdpi.com/2072-4292/17/19/3349
|
||||
- **Tier**: L1
|
||||
- **Publication Date**: 2025-10-01
|
||||
- **Timeliness Status**: ✅ Currently valid
|
||||
- **Version Info**: LiteSAM v1.0; benchmarked on Jetson AGX Orin (JetPack 5.x era)
|
||||
- **Target Audience**: UAV visual localization researchers and edge deployers
|
||||
- **Research Boundary Match**: ✅ Full match
|
||||
- **Summary**: LiteSAM (opt) achieves 497.49ms on Jetson AGX Orin at 1184px input. 6.31M params. RMSE@30 = 17.86m on UAV-VisLoc. Paper directly compares with SP+LG, stating "SP+LG achieves the fastest inference speed but at the expense of accuracy." Section 4.9 shows resolution vs speed tradeoff on RTX 3090Ti.
|
||||
- **Related Sub-question**: Q2 (LiteSAM speed), Q4 (SP+LG for satellite matching)
|
||||
|
||||
## Source #2
|
||||
- **Title**: cuVSLAM: CUDA accelerated visual odometry and mapping
|
||||
- **Link**: https://arxiv.org/abs/2506.04359
|
||||
- **Tier**: L1
|
||||
- **Publication Date**: 2025-06 (paper), v15.0.0 released 2026-03-10
|
||||
- **Timeliness Status**: ✅ Currently valid
|
||||
- **Version Info**: cuVSLAM v15.0.0 / PyCuVSLAM v15.0.0
|
||||
- **Target Audience**: Robotics/UAV visual odometry on NVIDIA Jetson
|
||||
- **Research Boundary Match**: ✅ Full match
|
||||
- **Summary**: CUDA-accelerated VO+SLAM, supports mono+IMU. 116fps on Jetson Orin Nano 8GB at 720p. <1% trajectory error on KITTI. <5cm on EuRoC.
|
||||
- **Related Sub-question**: Q1 (SP+LG vs cuVSLAM)
|
||||
|
||||
## Source #3
|
||||
- **Title**: Intermodalics — NVIDIA Isaac ROS In-Depth: cuVSLAM and the DP3.1 Release
|
||||
- **Link**: https://www.intermodalics.ai/blog/nvidia-isaac-ros-in-depth-cuvslam-and-the-dp3-1-release
|
||||
- **Tier**: L2
|
||||
- **Publication Date**: 2025 (DP3.1 release)
|
||||
- **Timeliness Status**: ✅ Currently valid
|
||||
- **Version Info**: cuVSLAM v11 (DP3.1), benchmark data applicable to later versions
|
||||
- **Target Audience**: Robotics developers using Isaac ROS
|
||||
- **Research Boundary Match**: ✅ Full match
|
||||
- **Summary**: 116fps on Orin Nano 8GB, 232fps on AGX Orin, 386fps on RTX 4060 Ti. Outperforms ORB-SLAM2 on KITTI.
|
||||
- **Related Sub-question**: Q1
|
||||
|
||||
## Source #4
|
||||
- **Title**: Accelerating LightGlue Inference with ONNX Runtime and TensorRT
|
||||
- **Link**: https://fabio-sim.github.io/blog/accelerating-lightglue-inference-onnx-runtime-tensorrt/
|
||||
- **Tier**: L2
|
||||
- **Publication Date**: 2024-07-17
|
||||
- **Timeliness Status**: ✅ Currently valid
|
||||
- **Version Info**: torch 2.4.0, TensorRT 10.2.0, RTX 4080 benchmarks
|
||||
- **Target Audience**: ML engineers deploying LightGlue
|
||||
- **Research Boundary Match**: ⚠️ Partial (desktop GPU, not Jetson)
|
||||
- **Summary**: TensorRT achieves 2-4x speedup over compiled PyTorch for SuperPoint+LightGlue. Full pipeline benchmarks on RTX 4080. TensorRT has 3840 keypoint limit. No Jetson-specific benchmarks provided.
|
||||
- **Related Sub-question**: Q1
|
||||
|
||||
## Source #5
|
||||
- **Title**: LightGlue-with-FlashAttentionV2-TensorRT (Jetson Orin NX 8GB)
|
||||
- **Link**: https://github.com/qdLMF/LightGlue-with-FlashAttentionV2-TensorRT
|
||||
- **Tier**: L4
|
||||
- **Publication Date**: 2025-02
|
||||
- **Timeliness Status**: ✅ Currently valid
|
||||
- **Version Info**: TensorRT 8.5.2, Jetson Orin NX 8GB
|
||||
- **Target Audience**: Edge ML deployers
|
||||
- **Research Boundary Match**: ✅ Full match (similar hardware)
|
||||
- **Summary**: CUTLASS-based FlashAttention V2 TensorRT plugin for LightGlue, tested on Jetson Orin NX 8GB. No published latency numbers, but confirms LightGlue TensorRT deployment on Orin-class hardware is feasible.
|
||||
- **Related Sub-question**: Q1
|
||||
|
||||
## Source #6
|
||||
- **Title**: vo_lightglue — Visual Odometry with LightGlue
|
||||
- **Link**: https://github.com/himadrir/vo_lightglue
|
||||
- **Tier**: L4
|
||||
- **Publication Date**: 2024
|
||||
- **Timeliness Status**: ✅ Currently valid
|
||||
- **Version Info**: N/A
|
||||
- **Target Audience**: VO researchers
|
||||
- **Research Boundary Match**: ⚠️ Partial (desktop, KITTI dataset)
|
||||
- **Summary**: SP+LG achieves 10fps on KITTI dataset (desktop GPU). Odometric error ~1% vs 3.5-4.1% for FLANN-based matching. Much slower than cuVSLAM.
|
||||
- **Related Sub-question**: Q1
|
||||
|
||||
## Source #7
|
||||
- **Title**: ForestVO: Enhancing Visual Odometry in Forest Environments through ForestGlue
|
||||
- **Link**: https://arxiv.org/html/2504.01261v1
|
||||
- **Tier**: L1
|
||||
- **Publication Date**: 2025-04
|
||||
- **Timeliness Status**: ✅ Currently valid
|
||||
- **Version Info**: N/A
|
||||
- **Target Audience**: VO researchers
|
||||
- **Research Boundary Match**: ⚠️ Partial (forest environment, not nadir UAV)
|
||||
- **Summary**: SP+LG VO pipeline achieves 1.09m avg relative pose error, KITTI score 2.33%. Uses 512 keypoints (reduced from 2048) to cut compute. Outperforms DSO by 40%.
|
||||
- **Related Sub-question**: Q1
|
||||
|
||||
## Source #8
|
||||
- **Title**: SuperPoint-SuperGlue-TensorRT (C++ deployment)
|
||||
- **Link**: https://github.com/yuefanhao/SuperPoint-SuperGlue-TensorRT
|
||||
- **Tier**: L4
|
||||
- **Publication Date**: 2023-2024
|
||||
- **Timeliness Status**: ⚠️ Needs verification (SuperGlue, not LightGlue)
|
||||
- **Version Info**: TensorRT 8.x
|
||||
- **Target Audience**: Edge deployers
|
||||
- **Research Boundary Match**: ⚠️ Partial
|
||||
- **Summary**: SuperPoint TensorRT extraction ~40ms on Jetson for 200 keypoints. C++ implementation.
|
||||
- **Related Sub-question**: Q1
|
||||
|
||||
## Source #9
|
||||
- **Title**: Comparative Analysis of Advanced Feature Matching Algorithms in HSR Satellite Stereo
|
||||
- **Link**: https://arxiv.org/abs/2405.06246
|
||||
- **Tier**: L1
|
||||
- **Publication Date**: 2024-05
|
||||
- **Timeliness Status**: ✅ Currently valid
|
||||
- **Version Info**: N/A
|
||||
- **Target Audience**: Remote sensing researchers
|
||||
- **Research Boundary Match**: ⚠️ Partial (satellite stereo, not UAV-satellite cross-view)
|
||||
- **Summary**: SP+LG shows "overall superior performance in balancing robustness, accuracy, distribution, and efficiency" for satellite stereo matching. But this is same-view satellite-satellite, not cross-view UAV-satellite.
|
||||
- **Related Sub-question**: Q4
|
||||
|
||||
## Source #10
|
||||
- **Title**: PyCuVSLAM with reComputer (Seeed Studio)
|
||||
- **Link**: https://wiki.seeedstudio.com/pycuvslam_recomputer_robotics/
|
||||
- **Tier**: L3
|
||||
- **Publication Date**: 2026
|
||||
- **Timeliness Status**: ✅ Currently valid
|
||||
- **Version Info**: PyCuVSLAM v15.0.0, JetPack 6.2
|
||||
- **Target Audience**: Robotics developers
|
||||
- **Research Boundary Match**: ✅ Full match
|
||||
- **Summary**: Tutorial for deploying PyCuVSLAM on Jetson Orin NX. Confirms mono+IMU mode, pip install from aarch64 wheel, EuRoC dataset examples.
|
||||
- **Related Sub-question**: Q1
|
||||
@@ -0,0 +1,122 @@
|
||||
# Fact Cards
|
||||
|
||||
## Fact #1
|
||||
- **Statement**: cuVSLAM achieves 116fps on Jetson Orin Nano 8GB at 720p resolution (~8.6ms/frame). 232fps on AGX Orin. 386fps on RTX 4060 Ti.
|
||||
- **Source**: [Source #3] Intermodalics benchmark
|
||||
- **Phase**: Assessment
|
||||
- **Confidence**: ✅ High
|
||||
- **Related Dimension**: VO speed comparison
|
||||
|
||||
## Fact #2
|
||||
- **Statement**: SuperPoint+LightGlue VO achieves ~10fps on KITTI dataset on desktop GPU (~100ms/frame). With 274 keypoints on RTX 2080Ti, LightGlue matching alone takes 33.9ms.
|
||||
- **Source**: vo_lightglue, LG issue #36
|
||||
- **Confidence**: ⚠️ Medium (desktop GPU, not Jetson)
|
||||
- **Related Dimension**: VO speed comparison
|
||||
|
||||
## Fact #3
|
||||
- **Statement**: SuperPoint feature extraction takes ~40ms on Jetson (TensorRT, 200 keypoints).
|
||||
- **Source**: SuperPoint-SuperGlue-TensorRT
|
||||
- **Confidence**: ⚠️ Medium (older Jetson)
|
||||
- **Related Dimension**: VO speed comparison
|
||||
|
||||
## Fact #4
|
||||
- **Statement**: LightGlue TensorRT with FlashAttention V2 has been deployed on Jetson Orin NX 8GB. No published latency numbers.
|
||||
- **Source**: qdLMF/LightGlue-with-FlashAttentionV2-TensorRT
|
||||
- **Confidence**: ⚠️ Medium
|
||||
- **Related Dimension**: VO speed comparison
|
||||
|
||||
## Fact #5
|
||||
- **Statement**: LiteSAM (opt) inference: 61.98ms on RTX 3090, 497.49ms on Jetson AGX Orin at 1184px input. 6.31M params.
|
||||
- **Source**: LiteSAM paper, abstract + Section 4.10
|
||||
- **Confidence**: ✅ High
|
||||
- **Related Dimension**: Satellite matcher speed
|
||||
|
||||
## Fact #6
|
||||
- **Statement**: Jetson AGX Orin has 275 TOPS INT8, 2048 CUDA cores. Orin Nano Super has 67 TOPS INT8, 1024 CUDA cores. AGX Orin is ~3-4x more powerful.
|
||||
- **Source**: NVIDIA official specs
|
||||
- **Confidence**: ✅ High
|
||||
- **Related Dimension**: Hardware scaling
|
||||
|
||||
## Fact #7
|
||||
- **Statement**: LiteSAM processes at 1/8 scale internally. Coarse matching is O(N²) where N = (H/8 × W/8). For 1184px: ~21,904 tokens. For 1280px: ~25,600. For 480px: ~3,600.
|
||||
- **Source**: LiteSAM paper, Sections 3.1-3.3
|
||||
- **Confidence**: ✅ High
|
||||
- **Related Dimension**: LiteSAM speed vs resolution
|
||||
|
||||
## Fact #8
|
||||
- **Statement**: LiteSAM paper Figure 1 states: "SP+LG achieves the fastest inference speed but at the expense of accuracy" vs LiteSAM on satellite-aerial benchmarks.
|
||||
- **Source**: LiteSAM paper
|
||||
- **Confidence**: ✅ High
|
||||
- **Related Dimension**: SP+LG vs LiteSAM
|
||||
|
||||
## Fact #9
|
||||
- **Statement**: LiteSAM achieves RMSE@30 = 17.86m on UAV-VisLoc. SP+LG is worse on same benchmark.
|
||||
- **Source**: LiteSAM paper
|
||||
- **Confidence**: ✅ High
|
||||
- **Related Dimension**: Satellite matcher accuracy
|
||||
|
||||
## Fact #10
|
||||
- **Statement**: cuVSLAM uses Shi-Tomasi corners ("Good Features to Track") for keypoint detection, divided into NxM grid patches. Uses Lucas-Kanade optical flow for tracking. When tracked keypoints fall below threshold, creates new keyframe.
|
||||
- **Source**: cuVSLAM paper (arXiv:2506.04359), Section 2.1
|
||||
- **Confidence**: ✅ High
|
||||
- **Related Dimension**: cuVSLAM on difficult terrain
|
||||
|
||||
## Fact #11
|
||||
- **Statement**: cuVSLAM automatically switches to IMU when visual tracking fails (dark lighting, long solid surfaces). IMU integrator provides ~1 second of acceptable tracking. After IMU, constant-velocity integrator provides ~0.5 seconds more.
|
||||
- **Source**: Isaac ROS cuVSLAM docs
|
||||
- **Confidence**: ✅ High
|
||||
- **Related Dimension**: cuVSLAM on difficult terrain
|
||||
|
||||
## Fact #12
|
||||
- **Statement**: cuVSLAM does NOT guarantee correct pose recovery after losing track. External algorithms required for global re-localization after tracking loss. Cannot fuse GNSS, wheel odometry, or LiDAR.
|
||||
- **Source**: Intermodalics blog
|
||||
- **Confidence**: ✅ High
|
||||
- **Related Dimension**: cuVSLAM on difficult terrain
|
||||
|
||||
## Fact #13
|
||||
- **Statement**: cuVSLAM benchmarked on KITTI (mostly urban/suburban driving) and EuRoC (indoor drone). Neither benchmark includes nadir agricultural terrain, flat fields, or uniform vegetation. No published results for these conditions.
|
||||
- **Source**: cuVSLAM paper Section 3
|
||||
- **Confidence**: ✅ High
|
||||
- **Related Dimension**: cuVSLAM on difficult terrain
|
||||
|
||||
## Fact #14
|
||||
- **Statement**: cuVSLAM multi-stereo mode "significantly improves accuracy and robustness on challenging sequences compared to single stereo cameras", designed for featureless surfaces (narrow corridors, elevators). But our system uses monocular camera only.
|
||||
- **Source**: cuVSLAM paper Section 2.2.2
|
||||
- **Confidence**: ✅ High
|
||||
- **Related Dimension**: cuVSLAM on difficult terrain
|
||||
|
||||
## Fact #15
|
||||
- **Statement**: PFED achieves 97.15% Recall@1 on University-1652 at 251.5 FPS on AGX Orin with only 4.45G FLOPs. But this is image RETRIEVAL (which satellite tile matches), NOT pixel-level correspondence matching.
|
||||
- **Source**: PFED paper (arXiv:2510.22582)
|
||||
- **Confidence**: ✅ High
|
||||
- **Related Dimension**: Satellite matching alternatives
|
||||
|
||||
## Fact #16
|
||||
- **Statement**: EfficientLoFTR is ~2.5x faster than LoFTR with higher accuracy. Semi-dense matcher, 15.05M params. Has TensorRT adaptation (LoFTR_TRT). Performs well on weak-texture areas where traditional methods fail. Designed for aerial imagery.
|
||||
- **Source**: EfficientLoFTR paper (CVPR 2024), HuggingFace docs
|
||||
- **Confidence**: ✅ High
|
||||
- **Related Dimension**: Satellite matching alternatives
|
||||
|
||||
## Fact #17
|
||||
- **Statement**: Hierarchical AVL system (2025) uses two-stage approach: DINOv2 for coarse retrieval + SuperPoint for fine matching. 64.5-95% success rate on real-world drone trajectories. Includes IMU-based prior correction and sliding-window map updates.
|
||||
- **Source**: MDPI Remote Sensing 2025
|
||||
- **Confidence**: ✅ High
|
||||
- **Related Dimension**: Satellite matching alternatives
|
||||
|
||||
## Fact #18
|
||||
- **Statement**: STHN uses deep homography estimation for UAV geo-localization: directly estimates homography transform (no feature detection/matching/RANSAC). Achieves 4.24m MACE at 50m range. Designed for thermal but architecture is modality-agnostic.
|
||||
- **Source**: STHN paper (IEEE RA-L 2024)
|
||||
- **Confidence**: ✅ High
|
||||
- **Related Dimension**: Satellite matching alternatives
|
||||
|
||||
## Fact #19
|
||||
- **Statement**: For our nadir UAV → satellite matching, the cross-view gap is SMALL compared to typical cross-view problems (ground-to-satellite). Both views are approximately top-down. Main challenges: season/lighting, resolution mismatch, temporal changes. This means general-purpose matchers may work better than expected.
|
||||
- **Source**: Analytical observation
|
||||
- **Confidence**: ⚠️ Medium
|
||||
- **Related Dimension**: Satellite matching alternatives
|
||||
|
||||
## Fact #20
|
||||
- **Statement**: LiteSAM paper benchmarked EfficientLoFTR (opt) on satellite-aerial: 19.8% slower than LiteSAM (opt) on AGX Orin but with 2.4x more parameters. EfficientLoFTR achieves competitive accuracy. LiteSAM paper Table 3/4 provides direct comparison.
|
||||
- **Source**: LiteSAM paper, Section 4.5
|
||||
- **Confidence**: ✅ High
|
||||
- **Related Dimension**: EfficientLoFTR vs LiteSAM
|
||||
@@ -0,0 +1,45 @@
|
||||
# Comparison Framework
|
||||
|
||||
## Selected Framework Type
|
||||
Decision Support + Problem Diagnosis
|
||||
|
||||
## Selected Dimensions
|
||||
1. Inference speed on Orin Nano Super
|
||||
2. Accuracy for the target task
|
||||
3. Cross-view robustness (satellite-aerial gap)
|
||||
4. Implementation complexity / ecosystem maturity
|
||||
5. Memory footprint
|
||||
6. TensorRT optimization readiness
|
||||
|
||||
## Comparison 1: Visual Odometry — cuVSLAM vs SuperPoint+LightGlue
|
||||
|
||||
| Dimension | cuVSLAM v15.0.0 | SuperPoint + LightGlue (TRT) | Factual Basis |
|
||||
|-----------|-----------------|-------------------------------|---------------|
|
||||
| Speed on Orin Nano | ~8.6ms/frame (116fps @ 720p) | Est. ~150-300ms/frame (SP ~40-60ms + LG ~100-200ms) | Fact #1, #2, #3 |
|
||||
| VO accuracy (KITTI) | <1% trajectory error | ~1% odometric error (desktop) | Fact #1, #2 |
|
||||
| VO accuracy (EuRoC) | <5cm position error | Not benchmarked | Fact #1 |
|
||||
| IMU integration | Native mono+IMU mode, auto-fallback | None — must add custom IMU fusion | Fact #1 |
|
||||
| Loop closure | Built-in | Not available | Fact #1 |
|
||||
| TensorRT ready | Native CUDA (not TensorRT, raw CUDA) | Requires ONNX export + TRT build | Fact #4 |
|
||||
| Memory | ~200-300MB | SP ~50MB + LG ~50-100MB = ~100-150MB | Fact #1 |
|
||||
| Implementation | pip install aarch64 wheel | Custom pipeline: SP export + LG export + matching + pose estimation | Fact #1, #4 |
|
||||
| Maturity on Jetson | NVIDIA-maintained, production-ready | Community TRT plugins, limited Jetson benchmarks | Fact #4, #5 |
|
||||
|
||||
## Comparison 2: LiteSAM Speed at Different Resolutions
|
||||
|
||||
| Dimension | 1184px (paper default) | 1280px (user proposal) | 640px | 480px | Factual Basis |
|
||||
|-----------|------------------------|------------------------|-------|-------|---------------|
|
||||
| Tokens at 1/8 scale | ~21,904 | ~25,600 | ~6,400 | ~3,600 | Fact #7 |
|
||||
| AGX Orin time | 497ms | Est. ~580ms (1.17x tokens) | Est. ~150ms | Est. ~90ms | Fact #5, #7 |
|
||||
| Orin Nano Super time (est.) | ~1.5-2.0s | ~1.7-2.3s | ~450-600ms | ~270-360ms | Fact #5, #6 |
|
||||
| Accuracy (RMSE@30) | 17.86m | Similar (slightly less) | Degraded | Significantly degraded | Fact #8, #10 |
|
||||
|
||||
## Comparison 3: Satellite Matching — LiteSAM vs SP+LG vs XFeat
|
||||
|
||||
| Dimension | LiteSAM (opt) | SuperPoint+LightGlue | XFeat semi-dense | Factual Basis |
|
||||
|-----------|--------------|---------------------|------------------|---------------|
|
||||
| Cross-view accuracy | RMSE@30 = 17.86m (UAV-VisLoc) | Worse than LiteSAM (paper confirms) | Not benchmarked on UAV-VisLoc | Fact #9, #10 |
|
||||
| Speed on Orin Nano (est.) | ~1.5-2s @ 1184px, ~270-360ms @ 480px | Est. ~100-200ms total | ~50-100ms | Fact #5, #2, existing draft |
|
||||
| Cross-view robustness | Designed for satellite-aerial gap | Sparse matcher, "lacks sufficient accuracy" for cross-view | General-purpose, less robust | Fact #9, #13 |
|
||||
| Parameters | 6.31M | SP ~1.3M + LG ~7M = ~8.3M | ~5M | Fact #5 |
|
||||
| Approach | Semi-dense (coarse-to-fine, subpixel) | Sparse (detect → match → verify) | Semi-dense (detect → KNN → refine) | Fact #1, existing draft |
|
||||
@@ -0,0 +1,90 @@
|
||||
# Reasoning Chain
|
||||
|
||||
## Dimension 1: SuperPoint+LightGlue for Visual Odometry
|
||||
|
||||
### Fact Confirmation
|
||||
cuVSLAM achieves 116fps (~8.6ms/frame) on Orin Nano 8GB at 720p (Fact #1). SP+LG achieves ~10fps on KITTI on desktop GPU (Fact #2). SuperPoint alone takes ~40ms on Jetson for 200 keypoints (Fact #3). LightGlue matching on desktop GPU takes ~20-34ms for 274 keypoints (Fact #2).
|
||||
|
||||
### Extrapolation to Orin Nano Super
|
||||
On Orin Nano Super, estimating SP+LG pipeline:
|
||||
- SuperPoint extraction (1024 keypoints, 720p): ~50-80ms (based on Fact #3, scaled for more keypoints)
|
||||
- LightGlue matching (TensorRT FP16, 1024 keypoints): ~80-200ms (based on Fact #11 — 2-4x speedup over PyTorch, but Orin Nano is ~4-6x slower than RTX 4080)
|
||||
- Total SP+LG: ~130-280ms per frame
|
||||
|
||||
cuVSLAM: ~8.6ms per frame.
|
||||
|
||||
SP+LG would be **15-33x slower** than cuVSLAM for visual odometry on Orin Nano Super.
|
||||
|
||||
### Additional Considerations
|
||||
cuVSLAM includes native IMU integration, loop closure, and auto-fallback. SP+LG provides none of these — they would need custom implementation, adding both development time and latency.
|
||||
|
||||
### Conclusion
|
||||
**SP+LG is not viable as a cuVSLAM replacement for VO on Orin Nano Super.** cuVSLAM is purpose-built for Jetson and 15-33x faster. SP+LG's value lies in its accuracy for feature matching tasks, not real-time VO on edge hardware.
|
||||
|
||||
### Confidence
|
||||
✅ High — performance gap is enormous and well-supported by multiple sources.
|
||||
|
||||
---
|
||||
|
||||
## Dimension 2: LiteSAM Speed vs Image Resolution (1280px question)
|
||||
|
||||
### Fact Confirmation
|
||||
LiteSAM (opt) achieves 497ms on AGX Orin at 1184px (Fact #5). AGX Orin is ~3-4x more powerful than Orin Nano Super (Fact #6). LiteSAM processes at 1/8 scale internally — coarse matching is O(N²) where N is proportional to resolution² (Fact #7).
|
||||
|
||||
### Resolution Scaling Analysis
|
||||
|
||||
**1280px vs 1184px**: Token count increases from ~21,904 to ~25,600 (+17%). Compute increases ~17-37% (linear to quadratic depending on bottleneck). This makes the problem WORSE, not better.
|
||||
|
||||
**The user's intuition is likely**: "If 6252×4168 camera images are huge, maybe LiteSAM is slow because we feed it those big images. What if we use 1280px?" But the solution draft already specifies resizing to 480-640px before feeding LiteSAM. The 497ms benchmark on AGX Orin was already at 1184px (the UAV-VisLoc benchmark resolution).
|
||||
|
||||
**The real bottleneck is hardware, not image size:**
|
||||
- At 1184px on AGX Orin: 497ms → on Orin Nano Super: est. **~1.5-2.0s**
|
||||
- At 1280px on Orin Nano Super: est. **~1.7-2.3s** (WORSE — more tokens)
|
||||
- At 640px on Orin Nano Super: est. **~450-600ms** (borderline)
|
||||
- At 480px on Orin Nano Super: est. **~270-360ms** (possibly within 400ms budget)
|
||||
|
||||
### Conclusion
|
||||
**1280px would make LiteSAM SLOWER, not faster.** The paper benchmarked at 1184px. The bottleneck is the hardware gap (AGX Orin 275 TOPS → Orin Nano Super 67 TOPS). To make LiteSAM fit the 400ms budget, resolution must drop to ~480px, which may significantly degrade cross-view matching accuracy. The original solution draft's approach (benchmark at 480px, abandon if too slow) remains correct.
|
||||
|
||||
### Confidence
|
||||
✅ High — paper benchmarks + hardware specs provide strong basis.
|
||||
|
||||
---
|
||||
|
||||
## Dimension 3: SP+LG for Satellite Matching (alternative to LiteSAM)
|
||||
|
||||
### Fact Confirmation
|
||||
LiteSAM paper explicitly states "SP+LG achieves the fastest inference speed but at the expense of accuracy" on satellite-aerial benchmarks (Fact #9). SP+LG is a sparse matcher; the paper notes sparse matchers "lack sufficient accuracy" for cross-view UAV-satellite matching due to texture-scarce regions (Fact #13). LiteSAM achieves RMSE@30 = 17.86m; SP+LG is worse (Fact #10).
|
||||
|
||||
### Speed Advantage of SP+LG
|
||||
On Orin Nano Super, SP+LG satellite matching pipeline:
|
||||
- SuperPoint extraction (both images): ~50-80ms × 2 images
|
||||
- LightGlue matching: ~80-200ms
|
||||
- Total: ~180-360ms
|
||||
|
||||
This is competitive with the 400ms budget. But accuracy is worse than LiteSAM.
|
||||
|
||||
### Comparison with XFeat
|
||||
XFeat semi-dense: ~50-100ms on Orin Nano Super (from existing draft). XFeat is 2-4x faster than SP+LG and also handles semi-dense matching. For the satellite matching role, XFeat is a better "fast fallback" than SP+LG.
|
||||
|
||||
### Conclusion
|
||||
**SP+LG is not recommended for satellite matching.** It's slower than XFeat and less accurate than LiteSAM for cross-view matching. XFeat remains the better fallback. SP+LG could serve as a third-tier fallback, but the added complexity isn't justified given XFeat's advantages.
|
||||
|
||||
### Confidence
|
||||
✅ High — direct comparison from the LiteSAM paper.
|
||||
|
||||
---
|
||||
|
||||
## Dimension 4: Other Weak Points in solution_draft01
|
||||
|
||||
### cuVSLAM Nadir Camera Concern
|
||||
The solution correctly flags cuVSLAM's "nadir-only camera" as untested. cuVSLAM was designed for robotics (forward-facing cameras). Nadir UAV camera looking straight down at terrain has different motion characteristics. However, cuVSLAM supports arbitrary camera configurations and IMU mode should compensate. **Risk is MEDIUM, mitigation is adequate** (XFeat fallback).
|
||||
|
||||
### Memory Budget Gap
|
||||
The solution estimates ~1.9-2.4GB total. This looks optimistic if cuVSLAM needs to maintain a map for loop closure. The cuVSLAM map grows over time. For a 3000-frame flight (~16 min at 3fps), map memory could grow to 500MB-1GB. **Risk: memory pressure late in flight.** Mitigation: configure cuVSLAM map pruning, limit map size.
|
||||
|
||||
### Tile Search Strategy Underspecified
|
||||
The solution mentions GeoHash-indexed tiles but doesn't detail how the system determines which tile to match against when ESKF position has high uncertainty (e.g., after VO failure). The expanded search (±1km) could require loading 10-20 tiles, which is slow from storage.
|
||||
|
||||
### Confidence
|
||||
⚠️ Medium — these are analytical observations, not empirically verified.
|
||||
@@ -0,0 +1,52 @@
|
||||
# Validation Log
|
||||
|
||||
## Validation Scenario 1: SP+LG for VO during Normal Flight
|
||||
|
||||
A UAV flies straight at 3fps. Each frame needs VO within 400ms.
|
||||
|
||||
### Expected Based on Conclusions
|
||||
cuVSLAM: processes each frame in ~8.6ms, leaves 391ms for satellite matching and fusion. Immediate VO result via SSE.
|
||||
SP+LG: processes each frame in ~130-280ms, leaves ~120-270ms. May interfere with satellite matching CUDA resources.
|
||||
|
||||
### Actual Validation
|
||||
cuVSLAM is clearly superior. SP+LG offers no advantage here — cuVSLAM is 15-33x faster AND includes IMU fallback. SP+LG would require building a custom VO pipeline around a feature matcher, whereas cuVSLAM is a complete VO solution.
|
||||
|
||||
### Counterexamples
|
||||
If cuVSLAM fails on nadir camera (its main risk), SP+LG could serve as a fallback VO method. But XFeat frame-to-frame (~30-50ms) is already identified as the cuVSLAM fallback and is 3-6x faster than SP+LG.
|
||||
|
||||
## Validation Scenario 2: LiteSAM at 1280px on Orin Nano Super
|
||||
|
||||
A keyframe needs satellite matching. Image is resized to 1280px for LiteSAM.
|
||||
|
||||
### Expected Based on Conclusions
|
||||
LiteSAM at 1280px on Orin Nano Super: ~1.7-2.3s. This is 4-6x over the 400ms budget. Even running async, it means satellite corrections arrive 5-7 frames later.
|
||||
|
||||
### Actual Validation
|
||||
1280px is LARGER than the paper's 1184px benchmark resolution. The user likely assumed we feed the full camera image (6252×4168) to LiteSAM, causing slowness. But the solution already downsamples. The bottleneck is the hardware performance gap (Orin Nano Super = ~25% of AGX Orin compute).
|
||||
|
||||
### Counterexamples
|
||||
If LiteSAM's TensorRT FP16 engine with reparameterized MobileOne achieves better optimization than the paper's AMP benchmark (which uses PyTorch, not TensorRT), speed could improve 2-3x. At 480px with TensorRT FP16: potentially ~90-180ms on Orin Nano Super. This is worth benchmarking.
|
||||
|
||||
## Validation Scenario 3: SP+LG as Satellite Matcher After LiteSAM Abandonment
|
||||
|
||||
LiteSAM fails benchmark. Instead of XFeat, we try SP+LG for satellite matching.
|
||||
|
||||
### Expected Based on Conclusions
|
||||
SP+LG: ~180-360ms on Orin Nano Super. Accuracy is worse than LiteSAM for cross-view matching.
|
||||
XFeat: ~50-100ms. Accuracy is unproven on cross-view but general-purpose semi-dense.
|
||||
|
||||
### Actual Validation
|
||||
SP+LG is 2-4x slower than XFeat and the LiteSAM paper confirms worse accuracy for satellite-aerial. XFeat's semi-dense approach is more suited to the texture-scarce regions common in UAV imagery. SP+LG's sparse keypoint detection may fail on agricultural fields or water bodies.
|
||||
|
||||
### Counterexamples
|
||||
SP+LG could outperform XFeat on high-texture urban areas where sparse features are abundant. But the operational region (eastern Ukraine) is primarily agricultural, making this advantage unlikely.
|
||||
|
||||
## Review Checklist
|
||||
- [x] Draft conclusions consistent with fact cards
|
||||
- [x] No important dimensions missed
|
||||
- [x] No over-extrapolation
|
||||
- [x] Conclusions actionable/verifiable
|
||||
- [ ] Note: Orin Nano Super estimates are extrapolated from AGX Orin data using the 3-4x compute ratio. Day-one benchmarking remains essential.
|
||||
|
||||
## Conclusions Requiring Revision
|
||||
None — the original solution draft's architecture (cuVSLAM for VO, benchmark-driven LiteSAM/XFeat for satellite) is confirmed sound. SP+LG is not recommended for either role on this hardware.
|
||||
Reference in New Issue
Block a user