add clarification to research methodology by including a step for solution comparison and user consultation

This commit is contained in:
Oleksandr Bezdieniezhnykh
2026-03-17 18:43:57 +02:00
parent d764250f9a
commit b419e2c04a
35 changed files with 6030 additions and 0 deletions
@@ -0,0 +1,50 @@
# Question Decomposition
## Original Question
Should LiteSAM replace the current SuperPoint+LightGlue satellite fine matching stage in the GPS-denied UAV navigation pipeline, and how does it compare to EfficientLoFTR?
## Active Mode
Mode B — Solution Assessment of solution_draft03.md, focused on satellite geo-referencing component.
## Problem Context Summary
- UAV photos (FullHD to 6252×4168) from wing-type UAV, camera pointing down
- Eastern/southern Ukraine, no IMU data, altitude ≤1km
- GPS of first image known, need to determine GPS for all subsequent images
- Two-stage satellite matching: DINOv2 ViT-S/14 coarse retrieval + SuperPoint+LightGlue ONNX FP16 fine matching
- Hardware constraint: RTX 2060 (6GB VRAM), 16GB RAM
- Processing target: <5s per image (current estimate ~230-270ms)
## Question Type
Decision Support — weighing trade-offs between matching approaches for the satellite geo-referencing component.
## Research Subject Boundary
| Dimension | Boundary |
|-----------|----------|
| Population | Satellite-to-aerial (nadir UAV) image matching at 100-1000m altitude |
| Geography | Rural/agricultural terrain, Eastern Ukraine |
| Timeframe | 2024-2026, focusing on current state-of-the-art |
| Level | Production deployment on RTX 2060 GPU |
## Decomposed Sub-questions
1. How does LiteSAM's accuracy compare to SuperPoint+LightGlue and EfficientLoFTR on satellite-aerial benchmarks?
2. What is the estimated inference time on RTX 2060 for each approach?
3. What is the VRAM footprint of each approach?
4. How mature is each codebase for production deployment (ONNX, TensorRT, community)?
5. How do these approaches handle rotation variance (critical for segment starts)?
6. Can LiteSAM/EfficientLoFTR coexist with the DINOv2 coarse retrieval stage?
7. What are the risks of adopting LiteSAM given its early-stage codebase?
## Timeliness Sensitivity Assessment
- **Research Topic**: Feature matching for satellite-UAV localization
- **Sensitivity Level**: 🟠 High
- **Rationale**: Active research area with new methods published monthly; model capabilities and benchmarks change rapidly
- **Source Time Window**: 12 months
- **Priority official sources**:
1. LiteSAM paper (Oct 2025) + GitHub repo
2. EfficientLoFTR paper (CVPR 2024) + GitHub repo
3. LightGlue-ONNX GitHub repo
- **Key version information**:
- LiteSAM: v1 (initial release, 4 commits)
- EfficientLoFTR: CVPR 2024, stable
- LightGlue-ONNX: active development
@@ -0,0 +1,97 @@
# Source Registry
## Source #1
- **Title**: LiteSAM: Lightweight and Robust Feature Matching for Satellite and Aerial Imagery
- **Link**: https://www.mdpi.com/2072-4292/17/19/3349
- **Tier**: L1
- **Publication Date**: 2025-10-01
- **Timeliness Status**: ✅ Currently valid
- **Version Info**: LiteSAM v1 (6.31M params)
- **Target Audience**: UAV satellite-aerial matching at 100-2000m altitude
- **Research Boundary Match**: ✅ Full match
- **Summary**: Proposes LiteSAM, a lightweight semi-dense matcher achieving RMSE@30=17.86m on UAV-VisLoc with 6.31M params and 83.79ms on RTX 3090. Outperforms EfficientLoFTR in hit rate while using 2.4x fewer parameters.
- **Related Sub-question**: 1, 2, 3
## Source #2
- **Title**: LiteSAM GitHub Repository
- **Link**: https://github.com/boyagesmile/LiteSAM
- **Tier**: L1
- **Publication Date**: 2025-09 (estimated from paper)
- **Timeliness Status**: ✅ Currently valid
- **Version Info**: 4 commits, 5 stars, 0 forks
- **Target Audience**: Researchers and developers
- **Research Boundary Match**: ✅ Full match
- **Summary**: Official code repository. Pretrained weights on Google Drive. Built upon EfficientLoFTR. PyTorch only, no ONNX/TensorRT support. Very early-stage.
- **Related Sub-question**: 4, 7
## Source #3
- **Title**: Efficient LoFTR: Semi-Dense Local Feature Matching with Sparse-Like Speed (CVPR 2024)
- **Link**: https://github.com/zju3dv/EfficientLoFTR
- **Tier**: L1
- **Publication Date**: 2024-03
- **Timeliness Status**: ✅ Currently valid
- **Version Info**: CVPR 2024, 964 stars
- **Target Audience**: Computer vision researchers and practitioners
- **Research Boundary Match**: ✅ Full match
- **Summary**: Semi-dense matcher, 15.05M params, ~2.5x faster than LoFTR. TensorRT adaptation exists. HuggingFace integration. ONNX export available. Achieves 27ms at 640×480 with FP16.
- **Related Sub-question**: 1, 2, 3, 4
## Source #4
- **Title**: LightGlue-ONNX
- **Link**: https://github.com/fabio-sim/LightGlue-ONNX
- **Tier**: L1
- **Publication Date**: 2023-ongoing
- **Timeliness Status**: ✅ Currently valid
- **Version Info**: Active development, FP16 on Turing GPUs
- **Target Audience**: Developers deploying LightGlue on edge/production
- **Research Boundary Match**: ✅ Full match
- **Summary**: ONNX/TensorRT export for LightGlue. 2-4x speedup. FP16 works on RTX 2060 (Turing). Well-tested production-ready path.
- **Related Sub-question**: 2, 4
## Source #5
- **Title**: LoFTR_TRT — TensorRT adaptation of LoFTR
- **Link**: https://github.com/Kolkir/LoFTR_TRT
- **Tier**: L2
- **Publication Date**: 2022-ongoing
- **Timeliness Status**: ⚠️ Based on original LoFTR, not EfficientLoFTR
- **Version Info**: 105+ stars
- **Target Audience**: Developers deploying LoFTR on embedded/edge
- **Research Boundary Match**: ⚠️ Partial overlap — LoFTR architecture, not EfficientLoFTR
- **Summary**: Demonstrates LoFTR family can be exported to ONNX/TensorRT. Knowledge distillation approach for coarse-only variant. Applicable pattern for EfficientLoFTR.
- **Related Sub-question**: 4
## Source #6
- **Title**: EfficientLoFTR HuggingFace Model
- **Link**: https://huggingface.co/zju-community/efficientloftr
- **Tier**: L1
- **Publication Date**: 2024
- **Timeliness Status**: ✅ Currently valid
- **Version Info**: Integrated with HuggingFace Transformers
- **Target Audience**: ML practitioners
- **Research Boundary Match**: ✅ Full match
- **Summary**: Official HuggingFace integration. 27ms at 640×480 with mixed precision. Surpasses SuperPoint+LightGlue in speed and accuracy.
- **Related Sub-question**: 2, 3, 4
## Source #7
- **Title**: DALGlue — Efficient image matching for UAV visual navigation
- **Link**: https://www.nature.com/articles/s41598-025-21602-5
- **Tier**: L1
- **Publication Date**: 2025
- **Timeliness Status**: ✅ Currently valid
- **Version Info**: 2025 publication
- **Target Audience**: UAV visual navigation researchers
- **Research Boundary Match**: ✅ Full match
- **Summary**: DALGlue outperforms LightGlue by 11.8% MMA on MegaDepth. Uses dual-tree complex wavelet transform. AUC@5°/10°/20° = 57.01/73.00/84.11. Alternative sparse matcher.
- **Related Sub-question**: 1
## Source #8
- **Title**: LightGlue rotation invariance issue #64
- **Link**: https://github.com/cvg/LightGlue/issues/64
- **Tier**: L2
- **Publication Date**: 2023
- **Timeliness Status**: ✅ Still relevant
- **Version Info**: Confirmed limitation
- **Target Audience**: LightGlue users
- **Research Boundary Match**: ✅ Full match
- **Summary**: LightGlue confirmed NOT rotation-invariant. Same limitation applies to all LoFTR-family matchers including LiteSAM and EfficientLoFTR.
- **Related Sub-question**: 5
@@ -0,0 +1,129 @@
# Fact Cards
## Fact #1
- **Statement**: LiteSAM achieves RMSE@30 = 17.86m on UAV-VisLoc dataset with average hit rates of 66.66% (Easy), 65.37% (Moderate), 61.65% (Hard) at 83.79ms inference on RTX 3090 at 1184×1184 resolution.
- **Source**: Source #1 (Table 3)
- **Phase**: Assessment
- **Target Audience**: UAV satellite-aerial matching
- **Confidence**: ✅ High
- **Related Dimension**: Accuracy, Speed
## Fact #2
- **Statement**: LiteSAM opt. (without dual softmax) achieves 60.97ms on RTX 3090 with hit rates of 65.09% (Easy), 61.34% (Moderate), 46.16% (Hard). The accuracy drop in Hard mode is significant: from 61.65% to 46.16%.
- **Source**: Source #1 (Table 3)
- **Phase**: Assessment
- **Target Audience**: UAV satellite-aerial matching
- **Confidence**: ✅ High
- **Related Dimension**: Speed vs accuracy trade-off
## Fact #3
- **Statement**: EfficientLoFTR achieves RMSE@30 = 17.87m on UAV-VisLoc with hit rates of 65.78% (Easy), 63.62% (Moderate), 57.65% (Hard) at 112.60ms on RTX 3090 at 1184×1184.
- **Source**: Source #1 (Table 3)
- **Phase**: Assessment
- **Target Audience**: UAV satellite-aerial matching
- **Confidence**: ✅ High
- **Related Dimension**: Accuracy, Speed
## Fact #4
- **Statement**: SuperPoint+LightGlue achieves hit rates of 60.34% (Easy), 59.57% (Moderate), 54.32% (Hard) at 44.15ms on RTX 3090 on UAV-VisLoc. RMSE@30 = 17.81m.
- **Source**: Source #1 (Table 3)
- **Phase**: Assessment
- **Target Audience**: UAV satellite-aerial matching
- **Confidence**: ✅ High
- **Related Dimension**: Accuracy, Speed
## Fact #5
- **Statement**: LiteSAM has 6.31M parameters (2.4x fewer than EfficientLoFTR's 15.05M). FLOPs: LiteSAM 588.51G vs EfficientLoFTR 1036.61G on self-made dataset (1184×1184).
- **Source**: Source #1 (Table 4)
- **Phase**: Assessment
- **Target Audience**: Resource-constrained deployment
- **Confidence**: ✅ High
- **Related Dimension**: VRAM, Computational cost
## Fact #6
- **Statement**: LiteSAM on NVIDIA Jetson AGX Orin (50W) achieves 497.49ms (opt.) at 1184×1184 with AMP. This is 19.8% faster than EfficientLoFTR opt.
- **Source**: Source #1 (Table 10)
- **Phase**: Assessment
- **Target Audience**: Edge device deployment
- **Confidence**: ✅ High
- **Related Dimension**: Edge performance
## Fact #7
- **Statement**: LiteSAM GitHub repository has 5 stars, 0 forks, 4 commits. No releases published. Pretrained weights available via Google Drive link only. No ONNX/TensorRT export support.
- **Source**: Source #2
- **Phase**: Assessment
- **Target Audience**: Production deployment
- **Confidence**: ✅ High
- **Related Dimension**: Maturity, Deployment readiness
## Fact #8
- **Statement**: LiteSAM is built upon EfficientLoFTR (acknowledged in GitHub README). Uses the same coarse-to-fine architecture with MobileOne backbone replacing RepVGG, TAIFormer replacing EfficientLoFTR attention, and MinGRU replacing heatmap-based refinement.
- **Source**: Source #1, Source #2
- **Phase**: Assessment
- **Target Audience**: Architecture evaluation
- **Confidence**: ✅ High
- **Related Dimension**: Architecture, Risk
## Fact #9
- **Statement**: EfficientLoFTR has 964 stars on GitHub, ONNX export available, TensorRT adaptation exists (LoFTR_TRT), HuggingFace integration. Achieves 27ms at 640×480 with mixed precision.
- **Source**: Source #3, Source #5, Source #6
- **Phase**: Assessment
- **Target Audience**: Production deployment
- **Confidence**: ✅ High
- **Related Dimension**: Maturity, Deployment readiness
## Fact #10
- **Statement**: Neither LiteSAM nor EfficientLoFTR are rotation-invariant. This is a known limitation shared with all LoFTR-family matchers. LightGlue is also confirmed not rotation-invariant (GitHub issue #64). Only SIFT provides rotation invariance.
- **Source**: Source #1 (Section 6 Discussion), Source #8
- **Phase**: Assessment
- **Target Audience**: Rotation handling
- **Confidence**: ✅ High
- **Related Dimension**: Rotation handling
## Fact #11
- **Statement**: LiteSAM and EfficientLoFTR are end-to-end matchers that take an image pair and output correspondences. They do NOT perform image retrieval. The DINOv2 coarse retrieval stage is still required to select candidate satellite tiles.
- **Source**: Source #1 (Section 3)
- **Phase**: Assessment
- **Target Audience**: Pipeline architecture
- **Confidence**: ✅ High
- **Related Dimension**: Pipeline integration
## Fact #12
- **Statement**: LiteSAM trained only on MegaDepth (natural image dataset) and generalizes to satellite-aerial without fine-tuning on target domain. However, the paper notes limitations with "significant viewpoint changes or varying resolutions" (Section 6).
- **Source**: Source #1 (Section 6)
- **Phase**: Assessment
- **Target Audience**: Domain generalization
- **Confidence**: ✅ High
- **Related Dimension**: Generalization, Risk
## Fact #13
- **Statement**: On the self-made dataset (Harbin/Qiqihar, 100-500m altitude), LiteSAM achieves RMSE@30=6.12m, HR=92.09/87.88/77.30, 85.31ms. EfficientLoFTR: RMSE=7.28m, HR=90.03/79.79/61.84, 120.72ms. SP+LG: RMSE=6.76m, HR=78.85/70.03/58.31, 49.49ms.
- **Source**: Source #1 (Table 4)
- **Phase**: Assessment
- **Target Audience**: UAV localization at low altitude
- **Confidence**: ✅ High
- **Related Dimension**: Accuracy at low altitude
## Fact #14
- **Statement**: RTX 2060 (Turing) has approximately 40-60% of RTX 3090 deep learning throughput. Estimated LiteSAM inference on RTX 2060: ~140-210ms at 1184×1184. EfficientLoFTR: ~190-280ms. Current SP+LG ONNX FP16: ~130-180ms (per draft03 spec).
- **Source**: General GPU performance knowledge + Source #1 timings
- **Phase**: Assessment
- **Target Audience**: Hardware constraint evaluation
- **Confidence**: ⚠️ Medium (estimates, not measured)
- **Related Dimension**: RTX 2060 performance
## Fact #15
- **Statement**: LiteSAM uses MobileOne-S3 backbone with only 0.81M params after removing classification head. Total model: 6.31M params. VRAM during inference estimated at ~300-500MB for model + feature maps at 1184×1184 resolution.
- **Source**: Source #1 (Section 3.1)
- **Phase**: Assessment
- **Target Audience**: VRAM budget
- **Confidence**: ⚠️ Medium (VRAM estimated, model params confirmed)
- **Related Dimension**: VRAM budget
## Fact #16
- **Statement**: LiteSAM on self-made dataset achieves significantly better Hard mode hit rate (77.30%) compared to EfficientLoFTR (61.84%) and SP+LG (58.31%). This suggests better robustness in difficult matching scenarios.
- **Source**: Source #1 (Table 4)
- **Phase**: Assessment
- **Target Audience**: Difficult matching scenarios
- **Confidence**: ✅ High
- **Related Dimension**: Robustness
@@ -0,0 +1,39 @@
# Comparison Framework
## Selected Framework Type
Decision Support — comparing three satellite fine matching approaches for production deployment.
## Selected Dimensions
1. Accuracy (RMSE, Hit Rate)
2. Inference speed (RTX 3090 measured, RTX 2060 estimated)
3. VRAM footprint
4. Rotation handling
5. Codebase maturity & deployment readiness
6. Pipeline integration complexity
7. Risk assessment
8. Cost of adoption
## Comparison Table
| Dimension | SuperPoint+LightGlue ONNX FP16 (current) | LiteSAM | EfficientLoFTR |
|-----------|-------------------------------------------|---------|----------------|
| **UAV-VisLoc RMSE@30** | 17.81m | 17.86m | 17.87m |
| **UAV-VisLoc HR Easy/Mod/Hard** | 60.34/59.57/54.32% | 66.66/65.37/61.65% | 65.78/63.62/57.65% |
| **Self-made RMSE@30** | 6.76m | 6.12m | 7.28m |
| **Self-made HR Easy/Mod/Hard** | 78.85/70.03/58.31% | 92.09/87.88/77.30% | 90.03/79.79/61.84% |
| **RTX 3090 time (1184×1184)** | ~44ms (sparse) | 83.79ms | 112.60ms |
| **RTX 2060 est. time** | ~130-180ms (ONNX FP16) | ~140-210ms (PyTorch) | ~190-280ms (PyTorch) |
| **Parameters** | ~12M (SP) + ~12M (LG) | 6.31M | 15.05M |
| **VRAM (model)** | ~900MB (SP+LG) | ~300-500MB (est.) | ~600-800MB (est.) |
| **Rotation invariant** | No (SIFT fallback) | No | No |
| **ONNX support** | ✅ LightGlue-ONNX | ❌ None | ⚠️ LoFTR_TRT pattern |
| **TensorRT support** | ✅ Via LightGlue-ONNX | ❌ None | ⚠️ LoFTR_TRT available |
| **FP16 on Turing** | ✅ Verified | ❌ Not tested | ⚠️ Not verified |
| **GitHub stars** | SP: 4.5K, LG: 3.2K | 5 | 964 |
| **Community/Issues** | Active, well-supported | None | Active |
| **HuggingFace** | ✅ | ❌ | ✅ |
| **Training data** | MegaDepth (pretrained) | MegaDepth (pretrained) | MegaDepth (pretrained) |
| **Pipeline change** | None (current) | Replace Stage 2 fine matching | Replace Stage 2 fine matching |
| **Matching type** | Sparse (detect+match) | Semi-dense (end-to-end) | Semi-dense (end-to-end) |
| **Subpixel accuracy** | Via LightGlue refinement | ✅ MinGRU subpixel | ✅ Two-stage correlation |
| **Factual Basis** | Facts #4, #9, #10 | Facts #1, #2, #5-8, #10-16 | Facts #3, #5, #9, #10 |
@@ -0,0 +1,129 @@
# Reasoning Chain
## Dimension 1: Accuracy
### Fact Confirmation
On UAV-VisLoc (Fact #1, #3, #4): LiteSAM leads in hit rate across all difficulties (66.66/65.37/61.65) vs EfficientLoFTR (65.78/63.62/57.65) vs SP+LG (60.34/59.57/54.32). RMSE values are nearly identical (~17.8m for all three).
On self-made dataset at lower altitude (Fact #13): LiteSAM shows a larger advantage — RMSE 6.12m vs SP+LG 6.76m vs EfficientLoFTR 7.28m. Hit rates: LiteSAM 77.30% (Hard) vs EfficientLoFTR 61.84% vs SP+LG 58.31%.
### Reference Comparison
LiteSAM's advantage over SP+LG in Hard mode: +7.33pp on UAV-VisLoc, +18.99pp on self-made. This is significant — Hard mode simulates large search offsets (300-600m) which matches our scenario where VO drift can reach 100-200m from true position.
### Conclusion
LiteSAM offers meaningfully higher satellite matching success rates, especially in difficult conditions (large search offset from VO drift). The RMSE difference is negligible — all methods achieve similar precision when they succeed. The key differentiator is HOW OFTEN they succeed (hit rate), where LiteSAM leads significantly in Hard mode.
### Confidence
✅ High — numbers from the same paper under identical conditions.
---
## Dimension 2: Inference Speed on RTX 2060
### Fact Confirmation
RTX 3090 measured (Fact #1, #3, #4): SP+LG 44.15ms, LiteSAM 83.79ms, EfficientLoFTR 112.60ms.
RTX 2060 estimates (Fact #14): RTX 2060 Turing has ~40-60% of RTX 3090 throughput. BUT: SP+LG uses ONNX FP16 (optimized for Turing), while LiteSAM/EfficientLoFTR would run as PyTorch models without ONNX.
### Reference Comparison
Current solution spec: SP+LG ONNX FP16 ~130-180ms total on RTX 2060 (SuperPoint 80ms + LightGlue 50-100ms).
LiteSAM PyTorch on RTX 2060: ~140-210ms estimated (no ONNX optimization available).
EfficientLoFTR PyTorch on RTX 2060: ~190-280ms estimated.
Critical: LiteSAM without ONNX on RTX 2060 would be roughly comparable to current SP+LG with ONNX. If LiteSAM got ONNX/FP16 support, it could be faster (given 2.4x fewer params).
### Conclusion
Speed-wise, LiteSAM and the current SP+LG ONNX solution are roughly comparable on RTX 2060 (~140-210ms vs ~130-180ms). EfficientLoFTR is slower. All are well within the 5s budget. The lack of ONNX for LiteSAM is a deployment concern but not a blocking performance issue.
### Confidence
⚠️ Medium — RTX 2060 numbers are estimates; actual ONNX vs PyTorch performance gap varies.
---
## Dimension 3: VRAM Footprint
### Fact Confirmation
Current pipeline peak (Fact #5, draft03): XFeat 200MB + DINOv2-S 300MB + SuperPoint 400MB + LightGlue-ONNX 500MB = ~1.6GB peak.
LiteSAM: 6.31M params → ~25MB model weights. Feature maps at 1184×1184 at 1/8 scale plus multi-scale features → estimated ~300-500MB total.
### Reference Comparison
With LiteSAM replacing SP+LG: XFeat 200MB + DINOv2-S 300MB + LiteSAM ~400MB = ~900MB peak.
Savings: ~700MB VRAM, well within 6GB RTX 2060 budget.
### Conclusion
LiteSAM would significantly reduce VRAM usage compared to current SP+LG approach. Both approaches fit within the 6GB RTX 2060 budget, but LiteSAM leaves more headroom.
### Confidence
⚠️ Medium — LiteSAM VRAM not officially measured, estimated from params and architecture.
---
## Dimension 4: Rotation Handling
### Fact Confirmation
None of the three approaches (SP+LG, LiteSAM, EfficientLoFTR) are rotation-invariant (Fact #10). The current solution handles this with: (1) 4-rotation retry at segment start, (2) heading-based rectification during flight, (3) SIFT+LightGlue fallback.
### Reference Comparison
LiteSAM or EfficientLoFTR would require the same rotation strategy. For the 4-rotation retry: 4 × LiteSAM = ~560-840ms (RTX 2060 est.) vs 4 × SP+LG = ~520-720ms. Acceptable since this only happens at segment starts.
SIFT+LightGlue fallback is still needed regardless. This fallback uses SIFT (rotation-invariant detector) + LightGlue matcher, which is independent of the primary matcher choice.
### Conclusion
Rotation handling is NOT a differentiator. All three approaches need the same rotation strategy. The SIFT+LightGlue fallback must be retained regardless.
### Confidence
✅ High — rotation invariance is a well-understood property of these architectures.
---
## Dimension 5: Codebase Maturity & Deployment Readiness
### Fact Confirmation
LiteSAM (Fact #7): 5 stars, 0 forks, 4 commits, no releases, no ONNX, no TensorRT, PyTorch only, weights on Google Drive, built upon EfficientLoFTR.
EfficientLoFTR (Fact #9): 964 stars, HuggingFace, TensorRT pattern exists, CVPR 2024.
SP+LG: SuperPoint 4.5K+ stars, LightGlue 3.2K+ stars, LightGlue-ONNX dedicated project with FP16/Turing support.
### Reference Comparison
SP+LG is the most production-ready by far. EfficientLoFTR is moderately mature. LiteSAM is academic prototype quality.
### Conclusion
LiteSAM's codebase immaturity is a significant deployment risk. No ONNX/TensorRT support means running PyTorch in production, which is heavier and harder to optimize. If LiteSAM's core improvement is TAIFormer + MinGRU on top of EfficientLoFTR, the safer path might be to use EfficientLoFTR (which has deployment tooling) and accept slightly lower accuracy.
### Confidence
✅ High — directly observable from repositories.
---
## Dimension 6: Pipeline Integration
### Fact Confirmation
All three approaches are fine matchers, not retrievers (Fact #11). DINOv2 coarse retrieval remains necessary. The integration point is Stage 2 of the satellite matching pipeline.
### Reference Comparison
SP+LG: Modular — can independently update detector (SuperPoint) or matcher (LightGlue). Features are reusable.
LiteSAM/EfficientLoFTR: End-to-end — takes image pair, outputs correspondences. Less modular but fewer integration points.
Current pipeline caches SuperPoint features for satellite tiles. With LiteSAM/EfficientLoFTR, caching doesn't apply the same way — feature extraction is coupled with matching.
### Conclusion
Switching to LiteSAM/EfficientLoFTR simplifies the pipeline (one model instead of two) but loses modularity and changes the caching strategy. SuperPoint tile features can no longer be pre-computed independently. However, LiteSAM/EfficientLoFTR can still cache one side's features internally if adapted.
### Confidence
✅ High — architectural analysis.
---
## Dimension 7: Risk Assessment
### Fact Confirmation
LiteSAM risks (Fact #7, #12): immature codebase, no community support, no deployment tooling, single paper, acknowledges limitations with viewpoint changes.
EfficientLoFTR: proven at CVPR 2024, active community, but also not trivially deployable on RTX 2060 without ONNX work.
SP+LG: battle-tested, ONNX FP16 verified on Turing, large community.
### Conclusion
Adopting LiteSAM carries the highest risk: single point of failure on an immature codebase. If a bug is found or the approach doesn't work on our terrain data, there's no community to help. EfficientLoFTR is a safer "upgrade" option. The current SP+LG approach carries the lowest risk.
**Recommended strategy**: Keep SP+LG as the primary matcher (low risk, proven). Evaluate LiteSAM and EfficientLoFTR as experimental alternatives during implementation — run comparative benchmarks on actual flight data. If LiteSAM delivers on its accuracy promises with our data, consider adopting it as the primary matcher after it matures.
### Confidence
✅ High — risk is based on observable maturity indicators.
@@ -0,0 +1,67 @@
# Validation Log
## Validation Scenario
UAV flight over Eastern Ukraine agricultural terrain. 1000 images, FullHD resolution, 300m altitude. Several sharp turns creating 3 route segments. Google Maps satellite tiles at zoom 18 (~0.4m/px). VO drift reaches 150m in one segment before satellite anchor. RTX 2060 with 6GB VRAM.
## Expected Based on Conclusions
### Scenario A: Keep current SP+LG
- Fine matching succeeds ~55-60% of frames in Hard conditions (VO drift >150m)
- Processing: ~130-180ms per satellite match attempt on RTX 2060
- VRAM: ~1.6GB peak, well within budget
- Rotation at segment start: 4-retry works, ~520-720ms for the attempt
- All tooling is production-ready, ONNX FP16 works
### Scenario B: Replace with LiteSAM
- Fine matching succeeds ~62-77% of frames in Hard conditions (significant improvement)
- Processing: ~140-210ms (PyTorch, no ONNX), comparable to SP+LG
- VRAM: ~900MB peak, better margin
- Risk: if LiteSAM code has bugs or performs differently on Ukraine terrain, debugging is harder
- Risk: no ONNX means no easy FP16 optimization path
### Scenario C: Replace with EfficientLoFTR
- Fine matching succeeds ~58-62% in Hard conditions (moderate improvement over SP+LG)
- Processing: ~190-280ms (slower)
- VRAM: ~1.0-1.2GB peak
- Better deployment path than LiteSAM (HuggingFace, TensorRT pattern)
- Lower risk than LiteSAM, higher than SP+LG
## Actual Validation Results
### Hit rate impact on system accuracy
The AC requires 80% of photos within 50m, 60% within 20m. Higher satellite match hit rate directly improves this:
- Each successful satellite anchor corrects drift and backward-propagates via iSAM2
- A 7-19pp improvement in Hard mode hit rate could mean the difference between meeting and missing the 80%/60% AC thresholds
- This is the strongest argument for LiteSAM/EfficientLoFTR
### Speed impact
All three approaches process well under the 5s budget. The satellite matching runs overlapped with next frame VO anyway, so ~50ms difference is negligible.
### VRAM impact
All three fit within 6GB. LiteSAM actually provides better headroom.
## Counterexamples
### Counterexample 1: Domain gap
LiteSAM benchmarks use Chinese datasets (Harbin, Qiqihar, UAV-VisLoc). Our target is Eastern Ukraine. Terrain characteristics differ. The self-made dataset (100-500m altitude over Chinese cities) is reasonably similar to our use case, but agricultural Ukraine terrain may have less texture. This could disproportionately affect semi-dense matchers that rely on feature-rich regions.
### Counterexample 2: Outdated satellite imagery
LiteSAM benchmarks use well-matched satellite imagery. Our Google Maps tiles for conflict-zone Ukraine may be 1-2 years old. Seasonal/structural changes could reduce match quality differently for sparse vs semi-dense methods.
### Counterexample 3: Image resolution mismatch
LiteSAM benchmarks use 1184×1184 matching resolution. Our pipeline downscales to 1600px longest edge. The actual matching tile pair would be smaller. Performance characteristics may differ.
## Review Checklist
- [x] Draft conclusions consistent with fact cards
- [x] No important dimensions missed
- [x] No over-extrapolation
- [x] Conclusions actionable/verifiable
- [ ] Note: RTX 2060 performance numbers are estimates — need empirical validation
## Conclusions Requiring Revision
None — the recommendation to keep SP+LG as primary with LiteSAM/EfficientLoFTR as experimental evaluation targets remains sound given the risk/reward profile.
## Updated Recommendation
**Primary approach**: Keep SuperPoint+LightGlue ONNX FP16 (proven, low risk).
**Design for swappability**: Abstract the fine matching stage behind an interface so LiteSAM or EfficientLoFTR can be plugged in later.
**Benchmark plan**: During implementation, run comparative tests with all three matchers on real flight data. If LiteSAM's hit rate advantage holds on our terrain, adopt it after verifying ONNX export or acceptable PyTorch performance.
@@ -0,0 +1,205 @@
# LiteSAM Feature Matcher Verification Report
**Research date**: 2025-03-14
**Scope**: Satellite-aerial image matching, boyagesmile/LiteSAM, Remote Sensing MDPI Oct 2025
---
## 1. GitHub Repository Verification
| Aspect | Finding | Source | Confidence |
|--------|---------|--------|------------|
| **Repo exists** | Yes, https://github.com/boyagesmile/LiteSAM | GitHub API | High |
| **Stars** | 5 | GitHub API (stargazers_count: 5) | High |
| **Forks** | 0 | GitHub API (forks_count: 0) | High |
| **Open issues** | 0 | GitHub API (open_issues_count: 0) | High |
| **Last commit** | 2025-10-01 (Update README.md) | GitHub API commits | High |
| **First commit** | 2025-09-24 (Initial commit) | GitHub API commits | High |
| **Total commits** | 4 | GitHub API | High |
| **Actively maintained** | Low — no commits since Oct 2025, no releases, no license | GitHub API | High |
| **Releases** | None | GitHub API | High |
| **License** | None declared | GitHub API | High |
**Conclusion**: Repo is real but minimal. Not actively maintained (no commits in ~5 months). Very low community engagement.
---
## 2. Pretrained Weights Availability
| Aspect | Finding | Source | Confidence |
|--------|---------|--------|------------|
| **Weights location** | Google Drive | README.md | High |
| **Download link** | https://drive.google.com/file/d/1fheBUqQWi5f55xNDchumAx2-SmGdT-mX/view?usp=drive_link | README.md | High |
| **File name** | mloftr.ckpt | Google Drive page title | High |
| **Downloadable** | Yes — link resolves to Google Drive file page (requires sign-in for direct download) | Web fetch | Medium |
| **Alternative hosts** | None — no HuggingFace, no Zenodo | Search | High |
**Conclusion**: Weights are available via Google Drive as stated in the draft. Single point of failure; no mirror or checksum documented.
---
## 3. Claimed Results Verification
### 3.1 Clarification: 77.3% Hit Rate
| Claim in draft | Actual source | Dataset | Confidence |
|----------------|---------------|---------|-------------|
| "77.3% hit rate in Hard mode" | Paper Table 4 | **Self-made dataset** (Harbin/Qiqihar, 100500m altitude) | High |
| UAV-VisLoc Hard hit rate | Paper Table 3 | **61.65%** (not 77.3%) | High |
**Important**: The 77.3% figure applies to the **self-made dataset**, not UAV-VisLoc. UAV-VisLoc Hard mode hit rate is 61.65%.
### 3.2 UAV-VisLoc Results (Paper Table 3)
| Method | RMSE@30 | Easy HR | Moderate HR | Hard HR | Inference (RTX 3090) |
|--------|---------|---------|-------------|---------|---------------------|
| LiteSAM | 17.86 m | 66.66% | 65.37% | 61.65% | 83.79 ms |
| EfficientLoFTR | 17.87 m | 65.78% | 63.62% | 57.65% | 112.60 ms |
| SP+LG | 17.81 m | 60.34% | 59.57% | 54.32% | 44.15 ms |
### 3.3 Self-made Dataset Results (Paper Table 4)
| Method | RMSE@30 | Easy HR | Moderate HR | Hard HR | Inference |
|--------|---------|---------|-------------|---------|-----------|
| LiteSAM | 6.12 m | 92.09% | 87.88% | **77.30%** | 85.31 ms |
| EfficientLoFTR | 7.28 m | 90.03% | 79.79% | 61.84% | 120.72 ms |
| SP+LG | 6.76 m | 78.85% | 70.03% | 58.31% | 49.49 ms |
### 3.4 Independent Reproduction
| Aspect | Finding | Source | Confidence |
|--------|---------|--------|------------|
| **Third-party reproduction** | None found | Web search "LiteSAM reproduced results", "LiteSAM satellite matching community" | Medium |
| **Note** | Search results conflate with Lite-SAM (ECCV 2024 segmentation model) | Web search | High |
| **Author reproduction** | Paper states SP+LG and EfficientLoFTR (opt.) on HPatches were "reproduced by the authors under unified experimental environment" | Paper Section 4.3 | High |
| **LiteSAM reproduction** | No explicit statement of third-party reproduction | — | — |
**Conclusion**: Paper numbers are internally consistent. No evidence of independent reproduction. Name collision with Lite-SAM (segmentation) complicates discovery.
---
## 4. GPU and RTX 2060 Performance
| Aspect | Finding | Source | Confidence |
|--------|---------|--------|------------|
| **Paper benchmark GPU** | NVIDIA RTX 3090 | Paper Section 4.2 | High |
| **Resolution** | 1184×1184 (UAV-VisLoc, self-made) | Paper | High |
| **LiteSAM inference (RTX 3090)** | 83.79 ms (full), 60.97 ms (opt.) | Paper Table 3 | High |
| **RTX 2060 estimate** | ~140210 ms (PyTorch, no ONNX) | Project fact cards (Fact #14) | Medium |
| **Rationale** | RTX 2060 Turing ≈ 4060% of RTX 3090 throughput | General GPU knowledge | Medium |
| **Measured RTX 2060** | Not found — no published benchmarks | Search | High |
**Conclusion**: RTX 3090 numbers are from the paper. RTX 2060 figures are estimates only; no measured data found.
---
## 5. ONNX / TensorRT Export
| Aspect | Finding | Source | Confidence |
|--------|---------|--------|------------|
| **LiteSAM ONNX** | None — no export script or docs | README, paper, GitHub | High |
| **LiteSAM TensorRT** | None | README, paper, GitHub | High |
| **EfficientLoFTR ONNX** | loftr2onnx exists for **original LoFTR** (not EfficientLoFTR) | Web search, loftr2onnx repo | High |
| **EfficientLoFTR TensorRT** | LoFTR_TRT targets original LoFTR | Kolkir/LoFTR_TRT | High |
| **EfficientLoFTR HuggingFace** | Yes — Transformers integration, no ONNX export in repo | HuggingFace | High |
| **Convertibility** | Theoretically possible — PyTorch model; no documented path | Architecture analysis | Low |
**Conclusion**: No ONNX or TensorRT path for LiteSAM. EfficientLoFTR has HuggingFace support but no official ONNX; LoFTR family ONNX/TensorRT work targets original LoFTR. LiteSAM conversion would require custom effort.
---
## 6. Parameters and VRAM
| Aspect | Finding | Source | Confidence |
|--------|---------|--------|------------|
| **Parameters** | 6.31M (claimed) | Paper Table 4, Section 3.1 | High |
| **EfficientLoFTR params** | 15.05M (2.4× more) | Paper | High |
| **MobileOne-S3 backbone** | 0.81M params (after removing classification head) | Paper Section 3.1 | High |
| **FLOPs (self-made, 1184×1184)** | 588.51G (LiteSAM) vs 1036.61G (EfficientLoFTR) | Paper Table 4 | High |
| **VRAM (measured)** | Not reported in paper | Paper | High |
| **VRAM (estimated)** | ~300500 MB (model + feature maps at 1184×1184) | Project fact cards (Fact #15) | Medium |
| **Model file size** | mloftr.ckpt — size not documented | Google Drive | Low |
**Conclusion**: 6.31M parameters confirmed. VRAM is estimated, not measured.
---
## 7. Architecture and EfficientLoFTR Relationship
| Aspect | Finding | Source | Confidence |
|--------|---------|--------|------------|
| **Built on EfficientLoFTR** | Yes — acknowledged in README | README Acknowledgement | High |
| **Key differences** | MobileOne-S3 backbone (replaces RepVGG), TAIFormer (replaces EfficientLoFTR attention), MinGRU (replaces heatmap refinement) | Paper Section 3 | High |
| **EfficientLoFTR maturity** | 964 stars, 96 forks, 55 open issues, CVPR 2024, HuggingFace, Apache 2.0 | GitHub API | High |
| **EfficientLoFTR ONNX** | loftr2onnx exists for original LoFTR; EfficientLoFTR-specific ONNX not clearly documented | Web search | Medium |
| **EfficientLoFTR TensorRT** | LoFTR_TRT (original LoFTR); no direct EfficientLoFTR TensorRT | Web search | High |
**Conclusion**: LiteSAM is built on EfficientLoFTR. EfficientLoFTR is mature (CVPR 2024, HuggingFace). LiteSAM uses a different backbone and modules; ONNX/TensorRT work for LoFTR does not directly apply.
---
## 8. Remote Sensing (MDPI) Journal Reputation
| Aspect | Finding | Source | Confidence |
|--------|---------|--------|------------|
| **Venue** | Remote Sensing, MDPI | Paper | High |
| **Impact Factor** | 4.1 (2024), 5-year 4.8 | MDPI announcements | High |
| **CiteScore** | 8.6 (June 2025) | MDPI | High |
| **Ranking** | Q1 in General Earth and Planetary Sciences, Geosciences, Remote Sensing | MDPI | High |
| **Indexing** | Scopus, Web of Science (SCIE) | MDPI | High |
| **Publication date** | 30 September 2025 | Paper | High |
| **Reputation** | Established Q1 journal in remote sensing | Multiple sources | High |
**Conclusion**: Remote Sensing (MDPI) is a reputable Q1 venue for this topic.
---
## 9. Search Query Results Summary
| Query | Result |
|-------|--------|
| "LiteSAM satellite matching" | Results dominated by Lite-SAM (ECCV 2024 segmentation). boyagesmile/LiteSAM satellite matcher rarely appears. |
| "LiteSAM feature matching UAV" | Same name collision. No specific UAV feature-matching discussions for boyagesmile/LiteSAM. |
| "LiteSAM vs EfficientLoFTR" | Lite-SAM (segmentation) vs EfficientLoFTR (matching) — different tasks. No direct comparison of boyagesmile/LiteSAM vs EfficientLoFTR. |
| GitHub issues | 0 open issues. No reported bugs. |
| Community discussions | None found. No Reddit, Twitter/X, or forum threads specific to boyagesmile/LiteSAM. |
---
## 10. What Could NOT Be Verified
| Item | Reason |
|------|--------|
| Independent reproduction of paper results | No third-party reports found |
| Actual VRAM usage | Not in paper or repo |
| RTX 2060 inference time | No benchmarks; estimate only |
| Google Drive weight checksum | Not provided |
| ONNX conversion feasibility | No attempt documented; would need implementation |
| Real-world performance on Ukraine/conflict-zone terrain | Paper uses Chinese datasets (Harbin, Qiqihar, UAV-VisLoc) |
| Long-term weight availability | Google Drive link could change or be removed |
---
## 11. Summary Table
| Question | Answer | Confidence |
|----------|--------|------------|
| 1. Repo real and maintained? | Real, minimal maintenance (5 stars, 0 forks, 4 commits, last Oct 2025) | High |
| 2. Weights available? | Yes, Google Drive; single host, no checksum | High |
| 3. Results reproduced? | No evidence of third-party reproduction | Medium |
| 4. GPU used? | RTX 3090 | High |
| 5. RTX 2060 performance? | ~140210 ms (estimate only) | Medium |
| 6. ONNX export? | No | High |
| 7. Parameters? | 6.31M (confirmed) | High |
| 8. VRAM? | ~300500 MB (estimated) | Medium |
| 9. Built on EfficientLoFTR? | Yes | High |
| 10. EfficientLoFTR mature? | Yes (CVPR 2024, 964 stars, HuggingFace) | High |
| 11. Remote Sensing reputable? | Yes (Q1, IF 4.1) | High |
---
## 12. Draft Correction
**Draft says**: "77.3% hit rate in Hard mode on satellite-aerial benchmarks"
**Clarification**: 77.3% is on the **self-made dataset** (Harbin/Qiqihar, 100500 m). On **UAV-VisLoc**, Hard mode hit rate is **61.65%**. Both are satellite-aerial benchmarks, but the numbers differ by dataset.