add clarification to research methodology by including a step for solution comparison and user consultation

This commit is contained in:
Oleksandr Bezdieniezhnykh
2026-03-17 18:43:57 +02:00
parent d764250f9a
commit b419e2c04a
35 changed files with 6030 additions and 0 deletions
@@ -0,0 +1,200 @@
# XFeat vs SuperPoint+LightGlue for Visual Odometry in UAV Navigation
**Research Date**: March 2025
**Context**: GPS-denied UAV navigation, frame-to-frame VO from consecutive aerial photos (6080% overlap, mostly translational motion, ~100m inter-frame spacing, downward-facing camera)
---
## Executive Summary
**Finding**: The switch from XFeat to SuperPoint+LightGlue for VO in solution draft 04 appears to be an **unintentional regression**. For frame-to-frame VO with high overlap and mostly translational motion, XFeat is likely sufficient in quality while being ~10× faster. SuperPoint+LightGlues quality advantage is most relevant for wide-baseline and satellite-aerial matching, not for high-overlap consecutive aerial frames.
**Recommendation**: Revert VO to XFeat with built-in matcher. Keep SuperPoint+LightGlue (or LiteSAM) for satellite fine matching only.
---
## 1. XFeat Performance and Quality
### 1.1 Speed
| Setting | Time | Source | Confidence |
|--------|------|--------|------------|
| **CPU (Intel i5-1135G7)** | 27.1 FPS (sparse) / 19.2 FPS (semi-dense) at VGA | XFeat paper (CVPR 2024) | High |
| **CPU** | ~37ms per frame (sparse), ~52ms (semi-dense) | Derived from FPS | High |
| **GPU** | Not explicitly reported in paper | — | — |
| **SatLoc-Fusion (RKNN)** | 30 FPS at 640×480 (~33ms) | SatLoc-Fusion (MDPI 2025) | High |
| **Draft 03 claim** | ~15ms total (extract+match) on GPU | solution_draft03.md | Medium — no direct citation |
**Notes**:
- The paper reports CPU timings; GPU is expected to be faster.
- SatLoc-Fusion achieves 30 FPS with XFeat on a 6 TFLOPS edge device after RKNN acceleration.
- The ~15ms GPU claim in draft 03 is plausible but not directly verified in the XFeat paper.
### 1.2 Quality (Megadepth-1500)
| Method | AUC@5° | AUC@10° | AUC@20° | Acc@10° | MIR | #inliers | FPS (CPU) |
|--------|--------|---------|---------|---------|-----|----------|-----------|
| SuperPoint | 37.3 | 50.1 | 61.5 | 67.4 | 0.35 | 495 | 3.0 |
| XFeat (sparse) | 42.6 | 56.4 | 67.7 | 74.9 | 0.55 | 892 | 27.1 |
| XFeat* (semi-dense) | 50.2 | 65.4 | 77.1 | 85.1 | 0.74 | 1885 | 19.2 |
| DISK* | 55.2 | 66.8 | 75.3 | 81.3 | 0.71 | 1997 | 1.2 |
**Source**: [XFeat paper, Table 1](https://arxiv.org/html/2404.19174v1)
**Conclusion**: XFeat outperforms SuperPoint on Megadepth (sparse and semi-dense) while being ~9× faster (sparse) or ~6× faster (semi-dense). Megadepth includes wide-baseline pairs; high-overlap consecutive frames are typically easier.
### 1.3 Downward-Facing Camera / Nadir Aerial
- No direct benchmark for nadir aerial imagery.
- SatLoc-Fusion uses XFeat for VO on downward-facing UAV imagery (100300 m altitude, DJI Mavic, 1920×1080).
- SatLoc-Fusion achieves <15 m absolute localization error and >90% trajectory coverage at >2 Hz on 6 TFLOPS edge hardware.
- XFeat is trained on Megadepth + COCO; homography estimation on HPatches is strong (illumination and viewpoint splits).
**Confidence**: Medium — SatLoc-Fusion validates XFeat for UAV VO, but not on the exact 100 m inter-frame spacing scenario.
---
## 2. SuperPoint+LightGlue Performance and Quality
### 2.1 Speed
| Component | Time | Source |
|-----------|------|--------|
| SuperPoint extraction | ~80ms GPU | solution_draft04.md, LiteSAM assessment |
| LightGlue ONNX FP16 | ~50100ms | solution_draft04.md |
| **Total VO** | **~130180ms** | solution_draft04.md |
### 2.2 Quality for VO
- LightGlue-based VO on KITTI: ~1% odometry error vs ~3.54.1% with FLANN.
- SuperVINS uses SuperPoint+LightGlue for front-end matching in visual-inertial SLAM.
- LightGlue is not rotation-invariant (GitHub issue #64).
### 2.3 High-Overlap Consecutive Frames
- High overlap (6080%) and mostly translational motion are easier than Megadepths wide-baseline pairs.
- SuperPoint+LightGlues main strength is contextual matching and robustness to viewpoint/illumination changes.
- For near-planar, high-overlap aerial pairs, simpler matchers (e.g. MNN) often suffice.
**Conclusion**: SuperPoint+LightGlue is strong for difficult matching, but its advantage is less critical for high-overlap consecutive aerial frames.
---
## 3. SatLoc-Fusion: XFeat in Production UAV VO
**Paper**: [Towards UAV Localization in GNSS-Denied Environments: The SatLoc Dataset and a Hierarchical Adaptive Fusion Framework](https://www.mdpi.com/2072-4292/17/17/3048) (MDPI Remote Sensing, Sept 2025)
### 3.1 Architecture
- **Layer 1**: DINOv2 for aerialsatellite matching (absolute localization).
- **Layer 2**: XFeat for VO (relative pose between consecutive frames).
- **Layer 3**: LucasKanade optical flow for velocity.
### 3.2 XFeat Usage
- XFeat for keypoint detection, descriptor extraction, and matching.
- Homography via DLT + RANSAC.
- Scale from relative altitude.
- Fine-tuned on UAV-VisLoc starting from public XFeat weights.
### 3.3 Results
| Metric | Value |
|--------|-------|
| Absolute localization error | <15 m |
| Trajectory coverage | >90% |
| Throughput | >2 Hz on 6 TFLOPS edge |
| XFeat inference | 30 FPS at 640×480 (RKNN) |
### 3.4 Ablation
- Without Layer 1 (satellite): MLE 27.84 m vs 14.05 m with full system.
- Layer 2 (XFeat VO) + Layer 3 (optical flow) provide fallback when satellite matching fails.
**Conclusion**: XFeat is validated for UAV VO in a published 2025 system with similar constraints (downward-facing, low altitude, edge hardware).
---
## 4. Direct Comparison: XFeat vs SuperPoint+LightGlue for VO
| Dimension | XFeat | SuperPoint+LightGlue |
|-----------|-------|----------------------|
| **VO time per frame** | ~1536 ms (GPU/CPU) | ~150200 ms |
| **Speed ratio** | ~10× faster | Baseline |
| **Megadepth AUC@10°** | 65.4 (semi-dense) | 50.1 |
| **Megadepth #inliers** | 8921885 | 495 |
| **Built-in matcher** | Yes (MNN + optional refinement) | No (needs LightGlue) |
| **VRAM (VO only)** | ~200 MB | ~900 MB |
| **UAV VO validation** | SatLoc-Fusion (2025) | No direct UAV VO paper |
| **Rotation invariance** | No | No |
### 4.1 When SuperPoint+LightGlue Helps
- Wide-baseline matching.
- Satelliteaerial matching (different viewpoints, scale, illumination).
- Low-texture or repetitive scenes where contextual matching matters.
### 4.2 When XFeat Is Enough
- High-overlap consecutive frames (6080%).
- Mostly translational motion.
- Downward-facing nadir imagery.
- Real-time or resource-limited systems.
---
## 5. Answer to the Key Question
**For frame-to-frame VO with 6080% overlap and mostly translational motion, is XFeat sufficient, or does SuperPoint+LightGlue provide materially better results?**
**Answer**: XFeat is likely sufficient. Evidence:
1. **Megadepth**: XFeat outperforms SuperPoint on pose estimation (AUC, inliers).
2. **Task difficulty**: High-overlap consecutive frames are easier than Megadepths wide-baseline pairs.
3. **SatLoc-Fusion**: XFeat delivers <15 m error and >2 Hz on edge hardware for UAV VO.
4. **Cost**: SuperPoint+LightGlue is ~10× slower with no clear VO-specific benefit in this scenario.
5. **Assessment gap**: Draft 04s assessment targeted satellite matching (SuperPoint+LightGlue → LiteSAM), not VO. The VO change from XFeat to SuperPoint+LightGlue was not justified in the findings.
---
## 6. Sources
| # | Source | Tier | Date | Key Content |
|---|--------|------|------|-------------|
| 1 | [XFeat: Accelerated Features (arXiv)](https://arxiv.org/html/2404.19174v1) | L1 | Apr 2024 | Benchmarks, Megadepth, CPU FPS |
| 2 | [SatLoc-Fusion (MDPI Remote Sensing)](https://www.mdpi.com/2072-4292/17/17/3048) | L1 | Sept 2025 | XFeat for UAV VO, <15 m, >2 Hz |
| 3 | [accelerated_features GitHub](https://github.com/verlab/accelerated_features) | L1 | 2024 | XFeat implementation |
| 4 | solution_draft03.md | L2 | Project | XFeat ~15ms VO, SuperPoint for satellite |
| 5 | solution_draft04.md | L2 | Project | SuperPoint+LightGlue for VO ~150200ms |
| 6 | [vo_lightglue](https://github.com/himadrir/vo_lightglue) | L3 | — | LightGlue VO, ~1% error on KITTI |
| 7 | [Luxonis XFeat](https://models.luxonis.com/luxonis/xfeat/) | L2 | — | XFeat comparable to SuperPoint, faster |
---
## 7. Confidence Summary
| Statement | Confidence |
|-----------|------------|
| XFeat is 59× faster than SuperPoint on CPU | High |
| XFeat outperforms SuperPoint on Megadepth | High |
| SatLoc-Fusion uses XFeat successfully for UAV VO | High |
| ~15ms GPU claim for XFeat is plausible | Medium |
| XFeat is sufficient for 6080% overlap VO | MediumHigh |
| SuperPoint+LightGlue does not materially improve VO for this use case | Medium |
| VO change in draft 04 was unintentional | High (no assessment finding) |
---
## 8. Recommendation
**Revert VO to XFeat** as in solution draft 03:
- Use XFeat with built-in matcher for frame-to-frame VO.
- Keep LiteSAM (or SuperPoint+LightGlue) for satellite fine matching only.
- Expected VO time: ~1536 ms vs ~150200 ms with SuperPoint+LightGlue.
- Total per-frame time should drop from ~350470 ms to ~230300 ms.
If VO quality issues appear in testing, consider:
- XFeat semi-dense mode (XFeat*) for more matches.
- XFeat+LightGlue as a middle ground (faster than SuperPoint+LightGlue, potentially better than XFeat alone).
@@ -0,0 +1,60 @@
# Source Registry: XFeat vs SuperPoint+LightGlue Low-Texture Matching
## Source #1
- **Title**: XFeat: Accelerated Features for Lightweight Image Matching
- **Link**: https://arxiv.org/html/2404.19174v1
- **Tier**: L1
- **Publication Date**: 2024-04
- **Summary**: CVPR 2024 paper. Megadepth-1500 Table 1 (XFeat, SuperPoint, DISK with MNN). Appendix F: LightGlue vs XFeat* (61.4 vs 50.2 AUC@5°). XFeat uses dual-softmax + MNN. Textureless demo vs SIFT. ScanNet indoor generalization.
- **Related Sub-question**: 1, 2, 3, 5
## Source #2
- **Title**: LightGlue: Local Feature Matching at Light Speed
- **Link**: https://openaccess.thecvf.com/content/ICCV2023/html/Lindenberger_LightGlue_Local_Feature_Matching_at_Light_Speed_ICCV_2023_paper.html
- **Tier**: L1
- **Publication Date**: 2023
- **Summary**: ICCV 2023. Transformer matcher with self/cross-attention. Adaptive computation. Typically paired with SuperPoint.
- **Related Sub-question**: 3
## Source #3
- **Title**: Nature Sci Rep 2025 - Table 2 Relative pose estimation MegaDepth-1500
- **Link**: https://www.nature.com/articles/s41598-025-21602-5/tables/2
- **Tier**: L1
- **Publication Date**: 2025
- **Summary**: LightGlue: RANSAC 47.83%, AUC@5° 86.8, AUC@10° 96.3. SuperGlue, OmniGlue, DALGlue comparison. Detector not specified.
- **Related Sub-question**: 1
## Source #4
- **Title**: verlab/accelerated_features
- **Link**: https://github.com/verlab/accelerated_features
- **Tier**: L1
- **Summary**: XFeat implementation. Textureless scene demo. LightGlue integration (Issue #67). GlueFactory.
- **Related Sub-question**: 5
## Source #5
- **Title**: cvg/LightGlue
- **Link**: https://github.com/cvg/LightGlue
- **Tier**: L1
- **Summary**: LightGlue implementation. Issue #128: XFeat_with_lightglue. SuperPoint pairing.
- **Related Sub-question**: 5
## Source #6
- **Title**: vismatch/xfeat-lightglue
- **Link**: https://huggingface.co/vismatch/xfeat-lightglue
- **Tier**: L2
- **Summary**: Pre-trained XFeat+LightGlue model. vismatch library.
- **Related Sub-question**: 5
## Source #7
- **Title**: LightGlueStick: Joint Point-Line Matching
- **Link**: https://arxiv.org/html/2510.16438v1
- **Tier**: L1
- **Summary**: Line segments in texture-less regions. LightGlue architecture for low-texture.
- **Related Sub-question**: 3
## Source #8
- **Title**: Novel real-time matching for low-overlap agricultural UAV images with repetitive textures
- **Link**: https://www.sciencedirect.com/science/article/abs/pii/S092427162500190X
- **Tier**: L1
- **Summary**: Agricultural UAV, repetitive textures, low overlap. Global texture information for weak-textured regions.
- **Related Sub-question**: 2, 4
@@ -0,0 +1,243 @@
# XFeat vs SuperPoint+LightGlue: Low-Texture Aerial Matching Assessment
**Research Date**: March 2025
**Context**: GPS-denied UAV navigation over eastern Ukraine — flat agricultural fields, uniform croplands, low-density features, visually repetitive terrain. Downward-facing camera at up to 1 km altitude.
---
## Executive Summary
| Question | Finding | Confidence |
|----------|---------|------------|
| XFeat+XFeat_matcher vs SuperPoint+LightGlue on low-texture | **SuperPoint+LightGlue** has higher AUC and RANSAC success; XFeat+MNN is faster but uses a simple matcher | High (benchmarks) |
| Detector on low-texture | **No direct agricultural benchmark**; XFeat has textureless demo; SuperPoint trained on synthetic shapes | Medium (inferred) |
| LightGlue advantage on difficult scenes | Attention mechanism helps on repetitive/ambiguous patterns; reduces false matches vs NN | High (paper mechanism) |
| Worst-case match rate / VO failures | **No published data** on per-frame failure rate or segment breaks | Low (gap) |
| XFeat+LightGlue | **Available** (GlueFactory, vismatch); ~65115 ms estimated; best-of-both-worlds option | Medium (implementation exists) |
---
## 1. Detector+Matcher Pairings: Critical Distinction
**All benchmark numbers depend on the exact pairing.** The following table clarifies what each paper measures:
| Pipeline | Detector | Matcher | Source |
|----------|----------|---------|--------|
| XFeat (sparse) | XFeat | MNN (Mutual Nearest Neighbor) | XFeat paper Table 1 |
| XFeat* (semi-dense) | XFeat | MNN + offset refinement | XFeat paper Table 1 |
| SuperPoint | SuperPoint | MNN | XFeat paper Table 1 |
| LightGlue | SuperPoint (typical) | LightGlue (attention-based) | LightGlue paper, Nature 2025 |
| XFeat+LightGlue | XFeat | LightGlue | GlueFactory, vismatch, GitHub #67 |
---
## 2. Megadepth-1500: Actual Numbers by Pipeline
### 2.1 XFeat Paper (CVPR 2024) — All Use MNN
| Method | AUC@5° | AUC@10° | AUC@20° | Acc@10° | MIR | #inliers | FPS (CPU) |
|--------|--------|---------|---------|---------|-----|----------|-----------|
| SuperPoint | 37.3 | 50.1 | 61.5 | 67.4 | 0.35 | 495 | 3.0 |
| XFeat (sparse) | 42.6 | 56.4 | 67.7 | 74.9 | 0.55 | 892 | 27.1 |
| XFeat* (semi-dense) | 50.2 | 65.4 | 77.1 | 85.1 | 0.74 | 1885 | 19.2 |
| DISK* | 55.2 | 66.8 | 75.3 | 81.3 | 0.71 | 1997 | 1.2 |
**Source**: [XFeat paper, Table 1](https://arxiv.org/html/2404.19174v1)
**Matcher**: MNN for all. XFeat uses dual-softmax loss during training but **MNN at inference**.
**Resolution**: Max dimension 1200 px.
### 2.2 XFeat Paper Appendix F — Learned Matchers
| Method | Type | AUC@5° | AUC@10° | AUC@20° | Acc@10° | MIR | #inliers | PPS |
|--------|------|--------|---------|---------|---------|-----|----------|-----|
| LightGlue | learned matcher | 61.4 | 75.0 | 84.8 | 91.8 | 0.92 | 475 | 0.31 |
| XFeat* | coarse-fine | 50.2 | 65.4 | 77.1 | 85.1 | 0.74 | 1885 | 1.33 |
| LoFTR | learned matcher | 68.3 | 80.0 | 88.0 | 93.9 | 0.93 | 3009 | 0.06 |
| Patch2Pix | coarse-fine | 47.8 | 61.0 | 71.0 | 77.8 | 0.59 | 536 | 0.05 |
**Source**: [XFeat paper, Appendix F, Table 6](https://arxiv.org/html/2404.19174v1)
**Detector for LightGlue**: Not explicitly stated; standard LightGlue model is trained for **SuperPoint**.
**Setup**: i7-6700K CPU, 1200 px max dimension, pairs per second (PPS).
**Conclusion**: SuperPoint+LightGlue (61.4% AUC@5°, 84.8% AUC@20°) **outperforms** XFeat+XFeat_matcher (50.2% AUC@5°, 77.1% AUC@20°) on Megadepth-1500. LightGlue has higher MIR (0.92 vs 0.74) and Acc@10° (91.8 vs 85.1).
### 2.3 Nature 2025 (DALGlue Paper) — Different Protocol
| Method | RANSAC % | Precision % | Recall % | AUC@5° | AUC@10° |
|--------|----------|------------|---------|--------|---------|
| SuperGlue | 34.18 | 50.32 | 64.16 | 74.6 | 90.5 |
| LightGlue | 47.83 | 65.48 | 79.04 | **86.8** | **96.3** |
| OmniGlue | 47.4 | 65.0 | 77.8 | 82.1 | 95.3 |
| DALGlue | 57.01 | 73.0 | 84.11 | 87.2 | 97.5 |
**Source**: [Nature Sci Rep 2025, Table 2](https://www.nature.com/articles/s41598-025-21602-5/tables/2)
**Detector**: Not specified; LightGlue is typically paired with SuperPoint.
**Note**: Higher AUC than XFeat Appendix F — likely different resolution (e.g. 1600 px), RANSAC settings, or evaluation protocol.
---
## 3. XFeat Built-in Matcher vs LightGlue
### 3.1 XFeat Matcher
- **Mechanism**: Dual-softmax nearest-neighbor. Similarity matrix S = F1·F2^T; softmax row-wise and column-wise; mutual nearest neighbor selection.
- **Training**: Dual-softmax loss (Eq. 3 in XFeat paper) supervises descriptors.
- **Inference**: MNN search on descriptors. No attention, no contextual refinement.
- **Limitation**: On repetitive/ambiguous patterns, nearest-neighbor can produce many false matches; no geometric reasoning.
### 3.2 LightGlue Matcher
- **Mechanism**: Transformer with self-attention (within image) and cross-attention (across images). Rotary positional encoding. Matchability-aware pruning.
- **Advantage**: Contextual matching — can disambiguate repetitive structures using neighborhood and global structure.
- **Adaptive**: Early exit on easy pairs; more computation on difficult pairs.
- **Source**: [LightGlue ICCV 2023](https://openaccess.thecvf.com/content/ICCV2023/html/Lindenberger_LightGlue_Local_Feature_Matching_at_Light_Speed_ICCV_2023_paper.html)
### 3.3 Which Performs Better on Sparse, Repetitive Features?
**Measured**: LightGlue (with SuperPoint) achieves higher AUC and MIR than XFeat+MNN on Megadepth-1500. Megadepth includes repetitive structures and viewpoint changes.
**Inferred**: On low-texture, repetitive agricultural terrain:
- **LightGlue** should reduce false matches by using attention over keypoint neighborhoods.
- **XFeat+MNN** may produce more matches (#inliers 1885 vs 475) but with lower precision (MIR 0.74 vs 0.92).
- For VO, **precision matters** — false matches corrupt RANSAC and cause pose drift. LightGlues higher MIR suggests fewer outliers.
**Confidence**: High for mechanism; Medium for low-texture agricultural extrapolation (no direct benchmark).
---
## 4. SuperPoint vs XFeat Keypoint Detection on Low-Texture
### 4.1 SuperPoint
- **Training**: Synthetic shapes (Homographic Adaptation) + self-supervised on synthetic warps. Trained on indoor/outdoor imagery.
- **Low-texture**: No explicit low-texture training. "Dustbin" channel helps reject non-interest points. Homographic Adaptation improves repeatability across transformations.
- **Agricultural**: Extensions like SuperPoint-E use tracking adaptation for low-texture endoscopy; no agricultural-specific variant found.
### 4.2 XFeat
- **Training**: Megadepth + COCO (6:4 hybrid). Keypoint head distilled from ALIKE-Tiny (low-level features: corners, lines, blobs).
- **Low-texture**: GitHub demo shows "SIFT cannot handle fast camera movements, while XFeat provides robust matches" on a **textureless scene** ([verlab/accelerated_features](https://github.com/verlab/accelerated_features)).
- **Lightweight**: 64-D descriptors; fewer keypoints in uniform areas by design (reliability map R filters low-confidence regions).
### 4.3 Which Extracts More Repeatable Keypoints on Flat Agricultural Terrain?
**Measured**: None. No benchmark on agricultural or flat cropland imagery.
**Inferred**:
- XFeats textureless demo suggests it handles low-texture better than SIFT.
- XFeats ScanNet-1500 results (indoor, often texture-poor) show XFeat outperforming DISK and ALIKE — "indoor imagery often lacks distinctiveness at the local level" (XFeat Appendix E).
- SuperPoints generalization comes from synthetic training; agricultural uniformity may be out-of-distribution.
- **Conclusion**: XFeat may have an edge on texture-poor scenes based on ScanNet and textureless demo; SuperPoint has no such evidence. Confidence: Medium.
---
## 5. LightGlue Advantage on Difficult Scenes
### 5.1 Attention Mechanism
- Self-attention: aggregates information within each image.
- Cross-attention: matches features across images with context.
- Helps distinguish repetitive patterns by using neighborhood structure.
### 5.2 False Match Reduction
- LightGlue predicts **matchability** scores and prunes low-confidence matches.
- MIR 0.92 (LightGlue) vs 0.74 (XFeat*) indicates a much higher fraction of matches that comply with the estimated model after RANSAC.
- **Interpretation**: LightGlue produces fewer but more reliable matches; XFeat* produces more matches but with more outliers.
### 5.3 Repetitive/Ambiguous Patterns
- LightGlueStick (line+point) explicitly targets "line segments abundant in texture-less regions" ([LightGlueStick arXiv](https://arxiv.org/html/2510.16438v1)).
- For point-only matching, LightGlues attention still helps disambiguate when features look similar.
**Confidence**: High for mechanism; Medium for agricultural extrapolation.
---
## 6. Worst-Case Match Rate / VO Failure
### 6.1 What Matters for VO
- A single frame failure can cause a segment break.
- Metrics like AUC and MIR are averaged over many pairs; they do not directly measure "percentage of frames that fail to produce a valid pose."
### 6.2 Data Availability
| Metric | Availability |
|--------|--------------|
| AUC, Acc@10°, MIR | Yes (Megadepth, etc.) |
| Per-frame success rate | **No** |
| Segment break rate | **No** |
| Match failure rate on difficult sequences | **No** |
### 6.3 Inference
- Higher MIR and AUC typically correlate with fewer RANSAC failures.
- SuperPoint+LightGlues higher MIR (0.92 vs 0.74) suggests fewer frames where RANSAC would fail to find a valid pose.
- **No quantitative evidence** for VO-specific failure rates.
**Confidence**: Low — purely inferred.
---
## 7. XFeat+LightGlue Option
### 7.1 Feasibility
| Source | Finding |
|--------|---------|
| [GitHub Issue #67](https://github.com/verlab/accelerated_features/issues/67) | XFeat+LightGlue via GlueFactory |
| [GitHub Issue #128](https://github.com/cvg/LightGlue/issues/128) | XFeat_with_lightglue discussion |
| [vismatch/xfeat-lightglue](https://huggingface.co/vismatch/xfeat-lightglue) | Pre-trained model on HuggingFace |
| [noahzhy/xfeat_lightglue_onnx](https://github.com/noahzhy/xfeat_lightglue_onnx) | ONNX deployment |
**Conclusion**: XFeat+LightGlue is **implemented and available**.
### 7.2 Speed Estimate
| Component | Time | Source |
|-----------|------|--------|
| XFeat extraction | ~15 ms (GPU) | XFeat ~27 FPS CPU → ~15 ms plausible on GPU |
| LightGlue matching | ~50100 ms | solution_draft04, LightGlue ONNX |
| **Total** | **~65115 ms** | Sum |
**Note**: XFeat is faster than SuperPoint (~15 ms vs ~80 ms GPU), so XFeat+LightGlue would be faster than SuperPoint+LightGlue (~130180 ms total).
### 7.3 Best of Both Worlds?
- **XFeat**: Fast extraction, lightweight, good on textureless (demo), strong on ScanNet (indoor).
- **LightGlue**: Contextual matching, high MIR, fewer false matches.
- **Combination**: Faster than SuperPoint+LightGlue; potentially better quality than XFeat+MNN on difficult scenes.
**No published benchmark** for XFeat+LightGlue on Megadepth or agricultural data. **Inferred** benefit: Medium confidence.
---
## 8. Summary Table: Measured vs Inferred
| Statement | Type | Confidence |
|-----------|------|------------|
| SuperPoint+LightGlue > XFeat+MNN on Megadepth (AUC, MIR) | Measured | High |
| LightGlue uses attention; XFeat uses MNN | Measured | High |
| LightGlue reduces false matches vs NN (higher MIR) | Measured | High |
| XFeat handles textureless better than SIFT (demo) | Measured | High |
| XFeat generalizes well to indoor (ScanNet) | Measured | High |
| SuperPoint+LightGlue better on low-texture agricultural | Inferred | Medium |
| XFeat detects more repeatable keypoints on flat terrain | Inferred | Medium |
| XFeat+LightGlue gives best of both worlds | Inferred | Medium |
| Worst-case match rate / VO failure data | Gap | — |
---
## 9. Sources
| # | Source | Tier | Date |
|---|--------|------|------|
| 1 | [XFeat arXiv](https://arxiv.org/html/2404.19174v1) | L1 | Apr 2024 |
| 2 | [LightGlue ICCV 2023](https://openaccess.thecvf.com/content/ICCV2023/html/Lindenberger_LightGlue_Local_Feature_Matching_at_Light_Speed_ICCV_2023_paper.html) | L1 | 2023 |
| 3 | [Nature Sci Rep 2025, Table 2](https://www.nature.com/articles/s41598-025-21602-5/tables/2) | L1 | 2025 |
| 4 | [verlab/accelerated_features](https://github.com/verlab/accelerated_features) | L1 | 2024 |
| 5 | [cvg/LightGlue](https://github.com/cvg/LightGlue) | L1 | 2023 |
| 6 | [vismatch/xfeat-lightglue](https://huggingface.co/vismatch/xfeat-lightglue) | L2 | 2025 |
| 7 | [LightGlueStick arXiv](https://arxiv.org/html/2510.16438v1) | L1 | 2024 |
| 8 | [Agricultural UAV repetitive texture](https://www.sciencedirect.com/science/article/abs/pii/S092427162500190X) | L1 | 2025 |