add clarification to research methodology by including a step for solution comparison and user consultation

2026-06-22 23:11:11 +00:00 · 2026-03-17 18:43:57 +02:00
parent d764250f9a
commit b419e2c04a
35 changed files with 6030 additions and 0 deletions
@@ -0,0 +1,200 @@
+# XFeat vs SuperPoint+LightGlue for Visual Odometry in UAV Navigation
+
+**Research Date**: March 2025  
+**Context**: GPS-denied UAV navigation, frame-to-frame VO from consecutive aerial photos (60–80% overlap, mostly translational motion, ~100m inter-frame spacing, downward-facing camera)
+
+---
+
+## Executive Summary
+
+**Finding**: The switch from XFeat to SuperPoint+LightGlue for VO in solution draft 04 appears to be an **unintentional regression**. For frame-to-frame VO with high overlap and mostly translational motion, XFeat is likely sufficient in quality while being ~10× faster. SuperPoint+LightGlue’s quality advantage is most relevant for wide-baseline and satellite-aerial matching, not for high-overlap consecutive aerial frames.
+
+**Recommendation**: Revert VO to XFeat with built-in matcher. Keep SuperPoint+LightGlue (or LiteSAM) for satellite fine matching only.
+
+---
+
+## 1. XFeat Performance and Quality
+
+### 1.1 Speed
+
+| Setting | Time | Source | Confidence |
+|--------|------|--------|------------|
+| **CPU (Intel i5-1135G7)** | 27.1 FPS (sparse) / 19.2 FPS (semi-dense) at VGA | XFeat paper (CVPR 2024) | High |
+| **CPU** | ~37ms per frame (sparse), ~52ms (semi-dense) | Derived from FPS | High |
+| **GPU** | Not explicitly reported in paper | — | — |
+| **SatLoc-Fusion (RKNN)** | 30 FPS at 640×480 (~33ms) | SatLoc-Fusion (MDPI 2025) | High |
+| **Draft 03 claim** | ~15ms total (extract+match) on GPU | solution_draft03.md | Medium — no direct citation |
+
+**Notes**:
+- The paper reports CPU timings; GPU is expected to be faster.
+- SatLoc-Fusion achieves 30 FPS with XFeat on a 6 TFLOPS edge device after RKNN acceleration.
+- The ~15ms GPU claim in draft 03 is plausible but not directly verified in the XFeat paper.
+
+### 1.2 Quality (Megadepth-1500)
+
+| Method | AUC@5° | AUC@10° | AUC@20° | Acc@10° | MIR | #inliers | FPS (CPU) |
+|--------|--------|---------|---------|---------|-----|----------|-----------|
+| SuperPoint | 37.3 | 50.1 | 61.5 | 67.4 | 0.35 | 495 | 3.0 |
+| XFeat (sparse) | 42.6 | 56.4 | 67.7 | 74.9 | 0.55 | 892 | 27.1 |
+| XFeat* (semi-dense) | 50.2 | 65.4 | 77.1 | 85.1 | 0.74 | 1885 | 19.2 |
+| DISK* | 55.2 | 66.8 | 75.3 | 81.3 | 0.71 | 1997 | 1.2 |
+
+**Source**: [XFeat paper, Table 1](https://arxiv.org/html/2404.19174v1)
+
+**Conclusion**: XFeat outperforms SuperPoint on Megadepth (sparse and semi-dense) while being ~9× faster (sparse) or ~6× faster (semi-dense). Megadepth includes wide-baseline pairs; high-overlap consecutive frames are typically easier.
+
+### 1.3 Downward-Facing Camera / Nadir Aerial
+
+- No direct benchmark for nadir aerial imagery.
+- SatLoc-Fusion uses XFeat for VO on downward-facing UAV imagery (100–300 m altitude, DJI Mavic, 1920×1080).
+- SatLoc-Fusion achieves <15 m absolute localization error and >90% trajectory coverage at >2 Hz on 6 TFLOPS edge hardware.
+- XFeat is trained on Megadepth + COCO; homography estimation on HPatches is strong (illumination and viewpoint splits).
+
+**Confidence**: Medium — SatLoc-Fusion validates XFeat for UAV VO, but not on the exact 100 m inter-frame spacing scenario.
+
+---
+
+## 2. SuperPoint+LightGlue Performance and Quality
+
+### 2.1 Speed
+
+| Component | Time | Source |
+|-----------|------|--------|
+| SuperPoint extraction | ~80ms GPU | solution_draft04.md, LiteSAM assessment |
+| LightGlue ONNX FP16 | ~50–100ms | solution_draft04.md |
+| **Total VO** | **~130–180ms** | solution_draft04.md |
+
+### 2.2 Quality for VO
+
+- LightGlue-based VO on KITTI: ~1% odometry error vs ~3.5–4.1% with FLANN.
+- SuperVINS uses SuperPoint+LightGlue for front-end matching in visual-inertial SLAM.
+- LightGlue is not rotation-invariant (GitHub issue #64).
+
+### 2.3 High-Overlap Consecutive Frames
+
+- High overlap (60–80%) and mostly translational motion are easier than Megadepth’s wide-baseline pairs.
+- SuperPoint+LightGlue’s main strength is contextual matching and robustness to viewpoint/illumination changes.
+- For near-planar, high-overlap aerial pairs, simpler matchers (e.g. MNN) often suffice.
+
+**Conclusion**: SuperPoint+LightGlue is strong for difficult matching, but its advantage is less critical for high-overlap consecutive aerial frames.
+
+---
+
+## 3. SatLoc-Fusion: XFeat in Production UAV VO
+
+**Paper**: [Towards UAV Localization in GNSS-Denied Environments: The SatLoc Dataset and a Hierarchical Adaptive Fusion Framework](https://www.mdpi.com/2072-4292/17/17/3048) (MDPI Remote Sensing, Sept 2025)
+
+### 3.1 Architecture
+
+- **Layer 1**: DINOv2 for aerial–satellite matching (absolute localization).
+- **Layer 2**: XFeat for VO (relative pose between consecutive frames).
+- **Layer 3**: Lucas–Kanade optical flow for velocity.
+
+### 3.2 XFeat Usage
+
+- XFeat for keypoint detection, descriptor extraction, and matching.
+- Homography via DLT + RANSAC.
+- Scale from relative altitude.
+- Fine-tuned on UAV-VisLoc starting from public XFeat weights.
+
+### 3.3 Results
+
+| Metric | Value |
+|--------|-------|
+| Absolute localization error | <15 m |
+| Trajectory coverage | >90% |
+| Throughput | >2 Hz on 6 TFLOPS edge |
+| XFeat inference | 30 FPS at 640×480 (RKNN) |
+
+### 3.4 Ablation
+
+- Without Layer 1 (satellite): MLE 27.84 m vs 14.05 m with full system.
+- Layer 2 (XFeat VO) + Layer 3 (optical flow) provide fallback when satellite matching fails.
+
+**Conclusion**: XFeat is validated for UAV VO in a published 2025 system with similar constraints (downward-facing, low altitude, edge hardware).
+
+---
+
+## 4. Direct Comparison: XFeat vs SuperPoint+LightGlue for VO
+
+| Dimension | XFeat | SuperPoint+LightGlue |
+|-----------|-------|----------------------|
+| **VO time per frame** | ~15–36 ms (GPU/CPU) | ~150–200 ms |
+| **Speed ratio** | ~10× faster | Baseline |
+| **Megadepth AUC@10°** | 65.4 (semi-dense) | 50.1 |
+| **Megadepth #inliers** | 892–1885 | 495 |
+| **Built-in matcher** | Yes (MNN + optional refinement) | No (needs LightGlue) |
+| **VRAM (VO only)** | ~200 MB | ~900 MB |
+| **UAV VO validation** | SatLoc-Fusion (2025) | No direct UAV VO paper |
+| **Rotation invariance** | No | No |
+
+### 4.1 When SuperPoint+LightGlue Helps
+
+- Wide-baseline matching.
+- Satellite–aerial matching (different viewpoints, scale, illumination).
+- Low-texture or repetitive scenes where contextual matching matters.
+
+### 4.2 When XFeat Is Enough
+
+- High-overlap consecutive frames (60–80%).
+- Mostly translational motion.
+- Downward-facing nadir imagery.
+- Real-time or resource-limited systems.
+
+---
+
+## 5. Answer to the Key Question
+
+**For frame-to-frame VO with 60–80% overlap and mostly translational motion, is XFeat sufficient, or does SuperPoint+LightGlue provide materially better results?**
+
+**Answer**: XFeat is likely sufficient. Evidence:
+
+1. **Megadepth**: XFeat outperforms SuperPoint on pose estimation (AUC, inliers).
+2. **Task difficulty**: High-overlap consecutive frames are easier than Megadepth’s wide-baseline pairs.
+3. **SatLoc-Fusion**: XFeat delivers <15 m error and >2 Hz on edge hardware for UAV VO.
+4. **Cost**: SuperPoint+LightGlue is ~10× slower with no clear VO-specific benefit in this scenario.
+5. **Assessment gap**: Draft 04’s assessment targeted satellite matching (SuperPoint+LightGlue → LiteSAM), not VO. The VO change from XFeat to SuperPoint+LightGlue was not justified in the findings.
+
+---
+
+## 6. Sources
+
+| # | Source | Tier | Date | Key Content |
+|---|--------|------|------|-------------|
+| 1 | [XFeat: Accelerated Features (arXiv)](https://arxiv.org/html/2404.19174v1) | L1 | Apr 2024 | Benchmarks, Megadepth, CPU FPS |
+| 2 | [SatLoc-Fusion (MDPI Remote Sensing)](https://www.mdpi.com/2072-4292/17/17/3048) | L1 | Sept 2025 | XFeat for UAV VO, <15 m, >2 Hz |
+| 3 | [accelerated_features GitHub](https://github.com/verlab/accelerated_features) | L1 | 2024 | XFeat implementation |
+| 4 | solution_draft03.md | L2 | Project | XFeat ~15ms VO, SuperPoint for satellite |
+| 5 | solution_draft04.md | L2 | Project | SuperPoint+LightGlue for VO ~150–200ms |
+| 6 | [vo_lightglue](https://github.com/himadrir/vo_lightglue) | L3 | — | LightGlue VO, ~1% error on KITTI |
+| 7 | [Luxonis XFeat](https://models.luxonis.com/luxonis/xfeat/) | L2 | — | XFeat comparable to SuperPoint, faster |
+
+---
+
+## 7. Confidence Summary
+
+| Statement | Confidence |
+|-----------|------------|
+| XFeat is 5–9× faster than SuperPoint on CPU | High |
+| XFeat outperforms SuperPoint on Megadepth | High |
+| SatLoc-Fusion uses XFeat successfully for UAV VO | High |
+| ~15ms GPU claim for XFeat is plausible | Medium |
+| XFeat is sufficient for 60–80% overlap VO | Medium–High |
+| SuperPoint+LightGlue does not materially improve VO for this use case | Medium |
+| VO change in draft 04 was unintentional | High (no assessment finding) |
+
+---
+
+## 8. Recommendation
+
+**Revert VO to XFeat** as in solution draft 03:
+
+- Use XFeat with built-in matcher for frame-to-frame VO.
+- Keep LiteSAM (or SuperPoint+LightGlue) for satellite fine matching only.
+- Expected VO time: ~15–36 ms vs ~150–200 ms with SuperPoint+LightGlue.
+- Total per-frame time should drop from ~350–470 ms to ~230–300 ms.
+
+If VO quality issues appear in testing, consider:
+
+- XFeat semi-dense mode (XFeat*) for more matches.
+- XFeat+LightGlue as a middle ground (faster than SuperPoint+LightGlue, potentially better than XFeat alone).
@@ -0,0 +1,60 @@
+# Source Registry: XFeat vs SuperPoint+LightGlue Low-Texture Matching
+
+## Source #1
+- **Title**: XFeat: Accelerated Features for Lightweight Image Matching
+- **Link**: https://arxiv.org/html/2404.19174v1
+- **Tier**: L1
+- **Publication Date**: 2024-04
+- **Summary**: CVPR 2024 paper. Megadepth-1500 Table 1 (XFeat, SuperPoint, DISK with MNN). Appendix F: LightGlue vs XFeat* (61.4 vs 50.2 AUC@5°). XFeat uses dual-softmax + MNN. Textureless demo vs SIFT. ScanNet indoor generalization.
+- **Related Sub-question**: 1, 2, 3, 5
+
+## Source #2
+- **Title**: LightGlue: Local Feature Matching at Light Speed
+- **Link**: https://openaccess.thecvf.com/content/ICCV2023/html/Lindenberger_LightGlue_Local_Feature_Matching_at_Light_Speed_ICCV_2023_paper.html
+- **Tier**: L1
+- **Publication Date**: 2023
+- **Summary**: ICCV 2023. Transformer matcher with self/cross-attention. Adaptive computation. Typically paired with SuperPoint.
+- **Related Sub-question**: 3
+
+## Source #3
+- **Title**: Nature Sci Rep 2025 - Table 2 Relative pose estimation MegaDepth-1500
+- **Link**: https://www.nature.com/articles/s41598-025-21602-5/tables/2
+- **Tier**: L1
+- **Publication Date**: 2025
+- **Summary**: LightGlue: RANSAC 47.83%, AUC@5° 86.8, AUC@10° 96.3. SuperGlue, OmniGlue, DALGlue comparison. Detector not specified.
+- **Related Sub-question**: 1
+
+## Source #4
+- **Title**: verlab/accelerated_features
+- **Link**: https://github.com/verlab/accelerated_features
+- **Tier**: L1
+- **Summary**: XFeat implementation. Textureless scene demo. LightGlue integration (Issue #67). GlueFactory.
+- **Related Sub-question**: 5
+
+## Source #5
+- **Title**: cvg/LightGlue
+- **Link**: https://github.com/cvg/LightGlue
+- **Tier**: L1
+- **Summary**: LightGlue implementation. Issue #128: XFeat_with_lightglue. SuperPoint pairing.
+- **Related Sub-question**: 5
+
+## Source #6
+- **Title**: vismatch/xfeat-lightglue
+- **Link**: https://huggingface.co/vismatch/xfeat-lightglue
+- **Tier**: L2
+- **Summary**: Pre-trained XFeat+LightGlue model. vismatch library.
+- **Related Sub-question**: 5
+
+## Source #7
+- **Title**: LightGlueStick: Joint Point-Line Matching
+- **Link**: https://arxiv.org/html/2510.16438v1
+- **Tier**: L1
+- **Summary**: Line segments in texture-less regions. LightGlue architecture for low-texture.
+- **Related Sub-question**: 3
+
+## Source #8
+- **Title**: Novel real-time matching for low-overlap agricultural UAV images with repetitive textures
+- **Link**: https://www.sciencedirect.com/science/article/abs/pii/S092427162500190X
+- **Tier**: L1
+- **Summary**: Agricultural UAV, repetitive textures, low overlap. Global texture information for weak-textured regions.
+- **Related Sub-question**: 2, 4
@@ -0,0 +1,243 @@
+# XFeat vs SuperPoint+LightGlue: Low-Texture Aerial Matching Assessment
+
+**Research Date**: March 2025  
+**Context**: GPS-denied UAV navigation over eastern Ukraine — flat agricultural fields, uniform croplands, low-density features, visually repetitive terrain. Downward-facing camera at up to 1 km altitude.
+
+---
+
+## Executive Summary
+
+| Question | Finding | Confidence |
+|----------|---------|------------|
+| XFeat+XFeat_matcher vs SuperPoint+LightGlue on low-texture | **SuperPoint+LightGlue** has higher AUC and RANSAC success; XFeat+MNN is faster but uses a simple matcher | High (benchmarks) |
+| Detector on low-texture | **No direct agricultural benchmark**; XFeat has textureless demo; SuperPoint trained on synthetic shapes | Medium (inferred) |
+| LightGlue advantage on difficult scenes | Attention mechanism helps on repetitive/ambiguous patterns; reduces false matches vs NN | High (paper mechanism) |
+| Worst-case match rate / VO failures | **No published data** on per-frame failure rate or segment breaks | Low (gap) |
+| XFeat+LightGlue | **Available** (GlueFactory, vismatch); ~65–115 ms estimated; best-of-both-worlds option | Medium (implementation exists) |
+
+---
+
+## 1. Detector+Matcher Pairings: Critical Distinction
+
+**All benchmark numbers depend on the exact pairing.** The following table clarifies what each paper measures:
+
+| Pipeline | Detector | Matcher | Source |
+|----------|----------|---------|--------|
+| XFeat (sparse) | XFeat | MNN (Mutual Nearest Neighbor) | XFeat paper Table 1 |
+| XFeat* (semi-dense) | XFeat | MNN + offset refinement | XFeat paper Table 1 |
+| SuperPoint | SuperPoint | MNN | XFeat paper Table 1 |
+| LightGlue | SuperPoint (typical) | LightGlue (attention-based) | LightGlue paper, Nature 2025 |
+| XFeat+LightGlue | XFeat | LightGlue | GlueFactory, vismatch, GitHub #67 |
+
+---
+
+## 2. Megadepth-1500: Actual Numbers by Pipeline
+
+### 2.1 XFeat Paper (CVPR 2024) — All Use MNN
+
+| Method | AUC@5° | AUC@10° | AUC@20° | Acc@10° | MIR | #inliers | FPS (CPU) |
+|--------|--------|---------|---------|---------|-----|----------|-----------|
+| SuperPoint | 37.3 | 50.1 | 61.5 | 67.4 | 0.35 | 495 | 3.0 |
+| XFeat (sparse) | 42.6 | 56.4 | 67.7 | 74.9 | 0.55 | 892 | 27.1 |
+| XFeat* (semi-dense) | 50.2 | 65.4 | 77.1 | 85.1 | 0.74 | 1885 | 19.2 |
+| DISK* | 55.2 | 66.8 | 75.3 | 81.3 | 0.71 | 1997 | 1.2 |
+
+**Source**: [XFeat paper, Table 1](https://arxiv.org/html/2404.19174v1)  
+**Matcher**: MNN for all. XFeat uses dual-softmax loss during training but **MNN at inference**.  
+**Resolution**: Max dimension 1200 px.
+
+### 2.2 XFeat Paper Appendix F — Learned Matchers
+
+| Method | Type | AUC@5° | AUC@10° | AUC@20° | Acc@10° | MIR | #inliers | PPS |
+|--------|------|--------|---------|---------|---------|-----|----------|-----|
+| LightGlue | learned matcher | 61.4 | 75.0 | 84.8 | 91.8 | 0.92 | 475 | 0.31 |
+| XFeat* | coarse-fine | 50.2 | 65.4 | 77.1 | 85.1 | 0.74 | 1885 | 1.33 |
+| LoFTR | learned matcher | 68.3 | 80.0 | 88.0 | 93.9 | 0.93 | 3009 | 0.06 |
+| Patch2Pix | coarse-fine | 47.8 | 61.0 | 71.0 | 77.8 | 0.59 | 536 | 0.05 |
+
+**Source**: [XFeat paper, Appendix F, Table 6](https://arxiv.org/html/2404.19174v1)  
+**Detector for LightGlue**: Not explicitly stated; standard LightGlue model is trained for **SuperPoint**.  
+**Setup**: i7-6700K CPU, 1200 px max dimension, pairs per second (PPS).
+
+**Conclusion**: SuperPoint+LightGlue (61.4% AUC@5°, 84.8% AUC@20°) **outperforms** XFeat+XFeat_matcher (50.2% AUC@5°, 77.1% AUC@20°) on Megadepth-1500. LightGlue has higher MIR (0.92 vs 0.74) and Acc@10° (91.8 vs 85.1).
+
+### 2.3 Nature 2025 (DALGlue Paper) — Different Protocol
+
+| Method | RANSAC % | Precision % | Recall % | AUC@5° | AUC@10° |
+|--------|----------|------------|---------|--------|---------|
+| SuperGlue | 34.18 | 50.32 | 64.16 | 74.6 | 90.5 |
+| LightGlue | 47.83 | 65.48 | 79.04 | **86.8** | **96.3** |
+| OmniGlue | 47.4 | 65.0 | 77.8 | 82.1 | 95.3 |
+| DALGlue | 57.01 | 73.0 | 84.11 | 87.2 | 97.5 |
+
+**Source**: [Nature Sci Rep 2025, Table 2](https://www.nature.com/articles/s41598-025-21602-5/tables/2)  
+**Detector**: Not specified; LightGlue is typically paired with SuperPoint.  
+**Note**: Higher AUC than XFeat Appendix F — likely different resolution (e.g. 1600 px), RANSAC settings, or evaluation protocol.
+
+---
+
+## 3. XFeat Built-in Matcher vs LightGlue
+
+### 3.1 XFeat Matcher
+
+- **Mechanism**: Dual-softmax nearest-neighbor. Similarity matrix S = F1·F2^T; softmax row-wise and column-wise; mutual nearest neighbor selection.
+- **Training**: Dual-softmax loss (Eq. 3 in XFeat paper) supervises descriptors.
+- **Inference**: MNN search on descriptors. No attention, no contextual refinement.
+- **Limitation**: On repetitive/ambiguous patterns, nearest-neighbor can produce many false matches; no geometric reasoning.
+
+### 3.2 LightGlue Matcher
+
+- **Mechanism**: Transformer with self-attention (within image) and cross-attention (across images). Rotary positional encoding. Matchability-aware pruning.
+- **Advantage**: Contextual matching — can disambiguate repetitive structures using neighborhood and global structure.
+- **Adaptive**: Early exit on easy pairs; more computation on difficult pairs.
+- **Source**: [LightGlue ICCV 2023](https://openaccess.thecvf.com/content/ICCV2023/html/Lindenberger_LightGlue_Local_Feature_Matching_at_Light_Speed_ICCV_2023_paper.html)
+
+### 3.3 Which Performs Better on Sparse, Repetitive Features?
+
+**Measured**: LightGlue (with SuperPoint) achieves higher AUC and MIR than XFeat+MNN on Megadepth-1500. Megadepth includes repetitive structures and viewpoint changes.
+
+**Inferred**: On low-texture, repetitive agricultural terrain:
+- **LightGlue** should reduce false matches by using attention over keypoint neighborhoods.
+- **XFeat+MNN** may produce more matches (#inliers 1885 vs 475) but with lower precision (MIR 0.74 vs 0.92).
+- For VO, **precision matters** — false matches corrupt RANSAC and cause pose drift. LightGlue’s higher MIR suggests fewer outliers.
+
+**Confidence**: High for mechanism; Medium for low-texture agricultural extrapolation (no direct benchmark).
+
+---
+
+## 4. SuperPoint vs XFeat Keypoint Detection on Low-Texture
+
+### 4.1 SuperPoint
+
+- **Training**: Synthetic shapes (Homographic Adaptation) + self-supervised on synthetic warps. Trained on indoor/outdoor imagery.
+- **Low-texture**: No explicit low-texture training. "Dustbin" channel helps reject non-interest points. Homographic Adaptation improves repeatability across transformations.
+- **Agricultural**: Extensions like SuperPoint-E use tracking adaptation for low-texture endoscopy; no agricultural-specific variant found.
+
+### 4.2 XFeat
+
+- **Training**: Megadepth + COCO (6:4 hybrid). Keypoint head distilled from ALIKE-Tiny (low-level features: corners, lines, blobs).
+- **Low-texture**: GitHub demo shows "SIFT cannot handle fast camera movements, while XFeat provides robust matches" on a **textureless scene** ([verlab/accelerated_features](https://github.com/verlab/accelerated_features)).
+- **Lightweight**: 64-D descriptors; fewer keypoints in uniform areas by design (reliability map R filters low-confidence regions).
+
+### 4.3 Which Extracts More Repeatable Keypoints on Flat Agricultural Terrain?
+
+**Measured**: None. No benchmark on agricultural or flat cropland imagery.
+
+**Inferred**:
+- XFeat’s textureless demo suggests it handles low-texture better than SIFT.
+- XFeat’s ScanNet-1500 results (indoor, often texture-poor) show XFeat outperforming DISK and ALIKE — "indoor imagery often lacks distinctiveness at the local level" (XFeat Appendix E).
+- SuperPoint’s generalization comes from synthetic training; agricultural uniformity may be out-of-distribution.
+- **Conclusion**: XFeat may have an edge on texture-poor scenes based on ScanNet and textureless demo; SuperPoint has no such evidence. Confidence: Medium.
+
+---
+
+## 5. LightGlue Advantage on Difficult Scenes
+
+### 5.1 Attention Mechanism
+
+- Self-attention: aggregates information within each image.
+- Cross-attention: matches features across images with context.
+- Helps distinguish repetitive patterns by using neighborhood structure.
+
+### 5.2 False Match Reduction
+
+- LightGlue predicts **matchability** scores and prunes low-confidence matches.
+- MIR 0.92 (LightGlue) vs 0.74 (XFeat*) indicates a much higher fraction of matches that comply with the estimated model after RANSAC.
+- **Interpretation**: LightGlue produces fewer but more reliable matches; XFeat* produces more matches but with more outliers.
+
+### 5.3 Repetitive/Ambiguous Patterns
+
+- LightGlueStick (line+point) explicitly targets "line segments abundant in texture-less regions" ([LightGlueStick arXiv](https://arxiv.org/html/2510.16438v1)).
+- For point-only matching, LightGlue’s attention still helps disambiguate when features look similar.
+
+**Confidence**: High for mechanism; Medium for agricultural extrapolation.
+
+---
+
+## 6. Worst-Case Match Rate / VO Failure
+
+### 6.1 What Matters for VO
+
+- A single frame failure can cause a segment break.
+- Metrics like AUC and MIR are averaged over many pairs; they do not directly measure "percentage of frames that fail to produce a valid pose."
+
+### 6.2 Data Availability
+
+| Metric | Availability |
+|--------|--------------|
+| AUC, Acc@10°, MIR | Yes (Megadepth, etc.) |
+| Per-frame success rate | **No** |
+| Segment break rate | **No** |
+| Match failure rate on difficult sequences | **No** |
+
+### 6.3 Inference
+
+- Higher MIR and AUC typically correlate with fewer RANSAC failures.
+- SuperPoint+LightGlue’s higher MIR (0.92 vs 0.74) suggests fewer frames where RANSAC would fail to find a valid pose.
+- **No quantitative evidence** for VO-specific failure rates.
+
+**Confidence**: Low — purely inferred.
+
+---
+
+## 7. XFeat+LightGlue Option
+
+### 7.1 Feasibility
+
+| Source | Finding |
+|--------|---------|
+| [GitHub Issue #67](https://github.com/verlab/accelerated_features/issues/67) | XFeat+LightGlue via GlueFactory |
+| [GitHub Issue #128](https://github.com/cvg/LightGlue/issues/128) | XFeat_with_lightglue discussion |
+| [vismatch/xfeat-lightglue](https://huggingface.co/vismatch/xfeat-lightglue) | Pre-trained model on HuggingFace |
+| [noahzhy/xfeat_lightglue_onnx](https://github.com/noahzhy/xfeat_lightglue_onnx) | ONNX deployment |
+
+**Conclusion**: XFeat+LightGlue is **implemented and available**.
+
+### 7.2 Speed Estimate
+
+| Component | Time | Source |
+|-----------|------|--------|
+| XFeat extraction | ~15 ms (GPU) | XFeat ~27 FPS CPU → ~15 ms plausible on GPU |
+| LightGlue matching | ~50–100 ms | solution_draft04, LightGlue ONNX |
+| **Total** | **~65–115 ms** | Sum |
+
+**Note**: XFeat is faster than SuperPoint (~15 ms vs ~80 ms GPU), so XFeat+LightGlue would be faster than SuperPoint+LightGlue (~130–180 ms total).
+
+### 7.3 Best of Both Worlds?
+
+- **XFeat**: Fast extraction, lightweight, good on textureless (demo), strong on ScanNet (indoor).
+- **LightGlue**: Contextual matching, high MIR, fewer false matches.
+- **Combination**: Faster than SuperPoint+LightGlue; potentially better quality than XFeat+MNN on difficult scenes.
+
+**No published benchmark** for XFeat+LightGlue on Megadepth or agricultural data. **Inferred** benefit: Medium confidence.
+
+---
+
+## 8. Summary Table: Measured vs Inferred
+
+| Statement | Type | Confidence |
+|-----------|------|------------|
+| SuperPoint+LightGlue > XFeat+MNN on Megadepth (AUC, MIR) | Measured | High |
+| LightGlue uses attention; XFeat uses MNN | Measured | High |
+| LightGlue reduces false matches vs NN (higher MIR) | Measured | High |
+| XFeat handles textureless better than SIFT (demo) | Measured | High |
+| XFeat generalizes well to indoor (ScanNet) | Measured | High |
+| SuperPoint+LightGlue better on low-texture agricultural | Inferred | Medium |
+| XFeat detects more repeatable keypoints on flat terrain | Inferred | Medium |
+| XFeat+LightGlue gives best of both worlds | Inferred | Medium |
+| Worst-case match rate / VO failure data | Gap | — |
+
+---
+
+## 9. Sources
+
+| # | Source | Tier | Date |
+|---|--------|------|------|
+| 1 | [XFeat arXiv](https://arxiv.org/html/2404.19174v1) | L1 | Apr 2024 |
+| 2 | [LightGlue ICCV 2023](https://openaccess.thecvf.com/content/ICCV2023/html/Lindenberger_LightGlue_Local_Feature_Matching_at_Light_Speed_ICCV_2023_paper.html) | L1 | 2023 |
+| 3 | [Nature Sci Rep 2025, Table 2](https://www.nature.com/articles/s41598-025-21602-5/tables/2) | L1 | 2025 |
+| 4 | [verlab/accelerated_features](https://github.com/verlab/accelerated_features) | L1 | 2024 |
+| 5 | [cvg/LightGlue](https://github.com/cvg/LightGlue) | L1 | 2023 |
+| 6 | [vismatch/xfeat-lightglue](https://huggingface.co/vismatch/xfeat-lightglue) | L2 | 2025 |
+| 7 | [LightGlueStick arXiv](https://arxiv.org/html/2510.16438v1) | L1 | 2024 |
+| 8 | [Agricultural UAV repetitive texture](https://www.sciencedirect.com/science/article/abs/pii/S092427162500190X) | L1 | 2025 |