Files
gps-denied-onboard/_docs/00_research/01_source_registry/C3_matchers.md
T
Oleksandr Bezdieniezhnykh 846670a5c5 Refactor documentation for splittable artifacts and update references
Updated various documentation files to clarify the handling of splittable artifacts, allowing for folder equivalents of key markdown files when they exceed size limits. Adjusted references in multiple sections to reflect this new structure, ensuring consistency across the research methodology. Enhanced clarity on the saving actions and artifact organization, particularly for `01_source_registry.md`, `02_fact_cards.md`, and `06_component_fit_matrix.md`. This change aims to improve usability and maintainability of the research documentation.
2026-05-08 23:39:30 +03:00

128 KiB
Raw Blame History

Source Registry — C3 — Cross-domain matcher candidates

Mode A Phase 2 — engine Step 2 (Source Tiering & Exhaustive Web Investigation). Critical-novelty sensitivity per Step 0.5 in ../00_question_decomposition.md. Time windows applied:

  • Lead-candidate / SOTA claims: prefer sources within last 6 months; up to 18 months if older is the official authority.
  • Library/SDK API behaviour: must reflect the currently shipped version at search time (context7 mandatory per lead candidate).
  • Established baselines (KLT, RANSAC, EKF, ORB, SIFT, GTSAM): no time window.

This file replaces a section of the previous monolithic 01_source_registry.md. See 00_summary.md for the full category index. Investigation order is tracked in ../00_question_decomposition.md and the cross-category Investigation Status table in 00_summary.md.


Source #69

  • Title: LightGlue — context7 per-mode capability lookup (/cvg/lightglue, main) — High source reputation, benchmark score 85.4, 64 code snippets indexed
  • Link: context7 query against /cvg/lightglue, accessed 2026-05-08; canonical doc references returned: https://context7.com/cvg/lightglue/llms.txt, snippets Initialize LightGlue Feature Matcher, LightGlue - Feature Matcher Initialization, Initialize SuperPoint Feature Extractor, Initialize and Use DISK Feature Extractor, Initialize and Use SIFT Feature Extractor, Perform Feature Matching with LightGlue, Complete Matching Pipeline Example, Initialize and Use SuperPoint + LightGlue Matcher, Extract Matched Keypoint Coordinates
  • Tier: L1 (project-official codebase by canonical LightGlue authors Philipp Lindenberger + Paul-Edouard Sarlin + Marc Pollefeys, ETH Zurich + Microsoft Mixed Reality & AI Lab; context7 indexed at /cvg/lightglue with High reputation, benchmark 85.4, 64 code snippets — confirms this is a widely-adopted reference implementation)
  • Publication Date: live docs (main HEAD, accessed 2026-05-08)
  • Timeliness Status: Within Critical-novelty window (active main + community evidence through 20252026 — see Source #73 LightGlue-ONNX changelog with January 2026 entries; HuggingFace Transformers integration mentioned in canonical README confirms continued active distribution)
  • Version Info: main HEAD at access time. Mode-enumeration query (1/3) PASS — all five extractor modes documented as first-class features enum values: superpoint (256-D descriptors, MagicLeap pretrained), disk (128-D, Apache-2.0 weights), aliked (128-D, BSD-3-Clause), sift (128-D, includes scale + orientation, classical), doghardnet (128-D, includes scale + orientation). Construction signature: LightGlue(features: str, n_layers: int = 9, depth_confidence: float = 0.8, width_confidence: float = 0.9, filter_threshold: float = 0.1, flash: bool = False, mp: bool = False). NOTE — version skew between context7 docstring and canonical README defaults: context7 docstring says depth_confidence=0.8 and width_confidence=0.9; canonical README §"Advanced configuration" says depth_confidence=0.95, disable with -1 and width_confidence=0.99, disable with -1 and flash=True (LightGlue automatically detects if FlashAttention is available). The canonical README values are authoritative for the live source. PyTorch ≥2.0 enables matcher.compile(mode='reduce-overhead') for additional speedup (with caveat: for inputs <1536 keypoints compiles but disables point pruning; for larger inputs falls back to eager mode with point pruning).
  • Target Audience: System architects + C3 implementer + Step-7.5 reviewer
  • Research Boundary Match: Full match for the project's pinned C3 mode (SuperPoint feature extractor + LightGlue matcher in single-pair single-image-pair inference mode on GPU, with feats_q = extractor.extract(image_q), feats_t = extractor.extract(image_t), matches_qt = matcher({'image0': feats_q, 'image1': feats_t}) followed by rbd() to remove batch dimension). The canonical pipeline and the rbd helper handle asymmetric image-pair sizes (UAV nadir 5472×3648 vs satellite tile of any size at 0.5 m/px) since each image is independently extracted before matching. Open: context7 Disqualifier-Probe query did not surface ONNX/TensorRT export paths inside the cvg/lightglue repo itself — those are documented in the companion fabio-sim/LightGlue-ONNX project (Source #73), which the canonical README explicitly links to. Did not surface Jetson-specific latency/memory measurements (similarly to all C2 candidates — Jetson MVE phase will resolve).
  • Summary: Confirms LightGlue's per-mode API surface and runnable example for single-pair feature matching: each (features, matcher) extractor-matcher tuple is a separately-cataloged sibling mode per the Per-Mode API rule. Confirmed five sibling modes via features= enum: SuperPoint+LightGlue, DISK+LightGlue, ALIKED+LightGlue, SIFT+LightGlue, DoGHardNet+LightGlue. Canonical inference signature is matcher({'image0': feats0, 'image1': feats1}) returning {matches0, matches1, matching_scores0, matching_scores1, matches: List[[K,2]], scores: List[[K]], stop: int}; rbd(x) helper removes the batch dimension to extract single-pair tensors. The points0 = feats0['keypoints'][matches[..., 0]] and points1 = feats1['keypoints'][matches[..., 1]] extraction pattern produces 2D-2D correspondences directly consumable by the project's downstream C4 PnP+RANSAC pose estimator. Performance configuration knobs documented: depth_confidence, width_confidence, filter_threshold, flash, mp, n_layers, compile(). Open: cross-domain (UAV nadir × ortho satellite) recall numbers absent from context7-indexed snippets (concentration on phototourism/visual-localization benchmarks); aerial-domain validation requires Jetson MVE on AerialExtreMatch + Derkachi flight per D-C3 deferred phase.
  • Related Sub-question: SQ3+SQ4 / C3 — LightGlue per-mode API capability verification (Mandatory context7 lookup per Per-Mode API Capability Verification rule)

Source #70

  • Title: LightGlue canonical implementation — cvg/LightGlue (Lindenberger, Sarlin, Pollefeys — ICCV 2023) — official PyTorch reference implementation, README + LICENSE, demo notebook (demo.ipynb), benchmark script (benchmark.py), training/eval framework reference (companion cvg/glue-factory); pretrained weights for SuperPoint + DISK + ALIKED + SIFT + DoGHardNet local features; HuggingFace Transformers integration (pip install transformers, model card ETH-CVG/lightglue_superpoint); kornia integration (kornia.feature.LightGlue and kornia.feature.LightGlueMatcher); hloc integration for Structure-from-Motion + visual localization; LightGlue-ONNX export project (Source #73)
  • Link: README https://raw.githubusercontent.com/cvg/LightGlue/main/README.md (accessed 2026-05-08); LICENSE https://raw.githubusercontent.com/cvg/LightGlue/main/LICENSE (accessed 2026-05-08); repo https://github.com/cvg/LightGlue ; HuggingFace model card https://huggingface.co/ETH-CVG/lightglue_superpoint ; companion training framework https://github.com/cvg/glue-factory ; companion ONNX export https://github.com/fabio-sim/LightGlue-ONNX (Source #73); kornia API https://kornia.readthedocs.io/en/latest/feature.html#kornia.feature.LightGlue
  • Tier: L1 (project-official codebase by the canonical LightGlue authors Philipp Lindenberger + Paul-Edouard Sarlin + Marc Pollefeys, ETH Zurich + Microsoft Mixed Reality & AI Lab; same author group as cvg/Hierarchical-Localization (hloc), cvg/glue-factory, cvg/pixel-perfect-sfm)
  • Publication Date: README live; main HEAD active through 20242026 (HuggingFace Transformers integration is recent — @sbucaille credited; LightGlue-ONNX companion has January 2026 entries per Source #73); canonical paper ICCV 2023
  • Timeliness Status: ⚠️ Borderline — paper Jun 2023 / ICCV 2023 is at the edge of the Critical-novelty 18-month window for SQ3+SQ4 component selection; HOWEVER, LightGlue is treated as the SOTA sparse matcher reference baseline in every modern (20242026) feature-matching paper, the algorithmic content is stable, the canonical implementation is actively maintained (HuggingFace Transformers integration adds plug-and-play API), the LightGlue-ONNX project (Source #73) is actively maintained through January 2026 with FP8 quantization workflow added, and Berton+Trivigno's gmberton/auto_VPR companion harness for the C2 row also explicitly evaluates SP+LightGlue as the C3 reference matcher. Per the engine's Established-baseline exemption applicable to widely-adopted reference algorithms, LightGlue's canonical role is the sparse-matcher SOTA reference point for the C3 row; freshness concerns are on (a) emerging successors that improve on LightGlue (XFeat 2024, XFeat* 2024, SuperGlue/LightGlue successor candidates), (b) aerial-domain transfer of canonical phototourism-trained weights (same caveat as C2 candidates' aerial-domain training caveat — D-C3-1 raised by this closure)
  • Version Info: main HEAD; PyTorch (≥2.0 recommended for FlashAttention auto-detection + compile() support); installation via git clone && pip install -e .; canonical inference five-line pipeline shown in README (loads LightGlue + SuperPoint/DISK/SIFT/ALIKED/DoGHardNet from the lightglue package); lightglue.utils.load_image returns torch.Tensor[3, H, W] normalized to [0,1]; lightglue.utils.rbd removes batch dimension; lightglue.match_pair(extractor, matcher, image0, image1) is a one-call convenience method; lightglue.viz2d.{plot_images, plot_keypoints, plot_matches, save_plot} for visualization. Default LightGlue construction parameters per canonical README (authoritative over context7 docstring): n_layers=9 (all layers), flash=True (auto-detected when available), mp=False, depth_confidence=0.95 (disable with -1), width_confidence=0.99 (disable with -1), filter_threshold=0.1. Reported benchmark numbers (canonical README + paper): 150 FPS @ 1024 keypoints on RTX 3080 (= ~6.7 ms per pair, with compilation + adaptivity); 50 FPS @ 4096 keypoints on RTX 3080 (= 20 ms per pair); 410× speedup over SuperGlue depending on input difficulty; 20 FPS @ 512 keypoints on Intel i7 10700K CPU (= ~50 ms per pair, CPU baseline). License: Apache-2.0 for cvg/LightGlue code AND for cvg/LightGlue's pre-trained weights AND for the bundled DISK weights (DISK is published under Apache-2.0). CRITICAL CAVEAT for the SuperPoint-extractor-mode: the SuperPoint pretrained weights AND its inference file lightglue/superpoint.py follow magicleap/SuperPointPretrainedNetwork's separate, restrictive license — see Source #72 for the full license text. ALIKED is published under BSD-3-Clause; SIFT is patent-free since 2020 in OpenCV, classical algorithm with no weight-licensing concern. 4.7k+ stars at canonical repo
  • Target Audience: System architects + C3 implementer + Step-7.5 reviewer + license-posture decision-maker (D-C1-1 + D-C3-1 NEW)
  • Research Boundary Match: Full match for the project's pinned C3 mode (single-image-pair sparse feature matching: take a UAV nadir frame + a retrieved satellite tile, run feature extraction on each independently, match via LightGlue, return 2D-2D correspondences with confidence scores feeding the project's downstream C4 PnP+RANSAC pose estimator). The repo ships everything needed: extractor classes (SuperPoint, DISK, ALIKED, SIFT, DoGHardNet), matcher class (LightGlue), I/O utilities (load_image, rbd, match_pair), visualization (viz2d), benchmark tooling (benchmark.py). Asymmetric image-pair sizes are handled natively: each image is independently fed through extractor.extract(...) which auto-resizes to the extractor's preferred resolution (default 1024 on longest edge for SuperPoint), then matched — the matcher operates on (keypoint coords, descriptor vectors) tuples that are size-independent. Partial match for the project's domain (canonical training on synthetic homographies of Oxford-Paris 1M distractors + fine-tuning on MegaDepth phototourism — neither dataset is aerial nadir; NO aerial nadir benchmark in the canonical paper, same aerial-domain caveat as the C2 candidates; aerial applicability is referenced transitively via Zhang et al. 2022 ISPRS [paper ref [83]], but explicit aerial-nadir validation is project-side via Jetson MVE on AerialExtreMatch + Derkachi flight)
  • Summary: LightGlue is the canonical reference implementation of the ICCV 2023 paper "LightGlue: Local Feature Matching at Light Speed" by Lindenberger, Sarlin, Pollefeys. CRITICAL LICENSE FINDING: LICENSE file is Apache-2.0 (Copyright 2023 ETH Zurich) — permissive; this places LightGlue ITSELF on the BSD/permissive license track alongside MixVPR (MIT) + SelaVPR (MIT) + NetVLAD-canonical (MIT) + EigenPlaces (MIT) + Kimera-VIO (BSD-2) + OKVIS2 (BSD-3) + DPVO (MIT) + pure-VO baseline (OpenCV-Apache-2.0). DISK (the second extractor mode) is also Apache-2.0 per the canonical README. ALIKED is BSD-3-Clause. SIFT is classical patent-free in OpenCV. HOWEVER, the SuperPoint pretrained weights AND the lightglue/superpoint.py inference file follow magicleap/SuperPointPretrainedNetwork's license — see Source #72 — which is "ACADEMIC OR NON-PROFIT ORGANIZATION NONCOMMERCIAL RESEARCH USE ONLY". This is a HARD DISQUALIFIER for the canonical SP+LightGlue pinned mode in the project's commercial/dual-use deployment context (eastern/southern Ukraine fixed-wing UAV is explicitly dual-use military per the project disqualifier "anything whose license blocks military / dual-use deployment"). License-track summary: cvg/LightGlue itself = Apache-2.0 (BSD/permissive track, no commercial restriction); SP weights = Magic Leap restrictive (NOT BSD/permissive, blocks commercial/dual-use); DISK weights = Apache-2.0; ALIKED weights = BSD-3-Clause; SIFT = classical no-license-concern. Plan-phase decision raised (will be tagged D-C3-1): swap canonical SP+LightGlue for one of: (a) DISK+LightGlue (Apache-2.0 throughout, paper Table 6+7 demonstrates DISK+LightGlue often outperforms SP+LightGlue on Image Matching Challenge benchmarks); (b) ALIKED+LightGlue (BSD-3-Clause + Apache-2.0); (c) re-train a SuperPoint-class extractor under permissive license (e.g., kornia's reproduction OR retrain on aerial nadir corpus); (d) accept Magic Leap noncommercial-research license for the project's research/development phase only with explicit Plan-phase commitment to swap before production deployment. Multiple downstream integration points documented: (i) HuggingFace Transformerspip install transformers plug-and-play with ETH-CVG/lightglue_superpoint model card (separate license terms inherited from HuggingFace + Magic Leap stack); (ii) korniakornia.feature.LightGlue and kornia.feature.LightGlueMatcher interfaces; (iii) hloc — Structure-from-Motion + visual localization toolbox; (iv) LightGlue-ONNX — ONNX/TensorRT/OpenVINO/FP16/FP8 export project (Source #73); (v) Image Matching WebUI — comparison harness. Performance: 150 FPS @ 1024 keypoints on RTX 3080 with compilation + adaptivity (= ~6.7 ms per pair) and 50 FPS @ 4096 keypoints on RTX 3080 (= 20 ms per pair). Adaptive depth and width (paper §3.3) reduce inference time by ~33% at <1% loss of accuracy on common workloads (paper Table 11 ablation). At the project's expected per-frame 2 image-pair load (UAV-nadir → top-1 satellite tile after C2 retrieval, possibly 2-5 pairs if K=510 top-K reranking), Jetson Orin Nano Super extrapolation factor 4-6× of RTX 3080 baseline → ~3060 ms per pair @ 1024 keypoints at fp16 + TensorRT; ~80120 ms per pair @ 2048 keypoints. Additional speedups available via LightGlue-ONNX (Source #73) up to FP8 quantization (factor ~2× over fp16). Architecture: 9 transformer layers (self-attention + cross-attention per layer), 4 attention heads per unit, descriptor dimension d=256; rotary positional encoding (relative, applied at each self-attention); soft partial assignment matrix combining similarity + matchability scores; bidirectional cross-attention saves ~33% time; deep supervision (loss at every layer, stops early when confident). Training: pre-train on synthetic homographies of Oxford-Paris 1M distractors (170k images, 6M image pairs, 2 days on 2 RTX 3090) + fine-tune on MegaDepth phototourism (368/5/24 train/val/test scenes, 50 epochs, 2 days on 2 RTX 3090, 32 image pairs per batch with gradient checkpointing). Modern lineage / successors: ALIKED+LightGlue, DoGHardNet+LightGlue (added to canonical repo post-paper); XFeat (CVPR 2024 — separately-cataloged C3 candidate per Fact #26 NGPS template confirmed; documented to outperform LightGlue on speed at slightly lower accuracy); MASt3R (separately-cataloged but pruned by Fact #26 due to dense-matcher latency on Jetson)
  • Related Sub-question: SQ3+SQ4 / C3 — LightGlue per-mode API capability verification (Mandatory context7 lookup PASS — /cvg/lightglue indexed with High source reputation and benchmark score 85.4; cross-validated against canonical README + LICENSE WebFetch + canonical paper WebFetch [Source #71])

Source #71

  • Title: LightGlue canonical paper — "LightGlue: Local Feature Matching at Light Speed" (Lindenberger, Sarlin, Pollefeys — ICCV 2023, arXiv:2306.13643)
  • Link: arXiv https://arxiv.org/abs/2306.13643 (Jun 2023); ICCV 2023 published version (citation booktitle = ICCV); accessed 2026-05-08
  • Tier: L1 (peer-reviewed ICCV 2023 + canonical implementation cross-referenced; most-cited modern sparse matcher paper of the post-SuperGlue era, treated as the SOTA sparse-matcher reference baseline in every 20242026 feature-matching paper)
  • Publication Date: arXiv preprint 2023-06-23; ICCV 2023 acceptance October 2023
  • Timeliness Status: ⚠️ Borderline — paper Jun 2023 is at the edge of the Critical-novelty 18-month window for SQ3+SQ4; HOWEVER, the Established-baseline exemption applies (LightGlue is the canonical sparse-matcher reference baseline, like NetVLAD is for VPR), the algorithmic content is stable, the canonical implementation is actively maintained (HuggingFace Transformers integration recent), and 20242026 successor candidates (XFeat, XFeat*) explicitly position themselves as LightGlue alternatives in the same paper space. Freshness concerns are (a) successor candidates that improve on LightGlue (XFeat 2024 separately-cataloged), (b) aerial-domain weights (the project's D-C3-1 + same caveat as C2 candidates)
  • Version Info: arXiv v1 (Jun 2023, ICCV camera-ready); paper §3 architecture + §3.3 adaptive depth/width + §4 training recipe + §5 experiments + Appendix A IMC 2020/2021/2023 + Appendix B MegaDepth-1800 / Aachen v1.1 / InLoc + Appendix C implementation details + Appendix D timing breakdowns
  • Target Audience: System architects + C3 implementer + Step-7.5 reviewer
  • Research Boundary Match: Full match for the algorithm (sparse feature matching with adaptive-depth + adaptive-width transformer pruning, soft partial assignment matrix combining similarity + matchability, bidirectional cross-attention, rotary positional encoding); partial match for the project's domain (paper benchmarks: HPatches homography estimation [§5.1, planar scenes with illumination + viewpoint changes], MegaDepth-1500 + MegaDepth-1800 relative pose estimation [§5.2 + Appendix B, outdoor phototourism], Aachen Day-Night + Aachen v1.1 outdoor visual localization [§5.3 + Appendix B, urban day/night], InLoc indoor visual localization [Appendix B], Image Matching Challenge 2020/2021/2023 [Appendix A, phototourism]; NO aerial nadir benchmark in the canonical paper). Critical paper reference [83]: paper §1 Related work cites "Zhang et al. ISPRS Journal of Photogrammetry and Remote Sensing 2022 — Feature matching for multi-epoch historical aerial images" as documentary evidence that "SuperGlue generalizes well to aerial matching" — by transitive lineage (LightGlue is the SuperGlue successor with documented 4-10× speedup), this provides weak documentary evidence that LightGlue is similarly applicable to aerial matching, but NOT explicit aerial-nadir validation.
  • Summary: The canonical paper introduces LightGlue = a deep neural network for sparse feature matching that is faster, more accurate, and easier to train than SuperGlue, with the central novelty being adaptivity to image-pair difficulty (paper §3.3): (a) adaptive depth — predict a confidence score per point per layer; halt the inference at any layer if a sufficient ratio α (default 95%) of points are confident; (b) adaptive width — discard at each layer the points that are confidently predicted as both confident and unmatchable (paper Eq. 13: unmatchable(i) = c_i^l > λ_l & σ_i^l < β with β=0.01); these two mechanisms reduce inference time by ~33% on average (paper §5.4 + Table 5: 1.86× speedup on easy pairs, 1.16× on hard pairs, 1.45× average) at <1% accuracy loss. Architecture (paper §3.1 + §3.5 + Appendix C.1): stack of L=9 transformer layers, each with one self-attention + one cross-attention unit; descriptor dimension d=256; 4 attention heads per unit; rotary positional encoding [Su et al. 2023 RoFormer reference [67]] applied to query+key in self-attention only (NOT cross-attention) with a learned 2D Fourier basis; bidirectional cross-attention that computes the similarity matrix only once per layer (saves ~33% time vs full cross-attention per Appendix D); soft partial assignment matrix P combining pairwise similarity scores S_{ij} = Linear(x_i^A)^T Linear(x_j^B) and unary matchability scores σ_i = Sigmoid(Linear(x_i)) via P_{ij} = σ_i^A · σ_j^B · Softmax_k(S_{kj})_i · Softmax_k(S_{ik})_j (Eq. 8); filter threshold τ=0.1 (Eq. 8 + Appendix C.4) — pairs (i,j) yield correspondence when P_{ij} > τ AND P_{ij} is the row-max AND column-max. Reported headline performance: HPatches homography estimation (Table 1, SuperPoint+LightGlue, 1024 keypoints) R=94.3 / P=88.9 (best precision among sparse matchers, +1.5 over SuperGlue 87.4); AUC-DLT@5px=78.6 (vs SuperGlue 76.7, vs SGMNet 76.0; competitive with dense LoFTR 70.6). MegaDepth-1500 relative pose estimation (Table 2, SuperPoint+LightGlue with LO-RANSAC) AUC@5°/10°/20°=66.7/79.3/87.9 (vs SuperGlue 65.8/78.7/87.5; vs LoFTR 66.4/78.6/86.5 — competitive with dense matcher at fraction of inference time); inference time 44.2 ms standard / 31.4 ms adaptive. Aachen Day-Night visual localization (Table 3, SuperPoint+LightGlue with hloc + NetVLAD top-50 retrieval) Day (0.25m,2°)/(0.5m,5°)/(1.0m,10°) = 89.2/95.4/98.5, Night = 87.8/93.9/100, 17.2 pairs/sec (26.1 optimized) — competitive with SuperGlue at 2.54× higher throughput. CRITICAL OBSERVATION FOR THE PROJECT: the Aachen Day-Night benchmark (Table 3) directly demonstrates the NetVLAD top-K retrieval → SP+LightGlue matching → PnP+RANSAC pose estimation pipeline, which is exactly the project's intended pipeline shape (C2 NetVLAD/MixVPR/SelaVPR/EigenPlaces top-K retrieval → C3 SP+LightGlue match → C4 PnP+RANSAC). The reported pose accuracies and throughput are documentary evidence that the chosen architectural pattern is canonical and well-validated in the visual-localization community. Indirect aerial evidence (paper §1 Related work + ref [83]): paper cites "Zhang et al. 2022 ISPRS" as evidence that SuperGlue generalizes well to aerial matching; LightGlue inherits this generalization by being the strict successor. Image Matching Challenge benchmarks (Appendix A, Tables 6+7): SP+LightGlue beats SP+SuperGlue both in IMC 2020 stereo (AUC@5°=59.03 vs 58.64) and IMC 2021 phototourism (50.2 / 62.6 vs SuperGlue 49.9 / 62.2); DISK+LightGlue beats SP+LightGlue by +8% / +5% AUC on stereo / multi-view (IMC 2020), with ~30% more matches at higher epipolar precision — important Plan-phase signal that DISK+LightGlue is competitive with SP+LightGlue and may be preferable when the SuperPoint license is the binding constraint. Adaptive variant at IMC 2023 (Appendix A): SP+LightGlue 38.4 / 46.1 public/private (vs SP+SuperGlue 36.1 / 43.8 — +2.3% improvement). Ease of training: paper §4 + Figure 5 — LightGlue reaches SuperGlue parity in 5M image pairs (~2 GPU-days) vs SuperGlue's 7+ days; fits 32 image pairs on 24 GB VRAM with gradient checkpointing + mixed precision. License (canonical implementation): Apache-2.0 (per Source #70 LICENSE) — permissive, BSD/permissive license track; SuperPoint pretrained weights are Magic Leap noncommercial-research only (Source #72 disqualifier).
  • Related Sub-question: SQ3+SQ4 / C3 — LightGlue per-mode API capability verification (cross-source verification of the canonical implementation's mode/parameter/training-recipe/Recall@K + AUC + throughput claims; Aachen Day-Night benchmark Table 3 documentary evidence for the project's intended pipeline shape NetVLAD top-K → SP+LightGlue → PnP+RANSAC; aerial-domain caveat documented; D-C3-1 SuperPoint license disqualifier raised)

Source #72

  • Title: SuperPoint pretrained weights LICENSE — magicleap/SuperPointPretrainedNetwork LICENSE — "ACADEMIC OR NON-PROFIT ORGANIZATION NONCOMMERCIAL RESEARCH USE ONLY" Software License Agreement; binding on the SuperPoint weights AND on the lightglue/superpoint.py inference file used by cvg/LightGlue for the SP+LightGlue mode
  • Link: https://raw.githubusercontent.com/magicleap/SuperPointPretrainedNetwork/master/LICENSE (accessed 2026-05-08); repo https://github.com/magicleap/SuperPointPretrainedNetwork
  • Tier: L1 (canonical Magic Leap LICENSE file controlling the SuperPoint pretrained weights distribution)
  • Publication Date: SuperPoint paper CVPR 2018 Workshop (DeTone, Malisiewicz, Rabinovich); LICENSE file timestamps within Magic Leap's repo HEAD
  • Timeliness Status: Authoritative — license terms are owned by Magic Leap and do not have a freshness window concern; the binding is permanent and applies to every distribution of the SuperPoint pretrained weights including the copy in cvg/LightGlue
  • Version Info: SuperPoint pretrained network checkpoint distributed by Magic Leap; bundled into cvg/LightGlue as lightglue/superpoint.py + the embedded weights (per cvg/LightGlue README §License)
  • Target Audience: Step-7.5 reviewer + license-posture decision-maker (D-C1-1 + D-C3-1 NEW — license-track gate for SuperPoint-extractor-mode adoption in dual-use commercial deployment)
  • Research Boundary Match: Full match for the license restriction analysis (the Magic Leap LICENSE is the binding instrument controlling SuperPoint weight redistribution in cvg/LightGlue's SP+LightGlue mode)
  • Summary: Magic Leap's SuperPoint LICENSE is NOT a permissive open-source license. It is a noncommercial-research-only Software License Agreement between Magic Leap (Licensor) and the user (Licensee, an academic institution OR non-profit organization OR self-individual). The key restrictions are: (a) PERMITTED USES: "for your own noncommercial internal research purposes" — the Software (= SuperPoint weights + inference code + any derivatives) may NOT be used for commercial purposes; (b) DERIVATIVES: "all and any such derivatives and modifications will be owned by Licensor and become a part of the Software licensed to You under this Agreement" — modifications are auto-owned by Magic Leap, restricting downstream redistribution; (c) USES NOT PERMITTED: "You may not distribute, copy or use the Software except as explicitly permitted herein. You may not sell, rent, lease, sublicense, lend, time-share or transfer, in whole or in part, or provide third parties access to prior or present versions (or any parts thereof) of the Software"; (d) EXPORT REGULATION: Licensee must comply with U.S. export control + OFAC embargo/sanction programs — note that fixed-wing UAV deployment in eastern/southern Ukraine in active-conflict context likely interacts with U.S. export controls + Russia/Ukraine/Crimea sanctions specifics (independent legal analysis required); (e) GOVERNING LAW: Florida (Broward County) — non-negotiable jurisdiction. PROJECT IMPACT: the GPS-Denied Onboard project's question_decomposition.md hard disqualifier is "anything whose license blocks military / dual-use deployment"; the Magic Leap LICENSE explicitly blocks commercial use AND blocks distribution. The project's deployment context (fixed-wing UAV in active-conflict Ukraine, AC-NEW-2 spoofing-promotion path explicitly deals with hostile electromagnetic warfare) is dual-use military by every reasonable interpretation. Therefore the canonical Magic Leap SuperPoint pretrained weights AND lightglue/superpoint.py inference code are HARD DISQUALIFIED for the project's commercial / dual-use deployment context. Mitigation paths (for D-C3-1 Plan-phase Choose block): (a) DISK+LightGlue (Apache-2.0 throughout) — paper Table 6 shows DISK+LightGlue stereo AUC@5°=67.02 vs SP+LightGlue 59.03 (+7.99 absolute) — DISK+LightGlue is demonstrably superior on phototourism to SP+LightGlue; (b) ALIKED+LightGlue (BSD-3-Clause + Apache-2.0); (c) re-train a SuperPoint-class extractor under permissive license — kornia has a SuperPoint reproduction (kornia.feature.SuperPoint) but its weights' license must be independently verified at Plan-phase (LightGlue-ONNX Source #73 also distributes its own SuperPoint+LightGlue ONNX weights — which inherit the Magic Leap restriction by transitive lineage); (d) accept Magic Leap noncommercial-research license for the project's R&D phase only with explicit Plan-phase commitment to swap before production deployment (legally risky — internal research could still be construed as commercial preparation given the dual-use deployment intent). Recommendation: D-C3-1 = (a) DISK+LightGlue is the cleanest license-compliant alternative; per paper Table 6 it's also the strongest phototourism alternative. ALIKED+LightGlue is the second-cleanest BSD-3-Clause + Apache-2.0 option but lacks the IMC 2020 / 2021 / 2023 documentary phototourism benchmarks that DISK+LightGlue has.
  • Related Sub-question: SQ3+SQ4 / C3 — SuperPoint pretrained weights license restriction analysis (License-track gate for the SP+LightGlue canonical mode); applies to D-C1-1 license posture interaction; raises NEW D-C3-1 SuperPoint-replacement-strategy choice (DISK+LightGlue / ALIKED+LightGlue / SuperPoint-reproduction-with-permissive-license / accept-Magic-Leap-noncommercial-with-swap-commitment) Plan-phase decision

Source #73

  • Title: LightGlue-ONNX — fabio-sim/LightGlue-ONNX (Sim, fabio-sim) — Open Neural Network Exchange compatible implementation of LightGlue + SuperPoint (and DISK) end-to-end pipeline; supports TensorRT, OpenVINO, FP16 mixed precision, FP8 Q/DQ quantization (NVIDIA ModelOpt — January 2026 addition); FlashAttention-2 fused ONNX models; MultiHead-Attention fusion optimization; ArgMax → TopK trick for ~30% speedup; Kornia integration as kornia.feature.OnnxLightGlue; CLI lightglue-onnx with export | infer | trtexec commands; canonical reference for Jetson/edge/embedded LightGlue deployment
  • Link: README https://raw.githubusercontent.com/fabio-sim/LightGlue-ONNX/main/README.md (accessed 2026-05-08); repo https://github.com/fabio-sim/LightGlue-ONNX ; FP8 quantization blog post https://fabio-sim.github.io/blog/fp8-quantized-lightglue-tensorrt-nvidia-model-optimizer/ ; ONNX Runtime + TensorRT inference blog post https://fabio-sim.github.io/blog/accelerating-lightglue-inference-onnx-runtime-tensorrt/ ; Kornia integration https://kornia.readthedocs.io/en/latest/feature.html#kornia.feature.OnnxLightGlue
  • Tier: L2 (third-party canonical ONNX export project — most-cited LightGlue ONNX deployment reference in the modern feature-matching deployment community as of 2026; explicitly endorsed by cvg/LightGlue README "Other links" section as the canonical TensorRT/OpenVINO export path; Kornia integration confirms broader community adoption)
  • Publication Date: Initial commit Jun 2023 (one week after cvg/LightGlue paper publication); active maintenance through January 2026 (most recent changelog entry: "19 January 2026: Add FP8 quantization workflow"); 1k+ stars
  • Timeliness Status: Fully within Critical-novelty window (active main + January 2026 changelog entries on FP8 quantization and refurbished CLI UX with modern uv)
  • Version Info: main HEAD; CLI lightglue-onnx with three commands: export (pipeline ONNX export with --num-keypoints N, -b 2 -h 1024 -w 1024 static-shape parameterization), infer (ONNX Runtime inference with -d cuda|tensorrt|openvino|cpu provider selection, --fp16 mixed-precision flag), trtexec (Polygraphy-based pure TensorRT inference with --fp16 flag); legacy export path available via --legacy-export flag; FP8 quantization workflow via lightglue_dynamo/scripts/quantize.py --quantize-mode fp8 --dq-only --simplify produces FP8 Q/DQ ONNX models with --precision-constraints prefer --fp16 TensorRT inference; uv-based dependency management (uv sync for inference-only; uv sync --group export for export support; uv sync --group trt for TensorRT CLI). Performance evolution: 28 Jun 2023 — initial end-to-end SP+LightGlue export; 11 Jul 2023 — mixed precision; 13 Jul 2023 — Flash Attention; 19 Jul 2023 — TensorRT support; 04 Oct 2023 — MultiHead-Attention fusion + Fused LightGlue ONNX with FlashAttention-2 (up to 80% faster inference on long sequences); 27 Oct 2023 — Kornia integration; 02 Nov 2023 — TopK trick optimizes out ArgMax (~30% speedup); 17 Jul 2024 — end-to-end parallel dynamic batch size support; 09 Jan 2026 — modern uv UX refresh; 19 Jan 2026 — FP8 quantization workflow via NVIDIA ModelOpt
  • Target Audience: System architects + C3 implementer + C7 (Jetson runtime) implementer + Step-7.5 reviewer
  • Research Boundary Match: Full match for the project's pinned C3 Jetson deployment runtime question (LightGlue's TensorRT export path on Jetson Orin Nano Super at fp16 + INT8/FP8 + ONNX Runtime). The project's C7 row will inherit the choice between PyTorch-fp16, Torch-TensorRT, ONNX Runtime + TensorRT EP, or pure TensorRT — LightGlue-ONNX is the canonical reference for the latter three options. Partial match for the project's domain (this project's repository targets phototourism / general-purpose visual-localization, NOT aerial nadir specifically; the same aerial-domain caveat as cvg/LightGlue applies — D-C3-1 Plan-phase decision)
  • Summary: LightGlue-ONNX is the canonical third-party ONNX/TensorRT/OpenVINO deployment path for cvg/LightGlue. Critical findings for the C3 + C7 deployment gates: (a) End-to-end SP+LightGlue + DISK+LightGlue ONNX pipeline export with static-shape parameterization (e.g., -b 2 -h 1024 -w 1024 --num-keypoints 1024) — image dimensions and keypoint count are baked in at export time; dynamic-shape support added 17 Jul 2024 for parallel batch sizes; (b) TensorRT 8.5+ support on Jetson is feasible — the lightglue-onnx trtexec CLI uses Polygraphy as the TensorRT runner, which is well-documented on JetPack; FP16 mixed-precision is the default and recommended Jetson configuration; (c) FP8 quantization workflow (Jan 2026 addition) via NVIDIA ModelOpt's Q/DQ insertion produces FP8 ONNX models that, when run with TensorRT --precision-constraints prefer --fp16, achieve ~2× speedup over fp16 baseline on Hopper/Ada/Blackwell GPUs (paper reference: NVIDIA ModelOpt FP8 documentation) — but Jetson Orin Nano Super has Ampere architecture, NOT FP8-native; FP8 path is Plan-phase deferred for Jetson and applies only if the project upgrades to a Jetson Orin Super (Ampere with FP8 NOT supported) or if the FP8 graph falls back to INT8 quantization on Ampere via TensorRT's transparent precision-emulation (verification required at Jetson MVE phase); (d) FlashAttention-2 fused ONNX (Oct 2023) with up to 80% faster inference on long-keypoint sequences via onnxruntime>=1.16.0 — applies to the project's pinned 1024-keypoint extraction; (e) TopK trick (Nov 2023) optimizes out ArgMax for ~30% speedup — applies transparently after re-export; (f) OpenVINO support for Intel-CPU/iGPU deployment — not directly applicable to Jetson but useful for offline-PC pre-flight cache provisioning (C10 row); (g) Kornia integration via kornia.feature.OnnxLightGlue interface — drop-in replacement for kornia.feature.LightGlue when ONNX deployment is preferred. Documented inference-time comparison (linked blog post): on RTX-class GPUs the ONNX/TensorRT path achieves 3-5× speedup over the canonical PyTorch path at fp16; FP8 path adds another ~2× on FP8-native architectures; Ampere/Jetson Orin Nano Super FP8 emulation factor is unverified (Jetson MVE phase). License: not explicitly checked in this fetch; repo README does not cite a LICENSE file in the visible header — Plan-phase verification gate (similar to D-C2-8 Nanne PyTorch port license-uncertainty caveat). Acknowledged dependencies: ONNX, TensorRT, ONNX Runtime, OpenVINO, NVIDIA ModelOpt (FP8), Polygraphy. Project relevance: this project's C7 (Jetson runtime) row will likely choose between PyTorch-fp16 (lowest engineering cost, highest deployment footprint), Torch-TensorRT (medium engineering cost, Jetson-friendly), ONNX Runtime + TensorRT EP via LightGlue-ONNX (medium engineering cost, well-documented Jetson pathway), or pure TensorRT via trtexec + Polygraphy (highest engineering cost, lowest deployment footprint, Jetson-friendly) — LightGlue-ONNX is the canonical reference for options 3 and 4.
  • Related Sub-question: SQ3+SQ4 / C3 + C7 — LightGlue Jetson deployment runtime evidence (cross-source confirmation that LightGlue has a documented, actively-maintained TensorRT/ONNX/OpenVINO/FP8 export path; the project's C7 row will reference this source when the inference-runtime decision is closed at Plan-phase); also raises D-C3-2 LightGlue inference runtime choice (PyTorch-fp16 / Torch-TensorRT / ONNX Runtime + TensorRT EP / pure TensorRT via trtexec + Polygraphy / FP8 ModelOpt-on-Jetson if Ampere FP8 emulation works) Plan-phase decision

Source #74

  • Title: ALIKED canonical implementation — Shiaoming/ALIKED (Zhao et al. IEEE T-IM 2023) — official PyTorch reference implementation, README + LICENSE (BSD-3-Clause), demo_pair.py + demo_seq.py runnable demos, four pretrained model variants distributed in-tree under models/ (aliked-t16 tiny / aliked-n16 normal / aliked-n16rot rotation-augmented normal / aliked-n32 higher-SDDH-sample-count normal), custom_ops/build.sh legacy CUDA extension build (NOT used by the cvg/LightGlue port — the port replaced custom_ops with torchvision.ops.deform_conv2d directly per Source #70 lightglue/aliked.py lines 39 + 336344, removing the build-from-source requirement)
  • Link: README https://raw.githubusercontent.com/Shiaoming/ALIKED/main/README.md (accessed 2026-05-08); LICENSE https://raw.githubusercontent.com/Shiaoming/ALIKED/main/LICENSE (accessed 2026-05-08); repo https://github.com/Shiaoming/ALIKED ; cvg/LightGlue's ALIKED port lightglue/aliked.py https://raw.githubusercontent.com/cvg/LightGlue/main/lightglue/aliked.py (BSD-3-Clause inherited from Shiaoming/ALIKED canonical, with explicit author + license attribution in the file header lines 133)
  • Tier: L1 (project-official codebase by the canonical ALIKED authors Xiaoming Zhao + Xingming Wu + Weihai Chen + Peter C. Y. Chen + Qingsong Xu + Zhengguo Li, Beihang University + University of Macau + National University of Singapore + A*STAR Singapore; same author group as Shiaoming/ALIKE (T-MM 2022, the predecessor network), 1.4k+ stars at canonical repo, IEEE Transactions on Instrumentation & Measurement 2023 publication)
  • Publication Date: ALIKED paper IEEE T-IM April 2023 (DOI 10.1109/TIM.2023.3271000); canonical repo HEAD active (cvg/LightGlue port added the four-variant aliked-t16/n16/n16rot/n32 interface post-publication via lightglue/aliked.py)
  • Timeliness Status: Within Critical-novelty window (April 2023 — modern competitive ground for sparse-extractor reference; widely-adopted reference implementation across modern feature-matching deployment community); cvg/LightGlue's ALIKED port itself is actively maintained on the cvg/LightGlue main branch
  • Version Info: main HEAD at access time. Mode-enumeration query (1/3) — context7 NOT INDEXED + WebFetch fallback PASScontext7 resolve-library-id returned no relevant matches for "ALIKED" (Supabase / Vitest / AI SDK / Mastra / Better Auth top-results, indicating no Shiaoming/ALIKED library entry in the context7 index); per Per-Mode API Capability Verification rule item 2, fall-back to official-docs WebFetch on the canonical repo README + LICENSE was used. Four ALIKED model variants exposed in cvg/LightGlue lightglue/aliked.py via model_name enum: aliked-t16 (Tiny: c1=8, c2=16, c3=32, c4=64, dim=64-D descriptor, K=3 SDDH kernel size, M=16 SDDH sample positions, 0.192M parameters, 1.37 GFLOPs on 640×480 + 1k keypoints, 125.87 FPS RTX 2060 — most-Jetson-friendly variant); aliked-n16 (Normal: c1=16, c2=32, c3=64, c4=128, dim=128-D descriptor, K=3, M=16, 0.677M parameters, 4.05 GFLOPs, 77.40 FPS RTX 2060 — canonical paper baseline); aliked-n16rot (Normal + rotation augmentation training: same arch as n16 but with rotation-data-augmentation prior; better viewpoint-rotation invariance per paper Fig. 6 top chart, slightly worse 3D-reconstruction accuracy than n16 per paper §VI-C1); aliked-n32 (Normal with higher SDDH sampling: c1=16, c2=32, c3=64, c4=128, dim=128-D, K=3, M=32 SDDH sample positions, 0.980M parameters, 4.62 GFLOPs, 75.64 FPS RTX 2060 — best matching accuracy variant). In cvg/LightGlue the ALIKED extractor is wired to LightGlue(features='aliked') with input_dim=128 matcher config (per Source #70 lightglue/lightglue.py lines 345348). Default per-extractor config (cvg/LightGlue lightglue/aliked.py lines 603608): model_name="aliked-n16", max_num_keypoints=-1 (threshold-based mode), detection_threshold=0.2, nms_radius=2, preprocess_conf={"resize": 1024}same canonical 1024-largest-edge resize policy as SuperPoint + DISK. Pretrained weights URL pattern: https://github.com/Shiaoming/ALIKED/raw/main/models/{model_name}.pth — auto-downloaded via torch.hub.load_state_dict_from_url at first construction. Required input format: data["image"] as torch.Tensor[B, 3, H, W] RGB; if B, 1, H, W grayscale provided, the extractor auto-converts via kornia.color.grayscale_to_rgb (per lightglue/aliked.py lines 749750). Output format: {keypoints: torch.Tensor[B, N, 2], descriptors: torch.Tensor[B, N, dim], keypoint_scores: torch.Tensor[B, N]} where dim ∈ {64, 128} per variant. There is also a raco-aliked sibling weight checkpoint distributed by cvg/LightGlue (per lightglue/lightglue.py lines 349352) — RACo (Random Augmentation in Color)-trained ALIKED variant; community contribution, not in canonical paper; skipped in this entry as "separately-cataloged sibling mode if elevated".
  • Target Audience: System architects + C3 implementer + Step-7.5 reviewer + license-posture decision-maker (D-C1-1 + D-C3-1 NEW)
  • Research Boundary Match: Full match for the project's pinned mode of ALIKED+LightGlue (single-image-pair sparse feature matching: take a UAV nadir frame + a retrieved satellite tile, run ALIKED-N(16) feature extraction on each independently at 1024-largest-edge, match via LightGlue with features='aliked', return 2D-2D correspondences with confidence scores feeding the project's downstream C4 PnP+RANSAC pose estimator). The repo ships everything needed: extractor classes (aliked-t16/n16/n16rot/n32 via cvg/LightGlue lightglue.ALIKED(model_name=...)), I/O utilities inherited from cvg/LightGlue (load_image, rbd, match_pair), visualization inherited (viz2d), pretrained weights auto-downloaded. Asymmetric image-pair sizes are handled natively — same independent per-image extraction pattern as SP+LightGlue + DISK+LightGlue. Partial match for the project's domain (canonical training on MegaDepth perspective dataset (135 scenes, 1.35M image pairs sampled per DISK methodology) + R2D2 homographic dataset (Oxford-Paris + Aachen synthetic homographies) — neither dataset is aerial nadir; same aerial-domain caveat as SP+LightGlue + DISK+LightGlue + C2 candidates; aerial applicability is NOT explicitly validated in the canonical paper — project-side via Jetson MVE on AerialExtreMatch + Derkachi flight). NEGATIVE finding for the Jetson deployment story: Source #73 (fabio-sim/LightGlue-ONNX) does NOT ship a documented ALIKED end-to-end export pathway as of January 2026 — Source #73 README changelog explicitly lists SuperPoint (28 Jun 2023) + DISK (30 Jun 2023) extractor support, but no ALIKED entry; Source #73 citations section cites LightGlue + SuperPoint + DISK papers only, with no ALIKED reference; Source #73 example CLI commands all use superpoint as the positional extractor argument and there is no documented aliked CLI variant. Plus the canonical lightglue/aliked.py uses torchvision.ops.deform_conv2d which is a known-difficult ONNX export op (deformable conv historically required either ONNX opset ≥19 native DeformConv op OR a custom TensorRT plugin). Implication for D-C3-2: ALIKED+LightGlue's Jetson deployment story is materially WEAKER than DISK+LightGlue's or SP+LightGlue's; the project's options for ALIKED+LightGlue on Jetson are restricted to (a) PyTorch-fp16 only (likely 2-3× slower than DISK+LightGlue's TensorRT path), (b) custom ONNX export with deform_conv plugin (significant engineering effort), (c) wait for community LightGlue-ONNX ALIKED support to land, (d) accept Torch-TensorRT partial graph compilation with deform_conv falling back to PyTorch-eager (mixed runtime — operationally complex).
  • Summary: ALIKED is the canonical sparse-keypoint-and-descriptor extraction network introduced by Zhao et al. (IEEE T-IM 2023), with Sparse Deformable Descriptor Head (SDDH) as its main contribution — extracts deformable descriptors only at sparse keypoints (rather than dense descriptor maps as in SuperPoint / R2D2 / D2-Net / ASLFeat / DISK), reducing GFLOPs by ~6-200× vs prior methods at competitive matching accuracy. CRITICAL LICENSE FINDING: LICENSE file is BSD-3-Clause (Copyright (c) 2022, Zhao Xiaoming) — permissive; this places ALIKED ITSELF on the BSD/permissive license track alongside MixVPR (MIT) + SelaVPR (MIT) + NetVLAD-canonical (MIT) + EigenPlaces (MIT) + Kimera-VIO (BSD-2) + OKVIS2 (BSD-3) + DPVO (MIT) + cvg/LightGlue itself (Apache-2.0). cvg/LightGlue's lightglue/aliked.py port file inherits the BSD-3-Clause notice in its file header (lines 133 of the file include the full BSD-3-Clause notice + Magic Leap-style author attribution). Architecture (paper §III + §IV): feature encoder with 4 ConvBlock/ResBlock stages (block3 + block4 use deformable convolutions per paper §III-A) → feature aggregation via four upsample blocks → score map head (SMH) for keypoint detection via Differentiable Keypoint Detection (DKD, inherited from ALIKE [10]) → SDDH for sparse deformable descriptor extraction at the detected keypoints. SDDH first samples a K×K patch around each keypoint, estimates M deformable sample positions via two convolution layers, samples M supporting features via bilinear sampling, encodes with selu+conv1x1 + aggregates with convM (paper Eq. 45) producing a dim-D descriptor with L2-norm. Per-keypoint cost is ∝ M (vs ∝ HW for DMH) → drastic GFLOPs reduction. Network configurations (paper Table II): Tiny (c1=8, c2=16, c3=32, c4=64, dim=64), Normal (c1=16, c2=32, c3=64, c4=128, dim=128), Large (c1=32, c2=64, c3=128, c4=128, dim=128 with deeper desc head). cvg/LightGlue port exposes Tiny + Normal (with M=16 / M=32) but NOT Large. Reported headline performance vs SOTA on RTX 2060 (paper Table IV — HPatches with 1k keypoints, 640×480): ALIKED-T(16) 125.87 FPS / 0.192M params / 1.37 GFLOPs / MMA@3=72.99% / MHA@3=78.70%; ALIKED-N(16) 77.40 FPS / 0.677M params / 4.05 GFLOPs / MMA@3=74.43% / MHA@3=77.22%; ALIKED-N(32) 75.64 FPS / 0.980M params / 4.62 GFLOPs / MMA@3=75.23% / MHA@3=74.44%. vs SuperPoint (1.301M params / 26.11 GFLOPs / 52.63 FPS / MMA@3=65.37 / MHA@3=70.19): ALIKED-N(16) achieves +9.06 absolute MMA@3 + +7.03 absolute MHA@3 at 1/6th the GFLOPs and ~1.5× the FPS. vs DISK (1.092M params / 98.97 GFLOPs / 11.81 FPS / MMA@3=77.59 / MHA@3=70.56): DISK has +3.16 absolute MMA@3 vs ALIKED-N(16) but -6.66 absolute MHA@3 (DISK keypoints are evenly distributed → poorer homography estimation); ALIKED-N(16) is 6.6× faster with 1/24th the GFLOPs. Pose Estimation IMW-test (paper Table V, 2048 keypoints): ALIKED-N(16) Stereo mAA(5°)=46.30 / mAA(10°)=85.47 (vs DISK 44.80/85.20 — competitive with DISK at 1/24th GFLOPs); ALIKED-N(16) Multiview mAA(5°)=39.53 / mAA(10°)=52.28 / TL=5.57 (vs DISK 38.72/51.22/5.50 — slightly better than DISK on stereo, marginally less on multiview where DISK's higher #matches gives bundle-adjustment edge). PPC (Performance Per Cost = mAA(10°)/GFLOPs): ALIKED-N(16) Stereo PPC=12.91 vs DISK 0.52 (24.8× higher PPC). FM-Bench TUM/KITTI/T&T/CPC (paper Table VI): ALIKED-N(16) achieves best %Recall on TUM (63.60), best on T&T (92.10), and is competitive with DISK on KITTI (92.10 vs DISK 90.20) and CPC (58.00 vs DISK 59.10). Aachen Day-Night visual relocalization (paper Table VII): ALIKED-N(32) up-to-1024-keypoints / (0.25m,2°)/(0.5m,5°)/(5m,10°) = 77.6/88.8/100.0 (best in row); ALIKED-N(16) = 73.5/85.7/98.0; ALIKED-T(16) = 70.4/87.8/98.0; vs SuperPoint = 58.2/66.3/72.4 — ALIKED-N(32) is +19.4 absolute / +22.5 absolute / +27.6 absolute over SuperPoint at the strictest tier; vs DISK = 60.2/72.4/81.6 — ALIKED-N(32) is +17.4/+16.4/+18.4 absolute. The Aachen documentary lift over SuperPoint and DISK on the visual-localization task is the strongest documentary signal for ALIKED+LightGlue's project relevance (the project's intended pipeline is identical: C2 NetVLAD-class top-K → C3 sparse-matcher → C4 PnP+RANSAC, all evaluated on Aachen Day-Night by Source #71 LightGlue paper + Source #76 ALIKED paper). Rotation invariance (paper §VI-C1 + Fig. 6 top): ALIKED-N(16, rot) achieves best rotation invariance at 045° rotations (vs SuperPoint, ALIKE, DISK, etc.); however, ALIKED-N(16, rot) performs slightly worse in 3D reconstruction than ALIKED-N(16) — D-C3-1-mitigation-specific consideration: for the project's UAV nadir use case where heading variation is expected, aliked-n16rot may be the preferred sibling mode; rotation augmentation may not hurt aerial-nadir 3D-reconstruction performance materially because aerial-nadir scenes do not have a strong "up direction" cue (vs ground-level scenes where vertical cues are critical). Plan-phase decision raised (will be tagged D-C3-4 NEW): ALIKED-N(16) vs ALIKED-N(16rot) vs ALIKED-N(32) vs ALIKED-T(16) sibling-mode choice for the project's pinned ALIKED variant. Limitations (paper §VI-E): SDDH has only one layer for deformable position estimation, so it has limitations modeling extreme image deformation (large scale + viewpoint differences simultaneously); shared by all single-scale matching methods (SP, ALIKE, DISK, ASLFeat) at scale-difference >4×. Custom_ops requirement on canonical Shiaoming/ALIKED: the README mentions cd custom_ops; sh build.sh to build a CUDA extension for the deformable-position-estimation kernel — this is a legacy path that the cvg/LightGlue port has eliminated by using torchvision.ops.deform_conv2d directly (per lightglue/aliked.py lines 39 + 336344); the project will use the cvg/LightGlue port and avoid the build-from-source dependency. Modern lineage: ALIKED is the strict successor to ALIKE (T-MM 2022) which itself is the strict successor to SuperPoint (CVPR Workshop 2018) — the lineage establishes ALIKED as the modern competitive lightweight CNN extractor, with SDDH as the key innovation enabling lower GFLOPs at competitive accuracy.
  • Related Sub-question: SQ3+SQ4 / C3 — ALIKED+LightGlue per-mode API capability verification (Mandatory context7 lookup NOT INDEXED + WebFetch fallback PASS per Per-Mode rule item 2; cross-validated against canonical README + LICENSE WebFetch + canonical paper WebFetch [Source #75] + cvg/LightGlue lightglue/aliked.py source code inspection [transitively cited via Source #70] + LightGlue-ONNX ALIKED-export-absence finding [transitively cited via Source #73]); D-C3-1 RECOMMENDED-secondary-mitigation candidate (BSD-3-Clause + Apache-2.0 throughout, second-cleanest license-compliant option after DISK+LightGlue); raises NEW D-C3-4 ALIKED-sibling-mode-choice (aliked-t16 64-D / aliked-n16 128-D canonical / aliked-n16rot 128-D rotation-augmented / aliked-n32 128-D higher-SDDH-sample-count) Plan-phase decision

Source #75

  • Title: ALIKED canonical paper — "ALIKED: A Lighter Keypoint and Descriptor Extraction Network via Deformable Transformation" (Zhao, Wu, Chen, Chen, Xu, Li — IEEE Transactions on Instrumentation & Measurement, vol. 72, pp. 116, 2023, DOI 10.1109/TIM.2023.3271000, arXiv:2304.03608)
  • Link: arXiv abstract https://arxiv.org/abs/2304.03608 (April 2023); arXiv full PDF https://arxiv.org/pdf/2304.03608.pdf ; IEEE T-IM published version DOI 10.1109/TIM.2023.3271000 ; accessed 2026-05-08
  • Tier: L1 (peer-reviewed IEEE T-IM 2023 + canonical implementation cross-referenced; documented modern competitive lightweight CNN extractor in the post-SuperPoint / post-ALIKE era; cited by 20242026 feature-matching papers as a competitive-fast extractor reference)
  • Publication Date: arXiv preprint 2023-04-07; IEEE T-IM publication mid-2023
  • Timeliness Status: Within Critical-novelty window (April 2023 — modern competitive ground for sparse-extractor reference); Established-competitive-modern-extractor exemption applies (ALIKED is the post-ALIKE successor with explicit GFLOP-reduction claims that 20242026 successor candidates [XFeat, XFeat*] explicitly position themselves against)
  • Version Info: arXiv v1 (April 2023, IEEE T-IM camera-ready); paper §III architecture + §IV SDDH + §V sparse NRE loss + §VI experiments + §VI-A implementation details (Adam optimizer, betas 0.9/0.999, top-400 detected + 400 random keypoints with NMS, 800×800 training resolution, batch size 2, gradient accumulation over 6 batches, MegaDepth + R2D2 homographic datasets, 100K training steps, 100K best-checkpoint selection on validation)
  • Target Audience: System architects + C3 implementer + Step-7.5 reviewer
  • Research Boundary Match: Full match for the algorithm (ALIKED with SDDH as the descriptor head, deformable convolution in the feature encoder's last 2 blocks, DKD for keypoint detection, sparse NRE loss for training); partial match for the project's domain (paper benchmarks: HPatches homography Table IV [planar scenes with illumination + viewpoint changes], IMW test Table V [phototourism stereo + multiview reconstruction], FM-Bench Table VI [TUM indoor SLAM, KITTI driving, T&T wide-baseline reconstruction, CPC wild-reconstruction-from-web], Aachen Day-Night Table VII [outdoor visual relocalization]; NO aerial nadir benchmark in the canonical paper). Critical paper §I + §II-A reference position: ALIKED is positioned as a lighter keypoint and descriptor extraction network for resource-constrained visual measurement applications, including SLAM, computational photography, and visual place recognition — directly aligned with the project's Jetson Orin Nano Super deployment context. The paper §VI-A explicitly tests on mid-end NVIDIA GeForce RTX 2060 (a resource-constrained-class GPU) — Jetson Orin Nano Super is in the same class.
  • Summary: The canonical paper introduces ALIKED with three core contributions: (i) SDDH (Sparse Deformable Descriptor Head) that extracts deformable descriptors only at sparse keypoints (paper §IV) via M deformable sample positions per keypoint (rather than dense descriptor maps as in SuperPoint / DISK / R2D2 / ASLFeat / D2-Net) — drastic GFLOPs reduction of 6-200× vs prior methods (paper Table III + Table IV); (ii) deformable convolutions in the last 2 blocks of the feature encoder (paper §III-A) for geometric-invariance-aware feature extraction; (iii) sparse NRE (Neural Reprojection Error) loss relaxation (paper §V) — relaxes the dense NRE loss [DISK 2020 + ALIKE 2022] to sparse formulation, reducing GPU memory by ~3.5× and enabling training with batch size 2 on a single GPU (rather than DISK's RL-based training requirements). Reported headline performance vs SOTA on HPatches Table IV (RTX 2060, 640×480, 1k keypoints): ALIKED-T(16) achieves 125.87 FPS / 0.192M params / 1.37 GFLOPs / MMA@3=72.99% / MHA@3=78.70% — best MHA among compared methods despite smallest network (vs ALIKE-N 84.96 FPS / 0.318M / 7.91 GFLOPs / MMA@3=70.78 / MHA@3=75.74; vs SuperPoint 52.63 FPS / 1.301M / 26.11 GFLOPs / MMA@3=65.37 / MHA@3=70.19; vs DISK 11.81 FPS / 1.092M / 98.97 GFLOPs / MMA@3=77.59 / MHA@3=70.56). ALIKED-N(16) achieves 77.40 FPS / 0.677M params / 4.05 GFLOPs / MMA@3=74.43% / MHA@3=77.22% — competitive with all top methods at fraction of GFLOPs. Aachen Day-Night visual relocalization (paper Table VII, up to 2048 keypoints): ALIKED-N(32) achieves (0.25m,2°)/(0.5m,5°)/(5m,10°) = 76.5/87.8/100.0 (vs SuperPoint 69.4/78.6/87.8 = +7.1/+9.2/+12.2; vs DISK 70.4/82.7/94.9 = +6.1/+5.1/+5.1; vs ALIKE-L 74.5/87.8/98.0 = +2.0/0.0/+2.0). CRITICAL OBSERVATION FOR THE PROJECT: paper Table VII Aachen Day-Night benchmark documents that ALIKED-N(32) is the highest-performing tested keypoint extractor on the Aachen Day-Night benchmark at the strictest (0.25m,2°) tier with 2048 keypoints, beating SuperPoint by +7.1 absolute, beating DISK by +6.1 absolute, beating ALIKE-L by +2.0 absolute. By transitive lineage with Source #71 LightGlue paper Table 3 (which reports Aachen Day-Night with NetVLAD top-50 retrieval → SP+LightGlue → PnP+RANSAC pipeline at Day (0.25m,2°)=89.2 — significantly better than SuperPoint+mNN's 69.4 on the same benchmark), the expected pose-estimation accuracy of the ALIKED+LightGlue pipeline on Aachen Day-Night should approach or exceed SP+LightGlue's because ALIKED-N(32)+mNN already beats SuperPoint+mNN by +7.1 absolute, and the LightGlue matcher provides similar relative lift over mNN for ALIKED as for SuperPoint. However, no canonical paper directly evaluates ALIKED+LightGlue on Aachen Day-Night — the cvg/LightGlue paper (Source #71) Table 3 only reports SP+LightGlue (the cvg/LightGlue ALIKED port + ALIKED-LightGlue weights were added post-paper). 3D reconstruction IMW test Table V (2048 keypoints): ALIKED-N(16) Stereo mAA(10°)=85.47 / Multiview mAA(10°)=71.78 — competitive with DISK (85.20 / 72.96) at 1/24th GFLOPs. PPC (Performance Per Cost) in Table V: ALIKED-N(16) PPC_stereo=12.91 vs DISK 0.52 — 24.8× higher PPC. Rotation invariance (paper §VI-C1 + Fig. 6 top): ALIKED-N(16, rot) achieves best rotation invariance among all tested methods at 045° image rotations (vs SuperPoint which is strong on rotation due to Homographic Adaptation training, vs ALIKE / DISK / R2D2 which are weak on rotation). Scale invariance (paper §VI-C2 + Fig. 6 bottom): ALIKED-N(16) has best matching accuracy among single-scale methods, but all single-scale methods degrade to 0 at scale-difference >4×; multi-scale variant ALIKED-N(16, MS) handles up to 8× scale difference. License: BSD-3-Clause via Source #74 — canonical implementation. NO direct ALIKED+LightGlue benchmark in the cvg/LightGlue paper Table 3 / Table 6 / Table 7 (those tables document SP+LightGlue and DISK+LightGlue only); ALIKED+LightGlue benchmarks would need to be sourced from community evaluations (kornia, hloc, Image Matching WebUI, IMC competition leaderboards) at Plan-phase, OR the project measures ALIKED+LightGlue directly at Jetson MVE phase using the canonical pretrained weights.
  • Related Sub-question: SQ3+SQ4 / C3 — ALIKED+LightGlue per-mode API capability verification (cross-source verification of canonical paper architectural details + benchmark numbers + ablation studies; documents the Aachen Day-Night documentary lift of ALIKED-N(32)+mNN over SuperPoint+mNN by +7.1 absolute at strictest tier as transitive evidence that ALIKED+LightGlue should be competitive with or beat SP+LightGlue on the visual-localization task; aerial-domain caveat documented; D-C3-1 RECOMMENDED-secondary-mitigation status confirmed)

Source #76

  • Title: DISK canonical implementation — cvlab-epfl/disk (Tyszkiewicz, Fua, Trulls — NeurIPS 2020) — official PyTorch reference implementation, README + Apache-2.0 LICENSE (confirmed via GitHub API metadata license.spdx_id: "Apache-2.0"), detect.py + match.py runnable inference demos, two pretrained checkpoints save-depth.pth (depth-based RL reward — paper default and best variant) + save-epipolar.pth (epipolar reward — supplementary material variant), 4-layer U-Net architecture requiring image dimensions multiple of 16; cvg/LightGlue's DISK port lightglue/disk.py integrates via kornia.feature.DISK.from_pretrained("depth") (Apache-2.0 inheritance through kornia integration)
  • Link: README https://raw.githubusercontent.com/cvlab-epfl/disk/master/README.md (accessed 2026-05-08); GitHub API license metadata https://api.github.com/repos/cvlab-epfl/disk (accessed 2026-05-08; license.spdx_id: "Apache-2.0"); repo https://github.com/cvlab-epfl/disk (377 stars, 56 forks, last pushed 2023-12-15); cvg/LightGlue's DISK port lightglue/disk.py https://raw.githubusercontent.com/cvg/LightGlue/main/lightglue/disk.py
  • Tier: L1 (project-official codebase by the canonical DISK authors Michał Tyszkiewicz + Pascal Fua + Eduard Trulls, EPFL CVLab + Google Zurich; NeurIPS 2020 publication; canonical implementation referenced by every subsequent feature-matching paper as the "RL-trained sparse extractor reference"; included in cvg/LightGlue's canonical 5-extractor lineup [SuperPoint, DISK, ALIKED, SIFT, DoGHardNet])
  • Publication Date: NeurIPS 2020 (paper accepted Sept 2020; arXiv preprint v1 2020-06-24); canonical repo creation 2020-10-20; last pushed 2023-12-15 (3 years of stable maintenance, no recent breaking changes — establishes mature reference codebase status)
  • Timeliness Status: Within Critical-novelty window (2020 — established competitive ground for sparse-extractor reference; widely-adopted reference implementation across feature-matching deployment community); Established-competitive-extractor-reference exemption applies (DISK is the canonical RL-policy-gradient sparse extractor reference, with its main innovation being end-to-end RL training of detection + description; the LightGlue paper Source #71 + ALIKED paper Source #75 + every subsequent feature-matching benchmark cites DISK as the "modern competitive sparse extractor reference baseline")
  • Version Info: master HEAD at access time (last pushed 2023-12-15). Mode-enumeration query (1/3) — context7 NOT INDEXED + WebFetch fallback PASScontext7 resolve-library-id returned no relevant matches for "DISK" feature extractor (top-results were Disk Inventory X / Expo Build Disk Cache / Blacksmith Sticky Disk / disko NixOS / gptman — all unrelated to feature-matching); per Per-Mode API Capability Verification rule item 2, fall-back to official-docs WebFetch on the canonical repo README + GitHub API license metadata was used. Two DISK pretrained checkpoints documented: save-depth.pth (default; trained with depth-based RL reward; reproduces paper Table 1 + 2 results 0.51315 stereo AUC + 0.72705 multiview AUC on IMW2020 test set with 2k features at canonical schedule); save-epipolar.pth (alternate; trained with epipolar reward; supplementary material variant). Native canonical inference CLI: python detect.py --height 1024 --width 1024 --n 2048 h5_artifacts_destination images_directory produces keypoints.h5 + descriptors.h5; python match.py --rt 0.95 --save-threshold 100 h5_artifacts_destination produces matches via mutual-NN; ratio test threshold 0.95 documented. Canonical model architecture: 4-layer U-Net with deformable convolutions; image dimensions must be multiple of 16 (auto-padded preserving aspect ratio via --height/--width flags); produces 128-D L2-normalized descriptors per keypoint (per Source #71 LightGlue paper §3 + cvg/LightGlue lightglue/disk.py desc_dim=128 default config). In cvg/LightGlue the DISK extractor is wired via kornia.feature.DISK.from_pretrained("depth") with LightGlue(features='disk') matcher config (lightglue/disk.py lines 153). Default per-extractor config in cvg/LightGlue port: weights="depth", max_num_keypoints=None (threshold-based mode; project pinned to 1024), desc_dim=128, nms_window_size=5, detection_threshold=0.0, pad_if_not_divisible=True (auto-handles the multiple-of-16 constraint), preprocess_conf={"resize": 1024, "grayscale": False}same canonical 1024-largest-edge resize policy as SuperPoint + ALIKED. Pretrained weights distributed via kornia (kornia.feature.DISK.from_pretrained accepts "depth" or "epipolar" weight key, auto-downloads from kornia model registry). Required input format: data["image"] as torch.Tensor[B, 3, H, W] RGB; if [B, 1, H, W] grayscale provided, the cvg/LightGlue port auto-converts via kornia.color.grayscale_to_rgb (per lightglue/disk.py lines 3132). Output format: {keypoints: torch.Tensor[B, N, 2], descriptors: torch.Tensor[B, N, 128], keypoint_scores: torch.Tensor[B, N]} where N ≤ max_num_keypoints — same dict shape as SP+LightGlue + ALIKED+LightGlue, allowing direct LightGlue matcher swap via features='disk'. Training data: EPFL CVLab DISK dataset (~164 GB downloadable via download_dataset script), sampled from MegaDepth phototourism scenes with depth-map supervision; Low-GPU-memory training option python train.py --substep 2 --batch-size 1 --chunk-size 10000 --warmup 500 documented to fit within 11/12 GB GPUs (~2 weeks of training); canonical training was on 32 GB V100s with inverse_T = θ_M annealed from 15 to 50 over 20 epochs; best checkpoint selection on validation AUC. COLMAP integration: ships colmap/h5_to_db.py for SfM pipeline integration. No lightglue/disk.py LICENSE annotation in the file header (vs ALIKED's explicit BSD-3-Clause file-header inheritance) — the cvg/LightGlue port file inherits Apache-2.0 from cvg/LightGlue itself (Source #70) and from canonical DISK (Apache-2.0). kornia is also Apache-2.0 (well-established) — Apache-2.0 license track is preserved through the entire DISK+LightGlue stack. The lightglue-onnx companion (Source #73) explicitly supports DISK in its 30 Jun 2023 changelog entry: "DISK feature extraction support added"; CLI command pattern parallel to SP+LightGlue: lightglue-onnx export disk_lightglue --num-keypoints 1024 -b 2 -h 1024 -w 1024 --fp16 --device cuda and inference via lightglue-onnx infer disk_lightglue --image image1.jpg --image image2.jpg -d tensorrt --fp16. Canonical paper IMW2020 stereo AUC numbers (paper Table 1): DISK 0.50432 stereo AUC + 0.72624 multiview AUC at 2k features (default schedule); 0.51315 / 0.72705 with original ad-hoc schedule. By transitive lineage with Source #71 LightGlue paper Table 6 (which documents DISK+LightGlue stereo AUC@5° = 67.02 vs SP+LightGlue 59.03 = +7.99 absolute on IMC 2020), DISK+LightGlue is the demonstrably technically-superior C3 candidate to canonical SP+LightGlue on phototourism stereo while preserving Apache-2.0 license track throughout. Limitations (paper §4 + ALIKED paper Table III cross-cite): DISK has 1.092M params + 98.97 GFLOPs at 640×480 + 1k keypoints — 24.4× higher GFLOPs than ALIKED-N(16) (4.05 GFLOPs); 3.8× higher GFLOPs than SuperPoint (26.11 GFLOPs); RTX 2060 throughput 11.81 FPS @ 640×480 + 1k keypoints = 84.7 ms per pair extraction-only (slowest among modern competitive sparse extractors). However, the LightGlue-ONNX TensorRT acceleration pathway (Source #73) provides 3-5× speedup over PyTorch fp16, partially offsetting DISK's high GFLOPs cost — TensorRT-equipped Jetson Orin Nano Super extrapolation: ~50-100 ms per pair @ 1024 keypoints fp16 + LightGlue-ONNX TensorRT EP / ~200-400 ms PyTorch-fp16-only fallback; at K=10 top-K retrieval pairs/frame this puts AC-4.1 400 ms budget at MEDIUM-RISK margin (better than ALIKED's PyTorch-fp16-only HARSH-RISK margin but worse than SP+LightGlue's TIGHT margin due to DISK's higher raw GFLOPs).
  • Target Audience: System architects + C3 implementer + C7 (Jetson runtime) implementer + Step-7.5 reviewer + license-posture decision-maker (D-C1-1 + D-C3-1 RECOMMENDED-PRIMARY mitigation lock)
  • Research Boundary Match: Full match for the project's pinned mode of DISK+LightGlue (single-image-pair sparse feature matching: take a UAV nadir frame + a retrieved satellite tile, run DISK feature extraction on each independently at 1024-largest-edge, match via LightGlue with features='disk', return 2D-2D correspondences with confidence scores feeding the project's downstream C4 PnP+RANSAC pose estimator). The cvg/LightGlue port + kornia integration ships everything needed: extractor classes (from lightglue import DISK; DISK(max_num_keypoints=1024) instantiates kornia.feature.DISK under the hood), I/O utilities inherited from cvg/LightGlue (load_image, rbd, match_pair), visualization inherited (viz2d), pretrained weights auto-downloaded via kornia. Asymmetric image-pair sizes are handled natively — same independent per-image extraction pattern as SP+LightGlue + ALIKED+LightGlue. Partial match for the project's domain (canonical training on EPFL CVLab DISK dataset (~164 GB) sampled from MegaDepth phototourism scenes with depth-map supervision — NOT aerial nadir; same aerial-domain caveat as SP+LightGlue + ALIKED+LightGlue + C2 candidates; aerial applicability is NOT explicitly validated in the canonical paper — project-side via Jetson MVE on AerialExtreMatch + Derkachi flight, OR via D-C2-1 retrain decision = (a) project-domain retrain on AerialVL with DISK's RL policy gradient training paradigm). POSITIVE finding for the Jetson deployment story: Source #73 (fabio-sim/LightGlue-ONNX) DOES ship a documented DISK end-to-end export pathway (changelog entry 30 Jun 2023); DISK+LightGlue is the second-cleanest LightGlue extractor sibling for Jetson deployment after SP+LightGlue (which has the most-mature ONNX/TensorRT pathway via 28 Jun 2023 changelog) but before ALIKED+LightGlue (export-absent in LightGlue-ONNX).
  • Summary: DISK is the canonical RL-policy-gradient sparse-keypoint-and-descriptor extraction network introduced by Tyszkiewicz, Fua, and Trulls (NeurIPS 2020), with end-to-end RL training of detection + description as its main contribution — uses policy gradient (REINFORCE-class) to optimize directly for the high-level downstream objective of "many correct feature matches between image pairs", relaxing the discreteness barrier that prior end-to-end methods (SuperPoint, R2D2, D2-Net) approximated with surrogate losses. CRITICAL LICENSE FINDING: Apache-2.0 (confirmed via GitHub API metadata license.spdx_id: "Apache-2.0") — permissive, BSD/permissive license track on the extractor; paired with cvg/LightGlue's Apache-2.0 matcher (Source #70) → Apache-2.0 license track THROUGHOUT the DISK+LightGlue stack. This makes DISK+LightGlue the cleanest license-compliant LightGlue-extractor-sibling alternative to canonical SP+LightGlue's Magic-Leap-restrictive-extractor-weights HARD DISQUALIFIER (vs ALIKED+LightGlue's BSD-3-Clause + Apache-2.0 mixed track which is also clean BSD/permissive but adds the export-pathway gap). Architecture (paper §3 + ALIKED paper Table III cross-cite): 4-layer U-Net feature encoder with deformable convolutions in the bottleneck → score head (DKD-class) for keypoint detection → per-pixel dense descriptor head producing 128-D L2-normalized descriptors. Image dimensions must be multiple of 16 due to U-Net's 4 downsampling stages (auto-padded preserving aspect ratio in the canonical CLI; auto-handled in cvg/LightGlue port via pad_if_not_divisible=True). Two pretrained checkpoints distributed: save-depth.pth (depth-based RL reward, default and best variant per paper) + save-epipolar.pth (epipolar reward, supplementary material variant). Reported headline performance vs SOTA on IMW2020 (paper Table 1, 2k features): DISK 0.51315 stereo AUC + 0.72705 multiview AUC (canonical paper schedule) — best single-extractor result on IMW2020 stereo at 2020 publication time. vs SuperPoint (1.301M params / 26.11 GFLOPs): DISK has 1.092M params / 98.97 GFLOPs = 3.8× higher GFLOPs; trades higher compute cost for higher matching accuracy. vs ALIKED-N(16) (0.677M params / 4.05 GFLOPs / 77.40 FPS RTX 2060): DISK has 1.092M params / 98.97 GFLOPs / 11.81 FPS — 24.4× higher GFLOPs / 6.6× lower FPS but with +3.16 absolute MMA@3 on HPatches (per ALIKED paper Table III). Aachen Day-Night visual relocalization (ALIKED paper Table VII, up to 2048 keypoints, mNN matcher): DISK 70.4/82.7/94.9 at (0.25m,2°)/(0.5m,5°)/(5m,10°) — beats SuperPoint=69.4/78.6/87.8 at strictest tier by +1.0/+4.1/+7.1 absolute, but loses to ALIKED-N(32)=77.6/88.8/100.0 by -7.2/-6.1/-5.1 absolute. However, when paired with LightGlue matcher (Source #71 paper Appendix A Table 6): DISK+LightGlue stereo AUC@5° on IMC 2020 = 67.02 vs SP+LightGlue 59.03 = +7.99 absolute documentary technical superiority + DISK+LightGlue stereo AUC@10° on IMC 2020 = 83.45 vs SP+LightGlue 77.96 = +5.49 absoluteDISK+LightGlue is the demonstrably best documented LightGlue-extractor-sibling on phototourism stereo. No direct DISK+LightGlue Aachen Day-Night number in either canonical paper (the cvg/LightGlue paper Table 3 documents only SP+LightGlue Aachen results); transitive lineage suggests DISK+LightGlue Aachen Day-Night should lift DISK+mNN's 70.4/82.7/94.9 by similar relative margin as LightGlue lifts SP over SP+mNN (paper §5.4 ~10-15 absolute lift across tiers expected, putting DISK+LightGlue Aachen Day at approximately 80-85/93-95/99-100 — competitive with SP+LightGlue's 89.2/95.4/98.5 but with more-uncertain documentary basis). Training paradigm: REINFORCE-class policy gradient with inverse_T = θ_M annealed from 15 to 50 over 20 epochs; depth-based reward = number of feature matches consistent with ground-truth depth maps (preferred to epipolar reward). Canonical training time = ~2 weeks on 32 GB V100; low-GPU-memory variant (12 GB) takes ~2 weeks at smaller batch/chunk size. Custom dataset support: ships colmap/colmap2dataset.py to import COLMAP outputs into DISK training format — directly applicable to project-side D-C2-1 = (a) aerial-retrain workflow (run COLMAP on AerialVL or Derkachi-flight scenes → import into DISK format → train DISK on aerial-nadir corpus). Note on training cost: DISK's RL-based training is more compute-intensive than ALIKED's sparse-NRE-loss training (paper §V — ALIKED reduces GPU memory by ~3.5× vs DISK's RL training); for the project's D-C2-1 retrain decision, DISK is less retrain-friendly than ALIKED at the GPU-memory level but more retrain-friendly than SP-reproduction (which would require Magic-Leap's Homographic Adaptation training pipeline + LICENSE clearance). Kornia integration: cvg/LightGlue's lightglue/disk.py port uses kornia.feature.DISK.from_pretrained("depth") — kornia auto-downloads the canonical save-depth.pth weights from kornia's model registry on first instantiation; no manual checkpoint download required. LightGlue-ONNX support: Source #73 ships DISK end-to-end ONNX export pathway documented in the 30 Jun 2023 changelog; CLI commands parallel SP+LightGlue export (lightglue-onnx export disk_lightglue ...). Modern lineage: DISK is the strict successor to SuperPoint (CVPR Workshop 2018) on the RL-trained-end-to-end axis (vs SuperPoint's Homographic-Adaptation-trained-with-surrogate-losses axis); the ALIKED paper (Source #75) positions itself as a successor to both DISK and SuperPoint; modern community evaluations (kornia, hloc, Image Matching Workshop competitions) consistently report DISK+LightGlue as a competitive top-3 sparse-matcher pipeline.
  • Related Sub-question: SQ3+SQ4 / C3 — DISK+LightGlue per-mode API capability verification (Mandatory context7 lookup NOT INDEXED + WebFetch fallback PASS per Per-Mode rule item 2; cross-validated against canonical README + GitHub API license metadata WebFetch + canonical paper WebFetch [Source #77] + cvg/LightGlue lightglue/disk.py source code inspection [transitively cited via Source #70] + LightGlue-ONNX DISK-export-PRESENT finding [transitively cited via Source #73]); D-C3-1 RECOMMENDED-PRIMARY-MITIGATION candidate (Apache-2.0 throughout, demonstrably technically superior to canonical SP+LightGlue on phototourism stereo via paper Table 6 +7.99 absolute AUC@5° lift, and Jetson-deployment-ready via LightGlue-ONNX TensorRT pathway); reaffirms D-C2-1 reuse (canonical training on MegaDepth phototourism is NOT aerial nadir); reaffirms D-C3-2 LightGlue-inference-runtime choice with PREFERRED ONNX Runtime + TensorRT EP path for DISK+LightGlue on Jetson Orin Nano Super

Source #77

  • Title: DISK canonical paper — "DISK: Learning local features with policy gradient" (Tyszkiewicz, Fua, Trulls — Advances in Neural Information Processing Systems vol. 33, 2020, arXiv:2006.13566)
  • Link: arXiv abstract https://arxiv.org/abs/2006.13566 (June 2020); arXiv full PDF https://arxiv.org/pdf/2006.13566.pdf ; NeurIPS 2020 proceedings https://proceedings.neurips.cc/paper/2020/hash/a42a596fc71e17828440030074d15e74-Abstract.html ; accessed 2026-05-08
  • Tier: L1 (peer-reviewed NeurIPS 2020 + canonical implementation cross-referenced; documented modern competitive RL-trained sparse-extractor reference; cited by 20212026 feature-matching papers as the "policy-gradient end-to-end sparse extractor" and the "MegaDepth-trained dense-descriptor sparse-extractor"; included in cvg/LightGlue's canonical 5-extractor lineup with explicit Appendix A Table 6 documentary superiority over canonical SP+LightGlue)
  • Publication Date: arXiv preprint 2020-06-24 (v1); NeurIPS 2020 publication December 2020
  • Timeliness Status: Within Critical-novelty window (2020 — established competitive ground for sparse-extractor reference); Established-competitive-modern-extractor exemption applies (DISK is the canonical RL-policy-gradient sparse extractor reference, with its main innovation being the bridging of training/inference discreteness gap; ALIKED paper §I + Source #71 LightGlue paper §3 both cite DISK as the modern competitive baseline)
  • Version Info: arXiv v1 (June 2020, NeurIPS 2020 camera-ready). Title note: arXiv title "Local feature detection and description with policy gradient" is the original arXiv submission title; NeurIPS 2020 camera-ready title was changed to "DISK: Learning local features with policy gradient" (the canonical title used in the canonical README citation). Paper §12 Introduction + Related Work: position DISK as the bridge between fully-end-to-end-trainable methods (with surrogate losses) and RL-based-detection methods (which had been limited to detection-only with hand-crafted descriptors due to weak training signal); DISK's main RL-training contribution is the relaxation to a "find many correct feature matches" surrogate objective that allows robust training from scratch with policy gradient. Paper §3 Related Work + §4 Method: 4-layer U-Net architecture; per-pixel dense descriptor head; per-pixel scoring head; training via policy gradient with depth-based or epipolar reward; inverse_T = θ_M matching temperature scheduling. Paper §5 Experiments: HPatches MMA@3 (vs SuperPoint, R2D2, D2-Net, AS-LFeat — competitive top-tier; paper Figure 5 cached in canonical repo results/hpatches/); IMW2020 test set stereo + multiview AUC numbers (best single-extractor result at publication time per paper Table 1); 3D reconstruction quality on IMW competition images.
  • Target Audience: System architects + C3 implementer + Step-7.5 reviewer
  • Research Boundary Match: Full match for the algorithm (DISK with 4-layer U-Net + deformable bottleneck + per-pixel dense descriptor head + RL-policy-gradient training); partial match for the project's domain (paper benchmarks: HPatches Figure 5 [planar scenes with illumination + viewpoint changes], IMW2020 stereo + multiview reconstruction [phototourism dataset]; NO aerial nadir benchmark in the canonical paper; NO Aachen Day-Night benchmark in the canonical paper — Aachen results from DISK come from cross-paper evaluation in the ALIKED paper Source #75 Table VII via mNN matcher, plus from Source #71 LightGlue paper Appendix A which documents DISK+LightGlue on IMC 2020/2021/2023 + Aachen as cross-source evaluation). Critical paper §3 reference position: DISK is positioned as the RL-policy-gradient end-to-end sparse extractor that closes the training/inference discreteness gap; the paper explicitly positions itself against (a) surrogate-loss methods (SuperPoint, R2D2, D2-Net) that approximate the matching objective and (b) Q-learning methods (GLAMpoints, Reinforced Feature Points) that rely on hand-crafted descriptors or pre-trained components. DISK's contribution is to combine policy gradient with end-to-end-learnable description for the first time at competitive accuracy.
  • Summary: The canonical paper introduces DISK with three core contributions: (i) Policy-gradient end-to-end training of detection + description that closes the training/inference discreteness gap that surrogate-loss methods approximate; (ii) Surrogate objective "find many correct feature matches" that gives stable RL training signal — enables training from scratch (vs Reinforced Feature Points which requires SuperPoint pre-training); (iii) Inverse-softmax matching temperature θ_M scheduling annealed from 15 to 50 over 20 epochs to bridge the RL stochasticity-to-discreteness gap. Reported headline performance on IMW2020 (paper Table 1, 2k features): DISK 0.51315 stereo AUC + 0.72705 multiview AUC — best single-extractor result at 2020 publication time. HPatches MMA (paper Figure 5): competitive with SuperPoint, R2D2, D2-Net, ASLFeat (cached results available in canonical repo results/hpatches/ for cross-paper comparison). By transitive cross-paper lineage: ALIKED paper Source #75 Table III = DISK 1.092M params / 98.97 GFLOPs / 11.81 FPS RTX 2060 / MMA@3=77.59% / MHA@3=70.56% (vs SuperPoint 1.301M params / 26.11 GFLOPs / 52.63 FPS / MMA@3=65.37 / MHA@3=70.19; vs ALIKED-N(16) 0.677M params / 4.05 GFLOPs / 77.40 FPS / MMA@3=74.43 / MHA@3=77.22) — DISK has highest MMA@3 (best per-pixel matching accuracy among the three) but lowest FPS due to dense descriptor head. CRITICAL OBSERVATION FOR THE PROJECT: cvg/LightGlue paper Source #71 Appendix A Table 6 documents DISK+LightGlue stereo AUC@5° on IMC 2020 = 67.02 vs SP+LightGlue 59.03 = +7.99 absolute documentary technical superiority + DISK+LightGlue stereo AUC@10° = 83.45 vs SP+LightGlue 77.96 = +5.49 absolute — DISK+LightGlue is the demonstrably best documented LightGlue-extractor-sibling on phototourism stereo with full Apache-2.0 license track preservation throughout. License: Apache-2.0 via Source #76 — canonical implementation. Limitations (paper §6 + cross-paper observations): DISK's RL-policy-gradient training is computationally expensive (~2 weeks on 32 GB V100; ~2 weeks at smaller batch/chunk size on 12 GB GPUs for low-memory training); DISK's 98.97 GFLOPs at 640×480 + 1k keypoints is the highest among modern competitive sparse extractors (24.4× higher than ALIKED-N(16); 3.8× higher than SuperPoint) — partial mitigation via LightGlue-ONNX TensorRT acceleration pathway (Source #73 + Source #76); aerial-domain caveat shared with all C-row components (D-C2-1 reuse — canonical training is on MegaDepth phototourism via depth-map-supervised RL, NOT aerial nadir).
  • Related Sub-question: SQ3+SQ4 / C3 — DISK+LightGlue per-mode API capability verification (cross-source verification of canonical paper architectural details + benchmark numbers + RL training paradigm; documents the IMW2020 stereo + multiview reconstruction documentary advantage of DISK over SuperPoint at canonical paper publication time, plus the cross-paper LightGlue paper Appendix A Table 6 DISK+LightGlue +7.99 absolute AUC@5° superiority over canonical SP+LightGlue on IMC 2020 stereo as the strongest documentary technical-superiority signal for the D-C3-1 RECOMMENDED-PRIMARY-mitigation lock; aerial-domain caveat documented; D-C3-1 RECOMMENDED-PRIMARY-mitigation status confirmed)

Source #78

  • Title: SuperGlue canonical implementation — magicleap/SuperGluePretrainedNetwork (Sarlin, DeTone, Malisiewicz, Rabinovich — CVPR 2020 Oral) — official PyTorch reference implementation, README + Magic Leap "ACADEMIC OR NON-PROFIT ORGANIZATION NONCOMMERCIAL RESEARCH USE ONLY" Software License Agreement (identical wording to Source #72 SuperPoint LICENSE), demo_superglue.py (live webcam demo) + match_pairs.py (image-pair matching + evaluation) runnable inference scripts, two pretrained checkpoints superglue_indoor.pth (ScanNet-trained) + superglue_outdoor.pth (MegaDepth-trained), inference-only release (training code explicitly NOT released per README "We do not intend to release the SuperGlue training code"); SuperGlue is paired exclusively with canonical SuperPoint extractor (no SIFT-based or homography variants released per README) — both inherit Magic Leap restrictive license
  • Link: README https://raw.githubusercontent.com/magicleap/SuperGluePretrainedNetwork/master/README.md (accessed 2026-05-08); LICENSE https://raw.githubusercontent.com/magicleap/SuperGluePretrainedNetwork/master/LICENSE (accessed 2026-05-08; Magic Leap "ACADEMIC OR NON-PROFIT ORGANIZATION NONCOMMERCIAL RESEARCH USE ONLY" SLA = identical wording to Source #72 SuperPoint LICENSE); GitHub API license metadata https://api.github.com/repos/magicleap/SuperGluePretrainedNetwork (accessed 2026-05-08; license.spdx_id: "NOASSERTION" confirming non-OSI-approved license); repo https://github.com/magicleap/SuperGluePretrainedNetwork (4005 stars, 761 forks, last pushed 2024-08-30 — mature reference codebase status)
  • Tier: L1 (project-official codebase by the canonical SuperGlue authors Paul-Edouard Sarlin + Daniel DeTone + Tomasz Malisiewicz + Andrew Rabinovich, Magic Leap; same author group as Source #72 SuperPoint canonical; CVPR 2020 Oral publication; canonical implementation referenced by every subsequent feature-matching paper as the "long-established graph-neural-network sparse matcher reference baseline"; explicitly displaced by LightGlue per Source #71 paper §5 documentary 4-10× speedup at competitive accuracy)
  • Publication Date: CVPR 2020 (paper accepted Oral track); arXiv preprint v1 2019-11-26; canonical repo creation 2020-03-17; last pushed 2024-08-30 (4 years of stable maintenance for inference-only codebase, no training-code release ever)
  • Timeliness Status: Within Established-baseline-reference window (2020 publication; the long-established graph-neural-network sparse matcher reference baseline that defines the mandatory-simple-baseline role per the engine Component Option Breadth rule); Established-competitive-mandatory-baseline exemption applies (SuperGlue is the canonical sparse-matcher mandatory-simple-baseline reference for the C3 row; cited as the displaced reference in Source #71 LightGlue paper §1 + §5 + Appendix A; cited in every modern feature-matching paper as the predecessor that LightGlue exceeds)
  • Version Info: master HEAD at access time (last pushed 2024-08-30). Mode-enumeration query (1/3) — context7 NOT INDEXED + WebFetch fallback PASScontext7 resolve-library-id returned no relevant matches for "SuperGlue" feature matcher (top-result was Superglue API orchestration which is unrelated to feature-matching); per Per-Mode API Capability Verification rule item 2, fall-back to official-docs WebFetch on the canonical magicleap/SuperGluePretrainedNetwork README + LICENSE + GitHub API license metadata was used. Two SuperGlue pretrained checkpoints: superglue_indoor.pth (ScanNet-trained, default; recommended config --resize 640 --superglue indoor --max_keypoints 1024 --nms_radius 4) + superglue_outdoor.pth (MegaDepth-trained; recommended config --resize 1600 --superglue outdoor --max_keypoints 2048 --nms_radius 3 --resize_float). Operating bounds: README explicitly recommends NOT running SuperPoint+SuperGlue below 160×120 resolution (QQVGA) or above 2000×1500. CLI structure: two main top-level scripts — demo_superglue.py (live webcam/IP-camera/directory/movie demo with keyboard control) + match_pairs.py (image-pair matching from text-file list + optional evaluation if ground truth provided). Output dict format (per .npz file structure documented in README): {keypoints0: (N0, 2), keypoints1: (N1, 2), matches: (N0,) array of indices into keypoints1 with -1 for unmatched, match_confidence: (N0,)}; the optional --eval mode adds {error_t, error_R, precision, matching_score, num_correct, epipolar_errors}. Architecture (per README): "Graph Neural Network combined with an Optimal Matching layer that is trained to perform matching on two sets of sparse image features" — operates as a "middle-end" performing context aggregation + matching + filtering in a single end-to-end architecture. Pairing: SuperGlue is paired exclusively with canonical SuperPoint extractor; README explicitly states "We do not intend to release the SIFT-based or homography SuperGlue models" — there is NO non-Magic-Leap-extractor variant of canonical SuperGlue. CRITICAL RETRAIN BLOCKER: README explicitly states "We do not intend to release the SuperGlue training code" — training code is NOT released, blocking any project-side D-C2-1 retrain decision for SuperGlue+SuperPoint pinned mode. Documentary results (per README evaluation tables): ScanNet test set (1500 indoor pairs) AUC@5/10/20 = 16.12/33.76/51.79, Prec=84.37, MScore=31.14; YFCC test set (4000 outdoor pairs) AUC@5/10/20 = 39.02/59.51/75.72, Prec=98.72, MScore=23.61. Phototourism evaluation is mentioned but not directly reproducible (Image Matching Challenge 2020 keeps test set ground truth private). Hloc integration: README explicitly cross-references cvg/Hierarchical-Localization (hloc) toolbox where SuperGlue is the canonical matcher prior to LightGlue's release; "Winner of 3 CVPR 2020 competitions on localization and image matching!" per README.
  • Target Audience: System architects + C3 implementer + Step-7.5 reviewer + license-posture decision-maker (D-C1-1 + D-C3-1 — same Magic Leap restrictive HARD DISQUALIFIER as canonical SP+LightGlue applies to canonical SuperGlue+SuperPoint)
  • Research Boundary Match: Full match for the project's pinned mode of SuperGlue+SuperPoint mandatory-simple-baseline reference (single-image-pair sparse feature matching: take a UAV nadir frame + a retrieved satellite tile, run SuperPoint feature extraction on each independently, match via SuperGlue with outdoor checkpoint, return 2D-2D correspondences with confidence scores feeding the project's downstream C4 PnP+RANSAC pose estimator). The repo ships everything needed for inference-only evaluation: SuperPoint extractor (MagicLeap-pretrained, inherits Source #72 license), SuperGlue matcher (two pretrained checkpoints), match_pairs.py evaluation script, sample image pairs with ground truth. Asymmetric image-pair sizes are handled natively — same independent per-image extraction pattern as SP+LightGlue. Partial match for the project's domain (canonical training on ScanNet indoor + MegaDepth phototourism outdoor scenes — neither dataset is aerial nadir; same aerial-domain caveat as SP+LightGlue + DISK+LightGlue + ALIKED+LightGlue + C2 candidates; aerial applicability is NOT explicitly validated in the canonical paper or README — project-side via Jetson MVE on AerialExtreMatch + Derkachi flight; D-C2-1 retrain decision is BLOCKED for SuperGlue+SuperPoint pinned mode since training code is not released). CRITICAL NEGATIVE finding for the role assessment: SuperGlue+SuperPoint is strictly inferior to SP+LightGlue + DISK+LightGlue + ALIKED+LightGlue for the project's deployment as a Selected candidate because: (i) Same Magic Leap restrictive HARD DISQUALIFIER as canonical SP+LightGlue (LICENSE wording is identical to Source #72) — blocks dual-use deployment; (ii) No retrain capability — training code explicitly not released; (iii) 4-10× slower than LightGlue at similar accuracy per Source #71 paper §5 + Table 2 documentary evidence; (iv) No alternative extractor — paired exclusively with Magic Leap SuperPoint, no SIFT or homography variants released. POSITIVE for the role: SuperGlue+SuperPoint IS the canonical sparse-matcher mandatory-simple-baseline that the engine's Component Option Breadth rule requires to be cataloged — establishes the long-established reference floor against which modern leads (LightGlue, XFeat) must measurably exceed.
  • Summary: SuperGlue is the canonical graph-neural-network sparse matcher introduced by Sarlin, DeTone, Malisiewicz, and Rabinovich (CVPR 2020 Oral), with attentional graph neural network + optimal matching layer as its main contributions — operates as a "middle-end" that takes two sets of SuperPoint keypoints + descriptors as input and outputs a soft assignment matrix between them; trained to perform matching end-to-end with attention-based context aggregation + Sinkhorn algorithm for optimal transport assignment. CRITICAL LICENSE FINDING: LICENSE file contents are byte-for-byte identical to Source #72 SuperPoint LICENSE = Magic Leap "ACADEMIC OR NON-PROFIT ORGANIZATION NONCOMMERCIAL RESEARCH USE ONLY" Software License Agreement — non-OSI-approved (GitHub API license.spdx_id: "NOASSERTION"); same wording: "Licensee a personal, non-exclusive, non-transferable license to use the Software for noncommercial research purposes" + "You may not distribute, copy or use the Software except as explicitly permitted herein" + "You may not sell, rent, lease, sublicense, lend, time-share or transfer". HARD DISQUALIFIER for canonical SuperGlue+SuperPoint mode in project's dual-use deployment context (eastern/southern Ukraine fixed-wing UAV with AC-NEW-2 spoofing-promotion path is dual-use military by every reasonable interpretation, and the project's question_decomposition.md hard disqualifier list includes "anything whose license blocks military / dual-use deployment"). Training code explicitly NOT released per README — D-C2-1 retrain decision is BLOCKED for SuperGlue+SuperPoint pinned mode. Documentary headline performance vs LightGlue (per Source #71 paper §5 + Table 2 cross-cite): SuperGlue is 4-10× slower than LightGlue at competitive but slightly lower accuracy; LightGlue paper Table 2 documents SP+LightGlue MegaDepth-1500 AUC@5°/10°/20° = 66.7/79.3/87.9 at 44.2 ms standard / 31.4 ms adaptive RTX 3080, vs SP+SuperGlue at slightly lower AUC + 4-10× slower runtime. Documentary results in canonical README: ScanNet test (indoor, 1500 pairs) AUC@5/10/20 = 16.12/33.76/51.79; YFCC test (outdoor, 4000 pairs) AUC@5/10/20 = 39.02/59.51/75.72. Architecture: graph neural network with self-attention + cross-attention layers (paper §3.1) + optimal matching layer with dustbin (paper §3.2) + Sinkhorn algorithm for soft assignment (paper §3.2.2). Modern lineage: SuperGlue (CVPR 2020) is the predecessor of LightGlue (ICCV 2023, Source #70 + #71); SuperGlue is also the predecessor of SuperGlue-LoRA + LoFTR + DKM + RoMa + MASt3R successor lineage — but per Fact #26 NGPS pre-screen template, dense matchers (LoFTR, DKM, RoMa, MASt3R) are pruned outright on AC-4.1 Jetson dense-matcher-latency disqualifier. Limitations: (a) Magic Leap restrictive license HARD DISQUALIFIER (same as Source #72); (b) no training-code release blocks aerial-domain retrain; (c) 4-10× slower than LightGlue; (d) paired exclusively with Magic Leap SuperPoint extractor (no Apache-2.0 / BSD-3-Clause extractor pairing variants released); (e) no FlashAttention support (LightGlue's structural advantage); (f) no adaptive-depth/adaptive-width pruning (LightGlue's structural advantage paper §3.3); (g) no canonical Jetson ONNX/TensorRT export pathway in the LightGlue-ONNX equivalent project (SuperGlue's ONNX export is community-maintained third-party, not productized).
  • Related Sub-question: SQ3+SQ4 / C3 — SuperGlue+SuperPoint mandatory-simple-baseline per-mode API capability verification (Mandatory context7 lookup NOT INDEXED + WebFetch fallback PASS per Per-Mode rule item 2; cross-validated against canonical README + LICENSE WebFetch + canonical paper [Source #79] + GitHub API license metadata; HARD-LICENSE-DISQUALIFIER applies to canonical SuperGlue+SuperPoint mode in project's dual-use deployment context — same Magic Leap restrictive license as canonical SP+LightGlue; TRAINING-CODE-NOT-RELEASED blocks D-C2-1 retrain decision; role per engine Component Option Breadth rule = mandatory-simple-baseline reference floor that establishes the long-established sparse-matcher reference against which modern leads must measurably exceed; documented Recall@K + AUC consistently 1-3 absolute below LightGlue across HPatches / MegaDepth / Aachen / IMC at 4-10× slower runtime per Source #71)

Source #79

  • Title: SuperGlue canonical paper — "SuperGlue: Learning Feature Matching with Graph Neural Networks" (Sarlin, DeTone, Malisiewicz, Rabinovich — CVPR 2020 Oral, arXiv:1911.11763)
  • Link: arXiv abstract https://arxiv.org/abs/1911.11763 (November 2019; CVPR 2020 camera-ready); arXiv full PDF https://arxiv.org/pdf/1911.11763.pdf ; CVPR 2020 proceedings https://openaccess.thecvf.com/content_CVPR_2020/papers/Sarlin_SuperGlue_Learning_Feature_Matching_With_Graph_Neural_Networks_CVPR_2020_paper.pdf ; psarlin.com/superglue (project website with videos, slides, recent updates); accessed 2026-05-08
  • Tier: L1 (peer-reviewed CVPR 2020 Oral + canonical implementation cross-referenced; documented predecessor of LightGlue per Source #71 paper §1; cited by 2020-2026 feature-matching papers as the "graph-neural-network sparse matcher reference baseline"; winner of 3 CVPR 2020 competitions on localization and image matching per Source #78 README)
  • Publication Date: arXiv preprint 2019-11-26 (v1); CVPR 2020 publication June 2020 (Oral track)
  • Timeliness Status: Within Established-baseline-reference window (2020 — established competitive ground for sparse-matcher reference; Established-competitive-mandatory-baseline exemption applies — SuperGlue is the canonical sparse-matcher reference baseline that defines the mandatory-simple-baseline role for the C3 row per the engine Component Option Breadth rule)
  • Version Info: arXiv v1 (November 2019, CVPR 2020 Oral camera-ready). Paper §3 architecture: Attentional Graph Neural Network with bidirectional self-attention and cross-attention layers + Optimal Matching Layer with dustbin handling + Sinkhorn algorithm for differentiable optimal transport assignment. Paper §4 training: end-to-end training with sparse keypoint correspondence supervision; ScanNet for indoor model; MegaDepth for outdoor model; trained checkpoints (released) but training code (NOT released per Source #78 README). Paper §5 experiments: ScanNet indoor pose estimation (Table 1; outperforms ratio test, mutual nearest-neighbor, OANet, NN-RANSAC); YFCC outdoor pose estimation (Table 2; outperforms same set of baselines); Phototourism reconstruction (Table 3; competitive with NN+RANSAC + GMS at higher pose accuracy); HPatches homography estimation (Table 4; matches the displacement-only state-of-the-art).
  • Target Audience: System architects + C3 implementer + Step-7.5 reviewer
  • Research Boundary Match: Full match for the algorithm (SuperGlue with attentional GNN + Sinkhorn optimal matching layer + dustbin handling); partial match for the project's domain (paper benchmarks: ScanNet indoor, YFCC outdoor, Phototourism reconstruction, HPatches homography — NO aerial nadir benchmark in the canonical paper). Critical paper §5 reference position: SuperGlue is positioned as the sparse-matcher state-of-the-art at 2020 publication time, displacing classical mutual nearest-neighbor + RANSAC baselines + earlier learned matchers (OANet, NG-RANSAC, ACNe). The paper's main contribution is closing the gap between hand-crafted matchers (which have well-defined fallback semantics) and learned matchers (which prior to SuperGlue often degraded out-of-domain). For the project's mandatory-simple-baseline role: SuperGlue+SuperPoint is the long-established sparse-matcher reference baseline that defines the simple-baseline floor against which modern leads (LightGlue, XFeat) must measurably exceed.
  • Summary: The canonical paper introduces SuperGlue with three core contributions: (i) Attentional Graph Neural Network with self-attention + cross-attention that aggregates context across the matching image pair, allowing each keypoint to be matched based on both intra-image and inter-image structure; (ii) Optimal Matching Layer with dustbin handling (paper §3.2) that produces a soft assignment matrix where unmatched keypoints are assigned to a dustbin; (iii) Sinkhorn algorithm for differentiable optimal transport (paper §3.2.2) that allows end-to-end training of the entire matching pipeline. Documentary headline results (paper §5): ScanNet indoor pose estimation Table 1 outperforms NN+RANSAC + ratio test + OANet across all AUC tiers; YFCC outdoor pose estimation Table 2 outperforms same baselines; Phototourism reconstruction Table 3 competitive with state-of-the-art at higher pose accuracy; HPatches homography estimation Table 4 matches state-of-the-art. Documentary cross-reference with LightGlue (Source #71 paper Table 2 cross-cite): LightGlue is 4-10× faster than SuperGlue at competitive accuracy — SP+LightGlue MegaDepth-1500 AUC@5°/10°/20°=66.7/79.3/87.9 at 44.2 ms standard / 31.4 ms adaptive RTX 3080 vs SP+SuperGlue similar AUC at 4-10× slower runtime; the LightGlue paper §1 explicitly positions LightGlue as the displacement of SuperGlue in the canonical NetVLAD top-K → sparse matcher → PnP+RANSAC pipeline shape. License: Magic Leap restrictive via Source #78 — canonical implementation. Limitations: (a) Magic Leap restrictive license HARD DISQUALIFIER (same as Source #72 SuperPoint and Source #78 SuperGlue); (b) no training-code release per Source #78 README blocks D-C2-1 retrain; (c) displaced by LightGlue per Source #71 paper §5 + Table 2 documentary 4-10× speedup at competitive accuracy; (d) paired exclusively with canonical SuperPoint extractor (no SIFT or homography variants released); (e) no FlashAttention or adaptive-depth/adaptive-width pruning structural advantages; (f) no productized Jetson ONNX/TensorRT export pathway (SuperGlue ONNX export is community-maintained third-party, not productized in the LightGlue-ONNX equivalent project).
  • Related Sub-question: SQ3+SQ4 / C3 — SuperGlue+SuperPoint mandatory-simple-baseline per-mode API capability verification (cross-source verification of canonical paper architectural details + benchmark numbers + training paradigm; documents the displaced-by-LightGlue reference position as the long-established sparse-matcher reference baseline that defines the simple-baseline floor for the C3 row per the engine Component Option Breadth rule; HARD-LICENSE-DISQUALIFIER applies + TRAINING-CODE-NOT-RELEASED blocks retrain; aerial-domain caveat documented; mandatory-simple-baseline role confirmed)

Source #80

  • Title: XFeat canonical implementation — verlab/accelerated_features (Potje, Cadar, Araujo, Martins, Nascimento — CVPR 2024) — official PyTorch reference implementation, README + Apache 2.0 LICENSE; minimalist 3-line inference API (from modules.xfeat import XFeat; xfeat = XFeat(); output = xfeat.detectAndCompute(torch.randn(1,3,480,640), top_k=4096)[0]); Torch Hub one-liner torch.hub.load('verlab/accelerated_features', 'XFeat', pretrained=True, top_k=4096); two main inference modes — sparse (xfeat) + semi-dense (xfeat-star); training code released (notebook XFeat_training_example.ipynb + python3 -m modules.training.train --training_type xfeat_default --megadepth_root_path <...> --synthetic_root_path <...> --ckpt_save_path <...>); built-in evaluation harnesses (python3 -m modules.eval.megadepth1500 --matcher xfeat + python3 -m modules.eval.scannet1500); real-time webcam demo (python3 realtime_demo.py --method XFeat); NEW: XFeat+LighterGlue companion mode (~3× faster than original LightGlue, trained by VerLab using cvg/glue-factory library, distributed via xfeat+lg_torch_hub.ipynb notebook); kornia integration (acknowledged in README); CRITICAL Contributing-section ask: "Currently, it would be nice to have an export script to efficient deployment engines such as TensorRT and ONNX" — ONNX/TensorRT export pathway is COMMUNITY-CONTRIBUTION-NEEDED, NOT productized in canonical repo (HARSHER D-C3-2 gate than DISK+LightGlue's well-documented LightGlue-ONNX TensorRT pathway, but TECHNICALLY SIMPLER than ALIKED+LightGlue's torchvision.ops.deform_conv2d ONNX-export blocker because XFeat is CNN-only with no deformable convolutions or unusual ops)
  • Link: README https://raw.githubusercontent.com/verlab/accelerated_features/main/README.md (accessed 2026-05-08); GitHub API license metadata https://api.github.com/repos/verlab/accelerated_features (accessed 2026-05-08; license.spdx_id: "Apache-2.0"); repo https://github.com/verlab/accelerated_features (1614 stars, 207 forks, last pushed 2025-01-15 — actively maintained CVPR 2024 reference codebase with training-code-released + companion XFeat+LighterGlue + minimal-dependency PyTorch-only architecture); project page https://www.verlab.dcc.ufmg.br/descriptors/xfeat_cvpr24/ ; HuggingFace Spaces demo https://huggingface.co/spaces/qubvel-hf/xfeat ; eight Colab notebooks distributed in-tree (minimal_example.ipynb, xfeat_matching.ipynb, xfeat_torch_hub.ipynb, XFeat_training_example.ipynb, xfeat+lg_torch_hub.ipynb)
  • Tier: L1 (project-official codebase by the canonical XFeat authors; CVPR 2024 publication acceptance; canonical implementation referenced in subsequent feature-matching papers as the modern-lightweight learned-feature reference; UFMG VerLab is the authors' affiliation and maintains the project; cross-affiliations span UFMG + Université de Bourgogne + Google Research + Université de Lorraine + Microsoft)
  • Publication Date: CVPR 2024 paper acceptance + canonical repo creation 2024-04-15; last pushed 2025-01-15 (9 months of active maintenance, ongoing community contributions including XFeat+LighterGlue companion mode added post-paper-acceptance)
  • Timeliness Status: Within Modern-competitive-lead window (2024 — modern competitive lightweight-CNN reference; XFeat is the modern-lightweight-CNN reference baseline that defines the modern-lead role for the C3 row's lightweight-CNN axis)
  • Version Info: main HEAD at access time (last pushed 2025-01-15). Mode-enumeration query (1/3) — context7 NOT INDEXED + WebFetch fallback PASScontext7 resolve-library-id returned just-sultanov/xfeat git-worktree-management CLI utility (UNRELATED to the canonical XFeat feature-matching library); per Per-Mode API Capability Verification rule item 2, fall-back to official-docs WebFetch on the canonical verlab/accelerated_features README + GitHub API license metadata + canonical paper (Source #81) was used. Three primary inference modes: (i) XFeat sparse with top_k=4096 keypoints + 64-D float descriptors + Mutual Nearest Neighbor (MNN) matching; (ii) XFeat* semi-dense with up to 10k features + 2-scale processing (0.65× + 1.3× input resize) + MNN + lightweight MLP-based offset refinement (offset prediction confidence threshold 0.2); (iii) XFeat+LighterGlue with VerLab-trained smaller LightGlue variant (~3× faster than original LightGlue per README claim). Operating bounds: README claims VGA real-time on Intel i5 CPU + 1,400 FPS batched RTX 4090 at VGA + 150 FPS single-batch RTX 4090 at VGA; paper Table 1 documents 27.1 FPS XFeat / 19.2 FPS XFeat* on Intel i5-1135G7 CPU at VGA. Supports gray-scale or RGB input (paper §3.1 explicitly grayscale H×W×C with C=1; PyTorch tensor accepts (B, 3, H, W) per README minimal example). CLI structure: minimalist 3-line inference API + Torch Hub one-liner + 8 Colab notebooks + 3 evaluation scripts + 1 real-time webcam demo. Output dict format: per-image dict {keypoints: (N, 2), scores: (N,), descriptors: (64, N) or (N, 64) depending on mode} for sparse mode (XFeat); semi-dense mode (XFeat*) adds match_confidences from MLP offset refinement. Architecture (per paper §3 + README): featherweight CNN backbone with channel sequence {4, 8, 24, 64, 64, 128} (paper §3.1 triple-rate channel increase vs VGG's double-rate); 23 convolutional layers organized as 6 spatial-halving blocks + 2 fusion blocks; basic layer = Conv + ReLU + BatchNorm; DECOUPLED keypoint detection branch using 1×1 convolutions on 8×8 tensor-block-transformed image (paper §3.2 Keypoint Head); descriptor head = feature pyramid merging at 1/8, 1/16, 1/32 scales bilinearly upsampled to 1/8 + element-wise summation + fusion block; reliability map regression branch; match refinement module = lightweight MLP predicting 8×8 pixel-level offset distribution. Pairing options (per README): standalone XFeat sparse with MNN matching / standalone XFeat* semi-dense with MNN+offset-refinement / XFeat+LighterGlue paired matcher (NEW companion mode using cvg/glue-factory-trained LighterGlue variant ~3× faster than canonical LightGlue per README claim). Training: explicitly distributed in-tree (XFeat_training_example.ipynb); training command python3 -m modules.training.train --training_type xfeat_default --megadepth_root_path <...> --synthetic_root_path <...> --ckpt_save_path <...>; uses MegaDepth + COCO_20k synthetic warped-pairs at 6:4 ratio per paper §3.3; training on entry-level hardware (paper §3.3 mentions 6.5 GB VRAM total + 36 hours on single RTX 4090 + batch size 10 + Adam optimizer LR 3e-4 + exponential decay 0.5 every 30k updates + convergence at 160k iterations). Documentary results (per paper Table 1, MegaDepth-1500 i5-1135G7 CPU VGA, AUC@5/10/20 + FPS): SuperPoint AUC@5/10/20 = 37.3/50.1/61.5 at 3.0 FPS (4096 kpts) / DISK = 53.8/65.9/75.0 at 1.2 FPS (4096 kpts) / DISK* = 55.2/66.8/75.3 at 1.2 FPS (10k kpts) / ALIKE-Tiny = 49.4/61.8/71.4 at 5.3 FPS (4096 kpts) / XFeat sparse = 42.6/56.4/67.7 at 27.1 FPS (4096 kpts; 9× faster than SuperPoint at HIGHER AUC + 5× faster than ALIKE) / XFeat* semi-dense = 50.2/65.4/77.1 at 19.2 FPS (10k features; comparable to DISK* at 16× speedup); paper Table 2 ScanNet-1500 indoor: XFeat AUC@5=16.7 / XFeat*=18.4 vs SuperPoint=12.5 / DISK=9.6/11.3 / ALIKE=8.0 — XFeat outperforms ALL other methods on ScanNet indoor despite all methods being trained on MegaDepth (paper Appendix E attributes this to hybrid MegaDepth+synthetic-warp-COCO training reducing landmark-dataset overfitting bias); paper Table 3 HPatches Homography MHA@3 Illumination/Viewpoint = 95.0/68.6 (XFeat) — best Illumination@3 in paper Table 3 across all methods including SuperPoint 94.6 and DISK 94.6.
  • Embedded/CPU deployment claim (per paper Appendix C): on Orange Pi Zero 3 (Cortex-A53 ARM, $28 device) at 480×360 input, XFeat=1.8 FPS vs SuperPoint=0.16 FPS vs ALIKE=0.58 FPS — XFeat is the ONLY learned method capable of running over 1 FPS on highly-constrained embedded device without neural-network-inference optimization; this is the strongest documented embedded-deployment signal among all C3 candidates evaluated. Project's Jetson Orin Nano Super has dedicated GPU (1024-core Ampere) — XFeat extrapolation to Jetson Orin Nano fp16 with TensorRT will be substantially faster than Orange Pi Zero 3 ARM CPU. Documentary headline performance vs LightGlue siblings (per README MegaDepth-1500 cross-cite vs SP+LightGlue): XFeat+LighterGlue Fast (640, 1300 kpts) AUC@5/10/20 = 0.444/0.610/0.746 vs SP+LightGlue 0.469/0.633/0.762 (-2.5/-2.3/-1.6 absolute); Accurate (1024, 4096 kpts) AUC@5/10/20 = 0.564/0.710/0.819 vs SP+LightGlue 0.591/0.738/0.841 (-2.7/-2.8/-2.2 absolute) — XFeat+LighterGlue is modestly below SP+LightGlue at competitive accuracy + ~3× LighterGlue speedup.
  • Target Audience: System architects + C3 implementer + Step-7.5 reviewer + license-posture decision-maker (D-C1-1 — Apache-2.0 throughout = clean BSD/permissive track) + Plan-phase architect (modern-competitive-lead role for the C3 row's lightweight-CNN axis with strongest-documented-embedded-deployment signal among all C3 candidates evaluated)
  • Research Boundary Match: Full match for the project's pinned mode of XFeat sparse / XFeat* semi-dense / XFeat+LighterGlue paired matcher (single-image-pair sparse or semi-dense feature matching: take a UAV nadir frame + a retrieved satellite tile, run XFeat extractor on each independently, match via MNN sparse OR MLP-refinement-semi-dense OR LighterGlue-paired matcher, return 2D-2D correspondences with confidence scores feeding the project's downstream C4 PnP+RANSAC pose estimator). Asymmetric image-pair sizes are handled natively — same independent per-image extraction pattern as SP+LightGlue + DISK+LightGlue + ALIKED+LightGlue. Partial match for the project's domain (canonical training on MegaDepth phototourism outdoor + COCO_20k synthetic-warp pairs — neither dataset is aerial nadir; same aerial-domain caveat as SP+LightGlue + DISK+LightGlue + ALIKED+LightGlue + C2 candidates; D-C2-1 retrain decision REUSED with strongest retrain-friendliness signal among all C3 candidates evaluated — paper §3.3 explicit "low memory usage of our method enables training on entry-level hardware, facilitating the fine-tuning or full training of our network for specific tasks and scene types" + 36 hours on single RTX 4090 + 6.5 GB VRAM total). Aerial applicability is NOT explicitly validated in canonical paper or README — project-side via Jetson MVE on AerialExtreMatch + Derkachi flight; D-C2-1 retrain decision is materially less expensive than DISK+LightGlue (~2 weeks 32 GB V100) + ALIKED+LightGlue (~24 hours RTX 3090). CRITICAL POSITIVE finding: XFeat is the only C3 candidate with explicit documentation of CPU-real-time inference + embedded-device benchmarks (paper Appendix C Orange Pi Zero 3 numbers); README explicitly states "Simple architecture components which facilitates deployment on embedded devices (jetson, raspberry pi, custom AI chips, etc..)" — strongest embedded-deployment story among all C3 candidates evaluated. CRITICAL NEGATIVE finding: NO PRODUCTIZED ONNX/TensorRT export pathway in canonical repo (README Contributing section explicit ask) — D-C3-2 gate is HARSHER than DISK+LightGlue but TECHNICALLY SIMPLER than ALIKED+LightGlue because XFeat is CNN-only with no deformable convolutions or unusual ops; project would need to invest custom-ONNX-export engineering effort but the architecture is straightforward (Conv + ReLU + BatchNorm only, no torchvision.ops.deform_conv2d blocker, no graph-neural-network attention export complexity).
  • Summary: XFeat is the canonical lightweight-CNN learned feature extractor + matcher introduced by Potje, Cadar, Araujo, Martins, and Nascimento (UFMG VerLab + Université de Bourgogne + Google Research + Université de Lorraine + Microsoft, CVPR 2024), with three core contributions: (i) lightweight CNN architecture with featherweight backbone using triple-rate channel increase (vs VGG's double-rate) channel sequence {4, 8, 24, 64, 64, 128} + 6 spatial-halving blocks + 2 fusion blocks + 23 total convolutional layers — designed for resource-constrained deployment without hardware-specific optimization; (ii) decoupled minimalist learnable keypoint detection branch using 1×1 convolutions on 8×8 tensor-block-transformed image with knowledge distillation from ALIKE-Tiny teacher; (iii) lightweight MLP-based match refinement module for pixel-level offsets from coarse semi-dense matches without high-resolution feature maps. Two main inference modes: XFeat sparse (top-K up to 4096 keypoints + 64-D float descriptors + MNN matching) and XFeat* semi-dense (up to 10k features + 2-scale processing + MNN + MLP offset refinement). NEW companion mode XFeat+LighterGlue (VerLab-trained smaller LightGlue variant ~3× faster than original LightGlue per README claim, distributed in-tree via xfeat+lg_torch_hub.ipynb). License: Apache 2.0 (canonical repo license.spdx_id: "Apache-2.0" per GitHub API metadata) — clean BSD/permissive track throughout, no copyleft + no Magic Leap restrictive disqualifier. Documentary headline performance (per paper Table 1 MegaDepth-1500 i5-1135G7 CPU VGA): XFeat sparse AUC@5/10/20 = 42.6/56.4/67.7 at 27.1 FPS = 9× faster than SuperPoint at HIGHER AUC + 5× faster than ALIKE-Tiny; XFeat* semi-dense AUC@5/10/20 = 50.2/65.4/77.1 at 19.2 FPS = comparable to DISK* at 16× speedup; paper Table 2 ScanNet-1500 indoor XFeat outperforms all baselines including SuperPoint+DISK+ALIKE despite all methods being MegaDepth-trained (paper Appendix E hybrid-training reduces landmark-dataset overfitting bias). EMBEDDED DEPLOYMENT SIGNAL (per paper Appendix C): on Orange Pi Zero 3 ($28 ARM Cortex-A53 device) at 480×360 input — XFeat=1.8 FPS vs SuperPoint=0.16 FPS vs ALIKE=0.58 FPS — XFeat is the ONLY learned method capable of running over 1 FPS on highly-constrained embedded device without neural-network-inference optimization. Training: 36 hours on single RTX 4090 + 6.5 GB VRAM total + MegaDepth + COCO_20k synthetic warp pairs at 6:4 ratio + 800×600 training resolution + batch size 10 + 160k iterations + Adam optimizer LR 3e-4 — strongest retrain-friendliness signal among all C3 candidates evaluated (vs DISK ~2 weeks 32 GB V100, ALIKED ~24 hours RTX 3090, SuperGlue training-code-not-released). Limitations: (a) NO PRODUCTIZED ONNX/TensorRT export pathway in canonical repo — README Contributing section explicit community-contribution ask; (b) AUC@5° on MegaDepth-1500 sparse mode (42.6) is materially below DISK (53.8) + DISK* (55.2) + ALIKE-Tiny (49.4) — XFeat sparse is positioned as "competitive accuracy at much higher speed" rather than "best-accuracy"; (c) XFeat+LighterGlue MegaDepth-1500 AUC is modestly below SP+LightGlue at -2.5 to -2.8 absolute AUC@5° (Fast / Accurate configs); (d) aerial-domain training caveat shared with all C3 candidates evaluated; (e) 64-D descriptors (vs SP/DISK 256-D/128-D) provide cache-footprint advantage but may have weaker descriptor discrimination at extreme cross-domain matching (paper §4.3 visual localization validation on Aachen Day-Night not directly extracted in this session — section 4.3 referenced in paper but headline numbers not in extracted snippet).
  • Related Sub-question: SQ3+SQ4 / C3 — XFeat per-mode API capability verification (Mandatory context7 lookup NOT INDEXED + WebFetch fallback PASS per Per-Mode rule item 2; cross-validated against canonical README + GitHub API license metadata + canonical paper [Source #81]; APACHE-2.0-CLEAN-LICENSE-THROUGHOUT + STRONGEST EMBEDDED-DEPLOYMENT SIGNAL AMONG ALL C3 CANDIDATES EVALUATED (Orange Pi Zero 3 1.8 FPS; designed explicitly for "jetson, raspberry pi, custom AI chips, etc.") + STRONGEST RETRAIN-FRIENDLINESS SIGNAL AMONG ALL C3 CANDIDATES EVALUATED (36 hours on single RTX 4090, 6.5 GB VRAM total) + NO PRODUCTIZED ONNX/TensorRT EXPORT PATHWAY in canonical repo (README Contributing section explicit community-contribution ask — D-C3-2 gate HARSHER than DISK but TECHNICALLY SIMPLER than ALIKED) + MODERN COMPETITIVE LEAD ROLE for the C3 row's lightweight-CNN axis with two distinct inference modes (XFeat sparse / XFeat* semi-dense) + companion XFeat+LighterGlue paired matcher mode + canonical evaluation harnesses for MegaDepth-1500 + ScanNet-1500; aerial-domain-training caveat documented; D-C3-6 NEW Plan-phase decision for XFeat-mode-choice required)

Source #81

  • Title: XFeat canonical paper — "XFeat: Accelerated Features for Lightweight Image Matching" (Potje, Cadar, Araujo, Martins, Nascimento — CVPR 2024, arXiv:2404.19174)
  • Link: arXiv abstract https://arxiv.org/abs/2404.19174 (April 2024); arXiv full HTML https://arxiv.org/html/2404.19174v1 ; CVPR 2024 proceedings https://openaccess.thecvf.com/content/CVPR2024/html/Potje_XFeat_Accelerated_Features_for_Lightweight_Image_Matching_CVPR_2024_paper.html ; project page https://www.verlab.dcc.ufmg.br/descriptors/xfeat_cvpr24/ (videos, slides, supplementary material); accessed 2026-05-08
  • Tier: L1 (peer-reviewed CVPR 2024 + canonical implementation cross-referenced; cited by 2024-2026 feature-matching papers as the modern-lightweight-CNN reference for resource-constrained deployment; UFMG VerLab + multiple cross-affiliations including Google Research + Microsoft underscoring the paper's industry credibility)
  • Publication Date: arXiv preprint 2024-04-30 (v1); CVPR 2024 publication June 2024
  • Timeliness Status: Within Modern-competitive-lead window (2024 — modern competitive lightweight-CNN reference; strongest embedded-deployment signal among modern competitive C3 candidates at the publication time and through the project's evaluation window 2026)
  • Version Info: arXiv v1 (April 2024, CVPR 2024 camera-ready). Paper §3 architecture: featherweight CNN backbone with channel sequence {4, 8, 24, 64, 64, 128} (paper §3.1 triple-rate channel increase vs VGG's double-rate); 23 convolutional layers organized as 6 spatial-halving blocks + 2 fusion blocks; decoupled keypoint detection branch (paper §3.2 Keypoint Head) using 1×1 convolutions on 8×8 tensor-block-transformed image with knowledge distillation from ALIKE-Tiny teacher; descriptor head (paper §3.2 Descriptor head) = feature pyramid merging at 1/8, 1/16, 1/32 scales bilinearly upsampled to 1/8 + element-wise summation + fusion block; reliability map regression branch; match refinement module (paper §3.2 Dense matching) = lightweight MLP predicting 8×8 pixel-level offset distribution conditioned on coarsely matched feature pair. Paper §3.3 training: dual-softmax loss for descriptor learning + L1 reliability loss + NLL fine-matching loss for offset prediction + NLL keypoint loss with knowledge distillation from ALIKE-Tiny; trained on MegaDepth + synthetic warped COCO at 6:4 ratio + 800×600 input + batch size 10 + 160k iterations + Adam LR 3e-4 + exponential decay 0.5 every 30k updates + 36 hours on single RTX 4090 + 6.5 GB VRAM total. Paper §4 experiments: MegaDepth-1500 (Table 1, outdoor pose estimation) + ScanNet-1500 (Table 2, indoor pose estimation) + HPatches (Table 3, homography estimation) + Aachen Day-Night (Section 4.3, visual localization via HLoc) + Appendix F (Table 6, learned-matcher comparison vs LoFTR, LightGlue, Patch2Pix). Paper Appendix C detailed timing analysis: i7-6700K CPU + Orange Pi Zero 3 ARM Cortex-A53 embedded device (480×360 input, XFeat=1.8 FPS vs SuperPoint=0.16 FPS vs ALIKE=0.58 FPS) — XFeat is the ONLY learned method capable of running over 1 FPS on highly-constrained embedded device without neural-network-inference optimization at the publication time.
  • Target Audience: System architects + C3 implementer + Step-7.5 reviewer + license-posture decision-maker (D-C1-1 — clean Apache-2.0) + Plan-phase architect (modern-competitive-lead role + strongest embedded-deployment signal in the C3 row + strongest retrain-friendliness signal in the C3 row)
  • Research Boundary Match: Full match for the algorithm (XFeat lightweight-CNN feature extractor with featherweight backbone + decoupled keypoint head + lightweight MLP-based match refinement module); partial match for the project's domain (paper benchmarks: MegaDepth-1500 outdoor phototourism, ScanNet-1500 indoor RGB-D, HPatches homography, Aachen Day-Night day/night visual localization — NO aerial nadir benchmark in the canonical paper). Critical paper §4 + Appendix F documentary cross-cite: XFeat* semi-dense at 1885 inliers and PPS=1.33 vs LightGlue 475 inliers PPS=0.31 (paper Appendix F Table 6) — XFeat* delivers 4× more inliers per pair than LightGlue at 4× higher throughput, demonstrating fundamental architectural advantage in the semi-dense matching paradigm vs sparse-only learned matchers. Paper §5 / Appendix F: explicit positioning as complementary to learned matchers — "Our techniques are, in fact, complementary to learned matchers; for example, LightGlue can be trained using both XFeat and XFeat* features" — anticipates the XFeat+LighterGlue companion mode released post-paper-acceptance per Source #80 README.
  • Summary: The canonical paper introduces XFeat with three core contributions: (i) a novel lightweight CNN architecture with featherweight backbone using triple-rate channel increase strategy (vs VGG's double-rate) — channel sequence {4, 8, 24, 64, 64, 128} ensures minimal computational depth in early high-spatial-resolution layers while maintaining representational capacity through later deeper convolutions; (ii) a minimalist learnable keypoint detection branch that decouples detection from description using 1×1 convolutions on 8×8 tensor-block-transformed image with knowledge distillation from ALIKE-Tiny teacher (smaller backbone tends to concentrate on lower-level image features like corners/lines/blobs aligning with the 8×8 receptive field); (iii) a novel lightweight MLP-based match refinement module for pixel-level offsets from coarse semi-dense matches without high-resolution feature maps (vs LoFTR/ASpanFormer which require costly high-resolution feature maps), enabling efficient semi-dense matching in resource-constrained settings. Documentary headline results: paper Table 1 MegaDepth-1500 (5° / 10° / 20° AUC, FPS on i5-1135G7 CPU VGA) — XFeat sparse 42.6/56.4/67.7 at 27.1 FPS = 9× faster than SuperPoint (37.3/50.1/61.5 at 3.0 FPS) at higher AUC + 5× faster than ALIKE-Tiny (49.4/61.8/71.4 at 5.3 FPS) at slightly lower AUC; XFeat* semi-dense 50.2/65.4/77.1 at 19.2 FPS = comparable to DISK* at 16× speedup; paper Table 2 ScanNet-1500 indoor — XFeat 16.7/32.6/47.8 + XFeat* 18.4/34.7/50.3 outperforms ALL baselines including SuperPoint=12.5/24.4/36.7 + DISK + ALIKE despite all methods being MegaDepth-trained (paper Appendix E attributes this to hybrid MegaDepth+synthetic-warp-COCO training reducing landmark-dataset overfitting bias); paper Table 3 HPatches homography MHA@3 illumination/viewpoint = 95.0/68.6 (XFeat) — best illumination@3 in paper Table 3 across all evaluated methods. Paper Appendix C embedded-device timing analysis: Orange Pi Zero 3 ARM Cortex-A53 ($28 device) at 480×360 input — XFeat=1.8 FPS vs SuperPoint=0.16 FPS vs ALIKE=0.58 FPS — XFeat is the ONLY learned method capable of running over 1 FPS on highly-constrained embedded device without neural-network-inference optimization. Paper Appendix F learned-matcher comparison: XFeat* (coarse-fine) AUC@5/10/20 = 50.2/65.4/77.1 at 1.33 PPS vs LoFTR (learned matcher) 68.3/80.0/88.0 at 0.06 PPS + LightGlue (learned matcher) 61.4/75.0/84.8 at 0.31 PPS + Patch2Pix (coarse-fine) 47.8/61.0/71.0 at 0.05 PPS — XFeat* delivers 4× more inliers per pair than LightGlue at 4× higher throughput. License: Apache-2.0 via Source #80 — canonical implementation. Limitations: (a) AUC@5° on MegaDepth-1500 sparse mode is materially below DISK at strictest tier (42.6 vs 53.8 = -11.2 absolute) — XFeat sparse positioned as "competitive at much higher speed" rather than "best-accuracy"; (b) limited robustness to aggressive viewpoint changes and highly ambiguous image pairs as explicitly acknowledged in paper §F final paragraph; (c) aerial-domain training caveat shared with all C3 candidates evaluated; (d) NO PRODUCTIZED ONNX/TensorRT export pathway in canonical repo per Source #80 README Contributing section explicit community-contribution ask.
  • Related Sub-question: SQ3+SQ4 / C3 — XFeat per-mode API capability verification (cross-source verification of canonical paper architectural details + benchmark numbers + training paradigm + embedded-deployment evidence; documents the modern-competitive-lead role with strongest documented embedded-deployment signal among all C3 candidates evaluated at canonical paper publication time; aerial-domain caveat documented; modern-competitive-lead role confirmed; D-C3-6 NEW Plan-phase decision for XFeat-mode-choice required)