# Source Registry — C3 — Cross-domain matcher candidates > Mode A Phase 2 — engine Step 2 (Source Tiering & Exhaustive Web Investigation). > Critical-novelty sensitivity per Step 0.5 in `../00_question_decomposition.md`. Time windows applied: > - **Lead-candidate / SOTA claims**: prefer sources within last 6 months; up to 18 months if older is the official authority. > - **Library/SDK API behaviour**: must reflect the currently shipped version at search time (`context7` mandatory per lead candidate). > - **Established baselines** (KLT, RANSAC, EKF, ORB, SIFT, GTSAM): no time window. > > This file replaces a section of the previous monolithic `01_source_registry.md`. See `00_summary.md` for the full category index. Investigation order is tracked in `../00_question_decomposition.md` and the cross-category Investigation Status table in `00_summary.md`. --- ### Source #69 - **Title**: LightGlue — `context7` per-mode capability lookup (`/cvg/lightglue`, main) — High source reputation, benchmark score 85.4, 64 code snippets indexed - **Link**: context7 query against `/cvg/lightglue`, accessed 2026-05-08; canonical doc references returned: `https://context7.com/cvg/lightglue/llms.txt`, snippets `Initialize LightGlue Feature Matcher`, `LightGlue - Feature Matcher Initialization`, `Initialize SuperPoint Feature Extractor`, `Initialize and Use DISK Feature Extractor`, `Initialize and Use SIFT Feature Extractor`, `Perform Feature Matching with LightGlue`, `Complete Matching Pipeline Example`, `Initialize and Use SuperPoint + LightGlue Matcher`, `Extract Matched Keypoint Coordinates` - **Tier**: L1 (project-official codebase by canonical LightGlue authors Philipp Lindenberger + Paul-Edouard Sarlin + Marc Pollefeys, ETH Zurich + Microsoft Mixed Reality & AI Lab; `context7` indexed at `/cvg/lightglue` with High reputation, benchmark 85.4, 64 code snippets — confirms this is a widely-adopted reference implementation) - **Publication Date**: live docs (main HEAD, accessed 2026-05-08) - **Timeliness Status**: ✅ Within Critical-novelty window (active main + community evidence through 2025–2026 — see Source #73 LightGlue-ONNX changelog with January 2026 entries; HuggingFace Transformers integration mentioned in canonical README confirms continued active distribution) - **Version Info**: main HEAD at access time. **Mode-enumeration query (1/3) PASS** — all five extractor modes documented as first-class `features` enum values: `superpoint` (256-D descriptors, MagicLeap pretrained), `disk` (128-D, Apache-2.0 weights), `aliked` (128-D, BSD-3-Clause), `sift` (128-D, includes scale + orientation, classical), `doghardnet` (128-D, includes scale + orientation). Construction signature: `LightGlue(features: str, n_layers: int = 9, depth_confidence: float = 0.8, width_confidence: float = 0.9, filter_threshold: float = 0.1, flash: bool = False, mp: bool = False)`. **NOTE — version skew between context7 docstring and canonical README defaults**: context7 docstring says `depth_confidence=0.8` and `width_confidence=0.9`; canonical README §"Advanced configuration" says `depth_confidence=0.95, disable with -1` and `width_confidence=0.99, disable with -1` and `flash=True (LightGlue automatically detects if FlashAttention is available)`. The canonical README values are authoritative for the live source. PyTorch ≥2.0 enables `matcher.compile(mode='reduce-overhead')` for additional speedup (with caveat: for inputs <1536 keypoints compiles but disables point pruning; for larger inputs falls back to eager mode with point pruning). - **Target Audience**: System architects + C3 implementer + Step-7.5 reviewer - **Research Boundary Match**: **Full match** for the project's pinned C3 mode (SuperPoint feature extractor + LightGlue matcher in single-pair single-image-pair inference mode on GPU, with `feats_q = extractor.extract(image_q)`, `feats_t = extractor.extract(image_t)`, `matches_qt = matcher({'image0': feats_q, 'image1': feats_t})` followed by `rbd()` to remove batch dimension). The canonical pipeline and the rbd helper handle asymmetric image-pair sizes (UAV nadir 5472×3648 vs satellite tile of any size at 0.5 m/px) since each image is independently extracted before matching. **Open**: `context7` Disqualifier-Probe query did not surface ONNX/TensorRT export paths inside the cvg/lightglue repo itself — those are documented in the companion `fabio-sim/LightGlue-ONNX` project (Source #73), which the canonical README explicitly links to. Did not surface Jetson-specific latency/memory measurements (similarly to all C2 candidates — Jetson MVE phase will resolve). - **Summary**: Confirms LightGlue's per-mode API surface and runnable example for single-pair feature matching: each `(features, matcher)` extractor-matcher tuple is a separately-cataloged sibling mode per the Per-Mode API rule. Confirmed five sibling modes via `features=` enum: SuperPoint+LightGlue, DISK+LightGlue, ALIKED+LightGlue, SIFT+LightGlue, DoGHardNet+LightGlue. Canonical inference signature is `matcher({'image0': feats0, 'image1': feats1})` returning `{matches0, matches1, matching_scores0, matching_scores1, matches: List[[K,2]], scores: List[[K]], stop: int}`; `rbd(x)` helper removes the batch dimension to extract single-pair tensors. The `points0 = feats0['keypoints'][matches[..., 0]]` and `points1 = feats1['keypoints'][matches[..., 1]]` extraction pattern produces 2D-2D correspondences directly consumable by the project's downstream C4 PnP+RANSAC pose estimator. Performance configuration knobs documented: `depth_confidence`, `width_confidence`, `filter_threshold`, `flash`, `mp`, `n_layers`, `compile()`. **Open**: cross-domain (UAV nadir × ortho satellite) recall numbers absent from context7-indexed snippets (concentration on phototourism/visual-localization benchmarks); aerial-domain validation requires Jetson MVE on AerialExtreMatch + Derkachi flight per D-C3 deferred phase. - **Related Sub-question**: SQ3+SQ4 / C3 — LightGlue per-mode API capability verification (Mandatory `context7` lookup per Per-Mode API Capability Verification rule) ### Source #70 - **Title**: LightGlue canonical implementation — `cvg/LightGlue` (Lindenberger, Sarlin, Pollefeys — ICCV 2023) — official PyTorch reference implementation, README + LICENSE, demo notebook (`demo.ipynb`), benchmark script (`benchmark.py`), training/eval framework reference (companion `cvg/glue-factory`); pretrained weights for SuperPoint + DISK + ALIKED + SIFT + DoGHardNet local features; HuggingFace Transformers integration (`pip install transformers`, model card `ETH-CVG/lightglue_superpoint`); kornia integration (`kornia.feature.LightGlue` and `kornia.feature.LightGlueMatcher`); hloc integration for Structure-from-Motion + visual localization; LightGlue-ONNX export project (Source #73) - **Link**: README https://raw.githubusercontent.com/cvg/LightGlue/main/README.md (accessed 2026-05-08); LICENSE https://raw.githubusercontent.com/cvg/LightGlue/main/LICENSE (accessed 2026-05-08); repo https://github.com/cvg/LightGlue ; HuggingFace model card https://huggingface.co/ETH-CVG/lightglue_superpoint ; companion training framework https://github.com/cvg/glue-factory ; companion ONNX export https://github.com/fabio-sim/LightGlue-ONNX (Source #73); kornia API https://kornia.readthedocs.io/en/latest/feature.html#kornia.feature.LightGlue - **Tier**: L1 (project-official codebase by the canonical LightGlue authors Philipp Lindenberger + Paul-Edouard Sarlin + Marc Pollefeys, ETH Zurich + Microsoft Mixed Reality & AI Lab; same author group as `cvg/Hierarchical-Localization` (hloc), `cvg/glue-factory`, `cvg/pixel-perfect-sfm`) - **Publication Date**: README live; main HEAD active through 2024–2026 (HuggingFace Transformers integration is recent — `@sbucaille` credited; LightGlue-ONNX companion has January 2026 entries per Source #73); canonical paper ICCV 2023 - **Timeliness Status**: ⚠️ Borderline — paper Jun 2023 / ICCV 2023 is at the edge of the Critical-novelty 18-month window for SQ3+SQ4 component selection; **HOWEVER**, LightGlue is treated as the SOTA sparse matcher reference baseline in every modern (2024–2026) feature-matching paper, the algorithmic content is stable, the canonical implementation is actively maintained (HuggingFace Transformers integration adds plug-and-play API), the LightGlue-ONNX project (Source #73) is actively maintained through January 2026 with FP8 quantization workflow added, and Berton+Trivigno's `gmberton/auto_VPR` companion harness for the C2 row also explicitly evaluates SP+LightGlue as the C3 reference matcher. Per the engine's Established-baseline exemption applicable to widely-adopted reference algorithms, LightGlue's canonical role is the sparse-matcher SOTA reference point for the C3 row; freshness concerns are on (a) emerging successors that improve on LightGlue (XFeat 2024, XFeat* 2024, SuperGlue/LightGlue successor candidates), (b) aerial-domain transfer of canonical phototourism-trained weights (same caveat as C2 candidates' aerial-domain training caveat — D-C3-1 raised by this closure) - **Version Info**: main HEAD; PyTorch (≥2.0 recommended for FlashAttention auto-detection + `compile()` support); installation via `git clone && pip install -e .`; canonical inference five-line pipeline shown in README (loads `LightGlue + SuperPoint/DISK/SIFT/ALIKED/DoGHardNet` from the `lightglue` package); `lightglue.utils.load_image` returns `torch.Tensor[3, H, W]` normalized to [0,1]; `lightglue.utils.rbd` removes batch dimension; `lightglue.match_pair(extractor, matcher, image0, image1)` is a one-call convenience method; `lightglue.viz2d.{plot_images, plot_keypoints, plot_matches, save_plot}` for visualization. **Default LightGlue construction parameters per canonical README** (authoritative over context7 docstring): `n_layers=9` (all layers), `flash=True` (auto-detected when available), `mp=False`, `depth_confidence=0.95` (disable with -1), `width_confidence=0.99` (disable with -1), `filter_threshold=0.1`. **Reported benchmark numbers (canonical README + paper)**: 150 FPS @ 1024 keypoints on RTX 3080 (= ~6.7 ms per pair, with compilation + adaptivity); 50 FPS @ 4096 keypoints on RTX 3080 (= 20 ms per pair); 4–10× speedup over SuperGlue depending on input difficulty; 20 FPS @ 512 keypoints on Intel i7 10700K CPU (= ~50 ms per pair, CPU baseline). **License**: **Apache-2.0** for cvg/LightGlue code AND for cvg/LightGlue's pre-trained weights AND for the bundled DISK weights (DISK is published under Apache-2.0). **CRITICAL CAVEAT** for the SuperPoint-extractor-mode: the SuperPoint pretrained weights AND its inference file `lightglue/superpoint.py` follow `magicleap/SuperPointPretrainedNetwork`'s **separate, restrictive license** — see Source #72 for the full license text. ALIKED is published under **BSD-3-Clause**; SIFT is patent-free since 2020 in OpenCV, classical algorithm with no weight-licensing concern. 4.7k+ stars at canonical repo - **Target Audience**: System architects + C3 implementer + Step-7.5 reviewer + license-posture decision-maker (D-C1-1 + D-C3-1 NEW) - **Research Boundary Match**: **Full match** for the project's pinned C3 mode (single-image-pair sparse feature matching: take a UAV nadir frame + a retrieved satellite tile, run feature extraction on each independently, match via LightGlue, return 2D-2D correspondences with confidence scores feeding the project's downstream C4 PnP+RANSAC pose estimator). The repo ships everything needed: extractor classes (`SuperPoint, DISK, ALIKED, SIFT, DoGHardNet`), matcher class (`LightGlue`), I/O utilities (`load_image, rbd, match_pair`), visualization (`viz2d`), benchmark tooling (`benchmark.py`). **Asymmetric image-pair sizes are handled natively**: each image is independently fed through `extractor.extract(...)` which auto-resizes to the extractor's preferred resolution (default 1024 on longest edge for SuperPoint), then matched — the matcher operates on (keypoint coords, descriptor vectors) tuples that are size-independent. **Partial match** for the project's domain (canonical training on synthetic homographies of Oxford-Paris 1M distractors + fine-tuning on MegaDepth phototourism — neither dataset is aerial nadir; **NO aerial nadir benchmark** in the canonical paper, **same aerial-domain caveat as the C2 candidates**; aerial applicability is referenced transitively via Zhang et al. 2022 ISPRS [paper ref [83]], but explicit aerial-nadir validation is project-side via Jetson MVE on AerialExtreMatch + Derkachi flight) - **Summary**: LightGlue is the canonical reference implementation of the ICCV 2023 paper "LightGlue: Local Feature Matching at Light Speed" by Lindenberger, Sarlin, Pollefeys. **CRITICAL LICENSE FINDING**: LICENSE file is **Apache-2.0** (Copyright 2023 ETH Zurich) — permissive; this places LightGlue ITSELF on the **BSD/permissive license track** alongside MixVPR (MIT) + SelaVPR (MIT) + NetVLAD-canonical (MIT) + EigenPlaces (MIT) + Kimera-VIO (BSD-2) + OKVIS2 (BSD-3) + DPVO (MIT) + pure-VO baseline (OpenCV-Apache-2.0). DISK (the second extractor mode) is also Apache-2.0 per the canonical README. ALIKED is BSD-3-Clause. SIFT is classical patent-free in OpenCV. **HOWEVER, the SuperPoint pretrained weights AND the `lightglue/superpoint.py` inference file follow `magicleap/SuperPointPretrainedNetwork`'s license — see Source #72** — which is "ACADEMIC OR NON-PROFIT ORGANIZATION NONCOMMERCIAL RESEARCH USE ONLY". This is a **HARD DISQUALIFIER** for the canonical SP+LightGlue pinned mode in the project's commercial/dual-use deployment context (eastern/southern Ukraine fixed-wing UAV is explicitly dual-use military per the project disqualifier "anything whose license blocks military / dual-use deployment"). **License-track summary**: cvg/LightGlue itself = Apache-2.0 (BSD/permissive track, no commercial restriction); SP weights = Magic Leap restrictive (NOT BSD/permissive, blocks commercial/dual-use); DISK weights = Apache-2.0; ALIKED weights = BSD-3-Clause; SIFT = classical no-license-concern. **Plan-phase decision raised** (will be tagged D-C3-1): swap canonical SP+LightGlue for one of: (a) DISK+LightGlue (Apache-2.0 throughout, paper Table 6+7 demonstrates DISK+LightGlue often outperforms SP+LightGlue on Image Matching Challenge benchmarks); (b) ALIKED+LightGlue (BSD-3-Clause + Apache-2.0); (c) re-train a SuperPoint-class extractor under permissive license (e.g., kornia's reproduction OR retrain on aerial nadir corpus); (d) accept Magic Leap noncommercial-research license for the project's research/development phase only with explicit Plan-phase commitment to swap before production deployment. Multiple downstream integration points documented: (i) **HuggingFace Transformers** — `pip install transformers` plug-and-play with `ETH-CVG/lightglue_superpoint` model card (separate license terms inherited from HuggingFace + Magic Leap stack); (ii) **kornia** — `kornia.feature.LightGlue` and `kornia.feature.LightGlueMatcher` interfaces; (iii) **hloc** — Structure-from-Motion + visual localization toolbox; (iv) **LightGlue-ONNX** — ONNX/TensorRT/OpenVINO/FP16/FP8 export project (Source #73); (v) **Image Matching WebUI** — comparison harness. **Performance**: **150 FPS @ 1024 keypoints on RTX 3080** with compilation + adaptivity (= ~6.7 ms per pair) and **50 FPS @ 4096 keypoints on RTX 3080** (= 20 ms per pair). **Adaptive depth and width** (paper §3.3) reduce inference time by ~33% at <1% loss of accuracy on common workloads (paper Table 11 ablation). At the project's expected per-frame 2 image-pair load (UAV-nadir → top-1 satellite tile after C2 retrieval, possibly 2-5 pairs if K=5–10 top-K reranking), Jetson Orin Nano Super extrapolation factor 4-6× of RTX 3080 baseline → **~30–60 ms per pair @ 1024 keypoints** at fp16 + TensorRT; **~80–120 ms per pair @ 2048 keypoints**. Additional speedups available via LightGlue-ONNX (Source #73) up to FP8 quantization (factor ~2× over fp16). **Architecture**: 9 transformer layers (self-attention + cross-attention per layer), 4 attention heads per unit, descriptor dimension d=256; rotary positional encoding (relative, applied at each self-attention); soft partial assignment matrix combining similarity + matchability scores; bidirectional cross-attention saves ~33% time; deep supervision (loss at every layer, stops early when confident). **Training**: pre-train on synthetic homographies of Oxford-Paris 1M distractors (170k images, 6M image pairs, 2 days on 2 RTX 3090) + fine-tune on MegaDepth phototourism (368/5/24 train/val/test scenes, 50 epochs, 2 days on 2 RTX 3090, 32 image pairs per batch with gradient checkpointing). **Modern lineage / successors**: ALIKED+LightGlue, DoGHardNet+LightGlue (added to canonical repo post-paper); XFeat (CVPR 2024 — separately-cataloged C3 candidate per Fact #26 NGPS template confirmed; documented to outperform LightGlue on speed at slightly lower accuracy); MASt3R (separately-cataloged but pruned by Fact #26 due to dense-matcher latency on Jetson) - **Related Sub-question**: SQ3+SQ4 / C3 — LightGlue per-mode API capability verification (Mandatory `context7` lookup PASS — `/cvg/lightglue` indexed with High source reputation and benchmark score 85.4; cross-validated against canonical README + LICENSE WebFetch + canonical paper WebFetch [Source #71]) ### Source #71 - **Title**: LightGlue canonical paper — "LightGlue: Local Feature Matching at Light Speed" (Lindenberger, Sarlin, Pollefeys — ICCV 2023, arXiv:2306.13643) - **Link**: arXiv https://arxiv.org/abs/2306.13643 (Jun 2023); ICCV 2023 published version (citation booktitle = ICCV); accessed 2026-05-08 - **Tier**: L1 (peer-reviewed ICCV 2023 + canonical implementation cross-referenced; **most-cited modern sparse matcher paper of the post-SuperGlue era**, treated as the SOTA sparse-matcher reference baseline in every 2024–2026 feature-matching paper) - **Publication Date**: arXiv preprint 2023-06-23; ICCV 2023 acceptance October 2023 - **Timeliness Status**: ⚠️ Borderline — paper Jun 2023 is at the edge of the Critical-novelty 18-month window for SQ3+SQ4; **HOWEVER**, the Established-baseline exemption applies (LightGlue is the canonical sparse-matcher reference baseline, like NetVLAD is for VPR), the algorithmic content is stable, the canonical implementation is actively maintained (HuggingFace Transformers integration recent), and 2024–2026 successor candidates (XFeat, XFeat*) explicitly position themselves as LightGlue alternatives in the same paper space. Freshness concerns are (a) successor candidates that improve on LightGlue (XFeat 2024 separately-cataloged), (b) aerial-domain weights (the project's D-C3-1 + same caveat as C2 candidates) - **Version Info**: arXiv v1 (Jun 2023, ICCV camera-ready); paper §3 architecture + §3.3 adaptive depth/width + §4 training recipe + §5 experiments + Appendix A IMC 2020/2021/2023 + Appendix B MegaDepth-1800 / Aachen v1.1 / InLoc + Appendix C implementation details + Appendix D timing breakdowns - **Target Audience**: System architects + C3 implementer + Step-7.5 reviewer - **Research Boundary Match**: **Full match** for the algorithm (sparse feature matching with adaptive-depth + adaptive-width transformer pruning, soft partial assignment matrix combining similarity + matchability, bidirectional cross-attention, rotary positional encoding); **partial match** for the project's domain (paper benchmarks: HPatches homography estimation [§5.1, planar scenes with illumination + viewpoint changes], MegaDepth-1500 + MegaDepth-1800 relative pose estimation [§5.2 + Appendix B, outdoor phototourism], Aachen Day-Night + Aachen v1.1 outdoor visual localization [§5.3 + Appendix B, urban day/night], InLoc indoor visual localization [Appendix B], Image Matching Challenge 2020/2021/2023 [Appendix A, phototourism]; **NO aerial nadir benchmark** in the canonical paper). **Critical paper reference [83]**: paper §1 Related work cites "Zhang et al. ISPRS Journal of Photogrammetry and Remote Sensing 2022 — Feature matching for multi-epoch historical aerial images" as documentary evidence that "SuperGlue generalizes well to aerial matching" — by transitive lineage (LightGlue is the SuperGlue successor with documented 4-10× speedup), this provides weak documentary evidence that LightGlue is similarly applicable to aerial matching, but **NOT explicit aerial-nadir validation**. - **Summary**: The canonical paper introduces **LightGlue = a deep neural network for sparse feature matching that is faster, more accurate, and easier to train than SuperGlue**, with the central novelty being **adaptivity to image-pair difficulty** (paper §3.3): (a) **adaptive depth** — predict a confidence score per point per layer; halt the inference at any layer if a sufficient ratio α (default 95%) of points are confident; (b) **adaptive width** — discard at each layer the points that are confidently predicted as both confident and unmatchable (paper Eq. 13: `unmatchable(i) = c_i^l > λ_l & σ_i^l < β` with β=0.01); these two mechanisms reduce inference time by **~33% on average** (paper §5.4 + Table 5: 1.86× speedup on easy pairs, 1.16× on hard pairs, 1.45× average) at <1% accuracy loss. **Architecture (paper §3.1 + §3.5 + Appendix C.1)**: stack of L=9 transformer layers, each with one self-attention + one cross-attention unit; descriptor dimension d=256; 4 attention heads per unit; **rotary positional encoding** [Su et al. 2023 RoFormer reference [67]] applied to query+key in self-attention only (NOT cross-attention) with a learned 2D Fourier basis; **bidirectional cross-attention** that computes the similarity matrix only once per layer (saves ~33% time vs full cross-attention per Appendix D); **soft partial assignment matrix** P combining pairwise similarity scores `S_{ij} = Linear(x_i^A)^T Linear(x_j^B)` and unary matchability scores `σ_i = Sigmoid(Linear(x_i))` via `P_{ij} = σ_i^A · σ_j^B · Softmax_k(S_{kj})_i · Softmax_k(S_{ik})_j` (Eq. 8); **filter threshold τ=0.1** (Eq. 8 + Appendix C.4) — pairs (i,j) yield correspondence when `P_{ij} > τ` AND `P_{ij}` is the row-max AND column-max. **Reported headline performance**: **HPatches homography estimation (Table 1, SuperPoint+LightGlue, 1024 keypoints)** R=94.3 / P=88.9 (best precision among sparse matchers, +1.5 over SuperGlue 87.4); AUC-DLT@5px=78.6 (vs SuperGlue 76.7, vs SGMNet 76.0; competitive with dense LoFTR 70.6). **MegaDepth-1500 relative pose estimation (Table 2, SuperPoint+LightGlue with LO-RANSAC)** AUC@5°/10°/20°=66.7/79.3/87.9 (vs SuperGlue 65.8/78.7/87.5; vs LoFTR 66.4/78.6/86.5 — competitive with dense matcher at fraction of inference time); inference time **44.2 ms** standard / **31.4 ms adaptive**. **Aachen Day-Night visual localization (Table 3, SuperPoint+LightGlue with hloc + NetVLAD top-50 retrieval)** Day (0.25m,2°)/(0.5m,5°)/(1.0m,10°) = **89.2/95.4/98.5**, Night = **87.8/93.9/100**, **17.2 pairs/sec** (26.1 optimized) — competitive with SuperGlue at 2.5–4× higher throughput. **CRITICAL OBSERVATION FOR THE PROJECT**: the Aachen Day-Night benchmark (Table 3) directly demonstrates the **NetVLAD top-K retrieval → SP+LightGlue matching → PnP+RANSAC pose estimation** pipeline, which is **exactly the project's intended pipeline shape** (C2 NetVLAD/MixVPR/SelaVPR/EigenPlaces top-K retrieval → C3 SP+LightGlue match → C4 PnP+RANSAC). The reported pose accuracies and throughput are documentary evidence that the chosen architectural pattern is canonical and well-validated in the visual-localization community. **Indirect aerial evidence (paper §1 Related work + ref [83])**: paper cites "Zhang et al. 2022 ISPRS" as evidence that SuperGlue generalizes well to aerial matching; LightGlue inherits this generalization by being the strict successor. **Image Matching Challenge benchmarks (Appendix A, Tables 6+7)**: SP+LightGlue beats SP+SuperGlue both in IMC 2020 stereo (AUC@5°=59.03 vs 58.64) and IMC 2021 phototourism (50.2 / 62.6 vs SuperGlue 49.9 / 62.2); **DISK+LightGlue beats SP+LightGlue** by +8% / +5% AUC on stereo / multi-view (IMC 2020), with ~30% more matches at higher epipolar precision — **important Plan-phase signal that DISK+LightGlue is competitive with SP+LightGlue and may be preferable when the SuperPoint license is the binding constraint**. **Adaptive variant** at IMC 2023 (Appendix A): SP+LightGlue 38.4 / 46.1 public/private (vs SP+SuperGlue 36.1 / 43.8 — +2.3% improvement). **Ease of training**: paper §4 + Figure 5 — LightGlue reaches SuperGlue parity in **5M image pairs (~2 GPU-days)** vs SuperGlue's 7+ days; **fits 32 image pairs on 24 GB VRAM** with gradient checkpointing + mixed precision. **License (canonical implementation): Apache-2.0** (per Source #70 LICENSE) — permissive, BSD/permissive license track; SuperPoint pretrained weights are Magic Leap noncommercial-research only (Source #72 disqualifier). - **Related Sub-question**: SQ3+SQ4 / C3 — LightGlue per-mode API capability verification (cross-source verification of the canonical implementation's mode/parameter/training-recipe/Recall@K + AUC + throughput claims; Aachen Day-Night benchmark Table 3 documentary evidence for the project's intended pipeline shape NetVLAD top-K → SP+LightGlue → PnP+RANSAC; aerial-domain caveat documented; D-C3-1 SuperPoint license disqualifier raised) ### Source #72 - **Title**: SuperPoint pretrained weights LICENSE — `magicleap/SuperPointPretrainedNetwork` LICENSE — "ACADEMIC OR NON-PROFIT ORGANIZATION NONCOMMERCIAL RESEARCH USE ONLY" Software License Agreement; binding on the SuperPoint weights AND on the `lightglue/superpoint.py` inference file used by `cvg/LightGlue` for the SP+LightGlue mode - **Link**: https://raw.githubusercontent.com/magicleap/SuperPointPretrainedNetwork/master/LICENSE (accessed 2026-05-08); repo https://github.com/magicleap/SuperPointPretrainedNetwork - **Tier**: L1 (canonical Magic Leap LICENSE file controlling the SuperPoint pretrained weights distribution) - **Publication Date**: SuperPoint paper CVPR 2018 Workshop (DeTone, Malisiewicz, Rabinovich); LICENSE file timestamps within Magic Leap's repo HEAD - **Timeliness Status**: ✅ Authoritative — license terms are owned by Magic Leap and do not have a freshness window concern; the binding is permanent and applies to every distribution of the SuperPoint pretrained weights including the copy in `cvg/LightGlue` - **Version Info**: SuperPoint pretrained network checkpoint distributed by Magic Leap; bundled into `cvg/LightGlue` as `lightglue/superpoint.py` + the embedded weights (per cvg/LightGlue README §License) - **Target Audience**: Step-7.5 reviewer + license-posture decision-maker (D-C1-1 + D-C3-1 NEW — license-track gate for SuperPoint-extractor-mode adoption in dual-use commercial deployment) - **Research Boundary Match**: **Full match** for the license restriction analysis (the Magic Leap LICENSE is the binding instrument controlling SuperPoint weight redistribution in `cvg/LightGlue`'s SP+LightGlue mode) - **Summary**: Magic Leap's SuperPoint LICENSE is **NOT a permissive open-source license**. It is a noncommercial-research-only Software License Agreement between Magic Leap (Licensor) and the user (Licensee, an academic institution OR non-profit organization OR self-individual). The key restrictions are: (a) **PERMITTED USES**: "for your own noncommercial internal research purposes" — the Software (= SuperPoint weights + inference code + any derivatives) may NOT be used for commercial purposes; (b) **DERIVATIVES**: "all and any such derivatives and modifications will be owned by Licensor and become a part of the Software licensed to You under this Agreement" — modifications are auto-owned by Magic Leap, restricting downstream redistribution; (c) **USES NOT PERMITTED**: "You may not distribute, copy or use the Software except as explicitly permitted herein. You may not sell, rent, lease, sublicense, lend, time-share or transfer, in whole or in part, or provide third parties access to prior or present versions (or any parts thereof) of the Software"; (d) **EXPORT REGULATION**: Licensee must comply with U.S. export control + OFAC embargo/sanction programs — note that fixed-wing UAV deployment in eastern/southern Ukraine in active-conflict context likely interacts with U.S. export controls + Russia/Ukraine/Crimea sanctions specifics (independent legal analysis required); (e) **GOVERNING LAW**: Florida (Broward County) — non-negotiable jurisdiction. **PROJECT IMPACT**: the GPS-Denied Onboard project's question_decomposition.md hard disqualifier is "anything whose license blocks military / dual-use deployment"; the Magic Leap LICENSE explicitly blocks commercial use AND blocks distribution. The project's deployment context (fixed-wing UAV in active-conflict Ukraine, AC-NEW-2 spoofing-promotion path explicitly deals with hostile electromagnetic warfare) is **dual-use military** by every reasonable interpretation. Therefore the canonical Magic Leap SuperPoint pretrained weights AND `lightglue/superpoint.py` inference code are **HARD DISQUALIFIED** for the project's commercial / dual-use deployment context. **Mitigation paths** (for D-C3-1 Plan-phase Choose block): (a) DISK+LightGlue (Apache-2.0 throughout) — paper Table 6 shows DISK+LightGlue stereo AUC@5°=67.02 vs SP+LightGlue 59.03 (+7.99 absolute) — DISK+LightGlue is **demonstrably superior on phototourism** to SP+LightGlue; (b) ALIKED+LightGlue (BSD-3-Clause + Apache-2.0); (c) re-train a SuperPoint-class extractor under permissive license — kornia has a SuperPoint reproduction (`kornia.feature.SuperPoint`) but its weights' license must be independently verified at Plan-phase (LightGlue-ONNX Source #73 also distributes its own SuperPoint+LightGlue ONNX weights — which inherit the Magic Leap restriction by transitive lineage); (d) accept Magic Leap noncommercial-research license for the project's R&D phase only with explicit Plan-phase commitment to swap before production deployment (legally risky — internal research could still be construed as commercial preparation given the dual-use deployment intent). **Recommendation: D-C3-1 = (a) DISK+LightGlue is the cleanest license-compliant alternative; per paper Table 6 it's also the strongest phototourism alternative**. ALIKED+LightGlue is the second-cleanest BSD-3-Clause + Apache-2.0 option but lacks the IMC 2020 / 2021 / 2023 documentary phototourism benchmarks that DISK+LightGlue has. - **Related Sub-question**: SQ3+SQ4 / C3 — SuperPoint pretrained weights license restriction analysis (License-track gate for the SP+LightGlue canonical mode); applies to D-C1-1 license posture interaction; raises NEW **D-C3-1 SuperPoint-replacement-strategy choice (DISK+LightGlue / ALIKED+LightGlue / SuperPoint-reproduction-with-permissive-license / accept-Magic-Leap-noncommercial-with-swap-commitment)** Plan-phase decision ### Source #73 - **Title**: LightGlue-ONNX — `fabio-sim/LightGlue-ONNX` (Sim, fabio-sim) — Open Neural Network Exchange compatible implementation of LightGlue + SuperPoint (and DISK) end-to-end pipeline; supports TensorRT, OpenVINO, FP16 mixed precision, FP8 Q/DQ quantization (NVIDIA ModelOpt — January 2026 addition); FlashAttention-2 fused ONNX models; MultiHead-Attention fusion optimization; ArgMax → TopK trick for ~30% speedup; Kornia integration as `kornia.feature.OnnxLightGlue`; CLI `lightglue-onnx` with `export | infer | trtexec` commands; canonical reference for Jetson/edge/embedded LightGlue deployment - **Link**: README https://raw.githubusercontent.com/fabio-sim/LightGlue-ONNX/main/README.md (accessed 2026-05-08); repo https://github.com/fabio-sim/LightGlue-ONNX ; FP8 quantization blog post https://fabio-sim.github.io/blog/fp8-quantized-lightglue-tensorrt-nvidia-model-optimizer/ ; ONNX Runtime + TensorRT inference blog post https://fabio-sim.github.io/blog/accelerating-lightglue-inference-onnx-runtime-tensorrt/ ; Kornia integration https://kornia.readthedocs.io/en/latest/feature.html#kornia.feature.OnnxLightGlue - **Tier**: L2 (third-party canonical ONNX export project — most-cited LightGlue ONNX deployment reference in the modern feature-matching deployment community as of 2026; explicitly endorsed by `cvg/LightGlue` README "Other links" section as the canonical TensorRT/OpenVINO export path; Kornia integration confirms broader community adoption) - **Publication Date**: Initial commit Jun 2023 (one week after `cvg/LightGlue` paper publication); active maintenance through January 2026 (most recent changelog entry: "19 January 2026: Add FP8 quantization workflow"); 1k+ stars - **Timeliness Status**: ✅ Fully within Critical-novelty window (active main + January 2026 changelog entries on FP8 quantization and refurbished CLI UX with modern uv) - **Version Info**: main HEAD; CLI `lightglue-onnx` with three commands: `export` (pipeline ONNX export with `--num-keypoints N`, `-b 2 -h 1024 -w 1024` static-shape parameterization), `infer` (ONNX Runtime inference with `-d cuda|tensorrt|openvino|cpu` provider selection, `--fp16` mixed-precision flag), `trtexec` (Polygraphy-based pure TensorRT inference with `--fp16` flag); legacy export path available via `--legacy-export` flag; FP8 quantization workflow via `lightglue_dynamo/scripts/quantize.py --quantize-mode fp8 --dq-only --simplify` produces FP8 Q/DQ ONNX models with `--precision-constraints prefer --fp16` TensorRT inference; uv-based dependency management (`uv sync` for inference-only; `uv sync --group export` for export support; `uv sync --group trt` for TensorRT CLI). **Performance evolution**: 28 Jun 2023 — initial end-to-end SP+LightGlue export; 11 Jul 2023 — mixed precision; 13 Jul 2023 — Flash Attention; 19 Jul 2023 — TensorRT support; 04 Oct 2023 — MultiHead-Attention fusion + Fused LightGlue ONNX with FlashAttention-2 (up to **80% faster inference** on long sequences); 27 Oct 2023 — Kornia integration; 02 Nov 2023 — TopK trick optimizes out ArgMax (~30% speedup); 17 Jul 2024 — end-to-end parallel dynamic batch size support; 09 Jan 2026 — modern uv UX refresh; 19 Jan 2026 — FP8 quantization workflow via NVIDIA ModelOpt - **Target Audience**: System architects + C3 implementer + C7 (Jetson runtime) implementer + Step-7.5 reviewer - **Research Boundary Match**: **Full match** for the project's pinned C3 Jetson deployment runtime question (LightGlue's TensorRT export path on Jetson Orin Nano Super at fp16 + INT8/FP8 + ONNX Runtime). The project's C7 row will inherit the choice between PyTorch-fp16, Torch-TensorRT, ONNX Runtime + TensorRT EP, or pure TensorRT — LightGlue-ONNX is the canonical reference for the latter three options. **Partial match** for the project's domain (this project's repository targets phototourism / general-purpose visual-localization, NOT aerial nadir specifically; the same aerial-domain caveat as `cvg/LightGlue` applies — D-C3-1 Plan-phase decision) - **Summary**: LightGlue-ONNX is the canonical third-party ONNX/TensorRT/OpenVINO deployment path for `cvg/LightGlue`. **Critical findings for the C3 + C7 deployment gates**: (a) **End-to-end SP+LightGlue + DISK+LightGlue ONNX pipeline export** with static-shape parameterization (e.g., `-b 2 -h 1024 -w 1024 --num-keypoints 1024`) — image dimensions and keypoint count are baked in at export time; dynamic-shape support added 17 Jul 2024 for parallel batch sizes; (b) **TensorRT 8.5+ support on Jetson** is feasible — the `lightglue-onnx trtexec` CLI uses Polygraphy as the TensorRT runner, which is well-documented on JetPack; FP16 mixed-precision is the default and recommended Jetson configuration; (c) **FP8 quantization workflow** (Jan 2026 addition) via NVIDIA ModelOpt's Q/DQ insertion produces FP8 ONNX models that, when run with TensorRT `--precision-constraints prefer --fp16`, achieve **~2× speedup over fp16 baseline** on Hopper/Ada/Blackwell GPUs (paper reference: NVIDIA ModelOpt FP8 documentation) — **but Jetson Orin Nano Super has Ampere architecture, NOT FP8-native; FP8 path is Plan-phase deferred for Jetson and applies only if the project upgrades to a Jetson Orin Super (Ampere with FP8 NOT supported) or if the FP8 graph falls back to INT8 quantization on Ampere via TensorRT's transparent precision-emulation (verification required at Jetson MVE phase)**; (d) **FlashAttention-2 fused ONNX** (Oct 2023) with up to 80% faster inference on long-keypoint sequences via `onnxruntime>=1.16.0` — applies to the project's pinned 1024-keypoint extraction; (e) **TopK trick** (Nov 2023) optimizes out ArgMax for ~30% speedup — applies transparently after re-export; (f) **OpenVINO support** for Intel-CPU/iGPU deployment — not directly applicable to Jetson but useful for offline-PC pre-flight cache provisioning (C10 row); (g) **Kornia integration** via `kornia.feature.OnnxLightGlue` interface — drop-in replacement for `kornia.feature.LightGlue` when ONNX deployment is preferred. **Documented inference-time comparison** (linked blog post): on RTX-class GPUs the ONNX/TensorRT path achieves **3-5× speedup** over the canonical PyTorch path at fp16; FP8 path adds another ~2× on FP8-native architectures; Ampere/Jetson Orin Nano Super FP8 emulation factor is unverified (Jetson MVE phase). **License**: not explicitly checked in this fetch; repo README does not cite a LICENSE file in the visible header — Plan-phase verification gate (similar to D-C2-8 Nanne PyTorch port license-uncertainty caveat). **Acknowledged dependencies**: ONNX, TensorRT, ONNX Runtime, OpenVINO, NVIDIA ModelOpt (FP8), Polygraphy. **Project relevance**: this project's C7 (Jetson runtime) row will likely choose between PyTorch-fp16 (lowest engineering cost, highest deployment footprint), Torch-TensorRT (medium engineering cost, Jetson-friendly), ONNX Runtime + TensorRT EP via LightGlue-ONNX (medium engineering cost, well-documented Jetson pathway), or pure TensorRT via `trtexec` + Polygraphy (highest engineering cost, lowest deployment footprint, Jetson-friendly) — LightGlue-ONNX is the canonical reference for options 3 and 4. - **Related Sub-question**: SQ3+SQ4 / C3 + C7 — LightGlue Jetson deployment runtime evidence (cross-source confirmation that LightGlue has a documented, actively-maintained TensorRT/ONNX/OpenVINO/FP8 export path; the project's C7 row will reference this source when the inference-runtime decision is closed at Plan-phase); also raises **D-C3-2 LightGlue inference runtime choice (PyTorch-fp16 / Torch-TensorRT / ONNX Runtime + TensorRT EP / pure TensorRT via trtexec + Polygraphy / FP8 ModelOpt-on-Jetson if Ampere FP8 emulation works)** Plan-phase decision ### Source #74 - **Title**: ALIKED canonical implementation — `Shiaoming/ALIKED` (Zhao et al. IEEE T-IM 2023) — official PyTorch reference implementation, README + LICENSE (BSD-3-Clause), `demo_pair.py` + `demo_seq.py` runnable demos, four pretrained model variants distributed in-tree under `models/` (`aliked-t16` tiny / `aliked-n16` normal / `aliked-n16rot` rotation-augmented normal / `aliked-n32` higher-SDDH-sample-count normal), `custom_ops/build.sh` legacy CUDA extension build (NOT used by the cvg/LightGlue port — the port replaced `custom_ops` with `torchvision.ops.deform_conv2d` directly per Source #70 `lightglue/aliked.py` lines 39 + 336–344, removing the build-from-source requirement) - **Link**: README https://raw.githubusercontent.com/Shiaoming/ALIKED/main/README.md (accessed 2026-05-08); LICENSE https://raw.githubusercontent.com/Shiaoming/ALIKED/main/LICENSE (accessed 2026-05-08); repo https://github.com/Shiaoming/ALIKED ; cvg/LightGlue's ALIKED port `lightglue/aliked.py` https://raw.githubusercontent.com/cvg/LightGlue/main/lightglue/aliked.py (BSD-3-Clause inherited from Shiaoming/ALIKED canonical, with explicit author + license attribution in the file header lines 1–33) - **Tier**: L1 (project-official codebase by the canonical ALIKED authors Xiaoming Zhao + Xingming Wu + Weihai Chen + Peter C. Y. Chen + Qingsong Xu + Zhengguo Li, Beihang University + University of Macau + National University of Singapore + A*STAR Singapore; same author group as `Shiaoming/ALIKE` (T-MM 2022, the predecessor network), 1.4k+ stars at canonical repo, IEEE Transactions on Instrumentation & Measurement 2023 publication) - **Publication Date**: ALIKED paper IEEE T-IM April 2023 (DOI 10.1109/TIM.2023.3271000); canonical repo HEAD active (cvg/LightGlue port added the four-variant `aliked-t16/n16/n16rot/n32` interface post-publication via `lightglue/aliked.py`) - **Timeliness Status**: ✅ Within Critical-novelty window (April 2023 — modern competitive ground for sparse-extractor reference; widely-adopted reference implementation across modern feature-matching deployment community); cvg/LightGlue's ALIKED port itself is actively maintained on the cvg/LightGlue main branch - **Version Info**: main HEAD at access time. **Mode-enumeration query (1/3) — context7 NOT INDEXED + WebFetch fallback PASS** — `context7 resolve-library-id` returned no relevant matches for "ALIKED" (Supabase / Vitest / AI SDK / Mastra / Better Auth top-results, indicating no `Shiaoming/ALIKED` library entry in the context7 index); per Per-Mode API Capability Verification rule item 2, fall-back to official-docs WebFetch on the canonical repo README + LICENSE was used. **Four ALIKED model variants exposed in `cvg/LightGlue` lightglue/aliked.py via `model_name` enum**: `aliked-t16` (Tiny: c1=8, c2=16, c3=32, c4=64, dim=**64-D descriptor**, K=3 SDDH kernel size, M=16 SDDH sample positions, **0.192M parameters**, **1.37 GFLOPs on 640×480 + 1k keypoints**, **125.87 FPS RTX 2060** — most-Jetson-friendly variant); `aliked-n16` (Normal: c1=16, c2=32, c3=64, c4=128, dim=**128-D descriptor**, K=3, M=16, **0.677M parameters**, **4.05 GFLOPs**, **77.40 FPS RTX 2060** — canonical paper baseline); `aliked-n16rot` (Normal + rotation augmentation training: same arch as n16 but with rotation-data-augmentation prior; better viewpoint-rotation invariance per paper Fig. 6 top chart, slightly worse 3D-reconstruction accuracy than n16 per paper §VI-C1); `aliked-n32` (Normal with higher SDDH sampling: c1=16, c2=32, c3=64, c4=128, dim=**128-D**, K=3, **M=32 SDDH sample positions**, **0.980M parameters**, **4.62 GFLOPs**, **75.64 FPS RTX 2060** — best matching accuracy variant). **In cvg/LightGlue the ALIKED extractor is wired to `LightGlue(features='aliked')` with `input_dim=128` matcher config** (per Source #70 `lightglue/lightglue.py` lines 345–348). **Default per-extractor config** (cvg/LightGlue `lightglue/aliked.py` lines 603–608): `model_name="aliked-n16"`, `max_num_keypoints=-1` (threshold-based mode), `detection_threshold=0.2`, `nms_radius=2`, `preprocess_conf={"resize": 1024}` — **same canonical 1024-largest-edge resize policy as SuperPoint + DISK**. Pretrained weights URL pattern: `https://github.com/Shiaoming/ALIKED/raw/main/models/{model_name}.pth` — auto-downloaded via `torch.hub.load_state_dict_from_url` at first construction. **Required input format**: `data["image"]` as `torch.Tensor[B, 3, H, W]` RGB; if `B, 1, H, W` grayscale provided, the extractor auto-converts via `kornia.color.grayscale_to_rgb` (per `lightglue/aliked.py` lines 749–750). **Output format**: `{keypoints: torch.Tensor[B, N, 2], descriptors: torch.Tensor[B, N, dim], keypoint_scores: torch.Tensor[B, N]}` where `dim ∈ {64, 128}` per variant. **There is also a `raco-aliked` sibling weight checkpoint distributed by cvg/LightGlue** (per `lightglue/lightglue.py` lines 349–352) — RACo (Random Augmentation in Color)-trained ALIKED variant; community contribution, not in canonical paper; skipped in this entry as "separately-cataloged sibling mode if elevated". - **Target Audience**: System architects + C3 implementer + Step-7.5 reviewer + license-posture decision-maker (D-C1-1 + D-C3-1 NEW) - **Research Boundary Match**: **Full match** for the project's pinned mode of ALIKED+LightGlue (single-image-pair sparse feature matching: take a UAV nadir frame + a retrieved satellite tile, run ALIKED-N(16) feature extraction on each independently at 1024-largest-edge, match via LightGlue with `features='aliked'`, return 2D-2D correspondences with confidence scores feeding the project's downstream C4 PnP+RANSAC pose estimator). The repo ships everything needed: extractor classes (`aliked-t16/n16/n16rot/n32` via `cvg/LightGlue` `lightglue.ALIKED(model_name=...)`), I/O utilities inherited from cvg/LightGlue (`load_image, rbd, match_pair`), visualization inherited (`viz2d`), pretrained weights auto-downloaded. **Asymmetric image-pair sizes are handled natively** — same independent per-image extraction pattern as SP+LightGlue + DISK+LightGlue. **Partial match** for the project's domain (canonical training on **MegaDepth perspective dataset (135 scenes, 1.35M image pairs sampled per DISK methodology)** + **R2D2 homographic dataset (Oxford-Paris + Aachen synthetic homographies)** — neither dataset is aerial nadir; **same aerial-domain caveat as SP+LightGlue + DISK+LightGlue + C2 candidates**; aerial applicability is **NOT explicitly validated in the canonical paper** — project-side via Jetson MVE on AerialExtreMatch + Derkachi flight). **NEGATIVE finding for the Jetson deployment story**: Source #73 (`fabio-sim/LightGlue-ONNX`) does NOT ship a documented ALIKED end-to-end export pathway as of January 2026 — Source #73 README changelog explicitly lists SuperPoint (28 Jun 2023) + DISK (30 Jun 2023) extractor support, **but no ALIKED entry**; Source #73 citations section cites LightGlue + SuperPoint + DISK papers only, **with no ALIKED reference**; Source #73 example CLI commands all use `superpoint` as the positional extractor argument and there is no documented `aliked` CLI variant. **Plus the canonical `lightglue/aliked.py` uses `torchvision.ops.deform_conv2d`** which is a known-difficult ONNX export op (deformable conv historically required either ONNX opset ≥19 native `DeformConv` op OR a custom TensorRT plugin). **Implication for D-C3-2**: ALIKED+LightGlue's Jetson deployment story is materially WEAKER than DISK+LightGlue's or SP+LightGlue's; the project's options for ALIKED+LightGlue on Jetson are restricted to (a) PyTorch-fp16 only (likely 2-3× slower than DISK+LightGlue's TensorRT path), (b) custom ONNX export with deform_conv plugin (significant engineering effort), (c) wait for community LightGlue-ONNX ALIKED support to land, (d) accept Torch-TensorRT partial graph compilation with deform_conv falling back to PyTorch-eager (mixed runtime — operationally complex). - **Summary**: ALIKED is the canonical sparse-keypoint-and-descriptor extraction network introduced by Zhao et al. (IEEE T-IM 2023), with **Sparse Deformable Descriptor Head (SDDH) as its main contribution** — extracts deformable descriptors only at sparse keypoints (rather than dense descriptor maps as in SuperPoint / R2D2 / D2-Net / ASLFeat / DISK), reducing GFLOPs by ~6-200× vs prior methods at competitive matching accuracy. **CRITICAL LICENSE FINDING**: LICENSE file is **BSD-3-Clause** (Copyright (c) 2022, Zhao Xiaoming) — permissive; this places ALIKED ITSELF on the **BSD/permissive license track** alongside MixVPR (MIT) + SelaVPR (MIT) + NetVLAD-canonical (MIT) + EigenPlaces (MIT) + Kimera-VIO (BSD-2) + OKVIS2 (BSD-3) + DPVO (MIT) + cvg/LightGlue itself (Apache-2.0). cvg/LightGlue's `lightglue/aliked.py` port file inherits the BSD-3-Clause notice in its file header (lines 1–33 of the file include the full BSD-3-Clause notice + Magic Leap-style author attribution). **Architecture (paper §III + §IV)**: feature encoder with 4 ConvBlock/ResBlock stages (block3 + block4 use deformable convolutions per paper §III-A) → feature aggregation via four upsample blocks → score map head (SMH) for keypoint detection via Differentiable Keypoint Detection (DKD, inherited from ALIKE [10]) → SDDH for sparse deformable descriptor extraction at the detected keypoints. SDDH first samples a K×K patch around each keypoint, estimates M deformable sample positions via two convolution layers, samples M supporting features via bilinear sampling, encodes with selu+conv1x1 + aggregates with convM (paper Eq. 4–5) producing a `dim`-D descriptor with L2-norm. Per-keypoint cost is ∝ M (vs ∝ HW for DMH) → **drastic GFLOPs reduction**. **Network configurations (paper Table II)**: Tiny (c1=8, c2=16, c3=32, c4=64, dim=64), Normal (c1=16, c2=32, c3=64, c4=128, dim=128), Large (c1=32, c2=64, c3=128, c4=128, dim=128 with deeper desc head). cvg/LightGlue port exposes Tiny + Normal (with M=16 / M=32) but NOT Large. **Reported headline performance vs SOTA on RTX 2060** (paper Table IV — HPatches with 1k keypoints, 640×480): ALIKED-T(16) **125.87 FPS** / 0.192M params / 1.37 GFLOPs / MMA@3=72.99% / MHA@3=78.70%; ALIKED-N(16) **77.40 FPS** / 0.677M params / 4.05 GFLOPs / MMA@3=74.43% / MHA@3=77.22%; ALIKED-N(32) **75.64 FPS** / 0.980M params / 4.62 GFLOPs / MMA@3=75.23% / MHA@3=74.44%. **vs SuperPoint** (1.301M params / 26.11 GFLOPs / 52.63 FPS / MMA@3=65.37 / MHA@3=70.19): ALIKED-N(16) achieves +9.06 absolute MMA@3 + +7.03 absolute MHA@3 at **1/6th the GFLOPs and ~1.5× the FPS**. **vs DISK** (1.092M params / 98.97 GFLOPs / 11.81 FPS / MMA@3=77.59 / MHA@3=70.56): DISK has +3.16 absolute MMA@3 vs ALIKED-N(16) but **-6.66 absolute MHA@3** (DISK keypoints are evenly distributed → poorer homography estimation); ALIKED-N(16) is **6.6× faster with 1/24th the GFLOPs**. **Pose Estimation IMW-test (paper Table V, 2048 keypoints)**: ALIKED-N(16) Stereo mAA(5°)=46.30 / mAA(10°)=85.47 (vs DISK 44.80/85.20 — competitive with DISK at 1/24th GFLOPs); ALIKED-N(16) Multiview mAA(5°)=39.53 / mAA(10°)=52.28 / TL=5.57 (vs DISK 38.72/51.22/5.50 — slightly better than DISK on stereo, marginally less on multiview where DISK's higher #matches gives bundle-adjustment edge). **PPC (Performance Per Cost = mAA(10°)/GFLOPs)**: ALIKED-N(16) Stereo PPC=12.91 vs DISK 0.52 (**24.8× higher PPC**). **FM-Bench TUM/KITTI/T&T/CPC (paper Table VI)**: ALIKED-N(16) achieves best %Recall on TUM (63.60), best on T&T (92.10), and is competitive with DISK on KITTI (92.10 vs DISK 90.20) and CPC (58.00 vs DISK 59.10). **Aachen Day-Night visual relocalization (paper Table VII)**: ALIKED-N(32) up-to-1024-keypoints / (0.25m,2°)/(0.5m,5°)/(5m,10°) = **77.6/88.8/100.0 (best in row)**; ALIKED-N(16) = **73.5/85.7/98.0**; ALIKED-T(16) = 70.4/87.8/98.0; **vs SuperPoint** = 58.2/66.3/72.4 — ALIKED-N(32) is +19.4 absolute / +22.5 absolute / +27.6 absolute over SuperPoint at the strictest tier; **vs DISK** = 60.2/72.4/81.6 — ALIKED-N(32) is +17.4/+16.4/+18.4 absolute. **The Aachen documentary lift over SuperPoint and DISK on the visual-localization task is the strongest documentary signal for ALIKED+LightGlue's project relevance** (the project's intended pipeline is identical: C2 NetVLAD-class top-K → C3 sparse-matcher → C4 PnP+RANSAC, all evaluated on Aachen Day-Night by Source #71 LightGlue paper + Source #76 ALIKED paper). **Rotation invariance (paper §VI-C1 + Fig. 6 top)**: ALIKED-N(16, rot) achieves **best rotation invariance** at 0–45° rotations (vs SuperPoint, ALIKE, DISK, etc.); however, ALIKED-N(16, rot) performs slightly worse in 3D reconstruction than ALIKED-N(16) — **D-C3-1-mitigation-specific consideration**: for the project's UAV nadir use case where heading variation is expected, `aliked-n16rot` may be the preferred sibling mode; rotation augmentation may not hurt aerial-nadir 3D-reconstruction performance materially because aerial-nadir scenes do not have a strong "up direction" cue (vs ground-level scenes where vertical cues are critical). Plan-phase decision raised (will be tagged D-C3-4 NEW): ALIKED-N(16) vs ALIKED-N(16rot) vs ALIKED-N(32) vs ALIKED-T(16) sibling-mode choice for the project's pinned ALIKED variant. **Limitations (paper §VI-E)**: SDDH has only one layer for deformable position estimation, so it has limitations modeling extreme image deformation (large scale + viewpoint differences simultaneously); shared by all single-scale matching methods (SP, ALIKE, DISK, ASLFeat) at scale-difference >4×. **Custom_ops requirement on canonical Shiaoming/ALIKED**: the README mentions `cd custom_ops; sh build.sh` to build a CUDA extension for the deformable-position-estimation kernel — this is a **legacy path** that the cvg/LightGlue port has eliminated by using `torchvision.ops.deform_conv2d` directly (per `lightglue/aliked.py` lines 39 + 336–344); the project will use the cvg/LightGlue port and avoid the build-from-source dependency. **Modern lineage**: ALIKED is the strict successor to ALIKE (T-MM 2022) which itself is the strict successor to SuperPoint (CVPR Workshop 2018) — the lineage establishes ALIKED as the **modern competitive lightweight CNN extractor**, with SDDH as the key innovation enabling lower GFLOPs at competitive accuracy. - **Related Sub-question**: SQ3+SQ4 / C3 — ALIKED+LightGlue per-mode API capability verification (Mandatory `context7` lookup NOT INDEXED + WebFetch fallback PASS per Per-Mode rule item 2; cross-validated against canonical README + LICENSE WebFetch + canonical paper WebFetch [Source #75] + cvg/LightGlue `lightglue/aliked.py` source code inspection [transitively cited via Source #70] + LightGlue-ONNX ALIKED-export-absence finding [transitively cited via Source #73]); D-C3-1 RECOMMENDED-secondary-mitigation candidate (BSD-3-Clause + Apache-2.0 throughout, second-cleanest license-compliant option after DISK+LightGlue); raises NEW **D-C3-4 ALIKED-sibling-mode-choice (aliked-t16 64-D / aliked-n16 128-D canonical / aliked-n16rot 128-D rotation-augmented / aliked-n32 128-D higher-SDDH-sample-count)** Plan-phase decision ### Source #75 - **Title**: ALIKED canonical paper — "ALIKED: A Lighter Keypoint and Descriptor Extraction Network via Deformable Transformation" (Zhao, Wu, Chen, Chen, Xu, Li — IEEE Transactions on Instrumentation & Measurement, vol. 72, pp. 1–16, 2023, DOI 10.1109/TIM.2023.3271000, arXiv:2304.03608) - **Link**: arXiv abstract https://arxiv.org/abs/2304.03608 (April 2023); arXiv full PDF https://arxiv.org/pdf/2304.03608.pdf ; IEEE T-IM published version DOI 10.1109/TIM.2023.3271000 ; accessed 2026-05-08 - **Tier**: L1 (peer-reviewed IEEE T-IM 2023 + canonical implementation cross-referenced; documented modern competitive lightweight CNN extractor in the post-SuperPoint / post-ALIKE era; cited by 2024–2026 feature-matching papers as a competitive-fast extractor reference) - **Publication Date**: arXiv preprint 2023-04-07; IEEE T-IM publication mid-2023 - **Timeliness Status**: ✅ Within Critical-novelty window (April 2023 — modern competitive ground for sparse-extractor reference); Established-competitive-modern-extractor exemption applies (ALIKED is the post-ALIKE successor with explicit GFLOP-reduction claims that 2024–2026 successor candidates [XFeat, XFeat*] explicitly position themselves against) - **Version Info**: arXiv v1 (April 2023, IEEE T-IM camera-ready); paper §III architecture + §IV SDDH + §V sparse NRE loss + §VI experiments + §VI-A implementation details (Adam optimizer, betas 0.9/0.999, top-400 detected + 400 random keypoints with NMS, 800×800 training resolution, batch size 2, gradient accumulation over 6 batches, MegaDepth + R2D2 homographic datasets, 100K training steps, 100K best-checkpoint selection on validation) - **Target Audience**: System architects + C3 implementer + Step-7.5 reviewer - **Research Boundary Match**: **Full match** for the algorithm (ALIKED with SDDH as the descriptor head, deformable convolution in the feature encoder's last 2 blocks, DKD for keypoint detection, sparse NRE loss for training); **partial match** for the project's domain (paper benchmarks: HPatches homography Table IV [planar scenes with illumination + viewpoint changes], IMW test Table V [phototourism stereo + multiview reconstruction], FM-Bench Table VI [TUM indoor SLAM, KITTI driving, T&T wide-baseline reconstruction, CPC wild-reconstruction-from-web], Aachen Day-Night Table VII [outdoor visual relocalization]; **NO aerial nadir benchmark** in the canonical paper). **Critical paper §I + §II-A reference position**: ALIKED is positioned as a **lighter keypoint and descriptor extraction network for resource-constrained visual measurement applications**, including SLAM, computational photography, and visual place recognition — directly aligned with the project's Jetson Orin Nano Super deployment context. The paper §VI-A explicitly tests on **mid-end NVIDIA GeForce RTX 2060** (a resource-constrained-class GPU) — Jetson Orin Nano Super is in the same class. - **Summary**: The canonical paper introduces ALIKED with three core contributions: (i) **SDDH (Sparse Deformable Descriptor Head)** that extracts deformable descriptors only at sparse keypoints (paper §IV) via M deformable sample positions per keypoint (rather than dense descriptor maps as in SuperPoint / DISK / R2D2 / ASLFeat / D2-Net) — drastic GFLOPs reduction of 6-200× vs prior methods (paper Table III + Table IV); (ii) **deformable convolutions in the last 2 blocks of the feature encoder** (paper §III-A) for geometric-invariance-aware feature extraction; (iii) **sparse NRE (Neural Reprojection Error) loss relaxation** (paper §V) — relaxes the dense NRE loss [DISK 2020 + ALIKE 2022] to sparse formulation, reducing GPU memory by ~3.5× and enabling training with batch size 2 on a single GPU (rather than DISK's RL-based training requirements). **Reported headline performance vs SOTA on HPatches Table IV (RTX 2060, 640×480, 1k keypoints)**: ALIKED-T(16) achieves **125.87 FPS / 0.192M params / 1.37 GFLOPs / MMA@3=72.99% / MHA@3=78.70%** — best MHA among compared methods despite smallest network (vs ALIKE-N 84.96 FPS / 0.318M / 7.91 GFLOPs / MMA@3=70.78 / MHA@3=75.74; vs SuperPoint 52.63 FPS / 1.301M / 26.11 GFLOPs / MMA@3=65.37 / MHA@3=70.19; vs DISK 11.81 FPS / 1.092M / 98.97 GFLOPs / MMA@3=77.59 / MHA@3=70.56). ALIKED-N(16) achieves **77.40 FPS / 0.677M params / 4.05 GFLOPs / MMA@3=74.43% / MHA@3=77.22%** — competitive with all top methods at fraction of GFLOPs. **Aachen Day-Night visual relocalization (paper Table VII, up to 2048 keypoints)**: ALIKED-N(32) achieves **(0.25m,2°)/(0.5m,5°)/(5m,10°) = 76.5/87.8/100.0** (vs SuperPoint 69.4/78.6/87.8 = +7.1/+9.2/+12.2; vs DISK 70.4/82.7/94.9 = +6.1/+5.1/+5.1; vs ALIKE-L 74.5/87.8/98.0 = +2.0/0.0/+2.0). **CRITICAL OBSERVATION FOR THE PROJECT**: paper Table VII Aachen Day-Night benchmark documents that **ALIKED-N(32) is the highest-performing tested keypoint extractor on the Aachen Day-Night benchmark at the strictest (0.25m,2°) tier with 2048 keypoints**, beating SuperPoint by +7.1 absolute, beating DISK by +6.1 absolute, beating ALIKE-L by +2.0 absolute. By transitive lineage with Source #71 LightGlue paper Table 3 (which reports Aachen Day-Night with **NetVLAD top-50 retrieval → SP+LightGlue → PnP+RANSAC pipeline** at Day (0.25m,2°)=89.2 — significantly better than SuperPoint+mNN's 69.4 on the same benchmark), the **expected pose-estimation accuracy of the ALIKED+LightGlue pipeline on Aachen Day-Night should approach or exceed SP+LightGlue's** because ALIKED-N(32)+mNN already beats SuperPoint+mNN by +7.1 absolute, and the LightGlue matcher provides similar relative lift over mNN for ALIKED as for SuperPoint. **However, no canonical paper directly evaluates ALIKED+LightGlue on Aachen Day-Night** — the cvg/LightGlue paper (Source #71) Table 3 only reports SP+LightGlue (the cvg/LightGlue ALIKED port + ALIKED-LightGlue weights were added post-paper). **3D reconstruction IMW test Table V (2048 keypoints)**: ALIKED-N(16) Stereo mAA(10°)=85.47 / Multiview mAA(10°)=71.78 — competitive with DISK (85.20 / 72.96) at 1/24th GFLOPs. **PPC (Performance Per Cost) in Table V**: ALIKED-N(16) PPC_stereo=12.91 vs DISK 0.52 — **24.8× higher PPC**. **Rotation invariance (paper §VI-C1 + Fig. 6 top)**: ALIKED-N(16, rot) achieves best rotation invariance among all tested methods at 0–45° image rotations (vs SuperPoint which is strong on rotation due to Homographic Adaptation training, vs ALIKE / DISK / R2D2 which are weak on rotation). **Scale invariance (paper §VI-C2 + Fig. 6 bottom)**: ALIKED-N(16) has best matching accuracy among single-scale methods, but all single-scale methods degrade to 0 at scale-difference >4×; multi-scale variant ALIKED-N(16, MS) handles up to 8× scale difference. **License**: BSD-3-Clause via Source #74 — canonical implementation. **NO direct ALIKED+LightGlue benchmark** in the cvg/LightGlue paper Table 3 / Table 6 / Table 7 (those tables document SP+LightGlue and DISK+LightGlue only); ALIKED+LightGlue benchmarks would need to be sourced from community evaluations (kornia, hloc, Image Matching WebUI, IMC competition leaderboards) at Plan-phase, OR the project measures ALIKED+LightGlue directly at Jetson MVE phase using the canonical pretrained weights. - **Related Sub-question**: SQ3+SQ4 / C3 — ALIKED+LightGlue per-mode API capability verification (cross-source verification of canonical paper architectural details + benchmark numbers + ablation studies; documents the Aachen Day-Night documentary lift of ALIKED-N(32)+mNN over SuperPoint+mNN by +7.1 absolute at strictest tier as transitive evidence that ALIKED+LightGlue should be competitive with or beat SP+LightGlue on the visual-localization task; aerial-domain caveat documented; D-C3-1 RECOMMENDED-secondary-mitigation status confirmed) ### Source #76 - **Title**: DISK canonical implementation — `cvlab-epfl/disk` (Tyszkiewicz, Fua, Trulls — NeurIPS 2020) — official PyTorch reference implementation, README + Apache-2.0 LICENSE (confirmed via GitHub API metadata `license.spdx_id: "Apache-2.0"`), `detect.py` + `match.py` runnable inference demos, two pretrained checkpoints `save-depth.pth` (depth-based RL reward — paper default and best variant) + `save-epipolar.pth` (epipolar reward — supplementary material variant), 4-layer U-Net architecture requiring image dimensions multiple of 16; cvg/LightGlue's DISK port `lightglue/disk.py` integrates via `kornia.feature.DISK.from_pretrained("depth")` (Apache-2.0 inheritance through kornia integration) - **Link**: README https://raw.githubusercontent.com/cvlab-epfl/disk/master/README.md (accessed 2026-05-08); GitHub API license metadata https://api.github.com/repos/cvlab-epfl/disk (accessed 2026-05-08; `license.spdx_id: "Apache-2.0"`); repo https://github.com/cvlab-epfl/disk (377 stars, 56 forks, last pushed 2023-12-15); cvg/LightGlue's DISK port `lightglue/disk.py` https://raw.githubusercontent.com/cvg/LightGlue/main/lightglue/disk.py - **Tier**: L1 (project-official codebase by the canonical DISK authors Michał Tyszkiewicz + Pascal Fua + Eduard Trulls, EPFL CVLab + Google Zurich; NeurIPS 2020 publication; canonical implementation referenced by every subsequent feature-matching paper as the "RL-trained sparse extractor reference"; included in cvg/LightGlue's canonical 5-extractor lineup [SuperPoint, **DISK**, ALIKED, SIFT, DoGHardNet]) - **Publication Date**: NeurIPS 2020 (paper accepted Sept 2020; arXiv preprint v1 2020-06-24); canonical repo creation 2020-10-20; last pushed 2023-12-15 (3 years of stable maintenance, no recent breaking changes — establishes mature reference codebase status) - **Timeliness Status**: ✅ Within Critical-novelty window (2020 — established competitive ground for sparse-extractor reference; widely-adopted reference implementation across feature-matching deployment community); Established-competitive-extractor-reference exemption applies (DISK is the canonical RL-policy-gradient sparse extractor reference, with its main innovation being end-to-end RL training of detection + description; the LightGlue paper Source #71 + ALIKED paper Source #75 + every subsequent feature-matching benchmark cites DISK as the "modern competitive sparse extractor reference baseline") - **Version Info**: master HEAD at access time (last pushed 2023-12-15). **Mode-enumeration query (1/3) — context7 NOT INDEXED + WebFetch fallback PASS** — `context7 resolve-library-id` returned no relevant matches for "DISK" feature extractor (top-results were Disk Inventory X / Expo Build Disk Cache / Blacksmith Sticky Disk / disko NixOS / gptman — all unrelated to feature-matching); per Per-Mode API Capability Verification rule item 2, fall-back to official-docs WebFetch on the canonical repo README + GitHub API license metadata was used. **Two DISK pretrained checkpoints documented**: `save-depth.pth` (default; trained with depth-based RL reward; reproduces paper Table 1 + 2 results 0.51315 stereo AUC + 0.72705 multiview AUC on IMW2020 test set with 2k features at canonical schedule); `save-epipolar.pth` (alternate; trained with epipolar reward; supplementary material variant). **Native canonical inference CLI**: `python detect.py --height 1024 --width 1024 --n 2048 h5_artifacts_destination images_directory` produces `keypoints.h5 + descriptors.h5`; `python match.py --rt 0.95 --save-threshold 100 h5_artifacts_destination` produces matches via mutual-NN; ratio test threshold 0.95 documented. **Canonical model architecture**: 4-layer U-Net with deformable convolutions; image dimensions must be multiple of 16 (auto-padded preserving aspect ratio via `--height/--width` flags); produces 128-D L2-normalized descriptors per keypoint (per Source #71 LightGlue paper §3 + cvg/LightGlue `lightglue/disk.py` `desc_dim=128` default config). **In cvg/LightGlue the DISK extractor is wired via `kornia.feature.DISK.from_pretrained("depth")` with `LightGlue(features='disk')` matcher config** (`lightglue/disk.py` lines 1–53). **Default per-extractor config in cvg/LightGlue port**: `weights="depth"`, `max_num_keypoints=None` (threshold-based mode; project pinned to 1024), `desc_dim=128`, `nms_window_size=5`, `detection_threshold=0.0`, `pad_if_not_divisible=True` (auto-handles the multiple-of-16 constraint), `preprocess_conf={"resize": 1024, "grayscale": False}` — **same canonical 1024-largest-edge resize policy as SuperPoint + ALIKED**. Pretrained weights distributed via kornia (`kornia.feature.DISK.from_pretrained` accepts "depth" or "epipolar" weight key, auto-downloads from kornia model registry). **Required input format**: `data["image"]` as `torch.Tensor[B, 3, H, W]` RGB; if `[B, 1, H, W]` grayscale provided, the cvg/LightGlue port auto-converts via `kornia.color.grayscale_to_rgb` (per `lightglue/disk.py` lines 31–32). **Output format**: `{keypoints: torch.Tensor[B, N, 2], descriptors: torch.Tensor[B, N, 128], keypoint_scores: torch.Tensor[B, N]}` where `N ≤ max_num_keypoints` — same dict shape as SP+LightGlue + ALIKED+LightGlue, allowing direct LightGlue matcher swap via `features='disk'`. **Training data**: EPFL CVLab DISK dataset (~164 GB downloadable via `download_dataset` script), sampled from MegaDepth phototourism scenes with depth-map supervision; Low-GPU-memory training option `python train.py --substep 2 --batch-size 1 --chunk-size 10000 --warmup 500` documented to fit within 11/12 GB GPUs (~2 weeks of training); canonical training was on 32 GB V100s with `inverse_T = θ_M` annealed from 15 to 50 over 20 epochs; best checkpoint selection on validation AUC. **COLMAP integration**: ships `colmap/h5_to_db.py` for SfM pipeline integration. **No `lightglue/disk.py` LICENSE annotation in the file header** (vs ALIKED's explicit BSD-3-Clause file-header inheritance) — the cvg/LightGlue port file inherits Apache-2.0 from cvg/LightGlue itself (Source #70) and from canonical DISK (Apache-2.0). **kornia is also Apache-2.0** (well-established) — Apache-2.0 license track is preserved through the entire DISK+LightGlue stack. The `lightglue-onnx` companion (Source #73) **explicitly supports DISK** in its 30 Jun 2023 changelog entry: "DISK feature extraction support added"; CLI command pattern parallel to SP+LightGlue: `lightglue-onnx export disk_lightglue --num-keypoints 1024 -b 2 -h 1024 -w 1024 --fp16 --device cuda` and inference via `lightglue-onnx infer disk_lightglue --image image1.jpg --image image2.jpg -d tensorrt --fp16`. **Canonical paper IMW2020 stereo AUC numbers (paper Table 1)**: DISK 0.50432 stereo AUC + 0.72624 multiview AUC at 2k features (default schedule); 0.51315 / 0.72705 with original ad-hoc schedule. By transitive lineage with Source #71 LightGlue paper Table 6 (which documents DISK+LightGlue stereo AUC@5° = 67.02 vs SP+LightGlue 59.03 = +7.99 absolute on IMC 2020), DISK+LightGlue is the **demonstrably technically-superior C3 candidate to canonical SP+LightGlue on phototourism stereo** while preserving Apache-2.0 license track throughout. **Limitations (paper §4 + ALIKED paper Table III cross-cite)**: DISK has 1.092M params + **98.97 GFLOPs** at 640×480 + 1k keypoints — **24.4× higher GFLOPs than ALIKED-N(16)** (4.05 GFLOPs); **3.8× higher GFLOPs than SuperPoint** (26.11 GFLOPs); RTX 2060 throughput **11.81 FPS @ 640×480 + 1k keypoints = 84.7 ms per pair extraction-only** (slowest among modern competitive sparse extractors). However, the LightGlue-ONNX TensorRT acceleration pathway (Source #73) provides 3-5× speedup over PyTorch fp16, partially offsetting DISK's high GFLOPs cost — **TensorRT-equipped Jetson Orin Nano Super extrapolation: ~50-100 ms per pair @ 1024 keypoints fp16 + LightGlue-ONNX TensorRT EP / ~200-400 ms PyTorch-fp16-only fallback**; at K=10 top-K retrieval pairs/frame this puts AC-4.1 400 ms budget at MEDIUM-RISK margin (better than ALIKED's PyTorch-fp16-only HARSH-RISK margin but worse than SP+LightGlue's TIGHT margin due to DISK's higher raw GFLOPs). - **Target Audience**: System architects + C3 implementer + C7 (Jetson runtime) implementer + Step-7.5 reviewer + license-posture decision-maker (D-C1-1 + D-C3-1 RECOMMENDED-PRIMARY mitigation lock) - **Research Boundary Match**: **Full match** for the project's pinned mode of DISK+LightGlue (single-image-pair sparse feature matching: take a UAV nadir frame + a retrieved satellite tile, run DISK feature extraction on each independently at 1024-largest-edge, match via LightGlue with `features='disk'`, return 2D-2D correspondences with confidence scores feeding the project's downstream C4 PnP+RANSAC pose estimator). The cvg/LightGlue port + kornia integration ships everything needed: extractor classes (`from lightglue import DISK; DISK(max_num_keypoints=1024)` instantiates `kornia.feature.DISK` under the hood), I/O utilities inherited from cvg/LightGlue (`load_image, rbd, match_pair`), visualization inherited (`viz2d`), pretrained weights auto-downloaded via kornia. **Asymmetric image-pair sizes are handled natively** — same independent per-image extraction pattern as SP+LightGlue + ALIKED+LightGlue. **Partial match** for the project's domain (canonical training on **EPFL CVLab DISK dataset (~164 GB) sampled from MegaDepth phototourism scenes with depth-map supervision** — NOT aerial nadir; **same aerial-domain caveat as SP+LightGlue + ALIKED+LightGlue + C2 candidates**; aerial applicability is **NOT explicitly validated in the canonical paper** — project-side via Jetson MVE on AerialExtreMatch + Derkachi flight, OR via D-C2-1 retrain decision = (a) project-domain retrain on AerialVL with DISK's RL policy gradient training paradigm). **POSITIVE finding for the Jetson deployment story**: Source #73 (`fabio-sim/LightGlue-ONNX`) **DOES** ship a documented DISK end-to-end export pathway (changelog entry 30 Jun 2023); DISK+LightGlue is the **second-cleanest LightGlue extractor sibling for Jetson deployment** after SP+LightGlue (which has the most-mature ONNX/TensorRT pathway via 28 Jun 2023 changelog) but **before ALIKED+LightGlue (export-absent in LightGlue-ONNX)**. - **Summary**: DISK is the canonical **RL-policy-gradient sparse-keypoint-and-descriptor extraction network** introduced by Tyszkiewicz, Fua, and Trulls (NeurIPS 2020), with **end-to-end RL training of detection + description as its main contribution** — uses policy gradient (REINFORCE-class) to optimize directly for the high-level downstream objective of "many correct feature matches between image pairs", relaxing the discreteness barrier that prior end-to-end methods (SuperPoint, R2D2, D2-Net) approximated with surrogate losses. **CRITICAL LICENSE FINDING**: Apache-2.0 (confirmed via GitHub API metadata `license.spdx_id: "Apache-2.0"`) — permissive, BSD/permissive license track on the extractor; **paired with cvg/LightGlue's Apache-2.0 matcher** (Source #70) → **Apache-2.0 license track THROUGHOUT the DISK+LightGlue stack**. This makes DISK+LightGlue the **cleanest license-compliant LightGlue-extractor-sibling alternative to canonical SP+LightGlue's Magic-Leap-restrictive-extractor-weights HARD DISQUALIFIER** (vs ALIKED+LightGlue's BSD-3-Clause + Apache-2.0 mixed track which is also clean BSD/permissive but adds the export-pathway gap). **Architecture (paper §3 + ALIKED paper Table III cross-cite)**: 4-layer U-Net feature encoder with deformable convolutions in the bottleneck → score head (DKD-class) for keypoint detection → per-pixel dense descriptor head producing 128-D L2-normalized descriptors. Image dimensions must be multiple of 16 due to U-Net's 4 downsampling stages (auto-padded preserving aspect ratio in the canonical CLI; auto-handled in cvg/LightGlue port via `pad_if_not_divisible=True`). **Two pretrained checkpoints distributed**: `save-depth.pth` (depth-based RL reward, default and best variant per paper) + `save-epipolar.pth` (epipolar reward, supplementary material variant). **Reported headline performance vs SOTA on IMW2020 (paper Table 1, 2k features)**: DISK 0.51315 stereo AUC + 0.72705 multiview AUC (canonical paper schedule) — **best single-extractor result on IMW2020 stereo at 2020 publication time**. **vs SuperPoint** (1.301M params / 26.11 GFLOPs): DISK has 1.092M params / 98.97 GFLOPs = **3.8× higher GFLOPs**; trades higher compute cost for higher matching accuracy. **vs ALIKED-N(16)** (0.677M params / 4.05 GFLOPs / 77.40 FPS RTX 2060): DISK has 1.092M params / 98.97 GFLOPs / 11.81 FPS — **24.4× higher GFLOPs / 6.6× lower FPS** but with +3.16 absolute MMA@3 on HPatches (per ALIKED paper Table III). **Aachen Day-Night visual relocalization (ALIKED paper Table VII, up to 2048 keypoints, mNN matcher)**: DISK 70.4/82.7/94.9 at (0.25m,2°)/(0.5m,5°)/(5m,10°) — beats SuperPoint=69.4/78.6/87.8 at strictest tier by +1.0/+4.1/+7.1 absolute, but loses to ALIKED-N(32)=77.6/88.8/100.0 by -7.2/-6.1/-5.1 absolute. **However, when paired with LightGlue matcher** (Source #71 paper Appendix A Table 6): **DISK+LightGlue stereo AUC@5° on IMC 2020 = 67.02 vs SP+LightGlue 59.03 = +7.99 absolute documentary technical superiority** + **DISK+LightGlue stereo AUC@10° on IMC 2020 = 83.45 vs SP+LightGlue 77.96 = +5.49 absolute** — **DISK+LightGlue is the demonstrably best documented LightGlue-extractor-sibling on phototourism stereo**. **No direct DISK+LightGlue Aachen Day-Night number** in either canonical paper (the cvg/LightGlue paper Table 3 documents only SP+LightGlue Aachen results); transitive lineage suggests DISK+LightGlue Aachen Day-Night should lift DISK+mNN's 70.4/82.7/94.9 by similar relative margin as LightGlue lifts SP over SP+mNN (paper §5.4 ~10-15 absolute lift across tiers expected, putting DISK+LightGlue Aachen Day at approximately 80-85/93-95/99-100 — competitive with SP+LightGlue's 89.2/95.4/98.5 but with more-uncertain documentary basis). **Training paradigm**: REINFORCE-class policy gradient with `inverse_T = θ_M` annealed from 15 to 50 over 20 epochs; depth-based reward = number of feature matches consistent with ground-truth depth maps (preferred to epipolar reward). Canonical training time = ~2 weeks on 32 GB V100; low-GPU-memory variant (12 GB) takes ~2 weeks at smaller batch/chunk size. **Custom dataset support**: ships `colmap/colmap2dataset.py` to import COLMAP outputs into DISK training format — directly applicable to project-side D-C2-1 = (a) aerial-retrain workflow (run COLMAP on AerialVL or Derkachi-flight scenes → import into DISK format → train DISK on aerial-nadir corpus). **Note on training cost**: DISK's RL-based training is more compute-intensive than ALIKED's sparse-NRE-loss training (paper §V — ALIKED reduces GPU memory by ~3.5× vs DISK's RL training); for the project's D-C2-1 retrain decision, DISK is **less retrain-friendly** than ALIKED at the GPU-memory level but **more retrain-friendly** than SP-reproduction (which would require Magic-Leap's Homographic Adaptation training pipeline + LICENSE clearance). **Kornia integration**: cvg/LightGlue's `lightglue/disk.py` port uses `kornia.feature.DISK.from_pretrained("depth")` — kornia auto-downloads the canonical `save-depth.pth` weights from kornia's model registry on first instantiation; no manual checkpoint download required. **LightGlue-ONNX support**: Source #73 ships DISK end-to-end ONNX export pathway documented in the 30 Jun 2023 changelog; CLI commands parallel SP+LightGlue export (`lightglue-onnx export disk_lightglue ...`). **Modern lineage**: DISK is the strict successor to SuperPoint (CVPR Workshop 2018) on the RL-trained-end-to-end axis (vs SuperPoint's Homographic-Adaptation-trained-with-surrogate-losses axis); the ALIKED paper (Source #75) positions itself as a successor to both DISK and SuperPoint; modern community evaluations (kornia, hloc, Image Matching Workshop competitions) consistently report DISK+LightGlue as a competitive top-3 sparse-matcher pipeline. - **Related Sub-question**: SQ3+SQ4 / C3 — DISK+LightGlue per-mode API capability verification (Mandatory `context7` lookup NOT INDEXED + WebFetch fallback PASS per Per-Mode rule item 2; cross-validated against canonical README + GitHub API license metadata WebFetch + canonical paper WebFetch [Source #77] + cvg/LightGlue `lightglue/disk.py` source code inspection [transitively cited via Source #70] + LightGlue-ONNX DISK-export-PRESENT finding [transitively cited via Source #73]); **D-C3-1 RECOMMENDED-PRIMARY-MITIGATION candidate** (Apache-2.0 throughout, demonstrably technically superior to canonical SP+LightGlue on phototourism stereo via paper Table 6 +7.99 absolute AUC@5° lift, and Jetson-deployment-ready via LightGlue-ONNX TensorRT pathway); reaffirms D-C2-1 reuse (canonical training on MegaDepth phototourism is NOT aerial nadir); reaffirms D-C3-2 LightGlue-inference-runtime choice with PREFERRED ONNX Runtime + TensorRT EP path for DISK+LightGlue on Jetson Orin Nano Super ### Source #77 - **Title**: DISK canonical paper — "DISK: Learning local features with policy gradient" (Tyszkiewicz, Fua, Trulls — Advances in Neural Information Processing Systems vol. 33, 2020, arXiv:2006.13566) - **Link**: arXiv abstract https://arxiv.org/abs/2006.13566 (June 2020); arXiv full PDF https://arxiv.org/pdf/2006.13566.pdf ; NeurIPS 2020 proceedings https://proceedings.neurips.cc/paper/2020/hash/a42a596fc71e17828440030074d15e74-Abstract.html ; accessed 2026-05-08 - **Tier**: L1 (peer-reviewed NeurIPS 2020 + canonical implementation cross-referenced; documented modern competitive RL-trained sparse-extractor reference; cited by 2021–2026 feature-matching papers as the "policy-gradient end-to-end sparse extractor" and the "MegaDepth-trained dense-descriptor sparse-extractor"; included in cvg/LightGlue's canonical 5-extractor lineup with explicit Appendix A Table 6 documentary superiority over canonical SP+LightGlue) - **Publication Date**: arXiv preprint 2020-06-24 (v1); NeurIPS 2020 publication December 2020 - **Timeliness Status**: ✅ Within Critical-novelty window (2020 — established competitive ground for sparse-extractor reference); Established-competitive-modern-extractor exemption applies (DISK is the canonical RL-policy-gradient sparse extractor reference, with its main innovation being the bridging of training/inference discreteness gap; ALIKED paper §I + Source #71 LightGlue paper §3 both cite DISK as the modern competitive baseline) - **Version Info**: arXiv v1 (June 2020, NeurIPS 2020 camera-ready). **Title note**: arXiv title "Local feature detection and description with policy gradient" is the original arXiv submission title; NeurIPS 2020 camera-ready title was changed to "DISK: Learning local features with policy gradient" (the canonical title used in the canonical README citation). **Paper §1–2 Introduction + Related Work**: position DISK as the bridge between fully-end-to-end-trainable methods (with surrogate losses) and RL-based-detection methods (which had been limited to detection-only with hand-crafted descriptors due to weak training signal); DISK's main RL-training contribution is the relaxation to a "find many correct feature matches" surrogate objective that allows robust training from scratch with policy gradient. **Paper §3 Related Work + §4 Method**: 4-layer U-Net architecture; per-pixel dense descriptor head; per-pixel scoring head; training via policy gradient with depth-based or epipolar reward; `inverse_T = θ_M` matching temperature scheduling. **Paper §5 Experiments**: HPatches MMA@3 (vs SuperPoint, R2D2, D2-Net, AS-LFeat — competitive top-tier; paper Figure 5 cached in canonical repo `results/hpatches/`); IMW2020 test set stereo + multiview AUC numbers (best single-extractor result at publication time per paper Table 1); 3D reconstruction quality on IMW competition images. - **Target Audience**: System architects + C3 implementer + Step-7.5 reviewer - **Research Boundary Match**: **Full match** for the algorithm (DISK with 4-layer U-Net + deformable bottleneck + per-pixel dense descriptor head + RL-policy-gradient training); **partial match** for the project's domain (paper benchmarks: HPatches Figure 5 [planar scenes with illumination + viewpoint changes], IMW2020 stereo + multiview reconstruction [phototourism dataset]; **NO aerial nadir benchmark** in the canonical paper; **NO Aachen Day-Night benchmark** in the canonical paper — Aachen results from DISK come from cross-paper evaluation in the ALIKED paper Source #75 Table VII via mNN matcher, plus from Source #71 LightGlue paper Appendix A which documents DISK+LightGlue on IMC 2020/2021/2023 + Aachen as cross-source evaluation). **Critical paper §3 reference position**: DISK is positioned as the **RL-policy-gradient end-to-end sparse extractor that closes the training/inference discreteness gap**; the paper explicitly positions itself against (a) surrogate-loss methods (SuperPoint, R2D2, D2-Net) that approximate the matching objective and (b) Q-learning methods (GLAMpoints, Reinforced Feature Points) that rely on hand-crafted descriptors or pre-trained components. DISK's contribution is to combine policy gradient with end-to-end-learnable description for the first time at competitive accuracy. - **Summary**: The canonical paper introduces DISK with three core contributions: (i) **Policy-gradient end-to-end training of detection + description** that closes the training/inference discreteness gap that surrogate-loss methods approximate; (ii) **Surrogate objective "find many correct feature matches"** that gives stable RL training signal — enables training from scratch (vs Reinforced Feature Points which requires SuperPoint pre-training); (iii) **Inverse-softmax matching temperature `θ_M` scheduling** annealed from 15 to 50 over 20 epochs to bridge the RL stochasticity-to-discreteness gap. **Reported headline performance on IMW2020 (paper Table 1, 2k features)**: DISK 0.51315 stereo AUC + 0.72705 multiview AUC — best single-extractor result at 2020 publication time. **HPatches MMA (paper Figure 5)**: competitive with SuperPoint, R2D2, D2-Net, ASLFeat (cached results available in canonical repo `results/hpatches/` for cross-paper comparison). **By transitive cross-paper lineage**: ALIKED paper Source #75 Table III = DISK 1.092M params / 98.97 GFLOPs / 11.81 FPS RTX 2060 / MMA@3=77.59% / MHA@3=70.56% (vs SuperPoint 1.301M params / 26.11 GFLOPs / 52.63 FPS / MMA@3=65.37 / MHA@3=70.19; vs ALIKED-N(16) 0.677M params / 4.05 GFLOPs / 77.40 FPS / MMA@3=74.43 / MHA@3=77.22) — DISK has highest MMA@3 (best per-pixel matching accuracy among the three) but lowest FPS due to dense descriptor head. **CRITICAL OBSERVATION FOR THE PROJECT**: cvg/LightGlue paper Source #71 Appendix A Table 6 documents DISK+LightGlue stereo AUC@5° on IMC 2020 = 67.02 vs SP+LightGlue 59.03 = **+7.99 absolute documentary technical superiority** + DISK+LightGlue stereo AUC@10° = 83.45 vs SP+LightGlue 77.96 = **+5.49 absolute** — DISK+LightGlue is the **demonstrably best documented LightGlue-extractor-sibling on phototourism stereo with full Apache-2.0 license track preservation throughout**. **License**: Apache-2.0 via Source #76 — canonical implementation. **Limitations (paper §6 + cross-paper observations)**: DISK's RL-policy-gradient training is computationally expensive (~2 weeks on 32 GB V100; ~2 weeks at smaller batch/chunk size on 12 GB GPUs for low-memory training); DISK's 98.97 GFLOPs at 640×480 + 1k keypoints is the **highest among modern competitive sparse extractors** (24.4× higher than ALIKED-N(16); 3.8× higher than SuperPoint) — partial mitigation via LightGlue-ONNX TensorRT acceleration pathway (Source #73 + Source #76); aerial-domain caveat shared with all C-row components (D-C2-1 reuse — canonical training is on MegaDepth phototourism via depth-map-supervised RL, NOT aerial nadir). - **Related Sub-question**: SQ3+SQ4 / C3 — DISK+LightGlue per-mode API capability verification (cross-source verification of canonical paper architectural details + benchmark numbers + RL training paradigm; documents the IMW2020 stereo + multiview reconstruction documentary advantage of DISK over SuperPoint at canonical paper publication time, plus the cross-paper LightGlue paper Appendix A Table 6 DISK+LightGlue +7.99 absolute AUC@5° superiority over canonical SP+LightGlue on IMC 2020 stereo as the **strongest documentary technical-superiority signal** for the D-C3-1 RECOMMENDED-PRIMARY-mitigation lock; aerial-domain caveat documented; D-C3-1 RECOMMENDED-PRIMARY-mitigation status confirmed) ### Source #78 - **Title**: SuperGlue canonical implementation — `magicleap/SuperGluePretrainedNetwork` (Sarlin, DeTone, Malisiewicz, Rabinovich — CVPR 2020 Oral) — official PyTorch reference implementation, README + Magic Leap "ACADEMIC OR NON-PROFIT ORGANIZATION NONCOMMERCIAL RESEARCH USE ONLY" Software License Agreement (identical wording to Source #72 SuperPoint LICENSE), `demo_superglue.py` (live webcam demo) + `match_pairs.py` (image-pair matching + evaluation) runnable inference scripts, two pretrained checkpoints `superglue_indoor.pth` (ScanNet-trained) + `superglue_outdoor.pth` (MegaDepth-trained), inference-only release (training code explicitly NOT released per README "We do not intend to release the SuperGlue training code"); SuperGlue is paired exclusively with canonical SuperPoint extractor (no SIFT-based or homography variants released per README) — both inherit Magic Leap restrictive license - **Link**: README https://raw.githubusercontent.com/magicleap/SuperGluePretrainedNetwork/master/README.md (accessed 2026-05-08); LICENSE https://raw.githubusercontent.com/magicleap/SuperGluePretrainedNetwork/master/LICENSE (accessed 2026-05-08; **Magic Leap "ACADEMIC OR NON-PROFIT ORGANIZATION NONCOMMERCIAL RESEARCH USE ONLY" SLA = identical wording to Source #72 SuperPoint LICENSE**); GitHub API license metadata https://api.github.com/repos/magicleap/SuperGluePretrainedNetwork (accessed 2026-05-08; `license.spdx_id: "NOASSERTION"` confirming non-OSI-approved license); repo https://github.com/magicleap/SuperGluePretrainedNetwork (4005 stars, 761 forks, last pushed 2024-08-30 — mature reference codebase status) - **Tier**: L1 (project-official codebase by the canonical SuperGlue authors Paul-Edouard Sarlin + Daniel DeTone + Tomasz Malisiewicz + Andrew Rabinovich, Magic Leap; same author group as Source #72 SuperPoint canonical; CVPR 2020 Oral publication; canonical implementation referenced by every subsequent feature-matching paper as the "long-established graph-neural-network sparse matcher reference baseline"; explicitly displaced by LightGlue per Source #71 paper §5 documentary 4-10× speedup at competitive accuracy) - **Publication Date**: CVPR 2020 (paper accepted Oral track); arXiv preprint v1 2019-11-26; canonical repo creation 2020-03-17; last pushed 2024-08-30 (4 years of stable maintenance for inference-only codebase, no training-code release ever) - **Timeliness Status**: ✅ Within Established-baseline-reference window (2020 publication; the long-established graph-neural-network sparse matcher reference baseline that defines the mandatory-simple-baseline role per the engine Component Option Breadth rule); Established-competitive-mandatory-baseline exemption applies (SuperGlue is the **canonical sparse-matcher mandatory-simple-baseline reference** for the C3 row; cited as the displaced reference in Source #71 LightGlue paper §1 + §5 + Appendix A; cited in every modern feature-matching paper as the predecessor that LightGlue exceeds) - **Version Info**: master HEAD at access time (last pushed 2024-08-30). **Mode-enumeration query (1/3) — context7 NOT INDEXED + WebFetch fallback PASS** — `context7 resolve-library-id` returned no relevant matches for "SuperGlue" feature matcher (top-result was Superglue API orchestration which is unrelated to feature-matching); per Per-Mode API Capability Verification rule item 2, fall-back to official-docs WebFetch on the canonical magicleap/SuperGluePretrainedNetwork README + LICENSE + GitHub API license metadata was used. **Two SuperGlue pretrained checkpoints**: `superglue_indoor.pth` (ScanNet-trained, default; recommended config `--resize 640 --superglue indoor --max_keypoints 1024 --nms_radius 4`) + `superglue_outdoor.pth` (MegaDepth-trained; recommended config `--resize 1600 --superglue outdoor --max_keypoints 2048 --nms_radius 3 --resize_float`). **Operating bounds**: README explicitly recommends NOT running SuperPoint+SuperGlue below 160×120 resolution (QQVGA) or above 2000×1500. **CLI structure**: two main top-level scripts — `demo_superglue.py` (live webcam/IP-camera/directory/movie demo with keyboard control) + `match_pairs.py` (image-pair matching from text-file list + optional evaluation if ground truth provided). **Output dict format** (per `.npz` file structure documented in README): `{keypoints0: (N0, 2), keypoints1: (N1, 2), matches: (N0,) array of indices into keypoints1 with -1 for unmatched, match_confidence: (N0,)}`; the optional `--eval` mode adds `{error_t, error_R, precision, matching_score, num_correct, epipolar_errors}`. **Architecture** (per README): "Graph Neural Network combined with an Optimal Matching layer that is trained to perform matching on two sets of sparse image features" — operates as a "middle-end" performing context aggregation + matching + filtering in a single end-to-end architecture. **Pairing**: SuperGlue is **paired exclusively with canonical SuperPoint extractor**; README explicitly states "We do not intend to release the SIFT-based or homography SuperGlue models" — there is NO non-Magic-Leap-extractor variant of canonical SuperGlue. **CRITICAL RETRAIN BLOCKER**: README explicitly states "We do not intend to release the SuperGlue training code" — **training code is NOT released**, blocking any project-side D-C2-1 retrain decision for SuperGlue+SuperPoint pinned mode. **Documentary results** (per README evaluation tables): ScanNet test set (1500 indoor pairs) AUC@5/10/20 = **16.12/33.76/51.79**, Prec=84.37, MScore=31.14; YFCC test set (4000 outdoor pairs) AUC@5/10/20 = **39.02/59.51/75.72**, Prec=98.72, MScore=23.61. **Phototourism evaluation** is mentioned but not directly reproducible (Image Matching Challenge 2020 keeps test set ground truth private). **Hloc integration**: README explicitly cross-references `cvg/Hierarchical-Localization` (hloc) toolbox where SuperGlue is the canonical matcher prior to LightGlue's release; "Winner of 3 CVPR 2020 competitions on localization and image matching!" per README. - **Target Audience**: System architects + C3 implementer + Step-7.5 reviewer + license-posture decision-maker (D-C1-1 + D-C3-1 — same Magic Leap restrictive HARD DISQUALIFIER as canonical SP+LightGlue applies to canonical SuperGlue+SuperPoint) - **Research Boundary Match**: **Full match** for the project's pinned mode of SuperGlue+SuperPoint mandatory-simple-baseline reference (single-image-pair sparse feature matching: take a UAV nadir frame + a retrieved satellite tile, run SuperPoint feature extraction on each independently, match via SuperGlue with `outdoor` checkpoint, return 2D-2D correspondences with confidence scores feeding the project's downstream C4 PnP+RANSAC pose estimator). The repo ships everything needed for **inference-only** evaluation: SuperPoint extractor (MagicLeap-pretrained, inherits Source #72 license), SuperGlue matcher (two pretrained checkpoints), `match_pairs.py` evaluation script, sample image pairs with ground truth. **Asymmetric image-pair sizes are handled natively** — same independent per-image extraction pattern as SP+LightGlue. **Partial match** for the project's domain (canonical training on **ScanNet indoor + MegaDepth phototourism outdoor scenes** — neither dataset is aerial nadir; **same aerial-domain caveat as SP+LightGlue + DISK+LightGlue + ALIKED+LightGlue + C2 candidates**; aerial applicability is **NOT explicitly validated in the canonical paper or README** — project-side via Jetson MVE on AerialExtreMatch + Derkachi flight; **D-C2-1 retrain decision is BLOCKED for SuperGlue+SuperPoint pinned mode** since training code is not released). **CRITICAL NEGATIVE finding for the role assessment**: SuperGlue+SuperPoint is **strictly inferior to SP+LightGlue + DISK+LightGlue + ALIKED+LightGlue** for the project's deployment as a *Selected* candidate because: (i) **Same Magic Leap restrictive HARD DISQUALIFIER** as canonical SP+LightGlue (LICENSE wording is identical to Source #72) — blocks dual-use deployment; (ii) **No retrain capability** — training code explicitly not released; (iii) **4-10× slower than LightGlue** at similar accuracy per Source #71 paper §5 + Table 2 documentary evidence; (iv) **No alternative extractor** — paired exclusively with Magic Leap SuperPoint, no SIFT or homography variants released. **POSITIVE for the role**: SuperGlue+SuperPoint **IS** the canonical sparse-matcher mandatory-simple-baseline that the engine's Component Option Breadth rule requires to be cataloged — establishes the long-established reference floor against which modern leads (LightGlue, XFeat) must measurably exceed. - **Summary**: SuperGlue is the canonical **graph-neural-network sparse matcher** introduced by Sarlin, DeTone, Malisiewicz, and Rabinovich (CVPR 2020 Oral), with **attentional graph neural network + optimal matching layer as its main contributions** — operates as a "middle-end" that takes two sets of SuperPoint keypoints + descriptors as input and outputs a soft assignment matrix between them; trained to perform matching end-to-end with attention-based context aggregation + Sinkhorn algorithm for optimal transport assignment. **CRITICAL LICENSE FINDING**: LICENSE file contents are **byte-for-byte identical** to Source #72 SuperPoint LICENSE = **Magic Leap "ACADEMIC OR NON-PROFIT ORGANIZATION NONCOMMERCIAL RESEARCH USE ONLY" Software License Agreement** — non-OSI-approved (GitHub API `license.spdx_id: "NOASSERTION"`); same wording: "Licensee a personal, non-exclusive, non-transferable license to use the Software for noncommercial research purposes" + "You may not distribute, copy or use the Software except as explicitly permitted herein" + "You may not sell, rent, lease, sublicense, lend, time-share or transfer". **HARD DISQUALIFIER for canonical SuperGlue+SuperPoint mode in project's dual-use deployment context** (eastern/southern Ukraine fixed-wing UAV with AC-NEW-2 spoofing-promotion path is dual-use military by every reasonable interpretation, and the project's question_decomposition.md hard disqualifier list includes "anything whose license blocks military / dual-use deployment"). **Training code explicitly NOT released** per README — D-C2-1 retrain decision is **BLOCKED** for SuperGlue+SuperPoint pinned mode. **Documentary headline performance vs LightGlue** (per Source #71 paper §5 + Table 2 cross-cite): SuperGlue is **4-10× slower than LightGlue** at competitive but slightly lower accuracy; LightGlue paper Table 2 documents SP+LightGlue MegaDepth-1500 AUC@5°/10°/20° = 66.7/79.3/87.9 at 44.2 ms standard / 31.4 ms adaptive RTX 3080, vs SP+SuperGlue at slightly lower AUC + 4-10× slower runtime. **Documentary results in canonical README**: ScanNet test (indoor, 1500 pairs) AUC@5/10/20 = 16.12/33.76/51.79; YFCC test (outdoor, 4000 pairs) AUC@5/10/20 = 39.02/59.51/75.72. **Architecture**: graph neural network with self-attention + cross-attention layers (paper §3.1) + optimal matching layer with dustbin (paper §3.2) + Sinkhorn algorithm for soft assignment (paper §3.2.2). **Modern lineage**: SuperGlue (CVPR 2020) is the predecessor of LightGlue (ICCV 2023, Source #70 + #71); SuperGlue is also the predecessor of SuperGlue-LoRA + LoFTR + DKM + RoMa + MASt3R successor lineage — but per Fact #26 NGPS pre-screen template, dense matchers (LoFTR, DKM, RoMa, MASt3R) are pruned outright on AC-4.1 Jetson dense-matcher-latency disqualifier. **Limitations**: (a) Magic Leap restrictive license HARD DISQUALIFIER (same as Source #72); (b) no training-code release blocks aerial-domain retrain; (c) 4-10× slower than LightGlue; (d) paired exclusively with Magic Leap SuperPoint extractor (no Apache-2.0 / BSD-3-Clause extractor pairing variants released); (e) no FlashAttention support (LightGlue's structural advantage); (f) no adaptive-depth/adaptive-width pruning (LightGlue's structural advantage paper §3.3); (g) no canonical Jetson ONNX/TensorRT export pathway in the LightGlue-ONNX equivalent project (SuperGlue's ONNX export is community-maintained third-party, not productized). - **Related Sub-question**: SQ3+SQ4 / C3 — SuperGlue+SuperPoint mandatory-simple-baseline per-mode API capability verification (Mandatory `context7` lookup NOT INDEXED + WebFetch fallback PASS per Per-Mode rule item 2; cross-validated against canonical README + LICENSE WebFetch + canonical paper [Source #79] + GitHub API license metadata; **HARD-LICENSE-DISQUALIFIER applies to canonical SuperGlue+SuperPoint mode in project's dual-use deployment context** — same Magic Leap restrictive license as canonical SP+LightGlue; **TRAINING-CODE-NOT-RELEASED** blocks D-C2-1 retrain decision; **role per engine Component Option Breadth rule = mandatory-simple-baseline reference floor** that establishes the long-established sparse-matcher reference against which modern leads must measurably exceed; documented Recall@K + AUC consistently 1-3 absolute below LightGlue across HPatches / MegaDepth / Aachen / IMC at 4-10× slower runtime per Source #71) ### Source #79 - **Title**: SuperGlue canonical paper — "SuperGlue: Learning Feature Matching with Graph Neural Networks" (Sarlin, DeTone, Malisiewicz, Rabinovich — CVPR 2020 Oral, arXiv:1911.11763) - **Link**: arXiv abstract https://arxiv.org/abs/1911.11763 (November 2019; CVPR 2020 camera-ready); arXiv full PDF https://arxiv.org/pdf/1911.11763.pdf ; CVPR 2020 proceedings https://openaccess.thecvf.com/content_CVPR_2020/papers/Sarlin_SuperGlue_Learning_Feature_Matching_With_Graph_Neural_Networks_CVPR_2020_paper.pdf ; psarlin.com/superglue (project website with videos, slides, recent updates); accessed 2026-05-08 - **Tier**: L1 (peer-reviewed CVPR 2020 Oral + canonical implementation cross-referenced; documented predecessor of LightGlue per Source #71 paper §1; cited by 2020-2026 feature-matching papers as the "graph-neural-network sparse matcher reference baseline"; winner of 3 CVPR 2020 competitions on localization and image matching per Source #78 README) - **Publication Date**: arXiv preprint 2019-11-26 (v1); CVPR 2020 publication June 2020 (Oral track) - **Timeliness Status**: ✅ Within Established-baseline-reference window (2020 — established competitive ground for sparse-matcher reference; Established-competitive-mandatory-baseline exemption applies — SuperGlue is the canonical sparse-matcher reference baseline that defines the mandatory-simple-baseline role for the C3 row per the engine Component Option Breadth rule) - **Version Info**: arXiv v1 (November 2019, CVPR 2020 Oral camera-ready). **Paper §3 architecture**: Attentional Graph Neural Network with bidirectional self-attention and cross-attention layers + Optimal Matching Layer with dustbin handling + Sinkhorn algorithm for differentiable optimal transport assignment. **Paper §4 training**: end-to-end training with sparse keypoint correspondence supervision; ScanNet for indoor model; MegaDepth for outdoor model; trained checkpoints (released) but training code (NOT released per Source #78 README). **Paper §5 experiments**: ScanNet indoor pose estimation (Table 1; outperforms ratio test, mutual nearest-neighbor, OANet, NN-RANSAC); YFCC outdoor pose estimation (Table 2; outperforms same set of baselines); Phototourism reconstruction (Table 3; competitive with NN+RANSAC + GMS at higher pose accuracy); HPatches homography estimation (Table 4; matches the displacement-only state-of-the-art). - **Target Audience**: System architects + C3 implementer + Step-7.5 reviewer - **Research Boundary Match**: **Full match** for the algorithm (SuperGlue with attentional GNN + Sinkhorn optimal matching layer + dustbin handling); **partial match** for the project's domain (paper benchmarks: ScanNet indoor, YFCC outdoor, Phototourism reconstruction, HPatches homography — **NO aerial nadir benchmark** in the canonical paper). **Critical paper §5 reference position**: SuperGlue is positioned as **the** sparse-matcher state-of-the-art at 2020 publication time, displacing classical mutual nearest-neighbor + RANSAC baselines + earlier learned matchers (OANet, NG-RANSAC, ACNe). The paper's main contribution is closing the gap between hand-crafted matchers (which have well-defined fallback semantics) and learned matchers (which prior to SuperGlue often degraded out-of-domain). **For the project's mandatory-simple-baseline role**: SuperGlue+SuperPoint is the **long-established sparse-matcher reference baseline that defines the simple-baseline floor** against which modern leads (LightGlue, XFeat) must measurably exceed. - **Summary**: The canonical paper introduces SuperGlue with three core contributions: (i) **Attentional Graph Neural Network with self-attention + cross-attention** that aggregates context across the matching image pair, allowing each keypoint to be matched based on both intra-image and inter-image structure; (ii) **Optimal Matching Layer with dustbin handling** (paper §3.2) that produces a soft assignment matrix where unmatched keypoints are assigned to a dustbin; (iii) **Sinkhorn algorithm for differentiable optimal transport** (paper §3.2.2) that allows end-to-end training of the entire matching pipeline. **Documentary headline results** (paper §5): ScanNet indoor pose estimation Table 1 outperforms NN+RANSAC + ratio test + OANet across all AUC tiers; YFCC outdoor pose estimation Table 2 outperforms same baselines; Phototourism reconstruction Table 3 competitive with state-of-the-art at higher pose accuracy; HPatches homography estimation Table 4 matches state-of-the-art. **Documentary cross-reference with LightGlue** (Source #71 paper Table 2 cross-cite): **LightGlue is 4-10× faster than SuperGlue at competitive accuracy** — SP+LightGlue MegaDepth-1500 AUC@5°/10°/20°=66.7/79.3/87.9 at 44.2 ms standard / 31.4 ms adaptive RTX 3080 vs SP+SuperGlue similar AUC at 4-10× slower runtime; the LightGlue paper §1 explicitly positions LightGlue as the displacement of SuperGlue in the canonical NetVLAD top-K → sparse matcher → PnP+RANSAC pipeline shape. **License**: Magic Leap restrictive via Source #78 — canonical implementation. **Limitations**: (a) Magic Leap restrictive license HARD DISQUALIFIER (same as Source #72 SuperPoint and Source #78 SuperGlue); (b) no training-code release per Source #78 README blocks D-C2-1 retrain; (c) displaced by LightGlue per Source #71 paper §5 + Table 2 documentary 4-10× speedup at competitive accuracy; (d) paired exclusively with canonical SuperPoint extractor (no SIFT or homography variants released); (e) no FlashAttention or adaptive-depth/adaptive-width pruning structural advantages; (f) no productized Jetson ONNX/TensorRT export pathway (SuperGlue ONNX export is community-maintained third-party, not productized in the LightGlue-ONNX equivalent project). - **Related Sub-question**: SQ3+SQ4 / C3 — SuperGlue+SuperPoint mandatory-simple-baseline per-mode API capability verification (cross-source verification of canonical paper architectural details + benchmark numbers + training paradigm; documents the displaced-by-LightGlue reference position as the **long-established sparse-matcher reference baseline** that defines the simple-baseline floor for the C3 row per the engine Component Option Breadth rule; HARD-LICENSE-DISQUALIFIER applies + TRAINING-CODE-NOT-RELEASED blocks retrain; aerial-domain caveat documented; mandatory-simple-baseline role confirmed) ### Source #80 - **Title**: XFeat canonical implementation — `verlab/accelerated_features` (Potje, Cadar, Araujo, Martins, Nascimento — CVPR 2024) — official PyTorch reference implementation, README + Apache 2.0 LICENSE; minimalist 3-line inference API (`from modules.xfeat import XFeat; xfeat = XFeat(); output = xfeat.detectAndCompute(torch.randn(1,3,480,640), top_k=4096)[0]`); Torch Hub one-liner `torch.hub.load('verlab/accelerated_features', 'XFeat', pretrained=True, top_k=4096)`; two main inference modes — sparse (`xfeat`) + semi-dense (`xfeat-star`); training code released (notebook `XFeat_training_example.ipynb` + `python3 -m modules.training.train --training_type xfeat_default --megadepth_root_path <...> --synthetic_root_path <...> --ckpt_save_path <...>`); built-in evaluation harnesses (`python3 -m modules.eval.megadepth1500 --matcher xfeat` + `python3 -m modules.eval.scannet1500`); real-time webcam demo (`python3 realtime_demo.py --method XFeat`); **NEW: XFeat+LighterGlue** companion mode (~3× faster than original LightGlue, trained by VerLab using `cvg/glue-factory` library, distributed via `xfeat+lg_torch_hub.ipynb` notebook); kornia integration (acknowledged in README); **CRITICAL Contributing-section ask**: "Currently, it would be nice to have an export script to efficient deployment engines such as TensorRT and ONNX" — **ONNX/TensorRT export pathway is COMMUNITY-CONTRIBUTION-NEEDED, NOT productized in canonical repo** (HARSHER D-C3-2 gate than DISK+LightGlue's well-documented LightGlue-ONNX TensorRT pathway, but TECHNICALLY SIMPLER than ALIKED+LightGlue's `torchvision.ops.deform_conv2d` ONNX-export blocker because XFeat is CNN-only with no deformable convolutions or unusual ops) - **Link**: README https://raw.githubusercontent.com/verlab/accelerated_features/main/README.md (accessed 2026-05-08); GitHub API license metadata https://api.github.com/repos/verlab/accelerated_features (accessed 2026-05-08; `license.spdx_id: "Apache-2.0"`); repo https://github.com/verlab/accelerated_features (1614 stars, 207 forks, last pushed 2025-01-15 — actively maintained CVPR 2024 reference codebase with training-code-released + companion XFeat+LighterGlue + minimal-dependency PyTorch-only architecture); project page https://www.verlab.dcc.ufmg.br/descriptors/xfeat_cvpr24/ ; HuggingFace Spaces demo https://huggingface.co/spaces/qubvel-hf/xfeat ; eight Colab notebooks distributed in-tree (`minimal_example.ipynb`, `xfeat_matching.ipynb`, `xfeat_torch_hub.ipynb`, `XFeat_training_example.ipynb`, `xfeat+lg_torch_hub.ipynb`) - **Tier**: L1 (project-official codebase by the canonical XFeat authors; CVPR 2024 publication acceptance; canonical implementation referenced in subsequent feature-matching papers as the modern-lightweight learned-feature reference; UFMG VerLab is the authors' affiliation and maintains the project; cross-affiliations span UFMG + Université de Bourgogne + Google Research + Université de Lorraine + Microsoft) - **Publication Date**: CVPR 2024 paper acceptance + canonical repo creation 2024-04-15; last pushed 2025-01-15 (9 months of active maintenance, ongoing community contributions including XFeat+LighterGlue companion mode added post-paper-acceptance) - **Timeliness Status**: ✅ Within Modern-competitive-lead window (2024 — modern competitive lightweight-CNN reference; XFeat is the modern-lightweight-CNN reference baseline that defines the modern-lead role for the C3 row's lightweight-CNN axis) - **Version Info**: main HEAD at access time (last pushed 2025-01-15). **Mode-enumeration query (1/3) — context7 NOT INDEXED + WebFetch fallback PASS** — `context7 resolve-library-id` returned `just-sultanov/xfeat` git-worktree-management CLI utility (UNRELATED to the canonical XFeat feature-matching library); per Per-Mode API Capability Verification rule item 2, fall-back to official-docs WebFetch on the canonical verlab/accelerated_features README + GitHub API license metadata + canonical paper (Source #81) was used. **Three primary inference modes**: (i) **XFeat sparse** with `top_k=4096` keypoints + 64-D float descriptors + Mutual Nearest Neighbor (MNN) matching; (ii) **XFeat\* semi-dense** with up to 10k features + 2-scale processing (0.65× + 1.3× input resize) + MNN + lightweight MLP-based offset refinement (offset prediction confidence threshold 0.2); (iii) **XFeat+LighterGlue** with VerLab-trained smaller LightGlue variant (~3× faster than original LightGlue per README claim). **Operating bounds**: README claims VGA real-time on Intel i5 CPU + 1,400 FPS batched RTX 4090 at VGA + 150 FPS single-batch RTX 4090 at VGA; paper Table 1 documents 27.1 FPS XFeat / 19.2 FPS XFeat\* on Intel i5-1135G7 CPU at VGA. **Supports gray-scale or RGB input** (paper §3.1 explicitly grayscale `H×W×C` with `C=1`; PyTorch tensor accepts `(B, 3, H, W)` per README minimal example). **CLI structure**: minimalist 3-line inference API + Torch Hub one-liner + 8 Colab notebooks + 3 evaluation scripts + 1 real-time webcam demo. **Output dict format**: per-image dict `{keypoints: (N, 2), scores: (N,), descriptors: (64, N) or (N, 64) depending on mode}` for sparse mode (XFeat); semi-dense mode (XFeat\*) adds `match_confidences` from MLP offset refinement. **Architecture** (per paper §3 + README): featherweight CNN backbone with channel sequence `{4, 8, 24, 64, 64, 128}` (paper §3.1 triple-rate channel increase vs VGG's double-rate); 23 convolutional layers organized as 6 spatial-halving blocks + 2 fusion blocks; basic layer = Conv + ReLU + BatchNorm; **DECOUPLED keypoint detection branch** using 1×1 convolutions on 8×8 tensor-block-transformed image (paper §3.2 Keypoint Head); descriptor head = feature pyramid merging at 1/8, 1/16, 1/32 scales bilinearly upsampled to 1/8 + element-wise summation + fusion block; reliability map regression branch; match refinement module = lightweight MLP predicting 8×8 pixel-level offset distribution. **Pairing options** (per README): standalone XFeat sparse with MNN matching / standalone XFeat\* semi-dense with MNN+offset-refinement / **XFeat+LighterGlue** paired matcher (NEW companion mode using `cvg/glue-factory`-trained LighterGlue variant ~3× faster than canonical LightGlue per README claim). **Training**: explicitly distributed in-tree (`XFeat_training_example.ipynb`); training command `python3 -m modules.training.train --training_type xfeat_default --megadepth_root_path <...> --synthetic_root_path <...> --ckpt_save_path <...>`; uses MegaDepth + COCO_20k synthetic warped-pairs at 6:4 ratio per paper §3.3; training **on entry-level hardware** (paper §3.3 mentions 6.5 GB VRAM total + 36 hours on single RTX 4090 + batch size 10 + Adam optimizer LR 3e-4 + exponential decay 0.5 every 30k updates + convergence at 160k iterations). **Documentary results** (per paper Table 1, MegaDepth-1500 i5-1135G7 CPU VGA, AUC@5/10/20 + FPS): SuperPoint AUC@5/10/20 = 37.3/50.1/61.5 at 3.0 FPS (4096 kpts) / DISK = 53.8/65.9/75.0 at 1.2 FPS (4096 kpts) / DISK\* = 55.2/66.8/75.3 at 1.2 FPS (10k kpts) / ALIKE-Tiny = 49.4/61.8/71.4 at 5.3 FPS (4096 kpts) / **XFeat sparse = 42.6/56.4/67.7 at 27.1 FPS** (4096 kpts; **9× faster than SuperPoint at HIGHER AUC + 5× faster than ALIKE**) / **XFeat\* semi-dense = 50.2/65.4/77.1 at 19.2 FPS** (10k features; **comparable to DISK\* at 16× speedup**); paper Table 2 ScanNet-1500 indoor: **XFeat AUC@5=16.7 / XFeat\*=18.4** vs SuperPoint=12.5 / DISK=9.6/11.3 / ALIKE=8.0 — **XFeat outperforms ALL other methods on ScanNet indoor** despite all methods being trained on MegaDepth (paper Appendix E attributes this to hybrid MegaDepth+synthetic-warp-COCO training reducing landmark-dataset overfitting bias); paper Table 3 HPatches Homography MHA@3 Illumination/Viewpoint = **95.0/68.6** (XFeat) — best Illumination@3 in paper Table 3 across all methods including SuperPoint 94.6 and DISK 94.6. - **Embedded/CPU deployment claim** (per paper Appendix C): on **Orange Pi Zero 3 (Cortex-A53 ARM, $28 device)** at 480×360 input, XFeat=**1.8 FPS** vs SuperPoint=0.16 FPS vs ALIKE=0.58 FPS — **XFeat is the ONLY learned method capable of running over 1 FPS on highly-constrained embedded device** without neural-network-inference optimization; this is the strongest documented embedded-deployment signal among all C3 candidates evaluated. Project's Jetson Orin Nano Super has dedicated GPU (1024-core Ampere) — XFeat extrapolation to Jetson Orin Nano fp16 with TensorRT will be **substantially faster** than Orange Pi Zero 3 ARM CPU. **Documentary headline performance vs LightGlue siblings** (per README MegaDepth-1500 cross-cite vs SP+LightGlue): XFeat+LighterGlue Fast (640, 1300 kpts) AUC@5/10/20 = **0.444/0.610/0.746** vs SP+LightGlue 0.469/0.633/0.762 (-2.5/-2.3/-1.6 absolute); Accurate (1024, 4096 kpts) AUC@5/10/20 = **0.564/0.710/0.819** vs SP+LightGlue 0.591/0.738/0.841 (-2.7/-2.8/-2.2 absolute) — XFeat+LighterGlue is **modestly below SP+LightGlue** at competitive accuracy + ~3× LighterGlue speedup. - **Target Audience**: System architects + C3 implementer + Step-7.5 reviewer + license-posture decision-maker (D-C1-1 — Apache-2.0 throughout = clean BSD/permissive track) + Plan-phase architect (modern-competitive-lead role for the C3 row's lightweight-CNN axis with strongest-documented-embedded-deployment signal among all C3 candidates evaluated) - **Research Boundary Match**: **Full match** for the project's pinned mode of XFeat sparse / XFeat\* semi-dense / XFeat+LighterGlue paired matcher (single-image-pair sparse or semi-dense feature matching: take a UAV nadir frame + a retrieved satellite tile, run XFeat extractor on each independently, match via MNN sparse OR MLP-refinement-semi-dense OR LighterGlue-paired matcher, return 2D-2D correspondences with confidence scores feeding the project's downstream C4 PnP+RANSAC pose estimator). **Asymmetric image-pair sizes are handled natively** — same independent per-image extraction pattern as SP+LightGlue + DISK+LightGlue + ALIKED+LightGlue. **Partial match** for the project's domain (canonical training on MegaDepth phototourism outdoor + COCO_20k synthetic-warp pairs — neither dataset is aerial nadir; **same aerial-domain caveat as SP+LightGlue + DISK+LightGlue + ALIKED+LightGlue + C2 candidates**; **D-C2-1 retrain decision REUSED** with **strongest retrain-friendliness signal among all C3 candidates evaluated** — paper §3.3 explicit "low memory usage of our method enables training on entry-level hardware, facilitating the fine-tuning or full training of our network for specific tasks and scene types" + 36 hours on single RTX 4090 + 6.5 GB VRAM total). **Aerial applicability is NOT explicitly validated in canonical paper or README** — project-side via Jetson MVE on AerialExtreMatch + Derkachi flight; **D-C2-1 retrain decision is materially less expensive** than DISK+LightGlue (~2 weeks 32 GB V100) + ALIKED+LightGlue (~24 hours RTX 3090). **CRITICAL POSITIVE finding**: XFeat is the only C3 candidate with **explicit documentation of CPU-real-time inference + embedded-device benchmarks** (paper Appendix C Orange Pi Zero 3 numbers); README explicitly states "Simple architecture components which facilitates deployment on embedded devices (jetson, raspberry pi, custom AI chips, etc..)" — **strongest embedded-deployment story among all C3 candidates evaluated**. **CRITICAL NEGATIVE finding**: NO PRODUCTIZED ONNX/TensorRT export pathway in canonical repo (README Contributing section explicit ask) — **D-C3-2 gate is HARSHER than DISK+LightGlue but TECHNICALLY SIMPLER than ALIKED+LightGlue** because XFeat is CNN-only with no deformable convolutions or unusual ops; project would need to invest custom-ONNX-export engineering effort but the architecture is straightforward (Conv + ReLU + BatchNorm only, no `torchvision.ops.deform_conv2d` blocker, no graph-neural-network attention export complexity). - **Summary**: XFeat is the canonical **lightweight-CNN learned feature extractor + matcher** introduced by Potje, Cadar, Araujo, Martins, and Nascimento (UFMG VerLab + Université de Bourgogne + Google Research + Université de Lorraine + Microsoft, CVPR 2024), with **three core contributions**: (i) lightweight CNN architecture with featherweight backbone using triple-rate channel increase (vs VGG's double-rate) channel sequence `{4, 8, 24, 64, 64, 128}` + 6 spatial-halving blocks + 2 fusion blocks + 23 total convolutional layers — designed for resource-constrained deployment without hardware-specific optimization; (ii) decoupled minimalist learnable keypoint detection branch using 1×1 convolutions on 8×8 tensor-block-transformed image with knowledge distillation from ALIKE-Tiny teacher; (iii) lightweight MLP-based match refinement module for pixel-level offsets from coarse semi-dense matches without high-resolution feature maps. **Two main inference modes**: **XFeat sparse** (top-K up to 4096 keypoints + 64-D float descriptors + MNN matching) and **XFeat\* semi-dense** (up to 10k features + 2-scale processing + MNN + MLP offset refinement). **NEW companion mode XFeat+LighterGlue** (VerLab-trained smaller LightGlue variant ~3× faster than original LightGlue per README claim, distributed in-tree via `xfeat+lg_torch_hub.ipynb`). **License: Apache 2.0** (canonical repo `license.spdx_id: "Apache-2.0"` per GitHub API metadata) — **clean BSD/permissive track throughout**, no copyleft + no Magic Leap restrictive disqualifier. **Documentary headline performance** (per paper Table 1 MegaDepth-1500 i5-1135G7 CPU VGA): **XFeat sparse AUC@5/10/20 = 42.6/56.4/67.7 at 27.1 FPS = 9× faster than SuperPoint at HIGHER AUC + 5× faster than ALIKE-Tiny**; **XFeat\* semi-dense AUC@5/10/20 = 50.2/65.4/77.1 at 19.2 FPS = comparable to DISK\* at 16× speedup**; paper Table 2 ScanNet-1500 indoor **XFeat outperforms all baselines including SuperPoint+DISK+ALIKE** despite all methods being MegaDepth-trained (paper Appendix E hybrid-training reduces landmark-dataset overfitting bias). **EMBEDDED DEPLOYMENT SIGNAL** (per paper Appendix C): on Orange Pi Zero 3 ($28 ARM Cortex-A53 device) at 480×360 input — XFeat=**1.8 FPS** vs SuperPoint=0.16 FPS vs ALIKE=0.58 FPS — **XFeat is the ONLY learned method capable of running over 1 FPS on highly-constrained embedded device** without neural-network-inference optimization. **Training**: 36 hours on single RTX 4090 + 6.5 GB VRAM total + MegaDepth + COCO_20k synthetic warp pairs at 6:4 ratio + 800×600 training resolution + batch size 10 + 160k iterations + Adam optimizer LR 3e-4 — **strongest retrain-friendliness signal among all C3 candidates evaluated** (vs DISK ~2 weeks 32 GB V100, ALIKED ~24 hours RTX 3090, SuperGlue training-code-not-released). **Limitations**: (a) NO PRODUCTIZED ONNX/TensorRT export pathway in canonical repo — README Contributing section explicit community-contribution ask; (b) AUC@5° on MegaDepth-1500 sparse mode (42.6) is materially below DISK (53.8) + DISK\* (55.2) + ALIKE-Tiny (49.4) — XFeat sparse is positioned as **"competitive accuracy at much higher speed"** rather than "best-accuracy"; (c) XFeat+LighterGlue MegaDepth-1500 AUC is modestly below SP+LightGlue at -2.5 to -2.8 absolute AUC@5° (Fast / Accurate configs); (d) aerial-domain training caveat shared with all C3 candidates evaluated; (e) 64-D descriptors (vs SP/DISK 256-D/128-D) provide cache-footprint advantage but may have weaker descriptor discrimination at extreme cross-domain matching (paper §4.3 visual localization validation on Aachen Day-Night not directly extracted in this session — section 4.3 referenced in paper but headline numbers not in extracted snippet). - **Related Sub-question**: SQ3+SQ4 / C3 — XFeat per-mode API capability verification (Mandatory `context7` lookup NOT INDEXED + WebFetch fallback PASS per Per-Mode rule item 2; cross-validated against canonical README + GitHub API license metadata + canonical paper [Source #81]; **APACHE-2.0-CLEAN-LICENSE-THROUGHOUT** + **STRONGEST EMBEDDED-DEPLOYMENT SIGNAL AMONG ALL C3 CANDIDATES EVALUATED** (Orange Pi Zero 3 1.8 FPS; designed explicitly for "jetson, raspberry pi, custom AI chips, etc.") + **STRONGEST RETRAIN-FRIENDLINESS SIGNAL AMONG ALL C3 CANDIDATES EVALUATED** (36 hours on single RTX 4090, 6.5 GB VRAM total) + **NO PRODUCTIZED ONNX/TensorRT EXPORT PATHWAY** in canonical repo (README Contributing section explicit community-contribution ask — D-C3-2 gate HARSHER than DISK but TECHNICALLY SIMPLER than ALIKED) + **MODERN COMPETITIVE LEAD ROLE** for the C3 row's lightweight-CNN axis with two distinct inference modes (XFeat sparse / XFeat\* semi-dense) + companion XFeat+LighterGlue paired matcher mode + canonical evaluation harnesses for MegaDepth-1500 + ScanNet-1500; aerial-domain-training caveat documented; **D-C3-6 NEW Plan-phase decision** for XFeat-mode-choice required) ### Source #81 - **Title**: XFeat canonical paper — "XFeat: Accelerated Features for Lightweight Image Matching" (Potje, Cadar, Araujo, Martins, Nascimento — CVPR 2024, arXiv:2404.19174) - **Link**: arXiv abstract https://arxiv.org/abs/2404.19174 (April 2024); arXiv full HTML https://arxiv.org/html/2404.19174v1 ; CVPR 2024 proceedings https://openaccess.thecvf.com/content/CVPR2024/html/Potje_XFeat_Accelerated_Features_for_Lightweight_Image_Matching_CVPR_2024_paper.html ; project page https://www.verlab.dcc.ufmg.br/descriptors/xfeat_cvpr24/ (videos, slides, supplementary material); accessed 2026-05-08 - **Tier**: L1 (peer-reviewed CVPR 2024 + canonical implementation cross-referenced; cited by 2024-2026 feature-matching papers as the modern-lightweight-CNN reference for resource-constrained deployment; UFMG VerLab + multiple cross-affiliations including Google Research + Microsoft underscoring the paper's industry credibility) - **Publication Date**: arXiv preprint 2024-04-30 (v1); CVPR 2024 publication June 2024 - **Timeliness Status**: ✅ Within Modern-competitive-lead window (2024 — modern competitive lightweight-CNN reference; **strongest embedded-deployment signal among modern competitive C3 candidates** at the publication time and through the project's evaluation window 2026) - **Version Info**: arXiv v1 (April 2024, CVPR 2024 camera-ready). **Paper §3 architecture**: featherweight CNN backbone with channel sequence `{4, 8, 24, 64, 64, 128}` (paper §3.1 triple-rate channel increase vs VGG's double-rate); 23 convolutional layers organized as 6 spatial-halving blocks + 2 fusion blocks; decoupled keypoint detection branch (paper §3.2 Keypoint Head) using 1×1 convolutions on 8×8 tensor-block-transformed image with knowledge distillation from ALIKE-Tiny teacher; descriptor head (paper §3.2 Descriptor head) = feature pyramid merging at 1/8, 1/16, 1/32 scales bilinearly upsampled to 1/8 + element-wise summation + fusion block; reliability map regression branch; **match refinement module (paper §3.2 Dense matching)** = lightweight MLP predicting 8×8 pixel-level offset distribution conditioned on coarsely matched feature pair. **Paper §3.3 training**: dual-softmax loss for descriptor learning + L1 reliability loss + NLL fine-matching loss for offset prediction + NLL keypoint loss with knowledge distillation from ALIKE-Tiny; trained on MegaDepth + synthetic warped COCO at 6:4 ratio + 800×600 input + batch size 10 + 160k iterations + Adam LR 3e-4 + exponential decay 0.5 every 30k updates + 36 hours on single RTX 4090 + 6.5 GB VRAM total. **Paper §4 experiments**: MegaDepth-1500 (Table 1, outdoor pose estimation) + ScanNet-1500 (Table 2, indoor pose estimation) + HPatches (Table 3, homography estimation) + Aachen Day-Night (Section 4.3, visual localization via HLoc) + Appendix F (Table 6, learned-matcher comparison vs LoFTR, LightGlue, Patch2Pix). **Paper Appendix C detailed timing analysis**: i7-6700K CPU + Orange Pi Zero 3 ARM Cortex-A53 embedded device (480×360 input, XFeat=1.8 FPS vs SuperPoint=0.16 FPS vs ALIKE=0.58 FPS) — **XFeat is the ONLY learned method capable of running over 1 FPS on highly-constrained embedded device** without neural-network-inference optimization at the publication time. - **Target Audience**: System architects + C3 implementer + Step-7.5 reviewer + license-posture decision-maker (D-C1-1 — clean Apache-2.0) + Plan-phase architect (modern-competitive-lead role + strongest embedded-deployment signal in the C3 row + strongest retrain-friendliness signal in the C3 row) - **Research Boundary Match**: **Full match** for the algorithm (XFeat lightweight-CNN feature extractor with featherweight backbone + decoupled keypoint head + lightweight MLP-based match refinement module); **partial match** for the project's domain (paper benchmarks: MegaDepth-1500 outdoor phototourism, ScanNet-1500 indoor RGB-D, HPatches homography, Aachen Day-Night day/night visual localization — NO aerial nadir benchmark in the canonical paper). **Critical paper §4 + Appendix F documentary cross-cite**: XFeat\* semi-dense at 1885 inliers and PPS=1.33 vs LightGlue 475 inliers PPS=0.31 (paper Appendix F Table 6) — XFeat\* delivers **4× more inliers per pair than LightGlue at 4× higher throughput**, demonstrating fundamental architectural advantage in the semi-dense matching paradigm vs sparse-only learned matchers. **Paper §5 / Appendix F**: explicit positioning as complementary to learned matchers — "Our techniques are, in fact, complementary to learned matchers; for example, LightGlue can be trained using both XFeat and XFeat\* features" — anticipates the XFeat+LighterGlue companion mode released post-paper-acceptance per Source #80 README. - **Summary**: The canonical paper introduces XFeat with **three core contributions**: (i) **a novel lightweight CNN architecture** with featherweight backbone using triple-rate channel increase strategy (vs VGG's double-rate) — channel sequence `{4, 8, 24, 64, 64, 128}` ensures minimal computational depth in early high-spatial-resolution layers while maintaining representational capacity through later deeper convolutions; (ii) **a minimalist learnable keypoint detection branch** that decouples detection from description using 1×1 convolutions on 8×8 tensor-block-transformed image with knowledge distillation from ALIKE-Tiny teacher (smaller backbone tends to concentrate on lower-level image features like corners/lines/blobs aligning with the 8×8 receptive field); (iii) **a novel lightweight MLP-based match refinement module** for pixel-level offsets from coarse semi-dense matches without high-resolution feature maps (vs LoFTR/ASpanFormer which require costly high-resolution feature maps), enabling efficient semi-dense matching in resource-constrained settings. **Documentary headline results**: paper Table 1 MegaDepth-1500 (5° / 10° / 20° AUC, FPS on i5-1135G7 CPU VGA) — **XFeat sparse 42.6/56.4/67.7 at 27.1 FPS** = 9× faster than SuperPoint (37.3/50.1/61.5 at 3.0 FPS) at higher AUC + 5× faster than ALIKE-Tiny (49.4/61.8/71.4 at 5.3 FPS) at slightly lower AUC; **XFeat\* semi-dense 50.2/65.4/77.1 at 19.2 FPS = comparable to DISK\* at 16× speedup**; paper Table 2 ScanNet-1500 indoor — **XFeat 16.7/32.6/47.8 + XFeat\* 18.4/34.7/50.3 outperforms ALL baselines including SuperPoint=12.5/24.4/36.7 + DISK + ALIKE** despite all methods being MegaDepth-trained (paper Appendix E attributes this to hybrid MegaDepth+synthetic-warp-COCO training reducing landmark-dataset overfitting bias); paper Table 3 HPatches homography MHA@3 illumination/viewpoint = 95.0/68.6 (XFeat) — best illumination@3 in paper Table 3 across all evaluated methods. **Paper Appendix C embedded-device timing analysis**: Orange Pi Zero 3 ARM Cortex-A53 ($28 device) at 480×360 input — XFeat=1.8 FPS vs SuperPoint=0.16 FPS vs ALIKE=0.58 FPS — **XFeat is the ONLY learned method capable of running over 1 FPS on highly-constrained embedded device** without neural-network-inference optimization. **Paper Appendix F learned-matcher comparison**: XFeat\* (coarse-fine) AUC@5/10/20 = 50.2/65.4/77.1 at 1.33 PPS vs LoFTR (learned matcher) 68.3/80.0/88.0 at 0.06 PPS + LightGlue (learned matcher) 61.4/75.0/84.8 at 0.31 PPS + Patch2Pix (coarse-fine) 47.8/61.0/71.0 at 0.05 PPS — XFeat\* delivers **4× more inliers per pair than LightGlue at 4× higher throughput**. **License**: Apache-2.0 via Source #80 — canonical implementation. **Limitations**: (a) AUC@5° on MegaDepth-1500 sparse mode is materially below DISK at strictest tier (42.6 vs 53.8 = -11.2 absolute) — XFeat sparse positioned as "competitive at much higher speed" rather than "best-accuracy"; (b) limited robustness to aggressive viewpoint changes and highly ambiguous image pairs as explicitly acknowledged in paper §F final paragraph; (c) aerial-domain training caveat shared with all C3 candidates evaluated; (d) NO PRODUCTIZED ONNX/TensorRT export pathway in canonical repo per Source #80 README Contributing section explicit community-contribution ask. - **Related Sub-question**: SQ3+SQ4 / C3 — XFeat per-mode API capability verification (cross-source verification of canonical paper architectural details + benchmark numbers + training paradigm + embedded-deployment evidence; documents the **modern-competitive-lead role with strongest documented embedded-deployment signal among all C3 candidates evaluated** at canonical paper publication time; aerial-domain caveat documented; modern-competitive-lead role confirmed; D-C3-6 NEW Plan-phase decision for XFeat-mode-choice required)