add clarification to research methodology by including a step for solution comparison and user consultation

2026-06-22 18:01:12 +00:00 · 2026-03-17 18:43:57 +02:00
parent d764250f9a
commit b419e2c04a
35 changed files with 6030 additions and 0 deletions
@@ -0,0 +1,111 @@
+# Reasoning Chain
+
+## Dimension 1: VO Matcher Selection
+
+### Fact Confirmation
+Draft04 uses SuperPoint+LightGlue for VO at 150-200ms/frame (Fact #1). XFeat achieves AUC@10° 65.4 vs SuperPoint's 50.1, is 5x faster (~15ms GPU), and is validated for UAV VO by SatLoc-Fusion (Fact #2, #3).
+
+### Reference Comparison
+SuperPoint+LightGlue provides higher quality matching for wide-baseline cross-view pairs (satellite matching). However, for consecutive frame VO with 60-80% overlap and mostly translational motion, XFeat's quality is sufficient — it actually outperforms SuperPoint on Megadepth.
+
+### Conclusion
+The VO matcher should be reverted to XFeat. The regression was unintentional (not in draft04 assessment findings). XFeat provides better speed (10x) and comparable-or-better quality for the VO use case. SuperPoint+LightGlue should only be retained as a fallback option, not the primary VO matcher.
+
+### Confidence
+✅ High — XFeat superiority for this use case is supported by both benchmarks and a published UAV system (SatLoc-Fusion).
+
+---
+
+## Dimension 2: LiteSAM Maturity Risk
+
+### Fact Confirmation
+LiteSAM has 5 GitHub stars, 0 forks, 4 commits, no license, no issues, and no independent reproduction (Fact #5, #14). Its base, EfficientLoFTR, has 964 stars and CVPR 2024 publication (Fact #8).
+
+### Reference Comparison
+For a production system, relying on a model with no community adoption, no license, and single-point-of-failure weight hosting (Google Drive) is risky. EfficientLoFTR is proven and mature but has 2.4x more parameters (15.05M vs 6.31M).
+
+### Conclusion
+Keep LiteSAM as primary satellite fine matcher (it IS better on benchmarks) but add EfficientLoFTR as a proven fallback. Add startup validation: verify weight checksum, test inference on a reference pair, log a warning if LiteSAM fails any check and auto-switch to EfficientLoFTR. This hedges the maturity risk while preserving the performance advantage.
+
+### Confidence
+✅ High — maturity metrics are objective; fallback strategy is standard engineering practice.
+
+---
+
+## Dimension 3: Hit Rate Claim Accuracy
+
+### Fact Confirmation
+Draft04 states "77.3% hit rate in Hard conditions on satellite-aerial benchmarks." The paper shows 77.3% is on the self-made dataset (Harbin/Qiqihar). On UAV-VisLoc Hard, LiteSAM achieves 61.65% (Fact #4).
+
+### Reference Comparison
+61.65% on UAV-VisLoc Hard is still better than SuperPoint+LightGlue's estimated 54-58%, but the gap is much narrower than 77.3% suggests.
+
+### Conclusion
+Correct the hit rate claim in the draft. Report both numbers: 61.65% on UAV-VisLoc Hard and 77.3% on self-made dataset. The improvement over SP+LG is real but more modest (~4-7pp on UAV-VisLoc) than the draft implies (~19pp).
+
+### Confidence
+✅ High — numbers directly from the paper.
+
+---
+
+## Dimension 4: Model Loading Security
+
+### Fact Confirmation
+CVE-2025-32434 (PyTorch ≤2.5.1) and CVE-2026-24747 (before 2.10.0) both allow code execution through torch.load even with weights_only=True (Fact #7). LiteSAM weights are on Google Drive with no integrity verification (Fact #6).
+
+### Reference Comparison
+All other models (SuperPoint, DINOv2) come from official registries (torch.hub, official repos). LiteSAM is the only model from an unverified source.
+
+### Conclusion
+Pin PyTorch ≥2.10.0. Add SHA256 checksum verification for ALL model weights, especially LiteSAM. Download LiteSAM weights once, compute checksum, store in configuration. Verify on every load. Prefer safetensors format where available (DINOv2 from HuggingFace supports this).
+
+### Confidence
+✅ High — CVEs are documented, mitigation is standard practice.
+
+---
+
+## Dimension 5: VRAM Budget
+
+### Fact Confirmation
+With SuperPoint+LightGlue for VO, peak VRAM is ~1.6GB. With XFeat, it drops to ~900MB (Fact #15). RTX 2060 has 6GB total, with ~500MB system overhead.
+
+### Reference Comparison
+Both fit under 6GB, but XFeat provides 700MB more headroom for PyTorch CUDA allocator overhead, batch processing, and unexpected spikes.
+
+### Conclusion
+Reverting to XFeat for VO improves VRAM headroom from 4.4GB to 5.1GB. No further action needed on VRAM — both configurations are safe.
+
+### Confidence
+⚠️ Medium — VRAM estimates are approximate; actual measurement needed.
+
+---
+
+## Dimension 6: GTSAM Robustness
+
+### Fact Confirmation
+iSAM2 can throw IndeterminantLinearSystemException (Fact #13). No error handling is specified in draft04.
+
+### Reference Comparison
+This is a standard GTSAM failure mode. Production systems must handle it.
+
+### Conclusion
+Add try/except around iSAM2.update(). On exception: log the error, skip the problematic factor, retry with relaxed noise model (10x sigma). If still fails: mark current position as VO-only. Never crash the pipeline on optimizer failure.
+
+### Confidence
+✅ High — standard GTSAM robustness pattern.
+
+---
+
+## Dimension 7: Satellite Imagery Freshness
+
+### Fact Confirmation
+Google Maps imagery for eastern Ukraine conflict zones is 1-3 years old and intentionally kept outdated (Fact #12). This can significantly degrade feature matching accuracy.
+
+### Reference Comparison
+DINOv2 coarse retrieval is robust to seasonal changes (semantic matching). Fine matching (LiteSAM/SuperPoint) is more sensitive to structural changes (destroyed buildings, new constructions in conflict zone).
+
+### Conclusion
+Add imagery age awareness: 1) log satellite tile age when available, 2) increase satellite match noise sigma for known-outdated regions, 3) lower confidence thresholds for matches in areas with known imagery staleness, 4) document Maxar (paid, fresh) and user-provided tiles as higher-priority alternatives for conflict zones. The existing multi-provider architecture already supports this — just needs tuning.
+
+### Confidence
+✅ High — Google's policy is documented; impact on matching is well-understood.