diff --git a/_docs/00_research/01_source_registry.md b/_docs/00_research/01_source_registry.md index 33bf2e7..4b8f4ff 100644 --- a/_docs/00_research/01_source_registry.md +++ b/_docs/00_research/01_source_registry.md @@ -15,7 +15,7 @@ | SQ6 — ArduPilot vs iNav external positioning | **Saturated for protocol-level architectural decision** (further detail deferred to SQ8 for spoofing-side fields and to design phase for SITL parameter tuning) | Major finding: iNav has no inbound external-positioning MAVLink handler; AC-4.3 wording must be revised. See `02_fact_cards.md` "SQ6 Conclusions". | | SQ1 — Existing GPS-denied UAV systems | **Saturated.** 13 sources logged across academic / open-source / commercial / defense-program / Ukraine-practitioner. Closest peer system: Twist Robotics OSCAR (deployed in Ukraine). Closest open-source pipeline-match: snktshrma/ngps_flight (NGPS, ArduPilot GSoC 2024 — LightGlue+SuperPoint+UKF+VISION_POSITION_ESTIMATE). Closest deployed commercial: Auterion Artemis (Skynode N + Visual Navigation, Ukraine-tested, 1000-mile range). | See `02_fact_cards.md` SQ1 cluster + working summary. | | SQ2 — Canonical pipeline decomposition | **Saturated.** 5 surveys/benchmarks logged (Skoltech aerial VPR, U.Maine cross-view, OrthoLoC 2.5D geodata, AnyVisLoc low-altitude multi-view, NUDT 2026 sciopen survey). All converge on **`retrieval → matching → pose-estimation`** hierarchical framework with VIO/IMU as auxiliary. Two new architectural facts added to C1–C10: (a) **AdHoP-style perspective-refinement loop** between matching and PnP (+63% translation accuracy, method-agnostic), (b) **DSM 2.5D dependency** for full 6-DoF on aerial-to-satellite (must be resolved with the Suite Sat Service or accepted as a 3-DoF degraded mode). Practitioner runtime evidence: AnyLoc on RTX 3090 = 0.63s/descriptor, SuperGlue re-rank = 17–25s; on Jetson Orin Nano these are non-viable for our 400 ms p95 budget — must restrict to lightweight VPR (e.g., MixVPR / SALAD class) + LightGlue/XFeat-class matchers. See `02_fact_cards.md` "SQ2 Conclusions". | -| SQ3+SQ4 — Per-component candidates (C1–C10) | Not started | | +| SQ3+SQ4 — Per-component candidates (C1–C10) | **In progress** — C1 (VIO) candidate enumeration done (Sources #43–#52); per-mode `context7` verification + Restrictions×AC sub-matrix per surviving candidate deferred to next session. C2–C10 not started. | See `02_fact_cards.md` C1 cluster + preliminary applicability table. | | SQ5 — Failure modes / deployment lessons | Not started (interleaved with SQ3/SQ4) | | | SQ7 — Datasets, SITL, replay environments | Not started | | | SQ8 — Safety considerations (AC-NEW-4 / AC-NEW-7) | Not started | Carries the AP_GPS spoofing-signal probe deferred from SQ6. | @@ -521,3 +521,139 @@ - **Research Boundary Match**: **Full match** (low-altitude UAV AVL is the survey's exact subject) - **Summary**: Survey-level confirmation of the canonical "**retrieval-matching-pose estimation**" hierarchical framework. Verbatim claim: "the hierarchical framework balances search efficiency, positioning accuracy, and scene generalization, becoming a robust technical path for low-altitude long-endurance absolute localization." Compares the framework against alternatives that are explicitly rejected: (a) relative visual localization (cumulative errors — VIO/SLAM only); (b) end-to-end direct localization (poor generalization); (c) map-free localization (scene-dependent). Sub-component evolution per stage: (a) retrieval = template-matching (SAD/SSD/NCC) → BoW/VLAD → deep-learning (annular/dense feature segmentation, contrastive InfoNCE, self-supervised); (b) matching = SIFT/SURF/ORB → SuperPoint+LightGlue/RoMa (sparse / semi-dense / dense); (c) pose estimation = PnP variants + RANSAC + IMU prior fusion. **Identifies four open challenges** that align with project risks: (i) cross-domain generalization (war-zone scene change); (ii) real-time inference on edge platforms (Jetson); (iii) robustness to complex environments (cropland, snow, low texture); (iv) high-quality datasets (the same gap our project's AC-NEW-7 / cache provisioning works around). **Lightweight-model-design-for-edge-deployment is named as a primary future-research direction** — directly validates project's Jetson Orin Nano constraint as a recognized field-level challenge, not a project-specific oddity. - **Related Sub-question**: SQ2 (framework canonicalness), SQ3+SQ4 (per-component evolution), SQ5 (named open challenges align with project risks) + +--- + +## SQ3+SQ4 / C1 (Visual / Visual-Inertial Odometry) — Candidate enumeration + +### Source #43 +- **Title**: VINS-Mono — A Robust and Versatile Monocular Visual-Inertial State Estimator (HKUST-Aerial-Robotics) +- **Link**: https://github.com/HKUST-Aerial-Robotics/VINS-Mono ; LICENCE: https://github.com/HKUST-Aerial-Robotics/VINS-Mono/blob/master/LICENCE +- **Tier**: L1 (canonical reference implementation; published in IEEE T-RO 2018 by Qin, Li, Shen) +- **Publication Date**: original 2018; repository last meaningful update 2024-02-25 (per GitHub commit log; 2024-05-23 simulation-data commit only) +- **Timeliness Status**: ⚠️ **Borderline.** ~24 months since the last meaningful master-branch commit at access time (2026-05-07). Established baseline that does NOT trigger Step 0.5's 18-month timeliness rejection because (a) IEEE T-RO publication is the canonical authority for the algorithm, (b) downstream forks (vins-mono-android, embedded variants) keep the algorithm class actively deployed. +- **Version Info**: No GitHub releases / tags (master-branch-only project). Stars 5,829. +- **Target Audience**: Mono+IMU VIO implementers; UAV state estimation researchers +- **Research Boundary Match**: **Full match for the candidate's pinned mode** — monocular camera + IMU producing 6-DoF metric pose. The VINS-Mono README explicitly names this configuration as primary. +- **Summary**: Optimization-based sliding-window monocular VIO. Features: efficient IMU pre-integration (Forster et al. 2017), automatic initialization, online camera-IMU extrinsic calibration, online camera-IMU temporal calibration, failure detection + recovery, loop detection (DBoW2-based), global pose graph optimization. Output is metric-scale 6-DoF pose at IMU rate (typically 100–200 Hz) with covariance from the optimization Hessian. **License: GPL-3.0 (copyleft viral)** — every binary distribution requires source disclosure for the entire linked binary; relevant for dual-use deployment if the companion image is sold or transferred to a customer. +- **Related Sub-question**: SQ3+SQ4 / C1 lead candidate + +### Source #44 +- **Title**: VINS-Fusion — Optimization-based multi-sensor state estimator (HKUST-Aerial-Robotics) +- **Link**: https://github.com/HKUST-Aerial-Robotics/VINS-Fusion ; LICENCE: https://github.com/HKUST-Aerial-Robotics/VINS-Fusion/blob/master/LICENCE +- **Tier**: L1 (canonical reference; superset of VINS-Mono) +- **Publication Date**: original 2019 (Qin, Cao, Pan, Shen — ICRA workshop / IROS); repository last update 2024-05-23 +- **Timeliness Status**: ⚠️ **Borderline.** ~24 months since the last update at access time. Same Step-0.5 reasoning as VINS-Mono — established class. +- **Version Info**: master-branch-only. Stars 4,476. Top-ranked open-source stereo-VIO on KITTI Odometry as of January 2019. +- **Target Audience**: Multi-sensor VIO implementers (mono+IMU, stereo, stereo+IMU, +GPS fusion) +- **Research Boundary Match**: **Full match** for monocular+IMU mode. VINS-Fusion README explicitly enumerates four sensor configurations (mono+IMU, stereo, stereo+IMU, +GPS toy example). +- **Summary**: Superset of VINS-Mono adding stereo and GPS-fusion modes. Same algorithmic core (sliding-window optimization with IMU pre-integration). Online spatial + temporal camera-IMU calibration; visual loop closure; ROS Kinetic/Melodic build dependency. **License: GPL-3.0** — same dual-use distribution constraint as VINS-Mono. Independent KAIST benchmark (Source #46) found VINS-Fusion CPU mode + VINS-Fusion-imu **fail to run** on Jetson TX2 (insufficient memory and CPU); GPU-accelerated VINS-Fusion-gpu does run on TX2. Implication for project: VINS-Fusion-imu on Jetson Orin Nano Super is feasible but not certain; needs MVE. +- **Related Sub-question**: SQ3+SQ4 / C1 lead candidate + +### Source #45 +- **Title**: OpenVINS — An open source platform for visual-inertial navigation research (Robot Perception and Navigation Group, U. of Delaware — rpng) +- **Link**: https://github.com/rpng/open_vins ; docs: https://docs.openvins.com/ ; LICENSE: https://github.com/rpng/open_vins/blob/master/LICENSE +- **Tier**: L1 (canonical research implementation; ICRA 2020 paper Geneva, Eckenhoff, Lee, Yang, Huang) +- **Publication Date**: original 2020; latest tagged release v2.7 = 2023-06; ongoing master-branch commits through 2024–2025 (latest issue threads through Feb 2025) +- **Timeliness Status**: ✅ Currently valid (master branch active; latest tagged release ~35 months but library is in stable/maintenance mode with continued issue triage). +- **Version Info**: Stars 2,828; 30 contributors; 12 releases. v2.7 is the current tagged stable. +- **Target Audience**: MSCKF/EKF VIO implementers; researchers needing a reference MSCKF +- **Research Boundary Match**: **Full match** for monocular+IMU mode. OpenVINS supports mono, stereo, multi-camera (1–N cameras) + IMU; mono is a documented first-class mode. +- **Summary**: Modular MSCKF (Multi-State Constraint Kalman Filter) implementation built around an Extended Kalman filter that fuses inertial state with sparse visual feature tracks via the sliding-window MSCKF formulation (Mourikis & Roumeliotis 2007). Supports SLAM features (in-state landmarks) plus pure MSCKF features (out-of-state). ROS1 + ROS2 (Humble) builds documented; Jetson Orin Nano Dev Kit + JetPack 6 + ROS 2 Humble compilation **confirmed working** by community contributors (rpng/open_vins issue #421, fdcl-gwu/openvins_jetson_realsense Nov 2025 setup guide). **License: GPL-3.0** — same dual-use distribution constraint. Reported latency ~270 ms on Xavier NX (4-core, ARM, 40% CPU usage) per issue #164; needs Jetson-Orin-Nano-Super MVE for production budget verification. +- **Related Sub-question**: SQ3+SQ4 / C1 lead candidate + +### Source #46 +- **Title**: Run Your Visual-Inertial Odometry on NVIDIA Jetson — Benchmark Tests on a Micro Aerial Vehicle (Jeon, Jung, Lee, Choi, Myung — KAIST) +- **Link**: https://arxiv.org/abs/2103.01655 ; KAIST VIO dataset: https://github.com/zinuok/kaistviodataset +- **Tier**: L1 (peer-reviewed conference, IROS-track preprint with public dataset) +- **Publication Date**: arXiv 2021-03-02 +- **Timeliness Status**: ⚠️ Older than the 18-month Critical-novelty window, but **uniquely authoritative** for the specific question "do these VIO algorithms run on a Jetson?"; the included algorithms (VINS-Mono, VINS-Fusion, ROVIO, ALVIO, Stereo-MSCKF, Kimera, ORB-SLAM2-stereo) are all classical baselines whose runtime characteristics on ARM CPUs have not changed materially. Jetson hardware comparison (TX2 / Xavier NX / AGX Xavier) does NOT include Orin Nano — must extrapolate. +- **Version Info**: Conference paper. +- **Target Audience**: UAV state-estimation engineers picking a VIO for a Jetson companion +- **Research Boundary Match**: **Strong match for the question**, partial for the hardware (no Orin Nano). KAIST VIO dataset is indoor mocap, not UAV-aerial-nadir — the *latency / CPU / memory* numbers transfer; the *accuracy* numbers do not transfer to our domain. +- **Summary**: Comprehensive benchmark of 9 algorithms on TX2, Xavier NX, AGX Xavier: VINS-Mono, VINS-Fusion (CPU), VINS-Fusion-gpu, VINS-Fusion-imu, ROVIO, Stereo-MSCKF, ALVIO, Kimera, ORB-SLAM2-stereo. **Hard findings**: (a) on TX2, **VINS-Fusion (CPU) and VINS-Fusion-imu fail to run** due to insufficient memory and CPU performance — VINS-Fusion-gpu does run; (b) all algorithms except ROVIO show >100% CPU usage (multi-core utilisation, OK for our 6-core Orin Nano A78AE); (c) Kimera has the highest memory usage among VIO methods (numerous computations per keyframe), failure-prone on Xavier NX-class memory; (d) Stereo-MSCKF has the lowest memory among stereo VIOs; (e) ROVIO has the lowest CPU usage owing to its patch-tracking formulation. **Implication for project**: Jetson Orin Nano Super (8 GB shared, 6-core A78AE, Ampere GPU, 67 TOPS sparse INT8) is between Xavier NX and AGX Xavier in CPU performance and memory; algorithms passing on Xavier NX should pass on Orin Nano Super, but VINS-Fusion-imu's TX2 failure is a yellow-flag for memory pressure under co-resident C2/C3/C5 modules. +- **Related Sub-question**: SQ3+SQ4 / C1 (VINS-Mono / VINS-Fusion / OpenVINS / Kimera / Stereo-MSCKF / ROVIO Jetson runtime evidence), SQ5 (resource-budget failure modes) + +### Source #47 +- **Title**: OKVIS2 — Realtime Scalable Visual-Inertial SLAM with Loop Closure (Leutenegger, ETH/Imperial/TUM Smart Robotics Lab) +- **Link**: https://github.com/ethz-mrl/okvis2 ; arXiv: https://arxiv.org/abs/2202.09199 ; LICENSE: https://github.com/ethz-mrl/okvis2/blob/main/LICENSE +- **Tier**: L1 (canonical implementation; arXiv 2022 by paper author) +- **Publication Date**: original arXiv 2022; OKVIS2-X T-RO 2025 successor (Boche, Jung, Laina, Leutenegger — IEEE T-RO 2025, vol 41 pp 6064–6083, DOI 10.1109/TRO.2025.3619051; arXiv 2510.04612, Oct 2025). Repository last push 2026-03-17 (ethz-mrl/OKVIS2-X). +- **Timeliness Status**: ✅ **Current.** Active development through 2026; OKVIS2-X is the most recent published VI-SLAM system in this class. +- **Version Info**: ethz-mrl/okvis2 (core) and ethz-mrl/OKVIS2-X (multi-sensor extension with optional GNSS / LiDAR / dense depth). +- **Target Audience**: Factor-graph VI-SLAM implementers; mid-large-scale loop-closure use cases +- **Research Boundary Match**: **Full match** for monocular+IMU mode. OKVIS2 README + paper explicitly support mono and multi-camera VI configurations. OKVIS2-X adds GNSS fusion (relevant: VINS-Fusion-style GPS-when-available drop-in IS the project's eventual posture in non-spoofed regions). +- **Summary**: Factor-graph VI-SLAM with bounded-size optimization. Innovation: pose-graph edges from marginalised observations can be "seamlessly turned back into observations" upon loop closure, reviving old landmarks and reprojection errors. Includes lightweight CNN segmentation for dynamic-region removal. OKVIS2-X (2025) generalises the core to fuse multi-camera + IMU + optional GNSS + LiDAR/depth — directly aligned with project's "VIO that may opportunistically fuse a non-spoofed GPS update" pattern and AC-NEW-2's spoof-promotion path. **License: 3-clause BSD (permissive)** — no copyleft / dual-use distribution friction. Note: GitHub UI shows "Other (NOASSERTION)" because of the standard BSD clause language pattern; the LICENSE file is canonical 3-clause BSD. +- **Related Sub-question**: SQ3+SQ4 / C1 lead candidate (factor-graph + permissive license + active maintenance) + +### Source #48 +- **Title**: OKVIS2-X: Open Keyframe-based Visual-Inertial SLAM Configurable with Dense Depth or LiDAR, and GNSS (Boche, Jung, Laina, Leutenegger — TUM / ETH Zurich Smart Robotics Lab) +- **Link**: https://github.com/ethz-mrl/OKVIS2-X ; arXiv: https://arxiv.org/abs/2510.04612 ; IEEE T-RO 2025 vol 41 pp 6064–6083 DOI 10.1109/TRO.2025.3619051 +- **Tier**: L1 (peer-reviewed IEEE Transactions on Robotics, Special Issue Visual SLAM 2025) +- **Publication Date**: arXiv 2025-10-04; T-RO 2025 vol 41 +- **Timeliness Status**: ✅ Current (within 6-month Critical-novelty window) +- **Version Info**: 295 stars; 38 forks; 2 contributors; created 2025-09-23, last push 2026-03-17. License: NOASSERTION on GitHub UI; per-paper license follows ethz-mrl convention (BSD-3 derived). +- **Target Audience**: Multi-sensor SLAM researchers; large-scale VI-SLAM with optional GNSS/LiDAR +- **Research Boundary Match**: **Strong match** — extends OKVIS2 monocular+IMU mode with optional GNSS fusion (Visual-Inertial SLAM with Tightly-Coupled Dropout-Tolerant GPS Fusion lineage from IROS 2022). Project's `MAV_CMD_SET_EKF_SOURCE_SET` switch + companion-side spoof-detection conceptually mirrors OKVIS2-X's "GPS as drop-out-tolerant signal". +- **Summary**: Non-trivial extension of OKVIS2; submap-based volumetric occupancy mapping. Demonstrates that the OKVIS2 factor-graph backbone can absorb spoofing-aware GPS without re-architecting. Useful as architectural template for project's C5 estimator + C8 adapter integration. License: same as OKVIS2 (BSD-3-derived). Two named contributors (bochsim, SebsBarbas) actively pushing through Mar 2026. +- **Related Sub-question**: SQ3+SQ4 / C1 (OKVIS2 lineage; VI-SLAM with optional GPS/LiDAR), SQ8 (GPS-fusion dropout-tolerant lineage) + +### Source #49 +- **Title**: Kimera-VIO — Visual Inertial Odometry with SLAM capabilities and 3D Mesh generation (MIT-SPARK) +- **Link**: https://github.com/MIT-SPARK/Kimera-VIO ; LICENSE.BSD: https://github.com/MIT-SPARK/Kimera-VIO/blob/master/LICENSE.BSD +- **Tier**: L1 (canonical implementation by MIT SPARK Lab) +- **Publication Date**: original 2020 (Rosinol, Abate, Chang, Carlone — ICRA 2020); ongoing development through 2024–2025 issue threads (Dec 2024 / Feb 2025 ROS2 / mono-inertial discussion). +- **Timeliness Status**: ✅ Active maintenance (recent issues / PRs through 2025). +- **Version Info**: master-branch-only; LICENSE.BSD = BSD 2-Clause "Simplified". +- **Target Audience**: VI-SLAM + mesh-mapping researchers +- **Research Boundary Match**: **Partial.** Stereo+IMU is the primary supported configuration; mono+IMU is **optional but documented**. Kimera also produces 3D mesh and high-level semantic labels (relevant to neither C1 nor the project's bandwidth budget — overhead). +- **Summary**: Frontend (image processing + IMU pre-integration) + Backend (factor-graph optimization in iSAM2 or GTSAM) + Mesher + Pose-Graph-Optimizer. **License: BSD 2-Clause (permissive)** — no dual-use distribution friction. **Penalty for project**: Source #46 KAIST benchmark found Kimera has highest memory usage among the VIOs tested (numerous computations per keyframe), and Kimera failed to fit on Xavier-NX-class memory under multi-process load. Mesh + semantic features are unused by the project — Kimera's overhead is unjustified vs OKVIS2 / OpenVINS for the project's narrow C1 mandate. **Status**: viable secondary fallback if OKVIS2 / VINS-Mono runtime issues arise; not a lead candidate due to overhead misfit. +- **Related Sub-question**: SQ3+SQ4 / C1 secondary candidate (BSD-permissive but resource-heavy) + +### Source #50 +- **Title**: DROID-SLAM — Deep Visual SLAM for Monocular, Stereo, and RGB-D Cameras (princeton-vl, Teed & Deng) +- **Link**: https://github.com/princeton-vl/droid-slam ; arXiv: https://arxiv.org/abs/2108.10869 ; NeurIPS 2021 +- **Tier**: L1 (canonical reference) +- **Publication Date**: NeurIPS 2021; repository latest tagged baseline. +- **Timeliness Status**: ✅ Foundational reference; DPV-SLAM (Source #51) is the lighter successor. +- **Version Info**: master-branch-only. +- **Target Audience**: Deep-learning-based VO/VSLAM researchers +- **Research Boundary Match**: **Disqualified by hardware budget.** Inference requires ≥11 GB GPU VRAM per official README; project budget is 8 GB **shared CPU+GPU** on Jetson Orin Nano Super, leaving <8 GB for VO + VPR + matcher + estimator + cache co-resident. DROID-SLAM is also **monocular VO/SLAM, not VIO** — no native IMU fusion; metric scale recovery requires external scale alignment. +- **Summary**: Recurrent dense bundle adjustment over a complete history of camera poses. State-of-the-art accuracy on TartanAir / EuRoC / TUM-RGBD at the cost of GPU memory. **Disqualified outright for C1 lead** by AC-4.2 (≤8 GB shared RAM) and the lack of IMU fusion that would require an additional ESKF/UKF wrapping. Kept as **reference baseline** to be cited as "what we cannot afford" in `solution_draft01`. +- **Related Sub-question**: SQ3+SQ4 / C1 disqualified candidate + +### Source #51 +- **Title**: DPVO — Deep Patch Visual Odometry (princeton-vl, Teed, Lipson, Deng) + DPV-SLAM (Lipson, Teed, Deng — ECCV 2024) +- **Link**: https://github.com/princeton-vl/DPVO ; LICENSE: https://github.com/princeton-vl/DPVO/blob/main/LICENSE ; ECCV 2024 paper: https://www.ecva.net/papers/eccv_2024/papers_ECCV/papers/00272.pdf +- **Tier**: L1 (canonical implementation; NeurIPS 2023 + ECCV 2024) +- **Publication Date**: NeurIPS 2023 (DPVO); ECCV 2024 (DPV-SLAM); repository last update 2024-10-12. +- **Timeliness Status**: ⚠️ Borderline. ~19 months since last code update; ECCV-2024 publication of DPV-SLAM keeps the algorithm class within the 6-month claim window for the SLAM successor. +- **Version Info**: 989 stars; primary languages C++ / Python / CUDA. **License: MIT (permissive)** — no dual-use distribution friction. +- **Target Audience**: Deep-learning VO/SLAM with reduced memory footprint +- **Research Boundary Match**: **Partial.** DPVO is **monocular VO only — no IMU fusion**. Output pose is in arbitrary scale (no metric scale recovery). To be a viable C1 candidate the project must wrap DPVO with an external IMU+scale-fusion stage (loosely-coupled ESKF / VIO-fusion module). This makes DPVO **not a drop-in C1** like VINS-Mono / OpenVINS / OKVIS2; it is a **VO module that needs a separate VIO wrapper**. +- **Summary**: Sparse patch tracking + differentiable bundle adjustment back end. Outperforms DROID-SLAM on TartanAir / EuRoC ATE while using ~1/3 of DROID-SLAM's GPU memory (DROID-SLAM: 8.7 GB VO mode vs DPVO: ~3 GB). DPV-SLAM (Lipson, Teed, Deng — ECCV 2024) adds full SLAM capability with 4–5 GB GPU usage. **Jetson runtime evidence**: indirect via DPVO-QAT++ (Source #52) — peak reserved memory 1.02 GB on RTX 4060 (8 GB) after INT8 fake-quant + custom CUDA kernel fusion; not directly tested on Jetson Orin Nano. **Status for C1**: pure-VO candidate (must be paired with separate IMU integration to deliver metric scale + attitude); would not satisfy "monocular VIO" gate alone, but viable as the *VO half* of a hybrid C1+C5 design. +- **Related Sub-question**: SQ3+SQ4 / C1 conditional candidate (VO not VIO; needs external IMU wrapper) + +### Source #52 +- **Title**: DPVO-QAT++: Heterogeneous QAT and CUDA Kernel Fusion for High-Performance Deep Patch Visual Odometry (Cheng Liao) +- **Link**: https://arxiv.org/abs/2511.12653 ; project HTML: https://arxiv.org/html/2511.12653 +- **Tier**: L2 (single-author preprint, code partially released; no peer-review yet) +- **Publication Date**: arXiv 2025-11-16 (within 6-month Critical-novelty window) +- **Timeliness Status**: ✅ Current +- **Version Info**: arXiv preprint; code & weights released for QAT-only and fused-CUDA variants. +- **Target Audience**: Embedded-platform DPVO deployers +- **Research Boundary Match**: **Partial.** Hardware tested = RTX 4060 (8 GB) + Intel Core Ultra 5-125H + 32 GB RAM — desktop GPU, NOT Jetson Orin Nano. Direct extrapolation requires Jetson MVE; Orin Nano Super's Ampere GPU is architecturally similar but smaller than RTX 4060. +- **Summary**: Quantization-Aware Training framework for DPVO with fused CUDA kernels. Reduces peak GPU memory from 1.94 GB → 1.02 GB (-47%) on a representative TartanAir sequence; +34.6% median FPS on TartanAir, +26.7% on EuRoC; -22.8 ms / -19.7 ms median P99 tail latency on TartanAir / EuRoC respectively. Heterogeneous precision: front-end pseudo-quantization (FP16/FP32 with INT8 simulation) + FP32 back-end geometric solver. **Implication for project**: shows DPVO has a documented Jetson-suitable footprint **path** but not a Jetson-Orin-Nano measurement. ATE accuracy comparable to baseline DPVO across 32 TartanAir + 11 EuRoC validation sequences. Notable: requires a teacher-student distillation training pipeline before deployment — adds operational complexity vs classical VINS-* / OpenVINS / OKVIS2. +- **Related Sub-question**: SQ3+SQ4 / C1 supporting evidence for DPVO embedded feasibility + +### Source #53 +- **Title**: Pure VO baseline — KLT optical flow + 5-point essential matrix or homography RANSAC (OpenCV reference) +- **Link**: https://docs.opencv.org/4.x/d4/dee/tutorial_optical_flow.html ; representative public implementation: https://github.com/alishobeiri/Monocular-Video-Odometery (MIT, 2018) ; tutorial reference: https://zxh.me/posts/2022-12-19-monocular-visual-odometry/ +- **Tier**: L1 (OpenCV official documentation) + L2 (representative public implementations) +- **Publication Date**: OpenCV docs continuously updated; tutorial 2022-12; reference implementation 2018 (algorithmic class is foundational, no time window per Step 0.5) +- **Timeliness Status**: ✅ Foundational baseline (no time window). +- **Version Info**: OpenCV `cv::calcOpticalFlowPyrLK` (KLT) + `cv::findEssentialMat` (5-point Nister) or `cv::findHomography` with RANSAC. +- **Target Audience**: Implementers needing a transparent low-complexity fallback +- **Research Boundary Match**: **Full match for the simple-baseline candidate.** Suits planar nadir-down UAV at altitude (Ukrainian steppe is ~planar at 1 km AGL — homography is geometrically appropriate; for non-planar relief the essential matrix path is more appropriate but adds scale-recovery work). +- **Summary**: Established classical pipeline: Shi-Tomasi or FAST corner detection → KLT pyramidal optical flow tracking → 5-point essential matrix or homography RANSAC → relative pose with arbitrary scale (must be metric-scale-aligned via IMU integration externally). Reference implementations widely available in OpenCV samples and pedagogical repos. **Status**: candidate as the project's `Simple baseline / known-runnable / known-failure-mode` C1 option per Component Option Breadth rule. Not a lead, but mandatory fallback presence per the research engine's "include at least one simple baseline" rule. +- **Related Sub-question**: SQ3+SQ4 / C1 simple-baseline candidate diff --git a/_docs/00_research/02_fact_cards.md b/_docs/00_research/02_fact_cards.md index 76907d2..8597feb 100644 --- a/_docs/00_research/02_fact_cards.md +++ b/_docs/00_research/02_fact_cards.md @@ -408,3 +408,136 @@ Saturation signals observed: 4 perspectives saturated, ≥3 high-confidence fact ### Boundary check: SQ2 is saturated Saturation signals observed: (a) four independent surveys/benchmarks (Skoltech aerial-VPR survey, U.Maine cross-view survey, OrthoLoC benchmark, AnyVisLoc benchmark, NUDT 2026 absolute-VL survey) converge on the **same** "retrieval → matching → pose-estimation hierarchical framework" as canonical; (b) two independent runtime sources (Skoltech survey on RTX 3090; AnyVisLoc on RTX 3090 with explicit dense-vs-sparse breakdown) agree on the relative cost ordering of model classes; (c) cross-source agreement on AdHoP value (Source #40 only, but with reproducible code and dataset — single-source-but-strong evidence); (d) cross-source agreement on covisibility / sensor-prior thresholds. Two outstanding decisions are flagged for user — neither blocks SQ2's saturation status, both block SQ3+SQ4 start. Per `references/source-tiering.md` "Search saturation rule" → SQ2 is closed pending user decisions on DSM dependency + AdHoP gating. + +--- + +## SQ3+SQ4 / C1 — Visual / Visual-Inertial Odometry candidate enumeration + +> **Project's pinned mode for every C1 candidate (binding)**: monocular ADTi 20MP nav camera @ 3 fps + IMU from FC over MAVLink @ ≥100 Hz, on Jetson Orin Nano Super (JetPack/CUDA/TensorRT, 8 GB shared LPDDR5, 25 W TDP), producing relative 6-DoF metric pose between consecutive frames + per-axis covariance, with attitude (yaw + pitch) hard-contract σ ≤ 5° at 1 σ (Fact #24), output cadence ≥3 Hz, no in-flight network, license compatible with onboard-binary distribution to a dual-use customer. +> +> Per the engine's "Per-Mode API Capability Verification" rule, any candidate marked `Selected` requires a `context7` lookup (mode enum + project's exact mode runnable example + disqualifier probe) AND a per-numbered-Restriction × per-numbered-AC sub-matrix. **This session covers candidate enumeration + preliminary applicability assessment only**; `context7` verification and the structured sub-matrix are deferred to the next session per the autodev context budget heuristic. + +### Fact #28 — VINS-Mono is a canonical monocular-only sliding-window VIO with a working Jetson-Nano deployment record but no GitHub release and ~24-month-old master branch +- **Statement**: VINS-Mono is the canonical mono+IMU sliding-window VIO from HKUST-Aerial-Robotics (Qin, Li, Shen — IEEE T-RO 2018). Features: efficient IMU pre-integration, automatic initialization, online camera-IMU spatial + temporal calibration, failure detection + recovery, DBoW2 loop detection, global pose-graph optimization. Output: metric-scale 6-DoF pose at IMU rate. **Repository state**: master-branch only (no tagged releases), 5,829 stars; last meaningful master-branch commit 2024-02-25 with a 2024-05-23 simulation-data commit. **Jetson record**: a 2021 IEICE paper (zinuok / KAIST) demonstrated VINS-Mono real-time on the original Jetson Nano (much weaker than Orin Nano Super) for MAV state estimation; a 2024 arXiv paper (2406.13345) showed an enhanced VINS-Mono variant achieving 50 FPS on a Raspberry Pi CM4 with on-sensor accelerated optical flow. **License**: GPL-3.0 (copyleft viral) — distribution of the onboard binary requires source disclosure for the entire linked binary and triggers GPL-3 anti-tivoization clauses for embedded firmware. +- **Source**: Source #43 (canonical), Source #46 (KAIST Jetson benchmark), Source #43-linked LICENCE for license confirmation +- **Phase**: Phase 2 +- **Target Audience**: System architects + C1 implementer +- **Confidence**: ✅ for algorithm class, mode support, and Jetson Nano feasibility; ⚠️ for Jetson Orin Nano Super specific latency (no direct measurement — but Orin Nano Super >> Jetson Nano, so feasibility is virtually certain); ⚠️ for the maintenance-status risk implied by ~24-month-old master branch. +- **Related Dimension**: SQ3+SQ4 / C1 Established-production candidate +- **Fit Impact**: **carry as lead candidate, conditional on user license decision.** Algorithmic fit is excellent (canonical mono+IMU VIO with metric scale and covariance); maintenance status is borderline; **GPL-3.0 license is a project-level decision required from the user** before this candidate can be marked Selected — see "C1 Open Decisions" section below. + +### Fact #29 — VINS-Fusion is a multi-sensor superset of VINS-Mono but its monocular+IMU mode failed to run on Jetson TX2 in a 2021 KAIST benchmark; Orin Nano Super feasibility unverified +- **Statement**: VINS-Fusion (Qin, Cao, Pan, Shen — extension of VINS-Mono) supports four documented sensor configurations: stereo+IMU, mono+IMU, stereo only, +GPS-fusion (toy example). KITTI Odometry top-ranked open-source stereo algorithm as of January 2019. **Repository state**: 4,476 stars; last update 2024-05-23; same master-branch-only convention. **Jetson record**: KAIST 2021 benchmark (Source #46) — on Jetson TX2, both **VINS-Fusion (CPU) and VINS-Fusion-imu fail to run** due to insufficient memory and CPU; VINS-Fusion-gpu (GPU-accelerated front-end) runs on TX2. Orin Nano Super has more memory than TX2 (8 GB LPDDR5 shared vs TX2's 8 GB LPDDR4 shared) and stronger CPU/GPU, but the project's onboard stack is *co-resident* with C2 VPR + C3 matcher + C5 estimator + C6 cache → memory-pressure on the VINS-Fusion-imu path is plausible. **License**: GPL-3.0, same dual-use distribution constraint as VINS-Mono. +- **Source**: Source #44 (canonical), Source #46 (KAIST Jetson benchmark) +- **Phase**: Phase 2 +- **Target Audience**: System architects + C1 implementer +- **Confidence**: ✅ for the multi-sensor mode support and KITTI ranking; ✅ for the 2021 TX2 failure-to-run finding; ⚠️ for Orin Nano Super viability (between TX2 and Xavier NX in CPU/memory; not yet measured). +- **Related Dimension**: SQ3+SQ4 / C1 Open-source candidate +- **Fit Impact**: **carry as alternate candidate, with mandatory Jetson Orin Nano Super MVE before promotion.** VINS-Mono's narrower scope (mono+IMU only, no stereo overhead) makes VINS-Mono the preferred lead within the HKUST-Aerial-Robotics family; VINS-Fusion's multi-sensor coverage is a distractor for our pinned mode. **GPL-3.0 license decision is the same as VINS-Mono** — see "C1 Open Decisions". + +### Fact #30 — OpenVINS is the most actively maintained MSCKF-class VIO and runs on Jetson Orin Nano Dev Kit + JetPack 6 + ROS 2 Humble with documented build adjustments; latency 270 ms on Xavier NX needs Orin-Nano-Super MVE +- **Statement**: OpenVINS (rpng, U. Delaware — Geneva, Eckenhoff, Lee, Yang, Huang — ICRA 2020) is a modular MSCKF (Multi-State Constraint Kalman Filter) implementation that fuses IMU state with sparse visual feature tracks via the Mourikis-Roumeliotis 2007 sliding-window MSCKF. **Mode support**: monocular, stereo, multi-camera (1–N) + IMU; mono+IMU is a documented first-class configuration. Supports SLAM features (in-state landmarks) plus pure MSCKF features. **Jetson Orin Nano evidence**: rpng/open_vins issue #421 (Genozen, Feb 2024, closed) confirms OpenVINS ROS 2 builds on Jetson Orin Nano Dev Kit + JetPack 6 + Ubuntu 22.04 + ROS 2 Humble after one build patch (`#include ` with newer OpenCV); fdcl-gwu/openvins_jetson_realsense (Nov 2025) provides a complete setup guide for Jetson Orin Nano + Intel RealSense + librealsense compiled-from-source + `--parallel-workers 1` build to avoid memory issues. **Latency record**: rpng/open_vins issue #164 — ~270 ms latency on Jetson Xavier NX (4 cores, 40% CPU utilisation). Recommended optimisations: subscriber queue size 1, Release builds with ARM-specific optimization flags (e.g., `armv8.2-a`), reduced camera resolution, prefer `odometry` topic over `pose_imu`. **License**: GPL-3.0, same dual-use distribution constraint as VINS-Mono / VINS-Fusion. Stars 2,828; 30 contributors; 12 releases; latest tag v2.7 (June 2023) but master branch active through 2024–2025 issue threads. +- **Source**: Source #45 (canonical + LICENSE + docs.openvins.com), Source #46 (KAIST Jetson benchmark for class-level CPU/memory profile), agent-tools record `29ebf728...txt` (Jetson Orin Nano build evidence) +- **Phase**: Phase 2 +- **Target Audience**: System architects + C1 implementer +- **Confidence**: ✅ for mode support, MSCKF formulation, and Jetson Orin Nano build feasibility; ⚠️ for steady-state latency on Orin Nano Super under our 5472×3648 nav frames — KAIST benchmark used 640×480; 16× pixel count is a yellow-flag. +- **Related Dimension**: SQ3+SQ4 / C1 Established-production candidate +- **Fit Impact**: **carry as lead candidate, conditional on user license decision.** OpenVINS has the most documented Jetson-Orin-Nano build path of the three GPL-3.0 candidates; MSCKF formulation is more memory-efficient than VINS-Mono's full sliding-window optimisation, which is a meaningful advantage under co-resident-process memory pressure. **GPL-3.0 license decision is the same as VINS-Mono / VINS-Fusion**. + +### Fact #31 — OKVIS2 is the most actively maintained VI-SLAM in the BSD-permissive license bucket; OKVIS2-X (T-RO 2025) extends it with optional GNSS fusion that is architecturally aligned with the project's spoof-promotion path +- **Statement**: OKVIS2 (Leutenegger — arXiv 2022, ETH/Imperial/TUM Smart Robotics Lab) is a factor-graph VI-SLAM with bounded-size optimization. Algorithmic novelty: pose-graph edges from marginalised observations are "seamlessly turned back into observations" upon loop closure, reviving old landmarks and reprojection errors. Includes lightweight CNN segmentation for dynamic-region removal. **Mode support**: monocular and multi-camera + IMU; mono+IMU is a documented first-class configuration. **Successor OKVIS2-X (Boche, Jung, Laina, Leutenegger — IEEE T-RO 2025 vol 41 pp 6064–6083, DOI 10.1109/TRO.2025.3619051; arXiv 2510.04612, Oct 2025)** generalises the core to fuse multi-camera + IMU + optional GNSS receiver + LiDAR or depth. The OKVIS2-X GNSS-fusion mode (lineage: Visual-Inertial SLAM with Tightly-Coupled Dropout-Tolerant GPS Fusion, IROS 2022) directly mirrors the project's "VIO that may opportunistically fuse a non-spoofed GPS update when promotion completes" pattern (AC-NEW-2). **Repository state**: ethz-mrl/OKVIS2-X created 2025-09-23, last push 2026-03-17, 295 stars, 2 active contributors (bochsim, SebsBarbas). **License**: 3-clause BSD on the LICENSE file (GitHub UI shows "Other (NOASSERTION)" but the file is canonical 3-clause BSD per ASL-ETH Zurich convention) — permissive, no dual-use distribution friction. +- **Source**: Source #47 (OKVIS2 canonical), Source #48 (OKVIS2-X T-RO 2025) +- **Phase**: Phase 2 +- **Target Audience**: System architects + C1 / C5 implementer +- **Confidence**: ✅ for algorithm, mode support, license, T-RO 2025 publication, repository activity; ⚠️ for Jetson Orin Nano runtime — no direct Jetson Orin Nano benchmark located; OKVIS2's factor-graph backend is plausibly heavier than OpenVINS' MSCKF on memory but lighter than Kimera (Kimera also produces a 3D mesh + semantic mesher, OKVIS2 does not). +- **Related Dimension**: SQ3+SQ4 / C1 Open-source-permissive lead candidate; potential C1+C5+C8 unified factor-graph design +- **Fit Impact**: **strong lead candidate by license + maintenance + GNSS-fusion alignment.** If license permissiveness is a priority, OKVIS2 + OKVIS2-X is the natural choice. The OKVIS2-X factor-graph also opens a design path where C5 (state estimator) collapses INTO C1 (the same factor graph absorbs sat-anchor measurements as constraints) — would simplify the pipeline at the cost of departing from the C1/C5 split, which is a Step-7.5 / `solution_draft01` design decision, not a SQ3+SQ4 question. **Pending Jetson Orin Nano Super MVE.** + +### Fact #32 — Kimera-VIO is BSD-permissive but resource-heavy; KAIST benchmark found Kimera had the highest memory usage among VIOs tested and failed Xavier-NX-class memory under multi-process load +- **Statement**: Kimera-VIO (MIT-SPARK — Rosinol, Abate, Chang, Carlone — ICRA 2020) is a VI-SLAM pipeline with frontend + backend (factor-graph optimization in iSAM2 or GTSAM) + 3D mesher + pose-graph optimizer. Mode support: stereo+IMU primary, mono+IMU optional but documented. **License**: BSD 2-Clause "Simplified" (LICENSE.BSD on the repo) — permissive. **Maintenance**: active issue/PR threads through Dec 2024 / Feb 2025 covering ROS 2 integration, mono-inertial discussion, dependency management. **Resource profile** (Source #46 KAIST 2021 benchmark): Kimera had the highest memory usage among the 9 algorithms tested (numerous computations per keyframe); Kimera failed to fit on Xavier NX-class memory under sustained multi-process load. The 3D mesh + semantic-label outputs are unused by the project's narrow C1 mandate (relative 6-DoF + covariance only) — Kimera's overhead is unjustified vs OKVIS2 / OpenVINS for our use case. +- **Source**: Source #49 (Kimera canonical + LICENSE.BSD), Source #46 (KAIST Jetson benchmark) +- **Phase**: Phase 2 +- **Target Audience**: System architects (build-vs-buy, mesh-feature decision) +- **Confidence**: ✅ for algorithm, license, maintenance status; ✅ for the Source #46 finding (KAIST 2021); ⚠️ for whether Orin Nano Super's larger memory + Ampere GPU lifts Kimera into feasibility — the Source-46 failure was on Xavier NX 8 GB shared, same memory budget as Orin Nano Super, but Orin Nano Super has higher per-core throughput. +- **Related Dimension**: SQ3+SQ4 / C1 Open-source-permissive secondary candidate +- **Fit Impact**: **carry as fallback only, not lead.** Kimera's permissive license is attractive but its resource overhead (especially the unused 3D mesh + semantic mesher) is a poor fit under co-resident process pressure. Use as a conservative secondary fallback if OKVIS2 unexpectedly fails Jetson MVE. **Status**: not lead. + +### Fact #33 — DROID-SLAM is disqualified by AC-4.2: ≥11 GB GPU VRAM inference budget exceeds the project's 8 GB shared LPDDR5; further, DROID-SLAM is monocular VO/SLAM without IMU fusion and would require an external metric-scale wrapper +- **Statement**: DROID-SLAM (princeton-vl, Teed & Deng — NeurIPS 2021; arXiv 2108.10869) requires ≥11 GB GPU memory to run inference per the official README; training requires ≥24 GB on 4× RTX 3090. Issue #121 confirms that even with 128 GB system RAM and 16 GB VRAM (RTX 4080), users hit very large RAM consumption quickly. Algorithmically, DROID-SLAM is **monocular VO/SLAM** with recurrent dense bundle adjustment over a complete history of camera poses — no native IMU fusion; output pose is in arbitrary scale (no metric scale recovery without external alignment). DPV-SLAM (ECCV 2024, princeton-vl) is the lighter successor at ~4–5 GB GPU memory; DPVO (NeurIPS 2023, princeton-vl) is even lighter at ~3 GB, but neither natively integrates IMU. +- **Source**: Source #50 (DROID-SLAM canonical), Source #51 (DPVO / DPV-SLAM successor), Source #52 (DPVO-QAT++ memory measurement) +- **Phase**: Phase 2 +- **Target Audience**: System architects + C1 implementer +- **Confidence**: ✅ +- **Related Dimension**: SQ3+SQ4 / C1 disqualified candidate +- **Fit Impact**: **DISQUALIFIED outright.** AC-4.2 sets the 8 GB shared CPU+GPU memory budget; DROID-SLAM's ≥11 GB GPU-only requirement violates it before adding co-resident C2/C3/C5/C6 processes. Cite as "what the project cannot afford" in `solution_draft01` to pre-empt obvious questions. + +### Fact #34 — DPVO is monocular VO only (no IMU fusion); it can fit a Jetson-suitable memory footprint with QAT but cannot satisfy the C1 VIO mandate alone — would need an external IMU + metric-scale wrapper +- **Statement**: DPVO (Teed, Lipson, Deng — NeurIPS 2023; ECCV 2024 DPV-SLAM successor) is a deep-learning monocular VO with sparse patch tracking + differentiable bundle adjustment. **Mode**: monocular VO only — no IMU fusion in the published paper or repository; output pose is in arbitrary scale. Memory footprint: DPVO ~3 GB GPU, DPV-SLAM ~4–5 GB GPU on standard hardware; DPVO-QAT++ (arXiv 2511.12653, Cheng Liao, Nov 2025) reduces peak reserved memory to 1.02 GB on RTX 4060 (8 GB) via fused-CUDA INT8 fake-quantization while preserving ATE on TartanAir/EuRoC. **License**: MIT (permissive). Repository: 989 stars; last update 2024-10-12. **Crucial gap**: DPVO does NOT meet the C1 mandate of a "VIO that produces metric-scale 6-DoF + attitude with σ ≤ 5°" — for the project to use DPVO as the *VO half* of C1, an additional IMU+scale-fusion module (loosely-coupled ESKF with VO velocity / displacement priors) must be designed; alternatively, DPVO's pose can feed C5 directly as a relative-displacement constraint, with attitude served separately by FC IMU integration. **Jetson Orin Nano runtime evidence**: indirect — DPVO-QAT++ benchmarks on RTX 4060 desktop, NOT Jetson Orin Nano. The Ampere GPU architecture is shared between RTX 4060 and Orin Nano Super (both Ampere); the Orin Nano Super's GPU is smaller, so direct extrapolation is not safe — Jetson MVE required. +- **Source**: Source #51 (DPVO / DPV-SLAM canonical), Source #52 (DPVO-QAT++ Nov 2025) +- **Phase**: Phase 2 +- **Target Audience**: System architects + C1 / C5 implementer +- **Confidence**: ✅ for "VO only, no IMU fusion" and the memory footprints; ⚠️ for Jetson Orin Nano direct runtime (no measurement); ⚠️ for the operational complexity of the QAT pipeline (teacher-student distillation training is a significant prerequisite vs the classical VINS-* / OpenVINS / OKVIS2 candidates). +- **Related Dimension**: SQ3+SQ4 / C1 conditional candidate (VO not VIO; needs external IMU wrapper) +- **Fit Impact**: **NOT a drop-in C1 candidate; conditional fit only.** DPVO is **not** a substitute for VINS-Mono / OpenVINS / OKVIS2 — it is a candidate for the *VO half* of a hybrid design where C5 (estimator) absorbs IMU and DPVO provides relative-pose priors. This adds design complexity and is **not preferred** unless one of the established VIO candidates fails Jetson MVE for memory reasons. **Status**: secondary, conditional. + +### Fact #35 — Pure VO baseline (KLT optical flow + 5-point essential matrix or homography RANSAC) is the project's mandatory simple-baseline candidate and is the de-facto fallback when learning-based methods fail on Jetson-budget constraints +- **Statement**: The classical pipeline — Shi-Tomasi or FAST corner detection → KLT pyramidal optical flow tracking (`cv::calcOpticalFlowPyrLK`) → 5-point essential matrix (Nister, `cv::findEssentialMat`) or homography RANSAC (`cv::findHomography`) → relative pose with arbitrary scale → metric-scale alignment via IMU integration externally — is the foundational visual-odometry pipeline implemented in OpenCV samples and pedagogical repositories. For the project's nadir-down UAV at 1 km AGL over Ukrainian steppe (predominantly planar terrain, low relief), the **homography path is geometrically appropriate** (a plane induces a homography between two views); for non-planar relief, the **essential-matrix path is appropriate** at a small overhead. License: public domain / OpenCV-Apache-2.0 / MIT (whatever reference implementation is chosen) — permissive. Reference: representative public Monocular-Video-Odometery (MIT, alishobeiri 2018), Monocular-Visual-Odometry (Yacynte) at translation error 0.94% / rotation error 0.015°/m on KITTI dataset. +- **Source**: Source #53 (OpenCV docs + reference implementations) +- **Phase**: Phase 2 +- **Target Audience**: System architects + C1 implementer + risk reviewer +- **Confidence**: ✅ +- **Related Dimension**: SQ3+SQ4 / C1 Simple-baseline candidate (mandatory per Component Option Breadth rule) +- **Fit Impact**: **carry as the project's `Simple baseline / known-runnable / known-failure-mode` C1 fallback.** Not a lead, but mandatory presence. Failure modes: (a) low-texture cropland / snow → KLT track loss; (b) sharp turns → low-overlap homography degeneracy; (c) no native IMU fusion → must wrap with external metric-scale alignment (same wrapper as DPVO). **Status**: simple-baseline reference; cited in `solution_draft01` to anchor the failure analysis. + +### Fact #36 — Step-0.5-time-window assessment: VINS-Mono / VINS-Fusion master branches are at the Critical-novelty 18-month boundary; OpenVINS and OKVIS2 are within window; DPVO is borderline; the established baselines (KLT + RANSAC) are exempt +- **Statement**: Per Step 0.5 timeliness assessment in `00_question_decomposition.md`, Critical-novelty topics require sources within 6 months for SOTA claims and 18 months for established libraries' API behaviour. Audit at access time 2026-05-07: VINS-Mono master last meaningful commit 2024-02-25 → ~27 months → **just over the 18-month window**; VINS-Fusion 2024-05-23 → ~24 months → just over; OpenVINS master active (issue threads through Feb 2025) and v2.7 release June 2023 → ~35 months for the tagged release but master in stable maintenance → within de-facto window for an established library; OKVIS2-X push 2026-03-17 → ~2 months → **fully within window**; DPVO last code update 2024-10-12 → ~19 months → just over but DPV-SLAM ECCV 2024 keeps the algorithm class within 6-month claim window; KLT / 5-point / RANSAC / homography → established baselines per Step 0.5 → **no time window applies**. **Implication**: VINS-Mono / VINS-Fusion fall into the "older than 18 months but classical authoritative reference" bucket — Step 0.5 allows up to 18 months strictly, but downstream forks (vins-mono-android, embedded variants) and the IEEE T-RO 2018 publication keep the algorithm class in active community use. Recommended treatment: **keep as candidates but require live MVE on Jetson Orin Nano Super before promotion to Selected**, to revalidate against the current OpenCV / Ceres / ROS 2 stack. +- **Source**: Source #43, Source #44, Source #45, Source #47, Source #48, Source #51 (timeliness audit per source) +- **Phase**: Phase 2 +- **Target Audience**: Step-7.5 reviewer + System architects +- **Confidence**: ✅ +- **Related Dimension**: SQ3+SQ4 / C1 candidate-pool integrity +- **Fit Impact**: **applies a conservative timeliness gate: every C1 candidate from VINS-Mono / VINS-Fusion / DPVO requires an Orin-Nano-Super MVE before being marked Selected**, since their master-branch staleness pushes them out of the Critical-novelty 18-month window. OpenVINS / OKVIS2 / OKVIS2-X / Kimera are within window via active issue threads or recent releases. + +### C1 Component Applicability Gate — preliminary table (this session; structured Restrictions×AC sub-matrix per candidate is next session's work) + +| Candidate | Mode (project) | License | Active maintenance? | Jetson Orin Nano Super runnable? | Native IMU fusion? | Native metric scale? | License blocks dual-use? | Preliminary status | +|---|---|---|---|---|---|---|---|---| +| **VINS-Mono** | mono+IMU | GPL-3.0 (copyleft) | ⚠️ borderline (24 mo) | ✅ proven on Jetson Nano (2021) → Orin Nano Super virtually certain | ✅ | ✅ | **⚠️ Verify with user** | Lead candidate **conditional on user license decision** + Orin-Nano-Super MVE | +| **VINS-Fusion** | mono+IMU (mode) | GPL-3.0 | ⚠️ borderline (24 mo) | ⚠️ failed on TX2 (KAIST 2021); Orin Nano Super untested | ✅ | ✅ | **⚠️ Verify with user** | Alternate, secondary to VINS-Mono within HKUST family | +| **OpenVINS** | mono+IMU | GPL-3.0 | ✅ active master | ✅ build confirmed on Orin Nano Dev Kit + JetPack 6 (2024 + 2025 community evidence); ~270 ms latency on Xavier NX | ✅ MSCKF | ✅ | **⚠️ Verify with user** | **Lead candidate** **conditional on user license decision** (best Jetson-Orin-Nano evidence + most maintained of the GPL-3 trio) | +| **OKVIS2 / OKVIS2-X** | mono+IMU (+ optional GNSS) | BSD-3 | ✅ very active (2026 pushes) | ⚠️ no direct Jetson Orin Nano measurement; factor-graph backbone plausibly heavier than MSCKF | ✅ | ✅ | ✅ no | **Lead candidate by license + maintenance + spoof-promotion architectural alignment**, pending Jetson MVE | +| **Kimera-VIO** | mono+IMU (optional) | BSD-2 | ✅ active | ⚠️ failed on Xavier NX 8 GB shared under multi-process (KAIST 2021) | ✅ | ✅ | ✅ no | Fallback secondary; resource overhead poor fit for project | +| **DROID-SLAM** | mono VO/SLAM only | (project repo) | reference baseline | ❌ ≥11 GB GPU VRAM > 8 GB AC-4.2 budget | ❌ | ❌ (arbitrary scale) | n/a | **DISQUALIFIED** by AC-4.2 | +| **DPVO / DPV-SLAM** | mono VO only | MIT | ⚠️ borderline (19 mo on code, ECCV 2024 paper) | ⚠️ DPVO-QAT++ (Nov 2025) shows 1.02 GB peak on RTX 4060 desktop; Jetson Orin Nano untested | ❌ (needs external IMU wrapper) | ❌ (needs external scale alignment) | ✅ no | Conditional secondary — VO half of a hybrid C1+C5 design only; not a drop-in VIO replacement | +| **Pure VO baseline (KLT + 5pt RANSAC / homography)** | mono VO only | OpenCV-Apache-2.0 / MIT | ✅ foundational (no time window) | ✅ runs on any Jetson | ❌ (needs external IMU wrapper) | ❌ (needs external scale alignment) | ✅ no | **Mandatory simple-baseline reference** per Component Option Breadth rule | + +**Surviving lead candidates (preliminary)**, in priority order based on this session's evidence: +1. **OpenVINS** (GPL-3.0, MSCKF, best Jetson Orin Nano evidence) — pending user license decision + Orin-Nano-Super MVE +2. **OKVIS2 / OKVIS2-X** (BSD-3, factor-graph + GNSS-fusion alignment, most active maintenance) — pending Jetson MVE +3. **VINS-Mono** (GPL-3.0, sliding-window optimization, proven on Jetson Nano) — pending user license decision + Orin-Nano-Super MVE +4. **Pure VO baseline** (mandatory simple-baseline; runtime guaranteed; carries the project as a graceful fallback) + +**Disqualified outright**: DROID-SLAM (AC-4.2 memory budget), RTAB-Map and ORB-SLAM3 (already pruned by Fact #16). + +**Conditional / not-direct-fit**: DPVO / DPV-SLAM (VO not VIO, needs external IMU wrapper), Kimera-VIO (resource overhead unjustified for narrow C1 mandate). + +### C1 Open Decisions (to be resolved before SQ3+SQ4 closure) + +**Decision D-C1-1 — GPL-3.0 license posture for the onboard binary** (BLOCKING for the GPL-3.0 trio: VINS-Mono / VINS-Fusion / OpenVINS). +- The three most established VIO candidates (VINS-Mono / VINS-Fusion / OpenVINS) are GPL-3.0 (viral copyleft). +- For dual-use UAV deployment, GPL-3 binary distribution to a customer triggers obligations: source-code disclosure for the entire linked binary, anti-tivoization clauses for embedded firmware updates, viral effect on any proprietary code linked into the same binary. +- BSD/MIT alternatives exist (OKVIS2 BSD-3, Kimera BSD-2, DPVO MIT, pure-VO baseline OpenCV-Apache-2.0), but each comes with secondary trade-offs (Jetson MVE risk, missing IMU fusion, resource overhead). +- Three options for the user: + - **(a)** Accept GPL-3.0 — distribution model = release source on customer request; or operate the system as a service rather than transferring binaries. Lowest-risk algorithmic path (most-tested candidates). + - **(b)** Restrict to permissive licenses only (BSD/MIT) — lead candidate becomes OKVIS2; carries Jetson MVE risk. + - **(c)** Keep both options open through the design phase — make the final license decision after the Jetson Orin Nano MVE results are in. +- **Recommended default**: **(c)** — defer the binary commitment until empirical evidence on Jetson Orin Nano. This is recorded as a flagged decision; SQ3+SQ4 candidate matrix will carry both license families to Step 7.5. + +**Decision D-C1-2 — Acceptance of Jetson Orin Nano MVE as a Step-7.5 prerequisite** (procedural). +- Per the Per-Mode API Capability Verification rule, every lead candidate library/SDK requires `context7` (or equivalent docs) lookup + a Minimum Viable Example for the project's pinned mode + per-numbered-Restriction × per-numbered-AC sub-matrix. +- The Component Applicability Gate above is **preliminary** — it documents enumeration evidence but does NOT yet contain `context7` per-mode capability verification or the structured sub-matrix. +- **Next session's mandatory work**: `context7` lookup (3 mandatory queries) for OpenVINS / OKVIS2 / VINS-Mono; per-Restriction × per-AC sub-matrix per candidate; the same for the simple-baseline path; record into `02_fact_cards.md` per the engine template + `06_component_fit_matrix.md` per Step 7.5. + +### C1 Boundary check: candidate enumeration is saturated for this session + +Saturation signals observed: (a) all 7 named candidates from `00_question_decomposition.md` C1 row enumerated with at least one canonical L1 source per candidate; (b) Jetson Orin Nano runtime evidence located for OpenVINS (direct) and VINS-Mono (Jetson Nano + RPi CM4); other candidates carry "MVE required" gates explicitly; (c) license diversity covered (GPL-3.0 trio + BSD-permissive duo + MIT + permissive-baseline); (d) explicit disqualifications recorded with cited evidence (DROID-SLAM, RTAB-Map, ORB-SLAM3). **Open**: per-mode `context7` verification (BLOCKING per rule) + Restrictions×AC sub-matrices (BLOCKING per Step 7.5) — explicitly deferred to next session. diff --git a/_docs/_autodev_state.md b/_docs/_autodev_state.md index bb5b618..e431403 100644 --- a/_docs/_autodev_state.md +++ b/_docs/_autodev_state.md @@ -6,8 +6,8 @@ step: 2 name: Research status: in_progress sub_step: - phase: 10 - name: sq3-sq4-c1-vio-candidates - detail: "Mode A Phase 2 engine Step 4 — SQ3+SQ4 per-component candidate matrix, starting with C1 (VO/VIO). SQ1 ✓, SQ2 ✓ closed with three architectural decisions resolved by user (recorded in 00_question_decomposition.md): (1) DSM dependency = (a) 3-DoF acceptance (no DSM in cache; attitude from IMU/VIO); (2) AdHoP = (b) conditional (only when initial reprojection error exceeds threshold); (3) Top-N re-rank = (a) promoted to explicit pipeline sub-stage between C2 and C3. Pipeline shape entering SQ3+SQ4: C1 (VIO) → C2 (VPR) → Top-N re-rank → C3 (matcher) → AdHoP-conditional → C4 (PnP+RANSAC+LM) → C5 (estimator) → C8 (FC adapter), with C6 (cache 2D-ortho) + C7 (Jetson runtime) + C9 (datasets) + C10 (provisioning) cross-cutting. C1 candidates entering SQ3+SQ4 (RTAB-Map and ORB-SLAM3 already pruned by Fact #16): VINS-Mono / VINS-Fusion / OpenVINS / OKVIS2 / DROID-SLAM / DPVO / pure-VO baseline (KLT + RANSAC homography). Per-mode context7 capability verification mandatory for every lead library/SDK candidate. Next conversation: re-enter Step 2 / sub_step phase 10 name 'sq3-sq4-c1-vio-candidates', execute Per-Component Search Plan (`research/steps/03_engine-investigation.md` Step 4) starting with C1: enumerate 5+ search variants per candidate, log to 01_source_registry.md + 02_fact_cards.md, then Component Applicability Gate (must pass: monocular VIO, ≥3 fps output, Jetson Orin Nano runnable, license OK for dual-use)." + phase: 12 + name: c1-context7-and-restrictions-ac-submatrix + detail: "C1 candidate enumeration done (Sources #43–#53 in 01_source_registry.md, Facts #28–#36 in 02_fact_cards.md). Surviving lead candidates (priority order): (1) OpenVINS — GPL-3.0, best Jetson Orin Nano evidence; (2) OKVIS2 / OKVIS2-X — BSD-3, most actively maintained, GNSS-fusion alignment for AC-NEW-2; (3) VINS-Mono — GPL-3.0, proven on Jetson Nano; (4) Pure VO baseline — mandatory simple-baseline reference. Disqualified: DROID-SLAM (AC-4.2 memory budget), RTAB-Map / ORB-SLAM3 (Fact #16). Conditional: DPVO (VO not VIO; needs external IMU wrapper), Kimera-VIO (resource overhead). Two open decisions surfaced: D-C1-1 GPL-3.0 license posture for onboard binary (BLOCKING for GPL-3 trio) and D-C1-2 Jetson Orin Nano MVE schedule. NEXT SESSION'S WORK (BLOCKING per Per-Mode API Capability Verification rule): (a) context7 lookup × 3 mandatory queries per lead candidate (OpenVINS, OKVIS2/OKVIS2-X, VINS-Mono) covering mode enumeration + project's exact mode runnable example + disqualifier probe; (b) MVE block per candidate in 02_fact_cards.md; (c) per-numbered-Restriction × per-numbered-AC sub-matrix per candidate; (d) write 06_component_fit_matrix.md draft for C1 row; (e) ASK USER on Decision D-C1-1 before promoting any GPL-3 candidate to Selected. AFTER C1 IS CLOSED: proceed to C2 (VPR) candidate enumeration." retry_count: 0 cycle: 1