From e0a6f0d9d5b64cb7ab6230253a8a1287eedef4f2 Mon Sep 17 00:00:00 2001 From: Oleksandr Bezdieniezhnykh Date: Fri, 8 May 2026 01:12:43 +0300 Subject: [PATCH] Update autodev state and candidate enumeration for C1 VIO Revised the autodev state to reflect the transition to phase 12, detailing the candidate enumeration for C1 (VIO) with a focus on context7 capability verification and restrictions assessment. Updated the source registry to indicate progress on C1 candidates, including the addition of new sources and their evaluation status. Enhanced fact cards with detailed assessments of VINS-Mono and VINS-Fusion, highlighting their suitability and licensing considerations for dual-use deployment. Deferred context7 verification and structured sub-matrix tasks to the next session. --- _docs/00_research/01_source_registry.md | 138 +++++++++++++++++++++++- _docs/00_research/02_fact_cards.md | 133 +++++++++++++++++++++++ _docs/_autodev_state.md | 6 +- 3 files changed, 273 insertions(+), 4 deletions(-) diff --git a/_docs/00_research/01_source_registry.md b/_docs/00_research/01_source_registry.md index 33bf2e7..4b8f4ff 100644 --- a/_docs/00_research/01_source_registry.md +++ b/_docs/00_research/01_source_registry.md @@ -15,7 +15,7 @@ | SQ6 — ArduPilot vs iNav external positioning | **Saturated for protocol-level architectural decision** (further detail deferred to SQ8 for spoofing-side fields and to design phase for SITL parameter tuning) | Major finding: iNav has no inbound external-positioning MAVLink handler; AC-4.3 wording must be revised. See `02_fact_cards.md` "SQ6 Conclusions". | | SQ1 — Existing GPS-denied UAV systems | **Saturated.** 13 sources logged across academic / open-source / commercial / defense-program / Ukraine-practitioner. Closest peer system: Twist Robotics OSCAR (deployed in Ukraine). Closest open-source pipeline-match: snktshrma/ngps_flight (NGPS, ArduPilot GSoC 2024 — LightGlue+SuperPoint+UKF+VISION_POSITION_ESTIMATE). Closest deployed commercial: Auterion Artemis (Skynode N + Visual Navigation, Ukraine-tested, 1000-mile range). | See `02_fact_cards.md` SQ1 cluster + working summary. | | SQ2 — Canonical pipeline decomposition | **Saturated.** 5 surveys/benchmarks logged (Skoltech aerial VPR, U.Maine cross-view, OrthoLoC 2.5D geodata, AnyVisLoc low-altitude multi-view, NUDT 2026 sciopen survey). All converge on **`retrieval → matching → pose-estimation`** hierarchical framework with VIO/IMU as auxiliary. Two new architectural facts added to C1–C10: (a) **AdHoP-style perspective-refinement loop** between matching and PnP (+63% translation accuracy, method-agnostic), (b) **DSM 2.5D dependency** for full 6-DoF on aerial-to-satellite (must be resolved with the Suite Sat Service or accepted as a 3-DoF degraded mode). Practitioner runtime evidence: AnyLoc on RTX 3090 = 0.63s/descriptor, SuperGlue re-rank = 17–25s; on Jetson Orin Nano these are non-viable for our 400 ms p95 budget — must restrict to lightweight VPR (e.g., MixVPR / SALAD class) + LightGlue/XFeat-class matchers. See `02_fact_cards.md` "SQ2 Conclusions". | -| SQ3+SQ4 — Per-component candidates (C1–C10) | Not started | | +| SQ3+SQ4 — Per-component candidates (C1–C10) | **In progress** — C1 (VIO) candidate enumeration done (Sources #43–#52); per-mode `context7` verification + Restrictions×AC sub-matrix per surviving candidate deferred to next session. C2–C10 not started. | See `02_fact_cards.md` C1 cluster + preliminary applicability table. | | SQ5 — Failure modes / deployment lessons | Not started (interleaved with SQ3/SQ4) | | | SQ7 — Datasets, SITL, replay environments | Not started | | | SQ8 — Safety considerations (AC-NEW-4 / AC-NEW-7) | Not started | Carries the AP_GPS spoofing-signal probe deferred from SQ6. | @@ -521,3 +521,139 @@ - **Research Boundary Match**: **Full match** (low-altitude UAV AVL is the survey's exact subject) - **Summary**: Survey-level confirmation of the canonical "**retrieval-matching-pose estimation**" hierarchical framework. Verbatim claim: "the hierarchical framework balances search efficiency, positioning accuracy, and scene generalization, becoming a robust technical path for low-altitude long-endurance absolute localization." Compares the framework against alternatives that are explicitly rejected: (a) relative visual localization (cumulative errors — VIO/SLAM only); (b) end-to-end direct localization (poor generalization); (c) map-free localization (scene-dependent). Sub-component evolution per stage: (a) retrieval = template-matching (SAD/SSD/NCC) → BoW/VLAD → deep-learning (annular/dense feature segmentation, contrastive InfoNCE, self-supervised); (b) matching = SIFT/SURF/ORB → SuperPoint+LightGlue/RoMa (sparse / semi-dense / dense); (c) pose estimation = PnP variants + RANSAC + IMU prior fusion. **Identifies four open challenges** that align with project risks: (i) cross-domain generalization (war-zone scene change); (ii) real-time inference on edge platforms (Jetson); (iii) robustness to complex environments (cropland, snow, low texture); (iv) high-quality datasets (the same gap our project's AC-NEW-7 / cache provisioning works around). **Lightweight-model-design-for-edge-deployment is named as a primary future-research direction** — directly validates project's Jetson Orin Nano constraint as a recognized field-level challenge, not a project-specific oddity. - **Related Sub-question**: SQ2 (framework canonicalness), SQ3+SQ4 (per-component evolution), SQ5 (named open challenges align with project risks) + +--- + +## SQ3+SQ4 / C1 (Visual / Visual-Inertial Odometry) — Candidate enumeration + +### Source #43 +- **Title**: VINS-Mono — A Robust and Versatile Monocular Visual-Inertial State Estimator (HKUST-Aerial-Robotics) +- **Link**: https://github.com/HKUST-Aerial-Robotics/VINS-Mono ; LICENCE: https://github.com/HKUST-Aerial-Robotics/VINS-Mono/blob/master/LICENCE +- **Tier**: L1 (canonical reference implementation; published in IEEE T-RO 2018 by Qin, Li, Shen) +- **Publication Date**: original 2018; repository last meaningful update 2024-02-25 (per GitHub commit log; 2024-05-23 simulation-data commit only) +- **Timeliness Status**: ⚠️ **Borderline.** ~24 months since the last meaningful master-branch commit at access time (2026-05-07). Established baseline that does NOT trigger Step 0.5's 18-month timeliness rejection because (a) IEEE T-RO publication is the canonical authority for the algorithm, (b) downstream forks (vins-mono-android, embedded variants) keep the algorithm class actively deployed. +- **Version Info**: No GitHub releases / tags (master-branch-only project). Stars 5,829. +- **Target Audience**: Mono+IMU VIO implementers; UAV state estimation researchers +- **Research Boundary Match**: **Full match for the candidate's pinned mode** — monocular camera + IMU producing 6-DoF metric pose. The VINS-Mono README explicitly names this configuration as primary. +- **Summary**: Optimization-based sliding-window monocular VIO. Features: efficient IMU pre-integration (Forster et al. 2017), automatic initialization, online camera-IMU extrinsic calibration, online camera-IMU temporal calibration, failure detection + recovery, loop detection (DBoW2-based), global pose graph optimization. Output is metric-scale 6-DoF pose at IMU rate (typically 100–200 Hz) with covariance from the optimization Hessian. **License: GPL-3.0 (copyleft viral)** — every binary distribution requires source disclosure for the entire linked binary; relevant for dual-use deployment if the companion image is sold or transferred to a customer. +- **Related Sub-question**: SQ3+SQ4 / C1 lead candidate + +### Source #44 +- **Title**: VINS-Fusion — Optimization-based multi-sensor state estimator (HKUST-Aerial-Robotics) +- **Link**: https://github.com/HKUST-Aerial-Robotics/VINS-Fusion ; LICENCE: https://github.com/HKUST-Aerial-Robotics/VINS-Fusion/blob/master/LICENCE +- **Tier**: L1 (canonical reference; superset of VINS-Mono) +- **Publication Date**: original 2019 (Qin, Cao, Pan, Shen — ICRA workshop / IROS); repository last update 2024-05-23 +- **Timeliness Status**: ⚠️ **Borderline.** ~24 months since the last update at access time. Same Step-0.5 reasoning as VINS-Mono — established class. +- **Version Info**: master-branch-only. Stars 4,476. Top-ranked open-source stereo-VIO on KITTI Odometry as of January 2019. +- **Target Audience**: Multi-sensor VIO implementers (mono+IMU, stereo, stereo+IMU, +GPS fusion) +- **Research Boundary Match**: **Full match** for monocular+IMU mode. VINS-Fusion README explicitly enumerates four sensor configurations (mono+IMU, stereo, stereo+IMU, +GPS toy example). +- **Summary**: Superset of VINS-Mono adding stereo and GPS-fusion modes. Same algorithmic core (sliding-window optimization with IMU pre-integration). Online spatial + temporal camera-IMU calibration; visual loop closure; ROS Kinetic/Melodic build dependency. **License: GPL-3.0** — same dual-use distribution constraint as VINS-Mono. Independent KAIST benchmark (Source #46) found VINS-Fusion CPU mode + VINS-Fusion-imu **fail to run** on Jetson TX2 (insufficient memory and CPU); GPU-accelerated VINS-Fusion-gpu does run on TX2. Implication for project: VINS-Fusion-imu on Jetson Orin Nano Super is feasible but not certain; needs MVE. +- **Related Sub-question**: SQ3+SQ4 / C1 lead candidate + +### Source #45 +- **Title**: OpenVINS — An open source platform for visual-inertial navigation research (Robot Perception and Navigation Group, U. of Delaware — rpng) +- **Link**: https://github.com/rpng/open_vins ; docs: https://docs.openvins.com/ ; LICENSE: https://github.com/rpng/open_vins/blob/master/LICENSE +- **Tier**: L1 (canonical research implementation; ICRA 2020 paper Geneva, Eckenhoff, Lee, Yang, Huang) +- **Publication Date**: original 2020; latest tagged release v2.7 = 2023-06; ongoing master-branch commits through 2024–2025 (latest issue threads through Feb 2025) +- **Timeliness Status**: ✅ Currently valid (master branch active; latest tagged release ~35 months but library is in stable/maintenance mode with continued issue triage). +- **Version Info**: Stars 2,828; 30 contributors; 12 releases. v2.7 is the current tagged stable. +- **Target Audience**: MSCKF/EKF VIO implementers; researchers needing a reference MSCKF +- **Research Boundary Match**: **Full match** for monocular+IMU mode. OpenVINS supports mono, stereo, multi-camera (1–N cameras) + IMU; mono is a documented first-class mode. +- **Summary**: Modular MSCKF (Multi-State Constraint Kalman Filter) implementation built around an Extended Kalman filter that fuses inertial state with sparse visual feature tracks via the sliding-window MSCKF formulation (Mourikis & Roumeliotis 2007). Supports SLAM features (in-state landmarks) plus pure MSCKF features (out-of-state). ROS1 + ROS2 (Humble) builds documented; Jetson Orin Nano Dev Kit + JetPack 6 + ROS 2 Humble compilation **confirmed working** by community contributors (rpng/open_vins issue #421, fdcl-gwu/openvins_jetson_realsense Nov 2025 setup guide). **License: GPL-3.0** — same dual-use distribution constraint. Reported latency ~270 ms on Xavier NX (4-core, ARM, 40% CPU usage) per issue #164; needs Jetson-Orin-Nano-Super MVE for production budget verification. +- **Related Sub-question**: SQ3+SQ4 / C1 lead candidate + +### Source #46 +- **Title**: Run Your Visual-Inertial Odometry on NVIDIA Jetson — Benchmark Tests on a Micro Aerial Vehicle (Jeon, Jung, Lee, Choi, Myung — KAIST) +- **Link**: https://arxiv.org/abs/2103.01655 ; KAIST VIO dataset: https://github.com/zinuok/kaistviodataset +- **Tier**: L1 (peer-reviewed conference, IROS-track preprint with public dataset) +- **Publication Date**: arXiv 2021-03-02 +- **Timeliness Status**: ⚠️ Older than the 18-month Critical-novelty window, but **uniquely authoritative** for the specific question "do these VIO algorithms run on a Jetson?"; the included algorithms (VINS-Mono, VINS-Fusion, ROVIO, ALVIO, Stereo-MSCKF, Kimera, ORB-SLAM2-stereo) are all classical baselines whose runtime characteristics on ARM CPUs have not changed materially. Jetson hardware comparison (TX2 / Xavier NX / AGX Xavier) does NOT include Orin Nano — must extrapolate. +- **Version Info**: Conference paper. +- **Target Audience**: UAV state-estimation engineers picking a VIO for a Jetson companion +- **Research Boundary Match**: **Strong match for the question**, partial for the hardware (no Orin Nano). KAIST VIO dataset is indoor mocap, not UAV-aerial-nadir — the *latency / CPU / memory* numbers transfer; the *accuracy* numbers do not transfer to our domain. +- **Summary**: Comprehensive benchmark of 9 algorithms on TX2, Xavier NX, AGX Xavier: VINS-Mono, VINS-Fusion (CPU), VINS-Fusion-gpu, VINS-Fusion-imu, ROVIO, Stereo-MSCKF, ALVIO, Kimera, ORB-SLAM2-stereo. **Hard findings**: (a) on TX2, **VINS-Fusion (CPU) and VINS-Fusion-imu fail to run** due to insufficient memory and CPU performance — VINS-Fusion-gpu does run; (b) all algorithms except ROVIO show >100% CPU usage (multi-core utilisation, OK for our 6-core Orin Nano A78AE); (c) Kimera has the highest memory usage among VIO methods (numerous computations per keyframe), failure-prone on Xavier NX-class memory; (d) Stereo-MSCKF has the lowest memory among stereo VIOs; (e) ROVIO has the lowest CPU usage owing to its patch-tracking formulation. **Implication for project**: Jetson Orin Nano Super (8 GB shared, 6-core A78AE, Ampere GPU, 67 TOPS sparse INT8) is between Xavier NX and AGX Xavier in CPU performance and memory; algorithms passing on Xavier NX should pass on Orin Nano Super, but VINS-Fusion-imu's TX2 failure is a yellow-flag for memory pressure under co-resident C2/C3/C5 modules. +- **Related Sub-question**: SQ3+SQ4 / C1 (VINS-Mono / VINS-Fusion / OpenVINS / Kimera / Stereo-MSCKF / ROVIO Jetson runtime evidence), SQ5 (resource-budget failure modes) + +### Source #47 +- **Title**: OKVIS2 — Realtime Scalable Visual-Inertial SLAM with Loop Closure (Leutenegger, ETH/Imperial/TUM Smart Robotics Lab) +- **Link**: https://github.com/ethz-mrl/okvis2 ; arXiv: https://arxiv.org/abs/2202.09199 ; LICENSE: https://github.com/ethz-mrl/okvis2/blob/main/LICENSE +- **Tier**: L1 (canonical implementation; arXiv 2022 by paper author) +- **Publication Date**: original arXiv 2022; OKVIS2-X T-RO 2025 successor (Boche, Jung, Laina, Leutenegger — IEEE T-RO 2025, vol 41 pp 6064–6083, DOI 10.1109/TRO.2025.3619051; arXiv 2510.04612, Oct 2025). Repository last push 2026-03-17 (ethz-mrl/OKVIS2-X). +- **Timeliness Status**: ✅ **Current.** Active development through 2026; OKVIS2-X is the most recent published VI-SLAM system in this class. +- **Version Info**: ethz-mrl/okvis2 (core) and ethz-mrl/OKVIS2-X (multi-sensor extension with optional GNSS / LiDAR / dense depth). +- **Target Audience**: Factor-graph VI-SLAM implementers; mid-large-scale loop-closure use cases +- **Research Boundary Match**: **Full match** for monocular+IMU mode. OKVIS2 README + paper explicitly support mono and multi-camera VI configurations. OKVIS2-X adds GNSS fusion (relevant: VINS-Fusion-style GPS-when-available drop-in IS the project's eventual posture in non-spoofed regions). +- **Summary**: Factor-graph VI-SLAM with bounded-size optimization. Innovation: pose-graph edges from marginalised observations can be "seamlessly turned back into observations" upon loop closure, reviving old landmarks and reprojection errors. Includes lightweight CNN segmentation for dynamic-region removal. OKVIS2-X (2025) generalises the core to fuse multi-camera + IMU + optional GNSS + LiDAR/depth — directly aligned with project's "VIO that may opportunistically fuse a non-spoofed GPS update" pattern and AC-NEW-2's spoof-promotion path. **License: 3-clause BSD (permissive)** — no copyleft / dual-use distribution friction. Note: GitHub UI shows "Other (NOASSERTION)" because of the standard BSD clause language pattern; the LICENSE file is canonical 3-clause BSD. +- **Related Sub-question**: SQ3+SQ4 / C1 lead candidate (factor-graph + permissive license + active maintenance) + +### Source #48 +- **Title**: OKVIS2-X: Open Keyframe-based Visual-Inertial SLAM Configurable with Dense Depth or LiDAR, and GNSS (Boche, Jung, Laina, Leutenegger — TUM / ETH Zurich Smart Robotics Lab) +- **Link**: https://github.com/ethz-mrl/OKVIS2-X ; arXiv: https://arxiv.org/abs/2510.04612 ; IEEE T-RO 2025 vol 41 pp 6064–6083 DOI 10.1109/TRO.2025.3619051 +- **Tier**: L1 (peer-reviewed IEEE Transactions on Robotics, Special Issue Visual SLAM 2025) +- **Publication Date**: arXiv 2025-10-04; T-RO 2025 vol 41 +- **Timeliness Status**: ✅ Current (within 6-month Critical-novelty window) +- **Version Info**: 295 stars; 38 forks; 2 contributors; created 2025-09-23, last push 2026-03-17. License: NOASSERTION on GitHub UI; per-paper license follows ethz-mrl convention (BSD-3 derived). +- **Target Audience**: Multi-sensor SLAM researchers; large-scale VI-SLAM with optional GNSS/LiDAR +- **Research Boundary Match**: **Strong match** — extends OKVIS2 monocular+IMU mode with optional GNSS fusion (Visual-Inertial SLAM with Tightly-Coupled Dropout-Tolerant GPS Fusion lineage from IROS 2022). Project's `MAV_CMD_SET_EKF_SOURCE_SET` switch + companion-side spoof-detection conceptually mirrors OKVIS2-X's "GPS as drop-out-tolerant signal". +- **Summary**: Non-trivial extension of OKVIS2; submap-based volumetric occupancy mapping. Demonstrates that the OKVIS2 factor-graph backbone can absorb spoofing-aware GPS without re-architecting. Useful as architectural template for project's C5 estimator + C8 adapter integration. License: same as OKVIS2 (BSD-3-derived). Two named contributors (bochsim, SebsBarbas) actively pushing through Mar 2026. +- **Related Sub-question**: SQ3+SQ4 / C1 (OKVIS2 lineage; VI-SLAM with optional GPS/LiDAR), SQ8 (GPS-fusion dropout-tolerant lineage) + +### Source #49 +- **Title**: Kimera-VIO — Visual Inertial Odometry with SLAM capabilities and 3D Mesh generation (MIT-SPARK) +- **Link**: https://github.com/MIT-SPARK/Kimera-VIO ; LICENSE.BSD: https://github.com/MIT-SPARK/Kimera-VIO/blob/master/LICENSE.BSD +- **Tier**: L1 (canonical implementation by MIT SPARK Lab) +- **Publication Date**: original 2020 (Rosinol, Abate, Chang, Carlone — ICRA 2020); ongoing development through 2024–2025 issue threads (Dec 2024 / Feb 2025 ROS2 / mono-inertial discussion). +- **Timeliness Status**: ✅ Active maintenance (recent issues / PRs through 2025). +- **Version Info**: master-branch-only; LICENSE.BSD = BSD 2-Clause "Simplified". +- **Target Audience**: VI-SLAM + mesh-mapping researchers +- **Research Boundary Match**: **Partial.** Stereo+IMU is the primary supported configuration; mono+IMU is **optional but documented**. Kimera also produces 3D mesh and high-level semantic labels (relevant to neither C1 nor the project's bandwidth budget — overhead). +- **Summary**: Frontend (image processing + IMU pre-integration) + Backend (factor-graph optimization in iSAM2 or GTSAM) + Mesher + Pose-Graph-Optimizer. **License: BSD 2-Clause (permissive)** — no dual-use distribution friction. **Penalty for project**: Source #46 KAIST benchmark found Kimera has highest memory usage among the VIOs tested (numerous computations per keyframe), and Kimera failed to fit on Xavier-NX-class memory under multi-process load. Mesh + semantic features are unused by the project — Kimera's overhead is unjustified vs OKVIS2 / OpenVINS for the project's narrow C1 mandate. **Status**: viable secondary fallback if OKVIS2 / VINS-Mono runtime issues arise; not a lead candidate due to overhead misfit. +- **Related Sub-question**: SQ3+SQ4 / C1 secondary candidate (BSD-permissive but resource-heavy) + +### Source #50 +- **Title**: DROID-SLAM — Deep Visual SLAM for Monocular, Stereo, and RGB-D Cameras (princeton-vl, Teed & Deng) +- **Link**: https://github.com/princeton-vl/droid-slam ; arXiv: https://arxiv.org/abs/2108.10869 ; NeurIPS 2021 +- **Tier**: L1 (canonical reference) +- **Publication Date**: NeurIPS 2021; repository latest tagged baseline. +- **Timeliness Status**: ✅ Foundational reference; DPV-SLAM (Source #51) is the lighter successor. +- **Version Info**: master-branch-only. +- **Target Audience**: Deep-learning-based VO/VSLAM researchers +- **Research Boundary Match**: **Disqualified by hardware budget.** Inference requires ≥11 GB GPU VRAM per official README; project budget is 8 GB **shared CPU+GPU** on Jetson Orin Nano Super, leaving <8 GB for VO + VPR + matcher + estimator + cache co-resident. DROID-SLAM is also **monocular VO/SLAM, not VIO** — no native IMU fusion; metric scale recovery requires external scale alignment. +- **Summary**: Recurrent dense bundle adjustment over a complete history of camera poses. State-of-the-art accuracy on TartanAir / EuRoC / TUM-RGBD at the cost of GPU memory. **Disqualified outright for C1 lead** by AC-4.2 (≤8 GB shared RAM) and the lack of IMU fusion that would require an additional ESKF/UKF wrapping. Kept as **reference baseline** to be cited as "what we cannot afford" in `solution_draft01`. +- **Related Sub-question**: SQ3+SQ4 / C1 disqualified candidate + +### Source #51 +- **Title**: DPVO — Deep Patch Visual Odometry (princeton-vl, Teed, Lipson, Deng) + DPV-SLAM (Lipson, Teed, Deng — ECCV 2024) +- **Link**: https://github.com/princeton-vl/DPVO ; LICENSE: https://github.com/princeton-vl/DPVO/blob/main/LICENSE ; ECCV 2024 paper: https://www.ecva.net/papers/eccv_2024/papers_ECCV/papers/00272.pdf +- **Tier**: L1 (canonical implementation; NeurIPS 2023 + ECCV 2024) +- **Publication Date**: NeurIPS 2023 (DPVO); ECCV 2024 (DPV-SLAM); repository last update 2024-10-12. +- **Timeliness Status**: ⚠️ Borderline. ~19 months since last code update; ECCV-2024 publication of DPV-SLAM keeps the algorithm class within the 6-month claim window for the SLAM successor. +- **Version Info**: 989 stars; primary languages C++ / Python / CUDA. **License: MIT (permissive)** — no dual-use distribution friction. +- **Target Audience**: Deep-learning VO/SLAM with reduced memory footprint +- **Research Boundary Match**: **Partial.** DPVO is **monocular VO only — no IMU fusion**. Output pose is in arbitrary scale (no metric scale recovery). To be a viable C1 candidate the project must wrap DPVO with an external IMU+scale-fusion stage (loosely-coupled ESKF / VIO-fusion module). This makes DPVO **not a drop-in C1** like VINS-Mono / OpenVINS / OKVIS2; it is a **VO module that needs a separate VIO wrapper**. +- **Summary**: Sparse patch tracking + differentiable bundle adjustment back end. Outperforms DROID-SLAM on TartanAir / EuRoC ATE while using ~1/3 of DROID-SLAM's GPU memory (DROID-SLAM: 8.7 GB VO mode vs DPVO: ~3 GB). DPV-SLAM (Lipson, Teed, Deng — ECCV 2024) adds full SLAM capability with 4–5 GB GPU usage. **Jetson runtime evidence**: indirect via DPVO-QAT++ (Source #52) — peak reserved memory 1.02 GB on RTX 4060 (8 GB) after INT8 fake-quant + custom CUDA kernel fusion; not directly tested on Jetson Orin Nano. **Status for C1**: pure-VO candidate (must be paired with separate IMU integration to deliver metric scale + attitude); would not satisfy "monocular VIO" gate alone, but viable as the *VO half* of a hybrid C1+C5 design. +- **Related Sub-question**: SQ3+SQ4 / C1 conditional candidate (VO not VIO; needs external IMU wrapper) + +### Source #52 +- **Title**: DPVO-QAT++: Heterogeneous QAT and CUDA Kernel Fusion for High-Performance Deep Patch Visual Odometry (Cheng Liao) +- **Link**: https://arxiv.org/abs/2511.12653 ; project HTML: https://arxiv.org/html/2511.12653 +- **Tier**: L2 (single-author preprint, code partially released; no peer-review yet) +- **Publication Date**: arXiv 2025-11-16 (within 6-month Critical-novelty window) +- **Timeliness Status**: ✅ Current +- **Version Info**: arXiv preprint; code & weights released for QAT-only and fused-CUDA variants. +- **Target Audience**: Embedded-platform DPVO deployers +- **Research Boundary Match**: **Partial.** Hardware tested = RTX 4060 (8 GB) + Intel Core Ultra 5-125H + 32 GB RAM — desktop GPU, NOT Jetson Orin Nano. Direct extrapolation requires Jetson MVE; Orin Nano Super's Ampere GPU is architecturally similar but smaller than RTX 4060. +- **Summary**: Quantization-Aware Training framework for DPVO with fused CUDA kernels. Reduces peak GPU memory from 1.94 GB → 1.02 GB (-47%) on a representative TartanAir sequence; +34.6% median FPS on TartanAir, +26.7% on EuRoC; -22.8 ms / -19.7 ms median P99 tail latency on TartanAir / EuRoC respectively. Heterogeneous precision: front-end pseudo-quantization (FP16/FP32 with INT8 simulation) + FP32 back-end geometric solver. **Implication for project**: shows DPVO has a documented Jetson-suitable footprint **path** but not a Jetson-Orin-Nano measurement. ATE accuracy comparable to baseline DPVO across 32 TartanAir + 11 EuRoC validation sequences. Notable: requires a teacher-student distillation training pipeline before deployment — adds operational complexity vs classical VINS-* / OpenVINS / OKVIS2. +- **Related Sub-question**: SQ3+SQ4 / C1 supporting evidence for DPVO embedded feasibility + +### Source #53 +- **Title**: Pure VO baseline — KLT optical flow + 5-point essential matrix or homography RANSAC (OpenCV reference) +- **Link**: https://docs.opencv.org/4.x/d4/dee/tutorial_optical_flow.html ; representative public implementation: https://github.com/alishobeiri/Monocular-Video-Odometery (MIT, 2018) ; tutorial reference: https://zxh.me/posts/2022-12-19-monocular-visual-odometry/ +- **Tier**: L1 (OpenCV official documentation) + L2 (representative public implementations) +- **Publication Date**: OpenCV docs continuously updated; tutorial 2022-12; reference implementation 2018 (algorithmic class is foundational, no time window per Step 0.5) +- **Timeliness Status**: ✅ Foundational baseline (no time window). +- **Version Info**: OpenCV `cv::calcOpticalFlowPyrLK` (KLT) + `cv::findEssentialMat` (5-point Nister) or `cv::findHomography` with RANSAC. +- **Target Audience**: Implementers needing a transparent low-complexity fallback +- **Research Boundary Match**: **Full match for the simple-baseline candidate.** Suits planar nadir-down UAV at altitude (Ukrainian steppe is ~planar at 1 km AGL — homography is geometrically appropriate; for non-planar relief the essential matrix path is more appropriate but adds scale-recovery work). +- **Summary**: Established classical pipeline: Shi-Tomasi or FAST corner detection → KLT pyramidal optical flow tracking → 5-point essential matrix or homography RANSAC → relative pose with arbitrary scale (must be metric-scale-aligned via IMU integration externally). Reference implementations widely available in OpenCV samples and pedagogical repos. **Status**: candidate as the project's `Simple baseline / known-runnable / known-failure-mode` C1 option per Component Option Breadth rule. Not a lead, but mandatory fallback presence per the research engine's "include at least one simple baseline" rule. +- **Related Sub-question**: SQ3+SQ4 / C1 simple-baseline candidate diff --git a/_docs/00_research/02_fact_cards.md b/_docs/00_research/02_fact_cards.md index 76907d2..8597feb 100644 --- a/_docs/00_research/02_fact_cards.md +++ b/_docs/00_research/02_fact_cards.md @@ -408,3 +408,136 @@ Saturation signals observed: 4 perspectives saturated, ≥3 high-confidence fact ### Boundary check: SQ2 is saturated Saturation signals observed: (a) four independent surveys/benchmarks (Skoltech aerial-VPR survey, U.Maine cross-view survey, OrthoLoC benchmark, AnyVisLoc benchmark, NUDT 2026 absolute-VL survey) converge on the **same** "retrieval → matching → pose-estimation hierarchical framework" as canonical; (b) two independent runtime sources (Skoltech survey on RTX 3090; AnyVisLoc on RTX 3090 with explicit dense-vs-sparse breakdown) agree on the relative cost ordering of model classes; (c) cross-source agreement on AdHoP value (Source #40 only, but with reproducible code and dataset — single-source-but-strong evidence); (d) cross-source agreement on covisibility / sensor-prior thresholds. Two outstanding decisions are flagged for user — neither blocks SQ2's saturation status, both block SQ3+SQ4 start. Per `references/source-tiering.md` "Search saturation rule" → SQ2 is closed pending user decisions on DSM dependency + AdHoP gating. + +--- + +## SQ3+SQ4 / C1 — Visual / Visual-Inertial Odometry candidate enumeration + +> **Project's pinned mode for every C1 candidate (binding)**: monocular ADTi 20MP nav camera @ 3 fps + IMU from FC over MAVLink @ ≥100 Hz, on Jetson Orin Nano Super (JetPack/CUDA/TensorRT, 8 GB shared LPDDR5, 25 W TDP), producing relative 6-DoF metric pose between consecutive frames + per-axis covariance, with attitude (yaw + pitch) hard-contract σ ≤ 5° at 1 σ (Fact #24), output cadence ≥3 Hz, no in-flight network, license compatible with onboard-binary distribution to a dual-use customer. +> +> Per the engine's "Per-Mode API Capability Verification" rule, any candidate marked `Selected` requires a `context7` lookup (mode enum + project's exact mode runnable example + disqualifier probe) AND a per-numbered-Restriction × per-numbered-AC sub-matrix. **This session covers candidate enumeration + preliminary applicability assessment only**; `context7` verification and the structured sub-matrix are deferred to the next session per the autodev context budget heuristic. + +### Fact #28 — VINS-Mono is a canonical monocular-only sliding-window VIO with a working Jetson-Nano deployment record but no GitHub release and ~24-month-old master branch +- **Statement**: VINS-Mono is the canonical mono+IMU sliding-window VIO from HKUST-Aerial-Robotics (Qin, Li, Shen — IEEE T-RO 2018). Features: efficient IMU pre-integration, automatic initialization, online camera-IMU spatial + temporal calibration, failure detection + recovery, DBoW2 loop detection, global pose-graph optimization. Output: metric-scale 6-DoF pose at IMU rate. **Repository state**: master-branch only (no tagged releases), 5,829 stars; last meaningful master-branch commit 2024-02-25 with a 2024-05-23 simulation-data commit. **Jetson record**: a 2021 IEICE paper (zinuok / KAIST) demonstrated VINS-Mono real-time on the original Jetson Nano (much weaker than Orin Nano Super) for MAV state estimation; a 2024 arXiv paper (2406.13345) showed an enhanced VINS-Mono variant achieving 50 FPS on a Raspberry Pi CM4 with on-sensor accelerated optical flow. **License**: GPL-3.0 (copyleft viral) — distribution of the onboard binary requires source disclosure for the entire linked binary and triggers GPL-3 anti-tivoization clauses for embedded firmware. +- **Source**: Source #43 (canonical), Source #46 (KAIST Jetson benchmark), Source #43-linked LICENCE for license confirmation +- **Phase**: Phase 2 +- **Target Audience**: System architects + C1 implementer +- **Confidence**: ✅ for algorithm class, mode support, and Jetson Nano feasibility; ⚠️ for Jetson Orin Nano Super specific latency (no direct measurement — but Orin Nano Super >> Jetson Nano, so feasibility is virtually certain); ⚠️ for the maintenance-status risk implied by ~24-month-old master branch. +- **Related Dimension**: SQ3+SQ4 / C1 Established-production candidate +- **Fit Impact**: **carry as lead candidate, conditional on user license decision.** Algorithmic fit is excellent (canonical mono+IMU VIO with metric scale and covariance); maintenance status is borderline; **GPL-3.0 license is a project-level decision required from the user** before this candidate can be marked Selected — see "C1 Open Decisions" section below. + +### Fact #29 — VINS-Fusion is a multi-sensor superset of VINS-Mono but its monocular+IMU mode failed to run on Jetson TX2 in a 2021 KAIST benchmark; Orin Nano Super feasibility unverified +- **Statement**: VINS-Fusion (Qin, Cao, Pan, Shen — extension of VINS-Mono) supports four documented sensor configurations: stereo+IMU, mono+IMU, stereo only, +GPS-fusion (toy example). KITTI Odometry top-ranked open-source stereo algorithm as of January 2019. **Repository state**: 4,476 stars; last update 2024-05-23; same master-branch-only convention. **Jetson record**: KAIST 2021 benchmark (Source #46) — on Jetson TX2, both **VINS-Fusion (CPU) and VINS-Fusion-imu fail to run** due to insufficient memory and CPU; VINS-Fusion-gpu (GPU-accelerated front-end) runs on TX2. Orin Nano Super has more memory than TX2 (8 GB LPDDR5 shared vs TX2's 8 GB LPDDR4 shared) and stronger CPU/GPU, but the project's onboard stack is *co-resident* with C2 VPR + C3 matcher + C5 estimator + C6 cache → memory-pressure on the VINS-Fusion-imu path is plausible. **License**: GPL-3.0, same dual-use distribution constraint as VINS-Mono. +- **Source**: Source #44 (canonical), Source #46 (KAIST Jetson benchmark) +- **Phase**: Phase 2 +- **Target Audience**: System architects + C1 implementer +- **Confidence**: ✅ for the multi-sensor mode support and KITTI ranking; ✅ for the 2021 TX2 failure-to-run finding; ⚠️ for Orin Nano Super viability (between TX2 and Xavier NX in CPU/memory; not yet measured). +- **Related Dimension**: SQ3+SQ4 / C1 Open-source candidate +- **Fit Impact**: **carry as alternate candidate, with mandatory Jetson Orin Nano Super MVE before promotion.** VINS-Mono's narrower scope (mono+IMU only, no stereo overhead) makes VINS-Mono the preferred lead within the HKUST-Aerial-Robotics family; VINS-Fusion's multi-sensor coverage is a distractor for our pinned mode. **GPL-3.0 license decision is the same as VINS-Mono** — see "C1 Open Decisions". + +### Fact #30 — OpenVINS is the most actively maintained MSCKF-class VIO and runs on Jetson Orin Nano Dev Kit + JetPack 6 + ROS 2 Humble with documented build adjustments; latency 270 ms on Xavier NX needs Orin-Nano-Super MVE +- **Statement**: OpenVINS (rpng, U. Delaware — Geneva, Eckenhoff, Lee, Yang, Huang — ICRA 2020) is a modular MSCKF (Multi-State Constraint Kalman Filter) implementation that fuses IMU state with sparse visual feature tracks via the Mourikis-Roumeliotis 2007 sliding-window MSCKF. **Mode support**: monocular, stereo, multi-camera (1–N) + IMU; mono+IMU is a documented first-class configuration. Supports SLAM features (in-state landmarks) plus pure MSCKF features. **Jetson Orin Nano evidence**: rpng/open_vins issue #421 (Genozen, Feb 2024, closed) confirms OpenVINS ROS 2 builds on Jetson Orin Nano Dev Kit + JetPack 6 + Ubuntu 22.04 + ROS 2 Humble after one build patch (`#include ` with newer OpenCV); fdcl-gwu/openvins_jetson_realsense (Nov 2025) provides a complete setup guide for Jetson Orin Nano + Intel RealSense + librealsense compiled-from-source + `--parallel-workers 1` build to avoid memory issues. **Latency record**: rpng/open_vins issue #164 — ~270 ms latency on Jetson Xavier NX (4 cores, 40% CPU utilisation). Recommended optimisations: subscriber queue size 1, Release builds with ARM-specific optimization flags (e.g., `armv8.2-a`), reduced camera resolution, prefer `odometry` topic over `pose_imu`. **License**: GPL-3.0, same dual-use distribution constraint as VINS-Mono / VINS-Fusion. Stars 2,828; 30 contributors; 12 releases; latest tag v2.7 (June 2023) but master branch active through 2024–2025 issue threads. +- **Source**: Source #45 (canonical + LICENSE + docs.openvins.com), Source #46 (KAIST Jetson benchmark for class-level CPU/memory profile), agent-tools record `29ebf728...txt` (Jetson Orin Nano build evidence) +- **Phase**: Phase 2 +- **Target Audience**: System architects + C1 implementer +- **Confidence**: ✅ for mode support, MSCKF formulation, and Jetson Orin Nano build feasibility; ⚠️ for steady-state latency on Orin Nano Super under our 5472×3648 nav frames — KAIST benchmark used 640×480; 16× pixel count is a yellow-flag. +- **Related Dimension**: SQ3+SQ4 / C1 Established-production candidate +- **Fit Impact**: **carry as lead candidate, conditional on user license decision.** OpenVINS has the most documented Jetson-Orin-Nano build path of the three GPL-3.0 candidates; MSCKF formulation is more memory-efficient than VINS-Mono's full sliding-window optimisation, which is a meaningful advantage under co-resident-process memory pressure. **GPL-3.0 license decision is the same as VINS-Mono / VINS-Fusion**. + +### Fact #31 — OKVIS2 is the most actively maintained VI-SLAM in the BSD-permissive license bucket; OKVIS2-X (T-RO 2025) extends it with optional GNSS fusion that is architecturally aligned with the project's spoof-promotion path +- **Statement**: OKVIS2 (Leutenegger — arXiv 2022, ETH/Imperial/TUM Smart Robotics Lab) is a factor-graph VI-SLAM with bounded-size optimization. Algorithmic novelty: pose-graph edges from marginalised observations are "seamlessly turned back into observations" upon loop closure, reviving old landmarks and reprojection errors. Includes lightweight CNN segmentation for dynamic-region removal. **Mode support**: monocular and multi-camera + IMU; mono+IMU is a documented first-class configuration. **Successor OKVIS2-X (Boche, Jung, Laina, Leutenegger — IEEE T-RO 2025 vol 41 pp 6064–6083, DOI 10.1109/TRO.2025.3619051; arXiv 2510.04612, Oct 2025)** generalises the core to fuse multi-camera + IMU + optional GNSS receiver + LiDAR or depth. The OKVIS2-X GNSS-fusion mode (lineage: Visual-Inertial SLAM with Tightly-Coupled Dropout-Tolerant GPS Fusion, IROS 2022) directly mirrors the project's "VIO that may opportunistically fuse a non-spoofed GPS update when promotion completes" pattern (AC-NEW-2). **Repository state**: ethz-mrl/OKVIS2-X created 2025-09-23, last push 2026-03-17, 295 stars, 2 active contributors (bochsim, SebsBarbas). **License**: 3-clause BSD on the LICENSE file (GitHub UI shows "Other (NOASSERTION)" but the file is canonical 3-clause BSD per ASL-ETH Zurich convention) — permissive, no dual-use distribution friction. +- **Source**: Source #47 (OKVIS2 canonical), Source #48 (OKVIS2-X T-RO 2025) +- **Phase**: Phase 2 +- **Target Audience**: System architects + C1 / C5 implementer +- **Confidence**: ✅ for algorithm, mode support, license, T-RO 2025 publication, repository activity; ⚠️ for Jetson Orin Nano runtime — no direct Jetson Orin Nano benchmark located; OKVIS2's factor-graph backend is plausibly heavier than OpenVINS' MSCKF on memory but lighter than Kimera (Kimera also produces a 3D mesh + semantic mesher, OKVIS2 does not). +- **Related Dimension**: SQ3+SQ4 / C1 Open-source-permissive lead candidate; potential C1+C5+C8 unified factor-graph design +- **Fit Impact**: **strong lead candidate by license + maintenance + GNSS-fusion alignment.** If license permissiveness is a priority, OKVIS2 + OKVIS2-X is the natural choice. The OKVIS2-X factor-graph also opens a design path where C5 (state estimator) collapses INTO C1 (the same factor graph absorbs sat-anchor measurements as constraints) — would simplify the pipeline at the cost of departing from the C1/C5 split, which is a Step-7.5 / `solution_draft01` design decision, not a SQ3+SQ4 question. **Pending Jetson Orin Nano Super MVE.** + +### Fact #32 — Kimera-VIO is BSD-permissive but resource-heavy; KAIST benchmark found Kimera had the highest memory usage among VIOs tested and failed Xavier-NX-class memory under multi-process load +- **Statement**: Kimera-VIO (MIT-SPARK — Rosinol, Abate, Chang, Carlone — ICRA 2020) is a VI-SLAM pipeline with frontend + backend (factor-graph optimization in iSAM2 or GTSAM) + 3D mesher + pose-graph optimizer. Mode support: stereo+IMU primary, mono+IMU optional but documented. **License**: BSD 2-Clause "Simplified" (LICENSE.BSD on the repo) — permissive. **Maintenance**: active issue/PR threads through Dec 2024 / Feb 2025 covering ROS 2 integration, mono-inertial discussion, dependency management. **Resource profile** (Source #46 KAIST 2021 benchmark): Kimera had the highest memory usage among the 9 algorithms tested (numerous computations per keyframe); Kimera failed to fit on Xavier NX-class memory under sustained multi-process load. The 3D mesh + semantic-label outputs are unused by the project's narrow C1 mandate (relative 6-DoF + covariance only) — Kimera's overhead is unjustified vs OKVIS2 / OpenVINS for our use case. +- **Source**: Source #49 (Kimera canonical + LICENSE.BSD), Source #46 (KAIST Jetson benchmark) +- **Phase**: Phase 2 +- **Target Audience**: System architects (build-vs-buy, mesh-feature decision) +- **Confidence**: ✅ for algorithm, license, maintenance status; ✅ for the Source #46 finding (KAIST 2021); ⚠️ for whether Orin Nano Super's larger memory + Ampere GPU lifts Kimera into feasibility — the Source-46 failure was on Xavier NX 8 GB shared, same memory budget as Orin Nano Super, but Orin Nano Super has higher per-core throughput. +- **Related Dimension**: SQ3+SQ4 / C1 Open-source-permissive secondary candidate +- **Fit Impact**: **carry as fallback only, not lead.** Kimera's permissive license is attractive but its resource overhead (especially the unused 3D mesh + semantic mesher) is a poor fit under co-resident process pressure. Use as a conservative secondary fallback if OKVIS2 unexpectedly fails Jetson MVE. **Status**: not lead. + +### Fact #33 — DROID-SLAM is disqualified by AC-4.2: ≥11 GB GPU VRAM inference budget exceeds the project's 8 GB shared LPDDR5; further, DROID-SLAM is monocular VO/SLAM without IMU fusion and would require an external metric-scale wrapper +- **Statement**: DROID-SLAM (princeton-vl, Teed & Deng — NeurIPS 2021; arXiv 2108.10869) requires ≥11 GB GPU memory to run inference per the official README; training requires ≥24 GB on 4× RTX 3090. Issue #121 confirms that even with 128 GB system RAM and 16 GB VRAM (RTX 4080), users hit very large RAM consumption quickly. Algorithmically, DROID-SLAM is **monocular VO/SLAM** with recurrent dense bundle adjustment over a complete history of camera poses — no native IMU fusion; output pose is in arbitrary scale (no metric scale recovery without external alignment). DPV-SLAM (ECCV 2024, princeton-vl) is the lighter successor at ~4–5 GB GPU memory; DPVO (NeurIPS 2023, princeton-vl) is even lighter at ~3 GB, but neither natively integrates IMU. +- **Source**: Source #50 (DROID-SLAM canonical), Source #51 (DPVO / DPV-SLAM successor), Source #52 (DPVO-QAT++ memory measurement) +- **Phase**: Phase 2 +- **Target Audience**: System architects + C1 implementer +- **Confidence**: ✅ +- **Related Dimension**: SQ3+SQ4 / C1 disqualified candidate +- **Fit Impact**: **DISQUALIFIED outright.** AC-4.2 sets the 8 GB shared CPU+GPU memory budget; DROID-SLAM's ≥11 GB GPU-only requirement violates it before adding co-resident C2/C3/C5/C6 processes. Cite as "what the project cannot afford" in `solution_draft01` to pre-empt obvious questions. + +### Fact #34 — DPVO is monocular VO only (no IMU fusion); it can fit a Jetson-suitable memory footprint with QAT but cannot satisfy the C1 VIO mandate alone — would need an external IMU + metric-scale wrapper +- **Statement**: DPVO (Teed, Lipson, Deng — NeurIPS 2023; ECCV 2024 DPV-SLAM successor) is a deep-learning monocular VO with sparse patch tracking + differentiable bundle adjustment. **Mode**: monocular VO only — no IMU fusion in the published paper or repository; output pose is in arbitrary scale. Memory footprint: DPVO ~3 GB GPU, DPV-SLAM ~4–5 GB GPU on standard hardware; DPVO-QAT++ (arXiv 2511.12653, Cheng Liao, Nov 2025) reduces peak reserved memory to 1.02 GB on RTX 4060 (8 GB) via fused-CUDA INT8 fake-quantization while preserving ATE on TartanAir/EuRoC. **License**: MIT (permissive). Repository: 989 stars; last update 2024-10-12. **Crucial gap**: DPVO does NOT meet the C1 mandate of a "VIO that produces metric-scale 6-DoF + attitude with σ ≤ 5°" — for the project to use DPVO as the *VO half* of C1, an additional IMU+scale-fusion module (loosely-coupled ESKF with VO velocity / displacement priors) must be designed; alternatively, DPVO's pose can feed C5 directly as a relative-displacement constraint, with attitude served separately by FC IMU integration. **Jetson Orin Nano runtime evidence**: indirect — DPVO-QAT++ benchmarks on RTX 4060 desktop, NOT Jetson Orin Nano. The Ampere GPU architecture is shared between RTX 4060 and Orin Nano Super (both Ampere); the Orin Nano Super's GPU is smaller, so direct extrapolation is not safe — Jetson MVE required. +- **Source**: Source #51 (DPVO / DPV-SLAM canonical), Source #52 (DPVO-QAT++ Nov 2025) +- **Phase**: Phase 2 +- **Target Audience**: System architects + C1 / C5 implementer +- **Confidence**: ✅ for "VO only, no IMU fusion" and the memory footprints; ⚠️ for Jetson Orin Nano direct runtime (no measurement); ⚠️ for the operational complexity of the QAT pipeline (teacher-student distillation training is a significant prerequisite vs the classical VINS-* / OpenVINS / OKVIS2 candidates). +- **Related Dimension**: SQ3+SQ4 / C1 conditional candidate (VO not VIO; needs external IMU wrapper) +- **Fit Impact**: **NOT a drop-in C1 candidate; conditional fit only.** DPVO is **not** a substitute for VINS-Mono / OpenVINS / OKVIS2 — it is a candidate for the *VO half* of a hybrid design where C5 (estimator) absorbs IMU and DPVO provides relative-pose priors. This adds design complexity and is **not preferred** unless one of the established VIO candidates fails Jetson MVE for memory reasons. **Status**: secondary, conditional. + +### Fact #35 — Pure VO baseline (KLT optical flow + 5-point essential matrix or homography RANSAC) is the project's mandatory simple-baseline candidate and is the de-facto fallback when learning-based methods fail on Jetson-budget constraints +- **Statement**: The classical pipeline — Shi-Tomasi or FAST corner detection → KLT pyramidal optical flow tracking (`cv::calcOpticalFlowPyrLK`) → 5-point essential matrix (Nister, `cv::findEssentialMat`) or homography RANSAC (`cv::findHomography`) → relative pose with arbitrary scale → metric-scale alignment via IMU integration externally — is the foundational visual-odometry pipeline implemented in OpenCV samples and pedagogical repositories. For the project's nadir-down UAV at 1 km AGL over Ukrainian steppe (predominantly planar terrain, low relief), the **homography path is geometrically appropriate** (a plane induces a homography between two views); for non-planar relief, the **essential-matrix path is appropriate** at a small overhead. License: public domain / OpenCV-Apache-2.0 / MIT (whatever reference implementation is chosen) — permissive. Reference: representative public Monocular-Video-Odometery (MIT, alishobeiri 2018), Monocular-Visual-Odometry (Yacynte) at translation error 0.94% / rotation error 0.015°/m on KITTI dataset. +- **Source**: Source #53 (OpenCV docs + reference implementations) +- **Phase**: Phase 2 +- **Target Audience**: System architects + C1 implementer + risk reviewer +- **Confidence**: ✅ +- **Related Dimension**: SQ3+SQ4 / C1 Simple-baseline candidate (mandatory per Component Option Breadth rule) +- **Fit Impact**: **carry as the project's `Simple baseline / known-runnable / known-failure-mode` C1 fallback.** Not a lead, but mandatory presence. Failure modes: (a) low-texture cropland / snow → KLT track loss; (b) sharp turns → low-overlap homography degeneracy; (c) no native IMU fusion → must wrap with external metric-scale alignment (same wrapper as DPVO). **Status**: simple-baseline reference; cited in `solution_draft01` to anchor the failure analysis. + +### Fact #36 — Step-0.5-time-window assessment: VINS-Mono / VINS-Fusion master branches are at the Critical-novelty 18-month boundary; OpenVINS and OKVIS2 are within window; DPVO is borderline; the established baselines (KLT + RANSAC) are exempt +- **Statement**: Per Step 0.5 timeliness assessment in `00_question_decomposition.md`, Critical-novelty topics require sources within 6 months for SOTA claims and 18 months for established libraries' API behaviour. Audit at access time 2026-05-07: VINS-Mono master last meaningful commit 2024-02-25 → ~27 months → **just over the 18-month window**; VINS-Fusion 2024-05-23 → ~24 months → just over; OpenVINS master active (issue threads through Feb 2025) and v2.7 release June 2023 → ~35 months for the tagged release but master in stable maintenance → within de-facto window for an established library; OKVIS2-X push 2026-03-17 → ~2 months → **fully within window**; DPVO last code update 2024-10-12 → ~19 months → just over but DPV-SLAM ECCV 2024 keeps the algorithm class within 6-month claim window; KLT / 5-point / RANSAC / homography → established baselines per Step 0.5 → **no time window applies**. **Implication**: VINS-Mono / VINS-Fusion fall into the "older than 18 months but classical authoritative reference" bucket — Step 0.5 allows up to 18 months strictly, but downstream forks (vins-mono-android, embedded variants) and the IEEE T-RO 2018 publication keep the algorithm class in active community use. Recommended treatment: **keep as candidates but require live MVE on Jetson Orin Nano Super before promotion to Selected**, to revalidate against the current OpenCV / Ceres / ROS 2 stack. +- **Source**: Source #43, Source #44, Source #45, Source #47, Source #48, Source #51 (timeliness audit per source) +- **Phase**: Phase 2 +- **Target Audience**: Step-7.5 reviewer + System architects +- **Confidence**: ✅ +- **Related Dimension**: SQ3+SQ4 / C1 candidate-pool integrity +- **Fit Impact**: **applies a conservative timeliness gate: every C1 candidate from VINS-Mono / VINS-Fusion / DPVO requires an Orin-Nano-Super MVE before being marked Selected**, since their master-branch staleness pushes them out of the Critical-novelty 18-month window. OpenVINS / OKVIS2 / OKVIS2-X / Kimera are within window via active issue threads or recent releases. + +### C1 Component Applicability Gate — preliminary table (this session; structured Restrictions×AC sub-matrix per candidate is next session's work) + +| Candidate | Mode (project) | License | Active maintenance? | Jetson Orin Nano Super runnable? | Native IMU fusion? | Native metric scale? | License blocks dual-use? | Preliminary status | +|---|---|---|---|---|---|---|---|---| +| **VINS-Mono** | mono+IMU | GPL-3.0 (copyleft) | ⚠️ borderline (24 mo) | ✅ proven on Jetson Nano (2021) → Orin Nano Super virtually certain | ✅ | ✅ | **⚠️ Verify with user** | Lead candidate **conditional on user license decision** + Orin-Nano-Super MVE | +| **VINS-Fusion** | mono+IMU (mode) | GPL-3.0 | ⚠️ borderline (24 mo) | ⚠️ failed on TX2 (KAIST 2021); Orin Nano Super untested | ✅ | ✅ | **⚠️ Verify with user** | Alternate, secondary to VINS-Mono within HKUST family | +| **OpenVINS** | mono+IMU | GPL-3.0 | ✅ active master | ✅ build confirmed on Orin Nano Dev Kit + JetPack 6 (2024 + 2025 community evidence); ~270 ms latency on Xavier NX | ✅ MSCKF | ✅ | **⚠️ Verify with user** | **Lead candidate** **conditional on user license decision** (best Jetson-Orin-Nano evidence + most maintained of the GPL-3 trio) | +| **OKVIS2 / OKVIS2-X** | mono+IMU (+ optional GNSS) | BSD-3 | ✅ very active (2026 pushes) | ⚠️ no direct Jetson Orin Nano measurement; factor-graph backbone plausibly heavier than MSCKF | ✅ | ✅ | ✅ no | **Lead candidate by license + maintenance + spoof-promotion architectural alignment**, pending Jetson MVE | +| **Kimera-VIO** | mono+IMU (optional) | BSD-2 | ✅ active | ⚠️ failed on Xavier NX 8 GB shared under multi-process (KAIST 2021) | ✅ | ✅ | ✅ no | Fallback secondary; resource overhead poor fit for project | +| **DROID-SLAM** | mono VO/SLAM only | (project repo) | reference baseline | ❌ ≥11 GB GPU VRAM > 8 GB AC-4.2 budget | ❌ | ❌ (arbitrary scale) | n/a | **DISQUALIFIED** by AC-4.2 | +| **DPVO / DPV-SLAM** | mono VO only | MIT | ⚠️ borderline (19 mo on code, ECCV 2024 paper) | ⚠️ DPVO-QAT++ (Nov 2025) shows 1.02 GB peak on RTX 4060 desktop; Jetson Orin Nano untested | ❌ (needs external IMU wrapper) | ❌ (needs external scale alignment) | ✅ no | Conditional secondary — VO half of a hybrid C1+C5 design only; not a drop-in VIO replacement | +| **Pure VO baseline (KLT + 5pt RANSAC / homography)** | mono VO only | OpenCV-Apache-2.0 / MIT | ✅ foundational (no time window) | ✅ runs on any Jetson | ❌ (needs external IMU wrapper) | ❌ (needs external scale alignment) | ✅ no | **Mandatory simple-baseline reference** per Component Option Breadth rule | + +**Surviving lead candidates (preliminary)**, in priority order based on this session's evidence: +1. **OpenVINS** (GPL-3.0, MSCKF, best Jetson Orin Nano evidence) — pending user license decision + Orin-Nano-Super MVE +2. **OKVIS2 / OKVIS2-X** (BSD-3, factor-graph + GNSS-fusion alignment, most active maintenance) — pending Jetson MVE +3. **VINS-Mono** (GPL-3.0, sliding-window optimization, proven on Jetson Nano) — pending user license decision + Orin-Nano-Super MVE +4. **Pure VO baseline** (mandatory simple-baseline; runtime guaranteed; carries the project as a graceful fallback) + +**Disqualified outright**: DROID-SLAM (AC-4.2 memory budget), RTAB-Map and ORB-SLAM3 (already pruned by Fact #16). + +**Conditional / not-direct-fit**: DPVO / DPV-SLAM (VO not VIO, needs external IMU wrapper), Kimera-VIO (resource overhead unjustified for narrow C1 mandate). + +### C1 Open Decisions (to be resolved before SQ3+SQ4 closure) + +**Decision D-C1-1 — GPL-3.0 license posture for the onboard binary** (BLOCKING for the GPL-3.0 trio: VINS-Mono / VINS-Fusion / OpenVINS). +- The three most established VIO candidates (VINS-Mono / VINS-Fusion / OpenVINS) are GPL-3.0 (viral copyleft). +- For dual-use UAV deployment, GPL-3 binary distribution to a customer triggers obligations: source-code disclosure for the entire linked binary, anti-tivoization clauses for embedded firmware updates, viral effect on any proprietary code linked into the same binary. +- BSD/MIT alternatives exist (OKVIS2 BSD-3, Kimera BSD-2, DPVO MIT, pure-VO baseline OpenCV-Apache-2.0), but each comes with secondary trade-offs (Jetson MVE risk, missing IMU fusion, resource overhead). +- Three options for the user: + - **(a)** Accept GPL-3.0 — distribution model = release source on customer request; or operate the system as a service rather than transferring binaries. Lowest-risk algorithmic path (most-tested candidates). + - **(b)** Restrict to permissive licenses only (BSD/MIT) — lead candidate becomes OKVIS2; carries Jetson MVE risk. + - **(c)** Keep both options open through the design phase — make the final license decision after the Jetson Orin Nano MVE results are in. +- **Recommended default**: **(c)** — defer the binary commitment until empirical evidence on Jetson Orin Nano. This is recorded as a flagged decision; SQ3+SQ4 candidate matrix will carry both license families to Step 7.5. + +**Decision D-C1-2 — Acceptance of Jetson Orin Nano MVE as a Step-7.5 prerequisite** (procedural). +- Per the Per-Mode API Capability Verification rule, every lead candidate library/SDK requires `context7` (or equivalent docs) lookup + a Minimum Viable Example for the project's pinned mode + per-numbered-Restriction × per-numbered-AC sub-matrix. +- The Component Applicability Gate above is **preliminary** — it documents enumeration evidence but does NOT yet contain `context7` per-mode capability verification or the structured sub-matrix. +- **Next session's mandatory work**: `context7` lookup (3 mandatory queries) for OpenVINS / OKVIS2 / VINS-Mono; per-Restriction × per-AC sub-matrix per candidate; the same for the simple-baseline path; record into `02_fact_cards.md` per the engine template + `06_component_fit_matrix.md` per Step 7.5. + +### C1 Boundary check: candidate enumeration is saturated for this session + +Saturation signals observed: (a) all 7 named candidates from `00_question_decomposition.md` C1 row enumerated with at least one canonical L1 source per candidate; (b) Jetson Orin Nano runtime evidence located for OpenVINS (direct) and VINS-Mono (Jetson Nano + RPi CM4); other candidates carry "MVE required" gates explicitly; (c) license diversity covered (GPL-3.0 trio + BSD-permissive duo + MIT + permissive-baseline); (d) explicit disqualifications recorded with cited evidence (DROID-SLAM, RTAB-Map, ORB-SLAM3). **Open**: per-mode `context7` verification (BLOCKING per rule) + Restrictions×AC sub-matrices (BLOCKING per Step 7.5) — explicitly deferred to next session. diff --git a/_docs/_autodev_state.md b/_docs/_autodev_state.md index bb5b618..e431403 100644 --- a/_docs/_autodev_state.md +++ b/_docs/_autodev_state.md @@ -6,8 +6,8 @@ step: 2 name: Research status: in_progress sub_step: - phase: 10 - name: sq3-sq4-c1-vio-candidates - detail: "Mode A Phase 2 engine Step 4 — SQ3+SQ4 per-component candidate matrix, starting with C1 (VO/VIO). SQ1 ✓, SQ2 ✓ closed with three architectural decisions resolved by user (recorded in 00_question_decomposition.md): (1) DSM dependency = (a) 3-DoF acceptance (no DSM in cache; attitude from IMU/VIO); (2) AdHoP = (b) conditional (only when initial reprojection error exceeds threshold); (3) Top-N re-rank = (a) promoted to explicit pipeline sub-stage between C2 and C3. Pipeline shape entering SQ3+SQ4: C1 (VIO) → C2 (VPR) → Top-N re-rank → C3 (matcher) → AdHoP-conditional → C4 (PnP+RANSAC+LM) → C5 (estimator) → C8 (FC adapter), with C6 (cache 2D-ortho) + C7 (Jetson runtime) + C9 (datasets) + C10 (provisioning) cross-cutting. C1 candidates entering SQ3+SQ4 (RTAB-Map and ORB-SLAM3 already pruned by Fact #16): VINS-Mono / VINS-Fusion / OpenVINS / OKVIS2 / DROID-SLAM / DPVO / pure-VO baseline (KLT + RANSAC homography). Per-mode context7 capability verification mandatory for every lead library/SDK candidate. Next conversation: re-enter Step 2 / sub_step phase 10 name 'sq3-sq4-c1-vio-candidates', execute Per-Component Search Plan (`research/steps/03_engine-investigation.md` Step 4) starting with C1: enumerate 5+ search variants per candidate, log to 01_source_registry.md + 02_fact_cards.md, then Component Applicability Gate (must pass: monocular VIO, ≥3 fps output, Jetson Orin Nano runnable, license OK for dual-use)." + phase: 12 + name: c1-context7-and-restrictions-ac-submatrix + detail: "C1 candidate enumeration done (Sources #43–#53 in 01_source_registry.md, Facts #28–#36 in 02_fact_cards.md). Surviving lead candidates (priority order): (1) OpenVINS — GPL-3.0, best Jetson Orin Nano evidence; (2) OKVIS2 / OKVIS2-X — BSD-3, most actively maintained, GNSS-fusion alignment for AC-NEW-2; (3) VINS-Mono — GPL-3.0, proven on Jetson Nano; (4) Pure VO baseline — mandatory simple-baseline reference. Disqualified: DROID-SLAM (AC-4.2 memory budget), RTAB-Map / ORB-SLAM3 (Fact #16). Conditional: DPVO (VO not VIO; needs external IMU wrapper), Kimera-VIO (resource overhead). Two open decisions surfaced: D-C1-1 GPL-3.0 license posture for onboard binary (BLOCKING for GPL-3 trio) and D-C1-2 Jetson Orin Nano MVE schedule. NEXT SESSION'S WORK (BLOCKING per Per-Mode API Capability Verification rule): (a) context7 lookup × 3 mandatory queries per lead candidate (OpenVINS, OKVIS2/OKVIS2-X, VINS-Mono) covering mode enumeration + project's exact mode runnable example + disqualifier probe; (b) MVE block per candidate in 02_fact_cards.md; (c) per-numbered-Restriction × per-numbered-AC sub-matrix per candidate; (d) write 06_component_fit_matrix.md draft for C1 row; (e) ASK USER on Decision D-C1-1 before promoting any GPL-3 candidate to Selected. AFTER C1 IS CLOSED: proceed to C2 (VPR) candidate enumeration." retry_count: 0 cycle: 1