mirror of
https://github.com/azaion/gps-denied-onboard.git
synced 2026-06-22 15:11:12 +00:00
Update autodev state and candidate enumeration for C1 VIO
Revised the autodev state to reflect the transition to phase 12, detailing the candidate enumeration for C1 (VIO) with a focus on context7 capability verification and restrictions assessment. Updated the source registry to indicate progress on C1 candidates, including the addition of new sources and their evaluation status. Enhanced fact cards with detailed assessments of VINS-Mono and VINS-Fusion, highlighting their suitability and licensing considerations for dual-use deployment. Deferred context7 verification and structured sub-matrix tasks to the next session.
This commit is contained in:
@@ -15,7 +15,7 @@
|
||||
| SQ6 — ArduPilot vs iNav external positioning | **Saturated for protocol-level architectural decision** (further detail deferred to SQ8 for spoofing-side fields and to design phase for SITL parameter tuning) | Major finding: iNav has no inbound external-positioning MAVLink handler; AC-4.3 wording must be revised. See `02_fact_cards.md` "SQ6 Conclusions". |
|
||||
| SQ1 — Existing GPS-denied UAV systems | **Saturated.** 13 sources logged across academic / open-source / commercial / defense-program / Ukraine-practitioner. Closest peer system: Twist Robotics OSCAR (deployed in Ukraine). Closest open-source pipeline-match: snktshrma/ngps_flight (NGPS, ArduPilot GSoC 2024 — LightGlue+SuperPoint+UKF+VISION_POSITION_ESTIMATE). Closest deployed commercial: Auterion Artemis (Skynode N + Visual Navigation, Ukraine-tested, 1000-mile range). | See `02_fact_cards.md` SQ1 cluster + working summary. |
|
||||
| SQ2 — Canonical pipeline decomposition | **Saturated.** 5 surveys/benchmarks logged (Skoltech aerial VPR, U.Maine cross-view, OrthoLoC 2.5D geodata, AnyVisLoc low-altitude multi-view, NUDT 2026 sciopen survey). All converge on **`retrieval → matching → pose-estimation`** hierarchical framework with VIO/IMU as auxiliary. Two new architectural facts added to C1–C10: (a) **AdHoP-style perspective-refinement loop** between matching and PnP (+63% translation accuracy, method-agnostic), (b) **DSM 2.5D dependency** for full 6-DoF on aerial-to-satellite (must be resolved with the Suite Sat Service or accepted as a 3-DoF degraded mode). Practitioner runtime evidence: AnyLoc on RTX 3090 = 0.63s/descriptor, SuperGlue re-rank = 17–25s; on Jetson Orin Nano these are non-viable for our 400 ms p95 budget — must restrict to lightweight VPR (e.g., MixVPR / SALAD class) + LightGlue/XFeat-class matchers. See `02_fact_cards.md` "SQ2 Conclusions". |
|
||||
| SQ3+SQ4 — Per-component candidates (C1–C10) | Not started | |
|
||||
| SQ3+SQ4 — Per-component candidates (C1–C10) | **In progress** — C1 (VIO) candidate enumeration done (Sources #43–#52); per-mode `context7` verification + Restrictions×AC sub-matrix per surviving candidate deferred to next session. C2–C10 not started. | See `02_fact_cards.md` C1 cluster + preliminary applicability table. |
|
||||
| SQ5 — Failure modes / deployment lessons | Not started (interleaved with SQ3/SQ4) | |
|
||||
| SQ7 — Datasets, SITL, replay environments | Not started | |
|
||||
| SQ8 — Safety considerations (AC-NEW-4 / AC-NEW-7) | Not started | Carries the AP_GPS spoofing-signal probe deferred from SQ6. |
|
||||
@@ -521,3 +521,139 @@
|
||||
- **Research Boundary Match**: **Full match** (low-altitude UAV AVL is the survey's exact subject)
|
||||
- **Summary**: Survey-level confirmation of the canonical "**retrieval-matching-pose estimation**" hierarchical framework. Verbatim claim: "the hierarchical framework balances search efficiency, positioning accuracy, and scene generalization, becoming a robust technical path for low-altitude long-endurance absolute localization." Compares the framework against alternatives that are explicitly rejected: (a) relative visual localization (cumulative errors — VIO/SLAM only); (b) end-to-end direct localization (poor generalization); (c) map-free localization (scene-dependent). Sub-component evolution per stage: (a) retrieval = template-matching (SAD/SSD/NCC) → BoW/VLAD → deep-learning (annular/dense feature segmentation, contrastive InfoNCE, self-supervised); (b) matching = SIFT/SURF/ORB → SuperPoint+LightGlue/RoMa (sparse / semi-dense / dense); (c) pose estimation = PnP variants + RANSAC + IMU prior fusion. **Identifies four open challenges** that align with project risks: (i) cross-domain generalization (war-zone scene change); (ii) real-time inference on edge platforms (Jetson); (iii) robustness to complex environments (cropland, snow, low texture); (iv) high-quality datasets (the same gap our project's AC-NEW-7 / cache provisioning works around). **Lightweight-model-design-for-edge-deployment is named as a primary future-research direction** — directly validates project's Jetson Orin Nano constraint as a recognized field-level challenge, not a project-specific oddity.
|
||||
- **Related Sub-question**: SQ2 (framework canonicalness), SQ3+SQ4 (per-component evolution), SQ5 (named open challenges align with project risks)
|
||||
|
||||
---
|
||||
|
||||
## SQ3+SQ4 / C1 (Visual / Visual-Inertial Odometry) — Candidate enumeration
|
||||
|
||||
### Source #43
|
||||
- **Title**: VINS-Mono — A Robust and Versatile Monocular Visual-Inertial State Estimator (HKUST-Aerial-Robotics)
|
||||
- **Link**: https://github.com/HKUST-Aerial-Robotics/VINS-Mono ; LICENCE: https://github.com/HKUST-Aerial-Robotics/VINS-Mono/blob/master/LICENCE
|
||||
- **Tier**: L1 (canonical reference implementation; published in IEEE T-RO 2018 by Qin, Li, Shen)
|
||||
- **Publication Date**: original 2018; repository last meaningful update 2024-02-25 (per GitHub commit log; 2024-05-23 simulation-data commit only)
|
||||
- **Timeliness Status**: ⚠️ **Borderline.** ~24 months since the last meaningful master-branch commit at access time (2026-05-07). Established baseline that does NOT trigger Step 0.5's 18-month timeliness rejection because (a) IEEE T-RO publication is the canonical authority for the algorithm, (b) downstream forks (vins-mono-android, embedded variants) keep the algorithm class actively deployed.
|
||||
- **Version Info**: No GitHub releases / tags (master-branch-only project). Stars 5,829.
|
||||
- **Target Audience**: Mono+IMU VIO implementers; UAV state estimation researchers
|
||||
- **Research Boundary Match**: **Full match for the candidate's pinned mode** — monocular camera + IMU producing 6-DoF metric pose. The VINS-Mono README explicitly names this configuration as primary.
|
||||
- **Summary**: Optimization-based sliding-window monocular VIO. Features: efficient IMU pre-integration (Forster et al. 2017), automatic initialization, online camera-IMU extrinsic calibration, online camera-IMU temporal calibration, failure detection + recovery, loop detection (DBoW2-based), global pose graph optimization. Output is metric-scale 6-DoF pose at IMU rate (typically 100–200 Hz) with covariance from the optimization Hessian. **License: GPL-3.0 (copyleft viral)** — every binary distribution requires source disclosure for the entire linked binary; relevant for dual-use deployment if the companion image is sold or transferred to a customer.
|
||||
- **Related Sub-question**: SQ3+SQ4 / C1 lead candidate
|
||||
|
||||
### Source #44
|
||||
- **Title**: VINS-Fusion — Optimization-based multi-sensor state estimator (HKUST-Aerial-Robotics)
|
||||
- **Link**: https://github.com/HKUST-Aerial-Robotics/VINS-Fusion ; LICENCE: https://github.com/HKUST-Aerial-Robotics/VINS-Fusion/blob/master/LICENCE
|
||||
- **Tier**: L1 (canonical reference; superset of VINS-Mono)
|
||||
- **Publication Date**: original 2019 (Qin, Cao, Pan, Shen — ICRA workshop / IROS); repository last update 2024-05-23
|
||||
- **Timeliness Status**: ⚠️ **Borderline.** ~24 months since the last update at access time. Same Step-0.5 reasoning as VINS-Mono — established class.
|
||||
- **Version Info**: master-branch-only. Stars 4,476. Top-ranked open-source stereo-VIO on KITTI Odometry as of January 2019.
|
||||
- **Target Audience**: Multi-sensor VIO implementers (mono+IMU, stereo, stereo+IMU, +GPS fusion)
|
||||
- **Research Boundary Match**: **Full match** for monocular+IMU mode. VINS-Fusion README explicitly enumerates four sensor configurations (mono+IMU, stereo, stereo+IMU, +GPS toy example).
|
||||
- **Summary**: Superset of VINS-Mono adding stereo and GPS-fusion modes. Same algorithmic core (sliding-window optimization with IMU pre-integration). Online spatial + temporal camera-IMU calibration; visual loop closure; ROS Kinetic/Melodic build dependency. **License: GPL-3.0** — same dual-use distribution constraint as VINS-Mono. Independent KAIST benchmark (Source #46) found VINS-Fusion CPU mode + VINS-Fusion-imu **fail to run** on Jetson TX2 (insufficient memory and CPU); GPU-accelerated VINS-Fusion-gpu does run on TX2. Implication for project: VINS-Fusion-imu on Jetson Orin Nano Super is feasible but not certain; needs MVE.
|
||||
- **Related Sub-question**: SQ3+SQ4 / C1 lead candidate
|
||||
|
||||
### Source #45
|
||||
- **Title**: OpenVINS — An open source platform for visual-inertial navigation research (Robot Perception and Navigation Group, U. of Delaware — rpng)
|
||||
- **Link**: https://github.com/rpng/open_vins ; docs: https://docs.openvins.com/ ; LICENSE: https://github.com/rpng/open_vins/blob/master/LICENSE
|
||||
- **Tier**: L1 (canonical research implementation; ICRA 2020 paper Geneva, Eckenhoff, Lee, Yang, Huang)
|
||||
- **Publication Date**: original 2020; latest tagged release v2.7 = 2023-06; ongoing master-branch commits through 2024–2025 (latest issue threads through Feb 2025)
|
||||
- **Timeliness Status**: ✅ Currently valid (master branch active; latest tagged release ~35 months but library is in stable/maintenance mode with continued issue triage).
|
||||
- **Version Info**: Stars 2,828; 30 contributors; 12 releases. v2.7 is the current tagged stable.
|
||||
- **Target Audience**: MSCKF/EKF VIO implementers; researchers needing a reference MSCKF
|
||||
- **Research Boundary Match**: **Full match** for monocular+IMU mode. OpenVINS supports mono, stereo, multi-camera (1–N cameras) + IMU; mono is a documented first-class mode.
|
||||
- **Summary**: Modular MSCKF (Multi-State Constraint Kalman Filter) implementation built around an Extended Kalman filter that fuses inertial state with sparse visual feature tracks via the sliding-window MSCKF formulation (Mourikis & Roumeliotis 2007). Supports SLAM features (in-state landmarks) plus pure MSCKF features (out-of-state). ROS1 + ROS2 (Humble) builds documented; Jetson Orin Nano Dev Kit + JetPack 6 + ROS 2 Humble compilation **confirmed working** by community contributors (rpng/open_vins issue #421, fdcl-gwu/openvins_jetson_realsense Nov 2025 setup guide). **License: GPL-3.0** — same dual-use distribution constraint. Reported latency ~270 ms on Xavier NX (4-core, ARM, 40% CPU usage) per issue #164; needs Jetson-Orin-Nano-Super MVE for production budget verification.
|
||||
- **Related Sub-question**: SQ3+SQ4 / C1 lead candidate
|
||||
|
||||
### Source #46
|
||||
- **Title**: Run Your Visual-Inertial Odometry on NVIDIA Jetson — Benchmark Tests on a Micro Aerial Vehicle (Jeon, Jung, Lee, Choi, Myung — KAIST)
|
||||
- **Link**: https://arxiv.org/abs/2103.01655 ; KAIST VIO dataset: https://github.com/zinuok/kaistviodataset
|
||||
- **Tier**: L1 (peer-reviewed conference, IROS-track preprint with public dataset)
|
||||
- **Publication Date**: arXiv 2021-03-02
|
||||
- **Timeliness Status**: ⚠️ Older than the 18-month Critical-novelty window, but **uniquely authoritative** for the specific question "do these VIO algorithms run on a Jetson?"; the included algorithms (VINS-Mono, VINS-Fusion, ROVIO, ALVIO, Stereo-MSCKF, Kimera, ORB-SLAM2-stereo) are all classical baselines whose runtime characteristics on ARM CPUs have not changed materially. Jetson hardware comparison (TX2 / Xavier NX / AGX Xavier) does NOT include Orin Nano — must extrapolate.
|
||||
- **Version Info**: Conference paper.
|
||||
- **Target Audience**: UAV state-estimation engineers picking a VIO for a Jetson companion
|
||||
- **Research Boundary Match**: **Strong match for the question**, partial for the hardware (no Orin Nano). KAIST VIO dataset is indoor mocap, not UAV-aerial-nadir — the *latency / CPU / memory* numbers transfer; the *accuracy* numbers do not transfer to our domain.
|
||||
- **Summary**: Comprehensive benchmark of 9 algorithms on TX2, Xavier NX, AGX Xavier: VINS-Mono, VINS-Fusion (CPU), VINS-Fusion-gpu, VINS-Fusion-imu, ROVIO, Stereo-MSCKF, ALVIO, Kimera, ORB-SLAM2-stereo. **Hard findings**: (a) on TX2, **VINS-Fusion (CPU) and VINS-Fusion-imu fail to run** due to insufficient memory and CPU performance — VINS-Fusion-gpu does run; (b) all algorithms except ROVIO show >100% CPU usage (multi-core utilisation, OK for our 6-core Orin Nano A78AE); (c) Kimera has the highest memory usage among VIO methods (numerous computations per keyframe), failure-prone on Xavier NX-class memory; (d) Stereo-MSCKF has the lowest memory among stereo VIOs; (e) ROVIO has the lowest CPU usage owing to its patch-tracking formulation. **Implication for project**: Jetson Orin Nano Super (8 GB shared, 6-core A78AE, Ampere GPU, 67 TOPS sparse INT8) is between Xavier NX and AGX Xavier in CPU performance and memory; algorithms passing on Xavier NX should pass on Orin Nano Super, but VINS-Fusion-imu's TX2 failure is a yellow-flag for memory pressure under co-resident C2/C3/C5 modules.
|
||||
- **Related Sub-question**: SQ3+SQ4 / C1 (VINS-Mono / VINS-Fusion / OpenVINS / Kimera / Stereo-MSCKF / ROVIO Jetson runtime evidence), SQ5 (resource-budget failure modes)
|
||||
|
||||
### Source #47
|
||||
- **Title**: OKVIS2 — Realtime Scalable Visual-Inertial SLAM with Loop Closure (Leutenegger, ETH/Imperial/TUM Smart Robotics Lab)
|
||||
- **Link**: https://github.com/ethz-mrl/okvis2 ; arXiv: https://arxiv.org/abs/2202.09199 ; LICENSE: https://github.com/ethz-mrl/okvis2/blob/main/LICENSE
|
||||
- **Tier**: L1 (canonical implementation; arXiv 2022 by paper author)
|
||||
- **Publication Date**: original arXiv 2022; OKVIS2-X T-RO 2025 successor (Boche, Jung, Laina, Leutenegger — IEEE T-RO 2025, vol 41 pp 6064–6083, DOI 10.1109/TRO.2025.3619051; arXiv 2510.04612, Oct 2025). Repository last push 2026-03-17 (ethz-mrl/OKVIS2-X).
|
||||
- **Timeliness Status**: ✅ **Current.** Active development through 2026; OKVIS2-X is the most recent published VI-SLAM system in this class.
|
||||
- **Version Info**: ethz-mrl/okvis2 (core) and ethz-mrl/OKVIS2-X (multi-sensor extension with optional GNSS / LiDAR / dense depth).
|
||||
- **Target Audience**: Factor-graph VI-SLAM implementers; mid-large-scale loop-closure use cases
|
||||
- **Research Boundary Match**: **Full match** for monocular+IMU mode. OKVIS2 README + paper explicitly support mono and multi-camera VI configurations. OKVIS2-X adds GNSS fusion (relevant: VINS-Fusion-style GPS-when-available drop-in IS the project's eventual posture in non-spoofed regions).
|
||||
- **Summary**: Factor-graph VI-SLAM with bounded-size optimization. Innovation: pose-graph edges from marginalised observations can be "seamlessly turned back into observations" upon loop closure, reviving old landmarks and reprojection errors. Includes lightweight CNN segmentation for dynamic-region removal. OKVIS2-X (2025) generalises the core to fuse multi-camera + IMU + optional GNSS + LiDAR/depth — directly aligned with project's "VIO that may opportunistically fuse a non-spoofed GPS update" pattern and AC-NEW-2's spoof-promotion path. **License: 3-clause BSD (permissive)** — no copyleft / dual-use distribution friction. Note: GitHub UI shows "Other (NOASSERTION)" because of the standard BSD clause language pattern; the LICENSE file is canonical 3-clause BSD.
|
||||
- **Related Sub-question**: SQ3+SQ4 / C1 lead candidate (factor-graph + permissive license + active maintenance)
|
||||
|
||||
### Source #48
|
||||
- **Title**: OKVIS2-X: Open Keyframe-based Visual-Inertial SLAM Configurable with Dense Depth or LiDAR, and GNSS (Boche, Jung, Laina, Leutenegger — TUM / ETH Zurich Smart Robotics Lab)
|
||||
- **Link**: https://github.com/ethz-mrl/OKVIS2-X ; arXiv: https://arxiv.org/abs/2510.04612 ; IEEE T-RO 2025 vol 41 pp 6064–6083 DOI 10.1109/TRO.2025.3619051
|
||||
- **Tier**: L1 (peer-reviewed IEEE Transactions on Robotics, Special Issue Visual SLAM 2025)
|
||||
- **Publication Date**: arXiv 2025-10-04; T-RO 2025 vol 41
|
||||
- **Timeliness Status**: ✅ Current (within 6-month Critical-novelty window)
|
||||
- **Version Info**: 295 stars; 38 forks; 2 contributors; created 2025-09-23, last push 2026-03-17. License: NOASSERTION on GitHub UI; per-paper license follows ethz-mrl convention (BSD-3 derived).
|
||||
- **Target Audience**: Multi-sensor SLAM researchers; large-scale VI-SLAM with optional GNSS/LiDAR
|
||||
- **Research Boundary Match**: **Strong match** — extends OKVIS2 monocular+IMU mode with optional GNSS fusion (Visual-Inertial SLAM with Tightly-Coupled Dropout-Tolerant GPS Fusion lineage from IROS 2022). Project's `MAV_CMD_SET_EKF_SOURCE_SET` switch + companion-side spoof-detection conceptually mirrors OKVIS2-X's "GPS as drop-out-tolerant signal".
|
||||
- **Summary**: Non-trivial extension of OKVIS2; submap-based volumetric occupancy mapping. Demonstrates that the OKVIS2 factor-graph backbone can absorb spoofing-aware GPS without re-architecting. Useful as architectural template for project's C5 estimator + C8 adapter integration. License: same as OKVIS2 (BSD-3-derived). Two named contributors (bochsim, SebsBarbas) actively pushing through Mar 2026.
|
||||
- **Related Sub-question**: SQ3+SQ4 / C1 (OKVIS2 lineage; VI-SLAM with optional GPS/LiDAR), SQ8 (GPS-fusion dropout-tolerant lineage)
|
||||
|
||||
### Source #49
|
||||
- **Title**: Kimera-VIO — Visual Inertial Odometry with SLAM capabilities and 3D Mesh generation (MIT-SPARK)
|
||||
- **Link**: https://github.com/MIT-SPARK/Kimera-VIO ; LICENSE.BSD: https://github.com/MIT-SPARK/Kimera-VIO/blob/master/LICENSE.BSD
|
||||
- **Tier**: L1 (canonical implementation by MIT SPARK Lab)
|
||||
- **Publication Date**: original 2020 (Rosinol, Abate, Chang, Carlone — ICRA 2020); ongoing development through 2024–2025 issue threads (Dec 2024 / Feb 2025 ROS2 / mono-inertial discussion).
|
||||
- **Timeliness Status**: ✅ Active maintenance (recent issues / PRs through 2025).
|
||||
- **Version Info**: master-branch-only; LICENSE.BSD = BSD 2-Clause "Simplified".
|
||||
- **Target Audience**: VI-SLAM + mesh-mapping researchers
|
||||
- **Research Boundary Match**: **Partial.** Stereo+IMU is the primary supported configuration; mono+IMU is **optional but documented**. Kimera also produces 3D mesh and high-level semantic labels (relevant to neither C1 nor the project's bandwidth budget — overhead).
|
||||
- **Summary**: Frontend (image processing + IMU pre-integration) + Backend (factor-graph optimization in iSAM2 or GTSAM) + Mesher + Pose-Graph-Optimizer. **License: BSD 2-Clause (permissive)** — no dual-use distribution friction. **Penalty for project**: Source #46 KAIST benchmark found Kimera has highest memory usage among the VIOs tested (numerous computations per keyframe), and Kimera failed to fit on Xavier-NX-class memory under multi-process load. Mesh + semantic features are unused by the project — Kimera's overhead is unjustified vs OKVIS2 / OpenVINS for the project's narrow C1 mandate. **Status**: viable secondary fallback if OKVIS2 / VINS-Mono runtime issues arise; not a lead candidate due to overhead misfit.
|
||||
- **Related Sub-question**: SQ3+SQ4 / C1 secondary candidate (BSD-permissive but resource-heavy)
|
||||
|
||||
### Source #50
|
||||
- **Title**: DROID-SLAM — Deep Visual SLAM for Monocular, Stereo, and RGB-D Cameras (princeton-vl, Teed & Deng)
|
||||
- **Link**: https://github.com/princeton-vl/droid-slam ; arXiv: https://arxiv.org/abs/2108.10869 ; NeurIPS 2021
|
||||
- **Tier**: L1 (canonical reference)
|
||||
- **Publication Date**: NeurIPS 2021; repository latest tagged baseline.
|
||||
- **Timeliness Status**: ✅ Foundational reference; DPV-SLAM (Source #51) is the lighter successor.
|
||||
- **Version Info**: master-branch-only.
|
||||
- **Target Audience**: Deep-learning-based VO/VSLAM researchers
|
||||
- **Research Boundary Match**: **Disqualified by hardware budget.** Inference requires ≥11 GB GPU VRAM per official README; project budget is 8 GB **shared CPU+GPU** on Jetson Orin Nano Super, leaving <8 GB for VO + VPR + matcher + estimator + cache co-resident. DROID-SLAM is also **monocular VO/SLAM, not VIO** — no native IMU fusion; metric scale recovery requires external scale alignment.
|
||||
- **Summary**: Recurrent dense bundle adjustment over a complete history of camera poses. State-of-the-art accuracy on TartanAir / EuRoC / TUM-RGBD at the cost of GPU memory. **Disqualified outright for C1 lead** by AC-4.2 (≤8 GB shared RAM) and the lack of IMU fusion that would require an additional ESKF/UKF wrapping. Kept as **reference baseline** to be cited as "what we cannot afford" in `solution_draft01`.
|
||||
- **Related Sub-question**: SQ3+SQ4 / C1 disqualified candidate
|
||||
|
||||
### Source #51
|
||||
- **Title**: DPVO — Deep Patch Visual Odometry (princeton-vl, Teed, Lipson, Deng) + DPV-SLAM (Lipson, Teed, Deng — ECCV 2024)
|
||||
- **Link**: https://github.com/princeton-vl/DPVO ; LICENSE: https://github.com/princeton-vl/DPVO/blob/main/LICENSE ; ECCV 2024 paper: https://www.ecva.net/papers/eccv_2024/papers_ECCV/papers/00272.pdf
|
||||
- **Tier**: L1 (canonical implementation; NeurIPS 2023 + ECCV 2024)
|
||||
- **Publication Date**: NeurIPS 2023 (DPVO); ECCV 2024 (DPV-SLAM); repository last update 2024-10-12.
|
||||
- **Timeliness Status**: ⚠️ Borderline. ~19 months since last code update; ECCV-2024 publication of DPV-SLAM keeps the algorithm class within the 6-month claim window for the SLAM successor.
|
||||
- **Version Info**: 989 stars; primary languages C++ / Python / CUDA. **License: MIT (permissive)** — no dual-use distribution friction.
|
||||
- **Target Audience**: Deep-learning VO/SLAM with reduced memory footprint
|
||||
- **Research Boundary Match**: **Partial.** DPVO is **monocular VO only — no IMU fusion**. Output pose is in arbitrary scale (no metric scale recovery). To be a viable C1 candidate the project must wrap DPVO with an external IMU+scale-fusion stage (loosely-coupled ESKF / VIO-fusion module). This makes DPVO **not a drop-in C1** like VINS-Mono / OpenVINS / OKVIS2; it is a **VO module that needs a separate VIO wrapper**.
|
||||
- **Summary**: Sparse patch tracking + differentiable bundle adjustment back end. Outperforms DROID-SLAM on TartanAir / EuRoC ATE while using ~1/3 of DROID-SLAM's GPU memory (DROID-SLAM: 8.7 GB VO mode vs DPVO: ~3 GB). DPV-SLAM (Lipson, Teed, Deng — ECCV 2024) adds full SLAM capability with 4–5 GB GPU usage. **Jetson runtime evidence**: indirect via DPVO-QAT++ (Source #52) — peak reserved memory 1.02 GB on RTX 4060 (8 GB) after INT8 fake-quant + custom CUDA kernel fusion; not directly tested on Jetson Orin Nano. **Status for C1**: pure-VO candidate (must be paired with separate IMU integration to deliver metric scale + attitude); would not satisfy "monocular VIO" gate alone, but viable as the *VO half* of a hybrid C1+C5 design.
|
||||
- **Related Sub-question**: SQ3+SQ4 / C1 conditional candidate (VO not VIO; needs external IMU wrapper)
|
||||
|
||||
### Source #52
|
||||
- **Title**: DPVO-QAT++: Heterogeneous QAT and CUDA Kernel Fusion for High-Performance Deep Patch Visual Odometry (Cheng Liao)
|
||||
- **Link**: https://arxiv.org/abs/2511.12653 ; project HTML: https://arxiv.org/html/2511.12653
|
||||
- **Tier**: L2 (single-author preprint, code partially released; no peer-review yet)
|
||||
- **Publication Date**: arXiv 2025-11-16 (within 6-month Critical-novelty window)
|
||||
- **Timeliness Status**: ✅ Current
|
||||
- **Version Info**: arXiv preprint; code & weights released for QAT-only and fused-CUDA variants.
|
||||
- **Target Audience**: Embedded-platform DPVO deployers
|
||||
- **Research Boundary Match**: **Partial.** Hardware tested = RTX 4060 (8 GB) + Intel Core Ultra 5-125H + 32 GB RAM — desktop GPU, NOT Jetson Orin Nano. Direct extrapolation requires Jetson MVE; Orin Nano Super's Ampere GPU is architecturally similar but smaller than RTX 4060.
|
||||
- **Summary**: Quantization-Aware Training framework for DPVO with fused CUDA kernels. Reduces peak GPU memory from 1.94 GB → 1.02 GB (-47%) on a representative TartanAir sequence; +34.6% median FPS on TartanAir, +26.7% on EuRoC; -22.8 ms / -19.7 ms median P99 tail latency on TartanAir / EuRoC respectively. Heterogeneous precision: front-end pseudo-quantization (FP16/FP32 with INT8 simulation) + FP32 back-end geometric solver. **Implication for project**: shows DPVO has a documented Jetson-suitable footprint **path** but not a Jetson-Orin-Nano measurement. ATE accuracy comparable to baseline DPVO across 32 TartanAir + 11 EuRoC validation sequences. Notable: requires a teacher-student distillation training pipeline before deployment — adds operational complexity vs classical VINS-* / OpenVINS / OKVIS2.
|
||||
- **Related Sub-question**: SQ3+SQ4 / C1 supporting evidence for DPVO embedded feasibility
|
||||
|
||||
### Source #53
|
||||
- **Title**: Pure VO baseline — KLT optical flow + 5-point essential matrix or homography RANSAC (OpenCV reference)
|
||||
- **Link**: https://docs.opencv.org/4.x/d4/dee/tutorial_optical_flow.html ; representative public implementation: https://github.com/alishobeiri/Monocular-Video-Odometery (MIT, 2018) ; tutorial reference: https://zxh.me/posts/2022-12-19-monocular-visual-odometry/
|
||||
- **Tier**: L1 (OpenCV official documentation) + L2 (representative public implementations)
|
||||
- **Publication Date**: OpenCV docs continuously updated; tutorial 2022-12; reference implementation 2018 (algorithmic class is foundational, no time window per Step 0.5)
|
||||
- **Timeliness Status**: ✅ Foundational baseline (no time window).
|
||||
- **Version Info**: OpenCV `cv::calcOpticalFlowPyrLK` (KLT) + `cv::findEssentialMat` (5-point Nister) or `cv::findHomography` with RANSAC.
|
||||
- **Target Audience**: Implementers needing a transparent low-complexity fallback
|
||||
- **Research Boundary Match**: **Full match for the simple-baseline candidate.** Suits planar nadir-down UAV at altitude (Ukrainian steppe is ~planar at 1 km AGL — homography is geometrically appropriate; for non-planar relief the essential matrix path is more appropriate but adds scale-recovery work).
|
||||
- **Summary**: Established classical pipeline: Shi-Tomasi or FAST corner detection → KLT pyramidal optical flow tracking → 5-point essential matrix or homography RANSAC → relative pose with arbitrary scale (must be metric-scale-aligned via IMU integration externally). Reference implementations widely available in OpenCV samples and pedagogical repos. **Status**: candidate as the project's `Simple baseline / known-runnable / known-failure-mode` C1 option per Component Option Breadth rule. Not a lead, but mandatory fallback presence per the research engine's "include at least one simple baseline" rule.
|
||||
- **Related Sub-question**: SQ3+SQ4 / C1 simple-baseline candidate
|
||||
|
||||
Reference in New Issue
Block a user