Files
gps-denied-onboard/_docs/00_research/02_fact_cards.md
T
Oleksandr Bezdieniezhnykh e0a6f0d9d5 Update autodev state and candidate enumeration for C1 VIO
Revised the autodev state to reflect the transition to phase 12, detailing the candidate enumeration for C1 (VIO) with a focus on context7 capability verification and restrictions assessment. Updated the source registry to indicate progress on C1 candidates, including the addition of new sources and their evaluation status. Enhanced fact cards with detailed assessments of VINS-Mono and VINS-Fusion, highlighting their suitability and licensing considerations for dual-use deployment. Deferred context7 verification and structured sub-matrix tasks to the next session.
2026-05-08 01:12:43 +03:00

91 KiB
Raw Blame History

Fact Cards

Mode A Phase 2 — engine Step 3 (Fact Extraction & Evidence Cards). Extracted from sources logged in 01_source_registry.md. Confidence labels: High (L1 / verified source code), ⚠️ Medium (L1/L2 with caveat), Low (L3/L4 inferential).

Bound to sub-questions in 00_question_decomposition.md. Many SQ6 facts also bind directly to the Project Constraint Matrix (acceptance_criteria.md / restrictions.md); per the engine's "Per-Mode API Capability Verification" rule, MAVLink/MSP messages are treated as candidate modes and are bound Pass/Fail/Verify/N/A against numbered ACs and restrictions.


SQ6 — ArduPilot Plane vs iNav external positioning

  • Statement: ArduPilot's AP_GPS_MAV driver (master) decodes MAVLINK_MSG_ID_GPS_INPUT and stores the resulting state into the GPS slot identified by gps_id. Decoded fields: lat/lon (degE7), alt (mm → cm internally), hdop/vdop, velocity (vn/ve/vd cm/s), speed/horizontal/vertical accuracy (m / m/s), yaw (cdeg, 0 sentinel = "not provided"). Honors ignore_flags for ALT/HDOP/VDOP/VEL_HORIZ/VEL_VERT/SPEED_ACCURACY/HORIZONTAL_ACCURACY/VERTICAL_ACCURACY. Requires fix_type ≥ 3 and time_week > 0 for jitter-corrected timestamping.
  • Source: Source #4 (AP_GPS_MAV.cpp master), Source #1 (Plane Non-GPS Navigation docs)
  • Phase: Phase 2
  • Target Audience: ArduPilot Plane operators / developers
  • Confidence:
  • Related Dimension: C8 (FC adapter), C5 (estimator covariance contract)
  • Fit Impact: supports selection — ArduPilot side of AC-4.3 is satisfied by GPS_INPUT as the primary external-positioning message; covariance fields (horiz_accuracy, vert_accuracy, speed_accuracy) are wired through.

Fact #2 — ArduPilot's covariance honesty (AC-NEW-4) is enforced via the horiz_accuracy field of GPS_INPUT

  • Statement: When GPS_INPUT_IGNORE_FLAG_HORIZONTAL_ACCURACY is unset, AP_GPS stores packet.horiz_accuracy into state.horizontal_accuracy and sets state.have_horizontal_accuracy = true. EKF3's quality chain consumes this via (a) ground-stationary 3 m drift check (_gpsCheckScaler-modulated), (b) innovation gating (POS_I_GATE/VEL_I_GATE), (c) soft de-weighting via EK3_GLITCH_RADIUS (PR #24135). Under-reporting horiz_accuracy defeats these gates — exactly the AC-NEW-4 risk the project flagged.
  • Source: Source #4, Source #23 (PR #24135), Source #24 (AP_NavEKF3 master)
  • Phase: Phase 2
  • Target Audience: System designers writing the C5 estimator → C8 adapter
  • Confidence: (source code + L1 docs); ⚠️ for the precise innovation-gate mechanics (deferred to design-phase SITL tuning)
  • Related Dimension: C5 covariance, AC-NEW-4
  • Fit Impact: architectural constraint — the C5 estimator MUST publish honest horiz_accuracy (not optimistic) for AP's EKF3 quality chain to function. Aligns directly with AC-1.4 / AC-NEW-4.

Fact #3 — ArduPilot supports runtime EKF source-set switching from companion via MAV_CMD_SET_EKF_SOURCE_SET

  • Statement: EKF3 supports up to three source sets (EK3_SRC1..3_*). A companion can request a switch by sending MAV_CMD_SET_EKF_SOURCE_SET. Alternative paths: RC aux-switch option 90 ("EKF Pos Source"), Lua scripts (e.g., ahrs-source.lua). Caveat from L1 docs: "no GCSs are currently known to implement this" — companion-driven switching works at the firmware level but is not exposed in stock GCS UIs.
  • Source: Source #2, Source #3
  • Phase: Phase 2
  • Target Audience: System designers handling AC-NEW-2 spoof-promotion path on ArduPilot
  • Confidence:
  • Related Dimension: C8 + AC-NEW-2
  • Fit Impact: supports selection — AP allows the project to model two source sets (set 1 = real GPS, set 2 = onboard GPS_INPUT) and switch automatically. Keeps companion lightweight; switching does not require the companion to suppress real-GPS itself.

Fact #4 — ArduPilot ODOMETRY-velocity-only fusion is currently NOT supported (open enhancement)

  • Statement: Issue #23485 confirms current limitation: feeding ODOMETRY without position causes EKF position-estimate timeout / failsafe. Implication: the project's visual_propagated mode (VO drift between satellite anchors, no global position) cannot be expressed as ODOMETRY-velocity-only on current AP — must be sent as a full GPS_INPUT with covariance widened to reflect drift uncertainty.
  • Source: Source #8
  • Phase: Phase 2
  • Target Audience: System designers
  • Confidence: (open enhancement, open as of accessed date)
  • Related Dimension: C5 + C8 + AC-1.3 (visual_propagated label) + AC-1.4 (covariance ellipse)
  • Fit Impact: architectural constraintvisual_propagated and dead_reckoned labels both ride GPS_INPUT with growing horiz_accuracy, NOT a separate ODOMETRY channel. Single-message contract = simpler. AC-NEW-8 thresholds (horiz_accuracy = 999.0 for "no fix") map directly.

Fact #5 — iNav firmware (master, post-9.0) has NO inbound MAVLink handler for any external-positioning message

  • Statement: Authoritative inbound switch in src/main/telemetry/mavlink.c::processMAVLinkIncomingTelemetry (master) handles only: HEARTBEAT, PARAM_REQUEST_LIST (stub reply), MISSION_CLEAR_ALL, MISSION_COUNT, MISSION_ITEM, MISSION_REQUEST_LIST, MISSION_REQUEST, COMMAND_INT (only MAV_CMD_DO_REPOSITION), RC_CHANNELS_OVERRIDE, ADSB_VEHICLE, RADIO_STATUS. No GPS_INPUT, VISION_POSITION_ESTIMATE, ODOMETRY, GLOBAL_POSITION_INT, or GPS_RAW_INT are accepted as inputs. Wiki page (Source #10) confirms: "Limited command support: Commands that are not implemented are ignored."
  • Source: Source #9 (master code), Source #10 (wiki, edited 2025-12-11)
  • Phase: Phase 2
  • Target Audience: System designers + AC-4.3 author
  • Confidence:
  • Related Dimension: C8, AC-4.3
  • Fit Impact: DISQUALIFIES the literal AC-4.3 wording ("the standard external-positioning message type(s) accepted by ArduPilot AND iNav"). No single MAVLink external-positioning message is accepted by both FCs. Project must adopt a per-FC adapter design and AC-4.3 must be revised to acknowledge two transports.

Fact #6 — iNav accepts external GPS injection via two MSP paths; MSP2_SENSOR_GPS is the covariance-rich path

  • Statement: MSP_SET_RAW_GPS (201) (legacy MSP1, 14 bytes): fixType, numSat, lat, lon, alt (m, internal cm), speed (cm/s). No covariance, no per-axis velocity, no yaw. MSP2_SENSOR_GPS (7939, MSPv2 sensor plugin): instance, gpsWeek, msTOW, fixType, satellitesInView, hPosAccuracy (mm), vPosAccuracy (mm), hVelAccuracy (cm/s), hdop, lat, lon, mslAltitude (cm), nedVelNorth/East/Down (cm/s), groundCourse (cdeg×100), trueYaw (cdeg×100), date+time. Routes through mspGPSReceiveNewData() via GPS_PROVIDER_MSP. Requires build flag USE_GPS_PROTO_MSPenabled by default in iNav's target/common.h, so stock firmware reaches this path.
  • Source: Source #12 (MSP message reference, master), Source #13 (target/common.h master + gps.c provider table)
  • Phase: Phase 2
  • Target Audience: System designers (C8 adapter, MSP transport)
  • Confidence:
  • Related Dimension: C8, C5 covariance contract
  • Fit Impact: supports selection of MSP2_SENSOR_GPS for the iNav adapter. Covariance fields (hPosAccuracy, vPosAccuracy, hVelAccuracy) align semantically with GPS_INPUT.horiz_accuracy / vert_accuracy / speed_accuracy, but unit conversions differ (mm vs m). The C8 adapter must therefore be FC-aware, not protocol-monomorphic.

Fact #7 — iNav does NOT support dual-GPS arbitration; companion must be the SOLE GPS source

  • Statement: Issue #10141 is an open feature request for dual-GPS support. Current iNav (master incl. 9.0.x) has single-GPS architecture with one UART selected as the GPS port. There is no primary/secondary failover and no per-instance arbitration in the nav stack.
  • Source: Source #14
  • Phase: Phase 2
  • Target Audience: System designers (architecture)
  • Confidence:
  • Related Dimension: C8, C5, AC-NEW-2 (spoof promotion)
  • Fit Impact: architectural constraint — on iNav, real GPS receivers must NOT be wired directly to the FC. Real GPS goes to the companion; the companion fuses (or rejects) it and emits the single iNav-facing feed via MSP2_SENSOR_GPS (or via a UBX-emulation UART). AC-NEW-2 latency on iNav = companion's internal reaction time only; iNav does not participate in source switching at all.

Fact #8 — iNav explicitly does NOT validate GPS for spoofing; anti-spoofing is fully the companion's responsibility

  • Statement: iNav's docs/GPS_fix_estimation.md states verbatim: "Not a solution for GPS spoofing (GPS output is not validated in INAV)." Combined with Fact #7, the architectural conclusion on iNav: companion = anti-spoofing oracle + nav-camera estimator + IMU-propagation source, all collapsed into the single MSP2_SENSOR_GPS feed.
  • Source: Source #15
  • Phase: Phase 2
  • Target Audience: System designers; AC-NEW-2 / AC-3.5 / AC-NEW-8 owners
  • Confidence:
  • Related Dimension: AC-NEW-2, AC-3.5, AC-NEW-8
  • Fit Impact: supports selection of "companion as iNav's only GPS"; disqualifies any architecture that relies on iNav-side spoof detection for AC-NEW-2 reaction.

Fact #9 — iNav dead-reckoning has documented stability bugs under intermittent feeds; AC-NEW-8 must avoid letting iNav enter dead-reckoning

  • Statement: Issue #10588 documents porpoising and motor-burst behaviour during intermittent GPS outages on iNav fixed-wing dead-reckoning. The community recommendation captured in the issue: "GPS should be rejected if providing erroneous coordinates rather than no fix." inav_allow_dead_reckoning (default OFF) and inav_allow_gps_fix_estimation (default OFF) are both fixed-state booleans — entering dead-reckoning mid-flight is a discrete transition, not a smooth degrade.
  • Source: Source #15, Source #16 (Settings.md), Source #17 (#10588)
  • Phase: Phase 2
  • Target Audience: System designers; AC-NEW-8 owner
  • Confidence: for setting names; ⚠️ for severity of stability bug (single open issue)
  • Related Dimension: AC-NEW-8, AC-3.5, C8
  • Fit Impact: architectural constraint — on iNav, the AC-NEW-8 path must keep emitting MSP2_SENSOR_GPS with growing hPosAccuracy rather than letting the feed drop and iNav switch to dead-reckoning. The "no fix" semantics on iNav must be expressed via fixType field of MSP2_SENSOR_GPS (not by silence). The horiz/vert accuracy fields are the only signal available; iNav has no equivalent of the AP horiz_accuracy = 999.0 "no fix" sentinel — must verify which fixType enum values iNav treats as no-fix.

Fact #10 — iNav supports UBX-only over UART (NMEA dropped in 7.0); UBX emulation is a viable third transport

  • Statement: iNav 7.0 removed NMEA. Currently supports u-blox UBX protocol with version ≥ 15.00 in 9.0+. Recommended physical receivers: u-blox M8/M9/M10. Companion can implement a UBX-emulation writer on the iNav GPS UART (NAV-PVT mandatory; NAV-DOP optional). UBX carries hAcc/vAcc/headAcc/velocity components — covariance honesty preserved.
  • Source: Source #11 (iNav GPS-and-Compass-setup wiki)
  • Phase: Phase 2
  • Target Audience: System designers (transport-choice)
  • Confidence: for UBX-only; ⚠️ for "minimum NAV-* set" — the canonical U-blox protocol spec (Source filed in agent-tools as fd8513f8-...txt) plus iNav's gps_ublox.c drive the precise message set; this is a follow-up search before final selection.
  • Related Dimension: C8 transport choice
  • Fit Impact: alternate candidate, NOT YET SELECTED — UBX path bypasses MSP queueing/arbitration concerns and treats the companion as a normal GPS to iNav. Trade-off: implementation cost (UBX writer + correct ACK behaviour) vs. MSP path (already-designed wire format, but iNav-specific).

SQ6 — Conclusions (working summary, will be re-checked at Step 7.5)

Per-FC adapter design is unavoidable (single-message AC-4.3 wording is unsatisfiable)

FC Inbound external-positioning transport Message Covariance fields Per-axis velocity Yaw Source-switching from companion
ArduPilot Plane MAVLink (TELEM/USB/UDP serial) GPS_INPUT (id 232) — primary horiz_accuracy, vert_accuracy, speed_accuracy (m/m·s⁻¹) vn, ve, vd (cm/s) yaw cdeg, 0 = not provided MAV_CMD_SET_EKF_SOURCE_SET (FW supports; stock GCS UIs do not — companion-driven OK)
iNav MSP2 (UART/USB) MSP2_SENSOR_GPS (id 7939) — primary candidate hPosAccuracy mm, vPosAccuracy mm, hVelAccuracy cm/s nedVelNorth/East/Down cm/s trueYaw cdeg×100 N/A — iNav has single-GPS arch; companion = sole GPS source
iNav alt 1 MSP1 MSP_SET_RAW_GPS (id 201) — rejected for production none none none N/A
iNav alt 2 UART UBX emulation (NAV-PVT etc.) — alternate candidate, requires NAV- subset verification* UBX hAcc/vAcc/headAcc mm/cm/scale NED in NAV-PVT yes N/A

Selection (preliminary, pending Step 7.5 component-fit gate):

  • AP path: GPS_INPUT — Selected (lead).
  • iNav path: MSP2_SENSOR_GPS — Selected (lead). UBX-emulation kept as fallback if MSP2_SENSOR_GPS proves rate-limited or quality-flag-lossy.

AC / Restriction binding (per-mode, Per-Mode API Capability Verification rule)

Numbered AC / Restriction AP GPS_INPUT iNav MSP2_SENSOR_GPS iNav MSP_SET_RAW_GPS
AC-1.4 (95% cov + source label {satellite_anchored, visual_propagated, dead_reckoned}) Pass (horiz_accuracy carries 95% covariance proxy; source label is companion-side metadata, not in MAVLink — emit via STATUSTEXT/NAMED_VALUE_FLOAT) Pass (hPosAccuracy = covariance proxy; same off-band source-label channel) Fail (no covariance field → cannot publish 95% ellipse)
AC-NEW-4 (false-position safety budget; covariance honesty) Pass (de-weighted via EK3_GLITCH_RADIUS if covariance is honest) Verify (need to confirm iNav nav-stack actually uses hPosAccuracy for outlier handling — pre-Step-7.5 follow-up) Fail
AC-NEW-2 (<3 s p95 spoof promotion) Verify via SITL (MAV_CMD_SET_EKF_SOURCE_SET round-trip latency under load) Pass by architecture (companion is sole GPS, no FC-side switch needed) Pass-by-arch but Fails AC-1.4
AC-NEW-8 (visual-blackout + spoofed GPS failsafe; covariance growth + degraded fix levels) Pass (fix_type 0/1/2 + horiz_accuracy=999.0 documented sentinel maps to AC-NEW-8 thresholds) Verify (iNav's fixType enum mapping for "no fix" — pre-Step-7.5 follow-up) Fail (no graceful degrade signal)
AC-3.5 (label switch within ≤1 frame OR ≤400 ms; reject spoofed GPS as input) Pass by architecture (EKF source switch + STATUSTEXT) Pass by architecture (companion suppresses spoofed-GPS contribution upstream) Pass-by-arch but Fails AC-1.4
AC-4.3 (FC accepts the chosen messages) Pass Pass (default build, USE_GPS_PROTO_MSP on) Pass but Fails AC-1.4 — discard
Restriction "Supported FCs: ArduPilot, iNav (both via standard MAVLink)" Pass Fail of "via standard MAVLink" — restriction's literal wording is incorrect because iNav has no inbound MAVLink external-positioning. The restriction must be revised to "ArduPilot via MAVLink GPS_INPUT; iNav via MSP2_SENSOR_GPS". n/a

Required AC / Restrictions edits flagged for user review

  1. AC-4.3 — current text says "the standard external-positioning message type(s) accepted by ArduPilot and iNav". Reality: no single message type is accepted by both. Proposed revision (outcome-shaped, IEEE-830-style): "WGS84 coordinates are delivered to each supported FC via that FC's documented external-positioning interface — MAVLink GPS_INPUT for ArduPilot Plane, MSP2 MSP2_SENSOR_GPS for iNav. Honest covariance is carried in the field each FC uses for outlier rejection (under-reported covariance is a defect — see AC-NEW-4). Source-label semantics per AC-1.4 are emitted out-of-band (FC-appropriate STATUSTEXT / NAMED_VALUE_FLOAT / equivalent)."
  2. Restriction "Communication protocol (pinned): MAVLink for both FC and GCS" — incorrect for iNav. Proposed revision: "Communication protocol: MAVLink for ArduPilot Plane and for QGroundControl GCS; MSP2 for iNav (UART or USB transport). MAVLink remains the GCS-facing protocol for both FCs." (iNav still emits MAVLink telemetry outbound to QGC; this is preserved.)
  3. AC-NEW-2 — keep numerical budget (<3 s p95) but split per-FC validation: ArduPilot validation = SITL round-trip of MAV_CMD_SET_EKF_SOURCE_SET from companion under spoof injection; iNav validation = companion-internal reaction time (companion-only metric — iNav doesn't participate).
  4. AC-NEW-8 — language "fix-quality 2D fix or worse when covariance > 100 m" maps to GPS_INPUT.fix_type for AP. iNav's fixType enum mapping (per gpsFixType_e in iNav's enums-reference) must be confirmed at design time before this AC is testable on iNav.

Open follow-up probes (deferred to SQ8 + design phase, NOT blocking SQ6 closure)

  • (SQ8) Confirm the precise MAVLink message + field set ArduPilot exposes for spoofing/jamming integrity reports (PR #2110 merged, but GPS_RAW_INT in current published common.xml shows no spoofing bits — likely lives in a sibling message such as GPS_INTEGRITY). This is the FC→companion direction needed for AC-NEW-2's input side and AC-3.5's spoofing detection.
  • (SQ8) UBX-emulation minimum NAV-* subset for iNav 9.0 (UBX ≥ 15.00). Authoritative inputs: U-blox protocol spec (cached) + iNav gps_ublox.c (cached). Output a "minimum companion-side UBX writer" definition.
  • (design) SITL parameter sets for both FCs for AC-NEW-2 / AC-NEW-8 validation. Out of research scope.
  • (design) Verify iNav nav-stack consumption of MSP2_SENSOR_GPS.hPosAccuracy for outlier handling (read src/main/io/gps_msp.c / mspGPSReceiveNewData in design phase, not research phase).

Boundary check: this SQ6 is saturated for the architectural decision

Saturation signals observed: ArduPilot side covered by L1 docs + L1 source code; iNav side covered by L1 source code (master) + L1 wiki (edited 2025-12-11) + L1 release notes (8.0/9.0). Three independent rounds of search yielded the same architectural conclusion (no inbound external-positioning MAVLink on iNav). Last queries returned no novel facts. Per references/source-tiering.md "Search saturation rule" → SQ6 is closed pending the SQ8 follow-up probes above; user decision required on the AC/restriction edits before further architectural work.


SQ1 — Existing / competitor GPS-denied UAV navigation systems

Fact #11 — Twist Robotics OSCAR is a deployed Ukrainian peer system in the same architectural class as this project

  • Statement: Twist Robotics (Ukraine) has a fielded camera + map-matching navigation module called OSCAR (Optical System of Coordinates with Automatic Relocalisation). The vendor states the system "captures the terrain, identifies landmarks, compares them with a map, determines coordinates, and transmits them to the autopilot as a reliable GPS signal" — the same five-stage architecture this project is building. Vendor-stated specs: ≤20 m accuracy without cumulative error, day/night/fog operation, and operational deployment of "more than 500,000 km across 25,000 combat missions over 24 months". Hardware includes active cooling, indicating a non-trivial onboard compute (likely Jetson-class). No public independent benchmark of the 20 m number.
  • Source: Source #25, Source #26
  • Phase: Phase 2
  • Target Audience: System architects + AC owners (existence-of-peer evidence, not implementation guide)
  • Confidence: for "deployed at scale on Ukrainian combat platforms"; ⚠️ for "20 m accuracy" (vendor self-report); for "fully resistant to spoofing and jamming" (claim not independently verified)
  • Related Dimension: SQ1, SQ8 (anti-spoofing claim audit), SQ9 (synthesis — ours must beat or at least match this in the operational regime)
  • Fit Impact: establishes feasibility floor — a Ukrainian peer is operating a similar architecture against the same threat environment our system targets. Project framing must explicitly differentiate (e.g., 1 km AGL vs unspecified OSCAR altitude; 8 h endurance vs unspecified OSCAR endurance; AC-NEW-4 honest covariance contract vs OSCAR's unspecified covariance reporting).

Fact #12 — Auterion Artemis is a production-shipping fixed-wing one-way attack drone with Ukraine-validated GPS-denied navigation, defining the production benchmark for this class

  • Statement: Auterion completed the US Defense Innovation Unit Artemis program in October 2025, delivering a Shahed-class deep-strike drone with up to 1,000-mile range and up to 40 kg warhead, running on Auterion Skynode N mission computer + Auterion Visual Navigation system + built-in terminal guidance. Government evaluators signed off after operational flight tests in Ukraine including ground launch, GPS and GPS-denied navigation, long-range transit, and terminal engagement. Manufacturing is being established in US, UA, and DE; Auterion is offering the system to the US Department of War and allied nations.
  • Source: Source #31; Source #32 confirms Skynode S sibling architecture (NPU-equipped companion).
  • Phase: Phase 2
  • Target Audience: System architects (production-pattern reference)
  • Confidence:
  • Related Dimension: SQ1 (closest commercial production peer), SQ9 (architecture template)
  • Fit Impact: establishes production reference architecture — companion-class autopilot + visual navigation + terminal guidance is shipping at production scale to a US defense customer. Implication: building a per-FC adapter (project decision in SQ6) is consistent with what production stacks already do; integrating against the Artemis architecture is realistic; competing on price + Ukraine-specific operational tuning + AC-NEW-4 honest-covariance contract is a viable differentiation.

Fact #13 — Vantor Raptor is a production COTS visual-GPS-replacement software suite, demonstrating that "branded sat-tile basemap + on-drone vision software" is a viable commercial pattern

  • Statement: Vantor Raptor product family (Guide / Sync / Ace) provides vision-based GPS replacement using the drone's existing camera plus Vantor's "100 million-plus sq km of highly accurate 3D terrain data" (Vivid Terrain, vendor-stated 3 m accuracy). Vendor-demonstrated absolute accuracy: <7 m in all dimensions for aerial position (Guide), <3 m for ground coordinate extraction (Sync, Ace). Works at night and at low altitudes. Platform-agnostic, deployable on commodity hardware, integrates with existing onboard cameras. Inertial Labs has published a VINS-integrated Raptor Guide white paper. Recent partnerships: Niantic Spatial (Dec 2025) for unified air-to-ground positioning in GPS-denied areas; Maxar partnership with AIDC (Sep 2025) for Taiwan UAV resilience against GPS interference.
  • Source: Source #30
  • Phase: Phase 2
  • Target Audience: Architecture / business decision-makers (build-vs-buy framing)
  • Confidence: for product existence + claimed accuracy bounds (vendor primary); ⚠️ for whether Vantor's commercial accuracy figures hold under the project's specific Ukrainian-steppe + active-conflict-tile-staleness conditions
  • Related Dimension: SQ1 (commercial), C2/C3 (commercial alternatives to building ourselves), SQ8 (basemap as a service vs offline cache)
  • Fit Impact: build-vs-buy lens — Raptor Guide's <7 m claim is better than the project's AC-1.1 budget (≤80 m / 95% under AC-1.1.1), so it's not a disqualifier on accuracy. Reasons we still build vs buy: (a) Vantor is a US vendor; export / dual-use licensing into the Ukrainian battlefield is uncertain; (b) restrictions specify offline cache from the project's own Azaion Suite Satellite Service (AC-2.x), not Vantor's Vivid Terrain — replacing the basemap is non-negotiable; (c) covariance honesty contract (AC-NEW-4) and source-label contract (AC-1.4) are project-specific and may not be exposed by Vantor's API. Outcome: keep Raptor as a competitive comparator in solution_draft01, NOT as a candidate component to integrate.

Fact #14 — snktshrma/ngps_flight (NGPS — ArduPilot GSoC 2024) is the closest open-source pipeline match to this project's exact C1+C2+C3+C5+C8 stack

  • Statement: NGPS = ROS 2 + ArduPilot pipeline composed of three packages: ap_ngps_ros2 (visual geo-localization at 12 Hz by matching live camera frames to georeferenced satellite imagery using LightGlue + SuperPoint, deep-learning-based feature matching), ap_ukf (Unscented Kalman Filter fusing NGPS absolute positions with VIO estimates), ap_vips (VIO providing relative pose). Output is fused odometry to ArduPilot's EKF (per related ArduPilot issue #23471, this is via VISION_POSITION_ESTIMATE requiring EKF source-set 2/3 with EK3_SRC*_POSXY=Vision). Project is published under ArduPilot's GSoC 2024 program. Sibling ap_nongps is an earlier OpenCV-based prototype.
  • Source: Source #33
  • Phase: Phase 2
  • Target Audience: Implementer / Engineer
  • Confidence: for project existence, component breakdown, and matcher choice (LightGlue+SuperPoint); ⚠️ for runtime behaviour under our exact constraints (Jetson Orin Nano, 1 km AGL, 17 m/s, 3 fps); for production hardening / covariance honesty / spoof-defence (none documented)
  • Related Dimension: SQ1 (closest open-source peer), SQ2 (canonical pipeline confirmation), SQ3+SQ4 (architectural template for component candidate matrix), SQ6 (alternate AP transport debate)
  • Fit Impact: architectural template — confirms the project's split (C1 VIO ↔ C2/C3 visual absolute ↔ C5 fusion ↔ C8 FC adapter) is canonical, not novel. Two concrete deltas:
    1. Transport choice on AP: NGPS uses VISION_POSITION_ESTIMATE. SQ6 picked GPS_INPUT because it carries horiz_accuracy directly, supports source-set switching via MAV_CMD_SET_EKF_SOURCE_SET, and avoids EKF-source-set reconfiguration. The trade-off (NGPS's path vs SQ6's pick) must be re-examined at design time before final AP-transport selection.
    2. Estimator choice: NGPS uses UKF; SQ3/SQ4 will compare UKF vs ESKF vs MSCKF vs factor-graph (GTSAM) on the same matrix.

Fact #15 — RGB satellite-image matching as a low-altitude (<25 m AGL) localization technique is unreliable per the SPRIN-D Challenge; our 1 km AGL operates in the regime where the same authors note it "works reasonably well"

  • Statement: The CTU Prague team's SPRIN-D winning paper directly states: "Some teams used RGB satellite image-based matching, but this has proved to be highly unreliable at such low altitudes." (referring to <25 m AGL). The paper's related-work review separately notes that "high-altitude matching... works reasonably well, but at low altitudes (25 m) the viewpoint differs drastically, making roofs, facades, and vegetation inconsistent with satellite imagery." The project operates at ≤1 km AGL — which is the high-altitude regime in the paper's terminology — making RGB sat-matching the appropriate technique class. The paper's CPU-only winning method (LiDAR heightmap-gradients + clustered particle filter) is not transferable to our hardware: our project has no LiDAR.
  • Source: Source #28
  • Phase: Phase 2
  • Target Audience: Implementer / Engineer + Domain expert
  • Confidence:
  • Related Dimension: SQ1, SQ5 (failure modes), SQ2 (canonical pipeline)
  • Fit Impact: disambiguates a potentially-disqualifying lesson — the CTU paper's "RGB sat-matching is unreliable" finding does NOT disqualify our approach because the failure was caused by low-altitude viewpoint mismatch, which our 1 km AGL regime does not have. This must be cited explicitly in solution_draft01 to pre-empt the natural objection from anyone who reads the paper. Separately, the CTU paper's specific lessons are still binding: VIO degrades catastrophically without IMU vibration isolation; magnetometer is unreliable near steel/concrete; "ability to recover from periods of high uncertainty and re-localize" matters more than instantaneous RMSE — this last lesson is a direct architectural input for AC-NEW-2 / AC-NEW-8.

Fact #16 — RTAB-Map and ORB-SLAM3 both fail beyond 1 km / above 2 m/s flight in the SPRIN-D environment; our cruise profile (≤17 m/s, kilometers between satellite anchors) explicitly excludes both as primary candidates

  • Statement: The SPRIN-D paper states: "We tested state-of-the-art visual SLAM systems such as RTAB-Map and ORB-SLAM3 in a high-fidelity simulator, and found that both performance degraded significantly in a long-range scenario (beyond 1 km), as their memory and compute demands grow with the size of the environment. Moreover, RTAB-Map was unable to maintain quality odometry in faster flight speeds (beyond 2 m/s), while ORB-SLAM3 suffered from tracking loss in textureless areas."
  • Source: Source #28
  • Phase: Phase 2
  • Target Audience: Implementer / Engineer (component selection for C1)
  • Confidence:
  • Related Dimension: SQ1, SQ3+SQ4 component C1 (VO/VIO), SQ5 (failure modes)
  • Fit Impact: prunes the C1 candidate landscape — RTAB-Map and ORB-SLAM3 should not be pursued as C1 leads. Plausible C1 leads remain: VINS-Mono / VINS-Fusion / OpenVINS / OKVIS2 / DROID-SLAM / DPVO / pure VO baseline (KLT + RANSAC homography). NGPS (Fact #14) uses ap_vips = OpenVINS-class VIO — confirming an aligned community choice. Final C1 selection happens in SQ3+SQ4.

Fact #17 — DSMAC + TERCOM lineage: pre-cached scene matching for downward-looking navigation is a 40+ year deployed technique class with documented sub-10 m terminal accuracy

  • Statement: DSMAC (Digital Scene Matching Area Correlator) is an autonomous missile-guidance system based on area correlation of sensed downward-camera ground scenes against pre-stored reference imagery (often satellite reconnaissance). It achieves 310 m terminal accuracy by correlating buildings, road intersections, and distinctive terrain landmarks. Tomahawk: TERCOM (radar altimeter + DEM) for mid-flight + DSMAC for terminal guidance reduces CEP from ~30 m to "only meters". Documented combat record: 1991 Gulf War, >80% of 280 launched Tomahawks hit target. Recent miniaturisation: Destinus Ruta (300 km strike-class) is integrating UAV Navigation's (Spanish, Grupo Oesía) DSMAC-class system, validated in Ukrainian combat conditions including GNSS-denied / jamming / spoofing.
  • Source: Source #36, Source #27
  • Phase: Phase 2
  • Target Audience: Domain expert + Decision-maker
  • Confidence: for the lineage and Tomahawk performance numbers (DTIC + open-source); ⚠️ for the Ruta-specific "DSMAC operating principle" inference (Defense Express analyst inference, not vendor disclosure)
  • Related Dimension: SQ1 (lineage), SQ8 (baseline accuracy expectations for AC-1.1.1 80 m / AC-NEW-4 false-position budget)
  • Fit Impact: establishes baseline accuracy expectations — the technique class has documented sub-10 m accuracy in the cruise-missile-terminal regime. Our budget (AC-1.1.1: <80 m at 1 km AGL with ≥0.5 m/px tiles) is loose by comparison, indicating that the AC budget is not aggressive against the technique-class baseline — it is aggressive against the Jetson Orin Nano + 8-h-continuous + 25 W envelope. Implication for AC-NEW-4: claiming P(error >500 m) <0.1% per flight is consistent with the DSMAC-lineage class; an honestly-reported failure rate at this level is realistic, not unprecedented.

Fact #18 — Hierarchical Image Matching (arXiv 2506.09748, June 2025) is a current academic SOTA pipeline for our exact problem, but uses DINOv2 — a heavyweight foundation model that must be benchmarked under our 25 W / 8 GB Jetson envelope before any selection

  • Statement: 2025 academic SOTA pipeline structure: (1) image retrieval module (off-the-shelf, optimal-transport feature aggregation); (2) Semantic-Aware and Structure-Constrained Matching Module (SASCM) using DINOv2 features + 4D correlation tensor + SoftMNN + 4D conv; (3) lightweight fine-grained matching module for pixel-level. Constructs UAV absolute visual localization without VIO/relative-localization dependence (retrieval-and-matching only). Evaluation on AerialVL + their own CS-UAV dataset claims superior accuracy under cross-source and cross-temporal variation.
  • Source: Source #29
  • Phase: Phase 2
  • Target Audience: Implementer / Engineer + Domain expert
  • Confidence: for pipeline structure and method; ⚠️ for "superior" claim (single-paper benchmark; AerialExtreMatch evaluates 16 methods with broader rigor — Source #34 is the better cross-method ranker); for Jetson-Orin-Nano runtime (no published number)
  • Related Dimension: SQ1 (academic SOTA), C2 (VPR), C3 (cross-domain registration), SQ5 (foundation-model-on-Jetson failure mode)
  • Fit Impact: academic-SOTA snapshot, candidate template — the retrieval → semantic-aware coarse → fine-grained pipeline is a candidate template for our C2+C3, but DINOv2 introduces a Jetson-deployment risk that must be quantified before commitment. Candidate-level decision: include DINOv2-based pipelines (AnyLoc, BoQ, this paper's SASCM) in the C2/C3 candidate matrix with mandatory MVE on Jetson Orin Nano under our exact frame size and 3 fps cadence. Reject DINOv2 if total inference latency cannot be brought under (400 ms - other-stages budget) at INT8 / fp16. Per Source #28 lesson, classical matchers (LightGlue+SuperPoint as in NGPS) should also be in the matrix as the "simple baseline / known-Jetson-runnable" option.

Fact #19 — AerialExtreMatch (2025) is the academic benchmark our C2+C3 candidate matrix must publish numbers against, with 32 difficulty-stratified cells exposing exactly the cross-source / cross-pitch / cross-scale failure modes our project will face

  • Statement: AerialExtreMatch publishes (a) 1.5 M synthetic train pairs (RGB+depth, diverse UAV/satellite viewpoints); (b) ~30,000 evaluation pairs in 32 difficulty levels stratified by overlap (4 bins: <20%, 2040%, 4060%, >60%), pitch difference (4 bins: 5055°, 5560°, 6065°, 6570°), and scale variation (2 bins: 12×, >2×); (c) a real-world UAV-localization split captured with DJI M300 RTK + H20T against UAV-derived orthomosaic/DSM AND lower-quality satellite maps. The benchmark evaluates 16 representative detector-based and detector-free image matching methods.
  • Source: Source #34
  • Phase: Phase 2
  • Target Audience: Domain expert + Implementer
  • Confidence:
  • Related Dimension: SQ1 (academic landscape), SQ7 (datasets), C2 (VPR), C3 (cross-domain registration)
  • Fit Impact: defines the C2/C3 evaluation matrix — every C2/C3 candidate going into solution_draft01 must report numbers on AerialExtreMatch's 32 difficulty cells, with at least the high-pitch (6570°) and high-scale (>2×) cells representing our worst-case (UAV vs satellite tile geometry mismatch + ortho-rectification residual). The dataset's real-world UAV-localization split with both UAV-orthomosaic AND satellite-map references mirrors our project's offline-cache-tile semantics directly.

Fact #20 — DARPA FLA + USAF SBIR establish the US-defense-program tailwind, but do not directly validate the project's specific regime (fixed-wing, ~1 km AGL, sat-tile basemap, 8-h endurance)

  • Statement: DARPA Fast Lightweight Autonomy (FLA) program ran 20152018 (Phase 1 Florida 2017; Phase 2 Georgia 2018; complete). Focused on small quadcopter autonomy at ≤20 m/s through cluttered indoor/outdoor environments using onboard cameras + LIDAR + sonar + IMU, no GPS / datalink / pilot. A 2025 retrospective (arXiv 2504.08122) reviews FLA testing methodology and Phase 1 results. A 2025 USAF SBIR Phase II solicitation (Sweetspot ID 7946c818-409f-5b31-8f06-554466071d83) is requesting visual position and navigation capability for sUAS in GPS-denied environments — confirming the regulatory + funding environment is currently active for this category in 2025.
  • Source: Source #35
  • Phase: Phase 2
  • Target Audience: Decision-maker + Domain expert
  • Confidence:
  • Related Dimension: SQ1 (defense-program lineage)
  • Fit Impact: context only, no direct candidate gain — FLA pre-dates the project's specific regime by 8 years, focused on a different platform (multirotor) and altitude (low-altitude obstacle avoidance, not 1 km AGL nadir-camera satellite-anchor). Useful only to establish lineage and context. The USAF SBIR datapoint is more directly relevant: confirms that an active US-defense-funded need exists for sUAS visual position + navigation in GPS-denied environments — i.e., the project's market exists outside Ukraine.

SQ1 — Conclusions (working summary, will be re-checked at Step 7.5)

Existing-systems landscape (5 named-and-evidenced peer / adjacent systems)

System Class Operational regime Closest match dimension Closest mismatch dimension Status as evidence
Twist Robotics OSCAR (UA) Deployed Ukrainian peer Combat-deployed, fixed-wing-class, GPS-denied vision-nav Same architecture, same threat environment Altitude / endurance / FC / accuracy contract not publicly specified Closest peer for "feasibility floor"
Auterion Artemis Production COTS one-way attack drone Shahed-class, 1000-mile range, 40 kg warhead, Ukraine-validated GPS-denied nav Same architectural pattern (Skynode + Visual Navigation + terminal guidance) One-way attack vs reusable; no covariance/source-label contract published Closest production reference architecture
Vantor Raptor (Guide / Sync / Ace) Production COTS software suite Vision-based GPS replacement on existing drone camera + Vivid Terrain 3D basemap Visual-position software pattern Vendor-managed sat-tile basemap is not the project's Azaion Suite Satellite Service; no AC-NEW-4 / AC-1.4 contract Closest commercial peer for "build-vs-buy" framing
snktshrma/ngps_flight (NGPS, ArduPilot GSoC 2024) Open-source research prototype LightGlue+SuperPoint+UKF+VISION_POSITION_ESTIMATE to AP Same component split, same FC family GSoC prototype, not production; no spoof defence; no covariance honesty Closest open-source pipeline match — explicit architectural template
CTU Prague SPRIN-D winner Academic / competition Multirotor, ≤25 m AGL, LiDAR + heightmap gradient + particle filter on CPU "Recover-from-uncertainty > low-instantaneous-RMSE" lesson; VIO discipline LiDAR-required, low-altitude regime, no sat-tile basemap Architectural-pattern reference + cautionary tale
Destinus Ruta + UAV Navigation Production miniaturised cruise missile 300 km strike, DSMAC-class, Ukraine-combat-validated Pre-cached basemap + visual matching + autopilot ingestion One-way attack, terminal guidance, no covariance contract Shows DSMAC-class miniaturised into UAV tier

Per-perspective coverage

Perspective Facts supporting Saturation status
Implementer / Engineer Fact #14 (NGPS), Fact #16 (SLAM failure modes), Fact #18 (DINOv2 risk) Saturated for SQ1 — deeper component-level deep-dives go to SQ3/SQ4
Practitioner / Field (Ukraine) Fact #11 (OSCAR), Source #37 (~70% UAV losses to EW), Source #27 (Ruta + UAV Navigation Ukraine combat validation) Saturated for SQ1
Domain expert / Academic Fact #18 (Hierarchical Matching SOTA), Fact #19 (AerialExtreMatch benchmark), Fact #15 (SPRIN-D regime distinction) Saturated for SQ1 — academic SOTA benchmarking handed off to SQ3/SQ4 + SQ7
Contrarian / Devil's advocate Fact #15 (low-altitude RGB matching unreliable lesson), Fact #16 (RTAB-Map / ORB-SLAM3 disqualified), Fact #18 (DINOv2-on-Jetson risk) Saturated for SQ1
Decision-maker / Business Fact #12 (production-ready Auterion), Fact #13 (commercial Vantor build-vs-buy framing), Fact #20 (USAF SBIR market context) Saturated for SQ1

Architectural conclusions for solution_draft01

  1. Build-vs-buy stance: build. Vantor Raptor and Auterion Visual Navigation are commercially superior on hardening + integration but neither exposes the covariance honesty contract (AC-NEW-4) nor uses the project-specified Azaion Suite Satellite Service tile cache (AC-2.x); both are dual-use export risks for the Ukrainian battlefield. NGPS (Fact #14) is the open-source architectural template to learn from but is a GSoC research prototype lacking production hardening, spoof defence, and the covariance-honesty contract. Architectural conclusion: build with NGPS as the template, with project-specific contracts (AC-NEW-4, AC-1.4, AC-NEW-7) and per-FC adapter (SQ6 conclusion) layered on top.
  2. Differentiation from OSCAR (Twist Robotics) must be made explicit in solution_draft01: (a) honest covariance contract per AC-NEW-4; (b) explicit {satellite_anchored, visual_propagated, dead_reckoned} source-label contract per AC-1.4; (c) AC-NEW-7 cache-poisoning safety budget on tile write-back; (d) ArduPilot Plane + iNav both supported per project's revised AC-4.3.
  3. Pipeline canonicalness: the C1+C2+C3+C4+C5+C8 split is canonical (NGPS + the 2025 hierarchical-matching paper + SPRIN-D winner all use the same shape; only the specific algorithm choices differ). SQ2 will sanity-check this against one more pipeline-survey paper, but this is essentially a low-risk question now.
  4. Component-pruning carried into SQ3/SQ4:
    • C1: prune RTAB-Map and ORB-SLAM3 as primary candidates per Fact #16. Carry: VINS-Mono / VINS-Fusion / OpenVINS / OKVIS2 / DROID-SLAM / DPVO / pure VO baseline.
    • C2/C3: mandatorily benchmark any DINOv2-based candidate (AnyLoc, BoQ, SASCM-style) against AerialExtreMatch at our pitch / scale / overlap regime AND against Jetson Orin Nano latency budget (per Fact #18). Maintain LightGlue+SuperPoint as the "simple-baseline / known-Jetson-runnable" option per NGPS precedent.
    • C8 transport: NGPS uses VISION_POSITION_ESTIMATE. SQ6 picked GPS_INPUT. Re-examine the trade-off in design phase, but SQ6's selection stands for the research draft.
  5. Lessons from SPRIN-D winner that must propagate to solution_draft01:
    • "Ability to recover from periods of high uncertainty and re-localize" > "low instantaneous RMSE" — directly informs AC-NEW-2 / AC-NEW-8.
    • VIO requires mechanically-decoupled IMU; this is a hardware-integration constraint, not a software issue.
    • Magnetometer is unreliable near steel/concrete; sensor fusion of heading sources is essential.
    • "No single sensor can be fully relied upon" — directly supports our IMU+camera+sat-tile multi-source posture.

Open follow-ups (deferred to later sub-questions)

  • (SQ8) Independent verification of OSCAR's "fully resistant to spoofing/jamming" claim — if available. Otherwise, Twist Robotics's claim remains a vendor-only signal.
  • (SQ8) Vantor Raptor and Auterion Visual Navigation's covariance reporting behaviour — for benchmarking AC-NEW-4 compliance.
  • (SQ3+SQ4 / C2) AnyLoc / BoQ / DINOv2-VLAD / MixVPR / EigenPlaces / NetVLAD on AerialExtreMatch for cross-source aerial — already in C2 search plan; SQ1 just confirmed they're the right candidate set.
  • (SQ3+SQ4 / C3) LightGlue / LoFTR / RoMa / DKM / MASt3R + classical SIFT+RANSAC + XFeat on AerialExtreMatch — already in C3 search plan; SQ1 confirms shape.
  • (SQ7) AerialExtreMatch + AerialVL + CS-UAV + RealUAV/SAVL + UAV-VisLoc as the dataset shortlist for our cross-validation — confirmed by SQ1 hits.

Boundary check: SQ1 is saturated

Saturation signals observed: 4 perspectives saturated, ≥3 high-confidence facts per perspective, last 3 search rounds (Anduril Iris detail probe, ArduPilot prior-art probe, DSMAC lineage probe) yielded only one new substantive datapoint (NGPS) and confirmed already-known patterns. No unresolved contradictions. Per references/source-tiering.md "Search saturation rule" → SQ1 is closed.


SQ2 — Canonical pipeline decomposition (sanity-check)

Fact #21 — The canonical pipeline for offline-cache visual geo-localization is two-stage: global VPR retrieval, then local alignment (image matching → pose)

  • Statement: Source #38 (Skoltech aerial-VPR survey) defines the field's canonical pipeline verbatim: "Visual geolocalization can be implemented through various methods, typically relying on a pre-built database of images with known locations. This approach generally involves two stages: global localization (or Visual Place Recognition, VPR) and local alignment. Global localization involves identifying the nearest frame from the database (Image Retrieval), while local alignment determines the precise position using the selected frame." Source #42 (NUDT 2026 absolute-VL survey) names the same shape "retrieval → matching → pose-estimation hierarchical framework" and explicitly contrasts it against three rejected alternatives: (a) relative-only VIO/SLAM (cumulative error), (b) end-to-end direct localization (poor generalization), (c) map-free localization (scene-dependent). Source #39 (U.Maine cross-view survey) traces the same lineage from 2003 pixel-wise template-matching → 2013 hand-engineered features → 2017 CNN/triplet-loss → 2018+ Siamese/GAN → 2022+ Transformer → 2023 DINOv2-class. Source #41 (AnyVisLoc benchmark) implements this hierarchy as: image retrieval (rough) → image matching (2D-2D) → DSM-lift to 3D → PnP+RANSAC, with Top-N re-rank by inlier count as a critical fourth stage between matching and pose.
  • Source: Source #38, Source #39, Source #41, Source #42
  • Phase: Phase 2
  • Target Audience: Architects of solution_draft01
  • Confidence: (four independent surveys/benchmarks converge)
  • Related Dimension: SQ2, C2 (VPR), C3 (cross-domain matching), C4 (pose estimation)
  • Fit Impact: confirms the project's C1C10 decomposition is canonical for the C2 → C3 → C4 chain. The component split is not novel; the project's contribution is the integration discipline (covariance honesty AC-NEW-4, source-label contract AC-1.4, offline-cache safety AC-NEW-7) layered on top. Augment the existing decomposition with an explicit "Top-N re-rank by inlier count" stage between C3 and C4 (currently implicit).

Fact #22 — AdHoP (Adaptive Homography Preconditioning) is a method-agnostic post-matching refinement loop that improves translation accuracy by ~30% average and up to 63% for previously-underperforming methods, at the cost of a second matching pass

  • Statement: Source #40 (OrthoLoC benchmark, Sep 2025): from initial 2D-2D query↔orthophoto correspondences, estimate a homography H via DLT+RANSAC, warp the orthophoto with H to better match the query's perspective (reducing residual perspective gap), re-match in this warped frame, then map the new correspondences back to the original orthophoto via H⁻¹, lift to 3D using DSM, and run PnP+RANSAC + Levenberg-Marquardt refinement. Accept the AdHoP-refined pose only if reprojection error decreases vs. the non-refined pose. Quantitative effects (16,425 images, 47 locations, 1m-1° threshold): GIM+DKM 75.4% recall (best); AdHoP-refined methods see ~30% average matching improvement, ~20% translation/rotation error reduction; for previously-underperforming methods AdHoP yields up to 95% matching improvement (XFeat*) or 63% translation reduction (DKM); for RoMa, AdHoP lifts 1m-1° recall by +23 points (54.6% → 77.6%-class). Cross-domain regime (war-zone-equivalent: scene change between query and reference): translation error increases ~3× when only the visual modality differs, ~7× when both visual and structural (DSM) gaps exist (0.16 m → 1.12 m for GIM+DKM+AdHoP). Method-agnostic — works on top of any 2D-2D matcher.
  • Source: Source #40
  • Phase: Phase 2
  • Target Audience: System architects + C3/C4 implementers
  • Confidence: for headline numbers (single-paper, but published dataset + open code + reproducible per repo)
  • Related Dimension: SQ2 (new sub-stage), C3 (matcher), C4 (pose), SQ5 (cross-domain failure mode)
  • Fit Impact: adds a new sub-stage between C3 and C4. Decision for solution_draft01: include AdHoP-class refinement as an optional stage gated on Jetson Orin Nano latency budget — if (single-pass match latency × 2) + homography estimation + reprojection check fits under (400 ms - other-stages), include it; otherwise reserve as offline-replay-time refinement. Cross-domain 3× translation-error penalty is a direct AC-NEW-4 calibration input — companion-side covariance must inflate proportionally when scene-change detection (deferred to SQ8) flags a stale tile.

Fact #23 — 6-DoF aerial-to-satellite localization requires DSM (Digital Surface Model) elevation data; without DSM, the system collapses to 3-DoF (position + 1 rotation) or must compute attitude purely from IMU/VIO

  • Statement: Source #40 OrthoLoC explicitly: "Our pipeline matches the query image with the DOP, lifts the matched 2D points in DOP to 3D using the DSM, and then estimates the camera pose using PnP and RANSAC." Without the DSM lift, the matcher produces 2D↔2D correspondences that constrain a homography (which encodes 3-DoF for a planar scene + planar camera) but not the full 6-DoF camera pose. Source #41 AnyVisLoc independently confirms by measuring: aerial-photogrammetry map (with paired DSM at 0.94 m/px) achieves 74.1% A@5m; satellite map (with ALOS 30 m DSM) achieves only 18.5% A@5m — a 4× accuracy collapse driven by DSM coarseness. The project's offline cache from the Azaion Suite Satellite Service is currently specified as 2D ortho tiles only (no DSM commitment in restrictions.md or AC). Three architectural responses are available: (a) 3-DoF acceptance — fix attitude from IMU/VIO, treat the matcher output as a homography-only constraint, ignore DSM; sacrifices the up-to-2× higher accuracy reported when DSM is present, but stays within current cache contract; (b) Request DSM tiles from the Suite Sat Service — adds C2 cache schema work + a Suite Sat Service contract change; preserves 6-DoF accuracy; (c) IMU/VIO-only attitude + 2D-2D matching translation — same as (a) but explicitly contracts the IMU/VIO module to provide attitude with σ ≤ 5° (per Fact #24); operationally identical to (a), differs only in how the contract is written.
  • Source: Source #40, Source #41
  • Phase: Phase 2
  • Target Audience: System architects + Suite Sat Service stakeholder + AC owner
  • Confidence: for the architectural claim; for the 4× accuracy collapse number
  • Related Dimension: SQ2 (decomposition), C2 (cache schema), C3 (matcher output contract), C4 (pose), C5 (estimator), C6 (IMU/VIO contract), AC-1.1 / AC-1.1.1 (accuracy budget)
  • Fit Impact: architectural decision required, surfaced for user. The current restrictions.md (no DSM commitment) implicitly forces option (a) or (c). The accuracy budget AC-1.1.1 (≤80 m at 1 km AGL) is loose enough that 3-DoF + IMU-attitude almost certainly satisfies it on a per-frame basis (per Fact #21 and DSMAC-class lineage in Fact #17), but requires explicit acknowledgement in the architecture before commitment. Proposed default for solution_draft01: option (c) — fix attitude from IMU/VIO with documented σ ≤ 5° contract on yaw, σ ≤ 5° on pitch (per Fact #24), translation from 2D-2D matching + camera pose. Flag option (b) as a "Suite Sat Service follow-up" if 6-DoF accuracy ever becomes a hard requirement.

Fact #24 — IMU-derived yaw and pitch priors with σ ≤ 5° are required for the matching+PnP stack to hit benchmark accuracy; σ ≥ 10° causes 24% A@5m drops, σ ≥ 30° causes ≥4% drops, σ ≥ 60° causes 25.7% drops

  • Statement: Source #41 AnyVisLoc systematically perturbs yaw and pitch priors and measures localization accuracy collapse. Yaw: σ = 5° → no impact; σ = 10° → 1.9% A@5m; σ = 30° → 4.1%; σ = 50° → 13.7%; σ = 60° → 25.7%. Pitch: σ < 5° → no impact; σ ≥ 7° → 15% drops. The benchmark is conducted at low altitude (30300 m AGL) with 2090° pitch range; lessons transfer to our 1 km AGL nadir-camera regime in the direction but the magnitudes may be lower at 1 km AGL because nadir geometry is less yaw-sensitive than oblique. Conservatively adopting the benchmark numbers gives a hard contract: IMU/VIO must deliver yaw with σ ≤ 5° and pitch with σ ≤ 5° to the matcher (1σ, not 95%, since the benchmark is single-σ). Pitch is naturally tighter on a nadir-fixed camera (mechanically constrained); yaw is the binding constraint and is the typical IMU/magnetometer failure mode (per SPRIN-D lesson Fact #15).
  • Source: Source #41
  • Phase: Phase 2
  • Target Audience: System architects + C1 (VIO) implementer + C5 (estimator) implementer
  • Confidence: for the AnyVisLoc numbers; ⚠️ for direct transfer to 1 km AGL nadir regime (magnitudes likely smaller at our altitude/pitch — direction is conservative)
  • Related Dimension: SQ2 (sensor-prior contract), C1 (VIO output contract), C5 (estimator), C6 (IMU)
  • Fit Impact: architectural contract for solution_draft01: the C1 module's published contract to the C2/C3 stack is yaw σ ≤ 5° AND pitch σ ≤ 5°. Magnetometer-only yaw is insufficient by the SPRIN-D lesson (Fact #15) — VIO must contribute. Adds a constraint that flows back to the C6 IMU integration: IMU mechanical isolation per SPRIN-D Fact #15 is required; magnetometer + GPS-yaw startup alignment at the airbase (before take-off, while real GPS is healthy) is part of the boot sequence.

Fact #25 — Top-N re-ranking by inlier count is the dominant accuracy/cost trade-off; pure-matching-without-retrieval is catastrophic (A@5m collapses from 62.2% to 34.3% with the same matcher)

  • Statement: Source #41 AnyVisLoc and Source #38 Skoltech survey both quantify the value of retrieval as a search-space reducer for matching. Source #41 explicitly: "Top-N re-rank by inlier count is the best accuracy/cost trade-off" → 62.2% A@5m at 0.8 s/frame on RTX 3090. Without retrieval (pure exhaustive matching against the cache): 34.3% A@5m — i.e., almost half the accuracy at infeasible compute. Source #38 measures sparse-VPR re-ranking specifically: AnyLoc descriptor + SuperGlue re-rank on top-100 candidates = 1525 s/frame on RTX 3090 (catastrophic for our 400 ms budget); LightGlue re-rank ≈ 1 s/frame (still over budget); SelaVPR re-rank < 0.1 s/frame (in-budget on RTX 3090, must be re-tested on Jetson Orin Nano). Re-ranking budget = (frame budget) (descriptor extraction) (initial top-N retrieval) (matcher pose estimation) (AdHoP if included).
  • Source: Source #38, Source #41
  • Phase: Phase 2
  • Target Audience: System architects + C2 implementer
  • Confidence: (two-source convergence on the qualitative claim; quantitative numbers are RTX-3090-specific and must be Jetson-MVE'd)
  • Related Dimension: SQ2 (pipeline structure), C2 (VPR), C3 (matcher), SQ3+SQ4 (Jetson MVE)
  • Fit Impact: mandates Top-N re-rank by inlier count as a stage in solution_draft01. Trade-off Top-N value (typical N=520 in literature) goes to SQ3+SQ4 candidate matrix, not SQ2.

Fact #26 — High-accuracy SOTA models (AnyLoc + SuperGlue + RoMa-class) are NOT viable on Jetson Orin Nano under the 400 ms p95 budget; lightweight VPR (MixVPR / SALAD / SelaVPR-class) + lightweight matchers (LightGlue / XFeat-class) are the only candidates that survive a basic latency pre-screen

  • Statement: Two independent runtime measurements on RTX 3090 (≥10× faster than Jetson Orin Nano in dense matrix ops): Source #38 — AnyLoc descriptor calculation 0.370.84 s/frame (huge ViT-G DINOv2); SuperGlue re-rank 1525 s/frame on top-100; LightGlue re-rank ~1 s/frame; SelaVPR re-rank < 0.1 s/frame. Source #41 — RoMa dense matcher 659 ms/frame; SP+LightGlue+GIM sparse 105 ms/frame; ratio = 6.3×. Memory: AnyLoc descriptors = 2.313.9 GB for 47k tiles (out of 8 GB Jetson Orin Nano envelope before model weights); SelaVPR descriptors < 0.2 GB. Pre-screen conclusion: AnyLoc / SuperGlue / RoMa-class are disqualified on the Jetson Orin Nano at 3 fps unless heavy quantization (INT8) reduces them ≥10×, which is not yet established for our latency target on this hardware. Surviving candidates from the literature: VPR: MixVPR, SALAD, SelaVPR, EigenPlaces, NetVLAD-class; matchers: LightGlue, XFeat, XFeat*, SP+LightGlue. Disqualification is preliminary — final go/no-go happens at SQ3+SQ4 with on-Jetson MVE per references/mode-A-mve-rules.md.
  • Source: Source #38, Source #41
  • Phase: Phase 2
  • Target Audience: C2 + C3 implementer; SQ3+SQ4 candidate-matrix author
  • Confidence: for RTX-3090 numbers; ⚠️ for direct Jetson translation (Jetson Orin Nano AI score is well-published; ratio is conservative)
  • Related Dimension: SQ2 (Jetson budget feasibility), SQ3+SQ4 (candidate pre-screen), SQ5 (foundation-model-on-edge failure mode), C2, C3, C7 (Jetson runtime)
  • Fit Impact: prunes the SQ3+SQ4 candidate matrix BEFORE expensive Jetson MVE. Candidates entering SQ3+SQ4 with mandatory Jetson MVE: (C2 VPR) MixVPR, SALAD, SelaVPR, EigenPlaces, NetVLAD; (C3 matcher) LightGlue, XFeat, XFeat*, SP+LightGlue. Candidates that need Jetson INT8 quant before they earn an MVE slot: AnyLoc, BoQ, DINOv2-VLAD (must demonstrate INT8 build path with vendor-validated accuracy preservation). Candidates pruned outright: RoMa dense, SuperGlue, MASt3R (latency).

Fact #27 — A 20% covisibility floor between query frame and reference tile is required for localization to succeed; below it, ALL methods fail regardless of matcher quality

  • Statement: Source #40 OrthoLoC: "When the covisibility between the UAV image and the orthographic geodata is too small (less than ~20%), the localization fails for all methods regardless of matcher quality." This is a geometric floor, not a method-specific limit. The implication for the project: any tile-cache design that allows a query to fall outside 20% covisibility with the best available cached tile must also include a runtime covisibility-check + graceful degrade to visual_propagated mode (per AC-1.4 source label). This is a runtime condition, not a one-time setup parameter.
  • Source: Source #40
  • Phase: Phase 2
  • Target Audience: C2 (cache scheduler) + C5 (estimator) + AC-1.4 owner
  • Confidence:
  • Related Dimension: SQ2 (boundary condition), C2 (tile cache), C5 (estimator state machine), AC-1.4
  • Fit Impact: adds a runtime invariant to solution_draft01: tile selection must guarantee ≥20% covisibility OR explicitly emit the visual_propagated source label per AC-1.4 with covariance widened per AC-NEW-4. This becomes a hard constraint on the C2 cache schema (must support tile-extent metadata) and a runtime check before invoking C3 matcher.

SQ2 — Conclusions (working summary, will be re-checked at Step 7.5)

Pipeline-component coverage table (existing C1C10 vs. survey-listed components)

Survey/benchmark canonical stage Project component (current) Coverage status Required action
Image retrieval (global VPR) C2 — Visual Place Recognition covered No change
Re-ranking (top-N inlier-based) (currently implicit, inside C2 or C3) ⚠️ implicit Promote to explicit sub-stage (C2.5 or C3.0) in solution_draft01
Local image matching (2D-2D, sparse or dense) C3 — Cross-domain registration covered Add Top-N re-rank-by-inlier-count requirement
AdHoP-style perspective preconditioning (not represented) missing Add as optional sub-stage between C3 and C4, gated on Jetson latency budget
2D-3D lift via DSM (not represented; current cache is 2D ortho only) architectural decision required Decision required from user — see below
Pose estimation (PnP + RANSAC + LM) C4 — Pose estimation covered No change
State estimator / fusion (UKF / ESKF / MSCKF / factor graph) C5 — Estimator / fusion covered Augmented with covariance-honesty contract from AC-NEW-4
IMU + VIO contract C1 — VO/VIO + C6 — IMU integration covered Add yaw σ ≤ 5°, pitch σ ≤ 5° hard contract from Fact #24
Tile cache + scheduler C2 — VPR tile cache + C9 — Cache hygiene covered Add 20% covisibility runtime invariant (Fact #27)
Anti-spoof / source-switch C7 — Spoof detection + C8 — FC adapter covered Already addressed in SQ6
Health monitoring / safety C10 — Safety / health monitoring covered Already addressed

Architectural decisions surfaced (require user resolution before SQ3+SQ4 starts)

  1. DSM dependency on the Suite Sat Service tile cache (per Fact #23). Three options:

    • (a) 3-DoF acceptance — accept that without DSM, only position is recovered from matching; attitude is fixed by IMU/VIO with no satellite-tile cross-check. Lowest project scope. Requires AC budget verification (likely passes AC-1.1.1).
    • (b) Request DSM tiles — Suite Sat Service contract change. Highest accuracy. Adds ~1 cycle to delivery. Recommended if 6-DoF accuracy ever becomes a hard AC.
    • (c) IMU/VIO-attitude + 2D-2D matching translation — operationally identical to (a) but contracts the IMU/VIO module explicitly with σ ≤ 5° yaw / pitch (Fact #24).
    • Recommended default: (c) — explicit IMU/VIO contract; fall back to (b) if AC tightens.
  2. AdHoP refinement loop (per Fact #22). Three options:

    • (a) Always-on — included in every frame; Jetson budget must accommodate 2× matching latency.
    • (b) Conditional — only when initial reprojection error exceeds a threshold; gated on per-frame budget.
    • (c) Off (initial release) — relegate to offline-replay refinement.
    • Recommended default: (b) Conditional — fits within latency variance budget while capturing the cross-domain accuracy gain.
  3. Top-N re-rank promotion to explicit pipeline sub-stage (per Fact #25). Recommendation: promote to a named sub-stage in solution_draft01 with N as an SQ3+SQ4 hyperparameter sweep target.

Component-pruning carried into SQ3+SQ4

  • C2 candidates entering SQ3+SQ4 with mandatory Jetson MVE: MixVPR, SALAD, SelaVPR, EigenPlaces, NetVLAD.
  • C2 candidates entering SQ3+SQ4 conditional on INT8 quantization path: AnyLoc, BoQ, DINOv2-VLAD.
  • C2 candidates pruned: SuperGlue-as-reranker (latency).
  • C3 candidates entering SQ3+SQ4 with mandatory Jetson MVE: LightGlue, XFeat, XFeat*, SP+LightGlue (NGPS template).
  • C3 candidates pruned: RoMa, MASt3R, DKM (dense matcher latency on Jetson).
  • C3 candidates as "AerialExtreMatch reference points" only, NOT for production: GIM+DKM, GIM+LightGlue (per Source #40, used as accuracy benchmark only).

Boundary check: SQ2 is saturated

Saturation signals observed: (a) four independent surveys/benchmarks (Skoltech aerial-VPR survey, U.Maine cross-view survey, OrthoLoC benchmark, AnyVisLoc benchmark, NUDT 2026 absolute-VL survey) converge on the same "retrieval → matching → pose-estimation hierarchical framework" as canonical; (b) two independent runtime sources (Skoltech survey on RTX 3090; AnyVisLoc on RTX 3090 with explicit dense-vs-sparse breakdown) agree on the relative cost ordering of model classes; (c) cross-source agreement on AdHoP value (Source #40 only, but with reproducible code and dataset — single-source-but-strong evidence); (d) cross-source agreement on covisibility / sensor-prior thresholds. Two outstanding decisions are flagged for user — neither blocks SQ2's saturation status, both block SQ3+SQ4 start. Per references/source-tiering.md "Search saturation rule" → SQ2 is closed pending user decisions on DSM dependency + AdHoP gating.


SQ3+SQ4 / C1 — Visual / Visual-Inertial Odometry candidate enumeration

Project's pinned mode for every C1 candidate (binding): monocular ADTi 20MP nav camera @ 3 fps + IMU from FC over MAVLink @ ≥100 Hz, on Jetson Orin Nano Super (JetPack/CUDA/TensorRT, 8 GB shared LPDDR5, 25 W TDP), producing relative 6-DoF metric pose between consecutive frames + per-axis covariance, with attitude (yaw + pitch) hard-contract σ ≤ 5° at 1 σ (Fact #24), output cadence ≥3 Hz, no in-flight network, license compatible with onboard-binary distribution to a dual-use customer.

Per the engine's "Per-Mode API Capability Verification" rule, any candidate marked Selected requires a context7 lookup (mode enum + project's exact mode runnable example + disqualifier probe) AND a per-numbered-Restriction × per-numbered-AC sub-matrix. This session covers candidate enumeration + preliminary applicability assessment only; context7 verification and the structured sub-matrix are deferred to the next session per the autodev context budget heuristic.

Fact #28 — VINS-Mono is a canonical monocular-only sliding-window VIO with a working Jetson-Nano deployment record but no GitHub release and ~24-month-old master branch

  • Statement: VINS-Mono is the canonical mono+IMU sliding-window VIO from HKUST-Aerial-Robotics (Qin, Li, Shen — IEEE T-RO 2018). Features: efficient IMU pre-integration, automatic initialization, online camera-IMU spatial + temporal calibration, failure detection + recovery, DBoW2 loop detection, global pose-graph optimization. Output: metric-scale 6-DoF pose at IMU rate. Repository state: master-branch only (no tagged releases), 5,829 stars; last meaningful master-branch commit 2024-02-25 with a 2024-05-23 simulation-data commit. Jetson record: a 2021 IEICE paper (zinuok / KAIST) demonstrated VINS-Mono real-time on the original Jetson Nano (much weaker than Orin Nano Super) for MAV state estimation; a 2024 arXiv paper (2406.13345) showed an enhanced VINS-Mono variant achieving 50 FPS on a Raspberry Pi CM4 with on-sensor accelerated optical flow. License: GPL-3.0 (copyleft viral) — distribution of the onboard binary requires source disclosure for the entire linked binary and triggers GPL-3 anti-tivoization clauses for embedded firmware.
  • Source: Source #43 (canonical), Source #46 (KAIST Jetson benchmark), Source #43-linked LICENCE for license confirmation
  • Phase: Phase 2
  • Target Audience: System architects + C1 implementer
  • Confidence: for algorithm class, mode support, and Jetson Nano feasibility; ⚠️ for Jetson Orin Nano Super specific latency (no direct measurement — but Orin Nano Super >> Jetson Nano, so feasibility is virtually certain); ⚠️ for the maintenance-status risk implied by ~24-month-old master branch.
  • Related Dimension: SQ3+SQ4 / C1 Established-production candidate
  • Fit Impact: carry as lead candidate, conditional on user license decision. Algorithmic fit is excellent (canonical mono+IMU VIO with metric scale and covariance); maintenance status is borderline; GPL-3.0 license is a project-level decision required from the user before this candidate can be marked Selected — see "C1 Open Decisions" section below.

Fact #29 — VINS-Fusion is a multi-sensor superset of VINS-Mono but its monocular+IMU mode failed to run on Jetson TX2 in a 2021 KAIST benchmark; Orin Nano Super feasibility unverified

  • Statement: VINS-Fusion (Qin, Cao, Pan, Shen — extension of VINS-Mono) supports four documented sensor configurations: stereo+IMU, mono+IMU, stereo only, +GPS-fusion (toy example). KITTI Odometry top-ranked open-source stereo algorithm as of January 2019. Repository state: 4,476 stars; last update 2024-05-23; same master-branch-only convention. Jetson record: KAIST 2021 benchmark (Source #46) — on Jetson TX2, both VINS-Fusion (CPU) and VINS-Fusion-imu fail to run due to insufficient memory and CPU; VINS-Fusion-gpu (GPU-accelerated front-end) runs on TX2. Orin Nano Super has more memory than TX2 (8 GB LPDDR5 shared vs TX2's 8 GB LPDDR4 shared) and stronger CPU/GPU, but the project's onboard stack is co-resident with C2 VPR + C3 matcher + C5 estimator + C6 cache → memory-pressure on the VINS-Fusion-imu path is plausible. License: GPL-3.0, same dual-use distribution constraint as VINS-Mono.
  • Source: Source #44 (canonical), Source #46 (KAIST Jetson benchmark)
  • Phase: Phase 2
  • Target Audience: System architects + C1 implementer
  • Confidence: for the multi-sensor mode support and KITTI ranking; for the 2021 TX2 failure-to-run finding; ⚠️ for Orin Nano Super viability (between TX2 and Xavier NX in CPU/memory; not yet measured).
  • Related Dimension: SQ3+SQ4 / C1 Open-source candidate
  • Fit Impact: carry as alternate candidate, with mandatory Jetson Orin Nano Super MVE before promotion. VINS-Mono's narrower scope (mono+IMU only, no stereo overhead) makes VINS-Mono the preferred lead within the HKUST-Aerial-Robotics family; VINS-Fusion's multi-sensor coverage is a distractor for our pinned mode. GPL-3.0 license decision is the same as VINS-Mono — see "C1 Open Decisions".

Fact #30 — OpenVINS is the most actively maintained MSCKF-class VIO and runs on Jetson Orin Nano Dev Kit + JetPack 6 + ROS 2 Humble with documented build adjustments; latency 270 ms on Xavier NX needs Orin-Nano-Super MVE

  • Statement: OpenVINS (rpng, U. Delaware — Geneva, Eckenhoff, Lee, Yang, Huang — ICRA 2020) is a modular MSCKF (Multi-State Constraint Kalman Filter) implementation that fuses IMU state with sparse visual feature tracks via the Mourikis-Roumeliotis 2007 sliding-window MSCKF. Mode support: monocular, stereo, multi-camera (1N) + IMU; mono+IMU is a documented first-class configuration. Supports SLAM features (in-state landmarks) plus pure MSCKF features. Jetson Orin Nano evidence: rpng/open_vins issue #421 (Genozen, Feb 2024, closed) confirms OpenVINS ROS 2 builds on Jetson Orin Nano Dev Kit + JetPack 6 + Ubuntu 22.04 + ROS 2 Humble after one build patch (#include <opencv2/aruco.hpp> with newer OpenCV); fdcl-gwu/openvins_jetson_realsense (Nov 2025) provides a complete setup guide for Jetson Orin Nano + Intel RealSense + librealsense compiled-from-source + --parallel-workers 1 build to avoid memory issues. Latency record: rpng/open_vins issue #164 — ~270 ms latency on Jetson Xavier NX (4 cores, 40% CPU utilisation). Recommended optimisations: subscriber queue size 1, Release builds with ARM-specific optimization flags (e.g., armv8.2-a), reduced camera resolution, prefer odometry topic over pose_imu. License: GPL-3.0, same dual-use distribution constraint as VINS-Mono / VINS-Fusion. Stars 2,828; 30 contributors; 12 releases; latest tag v2.7 (June 2023) but master branch active through 20242025 issue threads.
  • Source: Source #45 (canonical + LICENSE + docs.openvins.com), Source #46 (KAIST Jetson benchmark for class-level CPU/memory profile), agent-tools record 29ebf728...txt (Jetson Orin Nano build evidence)
  • Phase: Phase 2
  • Target Audience: System architects + C1 implementer
  • Confidence: for mode support, MSCKF formulation, and Jetson Orin Nano build feasibility; ⚠️ for steady-state latency on Orin Nano Super under our 5472×3648 nav frames — KAIST benchmark used 640×480; 16× pixel count is a yellow-flag.
  • Related Dimension: SQ3+SQ4 / C1 Established-production candidate
  • Fit Impact: carry as lead candidate, conditional on user license decision. OpenVINS has the most documented Jetson-Orin-Nano build path of the three GPL-3.0 candidates; MSCKF formulation is more memory-efficient than VINS-Mono's full sliding-window optimisation, which is a meaningful advantage under co-resident-process memory pressure. GPL-3.0 license decision is the same as VINS-Mono / VINS-Fusion.

Fact #31 — OKVIS2 is the most actively maintained VI-SLAM in the BSD-permissive license bucket; OKVIS2-X (T-RO 2025) extends it with optional GNSS fusion that is architecturally aligned with the project's spoof-promotion path

  • Statement: OKVIS2 (Leutenegger — arXiv 2022, ETH/Imperial/TUM Smart Robotics Lab) is a factor-graph VI-SLAM with bounded-size optimization. Algorithmic novelty: pose-graph edges from marginalised observations are "seamlessly turned back into observations" upon loop closure, reviving old landmarks and reprojection errors. Includes lightweight CNN segmentation for dynamic-region removal. Mode support: monocular and multi-camera + IMU; mono+IMU is a documented first-class configuration. Successor OKVIS2-X (Boche, Jung, Laina, Leutenegger — IEEE T-RO 2025 vol 41 pp 60646083, DOI 10.1109/TRO.2025.3619051; arXiv 2510.04612, Oct 2025) generalises the core to fuse multi-camera + IMU + optional GNSS receiver + LiDAR or depth. The OKVIS2-X GNSS-fusion mode (lineage: Visual-Inertial SLAM with Tightly-Coupled Dropout-Tolerant GPS Fusion, IROS 2022) directly mirrors the project's "VIO that may opportunistically fuse a non-spoofed GPS update when promotion completes" pattern (AC-NEW-2). Repository state: ethz-mrl/OKVIS2-X created 2025-09-23, last push 2026-03-17, 295 stars, 2 active contributors (bochsim, SebsBarbas). License: 3-clause BSD on the LICENSE file (GitHub UI shows "Other (NOASSERTION)" but the file is canonical 3-clause BSD per ASL-ETH Zurich convention) — permissive, no dual-use distribution friction.
  • Source: Source #47 (OKVIS2 canonical), Source #48 (OKVIS2-X T-RO 2025)
  • Phase: Phase 2
  • Target Audience: System architects + C1 / C5 implementer
  • Confidence: for algorithm, mode support, license, T-RO 2025 publication, repository activity; ⚠️ for Jetson Orin Nano runtime — no direct Jetson Orin Nano benchmark located; OKVIS2's factor-graph backend is plausibly heavier than OpenVINS' MSCKF on memory but lighter than Kimera (Kimera also produces a 3D mesh + semantic mesher, OKVIS2 does not).
  • Related Dimension: SQ3+SQ4 / C1 Open-source-permissive lead candidate; potential C1+C5+C8 unified factor-graph design
  • Fit Impact: strong lead candidate by license + maintenance + GNSS-fusion alignment. If license permissiveness is a priority, OKVIS2 + OKVIS2-X is the natural choice. The OKVIS2-X factor-graph also opens a design path where C5 (state estimator) collapses INTO C1 (the same factor graph absorbs sat-anchor measurements as constraints) — would simplify the pipeline at the cost of departing from the C1/C5 split, which is a Step-7.5 / solution_draft01 design decision, not a SQ3+SQ4 question. Pending Jetson Orin Nano Super MVE.

Fact #32 — Kimera-VIO is BSD-permissive but resource-heavy; KAIST benchmark found Kimera had the highest memory usage among VIOs tested and failed Xavier-NX-class memory under multi-process load

  • Statement: Kimera-VIO (MIT-SPARK — Rosinol, Abate, Chang, Carlone — ICRA 2020) is a VI-SLAM pipeline with frontend + backend (factor-graph optimization in iSAM2 or GTSAM) + 3D mesher + pose-graph optimizer. Mode support: stereo+IMU primary, mono+IMU optional but documented. License: BSD 2-Clause "Simplified" (LICENSE.BSD on the repo) — permissive. Maintenance: active issue/PR threads through Dec 2024 / Feb 2025 covering ROS 2 integration, mono-inertial discussion, dependency management. Resource profile (Source #46 KAIST 2021 benchmark): Kimera had the highest memory usage among the 9 algorithms tested (numerous computations per keyframe); Kimera failed to fit on Xavier NX-class memory under sustained multi-process load. The 3D mesh + semantic-label outputs are unused by the project's narrow C1 mandate (relative 6-DoF + covariance only) — Kimera's overhead is unjustified vs OKVIS2 / OpenVINS for our use case.
  • Source: Source #49 (Kimera canonical + LICENSE.BSD), Source #46 (KAIST Jetson benchmark)
  • Phase: Phase 2
  • Target Audience: System architects (build-vs-buy, mesh-feature decision)
  • Confidence: for algorithm, license, maintenance status; for the Source #46 finding (KAIST 2021); ⚠️ for whether Orin Nano Super's larger memory + Ampere GPU lifts Kimera into feasibility — the Source-46 failure was on Xavier NX 8 GB shared, same memory budget as Orin Nano Super, but Orin Nano Super has higher per-core throughput.
  • Related Dimension: SQ3+SQ4 / C1 Open-source-permissive secondary candidate
  • Fit Impact: carry as fallback only, not lead. Kimera's permissive license is attractive but its resource overhead (especially the unused 3D mesh + semantic mesher) is a poor fit under co-resident process pressure. Use as a conservative secondary fallback if OKVIS2 unexpectedly fails Jetson MVE. Status: not lead.

Fact #33 — DROID-SLAM is disqualified by AC-4.2: ≥11 GB GPU VRAM inference budget exceeds the project's 8 GB shared LPDDR5; further, DROID-SLAM is monocular VO/SLAM without IMU fusion and would require an external metric-scale wrapper

  • Statement: DROID-SLAM (princeton-vl, Teed & Deng — NeurIPS 2021; arXiv 2108.10869) requires ≥11 GB GPU memory to run inference per the official README; training requires ≥24 GB on 4× RTX 3090. Issue #121 confirms that even with 128 GB system RAM and 16 GB VRAM (RTX 4080), users hit very large RAM consumption quickly. Algorithmically, DROID-SLAM is monocular VO/SLAM with recurrent dense bundle adjustment over a complete history of camera poses — no native IMU fusion; output pose is in arbitrary scale (no metric scale recovery without external alignment). DPV-SLAM (ECCV 2024, princeton-vl) is the lighter successor at ~45 GB GPU memory; DPVO (NeurIPS 2023, princeton-vl) is even lighter at ~3 GB, but neither natively integrates IMU.
  • Source: Source #50 (DROID-SLAM canonical), Source #51 (DPVO / DPV-SLAM successor), Source #52 (DPVO-QAT++ memory measurement)
  • Phase: Phase 2
  • Target Audience: System architects + C1 implementer
  • Confidence:
  • Related Dimension: SQ3+SQ4 / C1 disqualified candidate
  • Fit Impact: DISQUALIFIED outright. AC-4.2 sets the 8 GB shared CPU+GPU memory budget; DROID-SLAM's ≥11 GB GPU-only requirement violates it before adding co-resident C2/C3/C5/C6 processes. Cite as "what the project cannot afford" in solution_draft01 to pre-empt obvious questions.

Fact #34 — DPVO is monocular VO only (no IMU fusion); it can fit a Jetson-suitable memory footprint with QAT but cannot satisfy the C1 VIO mandate alone — would need an external IMU + metric-scale wrapper

  • Statement: DPVO (Teed, Lipson, Deng — NeurIPS 2023; ECCV 2024 DPV-SLAM successor) is a deep-learning monocular VO with sparse patch tracking + differentiable bundle adjustment. Mode: monocular VO only — no IMU fusion in the published paper or repository; output pose is in arbitrary scale. Memory footprint: DPVO ~3 GB GPU, DPV-SLAM ~45 GB GPU on standard hardware; DPVO-QAT++ (arXiv 2511.12653, Cheng Liao, Nov 2025) reduces peak reserved memory to 1.02 GB on RTX 4060 (8 GB) via fused-CUDA INT8 fake-quantization while preserving ATE on TartanAir/EuRoC. License: MIT (permissive). Repository: 989 stars; last update 2024-10-12. Crucial gap: DPVO does NOT meet the C1 mandate of a "VIO that produces metric-scale 6-DoF + attitude with σ ≤ 5°" — for the project to use DPVO as the VO half of C1, an additional IMU+scale-fusion module (loosely-coupled ESKF with VO velocity / displacement priors) must be designed; alternatively, DPVO's pose can feed C5 directly as a relative-displacement constraint, with attitude served separately by FC IMU integration. Jetson Orin Nano runtime evidence: indirect — DPVO-QAT++ benchmarks on RTX 4060 desktop, NOT Jetson Orin Nano. The Ampere GPU architecture is shared between RTX 4060 and Orin Nano Super (both Ampere); the Orin Nano Super's GPU is smaller, so direct extrapolation is not safe — Jetson MVE required.
  • Source: Source #51 (DPVO / DPV-SLAM canonical), Source #52 (DPVO-QAT++ Nov 2025)
  • Phase: Phase 2
  • Target Audience: System architects + C1 / C5 implementer
  • Confidence: for "VO only, no IMU fusion" and the memory footprints; ⚠️ for Jetson Orin Nano direct runtime (no measurement); ⚠️ for the operational complexity of the QAT pipeline (teacher-student distillation training is a significant prerequisite vs the classical VINS-* / OpenVINS / OKVIS2 candidates).
  • Related Dimension: SQ3+SQ4 / C1 conditional candidate (VO not VIO; needs external IMU wrapper)
  • Fit Impact: NOT a drop-in C1 candidate; conditional fit only. DPVO is not a substitute for VINS-Mono / OpenVINS / OKVIS2 — it is a candidate for the VO half of a hybrid design where C5 (estimator) absorbs IMU and DPVO provides relative-pose priors. This adds design complexity and is not preferred unless one of the established VIO candidates fails Jetson MVE for memory reasons. Status: secondary, conditional.

Fact #35 — Pure VO baseline (KLT optical flow + 5-point essential matrix or homography RANSAC) is the project's mandatory simple-baseline candidate and is the de-facto fallback when learning-based methods fail on Jetson-budget constraints

  • Statement: The classical pipeline — Shi-Tomasi or FAST corner detection → KLT pyramidal optical flow tracking (cv::calcOpticalFlowPyrLK) → 5-point essential matrix (Nister, cv::findEssentialMat) or homography RANSAC (cv::findHomography) → relative pose with arbitrary scale → metric-scale alignment via IMU integration externally — is the foundational visual-odometry pipeline implemented in OpenCV samples and pedagogical repositories. For the project's nadir-down UAV at 1 km AGL over Ukrainian steppe (predominantly planar terrain, low relief), the homography path is geometrically appropriate (a plane induces a homography between two views); for non-planar relief, the essential-matrix path is appropriate at a small overhead. License: public domain / OpenCV-Apache-2.0 / MIT (whatever reference implementation is chosen) — permissive. Reference: representative public Monocular-Video-Odometery (MIT, alishobeiri 2018), Monocular-Visual-Odometry (Yacynte) at translation error 0.94% / rotation error 0.015°/m on KITTI dataset.
  • Source: Source #53 (OpenCV docs + reference implementations)
  • Phase: Phase 2
  • Target Audience: System architects + C1 implementer + risk reviewer
  • Confidence:
  • Related Dimension: SQ3+SQ4 / C1 Simple-baseline candidate (mandatory per Component Option Breadth rule)
  • Fit Impact: carry as the project's Simple baseline / known-runnable / known-failure-mode C1 fallback. Not a lead, but mandatory presence. Failure modes: (a) low-texture cropland / snow → KLT track loss; (b) sharp turns → low-overlap homography degeneracy; (c) no native IMU fusion → must wrap with external metric-scale alignment (same wrapper as DPVO). Status: simple-baseline reference; cited in solution_draft01 to anchor the failure analysis.

Fact #36 — Step-0.5-time-window assessment: VINS-Mono / VINS-Fusion master branches are at the Critical-novelty 18-month boundary; OpenVINS and OKVIS2 are within window; DPVO is borderline; the established baselines (KLT + RANSAC) are exempt

  • Statement: Per Step 0.5 timeliness assessment in 00_question_decomposition.md, Critical-novelty topics require sources within 6 months for SOTA claims and 18 months for established libraries' API behaviour. Audit at access time 2026-05-07: VINS-Mono master last meaningful commit 2024-02-25 → ~27 months → just over the 18-month window; VINS-Fusion 2024-05-23 → ~24 months → just over; OpenVINS master active (issue threads through Feb 2025) and v2.7 release June 2023 → ~35 months for the tagged release but master in stable maintenance → within de-facto window for an established library; OKVIS2-X push 2026-03-17 → ~2 months → fully within window; DPVO last code update 2024-10-12 → ~19 months → just over but DPV-SLAM ECCV 2024 keeps the algorithm class within 6-month claim window; KLT / 5-point / RANSAC / homography → established baselines per Step 0.5 → no time window applies. Implication: VINS-Mono / VINS-Fusion fall into the "older than 18 months but classical authoritative reference" bucket — Step 0.5 allows up to 18 months strictly, but downstream forks (vins-mono-android, embedded variants) and the IEEE T-RO 2018 publication keep the algorithm class in active community use. Recommended treatment: keep as candidates but require live MVE on Jetson Orin Nano Super before promotion to Selected, to revalidate against the current OpenCV / Ceres / ROS 2 stack.
  • Source: Source #43, Source #44, Source #45, Source #47, Source #48, Source #51 (timeliness audit per source)
  • Phase: Phase 2
  • Target Audience: Step-7.5 reviewer + System architects
  • Confidence:
  • Related Dimension: SQ3+SQ4 / C1 candidate-pool integrity
  • Fit Impact: applies a conservative timeliness gate: every C1 candidate from VINS-Mono / VINS-Fusion / DPVO requires an Orin-Nano-Super MVE before being marked Selected, since their master-branch staleness pushes them out of the Critical-novelty 18-month window. OpenVINS / OKVIS2 / OKVIS2-X / Kimera are within window via active issue threads or recent releases.

C1 Component Applicability Gate — preliminary table (this session; structured Restrictions×AC sub-matrix per candidate is next session's work)

Candidate Mode (project) License Active maintenance? Jetson Orin Nano Super runnable? Native IMU fusion? Native metric scale? License blocks dual-use? Preliminary status
VINS-Mono mono+IMU GPL-3.0 (copyleft) ⚠️ borderline (24 mo) proven on Jetson Nano (2021) → Orin Nano Super virtually certain ⚠️ Verify with user Lead candidate conditional on user license decision + Orin-Nano-Super MVE
VINS-Fusion mono+IMU (mode) GPL-3.0 ⚠️ borderline (24 mo) ⚠️ failed on TX2 (KAIST 2021); Orin Nano Super untested ⚠️ Verify with user Alternate, secondary to VINS-Mono within HKUST family
OpenVINS mono+IMU GPL-3.0 active master build confirmed on Orin Nano Dev Kit + JetPack 6 (2024 + 2025 community evidence); ~270 ms latency on Xavier NX MSCKF ⚠️ Verify with user Lead candidate conditional on user license decision (best Jetson-Orin-Nano evidence + most maintained of the GPL-3 trio)
OKVIS2 / OKVIS2-X mono+IMU (+ optional GNSS) BSD-3 very active (2026 pushes) ⚠️ no direct Jetson Orin Nano measurement; factor-graph backbone plausibly heavier than MSCKF no Lead candidate by license + maintenance + spoof-promotion architectural alignment, pending Jetson MVE
Kimera-VIO mono+IMU (optional) BSD-2 active ⚠️ failed on Xavier NX 8 GB shared under multi-process (KAIST 2021) no Fallback secondary; resource overhead poor fit for project
DROID-SLAM mono VO/SLAM only (project repo) reference baseline ≥11 GB GPU VRAM > 8 GB AC-4.2 budget (arbitrary scale) n/a DISQUALIFIED by AC-4.2
DPVO / DPV-SLAM mono VO only MIT ⚠️ borderline (19 mo on code, ECCV 2024 paper) ⚠️ DPVO-QAT++ (Nov 2025) shows 1.02 GB peak on RTX 4060 desktop; Jetson Orin Nano untested (needs external IMU wrapper) (needs external scale alignment) no Conditional secondary — VO half of a hybrid C1+C5 design only; not a drop-in VIO replacement
Pure VO baseline (KLT + 5pt RANSAC / homography) mono VO only OpenCV-Apache-2.0 / MIT foundational (no time window) runs on any Jetson (needs external IMU wrapper) (needs external scale alignment) no Mandatory simple-baseline reference per Component Option Breadth rule

Surviving lead candidates (preliminary), in priority order based on this session's evidence:

  1. OpenVINS (GPL-3.0, MSCKF, best Jetson Orin Nano evidence) — pending user license decision + Orin-Nano-Super MVE
  2. OKVIS2 / OKVIS2-X (BSD-3, factor-graph + GNSS-fusion alignment, most active maintenance) — pending Jetson MVE
  3. VINS-Mono (GPL-3.0, sliding-window optimization, proven on Jetson Nano) — pending user license decision + Orin-Nano-Super MVE
  4. Pure VO baseline (mandatory simple-baseline; runtime guaranteed; carries the project as a graceful fallback)

Disqualified outright: DROID-SLAM (AC-4.2 memory budget), RTAB-Map and ORB-SLAM3 (already pruned by Fact #16).

Conditional / not-direct-fit: DPVO / DPV-SLAM (VO not VIO, needs external IMU wrapper), Kimera-VIO (resource overhead unjustified for narrow C1 mandate).

C1 Open Decisions (to be resolved before SQ3+SQ4 closure)

Decision D-C1-1 — GPL-3.0 license posture for the onboard binary (BLOCKING for the GPL-3.0 trio: VINS-Mono / VINS-Fusion / OpenVINS).

  • The three most established VIO candidates (VINS-Mono / VINS-Fusion / OpenVINS) are GPL-3.0 (viral copyleft).
  • For dual-use UAV deployment, GPL-3 binary distribution to a customer triggers obligations: source-code disclosure for the entire linked binary, anti-tivoization clauses for embedded firmware updates, viral effect on any proprietary code linked into the same binary.
  • BSD/MIT alternatives exist (OKVIS2 BSD-3, Kimera BSD-2, DPVO MIT, pure-VO baseline OpenCV-Apache-2.0), but each comes with secondary trade-offs (Jetson MVE risk, missing IMU fusion, resource overhead).
  • Three options for the user:
    • (a) Accept GPL-3.0 — distribution model = release source on customer request; or operate the system as a service rather than transferring binaries. Lowest-risk algorithmic path (most-tested candidates).
    • (b) Restrict to permissive licenses only (BSD/MIT) — lead candidate becomes OKVIS2; carries Jetson MVE risk.
    • (c) Keep both options open through the design phase — make the final license decision after the Jetson Orin Nano MVE results are in.
  • Recommended default: (c) — defer the binary commitment until empirical evidence on Jetson Orin Nano. This is recorded as a flagged decision; SQ3+SQ4 candidate matrix will carry both license families to Step 7.5.

Decision D-C1-2 — Acceptance of Jetson Orin Nano MVE as a Step-7.5 prerequisite (procedural).

  • Per the Per-Mode API Capability Verification rule, every lead candidate library/SDK requires context7 (or equivalent docs) lookup + a Minimum Viable Example for the project's pinned mode + per-numbered-Restriction × per-numbered-AC sub-matrix.
  • The Component Applicability Gate above is preliminary — it documents enumeration evidence but does NOT yet contain context7 per-mode capability verification or the structured sub-matrix.
  • Next session's mandatory work: context7 lookup (3 mandatory queries) for OpenVINS / OKVIS2 / VINS-Mono; per-Restriction × per-AC sub-matrix per candidate; the same for the simple-baseline path; record into 02_fact_cards.md per the engine template + 06_component_fit_matrix.md per Step 7.5.

C1 Boundary check: candidate enumeration is saturated for this session

Saturation signals observed: (a) all 7 named candidates from 00_question_decomposition.md C1 row enumerated with at least one canonical L1 source per candidate; (b) Jetson Orin Nano runtime evidence located for OpenVINS (direct) and VINS-Mono (Jetson Nano + RPi CM4); other candidates carry "MVE required" gates explicitly; (c) license diversity covered (GPL-3.0 trio + BSD-permissive duo + MIT + permissive-baseline); (d) explicit disqualifications recorded with cited evidence (DROID-SLAM, RTAB-Map, ORB-SLAM3). Open: per-mode context7 verification (BLOCKING per rule) + Restrictions×AC sub-matrices (BLOCKING per Step 7.5) — explicitly deferred to next session.