Files
gps-denied-onboard/_docs/00_research/02_fact_cards.md
T
Oleksandr Bezdieniezhnykh 9eba1689b3 - Introduced a new document detailing the current state of the autodev process, including steps, status, and findings.
- Revised acceptance criteria in the acceptance_criteria.md file to clarify metrics and expectations, including updates to GPS accuracy and image processing quality.
- Enhanced restrictions documentation to reflect operational parameters and constraints for UAV flights, including camera specifications and satellite imagery usage.
- Added new research documents for acceptance criteria assessment and question decomposition to support ongoing project evaluation and decision-making.
2026-04-26 14:28:10 +03:00

53 KiB
Raw Blame History

Fact Cards — Phase 1 (AC & Restrictions Assessment)

Each fact card: statement, source(s), confidence (High / Med / Low), audience.


A — Position accuracy state of the art

F-A1. State-of-the-art UAV cross-view visual localization (drone image vs. ortho satellite map) at low altitude (30300 m, multi-view, oblique allowed) achieves 74.1% recall@5 m on the AnyVisLoc benchmark (best combined retrieval + matching + PnP).

  • Source: S02 (AnyVisLoc paper, 2025).
  • Confidence: High. Audience: implementer / decision-maker.

F-A2. Cross-view image matching benchmarks report Relative Distance Score (RDS) up to 84.40% and MA@20 (matched within 20 m) up to 83.35% in nadir-favoring setups — i.e., 80%+ within 20 m is achievable with current methods on similar reference data.

  • Source: S39.
  • Confidence: Med. Audience: implementer / decision-maker.

F-A3. The most relevant fixed-wing aerial public benchmark is UAV-VisLoc (6,742 drone images, fixed-wing & multi-rotor, altitudes 405840 m, ortho satellite reference at 0.3 m/px from Google Earth, 11 sites in China incl. cities/towns/farms/rivers/hills/forests).

  • Source: S01.
  • Confidence: High. Audience: implementer.

F-A4. AerialVL (RA-L 2024) is a fixed-wing UAV dataset with 11 sequences / ~70 km of trajectory, RGB camera with gimbal, NovAtel GNSS at 1.5 m RMS ground truth, and reference satellite map. Provides VPR + visual alignment + VO baselines.

  • Source: S03.
  • Confidence: High. Audience: implementer.

F-A5. The viewpoint discrepancy (oblique aerial vs. nadir satellite) and temporal staleness (seasonal / construction change) are the two dominant accuracy degraders cited across cross-view localization literature. ViewBridge (2025), OrthoLoC (2025), and AnyVisLoc all emphasise BEV projection or 3D-grounded matching as mitigation.

  • Source: S02, S36, S37.
  • Confidence: High. Audience: technical expert.

F-A6. Confidence-score schemes used by mature visual localization stacks: (a) RANSAC inlier ratio after PnP/homography; (b) reprojection error variance; (c) top-K retrieval similarity gap; (d) 6-DoF pose covariance from EKF/factor-graph optimization; (e) photometric consistency vs. tile.

  • Source: S03, S04, S32 (and ORB-SLAM3 lit).
  • Confidence: High. Audience: implementer.

B — Image registration & feature matching

F-B1. SuperPoint + LightGlue with TensorRT runs at ~286 FPS on RTX 3080 at 320×240. SuperPoint ≈ 0.95 ms, LightGlue ≈ 2.54 ms per pair on RTX 3080.

  • Source: S11.
  • Confidence: High. Audience: implementer.

F-B2. Jetson Orin NX (sibling SoC) has a working LightGlue+TensorRT deployment (CUTLASS FlashAttention V2 plugin, qdLMF repo) — confirms feasibility on Jetson Orinclass hardware. No public public-released benchmark for Jetson Orin Nano Super specifically.

  • Source: S12.
  • Confidence: Med. Audience: implementer.

F-B3. XFeat (CVPR 2024) is 5× faster than LightGlue / SuperPoint while maintaining comparable accuracy; runs in real-time on a budget CPU (i5-1135G7); offers semi-dense matching mode; C++ + CUDA 12.2 implementations available.

  • Source: S08.
  • Confidence: High. Audience: implementer.

F-B4. MASt3R (ECCV 2024) achieves +30% absolute VCRE AUC on Map-free localization vs. prior SOTA — valuable for cross-view UAV/satellite due to its 3D-grounded matching, but is heavier (transformer with depth backbone) than LightGlue/XFeat — may exceed Jetson Orin Nano Super 8 GB envelope under the user's latency budget without aggressive distillation/quantization.

  • Source: S09.
  • Confidence: Med. Audience: technical expert.

F-B5. Mean Reprojection Error <1 px is a tight but achievable target for homography-fit on overlapping aerial pairs; for full-PnP across UAVsatellite the typical achieved MRE is 13 px on cross-view benchmarks (heavily dependent on pixel scale ratio between drone and satellite).

  • Sources: S01 (UAV-VisLoc), S03 (AerialVL), S36 (ViewBridge).
  • Confidence: Med. Audience: technical expert.

C — Resilience & re-localization

F-C1. Aerial VPR survey + aero-vloc benchmark (2024) provides a unified evaluation framework over AnyLoc, CosPlace, EigenPlaces, MixVPR, NetVLAD, SALAD, SelaVPR with re-ranking via LightGlue/SuperGlue. Datasets used: VPAir, ALTO, MARS-LVIG.

  • Source: S04.
  • Confidence: High. Audience: implementer.

F-C2. AnyLoc (DINOv2 + unsupervised VLAD) achieves up to 4× higher Recall@1 than environment-specialised approaches across urban / aerial / underwater / subterranean without training. Strong default for cross-view re-localization when training data is limited.

  • Source: S05.
  • Confidence: High. Audience: implementer / decision-maker.

F-C3. MixVPR: 94.6% R@1 on Pitts250k with <50% the parameter count of NetVLAD — best lightweight VPR aggregation in 20232024.

  • Source: S06.
  • Confidence: High. Audience: implementer.

F-C4. Tile-zoom / overlap selection when constructing the satellite reference map is a critical parameter for VPR efficiency and accuracy in aerial domain (per the 2024 survey).

  • Source: S04.
  • Confidence: High. Audience: implementer.

D — Onboard real-time performance on Jetson Orin Nano Super

F-D1. Jetson Orin Nano Super (with JetPack 6.2 "Super Mode"): 67 TOPS sparse INT8 AI performance, 8 GB shared LPDDR5, supports 15 W / 25 W / MAXN SUPER power modes. The 25 W mode is the new "reference" performance mode.

  • Source: S14, S15.
  • Confidence: High. Audience: implementer / decision-maker.

F-D2. Sustained-load thermal throttling is real on Jetson family — earlier-gen Xavier NX (21 TOPS) throttled within 5 minutes at 640×480 YOLOv8n. Orin Nano Super is reportedly more thermally efficient but 8-hour sustained 25 W requires forced-air cooling and possibly active heatsink — not solvable purely in software.

  • Source: S14, S15 + practitioner test S14.
  • Confidence: Med. Audience: implementer / decision-maker.

F-D3. MAXN SUPER is uncapped; if power exceeds TDP the module auto-throttles. For sustained 8 h flight on a fixed-wing UAV with ~25 W power budget, the system MUST be sized to fit the 25 W envelope at 100% duty, not MAXN.

  • Source: S14.
  • Confidence: High. Audience: implementer.

F-D4. Naive scaling from RTX 3080 → Orin Nano Super for SuperPoint+LightGlue gives ~3040× slower (RTX 3080 ≈ 30 TFLOPS FP16, Orin Nano Super ≈ 1 TFLOPS FP16 scope). At 320×240 ≈ 3.5 ms × 35 ≈ ~120 ms/pair on Jetson Orin Nano Super. Pre-running matching on a downsampled image (e.g., 1024×683 from 6200×4100) is feasible within the 400 ms p95 budget when combined with feature caching for the satellite tile.

  • Source: derived from S11, S14 (back-of-envelope; needs empirical confirmation in Phase 2).
  • Confidence: Low. Audience: technical expert.
  • Action: Empirical benchmark on actual Jetson Orin Nano Super in implementation phase.

E — Satellite imagery sourcing & legality

F-E1. Google Maps / Map Tiles API explicitly prohibits offline use, image analysis, machine interpretation, object detection, geodata extraction, and "any systems or functions for automatic or autonomous control of vehicle behavior". Use of Google Maps satellite tiles for an offline UAV navigation system violates the Terms of Service.

  • Sources: S22 (Map Tiles API Policies), S23 (Maps Platform ToS).
  • Confidence: High (two L1 sources, explicit language). Audience: decision-maker / legal.
  • Severity: Hard blocker — must be resolved before solution design.

F-E2. Bing Maps also prohibits creating local copies / offline storage of tiles. Tile URLs are not stable; the supported access pattern is dynamic REST queries per session. Bing tiles are not a viable offline reference source either.

  • Source: S24.
  • Confidence: High. Audience: decision-maker.

F-E3. Maxar Vivid Mosaic offers a 30 cm global basemap (135 M km², ex-Antarctica) and a 15 cm urban basemap (7 M km²), continuously refreshed with AI-driven change detection. Pricing for archive imagery is approximately $2532 / km² for similar 30 cm products. Licensing for offline tactical use must be negotiated explicitly with Maxar (Vantor) — this is the standard path for defense customers.

  • Sources: S25, S38.
  • Confidence: High. Audience: decision-maker.

F-E4. Airbus Pléiades Neo provides 30 cm via OneAtlas; volume pricing approximately €58.50 / km² on a 6-month sliding window. Direct competitor to Maxar at sub-meter resolution.

  • Sources: S26, S27.
  • Confidence: High. Audience: decision-maker.

F-E5. Sentinel-2 cloudless (EOX) provides a free global mosaic but at 10 m/px — well below the AC requirement of 0.5 m/px (ideally 0.3 m/px). At 1 km AGL Sentinel-2 is too coarse to achieve registration with a 24 cm/px drone image without massive scale-bridging losses.

  • Source: S28 + S01 (drone GSD).
  • Confidence: High. Audience: implementer.

F-E6. For Eastern/Southern Ukraine specifically, Sentinel-2 / Sentinel-1 are heavily used in 2022+ academic literature for damage / change detection. Maxar and Planet are the de-facto sources for sub-meter imagery of Ukraine. Recent satellite imagery for this region is operationally sensitive but commercially available.

  • Source: S28 (Ukraine 20242025 references in the EOX/Sentinel papers).
  • Confidence: High. Audience: decision-maker.

F-E7. Active-conflict-region staleness is a real risk: dam destruction (Kakhovka), urban damage, cratering, road realignment, smoke/dust — all can defeat cross-view matching against pre-conflict imagery. Imagery freshness budget should be tightened from "<2 years" to "<6 months for active sectors, <12 months for stable rear areas" — to be confirmed with operations.

  • Source: S28 + extrapolation from change-detection literature for Ukraine.
  • Confidence: Med. Audience: decision-maker.

F — Camera & GSD

F-F1. GSD formula: GSD (cm/px) = (Altitude_m × 100 × Sensor_w_mm) / (Focal_mm × Image_w_px). For a typical full-frame sensor (36 mm wide) with 24 mm wide-angle lens at 1 km AGL and 6200 px wide image: GSD ≈ 24 cm/px, frame footprint ≈ 1.49 km × 0.99 km. Drone images at 0.10.2 m/px (per UAV-VisLoc) is consistent.

  • Source: S29.
  • Confidence: High. Audience: implementer.

F-F2. Camera intrinsics calibration is mandatory — without known focal length, principal point, and distortion, sub-pixel MRE is impossible. Pre-flight checkerboard calibration is the standard; some payloads use factory-cal + temperature compensation.

  • Source: photogrammetry consensus (S01, S03, S29).
  • Confidence: High. Audience: implementer.

F-G1. GPS_INPUT is a standard MAVLink message. ArduPilot: set GPS1_TYPE=14 (MAVLink) and the autopilot will accept GPS_INPUT as the primary GPS. PX4: native GPS_INPUT support is limited; the standard workaround is to publish via VISION_POSITION_ESTIMATE through the EKF2 vision-pose pipeline.

  • Sources: S16, S17, S18.
  • Confidence: High. Audience: implementer / decision-maker.

F-G2. MAVSDK-Python does NOT natively support GPS_INPUT (open issue #320). For Python implementations, pymavlink must be used to emit raw GPS_INPUT messages.

  • Source: S18.
  • Confidence: High. Audience: implementer.

F-G3. ArduPilot can blend or switch between GPS sources by quality (sat count, HDOP). If the legitimate (jammed) GPS keeps reporting plausible values while the spoofed/denied state is intermittent, the autopilot may oscillate between sources. The companion computer must explicitly disable / lower-quality the real GPS (or the autopilot must be configured to only trust GPS_INPUT) to avoid this.

  • Source: S33.
  • Confidence: High. Audience: implementer / security architect.

F-G4. PX4 has GPS spoofing detection baked into the EKF2 driver chain (u-blox spoof flag, ~1 s hysteresis, GNSS-fusion auto-disable on consistent spoof signal). This is a useful upstream signal for the GPS-Denied system to know "you are now the primary source".

  • Sources: S19, S20.
  • Confidence: High. Audience: implementer / security architect.

F-G5. PX4 failsafe delay COM_POS_FS_DELAY defaults to 1 s; EKF2_NOAID_TOUT controls dead-reckoning validity. Documented bugs exist (#23970) — version pinning matters.

  • Source: S21.
  • Confidence: Med. Audience: implementer.

F-G6. QGroundControl has only STATUSTEXT (string) as a first-class companion-computer message channel; ONBOARD_COMPUTER_STATUS (planned) and custom MAVLink messages (NAMED_VALUE_FLOAT/INT, custom dialect) are practical channels for re-localization request UI / confidence scores.

  • Sources: S34, S35.
  • Confidence: High. Audience: implementer.

H — Object localization (AI camera, gimbal-only pose)

F-H1. Trigonometric ground projection error with gimbal-angle-only (no airframe IMU attitude fusion onto AI cam) is dominated by the unknown UAV roll/pitch at the moment of capture. For a fixed-wing UAV, typical roll/pitch in straight cruise is ±2°; in turns up to ±25°. At 1 km AGL, a 5° unknown attitude → ~87 m ground-position error. The AC "object localization accuracy is consistent with frame-center accuracy" is therefore unrealistic without attitude fusion in turning flight.

  • Source: derived from F-F1 + standard photogrammetry trig.
  • Confidence: High. Audience: technical expert / decision-maker.
  • Action: revise AC to "consistent with frame-center accuracy in level flight; expect ±h·tan(unknown_attitude) in turns" OR add attitude fusion onto AI cam.

F-H2. Flat-terrain assumption is reasonable for eastern/southern Ukraine (typical relief amplitude ~50150 m over 10 km). At 1 km AGL with up to 5° gimbal off-nadir, terrain-induced ground-projection error from flat-terrain assumption is typically <30 m for level flight — within the AC envelope. Riverbanks, tall buildings, and reservoir scarps are local exceptions.

  • Source: derived from S26 + S28 + Ukraine relief data.
  • Confidence: Med. Audience: technical expert.

I — Hardware envelope & power

F-I1. Jetson Orin Nano Super in 25 W mode: ~25 W average; with cooling adequately sized for 8-hour duty, sustained throttling can be avoided. Without active cooling, expect throttling within minutes (Xavier NX precedent).

  • Source: S14, S15.
  • Confidence: Med. Audience: implementer.

F-I2. Storage budget: User's "~10 GB" estimate for a 400 km² @ 0.3 m/px tile cache is correct (400 km² × 11 px²/m² with 3-byte JPEG ≈ 1013 GB). Persistent cache across flights is feasible with a small NVMe (≥64 GB).

  • Source: arithmetic; cross-checked S25 (Vivid pricing per km²).
  • Confidence: High. Audience: implementer / decision-maker.

J — Failsafe & resilience

F-J1. PX4's own GPS-loss failsafe defaults to ~1 s delay. A reasonable upstream "system fails to produce an estimate" failsafe N for the GPS-Denied system is 35 seconds — long enough to ride out one sharp turn / re-localization attempt without flapping, short enough to let the flight controller switch to IMU dead reckoning before drift exceeds tens of metres.

  • Source: S21 + practitioner heuristic.
  • Confidence: Med. Audience: implementer / decision-maker.

K — Public datasets for IMU / aerial dev & test

F-K1. No public dataset perfectly matches all four constraints: fixed-wing + ~1 km AGL + downward-facing + synchronized IMU + GPS truth. Closest match is AerialVL (fixed-wing + gimbal RGB + GNSS, ~70 km of tracks, 11 sequences, RA-L 2024). Altitude band for AerialVL is "different altitudes" (not always 1 km).

  • Source: S03.
  • Confidence: High. Audience: implementer.

F-K2. UAV-VisLoc is the largest fixed-wing drone-vs-satellite localization dataset (6,742 images, 405840 m altitudes, 0.3 m/px Google Earth reference) — but it does not provide synchronized IMU.

  • Source: S01.
  • Confidence: High. Audience: implementer.

F-K3. MidAir (synthetic, quadcopter) provides full IMU + GPS + depth + semantic at low altitude. Good for training-time augmentation but not real-world testing for fixed-wing at 1 km AGL.

  • Source: S30.
  • Confidence: High. Audience: implementer.

F-K4. Recommended dev/test stack: AerialVL (primary real-world fixed-wing) + UAV-VisLoc (visual-localization-only validation at 1 kmneighborhood altitude) + MidAir (synthetic IMU augmentation) + the user's own 65 input-data photos for sanity / regression. Real IMU from a dedicated test flight should still be planned for system V&V.

  • Source: synthesis of S01, S03, S30.
  • Confidence: High. Audience: decision-maker.

L & M — Restriction & AC gaps / contradictions

F-LM1. Restriction "up to 3000 photos per flight" is inconsistent with the stated 8-hour endurance × 3 fps = 86,400 photos and with the 500 ms minimum interval × 8 h = 57,600 photos. Likely interpretations: (a) On-disk retention budget (sub-sample for storage). (b) Imagery for an individual mission segment (~17 min × 3 fps = 3,000), not the full sortie. (c) A stale value carried over from a Mavic 3 attempt that should be updated.

  • Hard contradiction: needs user resolution before solution sizing.

F-LM2. Camera resolution range "FullHD to 6252×4168" is wide (~13× pixel-count delta). Per-frame pipeline cost scales with resolution; AC compliance is camera-dependent. Need to lock the target camera spec for AC validation.

F-LM3. Latency 400 ms vs. cycle 333 ms (3 fps): the user has confirmed <400 ms p95 with skip-allowed. This is internally consistent; the AC should be re-stated as "p95 latency <400 ms; up to ~10% of frames may be dropped under sustained load" to remove the apparent contradiction with frame rate.

F-LM4. Suggested missing AC (gap analysis):

  • L2 — Time-to-first-fix on cold start / mid-flight reboot (e.g., <30 s after IMU-extrapolated init).
  • L3 — Spoofing-promotion latency (system asserts its estimate over flight controller GPS within X seconds of denial).
  • L4 — Flight-data-recorder requirement (all photos + estimates + confidence + IMU traces at full rate, retained in non-volatile storage with a budgeted size cap).
  • L5 — False-position safety budget (e.g., probability of an estimate >500 m from truth must be <0.1% per flight).
  • L6 — Operational temperature / vibration envelope (MIL-STD-810 lite or RTCA DO-160G low-altitude variant).
  • L7 — Imagery freshness operationally enforced (e.g., reject tiles older than 12 months for active sectors).

F-LM5. Restriction "Google Maps allowed" is legally not allowed per F-E1/E2. The project must change source to a license-cleared provider (Maxar Vivid / Airbus Pléiades / commissioned tasking / government feed) before deployment. This is a blocker, not a tweak.


Mode B Findings — adversarial assessment of solution_draft01.md (2026-04-26)

M-1 (Component 6 / AC-4.3) — ODOMETRY is ArduPilot's preferred external-nav channel, not GPS_INPUT. ArduPilot's own dev docs (S41) call ODOMETRY "the preferred method" for sending external position estimates to EKF3, ahead of both VISION_POSITION_ESTIMATE and GPS_INPUT. ODOMETRY carries quaternion + 3-D linear velocity + a 21-element pos+attitude covariance (incl. native yaw error) + a quality field (-1=failed, 0=unset, 1..100). VISO_QUAL_MIN gates ignored messages on the FC side. GPS_INPUT collapses our 6-DoF covariance into a scalar h_acc / v_acc, which directly under-reports our yaw covariance and under-utilises the FC's EKF3. The draft's GPS_INPUT-only choice is sub-optimal for AC-NEW-4 (false-position safety) covariance fidelity.

  • Source: S41, S42, S43.
  • Confidence: High.

M-2 (Component 3) — MASt3R is not viable as primary on Orin Nano Super at 25 W. mast3r-runtime (S57) lists Jetson Orin support as "Planned", not implemented. Speedy MASt3R (S57 paper-side) achieves 91 ms / pair on an A40 GPU, which is roughly 30× the throughput of a Jetson Orin Nano Super in 25 W mode → MASt3R extrapolates to ~2.53 s / pair on our target hardware without aggressive distillation/INT8 work that nobody has published yet. Drop MASt3R from the matcher primary shortlist; keep it only as a long-horizon research target.

  • Source: S57.
  • Confidence: High.

M-3 (Component 3) — Add GIM (ICLR 2024 spotlight) to the bench-off shortlist. GIM (S48) is a self-training framework that takes existing matchers (LightGlue, RoMa, DKM, LoFTR) and re-trains them on 50 h of internet videos for 8.418.1 % zero-shot improvement. The "generalist trained on diverse video" framing is the closest published proxy for our domain transfer (eastern-Ukraine 1 km AGL nadir vs. service satellite tiles). GIM-LightGlue should be included alongside vanilla LightGlue.

  • Source: S48.
  • Confidence: High.

M-4 (Component 2) — Add SALAD (DINOv2 + Sinkhorn-VLAD) and BoQ to the VPR shortlist. Two CVPR 2024 papers landed after the draft's "AnyLoc primary + MixVPR fast-lane" decision was made:

  • DINOv2 SALAD (S47) — DINOv2 backbone + optimal-transport Sinkhorn aggregator with a "dustbin" cluster for non-informative features. R@1 = 75.0 % on MSLS Challenge, 92.2 % on MSLS Val, 76.0 % on NordLand. Already a supported method in aero-vloc (S04), so direct apples-to-apples bench against AnyLoc/MixVPR.
  • BoQ (S46) — bag of learnable queries with cross-attention; outperforms NetVLAD, MixVPR, EigenPlaces on 14 large-scale benchmarks; surpasses two-stage methods (Patch-NetVLAD, TransVPR, R2Former) at lower cost; DinoV2 results published Nov 2024. AnyLoc is no longer the only DINOv2-based VPR option in the cross-domain regime; the bench-off must include all four.
  • Source: S46, S47.
  • Confidence: High.

M-5 (Component 2 / 9 / latency) — DINOv2-base latency on Orin Nano Super is ~10× better than the draft assumed. Jetson AI Lab measurements (S40): DINOv2-base-patch14 = 126 inferences/sec on Orin Nano Super (~8 ms/inf at 224×224), 75 inf/s on the original Orin Nano (~13 ms/inf). The draft estimated 5080 ms / 224×224. The latency budget therefore has substantially more headroom than the draft assumed — but only at 224×224; at higher input resolution, expect ~quadratic scaling (so 448×448 ≈ 32 ms/inf is still very comfortable inside the 400 ms p95 budget). This is a good news finding that simplifies AC-4.1.

  • Source: S40.
  • Confidence: High (NVIDIA L1 source; precision implied FP16 from JetPack 6.2 default trtexec).

M-6 (Component 6 / Security) — mavlink-router is itself attack surface. Issue #436 (S45): public, easily-triggered, fuzzing-discovered stack-based buffer overflow in ConfFile::get_sections (memcpy of user-controlled section names into a 100-byte fixed buffer with no bounds check, plus an OOB write on null-terminator append). The repo has no formal security policy / no SECURITY.md. The draft's "share the MAVLink endpoint via a single mavlink-router instance" recipe drops a known-vulnerable C++ daemon onto a flight-critical companion. Mitigation options:

  1. Pin to a fixed-and-audited tag, harden the systemd unit (NoNewPrivileges, ReadOnlyPaths, sandbox), and config-file-validate before launch.
  2. Replace mavlink-router with a tiny in-process MAVLink endpoint multiplexer (Python or Go; this is ~150 lines of code given the only consumers are MAVSDK + pymavlink + mavlink-router-replacement → FC).
  3. Use distinct system-IDs for MAVSDK and pymavlink and let ArduPilot's native MAVLink routing (S35-class) do the muxing on the FC side.
  • Source: S45.
  • Confidence: High.

M-7 (Component 6 / Security) — MAVLink2 signing is a v1-mandatory configuration item, not "recommended". S44: signing is per-link, USB bypasses signing, keys live in FRAM (32-byte secret + timestamp), configured via Mission Planner (or the MAVProxy signing module). It works in ArduPilot 4.5+, but key provisioning is a per-airframe operator step that needs a documented procedure. Given that GPS_INPUT (or ODOMETRY) is a high-trust local channel feeding the flight-critical EKF, a signed MAVLink link companion↔FC is the only defence against an attacker who gains serial access. The draft mentions signing under "Security note (deferred to a Phase-4 security pass)" — Mode B promotes it to v1-required.

  • Source: S44.
  • Confidence: High.

M-8 (Component 1 / Tile Cache) — MBTiles SQLite under our concurrent read+write workload needs WAL + connection pool + transaction batching. S54: the canonical mbtiles SQLite failure modes are (a) database is locked errors when concurrent writers compete with readers (default rollback journal is single-writer), (b) per-tile commit overhead crippling throughput on non-SSD. Recipe:

  • PRAGMA journal_mode = WAL (mandatory for mixed read+write).
  • Connection pool (cf. MbtilesPool from maplibre/martin S54) — multiple read connections + one write connection.
  • Transaction batching: bulk insert per N tiles per Component-1b cycle, not per tile.
  • Disable per-INSERT commit; rely on transaction boundary. The draft's tile-cache section says "MBTiles SQLite + per-tile metadata" but doesn't specify these. Add as a hard implementation note.
  • Source: S54.
  • Confidence: High.

M-9 (Component 1b / Tile Dedup — new safety risk) — onboard tile overwrites can poison the cache. The draft's dedup rule:

If cache has a tile and the cache tile's source ∈ {service} AND the cache tile's capture_date is older than AC-8.2 freshness threshold AND our quality score > existing → write (overwrites with source = onboard). The risk: a confidently-bad onboard pose (over-confident EKF covariance escapes the σ_xy ≤ 10 m gate) writes a tile that's misaligned by, say, 3050 m, but with high inlier count. Next flight, that misaligned tile becomes the satellite anchor for another fix → error compounds across flights. This is a feedback-loop safety hazard that AC-NEW-4 (false-position budget) does not currently capture, because Monte-Carlo over a single flight doesn't model the cross-flight cache-poisoning amplification. Mitigations (any of, ideally all):

  1. Service-source tiles are immutable within freshness budget. Onboard tiles overwrite only stale or other-onboard tiles, never a fresh service tile.
  2. Voting layer at the Service ingest. An onboard tile gets promoted to "trusted basemap" only after N≥2 independent flights confirm consistent geo-alignment within X m of each other.
  3. Quality score includes parent-pose covariance as a hard gate, not just inlier count: a tile written from σ_xy > 5 m (tighter than the 10 m generation gate) is marked as "soft" and flagged in the sidecar.
  4. An additional AC: "AC-NEW-7 — cache-poisoning safety" — see proposed addition in solution_draft02.md.
  • Source: derived analytical finding (no single L1/L2 — this is a design-level hazard exposed by Mode B reasoning).
  • Confidence: ⚠️ Medium (hazard is real and well-known in cartography/SfM; specific mitigation choice is empirical).

M-10 (Component 9 / Process topology) — Free-threaded Python 3.13 is not v1-ready. S55: free-threading is experimental, has a "substantial single-threaded performance hit", many C extensions don't yet support it, and the GIL auto-re-enables on import of any non-FT-aware extension (which would silently include numba, possibly TensorRT bindings, possibly older pymavlink). The draft's choice (single asyncio Python process + TRT subprocess workers + numba on hot path) is correct for v1 — but the rationale should be sharpened from "GIL is a risk we mitigate" to "free-threaded Python is not yet a substitute; revisit in v1.1 once NumPy/SciPy/numba/TRT bindings stabilise on PEP 703."

  • Source: S55.
  • Confidence: High.

M-11 (Component 5 / W4.a) — ODOMETRY for fixed-wing in ArduPilot has known production gotchas. S42 confirms ODOMETRY landed Dec 2021; S43 (PR #30080, "External nav+gps fix", merged 2025) shows ongoing work on the source-switching path when running external-nav alongside GPS. Practitioner-reported issues from S41/S42 discussion:

  • velocity errors when companion-computer-derived velocity is fed into EKF3,
  • position-estimate resets when external-nav loses reference,
  • conflicts when running external-nav alongside GPS. This is directly relevant to AC-NEW-2 (3 s spoofing-promotion latency) — the source switch is exactly the path that has known bugs. Mode B's recommended hybrid (GPS_INPUT primary + ODOMETRY when full covariance is available) needs SITL coverage of source-switching scenarios as a hard prerequisite, not a v1.1 follow-up.
  • Source: S41, S42, S43.
  • Confidence: High.

M-12 (Component 1b / R-Terrain) — Eastern-Ukraine relief amplitude breaks the "flat enough" assumption near edges. S56: Kharkiv-region UAV survey reports ~24 m peak-to-trough relief between low and high points in test areas, with creek + gully (yary/balky) systems. At 1 km AGL with a 35° HFOV camera, a 24 m elevation deviation at the frame edge produces ~17 m horizontal misalignment when projected via the flat-Earth assumption. That's inside AC-1.1 (50 m@80%) but eats into AC-1.2 (20 m@50%, hard-floor variant). Recommended addition: a per-sector DEM lookup (one-time pre-flight) that classifies sectors as "flat" (≤5 m amplitude), "moderate" (515 m), "rugged" (>15 m). The system uses tile-anchor weight-decay or skips ortho-tile generation in rugged sectors.

  • Source: S56.
  • Confidence: ⚠️ Medium (S56 is one regional survey; relief varies across the operational area).

M-13 (Datasets) — TartanAir V2 is a stronger synthetic baseline than MidAir; flag for user reconsideration. S51: TartanAir V2 is photo-realistic (AirSim) with native IMU + 12-cam rigs + 65 environments + season/weather variation + custom camera models. The draft drops synthetic IMU per user instruction (AC-NEW-4 validation rewritten in solution_draft01). User's stated reason: Mavic-class dynamics ≠ fixed-wing dynamics. TartanAir V2 lets us configure motion patterns, so the dynamics-mismatch argument is weaker for TartanAir than for MidAir. This is a real choice for the user: either keep "real-data only" purism, or add TartanAir V2 as an early-bench-off-only baseline. Surface to user as an open question, not a unilateral change.

  • Source: S51.
  • Confidence: ⚠️ Medium (technical viability is high; product/operator preference is the user's call).

M-14 (Component 3 / W1.c) — Add AerialExtreMatch and 2chADCNN to the matcher V&V plan for season/viewpoint robustness. Two underweighted benchmarks:

  • AerialExtreMatch (S49): 1.5 M synthetic image pairs with 32 difficulty levels crossing overlap × scale × pitch — exact failure-mode profile for our 1 km AGL operational regime. Real-world UAV localization subset for end-to-end validation.
  • 2chADCNN (S50): season-aware UAV↔satellite template-matching reference. Either include as bench-off candidate (vs. generic GIM/RoMa), or as a season-robustness benchmark the bench-off candidates run against.
  • Source: S49, S50.
  • Confidence: High.

M-15 (Component 4) — Real fixed-wing monocular VO is harder than the draft implies. S52: SVO, DSO, ORB-SLAM2 all "had significant difficulty maintaining localisation" on real fixed-wing flights at altitude. S53: high-altitude (3001000 m AGL) VIO publishes drift numbers in the same band as our AC-1.3. Conclusion: the draft's choice ("custom 2-frame homography VO using the Component-3 matcher") is right for our framing (VO between satellite anchors, not standalone metric SLAM), but the AC-1.3 drift budget (<100 m without IMU, <50 m with IMU) needs validation against real fixed-wing footage — not Mavic-class footage — before lock.

  • Source: S52, S53.
  • Confidence: High.

Mode B Findings — second adversarial pass (user-driven, 2026-04-26)

M-16 (Component 2 / Granularity) — VPR retrieval unit must be decoupled from the storage-tile boundary. The Mode A and Mode B draft both said "FAISS IVF over per-tile DINOv2-VLAD vectors" using storage tiles at z=20 (~154 m × 154 m ground). A 1 km AGL nadir frame covers 30100 such tiles depending on lens. Cosine similarity between a frame descriptor (covers ~600 × 450 m) and a tile descriptor (covers 154 × 154 m) is fundamentally mismatched and noisy. None of the published aerial-VPR systems do it this way:

  • AerialVL (S03) preprocesses the reference satellite map into frame-footprint-sized reference chunks matched to expected drone-frame ground coverage.
  • AnyLoc (S05) uses overlapping macro-windows scaled to query footprint on aerial.
  • NaviLoc uses a sliding-window descriptor over the basemap. Conclusion: the storage tile (z=20, 512×512) stays as the dedup / orthorect unit. The VPR chunk is a separate concept: ground-footprint chunks sized to the expected frame coverage with 4050% overlap so any frame footprint lands cleanly inside ≥1 chunk. Optionally multi-scale (one set per altitude band). Index is over chunks, not tiles.
  • Source: re-reading S03 + S05 with the granularity question in mind; verified against the user-surfaced gap.
  • Confidence: High. The error mode is well-known in the aerial-VPR literature; the original draft just under-specified the retrieval unit.

M-17 (Component 2 / Invocation policy) — VPR is a re-loc-trigger module, not an every-frame module. Per Component 5 EKF analysis, in steady state (recent anchor < 2 s, σ_xy < 20 m, VO healthy), a geometric prior from the IMU + VO predicted position is enough to pick top-K candidate VPR chunks by distance alone — no DINOv2 forward needed. VPR's value is concentrated in the resilience paths:

  • AC-NEW-1 cold start — no IMU prior at all → VPR is the only viable narrow.
  • AC-3.2 sharp turn — VO fails, IMU prior degrades fast → VPR re-anchors.
  • AC-3.3 disconnected segment — explicitly requires "global descriptor retrieval" — VPR.
  • σ_xy growth — when EKF position covariance escapes σ_xy ≥ 50 m, geometric prior is too wide; VPR re-narrows. Conclusion: control flow is if (steady_state) { use geometric prior } else { invoke VPR }. Saves ~1035 ms/frame and lets VPR backbone idle (one less concurrent process during cruise). The DINOv2-base TRT engine still has to be resident in GPU memory for fast invocation.
  • Source: derived from M-1, M-5, AC-NEW-1, AC-3.2, AC-3.3, EKF analysis. Independently corroborated by user feedback on the architecture.
  • Confidence: High.

M-18 (Component 2 / Fallback) — expanding-window retry on unconvincing top-1. Standard pattern in re-loc literature: if top-1 VPR similarity is below threshold OR top-1/top-2 gap is below threshold (both signs that VPR is unsure), expand the candidate set to adjacent chunks (±1 chunk in each direction = 8 neighbours in a regular grid; or radius-N expansion for sparse-overlap layouts) before failing over to operator-assisted re-loc. Cheap to add: same FAISS index, larger K, no extra DINOv2 forward.

  • Source: standard relocalization pattern (cf. ORB-SLAM3, GISNav, NGPS implementations).
  • Confidence: High.

M-19 (Component 2 / Active-conflict robustness) — multi-scale chunks + OSM road overlay + sector-driven K + negative cache. Active-conflict scene change (destroyed buildings, cratering, dam flooding, road realignment) is a frequent operational reality in the eastern/southern Ukraine deployment, not an edge case. Layered mitigations beyond M-16/17/18:

  • Multi-scale VPR chunks: maintain BOTH fine-scale (z=20-derived) and coarse-scale (z=17/18-effective) chunk descriptor sets. Coarse-scale descriptors capture road-network + field-boundary + waterway structure that survives building destruction. ~12 MB extra disk, ~3 min one-time pre-flight DINOv2 forward.
  • OSM road-network overlay: extract OSM road geometry for the operational area pre-flight as a binary "road-mask" tile sidecar; matcher applies bonus inlier weighting on keypoints that fall on road edges. GISNav uses this pattern. Roads are the single most change-stable feature in active-conflict zones.
  • Sector volatility classification drives K (binds to AC-NEW-6 sector_class): K=5 stable / K=20 active / K=50 expanding-window-fallback.
  • Onboard-tile rapid promotion in active sectors: refines M-9's 2-flight voting — single-flight promotion allowed in active sectors when σ_xy ≤ 3 m AND OSM-road-overlap ≥ 70 % (dual gate keeps safety).
  • Negative cache: tiles repeatedly rejected by matcher across flights get trust_level = stale_destroyed, excluded from retrieval until Service refresh.

The two highest-leverage of these are multi-scale chunks and OSM overlay; the rest are essentially free.

  • Source: derived from M-9, M-16, M-17, M-18 + standard cartographic-stability reasoning + GISNav reference architecture; user-driven concern about active-conflict scene change frequency.
  • Confidence: High on multi-scale + OSM (literature-backed); ⚠️ Medium on the OSM-road-overlap-≥-70 % numeric threshold (needs empirical calibration).

M-20 (Component 1) — Storage tile zoom level pinned at z=20. Trade-off analysis in response to user question (z=18 vs z=20):

  • ADTi 20MP APS-C @ 1 km AGL with 2450 mm lens → frame GSD in 818 cm/px range. Mid-range (~35 mm lens) → ~12 cm/px.
  • Frame-vs-reference scale ratio at z=20 (30 cm/px): 2.5× — well within the SP+LG / GIM-LightGlue "well-handled" band (≤4× per published IMW-style benchmarks).
  • Frame-vs-reference scale ratio at z=18 (~120 cm/px): 10× — outside the SP+LG well-handled band; sub-pixel keypoint-correspondence accuracy degrades sharply, pushing AC-1.2 (50 % @ 20 m) and AC-2.2 (MRE < 2.5 px) into risk territory.
  • Storage @ z=20 over 400 km² ≈ 2.8 GB cache + 30 MB DEM + 16 MB VPR chunk index ≈ 3 GB total — 28 % of the 10 GB budget, leaving 7 GB headroom for FDR overflow and multi-scale chunks (M-19).
  • Storage @ z=18 over 400 km² ≈ 220 MB total — saves ~2.5 GB but provides no operational benefit at our budget level.
  • Pre-flight compute: z=20 takes ~5 min; z=18 takes ~3 min. Both trivial on the bench. Not a deciding factor.
  • Decision: z=20 for the storage tile. The accuracy benefit is meaningful; the storage cost fits comfortably. Folded into restrictions.md.
  • Source: derived analysis using ADTi camera spec + Mode B finding S40 (DINOv2 latency) + IMW-style matcher-resolution-mismatch data.
  • Confidence: High.

M-21 (2chADCNN re-classification) — ceiling reference, NOT bench-off candidate. Closer reading of S50 (MDPI Drones 2023) reveals 2chADCNN is structurally incompatible with our bench-off:

  • Output format: template-overlap region (IoU-style), not sub-pixel keypoints. Component 3's PnP needs keypoint correspondences; 2chADCNN can't supply them.
  • Tested altitude band: 252500 m AGL, not 1 km. Their experimental envelope doesn't cover our regime.
  • No Jetson / TRT benchmark: trained on Intel i5 + 8 GB RAM CPU only.
  • Method paradigm: traversal-search template matching (slide template over satellite image at every position, compute similarity). Doesn't scale to a 400 km² operational area in our latency budget.
  • Reported numbers: real-summer overlap-IoU 0.920.99; synthetic-snow overlap-IoU 0.820.95. Useful as a published season-robustness number against which we benchmark our chosen modern matcher (SP+LG / GIM-LightGlue) — but not as a candidate for the matcher slot itself.

Walks back the "optionally a bench-off candidate" tag in M-14. 2chADCNN is purely a season-robustness ceiling reference.

Newer / more relevant season-aware references for the open-research reading list:

  • AFF-CNN-HTransformer cross-perspective UAV-satellite matching (Sci Reports 2025) — hybrid CNN+Transformer cross-view + season.
  • Polar-coordinate-transformation rotation-and-season-invariant UAV-satellite matching (2026) — explicitly addresses both rotation and season; intersects nicely with our IMU-driven de-rotation step.
  • Source: closer reading of S50 + new search results 2025-2026.
  • Confidence: High on 2chADCNN re-classification; ⚠️ Medium on the newer papers (need to read full PDFs before bench-off inclusion).

Mode B Round 2 (component replacements & sweep) — appended 2026-04-26

M-22 (Component 4 / VO architecture) — custom 2-frame homography VO is the wrong design. Source: S52 (AFIT thesis), S60 (cuVSLAM), S64 (Isaac ROS UAV reference), S72 (high-altitude VIO), S73 (DPV-SLAM).

  • Draft02 C-4 says "custom 2-frame VO via SuperPoint+LightGlue homography". This skips loop closure, sparse bundle adjustment, keyframe-based local mapping — every mechanism that bounds drift in production VO/SLAM systems.
  • AFIT thesis (S52) shows even ORB-SLAM2 / SVO / DSO struggle on real fixed-wing flights; a hand-rolled 2-frame homography VO will be strictly worse.
  • High-altitude VIO field test (S72): stereo-VIO = 2.186 m / 800 m at 40100 m AGL; monocular-VIO is "acceptable but worse". At 1 km AGL motion parallax shrinks ~1025× per frame, further degrading monocular VO.
  • Recommendation: replace custom 2-frame VO with cuVSLAM (S60, S64) in monocular + IMU mode.
  • Confidence: High on "custom 2-frame VO is wrong"; ⚠️ Medium on "cuVSLAM is the right replacement" — high-altitude fixed-wing performance is unproven on cuVSLAM's published benchmarks (KITTI urban driving + EuRoC indoor MAV). Bench-off in F-T1b mandatory.

M-23 (Component 4 / VO candidate evaluation on Jetson Orin Nano Super). Source: S60, S61, S62, S71, S73, S76.

  • cuVSLAM (S60): NVIDIA-supported, CUDA-optimized, drop-in via isaac_ros_visual_slam, Apache-2.0. Reference designs on Orin Nano (S64, S77) confirm runtime feasibility. <1% ATE on KITTI / <5cm on EuRoC. Verdict: v1 lead candidate.
  • DPVO / DPV-SLAM (S61, S73): SOTA deep VO, but DPVO-QAT++ is benchmarked on RTX-4060, not Jetson. Original DPVO @ 25× real-time on RTX-3090 (4 GB) → Orin Nano Super extrapolation ≈ 410 FPS without QAT, ≈615 FPS with QAT. Borderline for 10 Hz target; not v1.
  • MASt3R-SLAM (S62): 15 FPS on a single GPU; sub-1 Hz extrapolated on Orin Nano Super. Infeasible for inline v1.
  • VINS-Fusion / OpenVINS / BASALT / SVO Pro (S71): Classical, well-tested, but require manual integration (OpenCV pinning, ArUco fixes, DDS / ROS plumbing) and no Jetson-class CUDA acceleration of the front-end. Higher integration cost than cuVSLAM with no accuracy advantage.
  • Custom 2-frame homography VO (current draft02 plan): M-22 already disqualified.
  • Confidence: High.

M-24 (Component 3 / cross-view matcher — LiteSAM evaluation). Source: S58.

  • LiteSAM is purpose-built for satellite↔aerial AVL in GPS-denied environments. Architectural choices (TAIFormer + MinGRU sub-pixel refinement) are tailored to large appearance variations and texture-scarce regions — exactly our regime.
  • Results: 6.31 M params (2.4× smaller than EfficientLoFTR); RMSE@30 = 17.86 m on UAV-VisLoc; 61.98 ms on standard GPU; 497.49 ms on Jetson AGX Orin (FP16-optimized).
  • Crucial extrapolation: AGX Orin INT8 throughput ≈ 275 TOPS, Orin Nano Super ≈ 67 TOPS → 4× scaling factor → LiteSAM on Orin Nano Super ≈ 15002000 ms / pair. Well outside our 400 ms p95 budget for inline use.
  • Three useful roles (not the inline matcher):
    • (a) Re-localization fallback — invoked rarely (cold start, σ_xy > 50 m), 1.52 s latency tolerable.
    • (b) Validation oracle — ground-truth-quality matches for offline regression bench.
    • (c) Distillation teacher — train a smaller student model with LiteSAM-supervised correspondences for the satellite-aerial domain.
  • Verdict: add LiteSAM in roles (a)/(b)/(c); SP+LG (TRT FP16/INT8) remains the inline matcher.
  • Confidence: High on architectural fit; ⚠️ Medium on the 4× AGX-Orin → Orin Nano Super scaling — needs empirical confirmation in bench-off.

M-25 (Component 3 / cross-view matcher — RoMa v2 / MapGlue / MATCHA). Source: S63 + earlier MapGlue / MATCHA notes.

  • RoMa v2 (S63): SOTA dense matcher, frozen DINOv3 + custom CUDA + predictive covariance. GPU-class compute. Infeasible inline on Orin Nano Super; viable as offline ceiling reference for Component 3 bench-off.
  • MapGlue / MATCHA: Cross-modal/multimodal matchers — useful research-track candidates but no Jetson deployment data; same offline-only verdict.
  • Verdict: not a v1 candidate; offline ceiling reference. The matcher bench-off (deferred research item) MUST include both as ceilings so we know how much accuracy we're trading away by using SP+LG inline.
  • Confidence: High.

M-26 (Component 5 / EKF→ESKF question — architectural reframing). Source: S65, S66, S67, S68, S69.

  • The FC (ArduPilot 4.5+) runs EKF3, a classical extended Kalman filter — not an ESKF. PX4 EKF2 is the ESKF (S68); we are not on PX4. We cannot swap the FC's filter.
  • The "EKF vs ESKF" debate therefore applies only to the companion-side filter (Component 5 in draft02).
  • Best practice for ArduPilot ExtNav setups (S65, S66, S67): companion does NOT run a heavy filter on top. Companion produces (visual fix → GPS_INPUT) and/or (relative pose → ODOMETRY) with well-calibrated covariances; ArduPilot EKF3 fuses those with the FC's IMU.
  • ArduPilot issues #30076 (S65) and #32506 (S66) document concrete failure modes when feeding the FC two simultaneous position sources — only one position source per axis at a time. The hybrid GPS_INPUT + ODOMETRY plan from M-1 must therefore split responsibilities by channel, not duplicate position on both.
  • Architectural revision: the companion-side EKF in draft02's C-5 is not necessary for v1. It can be replaced by a lightweight "covariance calibrator + outlier gate + source-label producer": each upstream (matcher, VO, IMU passthrough if any) emits a hypothesis with a covariance; a Mahalanobis gate rejects outliers; covariances are re-scaled if empirical residuals indicate over- or under-confidence; results are emitted on the appropriate MAVLink channel. No state propagation, no IMU integration on the companion.
  • If a companion-side filter is justified later (e.g., to smooth visual fixes before they reach the FC, or to integrate VO with the FC's downsampled-IMU stream the companion can subscribe to), use vanilla ESKF (S69) for orientation correctness — but only after F-T9 SITL shows the FC's EKF3 cannot handle our raw input quality.
  • Confidence: High on dropping the companion-side EKF for v1; ⚠️ Medium on whether we'll need to re-introduce one for v1.x.

M-27 (Component 1b / Ortho-Tile Generator — use Orthority). Source: S59.

  • Orthority (Python, MIT-class) supports frame + RPC camera models, GeoTIFF DEM lookup, RPC refinement, pan-sharpening — i.e., everything draft02's hand-rolled pinhole-on-DEM ortho was going to reinvent.
  • Pip-installable (pip install orthority). API-driven (per-image ortho via Ortho class) → callable inline from our Component 1b worker.
  • ODM is post-processing batch SfM — wrong tier; not for per-frame ortho on a 1 km AGL nadir camera with known FC pose.
  • Verdict: replace draft02's "Pinhole projection on per-sector DEM" with Orthority frame-camera ortho. Falls back to a 6-line cv2.warpPerspective + bilinear DEM lookup if Orthority's per-frame latency on Orin Nano Super blows our budget — measure in F-T14.
  • Confidence: High on Orthority being the right tier; ⚠️ Medium on the latency assumption — needs measurement.

M-28 (Component 1 / tile storage — MBTiles WAL stays; PMTiles / COG considered). Source: COG/PMTiles search results + draft02 M-8.

  • COG: Highly-tiled COG metadata can trigger 500 MB initial download on a 7 GB file (geotiff.js issue #479) — defeats selective access on a bandwidth-constrained UAV system. Not a fit.
  • PMTiles: Single-file alternative to MBTiles, cloud-optimized. Good for HTTP serving (RPi tests show competitive performance). For our use case (local microSD, embedded reader+writer), PMTiles loses the SQLite-WAL concurrency story we already designed for in M-8.
  • Verdict: MBTiles + WAL (M-8) remains the right choice. No revision.
  • Confidence: High.

M-29 (Component 9 / orchestrator — ROS 2 vs DIY Python). Source: S64, S77.

  • ROS 2 Humble + JetPack 6 + Isaac ROS 3.2 + cuVSLAM + MAVROS is a proven reference architecture on Orin Nano Super (S64, S77).
  • If we adopt cuVSLAM (M-22/M-23), the lowest-friction path is to consume cuVSLAM via isaac_ros_visual_slam (ROS 2 wrapper) and bridge to the FC via MAVROS — not to re-export cuVSLAM's C++ API into a custom Python orchestrator.
  • ROS 2 cost: extra ~25 % CPU for DDS + topic serialization; learning curve for the team; deployment image grows ~200 MB.
  • ROS 2 benefit: free integration of cuVSLAM, MAVROS, Isaac ROS perception nodes; battle-tested; observability via ros2 bag and rqt_* tooling.
  • DIY Python alternative (draft02 plan): keeps everything in one asyncio process; lowest overhead; but we re-export every ROS 2 component we want to consume (cuVSLAM via Python bindings, MAVROS-equivalent via pymavlink, etc.).
  • Verdict: lean toward ROS 2 Humble + Isaac ROS for v1, with our matcher / VPR / ortho / FDR / fusion-glue nodes implemented as ROS 2 Python nodes (rclpy). Decision is not locked — it's the largest open architectural question for round 2 and the user should be asked.
  • Confidence: ⚠️ Medium — depends on whether the team has ROS 2 experience and whether the ~5 % CPU overhead is acceptable inside the latency budget. This is a Q for the user.

M-30 (Component 5 / hybrid GPS_INPUT + ODOMETRY — channel split per S65/S66/S67). Source: S65, S66, S67.

  • M-1 (round 1) said "emit BOTH GPS_INPUT AND ODOMETRY in parallel". S65/S66/S67 say only one position source per axis at a time and document concrete bugs when the FC sees two.
  • Revised channel split:
    • Option A (simplest, recommended for v1): GPS_INPUT carries position + velocity (lat/lon/alt + N/E/D velocities + h_acc/v_acc/vel_acc covariance scalars). ODOMETRY is disabled for v1. ArduPilot configured EK3_SRC1_POSXY = GPS, EK3_SRC1_VELXY = GPS, EK3_SRC1_YAW = GPS+Compass. Our companion provides a "GPS-equivalent" via GPS_INPUT (GPS1_TYPE=14); ArduPilot treats it identically to a real receiver. Failover to backup GPS via EK3_SRC2_*.
    • Option B (richer, v1.1+): ODOMETRY carries position + velocity + yaw + full 21-element covariance, GPS_INPUT carries fix only as fallback (not actively fused while ODOMETRY is healthy). ArduPilot configured EK3_SRC1_POSXY = ExternalNav, EK3_SRC1_YAW = ExternalNav, with EK3_SRC2_POSXY = GPS as backup. Requires PR #30080-class fixes for clean source switching.
  • Original M-1 (both channels for the same axis) is a misconfiguration, not a feature. Walk back.
  • Verdict: v1 ships Option A. Option B is v1.1 territory once F-T9 confirms source-switching behaves cleanly under PR #30080.
  • Confidence: High.

M-31 (Component 6 / sysid sharing on the wire). Source: S65, S67.

  • Round 1 M-6 picked "distinct system-IDs for MAVSDK (sysid=10) and pymavlink (sysid=11), sharing the serial port via ArduPilot's native MAVLink routing — no router daemon".
  • This decision survives round 2 unchanged. The distinct-sysid trick + ArduPilot native routing is documented and works for any MAVLink2 stack. No router CVE exposure (M-6 / S45).
  • Open task: confirm the chosen sysids don't collide with any MAVLink2 forwarding rule on QGroundControl GCS-side; document in deploy runbook.
  • Confidence: High.

M-32 (Component 9 / Python topology — confirmed). Source: S55.

  • Round 1 M-10: stay on CPython 3.11/3.12; defer free-threaded 3.13 to v1.1. Survives round 2 unchanged.
  • If Component 9 moves to ROS 2 (M-29), the Python version question still applies — rclpy supports 3.11/3.12; 3.13 free-threaded is also experimental there.
  • Confidence: High.

M-33 (Component 2 / VPR — no new entrants worth adding). Source: round-2 searches.

  • Searched for newer VPR SOTA than DINOv2-SALAD / BoQ (CVPR 2024). The 2025 landscape is matcher-centric (RoMa v2, LiteSAM, MASt3R-SLAM); no new VPR backbone has displaced SALAD/BoQ on aerial cross-domain.
  • Round 1 shortlist {AnyLoc, SALAD, BoQ, MixVPR} stands.
  • Confidence: High.

M-34 (Component 4 / camera intrinsics learning — calibration-free SLAM). Source: S62.

  • MASt3R-SLAM is calibration-free; cuVSLAM expects intrinsics. Our nav cam (ADTi 20MP APS-C) will be calibrated pre-flight via standard checkerboard procedure → cuVSLAM's intrinsics requirement is not a friction point.
  • Confidence: High.

M-35 (Component 5 / IMU access on the companion — open question). Source: S64 reference designs.

  • The reference cuVSLAM-on-Jetson designs (S64) use the camera's built-in IMU (RealSense D435i) for VIO. Our nav cam (ADTi 20MP APS-C) has no IMU; the FC has the IMU.
  • Two paths to feed IMU into companion-side cuVSLAM:
    • (a) MAVLink RAW_IMU / SCALED_IMU stream from FC → companion subscribes via pymavlink, feeds cuVSLAM. ~1 kHz IMU on FC down-rated to ~200400 Hz over MAVLink is sufficient for monocular VIO; latency budget acceptable.
    • (b) Add a dedicated companion-side IMU (BNO055 / ICM-42688P / Bosch BMI270 over SPI/I²C) with its own time sync. More hardware, but no MAVLink-bus contention.
  • Verdict v1: try path (a); if cuVSLAM's IMU sync sensitivity (timestamping) is too tight for MAVLink-rated IMU, fall back to (b) in v1.1.
  • Confidence: ⚠️ Medium — depends on cuVSLAM's tolerance for IMU rate / timing jitter; needs empirical check during integration.