# Fact Cards — Phase 1 (AC & Restrictions Assessment) Each fact card: statement, source(s), confidence (High / Med / Low), audience. --- ## A — Position accuracy state of the art **F-A1**. State-of-the-art UAV cross-view visual localization (drone image vs. ortho satellite map) at low altitude (30–300 m, multi-view, oblique allowed) achieves **74.1% recall@5 m** on the AnyVisLoc benchmark (best combined retrieval + matching + PnP). - Source: S02 (AnyVisLoc paper, 2025). - Confidence: High. Audience: implementer / decision-maker. **F-A2**. **Cross-view image matching benchmarks** report Relative Distance Score (RDS) up to 84.40% and **MA@20 (matched within 20 m) up to 83.35%** in nadir-favoring setups — i.e., 80%+ within 20 m is achievable with current methods on similar reference data. - Source: S39. - Confidence: Med. Audience: implementer / decision-maker. **F-A3**. The most relevant **fixed-wing aerial public benchmark** is UAV-VisLoc (6,742 drone images, fixed-wing & multi-rotor, **altitudes 405–840 m**, ortho satellite reference at **0.3 m/px** from Google Earth, 11 sites in China incl. cities/towns/farms/rivers/hills/forests). - Source: S01. - Confidence: High. Audience: implementer. **F-A4**. **AerialVL** (RA-L 2024) is a fixed-wing UAV dataset with 11 sequences / ~70 km of trajectory, RGB camera with **gimbal**, NovAtel GNSS at **1.5 m RMS** ground truth, and reference satellite map. Provides VPR + visual alignment + VO baselines. - Source: S03. - Confidence: High. Audience: implementer. **F-A5**. The **viewpoint discrepancy** (oblique aerial vs. nadir satellite) and **temporal staleness** (seasonal / construction change) are the two dominant accuracy degraders cited across cross-view localization literature. ViewBridge (2025), OrthoLoC (2025), and AnyVisLoc all emphasise BEV projection or 3D-grounded matching as mitigation. - Source: S02, S36, S37. - Confidence: High. Audience: technical expert. **F-A6**. Confidence-score schemes used by mature visual localization stacks: (a) RANSAC inlier ratio after PnP/homography; (b) reprojection error variance; (c) top-K retrieval similarity gap; (d) 6-DoF pose covariance from EKF/factor-graph optimization; (e) photometric consistency vs. tile. - Source: S03, S04, S32 (and ORB-SLAM3 lit). - Confidence: High. Audience: implementer. --- ## B — Image registration & feature matching **F-B1**. **SuperPoint + LightGlue** with TensorRT runs at ~286 FPS on RTX 3080 at 320×240. SuperPoint ≈ 0.95 ms, LightGlue ≈ 2.54 ms per pair on RTX 3080. - Source: S11. - Confidence: High. Audience: implementer. **F-B2**. **Jetson Orin NX (sibling SoC)** has a working LightGlue+TensorRT deployment (CUTLASS FlashAttention V2 plugin, qdLMF repo) — confirms feasibility on Jetson Orin–class hardware. No public public-released benchmark for Jetson Orin Nano Super specifically. - Source: S12. - Confidence: Med. Audience: implementer. **F-B3**. **XFeat** (CVPR 2024) is **5× faster** than LightGlue / SuperPoint while maintaining comparable accuracy; runs in real-time on a budget CPU (i5-1135G7); offers semi-dense matching mode; C++ + CUDA 12.2 implementations available. - Source: S08. - Confidence: High. Audience: implementer. **F-B4**. **MASt3R** (ECCV 2024) achieves +30% absolute VCRE AUC on Map-free localization vs. prior SOTA — valuable for cross-view UAV/satellite due to its 3D-grounded matching, but is **heavier** (transformer with depth backbone) than LightGlue/XFeat — may exceed Jetson Orin Nano Super 8 GB envelope under the user's latency budget without aggressive distillation/quantization. - Source: S09. - Confidence: Med. Audience: technical expert. **F-B5**. **Mean Reprojection Error <1 px** is a tight but achievable target for *homography-fit* on overlapping aerial pairs; for full-PnP across UAV–satellite the typical achieved MRE is 1–3 px on cross-view benchmarks (heavily dependent on pixel scale ratio between drone and satellite). - Sources: S01 (UAV-VisLoc), S03 (AerialVL), S36 (ViewBridge). - Confidence: Med. Audience: technical expert. --- ## C — Resilience & re-localization **F-C1**. **Aerial VPR survey + aero-vloc benchmark** (2024) provides a unified evaluation framework over AnyLoc, CosPlace, EigenPlaces, MixVPR, NetVLAD, SALAD, SelaVPR with re-ranking via LightGlue/SuperGlue. Datasets used: VPAir, ALTO, MARS-LVIG. - Source: S04. - Confidence: High. Audience: implementer. **F-C2**. **AnyLoc** (DINOv2 + unsupervised VLAD) achieves up to 4× higher Recall@1 than environment-specialised approaches across urban / aerial / underwater / subterranean **without training**. Strong default for cross-view re-localization when training data is limited. - Source: S05. - Confidence: High. Audience: implementer / decision-maker. **F-C3**. **MixVPR**: 94.6% R@1 on Pitts250k with <50% the parameter count of NetVLAD — best lightweight VPR aggregation in 2023–2024. - Source: S06. - Confidence: High. Audience: implementer. **F-C4**. **Tile-zoom / overlap selection** when constructing the satellite reference map is a **critical** parameter for VPR efficiency and accuracy in aerial domain (per the 2024 survey). - Source: S04. - Confidence: High. Audience: implementer. --- ## D — Onboard real-time performance on Jetson Orin Nano Super **F-D1**. **Jetson Orin Nano Super** (with JetPack 6.2 "Super Mode"): **67 TOPS sparse INT8** AI performance, 8 GB shared LPDDR5, supports **15 W / 25 W / MAXN SUPER** power modes. The 25 W mode is the new "reference" performance mode. - Source: S14, S15. - Confidence: High. Audience: implementer / decision-maker. **F-D2**. **Sustained-load thermal throttling** is real on Jetson family — earlier-gen Xavier NX (21 TOPS) throttled within 5 minutes at 640×480 YOLOv8n. Orin Nano Super is reportedly more thermally efficient but **8-hour sustained 25 W requires forced-air cooling and possibly active heatsink** — not solvable purely in software. - Source: S14, S15 + practitioner test S14. - Confidence: Med. Audience: implementer / decision-maker. **F-D3**. **MAXN SUPER** is uncapped; if power exceeds TDP the module auto-throttles. For sustained 8 h flight on a fixed-wing UAV with ~25 W power budget, **the system MUST be sized to fit the 25 W envelope at 100% duty**, not MAXN. - Source: S14. - Confidence: High. Audience: implementer. **F-D4**. Naive scaling from RTX 3080 → Orin Nano Super for SuperPoint+LightGlue gives ~30–40× slower (RTX 3080 ≈ 30 TFLOPS FP16, Orin Nano Super ≈ 1 TFLOPS FP16 scope). At 320×240 ≈ 3.5 ms × 35 ≈ **~120 ms/pair on Jetson Orin Nano Super**. Pre-running matching on a downsampled image (e.g., 1024×683 from 6200×4100) is feasible within the **400 ms p95 budget** when combined with feature caching for the satellite tile. - Source: derived from S11, S14 (back-of-envelope; needs empirical confirmation in Phase 2). - Confidence: Low. Audience: technical expert. - **Action**: Empirical benchmark on actual Jetson Orin Nano Super in implementation phase. --- ## E — Satellite imagery sourcing & legality **F-E1**. **Google Maps / Map Tiles API** explicitly prohibits offline use, image analysis, machine interpretation, object detection, geodata extraction, and "any systems or functions for automatic or autonomous control of vehicle behavior". **Use of Google Maps satellite tiles for an offline UAV navigation system violates the Terms of Service.** - Sources: S22 (Map Tiles API Policies), S23 (Maps Platform ToS). - Confidence: **High** (two L1 sources, explicit language). Audience: decision-maker / legal. - **Severity**: Hard blocker — must be resolved before solution design. **F-E2**. **Bing Maps** also prohibits creating local copies / offline storage of tiles. Tile URLs are not stable; the supported access pattern is dynamic REST queries per session. **Bing tiles are not a viable offline reference source either.** - Source: S24. - Confidence: High. Audience: decision-maker. **F-E3**. **Maxar Vivid Mosaic** offers a **30 cm global basemap** (135 M km², ex-Antarctica) and a **15 cm urban basemap** (7 M km²), **continuously refreshed** with AI-driven change detection. Pricing for archive imagery is approximately **$25–32 / km²** for similar 30 cm products. **Licensing for offline tactical use must be negotiated explicitly with Maxar (Vantor)** — this is the standard path for defense customers. - Sources: S25, S38. - Confidence: High. Audience: decision-maker. **F-E4**. **Airbus Pléiades Neo** provides 30 cm via OneAtlas; volume pricing approximately **€5–8.50 / km²** on a 6-month sliding window. Direct competitor to Maxar at sub-meter resolution. - Sources: S26, S27. - Confidence: High. Audience: decision-maker. **F-E5**. **Sentinel-2 cloudless** (EOX) provides a **free** global mosaic but at **10 m/px** — well below the AC requirement of 0.5 m/px (ideally 0.3 m/px). At **1 km AGL** Sentinel-2 is too coarse to achieve registration with a 24 cm/px drone image without massive scale-bridging losses. - Source: S28 + S01 (drone GSD). - Confidence: High. Audience: implementer. **F-E6**. For **Eastern/Southern Ukraine** specifically, Sentinel-2 / Sentinel-1 are heavily used in 2022+ academic literature for damage / change detection. **Maxar and Planet are the de-facto sources for sub-meter imagery** of Ukraine. Recent satellite imagery for this region is operationally sensitive but commercially available. - Source: S28 (Ukraine 2024–2025 references in the EOX/Sentinel papers). - Confidence: High. Audience: decision-maker. **F-E7**. **Active-conflict-region staleness** is a real risk: dam destruction (Kakhovka), urban damage, cratering, road realignment, smoke/dust — all can defeat cross-view matching against pre-conflict imagery. **Imagery freshness budget should be tightened from "<2 years" to "<6 months for active sectors, <12 months for stable rear areas"** — to be confirmed with operations. - Source: S28 + extrapolation from change-detection literature for Ukraine. - Confidence: Med. Audience: decision-maker. --- ## F — Camera & GSD **F-F1**. **GSD formula**: GSD (cm/px) = (Altitude_m × 100 × Sensor_w_mm) / (Focal_mm × Image_w_px). For a typical full-frame sensor (36 mm wide) with 24 mm wide-angle lens at **1 km AGL** and **6200 px** wide image: GSD ≈ **24 cm/px**, frame footprint ≈ **1.49 km × 0.99 km**. Drone images at 0.1–0.2 m/px (per UAV-VisLoc) is consistent. - Source: S29. - Confidence: High. Audience: implementer. **F-F2**. **Camera intrinsics calibration is mandatory** — without known focal length, principal point, and distortion, sub-pixel MRE is impossible. Pre-flight checkerboard calibration is the standard; some payloads use factory-cal + temperature compensation. - Source: photogrammetry consensus (S01, S03, S29). - Confidence: High. Audience: implementer. --- ## G — MAVLink / MAVSDK / flight controller integration **F-G1**. **GPS_INPUT** is a standard MAVLink message. **ArduPilot**: set `GPS1_TYPE=14` (MAVLink) and the autopilot will accept GPS_INPUT as the primary GPS. **PX4**: native GPS_INPUT support is limited; the standard workaround is to publish via VISION_POSITION_ESTIMATE through the EKF2 vision-pose pipeline. - Sources: S16, S17, S18. - Confidence: High. Audience: implementer / decision-maker. **F-G2**. **MAVSDK-Python does NOT natively support GPS_INPUT** (open issue #320). For Python implementations, **pymavlink** must be used to emit raw GPS_INPUT messages. - Source: S18. - Confidence: High. Audience: implementer. **F-G3**. ArduPilot can **blend or switch between GPS sources** by quality (sat count, HDOP). If the legitimate (jammed) GPS keeps reporting plausible values while the spoofed/denied state is intermittent, the autopilot may oscillate between sources. **The companion computer must explicitly disable / lower-quality the real GPS** (or the autopilot must be configured to *only* trust GPS_INPUT) to avoid this. - Source: S33. - Confidence: High. Audience: implementer / security architect. **F-G4**. **PX4 has GPS spoofing detection** baked into the EKF2 driver chain (u-blox spoof flag, ~1 s hysteresis, GNSS-fusion auto-disable on consistent spoof signal). This is a useful upstream signal for the GPS-Denied system to know "you are now the primary source". - Sources: S19, S20. - Confidence: High. Audience: implementer / security architect. **F-G5**. **PX4 failsafe delay** `COM_POS_FS_DELAY` defaults to **1 s**; `EKF2_NOAID_TOUT` controls dead-reckoning validity. Documented bugs exist (#23970) — version pinning matters. - Source: S21. - Confidence: Med. Audience: implementer. **F-G6**. **QGroundControl** has only **STATUSTEXT** (string) as a first-class companion-computer message channel; ONBOARD_COMPUTER_STATUS (planned) and custom MAVLink messages (NAMED_VALUE_FLOAT/INT, custom dialect) are practical channels for re-localization request UI / confidence scores. - Sources: S34, S35. - Confidence: High. Audience: implementer. --- ## H — Object localization (AI camera, gimbal-only pose) **F-H1**. **Trigonometric ground projection error** with **gimbal-angle-only** (no airframe IMU attitude fusion onto AI cam) is dominated by the **unknown UAV roll/pitch** at the moment of capture. For a fixed-wing UAV, typical roll/pitch in straight cruise is ±2°; in turns up to ±25°. At **1 km AGL**, a 5° unknown attitude → ~87 m ground-position error. **The AC "object localization accuracy is consistent with frame-center accuracy" is therefore unrealistic without attitude fusion in turning flight.** - Source: derived from F-F1 + standard photogrammetry trig. - Confidence: High. Audience: technical expert / decision-maker. - **Action**: revise AC to "consistent with frame-center accuracy in level flight; expect ±h·tan(unknown_attitude) in turns" OR add attitude fusion onto AI cam. **F-H2**. **Flat-terrain assumption** is reasonable for eastern/southern Ukraine (typical relief amplitude ~50–150 m over 10 km). At 1 km AGL with up to 5° gimbal off-nadir, terrain-induced ground-projection error from flat-terrain assumption is typically <30 m for level flight — within the AC envelope. Riverbanks, tall buildings, and reservoir scarps are local exceptions. - Source: derived from S26 + S28 + Ukraine relief data. - Confidence: Med. Audience: technical expert. --- ## I — Hardware envelope & power **F-I1**. Jetson Orin Nano Super in 25 W mode: ~25 W average; with cooling adequately sized for 8-hour duty, sustained throttling can be avoided. Without active cooling, expect throttling within minutes (Xavier NX precedent). - Source: S14, S15. - Confidence: Med. Audience: implementer. **F-I2**. **Storage budget**: User's "~10 GB" estimate for a 400 km² @ 0.3 m/px tile cache is **correct** (400 km² × 11 px²/m² with 3-byte JPEG ≈ 10–13 GB). Persistent cache across flights is feasible with a small NVMe (≥64 GB). - Source: arithmetic; cross-checked S25 (Vivid pricing per km²). - Confidence: High. Audience: implementer / decision-maker. --- ## J — Failsafe & resilience **F-J1**. PX4's own GPS-loss failsafe defaults to ~1 s delay. A reasonable upstream **"system fails to produce an estimate" failsafe `N`** for the GPS-Denied system is **3–5 seconds** — long enough to ride out one sharp turn / re-localization attempt without flapping, short enough to let the flight controller switch to IMU dead reckoning before drift exceeds tens of metres. - Source: S21 + practitioner heuristic. - Confidence: Med. Audience: implementer / decision-maker. --- ## K — Public datasets for IMU / aerial dev & test **F-K1**. **No public dataset perfectly matches all four constraints**: fixed-wing + ~1 km AGL + downward-facing + synchronized IMU + GPS truth. **Closest match is AerialVL** (fixed-wing + gimbal RGB + GNSS, ~70 km of tracks, 11 sequences, RA-L 2024). Altitude band for AerialVL is "different altitudes" (not always 1 km). - Source: S03. - Confidence: High. Audience: implementer. **F-K2**. **UAV-VisLoc** is the largest fixed-wing **drone-vs-satellite localization** dataset (6,742 images, 405–840 m altitudes, 0.3 m/px Google Earth reference) — but it does not provide synchronized IMU. - Source: S01. - Confidence: High. Audience: implementer. **F-K3**. **MidAir** (synthetic, quadcopter) provides full IMU + GPS + depth + semantic at low altitude. Good for **training-time augmentation** but not real-world testing for fixed-wing at 1 km AGL. - Source: S30. - Confidence: High. Audience: implementer. **F-K4**. **Recommended dev/test stack**: AerialVL (primary real-world fixed-wing) + UAV-VisLoc (visual-localization-only validation at 1 km–neighborhood altitude) + MidAir (synthetic IMU augmentation) + the user's own 65 input-data photos for sanity / regression. Real IMU from a dedicated test flight should still be planned for system V&V. - Source: synthesis of S01, S03, S30. - Confidence: High. Audience: decision-maker. --- ## L & M — Restriction & AC gaps / contradictions **F-LM1**. **Restriction "up to 3000 photos per flight"** is **inconsistent** with the stated 8-hour endurance × 3 fps = **86,400 photos** and with the 500 ms minimum interval × 8 h = 57,600 photos. Likely interpretations: (a) On-disk **retention** budget (sub-sample for storage). (b) Imagery for an *individual mission segment* (~17 min × 3 fps = 3,000), not the full sortie. (c) A stale value carried over from a Mavic 3 attempt that should be updated. - **Hard contradiction**: needs user resolution before solution sizing. **F-LM2**. **Camera resolution range "FullHD to 6252×4168"** is wide (~13× pixel-count delta). Per-frame pipeline cost scales with resolution; AC compliance is camera-dependent. Need to lock the **target camera spec** for AC validation. **F-LM3**. **Latency 400 ms vs. cycle 333 ms (3 fps)**: the user has confirmed `<400 ms p95` with skip-allowed. This is **internally consistent**; the AC should be re-stated as "p95 latency <400 ms; up to ~10% of frames may be dropped under sustained load" to remove the apparent contradiction with frame rate. **F-LM4**. **Suggested missing AC** (gap analysis): - **L2** — Time-to-first-fix on cold start / mid-flight reboot (e.g., <30 s after IMU-extrapolated init). - **L3** — Spoofing-promotion latency (system asserts its estimate over flight controller GPS within X seconds of denial). - **L4** — Flight-data-recorder requirement (all photos + estimates + confidence + IMU traces at full rate, retained in non-volatile storage with a budgeted size cap). - **L5** — False-position safety budget (e.g., probability of an estimate >500 m from truth must be <0.1% per flight). - **L6** — Operational temperature / vibration envelope (MIL-STD-810 lite or RTCA DO-160G low-altitude variant). - **L7** — Imagery freshness operationally enforced (e.g., reject tiles older than 12 months for active sectors). **F-LM5**. **Restriction "Google Maps allowed"** is **legally not allowed** per F-E1/E2. The project must change source to a license-cleared provider (Maxar Vivid / Airbus Pléiades / commissioned tasking / government feed) before deployment. **This is a blocker, not a tweak.** --- ## Mode B Findings — adversarial assessment of `solution_draft01.md` (2026-04-26) **M-1 (Component 6 / AC-4.3) — ODOMETRY is ArduPilot's preferred external-nav channel, not GPS_INPUT.** ArduPilot's own dev docs (S41) call **ODOMETRY "the preferred method"** for sending external position estimates to EKF3, ahead of both VISION_POSITION_ESTIMATE and GPS_INPUT. ODOMETRY carries quaternion + 3-D linear velocity + a **21-element pos+attitude covariance** (incl. native yaw error) + a `quality` field (-1=failed, 0=unset, 1..100). VISO_QUAL_MIN gates ignored messages on the FC side. GPS_INPUT collapses our 6-DoF covariance into a scalar `h_acc` / `v_acc`, which directly under-reports our yaw covariance and under-utilises the FC's EKF3. The draft's GPS_INPUT-only choice is sub-optimal for AC-NEW-4 (false-position safety) covariance fidelity. - Source: S41, S42, S43. - Confidence: ✅ High. **M-2 (Component 3) — MASt3R is not viable as primary on Orin Nano Super at 25 W.** `mast3r-runtime` (S57) lists Jetson Orin support as **"Planned"**, not implemented. *Speedy MASt3R* (S57 paper-side) achieves 91 ms / pair on an **A40 GPU**, which is roughly **30× the throughput** of a Jetson Orin Nano Super in 25 W mode → MASt3R extrapolates to **~2.5–3 s / pair** on our target hardware without aggressive distillation/INT8 work that nobody has published yet. Drop MASt3R from the matcher *primary* shortlist; keep it only as a long-horizon research target. - Source: S57. - Confidence: ✅ High. **M-3 (Component 3) — Add GIM (ICLR 2024 spotlight) to the bench-off shortlist.** GIM (S48) is a self-training framework that takes existing matchers (LightGlue, RoMa, DKM, LoFTR) and re-trains them on 50 h of internet videos for **8.4–18.1 % zero-shot improvement**. The "generalist trained on diverse video" framing is the closest published proxy for our domain transfer (eastern-Ukraine 1 km AGL nadir vs. service satellite tiles). GIM-LightGlue should be included alongside vanilla LightGlue. - Source: S48. - Confidence: ✅ High. **M-4 (Component 2) — Add SALAD (DINOv2 + Sinkhorn-VLAD) and BoQ to the VPR shortlist.** Two CVPR 2024 papers landed after the draft's "AnyLoc primary + MixVPR fast-lane" decision was made: - **DINOv2 SALAD** (S47) — DINOv2 backbone + optimal-transport Sinkhorn aggregator with a "dustbin" cluster for non-informative features. R@1 = **75.0 %** on MSLS Challenge, **92.2 %** on MSLS Val, **76.0 %** on NordLand. Already a supported method in `aero-vloc` (S04), so direct apples-to-apples bench against AnyLoc/MixVPR. - **BoQ** (S46) — bag of learnable queries with cross-attention; **outperforms NetVLAD, MixVPR, EigenPlaces** on 14 large-scale benchmarks; surpasses two-stage methods (Patch-NetVLAD, TransVPR, R2Former) at lower cost; DinoV2 results published Nov 2024. AnyLoc is no longer the only DINOv2-based VPR option in the cross-domain regime; the bench-off must include all four. - Source: S46, S47. - Confidence: ✅ High. **M-5 (Component 2 / 9 / latency) — DINOv2-base latency on Orin Nano Super is ~10× better than the draft assumed.** Jetson AI Lab measurements (S40): **DINOv2-base-patch14 = 126 inferences/sec on Orin Nano Super** (~8 ms/inf at 224×224), 75 inf/s on the original Orin Nano (~13 ms/inf). The draft estimated 50–80 ms / 224×224. The latency budget therefore has substantially more headroom than the draft assumed — **but only at 224×224**; at higher input resolution, expect ~quadratic scaling (so 448×448 ≈ 32 ms/inf is still very comfortable inside the 400 ms p95 budget). This is a **good news** finding that simplifies AC-4.1. - Source: S40. - Confidence: ✅ High (NVIDIA L1 source; precision implied FP16 from JetPack 6.2 default trtexec). **M-6 (Component 6 / Security) — `mavlink-router` is itself attack surface.** Issue #436 (S45): public, easily-triggered, fuzzing-discovered **stack-based buffer overflow** in `ConfFile::get_sections` (memcpy of user-controlled section names into a 100-byte fixed buffer with no bounds check, plus an OOB write on null-terminator append). The repo has **no formal security policy / no SECURITY.md**. The draft's "share the MAVLink endpoint via a single mavlink-router instance" recipe drops a known-vulnerable C++ daemon onto a flight-critical companion. Mitigation options: 1. Pin to a fixed-and-audited tag, harden the systemd unit (NoNewPrivileges, ReadOnlyPaths, sandbox), and config-file-validate before launch. 2. Replace mavlink-router with a tiny in-process MAVLink endpoint multiplexer (Python or Go; this is ~150 lines of code given the only consumers are MAVSDK + pymavlink + mavlink-router-replacement → FC). 3. Use distinct system-IDs for MAVSDK and pymavlink and let ArduPilot's native MAVLink routing (S35-class) do the muxing on the FC side. - Source: S45. - Confidence: ✅ High. **M-7 (Component 6 / Security) — MAVLink2 signing is a v1-mandatory configuration item, not "recommended".** S44: signing is per-link, **USB bypasses signing**, keys live in FRAM (32-byte secret + timestamp), configured via Mission Planner (or the MAVProxy `signing` module). It works in ArduPilot 4.5+, but key provisioning is a **per-airframe operator step** that needs a documented procedure. Given that GPS_INPUT (or ODOMETRY) is a high-trust local channel feeding the flight-critical EKF, a signed MAVLink link companion↔FC is the only defence against an attacker who gains serial access. The draft mentions signing under "Security note (deferred to a Phase-4 security pass)" — Mode B promotes it to v1-required. - Source: S44. - Confidence: ✅ High. **M-8 (Component 1 / Tile Cache) — MBTiles SQLite under our concurrent read+write workload needs WAL + connection pool + transaction batching.** S54: the canonical `mbtiles` SQLite failure modes are (a) `database is locked` errors when concurrent writers compete with readers (default rollback journal is single-writer), (b) per-tile commit overhead crippling throughput on non-SSD. Recipe: - `PRAGMA journal_mode = WAL` (mandatory for mixed read+write). - Connection pool (cf. `MbtilesPool` from maplibre/martin S54) — multiple read connections + one write connection. - Transaction batching: bulk insert per N tiles per Component-1b cycle, not per tile. - Disable per-INSERT commit; rely on transaction boundary. The draft's tile-cache section says "MBTiles SQLite + per-tile metadata" but doesn't specify these. Add as a hard implementation note. - Source: S54. - Confidence: ✅ High. **M-9 (Component 1b / Tile Dedup — *new safety risk*) — onboard tile overwrites can poison the cache.** The draft's dedup rule: > If cache has a tile and the cache tile's `source ∈ {service}` AND the cache tile's `capture_date` is older than AC-8.2 freshness threshold AND our quality score > existing → **write** (overwrites with `source = onboard`). The risk: a confidently-bad onboard pose (over-confident EKF covariance escapes the σ_xy ≤ 10 m gate) writes a tile that's misaligned by, say, 30–50 m, but with high inlier count. Next flight, that misaligned tile becomes the satellite anchor for *another* fix → error compounds across flights. **This is a feedback-loop safety hazard that AC-NEW-4 (false-position budget) does not currently capture**, because Monte-Carlo over a single flight doesn't model the cross-flight cache-poisoning amplification. Mitigations (any of, ideally all): 1. **Service-source tiles are immutable within freshness budget.** Onboard tiles overwrite only stale or other-onboard tiles, never a fresh service tile. 2. **Voting layer at the Service ingest.** An onboard tile gets promoted to "trusted basemap" only after **N≥2 independent flights** confirm consistent geo-alignment within X m of each other. 3. **Quality score includes parent-pose covariance as a hard gate**, not just inlier count: a tile written from σ_xy > 5 m (tighter than the 10 m generation gate) is marked as "soft" and flagged in the sidecar. 4. **An additional AC**: "AC-NEW-7 — cache-poisoning safety" — see proposed addition in `solution_draft02.md`. - Source: derived analytical finding (no single L1/L2 — this is a design-level hazard exposed by Mode B reasoning). - Confidence: ⚠️ Medium (hazard is real and well-known in cartography/SfM; specific mitigation choice is empirical). **M-10 (Component 9 / Process topology) — Free-threaded Python 3.13 is not v1-ready.** S55: free-threading is **experimental**, has a "substantial single-threaded performance hit", many C extensions don't yet support it, and the GIL **auto-re-enables on import of any non-FT-aware extension** (which would silently include numba, possibly TensorRT bindings, possibly older pymavlink). The draft's choice (single asyncio Python process + TRT subprocess workers + numba on hot path) is correct for v1 — but the rationale should be sharpened from "GIL is a risk we mitigate" to **"free-threaded Python is not yet a substitute; revisit in v1.1 once NumPy/SciPy/numba/TRT bindings stabilise on PEP 703."** - Source: S55. - Confidence: ✅ High. **M-11 (Component 5 / W4.a) — ODOMETRY for fixed-wing in ArduPilot has known production gotchas.** S42 confirms ODOMETRY landed Dec 2021; S43 (PR #30080, "External nav+gps fix", merged 2025) shows ongoing work on the source-switching path when running external-nav alongside GPS. Practitioner-reported issues from S41/S42 discussion: - velocity errors when companion-computer-derived velocity is fed into EKF3, - position-estimate resets when external-nav loses reference, - conflicts when running external-nav alongside GPS. This is directly relevant to AC-NEW-2 (3 s spoofing-promotion latency) — the source switch is exactly the path that has known bugs. Mode B's recommended hybrid (GPS_INPUT primary + ODOMETRY when full covariance is available) needs SITL coverage of source-switching scenarios as a hard prerequisite, not a v1.1 follow-up. - Source: S41, S42, S43. - Confidence: ✅ High. **M-12 (Component 1b / R-Terrain) — Eastern-Ukraine relief amplitude breaks the "flat enough" assumption near edges.** S56: Kharkiv-region UAV survey reports **~24 m peak-to-trough relief** between low and high points in test areas, with creek + gully (yary/balky) systems. At 1 km AGL with a 35° HFOV camera, a 24 m elevation deviation at the frame edge produces ~17 m horizontal misalignment when projected via the flat-Earth assumption. That's **inside AC-1.1** (50 m@80%) but **eats into AC-1.2** (20 m@50%, hard-floor variant). Recommended addition: a per-sector DEM lookup (one-time pre-flight) that classifies sectors as "flat" (≤5 m amplitude), "moderate" (5–15 m), "rugged" (>15 m). The system uses tile-anchor weight-decay or skips ortho-tile generation in rugged sectors. - Source: S56. - Confidence: ⚠️ Medium (S56 is one regional survey; relief varies across the operational area). **M-13 (Datasets) — TartanAir V2 is a stronger synthetic baseline than MidAir; flag for user reconsideration.** S51: TartanAir V2 is photo-realistic (AirSim) with **native IMU + 12-cam rigs + 65 environments + season/weather variation + custom camera models**. The draft drops synthetic IMU per user instruction (AC-NEW-4 validation rewritten in solution_draft01). User's stated reason: Mavic-class dynamics ≠ fixed-wing dynamics. TartanAir V2 lets us **configure motion patterns**, so the dynamics-mismatch argument is weaker for TartanAir than for MidAir. **This is a real choice for the user**: either keep "real-data only" purism, or add TartanAir V2 as an early-bench-off-only baseline. Surface to user as an open question, not a unilateral change. - Source: S51. - Confidence: ⚠️ Medium (technical viability is high; product/operator preference is the user's call). **M-14 (Component 3 / W1.c) — Add AerialExtreMatch and 2chADCNN to the matcher V&V plan for season/viewpoint robustness.** Two underweighted benchmarks: - **AerialExtreMatch** (S49): 1.5 M synthetic image pairs with **32 difficulty levels** crossing overlap × scale × pitch — exact failure-mode profile for our 1 km AGL operational regime. Real-world UAV localization subset for end-to-end validation. - **2chADCNN** (S50): season-aware UAV↔satellite template-matching reference. Either include as bench-off candidate (vs. generic GIM/RoMa), or as a season-robustness *benchmark* the bench-off candidates run against. - Source: S49, S50. - Confidence: ✅ High. **M-15 (Component 4) — Real fixed-wing monocular VO is harder than the draft implies.** S52: SVO, DSO, ORB-SLAM2 all "had significant difficulty maintaining localisation" on real fixed-wing flights at altitude. S53: high-altitude (300–1000 m AGL) VIO publishes drift numbers in the same band as our AC-1.3. Conclusion: the draft's choice ("custom 2-frame homography VO using the Component-3 matcher") is **right** for our framing (VO between satellite anchors, not standalone metric SLAM), but the AC-1.3 drift budget (<100 m without IMU, <50 m with IMU) needs validation against real fixed-wing footage — *not* Mavic-class footage — before lock. - Source: S52, S53. - Confidence: ✅ High. --- ## Mode B Findings — second adversarial pass (user-driven, 2026-04-26) **M-16 (Component 2 / Granularity) — VPR retrieval unit must be decoupled from the storage-tile boundary.** The Mode A and Mode B draft both said "FAISS IVF over per-tile DINOv2-VLAD vectors" using **storage tiles at z=20** (~154 m × 154 m ground). A 1 km AGL nadir frame covers **30–100 such tiles** depending on lens. Cosine similarity between a frame descriptor (covers ~600 × 450 m) and a tile descriptor (covers 154 × 154 m) is fundamentally mismatched and noisy. None of the published aerial-VPR systems do it this way: - **AerialVL** (S03) preprocesses the reference satellite map into **frame-footprint-sized reference chunks** matched to expected drone-frame ground coverage. - **AnyLoc** (S05) uses overlapping macro-windows scaled to query footprint on aerial. - **NaviLoc** uses a sliding-window descriptor over the basemap. **Conclusion**: the storage tile (z=20, 512×512) stays as the dedup / orthorect unit. The **VPR chunk** is a separate concept: ground-footprint chunks sized to the expected frame coverage with **40–50% overlap** so any frame footprint lands cleanly inside ≥1 chunk. Optionally multi-scale (one set per altitude band). Index is over chunks, not tiles. - Source: re-reading S03 + S05 with the granularity question in mind; verified against the user-surfaced gap. - Confidence: ✅ High. The error mode is well-known in the aerial-VPR literature; the original draft just under-specified the retrieval unit. **M-17 (Component 2 / Invocation policy) — VPR is a re-loc-trigger module, not an every-frame module.** Per Component 5 EKF analysis, in steady state (recent anchor < 2 s, σ_xy < 20 m, VO healthy), a geometric prior from the IMU + VO predicted position is enough to pick top-K candidate VPR chunks by **distance alone** — no DINOv2 forward needed. VPR's value is concentrated in the resilience paths: - **AC-NEW-1 cold start** — no IMU prior at all → VPR is the only viable narrow. - **AC-3.2 sharp turn** — VO fails, IMU prior degrades fast → VPR re-anchors. - **AC-3.3 disconnected segment** — explicitly requires "global descriptor retrieval" — VPR. - **σ_xy growth** — when EKF position covariance escapes σ_xy ≥ 50 m, geometric prior is too wide; VPR re-narrows. **Conclusion**: control flow is `if (steady_state) { use geometric prior } else { invoke VPR }`. Saves ~10–35 ms/frame and lets VPR backbone idle (one less concurrent process during cruise). The DINOv2-base TRT engine still has to be resident in GPU memory for fast invocation. - Source: derived from M-1, M-5, AC-NEW-1, AC-3.2, AC-3.3, EKF analysis. Independently corroborated by user feedback on the architecture. - Confidence: ✅ High. **M-18 (Component 2 / Fallback) — expanding-window retry on unconvincing top-1.** Standard pattern in re-loc literature: if top-1 VPR similarity is below threshold OR top-1/top-2 gap is below threshold (both signs that VPR is unsure), **expand the candidate set to adjacent chunks** (±1 chunk in each direction = 8 neighbours in a regular grid; or radius-N expansion for sparse-overlap layouts) before failing over to operator-assisted re-loc. Cheap to add: same FAISS index, larger K, no extra DINOv2 forward. - Source: standard relocalization pattern (cf. ORB-SLAM3, GISNav, NGPS implementations). - Confidence: ✅ High. **M-19 (Component 2 / Active-conflict robustness) — multi-scale chunks + OSM road overlay + sector-driven K + negative cache.** Active-conflict scene change (destroyed buildings, cratering, dam flooding, road realignment) is a frequent operational reality in the eastern/southern Ukraine deployment, not an edge case. Layered mitigations beyond M-16/17/18: - **Multi-scale VPR chunks**: maintain BOTH fine-scale (z=20-derived) and coarse-scale (z=17/18-effective) chunk descriptor sets. Coarse-scale descriptors capture road-network + field-boundary + waterway structure that survives building destruction. ~12 MB extra disk, ~3 min one-time pre-flight DINOv2 forward. - **OSM road-network overlay**: extract OSM road geometry for the operational area pre-flight as a binary "road-mask" tile sidecar; matcher applies bonus inlier weighting on keypoints that fall on road edges. GISNav uses this pattern. Roads are the single most change-stable feature in active-conflict zones. - **Sector volatility classification drives K** (binds to AC-NEW-6 `sector_class`): K=5 stable / K=20 active / K=50 expanding-window-fallback. - **Onboard-tile rapid promotion in active sectors**: refines M-9's 2-flight voting — single-flight promotion allowed in active sectors when σ_xy ≤ 3 m AND OSM-road-overlap ≥ 70 % (dual gate keeps safety). - **Negative cache**: tiles repeatedly rejected by matcher across flights get `trust_level = stale_destroyed`, excluded from retrieval until Service refresh. The two highest-leverage of these are multi-scale chunks and OSM overlay; the rest are essentially free. - Source: derived from M-9, M-16, M-17, M-18 + standard cartographic-stability reasoning + GISNav reference architecture; user-driven concern about active-conflict scene change frequency. - Confidence: ✅ High on multi-scale + OSM (literature-backed); ⚠️ Medium on the OSM-road-overlap-≥-70 % numeric threshold (needs empirical calibration). **M-20 (Component 1) — Storage tile zoom level pinned at z=20.** Trade-off analysis in response to user question (z=18 vs z=20): - ADTi 20MP APS-C @ 1 km AGL with 24–50 mm lens → frame GSD in 8–18 cm/px range. Mid-range (~35 mm lens) → ~12 cm/px. - Frame-vs-reference scale ratio at z=20 (30 cm/px): **2.5×** — well within the SP+LG / GIM-LightGlue "well-handled" band (≤4× per published IMW-style benchmarks). - Frame-vs-reference scale ratio at z=18 (~120 cm/px): **10×** — outside the SP+LG well-handled band; sub-pixel keypoint-correspondence accuracy degrades sharply, pushing AC-1.2 (50 % @ 20 m) and AC-2.2 (MRE < 2.5 px) into risk territory. - Storage @ z=20 over 400 km² ≈ 2.8 GB cache + 30 MB DEM + 16 MB VPR chunk index ≈ 3 GB total — **28 % of the 10 GB budget**, leaving 7 GB headroom for FDR overflow and multi-scale chunks (M-19). - Storage @ z=18 over 400 km² ≈ 220 MB total — saves ~2.5 GB but provides no operational benefit at our budget level. - Pre-flight compute: z=20 takes ~5 min; z=18 takes ~3 min. Both trivial on the bench. Not a deciding factor. - **Decision: z=20** for the storage tile. The accuracy benefit is meaningful; the storage cost fits comfortably. Folded into restrictions.md. - Source: derived analysis using ADTi camera spec + Mode B finding S40 (DINOv2 latency) + IMW-style matcher-resolution-mismatch data. - Confidence: ✅ High. **M-21 (2chADCNN re-classification) — ceiling reference, NOT bench-off candidate.** Closer reading of S50 (MDPI Drones 2023) reveals 2chADCNN is structurally incompatible with our bench-off: - **Output format**: template-overlap region (IoU-style), not sub-pixel keypoints. Component 3's PnP needs keypoint correspondences; 2chADCNN can't supply them. - **Tested altitude band**: 252–500 m AGL, not 1 km. Their experimental envelope doesn't cover our regime. - **No Jetson / TRT benchmark**: trained on Intel i5 + 8 GB RAM CPU only. - **Method paradigm**: traversal-search template matching (slide template over satellite image at every position, compute similarity). Doesn't scale to a 400 km² operational area in our latency budget. - **Reported numbers**: real-summer overlap-IoU 0.92–0.99; synthetic-snow overlap-IoU 0.82–0.95. Useful as a published season-robustness *number* against which we benchmark our chosen modern matcher (SP+LG / GIM-LightGlue) — but not as a candidate for the matcher slot itself. Walks back the "optionally a bench-off candidate" tag in M-14. 2chADCNN is **purely a season-robustness ceiling reference**. Newer / more relevant season-aware references for the open-research reading list: - **AFF-CNN-HTransformer cross-perspective UAV-satellite matching** (Sci Reports 2025) — hybrid CNN+Transformer cross-view + season. - **Polar-coordinate-transformation rotation-and-season-invariant UAV-satellite matching** (2026) — explicitly addresses both rotation and season; intersects nicely with our IMU-driven de-rotation step. - Source: closer reading of S50 + new search results 2025-2026. - Confidence: ✅ High on 2chADCNN re-classification; ⚠️ Medium on the newer papers (need to read full PDFs before bench-off inclusion). --- ## Mode B Round 2 (component replacements & sweep) — appended 2026-04-26 **M-22 (Component 4 / VO architecture) — custom 2-frame homography VO is the wrong design.** Source: S52 (AFIT thesis), S60 (cuVSLAM), S64 (Isaac ROS UAV reference), S72 (high-altitude VIO), S73 (DPV-SLAM). - Draft02 C-4 says "custom 2-frame VO via SuperPoint+LightGlue homography". This skips loop closure, sparse bundle adjustment, keyframe-based local mapping — every mechanism that bounds drift in production VO/SLAM systems. - AFIT thesis (S52) shows even ORB-SLAM2 / SVO / DSO struggle on real fixed-wing flights; a hand-rolled 2-frame homography VO will be strictly worse. - High-altitude VIO field test (S72): stereo-VIO = 2.186 m / 800 m at 40–100 m AGL; monocular-VIO is "acceptable but worse". At 1 km AGL motion parallax shrinks ~10–25× per frame, further degrading monocular VO. - **Recommendation: replace custom 2-frame VO with cuVSLAM (S60, S64) in monocular + IMU mode.** - Confidence: ✅ High on "custom 2-frame VO is wrong"; ⚠️ Medium on "cuVSLAM is the right replacement" — high-altitude fixed-wing performance is unproven on cuVSLAM's published benchmarks (KITTI urban driving + EuRoC indoor MAV). Bench-off in F-T1b mandatory. **M-23 (Component 4 / VO candidate evaluation on Jetson Orin Nano Super).** Source: S60, S61, S62, S71, S73, S76. - **cuVSLAM (S60)**: NVIDIA-supported, CUDA-optimized, drop-in via `isaac_ros_visual_slam`, Apache-2.0. Reference designs on Orin Nano (S64, S77) confirm runtime feasibility. <1% ATE on KITTI / <5cm on EuRoC. **Verdict: v1 lead candidate.** - **DPVO / DPV-SLAM (S61, S73)**: SOTA deep VO, but DPVO-QAT++ is benchmarked on RTX-4060, not Jetson. Original DPVO @ 2–5× real-time on RTX-3090 (4 GB) → Orin Nano Super extrapolation ≈ 4–10 FPS without QAT, ≈6–15 FPS with QAT. **Borderline for 10 Hz target; not v1.** - **MASt3R-SLAM (S62)**: 15 FPS on a single GPU; sub-1 Hz extrapolated on Orin Nano Super. **Infeasible for inline v1.** - **VINS-Fusion / OpenVINS / BASALT / SVO Pro (S71)**: Classical, well-tested, but require manual integration (OpenCV pinning, ArUco fixes, DDS / ROS plumbing) and no Jetson-class CUDA acceleration of the front-end. Higher integration cost than cuVSLAM with no accuracy advantage. - **Custom 2-frame homography VO (current draft02 plan)**: M-22 already disqualified. - Confidence: ✅ High. **M-24 (Component 3 / cross-view matcher — LiteSAM evaluation).** Source: S58. - LiteSAM is **purpose-built for satellite↔aerial AVL in GPS-denied environments**. Architectural choices (TAIFormer + MinGRU sub-pixel refinement) are tailored to large appearance variations and texture-scarce regions — exactly our regime. - Results: 6.31 M params (2.4× smaller than EfficientLoFTR); RMSE@30 = 17.86 m on UAV-VisLoc; 61.98 ms on standard GPU; **497.49 ms on Jetson AGX Orin** (FP16-optimized). - **Crucial extrapolation**: AGX Orin INT8 throughput ≈ 275 TOPS, Orin Nano Super ≈ 67 TOPS → 4× scaling factor → **LiteSAM on Orin Nano Super ≈ 1500–2000 ms / pair**. Well outside our 400 ms p95 budget for inline use. - **Three useful roles (not the inline matcher)**: - (a) **Re-localization fallback** — invoked rarely (cold start, σ_xy > 50 m), 1.5–2 s latency tolerable. - (b) **Validation oracle** — ground-truth-quality matches for offline regression bench. - (c) **Distillation teacher** — train a smaller student model with LiteSAM-supervised correspondences for the satellite-aerial domain. - **Verdict: add LiteSAM in roles (a)/(b)/(c); SP+LG (TRT FP16/INT8) remains the inline matcher.** - Confidence: ✅ High on architectural fit; ⚠️ Medium on the 4× AGX-Orin → Orin Nano Super scaling — needs empirical confirmation in bench-off. **M-25 (Component 3 / cross-view matcher — RoMa v2 / MapGlue / MATCHA).** Source: S63 + earlier MapGlue / MATCHA notes. - **RoMa v2 (S63)**: SOTA dense matcher, frozen DINOv3 + custom CUDA + predictive covariance. GPU-class compute. Infeasible inline on Orin Nano Super; viable as **offline ceiling reference** for Component 3 bench-off. - **MapGlue / MATCHA**: Cross-modal/multimodal matchers — useful research-track candidates but no Jetson deployment data; same offline-only verdict. - **Verdict**: not a v1 candidate; offline ceiling reference. The matcher bench-off (deferred research item) MUST include both as ceilings so we know how much accuracy we're trading away by using SP+LG inline. - Confidence: ✅ High. **M-26 (Component 5 / EKF→ESKF question — architectural reframing).** Source: S65, S66, S67, S68, S69. - **The FC (ArduPilot 4.5+) runs EKF3, a classical extended Kalman filter — not an ESKF.** PX4 EKF2 is the ESKF (S68); we are not on PX4. We cannot swap the FC's filter. - The "EKF vs ESKF" debate therefore applies **only to the companion-side filter** (Component 5 in draft02). - **Best practice for ArduPilot ExtNav setups (S65, S66, S67)**: companion does NOT run a heavy filter on top. Companion produces (visual fix → GPS_INPUT) and/or (relative pose → ODOMETRY) with well-calibrated covariances; ArduPilot EKF3 fuses those with the FC's IMU. - ArduPilot issues #30076 (S65) and #32506 (S66) document concrete failure modes when feeding the FC two simultaneous position sources — **only one position source per axis at a time**. The hybrid `GPS_INPUT + ODOMETRY` plan from M-1 must therefore split responsibilities by **channel**, not duplicate position on both. - **Architectural revision**: the companion-side EKF in draft02's C-5 is **not necessary** for v1. It can be replaced by a lightweight **"covariance calibrator + outlier gate + source-label producer"**: each upstream (matcher, VO, IMU passthrough if any) emits a hypothesis with a covariance; a Mahalanobis gate rejects outliers; covariances are re-scaled if empirical residuals indicate over- or under-confidence; results are emitted on the appropriate MAVLink channel. No state propagation, no IMU integration on the companion. - **If a companion-side filter is justified later** (e.g., to smooth visual fixes before they reach the FC, or to integrate VO with the FC's downsampled-IMU stream the companion can subscribe to), use **vanilla ESKF (S69)** for orientation correctness — but only after F-T9 SITL shows the FC's EKF3 cannot handle our raw input quality. - Confidence: ✅ High on dropping the companion-side EKF for v1; ⚠️ Medium on whether we'll need to re-introduce one for v1.x. **M-27 (Component 1b / Ortho-Tile Generator — use Orthority).** Source: S59. - Orthority (Python, MIT-class) supports frame + RPC camera models, GeoTIFF DEM lookup, RPC refinement, pan-sharpening — i.e., everything draft02's hand-rolled pinhole-on-DEM ortho was going to reinvent. - Pip-installable (`pip install orthority`). API-driven (per-image ortho via `Ortho` class) → callable inline from our Component 1b worker. - ODM is post-processing batch SfM — wrong tier; not for per-frame ortho on a 1 km AGL nadir camera with known FC pose. - **Verdict: replace draft02's "Pinhole projection on per-sector DEM" with Orthority frame-camera ortho.** Falls back to a 6-line `cv2.warpPerspective` + bilinear DEM lookup if Orthority's per-frame latency on Orin Nano Super blows our budget — measure in F-T14. - Confidence: ✅ High on Orthority being the right tier; ⚠️ Medium on the latency assumption — needs measurement. **M-28 (Component 1 / tile storage — MBTiles WAL stays; PMTiles / COG considered).** Source: COG/PMTiles search results + draft02 M-8. - **COG**: Highly-tiled COG metadata can trigger 500 MB initial download on a 7 GB file (geotiff.js issue #479) — defeats selective access on a bandwidth-constrained UAV system. Not a fit. - **PMTiles**: Single-file alternative to MBTiles, cloud-optimized. Good for HTTP serving (RPi tests show competitive performance). For our use case (local microSD, embedded reader+writer), PMTiles loses the SQLite-WAL concurrency story we already designed for in M-8. - **Verdict: MBTiles + WAL (M-8) remains the right choice.** No revision. - Confidence: ✅ High. **M-29 (Component 9 / orchestrator — ROS 2 vs DIY Python).** Source: S64, S77. - ROS 2 Humble + JetPack 6 + Isaac ROS 3.2 + cuVSLAM + MAVROS is a **proven reference architecture on Orin Nano Super** (S64, S77). - If we adopt cuVSLAM (M-22/M-23), the lowest-friction path is to consume cuVSLAM via `isaac_ros_visual_slam` (ROS 2 wrapper) and bridge to the FC via MAVROS — not to re-export cuVSLAM's C++ API into a custom Python orchestrator. - **ROS 2 cost**: extra ~2–5 % CPU for DDS + topic serialization; learning curve for the team; deployment image grows ~200 MB. - **ROS 2 benefit**: free integration of cuVSLAM, MAVROS, Isaac ROS perception nodes; battle-tested; observability via `ros2 bag` and `rqt_*` tooling. - **DIY Python alternative** (draft02 plan): keeps everything in one asyncio process; lowest overhead; but we re-export every ROS 2 component we want to consume (cuVSLAM via Python bindings, MAVROS-equivalent via pymavlink, etc.). - **Verdict: lean toward ROS 2 Humble + Isaac ROS for v1**, with our matcher / VPR / ortho / FDR / fusion-glue nodes implemented as ROS 2 Python nodes (`rclpy`). Decision is **not locked** — it's the largest open architectural question for round 2 and the user should be asked. - Confidence: ⚠️ Medium — depends on whether the team has ROS 2 experience and whether the ~5 % CPU overhead is acceptable inside the latency budget. **This is a Q for the user.** **M-30 (Component 5 / hybrid GPS_INPUT + ODOMETRY — channel split per S65/S66/S67).** Source: S65, S66, S67. - M-1 (round 1) said "emit BOTH GPS_INPUT AND ODOMETRY in parallel". S65/S66/S67 say **only one position source per axis at a time** and document concrete bugs when the FC sees two. - **Revised channel split**: - Option A (simplest, recommended for v1): **GPS_INPUT carries position + velocity** (lat/lon/alt + N/E/D velocities + h_acc/v_acc/vel_acc covariance scalars). ODOMETRY is **disabled** for v1. ArduPilot configured `EK3_SRC1_POSXY = GPS`, `EK3_SRC1_VELXY = GPS`, `EK3_SRC1_YAW = GPS+Compass`. Our companion provides a "GPS-equivalent" via GPS_INPUT (`GPS1_TYPE=14`); ArduPilot treats it identically to a real receiver. Failover to backup GPS via `EK3_SRC2_*`. - Option B (richer, v1.1+): **ODOMETRY carries position + velocity + yaw + full 21-element covariance**, GPS_INPUT carries **fix only as fallback** (not actively fused while ODOMETRY is healthy). ArduPilot configured `EK3_SRC1_POSXY = ExternalNav`, `EK3_SRC1_YAW = ExternalNav`, with `EK3_SRC2_POSXY = GPS` as backup. Requires PR #30080-class fixes for clean source switching. - **Original M-1 (both channels for the same axis) is a misconfiguration**, not a feature. Walk back. - **Verdict**: v1 ships Option A. Option B is v1.1 territory once F-T9 confirms source-switching behaves cleanly under PR #30080. - Confidence: ✅ High. **M-31 (Component 6 / sysid sharing on the wire).** Source: S65, S67. - Round 1 M-6 picked "distinct system-IDs for MAVSDK (sysid=10) and pymavlink (sysid=11), sharing the serial port via ArduPilot's native MAVLink routing — no router daemon". - This decision survives round 2 unchanged. The distinct-sysid trick + ArduPilot native routing is documented and works for any MAVLink2 stack. No router CVE exposure (M-6 / S45). - Open task: confirm the chosen sysids don't collide with any MAVLink2 forwarding rule on QGroundControl GCS-side; document in deploy runbook. - Confidence: ✅ High. **M-32 (Component 9 / Python topology — confirmed).** Source: S55. - Round 1 M-10: stay on CPython 3.11/3.12; defer free-threaded 3.13 to v1.1. Survives round 2 unchanged. - If Component 9 moves to ROS 2 (M-29), the Python version question still applies — `rclpy` supports 3.11/3.12; 3.13 free-threaded is also experimental there. - Confidence: ✅ High. **M-33 (Component 2 / VPR — no new entrants worth adding).** Source: round-2 searches. - Searched for newer VPR SOTA than DINOv2-SALAD / BoQ (CVPR 2024). The 2025 landscape is matcher-centric (RoMa v2, LiteSAM, MASt3R-SLAM); no new VPR backbone has displaced SALAD/BoQ on aerial cross-domain. - Round 1 shortlist {AnyLoc, SALAD, BoQ, MixVPR} stands. - Confidence: ✅ High. **M-34 (Component 4 / camera intrinsics learning — calibration-free SLAM).** Source: S62. - MASt3R-SLAM is calibration-free; cuVSLAM expects intrinsics. Our nav cam (ADTi 20MP APS-C) will be calibrated pre-flight via standard checkerboard procedure → cuVSLAM's intrinsics requirement is **not** a friction point. - Confidence: ✅ High. **M-35 (Component 5 / IMU access on the companion — open question).** Source: S64 reference designs. - The reference cuVSLAM-on-Jetson designs (S64) use the camera's built-in IMU (RealSense D435i) for VIO. Our nav cam (ADTi 20MP APS-C) has no IMU; the FC has the IMU. - Two paths to feed IMU into companion-side cuVSLAM: - (a) MAVLink `RAW_IMU` / `SCALED_IMU` stream from FC → companion subscribes via pymavlink, feeds cuVSLAM. **~1 kHz IMU on FC down-rated to ~200–400 Hz over MAVLink** is sufficient for monocular VIO; latency budget acceptable. - (b) Add a dedicated companion-side IMU (BNO055 / ICM-42688P / Bosch BMI270 over SPI/I²C) with its own time sync. More hardware, but no MAVLink-bus contention. - **Verdict v1**: try path (a); if cuVSLAM's IMU sync sensitivity (timestamping) is too tight for MAVLink-rated IMU, fall back to (b) in v1.1. - Confidence: ⚠️ Medium — depends on cuVSLAM's tolerance for IMU rate / timing jitter; needs empirical check during integration.