Files
gps-denied-onboard/_docs/01_solution/solution.md
T
Oleksandr Bezdieniezhnykh 9eba1689b3 - Introduced a new document detailing the current state of the autodev process, including steps, status, and findings.
- Revised acceptance criteria in the acceptance_criteria.md file to clarify metrics and expectations, including updates to GPS accuracy and image processing quality.
- Enhanced restrictions documentation to reflect operational parameters and constraints for UAV flights, including camera specifications and satellite imagery usage.
- Added new research documents for acceptance criteria assessment and question decomposition to support ongoing project evaluation and decision-making.
2026-04-26 14:28:10 +03:00

450 lines
42 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters
This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.
# Solution Draft 03
> **Mode**: B (Solution Assessment of `solution_draft02.md`).
> **Inputs**: `solution_draft02.md` (Mode B round 1) + `_docs/00_research/{03_mode_b_decomposition_round2,04_reasoning_chain_mode_b_round2,05_validation_log_mode_b_round2}.md` + Mode B round-2 sources S58S77 in `01_source_registry.md` + Mode B round-2 fact cards M-22..M-35 in `02_fact_cards.md`.
> **Date**: 2026-04-26 (Mode B round 2).
> **Self-contained**: yes — supersedes `solution_draft02.md`.
>
> **What changed in round 2** (driven by user-explicit asks: VO, matcher, EKF/ESKF, ortho-tile generator + thorough sweep):
>
> - **Component 4 (VO)**: replace draft02's *custom 2-frame homography VO via SP+LG* with **cuVSLAM** (NVIDIA, CUDA-accelerated, drop-in via `isaac_ros_visual_slam`) in monocular + IMU mode (M-22, M-23, S60, S64).
> - **Component 5 (Fusion)**: **drop the companion-side EKF entirely for v1**. Replace with a lightweight **covariance calibrator + Mahalanobis outlier gate + source-label producer** — no state propagation, no IMU integration on the companion (M-26). Let ArduPilot EKF3 do the actual fusion. The "EKF vs ESKF" question becomes: *if* we re-introduce a companion filter in v1.x, use vanilla ESKF (S68, S69) — but for v1 the question is moot.
> - **Component 5 (Hybrid output)**: walk back round-1 M-1's "emit BOTH GPS_INPUT AND ODOMETRY in parallel for the same axis" — that triggers ArduPilot EKF3 double-fusion bugs (S65, S66, S67). v1 ships **GPS_INPUT only** (Option A in M-30); ODOMETRY-primary mode is v1.1 territory.
> - **Component 3 (Matcher)**: **SP+LG (TRT FP16/INT8) remains the inline matcher**; **LiteSAM (S58) added in three non-inline roles**: re-localization fallback (cold start, σ_xy > 50 m), validation oracle, distillation teacher (M-24). RoMa v2 (S63), MASt3R-SLAM (S62), MapGlue, MATCHA added to the matcher bench-off as **offline ceiling references** (M-25).
> - **Component 1b (Ortho-Tile Generator)**: replace draft02's hand-rolled "pinhole projection on per-sector DEM" with **Orthority** (S59) — Python library, frame + RPC camera, GeoTIFF DEM, pip-installable. Documented fall-back to `cv2.warpPerspective + bilinear DEM` if F-T14 latency measurement fails (M-27).
> - **Component 9 (Software platform)**: **ROS 2 Humble + Isaac ROS 3.2** chosen (Q6 → A, locked 2026-04-26). Natural pair for cuVSLAM and a published reference architecture on Orin Nano Super (S64, S77, M-29). DDS overhead (~25 % CPU, ~200 MB image growth) accepted in exchange for free integration of `isaac_ros_visual_slam`, MAVROS, and `ros2 bag` / `rqt_*` observability tooling.
> - **Component 1 (Tile storage)**, **C-2 (VPR)**, **C-6 (MAVLink)**, **C-7/C-8/C-10/C-11**: unchanged from draft02 (M-28, M-31, M-33).
>
> **Locked-in user decisions carried over from round 1** (unchanged):
>
> - **Q1** → A: GPS_INPUT primary channel (now: ONLY channel for v1 — see M-30 above).
> - **Q2** → A: distinct system-IDs via ArduPilot native MAVLink routing; **no `mavlink-router` daemon**.
> - **Q3** → A: AC-NEW-7 thresholds confirmed at P(>30 m)<1 %, P(>100 m)<0.1 % per flight.
> - **Q4** → A: TartanAir V2 included as early-stage synthetic baseline.
> - **Q5** → B (round 1): proceed to Plan in fresh conversation. **Round 2 was triggered after rollback for additional component-replacement investigation.**
> - Camera spec → ADTi 20MP 20L V1 APS-C; storage zoom → z=20.
>
> **Round-2 user decisions locked-in (2026-04-26)**:
>
> - **Q6** → A: **ROS 2 Humble + Isaac ROS 3.2** as the v1 orchestrator (M-29). DIY Python orchestrator dropped. Codified in Component 9.
> - **Q7** → A: **MAVLink `RAW_IMU` / `SCALED_IMU` from FC** (path a) as the v1 IMU source for cuVSLAM (M-35). Dedicated companion IMU is a v1.1 hardware revision triggered only if F-T1c shows sync-jitter problems. Codified in Component 4.
---
## Assessment Findings (Round 2 additions)
The round-1 findings table (15 rows: M-1 … M-21, including addenda M-19/M-20/M-21) carries forward unchanged. **Round 2 adds the following findings, with the same `old → weak → new` pattern**:
| Old Component Solution (round 1) | Weak Point (round 2 evidence) | New Solution (round 2) |
|----------------------------------|-------------------------------|------------------------|
| **C-4 (round 1)**: "custom 2-frame VO via SuperPoint+LightGlue / GIM-LightGlue homography." | **Functional, high (M-22)**. Custom 2-frame homography skips loop closure, sparse bundle adjustment, and keyframe-based local mapping — every mechanism that bounds drift in production VO/SLAM. AFIT thesis (S52) shows even ORB-SLAM2/SVO/DSO struggle on real fixed-wing flights; a hand-rolled 2-frame variant will be strictly worse. At 1 km AGL motion parallax shrinks ~1025× per frame vs 100 m AGL, further degrading monocular VO. | **Replace with cuVSLAM** (NVIDIA, CUDA-accelerated, Apache-2.0; S60, S64). Monocular + IMU mode, drop-in via `isaac_ros_visual_slam` ROS 2 wrapper. <1 % ATE on KITTI / <5 cm on EuRoC. Fixed-wing 1 km AGL behaviour empirically TBD — bench-off in F-T1b mandatory before AC-1.3 lock. |
| **C-4 (round 1)**: same row, alternatives. | **Functional (M-23)**. Deep-VO alternatives evaluated for Orin Nano Super: DPVO/DPV-SLAM (S61, S73) extrapolate to 415 FPS — borderline for our 10 Hz target; MASt3R-SLAM (S62) is sub-1 Hz on Orin Nano Super — infeasible; VINS-Fusion / OpenVINS / BASALT / SVO Pro (S71) require non-trivial integration cost with no accuracy advantage over cuVSLAM. | cuVSLAM is **lead**; DPV-SLAM / VINS-Fusion / OpenVINS retained as **bench-off fall-backs** if cuVSLAM underperforms on fixed-wing 1 km AGL. MASt3R-SLAM / RoMa v2 reserved for **offline ceiling references**. |
| **C-3 (round 1)**: "SP+LG (TRT FP16) lead, GIM-LightGlue peer, RoMa/DKM bench-off, MASt3R dropped." | **Functional, positive (M-24)**. LiteSAM (S58, MDPI Oct 2025) is purpose-built for satellite↔aerial AVL: 6.31 M params (2.4× smaller than EfficientLoFTR), RMSE@30 = 17.86 m on UAV-VisLoc, beats EfficientLoFTR. **But on Jetson Orin Nano Super, extrapolated latency is ~15002000 ms / pair** (AGX Orin → Orin Nano Super 4× scaling) — outside our 400 ms p95 budget for inline use. | **Add LiteSAM in three non-inline roles**: (a) re-localization fallback (cold start, σ_xy > 50 m, 1.52 s tolerable); (b) validation oracle for offline regression bench; (c) distillation teacher to train a satellite-aerial-specialised student model that fits the inline budget. **Inline matcher remains SP+LG / GIM-LG.** |
| **C-3 (round 1)**: same row, ceilings. | **Functional, positive (M-25)**. RoMa v2 (S63, Nov 2025): SOTA dense matcher with frozen DINOv3 backbone + custom CUDA + predictive covariance — best published pose-estimation accuracy. MASt3R-SLAM (S62), MapGlue, MATCHA: cross-modal/multimodal matchers with strong specialisation. All GPU-class compute. | **Add RoMa v2, MASt3R, MapGlue, MATCHA to the matcher bench-off as offline ceiling references** so we know how much accuracy we trade by using SP+LG inline. None becomes inline candidate. |
| **C-5 (round 1, M-1)**: "Onboard loosely-coupled EKF emits two parallel MAVLink streams: GPS_INPUT (primary) AND ODOMETRY (auxiliary, when available) for the same axis." | **Functional, safety, high (M-26, M-30)**. ArduPilot ExtNav best practice (S65, S66, S67): **only one position source per axis at a time**. Open issues #30076 and #32506 document concrete EKF3 misbehaviours when both ExtNav (ODOMETRY) and GPS (GPS_INPUT) are fed for overlapping axes — including unstable position with high variances and Z-axis snap-to-ODOMETRY. The "emit both in parallel" framing was a misconfiguration, not a feature. | **v1 ships GPS_INPUT only** (Option A in M-30). ODOMETRY emission disabled in v1. ArduPilot configured `EK3_SRC1_*=GPS+Compass`; failover via `EK3_SRC2_*`. **Option B (ODOMETRY-primary) is v1.1 work** once F-T9 SITL confirms PR #30080-class source-switching is clean. |
| **C-5 (round 1)**: "loosely-coupled EKF in our process." | **Architectural (M-26)**. The companion-side EKF was always going to feed the FC's own EKF3 → double-fusion. Visual fix → companion EKF → ArduPilot EKF3 stacks two filters on overlapping observations, breaks the single-source-per-axis invariant, and risks the same instability documented in #30076/#32506. | **Drop the companion-side EKF for v1.** Component 5 becomes a **"covariance calibrator + Mahalanobis outlier gate + source-label producer"** — no state propagation, no IMU integration. Each upstream (matcher, cuVSLAM) emits a hypothesis with covariance; outliers are gated; covariances are re-scaled if empirical residuals show over- or under-confidence; results are emitted on the appropriate MAVLink channel. **If v1.x evidence demands a companion-side filter**, use vanilla **ESKF** (S68, S69) — the right family for orientation correctness. |
| **C-1b (round 1)**: "Pinhole projection on per-sector DEM (flat-Earth in flat sectors; SRTM-30 m DEM lookup in moderate sectors)." | **Engineering (M-27)**. Implicit hand-rolled implementation reinvents distortion handling, RPC refinement, DEM bilinear lookup, projection — all of which exist in the **Orthority** Python library (S59) under MIT-class licence, pip-installable. | **Use Orthority for per-frame ortho** (frame-camera mode). Falls back to `cv2.warpPerspective + bilinear DEM` (~520 ms estimated) if F-T14 measurement shows Orthority's per-frame latency on Orin Nano Super > 50 ms allotted to ortho. |
| **C-9 (round 1)**: "Single Python process (asyncio) on CPython 3.11/3.12; TRT subprocess workers." | **Architectural (M-29)**. With cuVSLAM adoption (M-23), the natural integration path is `isaac_ros_visual_slam` (ROS 2 wrapper) → MAVROS → FC. Re-exporting cuVSLAM into a custom asyncio orchestrator is high-friction. **ROS 2 Humble + JetPack 6 + Isaac ROS 3.2 is a published, working reference design on the exact hardware target** (S64, S77). | **OPEN QUESTION (Q6)**: ROS 2 Humble + Isaac ROS 3.2 vs. DIY Python orchestrator. ROS 2 cost: ~25 % CPU (DDS + topic serialisation), ~200 MB image growth, learning curve. ROS 2 benefit: free integration of cuVSLAM, MAVROS, observability via `ros2 bag` / `rqt_*`. **User decides.** |
(Round-1 findings M-1 through M-21 — including the Phase-1-correction addenda — remain unchanged in their original form; round-2 supersedes only the rows above. Full round-1 rationale lives in `solution_draft02.md` for traceability and `_docs/00_research/02_fact_cards.md`.)
---
## Product Solution Description (Revised)
A companion-computer software stack that runs on the **Jetson Orin Nano Super** alongside an **ArduPilot 4.5+** flight controller and provides **GPS-equivalent position fixes** to the autopilot when real GPS is jammed, spoofed, or denied.
**Localization pipeline (per frame at 3 fps nav cam):**
1. **cuVSLAM** (monocular + IMU from FC `RAW_IMU` MAVLink stream) provides drift-bounded **relative pose** with keyframe-based local mapping + sparse bundle adjustment + loop closure.
2. **VPR** (DINOv2 SALAD/BoQ chosen by bench-off; AnyLoc fallback) narrows the satellite basemap to a top-K candidate-chunk shortlist on re-localization triggers (cold start, sharp turn, σ_xy > 50 m) — **conditional invocation** keeps cruise overhead near zero.
3. **Cross-view matcher** (SP+LG TRT FP16 inline; GIM-LightGlue peer in the bench-off; LiteSAM as **re-loc fallback**) produces sub-pixel keypoint correspondences against the candidate chunks; PnP yields an **absolute pose** + covariance.
4. **Component 5** (**covariance calibrator + Mahalanobis outlier gate + source-label producer** — *not* an EKF) consumes the absolute pose + cuVSLAM relative pose; rejects outliers; re-scales covariances; emits result on the appropriate MAVLink channel.
5. **GPS_INPUT** (`GPS1_TYPE=14`, MAVLink2-signed, pymavlink) is sent to the FC. ArduPilot EKF3 (24-state classical EKF, 400 Hz) does the actual fusion of our GPS-equivalent fix with its own IMU, baro, compass.
**Tile generation** (in-flight, asynchronous):
1. Per-frame eligibility check (σ_xy ≤ 5 m hard gate, terrain class flat/moderate, EKF source = `satellite_anchored`).
2. **Orthorectification via Orthority** (frame-camera model + per-sector DEM from SRTM 30 m).
3. Quality scoring + dedup against existing tile cache (service-tile immutability respected).
4. Write to MBTiles SQLite cache (WAL + connection pool + transaction batching) with `parent_pose_sigma_xy`, `terrain_class`, `trust_level`.
5. **Post-flight**: tiles uploaded to **Suite Service candidate pool**; **2-flight voting** at Service ingest promotes onboard tiles to trusted basemap.
**Object localization** (separate path, AI camera): trig + airframe-attitude fusion via FC `ATTITUDE` MAVLink stream — unchanged from round 1.
**MAVLink endpoint**: shared between MAVSDK (telemetry, sysid=10) and pymavlink (GPS_INPUT, sysid=11) via **distinct system-IDs through ArduPilot's native MAVLink routing** — no `mavlink-router` daemon. **MAVLink2 signing mandatory in v1**.
```
Pre-flight (ground)
┌────────────────────────────────────────────────┐
│ Azaion Suite Satellite Service │
│ (sources commercial / agency imagery; │
│ ingests onboard tiles via candidate pool + │
│ 2-flight voting layer) │
└──────────────┬───────────────────┬─────────────┘
│ sync down │ upload back (post-flight)
▼ ▲
┌─────────────────┐
│ DEM (SRTM 30 m) │ ─────► sector classification
└─────────────────┘
Onboard (in-flight)
Nav Cam: ADTi 20MP, 3 fps AI Cam (gimbal+zoom, on-demand)
│ │
▼ ▼
┌────────────────────────────────────────────┐ ┌────────────────────┐
│ ROS 2 Humble + Isaac ROS 3.2 (Q6: TBD) │ │ Object Geo-Locator │
│ ┌──────────────────────────┐ │ │ (pinhole+ATTITUDE) │
│ │ cuVSLAM (mono + IMU) │←──FC RAW_IMU │ └──────┬─────────────┘
│ │ → keyframe pose + cov │ │ │
│ └────────────┬─────────────┘ │ │
│ ▼ │ │
│ ┌──────────────────────────┐ │ │
│ │ VPR (SALAD/BoQ/AnyLoc) │←─ re-loc │ │
│ │ on demand only │ triggers │ │
│ └────────────┬─────────────┘ │ │
│ ▼ │ │
│ ┌──────────────────────────┐ │ │
│ │ Cross-view Matcher │ │ │
│ │ inline: SP+LG / GIM-LG │ │ │
│ │ re-loc: LiteSAM (rare) │ │ │
│ └────────────┬─────────────┘ │ │
│ ▼ │ │
│ ┌──────────────────────────┐ │ │
│ │ PnP → absolute pose + Σ │ │ │
│ └────────────┬─────────────┘ │ │
│ ▼ │ │
│ ┌──────────────────────────────────────┐ │ │
│ │ Component 5 (NOT an EKF) │ │ │
│ │ - covariance calibrator │ │ │
│ │ - Mahalanobis outlier gate │ │ │
│ │ - source-label producer │ │ │
│ └────────────┬─────────────────────────┘ │ │
│ ▼ │ │
│ ┌──────────────────────────────────────┐ │ │
│ │ Ortho-Tile Generator (Orthority) │ │ │
│ │ → MBTiles+WAL Tile Cache │ │ │
│ └──────────────────────────────────────┘ │ │
└────────────────┬───────────────────────────┘ │
▼ │
GPS_INPUT (pymavlink, signed) ──► ArduPilot │
(GPS1_TYPE=14, EK3_SRC1_POSXY=GPS, EK3_SRC2=GPS)│
│ (ODOMETRY disabled for v1; v1.1+) │
▼ │
Telemetry summary 12 Hz ──────► QGroundControl │
│ │
▼ │
Flight Data Recorder (NVMe, 64 GB cap, no raw frames)
```
---
## Architecture
### Overall principles (revised vs draft02)
1. **Pipeline = stages with explicit confidence**. Each stage emits a pose hypothesis + covariance + categorical label. **Component 5 calibrates and gates; ArduPilot EKF3 fuses.** *(Revised — M-26.)*
2. **All heavy NN inference runs on GPU via TensorRT** (FP16, INT8 where validated). Pre-extract satellite-tile descriptors offline (AC-8.3). *(Unchanged.)*
3. **Orchestration**: **ROS 2 Humble + Isaac ROS 3.2** (Q6 → A, locked). cuVSLAM consumed via `isaac_ros_visual_slam`; MAVROS bridges ROS 2 ↔ MAVLink for the FC. Our matcher / VPR / ortho / Component-5 calibrator / FDR / uploader run as `rclpy` Python nodes. CPython 3.11 / 3.12 inside the nodes; TensorRT engines + CUDA contexts owned per-node. *(Revised — M-29.)*
4. **Persistent satellite cache** across flights (~10 GB for 400 km²); per-flight FDR is separate. *(Unchanged.)*
5. **Every output to the FC carries a covariance** — GPS_INPUT (`h_acc`, `v_acc`, `vel_acc`). ODOMETRY emission disabled for v1 (Option A in M-30). *(Revised — M-30.)*
6. **Service tiles are basemap truth**; onboard tiles go through Service-side voting before promotion (M-9). *(Unchanged.)*
7. **MAVLink2 signing on every companion↔FC link** (M-7). USB bypasses signing — bench-only access. *(Unchanged.)*
8. **No companion-side state propagation** — the FC's EKF3 is the only filter. Any future companion-side filter (v1.x) will be an **ESKF** (S69), not a regular EKF. *(New — M-26.)*
---
### Component 1: Satellite Tile Cache & Descriptor Index
**Unchanged from draft02 / Mode B round 1** — MBTiles SQLite + WAL + connection pool + transaction batching; FAISS IVF over per-chunk DINOv2-VLAD vectors (chunk-decoupled per M-16); `terrain_class` and `trust_level` sidecar. (M-28: COG + PMTiles considered and rejected for our use case.)
---
### Component 1b: Ortho-Tile Generator *(REVISED — M-27)*
**Library**: **Orthority** (S59, Python, MIT-class) — frame-camera model with GeoTIFF DEM lookup. Pip-installable: `pip install orthority`. Replaces draft02's hand-rolled "pinhole projection on per-sector DEM".
**Pipeline per frame** (eligibility / quality / dedup logic unchanged from draft02; only the *projection step* is replaced):
1. **Eligibility check** (unchanged from draft02 / M-9 hard gate): skip when EKF source is `dead_reckoned`, σ_xy > 5 m, roll/pitch > 10°, no inliers, or sector is `rugged`. Sectors classified `moderate` get `terrain_uncertainty=true` sidecar flag.
2. **Orthorectification (revised)**: call `orthority.Ortho(frame, dem, camera_model).process()` with the frame-camera model populated from FC `ATTITUDE` (gimbal pitch / roll / yaw) + companion-resolved position + airframe altitude. SRTM-30 m DEM tile pre-loaded for the operational area.
3. **Resampling to basemap projection** (unchanged): EPSG:3857 z=20.
4. **Quality scoring** (unchanged from draft02): sharpness + coverage + match_inliers + parent_pose_sigma_xy + glare/cloud flag.
5. **Deduplication / write decision** (unchanged from draft02 — M-9 service-tile-immutability + soft/candidate gates).
6. **Sidecar metadata** (unchanged): `parent_pose_sigma_xy`, `terrain_class`, `trust_level`.
**Latency budget**: F-T14 (revised) measures Orthority's per-frame latency on Orin Nano Super. **Budget: ≤50 ms / frame.** Documented fall-back if exceeded: `cv2.warpPerspective` + bilinear DEM lookup (~520 ms estimated).
---
### Component 2: Visual Place Recognition (Global Retrieval)
**Unchanged from draft02 / Mode B round 1.** AnyLoc + SALAD + BoQ + MixVPR shortlist; conditional invocation (M-17); chunk-based retrieval unit (M-16); expanding-window retry (M-18); multi-scale chunks + OSM road-overlay + sector-volatility-driven K (M-19); active-conflict scene-change mitigations stand. (M-33: no new VPR backbone in 2025 displaces this.)
---
### Component 3: Cross-View Matching & PnP *(REVISED — M-24, M-25)*
**Inline lead**: **SuperPoint + LightGlue (TRT FP16/INT8)** — unchanged. Feasibility re-confirmed: ~50200 ms / pair on Orin Nano Super FP16 at 320×240 → 640×480 (RTX 3080 baseline 0.96 + 2.54 ms scaled by Orin Nano Super throughput ratio; cross-validated by S76 YOLO26 reference points).
**Inline peer**: **GIM-LightGlue** — unchanged from draft02 (M-3, S48). +8.418.1 % zero-shot vs LightGlue baseline.
**Embedded fallback**: **XFeat (sparse + semi-dense)** — unchanged.
**Re-localization fallback** *(new — M-24)*: **LiteSAM** (S58). Invoked rarely (cold start, σ_xy > 50 m, sharp turn after cuVSLAM tracking loss). Latency budget: 1.52 s on Orin Nano Super. Accepted because re-loc events are rare and AC-NEW-1 cold-start budget is 30 s.
**Validation oracle** *(new — M-24)*: **LiteSAM run offline on bench data** for ground-truth-quality matches. Used to score the inline matcher's recall@30m on a per-flight basis without needing manual annotation.
**Distillation teacher** *(new — M-24)*: train a satellite-aerial-specialised student model (target ≤5 M params, ≤100 ms / pair) using LiteSAM-supervised correspondences on TartanAir V2 + AerialExtreMatch + UAV-VisLoc. Output is a candidate inline matcher for v1.x.
**Offline ceiling references** *(new — M-25)*: **RoMa v2** (S63), **MASt3R-SLAM** (S62), **MapGlue**, **MATCHA** — included in the matcher bench-off so we know how much accuracy we trade by using SP+LG inline. None becomes inline candidate.
**Bench-off scope (revised)** for the deferred research item:
- Inline candidates (must fit in 200 ms / pair on Orin Nano Super @ 25 W): SP+LG, GIM-LightGlue, XFeat (sparse), XFeat (semi-dense).
- Re-loc candidates (must fit in 2 s / pair): LiteSAM.
- Offline ceilings: RoMa v2, MASt3R-SLAM, MapGlue, MATCHA.
**Bench-off targets** (unchanged from draft02): AerialVL, UAV-VisLoc, AerialExtreMatch, 2chADCNN season set, TartanAir V2, internal Mavic, first internal fixed-wing flight.
**Score on**: AC-1.1 / AC-1.2 / AC-2.2 / p95 latency on Orin Nano Super 25 W / sustained 30-min thermal stability / peak GPU memory / **plus seasonal-robustness score** / **plus accuracy-vs-inline-feasibility frontier (re-loc role only for >200 ms candidates)**.
**PnP & projection**: unchanged from draft02.
**Input downsampling**: unchanged starting points (1024×768 for SP+LG / GIM-LG; 640×480 for XFeat sparse).
---
### Component 4: Visual Odometry *(REVISED — M-22, M-23)*
**v1 choice**: **cuVSLAM** (NVIDIA, CUDA-accelerated, Apache-2.0; S60). Monocular + IMU mode. Drop-in via `isaac_ros_visual_slam` ROS 2 wrapper (S64). Replaces draft02's "custom 2-frame VO via SP+LG / GIM-LG homography".
**Why cuVSLAM**:
- Production-grade VO/SLAM with keyframe-based local mapping + sparse bundle adjustment + loop closure — bounds drift, unlike a 2-frame homography.
- CUDA-accelerated, optimized for Jetson. Reference designs on Orin Nano (S64, S77) confirm runtime feasibility.
- <1 % ATE on KITTI / <5 cm on EuRoC.
- Minimal integration cost via the ROS 2 wrapper.
**Why not the alternatives**:
- DPVO / DPV-SLAM (S61, S73): extrapolated 415 FPS on Orin Nano Super — borderline for 10 Hz target. Reserved as bench-off fall-back.
- MASt3R-SLAM (S62): sub-1 Hz on Orin Nano Super — infeasible inline.
- VINS-Fusion / OpenVINS / BASALT / SVO Pro (S71): non-trivial integration cost; no accuracy advantage. Reserved as bench-off fall-backs.
- Custom 2-frame homography VO (draft02): wrong design (M-22).
**IMU source for cuVSLAM** (Q7 → A, locked, M-35): **MAVLink `RAW_IMU` / `SCALED_IMU` from FC** at ~200400 Hz (path a). Subscribed inside the cuVSLAM node via MAVROS. **F-T1c** (new field test) measures sync-jitter under flight load; if it fails the threshold (TBD by cuVSLAM tolerance), v1.1 adds a dedicated companion IMU (BNO055 / ICM-42688P / BMI270) over SPI as a hardware revision.
**Camera intrinsics**: nav cam (ADTi 20MP APS-C) calibrated pre-flight via standard checkerboard (M-34). cuVSLAM consumes the `camera_info` topic at start-up.
**Risk R8 reframed**: cuVSLAM's high-altitude fixed-wing performance is empirically unproven (its published benchmarks are urban driving + indoor MAV). **F-T1b (revised) bench-off mandatory before AC-1.3 lock**.
**Fall-back path**: if cuVSLAM underperforms on AerialVL fixed-wing trajectories, use a properly-scoped VO (DPV-SLAM with keyframe + bundle adjustment + loop closure, not 2-frame homography) as the v1.1 candidate. Custom 2-frame VO never comes back.
---
### Component 5: Companion-Side Output Stage *(REVISED — M-26, M-30)*
**Renamed**: was "IMU + Visual EKF Fusion" in draft02. Now: **"Companion-Side Output Stage — Covariance Calibrator + Outlier Gate + Source-Label Producer"**.
**Responsibility (v1)**:
1. Consume cuVSLAM relative-pose + cross-view matcher absolute-pose hypotheses.
2. Run a Mahalanobis outlier gate to drop fixes whose innovation w.r.t. cuVSLAM relative pose exceeds a threshold (computed against AC-NEW-4 false-position safety budget).
3. Re-scale covariances using empirical residuals (online, exponentially-weighted) to correct for systematic over- / under-confidence in the matcher / VPR / VO outputs.
4. Tag the result with a categorical source label: `satellite_anchored / vo_extrapolated / dead_reckoned`.
5. Emit on the appropriate MAVLink channel (GPS_INPUT for v1, Option A in M-30).
**Explicitly NOT in v1**:
- ❌ State propagation (no `x_{k+1} = f(x_k, u_k) + w_k`).
- ❌ IMU integration (the FC's EKF3 does this with the FC's own IMU at 400 Hz).
- ❌ ODOMETRY emission (Option B in M-30 — v1.1+).
**ESKF question resolved**: ArduPilot EKF3 is a regular EKF (24-state) — we cannot swap the FC filter (S65, S66, S67, S68). The EKF-vs-ESKF debate applies only to a hypothetical companion-side filter, which we drop for v1. **If v1.x evidence (F-T9 SITL) demands a companion-side filter, use vanilla ESKF** (S69) — the right family for orientation correctness, with tangent-space covariance on SO(3).
**Hybrid-output channel split (M-30)**:
| Mode | `EK3_SRC1_*` configuration | Channel emission | Status |
|------|---------------------------|------------------|--------|
| **Option A (v1 default)** | `POSXY=GPS, VELXY=GPS, YAW=GPS+Compass`. `EK3_SRC2_*=GPS` for failover. | GPS_INPUT only (`GPS1_TYPE=14`). ODOMETRY disabled. | Ships in v1. |
| **Option B (v1.1+)** | `POSXY=ExternalNav, YAW=ExternalNav`. `EK3_SRC2_POSXY=GPS` for failover. | ODOMETRY primary; GPS_INPUT held in reserve, not actively fused while ODOMETRY healthy. | Requires PR #30080 fix; gated on F-T9 SITL pass. |
---
### Component 6: MAVLink Integration & Source Promotion
**Unchanged from draft02 / round 1.** MAVSDK (telemetry, sysid=10) + pymavlink (GPS_INPUT, sysid=11), distinct system-IDs sharing the serial port via ArduPilot's native MAVLink routing. **No mavlink-router daemon.** **MAVLink2 signing mandatory**, per-airframe key in FC FRAM. Source-promotion logic and AC-NEW-2 (<3 s spoofing-promotion latency) carry forward unchanged. (M-31: sysid collision-check added to deploy runbook.)
---
### Component 7: Failsafe, Health & Re-Localization
Unchanged from draft02.
---
### Component 8: Object Localization (AI Camera)
Unchanged from draft02.
---
### Component 9: Software Platform & Process Topology *(LOCKED — Q6 → A, M-29)*
**v1 choice**: **ROS 2 Humble + Isaac ROS 3.2 on JetPack 6 / Ubuntu 22.04** (S64, S77).
**Process topology**:
- **C++ Isaac ROS node**: cuVSLAM via `isaac_ros_visual_slam` (consumes `camera_info` + image stream + IMU; publishes `nav_msgs/Odometry`).
- **C++ MAVROS node**: bridges ROS 2 ↔ MAVLink for the FC. `RAW_IMU` / `SCALED_IMU` subscribed by the cuVSLAM node; FC `ATTITUDE` consumed by Component 1b ortho node; `GPS_INPUT` published by Component 5 calibrator node.
- **Python `rclpy` nodes**: matcher (SP+LG TRT FP16/INT8), VPR (SALAD/BoQ on demand), Component 1b ortho generator (Orthority), Component 5 calibrator + outlier gate, FDR writer, Suite-Service uploader.
- **TensorRT engines + CUDA contexts** owned per-node (no shared CUDA context). Engines loaded at node start-up; warm-up inference at boot.
**Stack details (locked)**:
- CPython **3.11 or 3.12** inside `rclpy` nodes (free-threaded 3.13 deferred to v1.x — M-32, M-33).
- TensorRT **FP16 default**, INT8 where validated by the matcher bench-off.
- **numba JIT** for the calibrator's hot path (Mahalanobis distance + covariance re-scale).
- Configuration via YAML; structured-JSON logging to FDR; `ros2 bag` for in-flight telemetry capture.
**Cost / benefit reaffirmed**:
- **Cost**: ~25 % CPU for DDS + topic serialisation; ~200 MB extra deployment-image footprint; learning curve (mitigated by published reference designs in S64, S77).
- **Benefit**: drop-in `isaac_ros_visual_slam` for cuVSLAM, drop-in MAVROS for the FC bridge, free observability via `ros2 bag` and `rqt_*`, battle-tested by the wider robotics community.
**Reference designs**: S64 (Hackster.io GPS-Denied Drone), S77 (thomasthelliez ROS 2 / Isaac ROS guide), `bandofpv/VSLAM-UAV` (PX4 + ROS 2 reference), `sidharthmohannair/ros2-ardupilot-sitl-hardware` (ArduPilot + ROS 2 reference).
---
### Component 10: Flight Data Recorder
Unchanged from draft02 / round 1.
---
### Component 11: Confidence Score (cross-cutting)
Unchanged from draft02 / round 1.
---
## Testing Strategy
### Functional / Integration
- **F-T1** Tile cache load/lookup *(unchanged)*.
- **F-T1b** *(REVISED — M-22, M-23, R8 reframed)* AC-1.3 drift regression: run **cuVSLAM** on AerialVL fixed-wing trajectories (70 km of real flight). Pass = drift ≤ 100 m mono-only / ≤ 50 m mono+IMU between satellite anchors at 95th percentile. **Gates AC-1.3 lock.** If cuVSLAM fails: fall back to DPV-SLAM bench / VINS-Fusion bench.
- **F-T1c** *(new — M-22, M-23)* Compare cuVSLAM mono vs cuVSLAM mono+IMU on the same AerialVL trajectories — quantifies IMU contribution given MAVLink-rated IMU rate (path (a) of M-35).
- **F-T2** Tile generation + dedup *(extended — M-9 + M-27)*: replay a recorded flight; assert (a) ≤1 tile per ground sector covered ≥2× by nav cam; (b) tile has `parent_pose_sigma_xy` ≤ hard gate; (c) service tiles never overwritten within freshness budget; **(d) Orthority output equivalent to ground-truth ortho (RMSE < 1 px on synthetic frame with known DEM)**.
- **F-T3** Tile uploader → candidate pool *(unchanged from draft02)*.
- **F-T4** End-to-end against AerialVL.
- **F-T5** End-to-end against UAV-VisLoc.
- **F-T5b** End-to-end against AerialExtreMatch *(unchanged from draft02 — M-14)*.
- **F-T5c** Season-robustness regression against 2chADCNN season set *(unchanged from draft02 — M-14)*.
- **F-T6** End-to-end against internal Mavic flight footage.
- **F-T7** Sharp-turn handling *(extended — M-24)*: assert LiteSAM re-loc fallback recovers within 2 s on post-turn frames where SP+LG inline matcher fails.
- **F-T8** Disconnected-segment re-localization *(extended — M-24)*: include LiteSAM re-loc in the test matrix.
- **F-T9** ArduPilot SITL: full MAVLink loop *(REVISED — M-30)*. Test matrix:
- **Option A mode** (v1 default): GPS_INPUT only; verify EKF3 fuses correctly; verify failover to backup GPS via `EK3_SRC2_*`.
- **Option B mode** (v1.1 candidate): ODOMETRY-primary; verify PR #30080-class source-switching is clean; verify GPS_INPUT held in reserve does not double-fuse (issues #30076 / #32506 regression test).
- Source switching: jam-onset → our channel; spoofed-real-GPS recovery → operator-confirmed source-restore.
- **MAVLink2 signing on**: assert injection refused on signing failure; assert acceptance on valid signing.
- **F-T10** Operator re-loc workflow via QGC `STATUSTEXT` *(unchanged)*.
- **F-T11** Cold-start TTFF <30 s (AC-NEW-1) *(extended — M-24)*: include LiteSAM as the cold-start re-loc path.
- **F-T12** Spoofing-promotion <3 s (AC-NEW-2) *(unchanged)*.
- **F-T13** Object localization with airframe-attitude fusion *(unchanged)*.
- **F-T14** *(REVISED — M-27)* Per-sector DEM classification + **Orthority per-frame latency**: load SRTM-30 m for the operational area; assert sector classes (`flat`, `moderate`, `rugged`) match ground-truth DEM amplitudes; **measure Orthority per-frame ortho latency on Orin Nano Super @ 25 W**; assert ≤ 50 ms / frame budget. If exceeded: switch to `cv2.warpPerspective + bilinear DEM` fall-back.
- **F-T15** VPR retrieval-unit bench *(unchanged from draft02 — M-16/17/18)*.
- **F-T16** Synthetic cloud-occlusion injection *(unchanged from draft02)*.
- **F-T17** Mission replay assertion *(unchanged from draft02 — M-17)*.
- **F-T18** *(new — M-26)* Companion-side calibrator regression: replay a recorded flight; assert the calibrator's empirical residuals lie within the configured Mahalanobis gate; assert no state-propagation logic is invoked; assert ArduPilot EKF3 receives well-calibrated covariances (post-flight comparison of `h_acc` reported vs measured residual).
- **F-T19** *(new — Q6)* If Q6.A is chosen: ROS 2 topic-rate sanity test — assert all ROS 2 topics meet expected publish rates under simulated load.
### Non-Functional
- **NF-T1** Latency p95 <400 ms on Orin Nano Super 25 W (AC-4.1) *(unchanged)*.
- **NF-T2** Memory <8 GB shared (AC-4.2) *(extended — Q6)*: ROS 2 + Isaac ROS deployment image must fit; reserve ≥1 GB for matcher + VPR engines.
- **NF-T3** Thermal: 8 h sustained 25 W (AC-NEW-5) *(unchanged)*.
- **NF-T4** False-position safety budget (AC-NEW-4) *(extended — M-26)*: Monte Carlo with synthetic over-confidence injection; verify Component 5's outlier gate rejects bad fixes BEFORE they reach ArduPilot EKF3 (companion-side gate; FC EKF3 gate is a second line of defence).
- **NF-T4b** AC-NEW-7 cache-poisoning safety budget *(unchanged — M-9)*.
- **NF-T5** Storage: 64 GB FDR cap with rollover *(unchanged)*.
- **NF-T6** Imagery freshness gate (AC-NEW-6) *(unchanged)*.
### Security
- **S-T1** … **S-T5** *(unchanged from draft02)*.
### Field
- **FT-1** … **FT-3** *(unchanged from draft02)*.
---
## Key Risks & Open Items (carried into Plan step)
| ID | Risk | Severity | Mitigation |
|----|------|----------|------------|
| R1 | Imagery licensing lead time (Service-side) | Med | Suite Service procurement |
| R2 | Latency budget on Orin Nano Super at 1024×768 | Med | Empirical bench-off in week 1 of impl |
| R3 | Cross-view accuracy at 1 km AGL with Ukrainian seasonal change | Med | 50 %@20 m hard floor; bench-off includes SALAD/BoQ/GIM-LG/2chADCNN/**LiteSAM-as-oracle** |
| R4 | MAVSDK + pymavlink coexistence | **Resolved** (M-6) | — |
| R5 | Thermal at 25 W for 8 h | Med | NF-T3 |
| R6 | AC-7.1 in turning flight | Low | v1.1 |
| R7 | Public dataset gap (V&V) | Med | Bench-off + first internal fixed-wing flight before AC-1.3 lock |
| **R8** *(REFRAMED — M-22, M-23)* | **cuVSLAM 1 km AGL fixed-wing performance is empirically unproven** | Med | F-T1b on AerialVL fixed-wing trajectories; FT-3 first internal fixed-wing flight; documented fall-back to DPV-SLAM / VINS-Fusion |
| R9 | Cross-flight cache poisoning | High (safety) | Service-tile immutability + 2-flight voting + σ_xy hard gate + AC-NEW-7 |
| R10 | Companion↔FC link is flight-critical attack surface | High (security) | MAVLink2 signing mandatory + native routing |
| R11 | ArduPilot ExtNav source-switching gotchas | Med | F-T9 SITL matrix; pin ArduPilot to PR #30080 version |
| R12 | Eastern-Ukraine relief amplitude breaks flat-Earth assumption | Med | Per-sector DEM lookup + runtime self-classifier |
| **R13** *(new — M-27)* | **Orthority per-frame latency on Orin Nano Super may exceed budget** | LowMed | F-T14 measurement; fall-back to `cv2.warpPerspective + bilinear DEM` (~520 ms estimated) |
| **R14** *(new — M-26, M-30)* | **Dropping companion-side EKF may surface FC-side covariance-handling issues** | LowMed | F-T18 calibrator regression + F-T9 SITL Option A; if EKF3 mishandles raw inputs, re-introduce vanilla ESKF in v1.x |
| R15 *(M-29)* | Orchestrator choice (Q6 → A locked: ROS 2 Humble + Isaac ROS 3.2) | **Resolved** | — |
| **R16** *(M-35)* | **MAVLink-rated IMU may be insufficient for cuVSLAM sync sensitivity** | LowMed | F-T1c IMU-sync-jitter measurement; v1.1 hardware revision adds dedicated companion IMU if F-T1c fails (Q7 → A locked: path (a) for v1) |
---
## Proposed AC additions
**AC-NEW-7 — Cache-poisoning safety budget** *(unchanged from draft02 — M-9)*.
**AC-NEW-8 — VO drift bound on fixed-wing 1 km AGL** *(new — M-22, M-23, R8 reframed)*. Specifically: cuVSLAM (mono+IMU) drift between satellite anchors ≤ 50 m at 95th percentile on AerialVL fixed-wing trajectories; ≤ 100 m mono-only. Validated by **F-T1b**.
**AC-NEW-9 — Companion-side covariance calibration accuracy** *(new — M-26)*. Empirical residuals of GPS_INPUT pose, computed against ground truth on F-T1b trajectories, must lie within the reported `h_acc`/`v_acc` covariance with probability ≥ 95 %. (Calibration must not under- or over-claim.) Validated by **F-T18**.
---
## Open Research (deferred to dedicated research passes before Plan)
| Topic | Why now | Output | Owner |
|-------|---------|--------|-------|
| **Cross-view matcher bench-off** *(REVISED scope — M-24, M-25)* | Inline + re-loc + offline-ceiling tracks are now distinct | Selected inline matcher; selected re-loc matcher; ceiling reference numbers; distillation candidate teacher (LiteSAM) | Research skill, follow-up Mode A pass |
| **Input-resolution sweep** | Same as draft02 | Resolution per matcher candidate; sensitivity curves | Same pass |
| **VPR backbone bench-off** | Same as draft02 | Selected VPR backbone | Same pass |
| **VO bench-off** *(new — M-22, M-23)* | cuVSLAM is the lead but unproven on 1 km AGL fixed-wing | cuVSLAM mono / cuVSLAM mono+IMU / DPV-SLAM / VINS-Fusion / OpenVINS comparison on AerialVL + first internal fixed-wing flight | Research / impl. team |
| **Tile-generator quality scoring** | Same as draft02 | Calibrated thresholds for σ_xy / sharpness / glare | Implementation phase |
| **Orthority per-frame latency on Orin Nano Super** *(new — M-27)* | Confirms or rejects M-27 library choice | F-T14 measurement; if fail → `cv2.warpPerspective + bilinear DEM` fall-back path locked | Implementation phase |
| **Internal Mavic-flight V&V dataset** | Same as draft02 | Curated, ground-truth-labelled clips | Operations / data team |
| **First internal fixed-wing flight** | Same as draft02 | Recorded sortie with synced IMU + GPS truth + nav-cam stream | Field-test plan |
| ~~Q6 — Orchestrator decision~~ | **Locked 2026-04-26**: Q6 → A (ROS 2 Humble + Isaac ROS 3.2). | — | — |
| ~~Q7 — Companion IMU strategy~~ | **Locked 2026-04-26**: Q7 → A (MAVLink `RAW_IMU` from FC for v1; dedicated IMU only if F-T1c fails). | — | — |
| **Encryption-at-rest key management** | Same as draft02 | Threat-modelled design | Phase 4 security analysis |
---
## References
All citations are by ID from `_docs/00_research/01_source_registry.md`. Mode B round 2 sources: **S58S77** (round 1 sources S40S57 carried over).
- **VO**: S60 (cuVSLAM), S61 (DPVO-QAT++), S62 (MASt3R-SLAM), S64 (Isaac ROS UAV reference), S71 (VINS-Fusion / OpenVINS Jetson reports), S72 (high-altitude VIO), S73 (DPV-SLAM).
- **Matcher**: S58 (LiteSAM), S63 (RoMa v2), S74 (OrthoLoC + AdHoP), S75 (AerialExtreMatch open-review).
- **Fusion**: S65 (ArduPilot ExtNav double-fusion bug), S66 (Z-axis snap bug), S67 (EKF sources spec), S68 (PX4 EKF2 ESKF PR), S69 (Sola ESKF tutorial), S70 (T-ESKF + Hybrid ESKF/UKF 2025).
- **Ortho**: S59 (Orthority).
- **Sweep**: S76 (Orin Nano Super FP16/INT8 reference points), S77 (ROS 2 / Isaac ROS practical guide).
---
## Related Artifacts
- Mode A draft: `_docs/01_solution/solution_draft01.md` (superseded by draft02 → draft03).
- Mode B round 1 draft: `_docs/01_solution/solution_draft02.md` (superseded by draft03).
- Mode B round 2 decomposition: `_docs/00_research/03_mode_b_decomposition_round2.md`.
- Mode B round 2 reasoning chain: `_docs/00_research/04_reasoning_chain_mode_b_round2.md`.
- Mode B round 2 validation log: `_docs/00_research/05_validation_log_mode_b_round2.md`.
- AC & Restrictions assessment (Phase 1): `_docs/00_research/00_ac_assessment.md` *(unchanged)*.
- Source registry: `_docs/00_research/01_source_registry.md` (S01S77).
- Fact cards: `_docs/00_research/02_fact_cards.md` (Phase 1 + Mode B round 1 M-1..M-21 + Mode B round 2 M-22..M-35).
- Tech stack consolidation: `_docs/01_solution/tech_stack.md` (deferred — Phase 3 optional).
- Security analysis: `_docs/01_solution/security_analysis.md` (deferred — Phase 4 optional, **promoted to recommended-before-Plan-lock** because of M-6/M-7).