- Introduced a new document detailing the current state of the autodev process, including steps, status, and findings.

- Revised acceptance criteria in the acceptance_criteria.md file to clarify metrics and expectations, including updates to GPS accuracy and image processing quality.
- Enhanced restrictions documentation to reflect operational parameters and constraints for UAV flights, including camera specifications and satellite imagery usage.
- Added new research documents for acceptance criteria assessment and question decomposition to support ongoing project evaluation and decision-making.
This commit is contained in:
Oleksandr Bezdieniezhnykh
2026-04-26 14:28:10 +03:00
parent 2178737b36
commit 9eba1689b3
17 changed files with 2965 additions and 69 deletions
+449
View File
@@ -0,0 +1,449 @@
# Solution Draft 03
> **Mode**: B (Solution Assessment of `solution_draft02.md`).
> **Inputs**: `solution_draft02.md` (Mode B round 1) + `_docs/00_research/{03_mode_b_decomposition_round2,04_reasoning_chain_mode_b_round2,05_validation_log_mode_b_round2}.md` + Mode B round-2 sources S58S77 in `01_source_registry.md` + Mode B round-2 fact cards M-22..M-35 in `02_fact_cards.md`.
> **Date**: 2026-04-26 (Mode B round 2).
> **Self-contained**: yes — supersedes `solution_draft02.md`.
>
> **What changed in round 2** (driven by user-explicit asks: VO, matcher, EKF/ESKF, ortho-tile generator + thorough sweep):
>
> - **Component 4 (VO)**: replace draft02's *custom 2-frame homography VO via SP+LG* with **cuVSLAM** (NVIDIA, CUDA-accelerated, drop-in via `isaac_ros_visual_slam`) in monocular + IMU mode (M-22, M-23, S60, S64).
> - **Component 5 (Fusion)**: **drop the companion-side EKF entirely for v1**. Replace with a lightweight **covariance calibrator + Mahalanobis outlier gate + source-label producer** — no state propagation, no IMU integration on the companion (M-26). Let ArduPilot EKF3 do the actual fusion. The "EKF vs ESKF" question becomes: *if* we re-introduce a companion filter in v1.x, use vanilla ESKF (S68, S69) — but for v1 the question is moot.
> - **Component 5 (Hybrid output)**: walk back round-1 M-1's "emit BOTH GPS_INPUT AND ODOMETRY in parallel for the same axis" — that triggers ArduPilot EKF3 double-fusion bugs (S65, S66, S67). v1 ships **GPS_INPUT only** (Option A in M-30); ODOMETRY-primary mode is v1.1 territory.
> - **Component 3 (Matcher)**: **SP+LG (TRT FP16/INT8) remains the inline matcher**; **LiteSAM (S58) added in three non-inline roles**: re-localization fallback (cold start, σ_xy > 50 m), validation oracle, distillation teacher (M-24). RoMa v2 (S63), MASt3R-SLAM (S62), MapGlue, MATCHA added to the matcher bench-off as **offline ceiling references** (M-25).
> - **Component 1b (Ortho-Tile Generator)**: replace draft02's hand-rolled "pinhole projection on per-sector DEM" with **Orthority** (S59) — Python library, frame + RPC camera, GeoTIFF DEM, pip-installable. Documented fall-back to `cv2.warpPerspective + bilinear DEM` if F-T14 latency measurement fails (M-27).
> - **Component 9 (Software platform)**: **ROS 2 Humble + Isaac ROS 3.2** chosen (Q6 → A, locked 2026-04-26). Natural pair for cuVSLAM and a published reference architecture on Orin Nano Super (S64, S77, M-29). DDS overhead (~25 % CPU, ~200 MB image growth) accepted in exchange for free integration of `isaac_ros_visual_slam`, MAVROS, and `ros2 bag` / `rqt_*` observability tooling.
> - **Component 1 (Tile storage)**, **C-2 (VPR)**, **C-6 (MAVLink)**, **C-7/C-8/C-10/C-11**: unchanged from draft02 (M-28, M-31, M-33).
>
> **Locked-in user decisions carried over from round 1** (unchanged):
>
> - **Q1** → A: GPS_INPUT primary channel (now: ONLY channel for v1 — see M-30 above).
> - **Q2** → A: distinct system-IDs via ArduPilot native MAVLink routing; **no `mavlink-router` daemon**.
> - **Q3** → A: AC-NEW-7 thresholds confirmed at P(>30 m)<1 %, P(>100 m)<0.1 % per flight.
> - **Q4** → A: TartanAir V2 included as early-stage synthetic baseline.
> - **Q5** → B (round 1): proceed to Plan in fresh conversation. **Round 2 was triggered after rollback for additional component-replacement investigation.**
> - Camera spec → ADTi 20MP 20L V1 APS-C; storage zoom → z=20.
>
> **Round-2 user decisions locked-in (2026-04-26)**:
>
> - **Q6** → A: **ROS 2 Humble + Isaac ROS 3.2** as the v1 orchestrator (M-29). DIY Python orchestrator dropped. Codified in Component 9.
> - **Q7** → A: **MAVLink `RAW_IMU` / `SCALED_IMU` from FC** (path a) as the v1 IMU source for cuVSLAM (M-35). Dedicated companion IMU is a v1.1 hardware revision triggered only if F-T1c shows sync-jitter problems. Codified in Component 4.
---
## Assessment Findings (Round 2 additions)
The round-1 findings table (15 rows: M-1 … M-21, including addenda M-19/M-20/M-21) carries forward unchanged. **Round 2 adds the following findings, with the same `old → weak → new` pattern**:
| Old Component Solution (round 1) | Weak Point (round 2 evidence) | New Solution (round 2) |
|----------------------------------|-------------------------------|------------------------|
| **C-4 (round 1)**: "custom 2-frame VO via SuperPoint+LightGlue / GIM-LightGlue homography." | **Functional, high (M-22)**. Custom 2-frame homography skips loop closure, sparse bundle adjustment, and keyframe-based local mapping — every mechanism that bounds drift in production VO/SLAM. AFIT thesis (S52) shows even ORB-SLAM2/SVO/DSO struggle on real fixed-wing flights; a hand-rolled 2-frame variant will be strictly worse. At 1 km AGL motion parallax shrinks ~1025× per frame vs 100 m AGL, further degrading monocular VO. | **Replace with cuVSLAM** (NVIDIA, CUDA-accelerated, Apache-2.0; S60, S64). Monocular + IMU mode, drop-in via `isaac_ros_visual_slam` ROS 2 wrapper. <1 % ATE on KITTI / <5 cm on EuRoC. Fixed-wing 1 km AGL behaviour empirically TBD — bench-off in F-T1b mandatory before AC-1.3 lock. |
| **C-4 (round 1)**: same row, alternatives. | **Functional (M-23)**. Deep-VO alternatives evaluated for Orin Nano Super: DPVO/DPV-SLAM (S61, S73) extrapolate to 415 FPS — borderline for our 10 Hz target; MASt3R-SLAM (S62) is sub-1 Hz on Orin Nano Super — infeasible; VINS-Fusion / OpenVINS / BASALT / SVO Pro (S71) require non-trivial integration cost with no accuracy advantage over cuVSLAM. | cuVSLAM is **lead**; DPV-SLAM / VINS-Fusion / OpenVINS retained as **bench-off fall-backs** if cuVSLAM underperforms on fixed-wing 1 km AGL. MASt3R-SLAM / RoMa v2 reserved for **offline ceiling references**. |
| **C-3 (round 1)**: "SP+LG (TRT FP16) lead, GIM-LightGlue peer, RoMa/DKM bench-off, MASt3R dropped." | **Functional, positive (M-24)**. LiteSAM (S58, MDPI Oct 2025) is purpose-built for satellite↔aerial AVL: 6.31 M params (2.4× smaller than EfficientLoFTR), RMSE@30 = 17.86 m on UAV-VisLoc, beats EfficientLoFTR. **But on Jetson Orin Nano Super, extrapolated latency is ~15002000 ms / pair** (AGX Orin → Orin Nano Super 4× scaling) — outside our 400 ms p95 budget for inline use. | **Add LiteSAM in three non-inline roles**: (a) re-localization fallback (cold start, σ_xy > 50 m, 1.52 s tolerable); (b) validation oracle for offline regression bench; (c) distillation teacher to train a satellite-aerial-specialised student model that fits the inline budget. **Inline matcher remains SP+LG / GIM-LG.** |
| **C-3 (round 1)**: same row, ceilings. | **Functional, positive (M-25)**. RoMa v2 (S63, Nov 2025): SOTA dense matcher with frozen DINOv3 backbone + custom CUDA + predictive covariance — best published pose-estimation accuracy. MASt3R-SLAM (S62), MapGlue, MATCHA: cross-modal/multimodal matchers with strong specialisation. All GPU-class compute. | **Add RoMa v2, MASt3R, MapGlue, MATCHA to the matcher bench-off as offline ceiling references** so we know how much accuracy we trade by using SP+LG inline. None becomes inline candidate. |
| **C-5 (round 1, M-1)**: "Onboard loosely-coupled EKF emits two parallel MAVLink streams: GPS_INPUT (primary) AND ODOMETRY (auxiliary, when available) for the same axis." | **Functional, safety, high (M-26, M-30)**. ArduPilot ExtNav best practice (S65, S66, S67): **only one position source per axis at a time**. Open issues #30076 and #32506 document concrete EKF3 misbehaviours when both ExtNav (ODOMETRY) and GPS (GPS_INPUT) are fed for overlapping axes — including unstable position with high variances and Z-axis snap-to-ODOMETRY. The "emit both in parallel" framing was a misconfiguration, not a feature. | **v1 ships GPS_INPUT only** (Option A in M-30). ODOMETRY emission disabled in v1. ArduPilot configured `EK3_SRC1_*=GPS+Compass`; failover via `EK3_SRC2_*`. **Option B (ODOMETRY-primary) is v1.1 work** once F-T9 SITL confirms PR #30080-class source-switching is clean. |
| **C-5 (round 1)**: "loosely-coupled EKF in our process." | **Architectural (M-26)**. The companion-side EKF was always going to feed the FC's own EKF3 → double-fusion. Visual fix → companion EKF → ArduPilot EKF3 stacks two filters on overlapping observations, breaks the single-source-per-axis invariant, and risks the same instability documented in #30076/#32506. | **Drop the companion-side EKF for v1.** Component 5 becomes a **"covariance calibrator + Mahalanobis outlier gate + source-label producer"** — no state propagation, no IMU integration. Each upstream (matcher, cuVSLAM) emits a hypothesis with covariance; outliers are gated; covariances are re-scaled if empirical residuals show over- or under-confidence; results are emitted on the appropriate MAVLink channel. **If v1.x evidence demands a companion-side filter**, use vanilla **ESKF** (S68, S69) — the right family for orientation correctness. |
| **C-1b (round 1)**: "Pinhole projection on per-sector DEM (flat-Earth in flat sectors; SRTM-30 m DEM lookup in moderate sectors)." | **Engineering (M-27)**. Implicit hand-rolled implementation reinvents distortion handling, RPC refinement, DEM bilinear lookup, projection — all of which exist in the **Orthority** Python library (S59) under MIT-class licence, pip-installable. | **Use Orthority for per-frame ortho** (frame-camera mode). Falls back to `cv2.warpPerspective + bilinear DEM` (~520 ms estimated) if F-T14 measurement shows Orthority's per-frame latency on Orin Nano Super > 50 ms allotted to ortho. |
| **C-9 (round 1)**: "Single Python process (asyncio) on CPython 3.11/3.12; TRT subprocess workers." | **Architectural (M-29)**. With cuVSLAM adoption (M-23), the natural integration path is `isaac_ros_visual_slam` (ROS 2 wrapper) → MAVROS → FC. Re-exporting cuVSLAM into a custom asyncio orchestrator is high-friction. **ROS 2 Humble + JetPack 6 + Isaac ROS 3.2 is a published, working reference design on the exact hardware target** (S64, S77). | **OPEN QUESTION (Q6)**: ROS 2 Humble + Isaac ROS 3.2 vs. DIY Python orchestrator. ROS 2 cost: ~25 % CPU (DDS + topic serialisation), ~200 MB image growth, learning curve. ROS 2 benefit: free integration of cuVSLAM, MAVROS, observability via `ros2 bag` / `rqt_*`. **User decides.** |
(Round-1 findings M-1 through M-21 — including the Phase-1-correction addenda — remain unchanged in their original form; round-2 supersedes only the rows above. Full round-1 rationale lives in `solution_draft02.md` for traceability and `_docs/00_research/02_fact_cards.md`.)
---
## Product Solution Description (Revised)
A companion-computer software stack that runs on the **Jetson Orin Nano Super** alongside an **ArduPilot 4.5+** flight controller and provides **GPS-equivalent position fixes** to the autopilot when real GPS is jammed, spoofed, or denied.
**Localization pipeline (per frame at 3 fps nav cam):**
1. **cuVSLAM** (monocular + IMU from FC `RAW_IMU` MAVLink stream) provides drift-bounded **relative pose** with keyframe-based local mapping + sparse bundle adjustment + loop closure.
2. **VPR** (DINOv2 SALAD/BoQ chosen by bench-off; AnyLoc fallback) narrows the satellite basemap to a top-K candidate-chunk shortlist on re-localization triggers (cold start, sharp turn, σ_xy > 50 m) — **conditional invocation** keeps cruise overhead near zero.
3. **Cross-view matcher** (SP+LG TRT FP16 inline; GIM-LightGlue peer in the bench-off; LiteSAM as **re-loc fallback**) produces sub-pixel keypoint correspondences against the candidate chunks; PnP yields an **absolute pose** + covariance.
4. **Component 5** (**covariance calibrator + Mahalanobis outlier gate + source-label producer** — *not* an EKF) consumes the absolute pose + cuVSLAM relative pose; rejects outliers; re-scales covariances; emits result on the appropriate MAVLink channel.
5. **GPS_INPUT** (`GPS1_TYPE=14`, MAVLink2-signed, pymavlink) is sent to the FC. ArduPilot EKF3 (24-state classical EKF, 400 Hz) does the actual fusion of our GPS-equivalent fix with its own IMU, baro, compass.
**Tile generation** (in-flight, asynchronous):
1. Per-frame eligibility check (σ_xy ≤ 5 m hard gate, terrain class flat/moderate, EKF source = `satellite_anchored`).
2. **Orthorectification via Orthority** (frame-camera model + per-sector DEM from SRTM 30 m).
3. Quality scoring + dedup against existing tile cache (service-tile immutability respected).
4. Write to MBTiles SQLite cache (WAL + connection pool + transaction batching) with `parent_pose_sigma_xy`, `terrain_class`, `trust_level`.
5. **Post-flight**: tiles uploaded to **Suite Service candidate pool**; **2-flight voting** at Service ingest promotes onboard tiles to trusted basemap.
**Object localization** (separate path, AI camera): trig + airframe-attitude fusion via FC `ATTITUDE` MAVLink stream — unchanged from round 1.
**MAVLink endpoint**: shared between MAVSDK (telemetry, sysid=10) and pymavlink (GPS_INPUT, sysid=11) via **distinct system-IDs through ArduPilot's native MAVLink routing** — no `mavlink-router` daemon. **MAVLink2 signing mandatory in v1**.
```
Pre-flight (ground)
┌────────────────────────────────────────────────┐
│ Azaion Suite Satellite Service │
│ (sources commercial / agency imagery; │
│ ingests onboard tiles via candidate pool + │
│ 2-flight voting layer) │
└──────────────┬───────────────────┬─────────────┘
│ sync down │ upload back (post-flight)
▼ ▲
┌─────────────────┐
│ DEM (SRTM 30 m) │ ─────► sector classification
└─────────────────┘
Onboard (in-flight)
Nav Cam: ADTi 20MP, 3 fps AI Cam (gimbal+zoom, on-demand)
│ │
▼ ▼
┌────────────────────────────────────────────┐ ┌────────────────────┐
│ ROS 2 Humble + Isaac ROS 3.2 (Q6: TBD) │ │ Object Geo-Locator │
│ ┌──────────────────────────┐ │ │ (pinhole+ATTITUDE) │
│ │ cuVSLAM (mono + IMU) │←──FC RAW_IMU │ └──────┬─────────────┘
│ │ → keyframe pose + cov │ │ │
│ └────────────┬─────────────┘ │ │
│ ▼ │ │
│ ┌──────────────────────────┐ │ │
│ │ VPR (SALAD/BoQ/AnyLoc) │←─ re-loc │ │
│ │ on demand only │ triggers │ │
│ └────────────┬─────────────┘ │ │
│ ▼ │ │
│ ┌──────────────────────────┐ │ │
│ │ Cross-view Matcher │ │ │
│ │ inline: SP+LG / GIM-LG │ │ │
│ │ re-loc: LiteSAM (rare) │ │ │
│ └────────────┬─────────────┘ │ │
│ ▼ │ │
│ ┌──────────────────────────┐ │ │
│ │ PnP → absolute pose + Σ │ │ │
│ └────────────┬─────────────┘ │ │
│ ▼ │ │
│ ┌──────────────────────────────────────┐ │ │
│ │ Component 5 (NOT an EKF) │ │ │
│ │ - covariance calibrator │ │ │
│ │ - Mahalanobis outlier gate │ │ │
│ │ - source-label producer │ │ │
│ └────────────┬─────────────────────────┘ │ │
│ ▼ │ │
│ ┌──────────────────────────────────────┐ │ │
│ │ Ortho-Tile Generator (Orthority) │ │ │
│ │ → MBTiles+WAL Tile Cache │ │ │
│ └──────────────────────────────────────┘ │ │
└────────────────┬───────────────────────────┘ │
▼ │
GPS_INPUT (pymavlink, signed) ──► ArduPilot │
(GPS1_TYPE=14, EK3_SRC1_POSXY=GPS, EK3_SRC2=GPS)│
│ (ODOMETRY disabled for v1; v1.1+) │
▼ │
Telemetry summary 12 Hz ──────► QGroundControl │
│ │
▼ │
Flight Data Recorder (NVMe, 64 GB cap, no raw frames)
```
---
## Architecture
### Overall principles (revised vs draft02)
1. **Pipeline = stages with explicit confidence**. Each stage emits a pose hypothesis + covariance + categorical label. **Component 5 calibrates and gates; ArduPilot EKF3 fuses.** *(Revised — M-26.)*
2. **All heavy NN inference runs on GPU via TensorRT** (FP16, INT8 where validated). Pre-extract satellite-tile descriptors offline (AC-8.3). *(Unchanged.)*
3. **Orchestration**: **ROS 2 Humble + Isaac ROS 3.2** (Q6 → A, locked). cuVSLAM consumed via `isaac_ros_visual_slam`; MAVROS bridges ROS 2 ↔ MAVLink for the FC. Our matcher / VPR / ortho / Component-5 calibrator / FDR / uploader run as `rclpy` Python nodes. CPython 3.11 / 3.12 inside the nodes; TensorRT engines + CUDA contexts owned per-node. *(Revised — M-29.)*
4. **Persistent satellite cache** across flights (~10 GB for 400 km²); per-flight FDR is separate. *(Unchanged.)*
5. **Every output to the FC carries a covariance** — GPS_INPUT (`h_acc`, `v_acc`, `vel_acc`). ODOMETRY emission disabled for v1 (Option A in M-30). *(Revised — M-30.)*
6. **Service tiles are basemap truth**; onboard tiles go through Service-side voting before promotion (M-9). *(Unchanged.)*
7. **MAVLink2 signing on every companion↔FC link** (M-7). USB bypasses signing — bench-only access. *(Unchanged.)*
8. **No companion-side state propagation** — the FC's EKF3 is the only filter. Any future companion-side filter (v1.x) will be an **ESKF** (S69), not a regular EKF. *(New — M-26.)*
---
### Component 1: Satellite Tile Cache & Descriptor Index
**Unchanged from draft02 / Mode B round 1** — MBTiles SQLite + WAL + connection pool + transaction batching; FAISS IVF over per-chunk DINOv2-VLAD vectors (chunk-decoupled per M-16); `terrain_class` and `trust_level` sidecar. (M-28: COG + PMTiles considered and rejected for our use case.)
---
### Component 1b: Ortho-Tile Generator *(REVISED — M-27)*
**Library**: **Orthority** (S59, Python, MIT-class) — frame-camera model with GeoTIFF DEM lookup. Pip-installable: `pip install orthority`. Replaces draft02's hand-rolled "pinhole projection on per-sector DEM".
**Pipeline per frame** (eligibility / quality / dedup logic unchanged from draft02; only the *projection step* is replaced):
1. **Eligibility check** (unchanged from draft02 / M-9 hard gate): skip when EKF source is `dead_reckoned`, σ_xy > 5 m, roll/pitch > 10°, no inliers, or sector is `rugged`. Sectors classified `moderate` get `terrain_uncertainty=true` sidecar flag.
2. **Orthorectification (revised)**: call `orthority.Ortho(frame, dem, camera_model).process()` with the frame-camera model populated from FC `ATTITUDE` (gimbal pitch / roll / yaw) + companion-resolved position + airframe altitude. SRTM-30 m DEM tile pre-loaded for the operational area.
3. **Resampling to basemap projection** (unchanged): EPSG:3857 z=20.
4. **Quality scoring** (unchanged from draft02): sharpness + coverage + match_inliers + parent_pose_sigma_xy + glare/cloud flag.
5. **Deduplication / write decision** (unchanged from draft02 — M-9 service-tile-immutability + soft/candidate gates).
6. **Sidecar metadata** (unchanged): `parent_pose_sigma_xy`, `terrain_class`, `trust_level`.
**Latency budget**: F-T14 (revised) measures Orthority's per-frame latency on Orin Nano Super. **Budget: ≤50 ms / frame.** Documented fall-back if exceeded: `cv2.warpPerspective` + bilinear DEM lookup (~520 ms estimated).
---
### Component 2: Visual Place Recognition (Global Retrieval)
**Unchanged from draft02 / Mode B round 1.** AnyLoc + SALAD + BoQ + MixVPR shortlist; conditional invocation (M-17); chunk-based retrieval unit (M-16); expanding-window retry (M-18); multi-scale chunks + OSM road-overlay + sector-volatility-driven K (M-19); active-conflict scene-change mitigations stand. (M-33: no new VPR backbone in 2025 displaces this.)
---
### Component 3: Cross-View Matching & PnP *(REVISED — M-24, M-25)*
**Inline lead**: **SuperPoint + LightGlue (TRT FP16/INT8)** — unchanged. Feasibility re-confirmed: ~50200 ms / pair on Orin Nano Super FP16 at 320×240 → 640×480 (RTX 3080 baseline 0.96 + 2.54 ms scaled by Orin Nano Super throughput ratio; cross-validated by S76 YOLO26 reference points).
**Inline peer**: **GIM-LightGlue** — unchanged from draft02 (M-3, S48). +8.418.1 % zero-shot vs LightGlue baseline.
**Embedded fallback**: **XFeat (sparse + semi-dense)** — unchanged.
**Re-localization fallback** *(new — M-24)*: **LiteSAM** (S58). Invoked rarely (cold start, σ_xy > 50 m, sharp turn after cuVSLAM tracking loss). Latency budget: 1.52 s on Orin Nano Super. Accepted because re-loc events are rare and AC-NEW-1 cold-start budget is 30 s.
**Validation oracle** *(new — M-24)*: **LiteSAM run offline on bench data** for ground-truth-quality matches. Used to score the inline matcher's recall@30m on a per-flight basis without needing manual annotation.
**Distillation teacher** *(new — M-24)*: train a satellite-aerial-specialised student model (target ≤5 M params, ≤100 ms / pair) using LiteSAM-supervised correspondences on TartanAir V2 + AerialExtreMatch + UAV-VisLoc. Output is a candidate inline matcher for v1.x.
**Offline ceiling references** *(new — M-25)*: **RoMa v2** (S63), **MASt3R-SLAM** (S62), **MapGlue**, **MATCHA** — included in the matcher bench-off so we know how much accuracy we trade by using SP+LG inline. None becomes inline candidate.
**Bench-off scope (revised)** for the deferred research item:
- Inline candidates (must fit in 200 ms / pair on Orin Nano Super @ 25 W): SP+LG, GIM-LightGlue, XFeat (sparse), XFeat (semi-dense).
- Re-loc candidates (must fit in 2 s / pair): LiteSAM.
- Offline ceilings: RoMa v2, MASt3R-SLAM, MapGlue, MATCHA.
**Bench-off targets** (unchanged from draft02): AerialVL, UAV-VisLoc, AerialExtreMatch, 2chADCNN season set, TartanAir V2, internal Mavic, first internal fixed-wing flight.
**Score on**: AC-1.1 / AC-1.2 / AC-2.2 / p95 latency on Orin Nano Super 25 W / sustained 30-min thermal stability / peak GPU memory / **plus seasonal-robustness score** / **plus accuracy-vs-inline-feasibility frontier (re-loc role only for >200 ms candidates)**.
**PnP & projection**: unchanged from draft02.
**Input downsampling**: unchanged starting points (1024×768 for SP+LG / GIM-LG; 640×480 for XFeat sparse).
---
### Component 4: Visual Odometry *(REVISED — M-22, M-23)*
**v1 choice**: **cuVSLAM** (NVIDIA, CUDA-accelerated, Apache-2.0; S60). Monocular + IMU mode. Drop-in via `isaac_ros_visual_slam` ROS 2 wrapper (S64). Replaces draft02's "custom 2-frame VO via SP+LG / GIM-LG homography".
**Why cuVSLAM**:
- Production-grade VO/SLAM with keyframe-based local mapping + sparse bundle adjustment + loop closure — bounds drift, unlike a 2-frame homography.
- CUDA-accelerated, optimized for Jetson. Reference designs on Orin Nano (S64, S77) confirm runtime feasibility.
- <1 % ATE on KITTI / <5 cm on EuRoC.
- Minimal integration cost via the ROS 2 wrapper.
**Why not the alternatives**:
- DPVO / DPV-SLAM (S61, S73): extrapolated 415 FPS on Orin Nano Super — borderline for 10 Hz target. Reserved as bench-off fall-back.
- MASt3R-SLAM (S62): sub-1 Hz on Orin Nano Super — infeasible inline.
- VINS-Fusion / OpenVINS / BASALT / SVO Pro (S71): non-trivial integration cost; no accuracy advantage. Reserved as bench-off fall-backs.
- Custom 2-frame homography VO (draft02): wrong design (M-22).
**IMU source for cuVSLAM** (Q7 → A, locked, M-35): **MAVLink `RAW_IMU` / `SCALED_IMU` from FC** at ~200400 Hz (path a). Subscribed inside the cuVSLAM node via MAVROS. **F-T1c** (new field test) measures sync-jitter under flight load; if it fails the threshold (TBD by cuVSLAM tolerance), v1.1 adds a dedicated companion IMU (BNO055 / ICM-42688P / BMI270) over SPI as a hardware revision.
**Camera intrinsics**: nav cam (ADTi 20MP APS-C) calibrated pre-flight via standard checkerboard (M-34). cuVSLAM consumes the `camera_info` topic at start-up.
**Risk R8 reframed**: cuVSLAM's high-altitude fixed-wing performance is empirically unproven (its published benchmarks are urban driving + indoor MAV). **F-T1b (revised) bench-off mandatory before AC-1.3 lock**.
**Fall-back path**: if cuVSLAM underperforms on AerialVL fixed-wing trajectories, use a properly-scoped VO (DPV-SLAM with keyframe + bundle adjustment + loop closure, not 2-frame homography) as the v1.1 candidate. Custom 2-frame VO never comes back.
---
### Component 5: Companion-Side Output Stage *(REVISED — M-26, M-30)*
**Renamed**: was "IMU + Visual EKF Fusion" in draft02. Now: **"Companion-Side Output Stage — Covariance Calibrator + Outlier Gate + Source-Label Producer"**.
**Responsibility (v1)**:
1. Consume cuVSLAM relative-pose + cross-view matcher absolute-pose hypotheses.
2. Run a Mahalanobis outlier gate to drop fixes whose innovation w.r.t. cuVSLAM relative pose exceeds a threshold (computed against AC-NEW-4 false-position safety budget).
3. Re-scale covariances using empirical residuals (online, exponentially-weighted) to correct for systematic over- / under-confidence in the matcher / VPR / VO outputs.
4. Tag the result with a categorical source label: `satellite_anchored / vo_extrapolated / dead_reckoned`.
5. Emit on the appropriate MAVLink channel (GPS_INPUT for v1, Option A in M-30).
**Explicitly NOT in v1**:
- ❌ State propagation (no `x_{k+1} = f(x_k, u_k) + w_k`).
- ❌ IMU integration (the FC's EKF3 does this with the FC's own IMU at 400 Hz).
- ❌ ODOMETRY emission (Option B in M-30 — v1.1+).
**ESKF question resolved**: ArduPilot EKF3 is a regular EKF (24-state) — we cannot swap the FC filter (S65, S66, S67, S68). The EKF-vs-ESKF debate applies only to a hypothetical companion-side filter, which we drop for v1. **If v1.x evidence (F-T9 SITL) demands a companion-side filter, use vanilla ESKF** (S69) — the right family for orientation correctness, with tangent-space covariance on SO(3).
**Hybrid-output channel split (M-30)**:
| Mode | `EK3_SRC1_*` configuration | Channel emission | Status |
|------|---------------------------|------------------|--------|
| **Option A (v1 default)** | `POSXY=GPS, VELXY=GPS, YAW=GPS+Compass`. `EK3_SRC2_*=GPS` for failover. | GPS_INPUT only (`GPS1_TYPE=14`). ODOMETRY disabled. | Ships in v1. |
| **Option B (v1.1+)** | `POSXY=ExternalNav, YAW=ExternalNav`. `EK3_SRC2_POSXY=GPS` for failover. | ODOMETRY primary; GPS_INPUT held in reserve, not actively fused while ODOMETRY healthy. | Requires PR #30080 fix; gated on F-T9 SITL pass. |
---
### Component 6: MAVLink Integration & Source Promotion
**Unchanged from draft02 / round 1.** MAVSDK (telemetry, sysid=10) + pymavlink (GPS_INPUT, sysid=11), distinct system-IDs sharing the serial port via ArduPilot's native MAVLink routing. **No mavlink-router daemon.** **MAVLink2 signing mandatory**, per-airframe key in FC FRAM. Source-promotion logic and AC-NEW-2 (<3 s spoofing-promotion latency) carry forward unchanged. (M-31: sysid collision-check added to deploy runbook.)
---
### Component 7: Failsafe, Health & Re-Localization
Unchanged from draft02.
---
### Component 8: Object Localization (AI Camera)
Unchanged from draft02.
---
### Component 9: Software Platform & Process Topology *(LOCKED — Q6 → A, M-29)*
**v1 choice**: **ROS 2 Humble + Isaac ROS 3.2 on JetPack 6 / Ubuntu 22.04** (S64, S77).
**Process topology**:
- **C++ Isaac ROS node**: cuVSLAM via `isaac_ros_visual_slam` (consumes `camera_info` + image stream + IMU; publishes `nav_msgs/Odometry`).
- **C++ MAVROS node**: bridges ROS 2 ↔ MAVLink for the FC. `RAW_IMU` / `SCALED_IMU` subscribed by the cuVSLAM node; FC `ATTITUDE` consumed by Component 1b ortho node; `GPS_INPUT` published by Component 5 calibrator node.
- **Python `rclpy` nodes**: matcher (SP+LG TRT FP16/INT8), VPR (SALAD/BoQ on demand), Component 1b ortho generator (Orthority), Component 5 calibrator + outlier gate, FDR writer, Suite-Service uploader.
- **TensorRT engines + CUDA contexts** owned per-node (no shared CUDA context). Engines loaded at node start-up; warm-up inference at boot.
**Stack details (locked)**:
- CPython **3.11 or 3.12** inside `rclpy` nodes (free-threaded 3.13 deferred to v1.x — M-32, M-33).
- TensorRT **FP16 default**, INT8 where validated by the matcher bench-off.
- **numba JIT** for the calibrator's hot path (Mahalanobis distance + covariance re-scale).
- Configuration via YAML; structured-JSON logging to FDR; `ros2 bag` for in-flight telemetry capture.
**Cost / benefit reaffirmed**:
- **Cost**: ~25 % CPU for DDS + topic serialisation; ~200 MB extra deployment-image footprint; learning curve (mitigated by published reference designs in S64, S77).
- **Benefit**: drop-in `isaac_ros_visual_slam` for cuVSLAM, drop-in MAVROS for the FC bridge, free observability via `ros2 bag` and `rqt_*`, battle-tested by the wider robotics community.
**Reference designs**: S64 (Hackster.io GPS-Denied Drone), S77 (thomasthelliez ROS 2 / Isaac ROS guide), `bandofpv/VSLAM-UAV` (PX4 + ROS 2 reference), `sidharthmohannair/ros2-ardupilot-sitl-hardware` (ArduPilot + ROS 2 reference).
---
### Component 10: Flight Data Recorder
Unchanged from draft02 / round 1.
---
### Component 11: Confidence Score (cross-cutting)
Unchanged from draft02 / round 1.
---
## Testing Strategy
### Functional / Integration
- **F-T1** Tile cache load/lookup *(unchanged)*.
- **F-T1b** *(REVISED — M-22, M-23, R8 reframed)* AC-1.3 drift regression: run **cuVSLAM** on AerialVL fixed-wing trajectories (70 km of real flight). Pass = drift ≤ 100 m mono-only / ≤ 50 m mono+IMU between satellite anchors at 95th percentile. **Gates AC-1.3 lock.** If cuVSLAM fails: fall back to DPV-SLAM bench / VINS-Fusion bench.
- **F-T1c** *(new — M-22, M-23)* Compare cuVSLAM mono vs cuVSLAM mono+IMU on the same AerialVL trajectories — quantifies IMU contribution given MAVLink-rated IMU rate (path (a) of M-35).
- **F-T2** Tile generation + dedup *(extended — M-9 + M-27)*: replay a recorded flight; assert (a) ≤1 tile per ground sector covered ≥2× by nav cam; (b) tile has `parent_pose_sigma_xy` ≤ hard gate; (c) service tiles never overwritten within freshness budget; **(d) Orthority output equivalent to ground-truth ortho (RMSE < 1 px on synthetic frame with known DEM)**.
- **F-T3** Tile uploader → candidate pool *(unchanged from draft02)*.
- **F-T4** End-to-end against AerialVL.
- **F-T5** End-to-end against UAV-VisLoc.
- **F-T5b** End-to-end against AerialExtreMatch *(unchanged from draft02 — M-14)*.
- **F-T5c** Season-robustness regression against 2chADCNN season set *(unchanged from draft02 — M-14)*.
- **F-T6** End-to-end against internal Mavic flight footage.
- **F-T7** Sharp-turn handling *(extended — M-24)*: assert LiteSAM re-loc fallback recovers within 2 s on post-turn frames where SP+LG inline matcher fails.
- **F-T8** Disconnected-segment re-localization *(extended — M-24)*: include LiteSAM re-loc in the test matrix.
- **F-T9** ArduPilot SITL: full MAVLink loop *(REVISED — M-30)*. Test matrix:
- **Option A mode** (v1 default): GPS_INPUT only; verify EKF3 fuses correctly; verify failover to backup GPS via `EK3_SRC2_*`.
- **Option B mode** (v1.1 candidate): ODOMETRY-primary; verify PR #30080-class source-switching is clean; verify GPS_INPUT held in reserve does not double-fuse (issues #30076 / #32506 regression test).
- Source switching: jam-onset → our channel; spoofed-real-GPS recovery → operator-confirmed source-restore.
- **MAVLink2 signing on**: assert injection refused on signing failure; assert acceptance on valid signing.
- **F-T10** Operator re-loc workflow via QGC `STATUSTEXT` *(unchanged)*.
- **F-T11** Cold-start TTFF <30 s (AC-NEW-1) *(extended — M-24)*: include LiteSAM as the cold-start re-loc path.
- **F-T12** Spoofing-promotion <3 s (AC-NEW-2) *(unchanged)*.
- **F-T13** Object localization with airframe-attitude fusion *(unchanged)*.
- **F-T14** *(REVISED — M-27)* Per-sector DEM classification + **Orthority per-frame latency**: load SRTM-30 m for the operational area; assert sector classes (`flat`, `moderate`, `rugged`) match ground-truth DEM amplitudes; **measure Orthority per-frame ortho latency on Orin Nano Super @ 25 W**; assert ≤ 50 ms / frame budget. If exceeded: switch to `cv2.warpPerspective + bilinear DEM` fall-back.
- **F-T15** VPR retrieval-unit bench *(unchanged from draft02 — M-16/17/18)*.
- **F-T16** Synthetic cloud-occlusion injection *(unchanged from draft02)*.
- **F-T17** Mission replay assertion *(unchanged from draft02 — M-17)*.
- **F-T18** *(new — M-26)* Companion-side calibrator regression: replay a recorded flight; assert the calibrator's empirical residuals lie within the configured Mahalanobis gate; assert no state-propagation logic is invoked; assert ArduPilot EKF3 receives well-calibrated covariances (post-flight comparison of `h_acc` reported vs measured residual).
- **F-T19** *(new — Q6)* If Q6.A is chosen: ROS 2 topic-rate sanity test — assert all ROS 2 topics meet expected publish rates under simulated load.
### Non-Functional
- **NF-T1** Latency p95 <400 ms on Orin Nano Super 25 W (AC-4.1) *(unchanged)*.
- **NF-T2** Memory <8 GB shared (AC-4.2) *(extended — Q6)*: ROS 2 + Isaac ROS deployment image must fit; reserve ≥1 GB for matcher + VPR engines.
- **NF-T3** Thermal: 8 h sustained 25 W (AC-NEW-5) *(unchanged)*.
- **NF-T4** False-position safety budget (AC-NEW-4) *(extended — M-26)*: Monte Carlo with synthetic over-confidence injection; verify Component 5's outlier gate rejects bad fixes BEFORE they reach ArduPilot EKF3 (companion-side gate; FC EKF3 gate is a second line of defence).
- **NF-T4b** AC-NEW-7 cache-poisoning safety budget *(unchanged — M-9)*.
- **NF-T5** Storage: 64 GB FDR cap with rollover *(unchanged)*.
- **NF-T6** Imagery freshness gate (AC-NEW-6) *(unchanged)*.
### Security
- **S-T1** … **S-T5** *(unchanged from draft02)*.
### Field
- **FT-1** … **FT-3** *(unchanged from draft02)*.
---
## Key Risks & Open Items (carried into Plan step)
| ID | Risk | Severity | Mitigation |
|----|------|----------|------------|
| R1 | Imagery licensing lead time (Service-side) | Med | Suite Service procurement |
| R2 | Latency budget on Orin Nano Super at 1024×768 | Med | Empirical bench-off in week 1 of impl |
| R3 | Cross-view accuracy at 1 km AGL with Ukrainian seasonal change | Med | 50 %@20 m hard floor; bench-off includes SALAD/BoQ/GIM-LG/2chADCNN/**LiteSAM-as-oracle** |
| R4 | MAVSDK + pymavlink coexistence | **Resolved** (M-6) | — |
| R5 | Thermal at 25 W for 8 h | Med | NF-T3 |
| R6 | AC-7.1 in turning flight | Low | v1.1 |
| R7 | Public dataset gap (V&V) | Med | Bench-off + first internal fixed-wing flight before AC-1.3 lock |
| **R8** *(REFRAMED — M-22, M-23)* | **cuVSLAM 1 km AGL fixed-wing performance is empirically unproven** | Med | F-T1b on AerialVL fixed-wing trajectories; FT-3 first internal fixed-wing flight; documented fall-back to DPV-SLAM / VINS-Fusion |
| R9 | Cross-flight cache poisoning | High (safety) | Service-tile immutability + 2-flight voting + σ_xy hard gate + AC-NEW-7 |
| R10 | Companion↔FC link is flight-critical attack surface | High (security) | MAVLink2 signing mandatory + native routing |
| R11 | ArduPilot ExtNav source-switching gotchas | Med | F-T9 SITL matrix; pin ArduPilot to PR #30080 version |
| R12 | Eastern-Ukraine relief amplitude breaks flat-Earth assumption | Med | Per-sector DEM lookup + runtime self-classifier |
| **R13** *(new — M-27)* | **Orthority per-frame latency on Orin Nano Super may exceed budget** | LowMed | F-T14 measurement; fall-back to `cv2.warpPerspective + bilinear DEM` (~520 ms estimated) |
| **R14** *(new — M-26, M-30)* | **Dropping companion-side EKF may surface FC-side covariance-handling issues** | LowMed | F-T18 calibrator regression + F-T9 SITL Option A; if EKF3 mishandles raw inputs, re-introduce vanilla ESKF in v1.x |
| R15 *(M-29)* | Orchestrator choice (Q6 → A locked: ROS 2 Humble + Isaac ROS 3.2) | **Resolved** | — |
| **R16** *(M-35)* | **MAVLink-rated IMU may be insufficient for cuVSLAM sync sensitivity** | LowMed | F-T1c IMU-sync-jitter measurement; v1.1 hardware revision adds dedicated companion IMU if F-T1c fails (Q7 → A locked: path (a) for v1) |
---
## Proposed AC additions
**AC-NEW-7 — Cache-poisoning safety budget** *(unchanged from draft02 — M-9)*.
**AC-NEW-8 — VO drift bound on fixed-wing 1 km AGL** *(new — M-22, M-23, R8 reframed)*. Specifically: cuVSLAM (mono+IMU) drift between satellite anchors ≤ 50 m at 95th percentile on AerialVL fixed-wing trajectories; ≤ 100 m mono-only. Validated by **F-T1b**.
**AC-NEW-9 — Companion-side covariance calibration accuracy** *(new — M-26)*. Empirical residuals of GPS_INPUT pose, computed against ground truth on F-T1b trajectories, must lie within the reported `h_acc`/`v_acc` covariance with probability ≥ 95 %. (Calibration must not under- or over-claim.) Validated by **F-T18**.
---
## Open Research (deferred to dedicated research passes before Plan)
| Topic | Why now | Output | Owner |
|-------|---------|--------|-------|
| **Cross-view matcher bench-off** *(REVISED scope — M-24, M-25)* | Inline + re-loc + offline-ceiling tracks are now distinct | Selected inline matcher; selected re-loc matcher; ceiling reference numbers; distillation candidate teacher (LiteSAM) | Research skill, follow-up Mode A pass |
| **Input-resolution sweep** | Same as draft02 | Resolution per matcher candidate; sensitivity curves | Same pass |
| **VPR backbone bench-off** | Same as draft02 | Selected VPR backbone | Same pass |
| **VO bench-off** *(new — M-22, M-23)* | cuVSLAM is the lead but unproven on 1 km AGL fixed-wing | cuVSLAM mono / cuVSLAM mono+IMU / DPV-SLAM / VINS-Fusion / OpenVINS comparison on AerialVL + first internal fixed-wing flight | Research / impl. team |
| **Tile-generator quality scoring** | Same as draft02 | Calibrated thresholds for σ_xy / sharpness / glare | Implementation phase |
| **Orthority per-frame latency on Orin Nano Super** *(new — M-27)* | Confirms or rejects M-27 library choice | F-T14 measurement; if fail → `cv2.warpPerspective + bilinear DEM` fall-back path locked | Implementation phase |
| **Internal Mavic-flight V&V dataset** | Same as draft02 | Curated, ground-truth-labelled clips | Operations / data team |
| **First internal fixed-wing flight** | Same as draft02 | Recorded sortie with synced IMU + GPS truth + nav-cam stream | Field-test plan |
| ~~Q6 — Orchestrator decision~~ | **Locked 2026-04-26**: Q6 → A (ROS 2 Humble + Isaac ROS 3.2). | — | — |
| ~~Q7 — Companion IMU strategy~~ | **Locked 2026-04-26**: Q7 → A (MAVLink `RAW_IMU` from FC for v1; dedicated IMU only if F-T1c fails). | — | — |
| **Encryption-at-rest key management** | Same as draft02 | Threat-modelled design | Phase 4 security analysis |
---
## References
All citations are by ID from `_docs/00_research/01_source_registry.md`. Mode B round 2 sources: **S58S77** (round 1 sources S40S57 carried over).
- **VO**: S60 (cuVSLAM), S61 (DPVO-QAT++), S62 (MASt3R-SLAM), S64 (Isaac ROS UAV reference), S71 (VINS-Fusion / OpenVINS Jetson reports), S72 (high-altitude VIO), S73 (DPV-SLAM).
- **Matcher**: S58 (LiteSAM), S63 (RoMa v2), S74 (OrthoLoC + AdHoP), S75 (AerialExtreMatch open-review).
- **Fusion**: S65 (ArduPilot ExtNav double-fusion bug), S66 (Z-axis snap bug), S67 (EKF sources spec), S68 (PX4 EKF2 ESKF PR), S69 (Sola ESKF tutorial), S70 (T-ESKF + Hybrid ESKF/UKF 2025).
- **Ortho**: S59 (Orthority).
- **Sweep**: S76 (Orin Nano Super FP16/INT8 reference points), S77 (ROS 2 / Isaac ROS practical guide).
---
## Related Artifacts
- Mode A draft: `_docs/01_solution/solution_draft01.md` (superseded by draft02 → draft03).
- Mode B round 1 draft: `_docs/01_solution/solution_draft02.md` (superseded by draft03).
- Mode B round 2 decomposition: `_docs/00_research/03_mode_b_decomposition_round2.md`.
- Mode B round 2 reasoning chain: `_docs/00_research/04_reasoning_chain_mode_b_round2.md`.
- Mode B round 2 validation log: `_docs/00_research/05_validation_log_mode_b_round2.md`.
- AC & Restrictions assessment (Phase 1): `_docs/00_research/00_ac_assessment.md` *(unchanged)*.
- Source registry: `_docs/00_research/01_source_registry.md` (S01S77).
- Fact cards: `_docs/00_research/02_fact_cards.md` (Phase 1 + Mode B round 1 M-1..M-21 + Mode B round 2 M-22..M-35).
- Tech stack consolidation: `_docs/01_solution/tech_stack.md` (deferred — Phase 3 optional).
- Security analysis: `_docs/01_solution/security_analysis.md` (deferred — Phase 4 optional, **promoted to recommended-before-Plan-lock** because of M-6/M-7).