- Revised acceptance criteria in the acceptance_criteria.md file to clarify metrics and expectations, including updates to GPS accuracy and image processing quality. - Enhanced restrictions documentation to reflect operational parameters and constraints for UAV flights, including camera specifications and satellite imagery usage. - Added new research documents for acceptance criteria assessment and question decomposition to support ongoing project evaluation and decision-making.
42 KiB
Solution Draft 03
Mode: B (Solution Assessment of
solution_draft02.md). Inputs:solution_draft02.md(Mode B round 1) +_docs/00_research/{03_mode_b_decomposition_round2,04_reasoning_chain_mode_b_round2,05_validation_log_mode_b_round2}.md+ Mode B round-2 sources S58–S77 in01_source_registry.md+ Mode B round-2 fact cards M-22..M-35 in02_fact_cards.md. Date: 2026-04-26 (Mode B round 2). Self-contained: yes — supersedessolution_draft02.md.What changed in round 2 (driven by user-explicit asks: VO, matcher, EKF/ESKF, ortho-tile generator + thorough sweep):
- Component 4 (VO): replace draft02's custom 2-frame homography VO via SP+LG with cuVSLAM (NVIDIA, CUDA-accelerated, drop-in via
isaac_ros_visual_slam) in monocular + IMU mode (M-22, M-23, S60, S64).- Component 5 (Fusion): drop the companion-side EKF entirely for v1. Replace with a lightweight covariance calibrator + Mahalanobis outlier gate + source-label producer — no state propagation, no IMU integration on the companion (M-26). Let ArduPilot EKF3 do the actual fusion. The "EKF vs ESKF" question becomes: if we re-introduce a companion filter in v1.x, use vanilla ESKF (S68, S69) — but for v1 the question is moot.
- Component 5 (Hybrid output): walk back round-1 M-1's "emit BOTH GPS_INPUT AND ODOMETRY in parallel for the same axis" — that triggers ArduPilot EKF3 double-fusion bugs (S65, S66, S67). v1 ships GPS_INPUT only (Option A in M-30); ODOMETRY-primary mode is v1.1 territory.
- Component 3 (Matcher): SP+LG (TRT FP16/INT8) remains the inline matcher; LiteSAM (S58) added in three non-inline roles: re-localization fallback (cold start, σ_xy > 50 m), validation oracle, distillation teacher (M-24). RoMa v2 (S63), MASt3R-SLAM (S62), MapGlue, MATCHA added to the matcher bench-off as offline ceiling references (M-25).
- Component 1b (Ortho-Tile Generator): replace draft02's hand-rolled "pinhole projection on per-sector DEM" with Orthority (S59) — Python library, frame + RPC camera, GeoTIFF DEM, pip-installable. Documented fall-back to
cv2.warpPerspective + bilinear DEMif F-T14 latency measurement fails (M-27).- Component 9 (Software platform): ROS 2 Humble + Isaac ROS 3.2 chosen (Q6 → A, locked 2026-04-26). Natural pair for cuVSLAM and a published reference architecture on Orin Nano Super (S64, S77, M-29). DDS overhead (~2–5 % CPU, ~200 MB image growth) accepted in exchange for free integration of
isaac_ros_visual_slam, MAVROS, andros2 bag/rqt_*observability tooling.- Component 1 (Tile storage), C-2 (VPR), C-6 (MAVLink), C-7/C-8/C-10/C-11: unchanged from draft02 (M-28, M-31, M-33).
Locked-in user decisions carried over from round 1 (unchanged):
- Q1 → A: GPS_INPUT primary channel (now: ONLY channel for v1 — see M-30 above).
- Q2 → A: distinct system-IDs via ArduPilot native MAVLink routing; no
mavlink-routerdaemon.- Q3 → A: AC-NEW-7 thresholds confirmed at P(>30 m)<1 %, P(>100 m)<0.1 % per flight.
- Q4 → A: TartanAir V2 included as early-stage synthetic baseline.
- Q5 → B (round 1): proceed to Plan in fresh conversation. Round 2 was triggered after rollback for additional component-replacement investigation.
- Camera spec → ADTi 20MP 20L V1 APS-C; storage zoom → z=20.
Round-2 user decisions locked-in (2026-04-26):
- Q6 → A: ROS 2 Humble + Isaac ROS 3.2 as the v1 orchestrator (M-29). DIY Python orchestrator dropped. Codified in Component 9.
- Q7 → A: MAVLink
RAW_IMU/SCALED_IMUfrom FC (path a) as the v1 IMU source for cuVSLAM (M-35). Dedicated companion IMU is a v1.1 hardware revision triggered only if F-T1c shows sync-jitter problems. Codified in Component 4.
Assessment Findings (Round 2 additions)
The round-1 findings table (15 rows: M-1 … M-21, including addenda M-19/M-20/M-21) carries forward unchanged. Round 2 adds the following findings, with the same old → weak → new pattern:
| Old Component Solution (round 1) | Weak Point (round 2 evidence) | New Solution (round 2) |
|---|---|---|
| C-4 (round 1): "custom 2-frame VO via SuperPoint+LightGlue / GIM-LightGlue homography." | Functional, high (M-22). Custom 2-frame homography skips loop closure, sparse bundle adjustment, and keyframe-based local mapping — every mechanism that bounds drift in production VO/SLAM. AFIT thesis (S52) shows even ORB-SLAM2/SVO/DSO struggle on real fixed-wing flights; a hand-rolled 2-frame variant will be strictly worse. At 1 km AGL motion parallax shrinks ~10–25× per frame vs 100 m AGL, further degrading monocular VO. | Replace with cuVSLAM (NVIDIA, CUDA-accelerated, Apache-2.0; S60, S64). Monocular + IMU mode, drop-in via isaac_ros_visual_slam ROS 2 wrapper. <1 % ATE on KITTI / <5 cm on EuRoC. Fixed-wing 1 km AGL behaviour empirically TBD — bench-off in F-T1b mandatory before AC-1.3 lock. |
| C-4 (round 1): same row, alternatives. | Functional (M-23). Deep-VO alternatives evaluated for Orin Nano Super: DPVO/DPV-SLAM (S61, S73) extrapolate to 4–15 FPS — borderline for our 10 Hz target; MASt3R-SLAM (S62) is sub-1 Hz on Orin Nano Super — infeasible; VINS-Fusion / OpenVINS / BASALT / SVO Pro (S71) require non-trivial integration cost with no accuracy advantage over cuVSLAM. | cuVSLAM is lead; DPV-SLAM / VINS-Fusion / OpenVINS retained as bench-off fall-backs if cuVSLAM underperforms on fixed-wing 1 km AGL. MASt3R-SLAM / RoMa v2 reserved for offline ceiling references. |
| C-3 (round 1): "SP+LG (TRT FP16) lead, GIM-LightGlue peer, RoMa/DKM bench-off, MASt3R dropped." | Functional, positive (M-24). LiteSAM (S58, MDPI Oct 2025) is purpose-built for satellite↔aerial AVL: 6.31 M params (2.4× smaller than EfficientLoFTR), RMSE@30 = 17.86 m on UAV-VisLoc, beats EfficientLoFTR. But on Jetson Orin Nano Super, extrapolated latency is ~1500–2000 ms / pair (AGX Orin → Orin Nano Super 4× scaling) — outside our 400 ms p95 budget for inline use. | Add LiteSAM in three non-inline roles: (a) re-localization fallback (cold start, σ_xy > 50 m, 1.5–2 s tolerable); (b) validation oracle for offline regression bench; (c) distillation teacher to train a satellite-aerial-specialised student model that fits the inline budget. Inline matcher remains SP+LG / GIM-LG. |
| C-3 (round 1): same row, ceilings. | Functional, positive (M-25). RoMa v2 (S63, Nov 2025): SOTA dense matcher with frozen DINOv3 backbone + custom CUDA + predictive covariance — best published pose-estimation accuracy. MASt3R-SLAM (S62), MapGlue, MATCHA: cross-modal/multimodal matchers with strong specialisation. All GPU-class compute. | Add RoMa v2, MASt3R, MapGlue, MATCHA to the matcher bench-off as offline ceiling references so we know how much accuracy we trade by using SP+LG inline. None becomes inline candidate. |
| C-5 (round 1, M-1): "Onboard loosely-coupled EKF emits two parallel MAVLink streams: GPS_INPUT (primary) AND ODOMETRY (auxiliary, when available) for the same axis." | Functional, safety, high (M-26, M-30). ArduPilot ExtNav best practice (S65, S66, S67): only one position source per axis at a time. Open issues #30076 and #32506 document concrete EKF3 misbehaviours when both ExtNav (ODOMETRY) and GPS (GPS_INPUT) are fed for overlapping axes — including unstable position with high variances and Z-axis snap-to-ODOMETRY. The "emit both in parallel" framing was a misconfiguration, not a feature. | v1 ships GPS_INPUT only (Option A in M-30). ODOMETRY emission disabled in v1. ArduPilot configured EK3_SRC1_*=GPS+Compass; failover via EK3_SRC2_*. Option B (ODOMETRY-primary) is v1.1 work once F-T9 SITL confirms PR #30080-class source-switching is clean. |
| C-5 (round 1): "loosely-coupled EKF in our process." | Architectural (M-26). The companion-side EKF was always going to feed the FC's own EKF3 → double-fusion. Visual fix → companion EKF → ArduPilot EKF3 stacks two filters on overlapping observations, breaks the single-source-per-axis invariant, and risks the same instability documented in #30076/#32506. | Drop the companion-side EKF for v1. Component 5 becomes a "covariance calibrator + Mahalanobis outlier gate + source-label producer" — no state propagation, no IMU integration. Each upstream (matcher, cuVSLAM) emits a hypothesis with covariance; outliers are gated; covariances are re-scaled if empirical residuals show over- or under-confidence; results are emitted on the appropriate MAVLink channel. If v1.x evidence demands a companion-side filter, use vanilla ESKF (S68, S69) — the right family for orientation correctness. |
| C-1b (round 1): "Pinhole projection on per-sector DEM (flat-Earth in flat sectors; SRTM-30 m DEM lookup in moderate sectors)." | Engineering (M-27). Implicit hand-rolled implementation reinvents distortion handling, RPC refinement, DEM bilinear lookup, projection — all of which exist in the Orthority Python library (S59) under MIT-class licence, pip-installable. | Use Orthority for per-frame ortho (frame-camera mode). Falls back to cv2.warpPerspective + bilinear DEM (~5–20 ms estimated) if F-T14 measurement shows Orthority's per-frame latency on Orin Nano Super > 50 ms allotted to ortho. |
| C-9 (round 1): "Single Python process (asyncio) on CPython 3.11/3.12; TRT subprocess workers." | Architectural (M-29). With cuVSLAM adoption (M-23), the natural integration path is isaac_ros_visual_slam (ROS 2 wrapper) → MAVROS → FC. Re-exporting cuVSLAM into a custom asyncio orchestrator is high-friction. ROS 2 Humble + JetPack 6 + Isaac ROS 3.2 is a published, working reference design on the exact hardware target (S64, S77). |
OPEN QUESTION (Q6): ROS 2 Humble + Isaac ROS 3.2 vs. DIY Python orchestrator. ROS 2 cost: ~2–5 % CPU (DDS + topic serialisation), ~200 MB image growth, learning curve. ROS 2 benefit: free integration of cuVSLAM, MAVROS, observability via ros2 bag / rqt_*. User decides. |
(Round-1 findings M-1 through M-21 — including the Phase-1-correction addenda — remain unchanged in their original form; round-2 supersedes only the rows above. Full round-1 rationale lives in solution_draft02.md for traceability and _docs/00_research/02_fact_cards.md.)
Product Solution Description (Revised)
A companion-computer software stack that runs on the Jetson Orin Nano Super alongside an ArduPilot 4.5+ flight controller and provides GPS-equivalent position fixes to the autopilot when real GPS is jammed, spoofed, or denied.
Localization pipeline (per frame at 3 fps nav cam):
- cuVSLAM (monocular + IMU from FC
RAW_IMUMAVLink stream) provides drift-bounded relative pose with keyframe-based local mapping + sparse bundle adjustment + loop closure. - VPR (DINOv2 SALAD/BoQ chosen by bench-off; AnyLoc fallback) narrows the satellite basemap to a top-K candidate-chunk shortlist on re-localization triggers (cold start, sharp turn, σ_xy > 50 m) — conditional invocation keeps cruise overhead near zero.
- Cross-view matcher (SP+LG TRT FP16 inline; GIM-LightGlue peer in the bench-off; LiteSAM as re-loc fallback) produces sub-pixel keypoint correspondences against the candidate chunks; PnP yields an absolute pose + covariance.
- Component 5 (covariance calibrator + Mahalanobis outlier gate + source-label producer — not an EKF) consumes the absolute pose + cuVSLAM relative pose; rejects outliers; re-scales covariances; emits result on the appropriate MAVLink channel.
- GPS_INPUT (
GPS1_TYPE=14, MAVLink2-signed, pymavlink) is sent to the FC. ArduPilot EKF3 (24-state classical EKF, 400 Hz) does the actual fusion of our GPS-equivalent fix with its own IMU, baro, compass.
Tile generation (in-flight, asynchronous):
- Per-frame eligibility check (σ_xy ≤ 5 m hard gate, terrain class flat/moderate, EKF source =
satellite_anchored). - Orthorectification via Orthority (frame-camera model + per-sector DEM from SRTM 30 m).
- Quality scoring + dedup against existing tile cache (service-tile immutability respected).
- Write to MBTiles SQLite cache (WAL + connection pool + transaction batching) with
parent_pose_sigma_xy,terrain_class,trust_level. - Post-flight: tiles uploaded to Suite Service candidate pool; 2-flight voting at Service ingest promotes onboard tiles to trusted basemap.
Object localization (separate path, AI camera): trig + airframe-attitude fusion via FC ATTITUDE MAVLink stream — unchanged from round 1.
MAVLink endpoint: shared between MAVSDK (telemetry, sysid=10) and pymavlink (GPS_INPUT, sysid=11) via distinct system-IDs through ArduPilot's native MAVLink routing — no mavlink-router daemon. MAVLink2 signing mandatory in v1.
Pre-flight (ground)
┌────────────────────────────────────────────────┐
│ Azaion Suite Satellite Service │
│ (sources commercial / agency imagery; │
│ ingests onboard tiles via candidate pool + │
│ 2-flight voting layer) │
└──────────────┬───────────────────┬─────────────┘
│ sync down │ upload back (post-flight)
▼ ▲
┌─────────────────┐
│ DEM (SRTM 30 m) │ ─────► sector classification
└─────────────────┘
Onboard (in-flight)
Nav Cam: ADTi 20MP, 3 fps AI Cam (gimbal+zoom, on-demand)
│ │
▼ ▼
┌────────────────────────────────────────────┐ ┌────────────────────┐
│ ROS 2 Humble + Isaac ROS 3.2 (Q6: TBD) │ │ Object Geo-Locator │
│ ┌──────────────────────────┐ │ │ (pinhole+ATTITUDE) │
│ │ cuVSLAM (mono + IMU) │←──FC RAW_IMU │ └──────┬─────────────┘
│ │ → keyframe pose + cov │ │ │
│ └────────────┬─────────────┘ │ │
│ ▼ │ │
│ ┌──────────────────────────┐ │ │
│ │ VPR (SALAD/BoQ/AnyLoc) │←─ re-loc │ │
│ │ on demand only │ triggers │ │
│ └────────────┬─────────────┘ │ │
│ ▼ │ │
│ ┌──────────────────────────┐ │ │
│ │ Cross-view Matcher │ │ │
│ │ inline: SP+LG / GIM-LG │ │ │
│ │ re-loc: LiteSAM (rare) │ │ │
│ └────────────┬─────────────┘ │ │
│ ▼ │ │
│ ┌──────────────────────────┐ │ │
│ │ PnP → absolute pose + Σ │ │ │
│ └────────────┬─────────────┘ │ │
│ ▼ │ │
│ ┌──────────────────────────────────────┐ │ │
│ │ Component 5 (NOT an EKF) │ │ │
│ │ - covariance calibrator │ │ │
│ │ - Mahalanobis outlier gate │ │ │
│ │ - source-label producer │ │ │
│ └────────────┬─────────────────────────┘ │ │
│ ▼ │ │
│ ┌──────────────────────────────────────┐ │ │
│ │ Ortho-Tile Generator (Orthority) │ │ │
│ │ → MBTiles+WAL Tile Cache │ │ │
│ └──────────────────────────────────────┘ │ │
└────────────────┬───────────────────────────┘ │
▼ │
GPS_INPUT (pymavlink, signed) ──► ArduPilot │
(GPS1_TYPE=14, EK3_SRC1_POSXY=GPS, EK3_SRC2=GPS)│
│ (ODOMETRY disabled for v1; v1.1+) │
▼ │
Telemetry summary 1–2 Hz ──────► QGroundControl │
│ │
▼ │
Flight Data Recorder (NVMe, 64 GB cap, no raw frames)
Architecture
Overall principles (revised vs draft02)
- Pipeline = stages with explicit confidence. Each stage emits a pose hypothesis + covariance + categorical label. Component 5 calibrates and gates; ArduPilot EKF3 fuses. (Revised — M-26.)
- All heavy NN inference runs on GPU via TensorRT (FP16, INT8 where validated). Pre-extract satellite-tile descriptors offline (AC-8.3). (Unchanged.)
- Orchestration: ROS 2 Humble + Isaac ROS 3.2 (Q6 → A, locked). cuVSLAM consumed via
isaac_ros_visual_slam; MAVROS bridges ROS 2 ↔ MAVLink for the FC. Our matcher / VPR / ortho / Component-5 calibrator / FDR / uploader run asrclpyPython nodes. CPython 3.11 / 3.12 inside the nodes; TensorRT engines + CUDA contexts owned per-node. (Revised — M-29.) - Persistent satellite cache across flights (~10 GB for 400 km²); per-flight FDR is separate. (Unchanged.)
- Every output to the FC carries a covariance — GPS_INPUT (
h_acc,v_acc,vel_acc). ODOMETRY emission disabled for v1 (Option A in M-30). (Revised — M-30.) - Service tiles are basemap truth; onboard tiles go through Service-side voting before promotion (M-9). (Unchanged.)
- MAVLink2 signing on every companion↔FC link (M-7). USB bypasses signing — bench-only access. (Unchanged.)
- No companion-side state propagation — the FC's EKF3 is the only filter. Any future companion-side filter (v1.x) will be an ESKF (S69), not a regular EKF. (New — M-26.)
Component 1: Satellite Tile Cache & Descriptor Index
Unchanged from draft02 / Mode B round 1 — MBTiles SQLite + WAL + connection pool + transaction batching; FAISS IVF over per-chunk DINOv2-VLAD vectors (chunk-decoupled per M-16); terrain_class and trust_level sidecar. (M-28: COG + PMTiles considered and rejected for our use case.)
Component 1b: Ortho-Tile Generator (REVISED — M-27)
Library: Orthority (S59, Python, MIT-class) — frame-camera model with GeoTIFF DEM lookup. Pip-installable: pip install orthority. Replaces draft02's hand-rolled "pinhole projection on per-sector DEM".
Pipeline per frame (eligibility / quality / dedup logic unchanged from draft02; only the projection step is replaced):
- Eligibility check (unchanged from draft02 / M-9 hard gate): skip when EKF source is
dead_reckoned, σ_xy > 5 m, roll/pitch > 10°, no inliers, or sector isrugged. Sectors classifiedmoderategetterrain_uncertainty=truesidecar flag. - Orthorectification (revised): call
orthority.Ortho(frame, dem, camera_model).process()with the frame-camera model populated from FCATTITUDE(gimbal pitch / roll / yaw) + companion-resolved position + airframe altitude. SRTM-30 m DEM tile pre-loaded for the operational area. - Resampling to basemap projection (unchanged): EPSG:3857 z=20.
- Quality scoring (unchanged from draft02): sharpness + coverage + match_inliers + parent_pose_sigma_xy + glare/cloud flag.
- Deduplication / write decision (unchanged from draft02 — M-9 service-tile-immutability + soft/candidate gates).
- Sidecar metadata (unchanged):
parent_pose_sigma_xy,terrain_class,trust_level.
Latency budget: F-T14 (revised) measures Orthority's per-frame latency on Orin Nano Super. Budget: ≤50 ms / frame. Documented fall-back if exceeded: cv2.warpPerspective + bilinear DEM lookup (~5–20 ms estimated).
Component 2: Visual Place Recognition (Global Retrieval)
Unchanged from draft02 / Mode B round 1. AnyLoc + SALAD + BoQ + MixVPR shortlist; conditional invocation (M-17); chunk-based retrieval unit (M-16); expanding-window retry (M-18); multi-scale chunks + OSM road-overlay + sector-volatility-driven K (M-19); active-conflict scene-change mitigations stand. (M-33: no new VPR backbone in 2025 displaces this.)
Component 3: Cross-View Matching & PnP (REVISED — M-24, M-25)
Inline lead: SuperPoint + LightGlue (TRT FP16/INT8) — unchanged. Feasibility re-confirmed: ~50–200 ms / pair on Orin Nano Super FP16 at 320×240 → 640×480 (RTX 3080 baseline 0.96 + 2.54 ms scaled by Orin Nano Super throughput ratio; cross-validated by S76 YOLO26 reference points).
Inline peer: GIM-LightGlue — unchanged from draft02 (M-3, S48). +8.4–18.1 % zero-shot vs LightGlue baseline.
Embedded fallback: XFeat (sparse + semi-dense) — unchanged.
Re-localization fallback (new — M-24): LiteSAM (S58). Invoked rarely (cold start, σ_xy > 50 m, sharp turn after cuVSLAM tracking loss). Latency budget: 1.5–2 s on Orin Nano Super. Accepted because re-loc events are rare and AC-NEW-1 cold-start budget is 30 s.
Validation oracle (new — M-24): LiteSAM run offline on bench data for ground-truth-quality matches. Used to score the inline matcher's recall@30m on a per-flight basis without needing manual annotation.
Distillation teacher (new — M-24): train a satellite-aerial-specialised student model (target ≤5 M params, ≤100 ms / pair) using LiteSAM-supervised correspondences on TartanAir V2 + AerialExtreMatch + UAV-VisLoc. Output is a candidate inline matcher for v1.x.
Offline ceiling references (new — M-25): RoMa v2 (S63), MASt3R-SLAM (S62), MapGlue, MATCHA — included in the matcher bench-off so we know how much accuracy we trade by using SP+LG inline. None becomes inline candidate.
Bench-off scope (revised) for the deferred research item:
- Inline candidates (must fit in 200 ms / pair on Orin Nano Super @ 25 W): SP+LG, GIM-LightGlue, XFeat (sparse), XFeat (semi-dense).
- Re-loc candidates (must fit in 2 s / pair): LiteSAM.
- Offline ceilings: RoMa v2, MASt3R-SLAM, MapGlue, MATCHA.
Bench-off targets (unchanged from draft02): AerialVL, UAV-VisLoc, AerialExtreMatch, 2chADCNN season set, TartanAir V2, internal Mavic, first internal fixed-wing flight.
Score on: AC-1.1 / AC-1.2 / AC-2.2 / p95 latency on Orin Nano Super 25 W / sustained 30-min thermal stability / peak GPU memory / plus seasonal-robustness score / plus accuracy-vs-inline-feasibility frontier (re-loc role only for >200 ms candidates).
PnP & projection: unchanged from draft02.
Input downsampling: unchanged starting points (1024×768 for SP+LG / GIM-LG; 640×480 for XFeat sparse).
Component 4: Visual Odometry (REVISED — M-22, M-23)
v1 choice: cuVSLAM (NVIDIA, CUDA-accelerated, Apache-2.0; S60). Monocular + IMU mode. Drop-in via isaac_ros_visual_slam ROS 2 wrapper (S64). Replaces draft02's "custom 2-frame VO via SP+LG / GIM-LG homography".
Why cuVSLAM:
- Production-grade VO/SLAM with keyframe-based local mapping + sparse bundle adjustment + loop closure — bounds drift, unlike a 2-frame homography.
- CUDA-accelerated, optimized for Jetson. Reference designs on Orin Nano (S64, S77) confirm runtime feasibility.
- <1 % ATE on KITTI / <5 cm on EuRoC.
- Minimal integration cost via the ROS 2 wrapper.
Why not the alternatives:
- DPVO / DPV-SLAM (S61, S73): extrapolated 4–15 FPS on Orin Nano Super — borderline for 10 Hz target. Reserved as bench-off fall-back.
- MASt3R-SLAM (S62): sub-1 Hz on Orin Nano Super — infeasible inline.
- VINS-Fusion / OpenVINS / BASALT / SVO Pro (S71): non-trivial integration cost; no accuracy advantage. Reserved as bench-off fall-backs.
- Custom 2-frame homography VO (draft02): wrong design (M-22).
IMU source for cuVSLAM (Q7 → A, locked, M-35): MAVLink RAW_IMU / SCALED_IMU from FC at ~200–400 Hz (path a). Subscribed inside the cuVSLAM node via MAVROS. F-T1c (new field test) measures sync-jitter under flight load; if it fails the threshold (TBD by cuVSLAM tolerance), v1.1 adds a dedicated companion IMU (BNO055 / ICM-42688P / BMI270) over SPI as a hardware revision.
Camera intrinsics: nav cam (ADTi 20MP APS-C) calibrated pre-flight via standard checkerboard (M-34). cuVSLAM consumes the camera_info topic at start-up.
Risk R8 reframed: cuVSLAM's high-altitude fixed-wing performance is empirically unproven (its published benchmarks are urban driving + indoor MAV). F-T1b (revised) bench-off mandatory before AC-1.3 lock.
Fall-back path: if cuVSLAM underperforms on AerialVL fixed-wing trajectories, use a properly-scoped VO (DPV-SLAM with keyframe + bundle adjustment + loop closure, not 2-frame homography) as the v1.1 candidate. Custom 2-frame VO never comes back.
Component 5: Companion-Side Output Stage (REVISED — M-26, M-30)
Renamed: was "IMU + Visual EKF Fusion" in draft02. Now: "Companion-Side Output Stage — Covariance Calibrator + Outlier Gate + Source-Label Producer".
Responsibility (v1):
- Consume cuVSLAM relative-pose + cross-view matcher absolute-pose hypotheses.
- Run a Mahalanobis outlier gate to drop fixes whose innovation w.r.t. cuVSLAM relative pose exceeds a threshold (computed against AC-NEW-4 false-position safety budget).
- Re-scale covariances using empirical residuals (online, exponentially-weighted) to correct for systematic over- / under-confidence in the matcher / VPR / VO outputs.
- Tag the result with a categorical source label:
satellite_anchored / vo_extrapolated / dead_reckoned. - Emit on the appropriate MAVLink channel (GPS_INPUT for v1, Option A in M-30).
Explicitly NOT in v1:
- ❌ State propagation (no
x_{k+1} = f(x_k, u_k) + w_k). - ❌ IMU integration (the FC's EKF3 does this with the FC's own IMU at 400 Hz).
- ❌ ODOMETRY emission (Option B in M-30 — v1.1+).
ESKF question resolved: ArduPilot EKF3 is a regular EKF (24-state) — we cannot swap the FC filter (S65, S66, S67, S68). The EKF-vs-ESKF debate applies only to a hypothetical companion-side filter, which we drop for v1. If v1.x evidence (F-T9 SITL) demands a companion-side filter, use vanilla ESKF (S69) — the right family for orientation correctness, with tangent-space covariance on SO(3).
Hybrid-output channel split (M-30):
| Mode | EK3_SRC1_* configuration |
Channel emission | Status |
|---|---|---|---|
| Option A (v1 default) | POSXY=GPS, VELXY=GPS, YAW=GPS+Compass. EK3_SRC2_*=GPS for failover. |
GPS_INPUT only (GPS1_TYPE=14). ODOMETRY disabled. |
Ships in v1. |
| Option B (v1.1+) | POSXY=ExternalNav, YAW=ExternalNav. EK3_SRC2_POSXY=GPS for failover. |
ODOMETRY primary; GPS_INPUT held in reserve, not actively fused while ODOMETRY healthy. | Requires PR #30080 fix; gated on F-T9 SITL pass. |
Component 6: MAVLink Integration & Source Promotion
Unchanged from draft02 / round 1. MAVSDK (telemetry, sysid=10) + pymavlink (GPS_INPUT, sysid=11), distinct system-IDs sharing the serial port via ArduPilot's native MAVLink routing. No mavlink-router daemon. MAVLink2 signing mandatory, per-airframe key in FC FRAM. Source-promotion logic and AC-NEW-2 (<3 s spoofing-promotion latency) carry forward unchanged. (M-31: sysid collision-check added to deploy runbook.)
Component 7: Failsafe, Health & Re-Localization
Unchanged from draft02.
Component 8: Object Localization (AI Camera)
Unchanged from draft02.
Component 9: Software Platform & Process Topology (LOCKED — Q6 → A, M-29)
v1 choice: ROS 2 Humble + Isaac ROS 3.2 on JetPack 6 / Ubuntu 22.04 (S64, S77).
Process topology:
- C++ Isaac ROS node: cuVSLAM via
isaac_ros_visual_slam(consumescamera_info+ image stream + IMU; publishesnav_msgs/Odometry). - C++ MAVROS node: bridges ROS 2 ↔ MAVLink for the FC.
RAW_IMU/SCALED_IMUsubscribed by the cuVSLAM node; FCATTITUDEconsumed by Component 1b ortho node;GPS_INPUTpublished by Component 5 calibrator node. - Python
rclpynodes: matcher (SP+LG TRT FP16/INT8), VPR (SALAD/BoQ on demand), Component 1b ortho generator (Orthority), Component 5 calibrator + outlier gate, FDR writer, Suite-Service uploader. - TensorRT engines + CUDA contexts owned per-node (no shared CUDA context). Engines loaded at node start-up; warm-up inference at boot.
Stack details (locked):
- CPython 3.11 or 3.12 inside
rclpynodes (free-threaded 3.13 deferred to v1.x — M-32, M-33). - TensorRT FP16 default, INT8 where validated by the matcher bench-off.
- numba JIT for the calibrator's hot path (Mahalanobis distance + covariance re-scale).
- Configuration via YAML; structured-JSON logging to FDR;
ros2 bagfor in-flight telemetry capture.
Cost / benefit reaffirmed:
- Cost: ~2–5 % CPU for DDS + topic serialisation; ~200 MB extra deployment-image footprint; learning curve (mitigated by published reference designs in S64, S77).
- Benefit: drop-in
isaac_ros_visual_slamfor cuVSLAM, drop-in MAVROS for the FC bridge, free observability viaros2 bagandrqt_*, battle-tested by the wider robotics community.
Reference designs: S64 (Hackster.io GPS-Denied Drone), S77 (thomasthelliez ROS 2 / Isaac ROS guide), bandofpv/VSLAM-UAV (PX4 + ROS 2 reference), sidharthmohannair/ros2-ardupilot-sitl-hardware (ArduPilot + ROS 2 reference).
Component 10: Flight Data Recorder
Unchanged from draft02 / round 1.
Component 11: Confidence Score (cross-cutting)
Unchanged from draft02 / round 1.
Testing Strategy
Functional / Integration
- F-T1 Tile cache load/lookup (unchanged).
- F-T1b (REVISED — M-22, M-23, R8 reframed) AC-1.3 drift regression: run cuVSLAM on AerialVL fixed-wing trajectories (70 km of real flight). Pass = drift ≤ 100 m mono-only / ≤ 50 m mono+IMU between satellite anchors at 95th percentile. Gates AC-1.3 lock. If cuVSLAM fails: fall back to DPV-SLAM bench / VINS-Fusion bench.
- F-T1c (new — M-22, M-23) Compare cuVSLAM mono vs cuVSLAM mono+IMU on the same AerialVL trajectories — quantifies IMU contribution given MAVLink-rated IMU rate (path (a) of M-35).
- F-T2 Tile generation + dedup (extended — M-9 + M-27): replay a recorded flight; assert (a) ≤1 tile per ground sector covered ≥2× by nav cam; (b) tile has
parent_pose_sigma_xy≤ hard gate; (c) service tiles never overwritten within freshness budget; (d) Orthority output equivalent to ground-truth ortho (RMSE < 1 px on synthetic frame with known DEM). - F-T3 Tile uploader → candidate pool (unchanged from draft02).
- F-T4 End-to-end against AerialVL.
- F-T5 End-to-end against UAV-VisLoc.
- F-T5b End-to-end against AerialExtreMatch (unchanged from draft02 — M-14).
- F-T5c Season-robustness regression against 2chADCNN season set (unchanged from draft02 — M-14).
- F-T6 End-to-end against internal Mavic flight footage.
- F-T7 Sharp-turn handling (extended — M-24): assert LiteSAM re-loc fallback recovers within 2 s on post-turn frames where SP+LG inline matcher fails.
- F-T8 Disconnected-segment re-localization (extended — M-24): include LiteSAM re-loc in the test matrix.
- F-T9 ArduPilot SITL: full MAVLink loop (REVISED — M-30). Test matrix:
- Option A mode (v1 default): GPS_INPUT only; verify EKF3 fuses correctly; verify failover to backup GPS via
EK3_SRC2_*. - Option B mode (v1.1 candidate): ODOMETRY-primary; verify PR #30080-class source-switching is clean; verify GPS_INPUT held in reserve does not double-fuse (issues #30076 / #32506 regression test).
- Source switching: jam-onset → our channel; spoofed-real-GPS recovery → operator-confirmed source-restore.
- MAVLink2 signing on: assert injection refused on signing failure; assert acceptance on valid signing.
- Option A mode (v1 default): GPS_INPUT only; verify EKF3 fuses correctly; verify failover to backup GPS via
- F-T10 Operator re-loc workflow via QGC
STATUSTEXT(unchanged). - F-T11 Cold-start TTFF <30 s (AC-NEW-1) (extended — M-24): include LiteSAM as the cold-start re-loc path.
- F-T12 Spoofing-promotion <3 s (AC-NEW-2) (unchanged).
- F-T13 Object localization with airframe-attitude fusion (unchanged).
- F-T14 (REVISED — M-27) Per-sector DEM classification + Orthority per-frame latency: load SRTM-30 m for the operational area; assert sector classes (
flat,moderate,rugged) match ground-truth DEM amplitudes; measure Orthority per-frame ortho latency on Orin Nano Super @ 25 W; assert ≤ 50 ms / frame budget. If exceeded: switch tocv2.warpPerspective + bilinear DEMfall-back. - F-T15 VPR retrieval-unit bench (unchanged from draft02 — M-16/17/18).
- F-T16 Synthetic cloud-occlusion injection (unchanged from draft02).
- F-T17 Mission replay assertion (unchanged from draft02 — M-17).
- F-T18 (new — M-26) Companion-side calibrator regression: replay a recorded flight; assert the calibrator's empirical residuals lie within the configured Mahalanobis gate; assert no state-propagation logic is invoked; assert ArduPilot EKF3 receives well-calibrated covariances (post-flight comparison of
h_accreported vs measured residual). - F-T19 (new — Q6) If Q6.A is chosen: ROS 2 topic-rate sanity test — assert all ROS 2 topics meet expected publish rates under simulated load.
Non-Functional
- NF-T1 Latency p95 <400 ms on Orin Nano Super 25 W (AC-4.1) (unchanged).
- NF-T2 Memory <8 GB shared (AC-4.2) (extended — Q6): ROS 2 + Isaac ROS deployment image must fit; reserve ≥1 GB for matcher + VPR engines.
- NF-T3 Thermal: 8 h sustained 25 W (AC-NEW-5) (unchanged).
- NF-T4 False-position safety budget (AC-NEW-4) (extended — M-26): Monte Carlo with synthetic over-confidence injection; verify Component 5's outlier gate rejects bad fixes BEFORE they reach ArduPilot EKF3 (companion-side gate; FC EKF3 gate is a second line of defence).
- NF-T4b AC-NEW-7 cache-poisoning safety budget (unchanged — M-9).
- NF-T5 Storage: 64 GB FDR cap with rollover (unchanged).
- NF-T6 Imagery freshness gate (AC-NEW-6) (unchanged).
Security
- S-T1 … S-T5 (unchanged from draft02).
Field
- FT-1 … FT-3 (unchanged from draft02).
Key Risks & Open Items (carried into Plan step)
| ID | Risk | Severity | Mitigation |
|---|---|---|---|
| R1 | Imagery licensing lead time (Service-side) | Med | Suite Service procurement |
| R2 | Latency budget on Orin Nano Super at 1024×768 | Med | Empirical bench-off in week 1 of impl |
| R3 | Cross-view accuracy at 1 km AGL with Ukrainian seasonal change | Med | 50 %@20 m hard floor; bench-off includes SALAD/BoQ/GIM-LG/2chADCNN/LiteSAM-as-oracle |
| R4 | MAVSDK + pymavlink coexistence | Resolved (M-6) | — |
| R5 | Thermal at 25 W for 8 h | Med | NF-T3 |
| R6 | AC-7.1 in turning flight | Low | v1.1 |
| R7 | Public dataset gap (V&V) | Med | Bench-off + first internal fixed-wing flight before AC-1.3 lock |
| R8 (REFRAMED — M-22, M-23) | cuVSLAM 1 km AGL fixed-wing performance is empirically unproven | Med | F-T1b on AerialVL fixed-wing trajectories; FT-3 first internal fixed-wing flight; documented fall-back to DPV-SLAM / VINS-Fusion |
| R9 | Cross-flight cache poisoning | High (safety) | Service-tile immutability + 2-flight voting + σ_xy hard gate + AC-NEW-7 |
| R10 | Companion↔FC link is flight-critical attack surface | High (security) | MAVLink2 signing mandatory + native routing |
| R11 | ArduPilot ExtNav source-switching gotchas | Med | F-T9 SITL matrix; pin ArduPilot to PR #30080 version |
| R12 | Eastern-Ukraine relief amplitude breaks flat-Earth assumption | Med | Per-sector DEM lookup + runtime self-classifier |
| R13 (new — M-27) | Orthority per-frame latency on Orin Nano Super may exceed budget | Low–Med | F-T14 measurement; fall-back to cv2.warpPerspective + bilinear DEM (~5–20 ms estimated) |
| R14 (new — M-26, M-30) | Dropping companion-side EKF may surface FC-side covariance-handling issues | Low–Med | F-T18 calibrator regression + F-T9 SITL Option A; if EKF3 mishandles raw inputs, re-introduce vanilla ESKF in v1.x |
| R15 (M-29) | Orchestrator choice (Q6 → A locked: ROS 2 Humble + Isaac ROS 3.2) | Resolved | — |
| R16 (M-35) | MAVLink-rated IMU may be insufficient for cuVSLAM sync sensitivity | Low–Med | F-T1c IMU-sync-jitter measurement; v1.1 hardware revision adds dedicated companion IMU if F-T1c fails (Q7 → A locked: path (a) for v1) |
Proposed AC additions
AC-NEW-7 — Cache-poisoning safety budget (unchanged from draft02 — M-9).
AC-NEW-8 — VO drift bound on fixed-wing 1 km AGL (new — M-22, M-23, R8 reframed). Specifically: cuVSLAM (mono+IMU) drift between satellite anchors ≤ 50 m at 95th percentile on AerialVL fixed-wing trajectories; ≤ 100 m mono-only. Validated by F-T1b.
AC-NEW-9 — Companion-side covariance calibration accuracy (new — M-26). Empirical residuals of GPS_INPUT pose, computed against ground truth on F-T1b trajectories, must lie within the reported h_acc/v_acc covariance with probability ≥ 95 %. (Calibration must not under- or over-claim.) Validated by F-T18.
Open Research (deferred to dedicated research passes before Plan)
| Topic | Why now | Output | Owner |
|---|---|---|---|
| Cross-view matcher bench-off (REVISED scope — M-24, M-25) | Inline + re-loc + offline-ceiling tracks are now distinct | Selected inline matcher; selected re-loc matcher; ceiling reference numbers; distillation candidate teacher (LiteSAM) | Research skill, follow-up Mode A pass |
| Input-resolution sweep | Same as draft02 | Resolution per matcher candidate; sensitivity curves | Same pass |
| VPR backbone bench-off | Same as draft02 | Selected VPR backbone | Same pass |
| VO bench-off (new — M-22, M-23) | cuVSLAM is the lead but unproven on 1 km AGL fixed-wing | cuVSLAM mono / cuVSLAM mono+IMU / DPV-SLAM / VINS-Fusion / OpenVINS comparison on AerialVL + first internal fixed-wing flight | Research / impl. team |
| Tile-generator quality scoring | Same as draft02 | Calibrated thresholds for σ_xy / sharpness / glare | Implementation phase |
| Orthority per-frame latency on Orin Nano Super (new — M-27) | Confirms or rejects M-27 library choice | F-T14 measurement; if fail → cv2.warpPerspective + bilinear DEM fall-back path locked |
Implementation phase |
| Internal Mavic-flight V&V dataset | Same as draft02 | Curated, ground-truth-labelled clips | Operations / data team |
| First internal fixed-wing flight | Same as draft02 | Recorded sortie with synced IMU + GPS truth + nav-cam stream | Field-test plan |
| Locked 2026-04-26: Q6 → A (ROS 2 Humble + Isaac ROS 3.2). | — | — | |
Locked 2026-04-26: Q7 → A (MAVLink RAW_IMU from FC for v1; dedicated IMU only if F-T1c fails). |
— | — | |
| Encryption-at-rest key management | Same as draft02 | Threat-modelled design | Phase 4 security analysis |
References
All citations are by ID from _docs/00_research/01_source_registry.md. Mode B round 2 sources: S58–S77 (round 1 sources S40–S57 carried over).
- VO: S60 (cuVSLAM), S61 (DPVO-QAT++), S62 (MASt3R-SLAM), S64 (Isaac ROS UAV reference), S71 (VINS-Fusion / OpenVINS Jetson reports), S72 (high-altitude VIO), S73 (DPV-SLAM).
- Matcher: S58 (LiteSAM), S63 (RoMa v2), S74 (OrthoLoC + AdHoP), S75 (AerialExtreMatch open-review).
- Fusion: S65 (ArduPilot ExtNav double-fusion bug), S66 (Z-axis snap bug), S67 (EKF sources spec), S68 (PX4 EKF2 ESKF PR), S69 (Sola ESKF tutorial), S70 (T-ESKF + Hybrid ESKF/UKF 2025).
- Ortho: S59 (Orthority).
- Sweep: S76 (Orin Nano Super FP16/INT8 reference points), S77 (ROS 2 / Isaac ROS practical guide).
Related Artifacts
- Mode A draft:
_docs/01_solution/solution_draft01.md(superseded by draft02 → draft03). - Mode B round 1 draft:
_docs/01_solution/solution_draft02.md(superseded by draft03). - Mode B round 2 decomposition:
_docs/00_research/03_mode_b_decomposition_round2.md. - Mode B round 2 reasoning chain:
_docs/00_research/04_reasoning_chain_mode_b_round2.md. - Mode B round 2 validation log:
_docs/00_research/05_validation_log_mode_b_round2.md. - AC & Restrictions assessment (Phase 1):
_docs/00_research/00_ac_assessment.md(unchanged). - Source registry:
_docs/00_research/01_source_registry.md(S01–S77). - Fact cards:
_docs/00_research/02_fact_cards.md(Phase 1 + Mode B round 1 M-1..M-21 + Mode B round 2 M-22..M-35). - Tech stack consolidation:
_docs/01_solution/tech_stack.md(deferred — Phase 3 optional). - Security analysis:
_docs/01_solution/security_analysis.md(deferred — Phase 4 optional, promoted to recommended-before-Plan-lock because of M-6/M-7).