Files
gps-denied-onboard/_docs/01_solution/solution.md
T
Oleksandr Bezdieniezhnykh 9eba1689b3 - Introduced a new document detailing the current state of the autodev process, including steps, status, and findings.
- Revised acceptance criteria in the acceptance_criteria.md file to clarify metrics and expectations, including updates to GPS accuracy and image processing quality.
- Enhanced restrictions documentation to reflect operational parameters and constraints for UAV flights, including camera specifications and satellite imagery usage.
- Added new research documents for acceptance criteria assessment and question decomposition to support ongoing project evaluation and decision-making.
2026-04-26 14:28:10 +03:00

42 KiB
Raw Blame History

Solution Draft 03

Mode: B (Solution Assessment of solution_draft02.md). Inputs: solution_draft02.md (Mode B round 1) + _docs/00_research/{03_mode_b_decomposition_round2,04_reasoning_chain_mode_b_round2,05_validation_log_mode_b_round2}.md + Mode B round-2 sources S58S77 in 01_source_registry.md + Mode B round-2 fact cards M-22..M-35 in 02_fact_cards.md. Date: 2026-04-26 (Mode B round 2). Self-contained: yes — supersedes solution_draft02.md.

What changed in round 2 (driven by user-explicit asks: VO, matcher, EKF/ESKF, ortho-tile generator + thorough sweep):

  • Component 4 (VO): replace draft02's custom 2-frame homography VO via SP+LG with cuVSLAM (NVIDIA, CUDA-accelerated, drop-in via isaac_ros_visual_slam) in monocular + IMU mode (M-22, M-23, S60, S64).
  • Component 5 (Fusion): drop the companion-side EKF entirely for v1. Replace with a lightweight covariance calibrator + Mahalanobis outlier gate + source-label producer — no state propagation, no IMU integration on the companion (M-26). Let ArduPilot EKF3 do the actual fusion. The "EKF vs ESKF" question becomes: if we re-introduce a companion filter in v1.x, use vanilla ESKF (S68, S69) — but for v1 the question is moot.
  • Component 5 (Hybrid output): walk back round-1 M-1's "emit BOTH GPS_INPUT AND ODOMETRY in parallel for the same axis" — that triggers ArduPilot EKF3 double-fusion bugs (S65, S66, S67). v1 ships GPS_INPUT only (Option A in M-30); ODOMETRY-primary mode is v1.1 territory.
  • Component 3 (Matcher): SP+LG (TRT FP16/INT8) remains the inline matcher; LiteSAM (S58) added in three non-inline roles: re-localization fallback (cold start, σ_xy > 50 m), validation oracle, distillation teacher (M-24). RoMa v2 (S63), MASt3R-SLAM (S62), MapGlue, MATCHA added to the matcher bench-off as offline ceiling references (M-25).
  • Component 1b (Ortho-Tile Generator): replace draft02's hand-rolled "pinhole projection on per-sector DEM" with Orthority (S59) — Python library, frame + RPC camera, GeoTIFF DEM, pip-installable. Documented fall-back to cv2.warpPerspective + bilinear DEM if F-T14 latency measurement fails (M-27).
  • Component 9 (Software platform): ROS 2 Humble + Isaac ROS 3.2 chosen (Q6 → A, locked 2026-04-26). Natural pair for cuVSLAM and a published reference architecture on Orin Nano Super (S64, S77, M-29). DDS overhead (~25 % CPU, ~200 MB image growth) accepted in exchange for free integration of isaac_ros_visual_slam, MAVROS, and ros2 bag / rqt_* observability tooling.
  • Component 1 (Tile storage), C-2 (VPR), C-6 (MAVLink), C-7/C-8/C-10/C-11: unchanged from draft02 (M-28, M-31, M-33).

Locked-in user decisions carried over from round 1 (unchanged):

  • Q1 → A: GPS_INPUT primary channel (now: ONLY channel for v1 — see M-30 above).
  • Q2 → A: distinct system-IDs via ArduPilot native MAVLink routing; no mavlink-router daemon.
  • Q3 → A: AC-NEW-7 thresholds confirmed at P(>30 m)<1 %, P(>100 m)<0.1 % per flight.
  • Q4 → A: TartanAir V2 included as early-stage synthetic baseline.
  • Q5 → B (round 1): proceed to Plan in fresh conversation. Round 2 was triggered after rollback for additional component-replacement investigation.
  • Camera spec → ADTi 20MP 20L V1 APS-C; storage zoom → z=20.

Round-2 user decisions locked-in (2026-04-26):

  • Q6 → A: ROS 2 Humble + Isaac ROS 3.2 as the v1 orchestrator (M-29). DIY Python orchestrator dropped. Codified in Component 9.
  • Q7 → A: MAVLink RAW_IMU / SCALED_IMU from FC (path a) as the v1 IMU source for cuVSLAM (M-35). Dedicated companion IMU is a v1.1 hardware revision triggered only if F-T1c shows sync-jitter problems. Codified in Component 4.

Assessment Findings (Round 2 additions)

The round-1 findings table (15 rows: M-1 … M-21, including addenda M-19/M-20/M-21) carries forward unchanged. Round 2 adds the following findings, with the same old → weak → new pattern:

Old Component Solution (round 1) Weak Point (round 2 evidence) New Solution (round 2)
C-4 (round 1): "custom 2-frame VO via SuperPoint+LightGlue / GIM-LightGlue homography." Functional, high (M-22). Custom 2-frame homography skips loop closure, sparse bundle adjustment, and keyframe-based local mapping — every mechanism that bounds drift in production VO/SLAM. AFIT thesis (S52) shows even ORB-SLAM2/SVO/DSO struggle on real fixed-wing flights; a hand-rolled 2-frame variant will be strictly worse. At 1 km AGL motion parallax shrinks ~1025× per frame vs 100 m AGL, further degrading monocular VO. Replace with cuVSLAM (NVIDIA, CUDA-accelerated, Apache-2.0; S60, S64). Monocular + IMU mode, drop-in via isaac_ros_visual_slam ROS 2 wrapper. <1 % ATE on KITTI / <5 cm on EuRoC. Fixed-wing 1 km AGL behaviour empirically TBD — bench-off in F-T1b mandatory before AC-1.3 lock.
C-4 (round 1): same row, alternatives. Functional (M-23). Deep-VO alternatives evaluated for Orin Nano Super: DPVO/DPV-SLAM (S61, S73) extrapolate to 415 FPS — borderline for our 10 Hz target; MASt3R-SLAM (S62) is sub-1 Hz on Orin Nano Super — infeasible; VINS-Fusion / OpenVINS / BASALT / SVO Pro (S71) require non-trivial integration cost with no accuracy advantage over cuVSLAM. cuVSLAM is lead; DPV-SLAM / VINS-Fusion / OpenVINS retained as bench-off fall-backs if cuVSLAM underperforms on fixed-wing 1 km AGL. MASt3R-SLAM / RoMa v2 reserved for offline ceiling references.
C-3 (round 1): "SP+LG (TRT FP16) lead, GIM-LightGlue peer, RoMa/DKM bench-off, MASt3R dropped." Functional, positive (M-24). LiteSAM (S58, MDPI Oct 2025) is purpose-built for satellite↔aerial AVL: 6.31 M params (2.4× smaller than EfficientLoFTR), RMSE@30 = 17.86 m on UAV-VisLoc, beats EfficientLoFTR. But on Jetson Orin Nano Super, extrapolated latency is ~15002000 ms / pair (AGX Orin → Orin Nano Super 4× scaling) — outside our 400 ms p95 budget for inline use. Add LiteSAM in three non-inline roles: (a) re-localization fallback (cold start, σ_xy > 50 m, 1.52 s tolerable); (b) validation oracle for offline regression bench; (c) distillation teacher to train a satellite-aerial-specialised student model that fits the inline budget. Inline matcher remains SP+LG / GIM-LG.
C-3 (round 1): same row, ceilings. Functional, positive (M-25). RoMa v2 (S63, Nov 2025): SOTA dense matcher with frozen DINOv3 backbone + custom CUDA + predictive covariance — best published pose-estimation accuracy. MASt3R-SLAM (S62), MapGlue, MATCHA: cross-modal/multimodal matchers with strong specialisation. All GPU-class compute. Add RoMa v2, MASt3R, MapGlue, MATCHA to the matcher bench-off as offline ceiling references so we know how much accuracy we trade by using SP+LG inline. None becomes inline candidate.
C-5 (round 1, M-1): "Onboard loosely-coupled EKF emits two parallel MAVLink streams: GPS_INPUT (primary) AND ODOMETRY (auxiliary, when available) for the same axis." Functional, safety, high (M-26, M-30). ArduPilot ExtNav best practice (S65, S66, S67): only one position source per axis at a time. Open issues #30076 and #32506 document concrete EKF3 misbehaviours when both ExtNav (ODOMETRY) and GPS (GPS_INPUT) are fed for overlapping axes — including unstable position with high variances and Z-axis snap-to-ODOMETRY. The "emit both in parallel" framing was a misconfiguration, not a feature. v1 ships GPS_INPUT only (Option A in M-30). ODOMETRY emission disabled in v1. ArduPilot configured EK3_SRC1_*=GPS+Compass; failover via EK3_SRC2_*. Option B (ODOMETRY-primary) is v1.1 work once F-T9 SITL confirms PR #30080-class source-switching is clean.
C-5 (round 1): "loosely-coupled EKF in our process." Architectural (M-26). The companion-side EKF was always going to feed the FC's own EKF3 → double-fusion. Visual fix → companion EKF → ArduPilot EKF3 stacks two filters on overlapping observations, breaks the single-source-per-axis invariant, and risks the same instability documented in #30076/#32506. Drop the companion-side EKF for v1. Component 5 becomes a "covariance calibrator + Mahalanobis outlier gate + source-label producer" — no state propagation, no IMU integration. Each upstream (matcher, cuVSLAM) emits a hypothesis with covariance; outliers are gated; covariances are re-scaled if empirical residuals show over- or under-confidence; results are emitted on the appropriate MAVLink channel. If v1.x evidence demands a companion-side filter, use vanilla ESKF (S68, S69) — the right family for orientation correctness.
C-1b (round 1): "Pinhole projection on per-sector DEM (flat-Earth in flat sectors; SRTM-30 m DEM lookup in moderate sectors)." Engineering (M-27). Implicit hand-rolled implementation reinvents distortion handling, RPC refinement, DEM bilinear lookup, projection — all of which exist in the Orthority Python library (S59) under MIT-class licence, pip-installable. Use Orthority for per-frame ortho (frame-camera mode). Falls back to cv2.warpPerspective + bilinear DEM (~520 ms estimated) if F-T14 measurement shows Orthority's per-frame latency on Orin Nano Super > 50 ms allotted to ortho.
C-9 (round 1): "Single Python process (asyncio) on CPython 3.11/3.12; TRT subprocess workers." Architectural (M-29). With cuVSLAM adoption (M-23), the natural integration path is isaac_ros_visual_slam (ROS 2 wrapper) → MAVROS → FC. Re-exporting cuVSLAM into a custom asyncio orchestrator is high-friction. ROS 2 Humble + JetPack 6 + Isaac ROS 3.2 is a published, working reference design on the exact hardware target (S64, S77). OPEN QUESTION (Q6): ROS 2 Humble + Isaac ROS 3.2 vs. DIY Python orchestrator. ROS 2 cost: ~25 % CPU (DDS + topic serialisation), ~200 MB image growth, learning curve. ROS 2 benefit: free integration of cuVSLAM, MAVROS, observability via ros2 bag / rqt_*. User decides.

(Round-1 findings M-1 through M-21 — including the Phase-1-correction addenda — remain unchanged in their original form; round-2 supersedes only the rows above. Full round-1 rationale lives in solution_draft02.md for traceability and _docs/00_research/02_fact_cards.md.)


Product Solution Description (Revised)

A companion-computer software stack that runs on the Jetson Orin Nano Super alongside an ArduPilot 4.5+ flight controller and provides GPS-equivalent position fixes to the autopilot when real GPS is jammed, spoofed, or denied.

Localization pipeline (per frame at 3 fps nav cam):

  1. cuVSLAM (monocular + IMU from FC RAW_IMU MAVLink stream) provides drift-bounded relative pose with keyframe-based local mapping + sparse bundle adjustment + loop closure.
  2. VPR (DINOv2 SALAD/BoQ chosen by bench-off; AnyLoc fallback) narrows the satellite basemap to a top-K candidate-chunk shortlist on re-localization triggers (cold start, sharp turn, σ_xy > 50 m) — conditional invocation keeps cruise overhead near zero.
  3. Cross-view matcher (SP+LG TRT FP16 inline; GIM-LightGlue peer in the bench-off; LiteSAM as re-loc fallback) produces sub-pixel keypoint correspondences against the candidate chunks; PnP yields an absolute pose + covariance.
  4. Component 5 (covariance calibrator + Mahalanobis outlier gate + source-label producernot an EKF) consumes the absolute pose + cuVSLAM relative pose; rejects outliers; re-scales covariances; emits result on the appropriate MAVLink channel.
  5. GPS_INPUT (GPS1_TYPE=14, MAVLink2-signed, pymavlink) is sent to the FC. ArduPilot EKF3 (24-state classical EKF, 400 Hz) does the actual fusion of our GPS-equivalent fix with its own IMU, baro, compass.

Tile generation (in-flight, asynchronous):

  1. Per-frame eligibility check (σ_xy ≤ 5 m hard gate, terrain class flat/moderate, EKF source = satellite_anchored).
  2. Orthorectification via Orthority (frame-camera model + per-sector DEM from SRTM 30 m).
  3. Quality scoring + dedup against existing tile cache (service-tile immutability respected).
  4. Write to MBTiles SQLite cache (WAL + connection pool + transaction batching) with parent_pose_sigma_xy, terrain_class, trust_level.
  5. Post-flight: tiles uploaded to Suite Service candidate pool; 2-flight voting at Service ingest promotes onboard tiles to trusted basemap.

Object localization (separate path, AI camera): trig + airframe-attitude fusion via FC ATTITUDE MAVLink stream — unchanged from round 1.

MAVLink endpoint: shared between MAVSDK (telemetry, sysid=10) and pymavlink (GPS_INPUT, sysid=11) via distinct system-IDs through ArduPilot's native MAVLink routing — no mavlink-router daemon. MAVLink2 signing mandatory in v1.

                       Pre-flight (ground)
        ┌────────────────────────────────────────────────┐
        │  Azaion Suite Satellite Service                │
        │  (sources commercial / agency imagery;         │
        │   ingests onboard tiles via candidate pool +   │
        │   2-flight voting layer)                       │
        └──────────────┬───────────────────┬─────────────┘
                       │ sync down         │ upload back (post-flight)
                       ▼                   ▲
              ┌─────────────────┐
              │ DEM (SRTM 30 m) │ ─────► sector classification
              └─────────────────┘
                                    Onboard (in-flight)
   Nav Cam: ADTi 20MP, 3 fps        AI Cam (gimbal+zoom, on-demand)
        │                                    │
        ▼                                    ▼
 ┌────────────────────────────────────────────┐  ┌────────────────────┐
 │ ROS 2 Humble + Isaac ROS 3.2 (Q6: TBD)     │  │ Object Geo-Locator │
 │ ┌──────────────────────────┐               │  │ (pinhole+ATTITUDE) │
 │ │ cuVSLAM (mono + IMU)     │←──FC RAW_IMU  │  └──────┬─────────────┘
 │ │ → keyframe pose + cov    │               │         │
 │ └────────────┬─────────────┘               │         │
 │              ▼                              │         │
 │ ┌──────────────────────────┐               │         │
 │ │ VPR (SALAD/BoQ/AnyLoc)   │←─ re-loc      │         │
 │ │ on demand only           │   triggers    │         │
 │ └────────────┬─────────────┘               │         │
 │              ▼                              │         │
 │ ┌──────────────────────────┐               │         │
 │ │ Cross-view Matcher       │               │         │
 │ │ inline: SP+LG / GIM-LG   │               │         │
 │ │ re-loc:  LiteSAM (rare)  │               │         │
 │ └────────────┬─────────────┘               │         │
 │              ▼                              │         │
 │ ┌──────────────────────────┐               │         │
 │ │ PnP → absolute pose + Σ  │               │         │
 │ └────────────┬─────────────┘               │         │
 │              ▼                              │         │
 │ ┌──────────────────────────────────────┐   │         │
 │ │ Component 5 (NOT an EKF)             │   │         │
 │ │  - covariance calibrator             │   │         │
 │ │  - Mahalanobis outlier gate          │   │         │
 │ │  - source-label producer             │   │         │
 │ └────────────┬─────────────────────────┘   │         │
 │              ▼                              │         │
 │ ┌──────────────────────────────────────┐   │         │
 │ │ Ortho-Tile Generator (Orthority)     │   │         │
 │ │  → MBTiles+WAL Tile Cache            │   │         │
 │ └──────────────────────────────────────┘   │         │
 └────────────────┬───────────────────────────┘         │
                  ▼                                      │
         GPS_INPUT (pymavlink, signed) ──► ArduPilot     │
         (GPS1_TYPE=14, EK3_SRC1_POSXY=GPS, EK3_SRC2=GPS)│
                  │  (ODOMETRY disabled for v1; v1.1+)   │
                  ▼                                      │
         Telemetry summary 12 Hz ──────► QGroundControl │
                  │                                      │
                  ▼                                      │
         Flight Data Recorder (NVMe, 64 GB cap, no raw frames)

Architecture

Overall principles (revised vs draft02)

  1. Pipeline = stages with explicit confidence. Each stage emits a pose hypothesis + covariance + categorical label. Component 5 calibrates and gates; ArduPilot EKF3 fuses. (Revised — M-26.)
  2. All heavy NN inference runs on GPU via TensorRT (FP16, INT8 where validated). Pre-extract satellite-tile descriptors offline (AC-8.3). (Unchanged.)
  3. Orchestration: ROS 2 Humble + Isaac ROS 3.2 (Q6 → A, locked). cuVSLAM consumed via isaac_ros_visual_slam; MAVROS bridges ROS 2 ↔ MAVLink for the FC. Our matcher / VPR / ortho / Component-5 calibrator / FDR / uploader run as rclpy Python nodes. CPython 3.11 / 3.12 inside the nodes; TensorRT engines + CUDA contexts owned per-node. (Revised — M-29.)
  4. Persistent satellite cache across flights (~10 GB for 400 km²); per-flight FDR is separate. (Unchanged.)
  5. Every output to the FC carries a covariance — GPS_INPUT (h_acc, v_acc, vel_acc). ODOMETRY emission disabled for v1 (Option A in M-30). (Revised — M-30.)
  6. Service tiles are basemap truth; onboard tiles go through Service-side voting before promotion (M-9). (Unchanged.)
  7. MAVLink2 signing on every companion↔FC link (M-7). USB bypasses signing — bench-only access. (Unchanged.)
  8. No companion-side state propagation — the FC's EKF3 is the only filter. Any future companion-side filter (v1.x) will be an ESKF (S69), not a regular EKF. (New — M-26.)

Component 1: Satellite Tile Cache & Descriptor Index

Unchanged from draft02 / Mode B round 1 — MBTiles SQLite + WAL + connection pool + transaction batching; FAISS IVF over per-chunk DINOv2-VLAD vectors (chunk-decoupled per M-16); terrain_class and trust_level sidecar. (M-28: COG + PMTiles considered and rejected for our use case.)


Component 1b: Ortho-Tile Generator (REVISED — M-27)

Library: Orthority (S59, Python, MIT-class) — frame-camera model with GeoTIFF DEM lookup. Pip-installable: pip install orthority. Replaces draft02's hand-rolled "pinhole projection on per-sector DEM".

Pipeline per frame (eligibility / quality / dedup logic unchanged from draft02; only the projection step is replaced):

  1. Eligibility check (unchanged from draft02 / M-9 hard gate): skip when EKF source is dead_reckoned, σ_xy > 5 m, roll/pitch > 10°, no inliers, or sector is rugged. Sectors classified moderate get terrain_uncertainty=true sidecar flag.
  2. Orthorectification (revised): call orthority.Ortho(frame, dem, camera_model).process() with the frame-camera model populated from FC ATTITUDE (gimbal pitch / roll / yaw) + companion-resolved position + airframe altitude. SRTM-30 m DEM tile pre-loaded for the operational area.
  3. Resampling to basemap projection (unchanged): EPSG:3857 z=20.
  4. Quality scoring (unchanged from draft02): sharpness + coverage + match_inliers + parent_pose_sigma_xy + glare/cloud flag.
  5. Deduplication / write decision (unchanged from draft02 — M-9 service-tile-immutability + soft/candidate gates).
  6. Sidecar metadata (unchanged): parent_pose_sigma_xy, terrain_class, trust_level.

Latency budget: F-T14 (revised) measures Orthority's per-frame latency on Orin Nano Super. Budget: ≤50 ms / frame. Documented fall-back if exceeded: cv2.warpPerspective + bilinear DEM lookup (~520 ms estimated).


Component 2: Visual Place Recognition (Global Retrieval)

Unchanged from draft02 / Mode B round 1. AnyLoc + SALAD + BoQ + MixVPR shortlist; conditional invocation (M-17); chunk-based retrieval unit (M-16); expanding-window retry (M-18); multi-scale chunks + OSM road-overlay + sector-volatility-driven K (M-19); active-conflict scene-change mitigations stand. (M-33: no new VPR backbone in 2025 displaces this.)


Component 3: Cross-View Matching & PnP (REVISED — M-24, M-25)

Inline lead: SuperPoint + LightGlue (TRT FP16/INT8) — unchanged. Feasibility re-confirmed: ~50200 ms / pair on Orin Nano Super FP16 at 320×240 → 640×480 (RTX 3080 baseline 0.96 + 2.54 ms scaled by Orin Nano Super throughput ratio; cross-validated by S76 YOLO26 reference points).

Inline peer: GIM-LightGlue — unchanged from draft02 (M-3, S48). +8.418.1 % zero-shot vs LightGlue baseline.

Embedded fallback: XFeat (sparse + semi-dense) — unchanged.

Re-localization fallback (new — M-24): LiteSAM (S58). Invoked rarely (cold start, σ_xy > 50 m, sharp turn after cuVSLAM tracking loss). Latency budget: 1.52 s on Orin Nano Super. Accepted because re-loc events are rare and AC-NEW-1 cold-start budget is 30 s.

Validation oracle (new — M-24): LiteSAM run offline on bench data for ground-truth-quality matches. Used to score the inline matcher's recall@30m on a per-flight basis without needing manual annotation.

Distillation teacher (new — M-24): train a satellite-aerial-specialised student model (target ≤5 M params, ≤100 ms / pair) using LiteSAM-supervised correspondences on TartanAir V2 + AerialExtreMatch + UAV-VisLoc. Output is a candidate inline matcher for v1.x.

Offline ceiling references (new — M-25): RoMa v2 (S63), MASt3R-SLAM (S62), MapGlue, MATCHA — included in the matcher bench-off so we know how much accuracy we trade by using SP+LG inline. None becomes inline candidate.

Bench-off scope (revised) for the deferred research item:

  • Inline candidates (must fit in 200 ms / pair on Orin Nano Super @ 25 W): SP+LG, GIM-LightGlue, XFeat (sparse), XFeat (semi-dense).
  • Re-loc candidates (must fit in 2 s / pair): LiteSAM.
  • Offline ceilings: RoMa v2, MASt3R-SLAM, MapGlue, MATCHA.

Bench-off targets (unchanged from draft02): AerialVL, UAV-VisLoc, AerialExtreMatch, 2chADCNN season set, TartanAir V2, internal Mavic, first internal fixed-wing flight.

Score on: AC-1.1 / AC-1.2 / AC-2.2 / p95 latency on Orin Nano Super 25 W / sustained 30-min thermal stability / peak GPU memory / plus seasonal-robustness score / plus accuracy-vs-inline-feasibility frontier (re-loc role only for >200 ms candidates).

PnP & projection: unchanged from draft02.

Input downsampling: unchanged starting points (1024×768 for SP+LG / GIM-LG; 640×480 for XFeat sparse).


Component 4: Visual Odometry (REVISED — M-22, M-23)

v1 choice: cuVSLAM (NVIDIA, CUDA-accelerated, Apache-2.0; S60). Monocular + IMU mode. Drop-in via isaac_ros_visual_slam ROS 2 wrapper (S64). Replaces draft02's "custom 2-frame VO via SP+LG / GIM-LG homography".

Why cuVSLAM:

  • Production-grade VO/SLAM with keyframe-based local mapping + sparse bundle adjustment + loop closure — bounds drift, unlike a 2-frame homography.
  • CUDA-accelerated, optimized for Jetson. Reference designs on Orin Nano (S64, S77) confirm runtime feasibility.
  • <1 % ATE on KITTI / <5 cm on EuRoC.
  • Minimal integration cost via the ROS 2 wrapper.

Why not the alternatives:

  • DPVO / DPV-SLAM (S61, S73): extrapolated 415 FPS on Orin Nano Super — borderline for 10 Hz target. Reserved as bench-off fall-back.
  • MASt3R-SLAM (S62): sub-1 Hz on Orin Nano Super — infeasible inline.
  • VINS-Fusion / OpenVINS / BASALT / SVO Pro (S71): non-trivial integration cost; no accuracy advantage. Reserved as bench-off fall-backs.
  • Custom 2-frame homography VO (draft02): wrong design (M-22).

IMU source for cuVSLAM (Q7 → A, locked, M-35): MAVLink RAW_IMU / SCALED_IMU from FC at ~200400 Hz (path a). Subscribed inside the cuVSLAM node via MAVROS. F-T1c (new field test) measures sync-jitter under flight load; if it fails the threshold (TBD by cuVSLAM tolerance), v1.1 adds a dedicated companion IMU (BNO055 / ICM-42688P / BMI270) over SPI as a hardware revision.

Camera intrinsics: nav cam (ADTi 20MP APS-C) calibrated pre-flight via standard checkerboard (M-34). cuVSLAM consumes the camera_info topic at start-up.

Risk R8 reframed: cuVSLAM's high-altitude fixed-wing performance is empirically unproven (its published benchmarks are urban driving + indoor MAV). F-T1b (revised) bench-off mandatory before AC-1.3 lock.

Fall-back path: if cuVSLAM underperforms on AerialVL fixed-wing trajectories, use a properly-scoped VO (DPV-SLAM with keyframe + bundle adjustment + loop closure, not 2-frame homography) as the v1.1 candidate. Custom 2-frame VO never comes back.


Component 5: Companion-Side Output Stage (REVISED — M-26, M-30)

Renamed: was "IMU + Visual EKF Fusion" in draft02. Now: "Companion-Side Output Stage — Covariance Calibrator + Outlier Gate + Source-Label Producer".

Responsibility (v1):

  1. Consume cuVSLAM relative-pose + cross-view matcher absolute-pose hypotheses.
  2. Run a Mahalanobis outlier gate to drop fixes whose innovation w.r.t. cuVSLAM relative pose exceeds a threshold (computed against AC-NEW-4 false-position safety budget).
  3. Re-scale covariances using empirical residuals (online, exponentially-weighted) to correct for systematic over- / under-confidence in the matcher / VPR / VO outputs.
  4. Tag the result with a categorical source label: satellite_anchored / vo_extrapolated / dead_reckoned.
  5. Emit on the appropriate MAVLink channel (GPS_INPUT for v1, Option A in M-30).

Explicitly NOT in v1:

  • State propagation (no x_{k+1} = f(x_k, u_k) + w_k).
  • IMU integration (the FC's EKF3 does this with the FC's own IMU at 400 Hz).
  • ODOMETRY emission (Option B in M-30 — v1.1+).

ESKF question resolved: ArduPilot EKF3 is a regular EKF (24-state) — we cannot swap the FC filter (S65, S66, S67, S68). The EKF-vs-ESKF debate applies only to a hypothetical companion-side filter, which we drop for v1. If v1.x evidence (F-T9 SITL) demands a companion-side filter, use vanilla ESKF (S69) — the right family for orientation correctness, with tangent-space covariance on SO(3).

Hybrid-output channel split (M-30):

Mode EK3_SRC1_* configuration Channel emission Status
Option A (v1 default) POSXY=GPS, VELXY=GPS, YAW=GPS+Compass. EK3_SRC2_*=GPS for failover. GPS_INPUT only (GPS1_TYPE=14). ODOMETRY disabled. Ships in v1.
Option B (v1.1+) POSXY=ExternalNav, YAW=ExternalNav. EK3_SRC2_POSXY=GPS for failover. ODOMETRY primary; GPS_INPUT held in reserve, not actively fused while ODOMETRY healthy. Requires PR #30080 fix; gated on F-T9 SITL pass.

Unchanged from draft02 / round 1. MAVSDK (telemetry, sysid=10) + pymavlink (GPS_INPUT, sysid=11), distinct system-IDs sharing the serial port via ArduPilot's native MAVLink routing. No mavlink-router daemon. MAVLink2 signing mandatory, per-airframe key in FC FRAM. Source-promotion logic and AC-NEW-2 (<3 s spoofing-promotion latency) carry forward unchanged. (M-31: sysid collision-check added to deploy runbook.)


Component 7: Failsafe, Health & Re-Localization

Unchanged from draft02.


Component 8: Object Localization (AI Camera)

Unchanged from draft02.


Component 9: Software Platform & Process Topology (LOCKED — Q6 → A, M-29)

v1 choice: ROS 2 Humble + Isaac ROS 3.2 on JetPack 6 / Ubuntu 22.04 (S64, S77).

Process topology:

  • C++ Isaac ROS node: cuVSLAM via isaac_ros_visual_slam (consumes camera_info + image stream + IMU; publishes nav_msgs/Odometry).
  • C++ MAVROS node: bridges ROS 2 ↔ MAVLink for the FC. RAW_IMU / SCALED_IMU subscribed by the cuVSLAM node; FC ATTITUDE consumed by Component 1b ortho node; GPS_INPUT published by Component 5 calibrator node.
  • Python rclpy nodes: matcher (SP+LG TRT FP16/INT8), VPR (SALAD/BoQ on demand), Component 1b ortho generator (Orthority), Component 5 calibrator + outlier gate, FDR writer, Suite-Service uploader.
  • TensorRT engines + CUDA contexts owned per-node (no shared CUDA context). Engines loaded at node start-up; warm-up inference at boot.

Stack details (locked):

  • CPython 3.11 or 3.12 inside rclpy nodes (free-threaded 3.13 deferred to v1.x — M-32, M-33).
  • TensorRT FP16 default, INT8 where validated by the matcher bench-off.
  • numba JIT for the calibrator's hot path (Mahalanobis distance + covariance re-scale).
  • Configuration via YAML; structured-JSON logging to FDR; ros2 bag for in-flight telemetry capture.

Cost / benefit reaffirmed:

  • Cost: ~25 % CPU for DDS + topic serialisation; ~200 MB extra deployment-image footprint; learning curve (mitigated by published reference designs in S64, S77).
  • Benefit: drop-in isaac_ros_visual_slam for cuVSLAM, drop-in MAVROS for the FC bridge, free observability via ros2 bag and rqt_*, battle-tested by the wider robotics community.

Reference designs: S64 (Hackster.io GPS-Denied Drone), S77 (thomasthelliez ROS 2 / Isaac ROS guide), bandofpv/VSLAM-UAV (PX4 + ROS 2 reference), sidharthmohannair/ros2-ardupilot-sitl-hardware (ArduPilot + ROS 2 reference).


Component 10: Flight Data Recorder

Unchanged from draft02 / round 1.


Component 11: Confidence Score (cross-cutting)

Unchanged from draft02 / round 1.


Testing Strategy

Functional / Integration

  • F-T1 Tile cache load/lookup (unchanged).
  • F-T1b (REVISED — M-22, M-23, R8 reframed) AC-1.3 drift regression: run cuVSLAM on AerialVL fixed-wing trajectories (70 km of real flight). Pass = drift ≤ 100 m mono-only / ≤ 50 m mono+IMU between satellite anchors at 95th percentile. Gates AC-1.3 lock. If cuVSLAM fails: fall back to DPV-SLAM bench / VINS-Fusion bench.
  • F-T1c (new — M-22, M-23) Compare cuVSLAM mono vs cuVSLAM mono+IMU on the same AerialVL trajectories — quantifies IMU contribution given MAVLink-rated IMU rate (path (a) of M-35).
  • F-T2 Tile generation + dedup (extended — M-9 + M-27): replay a recorded flight; assert (a) ≤1 tile per ground sector covered ≥2× by nav cam; (b) tile has parent_pose_sigma_xy ≤ hard gate; (c) service tiles never overwritten within freshness budget; (d) Orthority output equivalent to ground-truth ortho (RMSE < 1 px on synthetic frame with known DEM).
  • F-T3 Tile uploader → candidate pool (unchanged from draft02).
  • F-T4 End-to-end against AerialVL.
  • F-T5 End-to-end against UAV-VisLoc.
  • F-T5b End-to-end against AerialExtreMatch (unchanged from draft02 — M-14).
  • F-T5c Season-robustness regression against 2chADCNN season set (unchanged from draft02 — M-14).
  • F-T6 End-to-end against internal Mavic flight footage.
  • F-T7 Sharp-turn handling (extended — M-24): assert LiteSAM re-loc fallback recovers within 2 s on post-turn frames where SP+LG inline matcher fails.
  • F-T8 Disconnected-segment re-localization (extended — M-24): include LiteSAM re-loc in the test matrix.
  • F-T9 ArduPilot SITL: full MAVLink loop (REVISED — M-30). Test matrix:
    • Option A mode (v1 default): GPS_INPUT only; verify EKF3 fuses correctly; verify failover to backup GPS via EK3_SRC2_*.
    • Option B mode (v1.1 candidate): ODOMETRY-primary; verify PR #30080-class source-switching is clean; verify GPS_INPUT held in reserve does not double-fuse (issues #30076 / #32506 regression test).
    • Source switching: jam-onset → our channel; spoofed-real-GPS recovery → operator-confirmed source-restore.
    • MAVLink2 signing on: assert injection refused on signing failure; assert acceptance on valid signing.
  • F-T10 Operator re-loc workflow via QGC STATUSTEXT (unchanged).
  • F-T11 Cold-start TTFF <30 s (AC-NEW-1) (extended — M-24): include LiteSAM as the cold-start re-loc path.
  • F-T12 Spoofing-promotion <3 s (AC-NEW-2) (unchanged).
  • F-T13 Object localization with airframe-attitude fusion (unchanged).
  • F-T14 (REVISED — M-27) Per-sector DEM classification + Orthority per-frame latency: load SRTM-30 m for the operational area; assert sector classes (flat, moderate, rugged) match ground-truth DEM amplitudes; measure Orthority per-frame ortho latency on Orin Nano Super @ 25 W; assert ≤ 50 ms / frame budget. If exceeded: switch to cv2.warpPerspective + bilinear DEM fall-back.
  • F-T15 VPR retrieval-unit bench (unchanged from draft02 — M-16/17/18).
  • F-T16 Synthetic cloud-occlusion injection (unchanged from draft02).
  • F-T17 Mission replay assertion (unchanged from draft02 — M-17).
  • F-T18 (new — M-26) Companion-side calibrator regression: replay a recorded flight; assert the calibrator's empirical residuals lie within the configured Mahalanobis gate; assert no state-propagation logic is invoked; assert ArduPilot EKF3 receives well-calibrated covariances (post-flight comparison of h_acc reported vs measured residual).
  • F-T19 (new — Q6) If Q6.A is chosen: ROS 2 topic-rate sanity test — assert all ROS 2 topics meet expected publish rates under simulated load.

Non-Functional

  • NF-T1 Latency p95 <400 ms on Orin Nano Super 25 W (AC-4.1) (unchanged).
  • NF-T2 Memory <8 GB shared (AC-4.2) (extended — Q6): ROS 2 + Isaac ROS deployment image must fit; reserve ≥1 GB for matcher + VPR engines.
  • NF-T3 Thermal: 8 h sustained 25 W (AC-NEW-5) (unchanged).
  • NF-T4 False-position safety budget (AC-NEW-4) (extended — M-26): Monte Carlo with synthetic over-confidence injection; verify Component 5's outlier gate rejects bad fixes BEFORE they reach ArduPilot EKF3 (companion-side gate; FC EKF3 gate is a second line of defence).
  • NF-T4b AC-NEW-7 cache-poisoning safety budget (unchanged — M-9).
  • NF-T5 Storage: 64 GB FDR cap with rollover (unchanged).
  • NF-T6 Imagery freshness gate (AC-NEW-6) (unchanged).

Security

  • S-T1S-T5 (unchanged from draft02).

Field

  • FT-1FT-3 (unchanged from draft02).

Key Risks & Open Items (carried into Plan step)

ID Risk Severity Mitigation
R1 Imagery licensing lead time (Service-side) Med Suite Service procurement
R2 Latency budget on Orin Nano Super at 1024×768 Med Empirical bench-off in week 1 of impl
R3 Cross-view accuracy at 1 km AGL with Ukrainian seasonal change Med 50 %@20 m hard floor; bench-off includes SALAD/BoQ/GIM-LG/2chADCNN/LiteSAM-as-oracle
R4 MAVSDK + pymavlink coexistence Resolved (M-6)
R5 Thermal at 25 W for 8 h Med NF-T3
R6 AC-7.1 in turning flight Low v1.1
R7 Public dataset gap (V&V) Med Bench-off + first internal fixed-wing flight before AC-1.3 lock
R8 (REFRAMED — M-22, M-23) cuVSLAM 1 km AGL fixed-wing performance is empirically unproven Med F-T1b on AerialVL fixed-wing trajectories; FT-3 first internal fixed-wing flight; documented fall-back to DPV-SLAM / VINS-Fusion
R9 Cross-flight cache poisoning High (safety) Service-tile immutability + 2-flight voting + σ_xy hard gate + AC-NEW-7
R10 Companion↔FC link is flight-critical attack surface High (security) MAVLink2 signing mandatory + native routing
R11 ArduPilot ExtNav source-switching gotchas Med F-T9 SITL matrix; pin ArduPilot to PR #30080 version
R12 Eastern-Ukraine relief amplitude breaks flat-Earth assumption Med Per-sector DEM lookup + runtime self-classifier
R13 (new — M-27) Orthority per-frame latency on Orin Nano Super may exceed budget LowMed F-T14 measurement; fall-back to cv2.warpPerspective + bilinear DEM (~520 ms estimated)
R14 (new — M-26, M-30) Dropping companion-side EKF may surface FC-side covariance-handling issues LowMed F-T18 calibrator regression + F-T9 SITL Option A; if EKF3 mishandles raw inputs, re-introduce vanilla ESKF in v1.x
R15 (M-29) Orchestrator choice (Q6 → A locked: ROS 2 Humble + Isaac ROS 3.2) Resolved
R16 (M-35) MAVLink-rated IMU may be insufficient for cuVSLAM sync sensitivity LowMed F-T1c IMU-sync-jitter measurement; v1.1 hardware revision adds dedicated companion IMU if F-T1c fails (Q7 → A locked: path (a) for v1)

Proposed AC additions

AC-NEW-7 — Cache-poisoning safety budget (unchanged from draft02 — M-9).

AC-NEW-8 — VO drift bound on fixed-wing 1 km AGL (new — M-22, M-23, R8 reframed). Specifically: cuVSLAM (mono+IMU) drift between satellite anchors ≤ 50 m at 95th percentile on AerialVL fixed-wing trajectories; ≤ 100 m mono-only. Validated by F-T1b.

AC-NEW-9 — Companion-side covariance calibration accuracy (new — M-26). Empirical residuals of GPS_INPUT pose, computed against ground truth on F-T1b trajectories, must lie within the reported h_acc/v_acc covariance with probability ≥ 95 %. (Calibration must not under- or over-claim.) Validated by F-T18.


Open Research (deferred to dedicated research passes before Plan)

Topic Why now Output Owner
Cross-view matcher bench-off (REVISED scope — M-24, M-25) Inline + re-loc + offline-ceiling tracks are now distinct Selected inline matcher; selected re-loc matcher; ceiling reference numbers; distillation candidate teacher (LiteSAM) Research skill, follow-up Mode A pass
Input-resolution sweep Same as draft02 Resolution per matcher candidate; sensitivity curves Same pass
VPR backbone bench-off Same as draft02 Selected VPR backbone Same pass
VO bench-off (new — M-22, M-23) cuVSLAM is the lead but unproven on 1 km AGL fixed-wing cuVSLAM mono / cuVSLAM mono+IMU / DPV-SLAM / VINS-Fusion / OpenVINS comparison on AerialVL + first internal fixed-wing flight Research / impl. team
Tile-generator quality scoring Same as draft02 Calibrated thresholds for σ_xy / sharpness / glare Implementation phase
Orthority per-frame latency on Orin Nano Super (new — M-27) Confirms or rejects M-27 library choice F-T14 measurement; if fail → cv2.warpPerspective + bilinear DEM fall-back path locked Implementation phase
Internal Mavic-flight V&V dataset Same as draft02 Curated, ground-truth-labelled clips Operations / data team
First internal fixed-wing flight Same as draft02 Recorded sortie with synced IMU + GPS truth + nav-cam stream Field-test plan
Q6 — Orchestrator decision Locked 2026-04-26: Q6 → A (ROS 2 Humble + Isaac ROS 3.2).
Q7 — Companion IMU strategy Locked 2026-04-26: Q7 → A (MAVLink RAW_IMU from FC for v1; dedicated IMU only if F-T1c fails).
Encryption-at-rest key management Same as draft02 Threat-modelled design Phase 4 security analysis

References

All citations are by ID from _docs/00_research/01_source_registry.md. Mode B round 2 sources: S58S77 (round 1 sources S40S57 carried over).

  • VO: S60 (cuVSLAM), S61 (DPVO-QAT++), S62 (MASt3R-SLAM), S64 (Isaac ROS UAV reference), S71 (VINS-Fusion / OpenVINS Jetson reports), S72 (high-altitude VIO), S73 (DPV-SLAM).
  • Matcher: S58 (LiteSAM), S63 (RoMa v2), S74 (OrthoLoC + AdHoP), S75 (AerialExtreMatch open-review).
  • Fusion: S65 (ArduPilot ExtNav double-fusion bug), S66 (Z-axis snap bug), S67 (EKF sources spec), S68 (PX4 EKF2 ESKF PR), S69 (Sola ESKF tutorial), S70 (T-ESKF + Hybrid ESKF/UKF 2025).
  • Ortho: S59 (Orthority).
  • Sweep: S76 (Orin Nano Super FP16/INT8 reference points), S77 (ROS 2 / Isaac ROS practical guide).

  • Mode A draft: _docs/01_solution/solution_draft01.md (superseded by draft02 → draft03).
  • Mode B round 1 draft: _docs/01_solution/solution_draft02.md (superseded by draft03).
  • Mode B round 2 decomposition: _docs/00_research/03_mode_b_decomposition_round2.md.
  • Mode B round 2 reasoning chain: _docs/00_research/04_reasoning_chain_mode_b_round2.md.
  • Mode B round 2 validation log: _docs/00_research/05_validation_log_mode_b_round2.md.
  • AC & Restrictions assessment (Phase 1): _docs/00_research/00_ac_assessment.md (unchanged).
  • Source registry: _docs/00_research/01_source_registry.md (S01S77).
  • Fact cards: _docs/00_research/02_fact_cards.md (Phase 1 + Mode B round 1 M-1..M-21 + Mode B round 2 M-22..M-35).
  • Tech stack consolidation: _docs/01_solution/tech_stack.md (deferred — Phase 3 optional).
  • Security analysis: _docs/01_solution/security_analysis.md (deferred — Phase 4 optional, promoted to recommended-before-Plan-lock because of M-6/M-7).