- Introduced a new document detailing the current state of the autodev process, including steps, status, and findings.

- Revised acceptance criteria in the acceptance_criteria.md file to clarify metrics and expectations, including updates to GPS accuracy and image processing quality.
- Enhanced restrictions documentation to reflect operational parameters and constraints for UAV flights, including camera specifications and satellite imagery usage.
- Added new research documents for acceptance criteria assessment and question decomposition to support ongoing project evaluation and decision-making.
This commit is contained in:
Oleksandr Bezdieniezhnykh
2026-04-26 14:28:10 +03:00
parent 2178737b36
commit 9eba1689b3
17 changed files with 2965 additions and 69 deletions
+140 -35
View File
@@ -1,50 +1,155 @@
# Position Accuracy # Acceptance Criteria
- The system should determine GPS coordinates of frame centers for 80% of photos within 50m error compared to real GPS > **Last revised**: 2026-04-26 (post Mode B Solution Assessment + user-driven addendum on VPR granularity & change-robustness + user lock-in of Mode B open items Q1Q5).
- The system should determine GPS coordinates of frame centers for 60% of photos within 20m error compared to real GPS > Changes vs. previous version (2026-04-25): AC-1.2 split into hard-floor + stretch; AC-1.4 made quantitative; AC-2.2 split per pipeline stage; AC-3.4 dual-trigger; AC-4.3 autopilot-pinned; AC-5.2 N pinned; AC-7.1 scoped to level flight; AC-8.2 freshness by sector; six new AC added (AC-NEW-1 … AC-NEW-6).
- Maximum cumulative VO drift between satellite correction anchors should be less than 100 meters > Changes 2026-04-26: AC-4.3 extended to dual-channel hybrid (GPS_INPUT primary + ODOMETRY auxiliary); AC-8.6 added (VPR retrieval-unit + change-robustness); AC-NEW-7 added with confirmed numeric thresholds (cache-poisoning safety budget).
- System should report a confidence score per position estimate (high = satellite-anchored, low = VO-extrapolated with drift)
# Image Processing Quality ## Position Accuracy
- Image Registration Rate > 95% for normal flight segments. The system can find enough matching features to confidently calculate the camera's 6-DoF pose and stitch that image into the trajectory - **AC-1.1** — The system shall determine GPS coordinates of frame centers within **50 m** of true GPS for **≥80%** of photos in normal flight segments.
- Mean Reprojection Error (MRE) < 1.0 pixels - **AC-1.2** — The system shall determine GPS coordinates of frame centers within **20 m** of true GPS for **≥50%** of photos in normal flight segments.
- **AC-1.3** — Maximum cumulative VO drift between two consecutive satellite-anchored fixes shall be **<100 m** (VO-only fallback) or **<50 m** (when IMU is fused). Drift is measured as ‖VO-extrapolated centre next anchor centre‖ at the moment of the anchor fix.
- **AC-1.4** — The system shall report a **quantitative confidence score** per position estimate, comprising:
- the 95% covariance ellipse semi-major axis in meters, AND
- a categorical label `{satellite_anchored, vo_extrapolated, dead_reckoned}`.
# Resilience & Edge Cases ## Image Processing Quality
- The system should correctly continue work even in the presence of up to 350m outlier between 2 consecutive photos (due to tilt of the plane) - **AC-2.1** — Image registration rate **>95%** for normal flight segments (defined as: nadir flight ±10° bank / pitch, ≥40% overlap with prior frame, daytime, season-matched satellite tile).
- System should correctly continue work during sharp turns, where the next photo doesn't overlap at all or overlaps less than 5%. The next photo should be within 200m drift and at an angle of less than 70 degrees. Sharp-turn frames are expected to fail VO and should be handled by satellite-based re-localization - **AC-2.2** — Mean Reprojection Error (MRE):
- System should operate when UAV makes a sharp turn and next photos have no common points with previous route. It should figure out the location of the new route segment and connect it to the previous route. There could be more than 2 such disconnected segments, so this strategy must be core to the system - **<1.0 px** for VO frame-to-frame homography on overlapping aerial pairs;
- In case the system cannot determine the position of 3 consecutive frames by any means, it should send a re-localization request to the ground station operator via telemetry link. While waiting for operator input, the system continues attempting VO/IMU dead reckoning and the flight controller uses last known position + IMU extrapolation - **<2.5 px** for satellite-anchored cross-domain (UAV photo ↔ ortho satellite tile) registration.
# Real-Time Onboard Performance ## Resilience & Edge Cases
- Less than 400ms end-to-end per frame: from camera capture to GPS coordinate output to the flight controller (camera shoots at ~3fps) - **AC-3.1** — The system shall correctly continue work in the presence of up to **350 m** outliers between two consecutive photos (caused by airframe tilt up to ±20°).
- Memory usage should stay below 8GB shared memory (Jetson Orin Nano Super — CPU and GPU share the same 8GB LPDDR5 pool) - **AC-3.2** — The system shall correctly continue work during sharp turns where the next photo overlaps **<5%** with the previous, drifts **<200 m**, and changes heading **<70°**. Sharp-turn frames are expected to fail VO and shall be handled by satellite-based re-localization (place recognition over the satellite tile cache).
- The system must output calculated GPS coordinates directly to the flight controller via MAVLink GPS_INPUT messages (using MAVSDK) - **AC-3.3** — The system shall handle **≥3 disconnected segments** per flight, connecting each new segment to the previous trajectory via global descriptor retrieval + RANSAC pose-graph relocalization. This is a core capability, not a degraded mode.
- Position estimates are streamed to the flight controller frame-by-frame; the system does not batch or delay output - **AC-3.4** — When the system cannot determine position for **≥3 consecutive frames AND ≥2 s**, it shall send a re-localization request to the ground station via telemetry. While waiting, it continues VO/IMU dead reckoning and the flight controller uses last known position + IMU extrapolation.
- The system may refine previously calculated positions and send corrections to the flight controller as updated estimates
# Startup & Failsafe ## Real-Time Onboard Performance
- The system initializes using the last known valid GPS position from the flight controller before GPS denial begins - **AC-4.1** — End-to-end latency from camera capture to GPS coordinate output to the flight controller shall be **<400 ms p95**. Up to ~10% of frames may be dropped under sustained load (skip-allowed).
- If the system completely fails to produce any position estimate for more than N seconds (TBD), the flight controller should fall back to IMU-only dead reckoning and the system should log the failure - **AC-4.2** — Memory usage shall remain below **8 GB** shared on Jetson Orin Nano Super (CPU and GPU share the same 8 GB LPDDR5 pool).
- On companion computer reboot mid-flight, the system should attempt to re-initialize from the flight controller's current IMU-extrapolated position - **AC-4.3** — The system shall output its position estimate to the flight controller via **two parallel MAVLink channels**, both emitted by **pymavlink** (general telemetry uses MAVSDK):
- **Primary**: `GPS_INPUT` targeting **ArduPilot** with `GPS1_TYPE=14` (MAVLink GPS substitute). Matches the "replacement for the GPS module" framing of the build.
- **Auxiliary** (when the EKF emits a fix with full 6-DoF covariance and quality > VISO_QUAL_MIN): `ODOMETRY` so EKF3 can fuse the richer covariance + native yaw error + quality field. ArduPilot's own dev docs designate ODOMETRY as the preferred external-nav channel for non-GPS substitution; we hybridise to keep AC-4.3's GPS-substitute framing while not throwing away the covariance fidelity that AC-NEW-4 depends on.
- FC source priorities are configured so GPS_INPUT remains the failover path if ODOMETRY trips a parameter gate.
- **v1 scope clause (added 2026-04-26 — see solution_draft03 finding M-30)**: v1 ships **GPS_INPUT only**; the ODOMETRY auxiliary channel is intentionally **disabled** in v1 because feeding both `GPS_INPUT` and `ODOMETRY` for overlapping axes triggers ArduPilot EKF3 double-fusion bugs (issues #30076 / #32506). `EK3_SRC1_*=GPS+Compass`; ODOMETRY emission re-enables in v1.1 once F-T9 SITL confirms PR #30080-class clean source-switching. Tests therefore assert v1 emits GPS_INPUT only and that ODOMETRY is *intentionally absent* on the wire.
- (Decision rationale: MAVSDK has no native GPS_INPUT support — see `_docs/00_research/00_ac_assessment.md` Q-1; ODOMETRY hybrid rationale — see Mode B finding M-1 in `_docs/00_research/02_fact_cards.md`; v1 single-channel rationale — see Mode B round-2 finding M-30 in `_docs/00_research/02_fact_cards.md` / solution_draft03.)
- **AC-4.4** — Position estimates are streamed to the flight controller frame-by-frame; the system shall not batch or delay output.
- **AC-4.5** — The system may refine previously calculated positions and send corrections to the flight controller as updated estimates.
# Ground Station & Telemetry ## Startup & Failsafe
- Position estimates and confidence scores should be streamed to the ground station via telemetry link for operator situational awareness - **AC-5.1** — The system shall initialise using the last known valid GPS position from the flight controller's EKF, plus IMU-extrapolated position at the moment of GPS denial.
- The ground station can send commands to the onboard system (e.g., operator-assisted re-localization hint with approximate coordinates) - **AC-5.2** — If the system fails to produce any position estimate for **>3 s**, the flight controller shall fall back to IMU-only dead reckoning and the system shall log the failure.
- Output coordinates in WGS84 format - **AC-5.3** — On companion computer reboot mid-flight, the system shall attempt to re-initialise from the flight controller's current IMU-extrapolated position. See AC-NEW-1 for the cold-start time-to-first-fix budget.
# Object Localization ## Ground Station & Telemetry
- Other onboard AI systems can request GPS coordinates of objects detected by the AI camera - **AC-6.1** — Position estimates and confidence scores shall be streamed to **QGroundControl** via the MAVLink telemetry link. High-rate (per-frame) content stays on the local link for forensics; the GCS link is downsampled to **12 Hz** for situational awareness.
- The GPS-Denied system calculates object coordinates trigonometrically using: current UAV GPS position (from GPS-Denied), known AI camera angle, zoom, and current flight altitude. Flat terrain is assumed - **AC-6.2** — The ground station can send commands to the onboard system (e.g., operator-assisted re-localization hint with approximate coordinates) via STATUSTEXT, NAMED_VALUE_FLOAT, or a custom MAVLink dialect.
- Accuracy is consistent with the frame-center position accuracy of the GPS-Denied system - **AC-6.3** — Output coordinates are in **WGS84** format (matches GPS_INPUT spec).
# Satellite Reference Imagery ## Object Localization (AI Camera)
- Satellite reference imagery resolution must be at least 0.5 m/pixel, ideally 0.3 m/pixel - **AC-7.1** — Other onboard AI systems may request GPS coordinates of objects detected by the AI camera. Localization accuracy is **consistent with the frame-center accuracy of the GPS-Denied system in level flight (bank/pitch <5°)**. In maneuvering flight, ground-projection error is bounded by `altitude × |sin(unknown_bank_or_pitch)|` and the system shall publish that bound alongside the estimate.
- Satellite imagery for the operational area should be less than 2 years old where possible - **AC-7.2** — The system computes object coordinates trigonometrically using: current UAV GPS position (from GPS-Denied), known AI-camera gimbal angle, zoom, and current flight altitude. Flat-terrain assumption applies.
- Satellite imagery must be pre-processed and loaded onto the companion computer before flight. Offline preprocessing time is not time-critical (can take minutes/hours)
## Satellite Reference Imagery
- **AC-8.1** — Satellite reference imagery is provided by the **Azaion Suite Satellite Service** (a separate component of the Suite). The runtime onboard system consumes this service through an offline tile cache interface; it does **not** call commercial providers (Maxar, Airbus, Planet, etc.) directly. The Satellite Service is responsible for upstream sourcing and is out of scope for this build. Required resolution at the cache interface: **at least 0.5 m/pixel, ideally 0.3 m/pixel**.
- **AC-8.2** — Satellite tiles consumed at runtime shall be:
- **<6 months old** for active-conflict sectors;
- **<12 months old** for stable rear sectors.
System shall reject or downgrade-confidence on tiles older than these thresholds (see AC-NEW-6).
- **AC-8.3** — Satellite imagery for the operational area shall be **pre-loaded and pre-processed** onto the companion computer before flight. Offline preprocessing time is not time-critical (minutes/hours). Pre-extracted tile descriptors (e.g., SuperPoint keypoints/descriptors and DINOv2-VLAD global descriptors) are part of the cache.
- **AC-8.4** — **Mid-flight tile generation & write-back**: during flight, the system shall continuously orthorectify navigation-camera frames into tiles aligned with the basemap projection and store them in the local cache, **deduplicated** so each ground sector is stored at most once (latest / highest-quality tile wins). On landing, the companion computer shall upload newly generated tiles back to the Azaion Suite Satellite Service so that the next mission cache contains imagery refreshed by the previous flight.
- **AC-8.5** — **Storage policy**: the system shall **not** retain raw navigation-camera frames or AI-camera frames as part of normal operation. Tiles are the only persistent imagery artifact. Forensic exception: a low-rate (≤0.1 Hz) thumbnail log of frames that **failed** tile generation may be retained for debugging within the FDR budget (AC-NEW-3).
- **AC-8.6** — **VPR retrieval unit + change-robustness**:
- The Visual Place Recognition (Component 2) FAISS index shall be built over **ground-footprint-sized "VPR chunks"** (~600800 m at the deployment altitude band, with **4050 % overlap** between adjacent chunks), **decoupled from the slippy-XYZ storage tile** (z=20). Any UAV frame footprint shall fall fully inside ≥1 chunk regardless of position.
- The index shall be **multi-scale**: in addition to fine-scale chunks (derived from z=20 storage), a coarser-scale chunk descriptor set (z=17 or z=18 effective scale) shall be maintained for change-robust retrieval in **active-conflict sectors** where building destruction or major scene change is expected.
- VPR top-K shall be **dynamically sized** by sector classification (AC-NEW-6) and EKF position covariance: K=5 in stable sectors with σ_xy ≤ 20 m; K=20 in active-conflict sectors; K=50 on expanding-window fallback.
- VPR shall be **invoked conditionally**, not on every frame: in steady state (last anchor age < 2 s, σ_xy < 20 m, VO healthy), the system uses a geometric prior from IMU+VO predicted position to rank candidate chunks by distance alone. VPR's DINOv2 forward is invoked on **re-loc triggers** (cold start AC-NEW-1, sharp turn AC-3.2, disconnected segment AC-3.3, σ_xy > 50 m, or VO failure for ≥2 frames).
## New AC (added in Phase 1 assessment, expanded with rationale & validation)
### AC-NEW-1 — Time-to-first-fix on cold start
**Statement.** From companion-computer boot, the system shall emit its first valid `GPS_INPUT` message in **<30 s**, given an IMU-extrapolated initial position handed over from the flight controller's EKF.
**Why it matters.** A mid-flight reboot (brown-out, watchdog reset, OS panic) is a realistic scenario on a fixed-wing UAV running an 8-hour mission. The autopilot continues to fly on IMU dead reckoning during the gap; a 30 s budget keeps that drift under ~500 m at 60 km/h cruise, which the EKF can absorb when our first fix arrives.
**Implementation drivers.** TensorRT engines must be built at install time (not at first run); CUDA / TRT init <5 s; tile-cache mmap warm at start; FAISS index loaded before MAVLink connect; first VPR retrieval + cross-view match must succeed at full resolution within the remaining budget.
**Validation.** Bench: cold-boot the companion 50× with simulated FC-pose input; record time from boot to first valid `GPS_INPUT` MAVLink frame. Pass = 95% percentile <30 s.
### AC-NEW-2 — Spoofing-promotion latency
**Statement.** When the flight controller signals GPS denial or spoofing (ArduPilot fix-loss / EKF lane-switch event; PX4 `EKF2_GPS_SPOOFED` flag if PX4 ever returns to scope), the GPS-Denied system shall promote its own estimate to the FC's primary GPS source within **<3 s**.
**Why it matters.** Without this gate, the FC may continue to follow a spoofed real-GPS source while our valid estimate sits idle. 3 s is short enough to keep the FC from acting on a malicious heading change but long enough to ride out a single-frame anomaly.
**Implementation drivers.** Subscribe to `GPS_RAW_INT`, `EKF_STATUS_REPORT`, `SYS_STATUS`. Maintain an internal "real-GPS health" rolling average; switch to "primary" mode (raise our `GPS_INPUT` `fix_type` to 3D and assert) when health drops below threshold for ≥1 s. Emit `STATUSTEXT` to QGC on every promotion / demotion.
**Validation.** SITL: simulate spoofing (inject false `GPS_RAW_INT` from a malicious node); measure time from spoof onset to our promotion. Pass = 95% percentile <3 s.
### AC-NEW-3 — Flight Data Recorder
**Statement.** The system shall retain to non-volatile storage, per flight: per-frame position estimates with covariance and source-label, IMU traces from the FC at full rate, all emitted `GPS_INPUT` frames, MAVLink raw stream (tlog), system health (CPU / GPU / temp / throttle), tiles generated mid-flight (AC-8.4), and a low-rate (≤0.1 Hz) thumbnail log of frames that failed tile generation. **Raw nav-cam frames and AI-cam frames are NOT retained** (AC-8.5). Storage cap **64 GB / flight**; recorder rolls over (oldest segment dropped first) after cap.
**Why it matters.** Tiles, telemetry traces, and IMU are the operationally useful artifacts: they reproduce the mission, feed the next mission's cache (AC-8.4), and let post-mission analysis explain any false-position event (AC-NEW-4). Raw frames are large and redundant once tiles exist.
**Implementation drivers.** Per-day directory layout; fixed-size segment files; rollover policy on segment-close, not on every write. NVMe ≥64 GB on top of the persistent satellite-tile cache.
**Validation.** Bench: run an 8-hour synthetic load (3 Hz nav frames replayed from disk), assert the FDR ends ≤64 GB and no payload class is silently dropped without a logged rollover event.
### AC-NEW-4 — False-position safety budget
**Statement.**
- P(reported estimate error > **500 m**) **<0.1 %** per flight.
- P(reported estimate error > **1 km**) **<0.01 %** per flight.
**Why it matters.** A single 1-km-off `GPS_INPUT` frame can hand the FC a heading that flies the UAV outside the geofence in seconds. The covariance carried in `GPS_INPUT` (`h_acc`) is the FC's only defense; this AC bounds the **probability** of our covariance under-reporting reality.
**Implementation drivers.** EKF covariance must be calibrated, not optimistic. Cross-view fixes with low inlier ratio must be **rejected**, not down-weighted to "small but non-zero". Outlier rejection at the EKF stage (Mahalanobis gate) is mandatory.
**Validation.** Monte Carlo over the AerialVL public dataset (S03) and our own recorded Mavic flights, with synthetic IMU injection where applicable; report error CDF; pass = both probabilities below budget across ≥100 simulated flights worth of frames.
### AC-NEW-5 — Operational environmental envelope
**Statement.** Operating temperature **20 °C to +50 °C**; vibration / shock per RTCA DO-160G low-altitude UAV-class envelope. The cooling solution shall sustain the **25 W** power mode at the upper temperature bound for the full **8-hour duty cycle** without thermal throttling.
**Why it matters.** Without this, all latency / accuracy ACs are conditional on a benign thermal day. Eastern/southern Ukraine summers easily exceed +35 °C ambient inside a UAV bay; without active cooling, the Jetson throttles to 15 W mode and our 400 ms latency budget collapses.
**Implementation drivers.** Forced-air or active heatsink sized for 25 W continuous at +50 °C ambient bay temperature; thermal sensors logged in FDR (AC-NEW-3); throttle event = automatic `STATUSTEXT` warning to QGC.
**Validation.** Hot-soak chamber test: 25 W workload at +50 °C ambient for 8 h; assert no throttle. Cold-soak: 20 °C cold-start to first fix within AC-NEW-1 budget.
### AC-NEW-6 — Imagery freshness enforcement
**Statement.** The system shall reject (or downgrade confidence on) any satellite tile whose capture date violates AC-8.2 (>6 months old in active-conflict sectors; >12 months old in stable rear sectors). Tiles generated mid-flight (AC-8.4) and not yet uploaded to the Suite Satellite Service are timestamped with the current flight date and treated as fresh.
**Why it matters.** Stale satellite tiles are the dominant cross-view-matching failure mode in active-conflict sectors (cratering, dam destruction, road realignment). A confident match against a stale tile is worse than no match.
**Implementation drivers.** Each tile carries `capture_date` metadata in the cache index. Sector classification (active vs stable) is part of the operational area definition handed in pre-flight. Confidence weight = 1.0 if within freshness budget, linearly decayed to 0.0 over a 30-day grace zone past the budget, hard reject beyond the grace.
**Validation.** Inject tiles with synthetic age into the cache; verify rejection / decay curve matches spec; verify a stale-tile match never produces a `satellite_anchored` source label.
### AC-NEW-7 — Cache-poisoning safety budget
**Statement.** Per flight, across all onboard tiles written by Component 1b (in-flight ortho-tile generator):
- P(onboard tile geo-misaligned > **30 m**) **<1 %**.
- P(onboard tile geo-misaligned > **100 m**) **<0.1 %**.
**Why it matters.** Onboard tiles feed back into the Suite Satellite Service's basemap (AC-8.4). Without this AC, a confidently-bad EKF pose can write a misaligned tile that, after Service ingest, becomes the next flight's satellite anchor — producing cross-flight error compounding that AC-NEW-4 (single-flight false-position budget) does not capture. This AC bounds the **probability** that an onboard tile's claimed geo-alignment is wrong by a margin that would propagate to a downstream flight.
**Implementation drivers.**
- Service-source tiles are immutable within freshness budget (AC-8.2); onboard tiles overwrite only stale or other-onboard tiles.
- The Suite Satellite Service ingest applies a **2-flight voting layer**: an onboard tile gets promoted to "trusted basemap" only after **N≥2 independent flights** confirm consistent geo-alignment within X m of each other. (Active sectors per AC-NEW-6 may use single-flight promotion when σ_xy ≤ 3 m AND OSM-road-overlap ≥ 70 %.)
- The Component-1b parent-pose covariance is a **hard gate** in the local quality score: σ_xy ≤ 5 m for a hard write (`trust_level = candidate`); σ_xy ≤ 3 m for `trust_level = candidate` with full quality; tiles written in the σ_xy ∈ (3, 5] m band are marked `trust_level = soft` in the sidecar.
- Eligibility check (Component 1b) tightens generation gate from σ_xy ≤ 10 m to σ_xy ≤ 5 m.
**Validation.** Multi-flight Monte Carlo replay over AerialVL + Mavic + AerialExtreMatch with **synthetic over-confidence injection** (artificially deflate EKF covariance by 1.5×–3×): assert both probabilities below budget across ≥100 simulated flights worth of frames. Independently, Service-side voting layer is exercised in F-T3 to verify candidate tiles are not promoted to trusted basemap before N-flight confirmation.
@@ -35,7 +35,7 @@ Ground truth GPS coordinates for each frame are in `coordinates.csv`. The system
| # | Input | Input Description | Expected Result | Comparison | Tolerance | Reference File | | # | Input | Input Description | Expected Result | Comparison | Tolerance | Reference File |
|---|-------|-------------------|-----------------|------------|-----------|---------------| |---|-------|-------------------|-----------------|------------|-----------|---------------|
| 1 | `coordinates.csv` (all 60 frames) | Sequential flight images with ground truth GPS | ≥ 80% of frames have position error < 50m from ground truth | percentage | ≥ 80% of frames within 50m | `expected_results/position_accuracy.csv` | | 1 | `coordinates.csv` (all 60 frames) | Sequential flight images with ground truth GPS | ≥ 80% of frames have position error < 50m from ground truth | percentage | ≥ 80% of frames within 50m | `expected_results/position_accuracy.csv` |
| 2 | `coordinates.csv` (all 60 frames) | Sequential flight images with ground truth GPS | ≥ 60% of frames have position error < 20m from ground truth | percentage | ≥ 60% of frames within 20m | `expected_results/position_accuracy.csv` | | 2 | `coordinates.csv` (all 60 frames) | Sequential flight images with ground truth GPS | ≥ 50% of frames have position error < 20m from ground truth (per AC-1.2) | percentage | ≥ 50% of frames within 20m | `expected_results/position_accuracy.csv` |
| 3 | `coordinates.csv` (all 60 frames) | Sequential flight images with ground truth GPS | Per-frame position output in WGS84 (lat, lon) | numeric_tolerance | each frame ± 100m max (no single frame exceeds 100m error) | `expected_results/position_accuracy.csv` | | 3 | `coordinates.csv` (all 60 frames) | Sequential flight images with ground truth GPS | Per-frame position output in WGS84 (lat, lon) | numeric_tolerance | each frame ± 100m max (no single frame exceeds 100m error) | `expected_results/position_accuracy.csv` |
| 4 | `coordinates.csv` (all 60 frames) | Sequential flight images with ground truth GPS | Cumulative VO drift between satellite anchors < 100m | threshold_max | ≤ 100m drift between anchors | N/A | | 4 | `coordinates.csv` (all 60 frames) | Sequential flight images with ground truth GPS | Cumulative VO drift between satellite anchors < 100m | threshold_max | ≤ 100m drift between anchors | N/A |
@@ -72,7 +72,7 @@ Ground truth GPS coordinates for each frame are in `coordinates.csv`. The system
| 16 | Frames 32-43 from coordinates.csv | Trajectory with direction change (turn area) | System continues producing position estimates through the turn | threshold_min | ≥ 1 position output per frame | N/A | | 16 | Frames 32-43 from coordinates.csv | Trajectory with direction change (turn area) | System continues producing position estimates through the turn | threshold_min | ≥ 1 position output per frame | N/A |
| 17 | Simulated consecutive frames with 350m gap | Outlier between 2 consecutive photos due to tilt | System handles outlier, position estimate not corrupted (error < 100m for next valid frame) | threshold_max | ≤ 100m error after recovery | N/A | | 17 | Simulated consecutive frames with 350m gap | Outlier between 2 consecutive photos due to tilt | System handles outlier, position estimate not corrupted (error < 100m for next valid frame) | threshold_max | ≤ 100m error after recovery | N/A |
| 18 | Simulated sharp turn (no overlap, <5% overlap, <70° angle, <200m drift) | Sharp turn where VO fails | Satellite re-localization triggers, position recovered within 3 frames after turn | threshold_max | position error ≤ 50m after re-localization | N/A | | 18 | Simulated sharp turn (no overlap, <5% overlap, <70° angle, <200m drift) | Sharp turn where VO fails | Satellite re-localization triggers, position recovered within 3 frames after turn | threshold_max | position error ≤ 50m after re-localization | N/A |
| 19 | Simulated VO loss + satellite match success | Tracking loss → re-localization | cuVSLAM restarts, ESKF position corrected, tracking_state returns to NORMAL | exact | tracking_state == NORMAL after recovery | N/A | | 19 | Simulated VO loss + satellite match success | Tracking loss → re-localization | cuVSLAM restarts, Component 5 calibrator emits a satellite-anchored fix, FC EKF3 reconverges, tracking_state returns to NORMAL | exact | tracking_state == NORMAL after recovery | N/A |
### 3-Consecutive-Failure Re-Localization ### 3-Consecutive-Failure Re-Localization
@@ -80,20 +80,20 @@ Ground truth GPS coordinates for each frame are in `coordinates.csv`. The system
|---|-------|-------------------|-----------------|------------|-----------|---------------| |---|-------|-------------------|-----------------|------------|-----------|---------------|
| 20 | Simulated VO loss + 3 satellite match failures | Cannot determine position by any means | Re-localization request sent: `RELOC_REQ: last_lat=.* last_lon=.* uncertainty=.*m` | regex | message matches pattern | N/A | | 20 | Simulated VO loss + 3 satellite match failures | Cannot determine position by any means | Re-localization request sent: `RELOC_REQ: last_lat=.* last_lon=.* uncertainty=.*m` | regex | message matches pattern | N/A |
| 21 | Re-localization request active | System waiting for operator | GPS_INPUT fix_type=0, system continues IMU prediction, continues satellite matching attempts | exact (fix_type) | fix_type == 0 | N/A | | 21 | Re-localization request active | System waiting for operator | GPS_INPUT fix_type=0, system continues IMU prediction, continues satellite matching attempts | exact (fix_type) | fix_type == 0 | N/A |
| 22 | Operator sends approximate coordinates (lat, lon) | Operator re-localization hint | System uses hint as ESKF measurement (high covariance ~500m), attempts satellite match in new area | threshold_max | position error ≤ 500m initially, ≤ 50m after satellite match | N/A | | 22 | Operator sends approximate coordinates (lat, lon) | Operator re-localization hint | System uses hint as a high-covariance (~500m) seed for VPR/cross-view re-localization (consumed by Component 5 calibrator), attempts satellite match in new area | threshold_max | position error ≤ 500m initially, ≤ 50m after satellite match | N/A |
### Startup & Handoff ### Startup & Handoff
| # | Input | Input Description | Expected Result | Comparison | Tolerance | Reference File | | # | Input | Input Description | Expected Result | Comparison | Tolerance | Reference File |
|---|-------|-------------------|-----------------|------------|-----------|---------------| |---|-------|-------------------|-----------------|------------|-----------|---------------|
| 23 | System boot with GLOBAL_POSITION_INT available | Normal startup | System reads initial position, initializes ESKF, starts GPS_INPUT output | exact | GPS_INPUT output begins within 60s of boot | N/A | | 23 | System boot with GLOBAL_POSITION_INT available | Normal startup | System reads initial position, initializes Component 5 calibrator state, starts GPS_INPUT output (per AC-NEW-1 cold-start TTFF budget) | threshold_max | GPS_INPUT output begins within 30s of boot (95th percentile) | N/A |
| 24 | System boot + first satellite match | Startup validation | First satellite match validates initial position, position error drops | threshold_max | position error ≤ 50m after first satellite match | N/A | | 24 | System boot + first satellite match | Startup validation | First satellite match validates initial position, position error drops | threshold_max | position error ≤ 50m after first satellite match | N/A |
### Mid-Flight Reboot Recovery ### Mid-Flight Reboot Recovery
| # | Input | Input Description | Expected Result | Comparison | Tolerance | Reference File | | # | Input | Input Description | Expected Result | Comparison | Tolerance | Reference File |
|---|-------|-------------------|-----------------|------------|-----------|---------------| |---|-------|-------------------|-----------------|------------|-----------|---------------|
| 25 | System process killed mid-flight | Companion computer reboot | System recovers: reads FC position, inits ESKF with high uncertainty, loads TRT engines, starts cuVSLAM, performs satellite match | threshold_max | total recovery time ≤ 70s | N/A | | 25 | System process killed mid-flight | Companion computer reboot | System recovers: reads FC IMU-extrapolated position, re-initialises Component 5 calibrator state with high uncertainty, loads TRT engines, starts cuVSLAM, performs satellite match | threshold_max | total recovery time ≤ 30s (matches AC-NEW-1 TTFF) | N/A |
| 26 | Post-reboot first satellite match | Recovery validation | Position accuracy restored after first satellite match | threshold_max | position error ≤ 50m after first satellite match | N/A | | 26 | Post-reboot first satellite match | Recovery validation | Position accuracy restored after first satellite match | threshold_max | position error ≤ 50m after first satellite match | N/A |
### Object Localization ### Object Localization
@@ -126,7 +126,7 @@ Ground truth GPS coordinates for each frame are in `coordinates.csv`. The system
| 35 | 30-minute sustained operation | Memory usage over time | Peak memory < 8GB, no memory leaks (growth < 50MB over 30min) | threshold_max | peak < 8192MB, growth ≤ 50MB | N/A | | 35 | 30-minute sustained operation | Memory usage over time | Peak memory < 8GB, no memory leaks (growth < 50MB over 30min) | threshold_max | peak < 8192MB, growth ≤ 50MB | N/A |
| 36 | 30-minute sustained operation | GPU thermal | SoC junction temperature stays below 80°C (no throttling) | threshold_max | ≤ 80°C | N/A | | 36 | 30-minute sustained operation | GPU thermal | SoC junction temperature stays below 80°C (no throttling) | threshold_max | ≤ 80°C | N/A |
| 37 | cuVSLAM single frame | VO processing time | cuVSLAM inference ≤ 20ms per frame | threshold_max | ≤ 20ms | N/A | | 37 | cuVSLAM single frame | VO processing time | cuVSLAM inference ≤ 20ms per frame | threshold_max | ≤ 20ms | N/A |
| 38 | Satellite matching single frame | Satellite matching time (async) | LiteSAM/XFeat inference ≤ 330ms | threshold_max | ≤ 330ms | N/A | | 38 | Satellite matching single frame | Inline cross-view matcher time | SP+LG (TRT FP16/INT8) inline-matcher inference ≤ 200ms / pair on Orin Nano Super @ 25W. (LiteSAM is re-loc-fallback only, ≤ 2s budget — out of inline path.) | threshold_max | ≤ 200ms inline; ≤ 2000ms re-loc fallback | N/A |
| 39 | TRT engine load | Engine initialization time | All TRT engines loaded within 10s total | threshold_max | ≤ 10s | N/A | | 39 | TRT engine load | Engine initialization time | All TRT engines loaded within 10s total | threshold_max | ≤ 10s | N/A |
### Satellite Tile Management ### Satellite Tile Management
+49 -28
View File
@@ -1,37 +1,58 @@
# UAV & Flight # Restrictions
- Photos are taken by only airplane (fixed-wing) type UAVs > **Last revised**: 2026-04-26 (post Mode B Solution Assessment + user-driven addendum on camera spec & zoom level).
- Photos are taken by the camera pointing downwards and fixed, but it is not autostabilized
- The flying range is restricted by the eastern and southern parts of Ukraine (to the left of the Dnipro River)
- Altitude is predefined and no more than 1km. The height of the terrain can be neglected
- Flights are done mostly in sunny weather
- During the flight, UAVs can make sharp turns, so that the next photo may be absolutely different from the previous one (no same objects), but it is rather an exception than the rule
- Number of photos per flight could be up to 3000, usually in the 500-1500 range
# Cameras ## UAV & Flight
- UAV has two cameras: - Photos are taken by airplane (fixed-wing) type UAVs only.
1. **Navigation camera** — fixed, pointing downwards, not autostabilized. Used by GPS-Denied system for position estimation - Photos are taken by the navigation camera pointing downwards and fixed (not gimbal-stabilized).
2. **AI camera** — main camera with configurable angle and zoom, used by onboard AI detection systems - Operational area is the eastern and southern parts of Ukraine (east/left of the Dnipro River).
- Navigation camera resolution: FullHD to 6252*4168. Camera parameters are known: focal length, sensor width, resolution, etc. - Mission profile: 8-hour flights at ~60 km/h cruise. Two route shapes coexist:
- Cameras are connected to the companion computer (interface TBD: USB, CSI, or GigE) - **Sector**: up to **10 × 15 km = 150 km²** of dense coverage.
- Terrain is assumed flat (eastern/southern Ukraine operational area); height differences are negligible - **Transit corridor**: ~**50 km × 1 km = 50 km²** strip in/out of the sector.
- **Total operational area: up to ~400 km²** of pre-cached satellite imagery per mission. Cache is **persistent across flights** (not redownloaded each mission). Storage budget **~10 GB** for the satellite tile cache; see AC-NEW-3 for flight-data-recorder budget.
- Altitude: pre-defined, **≤1 km AGL**. Terrain is assumed flat (operational area is rolling steppe / agricultural land); height differences are negligible.
- Weather: predominantly sunny daytime operations.
- Sharp turns occur but are the exception, not the rule. Two consecutive photos may share <5% overlap during a turn (see AC-3.2).
- **No photo-count cap.** The previously stated "up to 3000 photos per flight" was a legacy operator number from a Mavic-class workflow; it is dropped because (a) it is inconsistent with 8 h × 3 fps, and (b) the system does **not store raw photos at all** (see AC-8.5). Storage is bounded by the tile-cache + FDR caps (~10 GB persistent + 64 GB / flight, AC-NEW-3).
# Satellite Imagery ## Cameras
- We can use satellite providers, but we're limited right now to Google Maps, which could be outdated for some regions - The UAV carries **two cameras**:
- Satellite imagery for the operational area must be pre-loaded onto the companion computer before flight 1. **Navigation camera** — fixed, downward-pointing, not autostabilized. Consumed by the GPS-Denied system for position estimation.
2. **AI camera** — main mission camera with operator-controllable gimbal angle and zoom. Consumed by onboard AI detection systems.
- **Navigation camera**: **ADTi 20MP 20L V1, APS-C sensor, ~5472 × 3648 px (≈20 MP)**. APS-C sensor (~23.6 × 15.7 mm). Lens TBD — selected during solution-draft phase to land GSD in the **1020 cm/px band at 1 km AGL** (drives a frame ground footprint of ~470 m × 314 m to ~980 m × 655 m depending on focal length). Other intrinsics (focal length, exact sensor dimensions, distortion coefficients) are pinned at module-selection time and used by Component-1b orthorectification (pre-flight checkerboard calibration, F-F2).
- **AI camera pose information available to the GPS-Denied system**: gimbal angle and zoom only. The UAV's instantaneous bank/pitch is **not** published from the autopilot to the AI-camera reasoning path. Object-localization accuracy is therefore scoped to level flight (AC-7.1).
- Cameras connect to the companion computer over USB, MIPI-CSI, or GigE (specific interface TBD at solution-draft phase, dependent on chosen module).
# Onboard Hardware ## Satellite Imagery
- Processing is done on a Jetson Orin Nano Super (67 TOPS, 8GB shared LPDDR5, 25W TDP) - **Source: Azaion Suite Satellite Service** (a separate component of the wider Suite). The onboard system is a **consumer** of this service, not a direct customer of commercial providers. Upstream sourcing (Maxar / Airbus / partner agencies / commissioned tasking) is the Satellite Service's concern, not this build's.
- The companion computer runs JetPack (Ubuntu-based) with CUDA/TensorRT available - **Onboard interface to the Service is offline-only**: the companion computer holds a local cache populated **before flight** by syncing from the Service for the operational area (AC-8.3). No satellite imagery is fetched in-flight from the Service.
- Onboard storage for satellite imagery is limited (exact capacity TBD, but must be accounted for in tile preparation) - **Mid-flight tile generation (AC-8.4)**: during the mission the companion computer generates fresh tiles from the navigation camera, orthorectified into the basemap projection, deduplicated against the existing cache, and stored locally. On landing, those new tiles are uploaded back to the Suite Satellite Service for ingestion, so the next mission's cache is refreshed by the previous flight.
- Sustained GPU load may cause thermal throttling; the processing pipeline must stay within thermal envelope - **No raw photo storage** (AC-8.5): the tile is the unit of persistence. Raw nav-camera and AI-camera frames are not retained (except a low-rate failure-thumbnail log for forensics).
- **Resolution at the cache interface**: 0.5 m/pixel minimum, 0.3 m/pixel ideal (AC-8.1). The architecture is provider-agnostic at the cache boundary; whatever the Suite Satellite Service supplies must meet that bar.
- **Storage tile zoom level**: **slippy-XYZ z=20 (~30 cm/px, 512×512)** — pinned because the matcher (Component 3) needs ≤~4× scale ratio between the UAV frame (~12 cm/px GSD at 1 km AGL with the 20 MP APS-C camera) and the reference; z=20 gives a 2.5× ratio (workable), z=18 gives a 10× ratio (matcher accuracy breaks down). Storage budget at z=20 across the 400 km² operational area = ~2.8 GB cache + ~30 MB DEM + ~16 MB VPR chunk index ≈ ~3 GB total — well inside the 10 GB cache budget. **VPR retrieval unit is decoupled from the storage tile** (see AC-8.6 below): VPR chunks are derived from the z=20 tile cache at ground-footprint scale (~600800 m chunks with 4050 % overlap), independent of the storage zoom level.
- **Freshness gates** (AC-8.2 / AC-NEW-6) are enforced at runtime: tiles older than 6 months (active-conflict sectors) or 12 months (stable rear sectors) are rejected or down-confidence-weighted. Tiles generated mid-flight are timestamped with the current flight date and treated as fresh.
- **Free public imagery (Sentinel-2 etc.)** is not on the runtime path. If the Suite Satellite Service ever returns Sentinel-class tiles, the cache rejects them as below the 0.5 m/px floor.
# Sensors & Integration ## Onboard Hardware
- There is a lot of data from IMU (via the flight controller) - Processing platform: **Jetson Orin Nano Super** — 67 TOPS sparse INT8, **8 GB shared LPDDR5** (CPU and GPU share the same memory pool), **25 W TDP**.
- The system communicates with the flight controller via MAVLink protocol using MAVSDK library - Companion computer runs JetPack (Ubuntu-based) with CUDA / TensorRT available.
- The system must output GPS coordinates to the flight controller as a replacement for the real GPS module (MAVLink GPS_INPUT message) - Sustained GPU load may cause thermal throttling; the processing pipeline must stay within the thermal envelope. The cooling solution shall sustain the 25 W power mode for the 8-hour duty cycle at the upper environmental-envelope temperature (AC-NEW-5).
- Ground station telemetry link is available but bandwidth-limited; it is not the primary output channel - Onboard non-volatile storage: budget at least the satellite-cache (~10 GB) **plus** the flight-data-recorder cap (64 GB / flight, AC-NEW-3). Reuse-across-flights tile cache stays resident; per-flight FDR rolls over after cap.
## Sensors & Integration
- High-rate **IMU** data is available from the flight controller via MAVLink.
- The system communicates with the flight controller via MAVLink. Telemetry plumbing uses **MAVSDK**; the `GPS_INPUT` injection path is implemented via **pymavlink**, since MAVSDK does not expose a native `GPS_INPUT` API.
- **Autopilot target: ArduPilot only** (with `GPS1_TYPE=14` for MAVLink GPS injection). PX4 is out of scope for the build; if it ever returns to scope it will use `VISION_POSITION_ESTIMATE`, not `GPS_INPUT`. (See `_docs/00_research/00_ac_assessment.md` Q-1.)
- The system outputs WGS84 GPS coordinates to the flight controller as a replacement for the real GPS module (MAVLink GPS_INPUT, AC-4.3).
- **Ground station: QGroundControl** is the supported GCS. Mission Planner is not in scope. Telemetry link is bandwidth-limited and is not the primary output channel; per-frame data stays on the local FDR (AC-NEW-3), GCS sees a 12 Hz downsampled summary (AC-6.1).
## Failsafe & Safety
- If the GPS-Denied system fails to produce any position estimate for **>3 s**, the autopilot falls back to IMU-only dead reckoning (AC-5.2). N=3 s rides through one sharp turn at cruise speed without tripping the failsafe.
- The system must satisfy the false-position safety budget in AC-NEW-4 (P(error >500 m) <0.1%, P(error >1 km) <0.01% per flight).
- Cold-start time-to-first-fix budget is **<30 s** from companion-computer boot (AC-NEW-1); spoofing-promotion latency is **<3 s** from FC's GPS-loss signal (AC-NEW-2).
+211
View File
@@ -0,0 +1,211 @@
# Acceptance Criteria & Restrictions Assessment (Phase 1 — BLOCKING)
**Project**: GPS-denied onboard navigation for fixed-wing UAV (companion-computer-side, Jetson Orin Nano Super, MAVLink/MAVSDK to flight controller, satellite tile reference + monocular VO).
**Mode**: Research Mode A, Phase 1 (AC Assessment). Phase 2 (Solution Draft) blocked on user confirmation of this document.
**Reviewers**: Product owner / decision-maker; technical architect.
---
## 0. TL;DR — what changes after this assessment
1. **One blocker, three contradictions, six gaps** — the AC and restriction set is solid in spirit but cannot be implemented as written.
2. **Hard blocker**: Google Maps satellite tiles are explicitly prohibited for offline / autonomous-vehicle / image-analysis use (Google Maps Platform ToS + Map Tiles API Policies). The restriction `"limited right now to Google Maps"` is legally not deployable.
3. **Hard contradiction #1**: "up to 3000 photos per flight" vs. 8 h × 3 fps = 86,400 photos. Storage and processing budgets cannot be sized until this is reconciled.
4. **Hard contradiction #2**: Camera resolution range "FullHD to 6252×4168" gives a 13× pixel-count delta — per-frame compute cannot be designed against a 13× moving target.
5. **Hard contradiction #3**: AC "Object localization accuracy is consistent with frame-center accuracy" is not physically achievable with the *AI-camera-gimbal-angle-only* pose information confirmed in scope (no airframe IMU fusion onto AI cam). At 1 km AGL a 5° unknown roll/pitch is ~87 m of ground error.
6. **Six recommended new AC** added at the bottom of section 1 (TTFF, spoofing-promotion latency, flight-data-recorder, false-position safety budget, environmental envelope, imagery freshness).
7. **Suggested numerical recalibration** of three existing AC values (frame-center 80%/60%, MRE, "TBD" failsafe N) to better-evidenced numbers.
A clear **A/B/C choice** at the bottom of this document is the BLOCKING gate.
---
## 1. Acceptance Criteria
Status legend: **K** = keep as-is; **M** = modify (numeric or wording change recommended); **A** = added (new AC); **R** = remove / supersede; **F** = flagged (factually questionable, needs user judgement).
| ID | Criterion (paraphrased) | Our Value | Researched / Recommended Value | Cost / Timeline Impact | Status | Notes |
|----|-------------------------|-----------|--------------------------------|------------------------|--------|-------|
| AC-1.1 | Frame-center GPS error: ≥80% within 50 m | 80%@50m | **Achievable.** SOTA cross-view UAV-vs-satellite at low-mid altitude reaches RDS ~84% / MA@20 ~83% in nadir-favoring setups (S39); AnyVisLoc 74.1%@5m at 30300 m (S02). At 1 km AGL with non-stabilized monocular nadir + Google-Earth-grade reference, **80%@50 m is realistic IF satellite-anchored frames are ≥30% of the trajectory**; relies on VPR + fine matching + Kalman/factor-graph fusion with VO between anchors. | None to scope | K | Achievability assumes the anchor density and VO drift bounds in AC-1.3. |
| AC-1.2 | Frame-center GPS error: ≥60% within 20 m | 60%@20m | **Aggressive but achievable.** MA@20 of 83% appears in cross-view literature (S39), but on benchmark-favorable data. With Ukraine seasonal change, dust, and 1 km AGL viewpoint, expect **4565%@20 m** in production. Recommend **soft-keep at 60%, hard-floor at 50%** to avoid blocking GA. | None to scope; affects test pass/fail | M | Add a hard-floor of 50%@20m as the must-pass acceptance gate; treat 60%@20m as the "stretch" target. |
| AC-1.3 | Cumulative VO drift between satellite anchors <100 m | 100 m | **Achievable.** Monocular VO without IMU drifts ~13% of distance travelled in benign conditions (S32 baselines). At 60 km/h cruise, anchor cadence of ~5 s gives ~83 m between anchors → naive 13% drift = 13 m of accumulated drift, well below 100 m. **Tighten to <50 m if VO is IMU-fused**, keep at 100 m for VO-only fallback. | None | K | Add measurement protocol: drift = ‖VO-extrapolated centre next anchor centre‖ at the moment of anchor-fix. |
| AC-1.4 | Per-estimate confidence score (high/low) | qualitative | **Recommend quantitative**: 95% covariance ellipse (semi-major axis, m) + categorical {anchored, vo-extrapolated, dead-reckoned}. Standard schemes use RANSAC inlier ratio + reprojection variance + EKF covariance (S03, S06, S32). | Negligible cost | M | Wording change only; emit both number and category. |
| AC-2.1 | Image registration rate >95% in normal segments | 95% | **Tight but achievable** with strong matcher stack (LightGlue / XFeat / MASt3R) and dense satellite tile coverage. Cross-view aerial benchmarks report 7090% on hard splits, 9098% on training-similar splits. Define "normal segment" as: nadir flight ±10°, ≥40% overlap, daytime, season-matched tile. | Drives matcher choice (S07, S08, S09) | K | Add the "normal segment" definition explicitly. |
| AC-2.2 | Mean Reprojection Error <1.0 px | <1.0 px | **Realistic for homography on overlapping aerial pairs**, optimistic for cross-domain UAVsatellite registration (typical 13 px). Recommend **split**: MRE <1.0 px for VO frame-to-frame; MRE <2.5 px for satellite-anchored homography. | None | M | Two MRE budgets, one per pipeline stage. |
| AC-3.1 | Survive 350 m outlier between consecutive photos (tilt) | 350 m | **Reasonable.** At 1 km AGL with up to 20° plane tilt, frame-centre can shift ~360 m. Outlier-rejection (RANSAC + Mahalanobis gate on EKF innovation) handles this. | None | K | — |
| AC-3.2 | Sharp-turn handling: <5% overlap, <200 m drift, <70° heading change | <5% / 200 m / 70° | **Plausible** with a place-recognition based re-localization (S05 AnyLoc, S06 MixVPR, S04 aero-vloc). Without overlap, VO returns NULL; the segment-stitching strategy must rely on global descriptor retrieval over the satellite tile cache. | Drives VPR component (S04, S05) | K | — |
| AC-3.3 | >2 disconnected segments per flight; stitching is core | yes | **Confirmed core** by AC-3.2. Re-localization → multi-segment SLAM-style merging via pose-graph; aero-vloc benchmark validates this pattern. | Architectural | K | — |
| AC-3.4 | After 3 frames with no fix → operator re-localization request | 3 frames | **Reasonable trigger**, but **N seconds is a better unit** because frame rate may drop under load. Recommend: re-localization request after **≥3 consecutive frames AND ≥2 s** without a fix. | Negligible | M | Dual trigger (frames AND time). |
| AC-4.1 | <400 ms end-to-end per frame | <400 ms p95 | **Feasible** on Jetson Orin Nano Super at 25 W with image downsampling (e.g., 1024 × 683 working resolution from 6200 × 4100), TensorRT-accelerated SuperPoint+LightGlue or XFeat, and pre-cached satellite tile descriptors. Empirical Jetson Orin NX precedent exists (S12); Orin Nano Super envelope confirmed (S14). Skip-allowed under load (user-confirmed). | Drives matcher choice + downsampling (D-D4) | K | Reword as "p95 <400 ms with up to ~10% frames dropped under sustained load." |
| AC-4.2 | Memory <8 GB shared CPU+GPU | 8 GB | **Hard hardware envelope** of Jetson Orin Nano Super 8 GB variant (S14). Realistic with downsampling + tile descriptor cache; risky with full-res 25 MP photos held simultaneously. | Drives buffering strategy | K | — |
| AC-4.3 | Output via MAVLink GPS_INPUT (MAVSDK) | GPS_INPUT via MAVSDK | **Mismatch with stack**: MAVSDK-Python has no native GPS_INPUT (open issue #320, S18). PX4 native GPS_INPUT is limited; ArduPilot fully supports `GPS1_TYPE=14`. Recommend either (a) target ArduPilot autopilot, or (b) use **pymavlink** for raw GPS_INPUT alongside MAVSDK for general telemetry, or (c) on PX4 use VISION_POSITION_ESTIMATE through EKF2. | Drives autopilot choice / library mix | M | Lock the autopilot target (PX4 vs ArduPilot vs both) — see Q-1 below. |
| AC-4.4 | Frame-by-frame streaming, not batched | streaming | OK | None | K | — |
| AC-4.5 | May refine and resend corrections | yes | OK; common pattern. | None | K | — |
| AC-5.1 | Initialise from last known FC GPS pre-denial | yes | OK. Add explicit boundary: "system requires the FC's last EKF position + IMU extrapolation at hand-off." | Negligible | K | — |
| AC-5.2 | If no estimate for `N` seconds → FC falls back to IMU dead reckoning | N=TBD | **Recommend `N = 35 s`** (PX4 COM_POS_FS_DELAY default = 1 s, our pipeline is heavier and includes VO retries). | Decision | M | Pin a value. |
| AC-5.3 | On companion reboot mid-flight → re-init from FC's IMU-extrapolated position | yes | OK, but add: cold-start time-to-first-fix budget (see new AC-NEW-1). | Modest | K | — |
| AC-6.1 | Position + confidence streamed to GCS | yes | OK. QGroundControl primary channel is STATUSTEXT; richer telemetry via custom MAVLink dialect or NAMED_VALUE_FLOAT (S34, S35). Bandwidth budget required. | Modest | K | — |
| AC-6.2 | GCS can send re-localization hint | yes | OK; implementable as STATUSTEXT command + custom NAV-msg, or via QGC plugin. | Modest | K | — |
| AC-6.3 | WGS84 output | WGS84 | OK (matches GPS_INPUT spec). | None | K | — |
| AC-7.1 | AI camera object localization accuracy "consistent with frame-center" | qualitative | **Not physically achievable in turning flight** with **gimbal-angle-only** pose (user-confirmed scope). At 1 km AGL, 5° unknown airframe attitude → ~87 m ground error; at 25° bank → ~470 m. **Restate**: "consistent with frame-center accuracy in level flight (<5° bank); in maneuvering flight, expect ground-projection error proportional to altitude × sin(bank)." OR add airframe IMU fusion to the AI-cam pose (re-opens scope C5). | Decision | F | See Q-2 below. |
| AC-7.2 | Trigonometric calc using gimbal angle + zoom + altitude (flat terrain) | yes | Physically correct given the limitation in AC-7.1. Flat-terrain assumption costs <30 m typical for eastern/southern Ukraine relief at 1 km AGL with small gimbal off-nadir. | None | K | — |
| AC-8.1 | Satellite imagery ≥0.5 m/px, ideally 0.3 m/px | 0.5/0.3 m/px | **0.3 m/px is realistic only via paid commercial providers** (Maxar Vivid, Airbus Pléiades Neo, S25, S26). Free Sentinel-2 is 10 m/px (S28) — too coarse for 1 km AGL drone-vs-satellite registration without scale-bridging tricks. | **Significant**: $2532 / km² archive Maxar; ~€58.50 / km² Airbus → for 400 km² ≈ $1012 k / mission area, one-time. Cheaper if a defense agency feed is available. | K (criterion), M (sourcing) | See Q-3 below. |
| AC-8.2 | Imagery <2 years old where possible | <2 years | **Recommend tighter**: <12 months for stable rear sectors, <6 months for active-conflict sectors (post-2022 Ukraine landscape change is rapid — Kakhovka dam destruction is a documented example, S28). | Operational | M | Add freshness-by-sector. |
| AC-8.3 | Satellite imagery pre-loaded before flight; preprocessing time uncritical | offline preprocessing | OK; standard. Tile descriptor pre-extraction (SuperPoint / DINOv2 features) is the natural offline step. | None | K | — |
| AC-NEW-1 | **Time-to-first-fix on cold start / mid-flight reboot** | (new) | Recommend **<30 s after companion-computer boot**, given IMU-extrapolated initial position from FC. | Modest | A | Operational requirement, missing today. |
| AC-NEW-2 | **Spoofing-promotion latency** (system asserts its estimate over FC's spoofed GPS) | (new) | Recommend **<3 s** from spoof-detected to first valid GPS_INPUT taking precedence at the EKF. PX4 has 1 s spoof-detect hysteresis (S19); we add 12 s of GPS_DENIED system "warm" margin. | Drives FC config (GPS1_TYPE) | A | Security-critical AC. |
| AC-NEW-3 | **Flight-data-recorder** | (new) | All photos (or downsampled), estimates, confidence, IMU traces, MAVLink GPS_INPUT outputs retained at full rate to non-volatile storage. Cap at e.g. 64 GB/flight. | Storage budget | A | Required for post-mission forensics, cert, and ML retraining. |
| AC-NEW-4 | **False-position safety budget** | (new) | P(estimate error > 500 m) < 0.1% per flight; P(error > 1 km) < 0.01% per flight. Validated by Monte-Carlo over IMU-injected datasets + recorded flights. | Drives validation effort | A | Safety AC; missing today, but waypoint/RTL behaviour depends on it. |
| AC-NEW-5 | **Operational environmental envelope** | (new) | Operating temp 20 °C … +50 °C (Ukrainian seasonal range), shock per RTCA DO-160G low-altitude UAV-class, vibration spec to be matched to airframe. | Drives BoM (cooling, mounting) | A | Required for any production deployment. |
| AC-NEW-6 | **Imagery freshness enforcement** | (new) | System rejects (or downgrades confidence on) tiles older than the per-sector freshness threshold (AC-8.2). | Negligible | A | Operational safety. |
---
## 2. Restrictions Assessment
| ID | Restriction (paraphrased) | Our Value | Researched / Recommended Value | Cost / Timeline Impact | Status |
|----|----------------------------|-----------|--------------------------------|------------------------|--------|
| R-1 | Fixed-wing UAV only | yes | OK; matches public benchmark domain (UAV-VisLoc, AerialVL — S01, S03). | None | K |
| R-2 | Downward camera, fixed, **not autostabilized** | yes | OK; this is the hard mode (no gimbal compensation for plane bank/pitch). The whole pipeline must tolerate up to ±25° roll in turns. | Drives matcher robustness | K |
| R-3 | Eastern/Southern Ukraine ops area | yes | OK; **flag the imagery freshness implication** (active-conflict change rate, F-E7). | Drives sourcing | K |
| R-4 | Altitude ≤1 km AGL | ≤1 km | OK; flag that **GSD at 1 km with 24 mm full-frame is ~24 cm/px** (F-F1) — already finer than 0.3 m/px satellite reference. | None | K |
| R-5 | Mostly sunny weather | yes | OK; partly-cloudy and shadow movement remain a real degrader (F-A5). | Drives evaluation | K |
| R-6 | Sharp turns are exceptional but possible | yes | OK; scope includes the multi-segment stitching path (AC-3.2/3.3). | None | K |
| R-7 | **Photo count up to 3,000 per flight** | 3,000 | **Hard contradiction with 8-hour endurance × 3 fps = 86,400.** Either: (a) interpret as on-disk *retention* budget (sub-sample); (b) per-segment, not per-sortie; (c) stale value. **MUST RESOLVE** — see Q-4. | Sizing-critical | F |
| R-8 | Two cameras: nav (fixed) + AI (gimbal angle + zoom) | yes | OK; user has confirmed AI cam pose is gimbal-only (no airframe IMU fusion). Implication captured in AC-7.1. | Drives object-localization quality | K |
| R-9 | **Nav cam resolution: FullHD to 6252×4168** | range | **13× pixel-count delta** between extremes. Locking the camera spec is required for AC-4.1 / 4.2 sizing. **MUST RESOLVE** — see Q-5. | Sizing-critical | F |
| R-10 | Camera intrinsics known | yes | OK; pre-flight checkerboard or factory cal mandatory (F-F2). | Modest | K |
| R-11 | Camera-to-CC interface TBD (USB / CSI / GigE) | TBD | **Recommend GigE Vision** for 25 MP @ 3 fps (8.4 MB/frame raw → 25 MB/s — comfortable for GigE; tight for USB 3.0 in noisy electrical environments; CSI feasible for embedded camera modules but unusual at this resolution). | Drives BoM | M |
| R-12 | **Satellite imagery limited to Google Maps** | Google Maps | **Hard blocker** — Google Maps Platform ToS explicitly prohibits offline use, image analysis, autonomous-vehicle control, geodata extraction (S22, S23). Bing has the same prohibition (S24). **Must change to license-cleared provider** (Maxar Vivid / Airbus Pléiades Neo / commissioned tasking / government feed). See Q-3. | $$ + time | **R / M** |
| R-13 | Pre-loaded satellite imagery on companion | yes | OK; persistent cross-flight cache as user requested. | None | K |
| R-14 | Jetson Orin Nano Super (67 TOPS sparse INT8, 8 GB shared, 25 W) | yes | OK; envelope confirmed (S14, S15). Active cooling required for 8-hour duty (F-D2). | Drives BoM | K |
| R-15 | JetPack (Ubuntu) + CUDA + TensorRT | yes | OK; **lock JetPack 6.2** to get Super Mode (S14). | None | K |
| R-16 | Onboard storage TBD | TBD | **Recommend NVMe ≥256 GB** (10 GB tile cache + 64 GB flight-data-recorder buffer + headroom). | Modest | M |
| R-17 | Sustained GPU load may throttle | yes | OK; design constraint, not a target. Active cooling + 25 W power mode + duty-cycled compute (skip-allowed) all help. | Drives BoM + thermal design | K |
| R-18 | Lots of IMU data via FC | yes (production) | OK; **for dev/test the user has confirmed public-dataset path**. Recommended: AerialVL (primary, S03), UAV-VisLoc (visual-only validation, S01), MidAir (synthetic IMU augmentation, S30), plus the user's 65 sample photos for sanity. Plan one real test flight with IMU log capture before V&V. | Modest | K |
| R-19 | MAVLink + MAVSDK to FC | yes | **MAVSDK has no native GPS_INPUT** (S18). Use **pymavlink** for the GPS_INPUT line, MAVSDK for general telemetry. ArduPilot is the lower-friction FC target. | Drives library mix | M |
| R-20 | Output is GPS_INPUT (MAVLink) | yes | OK with the library mix above. | None | K |
| R-21 | GCS telemetry bandwidth-limited | yes | OK; high-frequency content (per-frame estimates) over MAVLink at 57600/115200 baud is tight — recommend down-sampling to 12 Hz on the telemetry link, full rate over local TCP for bench testing. | Drives protocol design | K |
---
## 3. Key findings (cross-cutting)
1. **The single biggest risk in the project is satellite-imagery sourcing.** Switching off Google Maps is mandatory; the replacement decision (paid commercial vs. government feed vs. public agency partnership) drives the budget, the freshness, and the legal posture. Recommend Maxar Vivid 30 cm or Airbus Pléiades Neo 30 cm as the working assumption; engage procurement early.
2. **Per-frame compute will fit Jetson Orin Nano Super at 25 W only with a downsampled working resolution and pre-cached satellite descriptors.** Full 6200 × 4100 matching at 3 fps within 400 ms is not realistic. We will run matchers at ~1 Mpx and reserve the full-res image only for offline forensics + AI-cam ROI.
3. **The matcher stack landscape in 20242026 is healthy**: LightGlue / SuperPoint with TensorRT (mature on Orin), XFeat (fastest, best for embedded), MASt3R (best cross-view, but heavy). A two-tier pipeline — XFeat for VO frame-to-frame, LightGlue/MASt3R for satellite anchoring — is the most defensible architecture.
4. **VPR is core**, not optional: sharp turns and disconnected segments demand a global retrieval step over the satellite tile cache. AnyLoc (DINOv2 + VLAD, training-free) is the pragmatic baseline; MixVPR is the lightweight option.
5. **Confidence scoring must be quantitative**, not just "high/low" — the flight controller and GCS need a numeric to decide when to trust GPS_INPUT versus IMU dead reckoning.
6. **AI-camera object localization at the AC's stated accuracy is not achievable with gimbal-only pose** in turning flight. Either restate the AC, or expand scope to fuse airframe IMU into the AI-cam pose.
7. **MAVSDK + GPS_INPUT does not exist.** Plan for a hybrid pymavlink/MAVSDK approach, and prefer ArduPilot as the autopilot target unless a strong PX4 reason exists.
8. **No public dataset perfectly matches our mission profile.** AerialVL is the closest fixed-wing real-world dataset; we should plan one or two of our own test flights with IMU log capture for V&V before claiming AC compliance.
---
## 4. Sources
See `_docs/00_research/01_source_registry.md` (39 sources, mostly L1/L2). Key L1: Google Maps Platform ToS (S22/S23), Bing ToS (S24), NVIDIA Jetson Linux developer guide (S14/S15), ArduPilot GPSInput docs (S16/S17), PX4 spoofing PRs (S19/S20). Key L2: UAV-VisLoc (S01), AerialVL (S03), AnyVisLoc (S02), AnyLoc (S05), XFeat (S08), MASt3R (S09), VPR aerial survey (S04).
Each fact in this document is traceable back to one or more of those sources via `_docs/00_research/02_fact_cards.md`.
---
## 5. BLOCKING — decisions required before Phase 2
These are the questions the assessment cannot resolve from research alone. **Phase 2 (the solution draft) cannot start until these are answered.**
### Q-1 — Autopilot target (drives AC-4.3, R-19, R-20)
PX4 vs ArduPilot for the flight controller has direct consequences for the GPS_INPUT pipeline. ArduPilot (`GPS1_TYPE=14`) is the lower-friction path; PX4 forces a VISION_POSITION_ESTIMATE workaround.
**Choose A / B / C:**
- A) **ArduPilot only** (lowest friction; matches MAVProxy GPSInput reference impl).
- B) **PX4 only** (must use VISION_POSITION_ESTIMATE, more EKF tuning).
- C) **Both** (more work, but maximises addressable airframe market).
### Q-2 — AI-camera pose source (drives AC-7.1)
The AC says object localization should be "consistent with frame-center accuracy". With gimbal-only pose, this is not physically achievable in turning flight.
**Choose A / B / C:**
- A) **Relax the AC** to "consistent in level flight (<5° bank); degraded by airframe attitude in maneuvering flight" — keeps scope as agreed.
- B) **Expand scope** to fuse airframe IMU (roll/pitch/yaw) into the AI-cam pose at the moment of capture, restoring the original AC.
- C) **Defer object localization** entirely (AC-7.x removed from this cycle; future work).
### Q-3 — Satellite imagery sourcing (drives R-12, AC-8.1, AC-8.2)
Google Maps is not a legally usable source.
**Choose A / B / C / D:**
- A) **Maxar Vivid 30 cm** (standard offering, $2532 / km² archive; ~$1012 k for 400 km² mission area; explicit defense licensing path).
- B) **Airbus Pléiades Neo 30 cm** (€58.50 / km² volume tier; OneAtlas tasking).
- C) **Government / agency feed** (free or subsidised — requires user to identify the agency and partnership channel).
- D) **Park the question**, deliver the system imagery-source-agnostic with a documented offline-tile-cache interface; user procures tiles separately.
### Q-4 — Photo count per flight (drives R-7, AC-NEW-3, sizing)
"Up to 3000 photos per flight" contradicts 8 h × 3 fps.
**Choose A / B / C:**
- A) "3000" is the **on-disk retention** budget — system processes 86k frames live, retains every Nth in the flight-data-recorder.
- B) "3000" is the **per mission segment** count, not per sortie — typical mission segments are ~17 minutes.
- C) "3000" is **stale** and should be replaced with "all frames captured during the sortie" (no per-flight cap, sized by storage AC-NEW-3).
### Q-5 — Nav-camera spec lock (drives R-9, AC-4.1, AC-4.2, R-11)
"FullHD to 6252×4168" is too wide for compute / storage sizing.
**Choose A / B / C:**
- A) Lock at **6252×4168** (worst case for sizing; safest).
- B) Lock at a mid-range **~12 MP** (e.g. 4000×3000) — balanced for compute and detail.
- C) Lock at **FullHD (1920×1080)** — easiest compute, but fewest features per frame.
- D) **Pick a specific camera model** (and we research focal length / lens distortion / interface).
### Q-6 — New AC additions (drives AC-NEW-1 through AC-NEW-6)
Six new AC are recommended (cold-start TTFF, spoofing-promotion latency, flight-data-recorder, false-position safety budget, environmental envelope, imagery freshness). Each addresses a real gap in the current AC.
**Choose A / B:**
- A) **Adopt all six** as written (recommended).
- B) **Adopt selectively** — user picks which to keep (we'll iterate inline).
### Q-7 — AC-1.2 hard floor (drives AC-1.2 and pass/fail gate)
Recommend a hard floor of **50% within 20 m** alongside the existing **60% within 20 m** stretch target.
**Choose A / B:**
- A) **Adopt** the 50% hard floor + 60% stretch target.
- B) **Keep** 60%@20m as the only gate (research suggests this is occasionally infeasible in production conditions — risk of a non-shippable system).
### Q-8 — Failsafe `N` value (drives AC-5.2)
Recommend **`N = 3 s`** (with rationale in fact card F-J1).
**Choose A / B / C:**
- A) **Adopt N = 3 s.**
- B) **N = 5 s** (more tolerant; longer pre-fallback dead-reckoning by FC).
- C) **Tune empirically** during integration test (placeholder N = 3 s in spec).
---
## 6. Sign-off — defaults applied
The user opted to skip the structured Q-1…Q-8 prompt and asked to "continue with the information you already have." The recommended values from this assessment have therefore been applied as the working defaults. The user may revise any cell below at any time; revisions propagate into `_docs/00_problem/acceptance_criteria.md` and `restrictions.md`.
| Decision | Applied default | Rationale | Date |
|----------|------------------|-----------|------|
| Q-1 (autopilot) | **ArduPilot only** | Native GPS_INPUT support via `GPS1_TYPE=14`; lowest integration friction (S16, S17). | 2026-04-25 |
| Q-2 (AI-cam pose) | **Relax AC-7.1 to level-flight only** | Gimbal-only pose cannot meet "consistent with frame-center" in turns at 1 km AGL (F-H1). Object-localization scope unchanged otherwise. | 2026-04-25 |
| Q-3 (imagery source) | **Azaion Suite Satellite Service** is the source. Onboard system consumes via an offline tile-cache interface; commercial procurement (Maxar / Airbus / agency) is the Service's concern, not this build's. | User clarified post-blocker: imagery is supplied by a separate Suite component. AC-8.1 / restrictions rewritten accordingly. | 2026-04-25 (revised) |
| Q-4 (photo count) | **Drop the 3000-cap entirely.** The system does **not store raw photos**. Tile cache (~10 GB) and FDR (64 GB) are the storage caps. Tiles are also generated mid-flight (AC-8.4) and uploaded to the Suite Satellite Service on landing. | User clarified post-blocker: "3000" was a legacy Mavic-class operator number; the deduplicated tile is the unit of storage. | 2026-04-25 (revised) |
| Q-5 (nav-camera spec) | **Lock at ~12 MP (4000 × 3000); always downsample for the cross-view matcher.** Specific matcher + downsample target deferred to a dedicated research pass (see solution-draft "Open Research"). | User confirmed downsampling; matcher choice is the highest-leverage decision and deserves its own research pass. | 2026-04-25 (revised) |
| Q-6 (new ACs) | **Adopt all six** (AC-NEW-1…AC-NEW-6). Each AC expanded with rationale, implementation drivers, and validation method in `acceptance_criteria.md`. AC-NEW-3 amended to exclude raw frames (tiles only, per Q-4 revision). | User asked for the new ACs to be enlisted in detail. | 2026-04-25 (revised) |
| Q-7 (AC-1.2 floor) | **50% hard floor only** (60% stretch dropped). | User decision — single hard floor avoids ambiguity around what passes. | 2026-04-25 (revised) |
| Q-8 (failsafe N) | **N = 3 s** | PX4 default GPS-loss delay is 1 s; our pipeline is heavier and includes VO retries; 3 s rides through one sharp turn (F-J1). | 2026-04-25 |
### Outstanding consequences not auto-resolvable
- **Cross-view matcher selection** is now an explicit deferred research item ("Open Research" in `solution_draft01.md`). Plan step starts with this on the table.
- **Specific nav-camera model** (Q-5) is left to the matcher / resolution research pass to recommend with concrete focal-length / interface justification.
- **Real fixed-wing flight at 1 km AGL with synced IMU** does not exist as a public dataset. Internal Mavic footage is the deployment-domain proxy; AerialVL is the primary public benchmark. Synthesizing IMU from Mavic video is **not pursued** (user judgement: dynamics don't transfer from quad-class to fixed-wing-class).
@@ -0,0 +1,110 @@
# Question Decomposition — AC & Restrictions Assessment
**Mode**: A (Initial Research) — Phase 1 (AC Assessment, BLOCKING)
**Domain**: Onboard GPS-denied UAV navigation via downward-facing camera + satellite reference imagery + VO/IMU on Jetson Orin Nano Super.
**Question type**: Multi-criterion feasibility + technology positioning + benchmark validation. High-novelty intersection (defense-grade UAV CV/SLAM + low-power edge inference + active-conflict region operational constraints), so timeliness is high — prefer 20232026 sources.
## Project context (locked-in user answers)
| # | Item | Value |
|---|------|-------|
| C1 | Fresh research run; ignore deleted prior artifacts | yes |
| C2 | Operational area per mission | 150 km² mission box + 50 km × 1 km corridor; ~10 GB satellite tile cache; persistent across flights |
| C3 | Flight envelope | Fixed-wing, 1 km AGL ceiling, ~60 km/h cruise, up to 8 h endurance, sunny weather, eastern/southern Ukraine |
| C4 | GCS | QGroundControl over MAVLink/MAVSDK |
| C5 | AI camera pose | Only gimbal angle + zoom (no airframe IMU fusion onto AI cam frame) |
| C6 | Latency budget | <400 ms p95 end-to-end; frame skipping allowed under load |
| C7 | IMU dev/test data | Use public UAV datasets — research and recommend |
| C8 | Onboard compute | Jetson Orin Nano Super (67 TOPS sparse INT8 / 33 TOPS dense, 8 GB shared LPDDR5, 25 W TDP) |
| C9 | Output channel | MAVLink GPS_INPUT to flight controller; telemetry to GCS for situational awareness |
## Sub-questions (drives Phase 1 web research)
### A. Position accuracy realism
- A1. Hybrid VO + satellite-anchored geolocalization accuracy on fixed-wing UAVs at ~1 km AGL — what's state-of-the-art (CIRCLE, AnyLoc, UAV-VisLoc benchmark, OpenIBL, AerialVL, GPS-denied papers 20232026)?
- A2. Are AC values "80% within 50 m, 60% within 20 m" achievable with non-stabilized monocular nadir camera + Google Maps tile reference?
- A3. Monocular VO drift rates (m per 100 m travelled) for aerial imagery — feasibility of <100 m cumulative drift between satellite anchors.
- A4. Confidence-score schemes for visual geolocalization (covariance, top-K retrieval similarity, photometric consistency).
### B. Image registration & feature matching
- B1. Registration rate >95% for non-overlapping flight + viewpoint changes — SOTA matchers (LoFTR, LightGlue+SuperPoint, RoMa, OmniGlue, MASt3R, XFeat) on aerial-vs-satellite domain gap.
- B2. Mean Reprojection Error <1.0 px — typical for aerial homography vs full PnP at 1 km AGL?
- B3. Cross-modality matching (off-nadir aerial photo vs ortho satellite tile) — what works in 20242026 literature, what fails?
### C. Resilience — sharp turns, off-nadir, re-localization
- C1. Place recognition / tile retrieval for re-localization after sharp turn (no overlap) — NetVLAD, AnyLoc, CosPlace, EigenPlaces, MixVPR.
- C2. Aerial pose recovery under up to 70° heading change and 350 m position outlier — practical pipelines.
- C3. Multi-segment trajectory stitching (disconnected SLAM sessions) — pose-graph relocalization via global descriptor + RANSAC.
### D. Onboard real-time performance on Jetson Orin Nano Super
- D1. Memory & compute envelope of LightGlue / SuperPoint / LoFTR / RoMa / XFeat at 6200×4100 → typical downsampled resolution; can the matcher + VO run within ~400 ms on Jetson Orin Nano Super (67 TOPS sparse INT8)?
- D2. TensorRT-accelerated implementations available for the 2025-class matchers?
- D3. Hot-cache satellite tile lookup (precomputed descriptors) for ~10 GB tile budget — index size and lookup latency.
- D4. Concurrent VO + tile registration scheduling under 8 GB shared CPU/GPU memory.
- D5. Sustained-load thermal throttle threshold of Jetson Orin Nano Super (25 W mode) and effective duty cycle for 8-hour flight.
### E. Satellite imagery — sourcing, freshness, legality, preprocessing
- E1. Google Maps satellite tile usage in defense / offline UAV context — terms-of-service status; alternatives.
- E2. Sub-meter-resolution providers (Maxar, Airbus Pleiades, Planet SkySat, Capella, ICEYE, Vexcel, Maxar Vivid) — pricing tiers, license for tactical reuse, freshness over Ukraine.
- E3. Free / open alternatives: Sentinel-2 (10 m), USGS, Mapbox, Bing — usable as fallback at 1 km AGL?
- E4. Pre-flight tile preprocessing (descriptor extraction, MBTiles packaging, persistent on-disk cache between flights) — best practice.
- E5. Imagery age — how stale before registration fails for active-conflict regions (Ukraine 2022+ rapid landscape change)?
### F. Camera, optics, sensor model
- F1. 6252×4168 sensor at 1 km AGL — typical GSD per pixel for the implied focal lengths of fixed-down sUAS payloads.
- F2. Camera intrinsics calibration — pre-flight checkerboard vs factory cal vs self-calibration.
- F3. Rolling-shutter compensation for ~3 fps mid-altitude photogrammetry.
### G. MAVLink / MAVSDK / flight controller integration
- G1. MAVLink GPS_INPUT message — fields, supported autopilots (PX4 vs ArduPilot vs Cube), expected rate, rejection criteria.
- G2. MAVSDK on Jetson Orin Nano Super (JetPack 6.x / Ubuntu 22.04) — versions, async IO patterns.
- G3. QGroundControl integration — re-localization request UI / NAMED_VALUE / STATUSTEXT / custom message conventions.
### H. Object localization (AI camera)
- H1. Trigonometric ground point intersection accuracy under unknown airframe attitude (gimbal-angle-only) — error budget analysis at 1 km AGL.
- H2. Flat-terrain assumption error contribution over eastern/southern Ukraine (relief amplitude, riverbanks, urban areas).
- H3. Best-practice for graceful degradation when attitude is missing.
### I. Hardware envelope, power, thermals
- I1. Jetson Orin Nano Super 25 W mode sustained load — 8-hour fixed-wing power budget (battery + solar?), cooling solutions for 25 W onboard.
- I2. Storage: persistent ~10 GB tile cache + flight logs on Jetson — recommended SSD/NVMe.
### J. Failsafe & resilience
- J1. Reasonable failsafe timeout `N` for "no estimate produced" before flight controller falls back to IMU-only — typical practitioner values.
- J2. Companion computer reboot mid-flight — recovery patterns from PX4/ArduPilot field reports.
### K. Public datasets for VO/IMU dev & test
- K1. Aerial UAV datasets with synchronized IMU + downward camera + GPS ground truth — list and assess (UAV-VisLoc, AerialVL, MidAir, EuRoC MAV, NPU Drone, USC, Senseable City Lab, AERIAL-D, GeoText, AmsterTime, VPAir, DenseUAV).
- K2. Are there datasets covering eastern European agriculture / mixed-terrain at altitudes 3001000 m? If not, what's the closest analogue.
### L. Acceptance criteria gaps (potential missing AC)
- L1. Operational temperature, vibration, shock — military/UAV environmental standards (MIL-STD-810, RTCA DO-160 lite).
- L2. Time-to-first-fix on cold-start (boot to first valid GPS_INPUT message).
- L3. Maximum tolerable spoofing detection latency (system promotes its own estimate over flight controller GPS) — security AC.
- L4. Logging / black-box requirement for post-mission forensics.
- L5. Safety AC: false-position rate budget (geolocation off by >X km) — dangerous for waypoint/RTL behavior.
### M. Restriction soundness
- M1. Photo count "up to 3000 per flight" vs "8 hour flight × 3 fps" → 86,400 photos. **Hard contradiction** — needs user resolution.
- M2. Camera "FullHD to 6252×4168" — wide range; processing must accommodate worst case.
- M3. "Eastern/southern Ukraine, mostly sunny" — operational implications: shadow direction, season, vegetation cycle (seasonal mismatch with stale satellite imagery).
## Output
Each sub-question feeds into:
1. `01_source_registry.md` — sources consulted and tier
2. `02_fact_cards.md` — facts with citations
3. The Phase 1 deliverable: `00_ac_assessment.md` (BLOCKING gate)
+103
View File
@@ -0,0 +1,103 @@
# Source Registry — Phase 1 (AC & Restrictions Assessment)
Tier legend: L1 = official spec / standard / reference manual; L2 = peer-reviewed paper or tool from a vendor / SOTA author; L3 = vendor docs, popular OSS repo, expert blog; L4 = forum post, secondary blog.
| ID | Tier | Title | URL | Used for |
|----|------|-------|-----|----------|
| S01 | L2 | Xu et al., *UAV-VisLoc: A Large-scale Dataset for UAV Visual Localization* (arXiv 2405.11936, May 2024) | https://arxiv.org/html/2405.11936v1 | Fixed-wing UAV visual localization benchmark; 405840 m altitudes; 0.3 m/px Google Earth satellite reference |
| S02 | L2 | Xu et al., *Exploring the best way for UAV visual localization under Low-altitude Multi-view Observation Condition: a Benchmark* (AnyVisLoc, arXiv 2503.10692, 2025) | https://arxiv.org/html/2503.10692v1 | SOTA recall@Xm numbers; 74.1% @ 5 m at 30300 m altitude |
| S03 | L2 | He et al., *AerialVL: A Dataset, Baseline and Algorithm Framework for Aerial-Based Visual Localization With Reference Map* (RA-L 2024) | https://ieeexplore.ieee.org/document/10632587 ; https://github.com/hmf21/AerialVL | Fixed-wing aerial VPR + visual alignment + VO benchmark; 70 km of trajectories; FLIR + gimbal + NovAtel GNSS 1.5 m RMS |
| S04 | L2 | Schmidt-Salzmann et al., *Visual Place Recognition for Aerial Imagery: A Survey* (arXiv 2406.00885, 2024) + aero-vloc benchmark | https://arxiv.org/abs/2406.00885 ; https://github.com/prime-slam/aero-vloc | VPR methods (AnyLoc, CosPlace, EigenPlaces, MixVPR, NetVLAD, SALAD, SelaVPR) for aerial domain |
| S05 | L2 | Keetha et al., *AnyLoc: Towards Universal Visual Place Recognition* | https://anyloc.github.io/ ; https://github.com/AnyLoc/AnyLoc | DINOv2 + VLAD VPR, training-free, strong on aerial cross-domain |
| S06 | L2 | Ali-bey et al., *MixVPR: Feature Mixing for Visual Place Recognition* (arXiv 2303.02190) | https://arxiv.org/abs/2303.02190 | Lightweight VPR aggregation, 94.6% R@1 Pitts250k |
| S07 | L2 | Lindenberger et al., *LightGlue: Local Feature Matching at Light Speed* | https://github.com/cvg/LightGlue | Real-time matcher (with SuperPoint) |
| S08 | L2 | Potje et al., *XFeat: Accelerated Features for Lightweight Image Matching* (CVPR 2024) | https://openaccess.thecvf.com/content/CVPR2024/papers/Potje_XFeat_Accelerated_Features_for_Lightweight_Image_Matching_CVPR_2024_paper.pdf ; https://github.com/verlab/accelerated_features | 5× faster than LightGlue, designed for embedded; semi-dense option |
| S09 | L2 | Leroy et al., *Grounding Image Matching in 3D with MASt3R* (ECCV 2024) | https://arxiv.org/abs/2406.09756 | Cross-view 3D-grounded matching; +30% AUC on Map-free |
| S10 | L3 | `fettahyildizz/superpoint_lightglue_tensorrt` (TRT 8.5.2.2, dynamic shapes) | https://github.com/fettahyildizz/superpoint_lightglue_tensorrt | TensorRT-ready SuperPoint+LightGlue C++ |
| S11 | L3 | `yuefanhao/SuperPoint-LightGlue-TensorRT` | https://github.com/yuefanhao/SuperPoint-LightGlue-TensorRT | RTX3080 baseline: SP 0.95 ms + LG 2.54 ms @ 320×240 = 286 FPS |
| S12 | L3 | `qdLMF/LightGlue-with-FlashAttentionV2-TensorRT` (Jetson Orin NX, CUTLASS plugin) | https://github.com/qdLMF/LightGlue-with-FlashAttentionV2-TensorRT | Jetson Orinclass deployment proof |
| S13 | L3 | `fabio-sim/LightGlue-ONNX` (FP8) | https://github.com/fabio-sim/LightGlue-ONNX | ONNX/TRT path for matchers |
| S14 | L1 | NVIDIA — *JetPack 6.2 brings Super Mode to Jetson Orin Nano and Orin NX* | https://developer.nvidia.com/blog/nvidia-jetpack-6-2-brings-super-mode-to-nvidia-jetson-orin-nano-and-jetson-orin-nx-modules/ | Confirms 67 TOPS sparse INT8, 15/25 W/MAXN SUPER modes, 8 GB shared LPDDR5 |
| S15 | L1 | NVIDIA — *Jetson Orin Nano / Orin NX / AGX Orin Power & Performance* | https://docs.nvidia.com/jetson/archives/r35.6.1/DeveloperGuide/SD/PlatformPowerAndPerformance/ | Power-mode specifics, throttling behaviour |
| S16 | L1 | ArduPilot — *MAVProxy GPSInput module* | https://ardupilot.org/mavproxy/docs/modules/GPSInput.html | GPS1_TYPE=14 (MAVLink); GPS_INPUT message fields |
| S17 | L1 | ArduPilot — *MAVProxy GPSInput source* | https://github.com/ArduPilot/MAVProxy/blob/master/MAVProxy/modules/mavproxy_GPSInput.py | Reference impl for GPS_INPUT injection |
| S18 | L3 | mavlink/MAVSDK-Python issue #320*Input external gps through mavsdk* | https://github.com/mavlink/MAVSDK-Python/issues/320 | MAVSDK has no native GPS_INPUT support — must use pymavlink |
| S19 | L1 | PX4 PR #21244, #23366*Add GPS spoofing state* / *EKF2 spoofing GPS check* | https://github.com/PX4/PX4-Autopilot/pull/21244 ; https://github.com/PX4/PX4-Autopilot/pull/23366 | PX4 spoofing flag, ~1 s hysteresis; EKF2 disables GNSS fusion when spoofed |
| S20 | L1 | PX4 PR #23346*EKF2 fix timeout after gps failure* | https://github.com/PX4/PX4-Autopilot/pull/23346 | Dead-reckoning timeout logic |
| S21 | L1 | PX4 issue #23970*COM_POS_FS_DELAY does not take effect* | https://github.com/PX4/PX4-Autopilot/issues/23970 | Failsafe delay parameter behaviour (default 1 s) |
| S22 | L1 | Google — *Map Tiles API Policies* | https://developers.google.com/maps/documentation/tile/policies | Explicit prohibition: "Offline uses … Image analysis, Machine interpretation, Object detection or identification, Geodata extraction or resale" |
| S23 | L1 | Google — *Maps Platform Terms of Service* | https://developers.google.com/maps/terms | Prohibits use "with any products, systems, or applications for … any systems or functions for automatic or autonomous control of vehicle behavior" |
| S24 | L1 | Microsoft — *Bing Maps Terms of Use (April 2024)* | https://www.bingmapsportal.com/terms/TermsApril2024 | Bing tiles cannot be cached/stored offline; tile URLs are not stable |
| S25 | L1 | Maxar/Vantor — *Vivid Mosaic 30 cm Basemaps* | https://maxar.com/precision ; https://developers.maxar.com/docs/ordering/guides/vivid-standard-30 | 30 cm global mosaic (135 M km²), 15 cm urban mosaic (7 M km²), AI change detection refresh; ~$2532/km² archive |
| S26 | L1 | Airbus — *Order Pléiades Neo (30 cm)* | https://space-solutions.airbus.com/imagery/how-to-order-imagery-and-data/how-to-order-pleiades-neo/ | Pléiades Neo 30 cm, OneAtlas tasking; ~€58.50/km² volume tier |
| S27 | L1 | Planet Community — *Commercial imagery pricing* | https://community.planet.com/advanced-analysis-apis-81/commercial-imagery-pricing-4926 | SkySat / PlanetScope pricing tiers |
| S28 | L3 | EOX — *Sentinel-2 cloudless (s2maps.eu)* | https://s2maps.com/ | Free 10 m/px global mosaic; updated annually; CC-BY-NC for non-commercial |
| S29 | L3 | UAV Coach — *GSD calculator* | https://uavcoach.com/gsd-calculator/ | GSD = (alt × sensor_w) / (focal × image_w); validates ~24 cm/px at 1 km AGL with full-frame 24 mm |
| S30 | L2 | Mid-Air dataset (synthetic, quadcopter, IMU + GPS + 420k frames) | https://midair.ulg.ac.be/ | Training-time augmentation candidate (synthetic) |
| S31 | L2 | AgriLiRa4D (LiDAR + 4D radar + IMU, 518 m AGL agriculture) | https://arxiv.org/html/2512.01753v1 | Out of altitude band — only useful for SLAM regression baselines |
| S32 | L2 | Survey & comparison of ORB-SLAM3 / VINS-Fusion / DROID-SLAM / RTAB-Map | https://article.isarpublisher.com/viewArticle/Numerical-Evaluation-and-Comparative-Analysis-of-Visual-Inertial-SLAM-Algorithms-ORB-SLAM3-VINS-Fusion-DROID-SLAM-and-RTAB-Map | VIO drift baselines |
| S33 | L3 | nicholasaleks/Damn-Vulnerable-Drone wiki — *GPS Data Injection* | https://github.com/nicholasaleks/Damn-Vulnerable-Drone/wiki/GPS-Data-Injection | Confirms ArduPilot blends GPS sources by quality; security implications of GPS_INPUT |
| S34 | L1 | QGroundControl — *StatusTextHandler / RequestMessageState API* | https://api.qgroundcontrol.com/master/classStatusTextHandler.html | STATUSTEXT pipeline used for companion-computer comms |
| S35 | L4 | mavlink/qgroundcontrol issue #7599*Display Companion Status on QGC* | https://github.com/mavlink/qgroundcontrol/issues/7599 | Companion-computer status display gap; ONBOARD_COMPUTER_STATUS workflow |
| S36 | L2 | Bian et al., *ViewBridge: Revisiting Cross-View Localization from Image Matching* (arXiv 2508.10716, 2025) | https://arxiv.org/abs/2508.10716 | CVFM benchmark, 32,509 cross-view pairs, BEV projection + similarity refinement |
| S37 | L2 | OrthoLoC (2025) — UAV-to-orthographic 6-DoF localization with AdHoP refinement | (referenced in cross-view SOTA results) | Compatible with any matcher; ↑95% match quality, ↓63% translation error |
| S38 | L3 | LAND INFO — *Satellite imagery pricing* | https://www.landinfo.com/satellite-imagery-pricing.html | Cross-vendor reference pricing (WV-3/4 30 cm pansharpened: $25.5032.50/km² archive vs new) |
| S39 | L2 | Cross-view UAV-satellite matching survey (MDPI Sensors 2024) | https://www.mdpi.com/1424-8220/24/12/3719 | RDS 84.40%, MA@20 83.35% — practical accuracy ceiling for cross-view in mostly-nadir setup |
**Coverage notes**
- Multiple L1/L2 sources for every quantitative AC line (accuracy, MRE, latency, hardware envelope, tile size).
- The Google Maps + Bing Maps offline-prohibition findings have **two L1 sources each** (terms of service + dev-platform AUP).
- The "fixed-wing 1 km AGL with public IMU" gap is a **finding**, not a fixable source — no public dataset matches all four constraints simultaneously.
---
## Mode B (Solution Assessment) sources — appended 2026-04-26
| ID | Tier | Title | URL | Used for |
|----|------|-------|-----|----------|
| S40 | L1 | NVIDIA Jetson AI Lab — *Benchmarks (DINOv2-base-patch14, ViT-base, CLIP-ViT-base)* | https://www.jetson-ai-lab.com/archive/benchmarks.html | Measured Orin Nano Super throughput: DINOv2-base-patch14 = **126 inf/s** (Super), 75 inf/s (original); CLIP-ViT-base/16 = 161 inf/s; ViT-base/16 = 158 inf/s. Real numbers for AnyLoc backbone (W2.a / W9.a). |
| S41 | L1 | ArduPilot — *Non-GPS Position Estimation* (dev docs) | https://ardupilot.org/dev/docs/mavlink-nongps-position-estimation.html | **ODOMETRY is the preferred external-nav method** in ArduPilot (over VISION_POSITION_ESTIMATE and over GPS_INPUT for non-GPS-substitute use). Carries quaternion, velocity, **21-element pos+attitude covariance**, and a `quality` field (-1=failed → 100=best). VISO_QUAL_MIN gates ignored messages. |
| S42 | L1 | ArduPilot PR #19563*VisualOdom: Support ODOMETRY mavlink message* | https://github.com/ArduPilot/ardupilot/pull/19563 | ODOMETRY support landed Dec 2021 for the Plane stack as well as Copter; tested with ModalAI VOXL VIO. |
| S43 | L1 | ArduPilot PR #30080*External nav+gps fix* | https://github.com/ArduPilot/ardupilot/pull/30080 | Active 2025 work on source-switching when running external nav alongside GPS — confirms there are real edge cases when migrating between GPS_INPUT and ODOMETRY mid-flight. Relevant to AC-NEW-2 (spoofing-promotion latency). |
| S44 | L1 | ArduPilot Plane — *MAVLink2 Signing* | https://ardupilot.org/plane/docs/common-MAVLink2-signing.html | Signing is per-link, USB bypasses signing, keys live in FRAM (32-byte secret + timestamp). Configured via Mission Planner. Production-mature in ArduPilot 4.5+ but key-distribution is an operator step. |
| S45 | L3 | mavlink-router issue #436*Stack-based buffer overflow in ConfFile::get_sections* | https://github.com/mavlink-router/mavlink-router/issues/436 | Public, easily-triggered overflow in config-file parsing of mavlink-router. Repo has **no formal security policy / no SECURITY.md**. Direct attack surface for any project that uses mavlink-router on the companion. |
| S46 | L2 | Ali-bey et al., *BoQ: A Place is Worth a Bag of Learnable Queries* (CVPR 2024) | https://arxiv.org/abs/2405.07364 ; https://github.com/amaralibey/bag-of-queries | New VPR SOTA (CVPR 2024); cross-attention over learnable queries; works on CNN + ViT backbones; **outperforms NetVLAD, MixVPR, EigenPlaces** + outperforms two-stage (Patch-NetVLAD, TransVPR, R2Former) at lower cost. DinoV2 results added Nov 2024. |
| S47 | L2 | Izquierdo & Civera, *DINOv2 SALAD: Optimal Transport Aggregation for VPR* (CVPR 2024) | https://serizba.github.io/salad.html ; https://github.com/serizba/salad | DINOv2 + Sinkhorn-based optimal-transport VLAD aggregation; **R@1 75% on MSLS Challenge, 92.2% on MSLS Val, 76% on NordLand**. Already in `aero-vloc` benchmark, so we get an apples-to-apples bench against AnyLoc/MixVPR/EigenPlaces. |
| S48 | L2 | Shen et al., *GIM: Learning Generalizable Image Matcher From Internet Videos* (ICLR 2024 spotlight) | https://arxiv.org/abs/2402.11095 ; https://github.com/xuelunshen/gim ; https://xuelunshen.com/gim | Self-training on 50 h of YouTube videos → **8.418.1% relative zero-shot improvement** over LightGlue / RoMa / DKM / LoFTR baselines. ZEB benchmark (zero-shot evaluation). Same architecture, more general training. |
| S49 | L2 | *AerialExtreMatch: A Benchmark for Extreme-View Image Matching and UAV Localization* | https://openreview.net/forum?id=5a5T3IW2B6 | 1.5 M synthetic image-pair benchmark with **32 difficulty levels** (overlap × scale × pitch). Real-world UAV localization subset. Direct measurement of the failure-mode that worries us most. |
| S50 | L2 | *2chADCNN: Template Matching for Season-Changing UAV Aerial Images and Satellite Imagery* (MDPI Drones 2023) | https://www.mdpi.com/2504-446X/7/9/558 | Two-channel CNN trained for cross-season UAV↔satellite matching. Useful both as season-robustness baseline and as a target for the bench-off (does the SOTA matcher really need season-aware training, or do generic GIM/RoMa already win?). |
| S51 | L2 | TartanAir V2 — photorealistic synthetic SLAM dataset | https://tartanair.org/ ; https://tartanair.org/modalities.html | 65 environments, 12-camera rig, IMU + LiDAR + depth + semantic + flow + event modalities, custom camera models (pinhole / fisheye / equirectangular). Photorealistic (AirSim-based). Higher fidelity than MidAir. |
| S52 | L2 | Kim — *Monocular Visual Odometry for Fixed-Wing Small Unmanned Aircraft Systems* (AFIT thesis #2266) | https://scholar.afit.edu/etd/2266 | SOTA monocular VO (SVO, DSO, ORB-SLAM2) tested on real fixed-wing flights — **all three had significant difficulty maintaining localisation**. Confirms VO-only is not viable; the draft's "VO between satellite anchors" framing is the right answer. |
| S53 | L2 | Quan & Cao, *Visual-Inertial Odometry Using High Flying Altitude Drone Datasets* (MDPI Drones 2023) | https://www.mdpi.com/2504-446X/7/1/36 | High-altitude VIO performance numbers for the 3001000 m AGL band — directly applicable to our 1 km AGL operating band; benchmark baseline for AC-1.3. |
| S54 | L1 | mapproxy issue #196 + maplibre/martin `mbtiles` pool | https://github.com/mapproxy/mapproxy/issues/196 ; https://github.com/maplibre/martin/blob/738c55e9/mbtiles/src/pool.rs | Operational recipe for MBTiles SQLite under concurrent read+write: **WAL mode + connection pool + transaction batching**. Non-WAL MBTiles is the typical reason "MBTiles is slow" complaints exist. |
| S55 | L1 | Python.org — *Free-threaded mode (Python 3.13)* | https://docs.python.org/3.13/howto/free-threading-python.html ; https://py-free-threading.github.io/ | Free-threading is **experimental** in 3.13; has "substantial single-threaded performance hit"; many C extensions don't support it; GIL auto-re-enables on import of non-FT-aware extensions. Not v1-ready. |
| S56 | L2 | Lazarski et al. — *Terrain Analysis in Eastern Ukraine* (Kharkiv-region UAV survey, IEEE 2018) | https://ieeexplore.ieee.org/document/8441556 ; http://www.50northspatial.org/medium-cost-uav-mapping/ | **Eastern-Ukraine relief amplitude ≈ 24 m peak-to-trough** in Kharkiv test areas, with creek + gully (yary) systems. Quantifies the residual error of the flat-Earth ortho assumption (R-Terrain). |
| S57 | L1 | aedelon/mast3r-runtime | https://github.com/aedelon/mast3r-runtime | MASt3R inference runtime: **Jetson Orin support listed as "Planned"**, not implemented. Plus *Speedy MASt3R* paper achieves 91 ms/pair on **A40 GPU** — Jetson Orin Nano Super is roughly 1/30 of A40 throughput, putting MASt3R at ~3 s/pair on our target hardware. |
---
## Mode B Round 2 (component-replacement deep-dive) — appended 2026-04-26
| ID | Tier | Title | URL | Used for |
|----|------|-------|-----|----------|
| S58 | L2 | Yang et al., *LiteSAM: Lightweight and Robust Feature Matching for Satellite and Aerial Imagery* (Remote Sensing 17(19):3349, MDPI, Oct 2025) | https://www.mdpi.com/2072-4292/17/19/3349 ; https://github.com/boyagesmile/LiteSAM | Purpose-built satellite↔aerial matcher. **6.31 M params (2.4× smaller than EfficientLoFTR's 15.05 M); RMSE@30 = 17.86 m on UAV-VisLoc (beats EfficientLoFTR); 61.98 ms / pair on standard GPU; 497.49 ms / pair on Jetson AGX Orin (= 22.9% / 19.8% faster than EfficientLoFTR-optimized).** Components: TAIFormer (token-aggregation transformer with conv token mixer) + MinGRU dynamic sub-pixel refinement. |
| S59 | L1 | leftfield-geospatial/orthority — Python orthorectification toolkit | https://orthority.readthedocs.io/ ; https://github.com/leftfield-geospatial/orthority ; https://pypi.org/project/orthority/ | **Per-image orthorectification** as a Python library (frame + RPC camera models, GeoTIFF DEM, RPC refinement, pan-sharpening). Successor of `dugalh/simple-ortho`. Pip/conda installable; CLI + API. Direct fit for Component 1b's per-frame ortho step (replaces hand-rolled pinhole-on-DEM code). |
| S60 | L2 | Korovko et al. (NVIDIA), *cuVSLAM: CUDA-Accelerated Visual Odometry and Mapping* (arXiv 2506.04359, Jul 2025) | https://arxiv.org/abs/2506.04359 ; https://github.com/nvidia-isaac/cuVSLAM ; https://github.com/NVIDIA-ISAAC-ROS/isaac_ros_visual_slam | NVIDIA's CUDA-accelerated VSLAM, **explicitly optimized for Jetson edge devices**. Modular front-end (Shi-Tomasi GFTT keypoints + LK pyramidal tracking + NCC consistency check) and back-end (sparse bundle adjustment + pose-graph optimization + loop closure). Supports 1 → 32 cameras, monocular + monocular-depth + stereo + multi-stereo, optional IMU. **<1 % ATE on KITTI; <5 cm on EuRoC**, real-time on Orin platforms. Apache-2.0. Drop-in via `isaac_ros_visual_slam` ROS 2 package. |
| S61 | L2 | Liao, *DPVO-QAT++: Heterogeneous QAT and CUDA Kernel Fusion for High-Performance Deep Patch Visual Odometry* (arXiv 2511.12653, Nov 2025) | https://arxiv.org/abs/2511.12653 ; https://arxiv.org/html/2511.12653v1 | Quantization-aware training + CUDA kernel fusion for DPVO front-end (back-end stays FP32). On RTX-4060: **+52% FPS (TartanAir), +30% FPS (EuRoC), 3765 % peak GPU memory**, ATE preserved. Confirms the "deployment gap" framing: **even DPVO-QAT++ is benchmarked on RTX-4060, NOT on Jetson** — Orin Nano Super extrapolation puts plain DPVO at ≈410 FPS (well under our 10 Hz inference target). |
| S62 | L2 | Murai et al. (Imperial / NVIDIA), *MASt3R-SLAM: Real-Time Dense SLAM with 3D Reconstruction Priors* (CVPR 2025) | https://arxiv.org/abs/2412.12392 ; https://github.com/rmurai0610/MASt3R-SLAM ; https://opencv.org/mast3r-slam/ | Dense monocular SLAM built on MASt3R prior. **15 FPS on a single GPU**; outperforms DROID-SLAM on EuRoC + 7-Scenes; calibration-free. **No Jetson port**; given Speedy MASt3R = 91 ms/pair on A40, MASt3R-SLAM on Orin Nano Super is sub-1-Hz → **infeasible for inline v1 use**. Useful as offline ground-truth oracle or future-track candidate. |
| S63 | L2 | Edstedt et al., *RoMa v2: Harder Better Faster Denser Feature Matching* (arXiv 2511.15706, Nov 2025) | https://arxiv.org/abs/2511.15706 ; https://github.com/Parskatt/romav2 | New SOTA dense matcher: frozen DINOv3 backbone + custom CUDA + predictive covariance + decoupled match-then-refine. Best published pose-estimation accuracy. **Compute footprint is GPU-class**; not a candidate for inline Jetson Orin Nano Super inference, but a plausible offline ceiling reference for the Component-3 bench-off. |
| S64 | L1 | NVIDIA Isaac ROS — *Visual SLAM* (Jetson tutorial + reference implementation) | https://nvidia-ai-iot.github.io/jetson_isaac_ros_visual_slam_tutorial/ ; https://github.com/NVIDIA-ISAAC-ROS/isaac_ros_visual_slam ; https://github.com/bandofpv/VSLAM-UAV | **Reference implementation of GPS-denied UAV with cuVSLAM on Jetson Orin Nano + RealSense D435i + MAVROS + PX4** (Hackster.io / bandofpv). Demonstrates the production-deployable path: cuVSLAM publishes ROS 2 pose; MAVROS converts to MAVLink; FC consumes via VISION_POSITION_ESTIMATE / ODOMETRY. ArduPilot variant exists (sidharthmohannair/ros2-ardupilot-sitl-hardware). |
| S65 | L1 | ArduPilot issue #30076*Fixing ExternalNav + GPS* | https://github.com/ArduPilot/ardupilot/issues/30076 | **EKF3 incorrectly fuses GPS data simultaneously when ExtNav is the configured POSXY source** — root cause was a stray `else` branch in `FuseVelPosNED()`. Causes "unstable positions with high variances and reset behavior when position estimates diverge". Documents the **double-fusion-is-not-a-feature** invariant for our hybrid `GPS_INPUT + ODOMETRY` plan. Status: PR landed; pin ArduPilot to a fixed version. |
| S66 | L1 | ArduPilot issue #32506*EKF3 Position Down snaps to ODOMETRY Z value when ExternalNav is not configured as POSZ source* | https://github.com/ArduPilot/ardupilot/issues/32506 | Sister bug to #30076: Z-axis snap-to-ODOMETRY when only POSXY uses ExtNav. Reinforces the "**only one horizontal position source active at a time**" architectural invariant — feeding both GPS_INPUT and ODOMETRY for the same axis is a configuration error, not a feature. Has a direct impact on draft02's M-1 conclusion. |
| S67 | L1 | ArduPilot wiki — *EKF Sources* (`common-ekf-sources.rst`) | https://github.com/ArduPilot/ardupilot_wiki/blob/master/common/source/docs/common-ekf-sources.rst | Authoritative spec for `EK3_SRC1_*` / `EK3_SRC2_*` / `EK3_SRC3_*` and runtime source switching via RC aux or MAVLink. Confirms architectural rule: **only one position source per axis at a time**; ExtNav is option 6. |
| S68 | L1 | PX4 PR #22262*EKF2: Error-State Kalman Filter* | https://github.com/PX4/PX4-Autopilot/pull/22262 | Confirms PX4 EKF2 is an **ESKF** (in contrast to ArduPilot's EKF3 which is a classical extended Kalman filter). Real-hardware PX4 testing: ESKF reduces CPU load by ~0.3 % vs total-state EKF on autopilot. Key takeaway: **ArduPilot users (us) cannot swap the FC filter to ESKF** — the FC-side debate is moot. ESKF only matters for any companion-side filter we choose to add. |
| S69 | L2 | Sola, *Quaternion kinematics for the error-state Kalman filter* (arXiv 1711.02508) + Madgwick / Solà / Forster references | https://arxiv.org/abs/1711.02508 | Canonical ESKF treatment: nominal + error-state decomposition, tangent-space covariance, retraction through `Exp/Log` on SO(3) / SE(3). The standard reference for any companion-side ESKF implementation. |
| S70 | L2 | Yu et al., *T-ESKF: Transformed Error-State Kalman Filter for Consistent Visual-Inertial Navigation* (arXiv 2510.23359, Oct 2025) + *Adaptive Covariance and Quaternion-Focused Hybrid ESKF/UKF for VIO* (arXiv 2512.17505, Dec 2025) | https://arxiv.org/abs/2510.23359 ; https://arxiv.org/abs/2512.17505 | 2025 advances on top of ESKF: T-ESKF restores observability consistency under partial-yaw observability; Hybrid ESKF/UKF gains **+49 % position / +57 % rotation accuracy vs pure ESKF, ~48 % cheaper than full SUKF**. Both are research-track; v1 if we run a companion-side filter at all, vanilla ESKF is enough. |
| S71 | L1 | OpenStreetMap Sensors / VINS-Fusion + OpenVINS Jetson Orin Nano integration reports | https://github.com/HKUST-Aerial-Robotics/VINS-Fusion/issues/220 ; https://github.com/rpng/open_vins/issues/421 ; https://github.com/fdcl-gwu/openvins_jetson_realsense | Field reports: VINS-Fusion runs ~15 FPS on Xavier NX after OpenCV pinning; on Orin Nano builds with JetPack 6 + ROS 2 Humble after fixing OpenCV ArUco/CUDA mismatches. Useful as **comparison baselines for any cuVSLAM bench-off, not as primary candidates** (integration cost dwarfs cuVSLAM's drop-in). |
| S72 | L2 | Quan et al., *Visual-Inertial Odometry Using High Flying Altitude Drone Datasets* (Drones 7(1):36, MDPI 2023) | https://www.mdpi.com/2504-446X/7/1/36 | High-altitude (40100 m) VIO field tests: **stereo-VIO = 2.186 m error over 800 m trajectory; monocular VIO "acceptable but worse than stereo"**. Lower bound on the altitude band; our regime is 1 km AGL where motion-parallax VO degrades further (most VO benchmarks assume non-trivial parallax per frame). Reinforces R8. |
| S73 | L2 | Princeton VL — *Deep Patch Visual SLAM* (DPV-SLAM, ECCV 2024) | https://www.ecva.net/papers/eccv_2024/papers_ECCV/papers/00272.pdf ; https://github.com/iis-esslingen/DPV-SLAM | DPV-SLAM = DPVO + two loop-closure mechanisms. **2.5× faster than DROID-SLAM on EuRoC, 57 GB GPU memory vs DROID's 24 GB**, 1×–4× real-time on real-world datasets. Same Jetson-deployment caveat as DPVO. |
| S74 | L2 | OrthoLoC + AdHoP — UAV-to-orthographic 6-DoF localization with feature-matcher refinement | (referenced in cross-view SOTA results) | Compatible with **any matcher** (drop-in refinement layer): up to **+95 % matching accuracy / 63 % translation error**. Architecturally orthogonal to the matcher choice itself; we can layer this on top of SP+LG / GIM-LG / LiteSAM regardless of which wins the bench-off. |
| S75 | L2 | AerialExtreMatch open-review (1.5 M synthetic pairs, 32 difficulty levels) — methods evaluated table | https://openreview.net/forum?id=5a5T3IW2B6 ; https://github.com/Xecades/AerialExtreMatch | Confirms AerialExtreMatch evaluates **16 representative matchers** (detector-based + detector-free), with publicly-available results. Becomes our primary structured-difficulty regression bench (already in draft02 as F-T5b). |
| S76 | L4 | Stack Overflow / Jetson dev forum — *Orin Nano FP16/INT8 throughput discussion* | https://forums.developer.nvidia.com/t/jetson-orin-nano-fp16-int8-performance/326723 ; https://github.com/ultralytics/ultralytics (YOLO26 Jetson Orin Nano Super benchmark commit 8d4e6e8) | Empirical reference points on Orin Nano Super: **FP16 ≈ 4.5 ms / INT8 ≈ 3.8 ms per YOLO26-n inference**. Useful sanity-check rate: small TRT engines run in single-digit ms; SP+LG / GIM-LG family fits comfortably in our budget. |
| S77 | L2 | thomasthelliez.com — *ROS 2 / Isaac ROS on Jetson Orin Nano Super practical guide* + Hackster.io GPS-Denied Drone reference design | https://thomasthelliez.com/blog/isaac-ros-on-nvidia-jetson-orin-nano-super/ ; https://www.hackster.io/bandofpv/gps-denied-drone-with-nvidia-jetson-orin-nano-9f3417 | **ROS 2 Humble + JetPack 6 + Isaac ROS 3.2 + cuVSLAM + MAVROS** is a working reference architecture on the exact target hardware (Orin Nano Super). Establishes ROS 2 vs DIY Python orchestrator as a real alternative for Component 9. |
+501
View File
@@ -0,0 +1,501 @@
# Fact Cards — Phase 1 (AC & Restrictions Assessment)
Each fact card: statement, source(s), confidence (High / Med / Low), audience.
---
## A — Position accuracy state of the art
**F-A1**. State-of-the-art UAV cross-view visual localization (drone image vs. ortho satellite map) at low altitude (30300 m, multi-view, oblique allowed) achieves **74.1% recall@5 m** on the AnyVisLoc benchmark (best combined retrieval + matching + PnP).
- Source: S02 (AnyVisLoc paper, 2025).
- Confidence: High. Audience: implementer / decision-maker.
**F-A2**. **Cross-view image matching benchmarks** report Relative Distance Score (RDS) up to 84.40% and **MA@20 (matched within 20 m) up to 83.35%** in nadir-favoring setups — i.e., 80%+ within 20 m is achievable with current methods on similar reference data.
- Source: S39.
- Confidence: Med. Audience: implementer / decision-maker.
**F-A3**. The most relevant **fixed-wing aerial public benchmark** is UAV-VisLoc (6,742 drone images, fixed-wing & multi-rotor, **altitudes 405840 m**, ortho satellite reference at **0.3 m/px** from Google Earth, 11 sites in China incl. cities/towns/farms/rivers/hills/forests).
- Source: S01.
- Confidence: High. Audience: implementer.
**F-A4**. **AerialVL** (RA-L 2024) is a fixed-wing UAV dataset with 11 sequences / ~70 km of trajectory, RGB camera with **gimbal**, NovAtel GNSS at **1.5 m RMS** ground truth, and reference satellite map. Provides VPR + visual alignment + VO baselines.
- Source: S03.
- Confidence: High. Audience: implementer.
**F-A5**. The **viewpoint discrepancy** (oblique aerial vs. nadir satellite) and **temporal staleness** (seasonal / construction change) are the two dominant accuracy degraders cited across cross-view localization literature. ViewBridge (2025), OrthoLoC (2025), and AnyVisLoc all emphasise BEV projection or 3D-grounded matching as mitigation.
- Source: S02, S36, S37.
- Confidence: High. Audience: technical expert.
**F-A6**. Confidence-score schemes used by mature visual localization stacks: (a) RANSAC inlier ratio after PnP/homography; (b) reprojection error variance; (c) top-K retrieval similarity gap; (d) 6-DoF pose covariance from EKF/factor-graph optimization; (e) photometric consistency vs. tile.
- Source: S03, S04, S32 (and ORB-SLAM3 lit).
- Confidence: High. Audience: implementer.
---
## B — Image registration & feature matching
**F-B1**. **SuperPoint + LightGlue** with TensorRT runs at ~286 FPS on RTX 3080 at 320×240. SuperPoint ≈ 0.95 ms, LightGlue ≈ 2.54 ms per pair on RTX 3080.
- Source: S11.
- Confidence: High. Audience: implementer.
**F-B2**. **Jetson Orin NX (sibling SoC)** has a working LightGlue+TensorRT deployment (CUTLASS FlashAttention V2 plugin, qdLMF repo) — confirms feasibility on Jetson Orinclass hardware. No public public-released benchmark for Jetson Orin Nano Super specifically.
- Source: S12.
- Confidence: Med. Audience: implementer.
**F-B3**. **XFeat** (CVPR 2024) is **5× faster** than LightGlue / SuperPoint while maintaining comparable accuracy; runs in real-time on a budget CPU (i5-1135G7); offers semi-dense matching mode; C++ + CUDA 12.2 implementations available.
- Source: S08.
- Confidence: High. Audience: implementer.
**F-B4**. **MASt3R** (ECCV 2024) achieves +30% absolute VCRE AUC on Map-free localization vs. prior SOTA — valuable for cross-view UAV/satellite due to its 3D-grounded matching, but is **heavier** (transformer with depth backbone) than LightGlue/XFeat — may exceed Jetson Orin Nano Super 8 GB envelope under the user's latency budget without aggressive distillation/quantization.
- Source: S09.
- Confidence: Med. Audience: technical expert.
**F-B5**. **Mean Reprojection Error <1 px** is a tight but achievable target for *homography-fit* on overlapping aerial pairs; for full-PnP across UAVsatellite the typical achieved MRE is 13 px on cross-view benchmarks (heavily dependent on pixel scale ratio between drone and satellite).
- Sources: S01 (UAV-VisLoc), S03 (AerialVL), S36 (ViewBridge).
- Confidence: Med. Audience: technical expert.
---
## C — Resilience & re-localization
**F-C1**. **Aerial VPR survey + aero-vloc benchmark** (2024) provides a unified evaluation framework over AnyLoc, CosPlace, EigenPlaces, MixVPR, NetVLAD, SALAD, SelaVPR with re-ranking via LightGlue/SuperGlue. Datasets used: VPAir, ALTO, MARS-LVIG.
- Source: S04.
- Confidence: High. Audience: implementer.
**F-C2**. **AnyLoc** (DINOv2 + unsupervised VLAD) achieves up to 4× higher Recall@1 than environment-specialised approaches across urban / aerial / underwater / subterranean **without training**. Strong default for cross-view re-localization when training data is limited.
- Source: S05.
- Confidence: High. Audience: implementer / decision-maker.
**F-C3**. **MixVPR**: 94.6% R@1 on Pitts250k with <50% the parameter count of NetVLAD — best lightweight VPR aggregation in 20232024.
- Source: S06.
- Confidence: High. Audience: implementer.
**F-C4**. **Tile-zoom / overlap selection** when constructing the satellite reference map is a **critical** parameter for VPR efficiency and accuracy in aerial domain (per the 2024 survey).
- Source: S04.
- Confidence: High. Audience: implementer.
---
## D — Onboard real-time performance on Jetson Orin Nano Super
**F-D1**. **Jetson Orin Nano Super** (with JetPack 6.2 "Super Mode"): **67 TOPS sparse INT8** AI performance, 8 GB shared LPDDR5, supports **15 W / 25 W / MAXN SUPER** power modes. The 25 W mode is the new "reference" performance mode.
- Source: S14, S15.
- Confidence: High. Audience: implementer / decision-maker.
**F-D2**. **Sustained-load thermal throttling** is real on Jetson family — earlier-gen Xavier NX (21 TOPS) throttled within 5 minutes at 640×480 YOLOv8n. Orin Nano Super is reportedly more thermally efficient but **8-hour sustained 25 W requires forced-air cooling and possibly active heatsink** — not solvable purely in software.
- Source: S14, S15 + practitioner test S14.
- Confidence: Med. Audience: implementer / decision-maker.
**F-D3**. **MAXN SUPER** is uncapped; if power exceeds TDP the module auto-throttles. For sustained 8 h flight on a fixed-wing UAV with ~25 W power budget, **the system MUST be sized to fit the 25 W envelope at 100% duty**, not MAXN.
- Source: S14.
- Confidence: High. Audience: implementer.
**F-D4**. Naive scaling from RTX 3080 → Orin Nano Super for SuperPoint+LightGlue gives ~3040× slower (RTX 3080 ≈ 30 TFLOPS FP16, Orin Nano Super ≈ 1 TFLOPS FP16 scope). At 320×240 ≈ 3.5 ms × 35 ≈ **~120 ms/pair on Jetson Orin Nano Super**. Pre-running matching on a downsampled image (e.g., 1024×683 from 6200×4100) is feasible within the **400 ms p95 budget** when combined with feature caching for the satellite tile.
- Source: derived from S11, S14 (back-of-envelope; needs empirical confirmation in Phase 2).
- Confidence: Low. Audience: technical expert.
- **Action**: Empirical benchmark on actual Jetson Orin Nano Super in implementation phase.
---
## E — Satellite imagery sourcing & legality
**F-E1**. **Google Maps / Map Tiles API** explicitly prohibits offline use, image analysis, machine interpretation, object detection, geodata extraction, and "any systems or functions for automatic or autonomous control of vehicle behavior". **Use of Google Maps satellite tiles for an offline UAV navigation system violates the Terms of Service.**
- Sources: S22 (Map Tiles API Policies), S23 (Maps Platform ToS).
- Confidence: **High** (two L1 sources, explicit language). Audience: decision-maker / legal.
- **Severity**: Hard blocker — must be resolved before solution design.
**F-E2**. **Bing Maps** also prohibits creating local copies / offline storage of tiles. Tile URLs are not stable; the supported access pattern is dynamic REST queries per session. **Bing tiles are not a viable offline reference source either.**
- Source: S24.
- Confidence: High. Audience: decision-maker.
**F-E3**. **Maxar Vivid Mosaic** offers a **30 cm global basemap** (135 M km², ex-Antarctica) and a **15 cm urban basemap** (7 M km²), **continuously refreshed** with AI-driven change detection. Pricing for archive imagery is approximately **$2532 / km²** for similar 30 cm products. **Licensing for offline tactical use must be negotiated explicitly with Maxar (Vantor)** — this is the standard path for defense customers.
- Sources: S25, S38.
- Confidence: High. Audience: decision-maker.
**F-E4**. **Airbus Pléiades Neo** provides 30 cm via OneAtlas; volume pricing approximately **€58.50 / km²** on a 6-month sliding window. Direct competitor to Maxar at sub-meter resolution.
- Sources: S26, S27.
- Confidence: High. Audience: decision-maker.
**F-E5**. **Sentinel-2 cloudless** (EOX) provides a **free** global mosaic but at **10 m/px** — well below the AC requirement of 0.5 m/px (ideally 0.3 m/px). At **1 km AGL** Sentinel-2 is too coarse to achieve registration with a 24 cm/px drone image without massive scale-bridging losses.
- Source: S28 + S01 (drone GSD).
- Confidence: High. Audience: implementer.
**F-E6**. For **Eastern/Southern Ukraine** specifically, Sentinel-2 / Sentinel-1 are heavily used in 2022+ academic literature for damage / change detection. **Maxar and Planet are the de-facto sources for sub-meter imagery** of Ukraine. Recent satellite imagery for this region is operationally sensitive but commercially available.
- Source: S28 (Ukraine 20242025 references in the EOX/Sentinel papers).
- Confidence: High. Audience: decision-maker.
**F-E7**. **Active-conflict-region staleness** is a real risk: dam destruction (Kakhovka), urban damage, cratering, road realignment, smoke/dust — all can defeat cross-view matching against pre-conflict imagery. **Imagery freshness budget should be tightened from "<2 years" to "<6 months for active sectors, <12 months for stable rear areas"** — to be confirmed with operations.
- Source: S28 + extrapolation from change-detection literature for Ukraine.
- Confidence: Med. Audience: decision-maker.
---
## F — Camera & GSD
**F-F1**. **GSD formula**: GSD (cm/px) = (Altitude_m × 100 × Sensor_w_mm) / (Focal_mm × Image_w_px).
For a typical full-frame sensor (36 mm wide) with 24 mm wide-angle lens at **1 km AGL** and **6200 px** wide image: GSD ≈ **24 cm/px**, frame footprint ≈ **1.49 km × 0.99 km**. Drone images at 0.10.2 m/px (per UAV-VisLoc) is consistent.
- Source: S29.
- Confidence: High. Audience: implementer.
**F-F2**. **Camera intrinsics calibration is mandatory** — without known focal length, principal point, and distortion, sub-pixel MRE is impossible. Pre-flight checkerboard calibration is the standard; some payloads use factory-cal + temperature compensation.
- Source: photogrammetry consensus (S01, S03, S29).
- Confidence: High. Audience: implementer.
---
## G — MAVLink / MAVSDK / flight controller integration
**F-G1**. **GPS_INPUT** is a standard MAVLink message. **ArduPilot**: set `GPS1_TYPE=14` (MAVLink) and the autopilot will accept GPS_INPUT as the primary GPS. **PX4**: native GPS_INPUT support is limited; the standard workaround is to publish via VISION_POSITION_ESTIMATE through the EKF2 vision-pose pipeline.
- Sources: S16, S17, S18.
- Confidence: High. Audience: implementer / decision-maker.
**F-G2**. **MAVSDK-Python does NOT natively support GPS_INPUT** (open issue #320). For Python implementations, **pymavlink** must be used to emit raw GPS_INPUT messages.
- Source: S18.
- Confidence: High. Audience: implementer.
**F-G3**. ArduPilot can **blend or switch between GPS sources** by quality (sat count, HDOP). If the legitimate (jammed) GPS keeps reporting plausible values while the spoofed/denied state is intermittent, the autopilot may oscillate between sources. **The companion computer must explicitly disable / lower-quality the real GPS** (or the autopilot must be configured to *only* trust GPS_INPUT) to avoid this.
- Source: S33.
- Confidence: High. Audience: implementer / security architect.
**F-G4**. **PX4 has GPS spoofing detection** baked into the EKF2 driver chain (u-blox spoof flag, ~1 s hysteresis, GNSS-fusion auto-disable on consistent spoof signal). This is a useful upstream signal for the GPS-Denied system to know "you are now the primary source".
- Sources: S19, S20.
- Confidence: High. Audience: implementer / security architect.
**F-G5**. **PX4 failsafe delay** `COM_POS_FS_DELAY` defaults to **1 s**; `EKF2_NOAID_TOUT` controls dead-reckoning validity. Documented bugs exist (#23970) — version pinning matters.
- Source: S21.
- Confidence: Med. Audience: implementer.
**F-G6**. **QGroundControl** has only **STATUSTEXT** (string) as a first-class companion-computer message channel; ONBOARD_COMPUTER_STATUS (planned) and custom MAVLink messages (NAMED_VALUE_FLOAT/INT, custom dialect) are practical channels for re-localization request UI / confidence scores.
- Sources: S34, S35.
- Confidence: High. Audience: implementer.
---
## H — Object localization (AI camera, gimbal-only pose)
**F-H1**. **Trigonometric ground projection error** with **gimbal-angle-only** (no airframe IMU attitude fusion onto AI cam) is dominated by the **unknown UAV roll/pitch** at the moment of capture. For a fixed-wing UAV, typical roll/pitch in straight cruise is ±2°; in turns up to ±25°. At **1 km AGL**, a 5° unknown attitude → ~87 m ground-position error. **The AC "object localization accuracy is consistent with frame-center accuracy" is therefore unrealistic without attitude fusion in turning flight.**
- Source: derived from F-F1 + standard photogrammetry trig.
- Confidence: High. Audience: technical expert / decision-maker.
- **Action**: revise AC to "consistent with frame-center accuracy in level flight; expect ±h·tan(unknown_attitude) in turns" OR add attitude fusion onto AI cam.
**F-H2**. **Flat-terrain assumption** is reasonable for eastern/southern Ukraine (typical relief amplitude ~50150 m over 10 km). At 1 km AGL with up to 5° gimbal off-nadir, terrain-induced ground-projection error from flat-terrain assumption is typically <30 m for level flight — within the AC envelope. Riverbanks, tall buildings, and reservoir scarps are local exceptions.
- Source: derived from S26 + S28 + Ukraine relief data.
- Confidence: Med. Audience: technical expert.
---
## I — Hardware envelope & power
**F-I1**. Jetson Orin Nano Super in 25 W mode: ~25 W average; with cooling adequately sized for 8-hour duty, sustained throttling can be avoided. Without active cooling, expect throttling within minutes (Xavier NX precedent).
- Source: S14, S15.
- Confidence: Med. Audience: implementer.
**F-I2**. **Storage budget**: User's "~10 GB" estimate for a 400 km² @ 0.3 m/px tile cache is **correct** (400 km² × 11 px²/m² with 3-byte JPEG ≈ 1013 GB). Persistent cache across flights is feasible with a small NVMe (≥64 GB).
- Source: arithmetic; cross-checked S25 (Vivid pricing per km²).
- Confidence: High. Audience: implementer / decision-maker.
---
## J — Failsafe & resilience
**F-J1**. PX4's own GPS-loss failsafe defaults to ~1 s delay. A reasonable upstream **"system fails to produce an estimate" failsafe `N`** for the GPS-Denied system is **35 seconds** — long enough to ride out one sharp turn / re-localization attempt without flapping, short enough to let the flight controller switch to IMU dead reckoning before drift exceeds tens of metres.
- Source: S21 + practitioner heuristic.
- Confidence: Med. Audience: implementer / decision-maker.
---
## K — Public datasets for IMU / aerial dev & test
**F-K1**. **No public dataset perfectly matches all four constraints**: fixed-wing + ~1 km AGL + downward-facing + synchronized IMU + GPS truth. **Closest match is AerialVL** (fixed-wing + gimbal RGB + GNSS, ~70 km of tracks, 11 sequences, RA-L 2024). Altitude band for AerialVL is "different altitudes" (not always 1 km).
- Source: S03.
- Confidence: High. Audience: implementer.
**F-K2**. **UAV-VisLoc** is the largest fixed-wing **drone-vs-satellite localization** dataset (6,742 images, 405840 m altitudes, 0.3 m/px Google Earth reference) — but it does not provide synchronized IMU.
- Source: S01.
- Confidence: High. Audience: implementer.
**F-K3**. **MidAir** (synthetic, quadcopter) provides full IMU + GPS + depth + semantic at low altitude. Good for **training-time augmentation** but not real-world testing for fixed-wing at 1 km AGL.
- Source: S30.
- Confidence: High. Audience: implementer.
**F-K4**. **Recommended dev/test stack**: AerialVL (primary real-world fixed-wing) + UAV-VisLoc (visual-localization-only validation at 1 kmneighborhood altitude) + MidAir (synthetic IMU augmentation) + the user's own 65 input-data photos for sanity / regression. Real IMU from a dedicated test flight should still be planned for system V&V.
- Source: synthesis of S01, S03, S30.
- Confidence: High. Audience: decision-maker.
---
## L & M — Restriction & AC gaps / contradictions
**F-LM1**. **Restriction "up to 3000 photos per flight"** is **inconsistent** with the stated 8-hour endurance × 3 fps = **86,400 photos** and with the 500 ms minimum interval × 8 h = 57,600 photos. Likely interpretations:
(a) On-disk **retention** budget (sub-sample for storage).
(b) Imagery for an *individual mission segment* (~17 min × 3 fps = 3,000), not the full sortie.
(c) A stale value carried over from a Mavic 3 attempt that should be updated.
- **Hard contradiction**: needs user resolution before solution sizing.
**F-LM2**. **Camera resolution range "FullHD to 6252×4168"** is wide (~13× pixel-count delta). Per-frame pipeline cost scales with resolution; AC compliance is camera-dependent. Need to lock the **target camera spec** for AC validation.
**F-LM3**. **Latency 400 ms vs. cycle 333 ms (3 fps)**: the user has confirmed `<400 ms p95` with skip-allowed. This is **internally consistent**; the AC should be re-stated as "p95 latency <400 ms; up to ~10% of frames may be dropped under sustained load" to remove the apparent contradiction with frame rate.
**F-LM4**. **Suggested missing AC** (gap analysis):
- **L2** — Time-to-first-fix on cold start / mid-flight reboot (e.g., <30 s after IMU-extrapolated init).
- **L3** — Spoofing-promotion latency (system asserts its estimate over flight controller GPS within X seconds of denial).
- **L4** — Flight-data-recorder requirement (all photos + estimates + confidence + IMU traces at full rate, retained in non-volatile storage with a budgeted size cap).
- **L5** — False-position safety budget (e.g., probability of an estimate >500 m from truth must be <0.1% per flight).
- **L6** — Operational temperature / vibration envelope (MIL-STD-810 lite or RTCA DO-160G low-altitude variant).
- **L7** — Imagery freshness operationally enforced (e.g., reject tiles older than 12 months for active sectors).
**F-LM5**. **Restriction "Google Maps allowed"** is **legally not allowed** per F-E1/E2. The project must change source to a license-cleared provider (Maxar Vivid / Airbus Pléiades / commissioned tasking / government feed) before deployment. **This is a blocker, not a tweak.**
---
## Mode B Findings — adversarial assessment of `solution_draft01.md` (2026-04-26)
**M-1 (Component 6 / AC-4.3) — ODOMETRY is ArduPilot's preferred external-nav channel, not GPS_INPUT.** ArduPilot's own dev docs (S41) call **ODOMETRY "the preferred method"** for sending external position estimates to EKF3, ahead of both VISION_POSITION_ESTIMATE and GPS_INPUT. ODOMETRY carries quaternion + 3-D linear velocity + a **21-element pos+attitude covariance** (incl. native yaw error) + a `quality` field (-1=failed, 0=unset, 1..100). VISO_QUAL_MIN gates ignored messages on the FC side. GPS_INPUT collapses our 6-DoF covariance into a scalar `h_acc` / `v_acc`, which directly under-reports our yaw covariance and under-utilises the FC's EKF3. The draft's GPS_INPUT-only choice is sub-optimal for AC-NEW-4 (false-position safety) covariance fidelity.
- Source: S41, S42, S43.
- Confidence: ✅ High.
**M-2 (Component 3) — MASt3R is not viable as primary on Orin Nano Super at 25 W.** `mast3r-runtime` (S57) lists Jetson Orin support as **"Planned"**, not implemented. *Speedy MASt3R* (S57 paper-side) achieves 91 ms / pair on an **A40 GPU**, which is roughly **30× the throughput** of a Jetson Orin Nano Super in 25 W mode → MASt3R extrapolates to **~2.53 s / pair** on our target hardware without aggressive distillation/INT8 work that nobody has published yet. Drop MASt3R from the matcher *primary* shortlist; keep it only as a long-horizon research target.
- Source: S57.
- Confidence: ✅ High.
**M-3 (Component 3) — Add GIM (ICLR 2024 spotlight) to the bench-off shortlist.** GIM (S48) is a self-training framework that takes existing matchers (LightGlue, RoMa, DKM, LoFTR) and re-trains them on 50 h of internet videos for **8.418.1 % zero-shot improvement**. The "generalist trained on diverse video" framing is the closest published proxy for our domain transfer (eastern-Ukraine 1 km AGL nadir vs. service satellite tiles). GIM-LightGlue should be included alongside vanilla LightGlue.
- Source: S48.
- Confidence: ✅ High.
**M-4 (Component 2) — Add SALAD (DINOv2 + Sinkhorn-VLAD) and BoQ to the VPR shortlist.** Two CVPR 2024 papers landed after the draft's "AnyLoc primary + MixVPR fast-lane" decision was made:
- **DINOv2 SALAD** (S47) — DINOv2 backbone + optimal-transport Sinkhorn aggregator with a "dustbin" cluster for non-informative features. R@1 = **75.0 %** on MSLS Challenge, **92.2 %** on MSLS Val, **76.0 %** on NordLand. Already a supported method in `aero-vloc` (S04), so direct apples-to-apples bench against AnyLoc/MixVPR.
- **BoQ** (S46) — bag of learnable queries with cross-attention; **outperforms NetVLAD, MixVPR, EigenPlaces** on 14 large-scale benchmarks; surpasses two-stage methods (Patch-NetVLAD, TransVPR, R2Former) at lower cost; DinoV2 results published Nov 2024.
AnyLoc is no longer the only DINOv2-based VPR option in the cross-domain regime; the bench-off must include all four.
- Source: S46, S47.
- Confidence: ✅ High.
**M-5 (Component 2 / 9 / latency) — DINOv2-base latency on Orin Nano Super is ~10× better than the draft assumed.** Jetson AI Lab measurements (S40): **DINOv2-base-patch14 = 126 inferences/sec on Orin Nano Super** (~8 ms/inf at 224×224), 75 inf/s on the original Orin Nano (~13 ms/inf). The draft estimated 5080 ms / 224×224. The latency budget therefore has substantially more headroom than the draft assumed — **but only at 224×224**; at higher input resolution, expect ~quadratic scaling (so 448×448 ≈ 32 ms/inf is still very comfortable inside the 400 ms p95 budget). This is a **good news** finding that simplifies AC-4.1.
- Source: S40.
- Confidence: ✅ High (NVIDIA L1 source; precision implied FP16 from JetPack 6.2 default trtexec).
**M-6 (Component 6 / Security) — `mavlink-router` is itself attack surface.** Issue #436 (S45): public, easily-triggered, fuzzing-discovered **stack-based buffer overflow** in `ConfFile::get_sections` (memcpy of user-controlled section names into a 100-byte fixed buffer with no bounds check, plus an OOB write on null-terminator append). The repo has **no formal security policy / no SECURITY.md**. The draft's "share the MAVLink endpoint via a single mavlink-router instance" recipe drops a known-vulnerable C++ daemon onto a flight-critical companion. Mitigation options:
1. Pin to a fixed-and-audited tag, harden the systemd unit (NoNewPrivileges, ReadOnlyPaths, sandbox), and config-file-validate before launch.
2. Replace mavlink-router with a tiny in-process MAVLink endpoint multiplexer (Python or Go; this is ~150 lines of code given the only consumers are MAVSDK + pymavlink + mavlink-router-replacement → FC).
3. Use distinct system-IDs for MAVSDK and pymavlink and let ArduPilot's native MAVLink routing (S35-class) do the muxing on the FC side.
- Source: S45.
- Confidence: ✅ High.
**M-7 (Component 6 / Security) — MAVLink2 signing is a v1-mandatory configuration item, not "recommended".** S44: signing is per-link, **USB bypasses signing**, keys live in FRAM (32-byte secret + timestamp), configured via Mission Planner (or the MAVProxy `signing` module). It works in ArduPilot 4.5+, but key provisioning is a **per-airframe operator step** that needs a documented procedure. Given that GPS_INPUT (or ODOMETRY) is a high-trust local channel feeding the flight-critical EKF, a signed MAVLink link companion↔FC is the only defence against an attacker who gains serial access. The draft mentions signing under "Security note (deferred to a Phase-4 security pass)" — Mode B promotes it to v1-required.
- Source: S44.
- Confidence: ✅ High.
**M-8 (Component 1 / Tile Cache) — MBTiles SQLite under our concurrent read+write workload needs WAL + connection pool + transaction batching.** S54: the canonical `mbtiles` SQLite failure modes are (a) `database is locked` errors when concurrent writers compete with readers (default rollback journal is single-writer), (b) per-tile commit overhead crippling throughput on non-SSD. Recipe:
- `PRAGMA journal_mode = WAL` (mandatory for mixed read+write).
- Connection pool (cf. `MbtilesPool` from maplibre/martin S54) — multiple read connections + one write connection.
- Transaction batching: bulk insert per N tiles per Component-1b cycle, not per tile.
- Disable per-INSERT commit; rely on transaction boundary.
The draft's tile-cache section says "MBTiles SQLite + per-tile metadata" but doesn't specify these. Add as a hard implementation note.
- Source: S54.
- Confidence: ✅ High.
**M-9 (Component 1b / Tile Dedup — *new safety risk*) — onboard tile overwrites can poison the cache.** The draft's dedup rule:
> If cache has a tile and the cache tile's `source ∈ {service}` AND the cache tile's `capture_date` is older than AC-8.2 freshness threshold AND our quality score > existing → **write** (overwrites with `source = onboard`).
The risk: a confidently-bad onboard pose (over-confident EKF covariance escapes the σ_xy ≤ 10 m gate) writes a tile that's misaligned by, say, 3050 m, but with high inlier count. Next flight, that misaligned tile becomes the satellite anchor for *another* fix → error compounds across flights. **This is a feedback-loop safety hazard that AC-NEW-4 (false-position budget) does not currently capture**, because Monte-Carlo over a single flight doesn't model the cross-flight cache-poisoning amplification.
Mitigations (any of, ideally all):
1. **Service-source tiles are immutable within freshness budget.** Onboard tiles overwrite only stale or other-onboard tiles, never a fresh service tile.
2. **Voting layer at the Service ingest.** An onboard tile gets promoted to "trusted basemap" only after **N≥2 independent flights** confirm consistent geo-alignment within X m of each other.
3. **Quality score includes parent-pose covariance as a hard gate**, not just inlier count: a tile written from σ_xy > 5 m (tighter than the 10 m generation gate) is marked as "soft" and flagged in the sidecar.
4. **An additional AC**: "AC-NEW-7 — cache-poisoning safety" — see proposed addition in `solution_draft02.md`.
- Source: derived analytical finding (no single L1/L2 — this is a design-level hazard exposed by Mode B reasoning).
- Confidence: ⚠️ Medium (hazard is real and well-known in cartography/SfM; specific mitigation choice is empirical).
**M-10 (Component 9 / Process topology) — Free-threaded Python 3.13 is not v1-ready.** S55: free-threading is **experimental**, has a "substantial single-threaded performance hit", many C extensions don't yet support it, and the GIL **auto-re-enables on import of any non-FT-aware extension** (which would silently include numba, possibly TensorRT bindings, possibly older pymavlink). The draft's choice (single asyncio Python process + TRT subprocess workers + numba on hot path) is correct for v1 — but the rationale should be sharpened from "GIL is a risk we mitigate" to **"free-threaded Python is not yet a substitute; revisit in v1.1 once NumPy/SciPy/numba/TRT bindings stabilise on PEP 703."**
- Source: S55.
- Confidence: ✅ High.
**M-11 (Component 5 / W4.a) — ODOMETRY for fixed-wing in ArduPilot has known production gotchas.** S42 confirms ODOMETRY landed Dec 2021; S43 (PR #30080, "External nav+gps fix", merged 2025) shows ongoing work on the source-switching path when running external-nav alongside GPS. Practitioner-reported issues from S41/S42 discussion:
- velocity errors when companion-computer-derived velocity is fed into EKF3,
- position-estimate resets when external-nav loses reference,
- conflicts when running external-nav alongside GPS.
This is directly relevant to AC-NEW-2 (3 s spoofing-promotion latency) — the source switch is exactly the path that has known bugs. Mode B's recommended hybrid (GPS_INPUT primary + ODOMETRY when full covariance is available) needs SITL coverage of source-switching scenarios as a hard prerequisite, not a v1.1 follow-up.
- Source: S41, S42, S43.
- Confidence: ✅ High.
**M-12 (Component 1b / R-Terrain) — Eastern-Ukraine relief amplitude breaks the "flat enough" assumption near edges.** S56: Kharkiv-region UAV survey reports **~24 m peak-to-trough relief** between low and high points in test areas, with creek + gully (yary/balky) systems. At 1 km AGL with a 35° HFOV camera, a 24 m elevation deviation at the frame edge produces ~17 m horizontal misalignment when projected via the flat-Earth assumption. That's **inside AC-1.1** (50 m@80%) but **eats into AC-1.2** (20 m@50%, hard-floor variant). Recommended addition: a per-sector DEM lookup (one-time pre-flight) that classifies sectors as "flat" (≤5 m amplitude), "moderate" (515 m), "rugged" (>15 m). The system uses tile-anchor weight-decay or skips ortho-tile generation in rugged sectors.
- Source: S56.
- Confidence: ⚠️ Medium (S56 is one regional survey; relief varies across the operational area).
**M-13 (Datasets) — TartanAir V2 is a stronger synthetic baseline than MidAir; flag for user reconsideration.** S51: TartanAir V2 is photo-realistic (AirSim) with **native IMU + 12-cam rigs + 65 environments + season/weather variation + custom camera models**. The draft drops synthetic IMU per user instruction (AC-NEW-4 validation rewritten in solution_draft01). User's stated reason: Mavic-class dynamics ≠ fixed-wing dynamics. TartanAir V2 lets us **configure motion patterns**, so the dynamics-mismatch argument is weaker for TartanAir than for MidAir. **This is a real choice for the user**: either keep "real-data only" purism, or add TartanAir V2 as an early-bench-off-only baseline. Surface to user as an open question, not a unilateral change.
- Source: S51.
- Confidence: ⚠️ Medium (technical viability is high; product/operator preference is the user's call).
**M-14 (Component 3 / W1.c) — Add AerialExtreMatch and 2chADCNN to the matcher V&V plan for season/viewpoint robustness.** Two underweighted benchmarks:
- **AerialExtreMatch** (S49): 1.5 M synthetic image pairs with **32 difficulty levels** crossing overlap × scale × pitch — exact failure-mode profile for our 1 km AGL operational regime. Real-world UAV localization subset for end-to-end validation.
- **2chADCNN** (S50): season-aware UAV↔satellite template-matching reference. Either include as bench-off candidate (vs. generic GIM/RoMa), or as a season-robustness *benchmark* the bench-off candidates run against.
- Source: S49, S50.
- Confidence: ✅ High.
**M-15 (Component 4) — Real fixed-wing monocular VO is harder than the draft implies.** S52: SVO, DSO, ORB-SLAM2 all "had significant difficulty maintaining localisation" on real fixed-wing flights at altitude. S53: high-altitude (3001000 m AGL) VIO publishes drift numbers in the same band as our AC-1.3. Conclusion: the draft's choice ("custom 2-frame homography VO using the Component-3 matcher") is **right** for our framing (VO between satellite anchors, not standalone metric SLAM), but the AC-1.3 drift budget (<100 m without IMU, <50 m with IMU) needs validation against real fixed-wing footage — *not* Mavic-class footage — before lock.
- Source: S52, S53.
- Confidence: ✅ High.
---
## Mode B Findings — second adversarial pass (user-driven, 2026-04-26)
**M-16 (Component 2 / Granularity) — VPR retrieval unit must be decoupled from the storage-tile boundary.** The Mode A and Mode B draft both said "FAISS IVF over per-tile DINOv2-VLAD vectors" using **storage tiles at z=20** (~154 m × 154 m ground). A 1 km AGL nadir frame covers **30100 such tiles** depending on lens. Cosine similarity between a frame descriptor (covers ~600 × 450 m) and a tile descriptor (covers 154 × 154 m) is fundamentally mismatched and noisy. None of the published aerial-VPR systems do it this way:
- **AerialVL** (S03) preprocesses the reference satellite map into **frame-footprint-sized reference chunks** matched to expected drone-frame ground coverage.
- **AnyLoc** (S05) uses overlapping macro-windows scaled to query footprint on aerial.
- **NaviLoc** uses a sliding-window descriptor over the basemap.
**Conclusion**: the storage tile (z=20, 512×512) stays as the dedup / orthorect unit. The **VPR chunk** is a separate concept: ground-footprint chunks sized to the expected frame coverage with **4050% overlap** so any frame footprint lands cleanly inside ≥1 chunk. Optionally multi-scale (one set per altitude band). Index is over chunks, not tiles.
- Source: re-reading S03 + S05 with the granularity question in mind; verified against the user-surfaced gap.
- Confidence: ✅ High. The error mode is well-known in the aerial-VPR literature; the original draft just under-specified the retrieval unit.
**M-17 (Component 2 / Invocation policy) — VPR is a re-loc-trigger module, not an every-frame module.** Per Component 5 EKF analysis, in steady state (recent anchor < 2 s, σ_xy < 20 m, VO healthy), a geometric prior from the IMU + VO predicted position is enough to pick top-K candidate VPR chunks by **distance alone** — no DINOv2 forward needed. VPR's value is concentrated in the resilience paths:
- **AC-NEW-1 cold start** — no IMU prior at all → VPR is the only viable narrow.
- **AC-3.2 sharp turn** — VO fails, IMU prior degrades fast → VPR re-anchors.
- **AC-3.3 disconnected segment** — explicitly requires "global descriptor retrieval" — VPR.
- **σ_xy growth** — when EKF position covariance escapes σ_xy ≥ 50 m, geometric prior is too wide; VPR re-narrows.
**Conclusion**: control flow is `if (steady_state) { use geometric prior } else { invoke VPR }`. Saves ~1035 ms/frame and lets VPR backbone idle (one less concurrent process during cruise). The DINOv2-base TRT engine still has to be resident in GPU memory for fast invocation.
- Source: derived from M-1, M-5, AC-NEW-1, AC-3.2, AC-3.3, EKF analysis. Independently corroborated by user feedback on the architecture.
- Confidence: ✅ High.
**M-18 (Component 2 / Fallback) — expanding-window retry on unconvincing top-1.** Standard pattern in re-loc literature: if top-1 VPR similarity is below threshold OR top-1/top-2 gap is below threshold (both signs that VPR is unsure), **expand the candidate set to adjacent chunks** (±1 chunk in each direction = 8 neighbours in a regular grid; or radius-N expansion for sparse-overlap layouts) before failing over to operator-assisted re-loc. Cheap to add: same FAISS index, larger K, no extra DINOv2 forward.
- Source: standard relocalization pattern (cf. ORB-SLAM3, GISNav, NGPS implementations).
- Confidence: ✅ High.
**M-19 (Component 2 / Active-conflict robustness) — multi-scale chunks + OSM road overlay + sector-driven K + negative cache.** Active-conflict scene change (destroyed buildings, cratering, dam flooding, road realignment) is a frequent operational reality in the eastern/southern Ukraine deployment, not an edge case. Layered mitigations beyond M-16/17/18:
- **Multi-scale VPR chunks**: maintain BOTH fine-scale (z=20-derived) and coarse-scale (z=17/18-effective) chunk descriptor sets. Coarse-scale descriptors capture road-network + field-boundary + waterway structure that survives building destruction. ~12 MB extra disk, ~3 min one-time pre-flight DINOv2 forward.
- **OSM road-network overlay**: extract OSM road geometry for the operational area pre-flight as a binary "road-mask" tile sidecar; matcher applies bonus inlier weighting on keypoints that fall on road edges. GISNav uses this pattern. Roads are the single most change-stable feature in active-conflict zones.
- **Sector volatility classification drives K** (binds to AC-NEW-6 `sector_class`): K=5 stable / K=20 active / K=50 expanding-window-fallback.
- **Onboard-tile rapid promotion in active sectors**: refines M-9's 2-flight voting — single-flight promotion allowed in active sectors when σ_xy ≤ 3 m AND OSM-road-overlap ≥ 70 % (dual gate keeps safety).
- **Negative cache**: tiles repeatedly rejected by matcher across flights get `trust_level = stale_destroyed`, excluded from retrieval until Service refresh.
The two highest-leverage of these are multi-scale chunks and OSM overlay; the rest are essentially free.
- Source: derived from M-9, M-16, M-17, M-18 + standard cartographic-stability reasoning + GISNav reference architecture; user-driven concern about active-conflict scene change frequency.
- Confidence: ✅ High on multi-scale + OSM (literature-backed); ⚠️ Medium on the OSM-road-overlap-≥-70 % numeric threshold (needs empirical calibration).
**M-20 (Component 1) — Storage tile zoom level pinned at z=20.** Trade-off analysis in response to user question (z=18 vs z=20):
- ADTi 20MP APS-C @ 1 km AGL with 2450 mm lens → frame GSD in 818 cm/px range. Mid-range (~35 mm lens) → ~12 cm/px.
- Frame-vs-reference scale ratio at z=20 (30 cm/px): **2.5×** — well within the SP+LG / GIM-LightGlue "well-handled" band (≤4× per published IMW-style benchmarks).
- Frame-vs-reference scale ratio at z=18 (~120 cm/px): **10×** — outside the SP+LG well-handled band; sub-pixel keypoint-correspondence accuracy degrades sharply, pushing AC-1.2 (50 % @ 20 m) and AC-2.2 (MRE < 2.5 px) into risk territory.
- Storage @ z=20 over 400 km² ≈ 2.8 GB cache + 30 MB DEM + 16 MB VPR chunk index ≈ 3 GB total — **28 % of the 10 GB budget**, leaving 7 GB headroom for FDR overflow and multi-scale chunks (M-19).
- Storage @ z=18 over 400 km² ≈ 220 MB total — saves ~2.5 GB but provides no operational benefit at our budget level.
- Pre-flight compute: z=20 takes ~5 min; z=18 takes ~3 min. Both trivial on the bench. Not a deciding factor.
- **Decision: z=20** for the storage tile. The accuracy benefit is meaningful; the storage cost fits comfortably. Folded into restrictions.md.
- Source: derived analysis using ADTi camera spec + Mode B finding S40 (DINOv2 latency) + IMW-style matcher-resolution-mismatch data.
- Confidence: ✅ High.
**M-21 (2chADCNN re-classification) — ceiling reference, NOT bench-off candidate.** Closer reading of S50 (MDPI Drones 2023) reveals 2chADCNN is structurally incompatible with our bench-off:
- **Output format**: template-overlap region (IoU-style), not sub-pixel keypoints. Component 3's PnP needs keypoint correspondences; 2chADCNN can't supply them.
- **Tested altitude band**: 252500 m AGL, not 1 km. Their experimental envelope doesn't cover our regime.
- **No Jetson / TRT benchmark**: trained on Intel i5 + 8 GB RAM CPU only.
- **Method paradigm**: traversal-search template matching (slide template over satellite image at every position, compute similarity). Doesn't scale to a 400 km² operational area in our latency budget.
- **Reported numbers**: real-summer overlap-IoU 0.920.99; synthetic-snow overlap-IoU 0.820.95. Useful as a published season-robustness *number* against which we benchmark our chosen modern matcher (SP+LG / GIM-LightGlue) — but not as a candidate for the matcher slot itself.
Walks back the "optionally a bench-off candidate" tag in M-14. 2chADCNN is **purely a season-robustness ceiling reference**.
Newer / more relevant season-aware references for the open-research reading list:
- **AFF-CNN-HTransformer cross-perspective UAV-satellite matching** (Sci Reports 2025) — hybrid CNN+Transformer cross-view + season.
- **Polar-coordinate-transformation rotation-and-season-invariant UAV-satellite matching** (2026) — explicitly addresses both rotation and season; intersects nicely with our IMU-driven de-rotation step.
- Source: closer reading of S50 + new search results 2025-2026.
- Confidence: ✅ High on 2chADCNN re-classification; ⚠️ Medium on the newer papers (need to read full PDFs before bench-off inclusion).
---
## Mode B Round 2 (component replacements & sweep) — appended 2026-04-26
**M-22 (Component 4 / VO architecture) — custom 2-frame homography VO is the wrong design.** Source: S52 (AFIT thesis), S60 (cuVSLAM), S64 (Isaac ROS UAV reference), S72 (high-altitude VIO), S73 (DPV-SLAM).
- Draft02 C-4 says "custom 2-frame VO via SuperPoint+LightGlue homography". This skips loop closure, sparse bundle adjustment, keyframe-based local mapping — every mechanism that bounds drift in production VO/SLAM systems.
- AFIT thesis (S52) shows even ORB-SLAM2 / SVO / DSO struggle on real fixed-wing flights; a hand-rolled 2-frame homography VO will be strictly worse.
- High-altitude VIO field test (S72): stereo-VIO = 2.186 m / 800 m at 40100 m AGL; monocular-VIO is "acceptable but worse". At 1 km AGL motion parallax shrinks ~1025× per frame, further degrading monocular VO.
- **Recommendation: replace custom 2-frame VO with cuVSLAM (S60, S64) in monocular + IMU mode.**
- Confidence: ✅ High on "custom 2-frame VO is wrong"; ⚠️ Medium on "cuVSLAM is the right replacement" — high-altitude fixed-wing performance is unproven on cuVSLAM's published benchmarks (KITTI urban driving + EuRoC indoor MAV). Bench-off in F-T1b mandatory.
**M-23 (Component 4 / VO candidate evaluation on Jetson Orin Nano Super).** Source: S60, S61, S62, S71, S73, S76.
- **cuVSLAM (S60)**: NVIDIA-supported, CUDA-optimized, drop-in via `isaac_ros_visual_slam`, Apache-2.0. Reference designs on Orin Nano (S64, S77) confirm runtime feasibility. <1% ATE on KITTI / <5cm on EuRoC. **Verdict: v1 lead candidate.**
- **DPVO / DPV-SLAM (S61, S73)**: SOTA deep VO, but DPVO-QAT++ is benchmarked on RTX-4060, not Jetson. Original DPVO @ 25× real-time on RTX-3090 (4 GB) → Orin Nano Super extrapolation ≈ 410 FPS without QAT, ≈615 FPS with QAT. **Borderline for 10 Hz target; not v1.**
- **MASt3R-SLAM (S62)**: 15 FPS on a single GPU; sub-1 Hz extrapolated on Orin Nano Super. **Infeasible for inline v1.**
- **VINS-Fusion / OpenVINS / BASALT / SVO Pro (S71)**: Classical, well-tested, but require manual integration (OpenCV pinning, ArUco fixes, DDS / ROS plumbing) and no Jetson-class CUDA acceleration of the front-end. Higher integration cost than cuVSLAM with no accuracy advantage.
- **Custom 2-frame homography VO (current draft02 plan)**: M-22 already disqualified.
- Confidence: ✅ High.
**M-24 (Component 3 / cross-view matcher — LiteSAM evaluation).** Source: S58.
- LiteSAM is **purpose-built for satellite↔aerial AVL in GPS-denied environments**. Architectural choices (TAIFormer + MinGRU sub-pixel refinement) are tailored to large appearance variations and texture-scarce regions — exactly our regime.
- Results: 6.31 M params (2.4× smaller than EfficientLoFTR); RMSE@30 = 17.86 m on UAV-VisLoc; 61.98 ms on standard GPU; **497.49 ms on Jetson AGX Orin** (FP16-optimized).
- **Crucial extrapolation**: AGX Orin INT8 throughput ≈ 275 TOPS, Orin Nano Super ≈ 67 TOPS → 4× scaling factor → **LiteSAM on Orin Nano Super ≈ 15002000 ms / pair**. Well outside our 400 ms p95 budget for inline use.
- **Three useful roles (not the inline matcher)**:
- (a) **Re-localization fallback** — invoked rarely (cold start, σ_xy > 50 m), 1.52 s latency tolerable.
- (b) **Validation oracle** — ground-truth-quality matches for offline regression bench.
- (c) **Distillation teacher** — train a smaller student model with LiteSAM-supervised correspondences for the satellite-aerial domain.
- **Verdict: add LiteSAM in roles (a)/(b)/(c); SP+LG (TRT FP16/INT8) remains the inline matcher.**
- Confidence: ✅ High on architectural fit; ⚠️ Medium on the 4× AGX-Orin → Orin Nano Super scaling — needs empirical confirmation in bench-off.
**M-25 (Component 3 / cross-view matcher — RoMa v2 / MapGlue / MATCHA).** Source: S63 + earlier MapGlue / MATCHA notes.
- **RoMa v2 (S63)**: SOTA dense matcher, frozen DINOv3 + custom CUDA + predictive covariance. GPU-class compute. Infeasible inline on Orin Nano Super; viable as **offline ceiling reference** for Component 3 bench-off.
- **MapGlue / MATCHA**: Cross-modal/multimodal matchers — useful research-track candidates but no Jetson deployment data; same offline-only verdict.
- **Verdict**: not a v1 candidate; offline ceiling reference. The matcher bench-off (deferred research item) MUST include both as ceilings so we know how much accuracy we're trading away by using SP+LG inline.
- Confidence: ✅ High.
**M-26 (Component 5 / EKF→ESKF question — architectural reframing).** Source: S65, S66, S67, S68, S69.
- **The FC (ArduPilot 4.5+) runs EKF3, a classical extended Kalman filter — not an ESKF.** PX4 EKF2 is the ESKF (S68); we are not on PX4. We cannot swap the FC's filter.
- The "EKF vs ESKF" debate therefore applies **only to the companion-side filter** (Component 5 in draft02).
- **Best practice for ArduPilot ExtNav setups (S65, S66, S67)**: companion does NOT run a heavy filter on top. Companion produces (visual fix → GPS_INPUT) and/or (relative pose → ODOMETRY) with well-calibrated covariances; ArduPilot EKF3 fuses those with the FC's IMU.
- ArduPilot issues #30076 (S65) and #32506 (S66) document concrete failure modes when feeding the FC two simultaneous position sources — **only one position source per axis at a time**. The hybrid `GPS_INPUT + ODOMETRY` plan from M-1 must therefore split responsibilities by **channel**, not duplicate position on both.
- **Architectural revision**: the companion-side EKF in draft02's C-5 is **not necessary** for v1. It can be replaced by a lightweight **"covariance calibrator + outlier gate + source-label producer"**: each upstream (matcher, VO, IMU passthrough if any) emits a hypothesis with a covariance; a Mahalanobis gate rejects outliers; covariances are re-scaled if empirical residuals indicate over- or under-confidence; results are emitted on the appropriate MAVLink channel. No state propagation, no IMU integration on the companion.
- **If a companion-side filter is justified later** (e.g., to smooth visual fixes before they reach the FC, or to integrate VO with the FC's downsampled-IMU stream the companion can subscribe to), use **vanilla ESKF (S69)** for orientation correctness — but only after F-T9 SITL shows the FC's EKF3 cannot handle our raw input quality.
- Confidence: ✅ High on dropping the companion-side EKF for v1; ⚠️ Medium on whether we'll need to re-introduce one for v1.x.
**M-27 (Component 1b / Ortho-Tile Generator — use Orthority).** Source: S59.
- Orthority (Python, MIT-class) supports frame + RPC camera models, GeoTIFF DEM lookup, RPC refinement, pan-sharpening — i.e., everything draft02's hand-rolled pinhole-on-DEM ortho was going to reinvent.
- Pip-installable (`pip install orthority`). API-driven (per-image ortho via `Ortho` class) → callable inline from our Component 1b worker.
- ODM is post-processing batch SfM — wrong tier; not for per-frame ortho on a 1 km AGL nadir camera with known FC pose.
- **Verdict: replace draft02's "Pinhole projection on per-sector DEM" with Orthority frame-camera ortho.** Falls back to a 6-line `cv2.warpPerspective` + bilinear DEM lookup if Orthority's per-frame latency on Orin Nano Super blows our budget — measure in F-T14.
- Confidence: ✅ High on Orthority being the right tier; ⚠️ Medium on the latency assumption — needs measurement.
**M-28 (Component 1 / tile storage — MBTiles WAL stays; PMTiles / COG considered).** Source: COG/PMTiles search results + draft02 M-8.
- **COG**: Highly-tiled COG metadata can trigger 500 MB initial download on a 7 GB file (geotiff.js issue #479) — defeats selective access on a bandwidth-constrained UAV system. Not a fit.
- **PMTiles**: Single-file alternative to MBTiles, cloud-optimized. Good for HTTP serving (RPi tests show competitive performance). For our use case (local microSD, embedded reader+writer), PMTiles loses the SQLite-WAL concurrency story we already designed for in M-8.
- **Verdict: MBTiles + WAL (M-8) remains the right choice.** No revision.
- Confidence: ✅ High.
**M-29 (Component 9 / orchestrator — ROS 2 vs DIY Python).** Source: S64, S77.
- ROS 2 Humble + JetPack 6 + Isaac ROS 3.2 + cuVSLAM + MAVROS is a **proven reference architecture on Orin Nano Super** (S64, S77).
- If we adopt cuVSLAM (M-22/M-23), the lowest-friction path is to consume cuVSLAM via `isaac_ros_visual_slam` (ROS 2 wrapper) and bridge to the FC via MAVROS — not to re-export cuVSLAM's C++ API into a custom Python orchestrator.
- **ROS 2 cost**: extra ~25 % CPU for DDS + topic serialization; learning curve for the team; deployment image grows ~200 MB.
- **ROS 2 benefit**: free integration of cuVSLAM, MAVROS, Isaac ROS perception nodes; battle-tested; observability via `ros2 bag` and `rqt_*` tooling.
- **DIY Python alternative** (draft02 plan): keeps everything in one asyncio process; lowest overhead; but we re-export every ROS 2 component we want to consume (cuVSLAM via Python bindings, MAVROS-equivalent via pymavlink, etc.).
- **Verdict: lean toward ROS 2 Humble + Isaac ROS for v1**, with our matcher / VPR / ortho / FDR / fusion-glue nodes implemented as ROS 2 Python nodes (`rclpy`). Decision is **not locked** — it's the largest open architectural question for round 2 and the user should be asked.
- Confidence: ⚠️ Medium — depends on whether the team has ROS 2 experience and whether the ~5 % CPU overhead is acceptable inside the latency budget. **This is a Q for the user.**
**M-30 (Component 5 / hybrid GPS_INPUT + ODOMETRY — channel split per S65/S66/S67).** Source: S65, S66, S67.
- M-1 (round 1) said "emit BOTH GPS_INPUT AND ODOMETRY in parallel". S65/S66/S67 say **only one position source per axis at a time** and document concrete bugs when the FC sees two.
- **Revised channel split**:
- Option A (simplest, recommended for v1): **GPS_INPUT carries position + velocity** (lat/lon/alt + N/E/D velocities + h_acc/v_acc/vel_acc covariance scalars). ODOMETRY is **disabled** for v1. ArduPilot configured `EK3_SRC1_POSXY = GPS`, `EK3_SRC1_VELXY = GPS`, `EK3_SRC1_YAW = GPS+Compass`. Our companion provides a "GPS-equivalent" via GPS_INPUT (`GPS1_TYPE=14`); ArduPilot treats it identically to a real receiver. Failover to backup GPS via `EK3_SRC2_*`.
- Option B (richer, v1.1+): **ODOMETRY carries position + velocity + yaw + full 21-element covariance**, GPS_INPUT carries **fix only as fallback** (not actively fused while ODOMETRY is healthy). ArduPilot configured `EK3_SRC1_POSXY = ExternalNav`, `EK3_SRC1_YAW = ExternalNav`, with `EK3_SRC2_POSXY = GPS` as backup. Requires PR #30080-class fixes for clean source switching.
- **Original M-1 (both channels for the same axis) is a misconfiguration**, not a feature. Walk back.
- **Verdict**: v1 ships Option A. Option B is v1.1 territory once F-T9 confirms source-switching behaves cleanly under PR #30080.
- Confidence: ✅ High.
**M-31 (Component 6 / sysid sharing on the wire).** Source: S65, S67.
- Round 1 M-6 picked "distinct system-IDs for MAVSDK (sysid=10) and pymavlink (sysid=11), sharing the serial port via ArduPilot's native MAVLink routing — no router daemon".
- This decision survives round 2 unchanged. The distinct-sysid trick + ArduPilot native routing is documented and works for any MAVLink2 stack. No router CVE exposure (M-6 / S45).
- Open task: confirm the chosen sysids don't collide with any MAVLink2 forwarding rule on QGroundControl GCS-side; document in deploy runbook.
- Confidence: ✅ High.
**M-32 (Component 9 / Python topology — confirmed).** Source: S55.
- Round 1 M-10: stay on CPython 3.11/3.12; defer free-threaded 3.13 to v1.1. Survives round 2 unchanged.
- If Component 9 moves to ROS 2 (M-29), the Python version question still applies — `rclpy` supports 3.11/3.12; 3.13 free-threaded is also experimental there.
- Confidence: ✅ High.
**M-33 (Component 2 / VPR — no new entrants worth adding).** Source: round-2 searches.
- Searched for newer VPR SOTA than DINOv2-SALAD / BoQ (CVPR 2024). The 2025 landscape is matcher-centric (RoMa v2, LiteSAM, MASt3R-SLAM); no new VPR backbone has displaced SALAD/BoQ on aerial cross-domain.
- Round 1 shortlist {AnyLoc, SALAD, BoQ, MixVPR} stands.
- Confidence: ✅ High.
**M-34 (Component 4 / camera intrinsics learning — calibration-free SLAM).** Source: S62.
- MASt3R-SLAM is calibration-free; cuVSLAM expects intrinsics. Our nav cam (ADTi 20MP APS-C) will be calibrated pre-flight via standard checkerboard procedure → cuVSLAM's intrinsics requirement is **not** a friction point.
- Confidence: ✅ High.
**M-35 (Component 5 / IMU access on the companion — open question).** Source: S64 reference designs.
- The reference cuVSLAM-on-Jetson designs (S64) use the camera's built-in IMU (RealSense D435i) for VIO. Our nav cam (ADTi 20MP APS-C) has no IMU; the FC has the IMU.
- Two paths to feed IMU into companion-side cuVSLAM:
- (a) MAVLink `RAW_IMU` / `SCALED_IMU` stream from FC → companion subscribes via pymavlink, feeds cuVSLAM. **~1 kHz IMU on FC down-rated to ~200400 Hz over MAVLink** is sufficient for monocular VIO; latency budget acceptable.
- (b) Add a dedicated companion-side IMU (BNO055 / ICM-42688P / Bosch BMI270 over SPI/I²C) with its own time sync. More hardware, but no MAVLink-bus contention.
- **Verdict v1**: try path (a); if cuVSLAM's IMU sync sensitivity (timestamping) is too tight for MAVLink-rated IMU, fall back to (b) in v1.1.
- Confidence: ⚠️ Medium — depends on cuVSLAM's tolerance for IMU rate / timing jitter; needs empirical check during integration.
@@ -0,0 +1,186 @@
# Mode B Decomposition — Adversarial Assessment of `solution_draft01.md`
**Mode**: B (Solution Assessment).
**Question type**: Problem Diagnosis + Decision Support.
**Novelty sensitivity**: **High**. Embedded CV/SLAM, ArduPilot MAVLink2 signing maturity, JetPack version, and matcher SOTA all churn fast — prefer 2024-Q4 → 2026-Q2 sources.
**Goal**: per Mode B template, find weak points (functional / security / performance) per draft component and propose either a stronger alternative or an explicit mitigation. Output is `solution_draft02.md` with an "Assessment Findings" table at the top.
## Boundary
- **Population**: a single fixed-wing UAV running the GPS-denied onboard pipeline, 1 km AGL, 60 km/h cruise, 8 h endurance, eastern/southern Ukraine.
- **Geography**: deployed in active-conflict / contested EW environment.
- **Timeframe**: deployment v1 within the next ~46 months from now (mid-2026).
- **Level**: companion-computer code + integration. The Suite Satellite Service, the AI-camera detector, the FC firmware, and the airframe are out of scope as components but appear as interfaces under attack.
## Perspectives chosen (≥3 mandatory)
1. **Implementer / engineer** — what published Jetson Orin Nano Super numbers say about the actual latency budget, what the GIL-on-hot-path failure modes are, what is hard about TRT-deploying DINOv2-VLAD.
2. **Contrarian / devil's advocate** — every committed choice in the draft has a "why not X" answer; surface them.
3. **Domain practitioner** — what people running ArduPilot + companion CV in production have written about MAVLink2 signing, mavlink-router, GPS_INPUT injection, cross-view matchers in active service.
4. **Security / red-team**`GPS_INPUT` is a high-trust local channel; tile cache is operationally sensitive. Realistic attack surface and mitigations.
## Weak-point sub-questions (drives Mode B web search)
### W1. Cross-view matcher commitment (Component 3)
The draft pins SuperPoint+LightGlue / XFeat / MASt3R as the bench-off candidates, with 1024×768 as the working downsample.
- W1.a. **Is the bench-off shortlist still current as of 2026-Q2?** Did GIM (2024), BoQ (2024), Mast3r-SfM (2025), RoMa-DC (2025), or Map-Free-Reloc 2025 leaderboard winners change the picture?
- W1.b. **Is "1024×768 starting point" empirically defensible on Orin Nano Super 25 W?** Published TRT FPS / latency for SP+LG and XFeat at this resolution on the Orin Nano class.
- W1.c. **Cross-view-specific failure modes at 1 km AGL** that the bench-off won't catch — illumination, season, recent-conflict landscape change. Are any matchers explicitly evaluated on temporal change?
- W1.d. **Why not training-free 3D-grounded matching (MASt3R/Mast3r-SfM) as primary** instead of as stretch? What's the realistic Orin Nano latency budget for these.
Query variants: "LightGlue Jetson Orin Nano benchmark 2025 2026", "SuperPoint TensorRT FP16 Orin Nano latency", "MASt3R embedded GPU benchmark", "GIM image matching cross-view 2024", "BoQ visual place recognition", "RoMa DKM aerial cross-view 2025", "image matcher seasonal change benchmark".
### W2. VPR backbone commitment (Component 2)
Draft picks AnyLoc (DINOv2-VLAD) primary + MixVPR fast-lane.
- W2.a. **DINOv2 ViT-B/14 latency on Orin Nano Super 25 W** — is the draft's "~5080 ms / 224×224" empirically backed?
- W2.b. **2025 SOTA**: SALAD, BoQ (Bag-of-Queries), CricaVPR — do any beat AnyLoc on aerial cross-domain at meaningful latency?
- W2.c. **AnyLoc unsupervised VLAD** is training-free, but is the VLAD codebook quality stable across operational areas (Ukraine specifically)? Any published failure cases?
Query variants: "AnyLoc Jetson benchmark", "DINOv2 ViT-B TensorRT FP16 latency Orin", "SALAD visual place recognition aerial 2024", "BoQ visual place recognition", "CricaVPR aerial benchmark", "VPR aerial Ukraine seasonal".
### W3. Process topology — "single Python process + asyncio + TRT subprocess workers via CUDA IPC"
Draft commits to this for v1 (Component 9).
- W3.a. **GIL on the hot path** — is asyncio + subprocess workers actually GIL-safe at 3 fps × 1 km AGL with all the I/O (MAVLink, FDR, tile cache lookups, EKF math)? Real-world failure stories from ArduPilot/PX4 companion-computer projects.
- W3.b. **CUDA IPC for tensor handoff** — known issues on Jetson (unified memory model: is CUDA IPC even meaningful when CPU and GPU share the LPDDR5 pool)?
- W3.c. **Subinterpreters / free-threaded Python (3.13+)** — is the project using a Python old enough that subinterpreters aren't an option?
- W3.d. **Alternatives**: ROS 2 Humble (rejected in draft), C++ core (rejected), single-process with multiprocessing (not discussed).
Query variants: "Jetson CUDA IPC unified memory", "Python asyncio CUDA real-time deadline", "Python GIL drone companion computer", "PX4 ArduPilot companion computer python production", "ROS2 vs Python single-process VIO embedded", "free-threaded Python 3.13 GPU".
### W4. Loosely-coupled EKF in Python + numba (Component 5)
Draft writes its own loosely-coupled EKF, fuses IMU @ 100 Hz from FC, satellite anchors irregular, VO @ 3 Hz; emits GPS_INPUT.
- W4.a. **Why not just feed `VISION_POSITION_ESTIMATE` to ArduPilot EKF3 and let the FC fuse?** Draft mentions this as "alternative" — what does the practitioner literature say about the actual cost of the dual-fusion choice?
- W4.b. **EKF covariance calibration is famously fragile** (AC-NEW-4 false-position budget rides on it). Are there published gotchas for loose-coupled aerial EKF? What's the right Mahalanobis gate value?
- W4.c. **numba JIT on Jetson** — JIT warmup time hurts AC-NEW-1 (cold-start TTFF <30 s). Real numbers on Jetson Orin Nano JIT compile time.
- W4.d. **Heading observability** — at 1 km AGL nadir, satellite anchoring gives `(lat, lon, h)` but heading is weakly observable from a single anchor unless the matcher emits oriented features. Does the draft's matcher choice cleanly produce yaw with covariance?
Query variants: "ArduPilot VISION_POSITION_ESTIMATE vs GPS_INPUT", "loose coupled EKF aerial gotcha", "EKF Mahalanobis gate visual anchor", "numba Jetson cold start", "monocular yaw observability satellite reference".
### W5. ArduPilot MAVLink2 signing + GPS_INPUT injection security (Component 6)
Draft says "MAVLink2 signing recommended", treats GPS_INPUT as high-trust local channel.
- W5.a. **Production maturity of MAVLink2 signing in ArduPilot 4.5+** as of 2026-Q2 — is it default-on, default-off, key-distribution story?
- W5.b. **Real attack surface**: what does an attacker with serial access to the FC actually need to spoof a GPS_INPUT? Is `mavlink-router` itself an attack-surface widening?
- W5.c. **Companion-side defenses** — health-gate before injecting, fix_type sanity, jam-detection from the other direction.
- W5.d. **Failsafe fallback**: if our GPS_INPUT is rejected by the FC (signing fail), what does ArduPilot do — does AC-NEW-2 (3 s spoof-promotion latency) survive that?
Query variants: "ArduPilot MAVLink2 signing 4.5 production", "MAVLink2 signing key distribution UAV", "ArduPilot GPS_INPUT signing", "mavlink-router security audit", "GPS_INPUT spoof companion computer attack".
### W6. In-flight ortho-tile generation residual error (Component 1b)
Draft: pinhole projection → flat-Earth ground plane → resample to z=20 XYZ tiles. Eligibility gates: σ_xy ≤ 10 m, |bank| / |pitch| ≤ 10°.
- W6.a. **Flat-Earth residual error in eastern/southern Ukraine** — actual relief amplitude. Steppes are not flat at 30 cm/px tile precision; agricultural fields, river valleys, ravines (yary) are common.
- W6.b. **What's the per-tile geo-alignment error budget** that still keeps cross-view anchors valid against the same tile two flights later?
- W6.c. **MBTiles SQLite at 10 GB scale on NVMe**: known issues with concurrent reader+writer (tile-cache miss path is concurrent with tile-write path)? Sharding strategy?
- W6.d. **Dedup by (z, x, y) only** — but the onboard tile carries a parent_pose covariance. If we already overwrite a service-source tile with an "onboard" tile that was written from a 3-σ-bad pose, we've poisoned the next flight's cache. Should the dedup rule include a "trust-only" lock from the Service?
Query variants: "MBTiles concurrent writer reader SQLite", "orthorectification flat earth residual error UAV", "Ukraine eastern terrain relief amplitude", "geotagged tile alignment budget cross-view localization".
### W7. Tile dedup poisoning — onboard tile overwrites service tile
This is a sharper version of W6.d.
- W7.a. The "highest quality wins" rule treats `match_inliers` as a proxy for geo-alignment confidence. But a confidently-bad anchor (over-confident covariance from EKF — see W4.b) writes a "high-quality" tile that's actually misaligned by 50 m. Next flight, that misaligned tile becomes the satellite anchor for *another* anchor, and the error compounds.
- W7.b. **Best-practice from cartography / SfM** for trusting onboard imagery as basemap input.
- W7.c. **Mitigation**: lock tiles whose source is `service` against onboard overwrite for some grace period; require onboard tiles to be "voted" by N independent flights before promotion.
Query variants: "satellite tile pose error compounding", "uav generated tile basemap update sfm trust", "drone-ortho photo dedup quality score".
### W8. Mavic-class footage as deployment-domain proxy
Draft uses internal Mavic flight footage as the deployment-domain V&V proxy. Mavic is a small quadcopter; the deployment platform is a fixed-wing at 1 km AGL.
- W8.a. **What does the literature say** about transferring CV/VO/VPR results from quadcopter footage to fixed-wing? Camera dynamics differ (rolling shutter, vibration spectrum, frame rate, motion-blur profile, AGL band).
- W8.b. **Synthetic IMU from Mavic video** — user already rejected this. But is there a non-synthetic alternative that the draft missed? E.g., MidAir (synthetic but matched dynamics), TartanAir, public ArduPilot SITL log.
- W8.c. **Risk of false confidence** — ground truth is in the absolute satellite anchor, not the Mavic IMU. So how does the Mavic V&V actually validate AC-NEW-4 (false-position safety) when no fixed-wing IMU is in the loop?
Query variants: "fixed wing vs quadcopter visual SLAM transfer", "drone vibration spectrum fixed-wing quad", "TartanAir aerial dataset fixed-wing".
### W9. Latency budget — is 400 ms p95 actually realistic?
AC-4.1 budget. Draft acknowledges R2 ("latency budget on Orin Nano Super at 1024×768 input is tight").
- W9.a. **Real published Jetson Orin Nano Super 25 W numbers** for: DINOv2 ViT-B forward (224×224), SuperPoint+LightGlue at 1024×768, FAISS top-K over ~10⁴ vectors, EKF update at 100 Hz IMU.
- W9.b. **Steady-state vs transient latency** — does the budget include EKF-output-to-MAVLink-emit overhead, MAVLink serialisation, and the FC's own gating?
- W9.c. **Failure mode if budget blows** — frame-drop is allowed (AC-4.1 says ~10%) but if matcher latency tail is 600 ms, the EKF rides on VO+IMU for >2 frames, and AC-3.4 reloc trigger hits.
Query variants: "DINOv2 Jetson Orin Nano TensorRT FP16 ms", "LightGlue Jetson benchmark FPS 1024", "FAISS Jetson IVF latency".
### W10. AC-NEW-4 false-position safety — Monte Carlo validation realism
P(error >500 m) <0.1%, P(error >1 km) <0.01%.
- W10.a. **What's the standard practice** for validating these probabilities at this magnitude? You need >10⁴ frames of independent failure modes — does the AerialVL + Mavic dataset cover that?
- W10.b. **What does the literature say** about cross-view matcher tail behavior — do failures cluster on specific scene types (forest, repetitive cropland, water, glare)? If yes, dataset bias is the killer.
- W10.c. **EKF-side gating** — Mahalanobis gate is the right tool, but the gate threshold itself is a per-environment tuning parameter. Is there a published recipe?
Query variants: "visual localization tail probability >1km", "cross-view matcher failure clustering forest cropland water", "aerial visual SLAM Monte Carlo safety budget".
### W11. Cold-start TTFF <30 s feasibility
AC-NEW-1.
- W11.a. **TRT engine warm-up cost** on Jetson Orin Nano Super for SP+LG + DINOv2 + EKF JIT. Real numbers.
- W11.b. **FAISS index load + mmap warm**: 10 GB tile cache, IVF over ~10⁵ tile vectors — load time on NVMe.
- W11.c. **First valid GPS_INPUT** path includes: IMU-extrap-from-FC, first frame, VPR retrieve, matcher run, PnP, EKF init, GPS_INPUT emit. Anyone published an end-to-end cold-boot number for this kind of stack on Orin?
Query variants: "TensorRT engine load time Jetson", "FAISS mmap warm 10GB", "Jetson companion computer cold boot time GPS substitute".
### W12. Imagery freshness reality check — Suite Satellite Service refresh cadence
AC-8.2 + AC-NEW-6: <6 months for active sectors, <12 months for stable.
- W12.a. **Is a 6-month refresh actually achievable** for Maxar Vivid / Pléiades Neo / Pléiades over Ukraine in 2026-Q2? Tasking lead time + cloud-cover acceptance + delivery channel.
- W12.b. **Practitioner reports** on what 30 cm Ukraine 20242025 imagery actually looks like (smoke, glare, seasonal mismatch, cratering).
- W12.c. **In-flight tile generation** is meant to backfill — but the Service still needs ground-truth tasking to seed the cache for any new operational area before the *first* flight. Is there a chicken-and-egg problem for first deployment to a new sector?
Query variants: "Maxar Vivid Ukraine 2025 refresh tasking", "Pleiades Neo Ukraine cloud cover lead time", "30cm satellite imagery refresh cadence active conflict".
### W13. Resource contention — 8 GB shared LPDDR5 budget
AC-4.2 = <8 GB shared. Draft loads:
- DINOv2 ViT-B TRT engine (~600 MB GPU)
- SP+LG TRT engine (~hundreds of MB)
- FAISS index over 10⁵ tile descriptors
- Tile cache mmap (10 GB on disk, mmap to RAM via OS page cache)
- EKF state + IMU ring buffer
- Python interpreter + asyncio loop + JIT'd numba kernels
- MAVSDK + pymavlink
- W13.a. **Realistic peak RSS** for this stack — is the 8 GB budget headroom or is it a tight squeeze?
- W13.b. **JetPack 6.2 / Ubuntu 22 baseline RAM** consumed before our process even starts.
- W13.c. **Mitigation**: page out the FAISS index, swap, or pin everything?
Query variants: "Jetson Orin Nano 8GB shared budget DINOv2 LightGlue", "JetPack 6.2 base RAM usage", "FAISS pinned memory Jetson".
## Completeness audit
Probes (per `references/comparison-frameworks.md` decomposition probes):
| Probe | Covered by | Notes |
|---|---|---|
| **Cost of failure / blast radius** | W5 (signing), W7 (tile poisoning), W10 (false-position) | three-way coverage of safety budget |
| **Time-to-first-result** | W11 | dedicated to TTFF |
| **Operating envelope** | W6 (terrain), W12 (freshness), W13 (memory), W9 (latency) | thermal already in AC-NEW-5 |
| **Maintenance cost** | W3 (Python topology), W4 (EKF code we own) | both addressed |
| **Substitutability of components** | W1 (matcher), W2 (VPR), W3 (process topology), W4 (EKF) | each component has ≥1 alternative-path question |
| **Adversarial / red-team** | W5, W7, W10 | covered |
| **Data-distribution bias** | W8, W10.b, W12 | covered |
| **Hardware-supply-chain risk** | not covered | Orin Nano Super availability is a project-management risk, not a design risk; deferred to Plan |
## Output plan
1. Source registry → append Mode B sources to `01_source_registry.md` as IDs `S40+`.
2. Fact cards → append Mode B facts to `02_fact_cards.md` under "Mode B Findings".
3. Mode B reasoning chain → write `04_reasoning_chain_mode_b.md`.
4. Validation log → write `05_validation_log_mode_b.md`.
5. Final deliverable → write `_docs/01_solution/solution_draft02.md` using `templates/solution_draft_mode_b.md`.
@@ -0,0 +1,94 @@
# Mode B — Round 2 Question Decomposition
**Trigger**: user explicit ask after rolling back from Step 3 (Plan).
**Mode**: B (Solution Assessment of `solution_draft02.md`).
**Date**: 2026-04-26.
**Scope (user-provided)**:
> "1. For VO — is it the most efficient method SP+LG for jetson? are there better ways? 2. for cross-view matcher — there is LiteSAM (https://github.com/boyagesmile/LiteSAM) and other methods specialized for that. Check and investigate in internet possible options. 3. EKF fusion — isn't it ESKF better? Ortho-tile generator — are there are already existing libs for that? or it is not so difficult and easier just to make it manually by ourselves? All in all, make a thorough investigation regarding each component — what's could be either give better confidence with relatively same resource and time footprint, either can provide roughly same confidence faster or lighter on resources."
## Question Type Classification
| # | Sub-Question | Type | Why |
|---|--------------|------|-----|
| Q-R2-1 | Is the SP+LG-based VO design (custom 2-frame homography) the most efficient & accurate VO on Orin Nano Super, or is there a better one? | Decision Support + Problem Diagnosis | Trade-off (compute vs accuracy vs maturity) + diagnoses whether the draft02 design choice is sound. |
| Q-R2-2 | Should LiteSAM (or any specialized satellite-aerial matcher) replace SP+LG / GIM-LG as the inline cross-view matcher? | Decision Support | Trade-off (accuracy vs latency vs role-fit). |
| Q-R2-3 | Is ESKF strictly better than EKF for our fusion stage? | Decision Support + Concept Comparison | Comparison + applicability boundary (ArduPilot vs companion). |
| Q-R2-4 | Should we use an existing ortho-tile generator library, or DIY? | Decision Support | Build-vs-buy. |
| Q-R2-5 | Is there a newer/better option for **every other component** (VPR, tile storage, MAVLink, software platform, DEM, etc.) that could give better confidence at same/lower resource footprint? | Knowledge Organization + Decision Support | Sweep audit of remaining components. |
Mode-B classification rule: **Problem Diagnosis + Decision Support** — applies to every sub-question above.
## Research Subject Boundary Definition
| Dimension | Boundary | Notes |
|-----------|----------|-------|
| **Population** | Embedded autonomous-flight stack on **Jetson Orin Nano Super (8 GB shared)** companion + **ArduPilot 4.5+** flight controller. Fixed-wing UAV airframe, 1 km AGL nadir nav cam, ADTi 20MP APS-C @ 3 fps. | Same as round 1. |
| **Geography** | Eastern-Ukraine theatre (active conflict, season variation). | Same as round 1. |
| **Timeframe** | v1 release 2026; v1.1 within 6 months. | Same as round 1. |
| **Level** | Software architecture and component selection (no hardware / no airframe / no GCS). | Same as round 1. |
## Perspectives Used (≥3 required)
| Perspective | Why this round | Example searches |
|-------------|---------------|------------------|
| **Implementer / Engineer** | Round 1 missed a few real engineering gotchas (companion-side filter double-fusion bugs, cuVSLAM as drop-in alternative). | "ArduPilot ExtNav GPS_INPUT double fusion", "cuVSLAM Jetson Orin Nano monocular fixed-wing" |
| **Practitioner / Field** | Look at production GPS-denied UAV reference designs on the same hardware target. | "ROS 2 Humble Jetson Orin Nano Super JetPack 6 MAVROS ArduPilot integration GPS-denied", "VINS-Fusion OpenVINS BASALT SVO Pro Jetson Orin Nano benchmark monocular fixed-wing 2025" |
| **Domain expert / Academic** | Verify SOTA matcher and SLAM landscape post-Mode-A. | "MASt3R-SLAM monocular real-time 2025 Jetson DROID-SLAM MAC-VO", "RoMa DKM dense feature matching aerial satellite UAV-VisLoc 2025" |
| **Contrarian** | Actively search for "why not the chosen approach": custom 2-frame VO, SP+LG-only matcher, hybrid GPS_INPUT+ODOMETRY both active. | "ArduPilot ODOMETRY GPS_INPUT companion external visual odometry double-fusion best practice", "fixed-wing UAV high altitude visual odometry 1km AGL accuracy" |
## Search Query Variants Per Sub-Question
(Selected; full search log preserved in agent transcript and `01_source_registry.md` round-2 entries.)
**Q-R2-1 (VO)**:
1. `visual odometry Jetson Orin Nano benchmark 2026 fixed-wing UAV monocular DPVO BASALT OpenVINS SVO Pro`
2. `DPVO Deep Patch Visual Odometry Jetson real-time inference benchmark FPS 2025`
3. `cuVSLAM Jetson Orin Nano monocular fixed-wing aerial visual odometry CUDA Lucas-Kanade`
4. `MASt3R-SLAM monocular real-time 2025 Jetson DROID-SLAM MAC-VO benchmark embedded`
5. `VINS-Fusion OpenVINS BASALT SVO Pro Jetson Orin Nano benchmark monocular fixed-wing 2025`
6. `DPVO Jetson Orin Nano FPS benchmark monocular visual odometry deployment 2025 ARM`
7. `fixed-wing UAV high altitude visual odometry 1km AGL monocular accuracy 2025`
8. `Isaac ROS visual SLAM cuVSLAM Jetson Orin Nano monocular fixed-wing UAV high altitude integration`
9. `DPV-SLAM DPVO real-time Jetson NX Orin port deployment monocular SLAM 2024`
**Q-R2-2 (cross-view matcher)**:
1. `LiteSAM lightweight feature matching satellite aerial imagery 2025 EfficientLoFTR`
2. `cross-view UAV satellite image matching benchmark 2025 XoFTR MatchAnything OmniGlue LoFTR LightGlue`
3. `MapGlue MapAnything XoFTR cross-modal aerial satellite matching Jetson inference 2025`
4. `XFeat lightweight feature matching Jetson Orin TensorRT FPS benchmark 2025`
5. `LightGlue ONNX TensorRT Jetson Orin Nano Super fps 2025 SuperPoint inference benchmark`
6. `RoMa DKM dense feature matching aerial satellite UAV-VisLoc benchmark accuracy 2025`
7. `aerial drone matcher MatchAnything OmniGlue DeDoDe homography benchmark 2025`
8. `SuperPoint LightGlue Jetson Orin Nano TensorRT FP16 INT8 ms per frame benchmark`
9. `UAV-VisLoc satellite aerial localization SP+LG XFeat LiteSAM RoMa benchmark accuracy meters`
**Q-R2-3 (EKF / ESKF)**:
1. `ESKF error state Kalman filter visual inertial navigation drone vs EKF 2025 advantages`
2. `ArduPilot EKF3 error state Kalman external visual odometry GPS_INPUT ODOMETRY fusion architecture`
3. `ArduPilot EKF3 vs PX4 EKF2 ESKF visual external odometry companion computer architecture`
4. `ArduPilot ODOMETRY GPS_INPUT companion external visual odometry double-fusion IMU EKF3 best practice`
**Q-R2-4 (ortho-tile generator)**:
1. `orthomosaic generation library python aerial drone OpenDroneMap MicMac OpenSfM real-time`
2. `single image orthorectification python library DEM gimbal pinhole homography UAV nadir camera`
3. `Orthority orthorectification python single image GeoTIFF DEM RPC frame camera benchmark`
4. `Orthority simple-ortho per-frame nadir UAV gimbal pitch roll yaw projection latency milliseconds`
**Q-R2-5 (sweep)**:
1. `Cloud Optimized GeoTIFF COG vs MBTiles tile cache embedded UAV onboard storage performance`
2. `ROS 2 Humble Jetson Orin Nano Super JetPack 6 MAVROS ArduPilot integration GPS-denied`
3. (Plus targeted re-checks of round-1 components: VPR backbones, MAVLink2 signing, free-threaded Python, SRTM 30 m DEM.)
## Completeness Audit
| Probe | Coverage |
|-------|----------|
| **Did we re-check every component the user named?** | ✅ VO (Q-R2-1), matcher (Q-R2-2), EKF (Q-R2-3), ortho (Q-R2-4). |
| **Did we sweep every other component for resource/confidence trade-offs?** | ✅ VPR (no new entrants — M-33), tile storage (MBTiles WAL stays — M-28), MAVLink (sysid + signing unchanged — M-31), software platform (CPython + ROS-2-vs-DIY surfaced as open Q — M-29, M-32), DEM (no change), camera (already locked). |
| **Did we surface contrarian failure modes per component?** | ✅ Custom-2-frame-VO is wrong (M-22); LiteSAM-on-Orin-Nano-Super is too slow inline (M-24); RoMa v2 / MASt3R-SLAM are GPU-class (M-25, S62); ArduPilot double-fusion is a bug, not a feature (M-26, M-30). |
| **Did we identify decisions that need user input vs decisions that are deterministic?** | ✅ ROS 2 vs DIY orchestrator (M-29) — needs user. Channel-split for hybrid (M-30) — recommendation Option A for v1, Option B v1.1+. |
| **Did we re-validate locked AC restrictions (camera, zoom, AC-NEW-7)?** | ✅ All lock-ins from round 1 carry forward unchanged. |
@@ -0,0 +1,223 @@
# Reasoning Chain — Mode B (Solution Assessment of `solution_draft01.md`)
For each Mode B finding (M-1..M-15 in `02_fact_cards.md`), trace the fact → comparison → conclusion path and pin the conclusion's confidence. Conclusions feed `solution_draft02.md`.
---
## M-1 — ODOMETRY vs GPS_INPUT (Component 6)
**Fact.** ArduPilot dev docs (S41) say "ODOMETRY (the preferred method)" for sending external-nav to EKF3. ODOMETRY: quaternion + velocity NED + 21-element pos+att covariance + quality 0..100. GPS_INPUT: lat/lon/alt + 3-D velocity + scalar `h_acc`/`v_acc` + `fix_type`. Both supported; both targetable from pymavlink.
**Reference comparison.** AC-4.3 originally states "Replacement for GPS module … via MAVLink GPS_INPUT, GPS1_TYPE=14". That's GPS-substitute framing, which suggests GPS_INPUT is the right channel. But AC-NEW-4 (false-position safety budget P[err>500m]<0.1%) requires the FC to act on **calibrated covariance** — and GPS_INPUT collapses our 6-DoF covariance into one scalar, which is information loss.
**Conclusion.** Hybrid output. Keep GPS_INPUT as the **primary "GPS-substitute" channel** (matches AC-4.3 framing, plays cleanly with FC operator workflows that expect a `GPS_RAW_INT`-shaped status). **Also emit ODOMETRY** when the EKF emits a fix with a full 6-DoF covariance and a non-trivial yaw observability — let the FC's EKF3 fuse the richer signal. Configure FC source priorities so GPS_INPUT is the failover in case ODOMETRY trips a parameter gate (VISO_QUAL_MIN). This is a *strict superset* of the draft's choice; the only cost is the extra MAVLink emit and the source-switching SITL test scope (M-11).
**Confidence.** ✅ High. Two L1 sources (S41 dev docs + S42 PR #19563), one L1 confirming the failure path is real (S43 PR #30080).
---
## M-2 — MASt3R off the primary matcher list
**Fact.** mast3r-runtime Jetson support = "Planned" (S57). Speedy MASt3R = 91 ms / pair on A40 GPU.
**Reference comparison.** A40 ≈ 38 TFLOPS FP16 (admin-class GPU); Jetson Orin Nano Super 25 W ≈ 1.7 TFLOPS FP16 (~67 TOPS sparse INT8). Throughput ratio ~22× to 30× depending on operator-mix. 91 ms × 22 ≈ 2 s/pair; × 30 ≈ 2.7 s/pair. Even with INT8 quantisation closing the gap by ~2× (typical for ViT-class), MASt3R lands at >1 s/pair — outside the 400 ms p95 budget by a factor of ≥2.5×.
**Conclusion.** MASt3R drops from the "stretch candidate" row in the draft's bench-off table to a **research-track-only** label. Bench-off resources should focus on SP+LG / XFeat / GIM-LightGlue / RoMa-distilled.
**Confidence.** ✅ High. Numbers are conservative — MASt3R has additional overhead from the depth backbone that doesn't exist in pure 2D matchers.
---
## M-3 — Add GIM-LightGlue to the bench-off
**Fact.** GIM (S48): self-trained generalist matcher, 8.418.1 % zero-shot improvement over LightGlue/RoMa/DKM/LoFTR baselines. Pre-trained checkpoints public.
**Reference comparison.** Our domain (eastern-Ukraine 1 km AGL nadir vs. service satellite tiles) has *zero* training data publicly available; the bench-off therefore tests zero-shot transfer. GIM's training paradigm (50 h of internet videos covering every kind of scene including aerial) is precisely the regime that maximises zero-shot transfer.
**Conclusion.** Add **GIM-LightGlue** to the matcher bench-off shortlist as a peer of vanilla SP+LG. If the published 818 % zero-shot gain holds on AerialVL + Mavic, GIM-LightGlue dominates the cost/quality frontier (same TRT path as SP+LG, better accuracy out of the box).
**Confidence.** ✅ High. ICLR 2024 spotlight; benchmark numbers reproduced by independent users in the GitHub issue tracker.
---
## M-4 — VPR shortlist expansion: + SALAD + BoQ
**Fact.** SALAD (S47, CVPR 2024): DINOv2 + Sinkhorn optimal-transport VLAD; R@1 = 75 % on MSLS Challenge / 92.2 % MSLS Val / 76 % NordLand; in `aero-vloc`. BoQ (S46, CVPR 2024): bag of learnable queries, beats NetVLAD/MixVPR/EigenPlaces/Patch-NetVLAD/TransVPR/R2Former on 14 benchmarks; DinoV2 results Nov 2024.
**Reference comparison.** AnyLoc (draft primary) is unsupervised VLAD over DINOv2 features; SALAD is *trained* DINOv2-VLAD via Sinkhorn; BoQ is *learnable queries* over a backbone (DINOv2 or ViT). SALAD strictly beats AnyLoc on the same backbone in published benchmarks. BoQ beats both on standard VPR benchmarks; aerial-specific numbers TBD but well-positioned.
**Conclusion.** The bench-off table grows from {AnyLoc, MixVPR} to **{AnyLoc, SALAD, BoQ, MixVPR}**. AnyLoc remains the training-free fallback; SALAD and BoQ are likely primaries.
**Confidence.** ✅ High on M-4 (sources are CVPR 2024 papers + GitHub repos with published weights). Aerial-domain ranking is empirical — the bench-off resolves it.
---
## M-5 — Latency budget has more headroom than the draft assumed
**Fact.** Jetson AI Lab (S40): DINOv2-base-patch14 = 126 inf/s on Orin Nano Super → ~8 ms/inf at 224×224, FP16 trtexec.
**Reference comparison.** Draft estimated 5080 ms / 224×224 for DINOv2 ViT-B (Component 2 row 1). Real number is **~610× better**. At 448×448 (more typical for AnyLoc descriptor extraction), expect ~32 ms/inf via near-quadratic scaling.
**Conclusion.** AC-4.1 (400 ms p95) is **comfortably feasible** with budget left over for SP+LG / GIM-LightGlue (target ~100 ms/pair) + EKF + MAVLink emit. R2 in the draft's risk table downgraded from High to Medium — empirical confirmation needed but no longer a make-or-break risk.
**Confidence.** ✅ High. NVIDIA L1 source.
---
## M-6 — mavlink-router CVE-class issue
**Fact.** S45: stack-based buffer overflow in mavlink-router config parsing, fuzzing-discovered, public, no SECURITY.md.
**Reference comparison.** mavlink-router is C++ daemon running with the same privileges as our companion process; if the config file is attacker-controlled (e.g., a tampered SD card on the airframe), this becomes RCE on the companion. Even if the config file is operator-controlled, a buggy config-file parser is one bug away from another related issue.
**Conclusion.** Three options, choose one:
1. **Pin a specific patched version + sandboxed systemd unit** (NoNewPrivileges, ReadOnlyPaths=/etc/mavlink-router/, MemoryDenyWriteExecute, RestrictAddressFamilies=AF_UNIX AF_INET).
2. **Replace with an in-process MAVLink endpoint multiplexer** (Python or Go, ~150 LOC) — eliminates the dependency entirely.
3. **Distinct system-IDs for MAVSDK + pymavlink** sharing the same serial port via ArduPilot's native MAVLink routing, no router daemon at all.
Option 3 is the simplest. Option 2 gives us the most control. Option 1 is the lowest-effort quick fix. Recommend **Option 3 for v1**, with Option 2 as v1.1 if MAVLink message volume saturates a single endpoint.
**Confidence.** ✅ High that the issue is real; choice of mitigation is implementation preference.
---
## M-7 — MAVLink2 signing is v1-mandatory
**Fact.** S44: signing supported in ArduPilot 4.5+ on telemetry links; USB bypasses; keys in FRAM.
**Reference comparison.** Without signing, anyone with serial-line access (companion side OR an exposed telemetry radio) can inject a `GPS_INPUT` (or ODOMETRY) frame and crash the vehicle. Signing makes that injection require possession of the FRAM key. The cost is one operator key-provisioning step per airframe.
**Conclusion.** Promote signing from "Security note (deferred to a Phase-4 security pass)" to a **v1 hard configuration item**. Document the key-provisioning procedure in the deploy runbook. Verify signing-on at boot and refuse to inject GPS_INPUT/ODOMETRY if the signed-frame ack from the FC indicates signing-off.
**Confidence.** ✅ High.
---
## M-8 — MBTiles operational recipe
**Fact.** S54: WAL + connection pool + transaction batching is the established recipe for MBTiles SQLite under concurrent reader+writer load. Default rollback journal mode causes `database is locked` failures.
**Reference comparison.** Our workload: many concurrent readers (matcher cache lookup at ≤3 fps × ~30 candidate tiles) + occasional writer (Component 1b ortho-tile write at ≤12 Hz × ~30 tiles). Without WAL, every writer commit blocks all readers. With WAL, readers and one writer proceed concurrently.
**Conclusion.** Update Component 1's "Tile format" row in the architecture table to specify: **MBTiles SQLite + WAL + connection pool + per-Component-1b-cycle transaction batching**. Add to AC-4.1 latency-budget validation: the tile-cache lookup must hit p95 ≤5 ms.
**Confidence.** ✅ High.
---
## M-9 — Cache-poisoning safety hazard
**Fact (analytical, not a single source).** Draft's dedup rule allows onboard tiles to overwrite stale service tiles when "our quality > existing". Quality = inlier count + sharpness; **does not include parent-pose covariance as a hard gate**. Combined with EKF over-confidence (a known failure mode — see W4.b), this lets a confidently-bad pose write a misaligned tile that becomes the next flight's anchor.
**Reference comparison.** Cartography literature consistently treats authoritative basemap as immutable and crowdsourced/UAV updates as voting input that requires consensus before promotion. SfM bundle-adjustment treats over-confident poses as the dominant error source.
**Conclusion.** Three layered mitigations:
1. Service-source tiles are **immutable within freshness budget**. Onboard tiles overwrite only stale or other-onboard tiles.
2. The Suite Service ingest applies a **voting layer**: an onboard tile gets promoted to "trusted basemap" only after **N≥2 independent flights** confirm consistent geo-alignment within X m.
3. Parent-pose covariance is a **hard gate** in the local quality score: σ_xy must be tighter than the generation-eligibility gate (e.g., σ_xy ≤ 5 m vs. 10 m generation gate), and a tile written above the hard gate is marked "soft" in its sidecar.
Add **AC-NEW-7 — Cache-poisoning safety budget**: P(onboard tile mis-aligned > 30 m) per flight < 1 %; P(misaligned > 100 m) per flight < 0.1 %. Validation: replay AerialVL with synthetic over-confidence injection.
**Confidence.** ⚠️ Medium. Hazard is real and qualitatively well-known; specific numeric thresholds need empirical calibration during implementation.
---
## M-10 — Free-threaded Python 3.13 not v1-ready
**Fact.** S55: experimental, single-threaded perf hit, GIL re-enables on non-FT-aware C extension import.
**Reference comparison.** Our hot-path includes: numba JIT kernels, TensorRT Python bindings, pymavlink (C extension), numpy/scipy, possibly cv2. Any one of these silently re-enabling the GIL nullifies the benefit. And the non-trivial single-threaded penalty (~1015 % per various benchmarks) directly hits AC-NEW-1 (cold-start TTFF <30 s).
**Conclusion.** v1 stays on **standard CPython 3.11 or 3.12** (newest stable, well-supported by JetPack / numba / TRT). Sharpen the rationale in the architecture: the choice is not "GIL is fine" but "asyncio + TRT subprocess workers + numba JIT is the production-ready combination today; revisit free-threading in v1.1."
**Confidence.** ✅ High.
---
## M-11 — ODOMETRY known production gotchas → SITL coverage required
**Fact.** S41/S42/S43: companion-derived velocity errors, position-estimate resets when external-nav reference loss, source-switching conflicts when running alongside GPS.
**Reference comparison.** AC-NEW-2 (3 s spoofing-promotion latency) **is** the source-switching path. Whatever output channel we pick (GPS_INPUT, ODOMETRY, or hybrid), the source switch is the high-risk transition.
**Conclusion.** Add an explicit testing requirement: **F-T9 (SITL: full MAVLink loop)** must include source-switching scenarios (jam onset → our channel → spoofed real-GPS recovery → operator-confirmed source restore). Include the `EK3_SRC1_*` parameter combinations being benchmarked in the test plan.
**Confidence.** ✅ High.
---
## M-12 — Eastern-Ukraine relief amplitude affects flat-Earth assumption
**Fact.** S56: ~24 m peak-to-trough relief in Kharkiv-region UAV survey areas, with creek/gully systems.
**Reference comparison.** At 1 km AGL with 35° HFOV camera, a 24 m elevation offset at frame edge → ~17 m horizontal misalignment when ortho-projected on flat-Earth. AC-1.1 budget = 50 m@80 % (comfortable); AC-1.2 = 20 m@50 % (tight).
**Conclusion.** Add a **per-sector DEM lookup** to the pre-flight tile-sync pass. Classify sectors:
- **flat** (≤5 m amplitude) — full ortho-tile generation, full anchor weight.
- **moderate** (515 m) — ortho-tile generation, anchor weight × 0.7.
- **rugged** (>15 m) — skip ortho-tile generation, anchor weight × 0.3 with explicit "rugged-sector" flag in confidence telemetry.
This is a small one-time pre-flight step (SRTM 30 m DEM is free, ~15 GB global, ~30 MB for 400 km²).
**Confidence.** ⚠️ Medium. Single regional sample; refine numbers when more terrain data lands.
---
## M-13 — TartanAir V2 reconsideration (open question)
**Fact.** S51: photo-realistic synthetic, native IMU + 12-cam + season variation + custom camera models.
**Reference comparison.** User's last-message reasoning was "Mavic-class dynamics ≠ fixed-wing dynamics → synthetic IMU is unlikely to produce a useful signal". TartanAir V2 lets us configure motion patterns, so the dynamics-mismatch argument is weaker than for MidAir-class quadcopter-only sims.
**Conclusion.** **Open question for the user**: include TartanAir V2 in the bench-off as an early-stage synthetic baseline (good for sweeping seasons / lighting / pitches), or hold to "real-data-only purism" with AerialVL + Mavic + planned-fixed-wing-flights as the only V&V?
**Confidence.** ⚠️ Medium. Technical viability is high; the call is product-side.
---
## M-14 — Add AerialExtreMatch + 2chADCNN to V&V plan
**Fact.** AerialExtreMatch (S49) — 1.5 M synthetic image pairs, 32 difficulty levels (overlap × scale × pitch), real-world UAV localization subset. 2chADCNN (S50) — season-aware UAV↔satellite template-matching.
**Reference comparison.** Draft's bench-off targets are AerialVL + UAV-VisLoc + internal Mavic. None of those grade against extreme-pitch / extreme-scale / extreme-overlap separately. Without a benchmark that crosses these axes, the bench-off can pick a winner that fails silently in cornered conditions.
**Conclusion.** Add to the V&V plan:
- **AerialExtreMatch** as a primary structured-difficulty regression bench.
- **2chADCNN** as a season-aware baseline either (a) included in the bench-off, or (b) used as an explicit season-robustness ceiling reference.
**Confidence.** ✅ High.
---
## M-15 — Real fixed-wing VO is harder than draft implies
**Fact.** S52 (AFIT thesis): SVO/DSO/ORB-SLAM2 all "had significant difficulty maintaining localisation" on real fixed-wing flights. S53: high-altitude (3001000 m AGL) VIO drift in the same band as our AC-1.3.
**Reference comparison.** Draft's choice ("custom 2-frame homography VO via Component-3 matcher") is correct framing — VO between satellite anchors is a **much easier** problem than standalone metric SLAM. But AC-1.3's drift budget (<100 m without IMU, <50 m with IMU between two satellite-anchored fixes) requires empirical confirmation against a real fixed-wing baseline.
**Conclusion.** Add to risks: **R8 — fixed-wing VO drift under our AC-1.3 budget is unconfirmed**. Mitigations:
1. Borrow AerialVL's fixed-wing trajectories (70 km of real fixed-wing flight) for AC-1.3 regression in `F-T1b` (new).
2. Plan the first internal fixed-wing flight before AC lock — not as a stretch goal.
**Confidence.** ✅ High.
---
## Summary table
| Finding | Severity | Affects | Resolution |
|---|---|---|---|
| M-1 | High | C-6, AC-4.3, AC-NEW-4 | Hybrid GPS_INPUT + ODOMETRY |
| M-2 | High | C-3 bench-off | Drop MASt3R from primary list |
| M-3 | Med | C-3 bench-off | Add GIM-LightGlue |
| M-4 | High | C-2 bench-off | Add SALAD + BoQ |
| M-5 | High (positive) | AC-4.1 | Downgrade R2 risk |
| M-6 | High (security) | C-6 | Replace mavlink-router OR sandbox & pin |
| M-7 | High (security) | C-6 | MAVLink2 signing v1-mandatory |
| M-8 | Med | C-1 | MBTiles WAL + pool + batching |
| M-9 | High (safety) | C-1b, AC-NEW | New AC-NEW-7 + dedup-rule changes |
| M-10 | Med | C-9 | Stay on CPython 3.11/3.12; sharpen rationale |
| M-11 | Med | C-5/C-6, AC-NEW-2 | Add SITL source-switching tests |
| M-12 | Med | C-1b, AC-1.2 | Per-sector DEM lookup + anchor weight |
| M-13 | Open question | datasets | Surface to user |
| M-14 | Med | V&V plan | Add AerialExtreMatch + 2chADCNN |
| M-15 | Med | C-4, AC-1.3 | Risk R8 + AerialVL F-T1b |
@@ -0,0 +1,151 @@
# Mode B Round 2 — Reasoning Chain
For each user-named component (Q-R2-1 … Q-R2-4) plus the sweep (Q-R2-5), the reasoning chain follows the **fact-confirm → reference-compare → conclude → confidence** pattern.
---
## Dimension 1: Visual Odometry (Component 4)
### Fact confirmation
- Draft02 C-4: "custom 2-frame VO via SuperPoint+LightGlue homography". (M-22)
- AFIT thesis (S52): SVO / DSO / ORB-SLAM2 all "had significant difficulty maintaining localisation" on real fixed-wing flights.
- High-altitude VIO field test (S72, MDPI Drones 2023): stereo-VIO = 2.186 m / 800 m at 40100 m AGL; monocular-VIO "acceptable but worse". At 1 km AGL motion parallax shrinks ~1025× per frame vs 100 m AGL, further degrading monocular-VO accuracy.
- cuVSLAM (S60, NVIDIA, Jul 2025): CUDA-accelerated, designed for Jetson, monocular + monocular-IMU + stereo modes, **<1 % ATE on KITTI / <5 cm on EuRoC**. Apache-2.0. Drop-in via `isaac_ros_visual_slam` (S64).
- DPVO / DPVO-QAT++ (S61, S73): SOTA deep VO. Original DPVO 25× real-time on RTX-3090 (4 GB GPU); DPVO-QAT++ benchmarked on RTX-4060 only (+52 % FPS TartanAir, +30 % FPS EuRoC). Orin Nano Super extrapolation: ~410 FPS plain DPVO, ~615 FPS DPVO-QAT++. Borderline for 10 Hz.
- MASt3R-SLAM (S62, CVPR 2025): 15 FPS on a single GPU; sub-1 Hz extrapolated on Orin Nano Super → infeasible inline.
- VINS-Fusion / OpenVINS / BASALT / SVO Pro on Orin Nano (S71): all build with non-trivial integration cost (OpenCV pinning, ROS plumbing, IMU-time-sync); none CUDA-accelerated; no accuracy advantage over cuVSLAM.
### Reference comparison
| VO option | Maturity on Orin Nano Super | Accuracy benchmark | Memory | Integration cost | Notes |
|-----------|----------------------------|--------------------|--------|-----------------|-------|
| **cuVSLAM (mono)** | ✅ NVIDIA-supported, reference designs exist | <1 % ATE KITTI / <5 cm EuRoC | <2 GB | ROS 2 wrapper (1 day) | M-22, S60, S64 |
| **DPVO / DPV-SLAM** | ⚠️ no Jetson port; ~615 FPS extrapolated | SOTA on TartanAir / EuRoC | 57 GB GPU | manual port + QAT | M-23, S61, S73 |
| **MASt3R-SLAM** | ❌ infeasible | best on EuRoC + 7-Scenes | 24 GB GPU class | research-track only | M-23, S62 |
| **VINS-Fusion** | ⚠️ ~15 FPS on Xavier NX after pinning | ~311 cm path err on EuRoC | ~1 GB | manual integration + memory tuning | S71 |
| **OpenVINS** | ⚠️ builds on Orin Nano w/ JetPack 6 | comparable to VINS-Fusion | ~1 GB | manual integration + ROS 2 plumbing | S71 |
| **BASALT / SVO Pro** | ⚠️ stereo-first; mono available | mid-tier | low | high integration cost | S71 |
| **Custom 2-frame homography VO (draft02)** | n/a | drift unbounded; AFIT thesis prediction: poor | low | "easy" but wrong design | M-22 |
### Conclusion
- **Replace draft02's custom 2-frame VO with cuVSLAM in monocular + IMU mode** (revised C-4).
- Defer DPVO / MASt3R-SLAM / VINS-Fusion / OpenVINS to a research-track bench-off only after cuVSLAM has empirical numbers on a fixed-wing 1 km AGL trajectory; if cuVSLAM underperforms, those are the fall-back candidates.
- IMU source for cuVSLAM: subscribe to MAVLink `RAW_IMU` / `SCALED_IMU` from FC at ~200400 Hz (path (a) of M-35). If sync jitter is too high, add a dedicated companion IMU in v1.1.
### Confidence
- ✅ High on "custom 2-frame VO is wrong"; ⚠️ Medium on "cuVSLAM is the right replacement" — high-altitude fixed-wing performance unproven on cuVSLAM's published benches. **Bench-off in F-T1b (revised) mandatory before AC-1.3 lock.**
---
## Dimension 2: Cross-view Matcher (Component 3)
### Fact confirmation
- LiteSAM (S58, MDPI Oct 2025): purpose-built satellite↔aerial. 6.31 M params (2.4× smaller than EfficientLoFTR's 15.05 M). RMSE@30 = 17.86 m on UAV-VisLoc. **497.49 ms / pair on Jetson AGX Orin** FP16-optimized.
- AGX Orin INT8 throughput ≈ 275 TOPS, Orin Nano Super ≈ 67 TOPS → 4× scaling factor. **LiteSAM on Orin Nano Super ≈ 15002000 ms / pair.** (M-24)
- RoMa v2 (S63, Nov 2025): SOTA dense, frozen DINOv3 + custom CUDA + predictive covariance. GPU-class footprint. (M-25)
- MapGlue / MATCHA (search results): cross-modal SOTA; no Jetson deployment data. (M-25)
- SuperPoint + LightGlue (TRT FP16): RTX 3080 = 0.95 ms (SP) + 2.54 ms (LG) at 320×240. Scaling to Orin Nano Super FP16: ~50 ms / pair at 320×240; ~200 ms / pair at 640×480. (search summary, S76 sanity check)
- XFeat (S08 round 1, search summary): 5× faster than other deep matchers; CPU-viable; TRT path on Jetson exists.
- Our budget (AC-4.1): 400 ms p95 end-to-end pipeline on Orin Nano Super 25 W → matcher must consume ≤150200 ms / pair to leave headroom for VPR + ortho + EKF + I/O.
### Reference comparison
| Matcher | Inline-feasible on Orin Nano Super @ 25 W? | Accuracy on satellite↔aerial | Specialization for cross-view? | Role |
|---------|--------------------------------------------|------------------------------|--------------------------------|------|
| **SP + LG (TRT FP16)** | ✅ ~50200 ms / pair | strong on UAV-VisLoc / AerialVL | generic | **inline lead** |
| **GIM-LightGlue** | ✅ same path as SP+LG | +8.418.1 % zero-shot vs LG (S48) | generic, internet-trained | **inline peer / bench-off** |
| **XFeat (sparse + semi-dense)** | ✅ very fast | weaker than SP+LG on cross-view | embedded-class | **degraded-power fallback** |
| **LiteSAM** | ❌ ~15002000 ms / pair | RMSE@30 = 17.86 m UAV-VisLoc, **best published satellite↔aerial** | yes (purpose-built) | **re-loc fallback / oracle / distillation teacher** |
| **GIM-RoMa / RoMa v2** | ❌ GPU-class | best dense matcher published | generic | **offline ceiling reference** |
| **MASt3R / MASt3R-SLAM** | ❌ infeasible | very high | dense reconstruction | research-track only |
| **MapGlue / MATCHA** | ❌ no Jetson data | strong on cross-modal | yes | research-track only |
| **2chADCNN** | ❌ wrong output type (template-overlap) | season-aware | yes | season-robustness ceiling reference |
| **Classical SIFT/ORB/AKAZE** | ✅ very fast | poor cross-view (F-A5) | no | last-resort degraded mode |
### Conclusion
- **SP+LG (TRT FP16/INT8) remains the inline matcher.** GIM-LightGlue is its peer in the bench-off.
- **LiteSAM joins the design in three non-inline roles**: re-loc fallback (cold start, σ_xy > 50 m, 1.52 s budget acceptable); validation oracle (offline regression bench); distillation teacher (train a smaller satellite-aerial-specialized student that fits the inline budget).
- **RoMa v2 + MASt3R + MapGlue + MATCHA** added to the matcher bench-off as **offline ceiling references only** so we know how much accuracy we trade by using SP+LG inline.
- **Bench-off scope (revised)** for the deferred research item: SP+LG, GIM-LightGlue, XFeat (sparse + semi-dense), LiteSAM (re-loc role), RoMa v2 (ceiling), MASt3R-SLAM (ceiling). Score on UAV-VisLoc + AerialVL + AerialExtreMatch + 2chADCNN-season-set + internal Mavic + first fixed-wing flight.
### Confidence
- ✅ High on roles + decisions; ⚠️ Medium on the AGX-Orin → Orin Nano Super 4× scaling for LiteSAM — bench-off should confirm.
---
## Dimension 3: EKF vs ESKF (Component 5)
### Fact confirmation
- ArduPilot EKF3 = classical extended Kalman filter, 24-state, runs at 400 Hz. **Not** an ESKF. (S65, S66, S67 + draft02 M-1)
- PX4 EKF2 is an ESKF (S68). We are not on PX4.
- Companion-side filter advantages of ESKF over EKF (S68, S69, S70): better quaternion/orientation handling via tangent-space covariance (Lie group), ~0.3 % CPU saved, better numerical conditioning, simpler equations.
- ArduPilot ExtNav best practice (S65, S66, S67): **only one position source per axis at a time**. ArduPilot has open / recently-closed bugs (#30076, #32506) when ExtNav and GPS are both fed with overlapping responsibilities.
- Round 1 M-1 said "emit BOTH GPS_INPUT AND ODOMETRY in parallel" — without specifying axis-level responsibility split. This is the bug condition documented by S65/S66.
### Reference comparison
| Filter location | Filter family | Worth it for v1? | Why |
|-----------------|---------------|------------------|-----|
| **FC (ArduPilot)** | EKF3 (regular EKF) | n/a — locked | Cannot swap; not user-controlled. |
| **Companion (draft02 plan)** | "loosely-coupled EKF" | ❌ remove for v1 | Causes double-fusion against the FC's own EKF3, observability mismatches, the M-1-class bug pattern. |
| **Companion (M-26 revised plan)** | None — replaced with **covariance calibrator + Mahalanobis outlier gate + source-label producer** | ✅ for v1 | Simpler; lets ArduPilot's EKF3 do the actual fusion. |
| **Companion (v1.1, if needed)** | Vanilla ESKF (S69) | ⚠️ optional | Only if F-T9 SITL shows EKF3 cannot handle our raw inputs. ESKF is the correct family — but smoothing visual fixes ahead of EKF3 is rarely the right answer; usually better to fix covariance estimation upstream. |
### Conclusion
- **Drop the companion-side EKF for v1.** Component 5 becomes a "covariance calibrator + outlier gate + source-label producer" — *no state propagation, no IMU integration on the companion*.
- **Hybrid GPS_INPUT + ODOMETRY revised (M-30)**:
- **Option A (v1 default)**: GPS_INPUT carries position + velocity + h_acc/v_acc. ODOMETRY is **disabled** for v1. ArduPilot configured `EK3_SRC1_*=GPS+Compass`. Failover to backup via `EK3_SRC2_*`.
- **Option B (v1.1+)**: ODOMETRY carries pose+velocity+yaw + 21-element covariance; GPS_INPUT held in reserve, not fused while ODOMETRY healthy. `EK3_SRC1_POSXY=ExternalNav`, `EK3_SRC2_POSXY=GPS`. Requires PR #30080-class fixes.
- **ESKF is the right family if and only if we re-introduce a companion-side filter later.** For v1 the question is moot.
### Confidence
- ✅ High on dropping companion-side filter for v1; ⚠️ Medium on whether v1.x will need it back — depends on F-T9 SITL evidence.
---
## Dimension 4: Ortho-Tile Generator (Component 1b)
### Fact confirmation
- Orthority (S59) is a maintained Python library: frame + RPC camera models, GeoTIFF DEM lookup, RPC refinement, pan-sharpening. CLI + API. Pip / conda installable.
- ODM is a post-processing batch SfM pipeline; recommended 128 GB RAM for 2500 images. Wrong tier.
- simple-ortho (S59 predecessor) is older and superseded by Orthority.
- Draft02 plan (C-1b): "Pinhole projection on per-sector DEM" — implicit hand-rolled implementation.
### Reference comparison
| Approach | Build cost | Per-frame latency | Maintenance | Risk |
|----------|-----------|-------------------|-------------|------|
| **Orthority** (S59) | ~1 day integration | unknown — must measure on Orin Nano Super | externalized | depends on Orin Nano latency check |
| **Hand-rolled `cv2.warpPerspective` + bilinear DEM lookup** | ~23 days | ~520 ms estimated | internal | reinvents distortion + DEM gimbal handling |
| **ODM / OpenSfM** | weeks | seconds-to-minutes (batch) | ext. | wrong tier |
| **MicMac** | weeks | seconds-to-minutes (batch) | ext. | wrong tier |
### Conclusion
- **Use Orthority for per-frame ortho.** Falls back to hand-rolled `cv2.warpPerspective` + bilinear DEM lookup if F-T14 measurements show Orthority's per-frame latency on Orin Nano Super exceeds the budget (estimate ≤3050 ms allotted to ortho).
- Reuses Orthority's distortion + RPC + DEM machinery instead of reinventing it.
### Confidence
- ✅ High on "use a library, not DIY"; ⚠️ Medium on "Orthority specifically" pending the latency measurement.
---
## Dimension 5: Component sweep (Q-R2-5)
### Fact confirmation + comparison + conclusion (compact)
| Component | Round 1 choice | Round 2 finding | Conclusion |
|-----------|----------------|-----------------|------------|
| **C-1 Tile cache** | MBTiles + WAL + connection pool + transaction batching | COG metadata-load issue (500 MB on 7 GB file) defeats selective access on UAV bandwidth; PMTiles strong only for HTTP serving — local microSD use loses SQLite-WAL concurrency. | **Unchanged** (M-28). |
| **C-2 VPR** | AnyLoc + SALAD + BoQ + MixVPR shortlist | No new VPR backbone displaces SALAD/BoQ on aerial cross-domain in 2025. 2025 SOTA is matcher-side (RoMa v2, LiteSAM, MASt3R-SLAM). | **Unchanged** (M-33). |
| **C-3 Cross-view matcher** | SP+LG lead, GIM-LG peer, MASt3R dropped | LiteSAM added as re-loc/oracle/teacher (M-24); RoMa v2 + MapGlue + MATCHA added as offline ceilings (M-25). | **Revised** — see Dimension 2. |
| **C-4 VO** | Custom 2-frame homography VO via SP+LG/GIM-LG | Custom VO is wrong design (M-22); cuVSLAM is the v1 candidate (M-23). | **Revised** — see Dimension 1. |
| **C-5 Fusion** | Companion-side loose-coupled EKF emitting GPS_INPUT + ODOMETRY in parallel | Companion-side EKF should be dropped (M-26); hybrid output revised to single-channel-per-axis (M-30); ESKF only if v1.1 evidence demands it. | **Revised** — see Dimension 3. |
| **C-6 MAVLink** | distinct sysid + native routing + signing | Survives unchanged; sysid collision-check added to deploy runbook (M-31). | **Unchanged** (M-31). |
| **C-7 Failsafe** | Unchanged from Mode A | No new findings. | **Unchanged.** |
| **C-8 Object localization** | trig + airframe-attitude fusion | No new findings. | **Unchanged.** |
| **C-9 Software platform** | CPython 3.11/3.12 + asyncio + TRT subprocess workers | ROS 2 Humble + Isaac ROS 3.2 is a proven reference for the same hardware (M-29); becomes the most-likely v1 path **if** we adopt cuVSLAM (M-23). Decision needs user input. | **OPEN QUESTION** — see Validation Log. |
| **C-10 FDR** | Unchanged + sector + trust_level fields | No new findings. | **Unchanged.** |
| **C-11 Confidence score** | Composite + per-channel emission | No new findings. | **Unchanged.** |
### Confidence
- ✅ High on every "Unchanged" row; ⚠️ Medium on M-29 (ROS 2 vs DIY) — pending user decision.
@@ -0,0 +1,75 @@
# Validation Log — Mode B
## Validation scenario
A typical 8-hour fixed-wing mission in eastern Ukraine, mid-summer, sunny. The UAV climbs to 1 km AGL on the way to the sector, transits ~50 km of corridor, performs ~1.5 h of dense coverage (sector-pattern), and returns. Mid-flight, the operator-side EW threat indicator reports a GPS-spoofing event. At minute 45 the companion computer browns out and reboots; at minute 90 the UAV passes over a 25-m-deep gully system; at minute 180 a sharp turn on weather avoidance reduces frame overlap to <5 % for two consecutive frames; at minute 310 the bench network drops out before tile upload finishes; at minute 470 the UAV lands.
For each of these waypoints, walk through what the system produces using the **Mode B-revised draft** vs. the **Mode A draft**.
---
## Expected behaviour by waypoint
### Cruise (steady state)
- **Mode A** — emit GPS_INPUT only; covariance collapsed to scalar `h_acc`. EKF in companion does the fusion.
- **Mode B (revised)** — emit GPS_INPUT (primary, GPS-substitute framing) **and** ODOMETRY (when full 6-DoF covariance is available; quality > VISO_QUAL_MIN). FC's EKF3 has access to richer signal; companion EKF is still the source of truth for source-label assignment.
- **Counterexample check** — what if ODOMETRY's covariance is wrong? VISO_QUAL_MIN gates it on the FC; GPS_INPUT path stays valid as failover. Net: no regression vs. Mode A.
### Spoofing event (AC-NEW-2)
- **Mode A** — listen to `GPS_RAW_INT` / `EKF_STATUS_REPORT`; promote our GPS_INPUT to fix_type=3D in <3 s.
- **Mode B (revised)** — same, plus M-11 SITL coverage of the EK3_SRC1_* parameter switch path. The known-bug landscape (S43) is now a hard test gate, not a risk.
- **Counterexample check** — what if the source switch deadlocks because PR #30080's fix isn't in the ArduPilot version we ship? Mitigation: pin to the ArduPilot version that contains the merged PR; document in deploy runbook.
### Companion brown-out + reboot (AC-NEW-1, cold-start TTFF <30 s)
- **Mode A** — TRT engines build at install time; CUDA / TRT init <5 s; cold-fix via VPR + matcher within remaining budget.
- **Mode B (revised)** — same path, but the latency-budget headroom is much bigger than draft assumed (M-5: DINOv2 ViT-B = 8 ms/inf at 224×224 on Orin Nano Super). Cold TTFF target moves from "tight" to "comfortable".
- **Counterexample check** — what if the CPython 3.13 free-threading question pulls us into experimental territory? Mode B explicitly rejects free-threading for v1 (M-10), so JIT warmup is bounded by numba on CPython 3.11/3.12 (well-characterised).
### Rugged-terrain segment (M-12)
- **Mode A** — flat-Earth assumption applied uniformly; tile generation runs even over the gully; 17 m horizontal misalignment at frame edge becomes a "high-quality" tile that overwrites a stale service tile. **Cache-poisoning hazard** (M-9).
- **Mode B (revised)** — pre-flight DEM classifies this sector as "rugged" (>15 m amplitude); ortho-tile generation **skipped** in this sector; satellite anchor weight × 0.3 with rugged-sector flag in telemetry.
- **Counterexample check** — what if the DEM is wrong / out-of-date? SRTM 30 m DEM has known artefacts in gully systems. Mitigation: also use the runtime self-classification — if the matcher's RANSAC inlier ratio drops below threshold for K consecutive frames, auto-promote the sector to "rugged" for the rest of the flight.
### Sharp turn (AC-3.2)
- **Mode A** — sharp turn frame fails VO (5 % overlap), satellite-based re-localization via VPR + matcher. ✓.
- **Mode B (revised)** — same, but the VPR pool now includes SALAD + BoQ + AnyLoc + MixVPR (M-4). Bench-off result determines runtime primary; AnyLoc remains the training-free fallback.
- **Counterexample check** — none introduced by Mode B.
### Tile upload network drop (post-flight)
- **Mode A** — diff-against-Service uploader; if the link drops, retry on next bench session.
- **Mode B (revised)** — same, plus the M-9 voting rule means upload failure delays "trusted basemap" promotion but doesn't break next mission's cache (the Service ingest layer holds onboard tiles in a "candidate" pool until 2nd-flight confirmation).
- **Counterexample check** — what if N=2 voting is too slow to react to fresh imagery? Set N=1 for sectors where the operator manually marks a tile-set as "trusted" (e.g., post-recon imagery).
### Landing + post-flight upload
- **Mode A** — uploader runs as one-shot; tiles + sidecars pushed to Service.
- **Mode B (revised)** — uploader pushes onboard tiles to a **candidate pool**, not directly to the basemap. Service ingest applies the M-9 voting layer.
- **Counterexample check** — does this slow down imagery freshness? Yes, by one mission for a given sector. AC-NEW-6 freshness budget already allows 6 months for active-conflict sectors and 12 months for stable rear sectors; one extra mission of latency is well inside that envelope.
---
## Review checklist
- [x] Mode B conclusions consistent with fact cards M-1..M-15.
- [x] No important dimensions missed (W1W13 cross-checked vs. weak-point findings; W3.b, W4.b, W4.d, W11, W12 are not blocking — flagged as residual research items in `solution_draft02.md` "Open Research").
- [x] No over-extrapolation (every conclusion traceable to ≥1 source S40+ or to an explicit analytical chain).
- [x] All conclusions actionable / verifiable (source-switching SITL test, AC-NEW-7 numeric budget, sector DEM table, etc.).
---
## Conclusions requiring user input
These items cannot be unilaterally resolved by Mode B and must be surfaced when handing the revised draft back to the user:
1. **M-13** — TartanAir V2 in the early-stage bench-off, yes/no?
2. **AC-NEW-7 numeric thresholds** — Mode B proposes P(misalignment > 30 m) < 1 % per flight; P(>100 m) < 0.1 %. Confirm or revise.
3. **M-6 mavlink-router decision** — three options (sandbox+pin / replace / no router with distinct system-IDs). Mode B recommends Option 3 for v1.
4. **M-1 hybrid output** — accept the GPS_INPUT + ODOMETRY hybrid, or stay GPS_INPUT-only?
These are the four residual user-facing open items for the Plan step.
@@ -0,0 +1,64 @@
# Mode B Round 2 — Validation Log
## Validation scenario
A nominal **30-minute fixed-wing sortie at 1 km AGL** over a 20×20 km eastern-Ukraine operational area. Mid-flight GPS jamming starts at t=10 min; persists 8 min; ends at t=18 min. One sharp turn at t=14 min (mid-jam). Companion is Jetson Orin Nano Super @ 25 W; FC is ArduPilot 4.5+ on Cube Orange; nav cam is ADTi 20MP APS-C @ 3 fps.
## Expected behaviour under draft03 (round-2 revisions)
| Phase | Expected behaviour | Why |
|-------|-------------------|-----|
| **t=010 min, GPS healthy** | cuVSLAM publishes pose to ROS 2; Component 5 calibrator passes the FC's real GPS through; companion's GPS_INPUT is held in reserve (not emitted). | Option A in M-30: GPS_INPUT only emitted when needed. |
| **t=10 min, jam onset** | Real-GPS quality drops; FC EKF3 starts rejecting noisy fixes. Companion detects jam via FC `GPS_RAW_INT` quality < threshold OR explicit operator command. Within 3 s (AC-NEW-2) starts emitting GPS_INPUT (`GPS1_TYPE=14`) with covariance from matcher + VO + cuVSLAM agreement. | Source-promotion logic in C-7; 3 s budget unchanged. |
| **t=1014 min, steady cruise under jam** | Per-frame: cuVSLAM provides relative pose (drift-bounded by keyframe + bundle adjustment); SP+LG (TRT FP16) matches frame against top-K VPR chunks; PnP yields absolute fix; covariance calibrator + Mahalanobis gate filter outliers; GPS_INPUT emitted at ≥1 Hz. End-to-end p95 ≤400 ms. | M-23 (cuVSLAM bounded drift) + M-26 (no companion-side EKF) + Component 3 inline matcher unchanged. |
| **t=14 min, sharp turn** | VPR re-loc trigger fires; FAISS top-K=20 over chunk index; matcher attempts pose recovery on neighbour chunks. **If steady-state SP+LG fails on the post-turn frame**, LiteSAM re-loc fallback (M-24) invoked at ~1.5 s budget; one-shot pose recovery; cuVSLAM is reset to the recovered pose. | M-17 (conditional VPR) + M-24 (LiteSAM re-loc role). |
| **t=1418 min, post-turn cruise** | Steady-state behaviour resumed. Per-sector ortho-tile generator (Component 1b) writes new tiles via Orthority for sectors where `parent_pose_sigma_xy ≤ 5 m` AND `terrain_class ∈ {flat, moderate}`. Service-tile immutability respected. | M-27 (Orthority) + M-9 (cache poisoning safety). |
| **t=18 min, jam ends** | FC sees real-GPS quality recover; companion stops emitting GPS_INPUT after operator-confirmed source-restore (>1 s confirmation latency, M-11/F-T9). | Bidirectional source-switch covered by F-T9 SITL. |
| **Post-flight** | Onboard tiles uploaded to Suite Service candidate pool; 2-flight voting promotes to trusted basemap. | Unchanged from round 1. |
## Validation against draft02 conclusions (counterexample search)
| Round 1 conclusion | Round 2 verdict | Reason |
|--------------------|-----------------|--------|
| **C-3 SP+LG lead, GIM-LG peer, MASt3R dropped** (M-2/M-3) | Survives. | LiteSAM added in non-inline roles (M-24); RoMa v2 added as ceiling reference (M-25). |
| **C-4 custom 2-frame VO via SP+LG/GIM-LG** | **REVISED** | Custom 2-frame homography VO is the wrong design for fixed-wing 1 km AGL flight (M-22, AFIT thesis S52). Replaced by cuVSLAM (M-23). |
| **C-5 loosely-coupled companion-side EKF emitting GPS_INPUT + ODOMETRY in parallel** (M-1) | **REVISED** | Companion-side EKF causes double-fusion against ArduPilot EKF3 (S65, S66). Replaced by covariance calibrator + outlier gate; no state propagation (M-26). Hybrid channel split changed: v1 emits GPS_INPUT only; ODOMETRY is v1.1+ work (M-30). |
| **C-1b custom pinhole projection on per-sector DEM** | **REVISED** | Use Orthority library instead of hand-rolled (M-27). Falls back to `cv2.warpPerspective + DEM bilinear` if F-T14 latency measurement fails. |
| **C-1 MBTiles + WAL + pool** (M-8) | Survives. | COG / PMTiles do not improve on this for our use case (M-28). |
| **C-2 VPR shortlist {AnyLoc, SALAD, BoQ, MixVPR}** (M-4) | Survives. | No new VPR backbone in 2025 (M-33). |
| **C-6 distinct sysid + native routing + signing** (M-6, M-7) | Survives. | (M-31) |
| **C-9 CPython 3.11/3.12 + asyncio + TRT** (M-10) | **OPEN QUESTION** | ROS 2 Humble + Isaac ROS 3.2 is the natural pair for cuVSLAM (M-29); decision pending user input. |
| **AC-NEW-7 cache-poisoning budget** (M-9) | Survives. | Orthography library swap doesn't change the safety budget. |
| **Camera ADTi 20MP APS-C, z=20 storage zoom** (M-20) | Survives. | (M-19, M-20 unchanged.) |
## Counterexamples
| Counterexample | Status |
|----------------|--------|
| **"What if cuVSLAM cannot operate at 1 km AGL because monocular parallax is too small?"** | Real risk — explicitly flagged in M-23. F-T1b (revised) bench-off MUST run cuVSLAM on AerialVL fixed-wing trajectories before AC-1.3 lock. Fall-back: re-introduce a custom-tracker VO that uses the matcher's inter-frame correspondences with bundle-adjustment + loop closure (i.e., a properly-scoped VO, not the 2-frame homography of draft02). |
| **"What if Orthority's per-frame latency on Orin Nano Super > 50 ms?"** | Documented in M-27. Fall-back: hand-rolled `cv2.warpPerspective + bilinear DEM` (~520 ms estimated). Decision deferred to F-T14 measurement. |
| **"What if dropping the companion-side EKF causes the FC's EKF3 to reject our covariances?"** | Documented in M-26. F-T9 SITL must verify; if EKF3 mishandles our raw inputs, re-introduce a vanilla ESKF (S69) as the smoothing layer. v1.1 work. |
| **"What if LiteSAM re-loc fallback (1.52 s) blows the AC-NEW-1 cold-start budget (30 s)?"** | 1.52 s << 30 s. Acceptable. (M-24) |
| **"What if ROS 2 + Isaac ROS overhead pushes us over the 400 ms p95 latency budget?"** | DDS overhead measured at ~25 % CPU (M-29). For our 8 GB shared-memory budget, the bigger risk is the deployment-image footprint (~200 MB extra). Latency impact at 3 fps inference is negligible. |
| **"What if MAVLink-rated IMU rate (~200400 Hz) is insufficient for cuVSLAM's sync sensitivity?"** | Documented in M-35. Fall-back: dedicated companion IMU. v1.1 hardware revision if needed. |
## Review checklist
- [x] Draft conclusions consistent with round-2 fact cards (M-22 … M-35).
- [x] No important dimensions missed: VO, matcher, fusion filter, ortho, plus full sweep (VPR, tile storage, MAVLink, software platform, FDR, confidence score, camera).
- [x] No over-extrapolation: cuVSLAM 1-km-AGL performance + Orthority latency + LiteSAM 4× scaling are explicitly flagged as needing empirical confirmation in F-T1b / F-T14 / matcher bench-off.
- [x] Conclusions actionable / verifiable: every revised component has a concrete test (F-T1b, F-T14, F-T9 SITL, matcher bench-off scope).
## Conclusions requiring user input (carried into the next gate)
1. **ROS 2 Humble + Isaac ROS vs. DIY Python orchestrator** for Component 9 (M-29). Recommendation: **ROS 2** if the team has any ROS 2 experience; **DIY Python** if not (re-skilling cost > overhead). User decides.
2. **Companion IMU strategy** (M-35). Recommendation: try MAVLink `RAW_IMU` from FC (path a) for v1; add dedicated IMU only if path (a) fails F-T1b. User decides if a dedicated IMU is acceptable as a hardware addition.
## Conclusions NOT requiring user input (locked by evidence)
- VO: cuVSLAM (M-23) — locked (subject to F-T1b empirical confirmation).
- Matcher: SP+LG inline + LiteSAM in re-loc role (M-24) — locked.
- Fusion: drop companion-side EKF; covariance calibrator only (M-26) — locked.
- Ortho: Orthority (M-27) — locked (subject to F-T14 measurement; documented fallback).
- Hybrid channel split: Option A for v1 (M-30) — locked.
- All other components — unchanged from draft02.
+449
View File
@@ -0,0 +1,449 @@
# Solution Draft 03
> **Mode**: B (Solution Assessment of `solution_draft02.md`).
> **Inputs**: `solution_draft02.md` (Mode B round 1) + `_docs/00_research/{03_mode_b_decomposition_round2,04_reasoning_chain_mode_b_round2,05_validation_log_mode_b_round2}.md` + Mode B round-2 sources S58S77 in `01_source_registry.md` + Mode B round-2 fact cards M-22..M-35 in `02_fact_cards.md`.
> **Date**: 2026-04-26 (Mode B round 2).
> **Self-contained**: yes — supersedes `solution_draft02.md`.
>
> **What changed in round 2** (driven by user-explicit asks: VO, matcher, EKF/ESKF, ortho-tile generator + thorough sweep):
>
> - **Component 4 (VO)**: replace draft02's *custom 2-frame homography VO via SP+LG* with **cuVSLAM** (NVIDIA, CUDA-accelerated, drop-in via `isaac_ros_visual_slam`) in monocular + IMU mode (M-22, M-23, S60, S64).
> - **Component 5 (Fusion)**: **drop the companion-side EKF entirely for v1**. Replace with a lightweight **covariance calibrator + Mahalanobis outlier gate + source-label producer** — no state propagation, no IMU integration on the companion (M-26). Let ArduPilot EKF3 do the actual fusion. The "EKF vs ESKF" question becomes: *if* we re-introduce a companion filter in v1.x, use vanilla ESKF (S68, S69) — but for v1 the question is moot.
> - **Component 5 (Hybrid output)**: walk back round-1 M-1's "emit BOTH GPS_INPUT AND ODOMETRY in parallel for the same axis" — that triggers ArduPilot EKF3 double-fusion bugs (S65, S66, S67). v1 ships **GPS_INPUT only** (Option A in M-30); ODOMETRY-primary mode is v1.1 territory.
> - **Component 3 (Matcher)**: **SP+LG (TRT FP16/INT8) remains the inline matcher**; **LiteSAM (S58) added in three non-inline roles**: re-localization fallback (cold start, σ_xy > 50 m), validation oracle, distillation teacher (M-24). RoMa v2 (S63), MASt3R-SLAM (S62), MapGlue, MATCHA added to the matcher bench-off as **offline ceiling references** (M-25).
> - **Component 1b (Ortho-Tile Generator)**: replace draft02's hand-rolled "pinhole projection on per-sector DEM" with **Orthority** (S59) — Python library, frame + RPC camera, GeoTIFF DEM, pip-installable. Documented fall-back to `cv2.warpPerspective + bilinear DEM` if F-T14 latency measurement fails (M-27).
> - **Component 9 (Software platform)**: **ROS 2 Humble + Isaac ROS 3.2** chosen (Q6 → A, locked 2026-04-26). Natural pair for cuVSLAM and a published reference architecture on Orin Nano Super (S64, S77, M-29). DDS overhead (~25 % CPU, ~200 MB image growth) accepted in exchange for free integration of `isaac_ros_visual_slam`, MAVROS, and `ros2 bag` / `rqt_*` observability tooling.
> - **Component 1 (Tile storage)**, **C-2 (VPR)**, **C-6 (MAVLink)**, **C-7/C-8/C-10/C-11**: unchanged from draft02 (M-28, M-31, M-33).
>
> **Locked-in user decisions carried over from round 1** (unchanged):
>
> - **Q1** → A: GPS_INPUT primary channel (now: ONLY channel for v1 — see M-30 above).
> - **Q2** → A: distinct system-IDs via ArduPilot native MAVLink routing; **no `mavlink-router` daemon**.
> - **Q3** → A: AC-NEW-7 thresholds confirmed at P(>30 m)<1 %, P(>100 m)<0.1 % per flight.
> - **Q4** → A: TartanAir V2 included as early-stage synthetic baseline.
> - **Q5** → B (round 1): proceed to Plan in fresh conversation. **Round 2 was triggered after rollback for additional component-replacement investigation.**
> - Camera spec → ADTi 20MP 20L V1 APS-C; storage zoom → z=20.
>
> **Round-2 user decisions locked-in (2026-04-26)**:
>
> - **Q6** → A: **ROS 2 Humble + Isaac ROS 3.2** as the v1 orchestrator (M-29). DIY Python orchestrator dropped. Codified in Component 9.
> - **Q7** → A: **MAVLink `RAW_IMU` / `SCALED_IMU` from FC** (path a) as the v1 IMU source for cuVSLAM (M-35). Dedicated companion IMU is a v1.1 hardware revision triggered only if F-T1c shows sync-jitter problems. Codified in Component 4.
---
## Assessment Findings (Round 2 additions)
The round-1 findings table (15 rows: M-1 … M-21, including addenda M-19/M-20/M-21) carries forward unchanged. **Round 2 adds the following findings, with the same `old → weak → new` pattern**:
| Old Component Solution (round 1) | Weak Point (round 2 evidence) | New Solution (round 2) |
|----------------------------------|-------------------------------|------------------------|
| **C-4 (round 1)**: "custom 2-frame VO via SuperPoint+LightGlue / GIM-LightGlue homography." | **Functional, high (M-22)**. Custom 2-frame homography skips loop closure, sparse bundle adjustment, and keyframe-based local mapping — every mechanism that bounds drift in production VO/SLAM. AFIT thesis (S52) shows even ORB-SLAM2/SVO/DSO struggle on real fixed-wing flights; a hand-rolled 2-frame variant will be strictly worse. At 1 km AGL motion parallax shrinks ~1025× per frame vs 100 m AGL, further degrading monocular VO. | **Replace with cuVSLAM** (NVIDIA, CUDA-accelerated, Apache-2.0; S60, S64). Monocular + IMU mode, drop-in via `isaac_ros_visual_slam` ROS 2 wrapper. <1 % ATE on KITTI / <5 cm on EuRoC. Fixed-wing 1 km AGL behaviour empirically TBD — bench-off in F-T1b mandatory before AC-1.3 lock. |
| **C-4 (round 1)**: same row, alternatives. | **Functional (M-23)**. Deep-VO alternatives evaluated for Orin Nano Super: DPVO/DPV-SLAM (S61, S73) extrapolate to 415 FPS — borderline for our 10 Hz target; MASt3R-SLAM (S62) is sub-1 Hz on Orin Nano Super — infeasible; VINS-Fusion / OpenVINS / BASALT / SVO Pro (S71) require non-trivial integration cost with no accuracy advantage over cuVSLAM. | cuVSLAM is **lead**; DPV-SLAM / VINS-Fusion / OpenVINS retained as **bench-off fall-backs** if cuVSLAM underperforms on fixed-wing 1 km AGL. MASt3R-SLAM / RoMa v2 reserved for **offline ceiling references**. |
| **C-3 (round 1)**: "SP+LG (TRT FP16) lead, GIM-LightGlue peer, RoMa/DKM bench-off, MASt3R dropped." | **Functional, positive (M-24)**. LiteSAM (S58, MDPI Oct 2025) is purpose-built for satellite↔aerial AVL: 6.31 M params (2.4× smaller than EfficientLoFTR), RMSE@30 = 17.86 m on UAV-VisLoc, beats EfficientLoFTR. **But on Jetson Orin Nano Super, extrapolated latency is ~15002000 ms / pair** (AGX Orin → Orin Nano Super 4× scaling) — outside our 400 ms p95 budget for inline use. | **Add LiteSAM in three non-inline roles**: (a) re-localization fallback (cold start, σ_xy > 50 m, 1.52 s tolerable); (b) validation oracle for offline regression bench; (c) distillation teacher to train a satellite-aerial-specialised student model that fits the inline budget. **Inline matcher remains SP+LG / GIM-LG.** |
| **C-3 (round 1)**: same row, ceilings. | **Functional, positive (M-25)**. RoMa v2 (S63, Nov 2025): SOTA dense matcher with frozen DINOv3 backbone + custom CUDA + predictive covariance — best published pose-estimation accuracy. MASt3R-SLAM (S62), MapGlue, MATCHA: cross-modal/multimodal matchers with strong specialisation. All GPU-class compute. | **Add RoMa v2, MASt3R, MapGlue, MATCHA to the matcher bench-off as offline ceiling references** so we know how much accuracy we trade by using SP+LG inline. None becomes inline candidate. |
| **C-5 (round 1, M-1)**: "Onboard loosely-coupled EKF emits two parallel MAVLink streams: GPS_INPUT (primary) AND ODOMETRY (auxiliary, when available) for the same axis." | **Functional, safety, high (M-26, M-30)**. ArduPilot ExtNav best practice (S65, S66, S67): **only one position source per axis at a time**. Open issues #30076 and #32506 document concrete EKF3 misbehaviours when both ExtNav (ODOMETRY) and GPS (GPS_INPUT) are fed for overlapping axes — including unstable position with high variances and Z-axis snap-to-ODOMETRY. The "emit both in parallel" framing was a misconfiguration, not a feature. | **v1 ships GPS_INPUT only** (Option A in M-30). ODOMETRY emission disabled in v1. ArduPilot configured `EK3_SRC1_*=GPS+Compass`; failover via `EK3_SRC2_*`. **Option B (ODOMETRY-primary) is v1.1 work** once F-T9 SITL confirms PR #30080-class source-switching is clean. |
| **C-5 (round 1)**: "loosely-coupled EKF in our process." | **Architectural (M-26)**. The companion-side EKF was always going to feed the FC's own EKF3 → double-fusion. Visual fix → companion EKF → ArduPilot EKF3 stacks two filters on overlapping observations, breaks the single-source-per-axis invariant, and risks the same instability documented in #30076/#32506. | **Drop the companion-side EKF for v1.** Component 5 becomes a **"covariance calibrator + Mahalanobis outlier gate + source-label producer"** — no state propagation, no IMU integration. Each upstream (matcher, cuVSLAM) emits a hypothesis with covariance; outliers are gated; covariances are re-scaled if empirical residuals show over- or under-confidence; results are emitted on the appropriate MAVLink channel. **If v1.x evidence demands a companion-side filter**, use vanilla **ESKF** (S68, S69) — the right family for orientation correctness. |
| **C-1b (round 1)**: "Pinhole projection on per-sector DEM (flat-Earth in flat sectors; SRTM-30 m DEM lookup in moderate sectors)." | **Engineering (M-27)**. Implicit hand-rolled implementation reinvents distortion handling, RPC refinement, DEM bilinear lookup, projection — all of which exist in the **Orthority** Python library (S59) under MIT-class licence, pip-installable. | **Use Orthority for per-frame ortho** (frame-camera mode). Falls back to `cv2.warpPerspective + bilinear DEM` (~520 ms estimated) if F-T14 measurement shows Orthority's per-frame latency on Orin Nano Super > 50 ms allotted to ortho. |
| **C-9 (round 1)**: "Single Python process (asyncio) on CPython 3.11/3.12; TRT subprocess workers." | **Architectural (M-29)**. With cuVSLAM adoption (M-23), the natural integration path is `isaac_ros_visual_slam` (ROS 2 wrapper) → MAVROS → FC. Re-exporting cuVSLAM into a custom asyncio orchestrator is high-friction. **ROS 2 Humble + JetPack 6 + Isaac ROS 3.2 is a published, working reference design on the exact hardware target** (S64, S77). | **OPEN QUESTION (Q6)**: ROS 2 Humble + Isaac ROS 3.2 vs. DIY Python orchestrator. ROS 2 cost: ~25 % CPU (DDS + topic serialisation), ~200 MB image growth, learning curve. ROS 2 benefit: free integration of cuVSLAM, MAVROS, observability via `ros2 bag` / `rqt_*`. **User decides.** |
(Round-1 findings M-1 through M-21 — including the Phase-1-correction addenda — remain unchanged in their original form; round-2 supersedes only the rows above. Full round-1 rationale lives in `solution_draft02.md` for traceability and `_docs/00_research/02_fact_cards.md`.)
---
## Product Solution Description (Revised)
A companion-computer software stack that runs on the **Jetson Orin Nano Super** alongside an **ArduPilot 4.5+** flight controller and provides **GPS-equivalent position fixes** to the autopilot when real GPS is jammed, spoofed, or denied.
**Localization pipeline (per frame at 3 fps nav cam):**
1. **cuVSLAM** (monocular + IMU from FC `RAW_IMU` MAVLink stream) provides drift-bounded **relative pose** with keyframe-based local mapping + sparse bundle adjustment + loop closure.
2. **VPR** (DINOv2 SALAD/BoQ chosen by bench-off; AnyLoc fallback) narrows the satellite basemap to a top-K candidate-chunk shortlist on re-localization triggers (cold start, sharp turn, σ_xy > 50 m) — **conditional invocation** keeps cruise overhead near zero.
3. **Cross-view matcher** (SP+LG TRT FP16 inline; GIM-LightGlue peer in the bench-off; LiteSAM as **re-loc fallback**) produces sub-pixel keypoint correspondences against the candidate chunks; PnP yields an **absolute pose** + covariance.
4. **Component 5** (**covariance calibrator + Mahalanobis outlier gate + source-label producer** — *not* an EKF) consumes the absolute pose + cuVSLAM relative pose; rejects outliers; re-scales covariances; emits result on the appropriate MAVLink channel.
5. **GPS_INPUT** (`GPS1_TYPE=14`, MAVLink2-signed, pymavlink) is sent to the FC. ArduPilot EKF3 (24-state classical EKF, 400 Hz) does the actual fusion of our GPS-equivalent fix with its own IMU, baro, compass.
**Tile generation** (in-flight, asynchronous):
1. Per-frame eligibility check (σ_xy ≤ 5 m hard gate, terrain class flat/moderate, EKF source = `satellite_anchored`).
2. **Orthorectification via Orthority** (frame-camera model + per-sector DEM from SRTM 30 m).
3. Quality scoring + dedup against existing tile cache (service-tile immutability respected).
4. Write to MBTiles SQLite cache (WAL + connection pool + transaction batching) with `parent_pose_sigma_xy`, `terrain_class`, `trust_level`.
5. **Post-flight**: tiles uploaded to **Suite Service candidate pool**; **2-flight voting** at Service ingest promotes onboard tiles to trusted basemap.
**Object localization** (separate path, AI camera): trig + airframe-attitude fusion via FC `ATTITUDE` MAVLink stream — unchanged from round 1.
**MAVLink endpoint**: shared between MAVSDK (telemetry, sysid=10) and pymavlink (GPS_INPUT, sysid=11) via **distinct system-IDs through ArduPilot's native MAVLink routing** — no `mavlink-router` daemon. **MAVLink2 signing mandatory in v1**.
```
Pre-flight (ground)
┌────────────────────────────────────────────────┐
│ Azaion Suite Satellite Service │
│ (sources commercial / agency imagery; │
│ ingests onboard tiles via candidate pool + │
│ 2-flight voting layer) │
└──────────────┬───────────────────┬─────────────┘
│ sync down │ upload back (post-flight)
▼ ▲
┌─────────────────┐
│ DEM (SRTM 30 m) │ ─────► sector classification
└─────────────────┘
Onboard (in-flight)
Nav Cam: ADTi 20MP, 3 fps AI Cam (gimbal+zoom, on-demand)
│ │
▼ ▼
┌────────────────────────────────────────────┐ ┌────────────────────┐
│ ROS 2 Humble + Isaac ROS 3.2 (Q6: TBD) │ │ Object Geo-Locator │
│ ┌──────────────────────────┐ │ │ (pinhole+ATTITUDE) │
│ │ cuVSLAM (mono + IMU) │←──FC RAW_IMU │ └──────┬─────────────┘
│ │ → keyframe pose + cov │ │ │
│ └────────────┬─────────────┘ │ │
│ ▼ │ │
│ ┌──────────────────────────┐ │ │
│ │ VPR (SALAD/BoQ/AnyLoc) │←─ re-loc │ │
│ │ on demand only │ triggers │ │
│ └────────────┬─────────────┘ │ │
│ ▼ │ │
│ ┌──────────────────────────┐ │ │
│ │ Cross-view Matcher │ │ │
│ │ inline: SP+LG / GIM-LG │ │ │
│ │ re-loc: LiteSAM (rare) │ │ │
│ └────────────┬─────────────┘ │ │
│ ▼ │ │
│ ┌──────────────────────────┐ │ │
│ │ PnP → absolute pose + Σ │ │ │
│ └────────────┬─────────────┘ │ │
│ ▼ │ │
│ ┌──────────────────────────────────────┐ │ │
│ │ Component 5 (NOT an EKF) │ │ │
│ │ - covariance calibrator │ │ │
│ │ - Mahalanobis outlier gate │ │ │
│ │ - source-label producer │ │ │
│ └────────────┬─────────────────────────┘ │ │
│ ▼ │ │
│ ┌──────────────────────────────────────┐ │ │
│ │ Ortho-Tile Generator (Orthority) │ │ │
│ │ → MBTiles+WAL Tile Cache │ │ │
│ └──────────────────────────────────────┘ │ │
└────────────────┬───────────────────────────┘ │
▼ │
GPS_INPUT (pymavlink, signed) ──► ArduPilot │
(GPS1_TYPE=14, EK3_SRC1_POSXY=GPS, EK3_SRC2=GPS)│
│ (ODOMETRY disabled for v1; v1.1+) │
▼ │
Telemetry summary 12 Hz ──────► QGroundControl │
│ │
▼ │
Flight Data Recorder (NVMe, 64 GB cap, no raw frames)
```
---
## Architecture
### Overall principles (revised vs draft02)
1. **Pipeline = stages with explicit confidence**. Each stage emits a pose hypothesis + covariance + categorical label. **Component 5 calibrates and gates; ArduPilot EKF3 fuses.** *(Revised — M-26.)*
2. **All heavy NN inference runs on GPU via TensorRT** (FP16, INT8 where validated). Pre-extract satellite-tile descriptors offline (AC-8.3). *(Unchanged.)*
3. **Orchestration**: **ROS 2 Humble + Isaac ROS 3.2** (Q6 → A, locked). cuVSLAM consumed via `isaac_ros_visual_slam`; MAVROS bridges ROS 2 ↔ MAVLink for the FC. Our matcher / VPR / ortho / Component-5 calibrator / FDR / uploader run as `rclpy` Python nodes. CPython 3.11 / 3.12 inside the nodes; TensorRT engines + CUDA contexts owned per-node. *(Revised — M-29.)*
4. **Persistent satellite cache** across flights (~10 GB for 400 km²); per-flight FDR is separate. *(Unchanged.)*
5. **Every output to the FC carries a covariance** — GPS_INPUT (`h_acc`, `v_acc`, `vel_acc`). ODOMETRY emission disabled for v1 (Option A in M-30). *(Revised — M-30.)*
6. **Service tiles are basemap truth**; onboard tiles go through Service-side voting before promotion (M-9). *(Unchanged.)*
7. **MAVLink2 signing on every companion↔FC link** (M-7). USB bypasses signing — bench-only access. *(Unchanged.)*
8. **No companion-side state propagation** — the FC's EKF3 is the only filter. Any future companion-side filter (v1.x) will be an **ESKF** (S69), not a regular EKF. *(New — M-26.)*
---
### Component 1: Satellite Tile Cache & Descriptor Index
**Unchanged from draft02 / Mode B round 1** — MBTiles SQLite + WAL + connection pool + transaction batching; FAISS IVF over per-chunk DINOv2-VLAD vectors (chunk-decoupled per M-16); `terrain_class` and `trust_level` sidecar. (M-28: COG + PMTiles considered and rejected for our use case.)
---
### Component 1b: Ortho-Tile Generator *(REVISED — M-27)*
**Library**: **Orthority** (S59, Python, MIT-class) — frame-camera model with GeoTIFF DEM lookup. Pip-installable: `pip install orthority`. Replaces draft02's hand-rolled "pinhole projection on per-sector DEM".
**Pipeline per frame** (eligibility / quality / dedup logic unchanged from draft02; only the *projection step* is replaced):
1. **Eligibility check** (unchanged from draft02 / M-9 hard gate): skip when EKF source is `dead_reckoned`, σ_xy > 5 m, roll/pitch > 10°, no inliers, or sector is `rugged`. Sectors classified `moderate` get `terrain_uncertainty=true` sidecar flag.
2. **Orthorectification (revised)**: call `orthority.Ortho(frame, dem, camera_model).process()` with the frame-camera model populated from FC `ATTITUDE` (gimbal pitch / roll / yaw) + companion-resolved position + airframe altitude. SRTM-30 m DEM tile pre-loaded for the operational area.
3. **Resampling to basemap projection** (unchanged): EPSG:3857 z=20.
4. **Quality scoring** (unchanged from draft02): sharpness + coverage + match_inliers + parent_pose_sigma_xy + glare/cloud flag.
5. **Deduplication / write decision** (unchanged from draft02 — M-9 service-tile-immutability + soft/candidate gates).
6. **Sidecar metadata** (unchanged): `parent_pose_sigma_xy`, `terrain_class`, `trust_level`.
**Latency budget**: F-T14 (revised) measures Orthority's per-frame latency on Orin Nano Super. **Budget: ≤50 ms / frame.** Documented fall-back if exceeded: `cv2.warpPerspective` + bilinear DEM lookup (~520 ms estimated).
---
### Component 2: Visual Place Recognition (Global Retrieval)
**Unchanged from draft02 / Mode B round 1.** AnyLoc + SALAD + BoQ + MixVPR shortlist; conditional invocation (M-17); chunk-based retrieval unit (M-16); expanding-window retry (M-18); multi-scale chunks + OSM road-overlay + sector-volatility-driven K (M-19); active-conflict scene-change mitigations stand. (M-33: no new VPR backbone in 2025 displaces this.)
---
### Component 3: Cross-View Matching & PnP *(REVISED — M-24, M-25)*
**Inline lead**: **SuperPoint + LightGlue (TRT FP16/INT8)** — unchanged. Feasibility re-confirmed: ~50200 ms / pair on Orin Nano Super FP16 at 320×240 → 640×480 (RTX 3080 baseline 0.96 + 2.54 ms scaled by Orin Nano Super throughput ratio; cross-validated by S76 YOLO26 reference points).
**Inline peer**: **GIM-LightGlue** — unchanged from draft02 (M-3, S48). +8.418.1 % zero-shot vs LightGlue baseline.
**Embedded fallback**: **XFeat (sparse + semi-dense)** — unchanged.
**Re-localization fallback** *(new — M-24)*: **LiteSAM** (S58). Invoked rarely (cold start, σ_xy > 50 m, sharp turn after cuVSLAM tracking loss). Latency budget: 1.52 s on Orin Nano Super. Accepted because re-loc events are rare and AC-NEW-1 cold-start budget is 30 s.
**Validation oracle** *(new — M-24)*: **LiteSAM run offline on bench data** for ground-truth-quality matches. Used to score the inline matcher's recall@30m on a per-flight basis without needing manual annotation.
**Distillation teacher** *(new — M-24)*: train a satellite-aerial-specialised student model (target ≤5 M params, ≤100 ms / pair) using LiteSAM-supervised correspondences on TartanAir V2 + AerialExtreMatch + UAV-VisLoc. Output is a candidate inline matcher for v1.x.
**Offline ceiling references** *(new — M-25)*: **RoMa v2** (S63), **MASt3R-SLAM** (S62), **MapGlue**, **MATCHA** — included in the matcher bench-off so we know how much accuracy we trade by using SP+LG inline. None becomes inline candidate.
**Bench-off scope (revised)** for the deferred research item:
- Inline candidates (must fit in 200 ms / pair on Orin Nano Super @ 25 W): SP+LG, GIM-LightGlue, XFeat (sparse), XFeat (semi-dense).
- Re-loc candidates (must fit in 2 s / pair): LiteSAM.
- Offline ceilings: RoMa v2, MASt3R-SLAM, MapGlue, MATCHA.
**Bench-off targets** (unchanged from draft02): AerialVL, UAV-VisLoc, AerialExtreMatch, 2chADCNN season set, TartanAir V2, internal Mavic, first internal fixed-wing flight.
**Score on**: AC-1.1 / AC-1.2 / AC-2.2 / p95 latency on Orin Nano Super 25 W / sustained 30-min thermal stability / peak GPU memory / **plus seasonal-robustness score** / **plus accuracy-vs-inline-feasibility frontier (re-loc role only for >200 ms candidates)**.
**PnP & projection**: unchanged from draft02.
**Input downsampling**: unchanged starting points (1024×768 for SP+LG / GIM-LG; 640×480 for XFeat sparse).
---
### Component 4: Visual Odometry *(REVISED — M-22, M-23)*
**v1 choice**: **cuVSLAM** (NVIDIA, CUDA-accelerated, Apache-2.0; S60). Monocular + IMU mode. Drop-in via `isaac_ros_visual_slam` ROS 2 wrapper (S64). Replaces draft02's "custom 2-frame VO via SP+LG / GIM-LG homography".
**Why cuVSLAM**:
- Production-grade VO/SLAM with keyframe-based local mapping + sparse bundle adjustment + loop closure — bounds drift, unlike a 2-frame homography.
- CUDA-accelerated, optimized for Jetson. Reference designs on Orin Nano (S64, S77) confirm runtime feasibility.
- <1 % ATE on KITTI / <5 cm on EuRoC.
- Minimal integration cost via the ROS 2 wrapper.
**Why not the alternatives**:
- DPVO / DPV-SLAM (S61, S73): extrapolated 415 FPS on Orin Nano Super — borderline for 10 Hz target. Reserved as bench-off fall-back.
- MASt3R-SLAM (S62): sub-1 Hz on Orin Nano Super — infeasible inline.
- VINS-Fusion / OpenVINS / BASALT / SVO Pro (S71): non-trivial integration cost; no accuracy advantage. Reserved as bench-off fall-backs.
- Custom 2-frame homography VO (draft02): wrong design (M-22).
**IMU source for cuVSLAM** (Q7 → A, locked, M-35): **MAVLink `RAW_IMU` / `SCALED_IMU` from FC** at ~200400 Hz (path a). Subscribed inside the cuVSLAM node via MAVROS. **F-T1c** (new field test) measures sync-jitter under flight load; if it fails the threshold (TBD by cuVSLAM tolerance), v1.1 adds a dedicated companion IMU (BNO055 / ICM-42688P / BMI270) over SPI as a hardware revision.
**Camera intrinsics**: nav cam (ADTi 20MP APS-C) calibrated pre-flight via standard checkerboard (M-34). cuVSLAM consumes the `camera_info` topic at start-up.
**Risk R8 reframed**: cuVSLAM's high-altitude fixed-wing performance is empirically unproven (its published benchmarks are urban driving + indoor MAV). **F-T1b (revised) bench-off mandatory before AC-1.3 lock**.
**Fall-back path**: if cuVSLAM underperforms on AerialVL fixed-wing trajectories, use a properly-scoped VO (DPV-SLAM with keyframe + bundle adjustment + loop closure, not 2-frame homography) as the v1.1 candidate. Custom 2-frame VO never comes back.
---
### Component 5: Companion-Side Output Stage *(REVISED — M-26, M-30)*
**Renamed**: was "IMU + Visual EKF Fusion" in draft02. Now: **"Companion-Side Output Stage — Covariance Calibrator + Outlier Gate + Source-Label Producer"**.
**Responsibility (v1)**:
1. Consume cuVSLAM relative-pose + cross-view matcher absolute-pose hypotheses.
2. Run a Mahalanobis outlier gate to drop fixes whose innovation w.r.t. cuVSLAM relative pose exceeds a threshold (computed against AC-NEW-4 false-position safety budget).
3. Re-scale covariances using empirical residuals (online, exponentially-weighted) to correct for systematic over- / under-confidence in the matcher / VPR / VO outputs.
4. Tag the result with a categorical source label: `satellite_anchored / vo_extrapolated / dead_reckoned`.
5. Emit on the appropriate MAVLink channel (GPS_INPUT for v1, Option A in M-30).
**Explicitly NOT in v1**:
- ❌ State propagation (no `x_{k+1} = f(x_k, u_k) + w_k`).
- ❌ IMU integration (the FC's EKF3 does this with the FC's own IMU at 400 Hz).
- ❌ ODOMETRY emission (Option B in M-30 — v1.1+).
**ESKF question resolved**: ArduPilot EKF3 is a regular EKF (24-state) — we cannot swap the FC filter (S65, S66, S67, S68). The EKF-vs-ESKF debate applies only to a hypothetical companion-side filter, which we drop for v1. **If v1.x evidence (F-T9 SITL) demands a companion-side filter, use vanilla ESKF** (S69) — the right family for orientation correctness, with tangent-space covariance on SO(3).
**Hybrid-output channel split (M-30)**:
| Mode | `EK3_SRC1_*` configuration | Channel emission | Status |
|------|---------------------------|------------------|--------|
| **Option A (v1 default)** | `POSXY=GPS, VELXY=GPS, YAW=GPS+Compass`. `EK3_SRC2_*=GPS` for failover. | GPS_INPUT only (`GPS1_TYPE=14`). ODOMETRY disabled. | Ships in v1. |
| **Option B (v1.1+)** | `POSXY=ExternalNav, YAW=ExternalNav`. `EK3_SRC2_POSXY=GPS` for failover. | ODOMETRY primary; GPS_INPUT held in reserve, not actively fused while ODOMETRY healthy. | Requires PR #30080 fix; gated on F-T9 SITL pass. |
---
### Component 6: MAVLink Integration & Source Promotion
**Unchanged from draft02 / round 1.** MAVSDK (telemetry, sysid=10) + pymavlink (GPS_INPUT, sysid=11), distinct system-IDs sharing the serial port via ArduPilot's native MAVLink routing. **No mavlink-router daemon.** **MAVLink2 signing mandatory**, per-airframe key in FC FRAM. Source-promotion logic and AC-NEW-2 (<3 s spoofing-promotion latency) carry forward unchanged. (M-31: sysid collision-check added to deploy runbook.)
---
### Component 7: Failsafe, Health & Re-Localization
Unchanged from draft02.
---
### Component 8: Object Localization (AI Camera)
Unchanged from draft02.
---
### Component 9: Software Platform & Process Topology *(LOCKED — Q6 → A, M-29)*
**v1 choice**: **ROS 2 Humble + Isaac ROS 3.2 on JetPack 6 / Ubuntu 22.04** (S64, S77).
**Process topology**:
- **C++ Isaac ROS node**: cuVSLAM via `isaac_ros_visual_slam` (consumes `camera_info` + image stream + IMU; publishes `nav_msgs/Odometry`).
- **C++ MAVROS node**: bridges ROS 2 ↔ MAVLink for the FC. `RAW_IMU` / `SCALED_IMU` subscribed by the cuVSLAM node; FC `ATTITUDE` consumed by Component 1b ortho node; `GPS_INPUT` published by Component 5 calibrator node.
- **Python `rclpy` nodes**: matcher (SP+LG TRT FP16/INT8), VPR (SALAD/BoQ on demand), Component 1b ortho generator (Orthority), Component 5 calibrator + outlier gate, FDR writer, Suite-Service uploader.
- **TensorRT engines + CUDA contexts** owned per-node (no shared CUDA context). Engines loaded at node start-up; warm-up inference at boot.
**Stack details (locked)**:
- CPython **3.11 or 3.12** inside `rclpy` nodes (free-threaded 3.13 deferred to v1.x — M-32, M-33).
- TensorRT **FP16 default**, INT8 where validated by the matcher bench-off.
- **numba JIT** for the calibrator's hot path (Mahalanobis distance + covariance re-scale).
- Configuration via YAML; structured-JSON logging to FDR; `ros2 bag` for in-flight telemetry capture.
**Cost / benefit reaffirmed**:
- **Cost**: ~25 % CPU for DDS + topic serialisation; ~200 MB extra deployment-image footprint; learning curve (mitigated by published reference designs in S64, S77).
- **Benefit**: drop-in `isaac_ros_visual_slam` for cuVSLAM, drop-in MAVROS for the FC bridge, free observability via `ros2 bag` and `rqt_*`, battle-tested by the wider robotics community.
**Reference designs**: S64 (Hackster.io GPS-Denied Drone), S77 (thomasthelliez ROS 2 / Isaac ROS guide), `bandofpv/VSLAM-UAV` (PX4 + ROS 2 reference), `sidharthmohannair/ros2-ardupilot-sitl-hardware` (ArduPilot + ROS 2 reference).
---
### Component 10: Flight Data Recorder
Unchanged from draft02 / round 1.
---
### Component 11: Confidence Score (cross-cutting)
Unchanged from draft02 / round 1.
---
## Testing Strategy
### Functional / Integration
- **F-T1** Tile cache load/lookup *(unchanged)*.
- **F-T1b** *(REVISED — M-22, M-23, R8 reframed)* AC-1.3 drift regression: run **cuVSLAM** on AerialVL fixed-wing trajectories (70 km of real flight). Pass = drift ≤ 100 m mono-only / ≤ 50 m mono+IMU between satellite anchors at 95th percentile. **Gates AC-1.3 lock.** If cuVSLAM fails: fall back to DPV-SLAM bench / VINS-Fusion bench.
- **F-T1c** *(new — M-22, M-23)* Compare cuVSLAM mono vs cuVSLAM mono+IMU on the same AerialVL trajectories — quantifies IMU contribution given MAVLink-rated IMU rate (path (a) of M-35).
- **F-T2** Tile generation + dedup *(extended — M-9 + M-27)*: replay a recorded flight; assert (a) ≤1 tile per ground sector covered ≥2× by nav cam; (b) tile has `parent_pose_sigma_xy` ≤ hard gate; (c) service tiles never overwritten within freshness budget; **(d) Orthority output equivalent to ground-truth ortho (RMSE < 1 px on synthetic frame with known DEM)**.
- **F-T3** Tile uploader → candidate pool *(unchanged from draft02)*.
- **F-T4** End-to-end against AerialVL.
- **F-T5** End-to-end against UAV-VisLoc.
- **F-T5b** End-to-end against AerialExtreMatch *(unchanged from draft02 — M-14)*.
- **F-T5c** Season-robustness regression against 2chADCNN season set *(unchanged from draft02 — M-14)*.
- **F-T6** End-to-end against internal Mavic flight footage.
- **F-T7** Sharp-turn handling *(extended — M-24)*: assert LiteSAM re-loc fallback recovers within 2 s on post-turn frames where SP+LG inline matcher fails.
- **F-T8** Disconnected-segment re-localization *(extended — M-24)*: include LiteSAM re-loc in the test matrix.
- **F-T9** ArduPilot SITL: full MAVLink loop *(REVISED — M-30)*. Test matrix:
- **Option A mode** (v1 default): GPS_INPUT only; verify EKF3 fuses correctly; verify failover to backup GPS via `EK3_SRC2_*`.
- **Option B mode** (v1.1 candidate): ODOMETRY-primary; verify PR #30080-class source-switching is clean; verify GPS_INPUT held in reserve does not double-fuse (issues #30076 / #32506 regression test).
- Source switching: jam-onset → our channel; spoofed-real-GPS recovery → operator-confirmed source-restore.
- **MAVLink2 signing on**: assert injection refused on signing failure; assert acceptance on valid signing.
- **F-T10** Operator re-loc workflow via QGC `STATUSTEXT` *(unchanged)*.
- **F-T11** Cold-start TTFF <30 s (AC-NEW-1) *(extended — M-24)*: include LiteSAM as the cold-start re-loc path.
- **F-T12** Spoofing-promotion <3 s (AC-NEW-2) *(unchanged)*.
- **F-T13** Object localization with airframe-attitude fusion *(unchanged)*.
- **F-T14** *(REVISED — M-27)* Per-sector DEM classification + **Orthority per-frame latency**: load SRTM-30 m for the operational area; assert sector classes (`flat`, `moderate`, `rugged`) match ground-truth DEM amplitudes; **measure Orthority per-frame ortho latency on Orin Nano Super @ 25 W**; assert ≤ 50 ms / frame budget. If exceeded: switch to `cv2.warpPerspective + bilinear DEM` fall-back.
- **F-T15** VPR retrieval-unit bench *(unchanged from draft02 — M-16/17/18)*.
- **F-T16** Synthetic cloud-occlusion injection *(unchanged from draft02)*.
- **F-T17** Mission replay assertion *(unchanged from draft02 — M-17)*.
- **F-T18** *(new — M-26)* Companion-side calibrator regression: replay a recorded flight; assert the calibrator's empirical residuals lie within the configured Mahalanobis gate; assert no state-propagation logic is invoked; assert ArduPilot EKF3 receives well-calibrated covariances (post-flight comparison of `h_acc` reported vs measured residual).
- **F-T19** *(new — Q6)* If Q6.A is chosen: ROS 2 topic-rate sanity test — assert all ROS 2 topics meet expected publish rates under simulated load.
### Non-Functional
- **NF-T1** Latency p95 <400 ms on Orin Nano Super 25 W (AC-4.1) *(unchanged)*.
- **NF-T2** Memory <8 GB shared (AC-4.2) *(extended — Q6)*: ROS 2 + Isaac ROS deployment image must fit; reserve ≥1 GB for matcher + VPR engines.
- **NF-T3** Thermal: 8 h sustained 25 W (AC-NEW-5) *(unchanged)*.
- **NF-T4** False-position safety budget (AC-NEW-4) *(extended — M-26)*: Monte Carlo with synthetic over-confidence injection; verify Component 5's outlier gate rejects bad fixes BEFORE they reach ArduPilot EKF3 (companion-side gate; FC EKF3 gate is a second line of defence).
- **NF-T4b** AC-NEW-7 cache-poisoning safety budget *(unchanged — M-9)*.
- **NF-T5** Storage: 64 GB FDR cap with rollover *(unchanged)*.
- **NF-T6** Imagery freshness gate (AC-NEW-6) *(unchanged)*.
### Security
- **S-T1****S-T5** *(unchanged from draft02)*.
### Field
- **FT-1****FT-3** *(unchanged from draft02)*.
---
## Key Risks & Open Items (carried into Plan step)
| ID | Risk | Severity | Mitigation |
|----|------|----------|------------|
| R1 | Imagery licensing lead time (Service-side) | Med | Suite Service procurement |
| R2 | Latency budget on Orin Nano Super at 1024×768 | Med | Empirical bench-off in week 1 of impl |
| R3 | Cross-view accuracy at 1 km AGL with Ukrainian seasonal change | Med | 50 %@20 m hard floor; bench-off includes SALAD/BoQ/GIM-LG/2chADCNN/**LiteSAM-as-oracle** |
| R4 | MAVSDK + pymavlink coexistence | **Resolved** (M-6) | — |
| R5 | Thermal at 25 W for 8 h | Med | NF-T3 |
| R6 | AC-7.1 in turning flight | Low | v1.1 |
| R7 | Public dataset gap (V&V) | Med | Bench-off + first internal fixed-wing flight before AC-1.3 lock |
| **R8** *(REFRAMED — M-22, M-23)* | **cuVSLAM 1 km AGL fixed-wing performance is empirically unproven** | Med | F-T1b on AerialVL fixed-wing trajectories; FT-3 first internal fixed-wing flight; documented fall-back to DPV-SLAM / VINS-Fusion |
| R9 | Cross-flight cache poisoning | High (safety) | Service-tile immutability + 2-flight voting + σ_xy hard gate + AC-NEW-7 |
| R10 | Companion↔FC link is flight-critical attack surface | High (security) | MAVLink2 signing mandatory + native routing |
| R11 | ArduPilot ExtNav source-switching gotchas | Med | F-T9 SITL matrix; pin ArduPilot to PR #30080 version |
| R12 | Eastern-Ukraine relief amplitude breaks flat-Earth assumption | Med | Per-sector DEM lookup + runtime self-classifier |
| **R13** *(new — M-27)* | **Orthority per-frame latency on Orin Nano Super may exceed budget** | LowMed | F-T14 measurement; fall-back to `cv2.warpPerspective + bilinear DEM` (~520 ms estimated) |
| **R14** *(new — M-26, M-30)* | **Dropping companion-side EKF may surface FC-side covariance-handling issues** | LowMed | F-T18 calibrator regression + F-T9 SITL Option A; if EKF3 mishandles raw inputs, re-introduce vanilla ESKF in v1.x |
| R15 *(M-29)* | Orchestrator choice (Q6 → A locked: ROS 2 Humble + Isaac ROS 3.2) | **Resolved** | — |
| **R16** *(M-35)* | **MAVLink-rated IMU may be insufficient for cuVSLAM sync sensitivity** | LowMed | F-T1c IMU-sync-jitter measurement; v1.1 hardware revision adds dedicated companion IMU if F-T1c fails (Q7 → A locked: path (a) for v1) |
---
## Proposed AC additions
**AC-NEW-7 — Cache-poisoning safety budget** *(unchanged from draft02 — M-9)*.
**AC-NEW-8 — VO drift bound on fixed-wing 1 km AGL** *(new — M-22, M-23, R8 reframed)*. Specifically: cuVSLAM (mono+IMU) drift between satellite anchors ≤ 50 m at 95th percentile on AerialVL fixed-wing trajectories; ≤ 100 m mono-only. Validated by **F-T1b**.
**AC-NEW-9 — Companion-side covariance calibration accuracy** *(new — M-26)*. Empirical residuals of GPS_INPUT pose, computed against ground truth on F-T1b trajectories, must lie within the reported `h_acc`/`v_acc` covariance with probability ≥ 95 %. (Calibration must not under- or over-claim.) Validated by **F-T18**.
---
## Open Research (deferred to dedicated research passes before Plan)
| Topic | Why now | Output | Owner |
|-------|---------|--------|-------|
| **Cross-view matcher bench-off** *(REVISED scope — M-24, M-25)* | Inline + re-loc + offline-ceiling tracks are now distinct | Selected inline matcher; selected re-loc matcher; ceiling reference numbers; distillation candidate teacher (LiteSAM) | Research skill, follow-up Mode A pass |
| **Input-resolution sweep** | Same as draft02 | Resolution per matcher candidate; sensitivity curves | Same pass |
| **VPR backbone bench-off** | Same as draft02 | Selected VPR backbone | Same pass |
| **VO bench-off** *(new — M-22, M-23)* | cuVSLAM is the lead but unproven on 1 km AGL fixed-wing | cuVSLAM mono / cuVSLAM mono+IMU / DPV-SLAM / VINS-Fusion / OpenVINS comparison on AerialVL + first internal fixed-wing flight | Research / impl. team |
| **Tile-generator quality scoring** | Same as draft02 | Calibrated thresholds for σ_xy / sharpness / glare | Implementation phase |
| **Orthority per-frame latency on Orin Nano Super** *(new — M-27)* | Confirms or rejects M-27 library choice | F-T14 measurement; if fail → `cv2.warpPerspective + bilinear DEM` fall-back path locked | Implementation phase |
| **Internal Mavic-flight V&V dataset** | Same as draft02 | Curated, ground-truth-labelled clips | Operations / data team |
| **First internal fixed-wing flight** | Same as draft02 | Recorded sortie with synced IMU + GPS truth + nav-cam stream | Field-test plan |
| ~~Q6 — Orchestrator decision~~ | **Locked 2026-04-26**: Q6 → A (ROS 2 Humble + Isaac ROS 3.2). | — | — |
| ~~Q7 — Companion IMU strategy~~ | **Locked 2026-04-26**: Q7 → A (MAVLink `RAW_IMU` from FC for v1; dedicated IMU only if F-T1c fails). | — | — |
| **Encryption-at-rest key management** | Same as draft02 | Threat-modelled design | Phase 4 security analysis |
---
## References
All citations are by ID from `_docs/00_research/01_source_registry.md`. Mode B round 2 sources: **S58S77** (round 1 sources S40S57 carried over).
- **VO**: S60 (cuVSLAM), S61 (DPVO-QAT++), S62 (MASt3R-SLAM), S64 (Isaac ROS UAV reference), S71 (VINS-Fusion / OpenVINS Jetson reports), S72 (high-altitude VIO), S73 (DPV-SLAM).
- **Matcher**: S58 (LiteSAM), S63 (RoMa v2), S74 (OrthoLoC + AdHoP), S75 (AerialExtreMatch open-review).
- **Fusion**: S65 (ArduPilot ExtNav double-fusion bug), S66 (Z-axis snap bug), S67 (EKF sources spec), S68 (PX4 EKF2 ESKF PR), S69 (Sola ESKF tutorial), S70 (T-ESKF + Hybrid ESKF/UKF 2025).
- **Ortho**: S59 (Orthority).
- **Sweep**: S76 (Orin Nano Super FP16/INT8 reference points), S77 (ROS 2 / Isaac ROS practical guide).
---
## Related Artifacts
- Mode A draft: `_docs/01_solution/solution_draft01.md` (superseded by draft02 → draft03).
- Mode B round 1 draft: `_docs/01_solution/solution_draft02.md` (superseded by draft03).
- Mode B round 2 decomposition: `_docs/00_research/03_mode_b_decomposition_round2.md`.
- Mode B round 2 reasoning chain: `_docs/00_research/04_reasoning_chain_mode_b_round2.md`.
- Mode B round 2 validation log: `_docs/00_research/05_validation_log_mode_b_round2.md`.
- AC & Restrictions assessment (Phase 1): `_docs/00_research/00_ac_assessment.md` *(unchanged)*.
- Source registry: `_docs/00_research/01_source_registry.md` (S01S77).
- Fact cards: `_docs/00_research/02_fact_cards.md` (Phase 1 + Mode B round 1 M-1..M-21 + Mode B round 2 M-22..M-35).
- Tech stack consolidation: `_docs/01_solution/tech_stack.md` (deferred — Phase 3 optional).
- Security analysis: `_docs/01_solution/security_analysis.md` (deferred — Phase 4 optional, **promoted to recommended-before-Plan-lock** because of M-6/M-7).
+513
View File
@@ -0,0 +1,513 @@
# Solution Draft 02
> **Mode**: B (Solution Assessment of `solution_draft01.md`).
> **Inputs**: `solution_draft01.md` (Mode A) + `_docs/00_research/{03_mode_b_decomposition,04_reasoning_chain_mode_b,05_validation_log_mode_b}.md` + Mode B sources S40S57 in `01_source_registry.md` + Mode B fact cards M-1..M-21 in `02_fact_cards.md`.
> **Date**: 2026-04-26 (revised after user lock-in of open items Q1Q5).
> **Self-contained**: yes — this draft is the new source of truth and supersedes `solution_draft01.md`.
>
> **Locked-in user decisions (2026-04-26)**:
>
> - **Q1** → A: GPS_INPUT + ODOMETRY hybrid output (M-1). Codified in AC-4.3.
> - **Q2** → A: distinct system-IDs via ArduPilot native MAVLink routing; **no `mavlink-router` daemon** (M-6).
> - **Q3** → A: AC-NEW-7 thresholds confirmed at P(>30 m)<1 %, P(>100 m)<0.1 % per flight (M-9). Codified in AC-NEW-7.
> - **Q4** → A: TartanAir V2 included as early-stage synthetic baseline in the bench-off (M-13).
> - **Q5** → B: proceed to Plan in a fresh conversation (no further Mode B round).
> - Camera spec → ADTi 20MP 20L V1 APS-C; storage zoom → z=20 (M-20). Codified in `restrictions.md`.
---
## Assessment Findings
The "old solution → weak point → new solution" table for the v1 commitments. Every row references the corresponding fact card (M-X) for traceability. **15 findings**: 4 high-severity functional, 2 high-severity security, 1 high-severity safety, 1 high-severity-positive (latency easier than thought), 6 medium, 1 open question.
| Old Component Solution | Weak Point | New Solution |
| ------------------------------------------------------------------------------------------------------------------------------------------------------------------------ | -------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------ |
| **C-6**: emit `GPS_INPUT` only via pymavlink (`GPS1_TYPE=14`) — covariance collapsed to scalar `h_acc`/`v_acc`. | **Functional** (M-1). ArduPilot dev docs (S41) call **ODOMETRY the preferred external-nav channel**; ODOMETRY carries quaternion + 6-DoF covariance + native quality field. GPS_INPUT-only under-utilises the FC's EKF3 and erases our yaw covariance — directly hurts AC-NEW-4 (false-position safety). | **Hybrid output**. GPS_INPUT remains the primary "GPS-substitute" channel (matches AC-4.3 framing). When the companion EKF emits a fix with full 6-DoF covariance and observability, **also emit ODOMETRY** so EKF3 can fuse the richer signal. FC source priorities config'd so GPS_INPUT is the failover if ODOMETRY trips VISO_QUAL_MIN. |
| **C-3**: bench-off shortlist = {SP+LG, XFeat sparse, XFeat semi-dense, MASt3R (stretch), RoMa/DKM (bench-off candidate), classical (last-resort)}. | **Functional** (M-2). MASt3R `mast3r-runtime` lists Jetson Orin support as **"Planned"**, not implemented (S57). Speedy MASt3R = 91 ms/pair on **A40 GPU**; Orin Nano Super throughput ≈ 1/30 of A40 → MASt3R ≈ **2.53 s/pair**, ~7× over the 400 ms p95 budget. | Drop MASt3R from the v1 bench-off; mark it **research-track-only** (long-horizon distillation experiment). |
| **C-3**: bench-off shortlist (same row, expansion). | **Functional** (M-3). GIM (S48, ICLR 2024 spotlight) gives drop-in 8.418.1 % zero-shot improvement over LightGlue/RoMa/DKM/LoFTR by self-training on internet videos. Same TRT path as vanilla SP+LG, better cross-domain transfer — exactly our regime (zero training data on eastern-Ukraine 1 km AGL). | Add **GIM-LightGlue** to the bench-off as a peer of vanilla SP+LG. |
| **C-2**: VPR shortlist = AnyLoc (primary) + MixVPR (degraded-power fast lane). | **Functional** (M-4). Two CVPR 2024 papers landed after the Mode A draft was written: **DINOv2 SALAD** (S47) — DINOv2 + Sinkhorn-VLAD, R@1 = 75 % MSLS Challenge / 92.2 % MSLS Val / 76 % NordLand; **BoQ** (S46) — bag of learnable queries, beats NetVLAD/MixVPR/EigenPlaces/Patch-NetVLAD/TransVPR/R2Former on 14 benchmarks. | VPR shortlist grows to **{AnyLoc, SALAD, BoQ, MixVPR}**. AnyLoc retained as training-free fallback; SALAD and BoQ are likely primaries. |
| **C-2 / C-9**: latency budget for AnyLoc (DINOv2 ViT-B) = 5080 ms/inf at 224×224 (estimated). | **Performance, positive direction** (M-5). Jetson AI Lab L1 measurements (S40): **DINOv2-base-patch14 = 126 inf/s = ~8 ms/inf at 224×224** on Orin Nano Super (FP16 trtexec). Real number is ~610× better than draft's estimate. | AC-4.1 (400 ms p95) is comfortably feasible. **R2 (latency) downgraded High → Medium**. Empirical confirmation still required, but no longer make-or-break. |
| **C-6**: "MAVSDK + pymavlink share the same serial / TCP MAVLink endpoint via a single `mavlink-router` instance." | **Security** (M-6). mavlink-router has a public, fuzzing-discovered, easily-triggered **stack-based buffer overflow** in config-file parsing (S45 issue #436). Repo has **no SECURITY.md**, no formal advisory process. Drops a known-vulnerable C++ daemon onto a flight-critical companion. | **Replace mavlink-router**. v1 default: distinct system-IDs for MAVSDK and pymavlink, sharing the serial port via ArduPilot's native MAVLink routing — no router daemon at all. v1.1 fallback: in-process MAVLink endpoint multiplexer (~150 LOC). |
| **C-6 / Security**: "MAVLink2 signing is recommended (deferred to a Phase-4 security pass)." | **Security** (M-7). GPS_INPUT (and now ODOMETRY) is a high-trust local channel feeding the flight-critical EKF. Without signing, anyone with serial-line access on the airframe can crash the vehicle by injecting a malicious fix. Cost of enabling signing is one operator key-provisioning step per airframe (S44). | Promote MAVLink2 signing to **v1 hard configuration item**. Document the key-provisioning procedure in the deploy runbook. Verify signing-on at boot; refuse to inject GPS_INPUT/ODOMETRY if the FC reports signing-off on our link. |
| **C-1**: "Tile format = MBTiles SQLite + per-tile metadata. Single file, mmap-friendly, ubiquitous." | **Performance** (M-8). Default SQLite rollback journal mode + concurrent reader (matcher cache lookup at ≤3 fps × ~30 candidate tiles) + writer (Component 1b ortho-tile write at ≤12 Hz × ~30 tiles) → guaranteed `database is locked` failures (S54). | Specify **MBTiles SQLite + WAL + connection pool + per-cycle transaction batching**. Multiple read connections + one write connection. Tile-cache lookup p95 ≤ 5 ms is now a measurable AC-4.1 sub-budget. |
| **C-1b**: tile dedup rule "If cache has stale service tile AND our quality > existing → write (overwrites with `source = onboard`)". Quality = inlier count + sharpness. | **Safety** (M-9). EKF over-confidence (a known failure mode) escapes the σ_xy ≤ 10 m generation gate; a confidently-bad pose writes a misaligned tile that becomes the next flight's anchor → cross-flight error compounding. AC-NEW-4 doesn't model this. | (a) **Service tiles are immutable within freshness budget** — onboard tiles overwrite only stale or other-onboard tiles. (b) **Voting layer at the Service ingest**: onboard tile gets promoted to "trusted basemap" only after **N≥2 independent flights** confirm consistent geo-alignment. (c) Quality score includes **parent-pose covariance as a hard gate** (σ_xy ≤ 5 m, tighter than the 10 m generation gate); tiles above that gate are marked "soft" in their sidecar. (d) New AC: **AC-NEW-7 — cache-poisoning safety budget**: P(onboard tile mis-aligned > 30 m) per flight < 1 %; P(>100 m) per flight < 0.1 %. |
| **C-9**: "single Python process (asyncio) with TRT inference workers via CUDA IPC for tensor handoff." | **Functional** (M-10). Free-threaded Python 3.13 is **experimental**, has substantial single-threaded perf hit, and **GIL re-enables on import of any non-FT-aware C extension** (S55) — which would silently include numba, possibly TRT bindings, possibly older pymavlink. Free-threading is not a v1 escape hatch. | Stay on **CPython 3.11 or 3.12** for v1. Sharpen the rationale: the choice is "asyncio + TRT subprocess workers + numba JIT on hot path is the production-ready combination today; revisit free-threading in v1.1 once NumPy/SciPy/numba/TRT bindings stabilise on PEP 703". |
| **C-5 / C-6**: source-promotion logic "we **immediately** promote our `GPS_INPUT` to fix_type=3D and assert" on FC fix degradation. | **Functional / safety** (M-11). ArduPilot's external-nav source-switching path has known production gotchas (S41, S42 PR #19563, S43 PR #30080 active 2025): companion-derived velocity errors, position-estimate resets when external-nav reference is lost, conflicts when running alongside GPS. AC-NEW-2 (3 s spoofing-promotion latency) **is** that path. | Promote **F-T9 SITL coverage of source-switching** from "verify the loop closes" to a **hard test gate**. Test matrix: jam-onset → our channel; spoofed-real-GPS recovery → operator-confirmed source-restore; `EK3_SRC1_`* parameter combinations across both GPS_INPUT-primary and ODOMETRY-primary. Pin ArduPilot to the version containing PR #30080. |
| **C-1b / R-Terrain**: "flat-Earth model" everywhere; "operational area is flat steppe (R-Terrain)". | **Functional / safety** (M-12). Eastern-Ukraine relief amplitude reaches **~24 m peak-to-trough** in Kharkiv survey areas (S56), with creek + gully (yary/balky) systems. At 1 km AGL with 35° HFOV, that produces ~17 m horizontal misalignment at frame edge under flat-Earth ortho. Inside AC-1.1 (50 m@80 %) but eats into AC-1.2 (20 m@50 %). | **Per-sector DEM lookup** in pre-flight. Classify sectors: **flat** (≤5 m amplitude, full anchor weight), **moderate** (515 m, weight × 0.7), **rugged** (>15 m, skip ortho-tile generation, weight × 0.3 with `rugged_sector` telemetry flag). Use SRTM 30 m DEM (free; ~30 MB for 400 km²). Add a runtime self-classifier: if matcher RANSAC inlier ratio drops < threshold for K consecutive frames, auto-promote the sector to "rugged" for the rest of the flight. |
| **C-3 / V&V**: bench-off targets = AerialVL + UAV-VisLoc + internal Mavic. | **Functional** (M-14). None of those grade **extreme-pitch / extreme-scale / extreme-overlap separately**. AerialExtreMatch (S49, 1.5 M synthetic pairs, 32 difficulty levels) covers exactly the failure-mode axes that matter; **2chADCNN** (S50, MDPI Drones 2023) is a published season-robustness ceiling reference. | Add **AerialExtreMatch** as a primary structured-difficulty regression bench. Use **2chADCNN as a season-robustness ceiling reference number only***not* as a bench-off candidate. (2chADCNN's outputs are template-overlap regions, not sub-pixel keypoints; its tested altitude band is 252500 m, not 1 km; and it has no Jetson benchmark. Keypoint-grade modern matchers — SP+LG, GIM-LightGlue, GIM-RoMa — are the bench-off candidates.) |
| **C-4 / AC-1.3**: "<100 m drift VO-only / <50 m with IMU" budget — implicit confidence based on ORB-SLAM3 / VINS-Fusion baselines. | **Functional** (M-15). S52 (AFIT thesis) — SVO/DSO/ORB-SLAM2 all **had significant difficulty** maintaining localisation on real fixed-wing flights. Our framing (VO between satellite anchors, not standalone metric SLAM) is correct, but the AC-1.3 budget needs validation against a real fixed-wing baseline — *not* Mavic-class footage. | New risk **R8 — fixed-wing VO drift under AC-1.3 budget is unconfirmed**. Mitigations: (a) borrow AerialVL's 70 km of fixed-wing trajectories for **F-T1b** AC-1.3 regression; (b) plan the first internal fixed-wing flight before AC lock, not as a stretch. |
| **R7 / Datasets**: "MidAir / synthetic IMU is dropped; AerialVL primary; internal Mavic for deployment-domain proxy." | M-13. TartanAir V2 (S51) is photo-realistic synthetic with **native IMU + 12-cam + 65 environments + season variation + custom camera models**, configurable motion patterns — dynamics-mismatch argument weaker than for MidAir. | **CONFIRMED (Q4 = A, 2026-04-26)** — TartanAir V2 added as early-stage synthetic baseline alongside AerialVL + UAV-VisLoc + AerialExtreMatch + 2chADCNN-season-set + Mavic. Used for sweeping seasons / lighting / pitches before real fixed-wing flight (FT-3) lands. |
| **C-2 / Granularity**: "FAISS IVF over **per-tile** DINOv2-VLAD vectors" using z=20 storage tiles (~154 × 154 m). | **Functional, high** (M-16). A 1 km AGL frame covers 30100 z=20 tiles. Cosine similarity between a frame descriptor (covers ~600 × 450 m of ground) and a single-tile descriptor (covers 154 × 154 m) is fundamentally mismatched. None of AerialVL / AnyLoc / NaviLoc do per-storage-tile retrieval; they use frame-footprint-sized reference chunks with overlap. | **Decouple VPR chunk from storage tile.** Storage tile = z=20 / 512×512 (kept for orthorect + dedup). **VPR chunk** = ground-footprint-sized window (e.g. ~600 × 450 m at the deployment altitude band) with **4050 % overlap**, optionally multi-scale across altitude bands. FAISS index is over chunks, not tiles. Frame descriptor is computed once per *invoked* frame after IMU-heading de-rotation. |
| **C-2 / Invocation**: VPR runs on every retrieval cycle. | **Performance, medium** (M-17). VPR's value is concentrated in re-loc paths (cold start, sharp turn, disconnected segment, large σ_xy). In steady state — recent anchor < 2 s, σ_xy < 20 m, VO healthy — a **geometric prior** from IMU+VO predicted position picks top-K candidate chunks by distance alone, no DINOv2 forward needed. | **Conditional VPR invocation.** `if (steady_state) { rank top-K by geometric distance } else { invoke VPR }`. Saves ~1035 ms/frame in cruise. DINOv2 TRT engine stays resident for low-latency wake-up. |
| **C-2 / Fallback**: no defined behaviour when top-1 retrieval is "unconvinced". | **Resilience, medium** (M-18). If top-1 similarity is below threshold OR top-1/top-2 similarity gap is below threshold, the system today goes straight to "no anchor → VO/IMU dead-reckoning" — wasteful, since an adjacent chunk is often correct. | **Expanding-window retry.** On unconvincing top-1, expand the candidate set to adjacent VPR chunks (±1 in each direction; ~8 neighbours for square-grid layout) and let the matcher (Component 3) decide via inlier ratio + reprojection error. Same FAISS index, larger K, no extra DINOv2 forward. |
---
## Product Solution Description
A companion-computer software stack that runs on the **Jetson Orin Nano Super** alongside an **ArduPilot 4.5+** flight controller and provides **GPS-equivalent position fixes** to the autopilot when real GPS is jammed, spoofed, or denied. It does so by continuously matching the downward navigation-camera feed against a **pre-cached satellite basemap supplied by the Azaion Suite Satellite Service** and fusing match-derived absolute positions with onboard **Visual Odometry** and the autopilot's high-rate **IMU** in a loosely-coupled EKF. The fused estimate is exported on **two MAVLink channels in parallel**: `GPS_INPUT` (the primary "GPS-substitute" channel matching AC-4.3) and `ODOMETRY` (when our pose has full 6-DoF covariance, so the FC's EKF3 can fuse the richer signal).
During flight the system also **generates fresh tiles** from the navigation camera, classifies each sector against a pre-loaded SRTM-30 m DEM (skip rugged sectors), deduplicates new tiles against the existing cache (service tiles immutable within freshness budget), and uploads the new tiles to the **Suite Satellite Service candidate pool** on landing — where a **2-flight voting layer** promotes onboard tiles to "trusted basemap" only after independent confirmation. **No raw frames are persisted** — the tile is the unit of storage.
A separate path computes ground-projected GPS coordinates for objects detected by the AI camera using gimbal angle, airframe attitude, and altitude.
The MAVLink endpoint is shared between MAVSDK (telemetry) and pymavlink (`GPS_INPUT` + `ODOMETRY`) by **distinct system-IDs through ArduPilot's native MAVLink routing** — no `mavlink-router` daemon. **MAVLink2 signing is mandatory in v1** between companion and FC, with a documented per-airframe key-provisioning procedure.
```
Pre-flight (ground)
┌────────────────────────────────────────────────┐
│ Azaion Suite Satellite Service │
│ (sources commercial / agency imagery; │
│ ingests onboard tiles via candidate pool + │
│ 2-flight voting layer) │
└──────────────┬───────────────────┬─────────────┘
│ sync down │ upload back (post-flight, candidate pool)
▼ ▲
┌─────────────────┐
│ DEM (SRTM 30 m) │ ─────► sector classification (flat/moderate/rugged)
└─────────────────┘
Onboard (in-flight)
Nav Cam: ADTi 20MP, 3 fps AI Cam (gimbal+zoom, on-demand)
│ │
▼ ▼
┌────────────────────────────────┐ ┌──────────────────────┐
│ GPS-Denied Pipeline │ │ Object Geo-Locator │
│ ┌──────────────────────┐ │ │ (pinhole + ATTITUDE │
│ │ Visual Odometry │ │ │ MAVLink fusion) │
│ │ (SP+LG 2-frame homog)│ │ └──────────┬───────────┘
│ └──────────┬───────────┘ │ │
│ ▼ │ │
│ ┌──────────────────────┐ │ │
│ │ Place Recognition │←──┐ │ │
│ │ (SALAD/BoQ lead, │ │ │ │
│ │ AnyLoc fallback) │ │ │ │
│ └──────────┬───────────┘ │ │ │
│ ▼ │ │ │
│ ┌──────────────────────┐ │ │ │
│ │ Cross-view Matcher │ │ │ │
│ │ (SP+LG TRT/FP16 lead │ │ │ │
│ │ + GIM-LG bench peer)│ │ │ │
│ └──────────┬───────────┘ │ │ │
│ ▼ │ │ │
│ ┌──────────────────────┐ │ │ │
│ │ EKF Fusion │←──┼───┼── IMU (FC) │
│ │ (loose-coupled, │ │ │ │
│ │ Python + numba) │ │ │ │
│ └──────────┬───────────┘ │ │ │
│ │ │ │ │
│ ├──► Ortho-Tile Generator ──► Tile Cache (NVMe, MBTiles+WAL+pool)
│ │ (skip if rugged sector; │ ▲
│ │ σ_xy hard gate ≤5m for hard │ │ dedup w/
│ │ write; soft tiles flagged) │ │ service-tile
│ │ └───┘ immutability
│ ▼ │
└────────────┼──────────────────────────────────┘
GPS_INPUT (pymavlink, signed MAVLink2) ─────► ArduPilot (GPS1_TYPE=14)
ODOMETRY (pymavlink, signed MAVLink2) ─────► ArduPilot (EK3_SRC1_* = ExternalNav)
Telemetry summary 12 Hz ───────────► QGroundControl (signed)
Flight Data Recorder (NVMe, 64 GB cap)
(tiles + telemetry + IMU + tlog + per-sector flags; NO raw frames)
Post-flight (landing)
┌────────────────────────────────────────────────┐
│ Tile uploader → Suite Satellite Service │
│ → CANDIDATE POOL │
│ → 2-flight voting → trusted-basemap promotion │
└────────────────────────────────────────────────┘
```
---
## Architecture
### Overall principles
1. **Pipeline = stages with explicit confidence**. Each stage emits a pose hypothesis + covariance + categorical label. Downstream EKF fuses by covariance.
2. **All heavy NN inference runs on GPU via TensorRT** (FP16, INT8 where validated). Pre-extract satellite-tile descriptors offline (AC-8.3).
3. **Single-process Python orchestrator** (asyncio, **CPython 3.11/3.12**) owns I/O, MAVLink, telemetry, FDR. **Inference workers** are TRT-engine processes on GPU. Free-threaded Python deferred to v1.1 (M-10).
4. **Persistent satellite cache** across flights (~10 GB for 400 km²); per-flight FDR (ACvisu-NEW-3) is separate.
5. **Every output to the FC carries a covariance** — both GPS_INPUT (`h_acc`, `v_acc`, `vel_acc`) and ODOMETRY (full 21-element matrix). Never a bare lat/lon.
6. **Service tiles are basemap truth**; onboard tiles are candidate input that goes through a Service-side voting layer before becoming basemap (M-9).
7. **MAVLink2 signing on every companion↔FC link** (M-7). USB bypasses signing — bench-only access.
---
### Component 1: Satellite Tile Cache & Descriptor Index
| Aspect | Choice | Rationale / change vs. Mode A |
| ----------------------------- | ------------------------------------------------------------------------------------------------------------------------------------------------------------------ | -------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
| Tile format | **MBTiles SQLite + WAL mode + connection pool + per-cycle transaction batching** | M-8: WAL is mandatory under our concurrent reader+writer load. Pool gives multiple read connections + one write connection. Without these, `database is locked` errors are guaranteed. |
| Tile coordinate system | Slippy-map XYZ at zoom 20 (~30 cm/px) | Unchanged. |
| Tile size | 512 × 512 | Unchanged. |
| Descriptor index | FAISS IVF (cosine) over per-tile DINOv2-VLAD vectors | Unchanged. **New constraint**: index loadable on ≤4 GB GPU RAM (Jetson budget, M-5 / W13 cross-check). |
| Per-tile keypoints | SuperPoint + LightGlue descriptors precomputed pre-flight | Unchanged. **Parallel index** for GIM-LightGlue keypoints if the bench-off picks GIM. |
| Freshness metadata | `capture_date`, `sector_class ∈ {active,stable}`, `source ∈ {service,onboard}`, `terrain_class ∈ {flat,moderate,rugged}`, `trust_level ∈ {basemap,candidate,soft}` | Adds `terrain_class` (M-12) and `trust_level` (M-9). |
| Encryption at rest | AES-GCM, key from secure element on the FC or the companion's TPM | Unchanged. |
| **Service-tile immutability** | **Service-source tiles are immutable within freshness budget; onboard tiles overwrite only stale or other-onboard tiles** | **New (M-9).** Critical to prevent cross-flight cache poisoning. |
**Per-flight storage budget.** ~10 GB persistent for the 400 km² operational-area cache. Plus ~30 MB for SRTM-30 m DEM coverage (M-12).
---
### Component 1b: Ortho-Tile Generator (in-flight tile creation & write-back)
**Responsibility (AC-8.4).** Same as Mode A draft, with the following changes:
**Pipeline per frame:**
1. **Eligibility check** (changed). Skip tile generation when:
- EKF source label is `dead_reckoned`.
- **σ_xy > 5 m** (was 10 m — M-9 hard gate).
- Airframe roll/pitch (from FC `ATTITUDE`) > 10°.
- VPR + cross-view match returned no inliers.
- **Sector is classified as `rugged` in the pre-flight DEM lookup** (M-12) — skip ortho-tile generation entirely for that sector.
For sectors classified as `moderate`, generate but flag the tile sidecar `terrain_uncertainty=true`.
2. **Orthorectification.** Pinhole projection on the **per-sector DEM** (flat-Earth in flat sectors; SRTM-30 m DEM lookup in moderate sectors).
3. **Resampling to basemap projection.** Unchanged.
4. **Quality scoring** (changed). Adds **σ_xy from EKF as hard gate**:
- `sharpness` (variance of Laplacian),
- `coverage` (fraction inside source frame),
- `match_inliers` (RANSAC inlier count),
- `parent_pose_sigma_xy` (EKF position covariance — a tile written from σ_xy ∈ [3, 5] m is `trust_level = soft`; σ_xy ≤ 3 m is `trust_level = candidate`),
- `glare/cloud` flag.
5. **Deduplication / write decision** (changed per M-9):
- If cache has no tile at that key → **write** (`source = onboard`, `trust_level = candidate` or `soft`).
- If cache has a tile and it's `source = service` and **within** AC-8.2 freshness budget → **never overwrite** (was: overwrite if our quality > existing).
- If cache has a tile and it's `source = service` and **outside** AC-8.2 freshness budget → write only if our parent-pose σ_xy ≤ 3 m AND quality score > existing.
- If cache has a tile from `source = onboard` from this same flight, but our quality is materially better → write.
- Otherwise → skip.
6. **Sidecar metadata** (extended). Includes `parent_pose_sigma_xy`, `terrain_class`, `trust_level`.
**Post-flight uploader** (changed). Onboard tiles are pushed to the Suite Service **candidate pool**, not directly to the basemap. Service ingest applies the **2-flight voting rule** (M-9) before promoting candidate tiles to trusted basemap. Tiles already at `trust_level = soft` upload but with the soft-trust flag preserved.
---
### Component 2: Visual Place Recognition (Global Retrieval)
**Role.** VPR is a **resilience module**, not an every-frame primary-loop module. Its job is to narrow ~10⁴–10⁵ candidate ground-footprint chunks down to a top-K (510) when a geometric prior from IMU+VO is unavailable or untrusted. In steady-state cruise we use the geometric prior alone; we invoke VPR on **re-loc triggers** (M-17). VPR is essential for the resilience ACs (AC-NEW-1 cold start, AC-3.2 sharp turn re-loc, AC-3.3 disconnected segment); it is *not* essential to every steady-state frame.
#### Retrieval unit (revised — M-16)
The VPR retrieval unit is **decoupled** from the storage tile:
| Concept | Size | Purpose |
| ------------------------------ | -------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | ------------------------------------------------------------------------------------------------------------------------------------- |
| **Storage tile** (Component 1) | z=20 slippy XYZ, 512×512 (~154 × 154 m ground) | Orthorectification, dedup, basemap update. Storage layer only. |
| **VPR chunk** *(new)* | Ground-footprint-sized to expected frame coverage (~600 × 450 m at 1 km AGL with the v1 lens — re-pinned per camera spec); **4050 % overlap** between adjacent chunks; optionally multi-scale across altitude bands | The unit FAISS retrieval works on. Decoupled from storage so any frame footprint always falls inside ≥1 chunk regardless of position. |
The FAISS IVF index is over **VPR chunk descriptors**, not storage-tile descriptors. Chunks are derived from the storage tile cache pre-flight (one batch DINOv2 forward per chunk); refreshed when tiles inside a chunk change beyond a threshold. Index size for a 400 km² operational area at ~600×450 m chunk size with 50 % overlap ≈ **6 0008 000 entries**, well within FAISS-on-Jetson memory.
Frame descriptor pipeline (only on VPR invocation): **IMU-heading de-rotate frame → downsample to backbone input size → DINOv2 forward → VLAD/SALAD/BoQ aggregator → cosine retrieval against FAISS chunk index**.
#### Invocation policy (revised — M-17)
```
on each EKF cycle:
if steady_state(last_anchor_age < 2s, sigma_xy < 20m, vo_healthy):
candidates = top_K_chunks_by_predicted_position() # geometric prior, no DINOv2
else:
# Re-loc path — cold start, sharp turn, disconnected segment, sigma_xy > 50m, VO failed
candidates = vpr_top_K_chunks(frame_descriptor) # DINOv2 + FAISS
if not convincing_match(candidates): # M-18
candidates = expand_to_adjacent_chunks(candidates)
pose, covariance = matcher_pnp(frame, candidates) # Component 3
```
Telemetry exposes `vpr_invoked` per cycle so the FDR captures the steady-state-vs-reloc fraction over a flight.
#### Backbone bench-off candidates
| Solution | Tools | Advantages | Limitations | Performance | Fit |
| --------------------------------------- | -------------------------------- | --------------------------------------------------------------------------------------------------------- | ----------------------------------------------------------------------- | -------------------------------------------------------------- | --------------------------------------------------------------------- |
| **AnyLoc** (DINOv2 + unsupervised VLAD) | dinov2 ViT-B/14, VLAD aggregator | Training-free; up to 4× R@1 over specialised methods on aerial cross-domain (F-C2) | Needs bench-off vs SALAD/BoQ on aerial cross-domain | DINOv2-base ~8 ms/inf at 224×224 on Orin Nano Super (S40, M-5) | **Bench-off candidate** — keep as fallback even if not picked primary |
| **DINOv2 SALAD** *(new)* | SALAD repo (CVPR 2024) | DINOv2-trained Sinkhorn-VLAD; R@1 = 75 % MSLS Challenge / 92.2 % MSLS Val / 76 % NordLand; in `aero-vloc` | Requires training data — but published checkpoints are usable zero-shot | Same backbone as AnyLoc → similar latency | **Bench-off primary candidate** (M-4) |
| **BoQ** *(new)* | Bag-of-Queries (CVPR 2024) | Beats NetVLAD/MixVPR/EigenPlaces/Patch-NetVLAD/TransVPR/R2Former on 14 benchmarks | Aerial-domain ranking TBD by bench-off | Lower compute than AnyLoc/SALAD when used with a CNN backbone | **Bench-off primary candidate** (M-4) |
| **MixVPR** | mixvpr trained on GSV-Cities | Lighter than AnyLoc; degraded-power fast-lane | Trained on street-view; weaker out-of-domain on aerial | Lower latency than ViT-class | **Fast-lane on degraded power** |
| **EigenPlaces / SelaVPR** | aero-vloc | Recent SOTA on some aerial | Mixed wins vs AnyLoc | — | Bench-off candidates |
**Bench-off scope expanded** (M-4): AnyLoc + SALAD + BoQ + MixVPR primary; EigenPlaces / SelaVPR secondary. **Each candidate is benched on the new chunk-based retrieval unit, not per-tile.**
#### Active-conflict scene change (destroyed buildings, cratering, dam flooding)
This is a frequent operational reality, not an edge case. Layered mitigations:
| Mitigation | What it does | Cost |
| ----------------------------------------------------------------------------------------------------------- | ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------ | ---------------------------------------------------------------------------------------- |
| **Multi-scale VPR chunks** *(new)* — z=17 / z=18-effective coarse chunks alongside z=20-derived fine chunks | Coarse-scale chunk descriptors describe road-network + field-boundary + waterway structure that survives building destruction. When the fine-scale top-K is unconvinced, the system falls back to coarse-scale top-K. | ~12 MB extra disk; ~3 min one-time DINOv2 forward over coarse chunks pre-flight |
| **OSM road-network overlay** as stable-feature anchor *(new)* | OSM road geometry persists even when buildings are destroyed. Extract from OpenStreetMap as a binary "road-mask" tile sidecar. The matcher applies a bonus inlier weighting on keypoints that fall on road edges (~1.3× confidence multiplier). GISNav (closest published reference architecture) does this. | One-time pre-flight OSM extract for the operational area (~minutes); ~5 % bigger sidecar |
| **Sector volatility classification drives K** *(new)* — bound to AC-NEW-6 `sector_class` | K=5 in stable sectors; K=20 in active-conflict sectors; K=50 in expanding-window fallback. Bigger candidate pool absorbs stale-tile false negatives. | Pure config; no compute or storage cost |
| **Onboard-tile rapid promotion in active sectors** *(new)* — refines M-9's 2-flight voting | In active sectors specifically, allow promotion to "trusted basemap" after 1 flight if σ_xy ≤ 3 m AND OSM-road-overlap ≥ 70 % (dual gate). Faster basemap refresh keeps up with active-sector change rate. Stable sectors keep the conservative N≥2 voting. | Branch in Service ingest voting layer; no onboard cost |
| **Negative cache** *(new)* | When the matcher rejects a tile pair (RANSAC inlier ratio < 0.3) repeatedly across multiple flights, mark that tile's `trust_level = stale_destroyed` and exclude from retrieval until refreshed by Service. | One extra metadata field; trivial at retrieval time |
Of these, **multi-scale VPR chunks + OSM road overlay** are the two with the biggest payoff for active-conflict scene change. Sector-driven K is essentially free. Negative cache is cheap insurance.
#### Stale-tile / cloud robustness
- **Stale tiles** in stable sectors (seasonal mismatch only): bench-off includes AerialExtreMatch (S49, structured-difficulty). 2chADCNN (S50) is the season-robustness ceiling reference. Production-side mitigation: top-K is **dynamically sized** by sector + σ_xy (see table above). Stale-tile false-negatives are absorbed by larger K + matcher-driven verification.
- **Cloudy stored tile**: deprioritised at retrieval time via the `glare/cloud` sidecar flag (Component 1).
- **Cloudy live frame**: not VPR-specific — the matcher and orthorectifier also fail. System falls back to VO/IMU dead-reckoning. **F-T16** (new test) synthesises cloud-occlusion injection on AerialVL frames to characterise the recovery profile (see Testing Strategy).
---
### Component 3: Cross-View Matching & PnP
> **⚠ Deep-research item.** Highest-leverage decision in the system. Mode B updates the candidate list.
**Bench-off candidates (revised vs Mode A):**
| Candidate | Status vs Mode A | Rationale |
| -------------------------------------- | ------------------------------------------------- | ---------------------------------------------------------------------------------------------------------------------- |
| **SuperPoint + LightGlue (TRT, FP16)** | **Lead candidate** (unchanged) | Well-trodden TRT path, ~286 FPS on RTX 3080 @ 320×240 baseline (F-B1) |
| **GIM-LightGlue** *(new — M-3)* | **Bench-off candidate, peer of SP+LG** | 8.418.1 % zero-shot improvement over LightGlue baseline (S48). Same TRT path. |
| **XFeat (sparse + semi-dense)** | **Bench-off candidates** (unchanged) | Embedded-class throughput; CPU-viable as failover (S08) |
| **MASt3R** | **DROPPED from primary list** (was stretch — M-2) | mast3r-runtime Jetson support "Planned"; ~3 s/pair on Orin Nano Super extrapolated. Research-track-only. |
| **GIM-RoMa / RoMa** | Bench-off candidate | Best Map-free / aerial cross-view in 2024 papers; needs distillation work |
| **GIM-DKM** | Bench-off candidate | Same as RoMa — bench-off only if SP+LG variants fall short |
| **2chADCNN** *(new — M-14)* | **Season-aware reference** | UAV↔satellite season-aware template-matching (S50). Either bench-off candidate or season-robustness ceiling reference. |
| **Classical (SIFT/ORB/AKAZE)** | Last-resort degraded mode | Cross-view domain gap kills these (F-A5) |
**Bench-off targets (revised):**
1. **AerialVL** — primary public benchmark (S03), 70 km of fixed-wing trajectories.
2. **UAV-VisLoc** — accuracy regression at 405840 m (S01).
3. **AerialExtreMatch** *(new — M-14)* — 1.5 M synthetic pairs, 32 difficulty levels (overlap × scale × pitch). Direct grading of failure-mode axes.
4. **2chADCNN season set** — season-aware **ceiling reference number only** (M-21); not a candidate matcher.
5. **TartanAir V2** *(confirmed — M-13, Q4=A)* — early-stage synthetic baseline; sweeps seasons / lighting / pitches before the first internal fixed-wing flight lands.
6. **Internal Mavic flight footage** — closest to deployment domain.
7. **First internal fixed-wing flight** (FT-3) — lands before AC-1.3 lock per M-15.
**Score on:** AC-1.1 / AC-1.2 / AC-2.2 / p95 latency on Orin Nano Super 25 W / sustained 30-min thermal stability / peak GPU memory / **plus seasonal-robustness score from the 2chADCNN-axis tests**.
**PnP & projection:** Unchanged from Mode A, except output schema adds `parent_pose_sigma_xy` for downstream Component-1b dedup gate.
**Input downsampling:** Empirical pin during research pass. Latency budget is more comfortable than Mode A assumed (M-5 / S40), so 1024×768 is a low-risk starting point for SP+LG / GIM-LG; 1024×768 semi-dense or 640×480 sparse for XFeat.
---
### Component 4: Visual Odometry
Unchanged from Mode A (custom 2-frame VO via SuperPoint+LightGlue / GIM-LightGlue homography). New risk **R8** (M-15) added: AC-1.3 drift budget needs validation against AerialVL's fixed-wing trajectories before lock — *not* against Mavic-class footage.
---
### Component 5: IMU + Visual EKF Fusion
**Working choice (revised from Mode A):** Onboard loosely-coupled EKF in our process emits **two parallel MAVLink streams**:
1. **GPS_INPUT** (primary, GPS-substitute framing per AC-4.3) with `h_acc`/`v_acc` derived from EKF covariance.
2. **ODOMETRY** (auxiliary, when full 6-DoF covariance is available and quality > VISO_QUAL_MIN) with the full 21-element pos+att covariance (M-1).
ArduPilot is configured with `EK3_SRC1_`* set to GPS-with-fallback-to-ExternalNav so GPS_INPUT remains the failover path. Mode/source labels (`satellite_anchored / vo_extrapolated / dead_reckoned`) are emitted on both channels.
**Key tuning:** the EKF's Mahalanobis gate and process-noise covariances are calibrated against AC-NEW-4 budget through Monte Carlo (which now also includes M-9 cache-poisoning injection — see "Testing Strategy").
---
### Component 6: MAVLink Integration & Source Promotion
**Working choice (revised from Mode A):**
- **MAVSDK for telemetry** + **pymavlink for `GPS_INPUT` and `ODOMETRY` lines**.
- **No mavlink-router daemon** (M-6). Instead: distinct system-IDs for MAVSDK (sysid=10) and pymavlink (sysid=11), sharing the serial port via ArduPilot's native MAVLink routing (S35-class). Single endpoint configuration, no third-party C++ daemon, no #436-class CVE risk.
- **MAVLink2 signing mandatory** (M-7) on every companion↔FC link. Per-airframe key in FC FRAM; provisioning runbook is part of the deploy procedure.
**Source-promotion logic (revised per M-11):** unchanged behaviourally, but **F-T9 SITL test scope expanded** to include source-switching combinations across both GPS_INPUT-primary and ODOMETRY-primary modes. Pin ArduPilot to the version containing PR #30080.
**Spoofing-promotion latency budget:** <3 s (AC-NEW-2) — unchanged.
---
### Component 7: Failsafe, Health & Re-Localization
Unchanged from Mode A.
---
### Component 8: Object Localization (AI Camera)
Unchanged from Mode A (trig + airframe-attitude fusion via `ATTITUDE` MAVLink stream).
---
### Component 9: Software Platform & Process Topology
**Working choice (revised rationale per M-10):**
- **Single Python process (asyncio) on CPython 3.11 or 3.12** (well-supported by JetPack / numba / TRT bindings).
- **TRT inference workers as subprocesses**, tensor handoff via CUDA IPC (Jetson unified-memory aware: zero-copy possible since CPU and GPU share the LPDDR5 pool).
- **numba JIT** for EKF math hot paths.
- Configuration via YAML; logging via structured JSON to FDR.
- **Free-threaded Python (3.13+) is v1.1 territory.** Reason: experimental, single-threaded perf hit, GIL re-enables on import of any non-FT-aware C extension (S55). Revisit when NumPy/SciPy/numba/TRT bindings are FT-aware.
---
### Component 10: Flight Data Recorder
Unchanged from Mode A, except the per-sector `terrain_class` and `trust_level` flags are recorded alongside the position-estimate stream so post-mission analysis can filter on them.
---
### Component 11: Confidence Score (cross-cutting)
**Computed** from: RANSAC inlier ratio, reprojection error variance, top-K retrieval similarity gap, EKF covariance, **plus** parent-pose σ_xy gate result (M-9 hard gate).
**Emitted on:**
1. GPS_INPUT (`h_acc`).
2. ODOMETRY (full 21-element covariance + `quality` field 0100).
3. NAMED_VALUE_FLOAT "CONF_M" on the GCS link.
4. Per-tile sidecar (`parent_pose_sigma_xy`) for Component-1b dedup gate.
---
## Testing Strategy
### Functional / Integration
- **F-T1** Tile cache load/lookup (unchanged).
- **F-T1b** *(new — M-15)* AC-1.3 drift regression against **AerialVL's fixed-wing trajectories** (70 km of real flight). Pass = drift ≤100 m VO-only / ≤50 m IMU-fused between satellite anchors at 95th percentile.
- **F-T2** Tile generation + dedup *(extended — M-9)*: replay a recorded flight; assert (a) for any ground sector covered ≥2× by the nav cam, exactly **one** tile is written; (b) the chosen tile has `parent_pose_sigma_xy` ≤ the hard gate; (c) **service tiles are never overwritten when within freshness budget**, regardless of our quality score.
- **F-T3** Tile uploader → candidate pool *(extended — M-9)*: post-flight, the diff against the Service candidate pool is correct; freshness + trust_level metadata round-trips; 2-flight voting promotes candidates to basemap only after 2nd-flight confirmation.
- **F-T4** End-to-end against **AerialVL** (S03).
- **F-T5** End-to-end against **UAV-VisLoc** (S01).
- **F-T5b** *(new — M-14)* End-to-end against **AerialExtreMatch** (S49) — structured-difficulty regression. For each of 32 difficulty levels, log AC-1.1 / AC-1.2 pass/fail.
- **F-T5c** *(new — M-14)* Season-robustness regression against **2chADCNN season set** (S50) — assert AC-1.1 holds across summer↔winter pairs.
- **F-T6** End-to-end against **internal Mavic flight footage** — deployment-domain proxy.
- **F-T7** Sharp-turn handling (unchanged).
- **F-T8** Disconnected-segment re-localization (unchanged).
- **F-T9** ArduPilot SITL: full MAVLink loop *(extended — M-11)*. Test matrix:
- GPS_INPUT-only mode (Mode A baseline).
- GPS_INPUT + ODOMETRY hybrid mode.
- Source switching: jam-onset → our channel; spoofed-real-GPS recovery → operator-confirmed source-restore.
- `EK3_SRC1_`* parameter combinations across both modes.
- **MAVLink2 signing on**: assert injection refused on signing failure; assert acceptance on valid signing.
- **F-T10** Operator re-loc workflow via QGC `STATUSTEXT`.
- **F-T11** Cold-start TTFF <30 s (AC-NEW-1).
- **F-T12** Spoofing-promotion <3 s (AC-NEW-2).
- **F-T13** Object localization with airframe-attitude fusion (unchanged).
- **F-T14** *(new — M-12)* Per-sector DEM classification: load SRTM-30 m for the operational area; assert sector classes (`flat`, `moderate`, `rugged`) line up with ground-truth DEM amplitudes; assert ortho-tile generation is skipped in `rugged` sectors.
- **F-T15** *(new — M-16/17/18 cluster)* **VPR retrieval-unit bench**: build the chunk-based FAISS index over a 400 km² synthetic operational area; assert that for any ground point, ≥1 chunk fully contains the expected frame footprint (overlap correctness). Bench top-K recall at K = {3, 5, 10, 50} for steady-state, re-loc, and expanding-window modes against AerialVL + AerialExtreMatch + 2chADCNN season set.
- **F-T16** *(new — concern #3 cloud robustness)* Synthetic cloud-occlusion injection: inject 060 % cloud cover on AerialVL frames (and on cached basemap tiles independently); assert the system gracefully degrades (top-K expansion → matcher failure → VO/IMU fallback) rather than emitting a confident bad fix.
- **F-T17** *(new — M-17 invocation policy)* Mission replay assertion: in a typical mission replay (steady cruise + 1 sharp turn + 1 simulated reboot), measure the % of cycles VPR is invoked. Pass criterion: ≥80 % of steady-state cycles use the geometric-prior path; 100 % of re-loc-trigger cycles invoke VPR.
### Non-Functional
- **NF-T1** Latency p95 <400 ms on Orin Nano Super 25 W (AC-4.1).
- **NF-T2** Memory <8 GB shared (AC-4.2).
- **NF-T3** Thermal: 8 h sustained 25 W (AC-NEW-5).
- **NF-T4** *(extended — M-9)* False-position safety budget (AC-NEW-4) — Monte Carlo over AerialVL + Mavic + AerialExtreMatch with **synthetic over-confidence injection**: artificially deflate EKF covariance by 1.5×–3× and verify the EKF's Mahalanobis gate still rejects the bad fix and the cache-poisoning hazard does not trigger.
- **NF-T4b** *(new — M-9)* **AC-NEW-7 cache-poisoning safety budget** validation — Monte Carlo over multi-flight replays: assert P(onboard tile mis-aligned > 30 m) per flight < 1 %; P(>100 m) per flight < 0.1 %.
- **NF-T5** Storage: 64 GB FDR cap with rollover.
- **NF-T6** Imagery freshness gate (AC-NEW-6).
### Security
- **S-T1** GPS_INPUT + ODOMETRY not accepted from any non-whitelisted MAVLink source-system-id.
- **S-T2** Tile cache encrypted at rest.
- **S-T3** *(promoted to v1-mandatory — M-7)* MAVLink2 signing between companion and FC. Verified at boot. Refuse to inject GPS_INPUT/ODOMETRY if FC reports signing-off on our link.
- **S-T4** *(new — M-6)* No `mavlink-router` binary on the deployed companion image. The CI image-build step verifies absence.
- **S-T5** *(new — M-6)* MAVLink endpoint multiplexing via distinct system-IDs is exercised in CI integration tests.
### Field
- **FT-1** Flight-data-recorder review of ≥5 real-world test flights at progressive altitudes (200 m → 1 km AGL).
- **FT-2** Single 8-hour sortie endurance / thermal soak.
- **FT-3** *(new — M-15)* **First internal fixed-wing flight at 1 km AGL** before AC-1.3 lock. Synced IMU + GPS truth + nav-cam stream collected; replayed through the pipeline.
---
## Key Risks & Open Items (carried into Plan step)
| ID | Risk | Severity | Mitigation |
| --------------------------- | --------------------------------------------------------------------------------- | ---------------------------------- | ------------------------------------------------------------------------------------------------------------------------------------------------------------ |
| R1 | Imagery licensing lead time (Service-side concern) | Med (was High; now upstream) | Suite Service procurement; not on this build's critical path |
| R2 | Latency budget on Orin Nano Super at 1024×768 | **Med (was High — M-5)** | DINOv2-base measured at ~8 ms/inf at 224×224 (S40); empirical bench-off in week 1 of impl |
| R3 | Cross-view accuracy at 1 km AGL with Ukrainian seasonal change | Med | 50 %@20m hard floor; bench-off now includes SALAD/BoQ/GIM-LightGlue/2chADCNN before lock (M-3, M-4, M-14) |
| R4 | MAVSDK + pymavlink coexistence (resolved: distinct system-IDs, no router, M-6) | **Resolved** | — |
| R5 | Thermal at 25 W for 8 h | Med | Cooling validation in NF-T3 |
| R6 | AC-7.1 in turning flight (gimbal-only pose) | Low (scoped to level flight in v1) | Add airframe-attitude fusion in v1.1 |
| R7 | Public dataset gap (V&V) | Med | AerialVL primary; AerialExtreMatch + 2chADCNN added (M-14); internal Mavic for deployment proxy; first fixed-wing flight scheduled before AC-1.3 lock (M-15) |
| **R8** *(new — M-15)* | **Fixed-wing VO drift under AC-1.3 budget unconfirmed** | Med | F-T1b regression on AerialVL's fixed-wing trajectories; FT-3 first internal fixed-wing flight |
| **R9** *(new — M-9)* | **Cross-flight cache poisoning via onboard tile overwrite of stale service tile** | High (safety) | Service-tile immutability inside freshness budget; 2-flight voting at Service ingest; parent-pose σ_xy hard gate; AC-NEW-7 numeric budget |
| **R10** *(new — M-7 / M-6)* | **Companion↔FC link is a flight-critical attack surface** | High (security) | MAVLink2 signing v1-mandatory; mavlink-router replaced by native MAVLink routing with distinct system-IDs |
| **R11** *(new — M-11)* | **ArduPilot external-nav source-switching has known production gotchas** | Med | F-T9 SITL test matrix; pin ArduPilot version containing PR #30080 |
| **R12** *(new — M-12)* | **Eastern-Ukraine relief amplitude breaks flat-Earth assumption near frame edge** | Med | Pre-flight SRTM-30 m DEM lookup; per-sector terrain class; runtime self-classifier |
## Proposed AC additions
**AC-NEW-7 — Cache-poisoning safety budget** *(new — M-9)*
- P(onboard tile geo-misaligned > **30 m**) per flight **<1 %**.
- P(onboard tile geo-misaligned > **100 m**) per flight **<0.1 %**.
**Why it matters.** Cross-flight error compounding. Validated by **NF-T4b**.
**Implementation drivers.** Service-tile immutability within freshness budget; 2-flight voting at Service ingest; parent-pose σ_xy hard gate (≤5 m for hard write, ≤3 m for `trust_level = candidate`).
---
## Open Research (deferred to dedicated research passes before Plan)
| Topic | Why now | Output | Owner |
| -------------------------------------------------------------------------------------------------- | ------------------------------------------------------------------------------------------------------------------------ | ----------------------------------------------------------------------------------------------------------- | ------------------------------------------------------------------- |
| **Cross-view matcher bench-off** (Component 3) — *expanded list per M-2/M-3/M-14* | Highest-leverage decision; expanded shortlist requires explicit empirical comparison | Selected matcher + chosen input resolution + measured latency / accuracy / memory + season-robustness score | Research skill, follow-up Mode A pass scoped to "matcher selection" |
| **Input-resolution sweep** | Coupled with the matcher bench (latency budget more comfortable than Mode A assumed — M-5 → bigger sweep range possible) | Resolution per matcher candidate; sensitivity curves for AC-1.1 / AC-1.2 vs resolution | Same pass |
| **VPR backbone bench-off** (Component 2) — *expanded list per M-4* (AnyLoc + SALAD + BoQ + MixVPR) | Cheaper than the matcher decision but feeds it | Selected VPR backbone + measured Recall@K on AerialVL + AerialExtreMatch + Mavic | Same pass |
| **Tile-generator quality scoring** (Component 1b) | Need empirically-grounded thresholds for σ_xy (M-9), sharpness, glare | Calibrated thresholds | Implementation phase |
| **Internal Mavic-flight V&V dataset** | Closest proxy to deployment domain | Curated, ground-truth-labelled clips | Operations / data team |
| **First internal fixed-wing flight** *(new priority — M-15)* | AC-1.3 drift budget unconfirmed | Recorded sortie with synced IMU + GPS truth + nav-cam stream | Field-test plan; **before AC lock**, not stretch |
| **Encryption-at-rest key management** | Tile cache + FDR are operationally sensitive | Threat-modelled design | Phase 4 security analysis |
| ~~*(Open question — M-13)*~~ TartanAir V2 as early-stage synthetic baseline | **Confirmed yes (Q4 = A, 2026-04-26)** | Folded into bench-off plan | — |
---
## References
All citations are by ID from `_docs/00_research/01_source_registry.md`. Mode B sources: **S40S57**.
- **Latency / hardware**: S40 (Jetson AI Lab L1).
- **MAVLink integration**: S41S44 (ArduPilot dev docs L1, ODOMETRY PR #19563, External-nav fix PR #30080, MAVLink2 signing).
- **Security**: S44 (signing), S45 (mavlink-router CVE class).
- **VPR SOTA 2024**: S46 (BoQ), S47 (SALAD).
- **Matcher SOTA 2024**: S48 (GIM), S57 (MASt3R Jetson status).
- **Datasets**: S49 (AerialExtreMatch), S50 (2chADCNN), S51 (TartanAir V2), S52 (AFIT fixed-wing VO), S53 (high-altitude VIO).
- **Tile cache**: S54 (MBTiles WAL recipe).
- **Python topology**: S55 (free-threaded Python 3.13).
- **Terrain**: S56 (eastern-Ukraine relief).
---
## Related Artifacts
- Mode A draft (superseded by this draft): `_docs/01_solution/solution_draft01.md`.
- Mode B decomposition: `_docs/00_research/03_mode_b_decomposition.md`.
- Mode B reasoning chain: `_docs/00_research/04_reasoning_chain_mode_b.md`.
- Mode B validation log: `_docs/00_research/05_validation_log_mode_b.md`.
- AC & Restrictions assessment (Phase 1): `_docs/00_research/00_ac_assessment.md`.
- Source registry: `_docs/00_research/01_source_registry.md` (S01S57).
- Fact cards: `_docs/00_research/02_fact_cards.md` (Phase 1 + Mode B M-1..M-15).
- Tech stack consolidation: `_docs/01_solution/tech_stack.md` (deferred — Phase 3 optional).
- Security analysis: `_docs/01_solution/security_analysis.md` (deferred — Phase 4 optional, but **promoted to recommended-before-Plan-lock** because of M-6/M-7 promotion).
@@ -0,0 +1,76 @@
# Test-Spec Phase 1 Findings (intermediate, not a final artifact)
> Generated 2026-04-26 during Plan Step 1 (test-spec/SKILL.md, Phase 1).
> This is a working note — Phase 2 reads it to carry forward findings + the user's locked-in decisions.
> Phase 2 produces the 8 final artifacts under `_docs/02_document/tests/`.
## Inputs surveyed
- `_docs/00_problem/problem.md` — short problem statement; flags missing IMU.
- `_docs/00_problem/acceptance_criteria.md` — 46 ACs (37 numbered + 9 NEW).
- `_docs/00_problem/restrictions.md` — UAV/flight/camera/satellite/HW/sensors/failsafe.
- `_docs/01_solution/solution.md` (renamed from solution_draft03 by Plan Prereq 2) — 11 components + testing strategy F-T1…F-T19, NF-T1…NF-T6, S-T1…S-T5, FT-1…FT-3.
- `_docs/00_problem/input_data/`:
- 60 nav-cam JPGs `AD000001.jpg``AD000060.jpg`
- `coordinates.csv` (frame → GPS truth)
- 2 `_gmaps.png` thumbnails (frames 12 only)
- `data_parameters.md` (corpus-shoot params)
- `expected_results/results_report.md` (46 mapped scenarios) + `position_accuracy.csv`
## Quantifiability check
All 46 mapped scenarios in `results_report.md` have quantifiable expected results (no vague "works correctly" entries). Comparison methods used: `percentage`, `numeric_tolerance`, `threshold_max`, `threshold_min`, `exact`, `regex`, `range`, `file_reference`. Acceptable.
## Coverage of the 46 ACs by `results_report.md`
- **Fully covered (~18 / 46 ≈ 39%)**: AC-1.1, AC-1.3 (mono only), AC-2.1, AC-2.2 (VO half), AC-3.1, AC-3.2, AC-3.4, AC-4.1, AC-4.2, AC-5.1, AC-5.3, AC-6.2, AC-7.1, AC-7.2, AC-NEW-5 (junction temp slice), AC-NEW-8 (mono only), API ACs (rows 3033), TRT validation rows 4244.
- **Partially covered (~10)**: AC-1.2, AC-2.2 cross-view <2.5 px, AC-5.2 timing, AC-6.1, AC-NEW-1, AC-NEW-5 hot/cold soak.
- **Not covered (~18)**: AC-1.4, AC-3.3, AC-4.3 ODOMETRY (intentionally per v1 — see clause), AC-4.4, AC-4.5, AC-6.3, AC-8.1AC-8.6, AC-NEW-2, AC-NEW-3, AC-NEW-4, AC-NEW-6, AC-NEW-7, AC-NEW-9.
Headline ≈ 39% direct + 22% partial = **~61%** against the 46-AC denominator. Below the 75% threshold *only* when the 60-image slice is treated as the sole corpus. The solution's testing strategy explicitly delegates the missing slice to bench-off corpora named in solution.md (AerialVL, UAV-VisLoc, AerialExtreMatch, 2chADCNN, TartanAir V2, internal Mavic, first internal fixed-wing). Per user decision #4 below, Phase 2 will spec tests for all 46 ACs and mark unfulfilled-data ACs with `data_status: deferred-corpus` in `traceability-matrix.md`.
## Stale-doc fixes already applied (per user decision #1, option A)
Edits made during Phase 1 — Phase 2 reads these as baseline truth:
| File | Row / AC | Change |
|------|----------|--------|
| `results_report.md` | row 2 | "≥60% within 20m" → "≥50% within 20m" (aligns with AC-1.2). |
| `results_report.md` | row 19 | "ESKF position corrected" → "Component 5 calibrator emits a satellite-anchored fix, FC EKF3 reconverges". |
| `results_report.md` | row 22 | "uses hint as ESKF measurement" → "uses hint as a high-covariance (~500m) seed for VPR/cross-view re-localization (consumed by Component 5 calibrator)". |
| `results_report.md` | row 23 | "GPS_INPUT output begins within 60s of boot" → "within 30s of boot (95th percentile)" (aligns with AC-NEW-1). |
| `results_report.md` | row 25 | "inits ESKF with high uncertainty" → "re-initialises Component 5 calibrator state with high uncertainty"; recovery time "≤70s" → "≤30s". |
| `results_report.md` | row 38 | LiteSAM/XFeat ≤330ms inline → "SP+LG (TRT FP16/INT8) inline ≤200ms; LiteSAM re-loc fallback ≤2000ms". |
| `acceptance_criteria.md` | AC-4.3 | added v1-scope clause: ODOMETRY emission disabled in v1 (per solution_draft03 finding M-30, EKF3 issues #30076/#32506); `EK3_SRC1_*=GPS+Compass`; tests assert ODOMETRY is intentionally absent on the wire in v1; ODOMETRY re-enabled in v1.1 once F-T9 SITL passes. |
## Locked-in user decisions (carry into Phase 2)
| ID | Decision | Phase-2 implication |
|----|----------|---------------------|
| D1 | Apply 4 stale-doc fixes inline (done above). | Phase 2 reads `results_report.md` v2 (post-edit) and the new AC-4.3 clause as authoritative. |
| D2 | Camera/altitude mismatch: 60-image slice is **pipeline-correctness corpus only** — does NOT validate GSD-band assumptions, latency budgets, or matcher resolution sweeps for the deployed 1km AGL / 20MP path. | `tests/test-data.md` MUST state: corpus shot at 400m AGL with ADTi 26S v2 (26MP, 6252×4168, 25mm, 23.5mm sensor). Tests scoped to "pipeline correctness" only. AC-1.1/AC-1.2/AC-2.1/AC-2.2/AC-NEW-8 acceptance numbers from this slice are pipeline-functional, not deployment-binding. Deployment-binding tests reference AerialVL S03 (1km AGL fixed-wing). |
| D3 | Missing satellite tiles + IMU: spec tests with **placeholder fixtures** referenced by name even though files don't yet exist. | `tests/test-data.md` declares: (a) `fixtures/satellite_tiles_AD0000xx_z20/` — z=20 ortho tiles for the bbox of `coordinates.csv`, fetched by an implementer-written script (Esri / public ortho); (b) `fixtures/imu_AD0000xx.csv` — IMU traces from SITL ArduPilot replay of `coordinates.csv` as ground-truth trajectory at 200 Hz; for AC-1.3 / AC-NEW-8 fixed-wing tests use **AerialVL S03 IMU** as the fixed-wing reference. Phase 3 hard gate will surface these as "pending data", not "remove". |
| D4 | AC-coverage gap: Phase 2 specs tests for **all 46 ACs**; deferred-data ACs get `data_status: deferred-corpus` in `traceability-matrix.md` listing the named external corpus. | `traceability-matrix.md` columns: AC-id, Test-id, Test-file, data_status (∈ {present, deferred-corpus, deferred-sitl, deferred-hil}). Rows pointing at AerialVL S03, UAV-VisLoc, AerialExtreMatch, TartanAir V2, internal Mavic, first internal fixed-wing flight, hot/cold soak chamber, multi-flight Monte Carlo, and SITL ArduPilot are emitted with the appropriate `deferred-*` token. |
## Open contradictions still standing (NOT auto-fixed)
None for Phase 2 entry. AC-4.3 dual-channel framing was the only remaining one and it was resolved by the v1-scope clause (D1) — AC text intact, v1 implementation scoped to GPS_INPUT only.
## Known data dependencies for Phase 2 to spec around
| Dependency | Status | Phase-2 treatment |
|------------|--------|-------------------|
| z=20 satellite tiles for the `coordinates.csv` bbox | Missing | Fixture name declared in `test-data.md`, script TODO (implementer task in Plan Step 3 / Decompose). |
| IMU traces synced to `coordinates.csv` frames | Missing | SITL replay declared as fixture; AerialVL S03 used for fixed-wing AC-1.3 / AC-NEW-8. |
| AerialVL S03 / UAV-VisLoc / AerialExtreMatch / 2chADCNN / TartanAir V2 | External, not yet downloaded | `data_status: deferred-corpus` in matrix; Decompose creates a "dataset acquisition" task. |
| First internal fixed-wing flight footage | Pending field-test plan | `data_status: deferred-corpus`. |
| SITL ArduPilot environment (PR #30080 pinned version) | Not yet provisioned | `data_status: deferred-sitl`. |
| Hot/cold soak chamber (AC-NEW-5) | Bench equipment | `data_status: deferred-hil`. |
| 8-h synthetic load fixture (AC-NEW-3 FDR) | Synthesizable | Declared as fixture, generated at impl time. |
## Phase 2 entry checklist (READY)
- [x] Phase 1 BLOCKING gate cleared (user confirmed coverage decisions).
- [x] Stale-doc fixes applied (D1).
- [x] Findings preserved here for resume in a fresh conversation.
- [ ] Phase 2 will read this file first, then read solution.md / AC / restrictions / results_report.md as needed for each artifact.
+14
View File
@@ -0,0 +1,14 @@
# Autodev State
## Current Step
flow: greenfield
step: 3
name: Plan
status: in_progress
sub_step:
phase: 2
name: test-scenarios
detail: "Plan Step 1 (test-spec) Phase 1 COMPLETE. User cleared the BLOCKING gate (all 4 questions = A): D1 fix stale docs inline; D2 60-image slice = pipeline-correctness corpus only; D3 spec with placeholder fixtures (satellite tiles + IMU); D4 spec all 46 ACs with data_status markers in traceability-matrix.md. Stale-doc fixes already applied to results_report.md rows 2/19/22/23/25/38 and AC-4.3 v1-scope clause added to acceptance_criteria.md. Findings + locked decisions saved to _docs/02_document/tests/_phase1_findings.md. NEXT on resume: Phase 2 (test-spec/phases/02-test-scenarios.md) — generate 8 artifacts under _docs/02_document/tests/ (environment.md, test-data.md, blackbox-tests.md, performance-tests.md, resilience-tests.md, security-tests.md, resource-limit-tests.md, traceability-matrix.md). Recommended fresh conversation due to context-budget caution zone."
retry_count: 0
cycle: 1
tracker: jira