Files
gps-denied-onboard/_docs/00_research/05_validation_log_mode_b.md
T
Oleksandr Bezdieniezhnykh 9eba1689b3 - Introduced a new document detailing the current state of the autodev process, including steps, status, and findings.
- Revised acceptance criteria in the acceptance_criteria.md file to clarify metrics and expectations, including updates to GPS accuracy and image processing quality.
- Enhanced restrictions documentation to reflect operational parameters and constraints for UAV flights, including camera specifications and satellite imagery usage.
- Added new research documents for acceptance criteria assessment and question decomposition to support ongoing project evaluation and decision-making.
2026-04-26 14:28:10 +03:00

6.1 KiB
Raw Blame History

Validation Log — Mode B

Validation scenario

A typical 8-hour fixed-wing mission in eastern Ukraine, mid-summer, sunny. The UAV climbs to 1 km AGL on the way to the sector, transits ~50 km of corridor, performs ~1.5 h of dense coverage (sector-pattern), and returns. Mid-flight, the operator-side EW threat indicator reports a GPS-spoofing event. At minute 45 the companion computer browns out and reboots; at minute 90 the UAV passes over a 25-m-deep gully system; at minute 180 a sharp turn on weather avoidance reduces frame overlap to <5 % for two consecutive frames; at minute 310 the bench network drops out before tile upload finishes; at minute 470 the UAV lands.

For each of these waypoints, walk through what the system produces using the Mode B-revised draft vs. the Mode A draft.


Expected behaviour by waypoint

Cruise (steady state)

  • Mode A — emit GPS_INPUT only; covariance collapsed to scalar h_acc. EKF in companion does the fusion.
  • Mode B (revised) — emit GPS_INPUT (primary, GPS-substitute framing) and ODOMETRY (when full 6-DoF covariance is available; quality > VISO_QUAL_MIN). FC's EKF3 has access to richer signal; companion EKF is still the source of truth for source-label assignment.
  • Counterexample check — what if ODOMETRY's covariance is wrong? VISO_QUAL_MIN gates it on the FC; GPS_INPUT path stays valid as failover. Net: no regression vs. Mode A.

Spoofing event (AC-NEW-2)

  • Mode A — listen to GPS_RAW_INT / EKF_STATUS_REPORT; promote our GPS_INPUT to fix_type=3D in <3 s.
  • Mode B (revised) — same, plus M-11 SITL coverage of the EK3_SRC1_* parameter switch path. The known-bug landscape (S43) is now a hard test gate, not a risk.
  • Counterexample check — what if the source switch deadlocks because PR #30080's fix isn't in the ArduPilot version we ship? Mitigation: pin to the ArduPilot version that contains the merged PR; document in deploy runbook.

Companion brown-out + reboot (AC-NEW-1, cold-start TTFF <30 s)

  • Mode A — TRT engines build at install time; CUDA / TRT init <5 s; cold-fix via VPR + matcher within remaining budget.
  • Mode B (revised) — same path, but the latency-budget headroom is much bigger than draft assumed (M-5: DINOv2 ViT-B = 8 ms/inf at 224×224 on Orin Nano Super). Cold TTFF target moves from "tight" to "comfortable".
  • Counterexample check — what if the CPython 3.13 free-threading question pulls us into experimental territory? Mode B explicitly rejects free-threading for v1 (M-10), so JIT warmup is bounded by numba on CPython 3.11/3.12 (well-characterised).

Rugged-terrain segment (M-12)

  • Mode A — flat-Earth assumption applied uniformly; tile generation runs even over the gully; 17 m horizontal misalignment at frame edge becomes a "high-quality" tile that overwrites a stale service tile. Cache-poisoning hazard (M-9).
  • Mode B (revised) — pre-flight DEM classifies this sector as "rugged" (>15 m amplitude); ortho-tile generation skipped in this sector; satellite anchor weight × 0.3 with rugged-sector flag in telemetry.
  • Counterexample check — what if the DEM is wrong / out-of-date? SRTM 30 m DEM has known artefacts in gully systems. Mitigation: also use the runtime self-classification — if the matcher's RANSAC inlier ratio drops below threshold for K consecutive frames, auto-promote the sector to "rugged" for the rest of the flight.

Sharp turn (AC-3.2)

  • Mode A — sharp turn frame fails VO (5 % overlap), satellite-based re-localization via VPR + matcher. ✓.
  • Mode B (revised) — same, but the VPR pool now includes SALAD + BoQ + AnyLoc + MixVPR (M-4). Bench-off result determines runtime primary; AnyLoc remains the training-free fallback.
  • Counterexample check — none introduced by Mode B.

Tile upload network drop (post-flight)

  • Mode A — diff-against-Service uploader; if the link drops, retry on next bench session.
  • Mode B (revised) — same, plus the M-9 voting rule means upload failure delays "trusted basemap" promotion but doesn't break next mission's cache (the Service ingest layer holds onboard tiles in a "candidate" pool until 2nd-flight confirmation).
  • Counterexample check — what if N=2 voting is too slow to react to fresh imagery? Set N=1 for sectors where the operator manually marks a tile-set as "trusted" (e.g., post-recon imagery).

Landing + post-flight upload

  • Mode A — uploader runs as one-shot; tiles + sidecars pushed to Service.
  • Mode B (revised) — uploader pushes onboard tiles to a candidate pool, not directly to the basemap. Service ingest applies the M-9 voting layer.
  • Counterexample check — does this slow down imagery freshness? Yes, by one mission for a given sector. AC-NEW-6 freshness budget already allows 6 months for active-conflict sectors and 12 months for stable rear sectors; one extra mission of latency is well inside that envelope.

Review checklist

  • Mode B conclusions consistent with fact cards M-1..M-15.
  • No important dimensions missed (W1W13 cross-checked vs. weak-point findings; W3.b, W4.b, W4.d, W11, W12 are not blocking — flagged as residual research items in solution_draft02.md "Open Research").
  • No over-extrapolation (every conclusion traceable to ≥1 source S40+ or to an explicit analytical chain).
  • All conclusions actionable / verifiable (source-switching SITL test, AC-NEW-7 numeric budget, sector DEM table, etc.).

Conclusions requiring user input

These items cannot be unilaterally resolved by Mode B and must be surfaced when handing the revised draft back to the user:

  1. M-13 — TartanAir V2 in the early-stage bench-off, yes/no?
  2. AC-NEW-7 numeric thresholds — Mode B proposes P(misalignment > 30 m) < 1 % per flight; P(>100 m) < 0.1 %. Confirm or revise.
  3. M-6 mavlink-router decision — three options (sandbox+pin / replace / no router with distinct system-IDs). Mode B recommends Option 3 for v1.
  4. M-1 hybrid output — accept the GPS_INPUT + ODOMETRY hybrid, or stay GPS_INPUT-only?

These are the four residual user-facing open items for the Plan step.