- Revised acceptance criteria in the acceptance_criteria.md file to clarify metrics and expectations, including updates to GPS accuracy and image processing quality. - Enhanced restrictions documentation to reflect operational parameters and constraints for UAV flights, including camera specifications and satellite imagery usage. - Added new research documents for acceptance criteria assessment and question decomposition to support ongoing project evaluation and decision-making.
6.1 KiB
Validation Log — Mode B
Validation scenario
A typical 8-hour fixed-wing mission in eastern Ukraine, mid-summer, sunny. The UAV climbs to 1 km AGL on the way to the sector, transits ~50 km of corridor, performs ~1.5 h of dense coverage (sector-pattern), and returns. Mid-flight, the operator-side EW threat indicator reports a GPS-spoofing event. At minute 45 the companion computer browns out and reboots; at minute 90 the UAV passes over a 25-m-deep gully system; at minute 180 a sharp turn on weather avoidance reduces frame overlap to <5 % for two consecutive frames; at minute 310 the bench network drops out before tile upload finishes; at minute 470 the UAV lands.
For each of these waypoints, walk through what the system produces using the Mode B-revised draft vs. the Mode A draft.
Expected behaviour by waypoint
Cruise (steady state)
- Mode A — emit GPS_INPUT only; covariance collapsed to scalar
h_acc. EKF in companion does the fusion. - Mode B (revised) — emit GPS_INPUT (primary, GPS-substitute framing) and ODOMETRY (when full 6-DoF covariance is available; quality > VISO_QUAL_MIN). FC's EKF3 has access to richer signal; companion EKF is still the source of truth for source-label assignment.
- Counterexample check — what if ODOMETRY's covariance is wrong? VISO_QUAL_MIN gates it on the FC; GPS_INPUT path stays valid as failover. Net: no regression vs. Mode A.
Spoofing event (AC-NEW-2)
- Mode A — listen to
GPS_RAW_INT/EKF_STATUS_REPORT; promote our GPS_INPUT to fix_type=3D in <3 s. - Mode B (revised) — same, plus M-11 SITL coverage of the EK3_SRC1_* parameter switch path. The known-bug landscape (S43) is now a hard test gate, not a risk.
- Counterexample check — what if the source switch deadlocks because PR #30080's fix isn't in the ArduPilot version we ship? Mitigation: pin to the ArduPilot version that contains the merged PR; document in deploy runbook.
Companion brown-out + reboot (AC-NEW-1, cold-start TTFF <30 s)
- Mode A — TRT engines build at install time; CUDA / TRT init <5 s; cold-fix via VPR + matcher within remaining budget.
- Mode B (revised) — same path, but the latency-budget headroom is much bigger than draft assumed (M-5: DINOv2 ViT-B = 8 ms/inf at 224×224 on Orin Nano Super). Cold TTFF target moves from "tight" to "comfortable".
- Counterexample check — what if the CPython 3.13 free-threading question pulls us into experimental territory? Mode B explicitly rejects free-threading for v1 (M-10), so JIT warmup is bounded by numba on CPython 3.11/3.12 (well-characterised).
Rugged-terrain segment (M-12)
- Mode A — flat-Earth assumption applied uniformly; tile generation runs even over the gully; 17 m horizontal misalignment at frame edge becomes a "high-quality" tile that overwrites a stale service tile. Cache-poisoning hazard (M-9).
- Mode B (revised) — pre-flight DEM classifies this sector as "rugged" (>15 m amplitude); ortho-tile generation skipped in this sector; satellite anchor weight × 0.3 with rugged-sector flag in telemetry.
- Counterexample check — what if the DEM is wrong / out-of-date? SRTM 30 m DEM has known artefacts in gully systems. Mitigation: also use the runtime self-classification — if the matcher's RANSAC inlier ratio drops below threshold for K consecutive frames, auto-promote the sector to "rugged" for the rest of the flight.
Sharp turn (AC-3.2)
- Mode A — sharp turn frame fails VO (5 % overlap), satellite-based re-localization via VPR + matcher. ✓.
- Mode B (revised) — same, but the VPR pool now includes SALAD + BoQ + AnyLoc + MixVPR (M-4). Bench-off result determines runtime primary; AnyLoc remains the training-free fallback.
- Counterexample check — none introduced by Mode B.
Tile upload network drop (post-flight)
- Mode A — diff-against-Service uploader; if the link drops, retry on next bench session.
- Mode B (revised) — same, plus the M-9 voting rule means upload failure delays "trusted basemap" promotion but doesn't break next mission's cache (the Service ingest layer holds onboard tiles in a "candidate" pool until 2nd-flight confirmation).
- Counterexample check — what if N=2 voting is too slow to react to fresh imagery? Set N=1 for sectors where the operator manually marks a tile-set as "trusted" (e.g., post-recon imagery).
Landing + post-flight upload
- Mode A — uploader runs as one-shot; tiles + sidecars pushed to Service.
- Mode B (revised) — uploader pushes onboard tiles to a candidate pool, not directly to the basemap. Service ingest applies the M-9 voting layer.
- Counterexample check — does this slow down imagery freshness? Yes, by one mission for a given sector. AC-NEW-6 freshness budget already allows 6 months for active-conflict sectors and 12 months for stable rear sectors; one extra mission of latency is well inside that envelope.
Review checklist
- Mode B conclusions consistent with fact cards M-1..M-15.
- No important dimensions missed (W1–W13 cross-checked vs. weak-point findings; W3.b, W4.b, W4.d, W11, W12 are not blocking — flagged as residual research items in
solution_draft02.md"Open Research"). - No over-extrapolation (every conclusion traceable to ≥1 source S40+ or to an explicit analytical chain).
- All conclusions actionable / verifiable (source-switching SITL test, AC-NEW-7 numeric budget, sector DEM table, etc.).
Conclusions requiring user input
These items cannot be unilaterally resolved by Mode B and must be surfaced when handing the revised draft back to the user:
- M-13 — TartanAir V2 in the early-stage bench-off, yes/no?
- AC-NEW-7 numeric thresholds — Mode B proposes P(misalignment > 30 m) < 1 % per flight; P(>100 m) < 0.1 %. Confirm or revise.
- M-6 mavlink-router decision — three options (sandbox+pin / replace / no router with distinct system-IDs). Mode B recommends Option 3 for v1.
- M-1 hybrid output — accept the GPS_INPUT + ODOMETRY hybrid, or stay GPS_INPUT-only?
These are the four residual user-facing open items for the Plan step.