- Introduced a new document detailing the current state of the autodev process, including steps, status, and findings.

- Revised acceptance criteria in the acceptance_criteria.md file to clarify metrics and expectations, including updates to GPS accuracy and image processing quality.
- Enhanced restrictions documentation to reflect operational parameters and constraints for UAV flights, including camera specifications and satellite imagery usage.
- Added new research documents for acceptance criteria assessment and question decomposition to support ongoing project evaluation and decision-making.
This commit is contained in:
Oleksandr Bezdieniezhnykh
2026-04-26 14:28:10 +03:00
parent 2178737b36
commit 9eba1689b3
17 changed files with 2965 additions and 69 deletions
@@ -0,0 +1,75 @@
# Validation Log — Mode B
## Validation scenario
A typical 8-hour fixed-wing mission in eastern Ukraine, mid-summer, sunny. The UAV climbs to 1 km AGL on the way to the sector, transits ~50 km of corridor, performs ~1.5 h of dense coverage (sector-pattern), and returns. Mid-flight, the operator-side EW threat indicator reports a GPS-spoofing event. At minute 45 the companion computer browns out and reboots; at minute 90 the UAV passes over a 25-m-deep gully system; at minute 180 a sharp turn on weather avoidance reduces frame overlap to <5 % for two consecutive frames; at minute 310 the bench network drops out before tile upload finishes; at minute 470 the UAV lands.
For each of these waypoints, walk through what the system produces using the **Mode B-revised draft** vs. the **Mode A draft**.
---
## Expected behaviour by waypoint
### Cruise (steady state)
- **Mode A** — emit GPS_INPUT only; covariance collapsed to scalar `h_acc`. EKF in companion does the fusion.
- **Mode B (revised)** — emit GPS_INPUT (primary, GPS-substitute framing) **and** ODOMETRY (when full 6-DoF covariance is available; quality > VISO_QUAL_MIN). FC's EKF3 has access to richer signal; companion EKF is still the source of truth for source-label assignment.
- **Counterexample check** — what if ODOMETRY's covariance is wrong? VISO_QUAL_MIN gates it on the FC; GPS_INPUT path stays valid as failover. Net: no regression vs. Mode A.
### Spoofing event (AC-NEW-2)
- **Mode A** — listen to `GPS_RAW_INT` / `EKF_STATUS_REPORT`; promote our GPS_INPUT to fix_type=3D in <3 s.
- **Mode B (revised)** — same, plus M-11 SITL coverage of the EK3_SRC1_* parameter switch path. The known-bug landscape (S43) is now a hard test gate, not a risk.
- **Counterexample check** — what if the source switch deadlocks because PR #30080's fix isn't in the ArduPilot version we ship? Mitigation: pin to the ArduPilot version that contains the merged PR; document in deploy runbook.
### Companion brown-out + reboot (AC-NEW-1, cold-start TTFF <30 s)
- **Mode A** — TRT engines build at install time; CUDA / TRT init <5 s; cold-fix via VPR + matcher within remaining budget.
- **Mode B (revised)** — same path, but the latency-budget headroom is much bigger than draft assumed (M-5: DINOv2 ViT-B = 8 ms/inf at 224×224 on Orin Nano Super). Cold TTFF target moves from "tight" to "comfortable".
- **Counterexample check** — what if the CPython 3.13 free-threading question pulls us into experimental territory? Mode B explicitly rejects free-threading for v1 (M-10), so JIT warmup is bounded by numba on CPython 3.11/3.12 (well-characterised).
### Rugged-terrain segment (M-12)
- **Mode A** — flat-Earth assumption applied uniformly; tile generation runs even over the gully; 17 m horizontal misalignment at frame edge becomes a "high-quality" tile that overwrites a stale service tile. **Cache-poisoning hazard** (M-9).
- **Mode B (revised)** — pre-flight DEM classifies this sector as "rugged" (>15 m amplitude); ortho-tile generation **skipped** in this sector; satellite anchor weight × 0.3 with rugged-sector flag in telemetry.
- **Counterexample check** — what if the DEM is wrong / out-of-date? SRTM 30 m DEM has known artefacts in gully systems. Mitigation: also use the runtime self-classification — if the matcher's RANSAC inlier ratio drops below threshold for K consecutive frames, auto-promote the sector to "rugged" for the rest of the flight.
### Sharp turn (AC-3.2)
- **Mode A** — sharp turn frame fails VO (5 % overlap), satellite-based re-localization via VPR + matcher. ✓.
- **Mode B (revised)** — same, but the VPR pool now includes SALAD + BoQ + AnyLoc + MixVPR (M-4). Bench-off result determines runtime primary; AnyLoc remains the training-free fallback.
- **Counterexample check** — none introduced by Mode B.
### Tile upload network drop (post-flight)
- **Mode A** — diff-against-Service uploader; if the link drops, retry on next bench session.
- **Mode B (revised)** — same, plus the M-9 voting rule means upload failure delays "trusted basemap" promotion but doesn't break next mission's cache (the Service ingest layer holds onboard tiles in a "candidate" pool until 2nd-flight confirmation).
- **Counterexample check** — what if N=2 voting is too slow to react to fresh imagery? Set N=1 for sectors where the operator manually marks a tile-set as "trusted" (e.g., post-recon imagery).
### Landing + post-flight upload
- **Mode A** — uploader runs as one-shot; tiles + sidecars pushed to Service.
- **Mode B (revised)** — uploader pushes onboard tiles to a **candidate pool**, not directly to the basemap. Service ingest applies the M-9 voting layer.
- **Counterexample check** — does this slow down imagery freshness? Yes, by one mission for a given sector. AC-NEW-6 freshness budget already allows 6 months for active-conflict sectors and 12 months for stable rear sectors; one extra mission of latency is well inside that envelope.
---
## Review checklist
- [x] Mode B conclusions consistent with fact cards M-1..M-15.
- [x] No important dimensions missed (W1W13 cross-checked vs. weak-point findings; W3.b, W4.b, W4.d, W11, W12 are not blocking — flagged as residual research items in `solution_draft02.md` "Open Research").
- [x] No over-extrapolation (every conclusion traceable to ≥1 source S40+ or to an explicit analytical chain).
- [x] All conclusions actionable / verifiable (source-switching SITL test, AC-NEW-7 numeric budget, sector DEM table, etc.).
---
## Conclusions requiring user input
These items cannot be unilaterally resolved by Mode B and must be surfaced when handing the revised draft back to the user:
1. **M-13** — TartanAir V2 in the early-stage bench-off, yes/no?
2. **AC-NEW-7 numeric thresholds** — Mode B proposes P(misalignment > 30 m) < 1 % per flight; P(>100 m) < 0.1 %. Confirm or revise.
3. **M-6 mavlink-router decision** — three options (sandbox+pin / replace / no router with distinct system-IDs). Mode B recommends Option 3 for v1.
4. **M-1 hybrid output** — accept the GPS_INPUT + ODOMETRY hybrid, or stay GPS_INPUT-only?
These are the four residual user-facing open items for the Plan step.