Files
gps-denied-onboard/_docs/01_solution/decisions/0001-e2e-dataset-strategy.md
T
Yuzviak 560dc38f0a docs(adr): record ADR 0001 — e2e validation on public UAV datasets
First Architecture Decision Record for this project. Captures the
rationale for building the e2e harness on VPAIR / MARS-LVIG / EuRoC
rather than blocking on proprietary Mavic data collection; lists
three alternatives considered and why rejected; records the first
real-run baseline (VPAIR ATE ~1770 km) as a measurable starting
point for future VO+ESKF tuning regressions.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-17 13:55:22 +03:00

5.8 KiB
Raw Blame History

ADR 0001 — E2E Validation on Public UAV Datasets

Date: 2026-04-16 Status: Accepted Supersedes:Superseded by:

Context

The next_steps.md roadmap (item 3) requires bringing the codebase under the autopilot existing-code flow: build an end-to-end test harness that runs the full product as a black box on real UAV data before further refactoring (VO → cuVSLAM, src/gps_denied/src/). Without that harness, any refactor is a blind change — unit tests would stay green while the integrated pipeline could degrade silently.

The obvious data source — a synchronised IMU + downward-camera log from the target tactical fixed-wing UAV — is not available:

  • Denys Popov (Mavic) was asked for logs but delivery is open-ended.
  • In-house Mavic operators would need to be convinced to perform non-mission flights with nadir-oriented cameras at high altitude, which is not their normal pattern.
  • The DJI Mavic flight in /home/yuzviak/Azaion/Data/ has no raw IMU (DJI only exports fused attitude + velocities), so the ESKF path is not exercised.

Blocking the harness until someone flies a mission to spec costs weeks. Meanwhile several mature public UAV datasets exist that ship synchronised camera + IMU + ground truth.

Decision

Build the e2e harness on public UAV datasets as the primary validation substrate. Three datasets, three test tiers:

Tier Dataset Role Platform match
Primary VPAIR (sample) Fixed-wing, nadir, 300400 m — closest to target envelope Fixed-wing light aircraft ≈ tactical fixed-wing dynamics (constant forward motion, banks)
Stress MARS-LVIG Rotary UAV with explicit featureless sequences — stress-test VO on low-texture terrain flagged as CRITICAL RISK in solution_draft02 DJI M300 RTK (rotary) — dynamics mismatch accepted, chosen for the terrain variety
CI regression EuRoC MH_01 Industry-standard VIO benchmark; every published VIO algorithm reports EuRoC numbers, so harness output can be sanity-checked against literature Indoor micro-MAV — dynamics mismatch accepted, chosen for reproducibility and paper-comparability

Proprietary data collection remains on the roadmap but is deprioritised to a later phase (see next_steps.md §4). It becomes relevant once (a) we need to validate on our specific airframe's IMU vibration spectrum and camera intrinsics, or (b) VO+ESKF tuning stabilises and we want a "final" dataset captured at our operational envelope.

All adapters share a single DatasetAdapter interface (src/gps_denied/testing/datasets/base.py) with capability flags (has_raw_imu, has_rtk_gt, has_loop_closures, platform_class), so integration tests auto-skip paths the current adapter can't exercise rather than failing them.

Consequences

Positive:

  • Harness and refactor work unblocked on day one — no person-dependent delay.
  • Three datasets cover three distinct failure modes: fixed-wing dynamics (VPAIR), low-texture terrain (MARS-LVIG featureless), indoor benchmark comparability (EuRoC).
  • Public numbers for EuRoC mean we can cross-check our ATE against OpenVINS / VINS-Fusion — catches gross regressions immediately.
  • Capability-flag pattern means an adapter that ships poses but not raw IMU (VPAIR) still contributes: VO+GPR+graph paths run, ESKF path is explicitly skipped.

Negative / accepted trade-offs:

  • VPAIR and MARS-LVIG are academic-use-only licences. Fine for R&D and internal CI, not for shipping benchmarks in commercial material. ALTO (BSD-3) is the escape hatch if commercial-license benchmarks become a requirement.
  • VPAIR sample has no raw IMU — full ESKF+VO path is not exercised by it. EuRoC and MARS-LVIG cover that gap.
  • Altitude envelope of public datasets (indoor EuRoC, 80130 m MARS-LVIG, 300400 m VPAIR) undershoots the target 2001500 m tactical envelope. Extrapolating upward is a leap of faith; real-target-altitude validation stays on the roadmap.
  • Dataset formats vary wildly (EuRoC ASL, VPAIR ECEF+Euler text, MARS-LVIG ROS bags). Each adapter is custom. Mitigation: shared coord.py helpers (ECEF→WGS84, Euler→quaternion) and a small design contract enforced by the ABC.

First real-run evidence (VPAIR sample, 2026-04-16): pipeline completes on 200 fixed-wing nadir frames without crashing, but ATE RMSE ~1770 km — VO alone diverges catastrophically without IMU or satellite anchoring. This is the baseline. Expected improvement comes from EuRoC (has raw IMU → ESKF path works) and from tuning VO+GPR for high-altitude nadir imagery.

Alternatives considered

  • Wait for Denys / internal pilots to collect data. Rejected: unbounded delay, no guarantee of raw IMU, coordinates nothing we can start on today.
  • Use only EuRoC. Rejected: indoor micro-MAV dynamics are nothing like a tactical fixed-wing, and it has no place-recognition-against-satellite angle. Good for CI, insufficient for primary validation.
  • Preprocess datasets into a single normalised format and load via one reader. Rejected on YAGNI — three adapters with a shared ABC stay clearer than a converter pipeline and don't lose dataset-specific metadata.
  • Use Mid-Air (synthetic). Rejected: synthetic extremely-low-altitude quadcopter flights. Adds no signal over SyntheticAdapter which we already have for harness self-test.

References