Files
gps-denied-onboard/_docs/01_solution/decisions/0001-e2e-dataset-strategy.md
T
Yuzviak 1bf8b2a684 docs: record EuRoC MH_01 real-run baseline across all doc surfaces
Updates README, testing/README, next_steps.md, and ADR 0001 with the
first real EuRoC MH_01 e2e run (100 frames, ~30s wall-time, ATE RMSE
~10.9 km → xfail). Places the EuRoC result alongside the prior VPAIR
baseline (~1770 km) so future-reader can see both failure modes at a
glance:

- VPAIR diverges because no raw IMU → ESKF never engages
- EuRoC diverges because indoor scene has no satellite anchor, so
  VO+ESKF drift without an external correction

Also records the branching policy (rename ``euroc_mh01`` →
``euroc_machine_hall``; empty URL due to DSpace UI gate; manual
fetch via DOI 10.3929/ethz-b-000690084).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-17 17:52:06 +03:00

71 lines
6.2 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters
This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.
# ADR 0001 — E2E Validation on Public UAV Datasets
**Date:** 2026-04-16
**Status:** Accepted
**Supersedes:**
**Superseded by:**
## Context
The `next_steps.md` roadmap (item 3) requires bringing the codebase under the `autopilot existing-code` flow: build an end-to-end test harness that runs the full product as a black box on real UAV data *before* further refactoring (VO → cuVSLAM, `src/gps_denied/``src/`). Without that harness, any refactor is a blind change — unit tests would stay green while the integrated pipeline could degrade silently.
The obvious data source — a synchronised IMU + downward-camera log from the target tactical fixed-wing UAV — is not available:
- Denys Popov (Mavic) was asked for logs but delivery is open-ended.
- In-house Mavic operators would need to be convinced to perform non-mission flights with nadir-oriented cameras at high altitude, which is not their normal pattern.
- The DJI Mavic flight in `/home/yuzviak/Azaion/Data/` has no raw IMU (DJI only exports fused attitude + velocities), so the ESKF path is not exercised.
Blocking the harness until someone flies a mission to spec costs weeks. Meanwhile several mature public UAV datasets exist that ship synchronised camera + IMU + ground truth.
## Decision
Build the e2e harness on **public UAV datasets** as the primary validation substrate. Three datasets, three test tiers:
| Tier | Dataset | Role | Platform match |
|---|---|---|---|
| Primary | **VPAIR** (sample) | Fixed-wing, nadir, 300400 m — closest to target envelope | Fixed-wing light aircraft ≈ tactical fixed-wing dynamics (constant forward motion, banks) |
| Stress | **MARS-LVIG** | Rotary UAV with explicit *featureless* sequences — stress-test VO on low-texture terrain flagged as CRITICAL RISK in solution_draft02 | DJI M300 RTK (rotary) — dynamics mismatch accepted, chosen for the terrain variety |
| CI regression | **EuRoC MH_01** | Industry-standard VIO benchmark; every published VIO algorithm reports EuRoC numbers, so harness output can be sanity-checked against literature | Indoor micro-MAV — dynamics mismatch accepted, chosen for reproducibility and paper-comparability |
Proprietary data collection remains on the roadmap but is deprioritised to a later phase (see `next_steps.md` §4). It becomes relevant once (a) we need to validate on our specific airframe's IMU vibration spectrum and camera intrinsics, or (b) VO+ESKF tuning stabilises and we want a "final" dataset captured at our operational envelope.
All adapters share a single `DatasetAdapter` interface (`src/gps_denied/testing/datasets/base.py`) with capability flags (`has_raw_imu`, `has_rtk_gt`, `has_loop_closures`, `platform_class`), so integration tests auto-skip paths the current adapter can't exercise rather than failing them.
## Consequences
**Positive:**
- Harness and refactor work unblocked on day one — no person-dependent delay.
- Three datasets cover three distinct failure modes: fixed-wing dynamics (VPAIR), low-texture terrain (MARS-LVIG featureless), indoor benchmark comparability (EuRoC).
- Public numbers for EuRoC mean we can cross-check our ATE against OpenVINS / VINS-Fusion — catches gross regressions immediately.
- Capability-flag pattern means an adapter that ships poses but not raw IMU (VPAIR) still contributes: VO+GPR+graph paths run, ESKF path is explicitly skipped.
**Negative / accepted trade-offs:**
- VPAIR and MARS-LVIG are academic-use-only licences. Fine for R&D and internal CI, not for shipping benchmarks in commercial material. ALTO (BSD-3) is the escape hatch if commercial-license benchmarks become a requirement.
- VPAIR sample has no raw IMU — full ESKF+VO path is not exercised by it. EuRoC and MARS-LVIG cover that gap.
- Altitude envelope of public datasets (indoor EuRoC, 80130 m MARS-LVIG, 300400 m VPAIR) undershoots the target 2001500 m tactical envelope. Extrapolating upward is a leap of faith; real-target-altitude validation stays on the roadmap.
- Dataset formats vary wildly (EuRoC ASL, VPAIR ECEF+Euler text, MARS-LVIG ROS bags). Each adapter is custom. Mitigation: shared `coord.py` helpers (ECEF→WGS84, Euler→quaternion) and a small design contract enforced by the ABC.
**First real-run evidence**:
- **VPAIR sample** (2026-04-16, 200 fixed-wing nadir frames): pipeline completes without crashing, ATE RMSE ~1770 km — VO alone diverges because VPAIR ships poses only, no raw IMU, so ESKF never engages.
- **EuRoC MH_01** (2026-04-17, first 100 indoor MAV frames, ~30 s wall-time): pipeline completes, ATE RMSE ~10.9 km. Raw IMU is present (200 Hz ADIS16448) so ESKF *does* run, but the satellite-matching anchor never fires on an indoor scene. Still diverges — one order of magnitude less than VPAIR, and for a different underlying reason. Both are xfail-gated.
These are the starting measurements. Improvements will come from tuning VO+ESKF for each regime and wiring a substitute anchor (e.g. synthetic ArUco markers for indoor, GPR-against-satellite at the right resolution for VPAIR).
## Alternatives considered
- **Wait for Denys / internal pilots to collect data.** Rejected: unbounded delay, no guarantee of raw IMU, coordinates nothing we can start on today.
- **Use only EuRoC.** Rejected: indoor micro-MAV dynamics are nothing like a tactical fixed-wing, and it has no place-recognition-against-satellite angle. Good for CI, insufficient for primary validation.
- **Preprocess datasets into a single normalised format and load via one reader.** Rejected on YAGNI — three adapters with a shared ABC stay clearer than a converter pipeline and don't lose dataset-specific metadata.
- **Use Mid-Air (synthetic).** Rejected: synthetic extremely-low-altitude quadcopter flights. Adds no signal over `SyntheticAdapter` which we already have for harness self-test.
## References
- Roadmap context: [next_steps.md](../../../next_steps.md) §3
- Harness architecture: [src/gps_denied/testing/README.md](../../../src/gps_denied/testing/README.md)
- Target system: [_docs/01_solution/solution.md](../solution.md), §Testing Strategy
- Low-texture CRITICAL RISK: [_docs/01_solution/solution_draft02.md](../solution_draft02.md)
- Local working drafts (not in repo): `.planning/brainstorms/2026-04-16-e2e-datasets-design.md`, `.planning/brainstorms/2026-04-16-e2e-datasets-plan.md`