mirror of
https://github.com/azaion/gps-denied-onboard.git
synced 2026-04-22 22:06:37 +00:00
1bf8b2a684
Updates README, testing/README, next_steps.md, and ADR 0001 with the first real EuRoC MH_01 e2e run (100 frames, ~30s wall-time, ATE RMSE ~10.9 km → xfail). Places the EuRoC result alongside the prior VPAIR baseline (~1770 km) so future-reader can see both failure modes at a glance: - VPAIR diverges because no raw IMU → ESKF never engages - EuRoC diverges because indoor scene has no satellite anchor, so VO+ESKF drift without an external correction Also records the branching policy (rename ``euroc_mh01`` → ``euroc_machine_hall``; empty URL due to DSpace UI gate; manual fetch via DOI 10.3929/ethz-b-000690084). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
71 lines
6.2 KiB
Markdown
71 lines
6.2 KiB
Markdown
# ADR 0001 — E2E Validation on Public UAV Datasets
|
||
|
||
**Date:** 2026-04-16
|
||
**Status:** Accepted
|
||
**Supersedes:** —
|
||
**Superseded by:** —
|
||
|
||
## Context
|
||
|
||
The `next_steps.md` roadmap (item 3) requires bringing the codebase under the `autopilot existing-code` flow: build an end-to-end test harness that runs the full product as a black box on real UAV data *before* further refactoring (VO → cuVSLAM, `src/gps_denied/` → `src/`). Without that harness, any refactor is a blind change — unit tests would stay green while the integrated pipeline could degrade silently.
|
||
|
||
The obvious data source — a synchronised IMU + downward-camera log from the target tactical fixed-wing UAV — is not available:
|
||
|
||
- Denys Popov (Mavic) was asked for logs but delivery is open-ended.
|
||
- In-house Mavic operators would need to be convinced to perform non-mission flights with nadir-oriented cameras at high altitude, which is not their normal pattern.
|
||
- The DJI Mavic flight in `/home/yuzviak/Azaion/Data/` has no raw IMU (DJI only exports fused attitude + velocities), so the ESKF path is not exercised.
|
||
|
||
Blocking the harness until someone flies a mission to spec costs weeks. Meanwhile several mature public UAV datasets exist that ship synchronised camera + IMU + ground truth.
|
||
|
||
## Decision
|
||
|
||
Build the e2e harness on **public UAV datasets** as the primary validation substrate. Three datasets, three test tiers:
|
||
|
||
| Tier | Dataset | Role | Platform match |
|
||
|---|---|---|---|
|
||
| Primary | **VPAIR** (sample) | Fixed-wing, nadir, 300–400 m — closest to target envelope | Fixed-wing light aircraft ≈ tactical fixed-wing dynamics (constant forward motion, banks) |
|
||
| Stress | **MARS-LVIG** | Rotary UAV with explicit *featureless* sequences — stress-test VO on low-texture terrain flagged as CRITICAL RISK in solution_draft02 | DJI M300 RTK (rotary) — dynamics mismatch accepted, chosen for the terrain variety |
|
||
| CI regression | **EuRoC MH_01** | Industry-standard VIO benchmark; every published VIO algorithm reports EuRoC numbers, so harness output can be sanity-checked against literature | Indoor micro-MAV — dynamics mismatch accepted, chosen for reproducibility and paper-comparability |
|
||
|
||
Proprietary data collection remains on the roadmap but is deprioritised to a later phase (see `next_steps.md` §4). It becomes relevant once (a) we need to validate on our specific airframe's IMU vibration spectrum and camera intrinsics, or (b) VO+ESKF tuning stabilises and we want a "final" dataset captured at our operational envelope.
|
||
|
||
All adapters share a single `DatasetAdapter` interface (`src/gps_denied/testing/datasets/base.py`) with capability flags (`has_raw_imu`, `has_rtk_gt`, `has_loop_closures`, `platform_class`), so integration tests auto-skip paths the current adapter can't exercise rather than failing them.
|
||
|
||
## Consequences
|
||
|
||
**Positive:**
|
||
|
||
- Harness and refactor work unblocked on day one — no person-dependent delay.
|
||
- Three datasets cover three distinct failure modes: fixed-wing dynamics (VPAIR), low-texture terrain (MARS-LVIG featureless), indoor benchmark comparability (EuRoC).
|
||
- Public numbers for EuRoC mean we can cross-check our ATE against OpenVINS / VINS-Fusion — catches gross regressions immediately.
|
||
- Capability-flag pattern means an adapter that ships poses but not raw IMU (VPAIR) still contributes: VO+GPR+graph paths run, ESKF path is explicitly skipped.
|
||
|
||
**Negative / accepted trade-offs:**
|
||
|
||
- VPAIR and MARS-LVIG are academic-use-only licences. Fine for R&D and internal CI, not for shipping benchmarks in commercial material. ALTO (BSD-3) is the escape hatch if commercial-license benchmarks become a requirement.
|
||
- VPAIR sample has no raw IMU — full ESKF+VO path is not exercised by it. EuRoC and MARS-LVIG cover that gap.
|
||
- Altitude envelope of public datasets (indoor EuRoC, 80–130 m MARS-LVIG, 300–400 m VPAIR) undershoots the target 200–1500 m tactical envelope. Extrapolating upward is a leap of faith; real-target-altitude validation stays on the roadmap.
|
||
- Dataset formats vary wildly (EuRoC ASL, VPAIR ECEF+Euler text, MARS-LVIG ROS bags). Each adapter is custom. Mitigation: shared `coord.py` helpers (ECEF→WGS84, Euler→quaternion) and a small design contract enforced by the ABC.
|
||
|
||
**First real-run evidence**:
|
||
|
||
- **VPAIR sample** (2026-04-16, 200 fixed-wing nadir frames): pipeline completes without crashing, ATE RMSE ~1770 km — VO alone diverges because VPAIR ships poses only, no raw IMU, so ESKF never engages.
|
||
- **EuRoC MH_01** (2026-04-17, first 100 indoor MAV frames, ~30 s wall-time): pipeline completes, ATE RMSE ~10.9 km. Raw IMU is present (200 Hz ADIS16448) so ESKF *does* run, but the satellite-matching anchor never fires on an indoor scene. Still diverges — one order of magnitude less than VPAIR, and for a different underlying reason. Both are xfail-gated.
|
||
|
||
These are the starting measurements. Improvements will come from tuning VO+ESKF for each regime and wiring a substitute anchor (e.g. synthetic ArUco markers for indoor, GPR-against-satellite at the right resolution for VPAIR).
|
||
|
||
## Alternatives considered
|
||
|
||
- **Wait for Denys / internal pilots to collect data.** Rejected: unbounded delay, no guarantee of raw IMU, coordinates nothing we can start on today.
|
||
- **Use only EuRoC.** Rejected: indoor micro-MAV dynamics are nothing like a tactical fixed-wing, and it has no place-recognition-against-satellite angle. Good for CI, insufficient for primary validation.
|
||
- **Preprocess datasets into a single normalised format and load via one reader.** Rejected on YAGNI — three adapters with a shared ABC stay clearer than a converter pipeline and don't lose dataset-specific metadata.
|
||
- **Use Mid-Air (synthetic).** Rejected: synthetic extremely-low-altitude quadcopter flights. Adds no signal over `SyntheticAdapter` which we already have for harness self-test.
|
||
|
||
## References
|
||
|
||
- Roadmap context: [next_steps.md](../../../next_steps.md) §3
|
||
- Harness architecture: [src/gps_denied/testing/README.md](../../../src/gps_denied/testing/README.md)
|
||
- Target system: [_docs/01_solution/solution.md](../solution.md), §Testing Strategy
|
||
- Low-texture CRITICAL RISK: [_docs/01_solution/solution_draft02.md](../solution_draft02.md)
|
||
- Local working drafts (not in repo): `.planning/brainstorms/2026-04-16-e2e-datasets-design.md`, `.planning/brainstorms/2026-04-16-e2e-datasets-plan.md`
|