Files
gps-denied-onboard/src/gps_denied/testing/README.md
T
Yuzviak 81ec7c317c docs: record PR #10 — all 5 EuRoC MH baseline numbers
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-18 15:19:41 +03:00

121 lines
7.2 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters
This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.
# `gps_denied.testing` — E2E Test Harness
Test-only subpackage. Not imported by product code. Runs the full `FlightProcessor` pipeline as a black box on public UAV datasets and compares estimated trajectories against ground truth.
## When to use this
- Adding a new public dataset → implement a `DatasetAdapter` subclass here.
- Debugging the pipeline on real flight-like data → run an e2e test locally with a real dataset in `./datasets/`.
- Guarding a refactor (VO → cuVSLAM, `src/gps_denied/``src/`, etc.) → run `pytest tests/e2e/` before and after, compare numbers.
Do **not** put production code here and do not import `gps_denied.testing.*` from `gps_denied.core.*` or `gps_denied.api.*`. The import direction is one-way: tests may see the product, the product must not see tests.
## Package layout
```
src/gps_denied/testing/
coord.py ECEF→WGS84 (Heikkinen closed-form), Euler→quaternion (ZYX aerospace)
metrics.py trajectory_rmse, absolute_trajectory_error, relative_pose_error
harness.py E2EHarness + HarnessResult
download.py DATASET_REGISTRY + SHA256-verified download_dataset()
datasets/
base.py DatasetAdapter ABC, DatasetCapabilities, DatasetFrame/IMU/Pose
synthetic.py SyntheticAdapter (harness self-test)
euroc.py EuRoCAdapter (ETHZ ASL MAV format)
vpair.py VPAIRAdapter (AerVisLoc sample — ECEF + Euler)
mars_lvig.py MARSLVIGAdapter (pre-extracted ROS bag layout)
```
Tests live at `tests/e2e/`. Real datasets are expected at repo root in `./datasets/<name>/` (gitignored).
## DatasetAdapter contract
Every adapter is a read-only iterator over one dataset sequence. It has a `name`, declared `capabilities`, and three streams: frames, IMU samples, ground-truth poses. Frames carry a timestamp and an image path; IMU carries body-frame accel+gyro; poses are WGS84 lat/lon/alt plus a unit quaternion.
```python
class DatasetAdapter(ABC):
@property
def name(self) -> str: ... # e.g. "euroc:MH_01"
@property
def capabilities(self) -> DatasetCapabilities: ...
def iter_frames(self) -> Iterator[DatasetFrame]: ...
def iter_imu(self) -> Iterator[DatasetIMU]: ...
def iter_ground_truth(self) -> Iterator[DatasetPose]: ...
```
If the dataset is not present on disk (or is incomplete), the adapter's `__init__` raises `DatasetNotAvailableError` with an actionable message. Test fixtures catch that and `pytest.skip` — they never fail.
### Capability flags
`DatasetCapabilities` tells tests what to expect. Tests use these flags to skip paths the adapter can't exercise:
| Flag | What it means | Example false case |
|---|---|---|
| `has_raw_imu` | `iter_imu()` yields raw accel+gyro at ≥100 Hz | VPAIR sample (ships 6-DoF poses only) |
| `has_rtk_gt` | Ground-truth positions are RTK-grade (<0.1 m) | EuRoC (uses Vicon, millimetre-grade but not RTK) |
| `has_loop_closures` | Trajectory revisits locations (affects GPR expected hit rate) | Most open-field fixed-wing flights |
| `platform_class` | `fixed_wing` / `rotary` / `indoor` / `synthetic` — dynamics differ sharply | — |
When a test needs `has_raw_imu=True` but the adapter has it False, the integration test should `pytest.skip` at the top, not assert.
## Writing a new adapter — recipe
1. **Decide capabilities first.** Read the dataset's paper/README. Does it ship raw IMU? RTK? What's the platform class?
2. **Add a failing adapter unit test** in `tests/e2e/test_<name>_adapter.py` using a `tmp_path`-based fabricated fixture. Mirror the real file layout (directory names, CSV headers, value ranges).
3. **Implement the adapter.** Reuse `coord.ecef_to_wgs84` and `coord.euler_to_quaternion` if the dataset ships those. Synthesize timestamps if the dataset doesn't have them (e.g. VPAIR — 5 Hz = 200 000 000 ns period).
4. **Add a session-scoped fixture** in `tests/e2e/conftest.py` that looks for the real dataset under `./datasets/<name>/<subdir>/` and skips with an actionable install hint.
5. **Add an integration test** in `tests/e2e/test_<name>.py` with `@pytest.mark.e2e @pytest.mark.needs_dataset` (add `@pytest.mark.e2e_slow` if >2 min). Compare harness output to GT using `metrics.absolute_trajectory_error`. When the pipeline is not yet tuned for the dataset, use `pytest.xfail()` to document the current gap instead of hard failing.
6. **Register SHA256** of the known-good dataset archive in `DATASET_REGISTRY`. Leave `url=""` if downloads are form-gated — the registry then documents the hash without enabling drive-by fetches.
## Harness data flow
```
adapter.iter_frames() ─┐
adapter.iter_imu() ├─▶ E2EHarness.run() ─▶ FlightProcessor.process_frame() ─▶ collected estimates
adapter.iter_ground_truth() ────────────────────▶ HarnessResult.ground_truth (ENU metres)
metrics.absolute_trajectory_error()
RMSE assert or pytest.xfail()
```
The harness owns a minimal `FlightProcessor` built with `MagicMock` repository and SSE streamer, wires in the real `vo/gpr/metric/graph/chunk_mgr/recovery` components via `attach_components()`, and feeds frames sequentially. GPS estimates (`FrameResult.gps`) are collected; both estimate and GT tracks are converted to a local ENU frame rooted at GT pose 0 so trajectory metrics don't depend on the absolute geodetic origin.
## Running
```bash
# Fast: unit + adapter tests, skip anything needing a real dataset
pytest tests/e2e/ -q
# CI tier: run what has a dataset locally, stay under ~30s
pytest tests/e2e/ -m "e2e and not e2e_slow" -v
# Nightly tier: VPAIR, MARS-LVIG, other long runs
pytest tests/e2e/ -m e2e_slow -v
# Inspect the registry / trigger a manual-instruction printout
python scripts/download_dataset.py <name> # exits 3 and prints fetch steps
# if the entry has an empty URL
```
Markers (`e2e`, `e2e_slow`, `needs_dataset`) are registered in `pyproject.toml`.
## Existing adapters at a glance
| Adapter | Platform | Raw IMU | GT | Real-run status |
|---|---|---|---|---|
| `SyntheticAdapter` | — | yes (zero motion) | exact | smoke test only, always runs |
| `EuRoCAdapter` | indoor MAV | 200 Hz ADIS16448 | Vicon | all 5 MH seqs, 100 frames each: ESKF ATE 0.0070.205 m (scale=5 mm/frame), **10/10 PASS**; GPS ATE xfail (indoor, no satellite tiles) |
| `VPAIRAdapter` | fixed-wing light aircraft | no (pose-only) | GNSS/INS ~1 m | ran once — ATE ~1770 km, xfail documented; VO alone diverges without anchoring |
| `MARSLVIGAdapter` | rotary (DJI M300 RTK) | yes | RTK | pending (requires pre-extracted ROS bag) |
## References
- Dataset-selection rationale: [ADR 0001](../../../_docs/01_solution/decisions/0001-e2e-dataset-strategy.md)
- Roadmap checklist: [next_steps.md](../../../next_steps.md)
- Target system solution: [_docs/01_solution/solution.md](../../../_docs/01_solution/solution.md), §Testing Strategy