mirror of
https://github.com/azaion/gps-denied-onboard.git
synced 2026-04-22 09:26:38 +00:00
docs(testing): add architecture guide for the e2e harness subpackage
Explains the DatasetAdapter contract (name/capabilities/iter_*), capability-flag semantics (has_raw_imu, has_rtk_gt, platform_class), the recipe for adding a new adapter (fabricated fixture → adapter → conftest fixture → integration test → registry SHA256), and the current state of each shipped adapter including the VPAIR ~1770 km ATE real-run baseline. Lives next to the code so it stays in sync. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This commit is contained in:
@@ -0,0 +1,119 @@
|
||||
# `gps_denied.testing` — E2E Test Harness
|
||||
|
||||
Test-only subpackage. Not imported by product code. Runs the full `FlightProcessor` pipeline as a black box on public UAV datasets and compares estimated trajectories against ground truth.
|
||||
|
||||
## When to use this
|
||||
|
||||
- Adding a new public dataset → implement a `DatasetAdapter` subclass here.
|
||||
- Debugging the pipeline on real flight-like data → run an e2e test locally with a real dataset in `./datasets/`.
|
||||
- Guarding a refactor (VO → cuVSLAM, `src/gps_denied/` → `src/`, etc.) → run `pytest tests/e2e/` before and after, compare numbers.
|
||||
|
||||
Do **not** put production code here and do not import `gps_denied.testing.*` from `gps_denied.core.*` or `gps_denied.api.*`. The import direction is one-way: tests may see the product, the product must not see tests.
|
||||
|
||||
## Package layout
|
||||
|
||||
```
|
||||
src/gps_denied/testing/
|
||||
coord.py ECEF→WGS84 (Heikkinen closed-form), Euler→quaternion (ZYX aerospace)
|
||||
metrics.py trajectory_rmse, absolute_trajectory_error, relative_pose_error
|
||||
harness.py E2EHarness + HarnessResult
|
||||
download.py DATASET_REGISTRY + SHA256-verified download_dataset()
|
||||
datasets/
|
||||
base.py DatasetAdapter ABC, DatasetCapabilities, DatasetFrame/IMU/Pose
|
||||
synthetic.py SyntheticAdapter (harness self-test)
|
||||
euroc.py EuRoCAdapter (ETHZ ASL MAV format)
|
||||
vpair.py VPAIRAdapter (AerVisLoc sample — ECEF + Euler)
|
||||
mars_lvig.py MARSLVIGAdapter (pre-extracted ROS bag layout)
|
||||
```
|
||||
|
||||
Tests live at `tests/e2e/`. Real datasets are expected at repo root in `./datasets/<name>/` (gitignored).
|
||||
|
||||
## DatasetAdapter contract
|
||||
|
||||
Every adapter is a read-only iterator over one dataset sequence. It has a `name`, declared `capabilities`, and three streams: frames, IMU samples, ground-truth poses. Frames carry a timestamp and an image path; IMU carries body-frame accel+gyro; poses are WGS84 lat/lon/alt plus a unit quaternion.
|
||||
|
||||
```python
|
||||
class DatasetAdapter(ABC):
|
||||
@property
|
||||
def name(self) -> str: ... # e.g. "euroc:MH_01"
|
||||
|
||||
@property
|
||||
def capabilities(self) -> DatasetCapabilities: ...
|
||||
|
||||
def iter_frames(self) -> Iterator[DatasetFrame]: ...
|
||||
def iter_imu(self) -> Iterator[DatasetIMU]: ...
|
||||
def iter_ground_truth(self) -> Iterator[DatasetPose]: ...
|
||||
```
|
||||
|
||||
If the dataset is not present on disk (or is incomplete), the adapter's `__init__` raises `DatasetNotAvailableError` with an actionable message. Test fixtures catch that and `pytest.skip` — they never fail.
|
||||
|
||||
### Capability flags
|
||||
|
||||
`DatasetCapabilities` tells tests what to expect. Tests use these flags to skip paths the adapter can't exercise:
|
||||
|
||||
| Flag | What it means | Example false case |
|
||||
|---|---|---|
|
||||
| `has_raw_imu` | `iter_imu()` yields raw accel+gyro at ≥100 Hz | VPAIR sample (ships 6-DoF poses only) |
|
||||
| `has_rtk_gt` | Ground-truth positions are RTK-grade (<0.1 m) | EuRoC (uses Vicon, millimetre-grade but not RTK) |
|
||||
| `has_loop_closures` | Trajectory revisits locations (affects GPR expected hit rate) | Most open-field fixed-wing flights |
|
||||
| `platform_class` | `fixed_wing` / `rotary` / `indoor` / `synthetic` — dynamics differ sharply | — |
|
||||
|
||||
When a test needs `has_raw_imu=True` but the adapter has it False, the integration test should `pytest.skip` at the top, not assert.
|
||||
|
||||
## Writing a new adapter — recipe
|
||||
|
||||
1. **Decide capabilities first.** Read the dataset's paper/README. Does it ship raw IMU? RTK? What's the platform class?
|
||||
2. **Add a failing adapter unit test** in `tests/e2e/test_<name>_adapter.py` using a `tmp_path`-based fabricated fixture. Mirror the real file layout (directory names, CSV headers, value ranges).
|
||||
3. **Implement the adapter.** Reuse `coord.ecef_to_wgs84` and `coord.euler_to_quaternion` if the dataset ships those. Synthesize timestamps if the dataset doesn't have them (e.g. VPAIR — 5 Hz = 200 000 000 ns period).
|
||||
4. **Add a session-scoped fixture** in `tests/e2e/conftest.py` that looks for the real dataset under `./datasets/<name>/<subdir>/` and skips with an actionable install hint.
|
||||
5. **Add an integration test** in `tests/e2e/test_<name>.py` with `@pytest.mark.e2e @pytest.mark.needs_dataset` (add `@pytest.mark.e2e_slow` if >2 min). Compare harness output to GT using `metrics.absolute_trajectory_error`. When the pipeline is not yet tuned for the dataset, use `pytest.xfail()` to document the current gap instead of hard failing.
|
||||
6. **Register SHA256** of the known-good dataset archive in `DATASET_REGISTRY`. Leave `url=""` if downloads are form-gated — the registry then documents the hash without enabling drive-by fetches.
|
||||
|
||||
## Harness data flow
|
||||
|
||||
```
|
||||
adapter.iter_frames() ─┐
|
||||
adapter.iter_imu() ├─▶ E2EHarness.run() ─▶ FlightProcessor.process_frame() ─▶ collected estimates
|
||||
adapter.iter_ground_truth() ────────────────────▶ HarnessResult.ground_truth (ENU metres)
|
||||
│
|
||||
▼
|
||||
metrics.absolute_trajectory_error()
|
||||
│
|
||||
▼
|
||||
RMSE assert or pytest.xfail()
|
||||
```
|
||||
|
||||
The harness owns a minimal `FlightProcessor` built with `MagicMock` repository and SSE streamer, wires in the real `vo/gpr/metric/graph/chunk_mgr/recovery` components via `attach_components()`, and feeds frames sequentially. GPS estimates (`FrameResult.gps`) are collected; both estimate and GT tracks are converted to a local ENU frame rooted at GT pose 0 so trajectory metrics don't depend on the absolute geodetic origin.
|
||||
|
||||
## Running
|
||||
|
||||
```bash
|
||||
# Fast: unit + adapter tests, skip anything needing a real dataset
|
||||
pytest tests/e2e/ -q
|
||||
|
||||
# CI tier: run what has a dataset locally, stay under ~30s
|
||||
pytest tests/e2e/ -m "e2e and not e2e_slow" -v
|
||||
|
||||
# Nightly tier: VPAIR, MARS-LVIG, other long runs
|
||||
pytest tests/e2e/ -m e2e_slow -v
|
||||
|
||||
# Download a dataset registered in DATASET_REGISTRY with a URL
|
||||
python scripts/download_dataset.py euroc_mh01
|
||||
```
|
||||
|
||||
Markers (`e2e`, `e2e_slow`, `needs_dataset`) are registered in `pyproject.toml`.
|
||||
|
||||
## Existing adapters at a glance
|
||||
|
||||
| Adapter | Platform | Raw IMU | GT | Real-run status |
|
||||
|---|---|---|---|---|
|
||||
| `SyntheticAdapter` | — | yes (zero motion) | exact | smoke test only, always runs |
|
||||
| `EuRoCAdapter` | indoor MAV | 200 Hz ADIS16448 | Vicon | pending first real run (dataset download in progress) |
|
||||
| `VPAIRAdapter` | fixed-wing light aircraft | no (pose-only) | GNSS/INS ~1 m | ran once — ATE ~1770 km, xfail documented; VO alone diverges without anchoring |
|
||||
| `MARSLVIGAdapter` | rotary (DJI M300 RTK) | yes | RTK | pending (requires pre-extracted ROS bag) |
|
||||
|
||||
## References
|
||||
|
||||
- Dataset-selection rationale: [ADR 0001](../../../_docs/01_solution/decisions/0001-e2e-dataset-strategy.md)
|
||||
- Roadmap checklist: [next_steps.md](../../../next_steps.md)
|
||||
- Target system solution: [_docs/01_solution/solution.md](../../../_docs/01_solution/solution.md), §Testing Strategy
|
||||
Reference in New Issue
Block a user