- next_steps.md: chronology entry for PRs #4-6 — trace harness, VO-only diagnostic (ORB 100% on EuRoC), harness ORB fix (vo_success 0→99/100); decision note on Mock vs ORB backend; next-step: ESKF init with synthetic GPS origin - README.md adapters table: update EuRoC status to reflect new vo_success baseline Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
7.2 KiB
gps_denied.testing — E2E Test Harness
Test-only subpackage. Not imported by product code. Runs the full FlightProcessor pipeline as a black box on public UAV datasets and compares estimated trajectories against ground truth.
When to use this
- Adding a new public dataset → implement a
DatasetAdaptersubclass here. - Debugging the pipeline on real flight-like data → run an e2e test locally with a real dataset in
./datasets/. - Guarding a refactor (VO → cuVSLAM,
src/gps_denied/→src/, etc.) → runpytest tests/e2e/before and after, compare numbers.
Do not put production code here and do not import gps_denied.testing.* from gps_denied.core.* or gps_denied.api.*. The import direction is one-way: tests may see the product, the product must not see tests.
Package layout
src/gps_denied/testing/
coord.py ECEF→WGS84 (Heikkinen closed-form), Euler→quaternion (ZYX aerospace)
metrics.py trajectory_rmse, absolute_trajectory_error, relative_pose_error
harness.py E2EHarness + HarnessResult
download.py DATASET_REGISTRY + SHA256-verified download_dataset()
datasets/
base.py DatasetAdapter ABC, DatasetCapabilities, DatasetFrame/IMU/Pose
synthetic.py SyntheticAdapter (harness self-test)
euroc.py EuRoCAdapter (ETHZ ASL MAV format)
vpair.py VPAIRAdapter (AerVisLoc sample — ECEF + Euler)
mars_lvig.py MARSLVIGAdapter (pre-extracted ROS bag layout)
Tests live at tests/e2e/. Real datasets are expected at repo root in ./datasets/<name>/ (gitignored).
DatasetAdapter contract
Every adapter is a read-only iterator over one dataset sequence. It has a name, declared capabilities, and three streams: frames, IMU samples, ground-truth poses. Frames carry a timestamp and an image path; IMU carries body-frame accel+gyro; poses are WGS84 lat/lon/alt plus a unit quaternion.
class DatasetAdapter(ABC):
@property
def name(self) -> str: ... # e.g. "euroc:MH_01"
@property
def capabilities(self) -> DatasetCapabilities: ...
def iter_frames(self) -> Iterator[DatasetFrame]: ...
def iter_imu(self) -> Iterator[DatasetIMU]: ...
def iter_ground_truth(self) -> Iterator[DatasetPose]: ...
If the dataset is not present on disk (or is incomplete), the adapter's __init__ raises DatasetNotAvailableError with an actionable message. Test fixtures catch that and pytest.skip — they never fail.
Capability flags
DatasetCapabilities tells tests what to expect. Tests use these flags to skip paths the adapter can't exercise:
| Flag | What it means | Example false case |
|---|---|---|
has_raw_imu |
iter_imu() yields raw accel+gyro at ≥100 Hz |
VPAIR sample (ships 6-DoF poses only) |
has_rtk_gt |
Ground-truth positions are RTK-grade (<0.1 m) | EuRoC (uses Vicon, millimetre-grade but not RTK) |
has_loop_closures |
Trajectory revisits locations (affects GPR expected hit rate) | Most open-field fixed-wing flights |
platform_class |
fixed_wing / rotary / indoor / synthetic — dynamics differ sharply |
— |
When a test needs has_raw_imu=True but the adapter has it False, the integration test should pytest.skip at the top, not assert.
Writing a new adapter — recipe
- Decide capabilities first. Read the dataset's paper/README. Does it ship raw IMU? RTK? What's the platform class?
- Add a failing adapter unit test in
tests/e2e/test_<name>_adapter.pyusing atmp_path-based fabricated fixture. Mirror the real file layout (directory names, CSV headers, value ranges). - Implement the adapter. Reuse
coord.ecef_to_wgs84andcoord.euler_to_quaternionif the dataset ships those. Synthesize timestamps if the dataset doesn't have them (e.g. VPAIR — 5 Hz = 200 000 000 ns period). - Add a session-scoped fixture in
tests/e2e/conftest.pythat looks for the real dataset under./datasets/<name>/<subdir>/and skips with an actionable install hint. - Add an integration test in
tests/e2e/test_<name>.pywith@pytest.mark.e2e @pytest.mark.needs_dataset(add@pytest.mark.e2e_slowif >2 min). Compare harness output to GT usingmetrics.absolute_trajectory_error. When the pipeline is not yet tuned for the dataset, usepytest.xfail()to document the current gap instead of hard failing. - Register SHA256 of the known-good dataset archive in
DATASET_REGISTRY. Leaveurl=""if downloads are form-gated — the registry then documents the hash without enabling drive-by fetches.
Harness data flow
adapter.iter_frames() ─┐
adapter.iter_imu() ├─▶ E2EHarness.run() ─▶ FlightProcessor.process_frame() ─▶ collected estimates
adapter.iter_ground_truth() ────────────────────▶ HarnessResult.ground_truth (ENU metres)
│
▼
metrics.absolute_trajectory_error()
│
▼
RMSE assert or pytest.xfail()
The harness owns a minimal FlightProcessor built with MagicMock repository and SSE streamer, wires in the real vo/gpr/metric/graph/chunk_mgr/recovery components via attach_components(), and feeds frames sequentially. GPS estimates (FrameResult.gps) are collected; both estimate and GT tracks are converted to a local ENU frame rooted at GT pose 0 so trajectory metrics don't depend on the absolute geodetic origin.
Running
# Fast: unit + adapter tests, skip anything needing a real dataset
pytest tests/e2e/ -q
# CI tier: run what has a dataset locally, stay under ~30s
pytest tests/e2e/ -m "e2e and not e2e_slow" -v
# Nightly tier: VPAIR, MARS-LVIG, other long runs
pytest tests/e2e/ -m e2e_slow -v
# Inspect the registry / trigger a manual-instruction printout
python scripts/download_dataset.py <name> # exits 3 and prints fetch steps
# if the entry has an empty URL
Markers (e2e, e2e_slow, needs_dataset) are registered in pyproject.toml.
Existing adapters at a glance
| Adapter | Platform | Raw IMU | GT | Real-run status |
|---|---|---|---|---|
SyntheticAdapter |
— | yes (zero motion) | exact | smoke test only, always runs |
EuRoCAdapter |
indoor MAV | 200 Hz ADIS16448 | Vicon | ran on first 100 frames — vo_success 99/100 (ORB), ESKF not yet init'd in harness (no start_gps), ATE xfail documented |
VPAIRAdapter |
fixed-wing light aircraft | no (pose-only) | GNSS/INS ~1 m | ran once — ATE ~1770 km, xfail documented; VO alone diverges without anchoring |
MARSLVIGAdapter |
rotary (DJI M300 RTK) | yes | RTK | pending (requires pre-extracted ROS bag) |
References
- Dataset-selection rationale: ADR 0001
- Roadmap checklist: next_steps.md
- Target system solution: _docs/01_solution/solution.md, §Testing Strategy