Files
gps-denied-onboard/src/gps_denied/testing/README.md
T
Yuzviak be7eb338c1 docs(testing): add architecture guide for the e2e harness subpackage
Explains the DatasetAdapter contract (name/capabilities/iter_*),
capability-flag semantics (has_raw_imu, has_rtk_gt, platform_class),
the recipe for adding a new adapter (fabricated fixture → adapter →
conftest fixture → integration test → registry SHA256), and the
current state of each shipped adapter including the VPAIR ~1770 km
ATE real-run baseline. Lives next to the code so it stays in sync.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-17 13:55:17 +03:00

7.0 KiB

gps_denied.testing — E2E Test Harness

Test-only subpackage. Not imported by product code. Runs the full FlightProcessor pipeline as a black box on public UAV datasets and compares estimated trajectories against ground truth.

When to use this

  • Adding a new public dataset → implement a DatasetAdapter subclass here.
  • Debugging the pipeline on real flight-like data → run an e2e test locally with a real dataset in ./datasets/.
  • Guarding a refactor (VO → cuVSLAM, src/gps_denied/src/, etc.) → run pytest tests/e2e/ before and after, compare numbers.

Do not put production code here and do not import gps_denied.testing.* from gps_denied.core.* or gps_denied.api.*. The import direction is one-way: tests may see the product, the product must not see tests.

Package layout

src/gps_denied/testing/
  coord.py                   ECEF→WGS84 (Heikkinen closed-form), Euler→quaternion (ZYX aerospace)
  metrics.py                 trajectory_rmse, absolute_trajectory_error, relative_pose_error
  harness.py                 E2EHarness + HarnessResult
  download.py                DATASET_REGISTRY + SHA256-verified download_dataset()
  datasets/
    base.py                  DatasetAdapter ABC, DatasetCapabilities, DatasetFrame/IMU/Pose
    synthetic.py             SyntheticAdapter (harness self-test)
    euroc.py                 EuRoCAdapter (ETHZ ASL MAV format)
    vpair.py                 VPAIRAdapter (AerVisLoc sample — ECEF + Euler)
    mars_lvig.py             MARSLVIGAdapter (pre-extracted ROS bag layout)

Tests live at tests/e2e/. Real datasets are expected at repo root in ./datasets/<name>/ (gitignored).

DatasetAdapter contract

Every adapter is a read-only iterator over one dataset sequence. It has a name, declared capabilities, and three streams: frames, IMU samples, ground-truth poses. Frames carry a timestamp and an image path; IMU carries body-frame accel+gyro; poses are WGS84 lat/lon/alt plus a unit quaternion.

class DatasetAdapter(ABC):
    @property
    def name(self) -> str: ...                      # e.g. "euroc:MH_01"

    @property
    def capabilities(self) -> DatasetCapabilities: ...

    def iter_frames(self) -> Iterator[DatasetFrame]: ...
    def iter_imu(self) -> Iterator[DatasetIMU]: ...
    def iter_ground_truth(self) -> Iterator[DatasetPose]: ...

If the dataset is not present on disk (or is incomplete), the adapter's __init__ raises DatasetNotAvailableError with an actionable message. Test fixtures catch that and pytest.skip — they never fail.

Capability flags

DatasetCapabilities tells tests what to expect. Tests use these flags to skip paths the adapter can't exercise:

Flag What it means Example false case
has_raw_imu iter_imu() yields raw accel+gyro at ≥100 Hz VPAIR sample (ships 6-DoF poses only)
has_rtk_gt Ground-truth positions are RTK-grade (<0.1 m) EuRoC (uses Vicon, millimetre-grade but not RTK)
has_loop_closures Trajectory revisits locations (affects GPR expected hit rate) Most open-field fixed-wing flights
platform_class fixed_wing / rotary / indoor / synthetic — dynamics differ sharply

When a test needs has_raw_imu=True but the adapter has it False, the integration test should pytest.skip at the top, not assert.

Writing a new adapter — recipe

  1. Decide capabilities first. Read the dataset's paper/README. Does it ship raw IMU? RTK? What's the platform class?
  2. Add a failing adapter unit test in tests/e2e/test_<name>_adapter.py using a tmp_path-based fabricated fixture. Mirror the real file layout (directory names, CSV headers, value ranges).
  3. Implement the adapter. Reuse coord.ecef_to_wgs84 and coord.euler_to_quaternion if the dataset ships those. Synthesize timestamps if the dataset doesn't have them (e.g. VPAIR — 5 Hz = 200 000 000 ns period).
  4. Add a session-scoped fixture in tests/e2e/conftest.py that looks for the real dataset under ./datasets/<name>/<subdir>/ and skips with an actionable install hint.
  5. Add an integration test in tests/e2e/test_<name>.py with @pytest.mark.e2e @pytest.mark.needs_dataset (add @pytest.mark.e2e_slow if >2 min). Compare harness output to GT using metrics.absolute_trajectory_error. When the pipeline is not yet tuned for the dataset, use pytest.xfail() to document the current gap instead of hard failing.
  6. Register SHA256 of the known-good dataset archive in DATASET_REGISTRY. Leave url="" if downloads are form-gated — the registry then documents the hash without enabling drive-by fetches.

Harness data flow

adapter.iter_frames()  ─┐
adapter.iter_imu()      ├─▶ E2EHarness.run() ─▶ FlightProcessor.process_frame() ─▶ collected estimates
adapter.iter_ground_truth() ────────────────────▶ HarnessResult.ground_truth (ENU metres)
                                                          │
                                                          ▼
                                             metrics.absolute_trajectory_error()
                                                          │
                                                          ▼
                                         RMSE assert or pytest.xfail()

The harness owns a minimal FlightProcessor built with MagicMock repository and SSE streamer, wires in the real vo/gpr/metric/graph/chunk_mgr/recovery components via attach_components(), and feeds frames sequentially. GPS estimates (FrameResult.gps) are collected; both estimate and GT tracks are converted to a local ENU frame rooted at GT pose 0 so trajectory metrics don't depend on the absolute geodetic origin.

Running

# Fast: unit + adapter tests, skip anything needing a real dataset
pytest tests/e2e/ -q

# CI tier: run what has a dataset locally, stay under ~30s
pytest tests/e2e/ -m "e2e and not e2e_slow" -v

# Nightly tier: VPAIR, MARS-LVIG, other long runs
pytest tests/e2e/ -m e2e_slow -v

# Download a dataset registered in DATASET_REGISTRY with a URL
python scripts/download_dataset.py euroc_mh01

Markers (e2e, e2e_slow, needs_dataset) are registered in pyproject.toml.

Existing adapters at a glance

Adapter Platform Raw IMU GT Real-run status
SyntheticAdapter yes (zero motion) exact smoke test only, always runs
EuRoCAdapter indoor MAV 200 Hz ADIS16448 Vicon pending first real run (dataset download in progress)
VPAIRAdapter fixed-wing light aircraft no (pose-only) GNSS/INS ~1 m ran once — ATE ~1770 km, xfail documented; VO alone diverges without anchoring
MARSLVIGAdapter rotary (DJI M300 RTK) yes RTK pending (requires pre-extracted ROS bag)

References