gps-denied-onboard/_docs/02_document/tests/e2e-test-suite.md

# E2E Test Suite

## Scope

The e2e test suite is separate test tooling, not part of the onboard runtime. It drives black-box replay, public dataset, SITL, Jetson, and representative validation through public runtime interfaces only.

## Purpose

- Feed navigation frames, telemetry traces, cache manifests, and fault triggers into the system under test.
- Validate emitted coordinates, confidence fields, MAVLink `GPS_INPUT`, QGC status, FDR, and generated-tile evidence.
- Produce release evidence without importing runtime internals.

## Ownership

- **Epic**: AZ-217 (E2E Test Suite / test-support work, not product runtime)
- **Owns**:
  - `tests/blackbox/**`
  - `tests/e2e/**`
  - `e2e/replay/**`
  - `e2e/reports/**`
- **Does not own**:
  - `src/**`
  - runtime component internals
  - production deployment code

## Public Interfaces Under Test

| Interface | Protocol / Contract |
|-----------|---------------------|
| Navigation frames | Ordered image/video replay with timestamps |
| FC telemetry | MAVLink replay or generated stream |
| Satellite cache | Local COG + manifest + descriptor fixtures |
| GPS output | MAVLink `GPS_INPUT` |
| Operator status | QGC-visible MAVLink status |
| FDR | Filesystem/database-backed evidence outputs |

## Runner Contract

| Method | Input | Output | Error Types |
|--------|-------|--------|-------------|
| `run_scenario` | `ScenarioRequest` | `ScenarioReport` | `FixtureInvalid`, `RuntimeFailed`, `ThresholdFailed` |
| `validate_fixture` | `FixtureRequest` | `FixtureValidationReport` | `FixtureInvalid` |

```yaml
ScenarioRequest:
  scenario_id: string
  execution_environment: enum(replay, sitl, jetson, representative)
  fixture_paths: list[string]

ScenarioReport:
  scenario_id: string
  result: enum(pass, fail, blocked)
  metrics: object
  artifacts: list[path]
  failure_reason: string optional
```

## Scenario Coverage

| Scenario | Purpose | Evidence |
|----------|---------|----------|
| Still-image accuracy runner | Verify project still-image replay reports frame-center accuracy | Per-image error, aggregate pass rates, covariance, source label, anchor age |
| Synchronized VIO replay runner | Verify Derkachi and public/representative synchronized data drive BASALT/wrapper tests | Fixture alignment, trajectory comparison, VIO registration, latency, covariance calibration |
| Satellite anchor replay runner | Verify VPR and anchor verification scenarios are executable | Retrieval recall, MRE, accepted/rejected anchors, freshness behavior |
| Outlier/sharp-turn/disconnected runner | Verify relocalization resilience scenarios are executable | Degraded-mode timelines and relocalization outcomes |
| Blackout and spoofing runner | Verify total blackout plus spoofing through SITL/replay | Mode-switch timing, covariance growth, failsafe thresholds |
| MAVLink/QGC contract runner | Verify MAVLink output and GCS status assertions | `GPS_INPUT`, WGS84 coordinates, status rate, command ingress |
| Startup/reboot runner | Verify cold-start and companion reboot scenarios | First valid `GPS_INPUT` p95 and FC-state reinitialization |
| Object coordinate contract runner | Verify AI-camera object coordinate request at system boundary | Frame-center-consistent coordinate accuracy and projection bound |
| Tile Manager runner | Verify cache, generated tiles, and storage tests | Cache load, tile write gates, no raw-frame retention, stale rejection, poisoning evidence |

## Release Evidence

The suite assembles CSV, Markdown, MAVLink tlogs, FDR summaries, cache validation reports, and pass/fail metadata into release evidence bundles. Missing public or representative data is reported as `blocked`, not `passed`.

## Non-Responsibilities

- No onboard flight logic.
- No direct estimator, BASALT, wrapper, or tile-manager imports.
- No mutation of runtime internal state.
- No production service APIs.