12 KiB
E2E replay tests (AZ-404 + AZ-835 + cycle-4)
End-to-end regression suite for the gps-denied-replay console-script
(AZ-402). Two distinct entry points live here:
| Entry point | Source | Coverage |
|---|---|---|
| AZ-265 / AZ-404 — 60 s Derkachi clip with synthetic tlog | test_derkachi_1min.py |
Original AC-1..AC-10 of the replay epic; runs on Tier-1 + Tier-2 |
AZ-835 / AZ-840 — full (tlog, video, calibration) orchestrator |
test_az835_e2e_real_flight.py |
Tier-2 only; closes the real-flight validation loop end-to-end (extract → seed → FAISS → run → verdict) |
The cycle-4 replay-input redesign (AZ-894 / AZ-895 / AZ-896) replaces
the tlog auto-sync surface with a CSV-driven path; the AZ-265 suite is
the regression net that catches drift in the legacy path during the
deprecation window. See replay_protocol.md Invariants 12-14 for the
authoritative contract.
How to run
AZ-404 Derkachi 60 s suite (Tier-1 + Tier-2)
# In a fresh venv with the package installed:
RUN_REPLAY_E2E=1 pytest tests/e2e/replay/test_derkachi_1min.py -v
Without RUN_REPLAY_E2E=1 the heavy tests skip cleanly. The two
unconditional tests (AC-4a mode-agnosticism scan + AC-7 skip-gate
self-check + the helpers in test_helpers.py) still run.
AZ-835 orchestrator test — full (tlog, video, calibration) loop (Tier-2 only)
Closes Epic AZ-835's narrative: given a real-flight .tlog + the
matching nadir video + camera calibration, the orchestrator runs the
7-step pipeline end-to-end and writes a verdict report.
Required inputs (already in-repo for the Derkachi reference fixture):
.tlog— pymavlink binary log from a real flight. Reference fixture:_docs/00_problem/input_data/flight_derkachi/data_imu.csv(the canonical CSV that_tlog_synth.pyreconstructs the tlog from) plus the synthesised tlog the conftest emits at session start.- Nadir video —
_docs/00_problem/input_data/flight_derkachi/*.mp4(large asset; not always checked in to the workstation clone — pull from the Jetson e2e harness or git LFS if absent). - Calibration —
tests/fixtures/calibration/adti26.json(factory-sheet approximation for the Topotek KHP20S30; real intrinsics still TBD).
Tier-2 invocation (Jetson):
ssh jetson-e2e
cd /workspace/gps-denied-onboard
export RUN_REPLAY_E2E=1
pytest tests/e2e/replay/test_az835_e2e_real_flight.py -v --tb=short -m tier2
AZ-962: docker-compose.test.jetson.yml exports
GPS_DENIED_OPERATOR_CONFIG_PATH=/opt/configs/operator_replay.yaml
automatically and bind-mounts ./configs:/opt/configs:ro, so no
manual env-var export is required when running through
scripts/run-tests-jetson.sh. The YAML at configs/operator_replay.yaml
declares the four blocks the fixture requires (c6 / c7 / c10 / c11);
secrets (SATELLITE_PROVIDER_API_KEY) flow in from .env.test via
the loader's ENV_KEY_MAP. c10_provisioning.backbones is
intentionally empty pending AZ-964 (the orchestrator test will
SKIP at the "no backbones" gate until AZ-964 lands).
The bundled local-development entry point is scripts/run-tests-jetson.sh,
which handles the SSH alias + rsync + remote pytest invocation. See
_docs/02_document/tests/tier2-jetson-testing.md for the harness contract.
Skip gates (in evaluation order):
@pytest.mark.tier2— the per-suite Tier-2 plugin gates this off on dev macOS / Tier-1 Docker (matches the AZ-839 / AZ-699 contract).RUN_REPLAY_E2Enot in{1, true, yes, on}.gps-denied-replayconsole-script not onPATH.- Real Derkachi video missing or placeholder-sized.
operator_pre_flight_setupfixture itself skipped — the downstream consumer inherits the SKIP automatically (pytest's fixture-skip propagation).
Expected runtime on Tier-2 Jetson AGX Orin (cold cache): ≤ 8 min end-to-end (≤ 5 min C3 fixture cold-start budget + ≤ 3 min for the replay + verdict compute). Warm-cache reinvocations within the same compose session: ≤ 60 s.
Verdict report location: _docs/06_metrics/real_flight_validation_<YYYY-MM-DD>.md.
The report is emitted ALWAYS, regardless of PASS / FAIL on the AZ-696
AC-3 threshold (≥ 80 % of emissions within 100 m of tlog ground truth).
The success criterion at the fixture level is "honest report exists with
distribution data", not "PASS". The PASS / FAIL line of the report itself
is the operator-facing answer to "did this flight clip localise within
the threshold".
Imagery source license attribution (dev/research use only)
The Jetson e2e harness's satellite-provider instance downloads tiles
from the Google Maps satellite layer (mt0..mt3.google.com/vt/lyrs=s),
governed by Google Maps Platform Terms of Service. Every tile served by
the harness carries the "Imagery © Google" attribution string.
This is dev/research use only. Production deployment of the
gps-denied-onboard companion against a Google-Maps-sourced
satellite-provider requires either a Google Maps Platform licensing
review or migration to a true CC-BY satellite source on the parent-suite
side (parent-suite ticket TBD; see _docs/02_document/architecture.md
§ satellite-provider integration). The onboard-side seed scripts
(tests/fixtures/derkachi_c6/seed_region.py, seed_route.py) propagate
the attribution into the test fixture's metadata; do not remove it.
Fixture state
| Artifact | Status | Source |
|---|---|---|
flight_derkachi.mp4 |
available | _docs/00_problem/input_data/flight_derkachi/ |
data_imu.csv |
available | same dir; 4900 rows at 10 Hz over 489.9 s |
| Synthetic tlog | generated at fixture time | _tlog_synth.py reproduces a pymavlink .tlog from the CSV (the original tlog is not in-repo; the CSV was its export) |
| Camera calibration | placeholder (tests/fixtures/calibration/adti26.json) |
The real Topotek KHP20S30 intrinsics are unknown per camera_info.md. AC-3 accuracy depends on this. |
| Operator pre-flight rehearsal | blocked | tests/fixtures/mock-suite-sat-service/ is a bootstrap stub (only GET /healthz); AC-8 skips until the full D-PROJ-2 contract lands. |
Clip range
The first 60 s of the Derkachi flight (Time=0.0 → Time=60.0). The
take-off region exercises the AZ-405 IMU-take-off auto-sync detector;
the cruise region that follows stresses the satellite-anchor + VIO
drift-correction path. To change the trim, edit _CLIP_START_S and
_CLIP_END_S in conftest.py.
Expected runtime (Tier-1)
| Test | Expected wall clock |
|---|---|
AC-1 (--pace asap) |
≤ 30 s (Tier-2 only) |
| AC-2 schema match | piggybacks on AC-1 (Tier-2 only) |
| AC-5 determinism | 2 × asap runs (≤ 60 s total; Tier-2 only) |
| AC-6 realtime | 60 s ± 3 s (Tier-2 only) |
| AC-6 asap | ≤ 30 s (Tier-2 only) |
| Total suite | ~6 min wall clock on Tier-2; skips on Mac |
The AC-1 / AC-2 / AC-5 tests share --pace asap runs but each
fixture invocation produces a fresh output file, so they do not
short-circuit each other (preserves AC-5's two-runs-diff guarantee).
AC matrix
| AC | Test | State |
|---|---|---|
| AC-1: exit 0 + JSONL count match | test_ac1_exits_0_jsonl_count_match |
Tier-2 (Jetson only) |
| AC-2: JSONL schema match | test_ac2_jsonl_schema_match |
Tier-2 (Jetson only) |
| AC-3: ≤ 100 m for 80 % of ticks | test_ac3_within_100m_80pct_of_ticks |
Tier-2 (Jetson only) |
| AC-4a: mode-agnosticism AST scan | test_ac4_mode_agnosticism_ast_scan |
unconditional |
| AC-4b: encoder byte-equality | test_ac4_encoder_byte_equality |
skip (waiting on AZ-558) |
| AC-5: determinism | test_ac5_determinism_two_runs_diff |
Tier-2 (Jetson only) |
| AC-6a: realtime 60 s ± 5 % | test_ac6_pace_realtime_60s_within_5pct |
Tier-2 (Jetson only) |
| AC-6b: asap ≤ 30 s | test_ac6_pace_asap_under_30s |
Tier-2 (Jetson only) |
| AC-7: skip-gate self-check | test_ac7_skip_gate_consistent_with_env_var |
unconditional |
| AC-8: operator workflow rehearsal | test_ac8_operator_workflow |
skip (waiting on D-PROJ-2 mock) |
| AC-9: helper L2 correctness | test_helpers.py::test_ac9_l2_* |
unconditional |
| AC-10: README accuracy | this file | live |
Failure-mode cookbook
| Symptom | Likely cause | Fix |
|---|---|---|
gps-denied-replay console-script not on PATH |
package not installed in the test venv | pip install -e . |
| AC-1 line count off by > 5 % | tlog synthesizer drifted from the CSV | regenerate by re-running the test (synthesizer is deterministic; non-determinism would be a real bug) |
| AC-3 fails at ~ 0 % even with calibration | wrong intrinsics OR wrong WGS84 ground truth source — verify the GLOBAL_POSITION_INT columns are still the AC-3 reference (per flight_derkachi/README.md) |
re-derive ground truth |
| AC-5 determinism violated | non-deterministic float ordering in C5 estimator OR a clock leaked into the runtime | bisect via git log against the C5 / clock modules |
| AC-6 realtime drifts on shared CI | shared-runner contention; the spec allows widening to ± 5 s | adjust _HEAVY_SKIP boundary if it persists |
tlog missing required messages |
_tlog_synth.py lost a message group |
check _REQUIRED_MESSAGE_GROUPS in tlog_replay_adapter.py against the synth output |
Files
tests/e2e/replay/
├── README.md ← this file
├── __init__.py ← package marker + module-level docstring
├── _helpers.py ← parse_jsonl, l2_horizontal_m, match_percentage,
│ CapturingMavlinkTransport, GroundTruthRow
├── _tlog_synth.py ← CSV → tlog generator
├── conftest.py ← derkachi_replay_inputs, replay_runner,
│ operator_pre_flight_setup fixtures
├── test_helpers.py ← unit tests for _helpers (unconditional)
└── test_derkachi_1min.py ← AC-1..AC-8 + AC-7 skip gate + AC-4a AST scan
Follow-up work
- AZ-777 — DONE (Phases 1+2 shipped cycle 3). C11 contract adapted, e2e-runner wired against real satellite-provider. Phases 3-5 superseded by Epic AZ-835 children (AZ-839, AZ-840, AZ-841).
- Real Topotek KHP20S30 calibration — needed for AC-3 accuracy even after AZ-777 lands (the threshold is ≤100 m for 80 % of ticks).
- AZ-558 — closes AC-4b (route C8 encoders through
MavlinkTransport). - D-PROJ-2 mock-suite-sat-service — unblocks AC-8 (operator workflow rehearsal).
Epic AZ-835 ticket map
The Tier-2 orchestrator path shipped under Epic AZ-835. Sub-tickets:
| Ticket | Role |
|---|---|
| AZ-836 | TlogRouteExtractor — active-segment trim + Douglas-Peucker coarsen tlog GPS to ≤ N waypoints |
| AZ-838 | SatelliteProviderRouteClient + seed_route.py CLI — POST RouteSpec to satellite-provider, poll mapsReady |
| AZ-839 | C3 operator_pre_flight_setup real fixture — wires C1+C2+C11+C10 against the seeded catalog |
| AZ-840 | C4 E2E orchestrator test — drives the full 7-step pipeline from (tlog, video, calibration) |
| AZ-842 | C6 Docs — replay_protocol.md Invariants 12-14 + architecture.md + this README (cycle-4 rescope) |
The cycle-4 replay-input redesign tickets ride alongside the Epic:
| Ticket | Role |
|---|---|
| AZ-894 | CsvReplayInputAdapter — new CSV-driven primary path on the single canonical clock |
| AZ-895 | Auto-sync surface deprecation — tlog adapter reduced to audit-only role |
| AZ-896 | CSV format spec (csv_replay_format.md) + example data_imu.csv |
| AZ-897 | Operator-facing UI (React + Tailwind paired-upload form) — cycle 5+ |