# E2E replay tests (AZ-404) End-to-end regression suite that runs the `gps-denied-replay` console-script (AZ-402) against the Derkachi 60 s clip and asserts the AZ-265 epic acceptance criteria. ## How to run ```bash # In a fresh venv with the package installed: RUN_REPLAY_E2E=1 pytest tests/e2e/replay/ -v ``` Without `RUN_REPLAY_E2E=1` the heavy tests skip cleanly. The two unconditional tests (AC-4a mode-agnosticism scan + AC-7 skip-gate self-check + the helpers in `test_helpers.py`) still run. ## Fixture state | Artifact | Status | Source | |----------|--------|--------| | `flight_derkachi.mp4` | available | `_docs/00_problem/input_data/flight_derkachi/` | | `data_imu.csv` | available | same dir; 4900 rows at 10 Hz over 489.9 s | | Synthetic tlog | generated at fixture time | `_tlog_synth.py` reproduces a `pymavlink` `.tlog` from the CSV (the original tlog is not in-repo; the CSV was its export) | | Camera calibration | placeholder (`tests/fixtures/calibration/adti26.json`) | The real Topotek KHP20S30 intrinsics are unknown per `camera_info.md`. AC-3 is `xfail`ed until a real calibration ships. | | Operator pre-flight rehearsal | blocked | `tests/fixtures/mock-suite-sat-service/` is a bootstrap stub (only `GET /healthz`); AC-8 skips until the full D-PROJ-2 contract lands. | ## Clip range The first 60 s of the Derkachi flight (Time=0.0 → Time=60.0). The take-off region exercises the AZ-405 IMU-take-off auto-sync detector; the cruise region that follows stresses the satellite-anchor + VIO drift-correction path. To change the trim, edit `_CLIP_START_S` and `_CLIP_END_S` in `conftest.py`. ## Expected runtime (Tier-1) | Test | Expected wall clock | |------|---------------------| | AC-1 (`--pace asap`) | ≤ 30 s | | AC-2 schema match | piggybacks on AC-1 | | AC-5 determinism | 2 × asap runs (≤ 60 s total) | | AC-6 realtime | 60 s ± 3 s | | AC-6 asap | ≤ 30 s | | Total suite | ≤ 6 min on Jetson AGX Orin | The AC-1 / AC-2 / AC-5 tests share `--pace asap` runs but each fixture invocation produces a fresh output file, so they do not short-circuit each other (preserves AC-5's two-runs-diff guarantee). ## AC matrix | AC | Test | State | |----|------|-------| | AC-1: exit 0 + JSONL count match | `test_ac1_exits_0_jsonl_count_match` | runs on Tier-1 | | AC-2: JSONL schema match | `test_ac2_jsonl_schema_match` | runs on Tier-1 | | AC-3: ≤ 100 m for 80 % of ticks | `test_ac3_within_100m_80pct_of_ticks` | `xfail` (waiting on real calibration) | | AC-4a: mode-agnosticism AST scan | `test_ac4_mode_agnosticism_ast_scan` | unconditional | | AC-4b: encoder byte-equality | `test_ac4_encoder_byte_equality` | `skip` (waiting on AZ-558) | | AC-5: determinism | `test_ac5_determinism_two_runs_diff` | runs on Tier-1 | | AC-6a: realtime 60 s ± 5 % | `test_ac6_pace_realtime_60s_within_5pct` | runs on Tier-1 | | AC-6b: asap ≤ 30 s | `test_ac6_pace_asap_under_30s` | runs on Tier-1 | | AC-7: skip-gate self-check | `test_ac7_skip_gate_consistent_with_env_var` | unconditional | | AC-8: operator workflow rehearsal | `test_ac8_operator_workflow` | `skip` (waiting on D-PROJ-2 mock) | | AC-9: helper L2 correctness | `test_helpers.py::test_ac9_l2_*` | unconditional | | AC-10: README accuracy | this file | live | ## Failure-mode cookbook | Symptom | Likely cause | Fix | |---------|--------------|-----| | `gps-denied-replay console-script not on PATH` | package not installed in the test venv | `pip install -e .` | | AC-1 line count off by > 5 % | tlog synthesizer drifted from the CSV | regenerate by re-running the test (synthesizer is deterministic; non-determinism would be a real bug) | | AC-3 fails at ~ 0 % even with calibration | wrong intrinsics OR wrong WGS84 ground truth source — verify the GLOBAL_POSITION_INT columns are still the AC-3 reference (per `flight_derkachi/README.md`) | re-derive ground truth | | AC-5 determinism violated | non-deterministic float ordering in C5 estimator OR a clock leaked into the runtime | bisect via `git log` against the C5 / `clock` modules | | AC-6 realtime drifts on shared CI | shared-runner contention; the spec allows widening to ± 5 s | adjust `_HEAVY_SKIP` boundary if it persists | | `tlog missing required messages` | `_tlog_synth.py` lost a message group | check `_REQUIRED_MESSAGE_GROUPS` in `tlog_replay_adapter.py` against the synth output | ## Files ``` tests/e2e/replay/ ├── README.md ← this file ├── __init__.py ← package marker + module-level docstring ├── _helpers.py ← parse_jsonl, l2_horizontal_m, match_percentage, │ CapturingMavlinkTransport, GroundTruthRow ├── _tlog_synth.py ← CSV → tlog generator ├── conftest.py ← derkachi_replay_inputs, replay_runner, │ operator_pre_flight_setup fixtures ├── test_helpers.py ← unit tests for _helpers (unconditional) └── test_derkachi_1min.py ← AC-1..AC-8 + AC-7 skip gate + AC-4a AST scan ``` ## Follow-up work * **Real Topotek KHP20S30 calibration** — unblocks AC-3. * **AZ-558** — closes AC-4b (route C8 encoders through `MavlinkTransport`). * **D-PROJ-2 mock-suite-sat-service** — unblocks AC-8 (operator workflow rehearsal).