Batch 63 of /autodev replay slice. Adds the AZ-404 E2E test harness against the Derkachi fixture and resolves the AZ-389 dependency phantom (closing AZ-559 Won't Fix). E2E test (AZ-404) - tests/e2e/replay/_tlog_synth.py: deterministic CSV->tlog generator (the original Derkachi tlog is not in repo; data_imu.csv is its export, so we round-trip the CSV through pymavlink). Verified: SCALED_IMU2 + ATTITUDE + GPS_RAW_INT + HEARTBEAT round-trip cleanly through mavutil.mavlink_connection. - tests/e2e/replay/_helpers.py: parse_jsonl, l2_horizontal_m (haversine), match_percentage, CapturingMavlinkTransport (ready for AZ-558 unblock), GroundTruthRow + load_ground_truth_csv. - tests/e2e/replay/conftest.py: derkachi_replay_inputs (session scope), replay_runner (subprocess fixture per AZ-402 CLI), operator_pre_flight_setup placeholder. - tests/e2e/replay/test_derkachi_1min.py: 9 tests covering AC-1..AC-8 with AC-7 skip-gate self-check + AC-4a mode-agnosticism AST scan (passes unconditionally, confirms ADR-011 holding). - tests/e2e/replay/test_helpers.py: 14 unit tests covering AC-9 helper L2 correctness + match_percentage + parse_jsonl + CapturingMavlinkTransport (all unconditional). - tests/e2e/replay/README.md: AC matrix, fixture state, runtime budget, failure cookbook (AC-10). AC matrix - AC-1, AC-2, AC-5, AC-6 implemented and Tier-1 gated on RUN_REPLAY_E2E=1. - AC-3 (<=100m for 80%) xfail until real Topotek KHP20S30 calibration ships (camera_info.md states intrinsics are unknown). - AC-4a (mode-agnosticism AST scan) PASSES unconditionally. - AC-4b (encoder byte-equality) skip until AZ-558 routes C8 bytes through MavlinkTransport. - AC-7 (skip-gate self-check) PASSES unconditionally. - AC-8 (operator workflow rehearsal) skip until D-PROJ-2 mock-suite-sat-service implements tile-fetch + index-build endpoints. - AC-9 (helper L2 correctness) 14 PASSES unconditionally. AZ-389 housekeeping - AZ-559 closed Won't Fix: investigation against c6_tile_cache/_types.py confirmed TileSource.ONBOARD_INGEST + TileMetadata.quality_metadata + write_tile's FreshnessRejectionError already cover the mid-flight ingest semantic. The "missing API" was a spec-vs-impl naming mismatch. - AZ-389 spec rewritten to consume the existing write_tile API + catch FreshnessRejectionError per AC-NEW-3 opportunistic emission. - _dependencies_table.md reverted: AZ-389 deps -> AZ-303 (was AZ-559 in the previous commit on this branch); total 150 / 497 pts. Tests - Full regression: 2099 passed (+14 new e2e/replay), 94 skipped (incl. 8 e2e/replay heavy-tier + documented blocker skips), 3 perf-microbench flakes deselected (test_cli_cold_start_under_2s, test_cold_start_under_500ms_p99, test_nfr_perf_sign_microbench; all pass in isolation - pre-existing under-load flakes on dev macOS). Reviews - _docs/03_implementation/reviews/batch_63_review.md: code review PASS_WITH_WARNINGS (3 documented spec-gap deferrals: AC-3, AC-4b, AC-8). - _docs/03_implementation/cumulative_review_batches_61-63_cycle1_report.md: cumulative review PASS_WITH_WARNINGS. Action items: prioritise AZ-558 (closes AZ-401 AC-9 + AZ-404 AC-4b); consider 2pt hygiene PBI for Protocol-completeness AST scan to catch the AZ-389 / AZ-559 phantom-API pattern at task-prep time. Architecture invariants observably holding - ADR-011 (replay-as-configuration): AC-4a's AST scan over src/gps_denied_onboard/components/**/*.py finds zero violations - components branch on neither config.mode nor any synonym. - Single composition root (replay protocol Invariant 11): AZ-402 CLI dispatches to runtime_root.main(config); does not call compose_root directly. Co-authored-by: Cursor <cursoragent@cursor.com>
5.2 KiB
E2E replay tests (AZ-404)
End-to-end regression suite that runs the gps-denied-replay
console-script (AZ-402) against the Derkachi 60 s clip and asserts
the AZ-265 epic acceptance criteria.
How to run
# In a fresh venv with the package installed:
RUN_REPLAY_E2E=1 pytest tests/e2e/replay/ -v
Without RUN_REPLAY_E2E=1 the heavy tests skip cleanly. The two
unconditional tests (AC-4a mode-agnosticism scan + AC-7 skip-gate
self-check + the helpers in test_helpers.py) still run.
Fixture state
| Artifact | Status | Source |
|---|---|---|
flight_derkachi.mp4 |
available | _docs/00_problem/input_data/flight_derkachi/ |
data_imu.csv |
available | same dir; 4900 rows at 10 Hz over 489.9 s |
| Synthetic tlog | generated at fixture time | _tlog_synth.py reproduces a pymavlink .tlog from the CSV (the original tlog is not in-repo; the CSV was its export) |
| Camera calibration | placeholder (tests/fixtures/calibration/adti26.json) |
The real Topotek KHP20S30 intrinsics are unknown per camera_info.md. AC-3 is xfailed until a real calibration ships. |
| Operator pre-flight rehearsal | blocked | tests/fixtures/mock-suite-sat-service/ is a bootstrap stub (only GET /healthz); AC-8 skips until the full D-PROJ-2 contract lands. |
Clip range
The first 60 s of the Derkachi flight (Time=0.0 → Time=60.0). The
take-off region exercises the AZ-405 IMU-take-off auto-sync detector;
the cruise region that follows stresses the satellite-anchor + VIO
drift-correction path. To change the trim, edit _CLIP_START_S and
_CLIP_END_S in conftest.py.
Expected runtime (Tier-1)
| Test | Expected wall clock |
|---|---|
AC-1 (--pace asap) |
≤ 30 s |
| AC-2 schema match | piggybacks on AC-1 |
| AC-5 determinism | 2 × asap runs (≤ 60 s total) |
| AC-6 realtime | 60 s ± 3 s |
| AC-6 asap | ≤ 30 s |
| Total suite | ≤ 6 min on Jetson AGX Orin |
The AC-1 / AC-2 / AC-5 tests share --pace asap runs but each
fixture invocation produces a fresh output file, so they do not
short-circuit each other (preserves AC-5's two-runs-diff guarantee).
AC matrix
| AC | Test | State |
|---|---|---|
| AC-1: exit 0 + JSONL count match | test_ac1_exits_0_jsonl_count_match |
runs on Tier-1 |
| AC-2: JSONL schema match | test_ac2_jsonl_schema_match |
runs on Tier-1 |
| AC-3: ≤ 100 m for 80 % of ticks | test_ac3_within_100m_80pct_of_ticks |
xfail (waiting on real calibration) |
| AC-4a: mode-agnosticism AST scan | test_ac4_mode_agnosticism_ast_scan |
unconditional |
| AC-4b: encoder byte-equality | test_ac4_encoder_byte_equality |
skip (waiting on AZ-558) |
| AC-5: determinism | test_ac5_determinism_two_runs_diff |
runs on Tier-1 |
| AC-6a: realtime 60 s ± 5 % | test_ac6_pace_realtime_60s_within_5pct |
runs on Tier-1 |
| AC-6b: asap ≤ 30 s | test_ac6_pace_asap_under_30s |
runs on Tier-1 |
| AC-7: skip-gate self-check | test_ac7_skip_gate_consistent_with_env_var |
unconditional |
| AC-8: operator workflow rehearsal | test_ac8_operator_workflow |
skip (waiting on D-PROJ-2 mock) |
| AC-9: helper L2 correctness | test_helpers.py::test_ac9_l2_* |
unconditional |
| AC-10: README accuracy | this file | live |
Failure-mode cookbook
| Symptom | Likely cause | Fix |
|---|---|---|
gps-denied-replay console-script not on PATH |
package not installed in the test venv | pip install -e . |
| AC-1 line count off by > 5 % | tlog synthesizer drifted from the CSV | regenerate by re-running the test (synthesizer is deterministic; non-determinism would be a real bug) |
| AC-3 fails at ~ 0 % even with calibration | wrong intrinsics OR wrong WGS84 ground truth source — verify the GLOBAL_POSITION_INT columns are still the AC-3 reference (per flight_derkachi/README.md) |
re-derive ground truth |
| AC-5 determinism violated | non-deterministic float ordering in C5 estimator OR a clock leaked into the runtime | bisect via git log against the C5 / clock modules |
| AC-6 realtime drifts on shared CI | shared-runner contention; the spec allows widening to ± 5 s | adjust _HEAVY_SKIP boundary if it persists |
tlog missing required messages |
_tlog_synth.py lost a message group |
check _REQUIRED_MESSAGE_GROUPS in tlog_replay_adapter.py against the synth output |
Files
tests/e2e/replay/
├── README.md ← this file
├── __init__.py ← package marker + module-level docstring
├── _helpers.py ← parse_jsonl, l2_horizontal_m, match_percentage,
│ CapturingMavlinkTransport, GroundTruthRow
├── _tlog_synth.py ← CSV → tlog generator
├── conftest.py ← derkachi_replay_inputs, replay_runner,
│ operator_pre_flight_setup fixtures
├── test_helpers.py ← unit tests for _helpers (unconditional)
└── test_derkachi_1min.py ← AC-1..AC-8 + AC-7 skip gate + AC-4a AST scan
Follow-up work
- Real Topotek KHP20S30 calibration — unblocks AC-3.
- AZ-558 — closes AC-4b (route C8 encoders through
MavlinkTransport). - D-PROJ-2 mock-suite-sat-service — unblocks AC-8 (operator workflow rehearsal).