- Modified the autodev state to reflect the current testing phase and details of the new `jetson-e2e` tests. - Enhanced the "How to Test" documentation to provide clearer instructions on the demo replay validation process, including video and tlog alignment steps. - Updated architectural documentation to include the new demo replay operator flow and its dependencies. - Documented the removal of deprecated auto-sync features and clarified the operator-facing UI for replay validation. - Added new entries in the dependencies table for upcoming tasks related to the demo replay flow. These changes improve clarity and usability for operators and developers working with the demo replay system.
12 KiB
E2E replay tests (AZ-404 + AZ-835 + cycle-4)
End-to-end regression suite for the gps-denied-replay console-script
(AZ-402). Two distinct entry points live here:
| Entry point | Source | Coverage |
|---|---|---|
| AZ-265 / AZ-404 — 60 s Derkachi clip with synthetic tlog | test_derkachi_1min.py |
Original AC-1..AC-10 of the replay epic; runs on Tier-1 + Tier-2 |
AZ-835 / AZ-840 — full (tlog, video, calibration) orchestrator |
test_az835_e2e_real_flight.py |
Tier-2 only; closes the real-flight validation loop end-to-end (extract → seed → FAISS → run → verdict) |
The cycle-4 replay-input redesign (AZ-894 / AZ-895 / AZ-896) replaces
the tlog auto-sync surface with a CSV-driven path; the AZ-265 suite is
the regression net that catches drift in the legacy path during the
deprecation window. See replay_protocol.md Invariants 12-14 for the
authoritative contract.
How to run
AZ-404 Derkachi 60 s suite (Tier-1 + Tier-2)
# In a fresh venv with the package installed:
RUN_REPLAY_E2E=1 pytest tests/e2e/replay/test_derkachi_1min.py -v
Without RUN_REPLAY_E2E=1 the heavy tests skip cleanly. The two
unconditional tests (AC-4a mode-agnosticism scan + AC-7 skip-gate
self-check + the helpers in test_helpers.py) still run.
AZ-835 orchestrator test — full (tlog, video, calibration) loop (Tier-2 only)
Closes Epic AZ-835's narrative: given a real-flight .tlog + the
matching nadir video + camera calibration, the orchestrator runs the
7-step pipeline end-to-end and writes a verdict report.
Required inputs (already in-repo for the Derkachi reference fixture):
.tlog— pymavlink binary log from a real flight. Reference fixture:_docs/00_problem/input_data/flight_derkachi/data_imu.csv(the canonical CSV that_tlog_synth.pyreconstructs the tlog from) plus the synthesised tlog the conftest emits at session start.- Nadir video —
_docs/00_problem/input_data/flight_derkachi/*.mp4(large asset; not always checked in to the workstation clone — pull from the Jetson e2e harness or git LFS if absent). - Calibration —
tests/fixtures/calibration/adti26.json(factory-sheet approximation for the Topotek KHP20S30; real intrinsics still TBD).
Tier-2 invocation (Jetson):
ssh jetson-e2e
cd /workspace/gps-denied-onboard
export RUN_REPLAY_E2E=1
pytest tests/e2e/replay/test_az835_e2e_real_flight.py -v --tb=short -m tier2
AZ-962: docker-compose.test.jetson.yml exports
GPS_DENIED_OPERATOR_CONFIG_PATH=/opt/configs/operator_replay.yaml
automatically and bind-mounts ./configs:/opt/configs:ro, so no
manual env-var export is required when running through
scripts/run-tests-jetson.sh. The YAML at configs/operator_replay.yaml
declares the four blocks the fixture requires (c6 / c7 / c10 / c11);
secrets (SATELLITE_PROVIDER_API_KEY) flow in from .env.test via
the loader's ENV_KEY_MAP. c10_provisioning.backbones is
intentionally empty pending AZ-964 (the orchestrator test will
SKIP at the "no backbones" gate until AZ-964 lands).
The bundled local-development entry point is scripts/run-tests-jetson.sh,
which handles the SSH alias + rsync + remote pytest invocation. See
_docs/02_document/tests/tier2-jetson-testing.md for the harness contract.
Skip gates (in evaluation order):
@pytest.mark.tier2— the per-suite Tier-2 plugin gates this off on dev macOS / Tier-1 Docker (matches the AZ-839 / AZ-699 contract).RUN_REPLAY_E2Enot in{1, true, yes, on}.gps-denied-replayconsole-script not onPATH.- Real Derkachi video missing or placeholder-sized.
operator_pre_flight_setupfixture itself skipped — the downstream consumer inherits the SKIP automatically (pytest's fixture-skip propagation).
Expected runtime on Tier-2 Jetson AGX Orin (cold cache): ≤ 8 min end-to-end (≤ 5 min C3 fixture cold-start budget + ≤ 3 min for the replay + verdict compute). Warm-cache reinvocations within the same compose session: ≤ 60 s.
Verdict report location: _docs/06_metrics/real_flight_validation_<YYYY-MM-DD>.md.
The report is emitted ALWAYS, regardless of PASS / FAIL on the AZ-696
AC-3 threshold (≥ 80 % of emissions within 100 m of tlog ground truth).
The success criterion at the fixture level is "honest report exists with
distribution data", not "PASS". The PASS / FAIL line of the report itself
is the operator-facing answer to "did this flight clip localise within
the threshold".
Imagery source license attribution (dev/research use only)
The Jetson e2e harness's satellite-provider instance downloads tiles
from the Google Maps satellite layer (mt0..mt3.google.com/vt/lyrs=s),
governed by Google Maps Platform Terms of Service. Every tile served by
the harness carries the "Imagery © Google" attribution string.
This is dev/research use only. Production deployment of the
gps-denied-onboard companion against a Google-Maps-sourced
satellite-provider requires either a Google Maps Platform licensing
review or migration to a true CC-BY satellite source on the parent-suite
side (parent-suite ticket TBD; see _docs/02_document/architecture.md
§ satellite-provider integration). The onboard-side seed scripts
(tests/fixtures/derkachi_c6/seed_region.py, seed_route.py) propagate
the attribution into the test fixture's metadata; do not remove it.
Fixture state
| Artifact | Status | Source |
|---|---|---|
flight_derkachi.mp4 |
available | _docs/00_problem/input_data/flight_derkachi/ |
data_imu.csv |
available | same dir; 4900 rows at 10 Hz over 489.9 s |
| Synthetic tlog | generated at fixture time | _tlog_synth.py reproduces a pymavlink .tlog from the CSV (the original tlog is not in-repo; the CSV was its export) |
| Camera calibration | placeholder (tests/fixtures/calibration/adti26.json) |
The real Topotek KHP20S30 intrinsics are unknown per camera_info.md. AC-3 accuracy depends on this. |
| Operator pre-flight rehearsal | blocked | tests/fixtures/mock-suite-sat-service/ is a bootstrap stub (only GET /healthz); AC-8 skips until the full D-PROJ-2 contract lands. |
Clip range
The first 60 s of the Derkachi flight (Time=0.0 → Time=60.0). The
take-off region exercises the AZ-405 IMU-take-off auto-sync detector;
the cruise region that follows stresses the satellite-anchor + VIO
drift-correction path. To change the trim, edit _CLIP_START_S and
_CLIP_END_S in conftest.py.
Expected runtime (Tier-1)
| Test | Expected wall clock |
|---|---|
AC-1 (--pace asap) |
≤ 30 s (Tier-2 only) |
| AC-2 schema match | piggybacks on AC-1 (Tier-2 only) |
| AC-5 determinism | 2 × asap runs (≤ 60 s total; Tier-2 only) |
| AC-6 realtime | 60 s ± 3 s (Tier-2 only) |
| AC-6 asap | ≤ 30 s (Tier-2 only) |
| Total suite | ~6 min wall clock on Tier-2; skips on Mac |
The AC-1 / AC-2 / AC-5 tests share --pace asap runs but each
fixture invocation produces a fresh output file, so they do not
short-circuit each other (preserves AC-5's two-runs-diff guarantee).
AC matrix
| AC | Test | State |
|---|---|---|
| AC-1: exit 0 + JSONL count match | test_ac1_exits_0_jsonl_count_match |
xfail (AZ-963 — open-loop ESKF) |
| AC-2: JSONL schema match | test_ac2_jsonl_schema_match |
Tier-2 (Jetson only) |
| AC-3: ≤ 100 m for 80 % of ticks | test_ac3_within_100m_80pct_of_ticks |
xfail (AZ-963 — open-loop ESKF) |
| AC-4a: mode-agnosticism AST scan | test_ac4_mode_agnosticism_ast_scan |
unconditional |
| AC-4b: encoder byte-equality | test_ac4_encoder_byte_equality |
skip (waiting on AZ-558) |
| AC-5: determinism | test_ac5_determinism_two_runs_diff |
xfail (AZ-963 — open-loop ESKF) |
| AC-6a: realtime 60 s ± 5 % | test_ac6_pace_realtime_60s_within_5pct |
xfail (AZ-963 — open-loop ESKF) |
| AC-6b: asap ≤ 30 s | test_ac6_pace_asap_under_30s |
xfail (AZ-963 — open-loop ESKF) |
| AC-7: skip-gate self-check | test_ac7_skip_gate_consistent_with_env_var |
unconditional |
| AC-8: operator workflow rehearsal | test_ac8_operator_workflow |
skip (waiting on D-PROJ-2 mock) |
| AC-9: helper L2 correctness | test_helpers.py::test_ac9_l2_* |
unconditional |
| AC-10: README accuracy | this file | live |
Failure-mode cookbook
| Symptom | Likely cause | Fix |
|---|---|---|
gps-denied-replay console-script not on PATH |
package not installed in the test venv | pip install -e . |
| AC-1 line count off by > 5 % | tlog synthesizer drifted from the CSV | regenerate by re-running the test (synthesizer is deterministic; non-determinism would be a real bug) |
| AC-3 fails at ~ 0 % even with calibration | wrong intrinsics OR wrong WGS84 ground truth source — verify the GLOBAL_POSITION_INT columns are still the AC-3 reference (per flight_derkachi/README.md) |
re-derive ground truth |
| AC-5 determinism violated | non-deterministic float ordering in C5 estimator OR a clock leaked into the runtime | bisect via git log against the C5 / clock modules |
| AC-6 realtime drifts on shared CI | shared-runner contention; the spec allows widening to ± 5 s | adjust _HEAVY_SKIP boundary if it persists |
tlog missing required messages |
_tlog_synth.py lost a message group |
check _REQUIRED_MESSAGE_GROUPS in tlog_replay_adapter.py against the synth output |
Files
tests/e2e/replay/
├── README.md ← this file
├── __init__.py ← package marker + module-level docstring
├── _helpers.py ← parse_jsonl, l2_horizontal_m, match_percentage,
│ CapturingMavlinkTransport, GroundTruthRow
├── _tlog_synth.py ← CSV → tlog generator
├── conftest.py ← derkachi_replay_inputs, replay_runner,
│ operator_pre_flight_setup fixtures
├── test_helpers.py ← unit tests for _helpers (unconditional)
└── test_derkachi_1min.py ← AC-1..AC-8 + AC-7 skip gate + AC-4a AST scan
Follow-up work
- AZ-963 — five Derkachi ACs (
AC-1,AC-3,AC-5,AC-6a,AC-6b) arexfailuntil a reference C6 tile cache exists (resolution path: AZ-777 / AZ-974). - Real Topotek KHP20S30 calibration — needed for AC-3 accuracy even after AZ-777 lands (the threshold is ≤100 m for 80 % of ticks).
- AZ-558 — closes AC-4b (route C8 encoders through
MavlinkTransport). - D-PROJ-2 mock-suite-sat-service — unblocks AC-8 (operator workflow rehearsal).
Epic AZ-835 ticket map
The Tier-2 orchestrator path shipped under Epic AZ-835. Sub-tickets:
| Ticket | Role |
|---|---|
| AZ-836 | TlogRouteExtractor — active-segment trim + Douglas-Peucker coarsen tlog GPS to ≤ N waypoints |
| AZ-838 | SatelliteProviderRouteClient + seed_route.py CLI — POST RouteSpec to satellite-provider, poll mapsReady |
| AZ-839 | C3 operator_pre_flight_setup real fixture — wires C1+C2+C11+C10 against the seeded catalog |
| AZ-840 | C4 E2E orchestrator test — drives the full 7-step pipeline from (tlog, video, calibration) |
| AZ-842 | C6 Docs — replay_protocol.md Invariants 12-14 + architecture.md + this README (cycle-4 rescope) |
The cycle-4 replay-input redesign tickets ride alongside the Epic:
| Ticket | Role |
|---|---|
| AZ-894 | CsvReplayInputAdapter — new CSV-driven primary path on the single canonical clock |
| AZ-895 | Auto-sync surface deprecation — tlog adapter reduced to audit-only role |
| AZ-896 | CSV format spec (csv_replay_format.md) + example data_imu.csv |
| AZ-897 | Operator-facing UI (React + Tailwind paired-upload form) — cycle 5+ |