Files
gps-denied-onboard/tests/e2e/replay/README.md
T
Oleksandr Bezdieniezhnykh 763d8b21ad [AZ-962] [AZ-964] [AZ-965] operator_replay.yaml + Tier-2 wiring
AZ-962 SHIPPED — Tier-2 Jetson AZ-840 orchestrator test no longer
SKIPs at the env-var gate. configs/operator_replay.yaml registers
c6/c7/c10/c11 with sane defaults (backbones intentionally empty,
see AZ-965); docker-compose.test.jetson.yml exports
GPS_DENIED_OPERATOR_CONFIG_PATH=/opt/configs/operator_replay.yaml
and bind-mounts ./configs:/opt/configs:ro. ENV_KEY_MAP gains
SATELLITE_PROVIDER_URL → c11_tile_manager.satellite_provider_url
and SATELLITE_PROVIDER_API_KEY → c11_tile_manager.service_api_key
so secrets flow from .env.test and never sit in YAML. README drops
the manual export step. 97/97 c11 + config unit tests stay green.

Tier-2 re-run (4 failed / 48 passed / 1 skipped / 1 xfailed /
1 xpassed / 2 errors in 84.99s vs baseline 3 skipped — i.e. -2
skipped, +2 errors): AZ-840 orchestrator test moves from SKIP to
ERROR with a deeper, real gate — IndexUnavailableError on
FaissDescriptorIndex against a fresh c6_tile_cache.root_dir.

AZ-964 (3 SP, todo/) filed for FAISS index bootstrap in the AZ-839
C3 fixture. AZ-965 (3 SP, todo/, blocked by AZ-964) filed for
NetVLAD ONNX backbone provisioning — the next gate the orchestrator
test will hit once FAISS clears.

Cycle-4 e2e gate remains NOT GREEN: AZ-840 chain is now AZ-964 →
AZ-965 → PASS; 60s smoke chain is AZ-963 → PASS. OKVIS2 deferral
directive (2026-05-29) unchanged — still gated behind Derkachi
e2e green, still NOT MET.

Co-authored-by: Cursor <cursoragent@cursor.com>
2026-05-29 16:42:55 +03:00

216 lines
12 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters
This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.
# E2E replay tests (AZ-404 + AZ-835 + cycle-4)
End-to-end regression suite for the `gps-denied-replay` console-script
(AZ-402). Two distinct entry points live here:
| Entry point | Source | Coverage |
|-------------|--------|----------|
| **AZ-265 / AZ-404** — 60 s Derkachi clip with synthetic tlog | `test_derkachi_1min.py` | Original AC-1..AC-10 of the replay epic; runs on Tier-1 + Tier-2 |
| **AZ-835 / AZ-840** — full `(tlog, video, calibration)` orchestrator | `test_az835_e2e_real_flight.py` | Tier-2 only; closes the real-flight validation loop end-to-end (extract → seed → FAISS → run → verdict) |
The cycle-4 replay-input redesign (AZ-894 / AZ-895 / AZ-896) replaces
the tlog auto-sync surface with a CSV-driven path; the AZ-265 suite is
the regression net that catches drift in the legacy path during the
deprecation window. See `replay_protocol.md` Invariants 12-14 for the
authoritative contract.
## How to run
### AZ-404 Derkachi 60 s suite (Tier-1 + Tier-2)
```bash
# In a fresh venv with the package installed:
RUN_REPLAY_E2E=1 pytest tests/e2e/replay/test_derkachi_1min.py -v
```
Without `RUN_REPLAY_E2E=1` the heavy tests skip cleanly. The two
unconditional tests (AC-4a mode-agnosticism scan + AC-7 skip-gate
self-check + the helpers in `test_helpers.py`) still run.
### AZ-835 orchestrator test — full `(tlog, video, calibration)` loop (Tier-2 only)
Closes Epic AZ-835's narrative: given a real-flight `.tlog` + the
matching nadir video + camera calibration, the orchestrator runs the
7-step pipeline end-to-end and writes a verdict report.
**Required inputs** (already in-repo for the Derkachi reference fixture):
- `.tlog` — pymavlink binary log from a real flight. Reference fixture:
`_docs/00_problem/input_data/flight_derkachi/data_imu.csv` (the canonical
CSV that `_tlog_synth.py` reconstructs the tlog from) plus the synthesised
tlog the conftest emits at session start.
- Nadir video — `_docs/00_problem/input_data/flight_derkachi/*.mp4` (large
asset; not always checked in to the workstation clone — pull from the
Jetson e2e harness or git LFS if absent).
- Calibration — `tests/fixtures/calibration/adti26.json` (factory-sheet
approximation for the Topotek KHP20S30; real intrinsics still TBD).
**Tier-2 invocation** (Jetson):
```bash
ssh jetson-e2e
cd /workspace/gps-denied-onboard
export RUN_REPLAY_E2E=1
pytest tests/e2e/replay/test_az835_e2e_real_flight.py -v --tb=short -m tier2
```
AZ-962: `docker-compose.test.jetson.yml` exports
`GPS_DENIED_OPERATOR_CONFIG_PATH=/opt/configs/operator_replay.yaml`
automatically and bind-mounts `./configs:/opt/configs:ro`, so no
manual env-var export is required when running through
`scripts/run-tests-jetson.sh`. The YAML at `configs/operator_replay.yaml`
declares the four blocks the fixture requires (c6 / c7 / c10 / c11);
secrets (`SATELLITE_PROVIDER_API_KEY`) flow in from `.env.test` via
the loader's `ENV_KEY_MAP`. `c10_provisioning.backbones` is
intentionally empty pending AZ-964 (the orchestrator test will
SKIP at the "no backbones" gate until AZ-964 lands).
The bundled local-development entry point is `scripts/run-tests-jetson.sh`,
which handles the SSH alias + rsync + remote pytest invocation. See
`_docs/02_document/tests/tier2-jetson-testing.md` for the harness contract.
**Skip gates (in evaluation order)**:
1. `@pytest.mark.tier2` — the per-suite Tier-2 plugin gates this off on dev
macOS / Tier-1 Docker (matches the AZ-839 / AZ-699 contract).
2. `RUN_REPLAY_E2E` not in `{1, true, yes, on}`.
3. `gps-denied-replay` console-script not on `PATH`.
4. Real Derkachi video missing or placeholder-sized.
5. `operator_pre_flight_setup` fixture itself skipped — the downstream
consumer inherits the SKIP automatically (pytest's fixture-skip
propagation).
**Expected runtime on Tier-2 Jetson AGX Orin** (cold cache): ≤ 8 min
end-to-end (≤ 5 min C3 fixture cold-start budget + ≤ 3 min for the
replay + verdict compute). Warm-cache reinvocations within the same
compose session: ≤ 60 s.
**Verdict report location**: `_docs/06_metrics/real_flight_validation_<YYYY-MM-DD>.md`.
The report is emitted ALWAYS, regardless of PASS / FAIL on the AZ-696
AC-3 threshold (≥ 80 % of emissions within 100 m of tlog ground truth).
The success criterion at the fixture level is "honest report exists with
distribution data", not "PASS". The PASS / FAIL line of the report itself
is the operator-facing answer to "did this flight clip localise within
the threshold".
### Imagery source license attribution (dev/research use only)
The Jetson e2e harness's `satellite-provider` instance downloads tiles
from the **Google Maps satellite layer** (`mt0..mt3.google.com/vt/lyrs=s`),
governed by Google Maps Platform Terms of Service. Every tile served by
the harness carries the **"Imagery © Google"** attribution string.
**This is dev/research use only.** Production deployment of the
gps-denied-onboard companion against a Google-Maps-sourced
`satellite-provider` requires either a Google Maps Platform licensing
review or migration to a true CC-BY satellite source on the parent-suite
side (parent-suite ticket TBD; see `_docs/02_document/architecture.md`
§ `satellite-provider` integration). The onboard-side seed scripts
(`tests/fixtures/derkachi_c6/seed_region.py`, `seed_route.py`) propagate
the attribution into the test fixture's metadata; do not remove it.
## Fixture state
| Artifact | Status | Source |
|----------|--------|--------|
| `flight_derkachi.mp4` | available | `_docs/00_problem/input_data/flight_derkachi/` |
| `data_imu.csv` | available | same dir; 4900 rows at 10 Hz over 489.9 s |
| Synthetic tlog | generated at fixture time | `_tlog_synth.py` reproduces a `pymavlink` `.tlog` from the CSV (the original tlog is not in-repo; the CSV was its export) |
| Camera calibration | placeholder (`tests/fixtures/calibration/adti26.json`) | The real Topotek KHP20S30 intrinsics are unknown per `camera_info.md`. AC-3 is `xfail`ed until a real calibration ships. |
| Operator pre-flight rehearsal | blocked | `tests/fixtures/mock-suite-sat-service/` is a bootstrap stub (only `GET /healthz`); AC-8 skips until the full D-PROJ-2 contract lands. |
## Clip range
The first 60 s of the Derkachi flight (Time=0.0 → Time=60.0). The
take-off region exercises the AZ-405 IMU-take-off auto-sync detector;
the cruise region that follows stresses the satellite-anchor + VIO
drift-correction path. To change the trim, edit `_CLIP_START_S` and
`_CLIP_END_S` in `conftest.py`.
## Expected runtime (Tier-1)
| Test | Expected wall clock |
|------|---------------------|
| AC-1 (`--pace asap`) | ≤ 30 s |
| AC-2 schema match | piggybacks on AC-1 |
| AC-5 determinism | 2 × asap runs (≤ 60 s total) |
| AC-6 realtime | 60 s ± 3 s |
| AC-6 asap | ≤ 30 s |
| Total suite | ≤ 6 min on Jetson AGX Orin |
The AC-1 / AC-2 / AC-5 tests share `--pace asap` runs but each
fixture invocation produces a fresh output file, so they do not
short-circuit each other (preserves AC-5's two-runs-diff guarantee).
## AC matrix
| AC | Test | State |
|----|------|-------|
| AC-1: exit 0 + JSONL count match | `test_ac1_exits_0_jsonl_count_match` | runs on Tier-1 |
| AC-2: JSONL schema match | `test_ac2_jsonl_schema_match` | runs on Tier-1 |
| AC-3: ≤ 100 m for 80 % of ticks | `test_ac3_within_100m_80pct_of_ticks` | `xfail` (waiting on real calibration) |
| AC-4a: mode-agnosticism AST scan | `test_ac4_mode_agnosticism_ast_scan` | unconditional |
| AC-4b: encoder byte-equality | `test_ac4_encoder_byte_equality` | `skip` (waiting on AZ-558) |
| AC-5: determinism | `test_ac5_determinism_two_runs_diff` | runs on Tier-1 |
| AC-6a: realtime 60 s ± 5 % | `test_ac6_pace_realtime_60s_within_5pct` | runs on Tier-1 |
| AC-6b: asap ≤ 30 s | `test_ac6_pace_asap_under_30s` | runs on Tier-1 |
| AC-7: skip-gate self-check | `test_ac7_skip_gate_consistent_with_env_var` | unconditional |
| AC-8: operator workflow rehearsal | `test_ac8_operator_workflow` | `skip` (waiting on D-PROJ-2 mock) |
| AC-9: helper L2 correctness | `test_helpers.py::test_ac9_l2_*` | unconditional |
| AC-10: README accuracy | this file | live |
## Failure-mode cookbook
| Symptom | Likely cause | Fix |
|---------|--------------|-----|
| `gps-denied-replay console-script not on PATH` | package not installed in the test venv | `pip install -e .` |
| AC-1 line count off by > 5 % | tlog synthesizer drifted from the CSV | regenerate by re-running the test (synthesizer is deterministic; non-determinism would be a real bug) |
| AC-3 fails at ~ 0 % even with calibration | wrong intrinsics OR wrong WGS84 ground truth source — verify the GLOBAL_POSITION_INT columns are still the AC-3 reference (per `flight_derkachi/README.md`) | re-derive ground truth |
| AC-5 determinism violated | non-deterministic float ordering in C5 estimator OR a clock leaked into the runtime | bisect via `git log` against the C5 / `clock` modules |
| AC-6 realtime drifts on shared CI | shared-runner contention; the spec allows widening to ± 5 s | adjust `_HEAVY_SKIP` boundary if it persists |
| `tlog missing required messages` | `_tlog_synth.py` lost a message group | check `_REQUIRED_MESSAGE_GROUPS` in `tlog_replay_adapter.py` against the synth output |
## Files
```
tests/e2e/replay/
├── README.md ← this file
├── __init__.py ← package marker + module-level docstring
├── _helpers.py ← parse_jsonl, l2_horizontal_m, match_percentage,
│ CapturingMavlinkTransport, GroundTruthRow
├── _tlog_synth.py ← CSV → tlog generator
├── conftest.py ← derkachi_replay_inputs, replay_runner,
│ operator_pre_flight_setup fixtures
├── test_helpers.py ← unit tests for _helpers (unconditional)
└── test_derkachi_1min.py ← AC-1..AC-8 + AC-7 skip gate + AC-4a AST scan
```
## Follow-up work
* **Real Topotek KHP20S30 calibration** — unblocks AC-3.
* **AZ-558** — closes AC-4b (route C8 encoders through `MavlinkTransport`).
* **D-PROJ-2 mock-suite-sat-service** — unblocks AC-8 (operator
workflow rehearsal).
## Epic AZ-835 ticket map
The Tier-2 orchestrator path shipped under Epic
[AZ-835](https://denyspopov.atlassian.net/browse/AZ-835). Sub-tickets:
| Ticket | Role |
|--------|------|
| [AZ-836](https://denyspopov.atlassian.net/browse/AZ-836) | `TlogRouteExtractor` — active-segment trim + Douglas-Peucker coarsen tlog GPS to ≤ N waypoints |
| [AZ-838](https://denyspopov.atlassian.net/browse/AZ-838) | `SatelliteProviderRouteClient` + `seed_route.py` CLI — POST RouteSpec to satellite-provider, poll `mapsReady` |
| [AZ-839](https://denyspopov.atlassian.net/browse/AZ-839) | C3 `operator_pre_flight_setup` real fixture — wires C1+C2+C11+C10 against the seeded catalog |
| [AZ-840](https://denyspopov.atlassian.net/browse/AZ-840) | C4 E2E orchestrator test — drives the full 7-step pipeline from `(tlog, video, calibration)` |
| [AZ-842](https://denyspopov.atlassian.net/browse/AZ-842) | C6 Docs — `replay_protocol.md` Invariants 12-14 + `architecture.md` + this README (cycle-4 rescope) |
The cycle-4 replay-input redesign tickets ride alongside the Epic:
| Ticket | Role |
|--------|------|
| [AZ-894](https://denyspopov.atlassian.net/browse/AZ-894) | `CsvReplayInputAdapter` — new CSV-driven primary path on the single canonical clock |
| [AZ-895](https://denyspopov.atlassian.net/browse/AZ-895) | Auto-sync surface deprecation — tlog adapter reduced to audit-only role |
| [AZ-896](https://denyspopov.atlassian.net/browse/AZ-896) | CSV format spec (`csv_replay_format.md`) + example `data_imu.csv` |
| [AZ-897](https://denyspopov.atlassian.net/browse/AZ-897) | Operator-facing UI (React + Tailwind paired-upload form) — cycle 5+ |