Files
gps-denied-onboard/tests/e2e/replay/README.md
T
Oleksandr Bezdieniezhnykh 42b1db6ace [AZ-842] Batch 04 cycle 4: AZ-835 docs + cycle-4 redesign narrative
Closes AZ-835 Epic C6 (docs) and folds the cycle-4 replay-input
redesign narrative (AZ-894 CSV adapter / AZ-895 auto-sync deprecation
/ AZ-896 format spec / AZ-897 UI follow-up) into the three
authoritative documents.

Modified:
- _docs/02_document/contracts/replay/replay_protocol.md: extend
  Invariant 12 with sub-invariants 12.c (route-driven supersedes
  bbox; ~100x tile efficiency + did-fly-vs-might-fly honesty) and
  12.d (fixture failure-handling: validation/terminal re-raise;
  transient -> C11 backoff x3). Add Invariant 14 with sub-
  invariants 14.a-14.d covering the single canonical clock model,
  the CSV-driven path, the tlog adapter's audit-only role, the
  auto-sync deprecation, and the AZ-897 UI follow-up pointer.
- _docs/02_document/architecture.md: add the AZ-777 Phase 3+
  superseded-by-Epic-AZ-835 supersession block + new "Replay input
  redesign (cycle 4)" sub-section with the cycle-4 ticket table.
- tests/e2e/replay/README.md: top section restructured for two
  distinct entry points (AZ-265/AZ-404 vs. AZ-835/AZ-840); add
  full AZ-835 orchestrator-test section (env vars, skip gates,
  expected runtime, verdict report path); add Imagery (c) Google
  attribution + dev-only caveat; add Epic AZ-835 ticket map.

Spec deviation: AC-1b says "new Invariant 13" but Invariant 13 is
already taken (C4<->C5 pairing, AZ-776 / ADR-012), and is referenced
by number in architecture.md, c4_pose description.md, and ADR-012
prose. Cycle-4 content shipped as Invariant 14 to preserve those
cross-references; renumbering would have cascaded to 3 files outside
AZ-842's ownership envelope. Documented in batch report.

Out-of-scope hygiene gap (NOT fixed in this batch):
BUILD_CSV_REPLAY_ADAPTER flag is not yet enumerated in
_docs/02_document/module-layout.md's Build-Time Exclusion Map.
Inherited from cycle-4 AZ-894. Suggested as a cycle-5+ hygiene PBI.

AZ-835 epic file stays in todo/ until AZ-841 (backlog) is resolved.

Co-authored-by: Cursor <cursoragent@cursor.com>
2026-05-29 11:13:33 +03:00

206 lines
11 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters
This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.
# E2E replay tests (AZ-404 + AZ-835 + cycle-4)
End-to-end regression suite for the `gps-denied-replay` console-script
(AZ-402). Two distinct entry points live here:
| Entry point | Source | Coverage |
|-------------|--------|----------|
| **AZ-265 / AZ-404** — 60 s Derkachi clip with synthetic tlog | `test_derkachi_1min.py` | Original AC-1..AC-10 of the replay epic; runs on Tier-1 + Tier-2 |
| **AZ-835 / AZ-840** — full `(tlog, video, calibration)` orchestrator | `test_az835_e2e_real_flight.py` | Tier-2 only; closes the real-flight validation loop end-to-end (extract → seed → FAISS → run → verdict) |
The cycle-4 replay-input redesign (AZ-894 / AZ-895 / AZ-896) replaces
the tlog auto-sync surface with a CSV-driven path; the AZ-265 suite is
the regression net that catches drift in the legacy path during the
deprecation window. See `replay_protocol.md` Invariants 12-14 for the
authoritative contract.
## How to run
### AZ-404 Derkachi 60 s suite (Tier-1 + Tier-2)
```bash
# In a fresh venv with the package installed:
RUN_REPLAY_E2E=1 pytest tests/e2e/replay/test_derkachi_1min.py -v
```
Without `RUN_REPLAY_E2E=1` the heavy tests skip cleanly. The two
unconditional tests (AC-4a mode-agnosticism scan + AC-7 skip-gate
self-check + the helpers in `test_helpers.py`) still run.
### AZ-835 orchestrator test — full `(tlog, video, calibration)` loop (Tier-2 only)
Closes Epic AZ-835's narrative: given a real-flight `.tlog` + the
matching nadir video + camera calibration, the orchestrator runs the
7-step pipeline end-to-end and writes a verdict report.
**Required inputs** (already in-repo for the Derkachi reference fixture):
- `.tlog` — pymavlink binary log from a real flight. Reference fixture:
`_docs/00_problem/input_data/flight_derkachi/data_imu.csv` (the canonical
CSV that `_tlog_synth.py` reconstructs the tlog from) plus the synthesised
tlog the conftest emits at session start.
- Nadir video — `_docs/00_problem/input_data/flight_derkachi/*.mp4` (large
asset; not always checked in to the workstation clone — pull from the
Jetson e2e harness or git LFS if absent).
- Calibration — `tests/fixtures/calibration/adti26.json` (factory-sheet
approximation for the Topotek KHP20S30; real intrinsics still TBD).
**Tier-2 invocation** (Jetson):
```bash
ssh jetson-e2e
cd /workspace/gps-denied-onboard
export RUN_REPLAY_E2E=1
export GPS_DENIED_OPERATOR_CONFIG_PATH=/workspace/configs/operator_replay.yaml
pytest tests/e2e/replay/test_az835_e2e_real_flight.py -v --tb=short -m tier2
```
The bundled local-development entry point is `scripts/run-tests-jetson.sh`,
which handles the SSH alias + rsync + remote pytest invocation. See
`_docs/02_document/tests/tier2-jetson-testing.md` for the harness contract.
**Skip gates (in evaluation order)**:
1. `@pytest.mark.tier2` — the per-suite Tier-2 plugin gates this off on dev
macOS / Tier-1 Docker (matches the AZ-839 / AZ-699 contract).
2. `RUN_REPLAY_E2E` not in `{1, true, yes, on}`.
3. `gps-denied-replay` console-script not on `PATH`.
4. Real Derkachi video missing or placeholder-sized.
5. `operator_pre_flight_setup` fixture itself skipped — the downstream
consumer inherits the SKIP automatically (pytest's fixture-skip
propagation).
**Expected runtime on Tier-2 Jetson AGX Orin** (cold cache): ≤ 8 min
end-to-end (≤ 5 min C3 fixture cold-start budget + ≤ 3 min for the
replay + verdict compute). Warm-cache reinvocations within the same
compose session: ≤ 60 s.
**Verdict report location**: `_docs/06_metrics/real_flight_validation_<YYYY-MM-DD>.md`.
The report is emitted ALWAYS, regardless of PASS / FAIL on the AZ-696
AC-3 threshold (≥ 80 % of emissions within 100 m of tlog ground truth).
The success criterion at the fixture level is "honest report exists with
distribution data", not "PASS". The PASS / FAIL line of the report itself
is the operator-facing answer to "did this flight clip localise within
the threshold".
### Imagery source license attribution (dev/research use only)
The Jetson e2e harness's `satellite-provider` instance downloads tiles
from the **Google Maps satellite layer** (`mt0..mt3.google.com/vt/lyrs=s`),
governed by Google Maps Platform Terms of Service. Every tile served by
the harness carries the **"Imagery © Google"** attribution string.
**This is dev/research use only.** Production deployment of the
gps-denied-onboard companion against a Google-Maps-sourced
`satellite-provider` requires either a Google Maps Platform licensing
review or migration to a true CC-BY satellite source on the parent-suite
side (parent-suite ticket TBD; see `_docs/02_document/architecture.md`
§ `satellite-provider` integration). The onboard-side seed scripts
(`tests/fixtures/derkachi_c6/seed_region.py`, `seed_route.py`) propagate
the attribution into the test fixture's metadata; do not remove it.
## Fixture state
| Artifact | Status | Source |
|----------|--------|--------|
| `flight_derkachi.mp4` | available | `_docs/00_problem/input_data/flight_derkachi/` |
| `data_imu.csv` | available | same dir; 4900 rows at 10 Hz over 489.9 s |
| Synthetic tlog | generated at fixture time | `_tlog_synth.py` reproduces a `pymavlink` `.tlog` from the CSV (the original tlog is not in-repo; the CSV was its export) |
| Camera calibration | placeholder (`tests/fixtures/calibration/adti26.json`) | The real Topotek KHP20S30 intrinsics are unknown per `camera_info.md`. AC-3 is `xfail`ed until a real calibration ships. |
| Operator pre-flight rehearsal | blocked | `tests/fixtures/mock-suite-sat-service/` is a bootstrap stub (only `GET /healthz`); AC-8 skips until the full D-PROJ-2 contract lands. |
## Clip range
The first 60 s of the Derkachi flight (Time=0.0 → Time=60.0). The
take-off region exercises the AZ-405 IMU-take-off auto-sync detector;
the cruise region that follows stresses the satellite-anchor + VIO
drift-correction path. To change the trim, edit `_CLIP_START_S` and
`_CLIP_END_S` in `conftest.py`.
## Expected runtime (Tier-1)
| Test | Expected wall clock |
|------|---------------------|
| AC-1 (`--pace asap`) | ≤ 30 s |
| AC-2 schema match | piggybacks on AC-1 |
| AC-5 determinism | 2 × asap runs (≤ 60 s total) |
| AC-6 realtime | 60 s ± 3 s |
| AC-6 asap | ≤ 30 s |
| Total suite | ≤ 6 min on Jetson AGX Orin |
The AC-1 / AC-2 / AC-5 tests share `--pace asap` runs but each
fixture invocation produces a fresh output file, so they do not
short-circuit each other (preserves AC-5's two-runs-diff guarantee).
## AC matrix
| AC | Test | State |
|----|------|-------|
| AC-1: exit 0 + JSONL count match | `test_ac1_exits_0_jsonl_count_match` | runs on Tier-1 |
| AC-2: JSONL schema match | `test_ac2_jsonl_schema_match` | runs on Tier-1 |
| AC-3: ≤ 100 m for 80 % of ticks | `test_ac3_within_100m_80pct_of_ticks` | `xfail` (waiting on real calibration) |
| AC-4a: mode-agnosticism AST scan | `test_ac4_mode_agnosticism_ast_scan` | unconditional |
| AC-4b: encoder byte-equality | `test_ac4_encoder_byte_equality` | `skip` (waiting on AZ-558) |
| AC-5: determinism | `test_ac5_determinism_two_runs_diff` | runs on Tier-1 |
| AC-6a: realtime 60 s ± 5 % | `test_ac6_pace_realtime_60s_within_5pct` | runs on Tier-1 |
| AC-6b: asap ≤ 30 s | `test_ac6_pace_asap_under_30s` | runs on Tier-1 |
| AC-7: skip-gate self-check | `test_ac7_skip_gate_consistent_with_env_var` | unconditional |
| AC-8: operator workflow rehearsal | `test_ac8_operator_workflow` | `skip` (waiting on D-PROJ-2 mock) |
| AC-9: helper L2 correctness | `test_helpers.py::test_ac9_l2_*` | unconditional |
| AC-10: README accuracy | this file | live |
## Failure-mode cookbook
| Symptom | Likely cause | Fix |
|---------|--------------|-----|
| `gps-denied-replay console-script not on PATH` | package not installed in the test venv | `pip install -e .` |
| AC-1 line count off by > 5 % | tlog synthesizer drifted from the CSV | regenerate by re-running the test (synthesizer is deterministic; non-determinism would be a real bug) |
| AC-3 fails at ~ 0 % even with calibration | wrong intrinsics OR wrong WGS84 ground truth source — verify the GLOBAL_POSITION_INT columns are still the AC-3 reference (per `flight_derkachi/README.md`) | re-derive ground truth |
| AC-5 determinism violated | non-deterministic float ordering in C5 estimator OR a clock leaked into the runtime | bisect via `git log` against the C5 / `clock` modules |
| AC-6 realtime drifts on shared CI | shared-runner contention; the spec allows widening to ± 5 s | adjust `_HEAVY_SKIP` boundary if it persists |
| `tlog missing required messages` | `_tlog_synth.py` lost a message group | check `_REQUIRED_MESSAGE_GROUPS` in `tlog_replay_adapter.py` against the synth output |
## Files
```
tests/e2e/replay/
├── README.md ← this file
├── __init__.py ← package marker + module-level docstring
├── _helpers.py ← parse_jsonl, l2_horizontal_m, match_percentage,
│ CapturingMavlinkTransport, GroundTruthRow
├── _tlog_synth.py ← CSV → tlog generator
├── conftest.py ← derkachi_replay_inputs, replay_runner,
│ operator_pre_flight_setup fixtures
├── test_helpers.py ← unit tests for _helpers (unconditional)
└── test_derkachi_1min.py ← AC-1..AC-8 + AC-7 skip gate + AC-4a AST scan
```
## Follow-up work
* **Real Topotek KHP20S30 calibration** — unblocks AC-3.
* **AZ-558** — closes AC-4b (route C8 encoders through `MavlinkTransport`).
* **D-PROJ-2 mock-suite-sat-service** — unblocks AC-8 (operator
workflow rehearsal).
## Epic AZ-835 ticket map
The Tier-2 orchestrator path shipped under Epic
[AZ-835](https://denyspopov.atlassian.net/browse/AZ-835). Sub-tickets:
| Ticket | Role |
|--------|------|
| [AZ-836](https://denyspopov.atlassian.net/browse/AZ-836) | `TlogRouteExtractor` — active-segment trim + Douglas-Peucker coarsen tlog GPS to ≤ N waypoints |
| [AZ-838](https://denyspopov.atlassian.net/browse/AZ-838) | `SatelliteProviderRouteClient` + `seed_route.py` CLI — POST RouteSpec to satellite-provider, poll `mapsReady` |
| [AZ-839](https://denyspopov.atlassian.net/browse/AZ-839) | C3 `operator_pre_flight_setup` real fixture — wires C1+C2+C11+C10 against the seeded catalog |
| [AZ-840](https://denyspopov.atlassian.net/browse/AZ-840) | C4 E2E orchestrator test — drives the full 7-step pipeline from `(tlog, video, calibration)` |
| [AZ-842](https://denyspopov.atlassian.net/browse/AZ-842) | C6 Docs — `replay_protocol.md` Invariants 12-14 + `architecture.md` + this README (cycle-4 rescope) |
The cycle-4 replay-input redesign tickets ride alongside the Epic:
| Ticket | Role |
|--------|------|
| [AZ-894](https://denyspopov.atlassian.net/browse/AZ-894) | `CsvReplayInputAdapter` — new CSV-driven primary path on the single canonical clock |
| [AZ-895](https://denyspopov.atlassian.net/browse/AZ-895) | Auto-sync surface deprecation — tlog adapter reduced to audit-only role |
| [AZ-896](https://denyspopov.atlassian.net/browse/AZ-896) | CSV format spec (`csv_replay_format.md`) + example `data_imu.csv` |
| [AZ-897](https://denyspopov.atlassian.net/browse/AZ-897) | Operator-facing UI (React + Tailwind paired-upload form) — cycle 5+ |