mirror of
https://github.com/azaion/gps-denied-onboard.git
synced 2026-06-22 05:41:13 +00:00
763d8b21ad
AZ-962 SHIPPED — Tier-2 Jetson AZ-840 orchestrator test no longer SKIPs at the env-var gate. configs/operator_replay.yaml registers c6/c7/c10/c11 with sane defaults (backbones intentionally empty, see AZ-965); docker-compose.test.jetson.yml exports GPS_DENIED_OPERATOR_CONFIG_PATH=/opt/configs/operator_replay.yaml and bind-mounts ./configs:/opt/configs:ro. ENV_KEY_MAP gains SATELLITE_PROVIDER_URL → c11_tile_manager.satellite_provider_url and SATELLITE_PROVIDER_API_KEY → c11_tile_manager.service_api_key so secrets flow from .env.test and never sit in YAML. README drops the manual export step. 97/97 c11 + config unit tests stay green. Tier-2 re-run (4 failed / 48 passed / 1 skipped / 1 xfailed / 1 xpassed / 2 errors in 84.99s vs baseline 3 skipped — i.e. -2 skipped, +2 errors): AZ-840 orchestrator test moves from SKIP to ERROR with a deeper, real gate — IndexUnavailableError on FaissDescriptorIndex against a fresh c6_tile_cache.root_dir. AZ-964 (3 SP, todo/) filed for FAISS index bootstrap in the AZ-839 C3 fixture. AZ-965 (3 SP, todo/, blocked by AZ-964) filed for NetVLAD ONNX backbone provisioning — the next gate the orchestrator test will hit once FAISS clears. Cycle-4 e2e gate remains NOT GREEN: AZ-840 chain is now AZ-964 → AZ-965 → PASS; 60s smoke chain is AZ-963 → PASS. OKVIS2 deferral directive (2026-05-29) unchanged — still gated behind Derkachi e2e green, still NOT MET. Co-authored-by: Cursor <cursoragent@cursor.com>
216 lines
12 KiB
Markdown
216 lines
12 KiB
Markdown
# E2E replay tests (AZ-404 + AZ-835 + cycle-4)
|
||
|
||
End-to-end regression suite for the `gps-denied-replay` console-script
|
||
(AZ-402). Two distinct entry points live here:
|
||
|
||
| Entry point | Source | Coverage |
|
||
|-------------|--------|----------|
|
||
| **AZ-265 / AZ-404** — 60 s Derkachi clip with synthetic tlog | `test_derkachi_1min.py` | Original AC-1..AC-10 of the replay epic; runs on Tier-1 + Tier-2 |
|
||
| **AZ-835 / AZ-840** — full `(tlog, video, calibration)` orchestrator | `test_az835_e2e_real_flight.py` | Tier-2 only; closes the real-flight validation loop end-to-end (extract → seed → FAISS → run → verdict) |
|
||
|
||
The cycle-4 replay-input redesign (AZ-894 / AZ-895 / AZ-896) replaces
|
||
the tlog auto-sync surface with a CSV-driven path; the AZ-265 suite is
|
||
the regression net that catches drift in the legacy path during the
|
||
deprecation window. See `replay_protocol.md` Invariants 12-14 for the
|
||
authoritative contract.
|
||
|
||
## How to run
|
||
|
||
### AZ-404 Derkachi 60 s suite (Tier-1 + Tier-2)
|
||
|
||
```bash
|
||
# In a fresh venv with the package installed:
|
||
RUN_REPLAY_E2E=1 pytest tests/e2e/replay/test_derkachi_1min.py -v
|
||
```
|
||
|
||
Without `RUN_REPLAY_E2E=1` the heavy tests skip cleanly. The two
|
||
unconditional tests (AC-4a mode-agnosticism scan + AC-7 skip-gate
|
||
self-check + the helpers in `test_helpers.py`) still run.
|
||
|
||
### AZ-835 orchestrator test — full `(tlog, video, calibration)` loop (Tier-2 only)
|
||
|
||
Closes Epic AZ-835's narrative: given a real-flight `.tlog` + the
|
||
matching nadir video + camera calibration, the orchestrator runs the
|
||
7-step pipeline end-to-end and writes a verdict report.
|
||
|
||
**Required inputs** (already in-repo for the Derkachi reference fixture):
|
||
|
||
- `.tlog` — pymavlink binary log from a real flight. Reference fixture:
|
||
`_docs/00_problem/input_data/flight_derkachi/data_imu.csv` (the canonical
|
||
CSV that `_tlog_synth.py` reconstructs the tlog from) plus the synthesised
|
||
tlog the conftest emits at session start.
|
||
- Nadir video — `_docs/00_problem/input_data/flight_derkachi/*.mp4` (large
|
||
asset; not always checked in to the workstation clone — pull from the
|
||
Jetson e2e harness or git LFS if absent).
|
||
- Calibration — `tests/fixtures/calibration/adti26.json` (factory-sheet
|
||
approximation for the Topotek KHP20S30; real intrinsics still TBD).
|
||
|
||
**Tier-2 invocation** (Jetson):
|
||
|
||
```bash
|
||
ssh jetson-e2e
|
||
cd /workspace/gps-denied-onboard
|
||
export RUN_REPLAY_E2E=1
|
||
pytest tests/e2e/replay/test_az835_e2e_real_flight.py -v --tb=short -m tier2
|
||
```
|
||
|
||
AZ-962: `docker-compose.test.jetson.yml` exports
|
||
`GPS_DENIED_OPERATOR_CONFIG_PATH=/opt/configs/operator_replay.yaml`
|
||
automatically and bind-mounts `./configs:/opt/configs:ro`, so no
|
||
manual env-var export is required when running through
|
||
`scripts/run-tests-jetson.sh`. The YAML at `configs/operator_replay.yaml`
|
||
declares the four blocks the fixture requires (c6 / c7 / c10 / c11);
|
||
secrets (`SATELLITE_PROVIDER_API_KEY`) flow in from `.env.test` via
|
||
the loader's `ENV_KEY_MAP`. `c10_provisioning.backbones` is
|
||
intentionally empty pending AZ-964 (the orchestrator test will
|
||
SKIP at the "no backbones" gate until AZ-964 lands).
|
||
|
||
The bundled local-development entry point is `scripts/run-tests-jetson.sh`,
|
||
which handles the SSH alias + rsync + remote pytest invocation. See
|
||
`_docs/02_document/tests/tier2-jetson-testing.md` for the harness contract.
|
||
|
||
**Skip gates (in evaluation order)**:
|
||
|
||
1. `@pytest.mark.tier2` — the per-suite Tier-2 plugin gates this off on dev
|
||
macOS / Tier-1 Docker (matches the AZ-839 / AZ-699 contract).
|
||
2. `RUN_REPLAY_E2E` not in `{1, true, yes, on}`.
|
||
3. `gps-denied-replay` console-script not on `PATH`.
|
||
4. Real Derkachi video missing or placeholder-sized.
|
||
5. `operator_pre_flight_setup` fixture itself skipped — the downstream
|
||
consumer inherits the SKIP automatically (pytest's fixture-skip
|
||
propagation).
|
||
|
||
**Expected runtime on Tier-2 Jetson AGX Orin** (cold cache): ≤ 8 min
|
||
end-to-end (≤ 5 min C3 fixture cold-start budget + ≤ 3 min for the
|
||
replay + verdict compute). Warm-cache reinvocations within the same
|
||
compose session: ≤ 60 s.
|
||
|
||
**Verdict report location**: `_docs/06_metrics/real_flight_validation_<YYYY-MM-DD>.md`.
|
||
The report is emitted ALWAYS, regardless of PASS / FAIL on the AZ-696
|
||
AC-3 threshold (≥ 80 % of emissions within 100 m of tlog ground truth).
|
||
The success criterion at the fixture level is "honest report exists with
|
||
distribution data", not "PASS". The PASS / FAIL line of the report itself
|
||
is the operator-facing answer to "did this flight clip localise within
|
||
the threshold".
|
||
|
||
### Imagery source license attribution (dev/research use only)
|
||
|
||
The Jetson e2e harness's `satellite-provider` instance downloads tiles
|
||
from the **Google Maps satellite layer** (`mt0..mt3.google.com/vt/lyrs=s`),
|
||
governed by Google Maps Platform Terms of Service. Every tile served by
|
||
the harness carries the **"Imagery © Google"** attribution string.
|
||
|
||
**This is dev/research use only.** Production deployment of the
|
||
gps-denied-onboard companion against a Google-Maps-sourced
|
||
`satellite-provider` requires either a Google Maps Platform licensing
|
||
review or migration to a true CC-BY satellite source on the parent-suite
|
||
side (parent-suite ticket TBD; see `_docs/02_document/architecture.md`
|
||
§ `satellite-provider` integration). The onboard-side seed scripts
|
||
(`tests/fixtures/derkachi_c6/seed_region.py`, `seed_route.py`) propagate
|
||
the attribution into the test fixture's metadata; do not remove it.
|
||
|
||
## Fixture state
|
||
|
||
| Artifact | Status | Source |
|
||
|----------|--------|--------|
|
||
| `flight_derkachi.mp4` | available | `_docs/00_problem/input_data/flight_derkachi/` |
|
||
| `data_imu.csv` | available | same dir; 4900 rows at 10 Hz over 489.9 s |
|
||
| Synthetic tlog | generated at fixture time | `_tlog_synth.py` reproduces a `pymavlink` `.tlog` from the CSV (the original tlog is not in-repo; the CSV was its export) |
|
||
| Camera calibration | placeholder (`tests/fixtures/calibration/adti26.json`) | The real Topotek KHP20S30 intrinsics are unknown per `camera_info.md`. AC-3 is `xfail`ed until a real calibration ships. |
|
||
| Operator pre-flight rehearsal | blocked | `tests/fixtures/mock-suite-sat-service/` is a bootstrap stub (only `GET /healthz`); AC-8 skips until the full D-PROJ-2 contract lands. |
|
||
|
||
## Clip range
|
||
|
||
The first 60 s of the Derkachi flight (Time=0.0 → Time=60.0). The
|
||
take-off region exercises the AZ-405 IMU-take-off auto-sync detector;
|
||
the cruise region that follows stresses the satellite-anchor + VIO
|
||
drift-correction path. To change the trim, edit `_CLIP_START_S` and
|
||
`_CLIP_END_S` in `conftest.py`.
|
||
|
||
## Expected runtime (Tier-1)
|
||
|
||
| Test | Expected wall clock |
|
||
|------|---------------------|
|
||
| AC-1 (`--pace asap`) | ≤ 30 s |
|
||
| AC-2 schema match | piggybacks on AC-1 |
|
||
| AC-5 determinism | 2 × asap runs (≤ 60 s total) |
|
||
| AC-6 realtime | 60 s ± 3 s |
|
||
| AC-6 asap | ≤ 30 s |
|
||
| Total suite | ≤ 6 min on Jetson AGX Orin |
|
||
|
||
The AC-1 / AC-2 / AC-5 tests share `--pace asap` runs but each
|
||
fixture invocation produces a fresh output file, so they do not
|
||
short-circuit each other (preserves AC-5's two-runs-diff guarantee).
|
||
|
||
## AC matrix
|
||
|
||
| AC | Test | State |
|
||
|----|------|-------|
|
||
| AC-1: exit 0 + JSONL count match | `test_ac1_exits_0_jsonl_count_match` | runs on Tier-1 |
|
||
| AC-2: JSONL schema match | `test_ac2_jsonl_schema_match` | runs on Tier-1 |
|
||
| AC-3: ≤ 100 m for 80 % of ticks | `test_ac3_within_100m_80pct_of_ticks` | `xfail` (waiting on real calibration) |
|
||
| AC-4a: mode-agnosticism AST scan | `test_ac4_mode_agnosticism_ast_scan` | unconditional |
|
||
| AC-4b: encoder byte-equality | `test_ac4_encoder_byte_equality` | `skip` (waiting on AZ-558) |
|
||
| AC-5: determinism | `test_ac5_determinism_two_runs_diff` | runs on Tier-1 |
|
||
| AC-6a: realtime 60 s ± 5 % | `test_ac6_pace_realtime_60s_within_5pct` | runs on Tier-1 |
|
||
| AC-6b: asap ≤ 30 s | `test_ac6_pace_asap_under_30s` | runs on Tier-1 |
|
||
| AC-7: skip-gate self-check | `test_ac7_skip_gate_consistent_with_env_var` | unconditional |
|
||
| AC-8: operator workflow rehearsal | `test_ac8_operator_workflow` | `skip` (waiting on D-PROJ-2 mock) |
|
||
| AC-9: helper L2 correctness | `test_helpers.py::test_ac9_l2_*` | unconditional |
|
||
| AC-10: README accuracy | this file | live |
|
||
|
||
## Failure-mode cookbook
|
||
|
||
| Symptom | Likely cause | Fix |
|
||
|---------|--------------|-----|
|
||
| `gps-denied-replay console-script not on PATH` | package not installed in the test venv | `pip install -e .` |
|
||
| AC-1 line count off by > 5 % | tlog synthesizer drifted from the CSV | regenerate by re-running the test (synthesizer is deterministic; non-determinism would be a real bug) |
|
||
| AC-3 fails at ~ 0 % even with calibration | wrong intrinsics OR wrong WGS84 ground truth source — verify the GLOBAL_POSITION_INT columns are still the AC-3 reference (per `flight_derkachi/README.md`) | re-derive ground truth |
|
||
| AC-5 determinism violated | non-deterministic float ordering in C5 estimator OR a clock leaked into the runtime | bisect via `git log` against the C5 / `clock` modules |
|
||
| AC-6 realtime drifts on shared CI | shared-runner contention; the spec allows widening to ± 5 s | adjust `_HEAVY_SKIP` boundary if it persists |
|
||
| `tlog missing required messages` | `_tlog_synth.py` lost a message group | check `_REQUIRED_MESSAGE_GROUPS` in `tlog_replay_adapter.py` against the synth output |
|
||
|
||
## Files
|
||
|
||
```
|
||
tests/e2e/replay/
|
||
├── README.md ← this file
|
||
├── __init__.py ← package marker + module-level docstring
|
||
├── _helpers.py ← parse_jsonl, l2_horizontal_m, match_percentage,
|
||
│ CapturingMavlinkTransport, GroundTruthRow
|
||
├── _tlog_synth.py ← CSV → tlog generator
|
||
├── conftest.py ← derkachi_replay_inputs, replay_runner,
|
||
│ operator_pre_flight_setup fixtures
|
||
├── test_helpers.py ← unit tests for _helpers (unconditional)
|
||
└── test_derkachi_1min.py ← AC-1..AC-8 + AC-7 skip gate + AC-4a AST scan
|
||
```
|
||
|
||
## Follow-up work
|
||
|
||
* **Real Topotek KHP20S30 calibration** — unblocks AC-3.
|
||
* **AZ-558** — closes AC-4b (route C8 encoders through `MavlinkTransport`).
|
||
* **D-PROJ-2 mock-suite-sat-service** — unblocks AC-8 (operator
|
||
workflow rehearsal).
|
||
|
||
## Epic AZ-835 ticket map
|
||
|
||
The Tier-2 orchestrator path shipped under Epic
|
||
[AZ-835](https://denyspopov.atlassian.net/browse/AZ-835). Sub-tickets:
|
||
|
||
| Ticket | Role |
|
||
|--------|------|
|
||
| [AZ-836](https://denyspopov.atlassian.net/browse/AZ-836) | `TlogRouteExtractor` — active-segment trim + Douglas-Peucker coarsen tlog GPS to ≤ N waypoints |
|
||
| [AZ-838](https://denyspopov.atlassian.net/browse/AZ-838) | `SatelliteProviderRouteClient` + `seed_route.py` CLI — POST RouteSpec to satellite-provider, poll `mapsReady` |
|
||
| [AZ-839](https://denyspopov.atlassian.net/browse/AZ-839) | C3 `operator_pre_flight_setup` real fixture — wires C1+C2+C11+C10 against the seeded catalog |
|
||
| [AZ-840](https://denyspopov.atlassian.net/browse/AZ-840) | C4 E2E orchestrator test — drives the full 7-step pipeline from `(tlog, video, calibration)` |
|
||
| [AZ-842](https://denyspopov.atlassian.net/browse/AZ-842) | C6 Docs — `replay_protocol.md` Invariants 12-14 + `architecture.md` + this README (cycle-4 rescope) |
|
||
|
||
The cycle-4 replay-input redesign tickets ride alongside the Epic:
|
||
|
||
| Ticket | Role |
|
||
|--------|------|
|
||
| [AZ-894](https://denyspopov.atlassian.net/browse/AZ-894) | `CsvReplayInputAdapter` — new CSV-driven primary path on the single canonical clock |
|
||
| [AZ-895](https://denyspopov.atlassian.net/browse/AZ-895) | Auto-sync surface deprecation — tlog adapter reduced to audit-only role |
|
||
| [AZ-896](https://denyspopov.atlassian.net/browse/AZ-896) | CSV format spec (`csv_replay_format.md`) + example `data_imu.csv` |
|
||
| [AZ-897](https://denyspopov.atlassian.net/browse/AZ-897) | Operator-facing UI (React + Tailwind paired-upload form) — cycle 5+ |
|