mirror of
https://github.com/azaion/gps-denied-onboard.git
synced 2026-06-22 14:31:12 +00:00
1f634c2604
ci/woodpecker/push/02-build-push Pipeline failed
- Modified the autodev state to reflect the current testing phase and details of the new `jetson-e2e` tests. - Enhanced the "How to Test" documentation to provide clearer instructions on the demo replay validation process, including video and tlog alignment steps. - Updated architectural documentation to include the new demo replay operator flow and its dependencies. - Documented the removal of deprecated auto-sync features and clarified the operator-facing UI for replay validation. - Added new entries in the dependencies table for upcoming tasks related to the demo replay flow. These changes improve clarity and usability for operators and developers working with the demo replay system.
220 lines
12 KiB
Markdown
220 lines
12 KiB
Markdown
# E2E replay tests (AZ-404 + AZ-835 + cycle-4)
|
||
|
||
End-to-end regression suite for the `gps-denied-replay` console-script
|
||
(AZ-402). Two distinct entry points live here:
|
||
|
||
| Entry point | Source | Coverage |
|
||
|-------------|--------|----------|
|
||
| **AZ-265 / AZ-404** — 60 s Derkachi clip with synthetic tlog | `test_derkachi_1min.py` | Original AC-1..AC-10 of the replay epic; runs on Tier-1 + Tier-2 |
|
||
| **AZ-835 / AZ-840** — full `(tlog, video, calibration)` orchestrator | `test_az835_e2e_real_flight.py` | Tier-2 only; closes the real-flight validation loop end-to-end (extract → seed → FAISS → run → verdict) |
|
||
|
||
The cycle-4 replay-input redesign (AZ-894 / AZ-895 / AZ-896) replaces
|
||
the tlog auto-sync surface with a CSV-driven path; the AZ-265 suite is
|
||
the regression net that catches drift in the legacy path during the
|
||
deprecation window. See `replay_protocol.md` Invariants 12-14 for the
|
||
authoritative contract.
|
||
|
||
## How to run
|
||
|
||
### AZ-404 Derkachi 60 s suite (Tier-1 + Tier-2)
|
||
|
||
```bash
|
||
# In a fresh venv with the package installed:
|
||
RUN_REPLAY_E2E=1 pytest tests/e2e/replay/test_derkachi_1min.py -v
|
||
```
|
||
|
||
Without `RUN_REPLAY_E2E=1` the heavy tests skip cleanly. The two
|
||
unconditional tests (AC-4a mode-agnosticism scan + AC-7 skip-gate
|
||
self-check + the helpers in `test_helpers.py`) still run.
|
||
|
||
### AZ-835 orchestrator test — full `(tlog, video, calibration)` loop (Tier-2 only)
|
||
|
||
Closes Epic AZ-835's narrative: given a real-flight `.tlog` + the
|
||
matching nadir video + camera calibration, the orchestrator runs the
|
||
7-step pipeline end-to-end and writes a verdict report.
|
||
|
||
**Required inputs** (already in-repo for the Derkachi reference fixture):
|
||
|
||
- `.tlog` — pymavlink binary log from a real flight. Reference fixture:
|
||
`_docs/00_problem/input_data/flight_derkachi/data_imu.csv` (the canonical
|
||
CSV that `_tlog_synth.py` reconstructs the tlog from) plus the synthesised
|
||
tlog the conftest emits at session start.
|
||
- Nadir video — `_docs/00_problem/input_data/flight_derkachi/*.mp4` (large
|
||
asset; not always checked in to the workstation clone — pull from the
|
||
Jetson e2e harness or git LFS if absent).
|
||
- Calibration — `tests/fixtures/calibration/adti26.json` (factory-sheet
|
||
approximation for the Topotek KHP20S30; real intrinsics still TBD).
|
||
|
||
**Tier-2 invocation** (Jetson):
|
||
|
||
```bash
|
||
ssh jetson-e2e
|
||
cd /workspace/gps-denied-onboard
|
||
export RUN_REPLAY_E2E=1
|
||
pytest tests/e2e/replay/test_az835_e2e_real_flight.py -v --tb=short -m tier2
|
||
```
|
||
|
||
AZ-962: `docker-compose.test.jetson.yml` exports
|
||
`GPS_DENIED_OPERATOR_CONFIG_PATH=/opt/configs/operator_replay.yaml`
|
||
automatically and bind-mounts `./configs:/opt/configs:ro`, so no
|
||
manual env-var export is required when running through
|
||
`scripts/run-tests-jetson.sh`. The YAML at `configs/operator_replay.yaml`
|
||
declares the four blocks the fixture requires (c6 / c7 / c10 / c11);
|
||
secrets (`SATELLITE_PROVIDER_API_KEY`) flow in from `.env.test` via
|
||
the loader's `ENV_KEY_MAP`. `c10_provisioning.backbones` is
|
||
intentionally empty pending AZ-964 (the orchestrator test will
|
||
SKIP at the "no backbones" gate until AZ-964 lands).
|
||
|
||
The bundled local-development entry point is `scripts/run-tests-jetson.sh`,
|
||
which handles the SSH alias + rsync + remote pytest invocation. See
|
||
`_docs/02_document/tests/tier2-jetson-testing.md` for the harness contract.
|
||
|
||
**Skip gates (in evaluation order)**:
|
||
|
||
1. `@pytest.mark.tier2` — the per-suite Tier-2 plugin gates this off on dev
|
||
macOS / Tier-1 Docker (matches the AZ-839 / AZ-699 contract).
|
||
2. `RUN_REPLAY_E2E` not in `{1, true, yes, on}`.
|
||
3. `gps-denied-replay` console-script not on `PATH`.
|
||
4. Real Derkachi video missing or placeholder-sized.
|
||
5. `operator_pre_flight_setup` fixture itself skipped — the downstream
|
||
consumer inherits the SKIP automatically (pytest's fixture-skip
|
||
propagation).
|
||
|
||
**Expected runtime on Tier-2 Jetson AGX Orin** (cold cache): ≤ 8 min
|
||
end-to-end (≤ 5 min C3 fixture cold-start budget + ≤ 3 min for the
|
||
replay + verdict compute). Warm-cache reinvocations within the same
|
||
compose session: ≤ 60 s.
|
||
|
||
**Verdict report location**: `_docs/06_metrics/real_flight_validation_<YYYY-MM-DD>.md`.
|
||
The report is emitted ALWAYS, regardless of PASS / FAIL on the AZ-696
|
||
AC-3 threshold (≥ 80 % of emissions within 100 m of tlog ground truth).
|
||
The success criterion at the fixture level is "honest report exists with
|
||
distribution data", not "PASS". The PASS / FAIL line of the report itself
|
||
is the operator-facing answer to "did this flight clip localise within
|
||
the threshold".
|
||
|
||
### Imagery source license attribution (dev/research use only)
|
||
|
||
The Jetson e2e harness's `satellite-provider` instance downloads tiles
|
||
from the **Google Maps satellite layer** (`mt0..mt3.google.com/vt/lyrs=s`),
|
||
governed by Google Maps Platform Terms of Service. Every tile served by
|
||
the harness carries the **"Imagery © Google"** attribution string.
|
||
|
||
**This is dev/research use only.** Production deployment of the
|
||
gps-denied-onboard companion against a Google-Maps-sourced
|
||
`satellite-provider` requires either a Google Maps Platform licensing
|
||
review or migration to a true CC-BY satellite source on the parent-suite
|
||
side (parent-suite ticket TBD; see `_docs/02_document/architecture.md`
|
||
§ `satellite-provider` integration). The onboard-side seed scripts
|
||
(`tests/fixtures/derkachi_c6/seed_region.py`, `seed_route.py`) propagate
|
||
the attribution into the test fixture's metadata; do not remove it.
|
||
|
||
## Fixture state
|
||
|
||
| Artifact | Status | Source |
|
||
|----------|--------|--------|
|
||
| `flight_derkachi.mp4` | available | `_docs/00_problem/input_data/flight_derkachi/` |
|
||
| `data_imu.csv` | available | same dir; 4900 rows at 10 Hz over 489.9 s |
|
||
| Synthetic tlog | generated at fixture time | `_tlog_synth.py` reproduces a `pymavlink` `.tlog` from the CSV (the original tlog is not in-repo; the CSV was its export) |
|
||
| Camera calibration | placeholder (`tests/fixtures/calibration/adti26.json`) | The real Topotek KHP20S30 intrinsics are unknown per `camera_info.md`. AC-3 accuracy depends on this. |
|
||
| Operator pre-flight rehearsal | blocked | `tests/fixtures/mock-suite-sat-service/` is a bootstrap stub (only `GET /healthz`); AC-8 skips until the full D-PROJ-2 contract lands. |
|
||
|
||
## Clip range
|
||
|
||
The first 60 s of the Derkachi flight (Time=0.0 → Time=60.0). The
|
||
take-off region exercises the AZ-405 IMU-take-off auto-sync detector;
|
||
the cruise region that follows stresses the satellite-anchor + VIO
|
||
drift-correction path. To change the trim, edit `_CLIP_START_S` and
|
||
`_CLIP_END_S` in `conftest.py`.
|
||
|
||
## Expected runtime (Tier-1)
|
||
|
||
| Test | Expected wall clock |
|
||
|------|---------------------|
|
||
| AC-1 (`--pace asap`) | ≤ 30 s (Tier-2 only) |
|
||
| AC-2 schema match | piggybacks on AC-1 (Tier-2 only) |
|
||
| AC-5 determinism | 2 × asap runs (≤ 60 s total; Tier-2 only) |
|
||
| AC-6 realtime | 60 s ± 3 s (Tier-2 only) |
|
||
| AC-6 asap | ≤ 30 s (Tier-2 only) |
|
||
| Total suite | ~6 min wall clock on Tier-2; skips on Mac |
|
||
|
||
The AC-1 / AC-2 / AC-5 tests share `--pace asap` runs but each
|
||
fixture invocation produces a fresh output file, so they do not
|
||
short-circuit each other (preserves AC-5's two-runs-diff guarantee).
|
||
|
||
## AC matrix
|
||
|
||
| AC | Test | State |
|
||
|----|------|-------|
|
||
| AC-1: exit 0 + JSONL count match | `test_ac1_exits_0_jsonl_count_match` | `xfail` (AZ-963 — open-loop ESKF) |
|
||
| AC-2: JSONL schema match | `test_ac2_jsonl_schema_match` | Tier-2 (Jetson only) |
|
||
| AC-3: ≤ 100 m for 80 % of ticks | `test_ac3_within_100m_80pct_of_ticks` | `xfail` (AZ-963 — open-loop ESKF) |
|
||
| AC-4a: mode-agnosticism AST scan | `test_ac4_mode_agnosticism_ast_scan` | unconditional |
|
||
| AC-4b: encoder byte-equality | `test_ac4_encoder_byte_equality` | `skip` (waiting on AZ-558) |
|
||
| AC-5: determinism | `test_ac5_determinism_two_runs_diff` | `xfail` (AZ-963 — open-loop ESKF) |
|
||
| AC-6a: realtime 60 s ± 5 % | `test_ac6_pace_realtime_60s_within_5pct` | `xfail` (AZ-963 — open-loop ESKF) |
|
||
| AC-6b: asap ≤ 30 s | `test_ac6_pace_asap_under_30s` | `xfail` (AZ-963 — open-loop ESKF) |
|
||
| AC-7: skip-gate self-check | `test_ac7_skip_gate_consistent_with_env_var` | unconditional |
|
||
| AC-8: operator workflow rehearsal | `test_ac8_operator_workflow` | `skip` (waiting on D-PROJ-2 mock) |
|
||
| AC-9: helper L2 correctness | `test_helpers.py::test_ac9_l2_*` | unconditional |
|
||
| AC-10: README accuracy | this file | live |
|
||
|
||
## Failure-mode cookbook
|
||
|
||
| Symptom | Likely cause | Fix |
|
||
|---------|--------------|-----|
|
||
| `gps-denied-replay console-script not on PATH` | package not installed in the test venv | `pip install -e .` |
|
||
| AC-1 line count off by > 5 % | tlog synthesizer drifted from the CSV | regenerate by re-running the test (synthesizer is deterministic; non-determinism would be a real bug) |
|
||
| AC-3 fails at ~ 0 % even with calibration | wrong intrinsics OR wrong WGS84 ground truth source — verify the GLOBAL_POSITION_INT columns are still the AC-3 reference (per `flight_derkachi/README.md`) | re-derive ground truth |
|
||
| AC-5 determinism violated | non-deterministic float ordering in C5 estimator OR a clock leaked into the runtime | bisect via `git log` against the C5 / `clock` modules |
|
||
| AC-6 realtime drifts on shared CI | shared-runner contention; the spec allows widening to ± 5 s | adjust `_HEAVY_SKIP` boundary if it persists |
|
||
| `tlog missing required messages` | `_tlog_synth.py` lost a message group | check `_REQUIRED_MESSAGE_GROUPS` in `tlog_replay_adapter.py` against the synth output |
|
||
|
||
## Files
|
||
|
||
```
|
||
tests/e2e/replay/
|
||
├── README.md ← this file
|
||
├── __init__.py ← package marker + module-level docstring
|
||
├── _helpers.py ← parse_jsonl, l2_horizontal_m, match_percentage,
|
||
│ CapturingMavlinkTransport, GroundTruthRow
|
||
├── _tlog_synth.py ← CSV → tlog generator
|
||
├── conftest.py ← derkachi_replay_inputs, replay_runner,
|
||
│ operator_pre_flight_setup fixtures
|
||
├── test_helpers.py ← unit tests for _helpers (unconditional)
|
||
└── test_derkachi_1min.py ← AC-1..AC-8 + AC-7 skip gate + AC-4a AST scan
|
||
```
|
||
|
||
## Follow-up work
|
||
|
||
* **AZ-963** — five Derkachi ACs (`AC-1`, `AC-3`, `AC-5`, `AC-6a`, `AC-6b`)
|
||
are `xfail` until a reference C6 tile cache exists (resolution path:
|
||
AZ-777 / AZ-974).
|
||
* **Real Topotek KHP20S30 calibration** — needed for AC-3 accuracy even
|
||
after AZ-777 lands (the threshold is ≤100 m for 80 % of ticks).
|
||
* **AZ-558** — closes AC-4b (route C8 encoders through `MavlinkTransport`).
|
||
* **D-PROJ-2 mock-suite-sat-service** — unblocks AC-8 (operator
|
||
workflow rehearsal).
|
||
|
||
## Epic AZ-835 ticket map
|
||
|
||
The Tier-2 orchestrator path shipped under Epic
|
||
[AZ-835](https://denyspopov.atlassian.net/browse/AZ-835). Sub-tickets:
|
||
|
||
| Ticket | Role |
|
||
|--------|------|
|
||
| [AZ-836](https://denyspopov.atlassian.net/browse/AZ-836) | `TlogRouteExtractor` — active-segment trim + Douglas-Peucker coarsen tlog GPS to ≤ N waypoints |
|
||
| [AZ-838](https://denyspopov.atlassian.net/browse/AZ-838) | `SatelliteProviderRouteClient` + `seed_route.py` CLI — POST RouteSpec to satellite-provider, poll `mapsReady` |
|
||
| [AZ-839](https://denyspopov.atlassian.net/browse/AZ-839) | C3 `operator_pre_flight_setup` real fixture — wires C1+C2+C11+C10 against the seeded catalog |
|
||
| [AZ-840](https://denyspopov.atlassian.net/browse/AZ-840) | C4 E2E orchestrator test — drives the full 7-step pipeline from `(tlog, video, calibration)` |
|
||
| [AZ-842](https://denyspopov.atlassian.net/browse/AZ-842) | C6 Docs — `replay_protocol.md` Invariants 12-14 + `architecture.md` + this README (cycle-4 rescope) |
|
||
|
||
The cycle-4 replay-input redesign tickets ride alongside the Epic:
|
||
|
||
| Ticket | Role |
|
||
|--------|------|
|
||
| [AZ-894](https://denyspopov.atlassian.net/browse/AZ-894) | `CsvReplayInputAdapter` — new CSV-driven primary path on the single canonical clock |
|
||
| [AZ-895](https://denyspopov.atlassian.net/browse/AZ-895) | Auto-sync surface deprecation — tlog adapter reduced to audit-only role |
|
||
| [AZ-896](https://denyspopov.atlassian.net/browse/AZ-896) | CSV format spec (`csv_replay_format.md`) + example `data_imu.csv` |
|
||
| [AZ-897](https://denyspopov.atlassian.net/browse/AZ-897) | Operator-facing UI (React + Tailwind paired-upload form) — cycle 5+ |
|