mirror of
https://github.com/azaion/gps-denied-onboard.git
synced 2026-06-22 16:11:13 +00:00
d7e6b0959e
Batch 63 of /autodev replay slice. Adds the AZ-404 E2E test harness against the Derkachi fixture and resolves the AZ-389 dependency phantom (closing AZ-559 Won't Fix). E2E test (AZ-404) - tests/e2e/replay/_tlog_synth.py: deterministic CSV->tlog generator (the original Derkachi tlog is not in repo; data_imu.csv is its export, so we round-trip the CSV through pymavlink). Verified: SCALED_IMU2 + ATTITUDE + GPS_RAW_INT + HEARTBEAT round-trip cleanly through mavutil.mavlink_connection. - tests/e2e/replay/_helpers.py: parse_jsonl, l2_horizontal_m (haversine), match_percentage, CapturingMavlinkTransport (ready for AZ-558 unblock), GroundTruthRow + load_ground_truth_csv. - tests/e2e/replay/conftest.py: derkachi_replay_inputs (session scope), replay_runner (subprocess fixture per AZ-402 CLI), operator_pre_flight_setup placeholder. - tests/e2e/replay/test_derkachi_1min.py: 9 tests covering AC-1..AC-8 with AC-7 skip-gate self-check + AC-4a mode-agnosticism AST scan (passes unconditionally, confirms ADR-011 holding). - tests/e2e/replay/test_helpers.py: 14 unit tests covering AC-9 helper L2 correctness + match_percentage + parse_jsonl + CapturingMavlinkTransport (all unconditional). - tests/e2e/replay/README.md: AC matrix, fixture state, runtime budget, failure cookbook (AC-10). AC matrix - AC-1, AC-2, AC-5, AC-6 implemented and Tier-1 gated on RUN_REPLAY_E2E=1. - AC-3 (<=100m for 80%) xfail until real Topotek KHP20S30 calibration ships (camera_info.md states intrinsics are unknown). - AC-4a (mode-agnosticism AST scan) PASSES unconditionally. - AC-4b (encoder byte-equality) skip until AZ-558 routes C8 bytes through MavlinkTransport. - AC-7 (skip-gate self-check) PASSES unconditionally. - AC-8 (operator workflow rehearsal) skip until D-PROJ-2 mock-suite-sat-service implements tile-fetch + index-build endpoints. - AC-9 (helper L2 correctness) 14 PASSES unconditionally. AZ-389 housekeeping - AZ-559 closed Won't Fix: investigation against c6_tile_cache/_types.py confirmed TileSource.ONBOARD_INGEST + TileMetadata.quality_metadata + write_tile's FreshnessRejectionError already cover the mid-flight ingest semantic. The "missing API" was a spec-vs-impl naming mismatch. - AZ-389 spec rewritten to consume the existing write_tile API + catch FreshnessRejectionError per AC-NEW-3 opportunistic emission. - _dependencies_table.md reverted: AZ-389 deps -> AZ-303 (was AZ-559 in the previous commit on this branch); total 150 / 497 pts. Tests - Full regression: 2099 passed (+14 new e2e/replay), 94 skipped (incl. 8 e2e/replay heavy-tier + documented blocker skips), 3 perf-microbench flakes deselected (test_cli_cold_start_under_2s, test_cold_start_under_500ms_p99, test_nfr_perf_sign_microbench; all pass in isolation - pre-existing under-load flakes on dev macOS). Reviews - _docs/03_implementation/reviews/batch_63_review.md: code review PASS_WITH_WARNINGS (3 documented spec-gap deferrals: AC-3, AC-4b, AC-8). - _docs/03_implementation/cumulative_review_batches_61-63_cycle1_report.md: cumulative review PASS_WITH_WARNINGS. Action items: prioritise AZ-558 (closes AZ-401 AC-9 + AZ-404 AC-4b); consider 2pt hygiene PBI for Protocol-completeness AST scan to catch the AZ-389 / AZ-559 phantom-API pattern at task-prep time. Architecture invariants observably holding - ADR-011 (replay-as-configuration): AC-4a's AST scan over src/gps_denied_onboard/components/**/*.py finds zero violations - components branch on neither config.mode nor any synonym. - Single composition root (replay protocol Invariant 11): AZ-402 CLI dispatches to runtime_root.main(config); does not call compose_root directly. Co-authored-by: Cursor <cursoragent@cursor.com>
100 lines
5.2 KiB
Markdown
100 lines
5.2 KiB
Markdown
# E2E replay tests (AZ-404)
|
||
|
||
End-to-end regression suite that runs the `gps-denied-replay`
|
||
console-script (AZ-402) against the Derkachi 60 s clip and asserts
|
||
the AZ-265 epic acceptance criteria.
|
||
|
||
## How to run
|
||
|
||
```bash
|
||
# In a fresh venv with the package installed:
|
||
RUN_REPLAY_E2E=1 pytest tests/e2e/replay/ -v
|
||
```
|
||
|
||
Without `RUN_REPLAY_E2E=1` the heavy tests skip cleanly. The two
|
||
unconditional tests (AC-4a mode-agnosticism scan + AC-7 skip-gate
|
||
self-check + the helpers in `test_helpers.py`) still run.
|
||
|
||
## Fixture state
|
||
|
||
| Artifact | Status | Source |
|
||
|----------|--------|--------|
|
||
| `flight_derkachi.mp4` | available | `_docs/00_problem/input_data/flight_derkachi/` |
|
||
| `data_imu.csv` | available | same dir; 4900 rows at 10 Hz over 489.9 s |
|
||
| Synthetic tlog | generated at fixture time | `_tlog_synth.py` reproduces a `pymavlink` `.tlog` from the CSV (the original tlog is not in-repo; the CSV was its export) |
|
||
| Camera calibration | placeholder (`tests/fixtures/calibration/adti26.json`) | The real Topotek KHP20S30 intrinsics are unknown per `camera_info.md`. AC-3 is `xfail`ed until a real calibration ships. |
|
||
| Operator pre-flight rehearsal | blocked | `tests/fixtures/mock-suite-sat-service/` is a bootstrap stub (only `GET /healthz`); AC-8 skips until the full D-PROJ-2 contract lands. |
|
||
|
||
## Clip range
|
||
|
||
The first 60 s of the Derkachi flight (Time=0.0 → Time=60.0). The
|
||
take-off region exercises the AZ-405 IMU-take-off auto-sync detector;
|
||
the cruise region that follows stresses the satellite-anchor + VIO
|
||
drift-correction path. To change the trim, edit `_CLIP_START_S` and
|
||
`_CLIP_END_S` in `conftest.py`.
|
||
|
||
## Expected runtime (Tier-1)
|
||
|
||
| Test | Expected wall clock |
|
||
|------|---------------------|
|
||
| AC-1 (`--pace asap`) | ≤ 30 s |
|
||
| AC-2 schema match | piggybacks on AC-1 |
|
||
| AC-5 determinism | 2 × asap runs (≤ 60 s total) |
|
||
| AC-6 realtime | 60 s ± 3 s |
|
||
| AC-6 asap | ≤ 30 s |
|
||
| Total suite | ≤ 6 min on Jetson AGX Orin |
|
||
|
||
The AC-1 / AC-2 / AC-5 tests share `--pace asap` runs but each
|
||
fixture invocation produces a fresh output file, so they do not
|
||
short-circuit each other (preserves AC-5's two-runs-diff guarantee).
|
||
|
||
## AC matrix
|
||
|
||
| AC | Test | State |
|
||
|----|------|-------|
|
||
| AC-1: exit 0 + JSONL count match | `test_ac1_exits_0_jsonl_count_match` | runs on Tier-1 |
|
||
| AC-2: JSONL schema match | `test_ac2_jsonl_schema_match` | runs on Tier-1 |
|
||
| AC-3: ≤ 100 m for 80 % of ticks | `test_ac3_within_100m_80pct_of_ticks` | `xfail` (waiting on real calibration) |
|
||
| AC-4a: mode-agnosticism AST scan | `test_ac4_mode_agnosticism_ast_scan` | unconditional |
|
||
| AC-4b: encoder byte-equality | `test_ac4_encoder_byte_equality` | `skip` (waiting on AZ-558) |
|
||
| AC-5: determinism | `test_ac5_determinism_two_runs_diff` | runs on Tier-1 |
|
||
| AC-6a: realtime 60 s ± 5 % | `test_ac6_pace_realtime_60s_within_5pct` | runs on Tier-1 |
|
||
| AC-6b: asap ≤ 30 s | `test_ac6_pace_asap_under_30s` | runs on Tier-1 |
|
||
| AC-7: skip-gate self-check | `test_ac7_skip_gate_consistent_with_env_var` | unconditional |
|
||
| AC-8: operator workflow rehearsal | `test_ac8_operator_workflow` | `skip` (waiting on D-PROJ-2 mock) |
|
||
| AC-9: helper L2 correctness | `test_helpers.py::test_ac9_l2_*` | unconditional |
|
||
| AC-10: README accuracy | this file | live |
|
||
|
||
## Failure-mode cookbook
|
||
|
||
| Symptom | Likely cause | Fix |
|
||
|---------|--------------|-----|
|
||
| `gps-denied-replay console-script not on PATH` | package not installed in the test venv | `pip install -e .` |
|
||
| AC-1 line count off by > 5 % | tlog synthesizer drifted from the CSV | regenerate by re-running the test (synthesizer is deterministic; non-determinism would be a real bug) |
|
||
| AC-3 fails at ~ 0 % even with calibration | wrong intrinsics OR wrong WGS84 ground truth source — verify the GLOBAL_POSITION_INT columns are still the AC-3 reference (per `flight_derkachi/README.md`) | re-derive ground truth |
|
||
| AC-5 determinism violated | non-deterministic float ordering in C5 estimator OR a clock leaked into the runtime | bisect via `git log` against the C5 / `clock` modules |
|
||
| AC-6 realtime drifts on shared CI | shared-runner contention; the spec allows widening to ± 5 s | adjust `_HEAVY_SKIP` boundary if it persists |
|
||
| `tlog missing required messages` | `_tlog_synth.py` lost a message group | check `_REQUIRED_MESSAGE_GROUPS` in `tlog_replay_adapter.py` against the synth output |
|
||
|
||
## Files
|
||
|
||
```
|
||
tests/e2e/replay/
|
||
├── README.md ← this file
|
||
├── __init__.py ← package marker + module-level docstring
|
||
├── _helpers.py ← parse_jsonl, l2_horizontal_m, match_percentage,
|
||
│ CapturingMavlinkTransport, GroundTruthRow
|
||
├── _tlog_synth.py ← CSV → tlog generator
|
||
├── conftest.py ← derkachi_replay_inputs, replay_runner,
|
||
│ operator_pre_flight_setup fixtures
|
||
├── test_helpers.py ← unit tests for _helpers (unconditional)
|
||
└── test_derkachi_1min.py ← AC-1..AC-8 + AC-7 skip gate + AC-4a AST scan
|
||
```
|
||
|
||
## Follow-up work
|
||
|
||
* **Real Topotek KHP20S30 calibration** — unblocks AC-3.
|
||
* **AZ-558** — closes AC-4b (route C8 encoders through `MavlinkTransport`).
|
||
* **D-PROJ-2 mock-suite-sat-service** — unblocks AC-8 (operator
|
||
workflow rehearsal).
|