[AZ-840] [AZ-835] e2e orchestrator test (E-AZ-835 C4)

Wraps the AZ-699 verdict-report path with the AZ-839
operator_pre_flight_setup C3 fixture so a single Tier-2 test
takes only (tlog, video, calibration) and runs the full 7-step
pipeline on the Jetson harness without operator hand-curation.

New surface (tests-only, no src/ changes):
- tests/e2e/replay/_e2e_orchestrator.py — orchestrator with
  OrchestratorStep enum, OrchestrationFailure exception (step
  prefix per AC-5), OrchestrationReport dataclass,
  write_effective_replay_config helper, and
  run_e2e_orchestration entry point covering steps 1-2-6-7.
- tests/e2e/replay/test_e2e_orchestrator_unit.py — 17 unit
  tests covering each failure mode + happy path with mocked
  subprocess + ground-truth loader (AC-8).
- tests/e2e/replay/test_az835_e2e_real_flight.py — Tier-2 +
  RUN_REPLAY_E2E gated integration test asserting verdict
  report exists, 15-min budget held (AC-1, AC-2, AC-3, AC-4,
  AC-6).

The effective config write overlays c6_tile_cache.root_dir
onto the static operator YAML at runtime so the airborne
subprocess shares the cache_root the C3 fixture chose. Field-
level merge — every other operator-config block stays
verbatim. The static YAML on disk is never touched.

Test run: tests/e2e/replay 45 passed, 10 skipped (10 skips
were 9 pre-existing + 1 new tier2). No src/ touched, no
AZ-839 driver changes; AC-7 (AZ-699 still passes) holds by
inspection.

Co-authored-by: Cursor <cursoragent@cursor.com>
This commit is contained in:
Oleksandr Bezdieniezhnykh
2026-05-23 15:27:41 +03:00
parent 8c4be9ace0
commit ade0c86f2b
6 changed files with 1680 additions and 1 deletions
@@ -1,75 +0,0 @@
# E2E orchestrator test (AZ-835 C4)
**Task**: AZ-840_e2e_orchestrator_test
**Name**: E2E orchestrator test ingesting raw (tlog, video, calibration) and running steps 1-7 (AZ-835 C4)
**Description**: Fourth building block of Epic AZ-835. A single pytest test that takes only `(tlog, video, calibration)` and runs the full 7-step pipeline end-to-end on the Jetson harness — without any operator hand-curation between steps. Extends or wraps the existing AZ-699 verdict test (`tests/e2e/replay/test_derkachi_real_tlog.py::test_az699_real_flight_validation_emits_verdict_and_report`) so the verdict-report-writing path is preserved. This is the test that closes the Epic's narrative: "give it a tlog + video, and the system does everything else."
**Complexity**: 3 SP
**Dependencies**: AZ-839 (C3, `operator_pre_flight_setup` real fixture — HARD); AZ-836 (C1, RouteSpec — In Testing); AZ-838 (C2, SatelliteProviderRouteClient — In Testing); AZ-699 (real flight validation runner — done); AZ-405 (tlog/video auto-sync — done); AZ-702 (camera factory-sheet calibration — done); AZ-696 (≥ 80 % within 100 m threshold — done); AZ-835 (parent Epic)
**Component**: `tests/e2e/replay/test_az835_e2e_real_flight.py` (new) OR extend `test_derkachi_real_tlog.py`
**Tracker**: AZ-840 (https://denyspopov.atlassian.net/browse/AZ-840)
**Parent Epic**: AZ-835
Jira AZ-840 is the authoritative spec; this file is the in-workspace mirror.
## Inputs (test parameters)
- `tlog_path: Path` — raw ArduPilot tlog binary (Derkachi as the reference fixture; parametrize for future tlogs).
- `video_path: Path` — raw flight video.
- `calibration_path: Path` — camera factory-sheet calibration (AZ-702).
## Pipeline orchestration
The 7 steps from the Epic:
1. **Active flight cut + tlog/video sync** — call AZ-405's `tlog_video_adapter`. If active-segment detection needs a small extension, file as an in-scope sub-fix; if it needs a meaningful new feature, STOP and propose a sibling ticket.
2. **On-fly frame + IMU extraction**`VideoFileFrameSource` + `TlogReplayFcAdapter`. No change.
3. **Auto-create route** — call AZ-836's `extract_route_from_tlog(tlog, max_waypoints=10)`. Assert the returned `RouteSpec` materially follows the tlog trajectory.
4. **POST route to satellite-provider** — delegate to AZ-839 (C3) fixture `operator_pre_flight_setup` (which itself calls AZ-838's `SatelliteProviderRouteClient.seed_route`). The fixture's `PopulatedC6Cache` is the dependency boundary.
5. **Build FAISS index** — driven by C3 fixture as part of populating the cache.
6. **Run gps-denied airborne pipeline** — invoke the `gps-denied-replay` console-script or equivalent direct-call entry point against the populated cache + tlog/video/calibration. Reuse the airborne composition root path AZ-699 exercises today.
7. **Get GPS fixes, check vs tlog GPS** — call `helpers/accuracy_report.py` + `helpers/gps_compare.py` to compute the horizontal-error distribution and emit the verdict report at `_docs/06_metrics/real_flight_validation_<YYYY-MM-DD>.md`.
## Test gating
- `@pytest.mark.tier2`.
- Skip-unless-env(`RUN_REPLAY_E2E=1`) with an explicit skip reason that names the missing env var — no silent skip.
## Verdict report
Emit ALWAYS, even on FAIL. The success criterion for AC-1 is that the report exists and the distribution is honest — NOT that the verdict is PASS.
## Acceptance criteria
| # | Criterion |
|---|-----------|
| AC-1 | Test takes only `(tlog, video, calibration)` and runs steps 1-7 end-to-end on Tier-2 Jetson. No operator hand-curation between steps. |
| AC-2 | Test produces the AZ-699 verdict report at `_docs/06_metrics/real_flight_validation_<YYYY-MM-DD>.md` with the honest horizontal-error distribution, REGARDLESS of PASS/FAIL on the AZ-696 AC-3 threshold (≥ 80 % within 100 m). |
| AC-3 | Test reuses the C3 fixture's `operator_pre_flight_setup` for steps 3-5; no duplicate seeding/downloading logic. |
| AC-4 | Test runs to completion within 15 min wall time on the Derkachi clip (soft target for first delivery; hard NFR set after first measurement is recorded in the report). |
| AC-5 | Mid-pipeline failure (e.g. step 4 satellite-provider rejection, step 5 FAISS sidecar mismatch) fails LOUD with a clear error pointing at the failing step. No silent skip past a failing step. |
| AC-6 | Test is gated by `RUN_REPLAY_E2E=1` + `@pytest.mark.tier2`; explicit skip reason names the missing env var. |
| AC-7 | The existing AZ-699 verdict test continues to pass (this test does not break or supersede it; either it lives alongside, or AZ-699 is folded into this test with the verdict-writing path preserved). |
| AC-8 | Unit tests cover the orchestration helper layer (parameter validation, error propagation between steps). The end-to-end happy path is the Jetson integration test. |
## Out of scope
- Un-xfailing the AZ-777 AC-4 / AC-5 tests (AZ-841 / C5).
- Documentation updates beyond the test file's own docstring (AZ-842 / C6).
- Real-time tlog ingestion (one finished `.tlog` per test invocation).
- Multi-flight aggregate validation.
- Performance optimization beyond the AC-4 soft target.
- Modifying the airborne composition root.
## Risks
**Risk 1 — Integration glue between AZ-405 tlog/video sync and the airborne pipeline's frame-source contract.** The auto-sync adapter and the airborne composition root were authored in different cycles; small impedance mismatches are likely. Mitigation: if the glue exceeds the 3 SP budget, STOP and propose a sub-ticket rather than expanding scope.
**Risk 2 — Step 1 active-segment detection may need extension.** AZ-405 covered tlog↔video sync; take-off/landing boundary detection may not be implemented. Mitigation: file an in-scope sub-fix if small; STOP and propose a sibling ticket if not.
## References
- Parent Epic: AZ-835 — https://denyspopov.atlassian.net/browse/AZ-835
- Hard dep (C3 fixture): AZ-839 — https://denyspopov.atlassian.net/browse/AZ-839
- Existing verdict test: `tests/e2e/replay/test_derkachi_real_tlog.py::test_az699_real_flight_validation_emits_verdict_and_report`
- Tlog/video adapter: `src/gps_denied_onboard/replay_input/tlog_video_adapter.py` (AZ-405)
- Helpers: `src/gps_denied_onboard/helpers/accuracy_report.py`, `src/gps_denied_onboard/helpers/gps_compare.py`