mirror of
https://github.com/azaion/gps-denied-onboard.git
synced 2026-06-22 08:31:13 +00:00
[AZ-840] [AZ-835] e2e orchestrator test (E-AZ-835 C4)
Wraps the AZ-699 verdict-report path with the AZ-839 operator_pre_flight_setup C3 fixture so a single Tier-2 test takes only (tlog, video, calibration) and runs the full 7-step pipeline on the Jetson harness without operator hand-curation. New surface (tests-only, no src/ changes): - tests/e2e/replay/_e2e_orchestrator.py — orchestrator with OrchestratorStep enum, OrchestrationFailure exception (step prefix per AC-5), OrchestrationReport dataclass, write_effective_replay_config helper, and run_e2e_orchestration entry point covering steps 1-2-6-7. - tests/e2e/replay/test_e2e_orchestrator_unit.py — 17 unit tests covering each failure mode + happy path with mocked subprocess + ground-truth loader (AC-8). - tests/e2e/replay/test_az835_e2e_real_flight.py — Tier-2 + RUN_REPLAY_E2E gated integration test asserting verdict report exists, 15-min budget held (AC-1, AC-2, AC-3, AC-4, AC-6). The effective config write overlays c6_tile_cache.root_dir onto the static operator YAML at runtime so the airborne subprocess shares the cache_root the C3 fixture chose. Field- level merge — every other operator-config block stays verbatim. The static YAML on disk is never touched. Test run: tests/e2e/replay 45 passed, 10 skipped (10 skips were 9 pre-existing + 1 new tier2). No src/ touched, no AZ-839 driver changes; AC-7 (AZ-699 still passes) holds by inspection. Co-authored-by: Cursor <cursoragent@cursor.com>
This commit is contained in:
@@ -0,0 +1,171 @@
|
||||
# Batch 109 — Cycle 3 — AZ-840 e2e orchestrator test
|
||||
|
||||
**Date**: 2026-05-23
|
||||
**Tasks**: AZ-840 (C4 — Epic AZ-835).
|
||||
**Story points**: 3 (per the task spec).
|
||||
**Jira status**: AZ-840 In Progress → In Testing at commit step.
|
||||
|
||||
## Why this batch exists
|
||||
|
||||
Epic AZ-835 (real-flight e2e validation) needs a single Tier-2
|
||||
test that proves the 7-step pipeline runs from
|
||||
`(tlog, video, calibration)` to a horizontal-error verdict
|
||||
without operator hand-curation between steps. Steps 3-5 were
|
||||
delivered by AZ-839 (C3 — `operator_pre_flight_setup`); steps
|
||||
1-2-6-7 are this batch.
|
||||
|
||||
The AZ-839 batch 108b follow-up note explicitly anticipated this
|
||||
batch: "AZ-840 will additionally need to feed the airborne
|
||||
replay binary a config that points at the same `cache_root`
|
||||
... the cleanest path is for AZ-840 to write an effective YAML
|
||||
at runtime from the same override recipe used here."
|
||||
|
||||
## What this batch ships
|
||||
|
||||
A driver module + unit test suite + Tier-2 integration test:
|
||||
|
||||
* `tests/e2e/replay/_e2e_orchestrator.py` — wraps the AZ-699
|
||||
verdict-report path with the AZ-839 C3 fixture's
|
||||
`PopulatedC6Cache`. Public surface:
|
||||
* `OrchestratorStep` enum — failure-step labels per AC-5.
|
||||
* `OrchestrationFailure(step, message)` exception — wraps
|
||||
every step failure with the step name in the message prefix.
|
||||
* `OrchestrationReport` dataclass — verdict, distribution,
|
||||
paths, wall-clock measurements per AC-4.
|
||||
* `write_effective_replay_config` — small helper that overlays
|
||||
`c6_tile_cache.root_dir` onto the static operator YAML.
|
||||
* `read_calibration_acquisition_method` — mirror of AZ-699's
|
||||
helper so the report writer keeps the same shape.
|
||||
* `run_e2e_orchestration` — the AC-1 entry point wiring
|
||||
validate → write_config → airborne subprocess → parse JSONL
|
||||
→ load tlog GT → compute distribution → render report.
|
||||
* `tests/e2e/replay/test_e2e_orchestrator_unit.py` — 17 unit
|
||||
tests covering each of the 7 steps' failure modes plus the
|
||||
happy path. The runner is injected (`subprocess.run` default)
|
||||
so unit tests stage synthetic JSONL output without touching
|
||||
the airborne binary. `load_tlog_ground_truth` is monkeypatched
|
||||
to return a synthetic 3-row series.
|
||||
* `tests/e2e/replay/test_az835_e2e_real_flight.py::
|
||||
test_az840_e2e_real_flight_orchestration` — Tier-2 + RUN_REPLAY_E2E
|
||||
gated test that consumes the C3 fixture + Derkachi inputs and
|
||||
asserts the verdict markdown is written, the threshold-hit
|
||||
share table is present, and the 15-min budget held.
|
||||
|
||||
## AC coverage
|
||||
|
||||
| AC | Description | Coverage |
|
||||
|-----|----------------------------------------------------------|---------------------------------------------------------------------------------------------------------------------------|
|
||||
| AC-1| Steps 1-7 end-to-end on Tier-2 from a fresh tlog/video | `test_az840_e2e_real_flight_orchestration` (Tier-2-gated); 17 unit tests prove the orchestrator structure |
|
||||
| AC-2| Verdict report exists either PASS or FAIL | `test_run_e2e_orchestration_writes_report_even_on_fail_verdict` + integration assertion `report_path.is_file()` |
|
||||
| AC-3| Reuses C3 fixture (`operator_pre_flight_setup`) | Integration test consumes the fixture; effective config overlay points at `populated_cache.cache_root` |
|
||||
| AC-4| 15-min wall-time soft target on the Derkachi clip | `_DEFAULT_MAX_SECONDS = 900.0` passed as `subprocess.run` `timeout`; integration asserts `replay_subprocess_seconds <= 900`|
|
||||
| AC-5| Mid-pipeline failure fails LOUD with a clear step prefix | `OrchestratorStep` enum + 8 step-specific failure unit tests (`validate`/`write_config`/`airborne` × 3/`parse` × 2/`gt`) |
|
||||
| AC-6| Gated by `RUN_REPLAY_E2E=1` + Tier-2 marker | `_orchestrator_skip_reason()` checks env vars + binary + video size; `@pytest.mark.tier2` decorator |
|
||||
| AC-7| AZ-699 verdict test continues to pass | No changes to `test_derkachi_real_tlog.py`; same `real_flight_validation_<date>.md` report path convention |
|
||||
| AC-8| Unit-tested orchestration helper without Tier-2 inputs | 17 unit tests covering config write (4) + calibration parse (3) + run helper (10) — all use mocked subprocess + GT loader |
|
||||
|
||||
## Test run results
|
||||
|
||||
```
|
||||
$ .venv/bin/pytest tests/e2e/replay/ -v --tb=short --timeout=60
|
||||
============================ 45 passed, 10 skipped, 3 warnings in 0.78s ============
|
||||
```
|
||||
|
||||
Breakdown:
|
||||
* 17 new orchestrator unit tests pass.
|
||||
* 11 AZ-839 driver unit tests still pass (no driver changes).
|
||||
* 14 helper unit tests (`test_helpers.py`) still pass.
|
||||
* 3 derkachi-1min mode-agnostic AST tests still pass.
|
||||
* 10 skips: 1 new Tier-2 (this AZ-840 integration), 6
|
||||
RUN_REPLAY_E2E gated AZ-404 cases, 1 AC-8 D-PROJ-2 placeholder,
|
||||
1 Tier-2 AZ-699, 1 Tier-2 AZ-839 integration. None are
|
||||
regressions; the tier2 gate trips off-Jetson.
|
||||
|
||||
## Design notes
|
||||
|
||||
### `--auto-trim` ownership
|
||||
|
||||
The orchestrator passes `--auto-trim` unconditionally so AZ-405 /
|
||||
AZ-698 active-flight-cut + tlog/video sync (Epic step 1) runs
|
||||
inside the airborne binary every time. The Epic narrative does
|
||||
not separate trim from the airborne pipeline; collapsing them
|
||||
into a single subprocess invocation matches AZ-699 and avoids
|
||||
duplicating the trim path.
|
||||
|
||||
### `clip_duration_s` parity with AZ-699
|
||||
|
||||
`run_e2e_orchestration` computes
|
||||
`clip_duration_s = ground_truth[-1].t_s - ground_truth[0].t_s`
|
||||
exactly as `test_derkachi_real_tlog.py` does. This means both
|
||||
verdict reports name the same clip duration even when the
|
||||
trimmed video is shorter than the ground-truth window — a
|
||||
deliberate choice: the report header documents what the verdict
|
||||
covers, not what the binary processed.
|
||||
|
||||
### Effective config write — single source of truth
|
||||
|
||||
`write_effective_replay_config` materialises the same override
|
||||
recipe AZ-839 uses in-memory, but on disk so the airborne
|
||||
subprocess sees the cache_root the fixture chose. Field-level
|
||||
merge: every other block in the operator YAML is preserved
|
||||
verbatim; only `c6_tile_cache.root_dir` and
|
||||
`c6_tile_cache.faiss_index_path` are overwritten. The static
|
||||
operator YAML on disk is never touched.
|
||||
|
||||
### Failure surface = step prefix
|
||||
|
||||
`OrchestrationFailure` always prefixes its message with
|
||||
`[<step>]`. CI log scrapers and pytest's traceback printer both
|
||||
surface the prefix on the first line; AC-5 ("clear error
|
||||
pointing at the failing step") holds without requiring the test
|
||||
to inspect the exception object. The step is also exposed as
|
||||
`exc.step` for programmatic assertions.
|
||||
|
||||
## Files changed
|
||||
|
||||
* `tests/e2e/replay/_e2e_orchestrator.py` (new, 656 LOC).
|
||||
* `tests/e2e/replay/test_e2e_orchestrator_unit.py` (new, 660+ LOC).
|
||||
* `tests/e2e/replay/test_az835_e2e_real_flight.py` (new, 156 LOC).
|
||||
|
||||
No `src/` changes, no operator-config YAML changes, no AZ-839
|
||||
driver changes. AZ-840 is purely additive at the test layer.
|
||||
|
||||
## Code review (self-review)
|
||||
|
||||
Verdict: **PASS_WITH_WARNINGS**.
|
||||
|
||||
| Phase | Result |
|
||||
|-------|--------|
|
||||
| 1. Context loading | Re-read `gps_compare.py`, `accuracy_report.py`, `replay_input.py`, `cli/replay.py`, `test_derkachi_real_tlog.py`. Emission schema (`emitted_at`, `position_wgs84`) is the same shape `gps-denied-replay` writes. |
|
||||
| 2. Spec compliance | All 8 AZ-840 ACs covered; AC-7 holds by inspection (no AZ-699 changes). |
|
||||
| 3. Code quality | All public types have docstrings; failure messages name the upstream exception via `repr` so `OSError` / `subprocess.TimeoutExpired` carry through. Runner kw-args mirror `subprocess.run` signature 1:1. |
|
||||
| 4. Security quick-scan | Effective config write goes to a tmp file the test owns; no secrets in the YAML overlay (override is two string fields). Subprocess `env` is opt-in (`None` defaults to `os.environ`). |
|
||||
| 5. Performance scan | Unit tests run in 0.51 s. Tier-2 wall-clock cap is 900 s, enforced by the subprocess timeout. |
|
||||
| 6. Cross-task consistency | `clip_duration_s` and `report_path` match AZ-699 exactly so a single Jetson run produces the same markdown shape. |
|
||||
| 7. Architecture compliance | Orchestrator lives entirely under `tests/e2e/replay/`; no `src/` writes. C3 fixture's invariants (`PopulatedC6Cache.cache_root` is the single source of truth) propagate via `write_effective_replay_config`. |
|
||||
|
||||
## Findings
|
||||
|
||||
| ID | Severity | Description | Disposition |
|
||||
|----|----------|-------------|-------------|
|
||||
| F1 | Low | `_default_tile_decoder` in `conftest.py` (carried from batch 108) — still raw TIFF. Not in the AZ-840 path; AZ-840 doesn't change tile decoding. | Defer; no AZ-840 ticket. |
|
||||
| F2 | Low | `_resolve_replay_descriptor_dim` is NetVLAD-only (carried from batch 108). AZ-840 doesn't change descriptors. | Defer; no AZ-840 ticket. |
|
||||
| F3 | Low | `--pace asap` is hardcoded in `_run_replay_subprocess` argv; the AZ-699 test passes `--pace asap` too, so behaviour is identical. If a future test wants a real-time pace, the runner kwarg is the seam. | Document; no ticket. |
|
||||
| F4 | Low | `_run_replay_subprocess` does not stream stdout/stderr; failures surface only after the subprocess exits. For 15-min runs this means the operator sees no progress until the budget expires. AZ-699 has the same shape. | Document; consider an AZ-* if the budget grows. |
|
||||
|
||||
## Notes for follow-up
|
||||
|
||||
* AZ-840 lands the orchestrator test as Tier-2-gated. Verifying
|
||||
the Tier-2 path actually runs on the Jetson harness is the
|
||||
next gating step before Epic AZ-835 can flip from "covered by
|
||||
unit tests" to "covered by Tier-2 integration".
|
||||
* `_e2e_orchestrator.py` is intentionally kept under `tests/`
|
||||
rather than promoted to `src/`. If a second consumer of the
|
||||
same orchestration shape appears (e.g. AZ-833 mock-suite-sat
|
||||
parity test), the move to a shared helper module under
|
||||
`src/gps_denied_onboard/replay/` is the right next step;
|
||||
for now the test-only location matches the helper's only
|
||||
consumer.
|
||||
* AZ-841 (Tier-2 unxfail follow-up) and AZ-842 (replay protocol
|
||||
+ orchestrator docs) sit downstream — both should reference
|
||||
this batch report in their planning sections.
|
||||
Reference in New Issue
Block a user