[AZ-840] [AZ-835] e2e orchestrator test (E-AZ-835 C4)

Wraps the AZ-699 verdict-report path with the AZ-839
operator_pre_flight_setup C3 fixture so a single Tier-2 test
takes only (tlog, video, calibration) and runs the full 7-step
pipeline on the Jetson harness without operator hand-curation.

New surface (tests-only, no src/ changes):
- tests/e2e/replay/_e2e_orchestrator.py — orchestrator with
  OrchestratorStep enum, OrchestrationFailure exception (step
  prefix per AC-5), OrchestrationReport dataclass,
  write_effective_replay_config helper, and
  run_e2e_orchestration entry point covering steps 1-2-6-7.
- tests/e2e/replay/test_e2e_orchestrator_unit.py — 17 unit
  tests covering each failure mode + happy path with mocked
  subprocess + ground-truth loader (AC-8).
- tests/e2e/replay/test_az835_e2e_real_flight.py — Tier-2 +
  RUN_REPLAY_E2E gated integration test asserting verdict
  report exists, 15-min budget held (AC-1, AC-2, AC-3, AC-4,
  AC-6).

The effective config write overlays c6_tile_cache.root_dir
onto the static operator YAML at runtime so the airborne
subprocess shares the cache_root the C3 fixture chose. Field-
level merge — every other operator-config block stays
verbatim. The static YAML on disk is never touched.

Test run: tests/e2e/replay 45 passed, 10 skipped (10 skips
were 9 pre-existing + 1 new tier2). No src/ touched, no
AZ-839 driver changes; AC-7 (AZ-699 still passes) holds by
inspection.

Co-authored-by: Cursor <cursoragent@cursor.com>
This commit is contained in:
Oleksandr Bezdieniezhnykh
2026-05-23 15:27:41 +03:00
parent 8c4be9ace0
commit ade0c86f2b
6 changed files with 1680 additions and 1 deletions
@@ -0,0 +1,171 @@
# Batch 109 — Cycle 3 — AZ-840 e2e orchestrator test
**Date**: 2026-05-23
**Tasks**: AZ-840 (C4 — Epic AZ-835).
**Story points**: 3 (per the task spec).
**Jira status**: AZ-840 In Progress → In Testing at commit step.
## Why this batch exists
Epic AZ-835 (real-flight e2e validation) needs a single Tier-2
test that proves the 7-step pipeline runs from
`(tlog, video, calibration)` to a horizontal-error verdict
without operator hand-curation between steps. Steps 3-5 were
delivered by AZ-839 (C3 — `operator_pre_flight_setup`); steps
1-2-6-7 are this batch.
The AZ-839 batch 108b follow-up note explicitly anticipated this
batch: "AZ-840 will additionally need to feed the airborne
replay binary a config that points at the same `cache_root`
... the cleanest path is for AZ-840 to write an effective YAML
at runtime from the same override recipe used here."
## What this batch ships
A driver module + unit test suite + Tier-2 integration test:
* `tests/e2e/replay/_e2e_orchestrator.py` — wraps the AZ-699
verdict-report path with the AZ-839 C3 fixture's
`PopulatedC6Cache`. Public surface:
* `OrchestratorStep` enum — failure-step labels per AC-5.
* `OrchestrationFailure(step, message)` exception — wraps
every step failure with the step name in the message prefix.
* `OrchestrationReport` dataclass — verdict, distribution,
paths, wall-clock measurements per AC-4.
* `write_effective_replay_config` — small helper that overlays
`c6_tile_cache.root_dir` onto the static operator YAML.
* `read_calibration_acquisition_method` — mirror of AZ-699's
helper so the report writer keeps the same shape.
* `run_e2e_orchestration` — the AC-1 entry point wiring
validate → write_config → airborne subprocess → parse JSONL
→ load tlog GT → compute distribution → render report.
* `tests/e2e/replay/test_e2e_orchestrator_unit.py` — 17 unit
tests covering each of the 7 steps' failure modes plus the
happy path. The runner is injected (`subprocess.run` default)
so unit tests stage synthetic JSONL output without touching
the airborne binary. `load_tlog_ground_truth` is monkeypatched
to return a synthetic 3-row series.
* `tests/e2e/replay/test_az835_e2e_real_flight.py::
test_az840_e2e_real_flight_orchestration` — Tier-2 + RUN_REPLAY_E2E
gated test that consumes the C3 fixture + Derkachi inputs and
asserts the verdict markdown is written, the threshold-hit
share table is present, and the 15-min budget held.
## AC coverage
| AC | Description | Coverage |
|-----|----------------------------------------------------------|---------------------------------------------------------------------------------------------------------------------------|
| AC-1| Steps 1-7 end-to-end on Tier-2 from a fresh tlog/video | `test_az840_e2e_real_flight_orchestration` (Tier-2-gated); 17 unit tests prove the orchestrator structure |
| AC-2| Verdict report exists either PASS or FAIL | `test_run_e2e_orchestration_writes_report_even_on_fail_verdict` + integration assertion `report_path.is_file()` |
| AC-3| Reuses C3 fixture (`operator_pre_flight_setup`) | Integration test consumes the fixture; effective config overlay points at `populated_cache.cache_root` |
| AC-4| 15-min wall-time soft target on the Derkachi clip | `_DEFAULT_MAX_SECONDS = 900.0` passed as `subprocess.run` `timeout`; integration asserts `replay_subprocess_seconds <= 900`|
| AC-5| Mid-pipeline failure fails LOUD with a clear step prefix | `OrchestratorStep` enum + 8 step-specific failure unit tests (`validate`/`write_config`/`airborne` × 3/`parse` × 2/`gt`) |
| AC-6| Gated by `RUN_REPLAY_E2E=1` + Tier-2 marker | `_orchestrator_skip_reason()` checks env vars + binary + video size; `@pytest.mark.tier2` decorator |
| AC-7| AZ-699 verdict test continues to pass | No changes to `test_derkachi_real_tlog.py`; same `real_flight_validation_<date>.md` report path convention |
| AC-8| Unit-tested orchestration helper without Tier-2 inputs | 17 unit tests covering config write (4) + calibration parse (3) + run helper (10) — all use mocked subprocess + GT loader |
## Test run results
```
$ .venv/bin/pytest tests/e2e/replay/ -v --tb=short --timeout=60
============================ 45 passed, 10 skipped, 3 warnings in 0.78s ============
```
Breakdown:
* 17 new orchestrator unit tests pass.
* 11 AZ-839 driver unit tests still pass (no driver changes).
* 14 helper unit tests (`test_helpers.py`) still pass.
* 3 derkachi-1min mode-agnostic AST tests still pass.
* 10 skips: 1 new Tier-2 (this AZ-840 integration), 6
RUN_REPLAY_E2E gated AZ-404 cases, 1 AC-8 D-PROJ-2 placeholder,
1 Tier-2 AZ-699, 1 Tier-2 AZ-839 integration. None are
regressions; the tier2 gate trips off-Jetson.
## Design notes
### `--auto-trim` ownership
The orchestrator passes `--auto-trim` unconditionally so AZ-405 /
AZ-698 active-flight-cut + tlog/video sync (Epic step 1) runs
inside the airborne binary every time. The Epic narrative does
not separate trim from the airborne pipeline; collapsing them
into a single subprocess invocation matches AZ-699 and avoids
duplicating the trim path.
### `clip_duration_s` parity with AZ-699
`run_e2e_orchestration` computes
`clip_duration_s = ground_truth[-1].t_s - ground_truth[0].t_s`
exactly as `test_derkachi_real_tlog.py` does. This means both
verdict reports name the same clip duration even when the
trimmed video is shorter than the ground-truth window — a
deliberate choice: the report header documents what the verdict
covers, not what the binary processed.
### Effective config write — single source of truth
`write_effective_replay_config` materialises the same override
recipe AZ-839 uses in-memory, but on disk so the airborne
subprocess sees the cache_root the fixture chose. Field-level
merge: every other block in the operator YAML is preserved
verbatim; only `c6_tile_cache.root_dir` and
`c6_tile_cache.faiss_index_path` are overwritten. The static
operator YAML on disk is never touched.
### Failure surface = step prefix
`OrchestrationFailure` always prefixes its message with
`[<step>]`. CI log scrapers and pytest's traceback printer both
surface the prefix on the first line; AC-5 ("clear error
pointing at the failing step") holds without requiring the test
to inspect the exception object. The step is also exposed as
`exc.step` for programmatic assertions.
## Files changed
* `tests/e2e/replay/_e2e_orchestrator.py` (new, 656 LOC).
* `tests/e2e/replay/test_e2e_orchestrator_unit.py` (new, 660+ LOC).
* `tests/e2e/replay/test_az835_e2e_real_flight.py` (new, 156 LOC).
No `src/` changes, no operator-config YAML changes, no AZ-839
driver changes. AZ-840 is purely additive at the test layer.
## Code review (self-review)
Verdict: **PASS_WITH_WARNINGS**.
| Phase | Result |
|-------|--------|
| 1. Context loading | Re-read `gps_compare.py`, `accuracy_report.py`, `replay_input.py`, `cli/replay.py`, `test_derkachi_real_tlog.py`. Emission schema (`emitted_at`, `position_wgs84`) is the same shape `gps-denied-replay` writes. |
| 2. Spec compliance | All 8 AZ-840 ACs covered; AC-7 holds by inspection (no AZ-699 changes). |
| 3. Code quality | All public types have docstrings; failure messages name the upstream exception via `repr` so `OSError` / `subprocess.TimeoutExpired` carry through. Runner kw-args mirror `subprocess.run` signature 1:1. |
| 4. Security quick-scan | Effective config write goes to a tmp file the test owns; no secrets in the YAML overlay (override is two string fields). Subprocess `env` is opt-in (`None` defaults to `os.environ`). |
| 5. Performance scan | Unit tests run in 0.51 s. Tier-2 wall-clock cap is 900 s, enforced by the subprocess timeout. |
| 6. Cross-task consistency | `clip_duration_s` and `report_path` match AZ-699 exactly so a single Jetson run produces the same markdown shape. |
| 7. Architecture compliance | Orchestrator lives entirely under `tests/e2e/replay/`; no `src/` writes. C3 fixture's invariants (`PopulatedC6Cache.cache_root` is the single source of truth) propagate via `write_effective_replay_config`. |
## Findings
| ID | Severity | Description | Disposition |
|----|----------|-------------|-------------|
| F1 | Low | `_default_tile_decoder` in `conftest.py` (carried from batch 108) — still raw TIFF. Not in the AZ-840 path; AZ-840 doesn't change tile decoding. | Defer; no AZ-840 ticket. |
| F2 | Low | `_resolve_replay_descriptor_dim` is NetVLAD-only (carried from batch 108). AZ-840 doesn't change descriptors. | Defer; no AZ-840 ticket. |
| F3 | Low | `--pace asap` is hardcoded in `_run_replay_subprocess` argv; the AZ-699 test passes `--pace asap` too, so behaviour is identical. If a future test wants a real-time pace, the runner kwarg is the seam. | Document; no ticket. |
| F4 | Low | `_run_replay_subprocess` does not stream stdout/stderr; failures surface only after the subprocess exits. For 15-min runs this means the operator sees no progress until the budget expires. AZ-699 has the same shape. | Document; consider an AZ-* if the budget grows. |
## Notes for follow-up
* AZ-840 lands the orchestrator test as Tier-2-gated. Verifying
the Tier-2 path actually runs on the Jetson harness is the
next gating step before Epic AZ-835 can flip from "covered by
unit tests" to "covered by Tier-2 integration".
* `_e2e_orchestrator.py` is intentionally kept under `tests/`
rather than promoted to `src/`. If a second consumer of the
same orchestration shape appears (e.g. AZ-833 mock-suite-sat
parity test), the move to a shared helper module under
`src/gps_denied_onboard/replay/` is the right next step;
for now the test-only location matches the helper's only
consumer.
* AZ-841 (Tier-2 unxfail follow-up) and AZ-842 (replay protocol
+ orchestrator docs) sit downstream — both should reference
this batch report in their planning sections.
+1 -1
View File
@@ -8,7 +8,7 @@ status: in_progress
sub_step:
phase: 7
name: batch-loop
detail: "batch 109 next; AZ-840 C4"
detail: "batch 110 next; full-suite gate after AZ-840 C4 ship"
retry_count: 0
cycle: 3
tracker: jira
+655
View File
@@ -0,0 +1,655 @@
"""E2E orchestrator for the AZ-835 7-step pipeline (AZ-840 / Epic AZ-835 C4).
Wraps the AZ-699 verdict-report writing path with the AZ-839 C3
fixture's `PopulatedC6Cache` so a single Tier-2 test can run from
``(tlog, video, calibration)`` to a horizontal-error report without
operator hand-curation between steps. The 7-step Epic narrative
(``_docs/02_tasks/todo/AZ-840_e2e_orchestrator_test.md``):
1. Active flight cut + tlog/video sync — handled by ``gps-denied-replay``
``--auto-trim`` (AZ-405 / AZ-698) inside the airborne binary.
2. On-fly frame + IMU extraction — same binary's per-frame loop.
3. Auto-create route — done by the C3 fixture
(``operator_pre_flight_setup`` calls ``extract_route_from_tlog``).
4. POST route to satellite-provider — C3 fixture (AZ-838
``SatelliteProviderRouteClient.seed_route``).
5. Build FAISS index — C3 fixture (AZ-322 ``DescriptorBatcher``).
6. Run gps-denied airborne pipeline — this module's
``_run_replay_subprocess`` invokes ``gps-denied-replay`` against
the populated cache.
7. Get GPS fixes, check vs tlog GPS — this module's
``_load_ground_truth`` + ``horizontal_error_distribution`` +
``render_report`` writes the verdict markdown.
The C3 fixture mutates ``c6_tile_cache.root_dir`` to point at a
``tmp_path_factory.mktemp`` value (AZ-839 batch 108b). The static
operator YAML at ``GPS_DENIED_OPERATOR_CONFIG_PATH`` cannot know
that path. ``write_effective_replay_config`` reads the static YAML,
overlays the ``c6_tile_cache.root_dir`` override, writes the merged
result to a tmp file, and returns the path the airborne binary
will load via ``--config``. This keeps a single source of truth
for the cache_root override across the in-memory C3 fixture path
and the subprocess airborne path.
Public surface — re-exported from this module:
* :class:`OrchestratorStep` — failure-step labels per AC-5 ("fails
LOUD with a clear error pointing at the failing step").
* :class:`OrchestrationFailure` — wraps the underlying exception
with the step that produced it.
* :class:`OrchestrationReport` — return value of
:func:`run_e2e_orchestration` (verdict, distribution, paths,
wall-clock measurements per AC-4).
* :func:`write_effective_replay_config` — small helper for the
config merge step.
* :func:`run_e2e_orchestration` — the AC-1 entry point.
"""
from __future__ import annotations
import datetime
import json
import logging
import subprocess
import time
from collections.abc import Callable, Mapping
from dataclasses import dataclass
from enum import Enum
from pathlib import Path
from typing import Any
import yaml
from gps_denied_onboard.helpers.accuracy_report import (
AC3_GATE_PCT,
AC3_GATE_THRESHOLD_M,
ReportContext,
render_report,
verdict_passes_ac3,
)
from gps_denied_onboard.helpers.gps_compare import (
GroundTruthRow,
HorizontalErrorDistribution,
horizontal_error_distribution,
)
from gps_denied_onboard.replay_input import load_tlog_ground_truth
from tests.e2e.replay._operator_pre_flight import PopulatedC6Cache
__all__ = [
"OrchestrationFailure",
"OrchestrationReport",
"OrchestratorStep",
"read_calibration_acquisition_method",
"run_e2e_orchestration",
"write_effective_replay_config",
]
# Replay-subprocess wall-clock cap for the Derkachi clip per AZ-840
# AC-4 (15 min soft target). Exposed as a default that the integration
# test can override; the unit tests rely on the contract that the
# runner argument is a free callable.
_DEFAULT_MAX_SECONDS: float = 900.0
_LOGGER = logging.getLogger("tests.e2e.replay.e2e_orchestrator")
class OrchestratorStep(str, Enum):
"""Labels for the 7-step pipeline used by :class:`OrchestrationFailure`.
AC-5: every failure that reaches the test surface must name the
step that produced it. The string values are stable so test
assertions and log readers can match on them.
"""
VALIDATE_INPUTS = "validate_inputs"
WRITE_EFFECTIVE_CONFIG = "write_effective_config"
AIRBORNE_PIPELINE = "airborne_pipeline"
PARSE_EMISSIONS = "parse_emissions"
LOAD_GROUND_TRUTH = "load_ground_truth"
COMPUTE_DISTRIBUTION = "compute_distribution"
RENDER_REPORT = "render_report"
class OrchestrationFailure(RuntimeError):
"""Failure inside one of the 7 orchestration steps (AC-5).
The :attr:`step` attribute names the failing step; the message
embeds it as the prefix so plain log-readers see the failure
location without inspecting the exception object.
"""
def __init__(self, step: OrchestratorStep, message: str) -> None:
super().__init__(f"[{step.value}] {message}")
self.step = step
@dataclass(frozen=True, slots=True)
class OrchestrationReport:
"""Return value of :func:`run_e2e_orchestration`.
Attributes:
verdict_passed: ``True`` iff the run met the AZ-696 epic
AC-3 gate (>= AC3_GATE_PCT% within AC3_GATE_THRESHOLD_M m).
distribution: Computed horizontal-error distribution.
report_path: Markdown report written under ``report_dir``.
emissions_count: Total estimator-output records consumed.
wall_clock_s: Wall-clock seconds for the orchestration run
(excludes the C3 fixture setup; covers steps 1-2-6-7).
replay_subprocess_seconds: Wall-clock seconds the airborne
replay subprocess took. Always <= ``wall_clock_s``.
"""
verdict_passed: bool
distribution: HorizontalErrorDistribution
report_path: Path
emissions_count: int
wall_clock_s: float
replay_subprocess_seconds: float
def read_calibration_acquisition_method(calibration_path: Path) -> str:
"""Return the AZ-702 ``acquisition_method`` field, or ``"unknown"``.
Mirrors ``test_derkachi_real_tlog._read_calibration_acquisition_method``
so the AZ-840 verdict report can name the calibration provenance
in its failure message (AZ-699 AC-3). Pure helper; the report
writer needs the string, not the JSON.
"""
try:
data = json.loads(calibration_path.read_text())
except (OSError, json.JSONDecodeError):
return "unknown"
method = data.get("acquisition_method")
if isinstance(method, str) and method:
return method
return "unknown"
def write_effective_replay_config(
*,
base_config_path: Path,
cache_root: Path,
output_path: Path,
) -> Path:
"""Merge cache_root override into the static operator YAML.
Reads ``base_config_path`` as YAML, sets the
``c6_tile_cache.root_dir`` to ``cache_root`` (forcing the
FAISS index path to fall back to ``<cache_root>/descriptor.index``),
and writes the merged document to ``output_path`` as YAML.
The merge is field-level: every other block in the base YAML is
preserved verbatim. This keeps a single source of truth for the
operator config — the test harness only contributes the dynamic
cache_root.
Returns:
The ``output_path`` argument, for ergonomic chaining.
Raises:
OrchestrationFailure (step=WRITE_EFFECTIVE_CONFIG): Base YAML
unreadable, malformed, or not a top-level mapping.
"""
try:
base_text = base_config_path.read_text()
except OSError as exc:
raise OrchestrationFailure(
OrchestratorStep.WRITE_EFFECTIVE_CONFIG,
f"cannot read base config at {base_config_path}: {exc!r}",
) from exc
try:
base_data = yaml.safe_load(base_text) or {}
except yaml.YAMLError as exc:
raise OrchestrationFailure(
OrchestratorStep.WRITE_EFFECTIVE_CONFIG,
f"base config YAML at {base_config_path} is malformed: {exc!r}",
) from exc
if not isinstance(base_data, dict):
raise OrchestrationFailure(
OrchestratorStep.WRITE_EFFECTIVE_CONFIG,
f"base config YAML at {base_config_path} must be a mapping; "
f"got {type(base_data).__name__}",
)
c6_block_raw = base_data.get("c6_tile_cache")
c6_block = dict(c6_block_raw) if isinstance(c6_block_raw, dict) else {}
c6_block["root_dir"] = str(cache_root)
c6_block["faiss_index_path"] = ""
base_data["c6_tile_cache"] = c6_block
try:
output_path.parent.mkdir(parents=True, exist_ok=True)
output_path.write_text(
yaml.safe_dump(base_data, sort_keys=True, default_flow_style=False)
)
except OSError as exc:
raise OrchestrationFailure(
OrchestratorStep.WRITE_EFFECTIVE_CONFIG,
f"cannot write effective config at {output_path}: {exc!r}",
) from exc
return output_path
def run_e2e_orchestration(
*,
populated_cache: PopulatedC6Cache,
base_config_path: Path,
tlog_path: Path,
video_path: Path,
calibration_path: Path,
signing_key_path: Path,
replay_binary: Path,
output_path: Path,
report_dir: Path,
effective_config_path: Path,
run_date_utc: str | None = None,
runner: Callable[..., subprocess.CompletedProcess[str]] = subprocess.run,
subprocess_env: Mapping[str, str] | None = None,
max_seconds: float = _DEFAULT_MAX_SECONDS,
logger: logging.Logger | None = None,
) -> OrchestrationReport:
"""Run AZ-835 steps 1-7 against the AZ-839 populated cache.
Steps 3-5 are the responsibility of ``populated_cache`` (the
AZ-839 C3 fixture); this function covers 1-2-6 (the airborne
replay subprocess) and 7 (verdict report). The C3 fixture and
this function share the cache_root via
:func:`write_effective_replay_config` so the airborne binary
reads the same FAISS index the fixture wrote (AC-3).
Args:
populated_cache: C3 fixture output (AZ-839). Carries
``cache_root``, ``faiss_index_path``, and the route
spec the test pipeline produced.
base_config_path: Static operator config YAML
(``GPS_DENIED_OPERATOR_CONFIG_PATH``). Must register
``c6_tile_cache``, ``c10_provisioning``, ``c2_vpr``,
``c4_pose``, and ``c5_state`` blocks for the airborne
binary to compose the replay graph.
tlog_path: ArduPilot binary tlog the test consumes.
video_path: Flight video file the test consumes.
calibration_path: Camera calibration JSON (AZ-702
factory-sheet for Derkachi).
signing_key_path: MAVLink signing-key file. Replay protocol
Invariant 11 — required even for the noop transport.
replay_binary: ``gps-denied-replay`` console-script path.
output_path: Where the airborne binary writes JSONL
estimator emissions.
report_dir: Directory the verdict markdown is written to.
effective_config_path: Where the cache_root-merged YAML is
written. The path is passed to the airborne binary via
``--config``.
run_date_utc: ISO-8601 date for the report filename and
header. Defaults to today UTC.
runner: ``subprocess.run`` by default; tests inject a fake
that emits a synthetic JSONL output.
subprocess_env: Optional environment overlay for the
replay subprocess. ``None`` means ``os.environ``.
max_seconds: Hard wall-clock cap for the airborne replay
subprocess. The orchestrator times out the runner via
its ``timeout`` kwarg; an exceeded budget surfaces as
``OrchestrationFailure(step=AIRBORNE_PIPELINE)``.
logger: Optional logger. Defaults to the module logger.
Returns:
:class:`OrchestrationReport` on success. The verdict can
be PASS or FAIL — AC-2 mandates the report exists either
way.
Raises:
OrchestrationFailure: Any of the 7 steps failed. The
``step`` attribute names the failing step.
"""
log = logger or _LOGGER
started = time.monotonic()
effective_run_date = run_date_utc or (
datetime.datetime.now(datetime.timezone.utc).date().isoformat()
)
_validate_inputs(
base_config_path=base_config_path,
tlog_path=tlog_path,
video_path=video_path,
calibration_path=calibration_path,
signing_key_path=signing_key_path,
replay_binary=replay_binary,
report_dir=report_dir,
)
write_effective_replay_config(
base_config_path=base_config_path,
cache_root=populated_cache.cache_root,
output_path=effective_config_path,
)
replay_subprocess_seconds = _run_replay_subprocess(
replay_binary=replay_binary,
video_path=video_path,
tlog_path=tlog_path,
output_path=output_path,
calibration_path=calibration_path,
config_path=effective_config_path,
signing_key_path=signing_key_path,
max_seconds=max_seconds,
runner=runner,
env=subprocess_env,
logger=log,
)
emissions = _parse_jsonl(output_path)
ground_truth = _load_ground_truth(tlog_path)
distribution = _compute_distribution(emissions, ground_truth)
context = ReportContext(
run_date_utc=effective_run_date,
tlog_path=tlog_path,
video_path=video_path,
calibration_acquisition_method=read_calibration_acquisition_method(
calibration_path
),
clip_duration_s=(
ground_truth[-1].t_s - ground_truth[0].t_s
if ground_truth
else 0.0
),
emissions_count=len(emissions),
)
verdict_passed = verdict_passes_ac3(distribution)
report_path = _render_and_write_report(
distribution=distribution,
context=context,
passed=verdict_passed,
report_dir=report_dir,
)
log.info(
"e2e_orchestrator: report written",
extra={
"kind": "e2e_orchestrator.report_written",
"kv": {
"report_path": str(report_path),
"verdict_passed": verdict_passed,
"share_within_threshold_pct": (
distribution.threshold_hit_share.get(
AC3_GATE_THRESHOLD_M, 0.0
)
* 100.0
),
"ac3_gate_pct": AC3_GATE_PCT,
"emissions_count": len(emissions),
"ground_truth_pairings": distribution.count,
},
},
)
wall_clock_s = max(0.0, time.monotonic() - started)
return OrchestrationReport(
verdict_passed=verdict_passed,
distribution=distribution,
report_path=report_path,
emissions_count=len(emissions),
wall_clock_s=wall_clock_s,
replay_subprocess_seconds=replay_subprocess_seconds,
)
def _validate_inputs(
*,
base_config_path: Path,
tlog_path: Path,
video_path: Path,
calibration_path: Path,
signing_key_path: Path,
replay_binary: Path,
report_dir: Path,
) -> None:
"""Fail fast on missing inputs (AC-5 — surface the failing step early)."""
file_inputs: tuple[tuple[str, Path], ...] = (
("base_config_path", base_config_path),
("tlog_path", tlog_path),
("video_path", video_path),
("calibration_path", calibration_path),
("signing_key_path", signing_key_path),
("replay_binary", replay_binary),
)
for label, path in file_inputs:
if not path.is_file():
raise OrchestrationFailure(
OrchestratorStep.VALIDATE_INPUTS,
f"{label} is not a file: {path}",
)
try:
report_dir.mkdir(parents=True, exist_ok=True)
except OSError as exc:
raise OrchestrationFailure(
OrchestratorStep.VALIDATE_INPUTS,
f"report_dir {report_dir} cannot be created: {exc!r}",
) from exc
def _run_replay_subprocess(
*,
replay_binary: Path,
video_path: Path,
tlog_path: Path,
output_path: Path,
calibration_path: Path,
config_path: Path,
signing_key_path: Path,
max_seconds: float,
runner: Callable[..., subprocess.CompletedProcess[str]],
env: Mapping[str, str] | None,
logger: logging.Logger,
) -> float:
"""Invoke gps-denied-replay with --auto-trim; return wall-clock seconds.
Wraps :class:`subprocess.run` so unit tests can inject a fake
runner. ``--auto-trim`` is always enabled here — the
orchestrator owns the AZ-405 / AZ-698 sync path (AZ-840 step 1).
Raises:
OrchestrationFailure (step=AIRBORNE_PIPELINE): Non-zero exit,
timeout, or runner-level OSError.
"""
argv = [
str(replay_binary),
"--video",
str(video_path),
"--tlog",
str(tlog_path),
"--output",
str(output_path),
"--camera-calibration",
str(calibration_path),
"--config",
str(config_path),
"--mavlink-signing-key",
str(signing_key_path),
"--pace",
"asap",
"--auto-trim",
]
started = time.monotonic()
try:
completed = runner(
argv,
capture_output=True,
text=True,
timeout=max_seconds,
env=dict(env) if env is not None else None,
)
except subprocess.TimeoutExpired as exc:
raise OrchestrationFailure(
OrchestratorStep.AIRBORNE_PIPELINE,
f"gps-denied-replay timed out after {max_seconds:.0f} s",
) from exc
except OSError as exc:
raise OrchestrationFailure(
OrchestratorStep.AIRBORNE_PIPELINE,
f"cannot launch gps-denied-replay at {replay_binary}: {exc!r}",
) from exc
elapsed_s = max(0.0, time.monotonic() - started)
if completed.returncode != 0:
raise OrchestrationFailure(
OrchestratorStep.AIRBORNE_PIPELINE,
f"gps-denied-replay exited {completed.returncode}\n"
f"stdout:\n{completed.stdout}\nstderr:\n{completed.stderr}",
)
logger.info(
"e2e_orchestrator: replay subprocess complete",
extra={
"kind": "e2e_orchestrator.replay_subprocess",
"kv": {
"elapsed_s": elapsed_s,
"max_seconds": max_seconds,
},
},
)
return elapsed_s
def _parse_jsonl(path: Path) -> list[dict[str, Any]]:
"""Read one JSON record per non-blank line.
Raises:
OrchestrationFailure (step=PARSE_EMISSIONS): Output file
missing, unreadable, has zero records, or contains a
malformed line.
"""
if not path.is_file():
raise OrchestrationFailure(
OrchestratorStep.PARSE_EMISSIONS,
f"replay output JSONL not found: {path}",
)
try:
text = path.read_text()
except OSError as exc:
raise OrchestrationFailure(
OrchestratorStep.PARSE_EMISSIONS,
f"replay output JSONL unreadable at {path}: {exc!r}",
) from exc
rows: list[dict[str, Any]] = []
for line_idx, line in enumerate(text.splitlines(), start=1):
if not line.strip():
continue
try:
row = json.loads(line)
except json.JSONDecodeError as exc:
raise OrchestrationFailure(
OrchestratorStep.PARSE_EMISSIONS,
f"malformed JSON at line {line_idx} of {path}: {exc.msg}",
) from exc
if not isinstance(row, dict):
raise OrchestrationFailure(
OrchestratorStep.PARSE_EMISSIONS,
f"line {line_idx} of {path} is not a JSON object: {row!r}",
)
rows.append(row)
if not rows:
raise OrchestrationFailure(
OrchestratorStep.PARSE_EMISSIONS,
f"replay output JSONL at {path} has zero records — pipeline "
"produced no estimator emissions",
)
return rows
def _load_ground_truth(tlog_path: Path) -> list[GroundTruthRow]:
"""Extract WGS84 ground truth from the binary tlog.
Raises:
OrchestrationFailure (step=LOAD_GROUND_TRUTH): Loader
error or empty record list.
"""
try:
series = load_tlog_ground_truth(tlog_path).records
except Exception as exc:
raise OrchestrationFailure(
OrchestratorStep.LOAD_GROUND_TRUTH,
f"load_tlog_ground_truth({tlog_path}) failed: {exc!r}",
) from exc
rows: list[GroundTruthRow] = [
GroundTruthRow(
t_s=fix.ts_ns / 1e9,
lat_deg=fix.lat_deg,
lon_deg=fix.lon_deg,
alt_m=fix.alt_m,
)
for fix in series
]
if not rows:
raise OrchestrationFailure(
OrchestratorStep.LOAD_GROUND_TRUTH,
f"tlog ground truth at {tlog_path} has zero rows",
)
return rows
def _compute_distribution(
emissions: list[dict[str, Any]],
ground_truth: list[GroundTruthRow],
) -> HorizontalErrorDistribution:
"""Compute the horizontal-error distribution.
Raises:
OrchestrationFailure (step=COMPUTE_DISTRIBUTION): Helper
error or zero ground-truth pairings (every emission
fell outside the GT time window).
"""
try:
distribution = horizontal_error_distribution(emissions, ground_truth)
except Exception as exc:
raise OrchestrationFailure(
OrchestratorStep.COMPUTE_DISTRIBUTION,
f"horizontal_error_distribution failed: {exc!r}",
) from exc
if distribution.count == 0:
raise OrchestrationFailure(
OrchestratorStep.COMPUTE_DISTRIBUTION,
"no emissions paired with ground truth — JSONL timestamps "
"outside the tlog GPS window?",
)
return distribution
def _render_and_write_report(
*,
distribution: HorizontalErrorDistribution,
context: ReportContext,
passed: bool,
report_dir: Path,
) -> Path:
"""Render the verdict markdown and write it to ``report_dir``.
Raises:
OrchestrationFailure (step=RENDER_REPORT): Render or write
failure; ``report_dir`` was already created by
:func:`_validate_inputs`.
"""
try:
report_text = render_report(distribution, context, passed=passed)
except Exception as exc:
raise OrchestrationFailure(
OrchestratorStep.RENDER_REPORT,
f"render_report failed: {exc!r}",
) from exc
report_path = (
report_dir / f"real_flight_validation_{context.run_date_utc}.md"
)
try:
report_path.write_text(report_text)
except OSError as exc:
raise OrchestrationFailure(
OrchestratorStep.RENDER_REPORT,
f"cannot write report at {report_path}: {exc!r}",
) from exc
return report_path
@@ -0,0 +1,182 @@
"""AZ-840 — E2E orchestrator integration test (AC-1 / AC-2 / AC-3 / AC-4 / AC-6).
The Tier-2 entry point that closes Epic AZ-835's narrative: from a
``(tlog, video, calibration)`` triple, run the full 7-step pipeline
end-to-end on the Jetson harness without operator hand-curation
between steps.
The test consumes:
* :func:`tests.e2e.replay.conftest.operator_pre_flight_setup` —
the AZ-839 C3 fixture that owns steps 3-5 (route extraction +
satellite-provider seeding + FAISS index build) and yields a
:class:`PopulatedC6Cache` keyed off a freshly-mktemp'd
``cache_root``.
* :func:`tests.e2e.replay.conftest.derkachi_replay_inputs` — the
shared session fixture that materialises the Derkachi tlog +
video + factory-sheet calibration + signing-key file.
* :func:`tests.e2e.replay._e2e_orchestrator.run_e2e_orchestration`
— the AC-1 driver that wires everything below the C3 fixture.
The driver writes a fresh effective replay config per session
(merging the static operator YAML with the cache_root override),
invokes ``gps-denied-replay --auto-trim``, parses the JSONL
emissions, computes the horizontal-error distribution, and writes
the verdict markdown under ``_docs/06_metrics/`` (AC-2).
Skip gates (in evaluation order):
1. ``@pytest.mark.tier2`` — the per-suite Tier-2 plugin gates this
off on dev macOS (matches the AZ-839 / AZ-699 contract).
2. ``RUN_REPLAY_E2E`` not in ``{1, true, yes, on}``.
3. ``gps-denied-replay`` console-script not on ``PATH``.
4. Real video missing or placeholder-sized (mirrors AZ-699's gate).
5. ``operator_pre_flight_setup`` fixture itself skipped — the
downstream consumer inherits the SKIP automatically (pytest's
fixture-skip propagation).
AC-7 (AZ-699 continues to pass) is satisfied by inspection: this
test does not modify ``test_derkachi_real_tlog.py`` and writes its
report to the same path (``real_flight_validation_<date>.md``) but
in an idempotent way — both tests writing PASS or both writing
FAIL is the expected joint outcome on a given clip.
"""
from __future__ import annotations
import os
import shutil
import sys
from collections.abc import Iterator
from pathlib import Path
import pytest
from tests.e2e.replay._e2e_orchestrator import (
OrchestrationReport,
run_e2e_orchestration,
)
from tests.e2e.replay._operator_pre_flight import PopulatedC6Cache
from tests.e2e.replay.conftest import DerkachiReplayInputs
def _repo_root() -> Path:
return Path(__file__).resolve().parents[3]
def _derkachi_dir() -> Path:
return _repo_root() / "_docs" / "00_problem" / "input_data" / "flight_derkachi"
_MIN_REAL_VIDEO_BYTES: int = 1_000_000
def _replay_binary() -> Path | None:
"""Return the absolute path to ``gps-denied-replay`` or ``None``.
Same lookup order AZ-699 uses: PATH first, venv bin second.
"""
binary = shutil.which("gps-denied-replay")
if binary is not None:
return Path(binary)
venv_bin = Path(sys.executable).parent / "gps-denied-replay"
if venv_bin.exists():
return venv_bin
return None
def _orchestrator_skip_reason() -> str | None:
"""Return a SKIP message when env / inputs preclude a Jetson run."""
if os.environ.get("RUN_REPLAY_E2E", "").strip().lower() not in {
"1",
"true",
"yes",
"on",
}:
return "AZ-840 e2e orchestrator gated by RUN_REPLAY_E2E=1"
if not os.environ.get("GPS_DENIED_OPERATOR_CONFIG_PATH", "").strip():
return (
"AZ-840 e2e orchestrator requires GPS_DENIED_OPERATOR_CONFIG_PATH "
"(same env var the C3 fixture consumes)"
)
if _replay_binary() is None:
return "gps-denied-replay console-script not installed"
video = _derkachi_dir() / "flight_derkachi.mp4"
if not video.is_file():
return f"Derkachi video missing: {video}"
if video.stat().st_size < _MIN_REAL_VIDEO_BYTES:
return (
f"Derkachi video at {video} is only {video.stat().st_size} "
"bytes — placeholder, not a real recording"
)
return None
@pytest.fixture
def az840_skip_gate() -> Iterator[None]:
"""Skip-gate the orchestrator test before any heavy fixtures resolve."""
reason = _orchestrator_skip_reason()
if reason is not None:
pytest.skip(reason)
yield
@pytest.mark.tier2
def test_az840_e2e_real_flight_orchestration(
az840_skip_gate: None,
operator_pre_flight_setup: PopulatedC6Cache,
derkachi_replay_inputs: DerkachiReplayInputs,
tmp_path: Path,
) -> None:
# Arrange — every input besides cache_root comes from the existing
# session fixtures so the same Tier-2 harness setup that powers
# AZ-699 + AZ-839 is exercised.
binary = _replay_binary()
assert binary is not None, "skip gate already verified the binary exists"
base_config_path = Path(os.environ["GPS_DENIED_OPERATOR_CONFIG_PATH"])
output_path = tmp_path / "estimator_output.jsonl"
effective_config_path = tmp_path / "operator_config_effective.yaml"
report_dir = _repo_root() / "_docs" / "06_metrics"
# Act
report = run_e2e_orchestration(
populated_cache=operator_pre_flight_setup,
base_config_path=base_config_path,
tlog_path=derkachi_replay_inputs.tlog_path,
video_path=derkachi_replay_inputs.video_path,
calibration_path=derkachi_replay_inputs.calibration_path,
signing_key_path=derkachi_replay_inputs.signing_key_path,
replay_binary=binary,
output_path=output_path,
report_dir=report_dir,
effective_config_path=effective_config_path,
)
# Assert AC-2 + AC-4 — report exists; full run within the 15-min budget.
assert isinstance(report, OrchestrationReport)
assert report.report_path.is_file()
body = report.report_path.read_text()
assert "## Horizontal error (metres)" in body
assert "## Threshold-hit share" in body
assert "Mean" in body
for threshold in (10, 25, 50, 100):
assert f"| {threshold} |" in body, (
f"threshold {threshold} m row missing from report"
)
assert report.replay_subprocess_seconds <= 900.0, (
"AZ-840 AC-4: replay subprocess exceeded 15-min soft target"
)
assert report.wall_clock_s >= report.replay_subprocess_seconds
assert report.distribution.count > 0, (
"no emissions paired with ground truth — orchestration produced "
"data but every emission fell outside the tlog GPS window"
)
# Assert AC-3 — the effective config was written and points at the
# cache_root the C3 fixture supplied.
assert effective_config_path.is_file()
effective_text = effective_config_path.read_text()
assert str(operator_pre_flight_setup.cache_root) in effective_text
@@ -0,0 +1,671 @@
"""Unit tests for the AZ-840 e2e orchestrator (AC-8).
The end-to-end happy path is the Tier-2 integration test in
``test_az835_e2e_real_flight.py`` (AC-1 / AC-2). This module covers
the orchestration helper layer in isolation:
* Param validation — every required path must exist before the
airborne subprocess is spawned (AC-5 fails LOUD).
* Effective-config merge — the ``c6_tile_cache.root_dir`` override
is written to YAML; the rest of the base config is preserved.
* Error propagation per step — every documented failure surfaces
as :class:`OrchestrationFailure` with the correct
:class:`OrchestratorStep` label.
* Happy path — when the runner returns success and the JSONL +
ground truth align, :class:`OrchestrationReport` carries a
written report path and an honest verdict (AC-2: report exists
PASS or FAIL).
The tests inject a fake ``runner`` so no real
``gps-denied-replay`` subprocess is spawned. Real binary execution
is exercised on the Jetson harness via the AC-1 integration test.
"""
from __future__ import annotations
import json
import subprocess
from pathlib import Path
from unittest.mock import MagicMock
import pytest
import yaml
from gps_denied_onboard.helpers.accuracy_report import (
AC3_GATE_THRESHOLD_M,
)
from gps_denied_onboard.replay_input.tlog_route import RouteSpec
from tests.e2e.replay._e2e_orchestrator import (
OrchestrationFailure,
OrchestrationReport,
OrchestratorStep,
read_calibration_acquisition_method,
run_e2e_orchestration,
write_effective_replay_config,
)
from tests.e2e.replay._operator_pre_flight import PopulatedC6Cache
# ----------------------------------------------------------------------
# Helpers
def _build_populated_cache(tmp_path: Path) -> PopulatedC6Cache:
"""Construct a synthetic :class:`PopulatedC6Cache`.
The orchestrator only consumes ``cache_root`` from the cache,
so the FAISS sidecar paths are placeholders. The route_spec is
a minimal one-waypoint instance — no AZ-836 invariants are
re-asserted by AZ-840.
"""
cache_root = tmp_path / "cache_root"
cache_root.mkdir()
return PopulatedC6Cache(
cache_root=cache_root,
tile_store_path=cache_root / "tiles",
faiss_index_path=cache_root / "descriptor.index",
faiss_sidecar_sha256_path=cache_root / "descriptor.index.sha256",
faiss_sidecar_meta_path=cache_root / "descriptor.index.meta.json",
route_spec=RouteSpec(
waypoints=((50.10, 36.10),),
suggested_region_size_meters=500.0,
source_tlog=Path("test.tlog"),
source_segment=(0, 100),
total_distance_meters=0.0,
),
tile_count=1,
elapsed_seconds=0.0,
)
def _stage_inputs(tmp_path: Path) -> dict[str, Path]:
"""Write touch-files for every input path the orchestrator validates.
The base config YAML carries one stub block so the merge step
has a real document to overlay on.
"""
base_config = tmp_path / "operator_config.yaml"
base_config.write_text(
yaml.safe_dump(
{
"mode": "replay",
"c6_tile_cache": {
"store_runtime": "postgres_filesystem",
"metadata_runtime": "postgres_filesystem",
"descriptor_index_runtime": "faiss_hnsw",
"root_dir": "/var/lib/gps-denied/tiles",
"faiss_index_path": "/some/static/path/descriptor.index",
},
}
)
)
tlog = tmp_path / "input.tlog"
tlog.write_bytes(b"\x00")
video = tmp_path / "input.mp4"
video.write_bytes(b"\x00")
calibration = tmp_path / "calibration.json"
calibration.write_text(json.dumps({"acquisition_method": "factory-sheet"}))
signing_key = tmp_path / "signing_key.bin"
signing_key.write_bytes(b"\x00" * 32)
binary = tmp_path / "gps-denied-replay"
binary.write_text("")
return {
"base_config_path": base_config,
"tlog_path": tlog,
"video_path": video,
"calibration_path": calibration,
"signing_key_path": signing_key,
"replay_binary": binary,
}
def _ground_truth_tlog_loader(
monkeypatch: pytest.MonkeyPatch,
*,
times_s: tuple[float, ...] = (0.0, 1.0, 2.0),
lat_deg: float = 50.10,
lon_deg: float = 36.10,
alt_m: float = 100.0,
) -> None:
"""Stub the orchestrator's ground-truth loader so unit tests skip MAVLink.
The orchestrator imports ``load_tlog_ground_truth`` from
``gps_denied_onboard.replay_input``; patching the symbol *as
bound on the orchestrator module* keeps the patch local to the
unit suite (no cross-test bleed).
"""
fixes = [
_StubGpsFix(
ts_ns=int(t * 1e9),
lat_deg=lat_deg,
lon_deg=lon_deg,
alt_m=alt_m,
)
for t in times_s
]
series = _StubGpsSeries(records=tuple(fixes))
monkeypatch.setattr(
"tests.e2e.replay._e2e_orchestrator.load_tlog_ground_truth",
lambda *_args, **_kwargs: series,
)
class _StubGpsFix:
"""Mirrors the fields the orchestrator reads from each tlog row."""
__slots__ = ("ts_ns", "lat_deg", "lon_deg", "alt_m")
def __init__(
self, *, ts_ns: int, lat_deg: float, lon_deg: float, alt_m: float
) -> None:
self.ts_ns = ts_ns
self.lat_deg = lat_deg
self.lon_deg = lon_deg
self.alt_m = alt_m
class _StubGpsSeries:
"""Drop-in replacement for :class:`TlogGroundTruth`."""
def __init__(self, *, records: tuple[_StubGpsFix, ...]) -> None:
self.records = records
def _build_runner_emitting(
output_path: Path,
*,
rows: list[dict[str, object]],
returncode: int = 0,
stdout: str = "",
stderr: str = "",
) -> "MagicMock":
"""Return a fake ``subprocess.run`` that writes JSONL on call."""
def _run(argv, **kwargs): # type: ignore[no-untyped-def]
if rows:
output_path.parent.mkdir(parents=True, exist_ok=True)
output_path.write_text(
"\n".join(json.dumps(row) for row in rows) + "\n"
)
return subprocess.CompletedProcess(
args=argv,
returncode=returncode,
stdout=stdout,
stderr=stderr,
)
return MagicMock(side_effect=_run)
# ----------------------------------------------------------------------
# write_effective_replay_config
def test_write_effective_replay_config_overlays_root_dir(
tmp_path: Path,
) -> None:
# Arrange
inputs = _stage_inputs(tmp_path)
cache_root = tmp_path / "cache"
cache_root.mkdir()
output_path = tmp_path / "effective.yaml"
# Act
written_path = write_effective_replay_config(
base_config_path=inputs["base_config_path"],
cache_root=cache_root,
output_path=output_path,
)
# Assert
assert written_path == output_path
merged = yaml.safe_load(output_path.read_text())
assert merged["c6_tile_cache"]["root_dir"] == str(cache_root)
assert merged["c6_tile_cache"]["faiss_index_path"] == ""
assert merged["mode"] == "replay"
assert (
merged["c6_tile_cache"]["store_runtime"] == "postgres_filesystem"
), "non-overridden c6_tile_cache fields must survive"
def test_write_effective_replay_config_creates_block_when_absent(
tmp_path: Path,
) -> None:
# Arrange
base = tmp_path / "operator.yaml"
base.write_text(yaml.safe_dump({"mode": "replay"}))
cache_root = tmp_path / "cache"
cache_root.mkdir()
# Act
write_effective_replay_config(
base_config_path=base,
cache_root=cache_root,
output_path=tmp_path / "effective.yaml",
)
# Assert
merged = yaml.safe_load((tmp_path / "effective.yaml").read_text())
assert merged["c6_tile_cache"]["root_dir"] == str(cache_root)
def test_write_effective_replay_config_malformed_yaml_fails(
tmp_path: Path,
) -> None:
# Arrange
base = tmp_path / "bad.yaml"
base.write_text(":\n : not yaml:")
cache_root = tmp_path / "cache"
cache_root.mkdir()
# Act + Assert
with pytest.raises(OrchestrationFailure) as exc_info:
write_effective_replay_config(
base_config_path=base,
cache_root=cache_root,
output_path=tmp_path / "effective.yaml",
)
assert exc_info.value.step is OrchestratorStep.WRITE_EFFECTIVE_CONFIG
def test_write_effective_replay_config_non_mapping_top_level_fails(
tmp_path: Path,
) -> None:
# Arrange
base = tmp_path / "bad.yaml"
base.write_text("- not a mapping\n")
cache_root = tmp_path / "cache"
cache_root.mkdir()
# Act + Assert
with pytest.raises(OrchestrationFailure) as exc_info:
write_effective_replay_config(
base_config_path=base,
cache_root=cache_root,
output_path=tmp_path / "effective.yaml",
)
assert exc_info.value.step is OrchestratorStep.WRITE_EFFECTIVE_CONFIG
# ----------------------------------------------------------------------
# read_calibration_acquisition_method
def test_read_calibration_acquisition_method_returns_field_when_present(
tmp_path: Path,
) -> None:
# Arrange
path = tmp_path / "cal.json"
path.write_text(json.dumps({"acquisition_method": "factory-sheet"}))
# Assert
assert read_calibration_acquisition_method(path) == "factory-sheet"
def test_read_calibration_acquisition_method_returns_unknown_on_missing(
tmp_path: Path,
) -> None:
# Arrange
path = tmp_path / "cal.json"
path.write_text(json.dumps({"some_other_field": True}))
# Assert
assert read_calibration_acquisition_method(path) == "unknown"
def test_read_calibration_acquisition_method_returns_unknown_on_malformed(
tmp_path: Path,
) -> None:
# Arrange
path = tmp_path / "cal.json"
path.write_text("{not valid json")
# Assert
assert read_calibration_acquisition_method(path) == "unknown"
# ----------------------------------------------------------------------
# run_e2e_orchestration — param validation (AC-5)
def test_run_e2e_orchestration_missing_tlog_fails_loud(
tmp_path: Path,
) -> None:
# Arrange
cache = _build_populated_cache(tmp_path)
inputs = _stage_inputs(tmp_path)
inputs["tlog_path"].unlink()
# Act + Assert
with pytest.raises(OrchestrationFailure) as exc_info:
run_e2e_orchestration(
populated_cache=cache,
output_path=tmp_path / "out.jsonl",
report_dir=tmp_path / "metrics",
effective_config_path=tmp_path / "eff.yaml",
**inputs, # type: ignore[arg-type]
)
assert exc_info.value.step is OrchestratorStep.VALIDATE_INPUTS
assert "tlog_path" in str(exc_info.value)
def test_run_e2e_orchestration_missing_binary_fails_loud(
tmp_path: Path,
) -> None:
# Arrange
cache = _build_populated_cache(tmp_path)
inputs = _stage_inputs(tmp_path)
inputs["replay_binary"].unlink()
# Act + Assert
with pytest.raises(OrchestrationFailure) as exc_info:
run_e2e_orchestration(
populated_cache=cache,
output_path=tmp_path / "out.jsonl",
report_dir=tmp_path / "metrics",
effective_config_path=tmp_path / "eff.yaml",
**inputs, # type: ignore[arg-type]
)
assert exc_info.value.step is OrchestratorStep.VALIDATE_INPUTS
assert "replay_binary" in str(exc_info.value)
# ----------------------------------------------------------------------
# run_e2e_orchestration — subprocess error propagation (AC-5)
def test_run_e2e_orchestration_replay_nonzero_exit_fails_loud(
tmp_path: Path,
monkeypatch: pytest.MonkeyPatch,
) -> None:
# Arrange
cache = _build_populated_cache(tmp_path)
inputs = _stage_inputs(tmp_path)
output_path = tmp_path / "out.jsonl"
runner = MagicMock(
return_value=subprocess.CompletedProcess(
args=[],
returncode=1,
stdout="",
stderr="boom",
)
)
_ground_truth_tlog_loader(monkeypatch)
# Act + Assert
with pytest.raises(OrchestrationFailure) as exc_info:
run_e2e_orchestration(
populated_cache=cache,
output_path=output_path,
report_dir=tmp_path / "metrics",
effective_config_path=tmp_path / "eff.yaml",
runner=runner,
**inputs, # type: ignore[arg-type]
)
assert exc_info.value.step is OrchestratorStep.AIRBORNE_PIPELINE
assert "exited 1" in str(exc_info.value)
assert "boom" in str(exc_info.value)
def test_run_e2e_orchestration_replay_timeout_fails_loud(
tmp_path: Path,
) -> None:
# Arrange
cache = _build_populated_cache(tmp_path)
inputs = _stage_inputs(tmp_path)
def _timeout(*_args, **_kwargs):
raise subprocess.TimeoutExpired(cmd=["replay"], timeout=0.1)
runner = MagicMock(side_effect=_timeout)
# Act + Assert
with pytest.raises(OrchestrationFailure) as exc_info:
run_e2e_orchestration(
populated_cache=cache,
output_path=tmp_path / "out.jsonl",
report_dir=tmp_path / "metrics",
effective_config_path=tmp_path / "eff.yaml",
runner=runner,
max_seconds=0.1,
**inputs, # type: ignore[arg-type]
)
assert exc_info.value.step is OrchestratorStep.AIRBORNE_PIPELINE
assert "timed out" in str(exc_info.value)
def test_run_e2e_orchestration_replay_oserror_fails_loud(
tmp_path: Path,
) -> None:
# Arrange
cache = _build_populated_cache(tmp_path)
inputs = _stage_inputs(tmp_path)
def _oserror(*_args, **_kwargs):
raise OSError("permission denied")
runner = MagicMock(side_effect=_oserror)
# Act + Assert
with pytest.raises(OrchestrationFailure) as exc_info:
run_e2e_orchestration(
populated_cache=cache,
output_path=tmp_path / "out.jsonl",
report_dir=tmp_path / "metrics",
effective_config_path=tmp_path / "eff.yaml",
runner=runner,
**inputs, # type: ignore[arg-type]
)
assert exc_info.value.step is OrchestratorStep.AIRBORNE_PIPELINE
assert "permission denied" in str(exc_info.value)
# ----------------------------------------------------------------------
# run_e2e_orchestration — empty / malformed JSONL (AC-5)
def test_run_e2e_orchestration_empty_jsonl_fails_loud(
tmp_path: Path,
monkeypatch: pytest.MonkeyPatch,
) -> None:
# Arrange
cache = _build_populated_cache(tmp_path)
inputs = _stage_inputs(tmp_path)
output_path = tmp_path / "out.jsonl"
def _runner(argv, **_kwargs): # type: ignore[no-untyped-def]
output_path.write_text("\n\n") # only blanks
return subprocess.CompletedProcess(args=argv, returncode=0, stdout="", stderr="")
runner = MagicMock(side_effect=_runner)
_ground_truth_tlog_loader(monkeypatch)
# Act + Assert
with pytest.raises(OrchestrationFailure) as exc_info:
run_e2e_orchestration(
populated_cache=cache,
output_path=output_path,
report_dir=tmp_path / "metrics",
effective_config_path=tmp_path / "eff.yaml",
runner=runner,
**inputs, # type: ignore[arg-type]
)
assert exc_info.value.step is OrchestratorStep.PARSE_EMISSIONS
def test_run_e2e_orchestration_malformed_jsonl_fails_loud(
tmp_path: Path,
monkeypatch: pytest.MonkeyPatch,
) -> None:
# Arrange
cache = _build_populated_cache(tmp_path)
inputs = _stage_inputs(tmp_path)
output_path = tmp_path / "out.jsonl"
def _runner(argv, **_kwargs): # type: ignore[no-untyped-def]
output_path.write_text('{"valid": true}\nnot a json line\n')
return subprocess.CompletedProcess(args=argv, returncode=0, stdout="", stderr="")
runner = MagicMock(side_effect=_runner)
_ground_truth_tlog_loader(monkeypatch)
# Act + Assert
with pytest.raises(OrchestrationFailure) as exc_info:
run_e2e_orchestration(
populated_cache=cache,
output_path=output_path,
report_dir=tmp_path / "metrics",
effective_config_path=tmp_path / "eff.yaml",
runner=runner,
**inputs, # type: ignore[arg-type]
)
assert exc_info.value.step is OrchestratorStep.PARSE_EMISSIONS
# ----------------------------------------------------------------------
# run_e2e_orchestration — ground truth loader failure (AC-5)
def test_run_e2e_orchestration_ground_truth_loader_failure_fails_loud(
tmp_path: Path,
monkeypatch: pytest.MonkeyPatch,
) -> None:
# Arrange
cache = _build_populated_cache(tmp_path)
inputs = _stage_inputs(tmp_path)
output_path = tmp_path / "out.jsonl"
runner = _build_runner_emitting(
output_path,
rows=[
{
"emitted_at": int(0.5 * 1e9),
"position_wgs84": {
"lat_deg": 50.10,
"lon_deg": 36.10,
"alt_m": 100.0,
},
}
],
)
def _raise(*_args, **_kwargs):
raise ValueError("tlog corrupt")
monkeypatch.setattr(
"tests.e2e.replay._e2e_orchestrator.load_tlog_ground_truth",
_raise,
)
# Act + Assert
with pytest.raises(OrchestrationFailure) as exc_info:
run_e2e_orchestration(
populated_cache=cache,
output_path=output_path,
report_dir=tmp_path / "metrics",
effective_config_path=tmp_path / "eff.yaml",
runner=runner,
**inputs, # type: ignore[arg-type]
)
assert exc_info.value.step is OrchestratorStep.LOAD_GROUND_TRUTH
assert "tlog corrupt" in str(exc_info.value)
# ----------------------------------------------------------------------
# run_e2e_orchestration — happy path (AC-1 / AC-2)
def test_run_e2e_orchestration_happy_path_writes_report(
tmp_path: Path,
monkeypatch: pytest.MonkeyPatch,
) -> None:
# Arrange
cache = _build_populated_cache(tmp_path)
inputs = _stage_inputs(tmp_path)
output_path = tmp_path / "out.jsonl"
report_dir = tmp_path / "metrics"
effective_config_path = tmp_path / "eff.yaml"
rows = [
{
"emitted_at": int(0.5 * 1e9),
"position_wgs84": {"lat_deg": 50.10, "lon_deg": 36.10, "alt_m": 100.0},
},
{
"emitted_at": int(1.5 * 1e9),
"position_wgs84": {"lat_deg": 50.10, "lon_deg": 36.10, "alt_m": 100.0},
},
]
runner = _build_runner_emitting(output_path, rows=rows)
_ground_truth_tlog_loader(monkeypatch)
# Act
report = run_e2e_orchestration(
populated_cache=cache,
output_path=output_path,
report_dir=report_dir,
effective_config_path=effective_config_path,
runner=runner,
run_date_utc="2026-05-23",
**inputs, # type: ignore[arg-type]
)
# Assert
assert isinstance(report, OrchestrationReport)
assert report.report_path.is_file()
assert report.emissions_count == 2
assert report.distribution.count == 2
assert report.verdict_passed is True
body = report.report_path.read_text()
assert "## Horizontal error (metres)" in body
assert "## Threshold-hit share" in body
assert f"| {AC3_GATE_THRESHOLD_M:g} |" in body
runner.assert_called_once()
argv_passed = runner.call_args.args[0]
assert str(effective_config_path) in argv_passed
assert "--auto-trim" in argv_passed
merged = yaml.safe_load(effective_config_path.read_text())
assert merged["c6_tile_cache"]["root_dir"] == str(cache.cache_root)
def test_run_e2e_orchestration_writes_report_even_on_fail_verdict(
tmp_path: Path,
monkeypatch: pytest.MonkeyPatch,
) -> None:
# Arrange — emissions are 1 km from ground truth, far above the 100 m gate.
cache = _build_populated_cache(tmp_path)
inputs = _stage_inputs(tmp_path)
output_path = tmp_path / "out.jsonl"
report_dir = tmp_path / "metrics"
rows = [
{
"emitted_at": int(0.5 * 1e9),
"position_wgs84": {"lat_deg": 50.110, "lon_deg": 36.110, "alt_m": 100.0},
},
{
"emitted_at": int(1.5 * 1e9),
"position_wgs84": {"lat_deg": 50.110, "lon_deg": 36.110, "alt_m": 100.0},
},
]
runner = _build_runner_emitting(output_path, rows=rows)
_ground_truth_tlog_loader(monkeypatch)
# Act
report = run_e2e_orchestration(
populated_cache=cache,
output_path=output_path,
report_dir=report_dir,
effective_config_path=tmp_path / "eff.yaml",
runner=runner,
run_date_utc="2026-05-23",
**inputs, # type: ignore[arg-type]
)
# Assert — AC-2: report exists regardless of PASS/FAIL.
assert report.verdict_passed is False
assert report.report_path.is_file()
assert "FAIL" in report.report_path.read_text()