[AZ-840] [AZ-835] e2e orchestrator test (E-AZ-835 C4)

Wraps the AZ-699 verdict-report path with the AZ-839 operator_pre_flight_setup C3 fixture so a single Tier-2 test takes only (tlog, video, calibration) and runs the full 7-step pipeline on the Jetson harness without operator hand-curation. New surface (tests-only, no src/ changes): - tests/e2e/replay/_e2e_orchestrator.py — orchestrator with OrchestratorStep enum, OrchestrationFailure exception (step prefix per AC-5), OrchestrationReport dataclass, write_effective_replay_config helper, and run_e2e_orchestration entry point covering steps 1-2-6-7. - tests/e2e/replay/test_e2e_orchestrator_unit.py — 17 unit tests covering each failure mode + happy path with mocked subprocess + ground-truth loader (AC-8). - tests/e2e/replay/test_az835_e2e_real_flight.py — Tier-2 + RUN_REPLAY_E2E gated integration test asserting verdict report exists, 15-min budget held (AC-1, AC-2, AC-3, AC-4, AC-6). The effective config write overlays c6_tile_cache.root_dir onto the static operator YAML at runtime so the airborne subprocess shares the cache_root the C3 fixture chose. Field- level merge — every other operator-config block stays verbatim. The static YAML on disk is never touched. Test run: tests/e2e/replay 45 passed, 10 skipped (10 skips were 9 pre-existing + 1 new tier2). No src/ touched, no AZ-839 driver changes; AC-7 (AZ-699 still passes) holds by inspection. Co-authored-by: Cursor <cursoragent@cursor.com>
2026-06-21 09:01:14 +00:00 · 2026-05-23 15:27:41 +03:00
parent 8c4be9ace0
commit ade0c86f2b
6 changed files with 1680 additions and 1 deletions
@@ -0,0 +1,171 @@
 # Batch 109 — Cycle 3 — AZ-840 e2e orchestrator test
 **Date**: 2026-05-23
 **Tasks**: AZ-840 (C4 — Epic AZ-835).
 **Story points**: 3 (per the task spec).
 **Jira status**: AZ-840 In Progress → In Testing at commit step.
 ## Why this batch exists
 Epic AZ-835 (real-flight e2e validation) needs a single Tier-2
 test that proves the 7-step pipeline runs from
 `(tlog, video, calibration)` to a horizontal-error verdict
 without operator hand-curation between steps. Steps 3-5 were
 delivered by AZ-839 (C3 — `operator_pre_flight_setup`); steps
 1-2-6-7 are this batch.
 The AZ-839 batch 108b follow-up note explicitly anticipated this
 batch: "AZ-840 will additionally need to feed the airborne
 replay binary a config that points at the same `cache_root`
 ... the cleanest path is for AZ-840 to write an effective YAML
 at runtime from the same override recipe used here."
 ## What this batch ships
 A driver module + unit test suite + Tier-2 integration test:
 * `tests/e2e/replay/_e2e_orchestrator.py` — wraps the AZ-699
  verdict-report path with the AZ-839 C3 fixture's
  `PopulatedC6Cache`. Public surface:
  * `OrchestratorStep` enum — failure-step labels per AC-5.
  * `OrchestrationFailure(step, message)` exception — wraps
    every step failure with the step name in the message prefix.
  * `OrchestrationReport` dataclass — verdict, distribution,
    paths, wall-clock measurements per AC-4.
  * `write_effective_replay_config` — small helper that overlays
    `c6_tile_cache.root_dir` onto the static operator YAML.
  * `read_calibration_acquisition_method` — mirror of AZ-699's
    helper so the report writer keeps the same shape.
  * `run_e2e_orchestration` — the AC-1 entry point wiring
    validate → write_config → airborne subprocess → parse JSONL
    → load tlog GT → compute distribution → render report.
 * `tests/e2e/replay/test_e2e_orchestrator_unit.py` — 17 unit
  tests covering each of the 7 steps' failure modes plus the
  happy path. The runner is injected (`subprocess.run` default)
  so unit tests stage synthetic JSONL output without touching
  the airborne binary. `load_tlog_ground_truth` is monkeypatched
  to return a synthetic 3-row series.
 * `tests/e2e/replay/test_az835_e2e_real_flight.py::
  test_az840_e2e_real_flight_orchestration` — Tier-2 + RUN_REPLAY_E2E
  gated test that consumes the C3 fixture + Derkachi inputs and
  asserts the verdict markdown is written, the threshold-hit
  share table is present, and the 15-min budget held.
 ## AC coverage
 | AC  | Description                                              | Coverage                                                                                                                  |
 |-----|----------------------------------------------------------|---------------------------------------------------------------------------------------------------------------------------|
 | AC-1| Steps 1-7 end-to-end on Tier-2 from a fresh tlog/video   | `test_az840_e2e_real_flight_orchestration` (Tier-2-gated); 17 unit tests prove the orchestrator structure                 |
 | AC-2| Verdict report exists either PASS or FAIL                | `test_run_e2e_orchestration_writes_report_even_on_fail_verdict` + integration assertion `report_path.is_file()`           |
 | AC-3| Reuses C3 fixture (`operator_pre_flight_setup`)          | Integration test consumes the fixture; effective config overlay points at `populated_cache.cache_root`                    |
 | AC-4| 15-min wall-time soft target on the Derkachi clip        | `_DEFAULT_MAX_SECONDS = 900.0` passed as `subprocess.run` `timeout`; integration asserts `replay_subprocess_seconds <= 900`|
 | AC-5| Mid-pipeline failure fails LOUD with a clear step prefix | `OrchestratorStep` enum + 8 step-specific failure unit tests (`validate`/`write_config`/`airborne` × 3/`parse` × 2/`gt`)  |
 | AC-6| Gated by `RUN_REPLAY_E2E=1` + Tier-2 marker              | `_orchestrator_skip_reason()` checks env vars + binary + video size; `@pytest.mark.tier2` decorator                       |
 | AC-7| AZ-699 verdict test continues to pass                    | No changes to `test_derkachi_real_tlog.py`; same `real_flight_validation_<date>.md` report path convention                |
 | AC-8| Unit-tested orchestration helper without Tier-2 inputs   | 17 unit tests covering config write (4) + calibration parse (3) + run helper (10) — all use mocked subprocess + GT loader |
 ## Test run results
 ```
 $ .venv/bin/pytest tests/e2e/replay/ -v --tb=short --timeout=60
 ============================ 45 passed, 10 skipped, 3 warnings in 0.78s ============
 ```
 Breakdown:
 * 17 new orchestrator unit tests pass.
 * 11 AZ-839 driver unit tests still pass (no driver changes).
 * 14 helper unit tests (`test_helpers.py`) still pass.
 * 3 derkachi-1min mode-agnostic AST tests still pass.
 * 10 skips: 1 new Tier-2 (this AZ-840 integration), 6
  RUN_REPLAY_E2E gated AZ-404 cases, 1 AC-8 D-PROJ-2 placeholder,
  1 Tier-2 AZ-699, 1 Tier-2 AZ-839 integration. None are
  regressions; the tier2 gate trips off-Jetson.
 ## Design notes
 ### `--auto-trim` ownership
 The orchestrator passes `--auto-trim` unconditionally so AZ-405 /
 AZ-698 active-flight-cut + tlog/video sync (Epic step 1) runs
 inside the airborne binary every time. The Epic narrative does
 not separate trim from the airborne pipeline; collapsing them
 into a single subprocess invocation matches AZ-699 and avoids
 duplicating the trim path.
 ### `clip_duration_s` parity with AZ-699
 `run_e2e_orchestration` computes
 `clip_duration_s = ground_truth[-1].t_s - ground_truth[0].t_s`
 exactly as `test_derkachi_real_tlog.py` does. This means both
 verdict reports name the same clip duration even when the
 trimmed video is shorter than the ground-truth window — a
 deliberate choice: the report header documents what the verdict
 covers, not what the binary processed.
 ### Effective config write — single source of truth
 `write_effective_replay_config` materialises the same override
 recipe AZ-839 uses in-memory, but on disk so the airborne
 subprocess sees the cache_root the fixture chose. Field-level
 merge: every other block in the operator YAML is preserved
 verbatim; only `c6_tile_cache.root_dir` and
 `c6_tile_cache.faiss_index_path` are overwritten. The static
 operator YAML on disk is never touched.
 ### Failure surface = step prefix
 `OrchestrationFailure` always prefixes its message with
 `[<step>]`. CI log scrapers and pytest's traceback printer both
 surface the prefix on the first line; AC-5 ("clear error
 pointing at the failing step") holds without requiring the test
 to inspect the exception object. The step is also exposed as
 `exc.step` for programmatic assertions.
 ## Files changed
 * `tests/e2e/replay/_e2e_orchestrator.py` (new, 656 LOC).
 * `tests/e2e/replay/test_e2e_orchestrator_unit.py` (new, 660+ LOC).
 * `tests/e2e/replay/test_az835_e2e_real_flight.py` (new, 156 LOC).
 No `src/` changes, no operator-config YAML changes, no AZ-839
 driver changes. AZ-840 is purely additive at the test layer.
 ## Code review (self-review)
 Verdict: **PASS_WITH_WARNINGS**.
 | Phase | Result |
 |-------|--------|
 | 1. Context loading | Re-read `gps_compare.py`, `accuracy_report.py`, `replay_input.py`, `cli/replay.py`, `test_derkachi_real_tlog.py`. Emission schema (`emitted_at`, `position_wgs84`) is the same shape `gps-denied-replay` writes. |
 | 2. Spec compliance | All 8 AZ-840 ACs covered; AC-7 holds by inspection (no AZ-699 changes). |
 | 3. Code quality | All public types have docstrings; failure messages name the upstream exception via `repr` so `OSError` / `subprocess.TimeoutExpired` carry through. Runner kw-args mirror `subprocess.run` signature 1:1. |
 | 4. Security quick-scan | Effective config write goes to a tmp file the test owns; no secrets in the YAML overlay (override is two string fields). Subprocess `env` is opt-in (`None` defaults to `os.environ`). |
 | 5. Performance scan | Unit tests run in 0.51 s. Tier-2 wall-clock cap is 900 s, enforced by the subprocess timeout. |
 | 6. Cross-task consistency | `clip_duration_s` and `report_path` match AZ-699 exactly so a single Jetson run produces the same markdown shape. |
 | 7. Architecture compliance | Orchestrator lives entirely under `tests/e2e/replay/`; no `src/` writes. C3 fixture's invariants (`PopulatedC6Cache.cache_root` is the single source of truth) propagate via `write_effective_replay_config`. |
 ## Findings
 | ID | Severity | Description | Disposition |
 |----|----------|-------------|-------------|
 | F1 | Low | `_default_tile_decoder` in `conftest.py` (carried from batch 108) — still raw TIFF. Not in the AZ-840 path; AZ-840 doesn't change tile decoding. | Defer; no AZ-840 ticket. |
 | F2 | Low | `_resolve_replay_descriptor_dim` is NetVLAD-only (carried from batch 108). AZ-840 doesn't change descriptors. | Defer; no AZ-840 ticket. |
 | F3 | Low | `--pace asap` is hardcoded in `_run_replay_subprocess` argv; the AZ-699 test passes `--pace asap` too, so behaviour is identical. If a future test wants a real-time pace, the runner kwarg is the seam. | Document; no ticket. |
 | F4 | Low | `_run_replay_subprocess` does not stream stdout/stderr; failures surface only after the subprocess exits. For 15-min runs this means the operator sees no progress until the budget expires. AZ-699 has the same shape. | Document; consider an AZ-* if the budget grows. |
 ## Notes for follow-up
 * AZ-840 lands the orchestrator test as Tier-2-gated. Verifying
  the Tier-2 path actually runs on the Jetson harness is the
  next gating step before Epic AZ-835 can flip from "covered by
  unit tests" to "covered by Tier-2 integration".
 * `_e2e_orchestrator.py` is intentionally kept under `tests/`
  rather than promoted to `src/`. If a second consumer of the
  same orchestration shape appears (e.g. AZ-833 mock-suite-sat
  parity test), the move to a shared helper module under
  `src/gps_denied_onboard/replay/` is the right next step;
  for now the test-only location matches the helper's only
  consumer.
 * AZ-841 (Tier-2 unxfail follow-up) and AZ-842 (replay protocol
  + orchestrator docs) sit downstream — both should reference
  this batch report in their planning sections.
@@ -8,7 +8,7 @@ status: in_progress
 sub_step:
  phase: 7
  name: batch-loop
-  detail: "batch 109 next; AZ-840 C4"
+  detail: "batch 110 next; full-suite gate after AZ-840 C4 ship"
 retry_count: 0
 cycle: 3
 tracker: jira
@@ -0,0 +1,655 @@
 """E2E orchestrator for the AZ-835 7-step pipeline (AZ-840 / Epic AZ-835 C4).
 Wraps the AZ-699 verdict-report writing path with the AZ-839 C3
 fixture's `PopulatedC6Cache` so a single Tier-2 test can run from
 ``(tlog, video, calibration)`` to a horizontal-error report without
 operator hand-curation between steps. The 7-step Epic narrative
 (``_docs/02_tasks/todo/AZ-840_e2e_orchestrator_test.md``):
 1. Active flight cut + tlog/video sync — handled by ``gps-denied-replay``
   ``--auto-trim`` (AZ-405 / AZ-698) inside the airborne binary.
 2. On-fly frame + IMU extraction — same binary's per-frame loop.
 3. Auto-create route — done by the C3 fixture
   (``operator_pre_flight_setup`` calls ``extract_route_from_tlog``).
 4. POST route to satellite-provider — C3 fixture (AZ-838
   ``SatelliteProviderRouteClient.seed_route``).
 5. Build FAISS index — C3 fixture (AZ-322 ``DescriptorBatcher``).
 6. Run gps-denied airborne pipeline — this module's
   ``_run_replay_subprocess`` invokes ``gps-denied-replay`` against
   the populated cache.
 7. Get GPS fixes, check vs tlog GPS — this module's
   ``_load_ground_truth`` + ``horizontal_error_distribution`` +
   ``render_report`` writes the verdict markdown.
 The C3 fixture mutates ``c6_tile_cache.root_dir`` to point at a
 ``tmp_path_factory.mktemp`` value (AZ-839 batch 108b). The static
 operator YAML at ``GPS_DENIED_OPERATOR_CONFIG_PATH`` cannot know
 that path. ``write_effective_replay_config`` reads the static YAML,
 overlays the ``c6_tile_cache.root_dir`` override, writes the merged
 result to a tmp file, and returns the path the airborne binary
 will load via ``--config``. This keeps a single source of truth
 for the cache_root override across the in-memory C3 fixture path
 and the subprocess airborne path.
 Public surface — re-exported from this module:
 * :class:`OrchestratorStep` — failure-step labels per AC-5 ("fails
  LOUD with a clear error pointing at the failing step").
 * :class:`OrchestrationFailure` — wraps the underlying exception
  with the step that produced it.
 * :class:`OrchestrationReport` — return value of
  :func:`run_e2e_orchestration` (verdict, distribution, paths,
  wall-clock measurements per AC-4).
 * :func:`write_effective_replay_config` — small helper for the
  config merge step.
 * :func:`run_e2e_orchestration` — the AC-1 entry point.
 """
 from __future__ import annotations
 import datetime
 import json
 import logging
 import subprocess
 import time
 from collections.abc import Callable, Mapping
 from dataclasses import dataclass
 from enum import Enum
 from pathlib import Path
 from typing import Any
 import yaml
 from gps_denied_onboard.helpers.accuracy_report import (
    AC3_GATE_PCT,
    AC3_GATE_THRESHOLD_M,
    ReportContext,
    render_report,
    verdict_passes_ac3,
 )
 from gps_denied_onboard.helpers.gps_compare import (
    GroundTruthRow,
    HorizontalErrorDistribution,
    horizontal_error_distribution,
 )
 from gps_denied_onboard.replay_input import load_tlog_ground_truth
 from tests.e2e.replay._operator_pre_flight import PopulatedC6Cache
 __all__ = [
    "OrchestrationFailure",
    "OrchestrationReport",
    "OrchestratorStep",
    "read_calibration_acquisition_method",
    "run_e2e_orchestration",
    "write_effective_replay_config",
 ]
 # Replay-subprocess wall-clock cap for the Derkachi clip per AZ-840
 # AC-4 (15 min soft target). Exposed as a default that the integration
 # test can override; the unit tests rely on the contract that the
 # runner argument is a free callable.
 _DEFAULT_MAX_SECONDS: float = 900.0
 _LOGGER = logging.getLogger("tests.e2e.replay.e2e_orchestrator")
 class OrchestratorStep(str, Enum):
    """Labels for the 7-step pipeline used by :class:`OrchestrationFailure`.
    AC-5: every failure that reaches the test surface must name the
    step that produced it. The string values are stable so test
    assertions and log readers can match on them.
    """
    VALIDATE_INPUTS = "validate_inputs"
    WRITE_EFFECTIVE_CONFIG = "write_effective_config"
    AIRBORNE_PIPELINE = "airborne_pipeline"
    PARSE_EMISSIONS = "parse_emissions"
    LOAD_GROUND_TRUTH = "load_ground_truth"
    COMPUTE_DISTRIBUTION = "compute_distribution"
    RENDER_REPORT = "render_report"
 class OrchestrationFailure(RuntimeError):
    """Failure inside one of the 7 orchestration steps (AC-5).
    The :attr:`step` attribute names the failing step; the message
    embeds it as the prefix so plain log-readers see the failure
    location without inspecting the exception object.
    """
    def __init__(self, step: OrchestratorStep, message: str) -> None:
        super().__init__(f"[{step.value}] {message}")
        self.step = step
@dataclass(frozen=True, slots=True)
 class OrchestrationReport:
    """Return value of :func:`run_e2e_orchestration`.
    Attributes:
        verdict_passed: ``True`` iff the run met the AZ-696 epic
            AC-3 gate (>= AC3_GATE_PCT% within AC3_GATE_THRESHOLD_M m).
        distribution: Computed horizontal-error distribution.
        report_path: Markdown report written under ``report_dir``.
        emissions_count: Total estimator-output records consumed.
        wall_clock_s: Wall-clock seconds for the orchestration run
            (excludes the C3 fixture setup; covers steps 1-2-6-7).
        replay_subprocess_seconds: Wall-clock seconds the airborne
            replay subprocess took. Always <= ``wall_clock_s``.
    """
    verdict_passed: bool
    distribution: HorizontalErrorDistribution
    report_path: Path
    emissions_count: int
    wall_clock_s: float
    replay_subprocess_seconds: float
 def read_calibration_acquisition_method(calibration_path: Path) -> str:
    """Return the AZ-702 ``acquisition_method`` field, or ``"unknown"``.
    Mirrors ``test_derkachi_real_tlog._read_calibration_acquisition_method``
    so the AZ-840 verdict report can name the calibration provenance
    in its failure message (AZ-699 AC-3). Pure helper; the report
    writer needs the string, not the JSON.
    """
    try:
        data = json.loads(calibration_path.read_text())
    except (OSError, json.JSONDecodeError):
        return "unknown"
    method = data.get("acquisition_method")
    if isinstance(method, str) and method:
        return method
    return "unknown"
 def write_effective_replay_config(
    *,
    base_config_path: Path,
    cache_root: Path,
    output_path: Path,
 ) -> Path:
    """Merge cache_root override into the static operator YAML.
    Reads ``base_config_path`` as YAML, sets the
    ``c6_tile_cache.root_dir`` to ``cache_root`` (forcing the
    FAISS index path to fall back to ``<cache_root>/descriptor.index``),
    and writes the merged document to ``output_path`` as YAML.
    The merge is field-level: every other block in the base YAML is
    preserved verbatim. This keeps a single source of truth for the
    operator config — the test harness only contributes the dynamic
    cache_root.
    Returns:
        The ``output_path`` argument, for ergonomic chaining.
    Raises:
        OrchestrationFailure (step=WRITE_EFFECTIVE_CONFIG): Base YAML
            unreadable, malformed, or not a top-level mapping.
    """
    try:
        base_text = base_config_path.read_text()
    except OSError as exc:
        raise OrchestrationFailure(
            OrchestratorStep.WRITE_EFFECTIVE_CONFIG,
            f"cannot read base config at {base_config_path}: {exc!r}",
        ) from exc
    try:
        base_data = yaml.safe_load(base_text) or {}
    except yaml.YAMLError as exc:
        raise OrchestrationFailure(
            OrchestratorStep.WRITE_EFFECTIVE_CONFIG,
            f"base config YAML at {base_config_path} is malformed: {exc!r}",
        ) from exc
    if not isinstance(base_data, dict):
        raise OrchestrationFailure(
            OrchestratorStep.WRITE_EFFECTIVE_CONFIG,
            f"base config YAML at {base_config_path} must be a mapping; "
            f"got {type(base_data).__name__}",
        )
    c6_block_raw = base_data.get("c6_tile_cache")
    c6_block = dict(c6_block_raw) if isinstance(c6_block_raw, dict) else {}
    c6_block["root_dir"] = str(cache_root)
    c6_block["faiss_index_path"] = ""
    base_data["c6_tile_cache"] = c6_block
    try:
        output_path.parent.mkdir(parents=True, exist_ok=True)
        output_path.write_text(
            yaml.safe_dump(base_data, sort_keys=True, default_flow_style=False)
        )
    except OSError as exc:
        raise OrchestrationFailure(
            OrchestratorStep.WRITE_EFFECTIVE_CONFIG,
            f"cannot write effective config at {output_path}: {exc!r}",
        ) from exc
    return output_path
 def run_e2e_orchestration(
    *,
    populated_cache: PopulatedC6Cache,
    base_config_path: Path,
    tlog_path: Path,
    video_path: Path,
    calibration_path: Path,
    signing_key_path: Path,
    replay_binary: Path,
    output_path: Path,
    report_dir: Path,
    effective_config_path: Path,
    run_date_utc: str | None = None,
    runner: Callable[..., subprocess.CompletedProcess[str]] = subprocess.run,
    subprocess_env: Mapping[str, str] | None = None,
    max_seconds: float = _DEFAULT_MAX_SECONDS,
    logger: logging.Logger | None = None,
 ) -> OrchestrationReport:
    """Run AZ-835 steps 1-7 against the AZ-839 populated cache.
    Steps 3-5 are the responsibility of ``populated_cache`` (the
    AZ-839 C3 fixture); this function covers 1-2-6 (the airborne
    replay subprocess) and 7 (verdict report). The C3 fixture and
    this function share the cache_root via
    :func:`write_effective_replay_config` so the airborne binary
    reads the same FAISS index the fixture wrote (AC-3).
    Args:
        populated_cache: C3 fixture output (AZ-839). Carries
            ``cache_root``, ``faiss_index_path``, and the route
            spec the test pipeline produced.
        base_config_path: Static operator config YAML
            (``GPS_DENIED_OPERATOR_CONFIG_PATH``). Must register
            ``c6_tile_cache``, ``c10_provisioning``, ``c2_vpr``,
            ``c4_pose``, and ``c5_state`` blocks for the airborne
            binary to compose the replay graph.
        tlog_path: ArduPilot binary tlog the test consumes.
        video_path: Flight video file the test consumes.
        calibration_path: Camera calibration JSON (AZ-702
            factory-sheet for Derkachi).
        signing_key_path: MAVLink signing-key file. Replay protocol
            Invariant 11 — required even for the noop transport.
        replay_binary: ``gps-denied-replay`` console-script path.
        output_path: Where the airborne binary writes JSONL
            estimator emissions.
        report_dir: Directory the verdict markdown is written to.
        effective_config_path: Where the cache_root-merged YAML is
            written. The path is passed to the airborne binary via
            ``--config``.
        run_date_utc: ISO-8601 date for the report filename and
            header. Defaults to today UTC.
        runner: ``subprocess.run`` by default; tests inject a fake
            that emits a synthetic JSONL output.
        subprocess_env: Optional environment overlay for the
            replay subprocess. ``None`` means ``os.environ``.
        max_seconds: Hard wall-clock cap for the airborne replay
            subprocess. The orchestrator times out the runner via
            its ``timeout`` kwarg; an exceeded budget surfaces as
            ``OrchestrationFailure(step=AIRBORNE_PIPELINE)``.
        logger: Optional logger. Defaults to the module logger.
    Returns:
        :class:`OrchestrationReport` on success. The verdict can
        be PASS or FAIL — AC-2 mandates the report exists either
        way.
    Raises:
        OrchestrationFailure: Any of the 7 steps failed. The
            ``step`` attribute names the failing step.
    """
    log = logger or _LOGGER
    started = time.monotonic()
    effective_run_date = run_date_utc or (
        datetime.datetime.now(datetime.timezone.utc).date().isoformat()
    )
    _validate_inputs(
        base_config_path=base_config_path,
        tlog_path=tlog_path,
        video_path=video_path,
        calibration_path=calibration_path,
        signing_key_path=signing_key_path,
        replay_binary=replay_binary,
        report_dir=report_dir,
    )
    write_effective_replay_config(
        base_config_path=base_config_path,
        cache_root=populated_cache.cache_root,
        output_path=effective_config_path,
    )
    replay_subprocess_seconds = _run_replay_subprocess(
        replay_binary=replay_binary,
        video_path=video_path,
        tlog_path=tlog_path,
        output_path=output_path,
        calibration_path=calibration_path,
        config_path=effective_config_path,
        signing_key_path=signing_key_path,
        max_seconds=max_seconds,
        runner=runner,
        env=subprocess_env,
        logger=log,
    )
    emissions = _parse_jsonl(output_path)
    ground_truth = _load_ground_truth(tlog_path)
    distribution = _compute_distribution(emissions, ground_truth)
    context = ReportContext(
        run_date_utc=effective_run_date,
        tlog_path=tlog_path,
        video_path=video_path,
        calibration_acquisition_method=read_calibration_acquisition_method(
            calibration_path
        ),
        clip_duration_s=(
            ground_truth[-1].t_s - ground_truth[0].t_s
            if ground_truth
            else 0.0
        ),
        emissions_count=len(emissions),
    )
    verdict_passed = verdict_passes_ac3(distribution)
    report_path = _render_and_write_report(
        distribution=distribution,
        context=context,
        passed=verdict_passed,
        report_dir=report_dir,
    )
    log.info(
        "e2e_orchestrator: report written",
        extra={
            "kind": "e2e_orchestrator.report_written",
            "kv": {
                "report_path": str(report_path),
                "verdict_passed": verdict_passed,
                "share_within_threshold_pct": (
                    distribution.threshold_hit_share.get(
                        AC3_GATE_THRESHOLD_M, 0.0
                    )
                    * 100.0
                ),
                "ac3_gate_pct": AC3_GATE_PCT,
                "emissions_count": len(emissions),
                "ground_truth_pairings": distribution.count,
            },
        },
    )
    wall_clock_s = max(0.0, time.monotonic() - started)
    return OrchestrationReport(
        verdict_passed=verdict_passed,
        distribution=distribution,
        report_path=report_path,
        emissions_count=len(emissions),
        wall_clock_s=wall_clock_s,
        replay_subprocess_seconds=replay_subprocess_seconds,
    )
 def _validate_inputs(
    *,
    base_config_path: Path,
    tlog_path: Path,
    video_path: Path,
    calibration_path: Path,
    signing_key_path: Path,
    replay_binary: Path,
    report_dir: Path,
 ) -> None:
    """Fail fast on missing inputs (AC-5 — surface the failing step early)."""
    file_inputs: tuple[tuple[str, Path], ...] = (
        ("base_config_path", base_config_path),
        ("tlog_path", tlog_path),
        ("video_path", video_path),
        ("calibration_path", calibration_path),
        ("signing_key_path", signing_key_path),
        ("replay_binary", replay_binary),
    )
    for label, path in file_inputs:
        if not path.is_file():
            raise OrchestrationFailure(
                OrchestratorStep.VALIDATE_INPUTS,
                f"{label} is not a file: {path}",
            )
    try:
        report_dir.mkdir(parents=True, exist_ok=True)
    except OSError as exc:
        raise OrchestrationFailure(
            OrchestratorStep.VALIDATE_INPUTS,
            f"report_dir {report_dir} cannot be created: {exc!r}",
        ) from exc
 def _run_replay_subprocess(
    *,
    replay_binary: Path,
    video_path: Path,
    tlog_path: Path,
    output_path: Path,
    calibration_path: Path,
    config_path: Path,
    signing_key_path: Path,
    max_seconds: float,
    runner: Callable[..., subprocess.CompletedProcess[str]],
    env: Mapping[str, str] | None,
    logger: logging.Logger,
 ) -> float:
    """Invoke gps-denied-replay with --auto-trim; return wall-clock seconds.
    Wraps :class:`subprocess.run` so unit tests can inject a fake
    runner. ``--auto-trim`` is always enabled here — the
    orchestrator owns the AZ-405 / AZ-698 sync path (AZ-840 step 1).
    Raises:
        OrchestrationFailure (step=AIRBORNE_PIPELINE): Non-zero exit,
            timeout, or runner-level OSError.
    """
    argv = [
        str(replay_binary),
        "--video",
        str(video_path),
        "--tlog",
        str(tlog_path),
        "--output",
        str(output_path),
        "--camera-calibration",
        str(calibration_path),
        "--config",
        str(config_path),
        "--mavlink-signing-key",
        str(signing_key_path),
        "--pace",
        "asap",
        "--auto-trim",
    ]
    started = time.monotonic()
    try:
        completed = runner(
            argv,
            capture_output=True,
            text=True,
            timeout=max_seconds,
            env=dict(env) if env is not None else None,
        )
    except subprocess.TimeoutExpired as exc:
        raise OrchestrationFailure(
            OrchestratorStep.AIRBORNE_PIPELINE,
            f"gps-denied-replay timed out after {max_seconds:.0f} s",
        ) from exc
    except OSError as exc:
        raise OrchestrationFailure(
            OrchestratorStep.AIRBORNE_PIPELINE,
            f"cannot launch gps-denied-replay at {replay_binary}: {exc!r}",
        ) from exc
    elapsed_s = max(0.0, time.monotonic() - started)
    if completed.returncode != 0:
        raise OrchestrationFailure(
            OrchestratorStep.AIRBORNE_PIPELINE,
            f"gps-denied-replay exited {completed.returncode}\n"
            f"stdout:\n{completed.stdout}\nstderr:\n{completed.stderr}",
        )
    logger.info(
        "e2e_orchestrator: replay subprocess complete",
        extra={
            "kind": "e2e_orchestrator.replay_subprocess",
            "kv": {
                "elapsed_s": elapsed_s,
                "max_seconds": max_seconds,
            },
        },
    )
    return elapsed_s
 def _parse_jsonl(path: Path) -> list[dict[str, Any]]:
    """Read one JSON record per non-blank line.
    Raises:
        OrchestrationFailure (step=PARSE_EMISSIONS): Output file
            missing, unreadable, has zero records, or contains a
            malformed line.
    """
    if not path.is_file():
        raise OrchestrationFailure(
            OrchestratorStep.PARSE_EMISSIONS,
            f"replay output JSONL not found: {path}",
        )
    try:
        text = path.read_text()
    except OSError as exc:
        raise OrchestrationFailure(
            OrchestratorStep.PARSE_EMISSIONS,
            f"replay output JSONL unreadable at {path}: {exc!r}",
        ) from exc
    rows: list[dict[str, Any]] = []
    for line_idx, line in enumerate(text.splitlines(), start=1):
        if not line.strip():
            continue
        try:
            row = json.loads(line)
        except json.JSONDecodeError as exc:
            raise OrchestrationFailure(
                OrchestratorStep.PARSE_EMISSIONS,
                f"malformed JSON at line {line_idx} of {path}: {exc.msg}",
            ) from exc
        if not isinstance(row, dict):
            raise OrchestrationFailure(
                OrchestratorStep.PARSE_EMISSIONS,
                f"line {line_idx} of {path} is not a JSON object: {row!r}",
            )
        rows.append(row)
    if not rows:
        raise OrchestrationFailure(
            OrchestratorStep.PARSE_EMISSIONS,
            f"replay output JSONL at {path} has zero records — pipeline "
            "produced no estimator emissions",
        )
    return rows
 def _load_ground_truth(tlog_path: Path) -> list[GroundTruthRow]:
    """Extract WGS84 ground truth from the binary tlog.
    Raises:
        OrchestrationFailure (step=LOAD_GROUND_TRUTH): Loader
            error or empty record list.
    """
    try:
        series = load_tlog_ground_truth(tlog_path).records
    except Exception as exc:
        raise OrchestrationFailure(
            OrchestratorStep.LOAD_GROUND_TRUTH,
            f"load_tlog_ground_truth({tlog_path}) failed: {exc!r}",
        ) from exc
    rows: list[GroundTruthRow] = [
        GroundTruthRow(
            t_s=fix.ts_ns / 1e9,
            lat_deg=fix.lat_deg,
            lon_deg=fix.lon_deg,
            alt_m=fix.alt_m,
        )
        for fix in series
    ]
    if not rows:
        raise OrchestrationFailure(
            OrchestratorStep.LOAD_GROUND_TRUTH,
            f"tlog ground truth at {tlog_path} has zero rows",
        )
    return rows
 def _compute_distribution(
    emissions: list[dict[str, Any]],
    ground_truth: list[GroundTruthRow],
 ) -> HorizontalErrorDistribution:
    """Compute the horizontal-error distribution.
    Raises:
        OrchestrationFailure (step=COMPUTE_DISTRIBUTION): Helper
            error or zero ground-truth pairings (every emission
            fell outside the GT time window).
    """
    try:
        distribution = horizontal_error_distribution(emissions, ground_truth)
    except Exception as exc:
        raise OrchestrationFailure(
            OrchestratorStep.COMPUTE_DISTRIBUTION,
            f"horizontal_error_distribution failed: {exc!r}",
        ) from exc
    if distribution.count == 0:
        raise OrchestrationFailure(
            OrchestratorStep.COMPUTE_DISTRIBUTION,
            "no emissions paired with ground truth — JSONL timestamps "
            "outside the tlog GPS window?",
        )
    return distribution
 def _render_and_write_report(
    *,
    distribution: HorizontalErrorDistribution,
    context: ReportContext,
    passed: bool,
    report_dir: Path,
 ) -> Path:
    """Render the verdict markdown and write it to ``report_dir``.
    Raises:
        OrchestrationFailure (step=RENDER_REPORT): Render or write
            failure; ``report_dir`` was already created by
            :func:`_validate_inputs`.
    """
    try:
        report_text = render_report(distribution, context, passed=passed)
    except Exception as exc:
        raise OrchestrationFailure(
            OrchestratorStep.RENDER_REPORT,
            f"render_report failed: {exc!r}",
        ) from exc
    report_path = (
        report_dir / f"real_flight_validation_{context.run_date_utc}.md"
    )
    try:
        report_path.write_text(report_text)
    except OSError as exc:
        raise OrchestrationFailure(
            OrchestratorStep.RENDER_REPORT,
            f"cannot write report at {report_path}: {exc!r}",
        ) from exc
    return report_path
@@ -0,0 +1,182 @@
 """AZ-840 — E2E orchestrator integration test (AC-1 / AC-2 / AC-3 / AC-4 / AC-6).
 The Tier-2 entry point that closes Epic AZ-835's narrative: from a
 ``(tlog, video, calibration)`` triple, run the full 7-step pipeline
 end-to-end on the Jetson harness without operator hand-curation
 between steps.
 The test consumes:
 * :func:`tests.e2e.replay.conftest.operator_pre_flight_setup` —
  the AZ-839 C3 fixture that owns steps 3-5 (route extraction +
  satellite-provider seeding + FAISS index build) and yields a
  :class:`PopulatedC6Cache` keyed off a freshly-mktemp'd
  ``cache_root``.
 * :func:`tests.e2e.replay.conftest.derkachi_replay_inputs` — the
  shared session fixture that materialises the Derkachi tlog +
  video + factory-sheet calibration + signing-key file.
 * :func:`tests.e2e.replay._e2e_orchestrator.run_e2e_orchestration`
  — the AC-1 driver that wires everything below the C3 fixture.
 The driver writes a fresh effective replay config per session
 (merging the static operator YAML with the cache_root override),
 invokes ``gps-denied-replay --auto-trim``, parses the JSONL
 emissions, computes the horizontal-error distribution, and writes
 the verdict markdown under ``_docs/06_metrics/`` (AC-2).
 Skip gates (in evaluation order):
 1. ``@pytest.mark.tier2`` — the per-suite Tier-2 plugin gates this
   off on dev macOS (matches the AZ-839 / AZ-699 contract).
 2. ``RUN_REPLAY_E2E`` not in ``{1, true, yes, on}``.
 3. ``gps-denied-replay`` console-script not on ``PATH``.
 4. Real video missing or placeholder-sized (mirrors AZ-699's gate).
 5. ``operator_pre_flight_setup`` fixture itself skipped — the
   downstream consumer inherits the SKIP automatically (pytest's
   fixture-skip propagation).
 AC-7 (AZ-699 continues to pass) is satisfied by inspection: this
 test does not modify ``test_derkachi_real_tlog.py`` and writes its
 report to the same path (``real_flight_validation_<date>.md``) but
 in an idempotent way — both tests writing PASS or both writing
 FAIL is the expected joint outcome on a given clip.
 """
 from __future__ import annotations
 import os
 import shutil
 import sys
 from collections.abc import Iterator
 from pathlib import Path
 import pytest
 from tests.e2e.replay._e2e_orchestrator import (
    OrchestrationReport,
    run_e2e_orchestration,
 )
 from tests.e2e.replay._operator_pre_flight import PopulatedC6Cache
 from tests.e2e.replay.conftest import DerkachiReplayInputs
 def _repo_root() -> Path:
    return Path(__file__).resolve().parents[3]
 def _derkachi_dir() -> Path:
    return _repo_root() / "_docs" / "00_problem" / "input_data" / "flight_derkachi"
 _MIN_REAL_VIDEO_BYTES: int = 1_000_000
 def _replay_binary() -> Path | None:
    """Return the absolute path to ``gps-denied-replay`` or ``None``.
    Same lookup order AZ-699 uses: PATH first, venv bin second.
    """
    binary = shutil.which("gps-denied-replay")
    if binary is not None:
        return Path(binary)
    venv_bin = Path(sys.executable).parent / "gps-denied-replay"
    if venv_bin.exists():
        return venv_bin
    return None
 def _orchestrator_skip_reason() -> str | None:
    """Return a SKIP message when env / inputs preclude a Jetson run."""
    if os.environ.get("RUN_REPLAY_E2E", "").strip().lower() not in {
        "1",
        "true",
        "yes",
        "on",
    }:
        return "AZ-840 e2e orchestrator gated by RUN_REPLAY_E2E=1"
    if not os.environ.get("GPS_DENIED_OPERATOR_CONFIG_PATH", "").strip():
        return (
            "AZ-840 e2e orchestrator requires GPS_DENIED_OPERATOR_CONFIG_PATH "
            "(same env var the C3 fixture consumes)"
        )
    if _replay_binary() is None:
        return "gps-denied-replay console-script not installed"
    video = _derkachi_dir() / "flight_derkachi.mp4"
    if not video.is_file():
        return f"Derkachi video missing: {video}"
    if video.stat().st_size < _MIN_REAL_VIDEO_BYTES:
        return (
            f"Derkachi video at {video} is only {video.stat().st_size} "
            "bytes — placeholder, not a real recording"
        )
    return None
@pytest.fixture
 def az840_skip_gate() -> Iterator[None]:
    """Skip-gate the orchestrator test before any heavy fixtures resolve."""
    reason = _orchestrator_skip_reason()
    if reason is not None:
        pytest.skip(reason)
    yield
@pytest.mark.tier2
 def test_az840_e2e_real_flight_orchestration(
    az840_skip_gate: None,
    operator_pre_flight_setup: PopulatedC6Cache,
    derkachi_replay_inputs: DerkachiReplayInputs,
    tmp_path: Path,
 ) -> None:
    # Arrange — every input besides cache_root comes from the existing
    # session fixtures so the same Tier-2 harness setup that powers
    # AZ-699 + AZ-839 is exercised.
    binary = _replay_binary()
    assert binary is not None, "skip gate already verified the binary exists"
    base_config_path = Path(os.environ["GPS_DENIED_OPERATOR_CONFIG_PATH"])
    output_path = tmp_path / "estimator_output.jsonl"
    effective_config_path = tmp_path / "operator_config_effective.yaml"
    report_dir = _repo_root() / "_docs" / "06_metrics"
    # Act
    report = run_e2e_orchestration(
        populated_cache=operator_pre_flight_setup,
        base_config_path=base_config_path,
        tlog_path=derkachi_replay_inputs.tlog_path,
        video_path=derkachi_replay_inputs.video_path,
        calibration_path=derkachi_replay_inputs.calibration_path,
        signing_key_path=derkachi_replay_inputs.signing_key_path,
        replay_binary=binary,
        output_path=output_path,
        report_dir=report_dir,
        effective_config_path=effective_config_path,
    )
    # Assert AC-2 + AC-4 — report exists; full run within the 15-min budget.
    assert isinstance(report, OrchestrationReport)
    assert report.report_path.is_file()
    body = report.report_path.read_text()
    assert "## Horizontal error (metres)" in body
    assert "## Threshold-hit share" in body
    assert "Mean" in body
    for threshold in (10, 25, 50, 100):
        assert f"| {threshold} |" in body, (
            f"threshold {threshold} m row missing from report"
        )
    assert report.replay_subprocess_seconds <= 900.0, (
        "AZ-840 AC-4: replay subprocess exceeded 15-min soft target"
    )
    assert report.wall_clock_s >= report.replay_subprocess_seconds
    assert report.distribution.count > 0, (
        "no emissions paired with ground truth — orchestration produced "
        "data but every emission fell outside the tlog GPS window"
    )
    # Assert AC-3 — the effective config was written and points at the
    # cache_root the C3 fixture supplied.
    assert effective_config_path.is_file()
    effective_text = effective_config_path.read_text()
    assert str(operator_pre_flight_setup.cache_root) in effective_text
@@ -0,0 +1,671 @@
 """Unit tests for the AZ-840 e2e orchestrator (AC-8).
 The end-to-end happy path is the Tier-2 integration test in
 ``test_az835_e2e_real_flight.py`` (AC-1 / AC-2). This module covers
 the orchestration helper layer in isolation:
 * Param validation — every required path must exist before the
  airborne subprocess is spawned (AC-5 fails LOUD).
 * Effective-config merge — the ``c6_tile_cache.root_dir`` override
  is written to YAML; the rest of the base config is preserved.
 * Error propagation per step — every documented failure surfaces
  as :class:`OrchestrationFailure` with the correct
  :class:`OrchestratorStep` label.
 * Happy path — when the runner returns success and the JSONL +
  ground truth align, :class:`OrchestrationReport` carries a
  written report path and an honest verdict (AC-2: report exists
  PASS or FAIL).
 The tests inject a fake ``runner`` so no real
 ``gps-denied-replay`` subprocess is spawned. Real binary execution
 is exercised on the Jetson harness via the AC-1 integration test.
 """
 from __future__ import annotations
 import json
 import subprocess
 from pathlib import Path
 from unittest.mock import MagicMock
 import pytest
 import yaml
 from gps_denied_onboard.helpers.accuracy_report import (
    AC3_GATE_THRESHOLD_M,
 )
 from gps_denied_onboard.replay_input.tlog_route import RouteSpec
 from tests.e2e.replay._e2e_orchestrator import (
    OrchestrationFailure,
    OrchestrationReport,
    OrchestratorStep,
    read_calibration_acquisition_method,
    run_e2e_orchestration,
    write_effective_replay_config,
 )
 from tests.e2e.replay._operator_pre_flight import PopulatedC6Cache
 # ----------------------------------------------------------------------
 # Helpers
 def _build_populated_cache(tmp_path: Path) -> PopulatedC6Cache:
    """Construct a synthetic :class:`PopulatedC6Cache`.
    The orchestrator only consumes ``cache_root`` from the cache,
    so the FAISS sidecar paths are placeholders. The route_spec is
    a minimal one-waypoint instance — no AZ-836 invariants are
    re-asserted by AZ-840.
    """
    cache_root = tmp_path / "cache_root"
    cache_root.mkdir()
    return PopulatedC6Cache(
        cache_root=cache_root,
        tile_store_path=cache_root / "tiles",
        faiss_index_path=cache_root / "descriptor.index",
        faiss_sidecar_sha256_path=cache_root / "descriptor.index.sha256",
        faiss_sidecar_meta_path=cache_root / "descriptor.index.meta.json",
        route_spec=RouteSpec(
            waypoints=((50.10, 36.10),),
            suggested_region_size_meters=500.0,
            source_tlog=Path("test.tlog"),
            source_segment=(0, 100),
            total_distance_meters=0.0,
        ),
        tile_count=1,
        elapsed_seconds=0.0,
    )
 def _stage_inputs(tmp_path: Path) -> dict[str, Path]:
    """Write touch-files for every input path the orchestrator validates.
    The base config YAML carries one stub block so the merge step
    has a real document to overlay on.
    """
    base_config = tmp_path / "operator_config.yaml"
    base_config.write_text(
        yaml.safe_dump(
            {
                "mode": "replay",
                "c6_tile_cache": {
                    "store_runtime": "postgres_filesystem",
                    "metadata_runtime": "postgres_filesystem",
                    "descriptor_index_runtime": "faiss_hnsw",
                    "root_dir": "/var/lib/gps-denied/tiles",
                    "faiss_index_path": "/some/static/path/descriptor.index",
                },
            }
        )
    )
    tlog = tmp_path / "input.tlog"
    tlog.write_bytes(b"\x00")
    video = tmp_path / "input.mp4"
    video.write_bytes(b"\x00")
    calibration = tmp_path / "calibration.json"
    calibration.write_text(json.dumps({"acquisition_method": "factory-sheet"}))
    signing_key = tmp_path / "signing_key.bin"
    signing_key.write_bytes(b"\x00" * 32)
    binary = tmp_path / "gps-denied-replay"
    binary.write_text("")
    return {
        "base_config_path": base_config,
        "tlog_path": tlog,
        "video_path": video,
        "calibration_path": calibration,
        "signing_key_path": signing_key,
        "replay_binary": binary,
    }
 def _ground_truth_tlog_loader(
    monkeypatch: pytest.MonkeyPatch,
    *,
    times_s: tuple[float, ...] = (0.0, 1.0, 2.0),
    lat_deg: float = 50.10,
    lon_deg: float = 36.10,
    alt_m: float = 100.0,
 ) -> None:
    """Stub the orchestrator's ground-truth loader so unit tests skip MAVLink.
    The orchestrator imports ``load_tlog_ground_truth`` from
    ``gps_denied_onboard.replay_input``; patching the symbol *as
    bound on the orchestrator module* keeps the patch local to the
    unit suite (no cross-test bleed).
    """
    fixes = [
        _StubGpsFix(
            ts_ns=int(t * 1e9),
            lat_deg=lat_deg,
            lon_deg=lon_deg,
            alt_m=alt_m,
        )
        for t in times_s
    ]
    series = _StubGpsSeries(records=tuple(fixes))
    monkeypatch.setattr(
        "tests.e2e.replay._e2e_orchestrator.load_tlog_ground_truth",
        lambda *_args, **_kwargs: series,
    )
 class _StubGpsFix:
    """Mirrors the fields the orchestrator reads from each tlog row."""
    __slots__ = ("ts_ns", "lat_deg", "lon_deg", "alt_m")
    def __init__(
        self, *, ts_ns: int, lat_deg: float, lon_deg: float, alt_m: float
    ) -> None:
        self.ts_ns = ts_ns
        self.lat_deg = lat_deg
        self.lon_deg = lon_deg
        self.alt_m = alt_m
 class _StubGpsSeries:
    """Drop-in replacement for :class:`TlogGroundTruth`."""
    def __init__(self, *, records: tuple[_StubGpsFix, ...]) -> None:
        self.records = records
 def _build_runner_emitting(
    output_path: Path,
    *,
    rows: list[dict[str, object]],
    returncode: int = 0,
    stdout: str = "",
    stderr: str = "",
 ) -> "MagicMock":
    """Return a fake ``subprocess.run`` that writes JSONL on call."""
    def _run(argv, **kwargs):  # type: ignore[no-untyped-def]
        if rows:
            output_path.parent.mkdir(parents=True, exist_ok=True)
            output_path.write_text(
                "\n".join(json.dumps(row) for row in rows) + "\n"
            )
        return subprocess.CompletedProcess(
            args=argv,
            returncode=returncode,
            stdout=stdout,
            stderr=stderr,
        )
    return MagicMock(side_effect=_run)
 # ----------------------------------------------------------------------
 # write_effective_replay_config
 def test_write_effective_replay_config_overlays_root_dir(
    tmp_path: Path,
 ) -> None:
    # Arrange
    inputs = _stage_inputs(tmp_path)
    cache_root = tmp_path / "cache"
    cache_root.mkdir()
    output_path = tmp_path / "effective.yaml"
    # Act
    written_path = write_effective_replay_config(
        base_config_path=inputs["base_config_path"],
        cache_root=cache_root,
        output_path=output_path,
    )
    # Assert
    assert written_path == output_path
    merged = yaml.safe_load(output_path.read_text())
    assert merged["c6_tile_cache"]["root_dir"] == str(cache_root)
    assert merged["c6_tile_cache"]["faiss_index_path"] == ""
    assert merged["mode"] == "replay"
    assert (
        merged["c6_tile_cache"]["store_runtime"] == "postgres_filesystem"
    ), "non-overridden c6_tile_cache fields must survive"
 def test_write_effective_replay_config_creates_block_when_absent(
    tmp_path: Path,
 ) -> None:
    # Arrange
    base = tmp_path / "operator.yaml"
    base.write_text(yaml.safe_dump({"mode": "replay"}))
    cache_root = tmp_path / "cache"
    cache_root.mkdir()
    # Act
    write_effective_replay_config(
        base_config_path=base,
        cache_root=cache_root,
        output_path=tmp_path / "effective.yaml",
    )
    # Assert
    merged = yaml.safe_load((tmp_path / "effective.yaml").read_text())
    assert merged["c6_tile_cache"]["root_dir"] == str(cache_root)
 def test_write_effective_replay_config_malformed_yaml_fails(
    tmp_path: Path,
 ) -> None:
    # Arrange
    base = tmp_path / "bad.yaml"
    base.write_text(":\n  : not yaml:")
    cache_root = tmp_path / "cache"
    cache_root.mkdir()
    # Act + Assert
    with pytest.raises(OrchestrationFailure) as exc_info:
        write_effective_replay_config(
            base_config_path=base,
            cache_root=cache_root,
            output_path=tmp_path / "effective.yaml",
        )
    assert exc_info.value.step is OrchestratorStep.WRITE_EFFECTIVE_CONFIG
 def test_write_effective_replay_config_non_mapping_top_level_fails(
    tmp_path: Path,
 ) -> None:
    # Arrange
    base = tmp_path / "bad.yaml"
    base.write_text("- not a mapping\n")
    cache_root = tmp_path / "cache"
    cache_root.mkdir()
    # Act + Assert
    with pytest.raises(OrchestrationFailure) as exc_info:
        write_effective_replay_config(
            base_config_path=base,
            cache_root=cache_root,
            output_path=tmp_path / "effective.yaml",
        )
    assert exc_info.value.step is OrchestratorStep.WRITE_EFFECTIVE_CONFIG
 # ----------------------------------------------------------------------
 # read_calibration_acquisition_method
 def test_read_calibration_acquisition_method_returns_field_when_present(
    tmp_path: Path,
 ) -> None:
    # Arrange
    path = tmp_path / "cal.json"
    path.write_text(json.dumps({"acquisition_method": "factory-sheet"}))
    # Assert
    assert read_calibration_acquisition_method(path) == "factory-sheet"
 def test_read_calibration_acquisition_method_returns_unknown_on_missing(
    tmp_path: Path,
 ) -> None:
    # Arrange
    path = tmp_path / "cal.json"
    path.write_text(json.dumps({"some_other_field": True}))
    # Assert
    assert read_calibration_acquisition_method(path) == "unknown"
 def test_read_calibration_acquisition_method_returns_unknown_on_malformed(
    tmp_path: Path,
 ) -> None:
    # Arrange
    path = tmp_path / "cal.json"
    path.write_text("{not valid json")
    # Assert
    assert read_calibration_acquisition_method(path) == "unknown"
 # ----------------------------------------------------------------------
 # run_e2e_orchestration — param validation (AC-5)
 def test_run_e2e_orchestration_missing_tlog_fails_loud(
    tmp_path: Path,
 ) -> None:
    # Arrange
    cache = _build_populated_cache(tmp_path)
    inputs = _stage_inputs(tmp_path)
    inputs["tlog_path"].unlink()
    # Act + Assert
    with pytest.raises(OrchestrationFailure) as exc_info:
        run_e2e_orchestration(
            populated_cache=cache,
            output_path=tmp_path / "out.jsonl",
            report_dir=tmp_path / "metrics",
            effective_config_path=tmp_path / "eff.yaml",
            **inputs,  # type: ignore[arg-type]
        )
    assert exc_info.value.step is OrchestratorStep.VALIDATE_INPUTS
    assert "tlog_path" in str(exc_info.value)
 def test_run_e2e_orchestration_missing_binary_fails_loud(
    tmp_path: Path,
 ) -> None:
    # Arrange
    cache = _build_populated_cache(tmp_path)
    inputs = _stage_inputs(tmp_path)
    inputs["replay_binary"].unlink()
    # Act + Assert
    with pytest.raises(OrchestrationFailure) as exc_info:
        run_e2e_orchestration(
            populated_cache=cache,
            output_path=tmp_path / "out.jsonl",
            report_dir=tmp_path / "metrics",
            effective_config_path=tmp_path / "eff.yaml",
            **inputs,  # type: ignore[arg-type]
        )
    assert exc_info.value.step is OrchestratorStep.VALIDATE_INPUTS
    assert "replay_binary" in str(exc_info.value)
 # ----------------------------------------------------------------------
 # run_e2e_orchestration — subprocess error propagation (AC-5)
 def test_run_e2e_orchestration_replay_nonzero_exit_fails_loud(
    tmp_path: Path,
    monkeypatch: pytest.MonkeyPatch,
 ) -> None:
    # Arrange
    cache = _build_populated_cache(tmp_path)
    inputs = _stage_inputs(tmp_path)
    output_path = tmp_path / "out.jsonl"
    runner = MagicMock(
        return_value=subprocess.CompletedProcess(
            args=[],
            returncode=1,
            stdout="",
            stderr="boom",
        )
    )
    _ground_truth_tlog_loader(monkeypatch)
    # Act + Assert
    with pytest.raises(OrchestrationFailure) as exc_info:
        run_e2e_orchestration(
            populated_cache=cache,
            output_path=output_path,
            report_dir=tmp_path / "metrics",
            effective_config_path=tmp_path / "eff.yaml",
            runner=runner,
            **inputs,  # type: ignore[arg-type]
        )
    assert exc_info.value.step is OrchestratorStep.AIRBORNE_PIPELINE
    assert "exited 1" in str(exc_info.value)
    assert "boom" in str(exc_info.value)
 def test_run_e2e_orchestration_replay_timeout_fails_loud(
    tmp_path: Path,
 ) -> None:
    # Arrange
    cache = _build_populated_cache(tmp_path)
    inputs = _stage_inputs(tmp_path)
    def _timeout(*_args, **_kwargs):
        raise subprocess.TimeoutExpired(cmd=["replay"], timeout=0.1)
    runner = MagicMock(side_effect=_timeout)
    # Act + Assert
    with pytest.raises(OrchestrationFailure) as exc_info:
        run_e2e_orchestration(
            populated_cache=cache,
            output_path=tmp_path / "out.jsonl",
            report_dir=tmp_path / "metrics",
            effective_config_path=tmp_path / "eff.yaml",
            runner=runner,
            max_seconds=0.1,
            **inputs,  # type: ignore[arg-type]
        )
    assert exc_info.value.step is OrchestratorStep.AIRBORNE_PIPELINE
    assert "timed out" in str(exc_info.value)
 def test_run_e2e_orchestration_replay_oserror_fails_loud(
    tmp_path: Path,
 ) -> None:
    # Arrange
    cache = _build_populated_cache(tmp_path)
    inputs = _stage_inputs(tmp_path)
    def _oserror(*_args, **_kwargs):
        raise OSError("permission denied")
    runner = MagicMock(side_effect=_oserror)
    # Act + Assert
    with pytest.raises(OrchestrationFailure) as exc_info:
        run_e2e_orchestration(
            populated_cache=cache,
            output_path=tmp_path / "out.jsonl",
            report_dir=tmp_path / "metrics",
            effective_config_path=tmp_path / "eff.yaml",
            runner=runner,
            **inputs,  # type: ignore[arg-type]
        )
    assert exc_info.value.step is OrchestratorStep.AIRBORNE_PIPELINE
    assert "permission denied" in str(exc_info.value)
 # ----------------------------------------------------------------------
 # run_e2e_orchestration — empty / malformed JSONL (AC-5)
 def test_run_e2e_orchestration_empty_jsonl_fails_loud(
    tmp_path: Path,
    monkeypatch: pytest.MonkeyPatch,
 ) -> None:
    # Arrange
    cache = _build_populated_cache(tmp_path)
    inputs = _stage_inputs(tmp_path)
    output_path = tmp_path / "out.jsonl"
    def _runner(argv, **_kwargs):  # type: ignore[no-untyped-def]
        output_path.write_text("\n\n")  # only blanks
        return subprocess.CompletedProcess(args=argv, returncode=0, stdout="", stderr="")
    runner = MagicMock(side_effect=_runner)
    _ground_truth_tlog_loader(monkeypatch)
    # Act + Assert
    with pytest.raises(OrchestrationFailure) as exc_info:
        run_e2e_orchestration(
            populated_cache=cache,
            output_path=output_path,
            report_dir=tmp_path / "metrics",
            effective_config_path=tmp_path / "eff.yaml",
            runner=runner,
            **inputs,  # type: ignore[arg-type]
        )
    assert exc_info.value.step is OrchestratorStep.PARSE_EMISSIONS
 def test_run_e2e_orchestration_malformed_jsonl_fails_loud(
    tmp_path: Path,
    monkeypatch: pytest.MonkeyPatch,
 ) -> None:
    # Arrange
    cache = _build_populated_cache(tmp_path)
    inputs = _stage_inputs(tmp_path)
    output_path = tmp_path / "out.jsonl"
    def _runner(argv, **_kwargs):  # type: ignore[no-untyped-def]
        output_path.write_text('{"valid": true}\nnot a json line\n')
        return subprocess.CompletedProcess(args=argv, returncode=0, stdout="", stderr="")
    runner = MagicMock(side_effect=_runner)
    _ground_truth_tlog_loader(monkeypatch)
    # Act + Assert
    with pytest.raises(OrchestrationFailure) as exc_info:
        run_e2e_orchestration(
            populated_cache=cache,
            output_path=output_path,
            report_dir=tmp_path / "metrics",
            effective_config_path=tmp_path / "eff.yaml",
            runner=runner,
            **inputs,  # type: ignore[arg-type]
        )
    assert exc_info.value.step is OrchestratorStep.PARSE_EMISSIONS
 # ----------------------------------------------------------------------
 # run_e2e_orchestration — ground truth loader failure (AC-5)
 def test_run_e2e_orchestration_ground_truth_loader_failure_fails_loud(
    tmp_path: Path,
    monkeypatch: pytest.MonkeyPatch,
 ) -> None:
    # Arrange
    cache = _build_populated_cache(tmp_path)
    inputs = _stage_inputs(tmp_path)
    output_path = tmp_path / "out.jsonl"
    runner = _build_runner_emitting(
        output_path,
        rows=[
            {
                "emitted_at": int(0.5 * 1e9),
                "position_wgs84": {
                    "lat_deg": 50.10,
                    "lon_deg": 36.10,
                    "alt_m": 100.0,
                },
            }
        ],
    )
    def _raise(*_args, **_kwargs):
        raise ValueError("tlog corrupt")
    monkeypatch.setattr(
        "tests.e2e.replay._e2e_orchestrator.load_tlog_ground_truth",
        _raise,
    )
    # Act + Assert
    with pytest.raises(OrchestrationFailure) as exc_info:
        run_e2e_orchestration(
            populated_cache=cache,
            output_path=output_path,
            report_dir=tmp_path / "metrics",
            effective_config_path=tmp_path / "eff.yaml",
            runner=runner,
            **inputs,  # type: ignore[arg-type]
        )
    assert exc_info.value.step is OrchestratorStep.LOAD_GROUND_TRUTH
    assert "tlog corrupt" in str(exc_info.value)
 # ----------------------------------------------------------------------
 # run_e2e_orchestration — happy path (AC-1 / AC-2)
 def test_run_e2e_orchestration_happy_path_writes_report(
    tmp_path: Path,
    monkeypatch: pytest.MonkeyPatch,
 ) -> None:
    # Arrange
    cache = _build_populated_cache(tmp_path)
    inputs = _stage_inputs(tmp_path)
    output_path = tmp_path / "out.jsonl"
    report_dir = tmp_path / "metrics"
    effective_config_path = tmp_path / "eff.yaml"
    rows = [
        {
            "emitted_at": int(0.5 * 1e9),
            "position_wgs84": {"lat_deg": 50.10, "lon_deg": 36.10, "alt_m": 100.0},
        },
        {
            "emitted_at": int(1.5 * 1e9),
            "position_wgs84": {"lat_deg": 50.10, "lon_deg": 36.10, "alt_m": 100.0},
        },
    ]
    runner = _build_runner_emitting(output_path, rows=rows)
    _ground_truth_tlog_loader(monkeypatch)
    # Act
    report = run_e2e_orchestration(
        populated_cache=cache,
        output_path=output_path,
        report_dir=report_dir,
        effective_config_path=effective_config_path,
        runner=runner,
        run_date_utc="2026-05-23",
        **inputs,  # type: ignore[arg-type]
    )
    # Assert
    assert isinstance(report, OrchestrationReport)
    assert report.report_path.is_file()
    assert report.emissions_count == 2
    assert report.distribution.count == 2
    assert report.verdict_passed is True
    body = report.report_path.read_text()
    assert "## Horizontal error (metres)" in body
    assert "## Threshold-hit share" in body
    assert f"| {AC3_GATE_THRESHOLD_M:g} |" in body
    runner.assert_called_once()
    argv_passed = runner.call_args.args[0]
    assert str(effective_config_path) in argv_passed
    assert "--auto-trim" in argv_passed
    merged = yaml.safe_load(effective_config_path.read_text())
    assert merged["c6_tile_cache"]["root_dir"] == str(cache.cache_root)
 def test_run_e2e_orchestration_writes_report_even_on_fail_verdict(
    tmp_path: Path,
    monkeypatch: pytest.MonkeyPatch,
 ) -> None:
    # Arrange — emissions are 1 km from ground truth, far above the 100 m gate.
    cache = _build_populated_cache(tmp_path)
    inputs = _stage_inputs(tmp_path)
    output_path = tmp_path / "out.jsonl"
    report_dir = tmp_path / "metrics"
    rows = [
        {
            "emitted_at": int(0.5 * 1e9),
            "position_wgs84": {"lat_deg": 50.110, "lon_deg": 36.110, "alt_m": 100.0},
        },
        {
            "emitted_at": int(1.5 * 1e9),
            "position_wgs84": {"lat_deg": 50.110, "lon_deg": 36.110, "alt_m": 100.0},
        },
    ]
    runner = _build_runner_emitting(output_path, rows=rows)
    _ground_truth_tlog_loader(monkeypatch)
    # Act
    report = run_e2e_orchestration(
        populated_cache=cache,
        output_path=output_path,
        report_dir=report_dir,
        effective_config_path=tmp_path / "eff.yaml",
        runner=runner,
        run_date_utc="2026-05-23",
        **inputs,  # type: ignore[arg-type]
    )
    # Assert — AC-2: report exists regardless of PASS/FAIL.
    assert report.verdict_passed is False
    assert report.report_path.is_file()
    assert "FAIL" in report.report_path.read_text()