[AZ-840] [AZ-835] e2e orchestrator test (E-AZ-835 C4)

Wraps the AZ-699 verdict-report path with the AZ-839 operator_pre_flight_setup C3 fixture so a single Tier-2 test takes only (tlog, video, calibration) and runs the full 7-step pipeline on the Jetson harness without operator hand-curation. New surface (tests-only, no src/ changes): - tests/e2e/replay/_e2e_orchestrator.py — orchestrator with OrchestratorStep enum, OrchestrationFailure exception (step prefix per AC-5), OrchestrationReport dataclass, write_effective_replay_config helper, and run_e2e_orchestration entry point covering steps 1-2-6-7. - tests/e2e/replay/test_e2e_orchestrator_unit.py — 17 unit tests covering each failure mode + happy path with mocked subprocess + ground-truth loader (AC-8). - tests/e2e/replay/test_az835_e2e_real_flight.py — Tier-2 + RUN_REPLAY_E2E gated integration test asserting verdict report exists, 15-min budget held (AC-1, AC-2, AC-3, AC-4, AC-6). The effective config write overlays c6_tile_cache.root_dir onto the static operator YAML at runtime so the airborne subprocess shares the cache_root the C3 fixture chose. Field- level merge — every other operator-config block stays verbatim. The static YAML on disk is never touched. Test run: tests/e2e/replay 45 passed, 10 skipped (10 skips were 9 pre-existing + 1 new tier2). No src/ touched, no AZ-839 driver changes; AC-7 (AZ-699 still passes) holds by inspection. Co-authored-by: Cursor <cursoragent@cursor.com>
2026-06-21 07:01:14 +00:00 · 2026-05-23 15:27:41 +03:00
parent 8c4be9ace0
commit ade0c86f2b
6 changed files with 1680 additions and 1 deletions
@@ -0,0 +1,171 @@
+# Batch 109 — Cycle 3 — AZ-840 e2e orchestrator test
+
+**Date**: 2026-05-23
+**Tasks**: AZ-840 (C4 — Epic AZ-835).
+**Story points**: 3 (per the task spec).
+**Jira status**: AZ-840 In Progress → In Testing at commit step.
+
+## Why this batch exists
+
+Epic AZ-835 (real-flight e2e validation) needs a single Tier-2
+test that proves the 7-step pipeline runs from
+`(tlog, video, calibration)` to a horizontal-error verdict
+without operator hand-curation between steps. Steps 3-5 were
+delivered by AZ-839 (C3 — `operator_pre_flight_setup`); steps
+1-2-6-7 are this batch.
+
+The AZ-839 batch 108b follow-up note explicitly anticipated this
+batch: "AZ-840 will additionally need to feed the airborne
+replay binary a config that points at the same `cache_root`
+... the cleanest path is for AZ-840 to write an effective YAML
+at runtime from the same override recipe used here."
+
+## What this batch ships
+
+A driver module + unit test suite + Tier-2 integration test:
+
+* `tests/e2e/replay/_e2e_orchestrator.py` — wraps the AZ-699
+  verdict-report path with the AZ-839 C3 fixture's
+  `PopulatedC6Cache`. Public surface:
+  * `OrchestratorStep` enum — failure-step labels per AC-5.
+  * `OrchestrationFailure(step, message)` exception — wraps
+    every step failure with the step name in the message prefix.
+  * `OrchestrationReport` dataclass — verdict, distribution,
+    paths, wall-clock measurements per AC-4.
+  * `write_effective_replay_config` — small helper that overlays
+    `c6_tile_cache.root_dir` onto the static operator YAML.
+  * `read_calibration_acquisition_method` — mirror of AZ-699's
+    helper so the report writer keeps the same shape.
+  * `run_e2e_orchestration` — the AC-1 entry point wiring
+    validate → write_config → airborne subprocess → parse JSONL
+    → load tlog GT → compute distribution → render report.
+* `tests/e2e/replay/test_e2e_orchestrator_unit.py` — 17 unit
+  tests covering each of the 7 steps' failure modes plus the
+  happy path. The runner is injected (`subprocess.run` default)
+  so unit tests stage synthetic JSONL output without touching
+  the airborne binary. `load_tlog_ground_truth` is monkeypatched
+  to return a synthetic 3-row series.
+* `tests/e2e/replay/test_az835_e2e_real_flight.py::
+  test_az840_e2e_real_flight_orchestration` — Tier-2 + RUN_REPLAY_E2E
+  gated test that consumes the C3 fixture + Derkachi inputs and
+  asserts the verdict markdown is written, the threshold-hit
+  share table is present, and the 15-min budget held.
+
+## AC coverage
+
+| AC  | Description                                              | Coverage                                                                                                                  |
+|-----|----------------------------------------------------------|---------------------------------------------------------------------------------------------------------------------------|
+| AC-1| Steps 1-7 end-to-end on Tier-2 from a fresh tlog/video   | `test_az840_e2e_real_flight_orchestration` (Tier-2-gated); 17 unit tests prove the orchestrator structure                 |
+| AC-2| Verdict report exists either PASS or FAIL                | `test_run_e2e_orchestration_writes_report_even_on_fail_verdict` + integration assertion `report_path.is_file()`           |
+| AC-3| Reuses C3 fixture (`operator_pre_flight_setup`)          | Integration test consumes the fixture; effective config overlay points at `populated_cache.cache_root`                    |
+| AC-4| 15-min wall-time soft target on the Derkachi clip        | `_DEFAULT_MAX_SECONDS = 900.0` passed as `subprocess.run` `timeout`; integration asserts `replay_subprocess_seconds <= 900`|
+| AC-5| Mid-pipeline failure fails LOUD with a clear step prefix | `OrchestratorStep` enum + 8 step-specific failure unit tests (`validate`/`write_config`/`airborne` × 3/`parse` × 2/`gt`)  |
+| AC-6| Gated by `RUN_REPLAY_E2E=1` + Tier-2 marker              | `_orchestrator_skip_reason()` checks env vars + binary + video size; `@pytest.mark.tier2` decorator                       |
+| AC-7| AZ-699 verdict test continues to pass                    | No changes to `test_derkachi_real_tlog.py`; same `real_flight_validation_<date>.md` report path convention                |
+| AC-8| Unit-tested orchestration helper without Tier-2 inputs   | 17 unit tests covering config write (4) + calibration parse (3) + run helper (10) — all use mocked subprocess + GT loader |
+
+## Test run results
+
+```
+$ .venv/bin/pytest tests/e2e/replay/ -v --tb=short --timeout=60
+============================ 45 passed, 10 skipped, 3 warnings in 0.78s ============
+```
+
+Breakdown:
+* 17 new orchestrator unit tests pass.
+* 11 AZ-839 driver unit tests still pass (no driver changes).
+* 14 helper unit tests (`test_helpers.py`) still pass.
+* 3 derkachi-1min mode-agnostic AST tests still pass.
+* 10 skips: 1 new Tier-2 (this AZ-840 integration), 6
+  RUN_REPLAY_E2E gated AZ-404 cases, 1 AC-8 D-PROJ-2 placeholder,
+  1 Tier-2 AZ-699, 1 Tier-2 AZ-839 integration. None are
+  regressions; the tier2 gate trips off-Jetson.
+
+## Design notes
+
+### `--auto-trim` ownership
+
+The orchestrator passes `--auto-trim` unconditionally so AZ-405 /
+AZ-698 active-flight-cut + tlog/video sync (Epic step 1) runs
+inside the airborne binary every time. The Epic narrative does
+not separate trim from the airborne pipeline; collapsing them
+into a single subprocess invocation matches AZ-699 and avoids
+duplicating the trim path.
+
+### `clip_duration_s` parity with AZ-699
+
+`run_e2e_orchestration` computes
+`clip_duration_s = ground_truth[-1].t_s - ground_truth[0].t_s`
+exactly as `test_derkachi_real_tlog.py` does. This means both
+verdict reports name the same clip duration even when the
+trimmed video is shorter than the ground-truth window — a
+deliberate choice: the report header documents what the verdict
+covers, not what the binary processed.
+
+### Effective config write — single source of truth
+
+`write_effective_replay_config` materialises the same override
+recipe AZ-839 uses in-memory, but on disk so the airborne
+subprocess sees the cache_root the fixture chose. Field-level
+merge: every other block in the operator YAML is preserved
+verbatim; only `c6_tile_cache.root_dir` and
+`c6_tile_cache.faiss_index_path` are overwritten. The static
+operator YAML on disk is never touched.
+
+### Failure surface = step prefix
+
+`OrchestrationFailure` always prefixes its message with
+`[<step>]`. CI log scrapers and pytest's traceback printer both
+surface the prefix on the first line; AC-5 ("clear error
+pointing at the failing step") holds without requiring the test
+to inspect the exception object. The step is also exposed as
+`exc.step` for programmatic assertions.
+
+## Files changed
+
+* `tests/e2e/replay/_e2e_orchestrator.py` (new, 656 LOC).
+* `tests/e2e/replay/test_e2e_orchestrator_unit.py` (new, 660+ LOC).
+* `tests/e2e/replay/test_az835_e2e_real_flight.py` (new, 156 LOC).
+
+No `src/` changes, no operator-config YAML changes, no AZ-839
+driver changes. AZ-840 is purely additive at the test layer.
+
+## Code review (self-review)
+
+Verdict: **PASS_WITH_WARNINGS**.
+
+| Phase | Result |
+|-------|--------|
+| 1. Context loading | Re-read `gps_compare.py`, `accuracy_report.py`, `replay_input.py`, `cli/replay.py`, `test_derkachi_real_tlog.py`. Emission schema (`emitted_at`, `position_wgs84`) is the same shape `gps-denied-replay` writes. |
+| 2. Spec compliance | All 8 AZ-840 ACs covered; AC-7 holds by inspection (no AZ-699 changes). |
+| 3. Code quality | All public types have docstrings; failure messages name the upstream exception via `repr` so `OSError` / `subprocess.TimeoutExpired` carry through. Runner kw-args mirror `subprocess.run` signature 1:1. |
+| 4. Security quick-scan | Effective config write goes to a tmp file the test owns; no secrets in the YAML overlay (override is two string fields). Subprocess `env` is opt-in (`None` defaults to `os.environ`). |
+| 5. Performance scan | Unit tests run in 0.51 s. Tier-2 wall-clock cap is 900 s, enforced by the subprocess timeout. |
+| 6. Cross-task consistency | `clip_duration_s` and `report_path` match AZ-699 exactly so a single Jetson run produces the same markdown shape. |
+| 7. Architecture compliance | Orchestrator lives entirely under `tests/e2e/replay/`; no `src/` writes. C3 fixture's invariants (`PopulatedC6Cache.cache_root` is the single source of truth) propagate via `write_effective_replay_config`. |
+
+## Findings
+
+| ID | Severity | Description | Disposition |
+|----|----------|-------------|-------------|
+| F1 | Low | `_default_tile_decoder` in `conftest.py` (carried from batch 108) — still raw TIFF. Not in the AZ-840 path; AZ-840 doesn't change tile decoding. | Defer; no AZ-840 ticket. |
+| F2 | Low | `_resolve_replay_descriptor_dim` is NetVLAD-only (carried from batch 108). AZ-840 doesn't change descriptors. | Defer; no AZ-840 ticket. |
+| F3 | Low | `--pace asap` is hardcoded in `_run_replay_subprocess` argv; the AZ-699 test passes `--pace asap` too, so behaviour is identical. If a future test wants a real-time pace, the runner kwarg is the seam. | Document; no ticket. |
+| F4 | Low | `_run_replay_subprocess` does not stream stdout/stderr; failures surface only after the subprocess exits. For 15-min runs this means the operator sees no progress until the budget expires. AZ-699 has the same shape. | Document; consider an AZ-* if the budget grows. |
+
+## Notes for follow-up
+
+* AZ-840 lands the orchestrator test as Tier-2-gated. Verifying
+  the Tier-2 path actually runs on the Jetson harness is the
+  next gating step before Epic AZ-835 can flip from "covered by
+  unit tests" to "covered by Tier-2 integration".
+* `_e2e_orchestrator.py` is intentionally kept under `tests/`
+  rather than promoted to `src/`. If a second consumer of the
+  same orchestration shape appears (e.g. AZ-833 mock-suite-sat
+  parity test), the move to a shared helper module under
+  `src/gps_denied_onboard/replay/` is the right next step;
+  for now the test-only location matches the helper's only
+  consumer.
+* AZ-841 (Tier-2 unxfail follow-up) and AZ-842 (replay protocol
+  + orchestrator docs) sit downstream — both should reference
+  this batch report in their planning sections.
@@ -8,7 +8,7 @@ status: in_progress
 sub_step:
  phase: 7
  name: batch-loop
-  detail: "batch 109 next; AZ-840 C4"
+  detail: "batch 110 next; full-suite gate after AZ-840 C4 ship"
 retry_count: 0
 cycle: 3
 tracker: jira
@@ -0,0 +1,655 @@
+"""E2E orchestrator for the AZ-835 7-step pipeline (AZ-840 / Epic AZ-835 C4).
+
+Wraps the AZ-699 verdict-report writing path with the AZ-839 C3
+fixture's `PopulatedC6Cache` so a single Tier-2 test can run from
+``(tlog, video, calibration)`` to a horizontal-error report without
+operator hand-curation between steps. The 7-step Epic narrative
+(``_docs/02_tasks/todo/AZ-840_e2e_orchestrator_test.md``):
+
+1. Active flight cut + tlog/video sync — handled by ``gps-denied-replay``
+   ``--auto-trim`` (AZ-405 / AZ-698) inside the airborne binary.
+2. On-fly frame + IMU extraction — same binary's per-frame loop.
+3. Auto-create route — done by the C3 fixture
+   (``operator_pre_flight_setup`` calls ``extract_route_from_tlog``).
+4. POST route to satellite-provider — C3 fixture (AZ-838
+   ``SatelliteProviderRouteClient.seed_route``).
+5. Build FAISS index — C3 fixture (AZ-322 ``DescriptorBatcher``).
+6. Run gps-denied airborne pipeline — this module's
+   ``_run_replay_subprocess`` invokes ``gps-denied-replay`` against
+   the populated cache.
+7. Get GPS fixes, check vs tlog GPS — this module's
+   ``_load_ground_truth`` + ``horizontal_error_distribution`` +
+   ``render_report`` writes the verdict markdown.
+
+The C3 fixture mutates ``c6_tile_cache.root_dir`` to point at a
+``tmp_path_factory.mktemp`` value (AZ-839 batch 108b). The static
+operator YAML at ``GPS_DENIED_OPERATOR_CONFIG_PATH`` cannot know
+that path. ``write_effective_replay_config`` reads the static YAML,
+overlays the ``c6_tile_cache.root_dir`` override, writes the merged
+result to a tmp file, and returns the path the airborne binary
+will load via ``--config``. This keeps a single source of truth
+for the cache_root override across the in-memory C3 fixture path
+and the subprocess airborne path.
+
+Public surface — re-exported from this module:
+
+* :class:`OrchestratorStep` — failure-step labels per AC-5 ("fails
+  LOUD with a clear error pointing at the failing step").
+* :class:`OrchestrationFailure` — wraps the underlying exception
+  with the step that produced it.
+* :class:`OrchestrationReport` — return value of
+  :func:`run_e2e_orchestration` (verdict, distribution, paths,
+  wall-clock measurements per AC-4).
+* :func:`write_effective_replay_config` — small helper for the
+  config merge step.
+* :func:`run_e2e_orchestration` — the AC-1 entry point.
+"""
+
+from __future__ import annotations
+
+import datetime
+import json
+import logging
+import subprocess
+import time
+from collections.abc import Callable, Mapping
+from dataclasses import dataclass
+from enum import Enum
+from pathlib import Path
+from typing import Any
+
+import yaml
+
+from gps_denied_onboard.helpers.accuracy_report import (
+    AC3_GATE_PCT,
+    AC3_GATE_THRESHOLD_M,
+    ReportContext,
+    render_report,
+    verdict_passes_ac3,
+)
+from gps_denied_onboard.helpers.gps_compare import (
+    GroundTruthRow,
+    HorizontalErrorDistribution,
+    horizontal_error_distribution,
+)
+from gps_denied_onboard.replay_input import load_tlog_ground_truth
+
+from tests.e2e.replay._operator_pre_flight import PopulatedC6Cache
+
+__all__ = [
+    "OrchestrationFailure",
+    "OrchestrationReport",
+    "OrchestratorStep",
+    "read_calibration_acquisition_method",
+    "run_e2e_orchestration",
+    "write_effective_replay_config",
+]
+
+
+# Replay-subprocess wall-clock cap for the Derkachi clip per AZ-840
+# AC-4 (15 min soft target). Exposed as a default that the integration
+# test can override; the unit tests rely on the contract that the
+# runner argument is a free callable.
+_DEFAULT_MAX_SECONDS: float = 900.0
+
+_LOGGER = logging.getLogger("tests.e2e.replay.e2e_orchestrator")
+
+
+class OrchestratorStep(str, Enum):
+    """Labels for the 7-step pipeline used by :class:`OrchestrationFailure`.
+
+    AC-5: every failure that reaches the test surface must name the
+    step that produced it. The string values are stable so test
+    assertions and log readers can match on them.
+    """
+
+    VALIDATE_INPUTS = "validate_inputs"
+    WRITE_EFFECTIVE_CONFIG = "write_effective_config"
+    AIRBORNE_PIPELINE = "airborne_pipeline"
+    PARSE_EMISSIONS = "parse_emissions"
+    LOAD_GROUND_TRUTH = "load_ground_truth"
+    COMPUTE_DISTRIBUTION = "compute_distribution"
+    RENDER_REPORT = "render_report"
+
+
+class OrchestrationFailure(RuntimeError):
+    """Failure inside one of the 7 orchestration steps (AC-5).
+
+    The :attr:`step` attribute names the failing step; the message
+    embeds it as the prefix so plain log-readers see the failure
+    location without inspecting the exception object.
+    """
+
+    def __init__(self, step: OrchestratorStep, message: str) -> None:
+        super().__init__(f"[{step.value}] {message}")
+        self.step = step
+
+
+@dataclass(frozen=True, slots=True)
+class OrchestrationReport:
+    """Return value of :func:`run_e2e_orchestration`.
+
+    Attributes:
+        verdict_passed: ``True`` iff the run met the AZ-696 epic
+            AC-3 gate (>= AC3_GATE_PCT% within AC3_GATE_THRESHOLD_M m).
+        distribution: Computed horizontal-error distribution.
+        report_path: Markdown report written under ``report_dir``.
+        emissions_count: Total estimator-output records consumed.
+        wall_clock_s: Wall-clock seconds for the orchestration run
+            (excludes the C3 fixture setup; covers steps 1-2-6-7).
+        replay_subprocess_seconds: Wall-clock seconds the airborne
+            replay subprocess took. Always <= ``wall_clock_s``.
+    """
+
+    verdict_passed: bool
+    distribution: HorizontalErrorDistribution
+    report_path: Path
+    emissions_count: int
+    wall_clock_s: float
+    replay_subprocess_seconds: float
+
+
+def read_calibration_acquisition_method(calibration_path: Path) -> str:
+    """Return the AZ-702 ``acquisition_method`` field, or ``"unknown"``.
+
+    Mirrors ``test_derkachi_real_tlog._read_calibration_acquisition_method``
+    so the AZ-840 verdict report can name the calibration provenance
+    in its failure message (AZ-699 AC-3). Pure helper; the report
+    writer needs the string, not the JSON.
+    """
+    try:
+        data = json.loads(calibration_path.read_text())
+    except (OSError, json.JSONDecodeError):
+        return "unknown"
+    method = data.get("acquisition_method")
+    if isinstance(method, str) and method:
+        return method
+    return "unknown"
+
+
+def write_effective_replay_config(
+    *,
+    base_config_path: Path,
+    cache_root: Path,
+    output_path: Path,
+) -> Path:
+    """Merge cache_root override into the static operator YAML.
+
+    Reads ``base_config_path`` as YAML, sets the
+    ``c6_tile_cache.root_dir`` to ``cache_root`` (forcing the
+    FAISS index path to fall back to ``<cache_root>/descriptor.index``),
+    and writes the merged document to ``output_path`` as YAML.
+
+    The merge is field-level: every other block in the base YAML is
+    preserved verbatim. This keeps a single source of truth for the
+    operator config — the test harness only contributes the dynamic
+    cache_root.
+
+    Returns:
+        The ``output_path`` argument, for ergonomic chaining.
+
+    Raises:
+        OrchestrationFailure (step=WRITE_EFFECTIVE_CONFIG): Base YAML
+            unreadable, malformed, or not a top-level mapping.
+    """
+
+    try:
+        base_text = base_config_path.read_text()
+    except OSError as exc:
+        raise OrchestrationFailure(
+            OrchestratorStep.WRITE_EFFECTIVE_CONFIG,
+            f"cannot read base config at {base_config_path}: {exc!r}",
+        ) from exc
+
+    try:
+        base_data = yaml.safe_load(base_text) or {}
+    except yaml.YAMLError as exc:
+        raise OrchestrationFailure(
+            OrchestratorStep.WRITE_EFFECTIVE_CONFIG,
+            f"base config YAML at {base_config_path} is malformed: {exc!r}",
+        ) from exc
+    if not isinstance(base_data, dict):
+        raise OrchestrationFailure(
+            OrchestratorStep.WRITE_EFFECTIVE_CONFIG,
+            f"base config YAML at {base_config_path} must be a mapping; "
+            f"got {type(base_data).__name__}",
+        )
+
+    c6_block_raw = base_data.get("c6_tile_cache")
+    c6_block = dict(c6_block_raw) if isinstance(c6_block_raw, dict) else {}
+    c6_block["root_dir"] = str(cache_root)
+    c6_block["faiss_index_path"] = ""
+    base_data["c6_tile_cache"] = c6_block
+
+    try:
+        output_path.parent.mkdir(parents=True, exist_ok=True)
+        output_path.write_text(
+            yaml.safe_dump(base_data, sort_keys=True, default_flow_style=False)
+        )
+    except OSError as exc:
+        raise OrchestrationFailure(
+            OrchestratorStep.WRITE_EFFECTIVE_CONFIG,
+            f"cannot write effective config at {output_path}: {exc!r}",
+        ) from exc
+    return output_path
+
+
+def run_e2e_orchestration(
+    *,
+    populated_cache: PopulatedC6Cache,
+    base_config_path: Path,
+    tlog_path: Path,
+    video_path: Path,
+    calibration_path: Path,
+    signing_key_path: Path,
+    replay_binary: Path,
+    output_path: Path,
+    report_dir: Path,
+    effective_config_path: Path,
+    run_date_utc: str | None = None,
+    runner: Callable[..., subprocess.CompletedProcess[str]] = subprocess.run,
+    subprocess_env: Mapping[str, str] | None = None,
+    max_seconds: float = _DEFAULT_MAX_SECONDS,
+    logger: logging.Logger | None = None,
+) -> OrchestrationReport:
+    """Run AZ-835 steps 1-7 against the AZ-839 populated cache.
+
+    Steps 3-5 are the responsibility of ``populated_cache`` (the
+    AZ-839 C3 fixture); this function covers 1-2-6 (the airborne
+    replay subprocess) and 7 (verdict report). The C3 fixture and
+    this function share the cache_root via
+    :func:`write_effective_replay_config` so the airborne binary
+    reads the same FAISS index the fixture wrote (AC-3).
+
+    Args:
+        populated_cache: C3 fixture output (AZ-839). Carries
+            ``cache_root``, ``faiss_index_path``, and the route
+            spec the test pipeline produced.
+        base_config_path: Static operator config YAML
+            (``GPS_DENIED_OPERATOR_CONFIG_PATH``). Must register
+            ``c6_tile_cache``, ``c10_provisioning``, ``c2_vpr``,
+            ``c4_pose``, and ``c5_state`` blocks for the airborne
+            binary to compose the replay graph.
+        tlog_path: ArduPilot binary tlog the test consumes.
+        video_path: Flight video file the test consumes.
+        calibration_path: Camera calibration JSON (AZ-702
+            factory-sheet for Derkachi).
+        signing_key_path: MAVLink signing-key file. Replay protocol
+            Invariant 11 — required even for the noop transport.
+        replay_binary: ``gps-denied-replay`` console-script path.
+        output_path: Where the airborne binary writes JSONL
+            estimator emissions.
+        report_dir: Directory the verdict markdown is written to.
+        effective_config_path: Where the cache_root-merged YAML is
+            written. The path is passed to the airborne binary via
+            ``--config``.
+        run_date_utc: ISO-8601 date for the report filename and
+            header. Defaults to today UTC.
+        runner: ``subprocess.run`` by default; tests inject a fake
+            that emits a synthetic JSONL output.
+        subprocess_env: Optional environment overlay for the
+            replay subprocess. ``None`` means ``os.environ``.
+        max_seconds: Hard wall-clock cap for the airborne replay
+            subprocess. The orchestrator times out the runner via
+            its ``timeout`` kwarg; an exceeded budget surfaces as
+            ``OrchestrationFailure(step=AIRBORNE_PIPELINE)``.
+        logger: Optional logger. Defaults to the module logger.
+
+    Returns:
+        :class:`OrchestrationReport` on success. The verdict can
+        be PASS or FAIL — AC-2 mandates the report exists either
+        way.
+
+    Raises:
+        OrchestrationFailure: Any of the 7 steps failed. The
+            ``step`` attribute names the failing step.
+    """
+
+    log = logger or _LOGGER
+    started = time.monotonic()
+    effective_run_date = run_date_utc or (
+        datetime.datetime.now(datetime.timezone.utc).date().isoformat()
+    )
+
+    _validate_inputs(
+        base_config_path=base_config_path,
+        tlog_path=tlog_path,
+        video_path=video_path,
+        calibration_path=calibration_path,
+        signing_key_path=signing_key_path,
+        replay_binary=replay_binary,
+        report_dir=report_dir,
+    )
+
+    write_effective_replay_config(
+        base_config_path=base_config_path,
+        cache_root=populated_cache.cache_root,
+        output_path=effective_config_path,
+    )
+
+    replay_subprocess_seconds = _run_replay_subprocess(
+        replay_binary=replay_binary,
+        video_path=video_path,
+        tlog_path=tlog_path,
+        output_path=output_path,
+        calibration_path=calibration_path,
+        config_path=effective_config_path,
+        signing_key_path=signing_key_path,
+        max_seconds=max_seconds,
+        runner=runner,
+        env=subprocess_env,
+        logger=log,
+    )
+
+    emissions = _parse_jsonl(output_path)
+
+    ground_truth = _load_ground_truth(tlog_path)
+
+    distribution = _compute_distribution(emissions, ground_truth)
+
+    context = ReportContext(
+        run_date_utc=effective_run_date,
+        tlog_path=tlog_path,
+        video_path=video_path,
+        calibration_acquisition_method=read_calibration_acquisition_method(
+            calibration_path
+        ),
+        clip_duration_s=(
+            ground_truth[-1].t_s - ground_truth[0].t_s
+            if ground_truth
+            else 0.0
+        ),
+        emissions_count=len(emissions),
+    )
+    verdict_passed = verdict_passes_ac3(distribution)
+    report_path = _render_and_write_report(
+        distribution=distribution,
+        context=context,
+        passed=verdict_passed,
+        report_dir=report_dir,
+    )
+
+    log.info(
+        "e2e_orchestrator: report written",
+        extra={
+            "kind": "e2e_orchestrator.report_written",
+            "kv": {
+                "report_path": str(report_path),
+                "verdict_passed": verdict_passed,
+                "share_within_threshold_pct": (
+                    distribution.threshold_hit_share.get(
+                        AC3_GATE_THRESHOLD_M, 0.0
+                    )
+                    * 100.0
+                ),
+                "ac3_gate_pct": AC3_GATE_PCT,
+                "emissions_count": len(emissions),
+                "ground_truth_pairings": distribution.count,
+            },
+        },
+    )
+
+    wall_clock_s = max(0.0, time.monotonic() - started)
+    return OrchestrationReport(
+        verdict_passed=verdict_passed,
+        distribution=distribution,
+        report_path=report_path,
+        emissions_count=len(emissions),
+        wall_clock_s=wall_clock_s,
+        replay_subprocess_seconds=replay_subprocess_seconds,
+    )
+
+
+def _validate_inputs(
+    *,
+    base_config_path: Path,
+    tlog_path: Path,
+    video_path: Path,
+    calibration_path: Path,
+    signing_key_path: Path,
+    replay_binary: Path,
+    report_dir: Path,
+) -> None:
+    """Fail fast on missing inputs (AC-5 — surface the failing step early)."""
+    file_inputs: tuple[tuple[str, Path], ...] = (
+        ("base_config_path", base_config_path),
+        ("tlog_path", tlog_path),
+        ("video_path", video_path),
+        ("calibration_path", calibration_path),
+        ("signing_key_path", signing_key_path),
+        ("replay_binary", replay_binary),
+    )
+    for label, path in file_inputs:
+        if not path.is_file():
+            raise OrchestrationFailure(
+                OrchestratorStep.VALIDATE_INPUTS,
+                f"{label} is not a file: {path}",
+            )
+    try:
+        report_dir.mkdir(parents=True, exist_ok=True)
+    except OSError as exc:
+        raise OrchestrationFailure(
+            OrchestratorStep.VALIDATE_INPUTS,
+            f"report_dir {report_dir} cannot be created: {exc!r}",
+        ) from exc
+
+
+def _run_replay_subprocess(
+    *,
+    replay_binary: Path,
+    video_path: Path,
+    tlog_path: Path,
+    output_path: Path,
+    calibration_path: Path,
+    config_path: Path,
+    signing_key_path: Path,
+    max_seconds: float,
+    runner: Callable[..., subprocess.CompletedProcess[str]],
+    env: Mapping[str, str] | None,
+    logger: logging.Logger,
+) -> float:
+    """Invoke gps-denied-replay with --auto-trim; return wall-clock seconds.
+
+    Wraps :class:`subprocess.run` so unit tests can inject a fake
+    runner. ``--auto-trim`` is always enabled here — the
+    orchestrator owns the AZ-405 / AZ-698 sync path (AZ-840 step 1).
+
+    Raises:
+        OrchestrationFailure (step=AIRBORNE_PIPELINE): Non-zero exit,
+            timeout, or runner-level OSError.
+    """
+
+    argv = [
+        str(replay_binary),
+        "--video",
+        str(video_path),
+        "--tlog",
+        str(tlog_path),
+        "--output",
+        str(output_path),
+        "--camera-calibration",
+        str(calibration_path),
+        "--config",
+        str(config_path),
+        "--mavlink-signing-key",
+        str(signing_key_path),
+        "--pace",
+        "asap",
+        "--auto-trim",
+    ]
+    started = time.monotonic()
+    try:
+        completed = runner(
+            argv,
+            capture_output=True,
+            text=True,
+            timeout=max_seconds,
+            env=dict(env) if env is not None else None,
+        )
+    except subprocess.TimeoutExpired as exc:
+        raise OrchestrationFailure(
+            OrchestratorStep.AIRBORNE_PIPELINE,
+            f"gps-denied-replay timed out after {max_seconds:.0f} s",
+        ) from exc
+    except OSError as exc:
+        raise OrchestrationFailure(
+            OrchestratorStep.AIRBORNE_PIPELINE,
+            f"cannot launch gps-denied-replay at {replay_binary}: {exc!r}",
+        ) from exc
+
+    elapsed_s = max(0.0, time.monotonic() - started)
+    if completed.returncode != 0:
+        raise OrchestrationFailure(
+            OrchestratorStep.AIRBORNE_PIPELINE,
+            f"gps-denied-replay exited {completed.returncode}\n"
+            f"stdout:\n{completed.stdout}\nstderr:\n{completed.stderr}",
+        )
+    logger.info(
+        "e2e_orchestrator: replay subprocess complete",
+        extra={
+            "kind": "e2e_orchestrator.replay_subprocess",
+            "kv": {
+                "elapsed_s": elapsed_s,
+                "max_seconds": max_seconds,
+            },
+        },
+    )
+    return elapsed_s
+
+
+def _parse_jsonl(path: Path) -> list[dict[str, Any]]:
+    """Read one JSON record per non-blank line.
+
+    Raises:
+        OrchestrationFailure (step=PARSE_EMISSIONS): Output file
+            missing, unreadable, has zero records, or contains a
+            malformed line.
+    """
+    if not path.is_file():
+        raise OrchestrationFailure(
+            OrchestratorStep.PARSE_EMISSIONS,
+            f"replay output JSONL not found: {path}",
+        )
+    try:
+        text = path.read_text()
+    except OSError as exc:
+        raise OrchestrationFailure(
+            OrchestratorStep.PARSE_EMISSIONS,
+            f"replay output JSONL unreadable at {path}: {exc!r}",
+        ) from exc
+    rows: list[dict[str, Any]] = []
+    for line_idx, line in enumerate(text.splitlines(), start=1):
+        if not line.strip():
+            continue
+        try:
+            row = json.loads(line)
+        except json.JSONDecodeError as exc:
+            raise OrchestrationFailure(
+                OrchestratorStep.PARSE_EMISSIONS,
+                f"malformed JSON at line {line_idx} of {path}: {exc.msg}",
+            ) from exc
+        if not isinstance(row, dict):
+            raise OrchestrationFailure(
+                OrchestratorStep.PARSE_EMISSIONS,
+                f"line {line_idx} of {path} is not a JSON object: {row!r}",
+            )
+        rows.append(row)
+    if not rows:
+        raise OrchestrationFailure(
+            OrchestratorStep.PARSE_EMISSIONS,
+            f"replay output JSONL at {path} has zero records — pipeline "
+            "produced no estimator emissions",
+        )
+    return rows
+
+
+def _load_ground_truth(tlog_path: Path) -> list[GroundTruthRow]:
+    """Extract WGS84 ground truth from the binary tlog.
+
+    Raises:
+        OrchestrationFailure (step=LOAD_GROUND_TRUTH): Loader
+            error or empty record list.
+    """
+    try:
+        series = load_tlog_ground_truth(tlog_path).records
+    except Exception as exc:
+        raise OrchestrationFailure(
+            OrchestratorStep.LOAD_GROUND_TRUTH,
+            f"load_tlog_ground_truth({tlog_path}) failed: {exc!r}",
+        ) from exc
+    rows: list[GroundTruthRow] = [
+        GroundTruthRow(
+            t_s=fix.ts_ns / 1e9,
+            lat_deg=fix.lat_deg,
+            lon_deg=fix.lon_deg,
+            alt_m=fix.alt_m,
+        )
+        for fix in series
+    ]
+    if not rows:
+        raise OrchestrationFailure(
+            OrchestratorStep.LOAD_GROUND_TRUTH,
+            f"tlog ground truth at {tlog_path} has zero rows",
+        )
+    return rows
+
+
+def _compute_distribution(
+    emissions: list[dict[str, Any]],
+    ground_truth: list[GroundTruthRow],
+) -> HorizontalErrorDistribution:
+    """Compute the horizontal-error distribution.
+
+    Raises:
+        OrchestrationFailure (step=COMPUTE_DISTRIBUTION): Helper
+            error or zero ground-truth pairings (every emission
+            fell outside the GT time window).
+    """
+    try:
+        distribution = horizontal_error_distribution(emissions, ground_truth)
+    except Exception as exc:
+        raise OrchestrationFailure(
+            OrchestratorStep.COMPUTE_DISTRIBUTION,
+            f"horizontal_error_distribution failed: {exc!r}",
+        ) from exc
+    if distribution.count == 0:
+        raise OrchestrationFailure(
+            OrchestratorStep.COMPUTE_DISTRIBUTION,
+            "no emissions paired with ground truth — JSONL timestamps "
+            "outside the tlog GPS window?",
+        )
+    return distribution
+
+
+def _render_and_write_report(
+    *,
+    distribution: HorizontalErrorDistribution,
+    context: ReportContext,
+    passed: bool,
+    report_dir: Path,
+) -> Path:
+    """Render the verdict markdown and write it to ``report_dir``.
+
+    Raises:
+        OrchestrationFailure (step=RENDER_REPORT): Render or write
+            failure; ``report_dir`` was already created by
+            :func:`_validate_inputs`.
+    """
+    try:
+        report_text = render_report(distribution, context, passed=passed)
+    except Exception as exc:
+        raise OrchestrationFailure(
+            OrchestratorStep.RENDER_REPORT,
+            f"render_report failed: {exc!r}",
+        ) from exc
+    report_path = (
+        report_dir / f"real_flight_validation_{context.run_date_utc}.md"
+    )
+    try:
+        report_path.write_text(report_text)
+    except OSError as exc:
+        raise OrchestrationFailure(
+            OrchestratorStep.RENDER_REPORT,
+            f"cannot write report at {report_path}: {exc!r}",
+        ) from exc
+    return report_path
@@ -0,0 +1,182 @@
+"""AZ-840 — E2E orchestrator integration test (AC-1 / AC-2 / AC-3 / AC-4 / AC-6).
+
+The Tier-2 entry point that closes Epic AZ-835's narrative: from a
+``(tlog, video, calibration)`` triple, run the full 7-step pipeline
+end-to-end on the Jetson harness without operator hand-curation
+between steps.
+
+The test consumes:
+
+* :func:`tests.e2e.replay.conftest.operator_pre_flight_setup` —
+  the AZ-839 C3 fixture that owns steps 3-5 (route extraction +
+  satellite-provider seeding + FAISS index build) and yields a
+  :class:`PopulatedC6Cache` keyed off a freshly-mktemp'd
+  ``cache_root``.
+* :func:`tests.e2e.replay.conftest.derkachi_replay_inputs` — the
+  shared session fixture that materialises the Derkachi tlog +
+  video + factory-sheet calibration + signing-key file.
+* :func:`tests.e2e.replay._e2e_orchestrator.run_e2e_orchestration`
+  — the AC-1 driver that wires everything below the C3 fixture.
+
+The driver writes a fresh effective replay config per session
+(merging the static operator YAML with the cache_root override),
+invokes ``gps-denied-replay --auto-trim``, parses the JSONL
+emissions, computes the horizontal-error distribution, and writes
+the verdict markdown under ``_docs/06_metrics/`` (AC-2).
+
+Skip gates (in evaluation order):
+
+1. ``@pytest.mark.tier2`` — the per-suite Tier-2 plugin gates this
+   off on dev macOS (matches the AZ-839 / AZ-699 contract).
+2. ``RUN_REPLAY_E2E`` not in ``{1, true, yes, on}``.
+3. ``gps-denied-replay`` console-script not on ``PATH``.
+4. Real video missing or placeholder-sized (mirrors AZ-699's gate).
+5. ``operator_pre_flight_setup`` fixture itself skipped — the
+   downstream consumer inherits the SKIP automatically (pytest's
+   fixture-skip propagation).
+
+AC-7 (AZ-699 continues to pass) is satisfied by inspection: this
+test does not modify ``test_derkachi_real_tlog.py`` and writes its
+report to the same path (``real_flight_validation_<date>.md``) but
+in an idempotent way — both tests writing PASS or both writing
+FAIL is the expected joint outcome on a given clip.
+"""
+
+from __future__ import annotations
+
+import os
+import shutil
+import sys
+from collections.abc import Iterator
+from pathlib import Path
+
+import pytest
+
+from tests.e2e.replay._e2e_orchestrator import (
+    OrchestrationReport,
+    run_e2e_orchestration,
+)
+from tests.e2e.replay._operator_pre_flight import PopulatedC6Cache
+from tests.e2e.replay.conftest import DerkachiReplayInputs
+
+
+def _repo_root() -> Path:
+    return Path(__file__).resolve().parents[3]
+
+
+def _derkachi_dir() -> Path:
+    return _repo_root() / "_docs" / "00_problem" / "input_data" / "flight_derkachi"
+
+
+_MIN_REAL_VIDEO_BYTES: int = 1_000_000
+
+
+def _replay_binary() -> Path | None:
+    """Return the absolute path to ``gps-denied-replay`` or ``None``.
+
+    Same lookup order AZ-699 uses: PATH first, venv bin second.
+    """
+
+    binary = shutil.which("gps-denied-replay")
+    if binary is not None:
+        return Path(binary)
+    venv_bin = Path(sys.executable).parent / "gps-denied-replay"
+    if venv_bin.exists():
+        return venv_bin
+    return None
+
+
+def _orchestrator_skip_reason() -> str | None:
+    """Return a SKIP message when env / inputs preclude a Jetson run."""
+
+    if os.environ.get("RUN_REPLAY_E2E", "").strip().lower() not in {
+        "1",
+        "true",
+        "yes",
+        "on",
+    }:
+        return "AZ-840 e2e orchestrator gated by RUN_REPLAY_E2E=1"
+    if not os.environ.get("GPS_DENIED_OPERATOR_CONFIG_PATH", "").strip():
+        return (
+            "AZ-840 e2e orchestrator requires GPS_DENIED_OPERATOR_CONFIG_PATH "
+            "(same env var the C3 fixture consumes)"
+        )
+    if _replay_binary() is None:
+        return "gps-denied-replay console-script not installed"
+    video = _derkachi_dir() / "flight_derkachi.mp4"
+    if not video.is_file():
+        return f"Derkachi video missing: {video}"
+    if video.stat().st_size < _MIN_REAL_VIDEO_BYTES:
+        return (
+            f"Derkachi video at {video} is only {video.stat().st_size} "
+            "bytes — placeholder, not a real recording"
+        )
+    return None
+
+
+@pytest.fixture
+def az840_skip_gate() -> Iterator[None]:
+    """Skip-gate the orchestrator test before any heavy fixtures resolve."""
+
+    reason = _orchestrator_skip_reason()
+    if reason is not None:
+        pytest.skip(reason)
+    yield
+
+
+@pytest.mark.tier2
+def test_az840_e2e_real_flight_orchestration(
+    az840_skip_gate: None,
+    operator_pre_flight_setup: PopulatedC6Cache,
+    derkachi_replay_inputs: DerkachiReplayInputs,
+    tmp_path: Path,
+) -> None:
+    # Arrange — every input besides cache_root comes from the existing
+    # session fixtures so the same Tier-2 harness setup that powers
+    # AZ-699 + AZ-839 is exercised.
+    binary = _replay_binary()
+    assert binary is not None, "skip gate already verified the binary exists"
+    base_config_path = Path(os.environ["GPS_DENIED_OPERATOR_CONFIG_PATH"])
+    output_path = tmp_path / "estimator_output.jsonl"
+    effective_config_path = tmp_path / "operator_config_effective.yaml"
+    report_dir = _repo_root() / "_docs" / "06_metrics"
+
+    # Act
+    report = run_e2e_orchestration(
+        populated_cache=operator_pre_flight_setup,
+        base_config_path=base_config_path,
+        tlog_path=derkachi_replay_inputs.tlog_path,
+        video_path=derkachi_replay_inputs.video_path,
+        calibration_path=derkachi_replay_inputs.calibration_path,
+        signing_key_path=derkachi_replay_inputs.signing_key_path,
+        replay_binary=binary,
+        output_path=output_path,
+        report_dir=report_dir,
+        effective_config_path=effective_config_path,
+    )
+
+    # Assert AC-2 + AC-4 — report exists; full run within the 15-min budget.
+    assert isinstance(report, OrchestrationReport)
+    assert report.report_path.is_file()
+    body = report.report_path.read_text()
+    assert "## Horizontal error (metres)" in body
+    assert "## Threshold-hit share" in body
+    assert "Mean" in body
+    for threshold in (10, 25, 50, 100):
+        assert f"| {threshold} |" in body, (
+            f"threshold {threshold} m row missing from report"
+        )
+    assert report.replay_subprocess_seconds <= 900.0, (
+        "AZ-840 AC-4: replay subprocess exceeded 15-min soft target"
+    )
+    assert report.wall_clock_s >= report.replay_subprocess_seconds
+    assert report.distribution.count > 0, (
+        "no emissions paired with ground truth — orchestration produced "
+        "data but every emission fell outside the tlog GPS window"
+    )
+
+    # Assert AC-3 — the effective config was written and points at the
+    # cache_root the C3 fixture supplied.
+    assert effective_config_path.is_file()
+    effective_text = effective_config_path.read_text()
+    assert str(operator_pre_flight_setup.cache_root) in effective_text
@@ -0,0 +1,671 @@
+"""Unit tests for the AZ-840 e2e orchestrator (AC-8).
+
+The end-to-end happy path is the Tier-2 integration test in
+``test_az835_e2e_real_flight.py`` (AC-1 / AC-2). This module covers
+the orchestration helper layer in isolation:
+
+* Param validation — every required path must exist before the
+  airborne subprocess is spawned (AC-5 fails LOUD).
+* Effective-config merge — the ``c6_tile_cache.root_dir`` override
+  is written to YAML; the rest of the base config is preserved.
+* Error propagation per step — every documented failure surfaces
+  as :class:`OrchestrationFailure` with the correct
+  :class:`OrchestratorStep` label.
+* Happy path — when the runner returns success and the JSONL +
+  ground truth align, :class:`OrchestrationReport` carries a
+  written report path and an honest verdict (AC-2: report exists
+  PASS or FAIL).
+
+The tests inject a fake ``runner`` so no real
+``gps-denied-replay`` subprocess is spawned. Real binary execution
+is exercised on the Jetson harness via the AC-1 integration test.
+"""
+
+from __future__ import annotations
+
+import json
+import subprocess
+from pathlib import Path
+from unittest.mock import MagicMock
+
+import pytest
+import yaml
+
+from gps_denied_onboard.helpers.accuracy_report import (
+    AC3_GATE_THRESHOLD_M,
+)
+from gps_denied_onboard.replay_input.tlog_route import RouteSpec
+
+from tests.e2e.replay._e2e_orchestrator import (
+    OrchestrationFailure,
+    OrchestrationReport,
+    OrchestratorStep,
+    read_calibration_acquisition_method,
+    run_e2e_orchestration,
+    write_effective_replay_config,
+)
+from tests.e2e.replay._operator_pre_flight import PopulatedC6Cache
+
+
+# ----------------------------------------------------------------------
+# Helpers
+
+
+def _build_populated_cache(tmp_path: Path) -> PopulatedC6Cache:
+    """Construct a synthetic :class:`PopulatedC6Cache`.
+
+    The orchestrator only consumes ``cache_root`` from the cache,
+    so the FAISS sidecar paths are placeholders. The route_spec is
+    a minimal one-waypoint instance — no AZ-836 invariants are
+    re-asserted by AZ-840.
+    """
+
+    cache_root = tmp_path / "cache_root"
+    cache_root.mkdir()
+    return PopulatedC6Cache(
+        cache_root=cache_root,
+        tile_store_path=cache_root / "tiles",
+        faiss_index_path=cache_root / "descriptor.index",
+        faiss_sidecar_sha256_path=cache_root / "descriptor.index.sha256",
+        faiss_sidecar_meta_path=cache_root / "descriptor.index.meta.json",
+        route_spec=RouteSpec(
+            waypoints=((50.10, 36.10),),
+            suggested_region_size_meters=500.0,
+            source_tlog=Path("test.tlog"),
+            source_segment=(0, 100),
+            total_distance_meters=0.0,
+        ),
+        tile_count=1,
+        elapsed_seconds=0.0,
+    )
+
+
+def _stage_inputs(tmp_path: Path) -> dict[str, Path]:
+    """Write touch-files for every input path the orchestrator validates.
+
+    The base config YAML carries one stub block so the merge step
+    has a real document to overlay on.
+    """
+
+    base_config = tmp_path / "operator_config.yaml"
+    base_config.write_text(
+        yaml.safe_dump(
+            {
+                "mode": "replay",
+                "c6_tile_cache": {
+                    "store_runtime": "postgres_filesystem",
+                    "metadata_runtime": "postgres_filesystem",
+                    "descriptor_index_runtime": "faiss_hnsw",
+                    "root_dir": "/var/lib/gps-denied/tiles",
+                    "faiss_index_path": "/some/static/path/descriptor.index",
+                },
+            }
+        )
+    )
+
+    tlog = tmp_path / "input.tlog"
+    tlog.write_bytes(b"\x00")
+    video = tmp_path / "input.mp4"
+    video.write_bytes(b"\x00")
+    calibration = tmp_path / "calibration.json"
+    calibration.write_text(json.dumps({"acquisition_method": "factory-sheet"}))
+    signing_key = tmp_path / "signing_key.bin"
+    signing_key.write_bytes(b"\x00" * 32)
+    binary = tmp_path / "gps-denied-replay"
+    binary.write_text("")
+
+    return {
+        "base_config_path": base_config,
+        "tlog_path": tlog,
+        "video_path": video,
+        "calibration_path": calibration,
+        "signing_key_path": signing_key,
+        "replay_binary": binary,
+    }
+
+
+def _ground_truth_tlog_loader(
+    monkeypatch: pytest.MonkeyPatch,
+    *,
+    times_s: tuple[float, ...] = (0.0, 1.0, 2.0),
+    lat_deg: float = 50.10,
+    lon_deg: float = 36.10,
+    alt_m: float = 100.0,
+) -> None:
+    """Stub the orchestrator's ground-truth loader so unit tests skip MAVLink.
+
+    The orchestrator imports ``load_tlog_ground_truth`` from
+    ``gps_denied_onboard.replay_input``; patching the symbol *as
+    bound on the orchestrator module* keeps the patch local to the
+    unit suite (no cross-test bleed).
+    """
+
+    fixes = [
+        _StubGpsFix(
+            ts_ns=int(t * 1e9),
+            lat_deg=lat_deg,
+            lon_deg=lon_deg,
+            alt_m=alt_m,
+        )
+        for t in times_s
+    ]
+    series = _StubGpsSeries(records=tuple(fixes))
+    monkeypatch.setattr(
+        "tests.e2e.replay._e2e_orchestrator.load_tlog_ground_truth",
+        lambda *_args, **_kwargs: series,
+    )
+
+
+class _StubGpsFix:
+    """Mirrors the fields the orchestrator reads from each tlog row."""
+
+    __slots__ = ("ts_ns", "lat_deg", "lon_deg", "alt_m")
+
+    def __init__(
+        self, *, ts_ns: int, lat_deg: float, lon_deg: float, alt_m: float
+    ) -> None:
+        self.ts_ns = ts_ns
+        self.lat_deg = lat_deg
+        self.lon_deg = lon_deg
+        self.alt_m = alt_m
+
+
+class _StubGpsSeries:
+    """Drop-in replacement for :class:`TlogGroundTruth`."""
+
+    def __init__(self, *, records: tuple[_StubGpsFix, ...]) -> None:
+        self.records = records
+
+
+def _build_runner_emitting(
+    output_path: Path,
+    *,
+    rows: list[dict[str, object]],
+    returncode: int = 0,
+    stdout: str = "",
+    stderr: str = "",
+) -> "MagicMock":
+    """Return a fake ``subprocess.run`` that writes JSONL on call."""
+
+    def _run(argv, **kwargs):  # type: ignore[no-untyped-def]
+        if rows:
+            output_path.parent.mkdir(parents=True, exist_ok=True)
+            output_path.write_text(
+                "\n".join(json.dumps(row) for row in rows) + "\n"
+            )
+        return subprocess.CompletedProcess(
+            args=argv,
+            returncode=returncode,
+            stdout=stdout,
+            stderr=stderr,
+        )
+
+    return MagicMock(side_effect=_run)
+
+
+# ----------------------------------------------------------------------
+# write_effective_replay_config
+
+
+def test_write_effective_replay_config_overlays_root_dir(
+    tmp_path: Path,
+) -> None:
+    # Arrange
+    inputs = _stage_inputs(tmp_path)
+    cache_root = tmp_path / "cache"
+    cache_root.mkdir()
+    output_path = tmp_path / "effective.yaml"
+
+    # Act
+    written_path = write_effective_replay_config(
+        base_config_path=inputs["base_config_path"],
+        cache_root=cache_root,
+        output_path=output_path,
+    )
+
+    # Assert
+    assert written_path == output_path
+    merged = yaml.safe_load(output_path.read_text())
+    assert merged["c6_tile_cache"]["root_dir"] == str(cache_root)
+    assert merged["c6_tile_cache"]["faiss_index_path"] == ""
+    assert merged["mode"] == "replay"
+    assert (
+        merged["c6_tile_cache"]["store_runtime"] == "postgres_filesystem"
+    ), "non-overridden c6_tile_cache fields must survive"
+
+
+def test_write_effective_replay_config_creates_block_when_absent(
+    tmp_path: Path,
+) -> None:
+    # Arrange
+    base = tmp_path / "operator.yaml"
+    base.write_text(yaml.safe_dump({"mode": "replay"}))
+    cache_root = tmp_path / "cache"
+    cache_root.mkdir()
+
+    # Act
+    write_effective_replay_config(
+        base_config_path=base,
+        cache_root=cache_root,
+        output_path=tmp_path / "effective.yaml",
+    )
+
+    # Assert
+    merged = yaml.safe_load((tmp_path / "effective.yaml").read_text())
+    assert merged["c6_tile_cache"]["root_dir"] == str(cache_root)
+
+
+def test_write_effective_replay_config_malformed_yaml_fails(
+    tmp_path: Path,
+) -> None:
+    # Arrange
+    base = tmp_path / "bad.yaml"
+    base.write_text(":\n  : not yaml:")
+    cache_root = tmp_path / "cache"
+    cache_root.mkdir()
+
+    # Act + Assert
+    with pytest.raises(OrchestrationFailure) as exc_info:
+        write_effective_replay_config(
+            base_config_path=base,
+            cache_root=cache_root,
+            output_path=tmp_path / "effective.yaml",
+        )
+    assert exc_info.value.step is OrchestratorStep.WRITE_EFFECTIVE_CONFIG
+
+
+def test_write_effective_replay_config_non_mapping_top_level_fails(
+    tmp_path: Path,
+) -> None:
+    # Arrange
+    base = tmp_path / "bad.yaml"
+    base.write_text("- not a mapping\n")
+    cache_root = tmp_path / "cache"
+    cache_root.mkdir()
+
+    # Act + Assert
+    with pytest.raises(OrchestrationFailure) as exc_info:
+        write_effective_replay_config(
+            base_config_path=base,
+            cache_root=cache_root,
+            output_path=tmp_path / "effective.yaml",
+        )
+    assert exc_info.value.step is OrchestratorStep.WRITE_EFFECTIVE_CONFIG
+
+
+# ----------------------------------------------------------------------
+# read_calibration_acquisition_method
+
+
+def test_read_calibration_acquisition_method_returns_field_when_present(
+    tmp_path: Path,
+) -> None:
+    # Arrange
+    path = tmp_path / "cal.json"
+    path.write_text(json.dumps({"acquisition_method": "factory-sheet"}))
+
+    # Assert
+    assert read_calibration_acquisition_method(path) == "factory-sheet"
+
+
+def test_read_calibration_acquisition_method_returns_unknown_on_missing(
+    tmp_path: Path,
+) -> None:
+    # Arrange
+    path = tmp_path / "cal.json"
+    path.write_text(json.dumps({"some_other_field": True}))
+
+    # Assert
+    assert read_calibration_acquisition_method(path) == "unknown"
+
+
+def test_read_calibration_acquisition_method_returns_unknown_on_malformed(
+    tmp_path: Path,
+) -> None:
+    # Arrange
+    path = tmp_path / "cal.json"
+    path.write_text("{not valid json")
+
+    # Assert
+    assert read_calibration_acquisition_method(path) == "unknown"
+
+
+# ----------------------------------------------------------------------
+# run_e2e_orchestration — param validation (AC-5)
+
+
+def test_run_e2e_orchestration_missing_tlog_fails_loud(
+    tmp_path: Path,
+) -> None:
+    # Arrange
+    cache = _build_populated_cache(tmp_path)
+    inputs = _stage_inputs(tmp_path)
+    inputs["tlog_path"].unlink()
+
+    # Act + Assert
+    with pytest.raises(OrchestrationFailure) as exc_info:
+        run_e2e_orchestration(
+            populated_cache=cache,
+            output_path=tmp_path / "out.jsonl",
+            report_dir=tmp_path / "metrics",
+            effective_config_path=tmp_path / "eff.yaml",
+            **inputs,  # type: ignore[arg-type]
+        )
+    assert exc_info.value.step is OrchestratorStep.VALIDATE_INPUTS
+    assert "tlog_path" in str(exc_info.value)
+
+
+def test_run_e2e_orchestration_missing_binary_fails_loud(
+    tmp_path: Path,
+) -> None:
+    # Arrange
+    cache = _build_populated_cache(tmp_path)
+    inputs = _stage_inputs(tmp_path)
+    inputs["replay_binary"].unlink()
+
+    # Act + Assert
+    with pytest.raises(OrchestrationFailure) as exc_info:
+        run_e2e_orchestration(
+            populated_cache=cache,
+            output_path=tmp_path / "out.jsonl",
+            report_dir=tmp_path / "metrics",
+            effective_config_path=tmp_path / "eff.yaml",
+            **inputs,  # type: ignore[arg-type]
+        )
+    assert exc_info.value.step is OrchestratorStep.VALIDATE_INPUTS
+    assert "replay_binary" in str(exc_info.value)
+
+
+# ----------------------------------------------------------------------
+# run_e2e_orchestration — subprocess error propagation (AC-5)
+
+
+def test_run_e2e_orchestration_replay_nonzero_exit_fails_loud(
+    tmp_path: Path,
+    monkeypatch: pytest.MonkeyPatch,
+) -> None:
+    # Arrange
+    cache = _build_populated_cache(tmp_path)
+    inputs = _stage_inputs(tmp_path)
+    output_path = tmp_path / "out.jsonl"
+    runner = MagicMock(
+        return_value=subprocess.CompletedProcess(
+            args=[],
+            returncode=1,
+            stdout="",
+            stderr="boom",
+        )
+    )
+    _ground_truth_tlog_loader(monkeypatch)
+
+    # Act + Assert
+    with pytest.raises(OrchestrationFailure) as exc_info:
+        run_e2e_orchestration(
+            populated_cache=cache,
+            output_path=output_path,
+            report_dir=tmp_path / "metrics",
+            effective_config_path=tmp_path / "eff.yaml",
+            runner=runner,
+            **inputs,  # type: ignore[arg-type]
+        )
+    assert exc_info.value.step is OrchestratorStep.AIRBORNE_PIPELINE
+    assert "exited 1" in str(exc_info.value)
+    assert "boom" in str(exc_info.value)
+
+
+def test_run_e2e_orchestration_replay_timeout_fails_loud(
+    tmp_path: Path,
+) -> None:
+    # Arrange
+    cache = _build_populated_cache(tmp_path)
+    inputs = _stage_inputs(tmp_path)
+
+    def _timeout(*_args, **_kwargs):
+        raise subprocess.TimeoutExpired(cmd=["replay"], timeout=0.1)
+
+    runner = MagicMock(side_effect=_timeout)
+
+    # Act + Assert
+    with pytest.raises(OrchestrationFailure) as exc_info:
+        run_e2e_orchestration(
+            populated_cache=cache,
+            output_path=tmp_path / "out.jsonl",
+            report_dir=tmp_path / "metrics",
+            effective_config_path=tmp_path / "eff.yaml",
+            runner=runner,
+            max_seconds=0.1,
+            **inputs,  # type: ignore[arg-type]
+        )
+    assert exc_info.value.step is OrchestratorStep.AIRBORNE_PIPELINE
+    assert "timed out" in str(exc_info.value)
+
+
+def test_run_e2e_orchestration_replay_oserror_fails_loud(
+    tmp_path: Path,
+) -> None:
+    # Arrange
+    cache = _build_populated_cache(tmp_path)
+    inputs = _stage_inputs(tmp_path)
+
+    def _oserror(*_args, **_kwargs):
+        raise OSError("permission denied")
+
+    runner = MagicMock(side_effect=_oserror)
+
+    # Act + Assert
+    with pytest.raises(OrchestrationFailure) as exc_info:
+        run_e2e_orchestration(
+            populated_cache=cache,
+            output_path=tmp_path / "out.jsonl",
+            report_dir=tmp_path / "metrics",
+            effective_config_path=tmp_path / "eff.yaml",
+            runner=runner,
+            **inputs,  # type: ignore[arg-type]
+        )
+    assert exc_info.value.step is OrchestratorStep.AIRBORNE_PIPELINE
+    assert "permission denied" in str(exc_info.value)
+
+
+# ----------------------------------------------------------------------
+# run_e2e_orchestration — empty / malformed JSONL (AC-5)
+
+
+def test_run_e2e_orchestration_empty_jsonl_fails_loud(
+    tmp_path: Path,
+    monkeypatch: pytest.MonkeyPatch,
+) -> None:
+    # Arrange
+    cache = _build_populated_cache(tmp_path)
+    inputs = _stage_inputs(tmp_path)
+    output_path = tmp_path / "out.jsonl"
+
+    def _runner(argv, **_kwargs):  # type: ignore[no-untyped-def]
+        output_path.write_text("\n\n")  # only blanks
+        return subprocess.CompletedProcess(args=argv, returncode=0, stdout="", stderr="")
+
+    runner = MagicMock(side_effect=_runner)
+    _ground_truth_tlog_loader(monkeypatch)
+
+    # Act + Assert
+    with pytest.raises(OrchestrationFailure) as exc_info:
+        run_e2e_orchestration(
+            populated_cache=cache,
+            output_path=output_path,
+            report_dir=tmp_path / "metrics",
+            effective_config_path=tmp_path / "eff.yaml",
+            runner=runner,
+            **inputs,  # type: ignore[arg-type]
+        )
+    assert exc_info.value.step is OrchestratorStep.PARSE_EMISSIONS
+
+
+def test_run_e2e_orchestration_malformed_jsonl_fails_loud(
+    tmp_path: Path,
+    monkeypatch: pytest.MonkeyPatch,
+) -> None:
+    # Arrange
+    cache = _build_populated_cache(tmp_path)
+    inputs = _stage_inputs(tmp_path)
+    output_path = tmp_path / "out.jsonl"
+
+    def _runner(argv, **_kwargs):  # type: ignore[no-untyped-def]
+        output_path.write_text('{"valid": true}\nnot a json line\n')
+        return subprocess.CompletedProcess(args=argv, returncode=0, stdout="", stderr="")
+
+    runner = MagicMock(side_effect=_runner)
+    _ground_truth_tlog_loader(monkeypatch)
+
+    # Act + Assert
+    with pytest.raises(OrchestrationFailure) as exc_info:
+        run_e2e_orchestration(
+            populated_cache=cache,
+            output_path=output_path,
+            report_dir=tmp_path / "metrics",
+            effective_config_path=tmp_path / "eff.yaml",
+            runner=runner,
+            **inputs,  # type: ignore[arg-type]
+        )
+    assert exc_info.value.step is OrchestratorStep.PARSE_EMISSIONS
+
+
+# ----------------------------------------------------------------------
+# run_e2e_orchestration — ground truth loader failure (AC-5)
+
+
+def test_run_e2e_orchestration_ground_truth_loader_failure_fails_loud(
+    tmp_path: Path,
+    monkeypatch: pytest.MonkeyPatch,
+) -> None:
+    # Arrange
+    cache = _build_populated_cache(tmp_path)
+    inputs = _stage_inputs(tmp_path)
+    output_path = tmp_path / "out.jsonl"
+    runner = _build_runner_emitting(
+        output_path,
+        rows=[
+            {
+                "emitted_at": int(0.5 * 1e9),
+                "position_wgs84": {
+                    "lat_deg": 50.10,
+                    "lon_deg": 36.10,
+                    "alt_m": 100.0,
+                },
+            }
+        ],
+    )
+
+    def _raise(*_args, **_kwargs):
+        raise ValueError("tlog corrupt")
+
+    monkeypatch.setattr(
+        "tests.e2e.replay._e2e_orchestrator.load_tlog_ground_truth",
+        _raise,
+    )
+
+    # Act + Assert
+    with pytest.raises(OrchestrationFailure) as exc_info:
+        run_e2e_orchestration(
+            populated_cache=cache,
+            output_path=output_path,
+            report_dir=tmp_path / "metrics",
+            effective_config_path=tmp_path / "eff.yaml",
+            runner=runner,
+            **inputs,  # type: ignore[arg-type]
+        )
+    assert exc_info.value.step is OrchestratorStep.LOAD_GROUND_TRUTH
+    assert "tlog corrupt" in str(exc_info.value)
+
+
+# ----------------------------------------------------------------------
+# run_e2e_orchestration — happy path (AC-1 / AC-2)
+
+
+def test_run_e2e_orchestration_happy_path_writes_report(
+    tmp_path: Path,
+    monkeypatch: pytest.MonkeyPatch,
+) -> None:
+    # Arrange
+    cache = _build_populated_cache(tmp_path)
+    inputs = _stage_inputs(tmp_path)
+    output_path = tmp_path / "out.jsonl"
+    report_dir = tmp_path / "metrics"
+    effective_config_path = tmp_path / "eff.yaml"
+    rows = [
+        {
+            "emitted_at": int(0.5 * 1e9),
+            "position_wgs84": {"lat_deg": 50.10, "lon_deg": 36.10, "alt_m": 100.0},
+        },
+        {
+            "emitted_at": int(1.5 * 1e9),
+            "position_wgs84": {"lat_deg": 50.10, "lon_deg": 36.10, "alt_m": 100.0},
+        },
+    ]
+    runner = _build_runner_emitting(output_path, rows=rows)
+    _ground_truth_tlog_loader(monkeypatch)
+
+    # Act
+    report = run_e2e_orchestration(
+        populated_cache=cache,
+        output_path=output_path,
+        report_dir=report_dir,
+        effective_config_path=effective_config_path,
+        runner=runner,
+        run_date_utc="2026-05-23",
+        **inputs,  # type: ignore[arg-type]
+    )
+
+    # Assert
+    assert isinstance(report, OrchestrationReport)
+    assert report.report_path.is_file()
+    assert report.emissions_count == 2
+    assert report.distribution.count == 2
+    assert report.verdict_passed is True
+    body = report.report_path.read_text()
+    assert "## Horizontal error (metres)" in body
+    assert "## Threshold-hit share" in body
+    assert f"| {AC3_GATE_THRESHOLD_M:g} |" in body
+    runner.assert_called_once()
+    argv_passed = runner.call_args.args[0]
+    assert str(effective_config_path) in argv_passed
+    assert "--auto-trim" in argv_passed
+    merged = yaml.safe_load(effective_config_path.read_text())
+    assert merged["c6_tile_cache"]["root_dir"] == str(cache.cache_root)
+
+
+def test_run_e2e_orchestration_writes_report_even_on_fail_verdict(
+    tmp_path: Path,
+    monkeypatch: pytest.MonkeyPatch,
+) -> None:
+    # Arrange — emissions are 1 km from ground truth, far above the 100 m gate.
+    cache = _build_populated_cache(tmp_path)
+    inputs = _stage_inputs(tmp_path)
+    output_path = tmp_path / "out.jsonl"
+    report_dir = tmp_path / "metrics"
+    rows = [
+        {
+            "emitted_at": int(0.5 * 1e9),
+            "position_wgs84": {"lat_deg": 50.110, "lon_deg": 36.110, "alt_m": 100.0},
+        },
+        {
+            "emitted_at": int(1.5 * 1e9),
+            "position_wgs84": {"lat_deg": 50.110, "lon_deg": 36.110, "alt_m": 100.0},
+        },
+    ]
+    runner = _build_runner_emitting(output_path, rows=rows)
+    _ground_truth_tlog_loader(monkeypatch)
+
+    # Act
+    report = run_e2e_orchestration(
+        populated_cache=cache,
+        output_path=output_path,
+        report_dir=report_dir,
+        effective_config_path=tmp_path / "eff.yaml",
+        runner=runner,
+        run_date_utc="2026-05-23",
+        **inputs,  # type: ignore[arg-type]
+    )
+
+    # Assert — AC-2: report exists regardless of PASS/FAIL.
+    assert report.verdict_passed is False
+    assert report.report_path.is_file()
+    assert "FAIL" in report.report_path.read_text()