mirror of
https://github.com/azaion/gps-denied-onboard.git
synced 2026-06-21 09:01:14 +00:00
[AZ-840] [AZ-835] e2e orchestrator test (E-AZ-835 C4)
Wraps the AZ-699 verdict-report path with the AZ-839 operator_pre_flight_setup C3 fixture so a single Tier-2 test takes only (tlog, video, calibration) and runs the full 7-step pipeline on the Jetson harness without operator hand-curation. New surface (tests-only, no src/ changes): - tests/e2e/replay/_e2e_orchestrator.py — orchestrator with OrchestratorStep enum, OrchestrationFailure exception (step prefix per AC-5), OrchestrationReport dataclass, write_effective_replay_config helper, and run_e2e_orchestration entry point covering steps 1-2-6-7. - tests/e2e/replay/test_e2e_orchestrator_unit.py — 17 unit tests covering each failure mode + happy path with mocked subprocess + ground-truth loader (AC-8). - tests/e2e/replay/test_az835_e2e_real_flight.py — Tier-2 + RUN_REPLAY_E2E gated integration test asserting verdict report exists, 15-min budget held (AC-1, AC-2, AC-3, AC-4, AC-6). The effective config write overlays c6_tile_cache.root_dir onto the static operator YAML at runtime so the airborne subprocess shares the cache_root the C3 fixture chose. Field- level merge — every other operator-config block stays verbatim. The static YAML on disk is never touched. Test run: tests/e2e/replay 45 passed, 10 skipped (10 skips were 9 pre-existing + 1 new tier2). No src/ touched, no AZ-839 driver changes; AC-7 (AZ-699 still passes) holds by inspection. Co-authored-by: Cursor <cursoragent@cursor.com>
This commit is contained in:
@@ -0,0 +1,171 @@
|
|||||||
|
# Batch 109 — Cycle 3 — AZ-840 e2e orchestrator test
|
||||||
|
|
||||||
|
**Date**: 2026-05-23
|
||||||
|
**Tasks**: AZ-840 (C4 — Epic AZ-835).
|
||||||
|
**Story points**: 3 (per the task spec).
|
||||||
|
**Jira status**: AZ-840 In Progress → In Testing at commit step.
|
||||||
|
|
||||||
|
## Why this batch exists
|
||||||
|
|
||||||
|
Epic AZ-835 (real-flight e2e validation) needs a single Tier-2
|
||||||
|
test that proves the 7-step pipeline runs from
|
||||||
|
`(tlog, video, calibration)` to a horizontal-error verdict
|
||||||
|
without operator hand-curation between steps. Steps 3-5 were
|
||||||
|
delivered by AZ-839 (C3 — `operator_pre_flight_setup`); steps
|
||||||
|
1-2-6-7 are this batch.
|
||||||
|
|
||||||
|
The AZ-839 batch 108b follow-up note explicitly anticipated this
|
||||||
|
batch: "AZ-840 will additionally need to feed the airborne
|
||||||
|
replay binary a config that points at the same `cache_root`
|
||||||
|
... the cleanest path is for AZ-840 to write an effective YAML
|
||||||
|
at runtime from the same override recipe used here."
|
||||||
|
|
||||||
|
## What this batch ships
|
||||||
|
|
||||||
|
A driver module + unit test suite + Tier-2 integration test:
|
||||||
|
|
||||||
|
* `tests/e2e/replay/_e2e_orchestrator.py` — wraps the AZ-699
|
||||||
|
verdict-report path with the AZ-839 C3 fixture's
|
||||||
|
`PopulatedC6Cache`. Public surface:
|
||||||
|
* `OrchestratorStep` enum — failure-step labels per AC-5.
|
||||||
|
* `OrchestrationFailure(step, message)` exception — wraps
|
||||||
|
every step failure with the step name in the message prefix.
|
||||||
|
* `OrchestrationReport` dataclass — verdict, distribution,
|
||||||
|
paths, wall-clock measurements per AC-4.
|
||||||
|
* `write_effective_replay_config` — small helper that overlays
|
||||||
|
`c6_tile_cache.root_dir` onto the static operator YAML.
|
||||||
|
* `read_calibration_acquisition_method` — mirror of AZ-699's
|
||||||
|
helper so the report writer keeps the same shape.
|
||||||
|
* `run_e2e_orchestration` — the AC-1 entry point wiring
|
||||||
|
validate → write_config → airborne subprocess → parse JSONL
|
||||||
|
→ load tlog GT → compute distribution → render report.
|
||||||
|
* `tests/e2e/replay/test_e2e_orchestrator_unit.py` — 17 unit
|
||||||
|
tests covering each of the 7 steps' failure modes plus the
|
||||||
|
happy path. The runner is injected (`subprocess.run` default)
|
||||||
|
so unit tests stage synthetic JSONL output without touching
|
||||||
|
the airborne binary. `load_tlog_ground_truth` is monkeypatched
|
||||||
|
to return a synthetic 3-row series.
|
||||||
|
* `tests/e2e/replay/test_az835_e2e_real_flight.py::
|
||||||
|
test_az840_e2e_real_flight_orchestration` — Tier-2 + RUN_REPLAY_E2E
|
||||||
|
gated test that consumes the C3 fixture + Derkachi inputs and
|
||||||
|
asserts the verdict markdown is written, the threshold-hit
|
||||||
|
share table is present, and the 15-min budget held.
|
||||||
|
|
||||||
|
## AC coverage
|
||||||
|
|
||||||
|
| AC | Description | Coverage |
|
||||||
|
|-----|----------------------------------------------------------|---------------------------------------------------------------------------------------------------------------------------|
|
||||||
|
| AC-1| Steps 1-7 end-to-end on Tier-2 from a fresh tlog/video | `test_az840_e2e_real_flight_orchestration` (Tier-2-gated); 17 unit tests prove the orchestrator structure |
|
||||||
|
| AC-2| Verdict report exists either PASS or FAIL | `test_run_e2e_orchestration_writes_report_even_on_fail_verdict` + integration assertion `report_path.is_file()` |
|
||||||
|
| AC-3| Reuses C3 fixture (`operator_pre_flight_setup`) | Integration test consumes the fixture; effective config overlay points at `populated_cache.cache_root` |
|
||||||
|
| AC-4| 15-min wall-time soft target on the Derkachi clip | `_DEFAULT_MAX_SECONDS = 900.0` passed as `subprocess.run` `timeout`; integration asserts `replay_subprocess_seconds <= 900`|
|
||||||
|
| AC-5| Mid-pipeline failure fails LOUD with a clear step prefix | `OrchestratorStep` enum + 8 step-specific failure unit tests (`validate`/`write_config`/`airborne` × 3/`parse` × 2/`gt`) |
|
||||||
|
| AC-6| Gated by `RUN_REPLAY_E2E=1` + Tier-2 marker | `_orchestrator_skip_reason()` checks env vars + binary + video size; `@pytest.mark.tier2` decorator |
|
||||||
|
| AC-7| AZ-699 verdict test continues to pass | No changes to `test_derkachi_real_tlog.py`; same `real_flight_validation_<date>.md` report path convention |
|
||||||
|
| AC-8| Unit-tested orchestration helper without Tier-2 inputs | 17 unit tests covering config write (4) + calibration parse (3) + run helper (10) — all use mocked subprocess + GT loader |
|
||||||
|
|
||||||
|
## Test run results
|
||||||
|
|
||||||
|
```
|
||||||
|
$ .venv/bin/pytest tests/e2e/replay/ -v --tb=short --timeout=60
|
||||||
|
============================ 45 passed, 10 skipped, 3 warnings in 0.78s ============
|
||||||
|
```
|
||||||
|
|
||||||
|
Breakdown:
|
||||||
|
* 17 new orchestrator unit tests pass.
|
||||||
|
* 11 AZ-839 driver unit tests still pass (no driver changes).
|
||||||
|
* 14 helper unit tests (`test_helpers.py`) still pass.
|
||||||
|
* 3 derkachi-1min mode-agnostic AST tests still pass.
|
||||||
|
* 10 skips: 1 new Tier-2 (this AZ-840 integration), 6
|
||||||
|
RUN_REPLAY_E2E gated AZ-404 cases, 1 AC-8 D-PROJ-2 placeholder,
|
||||||
|
1 Tier-2 AZ-699, 1 Tier-2 AZ-839 integration. None are
|
||||||
|
regressions; the tier2 gate trips off-Jetson.
|
||||||
|
|
||||||
|
## Design notes
|
||||||
|
|
||||||
|
### `--auto-trim` ownership
|
||||||
|
|
||||||
|
The orchestrator passes `--auto-trim` unconditionally so AZ-405 /
|
||||||
|
AZ-698 active-flight-cut + tlog/video sync (Epic step 1) runs
|
||||||
|
inside the airborne binary every time. The Epic narrative does
|
||||||
|
not separate trim from the airborne pipeline; collapsing them
|
||||||
|
into a single subprocess invocation matches AZ-699 and avoids
|
||||||
|
duplicating the trim path.
|
||||||
|
|
||||||
|
### `clip_duration_s` parity with AZ-699
|
||||||
|
|
||||||
|
`run_e2e_orchestration` computes
|
||||||
|
`clip_duration_s = ground_truth[-1].t_s - ground_truth[0].t_s`
|
||||||
|
exactly as `test_derkachi_real_tlog.py` does. This means both
|
||||||
|
verdict reports name the same clip duration even when the
|
||||||
|
trimmed video is shorter than the ground-truth window — a
|
||||||
|
deliberate choice: the report header documents what the verdict
|
||||||
|
covers, not what the binary processed.
|
||||||
|
|
||||||
|
### Effective config write — single source of truth
|
||||||
|
|
||||||
|
`write_effective_replay_config` materialises the same override
|
||||||
|
recipe AZ-839 uses in-memory, but on disk so the airborne
|
||||||
|
subprocess sees the cache_root the fixture chose. Field-level
|
||||||
|
merge: every other block in the operator YAML is preserved
|
||||||
|
verbatim; only `c6_tile_cache.root_dir` and
|
||||||
|
`c6_tile_cache.faiss_index_path` are overwritten. The static
|
||||||
|
operator YAML on disk is never touched.
|
||||||
|
|
||||||
|
### Failure surface = step prefix
|
||||||
|
|
||||||
|
`OrchestrationFailure` always prefixes its message with
|
||||||
|
`[<step>]`. CI log scrapers and pytest's traceback printer both
|
||||||
|
surface the prefix on the first line; AC-5 ("clear error
|
||||||
|
pointing at the failing step") holds without requiring the test
|
||||||
|
to inspect the exception object. The step is also exposed as
|
||||||
|
`exc.step` for programmatic assertions.
|
||||||
|
|
||||||
|
## Files changed
|
||||||
|
|
||||||
|
* `tests/e2e/replay/_e2e_orchestrator.py` (new, 656 LOC).
|
||||||
|
* `tests/e2e/replay/test_e2e_orchestrator_unit.py` (new, 660+ LOC).
|
||||||
|
* `tests/e2e/replay/test_az835_e2e_real_flight.py` (new, 156 LOC).
|
||||||
|
|
||||||
|
No `src/` changes, no operator-config YAML changes, no AZ-839
|
||||||
|
driver changes. AZ-840 is purely additive at the test layer.
|
||||||
|
|
||||||
|
## Code review (self-review)
|
||||||
|
|
||||||
|
Verdict: **PASS_WITH_WARNINGS**.
|
||||||
|
|
||||||
|
| Phase | Result |
|
||||||
|
|-------|--------|
|
||||||
|
| 1. Context loading | Re-read `gps_compare.py`, `accuracy_report.py`, `replay_input.py`, `cli/replay.py`, `test_derkachi_real_tlog.py`. Emission schema (`emitted_at`, `position_wgs84`) is the same shape `gps-denied-replay` writes. |
|
||||||
|
| 2. Spec compliance | All 8 AZ-840 ACs covered; AC-7 holds by inspection (no AZ-699 changes). |
|
||||||
|
| 3. Code quality | All public types have docstrings; failure messages name the upstream exception via `repr` so `OSError` / `subprocess.TimeoutExpired` carry through. Runner kw-args mirror `subprocess.run` signature 1:1. |
|
||||||
|
| 4. Security quick-scan | Effective config write goes to a tmp file the test owns; no secrets in the YAML overlay (override is two string fields). Subprocess `env` is opt-in (`None` defaults to `os.environ`). |
|
||||||
|
| 5. Performance scan | Unit tests run in 0.51 s. Tier-2 wall-clock cap is 900 s, enforced by the subprocess timeout. |
|
||||||
|
| 6. Cross-task consistency | `clip_duration_s` and `report_path` match AZ-699 exactly so a single Jetson run produces the same markdown shape. |
|
||||||
|
| 7. Architecture compliance | Orchestrator lives entirely under `tests/e2e/replay/`; no `src/` writes. C3 fixture's invariants (`PopulatedC6Cache.cache_root` is the single source of truth) propagate via `write_effective_replay_config`. |
|
||||||
|
|
||||||
|
## Findings
|
||||||
|
|
||||||
|
| ID | Severity | Description | Disposition |
|
||||||
|
|----|----------|-------------|-------------|
|
||||||
|
| F1 | Low | `_default_tile_decoder` in `conftest.py` (carried from batch 108) — still raw TIFF. Not in the AZ-840 path; AZ-840 doesn't change tile decoding. | Defer; no AZ-840 ticket. |
|
||||||
|
| F2 | Low | `_resolve_replay_descriptor_dim` is NetVLAD-only (carried from batch 108). AZ-840 doesn't change descriptors. | Defer; no AZ-840 ticket. |
|
||||||
|
| F3 | Low | `--pace asap` is hardcoded in `_run_replay_subprocess` argv; the AZ-699 test passes `--pace asap` too, so behaviour is identical. If a future test wants a real-time pace, the runner kwarg is the seam. | Document; no ticket. |
|
||||||
|
| F4 | Low | `_run_replay_subprocess` does not stream stdout/stderr; failures surface only after the subprocess exits. For 15-min runs this means the operator sees no progress until the budget expires. AZ-699 has the same shape. | Document; consider an AZ-* if the budget grows. |
|
||||||
|
|
||||||
|
## Notes for follow-up
|
||||||
|
|
||||||
|
* AZ-840 lands the orchestrator test as Tier-2-gated. Verifying
|
||||||
|
the Tier-2 path actually runs on the Jetson harness is the
|
||||||
|
next gating step before Epic AZ-835 can flip from "covered by
|
||||||
|
unit tests" to "covered by Tier-2 integration".
|
||||||
|
* `_e2e_orchestrator.py` is intentionally kept under `tests/`
|
||||||
|
rather than promoted to `src/`. If a second consumer of the
|
||||||
|
same orchestration shape appears (e.g. AZ-833 mock-suite-sat
|
||||||
|
parity test), the move to a shared helper module under
|
||||||
|
`src/gps_denied_onboard/replay/` is the right next step;
|
||||||
|
for now the test-only location matches the helper's only
|
||||||
|
consumer.
|
||||||
|
* AZ-841 (Tier-2 unxfail follow-up) and AZ-842 (replay protocol
|
||||||
|
+ orchestrator docs) sit downstream — both should reference
|
||||||
|
this batch report in their planning sections.
|
||||||
@@ -8,7 +8,7 @@ status: in_progress
|
|||||||
sub_step:
|
sub_step:
|
||||||
phase: 7
|
phase: 7
|
||||||
name: batch-loop
|
name: batch-loop
|
||||||
detail: "batch 109 next; AZ-840 C4"
|
detail: "batch 110 next; full-suite gate after AZ-840 C4 ship"
|
||||||
retry_count: 0
|
retry_count: 0
|
||||||
cycle: 3
|
cycle: 3
|
||||||
tracker: jira
|
tracker: jira
|
||||||
|
|||||||
@@ -0,0 +1,655 @@
|
|||||||
|
"""E2E orchestrator for the AZ-835 7-step pipeline (AZ-840 / Epic AZ-835 C4).
|
||||||
|
|
||||||
|
Wraps the AZ-699 verdict-report writing path with the AZ-839 C3
|
||||||
|
fixture's `PopulatedC6Cache` so a single Tier-2 test can run from
|
||||||
|
``(tlog, video, calibration)`` to a horizontal-error report without
|
||||||
|
operator hand-curation between steps. The 7-step Epic narrative
|
||||||
|
(``_docs/02_tasks/todo/AZ-840_e2e_orchestrator_test.md``):
|
||||||
|
|
||||||
|
1. Active flight cut + tlog/video sync — handled by ``gps-denied-replay``
|
||||||
|
``--auto-trim`` (AZ-405 / AZ-698) inside the airborne binary.
|
||||||
|
2. On-fly frame + IMU extraction — same binary's per-frame loop.
|
||||||
|
3. Auto-create route — done by the C3 fixture
|
||||||
|
(``operator_pre_flight_setup`` calls ``extract_route_from_tlog``).
|
||||||
|
4. POST route to satellite-provider — C3 fixture (AZ-838
|
||||||
|
``SatelliteProviderRouteClient.seed_route``).
|
||||||
|
5. Build FAISS index — C3 fixture (AZ-322 ``DescriptorBatcher``).
|
||||||
|
6. Run gps-denied airborne pipeline — this module's
|
||||||
|
``_run_replay_subprocess`` invokes ``gps-denied-replay`` against
|
||||||
|
the populated cache.
|
||||||
|
7. Get GPS fixes, check vs tlog GPS — this module's
|
||||||
|
``_load_ground_truth`` + ``horizontal_error_distribution`` +
|
||||||
|
``render_report`` writes the verdict markdown.
|
||||||
|
|
||||||
|
The C3 fixture mutates ``c6_tile_cache.root_dir`` to point at a
|
||||||
|
``tmp_path_factory.mktemp`` value (AZ-839 batch 108b). The static
|
||||||
|
operator YAML at ``GPS_DENIED_OPERATOR_CONFIG_PATH`` cannot know
|
||||||
|
that path. ``write_effective_replay_config`` reads the static YAML,
|
||||||
|
overlays the ``c6_tile_cache.root_dir`` override, writes the merged
|
||||||
|
result to a tmp file, and returns the path the airborne binary
|
||||||
|
will load via ``--config``. This keeps a single source of truth
|
||||||
|
for the cache_root override across the in-memory C3 fixture path
|
||||||
|
and the subprocess airborne path.
|
||||||
|
|
||||||
|
Public surface — re-exported from this module:
|
||||||
|
|
||||||
|
* :class:`OrchestratorStep` — failure-step labels per AC-5 ("fails
|
||||||
|
LOUD with a clear error pointing at the failing step").
|
||||||
|
* :class:`OrchestrationFailure` — wraps the underlying exception
|
||||||
|
with the step that produced it.
|
||||||
|
* :class:`OrchestrationReport` — return value of
|
||||||
|
:func:`run_e2e_orchestration` (verdict, distribution, paths,
|
||||||
|
wall-clock measurements per AC-4).
|
||||||
|
* :func:`write_effective_replay_config` — small helper for the
|
||||||
|
config merge step.
|
||||||
|
* :func:`run_e2e_orchestration` — the AC-1 entry point.
|
||||||
|
"""
|
||||||
|
|
||||||
|
from __future__ import annotations
|
||||||
|
|
||||||
|
import datetime
|
||||||
|
import json
|
||||||
|
import logging
|
||||||
|
import subprocess
|
||||||
|
import time
|
||||||
|
from collections.abc import Callable, Mapping
|
||||||
|
from dataclasses import dataclass
|
||||||
|
from enum import Enum
|
||||||
|
from pathlib import Path
|
||||||
|
from typing import Any
|
||||||
|
|
||||||
|
import yaml
|
||||||
|
|
||||||
|
from gps_denied_onboard.helpers.accuracy_report import (
|
||||||
|
AC3_GATE_PCT,
|
||||||
|
AC3_GATE_THRESHOLD_M,
|
||||||
|
ReportContext,
|
||||||
|
render_report,
|
||||||
|
verdict_passes_ac3,
|
||||||
|
)
|
||||||
|
from gps_denied_onboard.helpers.gps_compare import (
|
||||||
|
GroundTruthRow,
|
||||||
|
HorizontalErrorDistribution,
|
||||||
|
horizontal_error_distribution,
|
||||||
|
)
|
||||||
|
from gps_denied_onboard.replay_input import load_tlog_ground_truth
|
||||||
|
|
||||||
|
from tests.e2e.replay._operator_pre_flight import PopulatedC6Cache
|
||||||
|
|
||||||
|
__all__ = [
|
||||||
|
"OrchestrationFailure",
|
||||||
|
"OrchestrationReport",
|
||||||
|
"OrchestratorStep",
|
||||||
|
"read_calibration_acquisition_method",
|
||||||
|
"run_e2e_orchestration",
|
||||||
|
"write_effective_replay_config",
|
||||||
|
]
|
||||||
|
|
||||||
|
|
||||||
|
# Replay-subprocess wall-clock cap for the Derkachi clip per AZ-840
|
||||||
|
# AC-4 (15 min soft target). Exposed as a default that the integration
|
||||||
|
# test can override; the unit tests rely on the contract that the
|
||||||
|
# runner argument is a free callable.
|
||||||
|
_DEFAULT_MAX_SECONDS: float = 900.0
|
||||||
|
|
||||||
|
_LOGGER = logging.getLogger("tests.e2e.replay.e2e_orchestrator")
|
||||||
|
|
||||||
|
|
||||||
|
class OrchestratorStep(str, Enum):
|
||||||
|
"""Labels for the 7-step pipeline used by :class:`OrchestrationFailure`.
|
||||||
|
|
||||||
|
AC-5: every failure that reaches the test surface must name the
|
||||||
|
step that produced it. The string values are stable so test
|
||||||
|
assertions and log readers can match on them.
|
||||||
|
"""
|
||||||
|
|
||||||
|
VALIDATE_INPUTS = "validate_inputs"
|
||||||
|
WRITE_EFFECTIVE_CONFIG = "write_effective_config"
|
||||||
|
AIRBORNE_PIPELINE = "airborne_pipeline"
|
||||||
|
PARSE_EMISSIONS = "parse_emissions"
|
||||||
|
LOAD_GROUND_TRUTH = "load_ground_truth"
|
||||||
|
COMPUTE_DISTRIBUTION = "compute_distribution"
|
||||||
|
RENDER_REPORT = "render_report"
|
||||||
|
|
||||||
|
|
||||||
|
class OrchestrationFailure(RuntimeError):
|
||||||
|
"""Failure inside one of the 7 orchestration steps (AC-5).
|
||||||
|
|
||||||
|
The :attr:`step` attribute names the failing step; the message
|
||||||
|
embeds it as the prefix so plain log-readers see the failure
|
||||||
|
location without inspecting the exception object.
|
||||||
|
"""
|
||||||
|
|
||||||
|
def __init__(self, step: OrchestratorStep, message: str) -> None:
|
||||||
|
super().__init__(f"[{step.value}] {message}")
|
||||||
|
self.step = step
|
||||||
|
|
||||||
|
|
||||||
|
@dataclass(frozen=True, slots=True)
|
||||||
|
class OrchestrationReport:
|
||||||
|
"""Return value of :func:`run_e2e_orchestration`.
|
||||||
|
|
||||||
|
Attributes:
|
||||||
|
verdict_passed: ``True`` iff the run met the AZ-696 epic
|
||||||
|
AC-3 gate (>= AC3_GATE_PCT% within AC3_GATE_THRESHOLD_M m).
|
||||||
|
distribution: Computed horizontal-error distribution.
|
||||||
|
report_path: Markdown report written under ``report_dir``.
|
||||||
|
emissions_count: Total estimator-output records consumed.
|
||||||
|
wall_clock_s: Wall-clock seconds for the orchestration run
|
||||||
|
(excludes the C3 fixture setup; covers steps 1-2-6-7).
|
||||||
|
replay_subprocess_seconds: Wall-clock seconds the airborne
|
||||||
|
replay subprocess took. Always <= ``wall_clock_s``.
|
||||||
|
"""
|
||||||
|
|
||||||
|
verdict_passed: bool
|
||||||
|
distribution: HorizontalErrorDistribution
|
||||||
|
report_path: Path
|
||||||
|
emissions_count: int
|
||||||
|
wall_clock_s: float
|
||||||
|
replay_subprocess_seconds: float
|
||||||
|
|
||||||
|
|
||||||
|
def read_calibration_acquisition_method(calibration_path: Path) -> str:
|
||||||
|
"""Return the AZ-702 ``acquisition_method`` field, or ``"unknown"``.
|
||||||
|
|
||||||
|
Mirrors ``test_derkachi_real_tlog._read_calibration_acquisition_method``
|
||||||
|
so the AZ-840 verdict report can name the calibration provenance
|
||||||
|
in its failure message (AZ-699 AC-3). Pure helper; the report
|
||||||
|
writer needs the string, not the JSON.
|
||||||
|
"""
|
||||||
|
try:
|
||||||
|
data = json.loads(calibration_path.read_text())
|
||||||
|
except (OSError, json.JSONDecodeError):
|
||||||
|
return "unknown"
|
||||||
|
method = data.get("acquisition_method")
|
||||||
|
if isinstance(method, str) and method:
|
||||||
|
return method
|
||||||
|
return "unknown"
|
||||||
|
|
||||||
|
|
||||||
|
def write_effective_replay_config(
|
||||||
|
*,
|
||||||
|
base_config_path: Path,
|
||||||
|
cache_root: Path,
|
||||||
|
output_path: Path,
|
||||||
|
) -> Path:
|
||||||
|
"""Merge cache_root override into the static operator YAML.
|
||||||
|
|
||||||
|
Reads ``base_config_path`` as YAML, sets the
|
||||||
|
``c6_tile_cache.root_dir`` to ``cache_root`` (forcing the
|
||||||
|
FAISS index path to fall back to ``<cache_root>/descriptor.index``),
|
||||||
|
and writes the merged document to ``output_path`` as YAML.
|
||||||
|
|
||||||
|
The merge is field-level: every other block in the base YAML is
|
||||||
|
preserved verbatim. This keeps a single source of truth for the
|
||||||
|
operator config — the test harness only contributes the dynamic
|
||||||
|
cache_root.
|
||||||
|
|
||||||
|
Returns:
|
||||||
|
The ``output_path`` argument, for ergonomic chaining.
|
||||||
|
|
||||||
|
Raises:
|
||||||
|
OrchestrationFailure (step=WRITE_EFFECTIVE_CONFIG): Base YAML
|
||||||
|
unreadable, malformed, or not a top-level mapping.
|
||||||
|
"""
|
||||||
|
|
||||||
|
try:
|
||||||
|
base_text = base_config_path.read_text()
|
||||||
|
except OSError as exc:
|
||||||
|
raise OrchestrationFailure(
|
||||||
|
OrchestratorStep.WRITE_EFFECTIVE_CONFIG,
|
||||||
|
f"cannot read base config at {base_config_path}: {exc!r}",
|
||||||
|
) from exc
|
||||||
|
|
||||||
|
try:
|
||||||
|
base_data = yaml.safe_load(base_text) or {}
|
||||||
|
except yaml.YAMLError as exc:
|
||||||
|
raise OrchestrationFailure(
|
||||||
|
OrchestratorStep.WRITE_EFFECTIVE_CONFIG,
|
||||||
|
f"base config YAML at {base_config_path} is malformed: {exc!r}",
|
||||||
|
) from exc
|
||||||
|
if not isinstance(base_data, dict):
|
||||||
|
raise OrchestrationFailure(
|
||||||
|
OrchestratorStep.WRITE_EFFECTIVE_CONFIG,
|
||||||
|
f"base config YAML at {base_config_path} must be a mapping; "
|
||||||
|
f"got {type(base_data).__name__}",
|
||||||
|
)
|
||||||
|
|
||||||
|
c6_block_raw = base_data.get("c6_tile_cache")
|
||||||
|
c6_block = dict(c6_block_raw) if isinstance(c6_block_raw, dict) else {}
|
||||||
|
c6_block["root_dir"] = str(cache_root)
|
||||||
|
c6_block["faiss_index_path"] = ""
|
||||||
|
base_data["c6_tile_cache"] = c6_block
|
||||||
|
|
||||||
|
try:
|
||||||
|
output_path.parent.mkdir(parents=True, exist_ok=True)
|
||||||
|
output_path.write_text(
|
||||||
|
yaml.safe_dump(base_data, sort_keys=True, default_flow_style=False)
|
||||||
|
)
|
||||||
|
except OSError as exc:
|
||||||
|
raise OrchestrationFailure(
|
||||||
|
OrchestratorStep.WRITE_EFFECTIVE_CONFIG,
|
||||||
|
f"cannot write effective config at {output_path}: {exc!r}",
|
||||||
|
) from exc
|
||||||
|
return output_path
|
||||||
|
|
||||||
|
|
||||||
|
def run_e2e_orchestration(
|
||||||
|
*,
|
||||||
|
populated_cache: PopulatedC6Cache,
|
||||||
|
base_config_path: Path,
|
||||||
|
tlog_path: Path,
|
||||||
|
video_path: Path,
|
||||||
|
calibration_path: Path,
|
||||||
|
signing_key_path: Path,
|
||||||
|
replay_binary: Path,
|
||||||
|
output_path: Path,
|
||||||
|
report_dir: Path,
|
||||||
|
effective_config_path: Path,
|
||||||
|
run_date_utc: str | None = None,
|
||||||
|
runner: Callable[..., subprocess.CompletedProcess[str]] = subprocess.run,
|
||||||
|
subprocess_env: Mapping[str, str] | None = None,
|
||||||
|
max_seconds: float = _DEFAULT_MAX_SECONDS,
|
||||||
|
logger: logging.Logger | None = None,
|
||||||
|
) -> OrchestrationReport:
|
||||||
|
"""Run AZ-835 steps 1-7 against the AZ-839 populated cache.
|
||||||
|
|
||||||
|
Steps 3-5 are the responsibility of ``populated_cache`` (the
|
||||||
|
AZ-839 C3 fixture); this function covers 1-2-6 (the airborne
|
||||||
|
replay subprocess) and 7 (verdict report). The C3 fixture and
|
||||||
|
this function share the cache_root via
|
||||||
|
:func:`write_effective_replay_config` so the airborne binary
|
||||||
|
reads the same FAISS index the fixture wrote (AC-3).
|
||||||
|
|
||||||
|
Args:
|
||||||
|
populated_cache: C3 fixture output (AZ-839). Carries
|
||||||
|
``cache_root``, ``faiss_index_path``, and the route
|
||||||
|
spec the test pipeline produced.
|
||||||
|
base_config_path: Static operator config YAML
|
||||||
|
(``GPS_DENIED_OPERATOR_CONFIG_PATH``). Must register
|
||||||
|
``c6_tile_cache``, ``c10_provisioning``, ``c2_vpr``,
|
||||||
|
``c4_pose``, and ``c5_state`` blocks for the airborne
|
||||||
|
binary to compose the replay graph.
|
||||||
|
tlog_path: ArduPilot binary tlog the test consumes.
|
||||||
|
video_path: Flight video file the test consumes.
|
||||||
|
calibration_path: Camera calibration JSON (AZ-702
|
||||||
|
factory-sheet for Derkachi).
|
||||||
|
signing_key_path: MAVLink signing-key file. Replay protocol
|
||||||
|
Invariant 11 — required even for the noop transport.
|
||||||
|
replay_binary: ``gps-denied-replay`` console-script path.
|
||||||
|
output_path: Where the airborne binary writes JSONL
|
||||||
|
estimator emissions.
|
||||||
|
report_dir: Directory the verdict markdown is written to.
|
||||||
|
effective_config_path: Where the cache_root-merged YAML is
|
||||||
|
written. The path is passed to the airborne binary via
|
||||||
|
``--config``.
|
||||||
|
run_date_utc: ISO-8601 date for the report filename and
|
||||||
|
header. Defaults to today UTC.
|
||||||
|
runner: ``subprocess.run`` by default; tests inject a fake
|
||||||
|
that emits a synthetic JSONL output.
|
||||||
|
subprocess_env: Optional environment overlay for the
|
||||||
|
replay subprocess. ``None`` means ``os.environ``.
|
||||||
|
max_seconds: Hard wall-clock cap for the airborne replay
|
||||||
|
subprocess. The orchestrator times out the runner via
|
||||||
|
its ``timeout`` kwarg; an exceeded budget surfaces as
|
||||||
|
``OrchestrationFailure(step=AIRBORNE_PIPELINE)``.
|
||||||
|
logger: Optional logger. Defaults to the module logger.
|
||||||
|
|
||||||
|
Returns:
|
||||||
|
:class:`OrchestrationReport` on success. The verdict can
|
||||||
|
be PASS or FAIL — AC-2 mandates the report exists either
|
||||||
|
way.
|
||||||
|
|
||||||
|
Raises:
|
||||||
|
OrchestrationFailure: Any of the 7 steps failed. The
|
||||||
|
``step`` attribute names the failing step.
|
||||||
|
"""
|
||||||
|
|
||||||
|
log = logger or _LOGGER
|
||||||
|
started = time.monotonic()
|
||||||
|
effective_run_date = run_date_utc or (
|
||||||
|
datetime.datetime.now(datetime.timezone.utc).date().isoformat()
|
||||||
|
)
|
||||||
|
|
||||||
|
_validate_inputs(
|
||||||
|
base_config_path=base_config_path,
|
||||||
|
tlog_path=tlog_path,
|
||||||
|
video_path=video_path,
|
||||||
|
calibration_path=calibration_path,
|
||||||
|
signing_key_path=signing_key_path,
|
||||||
|
replay_binary=replay_binary,
|
||||||
|
report_dir=report_dir,
|
||||||
|
)
|
||||||
|
|
||||||
|
write_effective_replay_config(
|
||||||
|
base_config_path=base_config_path,
|
||||||
|
cache_root=populated_cache.cache_root,
|
||||||
|
output_path=effective_config_path,
|
||||||
|
)
|
||||||
|
|
||||||
|
replay_subprocess_seconds = _run_replay_subprocess(
|
||||||
|
replay_binary=replay_binary,
|
||||||
|
video_path=video_path,
|
||||||
|
tlog_path=tlog_path,
|
||||||
|
output_path=output_path,
|
||||||
|
calibration_path=calibration_path,
|
||||||
|
config_path=effective_config_path,
|
||||||
|
signing_key_path=signing_key_path,
|
||||||
|
max_seconds=max_seconds,
|
||||||
|
runner=runner,
|
||||||
|
env=subprocess_env,
|
||||||
|
logger=log,
|
||||||
|
)
|
||||||
|
|
||||||
|
emissions = _parse_jsonl(output_path)
|
||||||
|
|
||||||
|
ground_truth = _load_ground_truth(tlog_path)
|
||||||
|
|
||||||
|
distribution = _compute_distribution(emissions, ground_truth)
|
||||||
|
|
||||||
|
context = ReportContext(
|
||||||
|
run_date_utc=effective_run_date,
|
||||||
|
tlog_path=tlog_path,
|
||||||
|
video_path=video_path,
|
||||||
|
calibration_acquisition_method=read_calibration_acquisition_method(
|
||||||
|
calibration_path
|
||||||
|
),
|
||||||
|
clip_duration_s=(
|
||||||
|
ground_truth[-1].t_s - ground_truth[0].t_s
|
||||||
|
if ground_truth
|
||||||
|
else 0.0
|
||||||
|
),
|
||||||
|
emissions_count=len(emissions),
|
||||||
|
)
|
||||||
|
verdict_passed = verdict_passes_ac3(distribution)
|
||||||
|
report_path = _render_and_write_report(
|
||||||
|
distribution=distribution,
|
||||||
|
context=context,
|
||||||
|
passed=verdict_passed,
|
||||||
|
report_dir=report_dir,
|
||||||
|
)
|
||||||
|
|
||||||
|
log.info(
|
||||||
|
"e2e_orchestrator: report written",
|
||||||
|
extra={
|
||||||
|
"kind": "e2e_orchestrator.report_written",
|
||||||
|
"kv": {
|
||||||
|
"report_path": str(report_path),
|
||||||
|
"verdict_passed": verdict_passed,
|
||||||
|
"share_within_threshold_pct": (
|
||||||
|
distribution.threshold_hit_share.get(
|
||||||
|
AC3_GATE_THRESHOLD_M, 0.0
|
||||||
|
)
|
||||||
|
* 100.0
|
||||||
|
),
|
||||||
|
"ac3_gate_pct": AC3_GATE_PCT,
|
||||||
|
"emissions_count": len(emissions),
|
||||||
|
"ground_truth_pairings": distribution.count,
|
||||||
|
},
|
||||||
|
},
|
||||||
|
)
|
||||||
|
|
||||||
|
wall_clock_s = max(0.0, time.monotonic() - started)
|
||||||
|
return OrchestrationReport(
|
||||||
|
verdict_passed=verdict_passed,
|
||||||
|
distribution=distribution,
|
||||||
|
report_path=report_path,
|
||||||
|
emissions_count=len(emissions),
|
||||||
|
wall_clock_s=wall_clock_s,
|
||||||
|
replay_subprocess_seconds=replay_subprocess_seconds,
|
||||||
|
)
|
||||||
|
|
||||||
|
|
||||||
|
def _validate_inputs(
|
||||||
|
*,
|
||||||
|
base_config_path: Path,
|
||||||
|
tlog_path: Path,
|
||||||
|
video_path: Path,
|
||||||
|
calibration_path: Path,
|
||||||
|
signing_key_path: Path,
|
||||||
|
replay_binary: Path,
|
||||||
|
report_dir: Path,
|
||||||
|
) -> None:
|
||||||
|
"""Fail fast on missing inputs (AC-5 — surface the failing step early)."""
|
||||||
|
file_inputs: tuple[tuple[str, Path], ...] = (
|
||||||
|
("base_config_path", base_config_path),
|
||||||
|
("tlog_path", tlog_path),
|
||||||
|
("video_path", video_path),
|
||||||
|
("calibration_path", calibration_path),
|
||||||
|
("signing_key_path", signing_key_path),
|
||||||
|
("replay_binary", replay_binary),
|
||||||
|
)
|
||||||
|
for label, path in file_inputs:
|
||||||
|
if not path.is_file():
|
||||||
|
raise OrchestrationFailure(
|
||||||
|
OrchestratorStep.VALIDATE_INPUTS,
|
||||||
|
f"{label} is not a file: {path}",
|
||||||
|
)
|
||||||
|
try:
|
||||||
|
report_dir.mkdir(parents=True, exist_ok=True)
|
||||||
|
except OSError as exc:
|
||||||
|
raise OrchestrationFailure(
|
||||||
|
OrchestratorStep.VALIDATE_INPUTS,
|
||||||
|
f"report_dir {report_dir} cannot be created: {exc!r}",
|
||||||
|
) from exc
|
||||||
|
|
||||||
|
|
||||||
|
def _run_replay_subprocess(
|
||||||
|
*,
|
||||||
|
replay_binary: Path,
|
||||||
|
video_path: Path,
|
||||||
|
tlog_path: Path,
|
||||||
|
output_path: Path,
|
||||||
|
calibration_path: Path,
|
||||||
|
config_path: Path,
|
||||||
|
signing_key_path: Path,
|
||||||
|
max_seconds: float,
|
||||||
|
runner: Callable[..., subprocess.CompletedProcess[str]],
|
||||||
|
env: Mapping[str, str] | None,
|
||||||
|
logger: logging.Logger,
|
||||||
|
) -> float:
|
||||||
|
"""Invoke gps-denied-replay with --auto-trim; return wall-clock seconds.
|
||||||
|
|
||||||
|
Wraps :class:`subprocess.run` so unit tests can inject a fake
|
||||||
|
runner. ``--auto-trim`` is always enabled here — the
|
||||||
|
orchestrator owns the AZ-405 / AZ-698 sync path (AZ-840 step 1).
|
||||||
|
|
||||||
|
Raises:
|
||||||
|
OrchestrationFailure (step=AIRBORNE_PIPELINE): Non-zero exit,
|
||||||
|
timeout, or runner-level OSError.
|
||||||
|
"""
|
||||||
|
|
||||||
|
argv = [
|
||||||
|
str(replay_binary),
|
||||||
|
"--video",
|
||||||
|
str(video_path),
|
||||||
|
"--tlog",
|
||||||
|
str(tlog_path),
|
||||||
|
"--output",
|
||||||
|
str(output_path),
|
||||||
|
"--camera-calibration",
|
||||||
|
str(calibration_path),
|
||||||
|
"--config",
|
||||||
|
str(config_path),
|
||||||
|
"--mavlink-signing-key",
|
||||||
|
str(signing_key_path),
|
||||||
|
"--pace",
|
||||||
|
"asap",
|
||||||
|
"--auto-trim",
|
||||||
|
]
|
||||||
|
started = time.monotonic()
|
||||||
|
try:
|
||||||
|
completed = runner(
|
||||||
|
argv,
|
||||||
|
capture_output=True,
|
||||||
|
text=True,
|
||||||
|
timeout=max_seconds,
|
||||||
|
env=dict(env) if env is not None else None,
|
||||||
|
)
|
||||||
|
except subprocess.TimeoutExpired as exc:
|
||||||
|
raise OrchestrationFailure(
|
||||||
|
OrchestratorStep.AIRBORNE_PIPELINE,
|
||||||
|
f"gps-denied-replay timed out after {max_seconds:.0f} s",
|
||||||
|
) from exc
|
||||||
|
except OSError as exc:
|
||||||
|
raise OrchestrationFailure(
|
||||||
|
OrchestratorStep.AIRBORNE_PIPELINE,
|
||||||
|
f"cannot launch gps-denied-replay at {replay_binary}: {exc!r}",
|
||||||
|
) from exc
|
||||||
|
|
||||||
|
elapsed_s = max(0.0, time.monotonic() - started)
|
||||||
|
if completed.returncode != 0:
|
||||||
|
raise OrchestrationFailure(
|
||||||
|
OrchestratorStep.AIRBORNE_PIPELINE,
|
||||||
|
f"gps-denied-replay exited {completed.returncode}\n"
|
||||||
|
f"stdout:\n{completed.stdout}\nstderr:\n{completed.stderr}",
|
||||||
|
)
|
||||||
|
logger.info(
|
||||||
|
"e2e_orchestrator: replay subprocess complete",
|
||||||
|
extra={
|
||||||
|
"kind": "e2e_orchestrator.replay_subprocess",
|
||||||
|
"kv": {
|
||||||
|
"elapsed_s": elapsed_s,
|
||||||
|
"max_seconds": max_seconds,
|
||||||
|
},
|
||||||
|
},
|
||||||
|
)
|
||||||
|
return elapsed_s
|
||||||
|
|
||||||
|
|
||||||
|
def _parse_jsonl(path: Path) -> list[dict[str, Any]]:
|
||||||
|
"""Read one JSON record per non-blank line.
|
||||||
|
|
||||||
|
Raises:
|
||||||
|
OrchestrationFailure (step=PARSE_EMISSIONS): Output file
|
||||||
|
missing, unreadable, has zero records, or contains a
|
||||||
|
malformed line.
|
||||||
|
"""
|
||||||
|
if not path.is_file():
|
||||||
|
raise OrchestrationFailure(
|
||||||
|
OrchestratorStep.PARSE_EMISSIONS,
|
||||||
|
f"replay output JSONL not found: {path}",
|
||||||
|
)
|
||||||
|
try:
|
||||||
|
text = path.read_text()
|
||||||
|
except OSError as exc:
|
||||||
|
raise OrchestrationFailure(
|
||||||
|
OrchestratorStep.PARSE_EMISSIONS,
|
||||||
|
f"replay output JSONL unreadable at {path}: {exc!r}",
|
||||||
|
) from exc
|
||||||
|
rows: list[dict[str, Any]] = []
|
||||||
|
for line_idx, line in enumerate(text.splitlines(), start=1):
|
||||||
|
if not line.strip():
|
||||||
|
continue
|
||||||
|
try:
|
||||||
|
row = json.loads(line)
|
||||||
|
except json.JSONDecodeError as exc:
|
||||||
|
raise OrchestrationFailure(
|
||||||
|
OrchestratorStep.PARSE_EMISSIONS,
|
||||||
|
f"malformed JSON at line {line_idx} of {path}: {exc.msg}",
|
||||||
|
) from exc
|
||||||
|
if not isinstance(row, dict):
|
||||||
|
raise OrchestrationFailure(
|
||||||
|
OrchestratorStep.PARSE_EMISSIONS,
|
||||||
|
f"line {line_idx} of {path} is not a JSON object: {row!r}",
|
||||||
|
)
|
||||||
|
rows.append(row)
|
||||||
|
if not rows:
|
||||||
|
raise OrchestrationFailure(
|
||||||
|
OrchestratorStep.PARSE_EMISSIONS,
|
||||||
|
f"replay output JSONL at {path} has zero records — pipeline "
|
||||||
|
"produced no estimator emissions",
|
||||||
|
)
|
||||||
|
return rows
|
||||||
|
|
||||||
|
|
||||||
|
def _load_ground_truth(tlog_path: Path) -> list[GroundTruthRow]:
|
||||||
|
"""Extract WGS84 ground truth from the binary tlog.
|
||||||
|
|
||||||
|
Raises:
|
||||||
|
OrchestrationFailure (step=LOAD_GROUND_TRUTH): Loader
|
||||||
|
error or empty record list.
|
||||||
|
"""
|
||||||
|
try:
|
||||||
|
series = load_tlog_ground_truth(tlog_path).records
|
||||||
|
except Exception as exc:
|
||||||
|
raise OrchestrationFailure(
|
||||||
|
OrchestratorStep.LOAD_GROUND_TRUTH,
|
||||||
|
f"load_tlog_ground_truth({tlog_path}) failed: {exc!r}",
|
||||||
|
) from exc
|
||||||
|
rows: list[GroundTruthRow] = [
|
||||||
|
GroundTruthRow(
|
||||||
|
t_s=fix.ts_ns / 1e9,
|
||||||
|
lat_deg=fix.lat_deg,
|
||||||
|
lon_deg=fix.lon_deg,
|
||||||
|
alt_m=fix.alt_m,
|
||||||
|
)
|
||||||
|
for fix in series
|
||||||
|
]
|
||||||
|
if not rows:
|
||||||
|
raise OrchestrationFailure(
|
||||||
|
OrchestratorStep.LOAD_GROUND_TRUTH,
|
||||||
|
f"tlog ground truth at {tlog_path} has zero rows",
|
||||||
|
)
|
||||||
|
return rows
|
||||||
|
|
||||||
|
|
||||||
|
def _compute_distribution(
|
||||||
|
emissions: list[dict[str, Any]],
|
||||||
|
ground_truth: list[GroundTruthRow],
|
||||||
|
) -> HorizontalErrorDistribution:
|
||||||
|
"""Compute the horizontal-error distribution.
|
||||||
|
|
||||||
|
Raises:
|
||||||
|
OrchestrationFailure (step=COMPUTE_DISTRIBUTION): Helper
|
||||||
|
error or zero ground-truth pairings (every emission
|
||||||
|
fell outside the GT time window).
|
||||||
|
"""
|
||||||
|
try:
|
||||||
|
distribution = horizontal_error_distribution(emissions, ground_truth)
|
||||||
|
except Exception as exc:
|
||||||
|
raise OrchestrationFailure(
|
||||||
|
OrchestratorStep.COMPUTE_DISTRIBUTION,
|
||||||
|
f"horizontal_error_distribution failed: {exc!r}",
|
||||||
|
) from exc
|
||||||
|
if distribution.count == 0:
|
||||||
|
raise OrchestrationFailure(
|
||||||
|
OrchestratorStep.COMPUTE_DISTRIBUTION,
|
||||||
|
"no emissions paired with ground truth — JSONL timestamps "
|
||||||
|
"outside the tlog GPS window?",
|
||||||
|
)
|
||||||
|
return distribution
|
||||||
|
|
||||||
|
|
||||||
|
def _render_and_write_report(
|
||||||
|
*,
|
||||||
|
distribution: HorizontalErrorDistribution,
|
||||||
|
context: ReportContext,
|
||||||
|
passed: bool,
|
||||||
|
report_dir: Path,
|
||||||
|
) -> Path:
|
||||||
|
"""Render the verdict markdown and write it to ``report_dir``.
|
||||||
|
|
||||||
|
Raises:
|
||||||
|
OrchestrationFailure (step=RENDER_REPORT): Render or write
|
||||||
|
failure; ``report_dir`` was already created by
|
||||||
|
:func:`_validate_inputs`.
|
||||||
|
"""
|
||||||
|
try:
|
||||||
|
report_text = render_report(distribution, context, passed=passed)
|
||||||
|
except Exception as exc:
|
||||||
|
raise OrchestrationFailure(
|
||||||
|
OrchestratorStep.RENDER_REPORT,
|
||||||
|
f"render_report failed: {exc!r}",
|
||||||
|
) from exc
|
||||||
|
report_path = (
|
||||||
|
report_dir / f"real_flight_validation_{context.run_date_utc}.md"
|
||||||
|
)
|
||||||
|
try:
|
||||||
|
report_path.write_text(report_text)
|
||||||
|
except OSError as exc:
|
||||||
|
raise OrchestrationFailure(
|
||||||
|
OrchestratorStep.RENDER_REPORT,
|
||||||
|
f"cannot write report at {report_path}: {exc!r}",
|
||||||
|
) from exc
|
||||||
|
return report_path
|
||||||
@@ -0,0 +1,182 @@
|
|||||||
|
"""AZ-840 — E2E orchestrator integration test (AC-1 / AC-2 / AC-3 / AC-4 / AC-6).
|
||||||
|
|
||||||
|
The Tier-2 entry point that closes Epic AZ-835's narrative: from a
|
||||||
|
``(tlog, video, calibration)`` triple, run the full 7-step pipeline
|
||||||
|
end-to-end on the Jetson harness without operator hand-curation
|
||||||
|
between steps.
|
||||||
|
|
||||||
|
The test consumes:
|
||||||
|
|
||||||
|
* :func:`tests.e2e.replay.conftest.operator_pre_flight_setup` —
|
||||||
|
the AZ-839 C3 fixture that owns steps 3-5 (route extraction +
|
||||||
|
satellite-provider seeding + FAISS index build) and yields a
|
||||||
|
:class:`PopulatedC6Cache` keyed off a freshly-mktemp'd
|
||||||
|
``cache_root``.
|
||||||
|
* :func:`tests.e2e.replay.conftest.derkachi_replay_inputs` — the
|
||||||
|
shared session fixture that materialises the Derkachi tlog +
|
||||||
|
video + factory-sheet calibration + signing-key file.
|
||||||
|
* :func:`tests.e2e.replay._e2e_orchestrator.run_e2e_orchestration`
|
||||||
|
— the AC-1 driver that wires everything below the C3 fixture.
|
||||||
|
|
||||||
|
The driver writes a fresh effective replay config per session
|
||||||
|
(merging the static operator YAML with the cache_root override),
|
||||||
|
invokes ``gps-denied-replay --auto-trim``, parses the JSONL
|
||||||
|
emissions, computes the horizontal-error distribution, and writes
|
||||||
|
the verdict markdown under ``_docs/06_metrics/`` (AC-2).
|
||||||
|
|
||||||
|
Skip gates (in evaluation order):
|
||||||
|
|
||||||
|
1. ``@pytest.mark.tier2`` — the per-suite Tier-2 plugin gates this
|
||||||
|
off on dev macOS (matches the AZ-839 / AZ-699 contract).
|
||||||
|
2. ``RUN_REPLAY_E2E`` not in ``{1, true, yes, on}``.
|
||||||
|
3. ``gps-denied-replay`` console-script not on ``PATH``.
|
||||||
|
4. Real video missing or placeholder-sized (mirrors AZ-699's gate).
|
||||||
|
5. ``operator_pre_flight_setup`` fixture itself skipped — the
|
||||||
|
downstream consumer inherits the SKIP automatically (pytest's
|
||||||
|
fixture-skip propagation).
|
||||||
|
|
||||||
|
AC-7 (AZ-699 continues to pass) is satisfied by inspection: this
|
||||||
|
test does not modify ``test_derkachi_real_tlog.py`` and writes its
|
||||||
|
report to the same path (``real_flight_validation_<date>.md``) but
|
||||||
|
in an idempotent way — both tests writing PASS or both writing
|
||||||
|
FAIL is the expected joint outcome on a given clip.
|
||||||
|
"""
|
||||||
|
|
||||||
|
from __future__ import annotations
|
||||||
|
|
||||||
|
import os
|
||||||
|
import shutil
|
||||||
|
import sys
|
||||||
|
from collections.abc import Iterator
|
||||||
|
from pathlib import Path
|
||||||
|
|
||||||
|
import pytest
|
||||||
|
|
||||||
|
from tests.e2e.replay._e2e_orchestrator import (
|
||||||
|
OrchestrationReport,
|
||||||
|
run_e2e_orchestration,
|
||||||
|
)
|
||||||
|
from tests.e2e.replay._operator_pre_flight import PopulatedC6Cache
|
||||||
|
from tests.e2e.replay.conftest import DerkachiReplayInputs
|
||||||
|
|
||||||
|
|
||||||
|
def _repo_root() -> Path:
|
||||||
|
return Path(__file__).resolve().parents[3]
|
||||||
|
|
||||||
|
|
||||||
|
def _derkachi_dir() -> Path:
|
||||||
|
return _repo_root() / "_docs" / "00_problem" / "input_data" / "flight_derkachi"
|
||||||
|
|
||||||
|
|
||||||
|
_MIN_REAL_VIDEO_BYTES: int = 1_000_000
|
||||||
|
|
||||||
|
|
||||||
|
def _replay_binary() -> Path | None:
|
||||||
|
"""Return the absolute path to ``gps-denied-replay`` or ``None``.
|
||||||
|
|
||||||
|
Same lookup order AZ-699 uses: PATH first, venv bin second.
|
||||||
|
"""
|
||||||
|
|
||||||
|
binary = shutil.which("gps-denied-replay")
|
||||||
|
if binary is not None:
|
||||||
|
return Path(binary)
|
||||||
|
venv_bin = Path(sys.executable).parent / "gps-denied-replay"
|
||||||
|
if venv_bin.exists():
|
||||||
|
return venv_bin
|
||||||
|
return None
|
||||||
|
|
||||||
|
|
||||||
|
def _orchestrator_skip_reason() -> str | None:
|
||||||
|
"""Return a SKIP message when env / inputs preclude a Jetson run."""
|
||||||
|
|
||||||
|
if os.environ.get("RUN_REPLAY_E2E", "").strip().lower() not in {
|
||||||
|
"1",
|
||||||
|
"true",
|
||||||
|
"yes",
|
||||||
|
"on",
|
||||||
|
}:
|
||||||
|
return "AZ-840 e2e orchestrator gated by RUN_REPLAY_E2E=1"
|
||||||
|
if not os.environ.get("GPS_DENIED_OPERATOR_CONFIG_PATH", "").strip():
|
||||||
|
return (
|
||||||
|
"AZ-840 e2e orchestrator requires GPS_DENIED_OPERATOR_CONFIG_PATH "
|
||||||
|
"(same env var the C3 fixture consumes)"
|
||||||
|
)
|
||||||
|
if _replay_binary() is None:
|
||||||
|
return "gps-denied-replay console-script not installed"
|
||||||
|
video = _derkachi_dir() / "flight_derkachi.mp4"
|
||||||
|
if not video.is_file():
|
||||||
|
return f"Derkachi video missing: {video}"
|
||||||
|
if video.stat().st_size < _MIN_REAL_VIDEO_BYTES:
|
||||||
|
return (
|
||||||
|
f"Derkachi video at {video} is only {video.stat().st_size} "
|
||||||
|
"bytes — placeholder, not a real recording"
|
||||||
|
)
|
||||||
|
return None
|
||||||
|
|
||||||
|
|
||||||
|
@pytest.fixture
|
||||||
|
def az840_skip_gate() -> Iterator[None]:
|
||||||
|
"""Skip-gate the orchestrator test before any heavy fixtures resolve."""
|
||||||
|
|
||||||
|
reason = _orchestrator_skip_reason()
|
||||||
|
if reason is not None:
|
||||||
|
pytest.skip(reason)
|
||||||
|
yield
|
||||||
|
|
||||||
|
|
||||||
|
@pytest.mark.tier2
|
||||||
|
def test_az840_e2e_real_flight_orchestration(
|
||||||
|
az840_skip_gate: None,
|
||||||
|
operator_pre_flight_setup: PopulatedC6Cache,
|
||||||
|
derkachi_replay_inputs: DerkachiReplayInputs,
|
||||||
|
tmp_path: Path,
|
||||||
|
) -> None:
|
||||||
|
# Arrange — every input besides cache_root comes from the existing
|
||||||
|
# session fixtures so the same Tier-2 harness setup that powers
|
||||||
|
# AZ-699 + AZ-839 is exercised.
|
||||||
|
binary = _replay_binary()
|
||||||
|
assert binary is not None, "skip gate already verified the binary exists"
|
||||||
|
base_config_path = Path(os.environ["GPS_DENIED_OPERATOR_CONFIG_PATH"])
|
||||||
|
output_path = tmp_path / "estimator_output.jsonl"
|
||||||
|
effective_config_path = tmp_path / "operator_config_effective.yaml"
|
||||||
|
report_dir = _repo_root() / "_docs" / "06_metrics"
|
||||||
|
|
||||||
|
# Act
|
||||||
|
report = run_e2e_orchestration(
|
||||||
|
populated_cache=operator_pre_flight_setup,
|
||||||
|
base_config_path=base_config_path,
|
||||||
|
tlog_path=derkachi_replay_inputs.tlog_path,
|
||||||
|
video_path=derkachi_replay_inputs.video_path,
|
||||||
|
calibration_path=derkachi_replay_inputs.calibration_path,
|
||||||
|
signing_key_path=derkachi_replay_inputs.signing_key_path,
|
||||||
|
replay_binary=binary,
|
||||||
|
output_path=output_path,
|
||||||
|
report_dir=report_dir,
|
||||||
|
effective_config_path=effective_config_path,
|
||||||
|
)
|
||||||
|
|
||||||
|
# Assert AC-2 + AC-4 — report exists; full run within the 15-min budget.
|
||||||
|
assert isinstance(report, OrchestrationReport)
|
||||||
|
assert report.report_path.is_file()
|
||||||
|
body = report.report_path.read_text()
|
||||||
|
assert "## Horizontal error (metres)" in body
|
||||||
|
assert "## Threshold-hit share" in body
|
||||||
|
assert "Mean" in body
|
||||||
|
for threshold in (10, 25, 50, 100):
|
||||||
|
assert f"| {threshold} |" in body, (
|
||||||
|
f"threshold {threshold} m row missing from report"
|
||||||
|
)
|
||||||
|
assert report.replay_subprocess_seconds <= 900.0, (
|
||||||
|
"AZ-840 AC-4: replay subprocess exceeded 15-min soft target"
|
||||||
|
)
|
||||||
|
assert report.wall_clock_s >= report.replay_subprocess_seconds
|
||||||
|
assert report.distribution.count > 0, (
|
||||||
|
"no emissions paired with ground truth — orchestration produced "
|
||||||
|
"data but every emission fell outside the tlog GPS window"
|
||||||
|
)
|
||||||
|
|
||||||
|
# Assert AC-3 — the effective config was written and points at the
|
||||||
|
# cache_root the C3 fixture supplied.
|
||||||
|
assert effective_config_path.is_file()
|
||||||
|
effective_text = effective_config_path.read_text()
|
||||||
|
assert str(operator_pre_flight_setup.cache_root) in effective_text
|
||||||
@@ -0,0 +1,671 @@
|
|||||||
|
"""Unit tests for the AZ-840 e2e orchestrator (AC-8).
|
||||||
|
|
||||||
|
The end-to-end happy path is the Tier-2 integration test in
|
||||||
|
``test_az835_e2e_real_flight.py`` (AC-1 / AC-2). This module covers
|
||||||
|
the orchestration helper layer in isolation:
|
||||||
|
|
||||||
|
* Param validation — every required path must exist before the
|
||||||
|
airborne subprocess is spawned (AC-5 fails LOUD).
|
||||||
|
* Effective-config merge — the ``c6_tile_cache.root_dir`` override
|
||||||
|
is written to YAML; the rest of the base config is preserved.
|
||||||
|
* Error propagation per step — every documented failure surfaces
|
||||||
|
as :class:`OrchestrationFailure` with the correct
|
||||||
|
:class:`OrchestratorStep` label.
|
||||||
|
* Happy path — when the runner returns success and the JSONL +
|
||||||
|
ground truth align, :class:`OrchestrationReport` carries a
|
||||||
|
written report path and an honest verdict (AC-2: report exists
|
||||||
|
PASS or FAIL).
|
||||||
|
|
||||||
|
The tests inject a fake ``runner`` so no real
|
||||||
|
``gps-denied-replay`` subprocess is spawned. Real binary execution
|
||||||
|
is exercised on the Jetson harness via the AC-1 integration test.
|
||||||
|
"""
|
||||||
|
|
||||||
|
from __future__ import annotations
|
||||||
|
|
||||||
|
import json
|
||||||
|
import subprocess
|
||||||
|
from pathlib import Path
|
||||||
|
from unittest.mock import MagicMock
|
||||||
|
|
||||||
|
import pytest
|
||||||
|
import yaml
|
||||||
|
|
||||||
|
from gps_denied_onboard.helpers.accuracy_report import (
|
||||||
|
AC3_GATE_THRESHOLD_M,
|
||||||
|
)
|
||||||
|
from gps_denied_onboard.replay_input.tlog_route import RouteSpec
|
||||||
|
|
||||||
|
from tests.e2e.replay._e2e_orchestrator import (
|
||||||
|
OrchestrationFailure,
|
||||||
|
OrchestrationReport,
|
||||||
|
OrchestratorStep,
|
||||||
|
read_calibration_acquisition_method,
|
||||||
|
run_e2e_orchestration,
|
||||||
|
write_effective_replay_config,
|
||||||
|
)
|
||||||
|
from tests.e2e.replay._operator_pre_flight import PopulatedC6Cache
|
||||||
|
|
||||||
|
|
||||||
|
# ----------------------------------------------------------------------
|
||||||
|
# Helpers
|
||||||
|
|
||||||
|
|
||||||
|
def _build_populated_cache(tmp_path: Path) -> PopulatedC6Cache:
|
||||||
|
"""Construct a synthetic :class:`PopulatedC6Cache`.
|
||||||
|
|
||||||
|
The orchestrator only consumes ``cache_root`` from the cache,
|
||||||
|
so the FAISS sidecar paths are placeholders. The route_spec is
|
||||||
|
a minimal one-waypoint instance — no AZ-836 invariants are
|
||||||
|
re-asserted by AZ-840.
|
||||||
|
"""
|
||||||
|
|
||||||
|
cache_root = tmp_path / "cache_root"
|
||||||
|
cache_root.mkdir()
|
||||||
|
return PopulatedC6Cache(
|
||||||
|
cache_root=cache_root,
|
||||||
|
tile_store_path=cache_root / "tiles",
|
||||||
|
faiss_index_path=cache_root / "descriptor.index",
|
||||||
|
faiss_sidecar_sha256_path=cache_root / "descriptor.index.sha256",
|
||||||
|
faiss_sidecar_meta_path=cache_root / "descriptor.index.meta.json",
|
||||||
|
route_spec=RouteSpec(
|
||||||
|
waypoints=((50.10, 36.10),),
|
||||||
|
suggested_region_size_meters=500.0,
|
||||||
|
source_tlog=Path("test.tlog"),
|
||||||
|
source_segment=(0, 100),
|
||||||
|
total_distance_meters=0.0,
|
||||||
|
),
|
||||||
|
tile_count=1,
|
||||||
|
elapsed_seconds=0.0,
|
||||||
|
)
|
||||||
|
|
||||||
|
|
||||||
|
def _stage_inputs(tmp_path: Path) -> dict[str, Path]:
|
||||||
|
"""Write touch-files for every input path the orchestrator validates.
|
||||||
|
|
||||||
|
The base config YAML carries one stub block so the merge step
|
||||||
|
has a real document to overlay on.
|
||||||
|
"""
|
||||||
|
|
||||||
|
base_config = tmp_path / "operator_config.yaml"
|
||||||
|
base_config.write_text(
|
||||||
|
yaml.safe_dump(
|
||||||
|
{
|
||||||
|
"mode": "replay",
|
||||||
|
"c6_tile_cache": {
|
||||||
|
"store_runtime": "postgres_filesystem",
|
||||||
|
"metadata_runtime": "postgres_filesystem",
|
||||||
|
"descriptor_index_runtime": "faiss_hnsw",
|
||||||
|
"root_dir": "/var/lib/gps-denied/tiles",
|
||||||
|
"faiss_index_path": "/some/static/path/descriptor.index",
|
||||||
|
},
|
||||||
|
}
|
||||||
|
)
|
||||||
|
)
|
||||||
|
|
||||||
|
tlog = tmp_path / "input.tlog"
|
||||||
|
tlog.write_bytes(b"\x00")
|
||||||
|
video = tmp_path / "input.mp4"
|
||||||
|
video.write_bytes(b"\x00")
|
||||||
|
calibration = tmp_path / "calibration.json"
|
||||||
|
calibration.write_text(json.dumps({"acquisition_method": "factory-sheet"}))
|
||||||
|
signing_key = tmp_path / "signing_key.bin"
|
||||||
|
signing_key.write_bytes(b"\x00" * 32)
|
||||||
|
binary = tmp_path / "gps-denied-replay"
|
||||||
|
binary.write_text("")
|
||||||
|
|
||||||
|
return {
|
||||||
|
"base_config_path": base_config,
|
||||||
|
"tlog_path": tlog,
|
||||||
|
"video_path": video,
|
||||||
|
"calibration_path": calibration,
|
||||||
|
"signing_key_path": signing_key,
|
||||||
|
"replay_binary": binary,
|
||||||
|
}
|
||||||
|
|
||||||
|
|
||||||
|
def _ground_truth_tlog_loader(
|
||||||
|
monkeypatch: pytest.MonkeyPatch,
|
||||||
|
*,
|
||||||
|
times_s: tuple[float, ...] = (0.0, 1.0, 2.0),
|
||||||
|
lat_deg: float = 50.10,
|
||||||
|
lon_deg: float = 36.10,
|
||||||
|
alt_m: float = 100.0,
|
||||||
|
) -> None:
|
||||||
|
"""Stub the orchestrator's ground-truth loader so unit tests skip MAVLink.
|
||||||
|
|
||||||
|
The orchestrator imports ``load_tlog_ground_truth`` from
|
||||||
|
``gps_denied_onboard.replay_input``; patching the symbol *as
|
||||||
|
bound on the orchestrator module* keeps the patch local to the
|
||||||
|
unit suite (no cross-test bleed).
|
||||||
|
"""
|
||||||
|
|
||||||
|
fixes = [
|
||||||
|
_StubGpsFix(
|
||||||
|
ts_ns=int(t * 1e9),
|
||||||
|
lat_deg=lat_deg,
|
||||||
|
lon_deg=lon_deg,
|
||||||
|
alt_m=alt_m,
|
||||||
|
)
|
||||||
|
for t in times_s
|
||||||
|
]
|
||||||
|
series = _StubGpsSeries(records=tuple(fixes))
|
||||||
|
monkeypatch.setattr(
|
||||||
|
"tests.e2e.replay._e2e_orchestrator.load_tlog_ground_truth",
|
||||||
|
lambda *_args, **_kwargs: series,
|
||||||
|
)
|
||||||
|
|
||||||
|
|
||||||
|
class _StubGpsFix:
|
||||||
|
"""Mirrors the fields the orchestrator reads from each tlog row."""
|
||||||
|
|
||||||
|
__slots__ = ("ts_ns", "lat_deg", "lon_deg", "alt_m")
|
||||||
|
|
||||||
|
def __init__(
|
||||||
|
self, *, ts_ns: int, lat_deg: float, lon_deg: float, alt_m: float
|
||||||
|
) -> None:
|
||||||
|
self.ts_ns = ts_ns
|
||||||
|
self.lat_deg = lat_deg
|
||||||
|
self.lon_deg = lon_deg
|
||||||
|
self.alt_m = alt_m
|
||||||
|
|
||||||
|
|
||||||
|
class _StubGpsSeries:
|
||||||
|
"""Drop-in replacement for :class:`TlogGroundTruth`."""
|
||||||
|
|
||||||
|
def __init__(self, *, records: tuple[_StubGpsFix, ...]) -> None:
|
||||||
|
self.records = records
|
||||||
|
|
||||||
|
|
||||||
|
def _build_runner_emitting(
|
||||||
|
output_path: Path,
|
||||||
|
*,
|
||||||
|
rows: list[dict[str, object]],
|
||||||
|
returncode: int = 0,
|
||||||
|
stdout: str = "",
|
||||||
|
stderr: str = "",
|
||||||
|
) -> "MagicMock":
|
||||||
|
"""Return a fake ``subprocess.run`` that writes JSONL on call."""
|
||||||
|
|
||||||
|
def _run(argv, **kwargs): # type: ignore[no-untyped-def]
|
||||||
|
if rows:
|
||||||
|
output_path.parent.mkdir(parents=True, exist_ok=True)
|
||||||
|
output_path.write_text(
|
||||||
|
"\n".join(json.dumps(row) for row in rows) + "\n"
|
||||||
|
)
|
||||||
|
return subprocess.CompletedProcess(
|
||||||
|
args=argv,
|
||||||
|
returncode=returncode,
|
||||||
|
stdout=stdout,
|
||||||
|
stderr=stderr,
|
||||||
|
)
|
||||||
|
|
||||||
|
return MagicMock(side_effect=_run)
|
||||||
|
|
||||||
|
|
||||||
|
# ----------------------------------------------------------------------
|
||||||
|
# write_effective_replay_config
|
||||||
|
|
||||||
|
|
||||||
|
def test_write_effective_replay_config_overlays_root_dir(
|
||||||
|
tmp_path: Path,
|
||||||
|
) -> None:
|
||||||
|
# Arrange
|
||||||
|
inputs = _stage_inputs(tmp_path)
|
||||||
|
cache_root = tmp_path / "cache"
|
||||||
|
cache_root.mkdir()
|
||||||
|
output_path = tmp_path / "effective.yaml"
|
||||||
|
|
||||||
|
# Act
|
||||||
|
written_path = write_effective_replay_config(
|
||||||
|
base_config_path=inputs["base_config_path"],
|
||||||
|
cache_root=cache_root,
|
||||||
|
output_path=output_path,
|
||||||
|
)
|
||||||
|
|
||||||
|
# Assert
|
||||||
|
assert written_path == output_path
|
||||||
|
merged = yaml.safe_load(output_path.read_text())
|
||||||
|
assert merged["c6_tile_cache"]["root_dir"] == str(cache_root)
|
||||||
|
assert merged["c6_tile_cache"]["faiss_index_path"] == ""
|
||||||
|
assert merged["mode"] == "replay"
|
||||||
|
assert (
|
||||||
|
merged["c6_tile_cache"]["store_runtime"] == "postgres_filesystem"
|
||||||
|
), "non-overridden c6_tile_cache fields must survive"
|
||||||
|
|
||||||
|
|
||||||
|
def test_write_effective_replay_config_creates_block_when_absent(
|
||||||
|
tmp_path: Path,
|
||||||
|
) -> None:
|
||||||
|
# Arrange
|
||||||
|
base = tmp_path / "operator.yaml"
|
||||||
|
base.write_text(yaml.safe_dump({"mode": "replay"}))
|
||||||
|
cache_root = tmp_path / "cache"
|
||||||
|
cache_root.mkdir()
|
||||||
|
|
||||||
|
# Act
|
||||||
|
write_effective_replay_config(
|
||||||
|
base_config_path=base,
|
||||||
|
cache_root=cache_root,
|
||||||
|
output_path=tmp_path / "effective.yaml",
|
||||||
|
)
|
||||||
|
|
||||||
|
# Assert
|
||||||
|
merged = yaml.safe_load((tmp_path / "effective.yaml").read_text())
|
||||||
|
assert merged["c6_tile_cache"]["root_dir"] == str(cache_root)
|
||||||
|
|
||||||
|
|
||||||
|
def test_write_effective_replay_config_malformed_yaml_fails(
|
||||||
|
tmp_path: Path,
|
||||||
|
) -> None:
|
||||||
|
# Arrange
|
||||||
|
base = tmp_path / "bad.yaml"
|
||||||
|
base.write_text(":\n : not yaml:")
|
||||||
|
cache_root = tmp_path / "cache"
|
||||||
|
cache_root.mkdir()
|
||||||
|
|
||||||
|
# Act + Assert
|
||||||
|
with pytest.raises(OrchestrationFailure) as exc_info:
|
||||||
|
write_effective_replay_config(
|
||||||
|
base_config_path=base,
|
||||||
|
cache_root=cache_root,
|
||||||
|
output_path=tmp_path / "effective.yaml",
|
||||||
|
)
|
||||||
|
assert exc_info.value.step is OrchestratorStep.WRITE_EFFECTIVE_CONFIG
|
||||||
|
|
||||||
|
|
||||||
|
def test_write_effective_replay_config_non_mapping_top_level_fails(
|
||||||
|
tmp_path: Path,
|
||||||
|
) -> None:
|
||||||
|
# Arrange
|
||||||
|
base = tmp_path / "bad.yaml"
|
||||||
|
base.write_text("- not a mapping\n")
|
||||||
|
cache_root = tmp_path / "cache"
|
||||||
|
cache_root.mkdir()
|
||||||
|
|
||||||
|
# Act + Assert
|
||||||
|
with pytest.raises(OrchestrationFailure) as exc_info:
|
||||||
|
write_effective_replay_config(
|
||||||
|
base_config_path=base,
|
||||||
|
cache_root=cache_root,
|
||||||
|
output_path=tmp_path / "effective.yaml",
|
||||||
|
)
|
||||||
|
assert exc_info.value.step is OrchestratorStep.WRITE_EFFECTIVE_CONFIG
|
||||||
|
|
||||||
|
|
||||||
|
# ----------------------------------------------------------------------
|
||||||
|
# read_calibration_acquisition_method
|
||||||
|
|
||||||
|
|
||||||
|
def test_read_calibration_acquisition_method_returns_field_when_present(
|
||||||
|
tmp_path: Path,
|
||||||
|
) -> None:
|
||||||
|
# Arrange
|
||||||
|
path = tmp_path / "cal.json"
|
||||||
|
path.write_text(json.dumps({"acquisition_method": "factory-sheet"}))
|
||||||
|
|
||||||
|
# Assert
|
||||||
|
assert read_calibration_acquisition_method(path) == "factory-sheet"
|
||||||
|
|
||||||
|
|
||||||
|
def test_read_calibration_acquisition_method_returns_unknown_on_missing(
|
||||||
|
tmp_path: Path,
|
||||||
|
) -> None:
|
||||||
|
# Arrange
|
||||||
|
path = tmp_path / "cal.json"
|
||||||
|
path.write_text(json.dumps({"some_other_field": True}))
|
||||||
|
|
||||||
|
# Assert
|
||||||
|
assert read_calibration_acquisition_method(path) == "unknown"
|
||||||
|
|
||||||
|
|
||||||
|
def test_read_calibration_acquisition_method_returns_unknown_on_malformed(
|
||||||
|
tmp_path: Path,
|
||||||
|
) -> None:
|
||||||
|
# Arrange
|
||||||
|
path = tmp_path / "cal.json"
|
||||||
|
path.write_text("{not valid json")
|
||||||
|
|
||||||
|
# Assert
|
||||||
|
assert read_calibration_acquisition_method(path) == "unknown"
|
||||||
|
|
||||||
|
|
||||||
|
# ----------------------------------------------------------------------
|
||||||
|
# run_e2e_orchestration — param validation (AC-5)
|
||||||
|
|
||||||
|
|
||||||
|
def test_run_e2e_orchestration_missing_tlog_fails_loud(
|
||||||
|
tmp_path: Path,
|
||||||
|
) -> None:
|
||||||
|
# Arrange
|
||||||
|
cache = _build_populated_cache(tmp_path)
|
||||||
|
inputs = _stage_inputs(tmp_path)
|
||||||
|
inputs["tlog_path"].unlink()
|
||||||
|
|
||||||
|
# Act + Assert
|
||||||
|
with pytest.raises(OrchestrationFailure) as exc_info:
|
||||||
|
run_e2e_orchestration(
|
||||||
|
populated_cache=cache,
|
||||||
|
output_path=tmp_path / "out.jsonl",
|
||||||
|
report_dir=tmp_path / "metrics",
|
||||||
|
effective_config_path=tmp_path / "eff.yaml",
|
||||||
|
**inputs, # type: ignore[arg-type]
|
||||||
|
)
|
||||||
|
assert exc_info.value.step is OrchestratorStep.VALIDATE_INPUTS
|
||||||
|
assert "tlog_path" in str(exc_info.value)
|
||||||
|
|
||||||
|
|
||||||
|
def test_run_e2e_orchestration_missing_binary_fails_loud(
|
||||||
|
tmp_path: Path,
|
||||||
|
) -> None:
|
||||||
|
# Arrange
|
||||||
|
cache = _build_populated_cache(tmp_path)
|
||||||
|
inputs = _stage_inputs(tmp_path)
|
||||||
|
inputs["replay_binary"].unlink()
|
||||||
|
|
||||||
|
# Act + Assert
|
||||||
|
with pytest.raises(OrchestrationFailure) as exc_info:
|
||||||
|
run_e2e_orchestration(
|
||||||
|
populated_cache=cache,
|
||||||
|
output_path=tmp_path / "out.jsonl",
|
||||||
|
report_dir=tmp_path / "metrics",
|
||||||
|
effective_config_path=tmp_path / "eff.yaml",
|
||||||
|
**inputs, # type: ignore[arg-type]
|
||||||
|
)
|
||||||
|
assert exc_info.value.step is OrchestratorStep.VALIDATE_INPUTS
|
||||||
|
assert "replay_binary" in str(exc_info.value)
|
||||||
|
|
||||||
|
|
||||||
|
# ----------------------------------------------------------------------
|
||||||
|
# run_e2e_orchestration — subprocess error propagation (AC-5)
|
||||||
|
|
||||||
|
|
||||||
|
def test_run_e2e_orchestration_replay_nonzero_exit_fails_loud(
|
||||||
|
tmp_path: Path,
|
||||||
|
monkeypatch: pytest.MonkeyPatch,
|
||||||
|
) -> None:
|
||||||
|
# Arrange
|
||||||
|
cache = _build_populated_cache(tmp_path)
|
||||||
|
inputs = _stage_inputs(tmp_path)
|
||||||
|
output_path = tmp_path / "out.jsonl"
|
||||||
|
runner = MagicMock(
|
||||||
|
return_value=subprocess.CompletedProcess(
|
||||||
|
args=[],
|
||||||
|
returncode=1,
|
||||||
|
stdout="",
|
||||||
|
stderr="boom",
|
||||||
|
)
|
||||||
|
)
|
||||||
|
_ground_truth_tlog_loader(monkeypatch)
|
||||||
|
|
||||||
|
# Act + Assert
|
||||||
|
with pytest.raises(OrchestrationFailure) as exc_info:
|
||||||
|
run_e2e_orchestration(
|
||||||
|
populated_cache=cache,
|
||||||
|
output_path=output_path,
|
||||||
|
report_dir=tmp_path / "metrics",
|
||||||
|
effective_config_path=tmp_path / "eff.yaml",
|
||||||
|
runner=runner,
|
||||||
|
**inputs, # type: ignore[arg-type]
|
||||||
|
)
|
||||||
|
assert exc_info.value.step is OrchestratorStep.AIRBORNE_PIPELINE
|
||||||
|
assert "exited 1" in str(exc_info.value)
|
||||||
|
assert "boom" in str(exc_info.value)
|
||||||
|
|
||||||
|
|
||||||
|
def test_run_e2e_orchestration_replay_timeout_fails_loud(
|
||||||
|
tmp_path: Path,
|
||||||
|
) -> None:
|
||||||
|
# Arrange
|
||||||
|
cache = _build_populated_cache(tmp_path)
|
||||||
|
inputs = _stage_inputs(tmp_path)
|
||||||
|
|
||||||
|
def _timeout(*_args, **_kwargs):
|
||||||
|
raise subprocess.TimeoutExpired(cmd=["replay"], timeout=0.1)
|
||||||
|
|
||||||
|
runner = MagicMock(side_effect=_timeout)
|
||||||
|
|
||||||
|
# Act + Assert
|
||||||
|
with pytest.raises(OrchestrationFailure) as exc_info:
|
||||||
|
run_e2e_orchestration(
|
||||||
|
populated_cache=cache,
|
||||||
|
output_path=tmp_path / "out.jsonl",
|
||||||
|
report_dir=tmp_path / "metrics",
|
||||||
|
effective_config_path=tmp_path / "eff.yaml",
|
||||||
|
runner=runner,
|
||||||
|
max_seconds=0.1,
|
||||||
|
**inputs, # type: ignore[arg-type]
|
||||||
|
)
|
||||||
|
assert exc_info.value.step is OrchestratorStep.AIRBORNE_PIPELINE
|
||||||
|
assert "timed out" in str(exc_info.value)
|
||||||
|
|
||||||
|
|
||||||
|
def test_run_e2e_orchestration_replay_oserror_fails_loud(
|
||||||
|
tmp_path: Path,
|
||||||
|
) -> None:
|
||||||
|
# Arrange
|
||||||
|
cache = _build_populated_cache(tmp_path)
|
||||||
|
inputs = _stage_inputs(tmp_path)
|
||||||
|
|
||||||
|
def _oserror(*_args, **_kwargs):
|
||||||
|
raise OSError("permission denied")
|
||||||
|
|
||||||
|
runner = MagicMock(side_effect=_oserror)
|
||||||
|
|
||||||
|
# Act + Assert
|
||||||
|
with pytest.raises(OrchestrationFailure) as exc_info:
|
||||||
|
run_e2e_orchestration(
|
||||||
|
populated_cache=cache,
|
||||||
|
output_path=tmp_path / "out.jsonl",
|
||||||
|
report_dir=tmp_path / "metrics",
|
||||||
|
effective_config_path=tmp_path / "eff.yaml",
|
||||||
|
runner=runner,
|
||||||
|
**inputs, # type: ignore[arg-type]
|
||||||
|
)
|
||||||
|
assert exc_info.value.step is OrchestratorStep.AIRBORNE_PIPELINE
|
||||||
|
assert "permission denied" in str(exc_info.value)
|
||||||
|
|
||||||
|
|
||||||
|
# ----------------------------------------------------------------------
|
||||||
|
# run_e2e_orchestration — empty / malformed JSONL (AC-5)
|
||||||
|
|
||||||
|
|
||||||
|
def test_run_e2e_orchestration_empty_jsonl_fails_loud(
|
||||||
|
tmp_path: Path,
|
||||||
|
monkeypatch: pytest.MonkeyPatch,
|
||||||
|
) -> None:
|
||||||
|
# Arrange
|
||||||
|
cache = _build_populated_cache(tmp_path)
|
||||||
|
inputs = _stage_inputs(tmp_path)
|
||||||
|
output_path = tmp_path / "out.jsonl"
|
||||||
|
|
||||||
|
def _runner(argv, **_kwargs): # type: ignore[no-untyped-def]
|
||||||
|
output_path.write_text("\n\n") # only blanks
|
||||||
|
return subprocess.CompletedProcess(args=argv, returncode=0, stdout="", stderr="")
|
||||||
|
|
||||||
|
runner = MagicMock(side_effect=_runner)
|
||||||
|
_ground_truth_tlog_loader(monkeypatch)
|
||||||
|
|
||||||
|
# Act + Assert
|
||||||
|
with pytest.raises(OrchestrationFailure) as exc_info:
|
||||||
|
run_e2e_orchestration(
|
||||||
|
populated_cache=cache,
|
||||||
|
output_path=output_path,
|
||||||
|
report_dir=tmp_path / "metrics",
|
||||||
|
effective_config_path=tmp_path / "eff.yaml",
|
||||||
|
runner=runner,
|
||||||
|
**inputs, # type: ignore[arg-type]
|
||||||
|
)
|
||||||
|
assert exc_info.value.step is OrchestratorStep.PARSE_EMISSIONS
|
||||||
|
|
||||||
|
|
||||||
|
def test_run_e2e_orchestration_malformed_jsonl_fails_loud(
|
||||||
|
tmp_path: Path,
|
||||||
|
monkeypatch: pytest.MonkeyPatch,
|
||||||
|
) -> None:
|
||||||
|
# Arrange
|
||||||
|
cache = _build_populated_cache(tmp_path)
|
||||||
|
inputs = _stage_inputs(tmp_path)
|
||||||
|
output_path = tmp_path / "out.jsonl"
|
||||||
|
|
||||||
|
def _runner(argv, **_kwargs): # type: ignore[no-untyped-def]
|
||||||
|
output_path.write_text('{"valid": true}\nnot a json line\n')
|
||||||
|
return subprocess.CompletedProcess(args=argv, returncode=0, stdout="", stderr="")
|
||||||
|
|
||||||
|
runner = MagicMock(side_effect=_runner)
|
||||||
|
_ground_truth_tlog_loader(monkeypatch)
|
||||||
|
|
||||||
|
# Act + Assert
|
||||||
|
with pytest.raises(OrchestrationFailure) as exc_info:
|
||||||
|
run_e2e_orchestration(
|
||||||
|
populated_cache=cache,
|
||||||
|
output_path=output_path,
|
||||||
|
report_dir=tmp_path / "metrics",
|
||||||
|
effective_config_path=tmp_path / "eff.yaml",
|
||||||
|
runner=runner,
|
||||||
|
**inputs, # type: ignore[arg-type]
|
||||||
|
)
|
||||||
|
assert exc_info.value.step is OrchestratorStep.PARSE_EMISSIONS
|
||||||
|
|
||||||
|
|
||||||
|
# ----------------------------------------------------------------------
|
||||||
|
# run_e2e_orchestration — ground truth loader failure (AC-5)
|
||||||
|
|
||||||
|
|
||||||
|
def test_run_e2e_orchestration_ground_truth_loader_failure_fails_loud(
|
||||||
|
tmp_path: Path,
|
||||||
|
monkeypatch: pytest.MonkeyPatch,
|
||||||
|
) -> None:
|
||||||
|
# Arrange
|
||||||
|
cache = _build_populated_cache(tmp_path)
|
||||||
|
inputs = _stage_inputs(tmp_path)
|
||||||
|
output_path = tmp_path / "out.jsonl"
|
||||||
|
runner = _build_runner_emitting(
|
||||||
|
output_path,
|
||||||
|
rows=[
|
||||||
|
{
|
||||||
|
"emitted_at": int(0.5 * 1e9),
|
||||||
|
"position_wgs84": {
|
||||||
|
"lat_deg": 50.10,
|
||||||
|
"lon_deg": 36.10,
|
||||||
|
"alt_m": 100.0,
|
||||||
|
},
|
||||||
|
}
|
||||||
|
],
|
||||||
|
)
|
||||||
|
|
||||||
|
def _raise(*_args, **_kwargs):
|
||||||
|
raise ValueError("tlog corrupt")
|
||||||
|
|
||||||
|
monkeypatch.setattr(
|
||||||
|
"tests.e2e.replay._e2e_orchestrator.load_tlog_ground_truth",
|
||||||
|
_raise,
|
||||||
|
)
|
||||||
|
|
||||||
|
# Act + Assert
|
||||||
|
with pytest.raises(OrchestrationFailure) as exc_info:
|
||||||
|
run_e2e_orchestration(
|
||||||
|
populated_cache=cache,
|
||||||
|
output_path=output_path,
|
||||||
|
report_dir=tmp_path / "metrics",
|
||||||
|
effective_config_path=tmp_path / "eff.yaml",
|
||||||
|
runner=runner,
|
||||||
|
**inputs, # type: ignore[arg-type]
|
||||||
|
)
|
||||||
|
assert exc_info.value.step is OrchestratorStep.LOAD_GROUND_TRUTH
|
||||||
|
assert "tlog corrupt" in str(exc_info.value)
|
||||||
|
|
||||||
|
|
||||||
|
# ----------------------------------------------------------------------
|
||||||
|
# run_e2e_orchestration — happy path (AC-1 / AC-2)
|
||||||
|
|
||||||
|
|
||||||
|
def test_run_e2e_orchestration_happy_path_writes_report(
|
||||||
|
tmp_path: Path,
|
||||||
|
monkeypatch: pytest.MonkeyPatch,
|
||||||
|
) -> None:
|
||||||
|
# Arrange
|
||||||
|
cache = _build_populated_cache(tmp_path)
|
||||||
|
inputs = _stage_inputs(tmp_path)
|
||||||
|
output_path = tmp_path / "out.jsonl"
|
||||||
|
report_dir = tmp_path / "metrics"
|
||||||
|
effective_config_path = tmp_path / "eff.yaml"
|
||||||
|
rows = [
|
||||||
|
{
|
||||||
|
"emitted_at": int(0.5 * 1e9),
|
||||||
|
"position_wgs84": {"lat_deg": 50.10, "lon_deg": 36.10, "alt_m": 100.0},
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"emitted_at": int(1.5 * 1e9),
|
||||||
|
"position_wgs84": {"lat_deg": 50.10, "lon_deg": 36.10, "alt_m": 100.0},
|
||||||
|
},
|
||||||
|
]
|
||||||
|
runner = _build_runner_emitting(output_path, rows=rows)
|
||||||
|
_ground_truth_tlog_loader(monkeypatch)
|
||||||
|
|
||||||
|
# Act
|
||||||
|
report = run_e2e_orchestration(
|
||||||
|
populated_cache=cache,
|
||||||
|
output_path=output_path,
|
||||||
|
report_dir=report_dir,
|
||||||
|
effective_config_path=effective_config_path,
|
||||||
|
runner=runner,
|
||||||
|
run_date_utc="2026-05-23",
|
||||||
|
**inputs, # type: ignore[arg-type]
|
||||||
|
)
|
||||||
|
|
||||||
|
# Assert
|
||||||
|
assert isinstance(report, OrchestrationReport)
|
||||||
|
assert report.report_path.is_file()
|
||||||
|
assert report.emissions_count == 2
|
||||||
|
assert report.distribution.count == 2
|
||||||
|
assert report.verdict_passed is True
|
||||||
|
body = report.report_path.read_text()
|
||||||
|
assert "## Horizontal error (metres)" in body
|
||||||
|
assert "## Threshold-hit share" in body
|
||||||
|
assert f"| {AC3_GATE_THRESHOLD_M:g} |" in body
|
||||||
|
runner.assert_called_once()
|
||||||
|
argv_passed = runner.call_args.args[0]
|
||||||
|
assert str(effective_config_path) in argv_passed
|
||||||
|
assert "--auto-trim" in argv_passed
|
||||||
|
merged = yaml.safe_load(effective_config_path.read_text())
|
||||||
|
assert merged["c6_tile_cache"]["root_dir"] == str(cache.cache_root)
|
||||||
|
|
||||||
|
|
||||||
|
def test_run_e2e_orchestration_writes_report_even_on_fail_verdict(
|
||||||
|
tmp_path: Path,
|
||||||
|
monkeypatch: pytest.MonkeyPatch,
|
||||||
|
) -> None:
|
||||||
|
# Arrange — emissions are 1 km from ground truth, far above the 100 m gate.
|
||||||
|
cache = _build_populated_cache(tmp_path)
|
||||||
|
inputs = _stage_inputs(tmp_path)
|
||||||
|
output_path = tmp_path / "out.jsonl"
|
||||||
|
report_dir = tmp_path / "metrics"
|
||||||
|
rows = [
|
||||||
|
{
|
||||||
|
"emitted_at": int(0.5 * 1e9),
|
||||||
|
"position_wgs84": {"lat_deg": 50.110, "lon_deg": 36.110, "alt_m": 100.0},
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"emitted_at": int(1.5 * 1e9),
|
||||||
|
"position_wgs84": {"lat_deg": 50.110, "lon_deg": 36.110, "alt_m": 100.0},
|
||||||
|
},
|
||||||
|
]
|
||||||
|
runner = _build_runner_emitting(output_path, rows=rows)
|
||||||
|
_ground_truth_tlog_loader(monkeypatch)
|
||||||
|
|
||||||
|
# Act
|
||||||
|
report = run_e2e_orchestration(
|
||||||
|
populated_cache=cache,
|
||||||
|
output_path=output_path,
|
||||||
|
report_dir=report_dir,
|
||||||
|
effective_config_path=tmp_path / "eff.yaml",
|
||||||
|
runner=runner,
|
||||||
|
run_date_utc="2026-05-23",
|
||||||
|
**inputs, # type: ignore[arg-type]
|
||||||
|
)
|
||||||
|
|
||||||
|
# Assert — AC-2: report exists regardless of PASS/FAIL.
|
||||||
|
assert report.verdict_passed is False
|
||||||
|
assert report.report_path.is_file()
|
||||||
|
assert "FAIL" in report.report_path.read_text()
|
||||||
Reference in New Issue
Block a user