mirror of
https://github.com/azaion/gps-denied-onboard.git
synced 2026-06-22 14:51:13 +00:00
[AZ-404] [AZ-389] [AZ-559] E2E replay test (Derkachi 60s) + AZ-389 cleanup
Batch 63 of /autodev replay slice. Adds the AZ-404 E2E test harness against the Derkachi fixture and resolves the AZ-389 dependency phantom (closing AZ-559 Won't Fix). E2E test (AZ-404) - tests/e2e/replay/_tlog_synth.py: deterministic CSV->tlog generator (the original Derkachi tlog is not in repo; data_imu.csv is its export, so we round-trip the CSV through pymavlink). Verified: SCALED_IMU2 + ATTITUDE + GPS_RAW_INT + HEARTBEAT round-trip cleanly through mavutil.mavlink_connection. - tests/e2e/replay/_helpers.py: parse_jsonl, l2_horizontal_m (haversine), match_percentage, CapturingMavlinkTransport (ready for AZ-558 unblock), GroundTruthRow + load_ground_truth_csv. - tests/e2e/replay/conftest.py: derkachi_replay_inputs (session scope), replay_runner (subprocess fixture per AZ-402 CLI), operator_pre_flight_setup placeholder. - tests/e2e/replay/test_derkachi_1min.py: 9 tests covering AC-1..AC-8 with AC-7 skip-gate self-check + AC-4a mode-agnosticism AST scan (passes unconditionally, confirms ADR-011 holding). - tests/e2e/replay/test_helpers.py: 14 unit tests covering AC-9 helper L2 correctness + match_percentage + parse_jsonl + CapturingMavlinkTransport (all unconditional). - tests/e2e/replay/README.md: AC matrix, fixture state, runtime budget, failure cookbook (AC-10). AC matrix - AC-1, AC-2, AC-5, AC-6 implemented and Tier-1 gated on RUN_REPLAY_E2E=1. - AC-3 (<=100m for 80%) xfail until real Topotek KHP20S30 calibration ships (camera_info.md states intrinsics are unknown). - AC-4a (mode-agnosticism AST scan) PASSES unconditionally. - AC-4b (encoder byte-equality) skip until AZ-558 routes C8 bytes through MavlinkTransport. - AC-7 (skip-gate self-check) PASSES unconditionally. - AC-8 (operator workflow rehearsal) skip until D-PROJ-2 mock-suite-sat-service implements tile-fetch + index-build endpoints. - AC-9 (helper L2 correctness) 14 PASSES unconditionally. AZ-389 housekeeping - AZ-559 closed Won't Fix: investigation against c6_tile_cache/_types.py confirmed TileSource.ONBOARD_INGEST + TileMetadata.quality_metadata + write_tile's FreshnessRejectionError already cover the mid-flight ingest semantic. The "missing API" was a spec-vs-impl naming mismatch. - AZ-389 spec rewritten to consume the existing write_tile API + catch FreshnessRejectionError per AC-NEW-3 opportunistic emission. - _dependencies_table.md reverted: AZ-389 deps -> AZ-303 (was AZ-559 in the previous commit on this branch); total 150 / 497 pts. Tests - Full regression: 2099 passed (+14 new e2e/replay), 94 skipped (incl. 8 e2e/replay heavy-tier + documented blocker skips), 3 perf-microbench flakes deselected (test_cli_cold_start_under_2s, test_cold_start_under_500ms_p99, test_nfr_perf_sign_microbench; all pass in isolation - pre-existing under-load flakes on dev macOS). Reviews - _docs/03_implementation/reviews/batch_63_review.md: code review PASS_WITH_WARNINGS (3 documented spec-gap deferrals: AC-3, AC-4b, AC-8). - _docs/03_implementation/cumulative_review_batches_61-63_cycle1_report.md: cumulative review PASS_WITH_WARNINGS. Action items: prioritise AZ-558 (closes AZ-401 AC-9 + AZ-404 AC-4b); consider 2pt hygiene PBI for Protocol-completeness AST scan to catch the AZ-389 / AZ-559 phantom-API pattern at task-prep time. Architecture invariants observably holding - ADR-011 (replay-as-configuration): AC-4a's AST scan over src/gps_denied_onboard/components/**/*.py finds zero violations - components branch on neither config.mode nor any synonym. - Single composition root (replay protocol Invariant 11): AZ-402 CLI dispatches to runtime_root.main(config); does not call compose_root directly. Co-authored-by: Cursor <cursoragent@cursor.com>
This commit is contained in:
@@ -0,0 +1,99 @@
|
||||
# E2E replay tests (AZ-404)
|
||||
|
||||
End-to-end regression suite that runs the `gps-denied-replay`
|
||||
console-script (AZ-402) against the Derkachi 60 s clip and asserts
|
||||
the AZ-265 epic acceptance criteria.
|
||||
|
||||
## How to run
|
||||
|
||||
```bash
|
||||
# In a fresh venv with the package installed:
|
||||
RUN_REPLAY_E2E=1 pytest tests/e2e/replay/ -v
|
||||
```
|
||||
|
||||
Without `RUN_REPLAY_E2E=1` the heavy tests skip cleanly. The two
|
||||
unconditional tests (AC-4a mode-agnosticism scan + AC-7 skip-gate
|
||||
self-check + the helpers in `test_helpers.py`) still run.
|
||||
|
||||
## Fixture state
|
||||
|
||||
| Artifact | Status | Source |
|
||||
|----------|--------|--------|
|
||||
| `flight_derkachi.mp4` | available | `_docs/00_problem/input_data/flight_derkachi/` |
|
||||
| `data_imu.csv` | available | same dir; 4900 rows at 10 Hz over 489.9 s |
|
||||
| Synthetic tlog | generated at fixture time | `_tlog_synth.py` reproduces a `pymavlink` `.tlog` from the CSV (the original tlog is not in-repo; the CSV was its export) |
|
||||
| Camera calibration | placeholder (`tests/fixtures/calibration/adti26.json`) | The real Topotek KHP20S30 intrinsics are unknown per `camera_info.md`. AC-3 is `xfail`ed until a real calibration ships. |
|
||||
| Operator pre-flight rehearsal | blocked | `tests/fixtures/mock-suite-sat-service/` is a bootstrap stub (only `GET /healthz`); AC-8 skips until the full D-PROJ-2 contract lands. |
|
||||
|
||||
## Clip range
|
||||
|
||||
The first 60 s of the Derkachi flight (Time=0.0 → Time=60.0). The
|
||||
take-off region exercises the AZ-405 IMU-take-off auto-sync detector;
|
||||
the cruise region that follows stresses the satellite-anchor + VIO
|
||||
drift-correction path. To change the trim, edit `_CLIP_START_S` and
|
||||
`_CLIP_END_S` in `conftest.py`.
|
||||
|
||||
## Expected runtime (Tier-1)
|
||||
|
||||
| Test | Expected wall clock |
|
||||
|------|---------------------|
|
||||
| AC-1 (`--pace asap`) | ≤ 30 s |
|
||||
| AC-2 schema match | piggybacks on AC-1 |
|
||||
| AC-5 determinism | 2 × asap runs (≤ 60 s total) |
|
||||
| AC-6 realtime | 60 s ± 3 s |
|
||||
| AC-6 asap | ≤ 30 s |
|
||||
| Total suite | ≤ 6 min on Jetson AGX Orin |
|
||||
|
||||
The AC-1 / AC-2 / AC-5 tests share `--pace asap` runs but each
|
||||
fixture invocation produces a fresh output file, so they do not
|
||||
short-circuit each other (preserves AC-5's two-runs-diff guarantee).
|
||||
|
||||
## AC matrix
|
||||
|
||||
| AC | Test | State |
|
||||
|----|------|-------|
|
||||
| AC-1: exit 0 + JSONL count match | `test_ac1_exits_0_jsonl_count_match` | runs on Tier-1 |
|
||||
| AC-2: JSONL schema match | `test_ac2_jsonl_schema_match` | runs on Tier-1 |
|
||||
| AC-3: ≤ 100 m for 80 % of ticks | `test_ac3_within_100m_80pct_of_ticks` | `xfail` (waiting on real calibration) |
|
||||
| AC-4a: mode-agnosticism AST scan | `test_ac4_mode_agnosticism_ast_scan` | unconditional |
|
||||
| AC-4b: encoder byte-equality | `test_ac4_encoder_byte_equality` | `skip` (waiting on AZ-558) |
|
||||
| AC-5: determinism | `test_ac5_determinism_two_runs_diff` | runs on Tier-1 |
|
||||
| AC-6a: realtime 60 s ± 5 % | `test_ac6_pace_realtime_60s_within_5pct` | runs on Tier-1 |
|
||||
| AC-6b: asap ≤ 30 s | `test_ac6_pace_asap_under_30s` | runs on Tier-1 |
|
||||
| AC-7: skip-gate self-check | `test_ac7_skip_gate_consistent_with_env_var` | unconditional |
|
||||
| AC-8: operator workflow rehearsal | `test_ac8_operator_workflow` | `skip` (waiting on D-PROJ-2 mock) |
|
||||
| AC-9: helper L2 correctness | `test_helpers.py::test_ac9_l2_*` | unconditional |
|
||||
| AC-10: README accuracy | this file | live |
|
||||
|
||||
## Failure-mode cookbook
|
||||
|
||||
| Symptom | Likely cause | Fix |
|
||||
|---------|--------------|-----|
|
||||
| `gps-denied-replay console-script not on PATH` | package not installed in the test venv | `pip install -e .` |
|
||||
| AC-1 line count off by > 5 % | tlog synthesizer drifted from the CSV | regenerate by re-running the test (synthesizer is deterministic; non-determinism would be a real bug) |
|
||||
| AC-3 fails at ~ 0 % even with calibration | wrong intrinsics OR wrong WGS84 ground truth source — verify the GLOBAL_POSITION_INT columns are still the AC-3 reference (per `flight_derkachi/README.md`) | re-derive ground truth |
|
||||
| AC-5 determinism violated | non-deterministic float ordering in C5 estimator OR a clock leaked into the runtime | bisect via `git log` against the C5 / `clock` modules |
|
||||
| AC-6 realtime drifts on shared CI | shared-runner contention; the spec allows widening to ± 5 s | adjust `_HEAVY_SKIP` boundary if it persists |
|
||||
| `tlog missing required messages` | `_tlog_synth.py` lost a message group | check `_REQUIRED_MESSAGE_GROUPS` in `tlog_replay_adapter.py` against the synth output |
|
||||
|
||||
## Files
|
||||
|
||||
```
|
||||
tests/e2e/replay/
|
||||
├── README.md ← this file
|
||||
├── __init__.py ← package marker + module-level docstring
|
||||
├── _helpers.py ← parse_jsonl, l2_horizontal_m, match_percentage,
|
||||
│ CapturingMavlinkTransport, GroundTruthRow
|
||||
├── _tlog_synth.py ← CSV → tlog generator
|
||||
├── conftest.py ← derkachi_replay_inputs, replay_runner,
|
||||
│ operator_pre_flight_setup fixtures
|
||||
├── test_helpers.py ← unit tests for _helpers (unconditional)
|
||||
└── test_derkachi_1min.py ← AC-1..AC-8 + AC-7 skip gate + AC-4a AST scan
|
||||
```
|
||||
|
||||
## Follow-up work
|
||||
|
||||
* **Real Topotek KHP20S30 calibration** — unblocks AC-3.
|
||||
* **AZ-558** — closes AC-4b (route C8 encoders through `MavlinkTransport`).
|
||||
* **D-PROJ-2 mock-suite-sat-service** — unblocks AC-8 (operator
|
||||
workflow rehearsal).
|
||||
@@ -0,0 +1,6 @@
|
||||
"""E2E replay tests (AZ-404 / E-DEMO-REPLAY).
|
||||
|
||||
Runs the ``gps-denied-replay`` console-script (AZ-402) end-to-end
|
||||
against the Derkachi fixture. Gated by ``RUN_REPLAY_E2E=1`` per the
|
||||
project's E2E pattern; reports SKIPPED when unset.
|
||||
"""
|
||||
@@ -0,0 +1,223 @@
|
||||
"""Helpers shared by the AZ-404 E2E replay tests.
|
||||
|
||||
* :func:`parse_jsonl` — read the ``JsonlReplaySink`` output into a list
|
||||
of dicts with one entry per emit.
|
||||
* :func:`l2_horizontal_m` — WGS84-aware L2 horizontal distance between
|
||||
two ``(lat, lon)`` pairs in metres.
|
||||
* :func:`match_percentage` — share of estimator emissions whose
|
||||
L2 distance to the closest ground-truth row is within a threshold.
|
||||
* :class:`CapturingMavlinkTransport` — test-only ``MavlinkTransport``
|
||||
impl that records every ``write`` so AC-4b can compare the byte
|
||||
streams produced by ``compose_root(config_live)`` vs.
|
||||
``compose_root(config_replay)``.
|
||||
* :func:`load_ground_truth_csv` — the IMU CSV's ``GLOBAL_POSITION_INT``
|
||||
columns ARE the AC-3 reference (the original tlog's GPS rows
|
||||
exported to CSV); this helper materialises them.
|
||||
|
||||
All functions are pure / deterministic and stay safely importable on
|
||||
dev macOS without ``RUN_REPLAY_E2E``; the regular regression suite
|
||||
calls them via the unit-level helper test in this module's sibling
|
||||
``test_helpers.py``.
|
||||
"""
|
||||
|
||||
from __future__ import annotations
|
||||
|
||||
import csv
|
||||
import json
|
||||
import math
|
||||
from dataclasses import dataclass
|
||||
from pathlib import Path
|
||||
from typing import Any
|
||||
|
||||
__all__ = [
|
||||
"CapturingMavlinkTransport",
|
||||
"GroundTruthRow",
|
||||
"l2_horizontal_m",
|
||||
"load_ground_truth_csv",
|
||||
"match_percentage",
|
||||
"parse_jsonl",
|
||||
]
|
||||
|
||||
|
||||
# WGS84 mean Earth radius. Matches the value used by
|
||||
# `helpers/wgs_converter.py` (AZ-279) so the e2e check is consistent
|
||||
# with the production converter.
|
||||
_EARTH_RADIUS_M: float = 6_371_008.8
|
||||
|
||||
|
||||
@dataclass(frozen=True)
|
||||
class GroundTruthRow:
|
||||
"""One row from the Derkachi data_imu.csv ground-truth slice."""
|
||||
|
||||
t_s: float
|
||||
lat_deg: float
|
||||
lon_deg: float
|
||||
alt_m: float
|
||||
|
||||
|
||||
def parse_jsonl(path: Path) -> list[dict[str, Any]]:
|
||||
"""Return one dict per line of a JsonlReplaySink output file.
|
||||
|
||||
Empty trailing lines are tolerated (orjson always terminates with
|
||||
``\\n`` so the last newline is followed by ``""``); other empty
|
||||
lines indicate a corrupt file and surface as a JSON decode error.
|
||||
"""
|
||||
records: list[dict[str, Any]] = []
|
||||
with path.open(encoding="utf-8") as fp:
|
||||
for lineno, line in enumerate(fp, start=1):
|
||||
stripped = line.rstrip("\n")
|
||||
if not stripped:
|
||||
continue
|
||||
try:
|
||||
records.append(json.loads(stripped))
|
||||
except json.JSONDecodeError as exc:
|
||||
raise AssertionError(
|
||||
f"line {lineno} in {path} is not valid JSON: {exc.msg!r}"
|
||||
) from exc
|
||||
return records
|
||||
|
||||
|
||||
def l2_horizontal_m(
|
||||
lat1_deg: float, lon1_deg: float, lat2_deg: float, lon2_deg: float
|
||||
) -> float:
|
||||
"""WGS84-spherical great-circle distance in metres.
|
||||
|
||||
Uses the haversine formula with the C5/AZ-279 mean Earth radius.
|
||||
Sufficient for the AC-3 ≤ 100 m threshold (sub-metre accuracy at
|
||||
the Derkachi latitude band; the spherical approximation diverges
|
||||
from the WGS84 ellipsoid by < 0.5 % at these latitudes — well
|
||||
within the AC-3 budget).
|
||||
"""
|
||||
phi1 = math.radians(lat1_deg)
|
||||
phi2 = math.radians(lat2_deg)
|
||||
dphi = phi2 - phi1
|
||||
dlam = math.radians(lon2_deg - lon1_deg)
|
||||
a = (
|
||||
math.sin(dphi / 2.0) ** 2
|
||||
+ math.cos(phi1) * math.cos(phi2) * math.sin(dlam / 2.0) ** 2
|
||||
)
|
||||
c = 2.0 * math.asin(min(1.0, math.sqrt(a)))
|
||||
return _EARTH_RADIUS_M * c
|
||||
|
||||
|
||||
def load_ground_truth_csv(csv_path: Path) -> list[GroundTruthRow]:
|
||||
"""Load the Derkachi IMU CSV's GPS rows as ground truth.
|
||||
|
||||
The original ``flight_derkachi.tlog``'s ``GLOBAL_POSITION_INT``
|
||||
messages were exported to ``data_imu.csv``; the ``lat / lon /
|
||||
alt`` columns are degrees * 1e7 / metres * 1e3 (mavlink integer
|
||||
encoding), so we divide accordingly.
|
||||
"""
|
||||
rows: list[GroundTruthRow] = []
|
||||
with csv_path.open(newline="") as fp:
|
||||
reader = csv.DictReader(fp)
|
||||
for r in reader:
|
||||
rows.append(
|
||||
GroundTruthRow(
|
||||
t_s=float(r["Time"]),
|
||||
lat_deg=float(r["GLOBAL_POSITION_INT.lat"]) / 1e7,
|
||||
lon_deg=float(r["GLOBAL_POSITION_INT.lon"]) / 1e7,
|
||||
alt_m=float(r["GLOBAL_POSITION_INT.alt"]) / 1e3,
|
||||
)
|
||||
)
|
||||
return rows
|
||||
|
||||
|
||||
def match_percentage(
|
||||
emissions: list[dict[str, Any]],
|
||||
ground_truth: list[GroundTruthRow],
|
||||
*,
|
||||
threshold_m: float,
|
||||
) -> float:
|
||||
"""Share of emissions within ``threshold_m`` of the closest GT row.
|
||||
|
||||
For each emitted ``EstimatorOutput`` JSONL record, find the
|
||||
nearest-in-time ground-truth row, compute the horizontal L2
|
||||
distance, and count it as a hit when ≤ ``threshold_m``. Returns
|
||||
the hit ratio in [0.0, 1.0].
|
||||
|
||||
Nearest-in-time is sufficient because the IMU CSV's 10 Hz cadence
|
||||
(matching the C5 emit rate) means the candidate row is typically
|
||||
< 50 ms off the emit timestamp — well below the AC-3 100 m budget.
|
||||
"""
|
||||
if not emissions:
|
||||
return 0.0
|
||||
if not ground_truth:
|
||||
raise AssertionError("ground_truth must be non-empty")
|
||||
gt_sorted = sorted(ground_truth, key=lambda r: r.t_s)
|
||||
gt_times = [r.t_s for r in gt_sorted]
|
||||
hits = 0
|
||||
for emit in emissions:
|
||||
emit_ts_ns = int(emit["emitted_at"])
|
||||
emit_t_s = emit_ts_ns / 1e9
|
||||
idx = _bisect_left(gt_times, emit_t_s)
|
||||
candidates = []
|
||||
if idx > 0:
|
||||
candidates.append(gt_sorted[idx - 1])
|
||||
if idx < len(gt_sorted):
|
||||
candidates.append(gt_sorted[idx])
|
||||
# Nearest-in-time row.
|
||||
nearest = min(candidates, key=lambda r: abs(r.t_s - emit_t_s))
|
||||
emit_pos = emit["position_wgs84"]
|
||||
d = l2_horizontal_m(
|
||||
emit_pos["lat_deg"],
|
||||
emit_pos["lon_deg"],
|
||||
nearest.lat_deg,
|
||||
nearest.lon_deg,
|
||||
)
|
||||
if d <= threshold_m:
|
||||
hits += 1
|
||||
return hits / len(emissions)
|
||||
|
||||
|
||||
def _bisect_left(seq: list[float], target: float) -> int:
|
||||
"""Stdlib bisect_left, inlined to keep import surface narrow."""
|
||||
lo, hi = 0, len(seq)
|
||||
while lo < hi:
|
||||
mid = (lo + hi) // 2
|
||||
if seq[mid] < target:
|
||||
lo = mid + 1
|
||||
else:
|
||||
hi = mid
|
||||
return lo
|
||||
|
||||
|
||||
class CapturingMavlinkTransport:
|
||||
"""Test-only :class:`MavlinkTransport` that records every write.
|
||||
|
||||
Used by AZ-404 AC-4b: capture the byte streams produced by
|
||||
``compose_root(config_live).c8.emit_external_position(out)`` and
|
||||
``compose_root(config_replay).c8.emit_external_position(out)`` to
|
||||
assert byte-identity per replay protocol Invariant 5.
|
||||
|
||||
NOTE: AC-4b is currently SKIPPED (blocked on AZ-558 — the C8
|
||||
encoders still bypass the ``MavlinkTransport`` seam by calling
|
||||
``mav.*_send`` directly). This class is in place so the test
|
||||
fixture is ready the moment AZ-558 lands.
|
||||
"""
|
||||
|
||||
def __init__(self) -> None:
|
||||
self._chunks: list[bytes] = []
|
||||
self._closed = False
|
||||
|
||||
def write(self, payload: bytes) -> int:
|
||||
if self._closed:
|
||||
raise RuntimeError("CapturingMavlinkTransport.write after close")
|
||||
self._chunks.append(bytes(payload))
|
||||
return len(payload)
|
||||
|
||||
def bytes_written(self) -> int:
|
||||
return sum(len(c) for c in self._chunks)
|
||||
|
||||
def close(self) -> None:
|
||||
self._closed = True
|
||||
|
||||
@property
|
||||
def captured_payloads(self) -> tuple[bytes, ...]:
|
||||
"""Tuple of every payload passed to :meth:`write`, in order."""
|
||||
return tuple(self._chunks)
|
||||
|
||||
@property
|
||||
def captured_concat(self) -> bytes:
|
||||
"""All captured payloads concatenated — the wire-byte stream."""
|
||||
return b"".join(self._chunks)
|
||||
@@ -0,0 +1,167 @@
|
||||
"""Synthesize a pymavlink ``.tlog`` from the Derkachi ``data_imu.csv``.
|
||||
|
||||
The Derkachi fixture (``_docs/00_problem/input_data/flight_derkachi/``)
|
||||
ships ``flight_derkachi.mp4`` + ``data_imu.csv`` only — the original
|
||||
pymavlink tlog is not in-repo (it was the source the CSV was
|
||||
*exported* from). The AZ-404 E2E test runs ``gps-denied-replay``
|
||||
which expects a tlog input, so we round-trip the CSV back to a tlog
|
||||
here.
|
||||
|
||||
Output schema (per ``tlog_replay_adapter._REQUIRED_MESSAGE_GROUPS``):
|
||||
|
||||
* ``SCALED_IMU2`` — one per CSV row (xacc/yacc/zacc/xgyro/ygyro/zgyro/
|
||||
xmag/ymag/zmag fields map 1:1).
|
||||
* ``GPS_RAW_INT`` — one per CSV row, derived from
|
||||
``GLOBAL_POSITION_INT.lat / .lon / .alt / .vx / .vy``. ``fix_type``
|
||||
is held at ``GPS_FIX_TYPE_3D_FIX`` (3) for every row — the CSV is
|
||||
post-flight cleaned and contains valid GPS throughout.
|
||||
* ``ATTITUDE`` — one per CSV row. roll/pitch are synthesized as zero
|
||||
(the camera is mechanically locked nadir per
|
||||
``camera_info.md``); yaw is derived from
|
||||
``GLOBAL_POSITION_INT.hdg`` (cdeg → rad).
|
||||
* ``HEARTBEAT`` — one per second so the tlog-replay adapter's
|
||||
pre-scan find the type quickly.
|
||||
|
||||
The tlog binary format is the pymavlink convention: ``<8-byte
|
||||
big-endian timestamp microseconds><raw MAVLink2 message bytes>``,
|
||||
repeated. The C8 ``TlogReplayFcAdapter`` consumes it via
|
||||
``mavutil.mavlink_connection(path, mavlink_version="2.0")``.
|
||||
|
||||
The synthesizer is deterministic: identical CSV → identical bytes.
|
||||
The conftest caches the output path next to the CSV so repeat runs
|
||||
short-circuit when the cache is up-to-date.
|
||||
"""
|
||||
|
||||
from __future__ import annotations
|
||||
|
||||
import csv
|
||||
import math
|
||||
import struct
|
||||
from pathlib import Path
|
||||
from typing import Final
|
||||
|
||||
from pymavlink.dialects.v20 import ardupilotmega as mavlink
|
||||
|
||||
__all__ = [
|
||||
"SOURCE_COMPONENT",
|
||||
"SOURCE_SYSTEM",
|
||||
"synthesize_tlog",
|
||||
]
|
||||
|
||||
|
||||
SOURCE_SYSTEM: Final[int] = 1 # vehicle id (any non-zero stable integer)
|
||||
SOURCE_COMPONENT: Final[int] = mavlink.MAV_COMP_ID_AUTOPILOT1
|
||||
_HEARTBEAT_PERIOD_S: Final[float] = 1.0
|
||||
# tlog timestamp epoch — pymavlink stores absolute microseconds. The
|
||||
# Derkachi CSV's ``timestamp(ms)`` field is a flight-controller boot
|
||||
# clock, not Unix epoch. We anchor the synthetic tlog at a fixed
|
||||
# Unix-epoch base so the timestamps are monotonically increasing and
|
||||
# greater than the MAVLink2-required minimum (2015 cutoff). The
|
||||
# absolute value is irrelevant for replay-mode determinism; only the
|
||||
# delta-between-rows matters.
|
||||
_TLOG_BASE_TIMESTAMP_US: Final[int] = 1_700_000_000_000_000 # 2023-11-14 22:13:20 UTC
|
||||
|
||||
|
||||
def synthesize_tlog(csv_path: Path, tlog_path: Path) -> int:
|
||||
"""Write a tlog reproduced from ``csv_path`` to ``tlog_path``.
|
||||
|
||||
Returns the number of bytes written. Overwrites ``tlog_path``
|
||||
atomically (write to ``<path>.tmp``, fsync, rename).
|
||||
|
||||
The output schema satisfies ``TlogReplayFcAdapter``'s pre-scan
|
||||
requirements per ``c8_fc_adapter/tlog_replay_adapter.py``:
|
||||
``RAW_IMU`` or ``SCALED_IMU2`` + ``ATTITUDE`` + ``GPS_RAW_INT`` or
|
||||
``GPS2_RAW`` + ``HEARTBEAT``.
|
||||
"""
|
||||
tmp_path = tlog_path.with_suffix(tlog_path.suffix + ".tmp")
|
||||
mav = mavlink.MAVLink(
|
||||
file=None,
|
||||
srcSystem=SOURCE_SYSTEM,
|
||||
srcComponent=SOURCE_COMPONENT,
|
||||
)
|
||||
bytes_written = 0
|
||||
next_heartbeat_t_s = 0.0
|
||||
with csv_path.open(newline="") as fp, tmp_path.open("wb") as out:
|
||||
reader = csv.DictReader(fp)
|
||||
for row in reader:
|
||||
t_s = float(row["Time"])
|
||||
ts_us = _TLOG_BASE_TIMESTAMP_US + int(t_s * 1_000_000)
|
||||
time_boot_ms = int(float(row["timestamp(ms)"]))
|
||||
|
||||
# SCALED_IMU2 ----------------------------------------------------
|
||||
imu2 = mav.scaled_imu2_encode(
|
||||
time_boot_ms=time_boot_ms,
|
||||
xacc=int(float(row["SCALED_IMU2.xacc"])),
|
||||
yacc=int(float(row["SCALED_IMU2.yacc"])),
|
||||
zacc=int(float(row["SCALED_IMU2.zacc"])),
|
||||
xgyro=int(float(row["SCALED_IMU2.xgyro"])),
|
||||
ygyro=int(float(row["SCALED_IMU2.ygyro"])),
|
||||
zgyro=int(float(row["SCALED_IMU2.zgyro"])),
|
||||
xmag=int(float(row["SCALED_IMU2.xmag"])),
|
||||
ymag=int(float(row["SCALED_IMU2.ymag"])),
|
||||
zmag=int(float(row["SCALED_IMU2.zmag"])),
|
||||
)
|
||||
bytes_written += _write_record(out, ts_us, imu2.pack(mav))
|
||||
|
||||
# ATTITUDE -------------------------------------------------------
|
||||
yaw_cdeg = float(row["GLOBAL_POSITION_INT.hdg"])
|
||||
yaw_rad = math.radians(yaw_cdeg / 100.0) if yaw_cdeg > 0 else 0.0
|
||||
attitude = mav.attitude_encode(
|
||||
time_boot_ms=time_boot_ms,
|
||||
roll=0.0,
|
||||
pitch=0.0,
|
||||
yaw=yaw_rad,
|
||||
rollspeed=0.0,
|
||||
pitchspeed=0.0,
|
||||
yawspeed=0.0,
|
||||
)
|
||||
bytes_written += _write_record(out, ts_us, attitude.pack(mav))
|
||||
|
||||
# GPS_RAW_INT ----------------------------------------------------
|
||||
gps = mav.gps_raw_int_encode(
|
||||
time_usec=ts_us,
|
||||
fix_type=mavlink.GPS_FIX_TYPE_3D_FIX,
|
||||
lat=int(float(row["GLOBAL_POSITION_INT.lat"])),
|
||||
lon=int(float(row["GLOBAL_POSITION_INT.lon"])),
|
||||
alt=int(float(row["GLOBAL_POSITION_INT.alt"])),
|
||||
eph=100,
|
||||
epv=200,
|
||||
vel=int(
|
||||
math.hypot(
|
||||
float(row["GLOBAL_POSITION_INT.vx"]),
|
||||
float(row["GLOBAL_POSITION_INT.vy"]),
|
||||
)
|
||||
),
|
||||
cog=int(yaw_cdeg) if yaw_cdeg > 0 else 0,
|
||||
satellites_visible=12,
|
||||
)
|
||||
bytes_written += _write_record(out, ts_us, gps.pack(mav))
|
||||
|
||||
# HEARTBEAT (1 Hz) -----------------------------------------------
|
||||
if t_s >= next_heartbeat_t_s:
|
||||
heartbeat = mav.heartbeat_encode(
|
||||
type=mavlink.MAV_TYPE_FIXED_WING,
|
||||
autopilot=mavlink.MAV_AUTOPILOT_ARDUPILOTMEGA,
|
||||
base_mode=mavlink.MAV_MODE_FLAG_AUTO_ENABLED,
|
||||
custom_mode=10, # AUTO mode for ArduPlane
|
||||
system_status=mavlink.MAV_STATE_ACTIVE,
|
||||
)
|
||||
bytes_written += _write_record(out, ts_us, heartbeat.pack(mav))
|
||||
next_heartbeat_t_s = t_s + _HEARTBEAT_PERIOD_S
|
||||
|
||||
out.flush()
|
||||
# fsync the temp file so the rename below is durable on power loss.
|
||||
# OSError here is rare; we want it to surface, not be swallowed.
|
||||
import os as _os
|
||||
|
||||
_os.fsync(out.fileno())
|
||||
tmp_path.replace(tlog_path)
|
||||
return bytes_written
|
||||
|
||||
|
||||
def _write_record(out, ts_us: int, payload: bytes) -> int:
|
||||
"""Write one tlog record (8B big-endian timestamp + MAVLink frame)."""
|
||||
header = struct.pack(">Q", ts_us)
|
||||
out.write(header)
|
||||
out.write(payload)
|
||||
return len(header) + len(payload)
|
||||
@@ -0,0 +1,234 @@
|
||||
"""Pytest fixtures for the AZ-404 E2E replay tests.
|
||||
|
||||
The fixtures are import-clean on dev macOS — the heavy work
|
||||
(synthesizing the tlog, invoking the airborne CLI in a subprocess)
|
||||
runs only when ``RUN_REPLAY_E2E=1`` is set in the environment.
|
||||
Without the env var, the test module's collection-time skip marker
|
||||
prevents the fixtures from being requested.
|
||||
"""
|
||||
|
||||
from __future__ import annotations
|
||||
|
||||
import json
|
||||
import os
|
||||
import shutil
|
||||
import subprocess
|
||||
import sys
|
||||
from collections.abc import Iterator
|
||||
from dataclasses import dataclass
|
||||
from pathlib import Path
|
||||
from typing import Any
|
||||
|
||||
import pytest
|
||||
|
||||
from tests.e2e.replay._helpers import GroundTruthRow, load_ground_truth_csv
|
||||
from tests.e2e.replay._tlog_synth import synthesize_tlog
|
||||
|
||||
|
||||
# Derkachi clip range — anchored at the start of the data_imu.csv
|
||||
# (Time=0.0). The fixture clip is deliberately the first 60 s rather
|
||||
# than a mid-flight slice: the take-off region exercises the AZ-405
|
||||
# IMU-take-off auto-sync detector, and the steady cruise that follows
|
||||
# stresses the satellite-anchor + VIO drift-correction path. The
|
||||
# trim is documented in `tests/e2e/replay/README.md`.
|
||||
_CLIP_START_S: float = 0.0
|
||||
_CLIP_END_S: float = 60.0
|
||||
|
||||
|
||||
# ----------------------------------------------------------------------
|
||||
# Path helpers
|
||||
|
||||
|
||||
def _repo_root() -> Path:
|
||||
return Path(__file__).resolve().parents[3]
|
||||
|
||||
|
||||
def _derkachi_dir() -> Path:
|
||||
return _repo_root() / "_docs" / "00_problem" / "input_data" / "flight_derkachi"
|
||||
|
||||
|
||||
def _calibration_path() -> Path:
|
||||
# Placeholder calibration: the real Topotek KHP20S30 intrinsics
|
||||
# are unknown per `_docs/00_problem/input_data/flight_derkachi/
|
||||
# camera_info.md`. AC-3 is `xfail`ed until a real calibration
|
||||
# ships; AC-1 / AC-2 / AC-5 / AC-6 do not depend on intrinsics
|
||||
# accuracy.
|
||||
return _repo_root() / "tests" / "fixtures" / "calibration" / "adti26.json"
|
||||
|
||||
|
||||
# ----------------------------------------------------------------------
|
||||
# Fixtures
|
||||
|
||||
|
||||
@dataclass(frozen=True)
|
||||
class DerkachiReplayInputs:
|
||||
"""Bundle of paths the AZ-402 CLI consumes for a Derkachi replay run."""
|
||||
|
||||
video_path: Path
|
||||
tlog_path: Path
|
||||
calibration_path: Path
|
||||
config_path: Path
|
||||
signing_key_path: Path
|
||||
output_path: Path
|
||||
ground_truth: list[GroundTruthRow]
|
||||
|
||||
|
||||
@pytest.fixture(scope="session")
|
||||
def derkachi_replay_inputs(tmp_path_factory: pytest.TempPathFactory) -> DerkachiReplayInputs:
|
||||
"""Materialise Derkachi inputs + a synthesized tlog for the e2e run.
|
||||
|
||||
Session-scoped so the tlog synthesizer runs once across the whole
|
||||
e2e collection. The tlog is cached at
|
||||
``tmp_path_factory.mktemp("derkachi") / "synth.tlog"`` so each
|
||||
pytest invocation gets a fresh copy; the synthesizer is fast
|
||||
enough (~1 s for 60 s of data) that disk caching across invocations
|
||||
is unnecessary.
|
||||
"""
|
||||
derkachi = _derkachi_dir()
|
||||
csv_path = derkachi / "data_imu.csv"
|
||||
video_path = derkachi / "flight_derkachi.mp4"
|
||||
if not csv_path.is_file():
|
||||
pytest.fail(
|
||||
f"Derkachi fixture missing: {csv_path} — see "
|
||||
"_docs/00_problem/input_data/flight_derkachi/README.md"
|
||||
)
|
||||
if not video_path.is_file():
|
||||
pytest.fail(f"Derkachi fixture missing: {video_path}")
|
||||
|
||||
work_dir = tmp_path_factory.mktemp("derkachi")
|
||||
tlog_path = work_dir / "synth.tlog"
|
||||
synthesize_tlog(csv_path, tlog_path)
|
||||
|
||||
# Empty signing key — the airborne replay path runs the signing
|
||||
# handshake against `NoopMavlinkTransport`, so the key contents do
|
||||
# not affect any wire output. We still need a real file because
|
||||
# the CLI's path-validation gate requires it.
|
||||
signing_key_path = work_dir / "signing_key.bin"
|
||||
signing_key_path.write_bytes(b"\x00" * 32)
|
||||
|
||||
config_path = work_dir / "config.yaml"
|
||||
config_path.write_text(
|
||||
# Replay-specific overrides; the rest comes from the env vars
|
||||
# the airborne binary's `load_config` honours by default.
|
||||
"mode: replay\n"
|
||||
"replay:\n"
|
||||
" pace: asap\n"
|
||||
" target_fc_dialect: ardupilot_plane\n"
|
||||
)
|
||||
|
||||
output_path = work_dir / "estimator_output.jsonl"
|
||||
|
||||
ground_truth_full = load_ground_truth_csv(csv_path)
|
||||
ground_truth = [
|
||||
r for r in ground_truth_full if _CLIP_START_S <= r.t_s <= _CLIP_END_S
|
||||
]
|
||||
|
||||
return DerkachiReplayInputs(
|
||||
video_path=video_path,
|
||||
tlog_path=tlog_path,
|
||||
calibration_path=_calibration_path(),
|
||||
config_path=config_path,
|
||||
signing_key_path=signing_key_path,
|
||||
output_path=output_path,
|
||||
ground_truth=ground_truth,
|
||||
)
|
||||
|
||||
|
||||
@dataclass(frozen=True)
|
||||
class ReplayRunResult:
|
||||
"""Outcome of a single ``gps-denied-replay`` subprocess run."""
|
||||
|
||||
returncode: int
|
||||
stdout: str
|
||||
stderr: str
|
||||
output_path: Path
|
||||
wall_clock_s: float
|
||||
|
||||
|
||||
@pytest.fixture
|
||||
def replay_runner(derkachi_replay_inputs: DerkachiReplayInputs) -> Any:
|
||||
"""Return a callable that invokes the ``gps-denied-replay`` console-script.
|
||||
|
||||
The callable accepts keyword overrides for ``pace`` and
|
||||
``time_offset_ms``; everything else is taken from
|
||||
``derkachi_replay_inputs``. Output is written to a fresh path per
|
||||
invocation so determinism comparisons (AC-5) get two independent
|
||||
files.
|
||||
"""
|
||||
|
||||
binary = shutil.which("gps-denied-replay")
|
||||
if binary is None:
|
||||
venv_bin = Path(sys.executable).parent / "gps-denied-replay"
|
||||
if venv_bin.exists():
|
||||
binary = str(venv_bin)
|
||||
if binary is None:
|
||||
pytest.skip(
|
||||
"gps-denied-replay console-script not on PATH; "
|
||||
"install the package in the test venv"
|
||||
)
|
||||
|
||||
invocation_count = {"n": 0}
|
||||
|
||||
def _run(*, pace: str = "asap", time_offset_ms: int | None = None) -> ReplayRunResult:
|
||||
import time
|
||||
|
||||
invocation_count["n"] += 1
|
||||
out_path = derkachi_replay_inputs.output_path.with_name(
|
||||
f"estimator_output_{invocation_count['n']}.jsonl"
|
||||
)
|
||||
argv = [
|
||||
binary,
|
||||
"--video",
|
||||
str(derkachi_replay_inputs.video_path),
|
||||
"--tlog",
|
||||
str(derkachi_replay_inputs.tlog_path),
|
||||
"--output",
|
||||
str(out_path),
|
||||
"--camera-calibration",
|
||||
str(derkachi_replay_inputs.calibration_path),
|
||||
"--config",
|
||||
str(derkachi_replay_inputs.config_path),
|
||||
"--mavlink-signing-key",
|
||||
str(derkachi_replay_inputs.signing_key_path),
|
||||
"--pace",
|
||||
pace,
|
||||
]
|
||||
if time_offset_ms is not None:
|
||||
argv.extend(["--time-offset-ms", str(time_offset_ms)])
|
||||
t0 = time.monotonic()
|
||||
completed = subprocess.run(
|
||||
argv,
|
||||
capture_output=True,
|
||||
text=True,
|
||||
timeout=180,
|
||||
)
|
||||
wall_s = time.monotonic() - t0
|
||||
return ReplayRunResult(
|
||||
returncode=completed.returncode,
|
||||
stdout=completed.stdout,
|
||||
stderr=completed.stderr,
|
||||
output_path=out_path,
|
||||
wall_clock_s=wall_s,
|
||||
)
|
||||
|
||||
return _run
|
||||
|
||||
|
||||
@pytest.fixture
|
||||
def operator_pre_flight_setup(tmp_path: Path) -> Iterator[Path]:
|
||||
"""Operator C12 pre-flight rehearsal stub.
|
||||
|
||||
Per AZ-404's spec this fixture should run the operator's full
|
||||
C10/C11/C12 pre-flight against a ``mock-suite-sat-service``
|
||||
fixture and yield the populated cache directory. The current
|
||||
``tests/fixtures/mock-suite-sat-service`` is a bootstrap stub
|
||||
(only ``GET /healthz`` per its README) — the full D-PROJ-2
|
||||
contract is not implemented. Until that ships, AC-8 (operator
|
||||
workflow rehearsal) is skipped at the test level; this fixture
|
||||
yields a placeholder cache directory so test bodies that
|
||||
request it can fail-fast with a documented reason rather than a
|
||||
surprise ImportError.
|
||||
"""
|
||||
cache_dir = tmp_path / "operator_cache"
|
||||
cache_dir.mkdir()
|
||||
yield cache_dir
|
||||
@@ -0,0 +1,382 @@
|
||||
"""AZ-404 — E2E replay test against the Derkachi 60 s clip.
|
||||
|
||||
Runs the ``gps-denied-replay`` console-script (AZ-402) against the
|
||||
Derkachi fixture (``_docs/00_problem/input_data/flight_derkachi/``)
|
||||
and asserts the epic AZ-265 acceptance criteria. Per the project's
|
||||
E2E pattern the heavy tests are gated by ``RUN_REPLAY_E2E=1``; the
|
||||
lightweight AC-4a (mode-agnosticism AST scan) and AC-7 (skip-gate
|
||||
self-check) run unconditionally.
|
||||
|
||||
Some ACs are SKIPPED with documented reasons until upstream work
|
||||
ships:
|
||||
|
||||
* AC-3 (≤ 100 m for 80 % of ticks) — ``xfail`` until a real Topotek
|
||||
KHP20S30 calibration ships (camera_info.md notes the intrinsics
|
||||
are unknown).
|
||||
* AC-4b (encoder byte-equality) — ``skip`` until AZ-558 routes the
|
||||
C8 outbound bytes through the ``MavlinkTransport`` seam.
|
||||
* AC-8 / AC-9 in spec (operator workflow rehearsal) — ``skip`` until
|
||||
``mock-suite-sat-service`` implements the D-PROJ-2 ingest contract.
|
||||
|
||||
The unit-level ``_helpers.py`` tests in ``test_helpers.py`` cover
|
||||
AC-9 (helper L2 correctness) unconditionally.
|
||||
"""
|
||||
|
||||
from __future__ import annotations
|
||||
|
||||
import ast
|
||||
import os
|
||||
import re
|
||||
from pathlib import Path
|
||||
|
||||
import pytest
|
||||
|
||||
from tests.e2e.replay._helpers import (
|
||||
match_percentage,
|
||||
parse_jsonl,
|
||||
)
|
||||
|
||||
|
||||
# ----------------------------------------------------------------------
|
||||
# Skip gates
|
||||
|
||||
|
||||
def _heavy_skip_reason() -> str | None:
|
||||
if os.environ.get("RUN_REPLAY_E2E", "").lower() not in {"1", "true", "yes", "on"}:
|
||||
return "AZ-404 heavy e2e tests gated by RUN_REPLAY_E2E=1"
|
||||
return None
|
||||
|
||||
|
||||
_HEAVY_SKIP = pytest.mark.skipif(
|
||||
_heavy_skip_reason() is not None, reason=_heavy_skip_reason() or "ok"
|
||||
)
|
||||
|
||||
|
||||
# ----------------------------------------------------------------------
|
||||
# AC-1: CLI exits 0; JSONL line count matches tlog GLOBAL_POSITION_INT count
|
||||
|
||||
|
||||
@_HEAVY_SKIP
|
||||
def test_ac1_exits_0_jsonl_count_match(replay_runner, derkachi_replay_inputs) -> None:
|
||||
# Act
|
||||
result = replay_runner(pace="asap")
|
||||
|
||||
# Assert — clean exit
|
||||
assert result.returncode == 0, (
|
||||
f"gps-denied-replay exited {result.returncode}\n"
|
||||
f"stdout:\n{result.stdout}\nstderr:\n{result.stderr}"
|
||||
)
|
||||
|
||||
# Assert — JSONL line count within ±5 % of the ground-truth row count
|
||||
rows = parse_jsonl(result.output_path)
|
||||
expected = len(derkachi_replay_inputs.ground_truth)
|
||||
actual = len(rows)
|
||||
tolerance = max(1, int(expected * 0.05))
|
||||
assert abs(actual - expected) <= tolerance, (
|
||||
f"JSONL count {actual} not within ±5 % of expected "
|
||||
f"{expected} (tolerance ±{tolerance})"
|
||||
)
|
||||
|
||||
|
||||
# ----------------------------------------------------------------------
|
||||
# AC-2: Each line is valid JSON matching the EstimatorOutput schema
|
||||
|
||||
|
||||
_ESTIMATOR_OUTPUT_KEYS = frozenset(
|
||||
{
|
||||
"frame_id",
|
||||
"position_wgs84",
|
||||
"orientation_world_T_body",
|
||||
"velocity_world_mps",
|
||||
"covariance_6x6",
|
||||
"source_label",
|
||||
"last_satellite_anchor_age_ms",
|
||||
"smoothed",
|
||||
"emitted_at",
|
||||
}
|
||||
)
|
||||
|
||||
|
||||
@_HEAVY_SKIP
|
||||
def test_ac2_jsonl_schema_match(replay_runner) -> None:
|
||||
# Act
|
||||
result = replay_runner(pace="asap")
|
||||
rows = parse_jsonl(result.output_path)
|
||||
|
||||
# Assert
|
||||
assert rows, "no JSONL output rows produced"
|
||||
for i, row in enumerate(rows):
|
||||
assert isinstance(row, dict), f"row {i} is not a JSON object"
|
||||
missing = _ESTIMATOR_OUTPUT_KEYS - set(row.keys())
|
||||
extra = set(row.keys()) - _ESTIMATOR_OUTPUT_KEYS
|
||||
assert not missing, f"row {i} missing keys: {missing}"
|
||||
assert not extra, f"row {i} has unexpected keys: {extra}"
|
||||
assert isinstance(row["position_wgs84"], dict)
|
||||
assert {"lat_deg", "lon_deg", "alt_m"}.issubset(row["position_wgs84"])
|
||||
assert isinstance(row["covariance_6x6"], list) and len(row["covariance_6x6"]) == 36
|
||||
assert isinstance(row["smoothed"], bool)
|
||||
|
||||
|
||||
# ----------------------------------------------------------------------
|
||||
# AC-3: ≥ 80 % of emissions within 100 m of ground truth
|
||||
|
||||
|
||||
@_HEAVY_SKIP
|
||||
@pytest.mark.xfail(
|
||||
reason=(
|
||||
"AC-3 requires a real Topotek KHP20S30 camera calibration; "
|
||||
"_docs/00_problem/input_data/flight_derkachi/camera_info.md "
|
||||
"states the intrinsics are unknown. Test runs as xfail "
|
||||
"until a real calibration JSON ships."
|
||||
),
|
||||
strict=False,
|
||||
)
|
||||
def test_ac3_within_100m_80pct_of_ticks(replay_runner, derkachi_replay_inputs) -> None:
|
||||
# Act
|
||||
result = replay_runner(pace="asap")
|
||||
rows = parse_jsonl(result.output_path)
|
||||
|
||||
# Assert
|
||||
pct = match_percentage(
|
||||
rows,
|
||||
derkachi_replay_inputs.ground_truth,
|
||||
threshold_m=100.0,
|
||||
)
|
||||
assert pct >= 0.80, (
|
||||
f"AC-3: only {pct * 100:.1f} % of emissions within 100 m of GT; "
|
||||
f"epic threshold is 80 %"
|
||||
)
|
||||
|
||||
|
||||
# ----------------------------------------------------------------------
|
||||
# AC-4a: Mode-agnosticism AST scan (runs unconditionally)
|
||||
|
||||
|
||||
def test_ac4_mode_agnosticism_ast_scan() -> None:
|
||||
"""Components MUST NOT branch on `config.mode` / `is_replay` / etc.
|
||||
|
||||
Per ADR-011 + replay protocol Invariant 1, replay-mode logic is
|
||||
structurally confined to the composition root (``runtime_root``),
|
||||
the replay strategies (``frame_source``, ``clock``,
|
||||
``c8_fc_adapter/{tlog_replay_adapter,replay_sink,
|
||||
noop_mavlink_transport,serial_mavlink_transport}``), the
|
||||
``replay_input/`` coordinator, and the ``cli/replay.py`` CLI. No
|
||||
``components/**/*.py`` file should test the mode at runtime.
|
||||
"""
|
||||
# Arrange
|
||||
repo_root = Path(__file__).resolve().parents[3]
|
||||
components_dir = repo_root / "src" / "gps_denied_onboard" / "components"
|
||||
py_files = sorted(components_dir.rglob("*.py"))
|
||||
assert py_files, "no component .py files found — repository layout drift?"
|
||||
|
||||
# Patterns we treat as mode-aware branches.
|
||||
forbidden_attribute_chains = {
|
||||
("config", "mode"),
|
||||
("self", "_replay_mode"),
|
||||
("self", "_mode"),
|
||||
("self", "is_replay"),
|
||||
}
|
||||
forbidden_compare_strings = {"replay", "live"}
|
||||
|
||||
violations: list[str] = []
|
||||
for path in py_files:
|
||||
try:
|
||||
tree = ast.parse(path.read_text(encoding="utf-8"))
|
||||
except SyntaxError as exc:
|
||||
pytest.fail(f"{path} is not valid Python: {exc!r}")
|
||||
scanner = _ModeBranchScanner(
|
||||
forbidden_attribute_chains, forbidden_compare_strings
|
||||
)
|
||||
scanner.visit(tree)
|
||||
for lineno, snippet in scanner.violations:
|
||||
violations.append(f"{path.relative_to(repo_root)}:{lineno}: {snippet}")
|
||||
|
||||
# Assert
|
||||
assert not violations, (
|
||||
"mode-agnosticism violation — components must not branch on "
|
||||
"replay vs live state (move the branch to runtime_root or a "
|
||||
"replay strategy):\n " + "\n ".join(violations)
|
||||
)
|
||||
|
||||
|
||||
class _ModeBranchScanner(ast.NodeVisitor):
|
||||
"""AST visitor that flags `if config.mode == ...` / `is_replay` / etc."""
|
||||
|
||||
def __init__(
|
||||
self,
|
||||
forbidden_attribute_chains: set[tuple[str, str]],
|
||||
forbidden_compare_strings: set[str],
|
||||
) -> None:
|
||||
self.forbidden_attrs = forbidden_attribute_chains
|
||||
self.forbidden_strings = forbidden_compare_strings
|
||||
self.violations: list[tuple[int, str]] = []
|
||||
|
||||
def visit_If(self, node: ast.If) -> None:
|
||||
self._check_test(node.test)
|
||||
self.generic_visit(node)
|
||||
|
||||
def visit_IfExp(self, node: ast.IfExp) -> None:
|
||||
self._check_test(node.test)
|
||||
self.generic_visit(node)
|
||||
|
||||
def _check_test(self, node: ast.expr) -> None:
|
||||
# Catch `if self._replay_mode:` / `if config.mode:`
|
||||
if isinstance(node, ast.Attribute):
|
||||
chain = self._attribute_chain(node)
|
||||
if chain in self.forbidden_attrs:
|
||||
self.violations.append(
|
||||
(node.lineno, f"truthiness of {'.'.join(chain)}")
|
||||
)
|
||||
# Catch `if config.mode == "replay":` / `if mode != "live":`
|
||||
if isinstance(node, ast.Compare) and isinstance(node.left, ast.Attribute):
|
||||
chain = self._attribute_chain(node.left)
|
||||
if chain in self.forbidden_attrs:
|
||||
for cmp_value in node.comparators:
|
||||
if (
|
||||
isinstance(cmp_value, ast.Constant)
|
||||
and isinstance(cmp_value.value, str)
|
||||
and cmp_value.value in self.forbidden_strings
|
||||
):
|
||||
self.violations.append(
|
||||
(
|
||||
node.lineno,
|
||||
f"compare {'.'.join(chain)} == {cmp_value.value!r}",
|
||||
)
|
||||
)
|
||||
# Catch nested boolean / unary wrappers.
|
||||
if isinstance(node, ast.BoolOp):
|
||||
for value in node.values:
|
||||
self._check_test(value)
|
||||
if isinstance(node, ast.UnaryOp):
|
||||
self._check_test(node.operand)
|
||||
|
||||
@staticmethod
|
||||
def _attribute_chain(node: ast.Attribute) -> tuple[str, ...]:
|
||||
"""Return ('self', 'mode') for `self.mode`, etc.; () if non-trivial."""
|
||||
parts: list[str] = []
|
||||
cur: ast.expr = node
|
||||
while isinstance(cur, ast.Attribute):
|
||||
parts.append(cur.attr)
|
||||
cur = cur.value
|
||||
if isinstance(cur, ast.Name):
|
||||
parts.append(cur.id)
|
||||
else:
|
||||
return ()
|
||||
return tuple(reversed(parts))
|
||||
|
||||
|
||||
# ----------------------------------------------------------------------
|
||||
# AC-4b: Encoder byte-equality (BLOCKED on AZ-558)
|
||||
|
||||
|
||||
@pytest.mark.skip(
|
||||
reason=(
|
||||
"AC-4b blocked on AZ-558: C8 encoders still bypass the "
|
||||
"MavlinkTransport seam by calling mav.*_send directly. The "
|
||||
"CapturingMavlinkTransport fixture in _helpers.py is ready; "
|
||||
"this test unskips when AZ-558 lands."
|
||||
)
|
||||
)
|
||||
def test_ac4_encoder_byte_equality() -> None:
|
||||
raise NotImplementedError("blocked on AZ-558 — see skip reason")
|
||||
|
||||
|
||||
# ----------------------------------------------------------------------
|
||||
# AC-5: Determinism (two runs differ by ≤ 1e-6 in position fields)
|
||||
|
||||
|
||||
@_HEAVY_SKIP
|
||||
def test_ac5_determinism_two_runs_diff(replay_runner) -> None:
|
||||
# Act
|
||||
r1 = replay_runner(pace="asap")
|
||||
r2 = replay_runner(pace="asap")
|
||||
|
||||
# Assert
|
||||
assert r1.returncode == 0 and r2.returncode == 0
|
||||
rows_1 = parse_jsonl(r1.output_path)
|
||||
rows_2 = parse_jsonl(r2.output_path)
|
||||
assert len(rows_1) == len(rows_2), (
|
||||
f"determinism violated at line count: {len(rows_1)} vs {len(rows_2)}"
|
||||
)
|
||||
for i, (a, b) in enumerate(zip(rows_1, rows_2, strict=True)):
|
||||
for axis in ("lat_deg", "lon_deg", "alt_m"):
|
||||
diff = abs(
|
||||
a["position_wgs84"][axis] - b["position_wgs84"][axis]
|
||||
)
|
||||
assert diff <= 1e-6, (
|
||||
f"row {i} axis {axis}: |{a['position_wgs84'][axis]} - "
|
||||
f"{b['position_wgs84'][axis]}| = {diff} > 1e-6"
|
||||
)
|
||||
|
||||
|
||||
# ----------------------------------------------------------------------
|
||||
# AC-6: Pace timing
|
||||
|
||||
|
||||
@_HEAVY_SKIP
|
||||
def test_ac6_pace_realtime_60s_within_5pct(replay_runner) -> None:
|
||||
# Act
|
||||
result = replay_runner(pace="realtime")
|
||||
|
||||
# Assert
|
||||
assert result.returncode == 0
|
||||
# 60 s clip ± 3 s tolerance per the spec.
|
||||
assert 57.0 <= result.wall_clock_s <= 63.0, (
|
||||
f"--pace realtime expected 60 s ± 3 s; got {result.wall_clock_s:.2f} s"
|
||||
)
|
||||
|
||||
|
||||
@_HEAVY_SKIP
|
||||
def test_ac6_pace_asap_under_30s(replay_runner) -> None:
|
||||
# Act
|
||||
result = replay_runner(pace="asap")
|
||||
|
||||
# Assert
|
||||
assert result.returncode == 0
|
||||
assert result.wall_clock_s <= 30.0, (
|
||||
f"--pace asap expected ≤ 30 s on Tier-1; got {result.wall_clock_s:.2f} s"
|
||||
)
|
||||
|
||||
|
||||
# ----------------------------------------------------------------------
|
||||
# AC-7: Skip-gate self-check
|
||||
|
||||
|
||||
def test_ac7_skip_gate_consistent_with_env_var() -> None:
|
||||
"""The heavy-test skip mark MUST mirror the documented env-var gate.
|
||||
|
||||
Verifies that ``RUN_REPLAY_E2E`` controls the skip mark, so the
|
||||
epic AC-7 contract ("all e2e tests skip cleanly without the env
|
||||
var, without errors") is observably true at collection time.
|
||||
"""
|
||||
# Arrange
|
||||
env_set = os.environ.get("RUN_REPLAY_E2E", "").lower() in {
|
||||
"1", "true", "yes", "on"
|
||||
}
|
||||
|
||||
# Act
|
||||
skip_active = _heavy_skip_reason() is not None
|
||||
|
||||
# Assert
|
||||
assert skip_active != env_set, (
|
||||
f"RUN_REPLAY_E2E env_set={env_set}; skip_active={skip_active}"
|
||||
)
|
||||
|
||||
|
||||
# ----------------------------------------------------------------------
|
||||
# Operator workflow rehearsal (AC-8 in this file's matrix; spec calls it AC-9)
|
||||
|
||||
|
||||
@pytest.mark.skip(
|
||||
reason=(
|
||||
"AC-8 (operator workflow rehearsal) blocked on the full "
|
||||
"D-PROJ-2 mock-suite-sat-service implementation — current "
|
||||
"tests/fixtures/mock-suite-sat-service/ is a bootstrap stub "
|
||||
"with only GET /healthz. Unskips when the mock implements "
|
||||
"tile-fetch + index-build endpoints."
|
||||
)
|
||||
)
|
||||
def test_ac8_operator_workflow(operator_pre_flight_setup, replay_runner) -> None:
|
||||
raise NotImplementedError(
|
||||
"blocked on D-PROJ-2 mock-suite-sat-service implementation"
|
||||
)
|
||||
@@ -0,0 +1,205 @@
|
||||
"""Unit-level tests for the AZ-404 e2e helpers.
|
||||
|
||||
Runs unconditionally in the regular regression suite (NOT gated by
|
||||
``RUN_REPLAY_E2E``) — the helpers are pure / deterministic and test
|
||||
themselves cheaply. Covers AC-9 (Helper L2 computation correct) and
|
||||
ancillary helper invariants.
|
||||
"""
|
||||
|
||||
from __future__ import annotations
|
||||
|
||||
from pathlib import Path
|
||||
|
||||
import pytest
|
||||
|
||||
from tests.e2e.replay._helpers import (
|
||||
CapturingMavlinkTransport,
|
||||
GroundTruthRow,
|
||||
l2_horizontal_m,
|
||||
match_percentage,
|
||||
parse_jsonl,
|
||||
)
|
||||
|
||||
|
||||
# ----------------------------------------------------------------------
|
||||
# AC-9: L2 helper correctness
|
||||
|
||||
|
||||
def test_ac9_l2_zero_at_same_point() -> None:
|
||||
# Arrange / Act
|
||||
d = l2_horizontal_m(50.08, 36.11, 50.08, 36.11)
|
||||
|
||||
# Assert
|
||||
assert d == pytest.approx(0.0, abs=1e-6)
|
||||
|
||||
|
||||
def test_ac9_l2_north_one_degree_111km() -> None:
|
||||
"""One degree of latitude ≈ 111 km on the WGS84 spherical model."""
|
||||
# Act
|
||||
d = l2_horizontal_m(50.08, 36.11, 51.08, 36.11)
|
||||
|
||||
# Assert
|
||||
assert d == pytest.approx(111_195.0, rel=0.001)
|
||||
|
||||
|
||||
def test_ac9_l2_known_pair_kharkiv_kyiv() -> None:
|
||||
"""Hand-checked Derkachi (~Kharkiv) to Kyiv center: 411 km ± 1 km."""
|
||||
# Arrange
|
||||
kharkiv_lat, kharkiv_lon = 49.9935, 36.2304
|
||||
kyiv_lat, kyiv_lon = 50.4501, 30.5234
|
||||
|
||||
# Act
|
||||
d = l2_horizontal_m(kharkiv_lat, kharkiv_lon, kyiv_lat, kyiv_lon)
|
||||
|
||||
# Assert — externally known reference distance is 411 km.
|
||||
assert d == pytest.approx(411_000.0, rel=0.005)
|
||||
|
||||
|
||||
def test_ac9_l2_symmetric() -> None:
|
||||
# Arrange
|
||||
a = (49.991, 36.221)
|
||||
b = (50.080, 36.111)
|
||||
|
||||
# Act
|
||||
d_ab = l2_horizontal_m(*a, *b)
|
||||
d_ba = l2_horizontal_m(*b, *a)
|
||||
|
||||
# Assert
|
||||
assert d_ab == pytest.approx(d_ba, rel=1e-12)
|
||||
|
||||
|
||||
# ----------------------------------------------------------------------
|
||||
# match_percentage
|
||||
|
||||
|
||||
def test_match_percentage_all_within_threshold() -> None:
|
||||
# Arrange
|
||||
gt = [GroundTruthRow(t_s=0.0, lat_deg=50.0, lon_deg=36.0, alt_m=100.0)]
|
||||
emissions = [
|
||||
{
|
||||
"emitted_at": 0,
|
||||
"position_wgs84": {"lat_deg": 50.0, "lon_deg": 36.0, "alt_m": 100.0},
|
||||
}
|
||||
]
|
||||
|
||||
# Act
|
||||
pct = match_percentage(emissions, gt, threshold_m=100.0)
|
||||
|
||||
# Assert
|
||||
assert pct == 1.0
|
||||
|
||||
|
||||
def test_match_percentage_none_within_threshold() -> None:
|
||||
# Arrange
|
||||
gt = [GroundTruthRow(t_s=0.0, lat_deg=50.0, lon_deg=36.0, alt_m=100.0)]
|
||||
emissions = [
|
||||
{
|
||||
"emitted_at": 0,
|
||||
# ~111 km north of the GT row.
|
||||
"position_wgs84": {"lat_deg": 51.0, "lon_deg": 36.0, "alt_m": 100.0},
|
||||
}
|
||||
]
|
||||
|
||||
# Act
|
||||
pct = match_percentage(emissions, gt, threshold_m=100.0)
|
||||
|
||||
# Assert
|
||||
assert pct == 0.0
|
||||
|
||||
|
||||
def test_match_percentage_empty_emissions_zero() -> None:
|
||||
# Arrange
|
||||
gt = [GroundTruthRow(t_s=0.0, lat_deg=50.0, lon_deg=36.0, alt_m=100.0)]
|
||||
|
||||
# Act
|
||||
pct = match_percentage([], gt, threshold_m=100.0)
|
||||
|
||||
# Assert
|
||||
assert pct == 0.0
|
||||
|
||||
|
||||
def test_match_percentage_empty_ground_truth_raises() -> None:
|
||||
# Act / Assert
|
||||
with pytest.raises(AssertionError, match="ground_truth must be non-empty"):
|
||||
match_percentage(
|
||||
[{"emitted_at": 0, "position_wgs84": {"lat_deg": 50, "lon_deg": 36}}],
|
||||
[],
|
||||
threshold_m=100.0,
|
||||
)
|
||||
|
||||
|
||||
# ----------------------------------------------------------------------
|
||||
# parse_jsonl
|
||||
|
||||
|
||||
def test_parse_jsonl_round_trip(tmp_path: Path) -> None:
|
||||
# Arrange
|
||||
path = tmp_path / "out.jsonl"
|
||||
path.write_text('{"a": 1}\n{"b": 2}\n')
|
||||
|
||||
# Act
|
||||
rows = parse_jsonl(path)
|
||||
|
||||
# Assert
|
||||
assert rows == [{"a": 1}, {"b": 2}]
|
||||
|
||||
|
||||
def test_parse_jsonl_skips_trailing_blank(tmp_path: Path) -> None:
|
||||
# Arrange
|
||||
path = tmp_path / "out.jsonl"
|
||||
path.write_text('{"a": 1}\n\n')
|
||||
|
||||
# Act
|
||||
rows = parse_jsonl(path)
|
||||
|
||||
# Assert — the trailing blank line is tolerated
|
||||
assert rows == [{"a": 1}]
|
||||
|
||||
|
||||
def test_parse_jsonl_invalid_line_raises(tmp_path: Path) -> None:
|
||||
# Arrange
|
||||
path = tmp_path / "out.jsonl"
|
||||
path.write_text("not json\n")
|
||||
|
||||
# Act / Assert
|
||||
with pytest.raises(AssertionError, match="not valid JSON"):
|
||||
parse_jsonl(path)
|
||||
|
||||
|
||||
# ----------------------------------------------------------------------
|
||||
# CapturingMavlinkTransport (ready for AZ-558 unblock)
|
||||
|
||||
|
||||
def test_capturing_transport_records_writes() -> None:
|
||||
# Arrange
|
||||
t = CapturingMavlinkTransport()
|
||||
|
||||
# Act
|
||||
t.write(b"abc")
|
||||
t.write(b"def")
|
||||
|
||||
# Assert
|
||||
assert t.captured_payloads == (b"abc", b"def")
|
||||
assert t.captured_concat == b"abcdef"
|
||||
assert t.bytes_written() == 6
|
||||
|
||||
|
||||
def test_capturing_transport_close_then_write_raises() -> None:
|
||||
# Arrange
|
||||
t = CapturingMavlinkTransport()
|
||||
t.close()
|
||||
|
||||
# Act / Assert
|
||||
with pytest.raises(RuntimeError, match="after close"):
|
||||
t.write(b"x")
|
||||
|
||||
|
||||
def test_capturing_transport_implements_protocol() -> None:
|
||||
# Arrange
|
||||
from gps_denied_onboard.components.c8_fc_adapter.interface import MavlinkTransport
|
||||
|
||||
# Act
|
||||
t = CapturingMavlinkTransport()
|
||||
|
||||
# Assert — runtime_checkable Protocol acceptance
|
||||
assert isinstance(t, MavlinkTransport)
|
||||
Reference in New Issue
Block a user