mirror of
https://github.com/azaion/gps-denied-onboard.git
synced 2026-06-22 20:51:14 +00:00
[AZ-404] [AZ-389] [AZ-559] E2E replay test (Derkachi 60s) + AZ-389 cleanup
Batch 63 of /autodev replay slice. Adds the AZ-404 E2E test harness against the Derkachi fixture and resolves the AZ-389 dependency phantom (closing AZ-559 Won't Fix). E2E test (AZ-404) - tests/e2e/replay/_tlog_synth.py: deterministic CSV->tlog generator (the original Derkachi tlog is not in repo; data_imu.csv is its export, so we round-trip the CSV through pymavlink). Verified: SCALED_IMU2 + ATTITUDE + GPS_RAW_INT + HEARTBEAT round-trip cleanly through mavutil.mavlink_connection. - tests/e2e/replay/_helpers.py: parse_jsonl, l2_horizontal_m (haversine), match_percentage, CapturingMavlinkTransport (ready for AZ-558 unblock), GroundTruthRow + load_ground_truth_csv. - tests/e2e/replay/conftest.py: derkachi_replay_inputs (session scope), replay_runner (subprocess fixture per AZ-402 CLI), operator_pre_flight_setup placeholder. - tests/e2e/replay/test_derkachi_1min.py: 9 tests covering AC-1..AC-8 with AC-7 skip-gate self-check + AC-4a mode-agnosticism AST scan (passes unconditionally, confirms ADR-011 holding). - tests/e2e/replay/test_helpers.py: 14 unit tests covering AC-9 helper L2 correctness + match_percentage + parse_jsonl + CapturingMavlinkTransport (all unconditional). - tests/e2e/replay/README.md: AC matrix, fixture state, runtime budget, failure cookbook (AC-10). AC matrix - AC-1, AC-2, AC-5, AC-6 implemented and Tier-1 gated on RUN_REPLAY_E2E=1. - AC-3 (<=100m for 80%) xfail until real Topotek KHP20S30 calibration ships (camera_info.md states intrinsics are unknown). - AC-4a (mode-agnosticism AST scan) PASSES unconditionally. - AC-4b (encoder byte-equality) skip until AZ-558 routes C8 bytes through MavlinkTransport. - AC-7 (skip-gate self-check) PASSES unconditionally. - AC-8 (operator workflow rehearsal) skip until D-PROJ-2 mock-suite-sat-service implements tile-fetch + index-build endpoints. - AC-9 (helper L2 correctness) 14 PASSES unconditionally. AZ-389 housekeeping - AZ-559 closed Won't Fix: investigation against c6_tile_cache/_types.py confirmed TileSource.ONBOARD_INGEST + TileMetadata.quality_metadata + write_tile's FreshnessRejectionError already cover the mid-flight ingest semantic. The "missing API" was a spec-vs-impl naming mismatch. - AZ-389 spec rewritten to consume the existing write_tile API + catch FreshnessRejectionError per AC-NEW-3 opportunistic emission. - _dependencies_table.md reverted: AZ-389 deps -> AZ-303 (was AZ-559 in the previous commit on this branch); total 150 / 497 pts. Tests - Full regression: 2099 passed (+14 new e2e/replay), 94 skipped (incl. 8 e2e/replay heavy-tier + documented blocker skips), 3 perf-microbench flakes deselected (test_cli_cold_start_under_2s, test_cold_start_under_500ms_p99, test_nfr_perf_sign_microbench; all pass in isolation - pre-existing under-load flakes on dev macOS). Reviews - _docs/03_implementation/reviews/batch_63_review.md: code review PASS_WITH_WARNINGS (3 documented spec-gap deferrals: AC-3, AC-4b, AC-8). - _docs/03_implementation/cumulative_review_batches_61-63_cycle1_report.md: cumulative review PASS_WITH_WARNINGS. Action items: prioritise AZ-558 (closes AZ-401 AC-9 + AZ-404 AC-4b); consider 2pt hygiene PBI for Protocol-completeness AST scan to catch the AZ-389 / AZ-559 phantom-API pattern at task-prep time. Architecture invariants observably holding - ADR-011 (replay-as-configuration): AC-4a's AST scan over src/gps_denied_onboard/components/**/*.py finds zero violations - components branch on neither config.mode nor any synonym. - Single composition root (replay protocol Invariant 11): AZ-402 CLI dispatches to runtime_root.main(config); does not call compose_root directly. Co-authored-by: Cursor <cursoragent@cursor.com>
This commit is contained in:
@@ -0,0 +1,382 @@
|
||||
"""AZ-404 — E2E replay test against the Derkachi 60 s clip.
|
||||
|
||||
Runs the ``gps-denied-replay`` console-script (AZ-402) against the
|
||||
Derkachi fixture (``_docs/00_problem/input_data/flight_derkachi/``)
|
||||
and asserts the epic AZ-265 acceptance criteria. Per the project's
|
||||
E2E pattern the heavy tests are gated by ``RUN_REPLAY_E2E=1``; the
|
||||
lightweight AC-4a (mode-agnosticism AST scan) and AC-7 (skip-gate
|
||||
self-check) run unconditionally.
|
||||
|
||||
Some ACs are SKIPPED with documented reasons until upstream work
|
||||
ships:
|
||||
|
||||
* AC-3 (≤ 100 m for 80 % of ticks) — ``xfail`` until a real Topotek
|
||||
KHP20S30 calibration ships (camera_info.md notes the intrinsics
|
||||
are unknown).
|
||||
* AC-4b (encoder byte-equality) — ``skip`` until AZ-558 routes the
|
||||
C8 outbound bytes through the ``MavlinkTransport`` seam.
|
||||
* AC-8 / AC-9 in spec (operator workflow rehearsal) — ``skip`` until
|
||||
``mock-suite-sat-service`` implements the D-PROJ-2 ingest contract.
|
||||
|
||||
The unit-level ``_helpers.py`` tests in ``test_helpers.py`` cover
|
||||
AC-9 (helper L2 correctness) unconditionally.
|
||||
"""
|
||||
|
||||
from __future__ import annotations
|
||||
|
||||
import ast
|
||||
import os
|
||||
import re
|
||||
from pathlib import Path
|
||||
|
||||
import pytest
|
||||
|
||||
from tests.e2e.replay._helpers import (
|
||||
match_percentage,
|
||||
parse_jsonl,
|
||||
)
|
||||
|
||||
|
||||
# ----------------------------------------------------------------------
|
||||
# Skip gates
|
||||
|
||||
|
||||
def _heavy_skip_reason() -> str | None:
|
||||
if os.environ.get("RUN_REPLAY_E2E", "").lower() not in {"1", "true", "yes", "on"}:
|
||||
return "AZ-404 heavy e2e tests gated by RUN_REPLAY_E2E=1"
|
||||
return None
|
||||
|
||||
|
||||
_HEAVY_SKIP = pytest.mark.skipif(
|
||||
_heavy_skip_reason() is not None, reason=_heavy_skip_reason() or "ok"
|
||||
)
|
||||
|
||||
|
||||
# ----------------------------------------------------------------------
|
||||
# AC-1: CLI exits 0; JSONL line count matches tlog GLOBAL_POSITION_INT count
|
||||
|
||||
|
||||
@_HEAVY_SKIP
|
||||
def test_ac1_exits_0_jsonl_count_match(replay_runner, derkachi_replay_inputs) -> None:
|
||||
# Act
|
||||
result = replay_runner(pace="asap")
|
||||
|
||||
# Assert — clean exit
|
||||
assert result.returncode == 0, (
|
||||
f"gps-denied-replay exited {result.returncode}\n"
|
||||
f"stdout:\n{result.stdout}\nstderr:\n{result.stderr}"
|
||||
)
|
||||
|
||||
# Assert — JSONL line count within ±5 % of the ground-truth row count
|
||||
rows = parse_jsonl(result.output_path)
|
||||
expected = len(derkachi_replay_inputs.ground_truth)
|
||||
actual = len(rows)
|
||||
tolerance = max(1, int(expected * 0.05))
|
||||
assert abs(actual - expected) <= tolerance, (
|
||||
f"JSONL count {actual} not within ±5 % of expected "
|
||||
f"{expected} (tolerance ±{tolerance})"
|
||||
)
|
||||
|
||||
|
||||
# ----------------------------------------------------------------------
|
||||
# AC-2: Each line is valid JSON matching the EstimatorOutput schema
|
||||
|
||||
|
||||
_ESTIMATOR_OUTPUT_KEYS = frozenset(
|
||||
{
|
||||
"frame_id",
|
||||
"position_wgs84",
|
||||
"orientation_world_T_body",
|
||||
"velocity_world_mps",
|
||||
"covariance_6x6",
|
||||
"source_label",
|
||||
"last_satellite_anchor_age_ms",
|
||||
"smoothed",
|
||||
"emitted_at",
|
||||
}
|
||||
)
|
||||
|
||||
|
||||
@_HEAVY_SKIP
|
||||
def test_ac2_jsonl_schema_match(replay_runner) -> None:
|
||||
# Act
|
||||
result = replay_runner(pace="asap")
|
||||
rows = parse_jsonl(result.output_path)
|
||||
|
||||
# Assert
|
||||
assert rows, "no JSONL output rows produced"
|
||||
for i, row in enumerate(rows):
|
||||
assert isinstance(row, dict), f"row {i} is not a JSON object"
|
||||
missing = _ESTIMATOR_OUTPUT_KEYS - set(row.keys())
|
||||
extra = set(row.keys()) - _ESTIMATOR_OUTPUT_KEYS
|
||||
assert not missing, f"row {i} missing keys: {missing}"
|
||||
assert not extra, f"row {i} has unexpected keys: {extra}"
|
||||
assert isinstance(row["position_wgs84"], dict)
|
||||
assert {"lat_deg", "lon_deg", "alt_m"}.issubset(row["position_wgs84"])
|
||||
assert isinstance(row["covariance_6x6"], list) and len(row["covariance_6x6"]) == 36
|
||||
assert isinstance(row["smoothed"], bool)
|
||||
|
||||
|
||||
# ----------------------------------------------------------------------
|
||||
# AC-3: ≥ 80 % of emissions within 100 m of ground truth
|
||||
|
||||
|
||||
@_HEAVY_SKIP
|
||||
@pytest.mark.xfail(
|
||||
reason=(
|
||||
"AC-3 requires a real Topotek KHP20S30 camera calibration; "
|
||||
"_docs/00_problem/input_data/flight_derkachi/camera_info.md "
|
||||
"states the intrinsics are unknown. Test runs as xfail "
|
||||
"until a real calibration JSON ships."
|
||||
),
|
||||
strict=False,
|
||||
)
|
||||
def test_ac3_within_100m_80pct_of_ticks(replay_runner, derkachi_replay_inputs) -> None:
|
||||
# Act
|
||||
result = replay_runner(pace="asap")
|
||||
rows = parse_jsonl(result.output_path)
|
||||
|
||||
# Assert
|
||||
pct = match_percentage(
|
||||
rows,
|
||||
derkachi_replay_inputs.ground_truth,
|
||||
threshold_m=100.0,
|
||||
)
|
||||
assert pct >= 0.80, (
|
||||
f"AC-3: only {pct * 100:.1f} % of emissions within 100 m of GT; "
|
||||
f"epic threshold is 80 %"
|
||||
)
|
||||
|
||||
|
||||
# ----------------------------------------------------------------------
|
||||
# AC-4a: Mode-agnosticism AST scan (runs unconditionally)
|
||||
|
||||
|
||||
def test_ac4_mode_agnosticism_ast_scan() -> None:
|
||||
"""Components MUST NOT branch on `config.mode` / `is_replay` / etc.
|
||||
|
||||
Per ADR-011 + replay protocol Invariant 1, replay-mode logic is
|
||||
structurally confined to the composition root (``runtime_root``),
|
||||
the replay strategies (``frame_source``, ``clock``,
|
||||
``c8_fc_adapter/{tlog_replay_adapter,replay_sink,
|
||||
noop_mavlink_transport,serial_mavlink_transport}``), the
|
||||
``replay_input/`` coordinator, and the ``cli/replay.py`` CLI. No
|
||||
``components/**/*.py`` file should test the mode at runtime.
|
||||
"""
|
||||
# Arrange
|
||||
repo_root = Path(__file__).resolve().parents[3]
|
||||
components_dir = repo_root / "src" / "gps_denied_onboard" / "components"
|
||||
py_files = sorted(components_dir.rglob("*.py"))
|
||||
assert py_files, "no component .py files found — repository layout drift?"
|
||||
|
||||
# Patterns we treat as mode-aware branches.
|
||||
forbidden_attribute_chains = {
|
||||
("config", "mode"),
|
||||
("self", "_replay_mode"),
|
||||
("self", "_mode"),
|
||||
("self", "is_replay"),
|
||||
}
|
||||
forbidden_compare_strings = {"replay", "live"}
|
||||
|
||||
violations: list[str] = []
|
||||
for path in py_files:
|
||||
try:
|
||||
tree = ast.parse(path.read_text(encoding="utf-8"))
|
||||
except SyntaxError as exc:
|
||||
pytest.fail(f"{path} is not valid Python: {exc!r}")
|
||||
scanner = _ModeBranchScanner(
|
||||
forbidden_attribute_chains, forbidden_compare_strings
|
||||
)
|
||||
scanner.visit(tree)
|
||||
for lineno, snippet in scanner.violations:
|
||||
violations.append(f"{path.relative_to(repo_root)}:{lineno}: {snippet}")
|
||||
|
||||
# Assert
|
||||
assert not violations, (
|
||||
"mode-agnosticism violation — components must not branch on "
|
||||
"replay vs live state (move the branch to runtime_root or a "
|
||||
"replay strategy):\n " + "\n ".join(violations)
|
||||
)
|
||||
|
||||
|
||||
class _ModeBranchScanner(ast.NodeVisitor):
|
||||
"""AST visitor that flags `if config.mode == ...` / `is_replay` / etc."""
|
||||
|
||||
def __init__(
|
||||
self,
|
||||
forbidden_attribute_chains: set[tuple[str, str]],
|
||||
forbidden_compare_strings: set[str],
|
||||
) -> None:
|
||||
self.forbidden_attrs = forbidden_attribute_chains
|
||||
self.forbidden_strings = forbidden_compare_strings
|
||||
self.violations: list[tuple[int, str]] = []
|
||||
|
||||
def visit_If(self, node: ast.If) -> None:
|
||||
self._check_test(node.test)
|
||||
self.generic_visit(node)
|
||||
|
||||
def visit_IfExp(self, node: ast.IfExp) -> None:
|
||||
self._check_test(node.test)
|
||||
self.generic_visit(node)
|
||||
|
||||
def _check_test(self, node: ast.expr) -> None:
|
||||
# Catch `if self._replay_mode:` / `if config.mode:`
|
||||
if isinstance(node, ast.Attribute):
|
||||
chain = self._attribute_chain(node)
|
||||
if chain in self.forbidden_attrs:
|
||||
self.violations.append(
|
||||
(node.lineno, f"truthiness of {'.'.join(chain)}")
|
||||
)
|
||||
# Catch `if config.mode == "replay":` / `if mode != "live":`
|
||||
if isinstance(node, ast.Compare) and isinstance(node.left, ast.Attribute):
|
||||
chain = self._attribute_chain(node.left)
|
||||
if chain in self.forbidden_attrs:
|
||||
for cmp_value in node.comparators:
|
||||
if (
|
||||
isinstance(cmp_value, ast.Constant)
|
||||
and isinstance(cmp_value.value, str)
|
||||
and cmp_value.value in self.forbidden_strings
|
||||
):
|
||||
self.violations.append(
|
||||
(
|
||||
node.lineno,
|
||||
f"compare {'.'.join(chain)} == {cmp_value.value!r}",
|
||||
)
|
||||
)
|
||||
# Catch nested boolean / unary wrappers.
|
||||
if isinstance(node, ast.BoolOp):
|
||||
for value in node.values:
|
||||
self._check_test(value)
|
||||
if isinstance(node, ast.UnaryOp):
|
||||
self._check_test(node.operand)
|
||||
|
||||
@staticmethod
|
||||
def _attribute_chain(node: ast.Attribute) -> tuple[str, ...]:
|
||||
"""Return ('self', 'mode') for `self.mode`, etc.; () if non-trivial."""
|
||||
parts: list[str] = []
|
||||
cur: ast.expr = node
|
||||
while isinstance(cur, ast.Attribute):
|
||||
parts.append(cur.attr)
|
||||
cur = cur.value
|
||||
if isinstance(cur, ast.Name):
|
||||
parts.append(cur.id)
|
||||
else:
|
||||
return ()
|
||||
return tuple(reversed(parts))
|
||||
|
||||
|
||||
# ----------------------------------------------------------------------
|
||||
# AC-4b: Encoder byte-equality (BLOCKED on AZ-558)
|
||||
|
||||
|
||||
@pytest.mark.skip(
|
||||
reason=(
|
||||
"AC-4b blocked on AZ-558: C8 encoders still bypass the "
|
||||
"MavlinkTransport seam by calling mav.*_send directly. The "
|
||||
"CapturingMavlinkTransport fixture in _helpers.py is ready; "
|
||||
"this test unskips when AZ-558 lands."
|
||||
)
|
||||
)
|
||||
def test_ac4_encoder_byte_equality() -> None:
|
||||
raise NotImplementedError("blocked on AZ-558 — see skip reason")
|
||||
|
||||
|
||||
# ----------------------------------------------------------------------
|
||||
# AC-5: Determinism (two runs differ by ≤ 1e-6 in position fields)
|
||||
|
||||
|
||||
@_HEAVY_SKIP
|
||||
def test_ac5_determinism_two_runs_diff(replay_runner) -> None:
|
||||
# Act
|
||||
r1 = replay_runner(pace="asap")
|
||||
r2 = replay_runner(pace="asap")
|
||||
|
||||
# Assert
|
||||
assert r1.returncode == 0 and r2.returncode == 0
|
||||
rows_1 = parse_jsonl(r1.output_path)
|
||||
rows_2 = parse_jsonl(r2.output_path)
|
||||
assert len(rows_1) == len(rows_2), (
|
||||
f"determinism violated at line count: {len(rows_1)} vs {len(rows_2)}"
|
||||
)
|
||||
for i, (a, b) in enumerate(zip(rows_1, rows_2, strict=True)):
|
||||
for axis in ("lat_deg", "lon_deg", "alt_m"):
|
||||
diff = abs(
|
||||
a["position_wgs84"][axis] - b["position_wgs84"][axis]
|
||||
)
|
||||
assert diff <= 1e-6, (
|
||||
f"row {i} axis {axis}: |{a['position_wgs84'][axis]} - "
|
||||
f"{b['position_wgs84'][axis]}| = {diff} > 1e-6"
|
||||
)
|
||||
|
||||
|
||||
# ----------------------------------------------------------------------
|
||||
# AC-6: Pace timing
|
||||
|
||||
|
||||
@_HEAVY_SKIP
|
||||
def test_ac6_pace_realtime_60s_within_5pct(replay_runner) -> None:
|
||||
# Act
|
||||
result = replay_runner(pace="realtime")
|
||||
|
||||
# Assert
|
||||
assert result.returncode == 0
|
||||
# 60 s clip ± 3 s tolerance per the spec.
|
||||
assert 57.0 <= result.wall_clock_s <= 63.0, (
|
||||
f"--pace realtime expected 60 s ± 3 s; got {result.wall_clock_s:.2f} s"
|
||||
)
|
||||
|
||||
|
||||
@_HEAVY_SKIP
|
||||
def test_ac6_pace_asap_under_30s(replay_runner) -> None:
|
||||
# Act
|
||||
result = replay_runner(pace="asap")
|
||||
|
||||
# Assert
|
||||
assert result.returncode == 0
|
||||
assert result.wall_clock_s <= 30.0, (
|
||||
f"--pace asap expected ≤ 30 s on Tier-1; got {result.wall_clock_s:.2f} s"
|
||||
)
|
||||
|
||||
|
||||
# ----------------------------------------------------------------------
|
||||
# AC-7: Skip-gate self-check
|
||||
|
||||
|
||||
def test_ac7_skip_gate_consistent_with_env_var() -> None:
|
||||
"""The heavy-test skip mark MUST mirror the documented env-var gate.
|
||||
|
||||
Verifies that ``RUN_REPLAY_E2E`` controls the skip mark, so the
|
||||
epic AC-7 contract ("all e2e tests skip cleanly without the env
|
||||
var, without errors") is observably true at collection time.
|
||||
"""
|
||||
# Arrange
|
||||
env_set = os.environ.get("RUN_REPLAY_E2E", "").lower() in {
|
||||
"1", "true", "yes", "on"
|
||||
}
|
||||
|
||||
# Act
|
||||
skip_active = _heavy_skip_reason() is not None
|
||||
|
||||
# Assert
|
||||
assert skip_active != env_set, (
|
||||
f"RUN_REPLAY_E2E env_set={env_set}; skip_active={skip_active}"
|
||||
)
|
||||
|
||||
|
||||
# ----------------------------------------------------------------------
|
||||
# Operator workflow rehearsal (AC-8 in this file's matrix; spec calls it AC-9)
|
||||
|
||||
|
||||
@pytest.mark.skip(
|
||||
reason=(
|
||||
"AC-8 (operator workflow rehearsal) blocked on the full "
|
||||
"D-PROJ-2 mock-suite-sat-service implementation — current "
|
||||
"tests/fixtures/mock-suite-sat-service/ is a bootstrap stub "
|
||||
"with only GET /healthz. Unskips when the mock implements "
|
||||
"tile-fetch + index-build endpoints."
|
||||
)
|
||||
)
|
||||
def test_ac8_operator_workflow(operator_pre_flight_setup, replay_runner) -> None:
|
||||
raise NotImplementedError(
|
||||
"blocked on D-PROJ-2 mock-suite-sat-service implementation"
|
||||
)
|
||||
Reference in New Issue
Block a user