[AZ-404] [AZ-389] [AZ-559] E2E replay test (Derkachi 60s) + AZ-389 cleanup

Batch 63 of /autodev replay slice. Adds the AZ-404 E2E test harness against the Derkachi fixture and resolves the AZ-389 dependency phantom (closing AZ-559 Won't Fix). E2E test (AZ-404) - tests/e2e/replay/_tlog_synth.py: deterministic CSV->tlog generator (the original Derkachi tlog is not in repo; data_imu.csv is its export, so we round-trip the CSV through pymavlink). Verified: SCALED_IMU2 + ATTITUDE + GPS_RAW_INT + HEARTBEAT round-trip cleanly through mavutil.mavlink_connection. - tests/e2e/replay/_helpers.py: parse_jsonl, l2_horizontal_m (haversine), match_percentage, CapturingMavlinkTransport (ready for AZ-558 unblock), GroundTruthRow + load_ground_truth_csv. - tests/e2e/replay/conftest.py: derkachi_replay_inputs (session scope), replay_runner (subprocess fixture per AZ-402 CLI), operator_pre_flight_setup placeholder. - tests/e2e/replay/test_derkachi_1min.py: 9 tests covering AC-1..AC-8 with AC-7 skip-gate self-check + AC-4a mode-agnosticism AST scan (passes unconditionally, confirms ADR-011 holding). - tests/e2e/replay/test_helpers.py: 14 unit tests covering AC-9 helper L2 correctness + match_percentage + parse_jsonl + CapturingMavlinkTransport (all unconditional). - tests/e2e/replay/README.md: AC matrix, fixture state, runtime budget, failure cookbook (AC-10). AC matrix - AC-1, AC-2, AC-5, AC-6 implemented and Tier-1 gated on RUN_REPLAY_E2E=1. - AC-3 (<=100m for 80%) xfail until real Topotek KHP20S30 calibration ships (camera_info.md states intrinsics are unknown). - AC-4a (mode-agnosticism AST scan) PASSES unconditionally. - AC-4b (encoder byte-equality) skip until AZ-558 routes C8 bytes through MavlinkTransport. - AC-7 (skip-gate self-check) PASSES unconditionally. - AC-8 (operator workflow rehearsal) skip until D-PROJ-2 mock-suite-sat-service implements tile-fetch + index-build endpoints. - AC-9 (helper L2 correctness) 14 PASSES unconditionally. AZ-389 housekeeping - AZ-559 closed Won't Fix: investigation against c6_tile_cache/_types.py confirmed TileSource.ONBOARD_INGEST + TileMetadata.quality_metadata + write_tile's FreshnessRejectionError already cover the mid-flight ingest semantic. The "missing API" was a spec-vs-impl naming mismatch. - AZ-389 spec rewritten to consume the existing write_tile API + catch FreshnessRejectionError per AC-NEW-3 opportunistic emission. - _dependencies_table.md reverted: AZ-389 deps -> AZ-303 (was AZ-559 in the previous commit on this branch); total 150 / 497 pts. Tests - Full regression: 2099 passed (+14 new e2e/replay), 94 skipped (incl. 8 e2e/replay heavy-tier + documented blocker skips), 3 perf-microbench flakes deselected (test_cli_cold_start_under_2s, test_cold_start_under_500ms_p99, test_nfr_perf_sign_microbench; all pass in isolation - pre-existing under-load flakes on dev macOS). Reviews - _docs/03_implementation/reviews/batch_63_review.md: code review PASS_WITH_WARNINGS (3 documented spec-gap deferrals: AC-3, AC-4b, AC-8). - _docs/03_implementation/cumulative_review_batches_61-63_cycle1_report.md: cumulative review PASS_WITH_WARNINGS. Action items: prioritise AZ-558 (closes AZ-401 AC-9 + AZ-404 AC-4b); consider 2pt hygiene PBI for Protocol-completeness AST scan to catch the AZ-389 / AZ-559 phantom-API pattern at task-prep time. Architecture invariants observably holding - ADR-011 (replay-as-configuration): AC-4a's AST scan over src/gps_denied_onboard/components/**/*.py finds zero violations - components branch on neither config.mode nor any synonym. - Single composition root (replay protocol Invariant 11): AZ-402 CLI dispatches to runtime_root.main(config); does not call compose_root directly. Co-authored-by: Cursor <cursoragent@cursor.com>
2026-06-22 20:51:14 +00:00 · 2026-05-14 21:41:39 +03:00
parent 4f10fd230f
commit d7e6b0959e
13 changed files with 1611 additions and 26 deletions
@@ -0,0 +1,382 @@
+"""AZ-404 — E2E replay test against the Derkachi 60 s clip.
+
+Runs the ``gps-denied-replay`` console-script (AZ-402) against the
+Derkachi fixture (``_docs/00_problem/input_data/flight_derkachi/``)
+and asserts the epic AZ-265 acceptance criteria. Per the project's
+E2E pattern the heavy tests are gated by ``RUN_REPLAY_E2E=1``; the
+lightweight AC-4a (mode-agnosticism AST scan) and AC-7 (skip-gate
+self-check) run unconditionally.
+
+Some ACs are SKIPPED with documented reasons until upstream work
+ships:
+
+* AC-3 (≤ 100 m for 80 % of ticks) — ``xfail`` until a real Topotek
+  KHP20S30 calibration ships (camera_info.md notes the intrinsics
+  are unknown).
+* AC-4b (encoder byte-equality) — ``skip`` until AZ-558 routes the
+  C8 outbound bytes through the ``MavlinkTransport`` seam.
+* AC-8 / AC-9 in spec (operator workflow rehearsal) — ``skip`` until
+  ``mock-suite-sat-service`` implements the D-PROJ-2 ingest contract.
+
+The unit-level ``_helpers.py`` tests in ``test_helpers.py`` cover
+AC-9 (helper L2 correctness) unconditionally.
+"""
+
+from __future__ import annotations
+
+import ast
+import os
+import re
+from pathlib import Path
+
+import pytest
+
+from tests.e2e.replay._helpers import (
+    match_percentage,
+    parse_jsonl,
+)
+
+
+# ----------------------------------------------------------------------
+# Skip gates
+
+
+def _heavy_skip_reason() -> str | None:
+    if os.environ.get("RUN_REPLAY_E2E", "").lower() not in {"1", "true", "yes", "on"}:
+        return "AZ-404 heavy e2e tests gated by RUN_REPLAY_E2E=1"
+    return None
+
+
+_HEAVY_SKIP = pytest.mark.skipif(
+    _heavy_skip_reason() is not None, reason=_heavy_skip_reason() or "ok"
+)
+
+
+# ----------------------------------------------------------------------
+# AC-1: CLI exits 0; JSONL line count matches tlog GLOBAL_POSITION_INT count
+
+
+@_HEAVY_SKIP
+def test_ac1_exits_0_jsonl_count_match(replay_runner, derkachi_replay_inputs) -> None:
+    # Act
+    result = replay_runner(pace="asap")
+
+    # Assert — clean exit
+    assert result.returncode == 0, (
+        f"gps-denied-replay exited {result.returncode}\n"
+        f"stdout:\n{result.stdout}\nstderr:\n{result.stderr}"
+    )
+
+    # Assert — JSONL line count within ±5 % of the ground-truth row count
+    rows = parse_jsonl(result.output_path)
+    expected = len(derkachi_replay_inputs.ground_truth)
+    actual = len(rows)
+    tolerance = max(1, int(expected * 0.05))
+    assert abs(actual - expected) <= tolerance, (
+        f"JSONL count {actual} not within ±5 % of expected "
+        f"{expected} (tolerance ±{tolerance})"
+    )
+
+
+# ----------------------------------------------------------------------
+# AC-2: Each line is valid JSON matching the EstimatorOutput schema
+
+
+_ESTIMATOR_OUTPUT_KEYS = frozenset(
+    {
+        "frame_id",
+        "position_wgs84",
+        "orientation_world_T_body",
+        "velocity_world_mps",
+        "covariance_6x6",
+        "source_label",
+        "last_satellite_anchor_age_ms",
+        "smoothed",
+        "emitted_at",
+    }
+)
+
+
+@_HEAVY_SKIP
+def test_ac2_jsonl_schema_match(replay_runner) -> None:
+    # Act
+    result = replay_runner(pace="asap")
+    rows = parse_jsonl(result.output_path)
+
+    # Assert
+    assert rows, "no JSONL output rows produced"
+    for i, row in enumerate(rows):
+        assert isinstance(row, dict), f"row {i} is not a JSON object"
+        missing = _ESTIMATOR_OUTPUT_KEYS - set(row.keys())
+        extra = set(row.keys()) - _ESTIMATOR_OUTPUT_KEYS
+        assert not missing, f"row {i} missing keys: {missing}"
+        assert not extra, f"row {i} has unexpected keys: {extra}"
+        assert isinstance(row["position_wgs84"], dict)
+        assert {"lat_deg", "lon_deg", "alt_m"}.issubset(row["position_wgs84"])
+        assert isinstance(row["covariance_6x6"], list) and len(row["covariance_6x6"]) == 36
+        assert isinstance(row["smoothed"], bool)
+
+
+# ----------------------------------------------------------------------
+# AC-3: ≥ 80 % of emissions within 100 m of ground truth
+
+
+@_HEAVY_SKIP
+@pytest.mark.xfail(
+    reason=(
+        "AC-3 requires a real Topotek KHP20S30 camera calibration; "
+        "_docs/00_problem/input_data/flight_derkachi/camera_info.md "
+        "states the intrinsics are unknown. Test runs as xfail "
+        "until a real calibration JSON ships."
+    ),
+    strict=False,
+)
+def test_ac3_within_100m_80pct_of_ticks(replay_runner, derkachi_replay_inputs) -> None:
+    # Act
+    result = replay_runner(pace="asap")
+    rows = parse_jsonl(result.output_path)
+
+    # Assert
+    pct = match_percentage(
+        rows,
+        derkachi_replay_inputs.ground_truth,
+        threshold_m=100.0,
+    )
+    assert pct >= 0.80, (
+        f"AC-3: only {pct * 100:.1f} % of emissions within 100 m of GT; "
+        f"epic threshold is 80 %"
+    )
+
+
+# ----------------------------------------------------------------------
+# AC-4a: Mode-agnosticism AST scan (runs unconditionally)
+
+
+def test_ac4_mode_agnosticism_ast_scan() -> None:
+    """Components MUST NOT branch on `config.mode` / `is_replay` / etc.
+
+    Per ADR-011 + replay protocol Invariant 1, replay-mode logic is
+    structurally confined to the composition root (``runtime_root``),
+    the replay strategies (``frame_source``, ``clock``,
+    ``c8_fc_adapter/{tlog_replay_adapter,replay_sink,
+    noop_mavlink_transport,serial_mavlink_transport}``), the
+    ``replay_input/`` coordinator, and the ``cli/replay.py`` CLI. No
+    ``components/**/*.py`` file should test the mode at runtime.
+    """
+    # Arrange
+    repo_root = Path(__file__).resolve().parents[3]
+    components_dir = repo_root / "src" / "gps_denied_onboard" / "components"
+    py_files = sorted(components_dir.rglob("*.py"))
+    assert py_files, "no component .py files found — repository layout drift?"
+
+    # Patterns we treat as mode-aware branches.
+    forbidden_attribute_chains = {
+        ("config", "mode"),
+        ("self", "_replay_mode"),
+        ("self", "_mode"),
+        ("self", "is_replay"),
+    }
+    forbidden_compare_strings = {"replay", "live"}
+
+    violations: list[str] = []
+    for path in py_files:
+        try:
+            tree = ast.parse(path.read_text(encoding="utf-8"))
+        except SyntaxError as exc:
+            pytest.fail(f"{path} is not valid Python: {exc!r}")
+        scanner = _ModeBranchScanner(
+            forbidden_attribute_chains, forbidden_compare_strings
+        )
+        scanner.visit(tree)
+        for lineno, snippet in scanner.violations:
+            violations.append(f"{path.relative_to(repo_root)}:{lineno}: {snippet}")
+
+    # Assert
+    assert not violations, (
+        "mode-agnosticism violation — components must not branch on "
+        "replay vs live state (move the branch to runtime_root or a "
+        "replay strategy):\n  " + "\n  ".join(violations)
+    )
+
+
+class _ModeBranchScanner(ast.NodeVisitor):
+    """AST visitor that flags `if config.mode == ...` / `is_replay` / etc."""
+
+    def __init__(
+        self,
+        forbidden_attribute_chains: set[tuple[str, str]],
+        forbidden_compare_strings: set[str],
+    ) -> None:
+        self.forbidden_attrs = forbidden_attribute_chains
+        self.forbidden_strings = forbidden_compare_strings
+        self.violations: list[tuple[int, str]] = []
+
+    def visit_If(self, node: ast.If) -> None:
+        self._check_test(node.test)
+        self.generic_visit(node)
+
+    def visit_IfExp(self, node: ast.IfExp) -> None:
+        self._check_test(node.test)
+        self.generic_visit(node)
+
+    def _check_test(self, node: ast.expr) -> None:
+        # Catch `if self._replay_mode:` / `if config.mode:`
+        if isinstance(node, ast.Attribute):
+            chain = self._attribute_chain(node)
+            if chain in self.forbidden_attrs:
+                self.violations.append(
+                    (node.lineno, f"truthiness of {'.'.join(chain)}")
+                )
+        # Catch `if config.mode == "replay":` / `if mode != "live":`
+        if isinstance(node, ast.Compare) and isinstance(node.left, ast.Attribute):
+            chain = self._attribute_chain(node.left)
+            if chain in self.forbidden_attrs:
+                for cmp_value in node.comparators:
+                    if (
+                        isinstance(cmp_value, ast.Constant)
+                        and isinstance(cmp_value.value, str)
+                        and cmp_value.value in self.forbidden_strings
+                    ):
+                        self.violations.append(
+                            (
+                                node.lineno,
+                                f"compare {'.'.join(chain)} == {cmp_value.value!r}",
+                            )
+                        )
+        # Catch nested boolean / unary wrappers.
+        if isinstance(node, ast.BoolOp):
+            for value in node.values:
+                self._check_test(value)
+        if isinstance(node, ast.UnaryOp):
+            self._check_test(node.operand)
+
+    @staticmethod
+    def _attribute_chain(node: ast.Attribute) -> tuple[str, ...]:
+        """Return ('self', 'mode') for `self.mode`, etc.; () if non-trivial."""
+        parts: list[str] = []
+        cur: ast.expr = node
+        while isinstance(cur, ast.Attribute):
+            parts.append(cur.attr)
+            cur = cur.value
+        if isinstance(cur, ast.Name):
+            parts.append(cur.id)
+        else:
+            return ()
+        return tuple(reversed(parts))
+
+
+# ----------------------------------------------------------------------
+# AC-4b: Encoder byte-equality (BLOCKED on AZ-558)
+
+
+@pytest.mark.skip(
+    reason=(
+        "AC-4b blocked on AZ-558: C8 encoders still bypass the "
+        "MavlinkTransport seam by calling mav.*_send directly. The "
+        "CapturingMavlinkTransport fixture in _helpers.py is ready; "
+        "this test unskips when AZ-558 lands."
+    )
+)
+def test_ac4_encoder_byte_equality() -> None:
+    raise NotImplementedError("blocked on AZ-558 — see skip reason")
+
+
+# ----------------------------------------------------------------------
+# AC-5: Determinism (two runs differ by ≤ 1e-6 in position fields)
+
+
+@_HEAVY_SKIP
+def test_ac5_determinism_two_runs_diff(replay_runner) -> None:
+    # Act
+    r1 = replay_runner(pace="asap")
+    r2 = replay_runner(pace="asap")
+
+    # Assert
+    assert r1.returncode == 0 and r2.returncode == 0
+    rows_1 = parse_jsonl(r1.output_path)
+    rows_2 = parse_jsonl(r2.output_path)
+    assert len(rows_1) == len(rows_2), (
+        f"determinism violated at line count: {len(rows_1)} vs {len(rows_2)}"
+    )
+    for i, (a, b) in enumerate(zip(rows_1, rows_2, strict=True)):
+        for axis in ("lat_deg", "lon_deg", "alt_m"):
+            diff = abs(
+                a["position_wgs84"][axis] - b["position_wgs84"][axis]
+            )
+            assert diff <= 1e-6, (
+                f"row {i} axis {axis}: |{a['position_wgs84'][axis]} - "
+                f"{b['position_wgs84'][axis]}| = {diff} > 1e-6"
+            )
+
+
+# ----------------------------------------------------------------------
+# AC-6: Pace timing
+
+
+@_HEAVY_SKIP
+def test_ac6_pace_realtime_60s_within_5pct(replay_runner) -> None:
+    # Act
+    result = replay_runner(pace="realtime")
+
+    # Assert
+    assert result.returncode == 0
+    # 60 s clip ± 3 s tolerance per the spec.
+    assert 57.0 <= result.wall_clock_s <= 63.0, (
+        f"--pace realtime expected 60 s ± 3 s; got {result.wall_clock_s:.2f} s"
+    )
+
+
+@_HEAVY_SKIP
+def test_ac6_pace_asap_under_30s(replay_runner) -> None:
+    # Act
+    result = replay_runner(pace="asap")
+
+    # Assert
+    assert result.returncode == 0
+    assert result.wall_clock_s <= 30.0, (
+        f"--pace asap expected ≤ 30 s on Tier-1; got {result.wall_clock_s:.2f} s"
+    )
+
+
+# ----------------------------------------------------------------------
+# AC-7: Skip-gate self-check
+
+
+def test_ac7_skip_gate_consistent_with_env_var() -> None:
+    """The heavy-test skip mark MUST mirror the documented env-var gate.
+
+    Verifies that ``RUN_REPLAY_E2E`` controls the skip mark, so the
+    epic AC-7 contract ("all e2e tests skip cleanly without the env
+    var, without errors") is observably true at collection time.
+    """
+    # Arrange
+    env_set = os.environ.get("RUN_REPLAY_E2E", "").lower() in {
+        "1", "true", "yes", "on"
+    }
+
+    # Act
+    skip_active = _heavy_skip_reason() is not None
+
+    # Assert
+    assert skip_active != env_set, (
+        f"RUN_REPLAY_E2E env_set={env_set}; skip_active={skip_active}"
+    )
+
+
+# ----------------------------------------------------------------------
+# Operator workflow rehearsal (AC-8 in this file's matrix; spec calls it AC-9)
+
+
+@pytest.mark.skip(
+    reason=(
+        "AC-8 (operator workflow rehearsal) blocked on the full "
+        "D-PROJ-2 mock-suite-sat-service implementation — current "
+        "tests/fixtures/mock-suite-sat-service/ is a bootstrap stub "
+        "with only GET /healthz. Unskips when the mock implements "
+        "tile-fetch + index-build endpoints."
+    )
+)
+def test_ac8_operator_workflow(operator_pre_flight_setup, replay_runner) -> None:
+    raise NotImplementedError(
+        "blocked on D-PROJ-2 mock-suite-sat-service implementation"
+    )