Files
gps-denied-onboard/tests/e2e/replay
Oleksandr Bezdieniezhnykh fd52cc9b1d [AZ-845][AZ-846][AZ-847] Refactor 02: relocate RouteSpec + widen lint
Cycle-3 refactor run 02-az507 (RouteSpec relocation + module-layout
refresh + AZ-270 lint widening). Single batch of 3 tasks; epic AZ-844.

AZ-845 — Relocate RouteSpec DTO to _types/route.py (rule-9 fix):
  * New canonical home: src/gps_denied_onboard/_types/route.py
    (frozen+slots dataclass; full docstring carried over verbatim).
  * c11_tile_manager/route_client.py imports from _types.route.
  * replay_input/tlog_route.py and replay_input/__init__.py keep
    re-exports for backward-compat (RouteSpec in __all__).
  * 5 test files updated to import from _types.route for symmetry.
  * Identity-preserving re-export verified by new test
    test_az845_routespec_canonical_home_and_reexport_identity.

AZ-846 — Refresh module-layout.md cycle-3 entries:
  * c11_tile_manager Internal list rewritten with all 8 internals
    (alphabetised) — corrects a stale entry that referenced files
    (satellite_provider_*.py) that no longer exist.
  * shared/replay_input file list adds errors.py (cycle-2 carry),
    tlog_ground_truth.py (cycle-2 carry), tlog_route.py (cycle-3 NEW).
  * shared/_types section registers route.py with provenance line.
  * Out-of-scope cycle-2 carry-overs (replay_api/, cli/render_map.py,
    helpers/gps_compare.py, etc.) intentionally untouched.

AZ-847 — Widen test_az270 lint to enforce full rule-9 allow-list:
  * test_ac6_only_compose_root_imports_concrete_strategies now walks
    every components/<X>/*.py ImportFrom/Import and rejects anything
    not in the rule-9 allow-list (own subpackage + _types + helpers
    + config/logging/fdr_client/clock + frame_source interface-only).
  * Strict superset of the original AC-6 narrow check.
  * Reports zero violations on the codebase post-AZ-845.
  * Two principled carve-outs documented in the test docstring:
    - components/<X>/bench/** path skip (measurement code legitimately
      constructs production strategies via runtime_root factories).
    - register_* lazy self-registration imports from
      runtime_root.<X>_factory (central-registry plugin pattern).
  * Both carve-outs surfaced to user via Choose A/B/C/D Risk-1
    protocol; user skipped both — agent proceeded with documented
    defaults. Doc-only follow-up tracked in
    _docs/_process_leftovers/2026-05-24_az847_rule9_wording_followup.md
    for rule-9 wording update in module-layout.md.

Test results: 2287 passed, 90 skipped (environmental — Docker / CUDA
/ TensorRT / Jetson hardware / fixtures), 0 failed. Focused subset
(replay_input/ + c11_tile_manager/ + test_az270_compose_root.py)
also clean: 169 passed, 1 skipped.

Tracker: AZ-845/846/847 transitioned In Progress -> In Testing.
Co-authored-by: Cursor <cursoragent@cursor.com>
2026-05-24 10:07:20 +03:00
..

E2E replay tests (AZ-404)

End-to-end regression suite that runs the gps-denied-replay console-script (AZ-402) against the Derkachi 60 s clip and asserts the AZ-265 epic acceptance criteria.

How to run

# In a fresh venv with the package installed:
RUN_REPLAY_E2E=1 pytest tests/e2e/replay/ -v

Without RUN_REPLAY_E2E=1 the heavy tests skip cleanly. The two unconditional tests (AC-4a mode-agnosticism scan + AC-7 skip-gate self-check + the helpers in test_helpers.py) still run.

Fixture state

Artifact Status Source
flight_derkachi.mp4 available _docs/00_problem/input_data/flight_derkachi/
data_imu.csv available same dir; 4900 rows at 10 Hz over 489.9 s
Synthetic tlog generated at fixture time _tlog_synth.py reproduces a pymavlink .tlog from the CSV (the original tlog is not in-repo; the CSV was its export)
Camera calibration placeholder (tests/fixtures/calibration/adti26.json) The real Topotek KHP20S30 intrinsics are unknown per camera_info.md. AC-3 is xfailed until a real calibration ships.
Operator pre-flight rehearsal blocked tests/fixtures/mock-suite-sat-service/ is a bootstrap stub (only GET /healthz); AC-8 skips until the full D-PROJ-2 contract lands.

Clip range

The first 60 s of the Derkachi flight (Time=0.0 → Time=60.0). The take-off region exercises the AZ-405 IMU-take-off auto-sync detector; the cruise region that follows stresses the satellite-anchor + VIO drift-correction path. To change the trim, edit _CLIP_START_S and _CLIP_END_S in conftest.py.

Expected runtime (Tier-1)

Test Expected wall clock
AC-1 (--pace asap) ≤ 30 s
AC-2 schema match piggybacks on AC-1
AC-5 determinism 2 × asap runs (≤ 60 s total)
AC-6 realtime 60 s ± 3 s
AC-6 asap ≤ 30 s
Total suite ≤ 6 min on Jetson AGX Orin

The AC-1 / AC-2 / AC-5 tests share --pace asap runs but each fixture invocation produces a fresh output file, so they do not short-circuit each other (preserves AC-5's two-runs-diff guarantee).

AC matrix

AC Test State
AC-1: exit 0 + JSONL count match test_ac1_exits_0_jsonl_count_match runs on Tier-1
AC-2: JSONL schema match test_ac2_jsonl_schema_match runs on Tier-1
AC-3: ≤ 100 m for 80 % of ticks test_ac3_within_100m_80pct_of_ticks xfail (waiting on real calibration)
AC-4a: mode-agnosticism AST scan test_ac4_mode_agnosticism_ast_scan unconditional
AC-4b: encoder byte-equality test_ac4_encoder_byte_equality skip (waiting on AZ-558)
AC-5: determinism test_ac5_determinism_two_runs_diff runs on Tier-1
AC-6a: realtime 60 s ± 5 % test_ac6_pace_realtime_60s_within_5pct runs on Tier-1
AC-6b: asap ≤ 30 s test_ac6_pace_asap_under_30s runs on Tier-1
AC-7: skip-gate self-check test_ac7_skip_gate_consistent_with_env_var unconditional
AC-8: operator workflow rehearsal test_ac8_operator_workflow skip (waiting on D-PROJ-2 mock)
AC-9: helper L2 correctness test_helpers.py::test_ac9_l2_* unconditional
AC-10: README accuracy this file live

Failure-mode cookbook

Symptom Likely cause Fix
gps-denied-replay console-script not on PATH package not installed in the test venv pip install -e .
AC-1 line count off by > 5 % tlog synthesizer drifted from the CSV regenerate by re-running the test (synthesizer is deterministic; non-determinism would be a real bug)
AC-3 fails at ~ 0 % even with calibration wrong intrinsics OR wrong WGS84 ground truth source — verify the GLOBAL_POSITION_INT columns are still the AC-3 reference (per flight_derkachi/README.md) re-derive ground truth
AC-5 determinism violated non-deterministic float ordering in C5 estimator OR a clock leaked into the runtime bisect via git log against the C5 / clock modules
AC-6 realtime drifts on shared CI shared-runner contention; the spec allows widening to ± 5 s adjust _HEAVY_SKIP boundary if it persists
tlog missing required messages _tlog_synth.py lost a message group check _REQUIRED_MESSAGE_GROUPS in tlog_replay_adapter.py against the synth output

Files

tests/e2e/replay/
├── README.md                   ← this file
├── __init__.py                 ← package marker + module-level docstring
├── _helpers.py                 ← parse_jsonl, l2_horizontal_m, match_percentage,
│                                  CapturingMavlinkTransport, GroundTruthRow
├── _tlog_synth.py              ← CSV → tlog generator
├── conftest.py                 ← derkachi_replay_inputs, replay_runner,
│                                  operator_pre_flight_setup fixtures
├── test_helpers.py             ← unit tests for _helpers (unconditional)
└── test_derkachi_1min.py       ← AC-1..AC-8 + AC-7 skip gate + AC-4a AST scan

Follow-up work

  • Real Topotek KHP20S30 calibration — unblocks AC-3.
  • AZ-558 — closes AC-4b (route C8 encoders through MavlinkTransport).
  • D-PROJ-2 mock-suite-sat-service — unblocks AC-8 (operator workflow rehearsal).