2.4 KiB
Batch 05 — Cycle 4 Implementation Report
Date: 2026-09-06
Task: AZ-963 — Fix Derkachi 60 s smoke regressions (ESKF divergence on CSV-only path)
Chosen option: D (xfail with rationale) + E (investigate XPASS)
Changes
tests/e2e/replay/test_derkachi_1min.py
Added @pytest.mark.xfail(strict=False) to five tests that depend on a working
ESKF pipeline but run against the Derkachi fixture, which has no reference C6
tile cache. Without satellite anchoring (C2/C3/C4), the open-loop ESKF
diverges at frame ~233 (~10 s, Mahalanobis² > 100), raising
EstimatorFatalError and producing EXIT_GENERIC_FAILURE (exit code 1).
Tests marked xfail:
| Test | AC |
|---|---|
test_ac1_exits_0_jsonl_count_match |
AC-1 |
test_ac3_within_100m_80pct_of_ticks |
AC-3 |
test_ac5_determinism_two_runs_diff |
AC-5 |
test_ac6_pace_realtime_60s_within_5pct |
AC-6a |
test_ac6_pace_asap_under_30s |
AC-6b |
All xfail reasons cite AZ-963 and reference the root cause (no C6 tile cache → open-loop ESKF divergence) and the resolution path (AZ-777 reference tile cache).
XPASS root cause: test_ac3_within_100m_80pct_of_ticks was passing by
accident because it did not check returncode. Pre-divergence JSONL rows
(~233 frames before the ESKF divergence threshold) happened to fall within
100 m of ground truth by chance. Added assert result.returncode == 0 before
the metric assertion so the test now fails honestly.
tests/e2e/replay/README.md
Updated AC matrix: AC-1/AC-3/AC-5/AC-6a/AC-6b now marked xfail (AZ-963).
Added AZ-777 to Follow-up work as the only resolution path for AZ-963.
Updated Expected runtime notes.
Test results
tests/e2e/replay/test_derkachi_1min.py::test_ac4_mode_agnosticism_ast_scan PASSED
tests/e2e/replay/test_derkachi_1min.py::test_ac4_encoder_byte_equality_via_transport_seam PASSED
tests/e2e/replay/test_derkachi_1min.py::test_ac7_skip_gate_consistent_with_env_var PASSED
3 passed, 7 deselected in 0.28s
All unconditional (non-gated) tests pass. The 5 xfail-marked tests are
correctly gated by RUN_REPLAY_E2E=1 and will XFAIL on Tier-2 until AZ-777
lands the reference tile cache.
Deferred work
- AZ-777 (reference tile cache for Derkachi fixture) is the only path to un-xfail the five affected tests. No other code changes are needed.
- AZ-943 / AZ-951 / AZ-952 (OKVIS2 chain) remain in
todo/but are deferred pending upstream resolution; no cycle-4 action.