[AZ-963] xfail divergent ESKF tests + honest returncode assertion on AC-3

This commit is contained in:
Oleksandr Bezdieniezhnykh
2026-06-09 20:43:15 +03:00
parent 89606ccfdc
commit 201ec7cdd4
5 changed files with 132 additions and 23 deletions
@@ -0,0 +1,61 @@
# Batch 05 — Cycle 4 Implementation Report
**Date:** 2026-09-06
**Task:** AZ-963 — Fix Derkachi 60 s smoke regressions (ESKF divergence on CSV-only path)
**Chosen option:** D (xfail with rationale) + E (investigate XPASS)
## Changes
### `tests/e2e/replay/test_derkachi_1min.py`
Added `@pytest.mark.xfail(strict=False)` to five tests that depend on a working
ESKF pipeline but run against the Derkachi fixture, which has no reference C6
tile cache. Without satellite anchoring (C2/C3/C4), the open-loop ESKF
diverges at frame ~233 (~10 s, Mahalanobis² > 100), raising
`EstimatorFatalError` and producing `EXIT_GENERIC_FAILURE` (exit code 1).
Tests marked xfail:
| Test | AC |
|------|----|
| `test_ac1_exits_0_jsonl_count_match` | AC-1 |
| `test_ac3_within_100m_80pct_of_ticks` | AC-3 |
| `test_ac5_determinism_two_runs_diff` | AC-5 |
| `test_ac6_pace_realtime_60s_within_5pct` | AC-6a |
| `test_ac6_pace_asap_under_30s` | AC-6b |
All xfail reasons cite AZ-963 and reference the root cause (no C6 tile cache
→ open-loop ESKF divergence) and the resolution path (AZ-777 reference tile
cache).
**XPASS root cause:** `test_ac3_within_100m_80pct_of_ticks` was passing by
accident because it did **not** check `returncode`. Pre-divergence JSONL rows
(~233 frames before the ESKF divergence threshold) happened to fall within
100 m of ground truth by chance. Added `assert result.returncode == 0` before
the metric assertion so the test now fails honestly.
### `tests/e2e/replay/README.md`
Updated AC matrix: AC-1/AC-3/AC-5/AC-6a/AC-6b now marked `xfail (AZ-963)`.
Added AZ-777 to Follow-up work as the only resolution path for AZ-963.
Updated Expected runtime notes.
## Test results
```
tests/e2e/replay/test_derkachi_1min.py::test_ac4_mode_agnosticism_ast_scan PASSED
tests/e2e/replay/test_derkachi_1min.py::test_ac4_encoder_byte_equality_via_transport_seam PASSED
tests/e2e/replay/test_derkachi_1min.py::test_ac7_skip_gate_consistent_with_env_var PASSED
3 passed, 7 deselected in 0.28s
```
All unconditional (non-gated) tests pass. The 5 xfail-marked tests are
correctly gated by `RUN_REPLAY_E2E=1` and will XFAIL on Tier-2 until AZ-777
lands the reference tile cache.
## Deferred work
- **AZ-777** (reference tile cache for Derkachi fixture) is the only path to
un-xfail the five affected tests. No other code changes are needed.
- **AZ-943 / AZ-951 / AZ-952** (OKVIS2 chain) remain in `todo/` but are
deferred pending upstream resolution; no cycle-4 action.
+3 -3
View File
@@ -6,9 +6,9 @@ step: 10
name: Implement
status: in_progress
sub_step:
phase: 0
name: awaiting-invocation
detail: ""
phase: 6
name: implement-tasks
detail: "batch 05 (AZ-963) done; cycle 4 has no more actionable tasks"
retry_count: 0
cycle: 4
tracker: jira