diff --git a/_docs/02_tasks/backlog/AZ-841_unxfail_az777_tier2_tests.md b/_docs/02_tasks/done/AZ-841_unxfail_az777_tier2_tests.md similarity index 100% rename from _docs/02_tasks/backlog/AZ-841_unxfail_az777_tier2_tests.md rename to _docs/02_tasks/done/AZ-841_unxfail_az777_tier2_tests.md diff --git a/_docs/_process_leftovers/2026-09-06_az963_jira_transition.md b/_docs/_process_leftovers/2026-09-06_az963_jira_transition.md new file mode 100644 index 0000000..60e825f --- /dev/null +++ b/_docs/_process_leftovers/2026-09-06_az963_jira_transition.md @@ -0,0 +1,9 @@ +# Tracker leftover — AZ-963 Jira transition + +**Timestamp:** 2026-09-06T20:43:00+03:00 +**What was blocked:** AZ-963 status transition (In Progress → Done) in Jira +**Full payload:** +- Issue: AZ-963 +- Target status: Done +- Comment: "Implemented as xfail+returncode fix (Option D+E). Committed as 201ec7c. Tests AC-4a/AC-4b/AC-7 pass locally. Five xfail-marked tests will XFAIL on Tier-2 until AZ-777 lands." +**Reason:** Jira MCP server availability not confirmed during this session \ No newline at end of file diff --git a/tests/e2e/replay/README.md b/tests/e2e/replay/README.md index dab7ff2..63997ab 100644 --- a/tests/e2e/replay/README.md +++ b/tests/e2e/replay/README.md @@ -116,7 +116,7 @@ the attribution into the test fixture's metadata; do not remove it. | `flight_derkachi.mp4` | available | `_docs/00_problem/input_data/flight_derkachi/` | | `data_imu.csv` | available | same dir; 4900 rows at 10 Hz over 489.9 s | | Synthetic tlog | generated at fixture time | `_tlog_synth.py` reproduces a `pymavlink` `.tlog` from the CSV (the original tlog is not in-repo; the CSV was its export) | -| Camera calibration | placeholder (`tests/fixtures/calibration/adti26.json`) | The real Topotek KHP20S30 intrinsics are unknown per `camera_info.md`. AC-3 is `xfail`ed until a real calibration ships. | +| Camera calibration | placeholder (`tests/fixtures/calibration/adti26.json`) | The real Topotek KHP20S30 intrinsics are unknown per `camera_info.md`. AC-3 accuracy depends on this. | | Operator pre-flight rehearsal | blocked | `tests/fixtures/mock-suite-sat-service/` is a bootstrap stub (only `GET /healthz`); AC-8 skips until the full D-PROJ-2 contract lands. | ## Clip range @@ -131,12 +131,12 @@ drift-correction path. To change the trim, edit `_CLIP_START_S` and | Test | Expected wall clock | |------|---------------------| -| AC-1 (`--pace asap`) | would be ≤ 30 s; `xfail` (AZ-963) | -| AC-2 schema match | piggybacks on AC-1; still runs (no divergence path) | -| AC-5 determinism | would be 2 × asap runs (≤ 60 s total); `xfail` (AZ-963) | -| AC-6 realtime | would be 60 s ± 3 s; `xfail` (AZ-963) | -| AC-6 asap | would be ≤ 30 s; `xfail` (AZ-963) | -| Total suite | ~6 min wall clock but 4 xfailed on Derkachi fixture | +| AC-1 (`--pace asap`) | ≤ 30 s (Tier-2 only) | +| AC-2 schema match | piggybacks on AC-1 (Tier-2 only) | +| AC-5 determinism | 2 × asap runs (≤ 60 s total; Tier-2 only) | +| AC-6 realtime | 60 s ± 3 s (Tier-2 only) | +| AC-6 asap | ≤ 30 s (Tier-2 only) | +| Total suite | ~6 min wall clock on Tier-2; skips on Mac | The AC-1 / AC-2 / AC-5 tests share `--pace asap` runs but each fixture invocation produces a fresh output file, so they do not @@ -146,14 +146,14 @@ short-circuit each other (preserves AC-5's two-runs-diff guarantee). | AC | Test | State | |----|------|-------| -| AC-1: exit 0 + JSONL count match | `test_ac1_exits_0_jsonl_count_match` | `xfail` (AZ-963 — open-loop ESKF diverges without satellite anchoring) | -| AC-2: JSONL schema match | `test_ac2_jsonl_schema_match` | runs on Tier-1 | -| AC-3: ≤ 100 m for 80 % of ticks | `test_ac3_within_100m_80pct_of_ticks` | `xfail` (AZ-963 — open-loop ESKF cannot meet ≤100 m threshold without satellite anchoring; pre-AZ-963 XPASS was a false positive on partial pre-divergence output) | +| AC-1: exit 0 + JSONL count match | `test_ac1_exits_0_jsonl_count_match` | Tier-2 (Jetson only) | +| AC-2: JSONL schema match | `test_ac2_jsonl_schema_match` | Tier-2 (Jetson only) | +| AC-3: ≤ 100 m for 80 % of ticks | `test_ac3_within_100m_80pct_of_ticks` | Tier-2 (Jetson only) | | AC-4a: mode-agnosticism AST scan | `test_ac4_mode_agnosticism_ast_scan` | unconditional | | AC-4b: encoder byte-equality | `test_ac4_encoder_byte_equality` | `skip` (waiting on AZ-558) | -| AC-5: determinism | `test_ac5_determinism_two_runs_diff` | `xfail` (AZ-963 — open-loop ESKF diverges without satellite anchoring) | -| AC-6a: realtime 60 s ± 5 % | `test_ac6_pace_realtime_60s_within_5pct` | `xfail` (AZ-963 — open-loop ESKF diverges without satellite anchoring) | -| AC-6b: asap ≤ 30 s | `test_ac6_pace_asap_under_30s` | `xfail` (AZ-963 — open-loop ESKF diverges without satellite anchoring) | +| AC-5: determinism | `test_ac5_determinism_two_runs_diff` | Tier-2 (Jetson only) | +| AC-6a: realtime 60 s ± 5 % | `test_ac6_pace_realtime_60s_within_5pct` | Tier-2 (Jetson only) | +| AC-6b: asap ≤ 30 s | `test_ac6_pace_asap_under_30s` | Tier-2 (Jetson only) | | AC-7: skip-gate self-check | `test_ac7_skip_gate_consistent_with_env_var` | unconditional | | AC-8: operator workflow rehearsal | `test_ac8_operator_workflow` | `skip` (waiting on D-PROJ-2 mock) | | AC-9: helper L2 correctness | `test_helpers.py::test_ac9_l2_*` | unconditional | @@ -187,9 +187,9 @@ tests/e2e/replay/ ## Follow-up work -* **AZ-777** — build a reference C6 tile cache for the Derkachi fixture - so C2/C3/C4 satellite re-anchoring is wired. This is the ONLY path - that can resolve AZ-963 (open-loop ESKF divergence is expected physics). +* **AZ-777** — DONE (Phases 1+2 shipped cycle 3). C11 contract adapted, + e2e-runner wired against real satellite-provider. Phases 3-5 superseded + by Epic AZ-835 children (AZ-839, AZ-840, AZ-841). * **Real Topotek KHP20S30 calibration** — needed for AC-3 accuracy even after AZ-777 lands (the threshold is ≤100 m for 80 % of ticks). * **AZ-558** — closes AC-4b (route C8 encoders through `MavlinkTransport`). diff --git a/tests/e2e/replay/test_derkachi_1min.py b/tests/e2e/replay/test_derkachi_1min.py index 4c69968..16959c2 100644 --- a/tests/e2e/replay/test_derkachi_1min.py +++ b/tests/e2e/replay/test_derkachi_1min.py @@ -7,12 +7,15 @@ E2E pattern the heavy tests are gated by ``RUN_REPLAY_E2E=1``; the lightweight AC-4a (mode-agnosticism AST scan) and AC-7 (skip-gate self-check) run unconditionally. -Some ACs are SKIPPED with documented reasons until upstream work -ships: +Environment segregation: + +* **Tier-2 (Jetson)** tests are gated by ``RUN_REPLAY_E2E=1`` + + ``@pytest.mark.tier2`` — they SKIP on Mac and only run on Jetson + where the satellite-provider + C6 tile cache are available. +* **Unconditional** tests (AC-4a, AC-4b, AC-7) run everywhere. + +Still skipped with documented reasons: -* AC-3 (≤ 100 m for 80 % of ticks) — ``xfail`` until a real Topotek - KHP20S30 calibration ships (camera_info.md notes the intrinsics - are unknown). * AC-4b (encoder byte-equality) — ``skip`` until AZ-558 routes the C8 outbound bytes through the ``MavlinkTransport`` seam. * AC-8 / AC-9 in spec (operator workflow rehearsal) — ``skip`` until @@ -58,15 +61,6 @@ _HEAVY_SKIP = pytest.mark.skipif( @pytest.mark.tier2 @_HEAVY_SKIP -@pytest.mark.xfail( - reason=( - "AZ-963: open-loop ESKF diverges on the Derkachi fixture " - "(~10 s, frame ~233, Mahalanobis² > 100). The fixture has no " - "reference C6 tile cache → no satellite anchoring (C2/C3/C4). " - "Expected physics until AZ-777 lands a reference tile cache." - ), - strict=False, -) def test_ac1_exits_0_jsonl_count_match(replay_runner, derkachi_replay_inputs) -> None: """Real loop emits one EstimatorOutput per video frame, not per GPS fix. @@ -153,19 +147,6 @@ def test_ac2_jsonl_schema_match(replay_runner) -> None: @pytest.mark.tier2 @_HEAVY_SKIP -@pytest.mark.xfail( - reason=( - "AZ-963: open-loop ESKF diverges on the Derkachi fixture " - "(~10 s, frame ~233, Mahalanobis² > 100). The fixture has no " - "reference C6 tile cache → no satellite anchoring (C2/C3/C4). " - "Expected physics until AZ-777 lands a reference tile cache. " - "The XPASS observed pre-AZ-963 was a false positive: the test " - "did not check returncode, so partial pre-divergence JSONL rows " - "happened to match GT by chance. The returncode assertion added " - "below now makes the failure honest." - ), - strict=False, -) def test_ac3_within_100m_80pct_of_ticks(replay_runner, derkachi_replay_inputs) -> None: # Act result = replay_runner(pace="asap") @@ -395,15 +376,6 @@ def test_ac4_encoder_byte_equality_via_transport_seam() -> None: @pytest.mark.tier2 @_HEAVY_SKIP -@pytest.mark.xfail( - reason=( - "AZ-963: open-loop ESKF diverges on the Derkachi fixture " - "(~10 s, frame ~233, Mahalanobis² > 100). The fixture has no " - "reference C6 tile cache → no satellite anchoring (C2/C3/C4). " - "Expected physics until AZ-777 lands a reference tile cache." - ), - strict=False, -) def test_ac5_determinism_two_runs_diff(replay_runner) -> None: # Act r1 = replay_runner(pace="asap") @@ -433,15 +405,6 @@ def test_ac5_determinism_two_runs_diff(replay_runner) -> None: @pytest.mark.tier2 @_HEAVY_SKIP -@pytest.mark.xfail( - reason=( - "AZ-963: open-loop ESKF diverges on the Derkachi fixture " - "(~10 s, frame ~233, Mahalanobis² > 100). The fixture has no " - "reference C6 tile cache → no satellite anchoring (C2/C3/C4). " - "Expected physics until AZ-777 lands a reference tile cache." - ), - strict=False, -) def test_ac6_pace_realtime_60s_within_5pct(replay_runner) -> None: # Act — cap to 60 s so a full 490-second flight doesn't pin the test # to an 8-minute realtime run; the pacing correctness is validated @@ -460,15 +423,6 @@ def test_ac6_pace_realtime_60s_within_5pct(replay_runner) -> None: @pytest.mark.tier2 @_HEAVY_SKIP -@pytest.mark.xfail( - reason=( - "AZ-963: open-loop ESKF diverges on the Derkachi fixture " - "(~10 s, frame ~233, Mahalanobis² > 100). The fixture has no " - "reference C6 tile cache → no satellite anchoring (C2/C3/C4). " - "Expected physics until AZ-777 lands a reference tile cache." - ), - strict=False, -) def test_ac6_pace_asap_under_30s(replay_runner) -> None: # Act result = replay_runner(pace="asap")