mirror of
https://github.com/azaion/gps-denied-onboard.git
synced 2026-06-21 10:31:13 +00:00
[AZ-841] Remove xfail markers from Derkachi tests — environment segregation via tier2+RUN_REPLAY_E2E
This commit is contained in:
+16
-16
@@ -116,7 +116,7 @@ the attribution into the test fixture's metadata; do not remove it.
|
||||
| `flight_derkachi.mp4` | available | `_docs/00_problem/input_data/flight_derkachi/` |
|
||||
| `data_imu.csv` | available | same dir; 4900 rows at 10 Hz over 489.9 s |
|
||||
| Synthetic tlog | generated at fixture time | `_tlog_synth.py` reproduces a `pymavlink` `.tlog` from the CSV (the original tlog is not in-repo; the CSV was its export) |
|
||||
| Camera calibration | placeholder (`tests/fixtures/calibration/adti26.json`) | The real Topotek KHP20S30 intrinsics are unknown per `camera_info.md`. AC-3 is `xfail`ed until a real calibration ships. |
|
||||
| Camera calibration | placeholder (`tests/fixtures/calibration/adti26.json`) | The real Topotek KHP20S30 intrinsics are unknown per `camera_info.md`. AC-3 accuracy depends on this. |
|
||||
| Operator pre-flight rehearsal | blocked | `tests/fixtures/mock-suite-sat-service/` is a bootstrap stub (only `GET /healthz`); AC-8 skips until the full D-PROJ-2 contract lands. |
|
||||
|
||||
## Clip range
|
||||
@@ -131,12 +131,12 @@ drift-correction path. To change the trim, edit `_CLIP_START_S` and
|
||||
|
||||
| Test | Expected wall clock |
|
||||
|------|---------------------|
|
||||
| AC-1 (`--pace asap`) | would be ≤ 30 s; `xfail` (AZ-963) |
|
||||
| AC-2 schema match | piggybacks on AC-1; still runs (no divergence path) |
|
||||
| AC-5 determinism | would be 2 × asap runs (≤ 60 s total); `xfail` (AZ-963) |
|
||||
| AC-6 realtime | would be 60 s ± 3 s; `xfail` (AZ-963) |
|
||||
| AC-6 asap | would be ≤ 30 s; `xfail` (AZ-963) |
|
||||
| Total suite | ~6 min wall clock but 4 xfailed on Derkachi fixture |
|
||||
| AC-1 (`--pace asap`) | ≤ 30 s (Tier-2 only) |
|
||||
| AC-2 schema match | piggybacks on AC-1 (Tier-2 only) |
|
||||
| AC-5 determinism | 2 × asap runs (≤ 60 s total; Tier-2 only) |
|
||||
| AC-6 realtime | 60 s ± 3 s (Tier-2 only) |
|
||||
| AC-6 asap | ≤ 30 s (Tier-2 only) |
|
||||
| Total suite | ~6 min wall clock on Tier-2; skips on Mac |
|
||||
|
||||
The AC-1 / AC-2 / AC-5 tests share `--pace asap` runs but each
|
||||
fixture invocation produces a fresh output file, so they do not
|
||||
@@ -146,14 +146,14 @@ short-circuit each other (preserves AC-5's two-runs-diff guarantee).
|
||||
|
||||
| AC | Test | State |
|
||||
|----|------|-------|
|
||||
| AC-1: exit 0 + JSONL count match | `test_ac1_exits_0_jsonl_count_match` | `xfail` (AZ-963 — open-loop ESKF diverges without satellite anchoring) |
|
||||
| AC-2: JSONL schema match | `test_ac2_jsonl_schema_match` | runs on Tier-1 |
|
||||
| AC-3: ≤ 100 m for 80 % of ticks | `test_ac3_within_100m_80pct_of_ticks` | `xfail` (AZ-963 — open-loop ESKF cannot meet ≤100 m threshold without satellite anchoring; pre-AZ-963 XPASS was a false positive on partial pre-divergence output) |
|
||||
| AC-1: exit 0 + JSONL count match | `test_ac1_exits_0_jsonl_count_match` | Tier-2 (Jetson only) |
|
||||
| AC-2: JSONL schema match | `test_ac2_jsonl_schema_match` | Tier-2 (Jetson only) |
|
||||
| AC-3: ≤ 100 m for 80 % of ticks | `test_ac3_within_100m_80pct_of_ticks` | Tier-2 (Jetson only) |
|
||||
| AC-4a: mode-agnosticism AST scan | `test_ac4_mode_agnosticism_ast_scan` | unconditional |
|
||||
| AC-4b: encoder byte-equality | `test_ac4_encoder_byte_equality` | `skip` (waiting on AZ-558) |
|
||||
| AC-5: determinism | `test_ac5_determinism_two_runs_diff` | `xfail` (AZ-963 — open-loop ESKF diverges without satellite anchoring) |
|
||||
| AC-6a: realtime 60 s ± 5 % | `test_ac6_pace_realtime_60s_within_5pct` | `xfail` (AZ-963 — open-loop ESKF diverges without satellite anchoring) |
|
||||
| AC-6b: asap ≤ 30 s | `test_ac6_pace_asap_under_30s` | `xfail` (AZ-963 — open-loop ESKF diverges without satellite anchoring) |
|
||||
| AC-5: determinism | `test_ac5_determinism_two_runs_diff` | Tier-2 (Jetson only) |
|
||||
| AC-6a: realtime 60 s ± 5 % | `test_ac6_pace_realtime_60s_within_5pct` | Tier-2 (Jetson only) |
|
||||
| AC-6b: asap ≤ 30 s | `test_ac6_pace_asap_under_30s` | Tier-2 (Jetson only) |
|
||||
| AC-7: skip-gate self-check | `test_ac7_skip_gate_consistent_with_env_var` | unconditional |
|
||||
| AC-8: operator workflow rehearsal | `test_ac8_operator_workflow` | `skip` (waiting on D-PROJ-2 mock) |
|
||||
| AC-9: helper L2 correctness | `test_helpers.py::test_ac9_l2_*` | unconditional |
|
||||
@@ -187,9 +187,9 @@ tests/e2e/replay/
|
||||
|
||||
## Follow-up work
|
||||
|
||||
* **AZ-777** — build a reference C6 tile cache for the Derkachi fixture
|
||||
so C2/C3/C4 satellite re-anchoring is wired. This is the ONLY path
|
||||
that can resolve AZ-963 (open-loop ESKF divergence is expected physics).
|
||||
* **AZ-777** — DONE (Phases 1+2 shipped cycle 3). C11 contract adapted,
|
||||
e2e-runner wired against real satellite-provider. Phases 3-5 superseded
|
||||
by Epic AZ-835 children (AZ-839, AZ-840, AZ-841).
|
||||
* **Real Topotek KHP20S30 calibration** — needed for AC-3 accuracy even
|
||||
after AZ-777 lands (the threshold is ≤100 m for 80 % of ticks).
|
||||
* **AZ-558** — closes AC-4b (route C8 encoders through `MavlinkTransport`).
|
||||
|
||||
@@ -7,12 +7,15 @@ E2E pattern the heavy tests are gated by ``RUN_REPLAY_E2E=1``; the
|
||||
lightweight AC-4a (mode-agnosticism AST scan) and AC-7 (skip-gate
|
||||
self-check) run unconditionally.
|
||||
|
||||
Some ACs are SKIPPED with documented reasons until upstream work
|
||||
ships:
|
||||
Environment segregation:
|
||||
|
||||
* **Tier-2 (Jetson)** tests are gated by ``RUN_REPLAY_E2E=1`` +
|
||||
``@pytest.mark.tier2`` — they SKIP on Mac and only run on Jetson
|
||||
where the satellite-provider + C6 tile cache are available.
|
||||
* **Unconditional** tests (AC-4a, AC-4b, AC-7) run everywhere.
|
||||
|
||||
Still skipped with documented reasons:
|
||||
|
||||
* AC-3 (≤ 100 m for 80 % of ticks) — ``xfail`` until a real Topotek
|
||||
KHP20S30 calibration ships (camera_info.md notes the intrinsics
|
||||
are unknown).
|
||||
* AC-4b (encoder byte-equality) — ``skip`` until AZ-558 routes the
|
||||
C8 outbound bytes through the ``MavlinkTransport`` seam.
|
||||
* AC-8 / AC-9 in spec (operator workflow rehearsal) — ``skip`` until
|
||||
@@ -58,15 +61,6 @@ _HEAVY_SKIP = pytest.mark.skipif(
|
||||
|
||||
@pytest.mark.tier2
|
||||
@_HEAVY_SKIP
|
||||
@pytest.mark.xfail(
|
||||
reason=(
|
||||
"AZ-963: open-loop ESKF diverges on the Derkachi fixture "
|
||||
"(~10 s, frame ~233, Mahalanobis² > 100). The fixture has no "
|
||||
"reference C6 tile cache → no satellite anchoring (C2/C3/C4). "
|
||||
"Expected physics until AZ-777 lands a reference tile cache."
|
||||
),
|
||||
strict=False,
|
||||
)
|
||||
def test_ac1_exits_0_jsonl_count_match(replay_runner, derkachi_replay_inputs) -> None:
|
||||
"""Real loop emits one EstimatorOutput per video frame, not per GPS fix.
|
||||
|
||||
@@ -153,19 +147,6 @@ def test_ac2_jsonl_schema_match(replay_runner) -> None:
|
||||
|
||||
@pytest.mark.tier2
|
||||
@_HEAVY_SKIP
|
||||
@pytest.mark.xfail(
|
||||
reason=(
|
||||
"AZ-963: open-loop ESKF diverges on the Derkachi fixture "
|
||||
"(~10 s, frame ~233, Mahalanobis² > 100). The fixture has no "
|
||||
"reference C6 tile cache → no satellite anchoring (C2/C3/C4). "
|
||||
"Expected physics until AZ-777 lands a reference tile cache. "
|
||||
"The XPASS observed pre-AZ-963 was a false positive: the test "
|
||||
"did not check returncode, so partial pre-divergence JSONL rows "
|
||||
"happened to match GT by chance. The returncode assertion added "
|
||||
"below now makes the failure honest."
|
||||
),
|
||||
strict=False,
|
||||
)
|
||||
def test_ac3_within_100m_80pct_of_ticks(replay_runner, derkachi_replay_inputs) -> None:
|
||||
# Act
|
||||
result = replay_runner(pace="asap")
|
||||
@@ -395,15 +376,6 @@ def test_ac4_encoder_byte_equality_via_transport_seam() -> None:
|
||||
|
||||
@pytest.mark.tier2
|
||||
@_HEAVY_SKIP
|
||||
@pytest.mark.xfail(
|
||||
reason=(
|
||||
"AZ-963: open-loop ESKF diverges on the Derkachi fixture "
|
||||
"(~10 s, frame ~233, Mahalanobis² > 100). The fixture has no "
|
||||
"reference C6 tile cache → no satellite anchoring (C2/C3/C4). "
|
||||
"Expected physics until AZ-777 lands a reference tile cache."
|
||||
),
|
||||
strict=False,
|
||||
)
|
||||
def test_ac5_determinism_two_runs_diff(replay_runner) -> None:
|
||||
# Act
|
||||
r1 = replay_runner(pace="asap")
|
||||
@@ -433,15 +405,6 @@ def test_ac5_determinism_two_runs_diff(replay_runner) -> None:
|
||||
|
||||
@pytest.mark.tier2
|
||||
@_HEAVY_SKIP
|
||||
@pytest.mark.xfail(
|
||||
reason=(
|
||||
"AZ-963: open-loop ESKF diverges on the Derkachi fixture "
|
||||
"(~10 s, frame ~233, Mahalanobis² > 100). The fixture has no "
|
||||
"reference C6 tile cache → no satellite anchoring (C2/C3/C4). "
|
||||
"Expected physics until AZ-777 lands a reference tile cache."
|
||||
),
|
||||
strict=False,
|
||||
)
|
||||
def test_ac6_pace_realtime_60s_within_5pct(replay_runner) -> None:
|
||||
# Act — cap to 60 s so a full 490-second flight doesn't pin the test
|
||||
# to an 8-minute realtime run; the pacing correctness is validated
|
||||
@@ -460,15 +423,6 @@ def test_ac6_pace_realtime_60s_within_5pct(replay_runner) -> None:
|
||||
|
||||
@pytest.mark.tier2
|
||||
@_HEAVY_SKIP
|
||||
@pytest.mark.xfail(
|
||||
reason=(
|
||||
"AZ-963: open-loop ESKF diverges on the Derkachi fixture "
|
||||
"(~10 s, frame ~233, Mahalanobis² > 100). The fixture has no "
|
||||
"reference C6 tile cache → no satellite anchoring (C2/C3/C4). "
|
||||
"Expected physics until AZ-777 lands a reference tile cache."
|
||||
),
|
||||
strict=False,
|
||||
)
|
||||
def test_ac6_pace_asap_under_30s(replay_runner) -> None:
|
||||
# Act
|
||||
result = replay_runner(pace="asap")
|
||||
|
||||
Reference in New Issue
Block a user