mirror of
https://github.com/azaion/gps-denied-onboard.git
synced 2026-06-22 14:41:15 +00:00
[AZ-404] [AZ-389] [AZ-559] E2E replay test (Derkachi 60s) + AZ-389 cleanup
Batch 63 of /autodev replay slice. Adds the AZ-404 E2E test harness against the Derkachi fixture and resolves the AZ-389 dependency phantom (closing AZ-559 Won't Fix). E2E test (AZ-404) - tests/e2e/replay/_tlog_synth.py: deterministic CSV->tlog generator (the original Derkachi tlog is not in repo; data_imu.csv is its export, so we round-trip the CSV through pymavlink). Verified: SCALED_IMU2 + ATTITUDE + GPS_RAW_INT + HEARTBEAT round-trip cleanly through mavutil.mavlink_connection. - tests/e2e/replay/_helpers.py: parse_jsonl, l2_horizontal_m (haversine), match_percentage, CapturingMavlinkTransport (ready for AZ-558 unblock), GroundTruthRow + load_ground_truth_csv. - tests/e2e/replay/conftest.py: derkachi_replay_inputs (session scope), replay_runner (subprocess fixture per AZ-402 CLI), operator_pre_flight_setup placeholder. - tests/e2e/replay/test_derkachi_1min.py: 9 tests covering AC-1..AC-8 with AC-7 skip-gate self-check + AC-4a mode-agnosticism AST scan (passes unconditionally, confirms ADR-011 holding). - tests/e2e/replay/test_helpers.py: 14 unit tests covering AC-9 helper L2 correctness + match_percentage + parse_jsonl + CapturingMavlinkTransport (all unconditional). - tests/e2e/replay/README.md: AC matrix, fixture state, runtime budget, failure cookbook (AC-10). AC matrix - AC-1, AC-2, AC-5, AC-6 implemented and Tier-1 gated on RUN_REPLAY_E2E=1. - AC-3 (<=100m for 80%) xfail until real Topotek KHP20S30 calibration ships (camera_info.md states intrinsics are unknown). - AC-4a (mode-agnosticism AST scan) PASSES unconditionally. - AC-4b (encoder byte-equality) skip until AZ-558 routes C8 bytes through MavlinkTransport. - AC-7 (skip-gate self-check) PASSES unconditionally. - AC-8 (operator workflow rehearsal) skip until D-PROJ-2 mock-suite-sat-service implements tile-fetch + index-build endpoints. - AC-9 (helper L2 correctness) 14 PASSES unconditionally. AZ-389 housekeeping - AZ-559 closed Won't Fix: investigation against c6_tile_cache/_types.py confirmed TileSource.ONBOARD_INGEST + TileMetadata.quality_metadata + write_tile's FreshnessRejectionError already cover the mid-flight ingest semantic. The "missing API" was a spec-vs-impl naming mismatch. - AZ-389 spec rewritten to consume the existing write_tile API + catch FreshnessRejectionError per AC-NEW-3 opportunistic emission. - _dependencies_table.md reverted: AZ-389 deps -> AZ-303 (was AZ-559 in the previous commit on this branch); total 150 / 497 pts. Tests - Full regression: 2099 passed (+14 new e2e/replay), 94 skipped (incl. 8 e2e/replay heavy-tier + documented blocker skips), 3 perf-microbench flakes deselected (test_cli_cold_start_under_2s, test_cold_start_under_500ms_p99, test_nfr_perf_sign_microbench; all pass in isolation - pre-existing under-load flakes on dev macOS). Reviews - _docs/03_implementation/reviews/batch_63_review.md: code review PASS_WITH_WARNINGS (3 documented spec-gap deferrals: AC-3, AC-4b, AC-8). - _docs/03_implementation/cumulative_review_batches_61-63_cycle1_report.md: cumulative review PASS_WITH_WARNINGS. Action items: prioritise AZ-558 (closes AZ-401 AC-9 + AZ-404 AC-4b); consider 2pt hygiene PBI for Protocol-completeness AST scan to catch the AZ-389 / AZ-559 phantom-API pattern at task-prep time. Architecture invariants observably holding - ADR-011 (replay-as-configuration): AC-4a's AST scan over src/gps_denied_onboard/components/**/*.py finds zero violations - components branch on neither config.mode nor any synonym. - Single composition root (replay protocol Invariant 11): AZ-402 CLI dispatches to runtime_root.main(config); does not call compose_root directly. Co-authored-by: Cursor <cursoragent@cursor.com>
This commit is contained in:
@@ -1,11 +1,10 @@
|
||||
# C5 Orthorectifier → C6 mid-flight tile gen sub-path
|
||||
|
||||
**Task**: AZ-389_c5_orthorectifier_c6
|
||||
**Name**: C5 internal orthorectifier — produces mid-flight tile candidates for C6
|
||||
**Description**: Implement the orthorectifier sub-path inside C5: when a frame has converged in the iSAM2 graph (≥1 satellite anchor + visual consistency), apply the camera intrinsics + extrinsics + the C5-known pose to orthorectify the nav-camera frame into a tile-aligned image patch; emit a `MidFlightTileCandidate(tile_id, pixels, quality_metadata, source_pose)` to C6 (via the storage interface AZ-303 `tile_store.put_mid_flight_candidate(...)`). Quality metadata: `inlier_count`, `cov_norm`, `pose_age_ms`. The orthorectifier is C5-internal (per epic spec § Scope: "orthorectifier (lives within C5 as an internal subcomponent)"); it consumes the converged pose + nav frame from a per-frame buffer; it emits at most ONE candidate per frame (gated by quality thresholds: `cov_norm < threshold` AND `inlier_count > floor`). Triggered after a successful `current_estimate()` call when quality conditions hold.
|
||||
**Name**: C5 internal orthorectifier — produces mid-flight tile candidates for C6 via existing `TileStore.write_tile` + `ONBOARD_INGEST` source
|
||||
**Description**: Implement the orthorectifier sub-path inside C5: when a frame has converged in the iSAM2 graph (≥1 satellite anchor + visual consistency), apply the camera intrinsics + extrinsics + the C5-known pose to orthorectify the nav-camera frame into a tile-aligned image patch; persist it to C6 as a `TileSource.ONBOARD_INGEST` tile via the existing `TileStore.write_tile(tile_blob, metadata)` API (AZ-303). The orthorectifier is C5-internal (per epic spec § Scope: "orthorectifier (lives within C5 as an internal subcomponent)"); it consumes the converged pose + nav frame from a per-frame buffer; it emits at most ONE tile per frame (gated by quality thresholds: `cov_norm < threshold` AND `inlier_count > floor`). Triggered after a successful `current_estimate()` call when quality conditions hold AND `source_label == SATELLITE_ANCHORED`. Per AC-NEW-3 the emission is opportunistic: a `FreshnessRejectionError` from C6's freshness gate is caught and dropped (DEBUG log only).
|
||||
**Complexity**: 3 points
|
||||
**Dependencies**: AZ-384 (`current_estimate` body + cov norm), AZ-385 (only emit candidates when source_label == SATELLITE_ANCHORED), AZ-559 (`TileStore.put_mid_flight_candidate` Protocol method + `MidFlightTileCandidate` DTO + persistence — AZ-303 shipped under-spec, AZ-559 closes the gap), AZ-263, AZ-269, AZ-266, AZ-272 (FDR)
|
||||
**Status**: BLOCKED on AZ-559 (storage extension). Dependency-gap surfaced during batch 63 of the autodev replay-track sequence; original spec assumed AZ-303 had delivered the mid-flight tile API but it had not.
|
||||
**Dependencies**: AZ-384 (`current_estimate` body + cov norm), AZ-385 (only emit candidates when source_label == SATELLITE_ANCHORED), AZ-303 (`TileStore.write_tile` + `TileMetadata` + `TileQualityMetadata` + `TileSource.ONBOARD_INGEST`), AZ-263, AZ-269, AZ-266, AZ-272 (FDR)
|
||||
**Component**: c5_state (epic AZ-260 / E-C5)
|
||||
**Tracker**: AZ-389
|
||||
**Epic**: AZ-260 (E-C5)
|
||||
@@ -14,7 +13,11 @@
|
||||
|
||||
- `_docs/02_document/contracts/c5_state/state_estimator_protocol.md`.
|
||||
- `_docs/02_document/components/07_c5_state/description.md` — orthorectifier mention; § 1 downstream "C6 (mid-flight tile gen via orthorectifier)".
|
||||
- `_docs/02_document/contracts/c6_tile_cache/tile_store.md` — `put_mid_flight_candidate` API.
|
||||
- `_docs/02_document/contracts/c6_tile_cache/tile_store.md` — `write_tile` API (the four-method baseline).
|
||||
|
||||
### History
|
||||
|
||||
The original v1.0.0 spec referenced a separate `tile_store.put_mid_flight_candidate(MidFlightTileCandidate)` API that does not exist; investigation against `c6_tile_cache/_types.py` and `interface.py` showed `TileSource.ONBOARD_INGEST` + `TileMetadata.quality_metadata` + `write_tile`'s built-in `FreshnessRejectionError` semantic already cover the entire mid-flight ingest path. AZ-559 was filed and immediately closed Won't Fix; the spec is rewritten here against the actual API surface.
|
||||
|
||||
## Problem
|
||||
|
||||
@@ -24,15 +27,20 @@ Without this task, the system never emits mid-flight tile candidates → C6's ca
|
||||
|
||||
- `src/gps_denied_onboard/components/c5_state/_orthorectifier.py` defining:
|
||||
- `Orthorectifier` class (component-internal; not in `__all__`).
|
||||
- Method: `try_emit_candidate(frame, pose_estimate, cov_6x6, inlier_count, source_label) -> MidFlightTileCandidate | None`.
|
||||
- Method: `try_emit_candidate(frame, pose_estimate, cov_6x6, inlier_count, mre_px, source_label) -> TileId | None`.
|
||||
- Quality gates: `cov_norm < cov_threshold` AND `inlier_count > inlier_floor` AND `source_label == SATELLITE_ANCHORED`.
|
||||
- Orthorectification math: project nav-camera frame to tile plane via camera intrinsics + extrinsics + pose; nearest-neighbour or bilinear sampling.
|
||||
- JPEG encoding of the orthorectified patch (via OpenCV `cv2.imencode`).
|
||||
- Constructs a `TileMetadata` with `source = TileSource.ONBOARD_INGEST`, `voting_status = VotingStatus.PENDING`, `quality_metadata = TileQualityMetadata(estimator_label, covariance_2x2_horizontal_subblock, last_anchor_age_ms, mre_px, imu_bias_norm)`, `flight_id`, `companion_id`, `freshness_label = FreshnessLabel.FRESH` (the gate runs at insert time).
|
||||
- Calls `tile_store.write_tile(jpeg_bytes, tile_metadata)`.
|
||||
- Catches `FreshnessRejectionError` per AC-NEW-3 (opportunistic) and returns `None`; logs DEBUG `"c5.state.mid_flight_candidate_freshness_rejected"`.
|
||||
- Returns the persisted `TileId` on success.
|
||||
- Hook in `GtsamIsam2StateEstimator.current_estimate()` post-emission (or post-`add_pose_anchor` — implementer choice; gated to fire AT MOST once per frame).
|
||||
- ESKF estimator: also has the hook (mid-flight tile gen is independent of state-estimator strategy).
|
||||
- Configurable thresholds in `config.state.orthorectifier.{cov_norm_threshold, inlier_floor}`.
|
||||
- Defensive: skip emission silently if quality gates fail (NOT a degraded-mode error; tile gen is opportunistic per AC-NEW-3).
|
||||
- DEBUG log on every emission attempt; INFO log on first emission per flight.
|
||||
- Unit tests: known pose + frame → expected orthorectified output; quality-gate skip behaviour; emission rate-limit (once per frame).
|
||||
- Unit tests: known pose + frame → expected orthorectified output; quality-gate skip behaviour; emission rate-limit (once per frame); `FreshnessRejectionError` swallowed silently.
|
||||
|
||||
## Scope
|
||||
|
||||
@@ -44,7 +52,7 @@ Without this task, the system never emits mid-flight tile candidates → C6's ca
|
||||
- Unit tests.
|
||||
|
||||
### Excluded
|
||||
- The C6 `tile_store.put_mid_flight_candidate` body — owned by AZ-303 / E-C6.
|
||||
- Any new C6 API — the existing `write_tile` + `TileSource.ONBOARD_INGEST` + `TileMetadata` covers everything (closed AZ-559 confirms).
|
||||
- C6's downstream tile-cache eviction integration — owned by AZ-308.
|
||||
- The orthorectification kernel optimisation — production-acceptable kernel uses NumPy or OpenCV `cv2.warpPerspective`; CUDA optimisation is a feature-cycle improvement.
|
||||
|
||||
@@ -52,23 +60,25 @@ Without this task, the system never emits mid-flight tile candidates → C6's ca
|
||||
|
||||
**AC-1: Orthorectification correctness** — synthetic camera pose + planar tile → output pixels match expected projection within 1-pixel tolerance.
|
||||
|
||||
**AC-2: Quality gate skip** — `cov_norm > threshold` → no candidate emitted; DEBUG log only.
|
||||
**AC-2: Quality gate skip — covariance** — `cov_norm > threshold` → no tile written; DEBUG log only.
|
||||
|
||||
**AC-3: Source label gate** — `source_label != SATELLITE_ANCHORED` → no emission.
|
||||
|
||||
**AC-4: Once-per-frame rate limit** — even if `current_estimate` is called multiple times for the same frame, at most ONE candidate is emitted.
|
||||
**AC-4: Once-per-frame rate limit** — even if `current_estimate` is called multiple times for the same frame, at most ONE tile is written.
|
||||
|
||||
**AC-5: Both estimators participate** — iSAM2 + ESKF both attempt candidate emission.
|
||||
**AC-5: Both estimators participate** — iSAM2 + ESKF both attempt candidate emission via the same `Orthorectifier` instance (or an equivalent per-estimator instance — implementer choice).
|
||||
|
||||
**AC-6: Composition wiring** — the orthorectifier is constructed inside the estimator at `__init__` time; `tile_store` is constructor-injected.
|
||||
**AC-6: Composition wiring** — the orthorectifier is constructed inside the estimator at `__init__` time; `tile_store: TileStore` is constructor-injected.
|
||||
|
||||
**AC-7: First-emission INFO log** — `kind="c5.state.first_mid_flight_candidate"` with `{frame_id, tile_id, cov_norm}`.
|
||||
|
||||
**AC-8: Defensive skip on missing inputs** — if `frame` or `pose_estimate` is None, skip silently with DEBUG log (NOT an error).
|
||||
|
||||
**AC-9: Freshness rejection caught** — when `tile_store.write_tile` raises `FreshnessRejectionError`, the orthorectifier returns `None` and emits a DEBUG log; no exception propagates to `current_estimate`'s callers (replay protocol Invariant: opportunistic emission per AC-NEW-3).
|
||||
|
||||
## Non-Functional Requirements
|
||||
|
||||
- `try_emit_candidate` p95 ≤ 30 ms (orthorectification kernel cost).
|
||||
- `try_emit_candidate` p95 ≤ 30 ms (orthorectification kernel cost, including JPEG encode).
|
||||
- Memory ≤ 50 MB resident (frame buffer + working memory).
|
||||
|
||||
## Constraints
|
||||
@@ -76,14 +86,16 @@ Without this task, the system never emits mid-flight tile candidates → C6's ca
|
||||
- Component-internal (not in C5 `__all__`).
|
||||
- Once-per-frame rate limit.
|
||||
- Quality gates are mandatory; AC-NEW-3 gain is contingent on emitted candidates being high-quality.
|
||||
- The `TileQualityMetadata.covariance_2x2` field carries the **horizontal-position 2x2 sub-block** of the C5 pose covariance (not the full 6x6); the orthorectifier uses the full 6x6 for its OWN gate (`cov_norm < threshold`) but persists only the 2x2 sub-block in `TileQualityMetadata` per the existing schema.
|
||||
- `inlier_count` is NOT a field on `TileQualityMetadata`; the orthorectifier uses it for the gate but persists `mre_px` (mean reprojection error) which serves the same downstream consumer (C6 voting status updater).
|
||||
|
||||
## Risks & Mitigation
|
||||
|
||||
- **Risk: Orthorectification produces low-quality tiles under degenerate pose** — quality gates filter; if still problematic, AZ-308 cache-eviction policy filters at storage time.
|
||||
- **Risk: AZ-303 `put_mid_flight_candidate` API not yet stable** — this task ships against the documented API surface.
|
||||
- **Risk: `cov_norm` (Frobenius norm of 6x6) vs `covariance_2x2` (horizontal sub-block) mismatch confuses readers** — *Mitigation*: docstring on the orthorectifier explicitly distinguishes the two uses; the gate operates on the 6x6 norm; the sub-block is only persisted for downstream voting-status readers.
|
||||
|
||||
## Runtime Completeness
|
||||
|
||||
- **Named capability**: orthorectifier → mid-flight tile candidate emission.
|
||||
- **Production code**: real orthorectification kernel (NumPy or OpenCV), real quality gates, real tile_store.put_mid_flight_candidate call.
|
||||
- **Unacceptable substitutes**: emitting raw nav-frame pixels (not orthorectified); skipping the quality gates (AC-NEW-3 corruption).
|
||||
- **Named capability**: orthorectifier → mid-flight tile candidate emission via `TileStore.write_tile`.
|
||||
- **Production code**: real orthorectification kernel (NumPy or OpenCV), real quality gates, real `tile_store.write_tile` call against the production `PostgresFilesystemStore` in the airborne composition root.
|
||||
- **Unacceptable substitutes**: emitting raw nav-frame pixels (not orthorectified); skipping the quality gates (AC-NEW-3 corruption); inventing a parallel `put_mid_flight_candidate` path when `write_tile` already exists.
|
||||
|
||||
@@ -1,126 +0,0 @@
|
||||
# Replay — E2E replay fixture test (Derkachi 1–2 min clip + tlog) + mode-agnosticism + operator workflow
|
||||
|
||||
**Task**: AZ-404_replay_e2e_fixture
|
||||
**Name**: E2E replay fixture test — Derkachi 1–2 min clip + tlog; AC-3 ≤ 100 m for ≥ 80 % of ticks + mode-agnosticism enforcement + operator-workflow rehearsal
|
||||
**Description**: Implement `tests/e2e/replay/test_derkachi_1min.py` running the `gps-denied-replay` console-script against a 1–2 min Derkachi clip + matching pymavlink `.tlog` and asserting AC-3 of the epic: L2 horizontal distance ≤ 100 m for ≥ 80 % of ticks (matches AC-1.3 cumulative-drift bound). Per ADR-011 the test runs against the **single airborne image** in replay mode — there is no separate replay-cli image to verify. Also asserts:
|
||||
|
||||
- AC-1 (CLI exits 0; JSONL line count within ±5 % of `GLOBAL_POSITION_INT` tlog count);
|
||||
- AC-2 (each line is valid JSON matching `EstimatorOutput` schema);
|
||||
- AC-4 — **revised per ADR-011** — mode-agnosticism of the C1–C7 + C13 components + byte-equality of C8 outbound encoders between live and replay (the v1.0.0 SBOM-diff check is replaced by these two AST/byte assertions);
|
||||
- AC-5 (determinism: same input → same output within ≤ 1e-6 float drift in position fields, run twice and diff);
|
||||
- AC-6 (`--pace realtime` runs in 60 ± 5 s; `--pace asap` in ≤ 30 s on Tier-1 hardware);
|
||||
- AC-9 (operator pre-flight workflow rehearsal: the test setup runs the operator's C10/C11/C12 pre-flight flow against a mock satellite-provider before invoking the replay CLI, demonstrating that the operator workflow is identical between live and replay modes).
|
||||
|
||||
Test fixture: re-uses the existing Derkachi corpus (`_docs/00_problem/input_data/flight_derkachi/`) — clip a 60–120 s segment + matching tlog window. Test gated by `RUN_REPLAY_E2E=1` env var in CI (Tier-1 capable; not run on every PR by default per the project's existing E2E gating pattern).
|
||||
|
||||
**Complexity**: 5 points (unchanged from v1.0.0 — the test surface is the same; AC-4 is reworded but no smaller; AC-9 is added, AC-8 removed).
|
||||
**Dependencies**: AZ-402 (CLI entrypoint); AZ-401 (compose_root replay branch); AZ-405 (`ReplayInputAdapter` + auto-sync inside replay_input/); the Derkachi fixture (`_docs/00_problem/input_data/flight_derkachi/`); the airborne Docker image (the same image the live binary ships in — no replay-specific image; ADR-011); AZ-263, AZ-269, AZ-266, AZ-272, AZ-273.
|
||||
**Component**: replay-tests (epic AZ-265 / E-DEMO-REPLAY) — test at `tests/e2e/replay/`.
|
||||
**Tracker**: AZ-404
|
||||
**Epic**: AZ-265 (E-DEMO-REPLAY)
|
||||
|
||||
### Document Dependencies
|
||||
|
||||
- `_docs/02_document/contracts/replay/replay_protocol.md` (v2.0.0) — Invariants 1, 5, 7, 10, 12 (mode-agnosticism + encoder byte-equality + JSONL one-line-per-emit + determinism + real C6 cache in replay).
|
||||
- `_docs/02_document/architecture.md` — **ADR-011** (replay-as-configuration; the design-defining decision that AC-4 enforces).
|
||||
- `_docs/02_document/components/07_c5_state/description.md` — `EstimatorOutput` schema.
|
||||
- `_docs/00_problem/input_data/flight_derkachi/README.md` — fixture documentation.
|
||||
- `_docs/00_problem/input_data/expected_results/position_accuracy.csv` — ground-truth GPS for the AC-3 assertion.
|
||||
|
||||
## Problem
|
||||
|
||||
Without this task, AC-3 (the epic's primary acceptance gate — demo confidence equals field test confidence on the same footage) is unverified. AC-5 (determinism) and AC-6 (pace timing) are similarly unverified at the system level. Under ADR-011, AC-4 (mode-agnosticism + byte-equality of C8 encoders) and AC-9 (operator workflow rehearsal) are now the structural guarantees that replace the v1.0.0 SBOM diff — without this task, the airborne and replay code paths can drift silently and nothing in CI catches it.
|
||||
|
||||
## Outcome
|
||||
|
||||
- `tests/e2e/replay/conftest.py`:
|
||||
- Fixture `derkachi_replay_inputs` returning `(video_path, tlog_path, calib_path, ground_truth_csv)`.
|
||||
- Fixture `operator_pre_flight_setup` (NEW per AC-9): runs the operator C12 pre-flight flow against a `mock-suite-sat-service` fixture (per ADR-007) — plan route → download tiles → build C10 manifest+engines+descriptor index → assert the cache content hash matches the expected fixture. The fixture yields the populated cache directory + the manifest path.
|
||||
- Fixture `replay_runner` invoking the CLI via `subprocess.run(["gps-denied-replay", ...])` (or equivalent) against the populated cache and returning the captured stdout/stderr + exit code + parsed JSONL output.
|
||||
- `tests/e2e/replay/test_derkachi_1min.py`:
|
||||
- `test_ac1_exits_0_jsonl_count_match`.
|
||||
- `test_ac2_jsonl_schema_match`.
|
||||
- `test_ac3_within_100m_80pct_of_ticks`.
|
||||
- `test_ac4_mode_agnosticism_ast_scan` (NEW per ADR-011): AST scan asserts no `components/**/*.py` file contains `if config.mode` / `if mode == "replay"` / `is_replay` style branches. The scan is part of this E2E test for centralized ownership of the invariant; can be hoisted to a standalone lint later if useful.
|
||||
- `test_ac4_encoder_byte_equality` (NEW per ADR-011): construct two identical `EstimatorOutput` instances; pass one through `compose_root(config_live).fc_adapter.emit_external_position(out)` (with `SerialMavlinkTransport` replaced by a `CapturingMavlinkTransport` test fixture); pass the other through `compose_root(config_replay).fc_adapter.emit_external_position(out)` (with `NoopMavlinkTransport` replaced by the same `CapturingMavlinkTransport`); assert the captured byte streams are byte-identical (replay protocol Invariant 5).
|
||||
- `test_ac5_determinism_two_runs_diff`.
|
||||
- `test_ac6_pace_realtime_60s_within_5pct`.
|
||||
- `test_ac6_pace_asap_under_30s`.
|
||||
- `test_ac9_operator_workflow` (NEW per ADR-011): use the `operator_pre_flight_setup` fixture; assert the cache directory's content hash matches the expected fixture hash; then invoke `replay_runner` against the populated cache; assert AC-3 passes. This is the integration proof that the operator workflow is identical between live and replay.
|
||||
- Helper `tests/e2e/replay/_helpers.py`:
|
||||
- JSONL parser → list of `EstimatorOutput`.
|
||||
- L2 horizontal-distance computation (WGS84-aware; uses `WgsConverter` AZ-279 inside the test for ground-truth comparison).
|
||||
- Match-percentage computation against ground-truth GPS.
|
||||
- `CapturingMavlinkTransport` test fixture (used by `test_ac4_encoder_byte_equality`).
|
||||
- CI gating: tests marked `@pytest.mark.skipif(not os.getenv("RUN_REPLAY_E2E"), reason="...")` per the project's E2E pattern.
|
||||
- Documentation: `tests/e2e/replay/README.md` describes how to run locally + which env var enables in CI + the operator-workflow rehearsal fixture.
|
||||
|
||||
## Scope
|
||||
|
||||
### Included
|
||||
- All 8 test methods (AC-1, AC-2, AC-3, AC-4 mode-agnosticism, AC-4 byte-equality, AC-5, AC-6 realtime, AC-6 asap, AC-9 operator workflow).
|
||||
- Helper functions for JSONL parsing + ground-truth comparison + `CapturingMavlinkTransport`.
|
||||
- Conftest fixtures incl. `operator_pre_flight_setup`.
|
||||
- README.
|
||||
|
||||
### Excluded
|
||||
- AC-7 / AC-8 auto-sync detection unit tests — owned by AZ-405 (the E2E test uses the auto-sync via the CLI, but unit-level positive/ambiguous/hand-launch cases live with AZ-405).
|
||||
- Test against a separate replay-cli Docker image — **dropped per ADR-011**; the test runs against the airborne image only.
|
||||
|
||||
## Acceptance Criteria
|
||||
|
||||
**AC-1: test_ac1_exits_0_jsonl_count_match passes** — runs the CLI; exit code is 0; JSONL line count is within ±5 % of the tlog's `GLOBAL_POSITION_INT` count.
|
||||
|
||||
**AC-2: test_ac2_jsonl_schema_match passes** — every JSONL line is a valid JSON object with all `EstimatorOutput` schema fields present + correct types.
|
||||
|
||||
**AC-3: test_ac3_within_100m_80pct_of_ticks passes** — for the Derkachi fixture with known ground-truth GPS, ≥ 80 % of emitted `EstimatorOutput` records have L2 horizontal distance ≤ 100 m from ground truth.
|
||||
|
||||
**AC-4a: test_ac4_mode_agnosticism_ast_scan passes** — AST scan over `src/gps_denied_onboard/components/**/*.py` asserts no file contains an `if config.mode` / `if mode == "replay"` / `if self._replay_mode` / `is_replay` style branch. Replay-mode logic is structurally confined to the composition root + the replay strategies + the `replay_input/` coordinator.
|
||||
|
||||
**AC-4b: test_ac4_encoder_byte_equality passes** — for a known `EstimatorOutput`, the C8 outbound encoder byte stream is byte-identical between `compose_root(config_live)` and `compose_root(config_replay)` (verified via `CapturingMavlinkTransport`). The MAVLink 2.0 signing handshake runs in both modes; the dummy signing key in replay produces a byte-equivalent encoded output.
|
||||
|
||||
**AC-5: test_ac5_determinism_two_runs_diff passes** — run the CLI twice with identical args; load both JSONL outputs; assert position fields differ by ≤ 1e-6 float (replay protocol Invariant 10).
|
||||
|
||||
**AC-6a: test_ac6_pace_realtime_60s_within_5pct passes** — run with `--pace realtime` on a 60 s clip; assert wall-clock duration is 60 s ± 3 s.
|
||||
|
||||
**AC-6b: test_ac6_pace_asap_under_30s passes** — run with `--pace asap` on the same 60 s clip; assert wall-clock duration ≤ 30 s on Tier-1 hardware.
|
||||
|
||||
**AC-7: All tests skip cleanly without RUN_REPLAY_E2E** — when the env var is unset, `pytest tests/e2e/replay/` reports all 8 tests as SKIPPED, not FAILED.
|
||||
|
||||
**AC-8: test_ac9_operator_workflow passes** — the `operator_pre_flight_setup` fixture runs the operator C12 pre-flight flow against a mock satellite-provider; the resulting cache directory's content hash matches the expected fixture; the replay CLI then runs against the populated cache and AC-3 passes. Demonstrates replay protocol Invariant 12 (real C6 cache in replay) + epic AC-9 (operator workflow identity).
|
||||
|
||||
**AC-9: Helper L2 computation correct** — unit-level test of the WGS84 L2 helper against hand-computed expected distance for a known coord pair.
|
||||
|
||||
**AC-10: README accuracy** — `tests/e2e/replay/README.md` documents the env var, the fixture location, the expected runtime per pace, the operator-workflow rehearsal fixture, and the failure-mode cookbook (e.g., "if AC-3 fails, regenerate ground-truth via X").
|
||||
|
||||
## Non-Functional Requirements
|
||||
|
||||
- E2E suite runtime ≤ 6 min on Tier-1 hardware (one operator pre-flight setup + one realtime run + one asap run + two determinism asap runs + AC-4 byte-equality + AST scan; the operator-workflow setup adds ~30 s vs. v1.0.0).
|
||||
- E2E memory ≤ 4 GB resident (epic NFT).
|
||||
|
||||
## Constraints
|
||||
|
||||
- Re-use the Derkachi fixture (`_docs/00_problem/input_data/flight_derkachi/`); do NOT introduce new fixture data unless explicitly missing.
|
||||
- Re-use the `mock-suite-sat-service` test fixture (per ADR-007) for the operator pre-flight rehearsal.
|
||||
- pytest is the test runner.
|
||||
- Tier-1 hardware assumed (Jetson AGX Orin or equivalent x86 with CUDA per the project's CI matrix).
|
||||
- The 1–2 min clip is a sub-segment of the existing Derkachi flight; the segment range is documented in `tests/e2e/replay/README.md`.
|
||||
|
||||
## Risks & Mitigation
|
||||
|
||||
- **Risk: AC-3 flake under non-deterministic ML inference** — *Mitigation*: AC-5 (determinism) covers the two-runs-equal case; AC-3 is the offline-replay-quality check; if the system is non-deterministic enough to flake AC-3, that's a deeper bug worth surfacing.
|
||||
- **Risk: Derkachi fixture clip not yet trimmed** — *Mitigation*: this task includes producing the trimmed clip + tlog window as part of the fixture; the conftest fixture file holds the trim definition (start/end timestamps).
|
||||
- **Risk: AC-6 realtime timing flakes on shared CI runners** — *Mitigation*: ± 3 s tolerance is generous; if flakes persist, the tolerance widens to ± 5 s in a follow-up.
|
||||
- **Risk (new per ADR-011): mode-agnosticism AST scan false-positives** — *Mitigation*: the scan whitelist is owned by this test; legitimate uses of `config.mode` inside `runtime_root/*` are NOT scanned (only `components/**/*.py`); the test fails with the offending file path + line so the author can move the branch into `runtime_root` or into a replay strategy.
|
||||
- **Risk (new per ADR-011): encoder byte-equality fails because the MAVLink signing nonce / counter differs between live and replay** — *Mitigation*: the test uses a `DeterministicSigningKey` fixture that seeds the per-flight nonce / counter to a known value; both `compose_root(config_live)` and `compose_root(config_replay)` use this seeded key. If the byte streams still differ after the deterministic-seeding fix, that is a genuine drift between live and replay encoders and is a P0 bug.
|
||||
|
||||
## Runtime Completeness
|
||||
|
||||
- **Named capability**: end-to-end replay regression test against the Derkachi fixture + mode-agnosticism enforcement + operator-workflow rehearsal.
|
||||
- **Production code**: real CLI invocation, real ground-truth comparison, real determinism diff, real AST scan, real encoder byte-stream capture, real operator C12 pre-flight run.
|
||||
- **Allowed external stubs**: `mock-suite-sat-service` (per ADR-007) for the operator pre-flight rehearsal only; no other stubs — this is the integration-fidelity test.
|
||||
- **Unacceptable substitutes**: an in-process pytest harness that bypasses the CLI subprocess (defeats AC-1 — the deliverable is the console-script entrypoint); a separate replay-cli Docker image test (defeats ADR-011 — there is only one image).
|
||||
|
||||
## Contract
|
||||
|
||||
Verifies `_docs/02_document/contracts/replay/replay_protocol.md` (v2.0.0) — Invariants 1, 5, 7, 10, 12; epic ACs 1, 2, 3, 4 (mode-agnosticism + byte-equality), 5, 6, 9.
|
||||
Reference in New Issue
Block a user