mirror of
https://github.com/azaion/gps-denied-onboard.git
synced 2026-06-22 23:01:13 +00:00
Decompose Step 6 snapshot: 140 task specs + contract docs
Closes out greenfield Step 6 (Decompose) for all 14 components (C1-C13 + cross-cutting helpers/replay). Covers tasks AZ-266..AZ-446 plus the _dependencies_table.md and component contract documents. State file updated to greenfield Step 7 (Implement), not_started. Co-authored-by: Cursor <cursoragent@cursor.com>
This commit is contained in:
@@ -0,0 +1,103 @@
|
||||
# Replay — E2E replay fixture test (Derkachi 1–2 min clip + tlog)
|
||||
|
||||
**Task**: AZ-404_replay_e2e_fixture
|
||||
**Name**: E2E replay fixture test — Derkachi 1–2 min clip + tlog; AC-3 ≤ 100 m for ≥ 80 % of ticks
|
||||
**Description**: Implement `tests/e2e/replay/test_derkachi_1min.py` running the `gps-denied-replay` CLI against a 1–2 min Derkachi clip + matching pymavlink `.tlog` and asserting AC-3 of the epic: L2 horizontal distance ≤ 100 m for ≥ 80 % of ticks (matches AC-1.3 cumulative-drift bound). Also asserts AC-1 (CLI exits 0; JSONL line count within ±5 % of `GLOBAL_POSITION_INT` tlog count); AC-2 (each line is valid JSON matching `EstimatorOutput` schema); AC-5 (determinism: same input → same output within ≤ 1e-6 float drift in position fields, run twice and diff); AC-6 (`--pace realtime` runs in 60 ± 5 s; `--pace asap` in ≤ 30 s on Tier-1 hardware). Test fixture: re-uses the existing Derkachi corpus (`_docs/00_problem/input_data/flight_derkachi/`) — clip a 60–120 s segment + matching tlog window. Test gated by `RUN_REPLAY_E2E=1` env var in CI (Tier-1 capable; not run on every PR by default per the project's existing E2E gating pattern).
|
||||
**Complexity**: 5 points
|
||||
**Dependencies**: AZ-402 (CLI entrypoint); AZ-403 (Docker image used by E2E in CI); AZ-401 (composition root); the Derkachi fixture (`_docs/00_problem/input_data/flight_derkachi/`); AZ-263, AZ-269, AZ-266, AZ-272, AZ-273
|
||||
**Component**: replay-tests (epic AZ-265 / E-DEMO-REPLAY) — test at `tests/e2e/replay/`
|
||||
**Tracker**: AZ-404
|
||||
**Epic**: AZ-265 (E-DEMO-REPLAY)
|
||||
|
||||
### Document Dependencies
|
||||
|
||||
- `_docs/02_document/contracts/replay/replay_protocol.md` — Invariants 7, 10 (determinism).
|
||||
- `_docs/02_document/components/07_c5_state/description.md` — `EstimatorOutput` schema.
|
||||
- `_docs/00_problem/input_data/flight_derkachi/README.md` — fixture documentation.
|
||||
- `_docs/00_problem/input_data/expected_results/position_accuracy.csv` — ground-truth GPS for the AC-3 assertion.
|
||||
|
||||
## Problem
|
||||
|
||||
Without this task, AC-3 (the epic's primary acceptance gate — demo confidence equals field test confidence on the same footage) is unverified. AC-5 (determinism) and AC-6 (pace timing) are similarly unverified at the system level.
|
||||
|
||||
## Outcome
|
||||
|
||||
- `tests/e2e/replay/conftest.py`:
|
||||
- Fixture `derkachi_replay_inputs` returning `(video_path, tlog_path, calib_path, ground_truth_csv)`.
|
||||
- Fixture `replay_runner` invoking the CLI via `subprocess.run(["gps-denied-replay", ...])` (or equivalent) and returning the captured stdout/stderr + exit code + parsed JSONL output.
|
||||
- `tests/e2e/replay/test_derkachi_1min.py`:
|
||||
- `test_ac1_exits_0_jsonl_count_match`.
|
||||
- `test_ac2_jsonl_schema_match`.
|
||||
- `test_ac3_within_100m_80pct_of_ticks`.
|
||||
- `test_ac5_determinism_two_runs_diff`.
|
||||
- `test_ac6_pace_realtime_60s_within_5pct`.
|
||||
- `test_ac6_pace_asap_under_30s`.
|
||||
- Helper `tests/e2e/replay/_helpers.py`:
|
||||
- JSONL parser → list of `EstimatorOutput`.
|
||||
- L2 horizontal-distance computation (WGS84-aware; uses `WgsConverter` AZ-279 inside the test for ground-truth comparison).
|
||||
- Match-percentage computation against ground-truth GPS.
|
||||
- CI gating: tests marked `@pytest.mark.skipif(not os.getenv("RUN_REPLAY_E2E"), reason="...")` per the project's E2E pattern.
|
||||
- Documentation: `tests/e2e/replay/README.md` describes how to run locally + which env var enables in CI.
|
||||
|
||||
## Scope
|
||||
|
||||
### Included
|
||||
- All 6 test methods (one per epic AC except AC-7 / AC-8 — those are auto-sync, owned by AZ-405 — and AC-4 — owned by SBOM diff in AZ-403).
|
||||
- Helper functions for JSONL parsing + ground-truth comparison.
|
||||
- Conftest fixtures.
|
||||
- README.
|
||||
|
||||
### Excluded
|
||||
- AC-7 / AC-8 auto-sync tests — owned by AZ-405 (auto-sync task).
|
||||
- AC-4 SBOM-diff verification — owned by AZ-403 (Dockerfile + CI task).
|
||||
|
||||
## Acceptance Criteria
|
||||
|
||||
**AC-1: test_ac1_exits_0_jsonl_count_match passes** — runs the CLI; exit code is 0; JSONL line count is within ±5 % of the tlog's `GLOBAL_POSITION_INT` count.
|
||||
|
||||
**AC-2: test_ac2_jsonl_schema_match passes** — every JSONL line is a valid JSON object with all `EstimatorOutput` schema fields present + correct types.
|
||||
|
||||
**AC-3: test_ac3_within_100m_80pct_of_ticks passes** — for the Derkachi fixture with known ground-truth GPS, ≥ 80 % of emitted `EstimatorOutput` records have L2 horizontal distance ≤ 100 m from ground truth.
|
||||
|
||||
**AC-4: test_ac5_determinism_two_runs_diff passes** — run the CLI twice with identical args; load both JSONL outputs; assert position fields differ by ≤ 1e-6 float (Invariant 10).
|
||||
|
||||
**AC-5: test_ac6_pace_realtime_60s_within_5pct passes** — run with `--pace realtime` on a 60 s clip; assert wall-clock duration is 60 s ± 3 s.
|
||||
|
||||
**AC-6: test_ac6_pace_asap_under_30s passes** — run with `--pace asap` on the same 60 s clip; assert wall-clock duration ≤ 30 s on Tier-1 hardware.
|
||||
|
||||
**AC-7: All tests skip cleanly without RUN_REPLAY_E2E** — when the env var is unset, `pytest tests/e2e/replay/` reports all 6 tests as SKIPPED, not FAILED.
|
||||
|
||||
**AC-8: Tests run via Docker image** — also verify the CLI works via `docker run --rm gps-denied-replay-cli gps-denied-replay ...` for at least one of the AC tests (AC-1) — proves the image entrypoint is functional.
|
||||
|
||||
**AC-9: Helper L2 computation correct** — unit-level test of the WGS84 L2 helper against hand-computed expected distance for a known coord pair.
|
||||
|
||||
**AC-10: README accuracy** — `tests/e2e/replay/README.md` documents the env var, the fixture location, the expected runtime per pace, and the failure-mode cookbook (e.g., "if AC-3 fails, regenerate ground-truth via X").
|
||||
|
||||
## Non-Functional Requirements
|
||||
|
||||
- E2E suite runtime ≤ 5 min on Tier-1 hardware (one realtime run + one asap run + two determinism asap runs + two more for AC-1/AC-2).
|
||||
- E2E memory ≤ 4 GB resident (epic NFT).
|
||||
|
||||
## Constraints
|
||||
|
||||
- Re-use the Derkachi fixture (`_docs/00_problem/input_data/flight_derkachi/`); do NOT introduce new fixture data unless explicitly missing.
|
||||
- pytest is the test runner.
|
||||
- Tier-1 hardware assumed (Jetson AGX Orin or equivalent x86 with CUDA per the project's CI matrix).
|
||||
- The 1–2 min clip is a sub-segment of the existing Derkachi flight; the segment range is documented in `tests/e2e/replay/README.md`.
|
||||
|
||||
## Risks & Mitigation
|
||||
|
||||
- **Risk: AC-3 flake under non-deterministic ML inference** — *Mitigation*: AC-5 (determinism) covers the two-runs-equal case; AC-3 is the offline-replay-quality check; if the system is non-deterministic enough to flake AC-3, that's a deeper bug worth surfacing.
|
||||
- **Risk: Derkachi fixture clip not yet trimmed** — *Mitigation*: this task includes producing the trimmed clip + tlog window as part of the fixture; the conftest fixture file holds the trim definition (start/end timestamps).
|
||||
- **Risk: AC-6 realtime timing flakes on shared CI runners** — *Mitigation*: ± 3 s tolerance is generous; if flakes persist, the tolerance widens to ± 5 s in a follow-up.
|
||||
|
||||
## Runtime Completeness
|
||||
|
||||
- **Named capability**: end-to-end replay regression test against the Derkachi fixture.
|
||||
- **Production code**: real CLI invocation, real ground-truth comparison, real determinism diff.
|
||||
- **Allowed external stubs**: NONE — this is the integration-fidelity test.
|
||||
- **Unacceptable substitutes**: an in-process pytest harness that bypasses the CLI subprocess (defeats AC-1 + AC-8 — the deliverable is the CLI binary).
|
||||
|
||||
## Contract
|
||||
|
||||
Verifies `_docs/02_document/contracts/replay/replay_protocol.md` — Invariants 7 + 10; epic ACs 1, 2, 3, 5, 6.
|
||||
Reference in New Issue
Block a user