Implement the AC-8.4 and AC-NEW-6 blackbox scenarios for mid-flight tile generation, dedup, landing-time upload, and freshness gating. Helpers: - runner/helpers/mid_flight_tile_evaluator.py — pure-logic evaluators for tile generation rate, Mode B Fact #105 schema check, footprint+ GSD dedup (via geo.distance_m), upload-audit reconciliation, and the AC-5/AC-6 capture_utc + freshness-gate checks. - runner/helpers/mock_suite_sat_audit.py — httpx wrapper for the mock-suite-sat-service /tiles/audit endpoint with strict response- shape validation. Scenarios: - tests/positive/test_ft_p_17_mid_flight_tiles.py - tests/negative/test_ft_n_06_mid_flight_freshness.py Both skip when sitl_replay_ready is false and fail loudly when fixture records are missing (tests-as-gates discipline). 52 new unit tests (41 evaluator + 11 audit client) cover every helper branch. Review: PASS_WITH_WARNINGS (2 Low — duplicate haversine carry-over, upstream production dependency surface). Co-authored-by: Cursor <cursoragent@cursor.com>
6.1 KiB
Batch 83 — AZ-422 (FT-P-17 + FT-N-06 mid-flight tile generation + freshness)
Tracker: AZ-422
Tasks: 1 task / 3 complexity points
Date: 2026-05-17
Verdict: PASS_WITH_WARNINGS
Review: _docs/03_implementation/reviews/batch_83_review.md
Scope
- FT-P-17 (positive, AC-8.4): mid-flight orthorectified tile generation, per-tile quality metadata, dedup, landing-time upload to mock-suite-sat-service.
- FT-N-06 (negative, AC-NEW-6): per-tile
capture_utcwithin ±60 s of generation wall-clock; freshness gate must not reject freshly generated tiles as stale.
Both scenarios parameterize across (fc_adapter ∈ {ardupilot, inav}, vio_strategy ∈ {okvis2, klt_ransac, vins_mono}) → 12 collected test cases.
Files
Created
e2e/runner/helpers/mid_flight_tile_evaluator.py— pure-logic evaluators for AC-1..AC-6:evaluate_tile_generation_rate(AC-1)evaluate_tile_quality_metadata(AC-2; Mode B Fact #105 schema mirror)evaluate_dedup(AC-3; Vincenty distance viageo.distance_m+ GSD-fraction check)evaluate_upload_acks(AC-4)evaluate_capture_date_freshness(AC-5; ISO-8601 parse + monotonic-ms drift)evaluate_freshness_gate(AC-6)
e2e/runner/helpers/mock_suite_sat_audit.py— thinhttpxclient forGET /tiles/auditwith input validation, HTTP error, JSON shape errors all raised asRuntimeError.e2e/tests/positive/test_ft_p_17_mid_flight_tiles.py— FT-P-17 scenario covering AC-1..AC-4 + AC-7.e2e/tests/negative/test_ft_n_06_mid_flight_freshness.py— FT-N-06 scenario covering AC-5 + AC-6 + AC-7.e2e/_unit_tests/helpers/test_mid_flight_tile_evaluator.py— 41 unit tests covering happy paths + boundary + error cases for every evaluator.e2e/_unit_tests/helpers/test_mock_suite_sat_audit.py— 11 unit tests covering happy paths + every error branch withhttpx.MockTransport.
Modified
e2e/_unit_tests/test_directory_layout.py— registered 4 new paths under the AZ-406 layout invariant.
Test Results
$ pytest _unit_tests/helpers/test_mid_flight_tile_evaluator.py \
_unit_tests/helpers/test_mock_suite_sat_audit.py \
_unit_tests/test_directory_layout.py -x
============================= 157 passed in 1.07s ==============================
Scenario collection:
$ pytest tests/positive/test_ft_p_17_mid_flight_tiles.py \
tests/negative/test_ft_n_06_mid_flight_freshness.py --collect-only
collected 12 items (6 per scenario × {ardupilot,inav} × {okvis2,klt_ransac,vins_mono})
(Pre-existing OSError: Read-only file system: '/e2e-results' in pytest_sessionfinish is unrelated NFR-recorder teardown noise; doesn't affect collection or assertion logic.)
AC Verification
| AC | Coverage |
|---|---|
| AC-1 tile rate ≥1 per ~3 s high-quality nav frames | evaluate_tile_generation_rate + scenario assertion + 5 unit tests |
| AC-2 quality metadata (Mode B Fact #105) | evaluate_tile_quality_metadata + scenario assertion + 7 unit tests |
| AC-3 dedup (±1 m footprint AND ±5 % GSD) | evaluate_dedup + scenario assertion + 8 unit tests |
| AC-4 landing upload HTTP 202 for every tile | evaluate_upload_acks + fetch_audit + scenario assertion + 5 unit tests + 11 HTTP unit tests |
| AC-5 |capture_utc − generated_at| ≤ 60 s | evaluate_capture_date_freshness + scenario assertion + 8 unit tests |
AC-6 no tile-load-rejected: stale for fresh tiles |
evaluate_freshness_gate + scenario assertion + 7 unit tests |
| AC-7 parameterization | 12 collected variants (6 per scenario) via conftest fc_adapter / vio_strategy fixtures |
traces_to markers wire scenarios to the traceability matrix:
- FT-P-17:
AC-8.4,AC-1,AC-2,AC-3,AC-4,AC-7 - FT-N-06:
AC-NEW-6,AC-5,AC-6,AC-7
Code Review
Verdict: PASS_WITH_WARNINGS — 0 Critical, 0 High, 2 Low.
- F1 (carry-over):
gcs_telemetry_evaluator.py's private haversine duplicatesgeo.distance_m. Already surfaced in the batches 79–81 cumulative review; deferred to a dedicated refactor batch. - F2 (production-dependency surface): both scenarios depend on upstream features (see Production Dependencies below). Tests skip cleanly when fixtures missing and fail loudly when fixtures exist but records are missing — adhering to "tests as gates" principle.
Full review: _docs/03_implementation/reviews/batch_83_review.md.
Production Dependencies
These features must exist for the scenarios to actually run (rather than skip):
- SUT-side
mid-flight-tile-outputFDR record kind matching the Mode B Fact #105 schema (TILE_REQUIRED_TOP_LEVEL_FIELDS+TILE_REQUIRED_QUALITY_FIELDS). - SUT-side
tile-load-rejectedFDR record withreason="stale"emitted by the freshness gate. - SUT-side
simulate_landing()MAVLink command (or equivalent public-input trigger) for landing-event tile upload. - Fixture-builder-side Derkachi 5-min replay scenario emitting both record kinds for the parameterized FC × VIO grid.
- Fixture-builder-side
FT_P_17_HIGH_QUALITY_WINDOW_Senv var injection (total seconds of high-quality nav frames per AC-2.1a normal-segment criterion). - Already exists:
mock-suite-sat-service/tiles/auditendpoint (e2e/fixtures/mock-suite-sat/app.py). - Already exists:
mock_suite_sat_urlandsitl_replay_readypytest fixtures (used by sibling scenarios).
Dependencies 1–5 are tracked against epic E-OBC (Mode B work) and AZ-595 fixture builder — outside the blackbox-test workspace.
Architecture Compliance
- All new files under
e2e/, owned by the Blackbox Tests component per_docs/02_document/module-layout.md. - No imports from
src/gps_denied_onboard(explicit public-boundary discipline note inmid_flight_tile_evaluator.py). - No new cyclic dependencies.
httpxandpyproj(viageo) reuse — no new infrastructure libraries introduced.
Sub-step Trace
Phases executed per implement/SKILL.md:
- phase 5 (load-spec) → AZ-422 spec read
- phase 6 (implement-tasks-sequentially) → helper + scenarios + unit tests
- phase 7 (verify-ac-coverage) → 7-AC trace above
- phase 8 (code-review) → batch_83_review.md (PASS_WITH_WARNINGS)
- phase 11 (commit-batch) → next.