Files
gps-denied-onboard/_docs/03_implementation/batch_83_report.md
T
Oleksandr Bezdieniezhnykh 5def1a3eb3 [AZ-422] Add FT-P-17 + FT-N-06 mid-flight tile blackbox tests
Implement the AC-8.4 and AC-NEW-6 blackbox scenarios for mid-flight
tile generation, dedup, landing-time upload, and freshness gating.

Helpers:
- runner/helpers/mid_flight_tile_evaluator.py — pure-logic evaluators
  for tile generation rate, Mode B Fact #105 schema check, footprint+
  GSD dedup (via geo.distance_m), upload-audit reconciliation, and
  the AC-5/AC-6 capture_utc + freshness-gate checks.
- runner/helpers/mock_suite_sat_audit.py — httpx wrapper for the
  mock-suite-sat-service /tiles/audit endpoint with strict response-
  shape validation.

Scenarios:
- tests/positive/test_ft_p_17_mid_flight_tiles.py
- tests/negative/test_ft_n_06_mid_flight_freshness.py

Both skip when sitl_replay_ready is false and fail loudly when fixture
records are missing (tests-as-gates discipline). 52 new unit tests
(41 evaluator + 11 audit client) cover every helper branch.

Review: PASS_WITH_WARNINGS (2 Low — duplicate haversine carry-over,
upstream production dependency surface).

Co-authored-by: Cursor <cursoragent@cursor.com>
2026-05-17 15:28:39 +03:00

6.1 KiB
Raw Blame History

Batch 83 — AZ-422 (FT-P-17 + FT-N-06 mid-flight tile generation + freshness)

Tracker: AZ-422 Tasks: 1 task / 3 complexity points Date: 2026-05-17 Verdict: PASS_WITH_WARNINGS Review: _docs/03_implementation/reviews/batch_83_review.md

Scope

  • FT-P-17 (positive, AC-8.4): mid-flight orthorectified tile generation, per-tile quality metadata, dedup, landing-time upload to mock-suite-sat-service.
  • FT-N-06 (negative, AC-NEW-6): per-tile capture_utc within ±60 s of generation wall-clock; freshness gate must not reject freshly generated tiles as stale.

Both scenarios parameterize across (fc_adapter ∈ {ardupilot, inav}, vio_strategy ∈ {okvis2, klt_ransac, vins_mono}) → 12 collected test cases.

Files

Created

  • e2e/runner/helpers/mid_flight_tile_evaluator.py — pure-logic evaluators for AC-1..AC-6:
    • evaluate_tile_generation_rate (AC-1)
    • evaluate_tile_quality_metadata (AC-2; Mode B Fact #105 schema mirror)
    • evaluate_dedup (AC-3; Vincenty distance via geo.distance_m + GSD-fraction check)
    • evaluate_upload_acks (AC-4)
    • evaluate_capture_date_freshness (AC-5; ISO-8601 parse + monotonic-ms drift)
    • evaluate_freshness_gate (AC-6)
  • e2e/runner/helpers/mock_suite_sat_audit.py — thin httpx client for GET /tiles/audit with input validation, HTTP error, JSON shape errors all raised as RuntimeError.
  • e2e/tests/positive/test_ft_p_17_mid_flight_tiles.py — FT-P-17 scenario covering AC-1..AC-4 + AC-7.
  • e2e/tests/negative/test_ft_n_06_mid_flight_freshness.py — FT-N-06 scenario covering AC-5 + AC-6 + AC-7.
  • e2e/_unit_tests/helpers/test_mid_flight_tile_evaluator.py — 41 unit tests covering happy paths + boundary + error cases for every evaluator.
  • e2e/_unit_tests/helpers/test_mock_suite_sat_audit.py — 11 unit tests covering happy paths + every error branch with httpx.MockTransport.

Modified

  • e2e/_unit_tests/test_directory_layout.py — registered 4 new paths under the AZ-406 layout invariant.

Test Results

$ pytest _unit_tests/helpers/test_mid_flight_tile_evaluator.py \
         _unit_tests/helpers/test_mock_suite_sat_audit.py \
         _unit_tests/test_directory_layout.py -x
============================= 157 passed in 1.07s ==============================

Scenario collection:

$ pytest tests/positive/test_ft_p_17_mid_flight_tiles.py \
         tests/negative/test_ft_n_06_mid_flight_freshness.py --collect-only
collected 12 items  (6 per scenario × {ardupilot,inav} × {okvis2,klt_ransac,vins_mono})

(Pre-existing OSError: Read-only file system: '/e2e-results' in pytest_sessionfinish is unrelated NFR-recorder teardown noise; doesn't affect collection or assertion logic.)

AC Verification

AC Coverage
AC-1 tile rate ≥1 per ~3 s high-quality nav frames evaluate_tile_generation_rate + scenario assertion + 5 unit tests
AC-2 quality metadata (Mode B Fact #105) evaluate_tile_quality_metadata + scenario assertion + 7 unit tests
AC-3 dedup (±1 m footprint AND ±5 % GSD) evaluate_dedup + scenario assertion + 8 unit tests
AC-4 landing upload HTTP 202 for every tile evaluate_upload_acks + fetch_audit + scenario assertion + 5 unit tests + 11 HTTP unit tests
AC-5 |capture_utc generated_at| ≤ 60 s evaluate_capture_date_freshness + scenario assertion + 8 unit tests
AC-6 no tile-load-rejected: stale for fresh tiles evaluate_freshness_gate + scenario assertion + 7 unit tests
AC-7 parameterization 12 collected variants (6 per scenario) via conftest fc_adapter / vio_strategy fixtures

traces_to markers wire scenarios to the traceability matrix:

  • FT-P-17: AC-8.4,AC-1,AC-2,AC-3,AC-4,AC-7
  • FT-N-06: AC-NEW-6,AC-5,AC-6,AC-7

Code Review

Verdict: PASS_WITH_WARNINGS — 0 Critical, 0 High, 2 Low.

  • F1 (carry-over): gcs_telemetry_evaluator.py's private haversine duplicates geo.distance_m. Already surfaced in the batches 7981 cumulative review; deferred to a dedicated refactor batch.
  • F2 (production-dependency surface): both scenarios depend on upstream features (see Production Dependencies below). Tests skip cleanly when fixtures missing and fail loudly when fixtures exist but records are missing — adhering to "tests as gates" principle.

Full review: _docs/03_implementation/reviews/batch_83_review.md.

Production Dependencies

These features must exist for the scenarios to actually run (rather than skip):

  1. SUT-side mid-flight-tile-output FDR record kind matching the Mode B Fact #105 schema (TILE_REQUIRED_TOP_LEVEL_FIELDS + TILE_REQUIRED_QUALITY_FIELDS).
  2. SUT-side tile-load-rejected FDR record with reason="stale" emitted by the freshness gate.
  3. SUT-side simulate_landing() MAVLink command (or equivalent public-input trigger) for landing-event tile upload.
  4. Fixture-builder-side Derkachi 5-min replay scenario emitting both record kinds for the parameterized FC × VIO grid.
  5. Fixture-builder-side FT_P_17_HIGH_QUALITY_WINDOW_S env var injection (total seconds of high-quality nav frames per AC-2.1a normal-segment criterion).
  6. Already exists: mock-suite-sat-service /tiles/audit endpoint (e2e/fixtures/mock-suite-sat/app.py).
  7. Already exists: mock_suite_sat_url and sitl_replay_ready pytest fixtures (used by sibling scenarios).

Dependencies 15 are tracked against epic E-OBC (Mode B work) and AZ-595 fixture builder — outside the blackbox-test workspace.

Architecture Compliance

  • All new files under e2e/, owned by the Blackbox Tests component per _docs/02_document/module-layout.md.
  • No imports from src/gps_denied_onboard (explicit public-boundary discipline note in mid_flight_tile_evaluator.py).
  • No new cyclic dependencies.
  • httpx and pyproj (via geo) reuse — no new infrastructure libraries introduced.

Sub-step Trace

Phases executed per implement/SKILL.md:

  • phase 5 (load-spec) → AZ-422 spec read
  • phase 6 (implement-tasks-sequentially) → helper + scenarios + unit tests
  • phase 7 (verify-ac-coverage) → 7-AC trace above
  • phase 8 (code-review) → batch_83_review.md (PASS_WITH_WARNINGS)
  • phase 11 (commit-batch) → next.