Files
gps-denied-onboard/_docs/03_implementation/reviews/batch_83_review.md
T
Oleksandr Bezdieniezhnykh 5def1a3eb3 [AZ-422] Add FT-P-17 + FT-N-06 mid-flight tile blackbox tests
Implement the AC-8.4 and AC-NEW-6 blackbox scenarios for mid-flight
tile generation, dedup, landing-time upload, and freshness gating.

Helpers:
- runner/helpers/mid_flight_tile_evaluator.py — pure-logic evaluators
  for tile generation rate, Mode B Fact #105 schema check, footprint+
  GSD dedup (via geo.distance_m), upload-audit reconciliation, and
  the AC-5/AC-6 capture_utc + freshness-gate checks.
- runner/helpers/mock_suite_sat_audit.py — httpx wrapper for the
  mock-suite-sat-service /tiles/audit endpoint with strict response-
  shape validation.

Scenarios:
- tests/positive/test_ft_p_17_mid_flight_tiles.py
- tests/negative/test_ft_n_06_mid_flight_freshness.py

Both skip when sitl_replay_ready is false and fail loudly when fixture
records are missing (tests-as-gates discipline). 52 new unit tests
(41 evaluator + 11 audit client) cover every helper branch.

Review: PASS_WITH_WARNINGS (2 Low — duplicate haversine carry-over,
upstream production dependency surface).

Co-authored-by: Cursor <cursoragent@cursor.com>
2026-05-17 15:28:39 +03:00

9.5 KiB
Raw Blame History

Code Review Report

Batch: 83 — AZ-422 (FT-P-17 + FT-N-06 mid-flight tile generation + freshness) Date: 2026-05-17 Verdict: PASS_WITH_WARNINGS

Files Reviewed

Created:

  • e2e/runner/helpers/mid_flight_tile_evaluator.py
  • e2e/runner/helpers/mock_suite_sat_audit.py
  • e2e/tests/positive/test_ft_p_17_mid_flight_tiles.py
  • e2e/tests/negative/test_ft_n_06_mid_flight_freshness.py
  • e2e/_unit_tests/helpers/test_mid_flight_tile_evaluator.py
  • e2e/_unit_tests/helpers/test_mock_suite_sat_audit.py

Modified:

  • e2e/_unit_tests/test_directory_layout.py (registered 4 new paths)

Findings

# Severity Category File:Line Title
1 Low Maintainability e2e/runner/helpers/gcs_telemetry_evaluator.py (carry-over) Duplicate haversine helper not consolidated to geo.distance_m
2 Low Spec-Gap e2e/tests/positive/test_ft_p_17_mid_flight_tiles.py, e2e/tests/negative/test_ft_n_06_mid_flight_freshness.py Tests depend on upstream production + fixture-builder features that don't exist yet

Finding Details

F1: Duplicate haversine helper not consolidated to geo.distance_m (Low / Maintainability — carry-over)

  • Location: e2e/runner/helpers/gcs_telemetry_evaluator.py
  • Description: Batch 81 introduced a private haversine function inside gcs_telemetry_evaluator.py for search-region-shift distance math. runner.helpers.geo.distance_m is the project-wide Vincenty helper (used by this batch's mid_flight_tile_evaluator.py for dedup). Two helpers, two algorithms, same intent.
  • Suggestion: Migrate gcs_telemetry_evaluator.py to geo.distance_m in a dedicated refactor batch (≤1 point). Out of scope for AZ-422 — would expand the diff into a helper already shipped and reviewed.
  • Task: Carry-over from batches 7981 cumulative review.

F2: Tests depend on upstream production + fixture-builder features that don't exist yet (Low / Spec-Gap)

  • Location: e2e/tests/positive/test_ft_p_17_mid_flight_tiles.py, e2e/tests/negative/test_ft_n_06_mid_flight_freshness.py
  • Description: Both scenarios require:
    • SUT writing mid-flight-tile-output FDR records with the Mode B Fact #105 schema (production, owned by epic E-OBC / Mode B work — outside the test harness).
    • SUT emitting tile-load-rejected FDR records with reason="stale" from the freshness gate (same).
    • simulate_landing() MAVLink command or equivalent public-input mechanism that triggers landing-time tile upload (production, public-input).
    • mock-suite-sat-service audit endpoint (already exists in e2e/fixtures/mock-suite-sat/app.py).
    • Fixture builder support for the FT_P_17_HIGH_QUALITY_WINDOW_S env var, computed from segment-quality FDR records (AZ-595).
    • Fixture builder support for AZ-422 5-min Derkachi replay scenario.
  • The tests skip cleanly when sitl_replay_ready is false (consistent with FT-P-12/13/15/16/18) and fail loudly when the fixture exists but the required records are missing — adhering to the "tests as gates" principle.
  • Suggestion: Surface as a single line in the AZ-422 batch report under Production Dependencies. No code change in this batch.
  • Task: AZ-422.

Phase 1: Context Loading

Read AZ-422 task spec (_docs/02_tasks/todo/AZ-422_ft_p_17_ftn_06_mid_flight_tiles.md). All seven ACs and the SUT boundary statement understood before review.

Phase 2: Spec Compliance

AC Helper Scenario assertion Unit-test coverage
AC-1 (≥1 tile / 3 s high-quality nav frames) evaluate_tile_generation_rate assert rate_report.passes 5 cases (exact pass, under-min fail, zero window, invalid arg, empty list)
AC-2 (Mode B Fact #105 quality fields populated) evaluate_tile_quality_metadata assert quality_report.passes 7 cases (full pass, missing quality field, missing top-level, non-dict quality, empty list, null value, partial drop)
AC-3 (dedup: ±1 m footprint AND ±5 % GSD) evaluate_dedup assert dedup_report.passes 8 cases (dup same centre, far apart pass, different GSD pass, close-GSD dupe, missing GSD skip, empty pass, invalid args raise, 3-tile pair detection)
AC-4 (landing upload HTTP 202 every tile) evaluate_upload_acks + mock_suite_sat_audit.fetch_audit assert upload_report.passes 5 cases (all acked pass, missing tile fail, extra audit pass, empty generated fail, malformed-entry skip, non-dict skip) + 11 HTTP-client cases
AC-5 ( capture_utc generated_at ≤ 60 s) evaluate_capture_date_freshness
AC-6 (no tile-load-rejected: stale for fresh tiles) evaluate_freshness_gate assert freshness_report.passes 7 cases (no rejections pass, unrelated rejection pass, fresh stale-rejected fail, non-stale reason ignored, tile_id key variant, non-dict skip, custom stale reason)
AC-7 (parameterized across (fc_adapter, vio_strategy)) conftest fixtures 6 collected variants per scenario = 12 total

All ACs satisfied. The @pytest.mark.traces_to("AC-8.4,AC-1,...") markers wire scenarios to AC IDs for the traceability matrix.

Phase 3: Code Quality

  • SOLID: each evaluator is a pure function over a TileSpec/dict input returning a frozen-dataclass report. mock_suite_sat_audit.fetch_audit is a single-responsibility HTTP wrapper. Test files mirror helper shape (one test file per helper module).
  • Error handling: all error paths raise ValueError (input validation) or RuntimeError (HTTP / response shape). No bare except or silent swallowing. The except (TypeError, ValueError) in _parse_iso8601_utc_seconds is typed and limited to parsing failure.
  • Naming: evaluate_* matches sibling helpers (gcs_telemetry_evaluator, tile_cache_inspector); report dataclasses follow <Concern>Report / <Concern>EntryReport naming.
  • Complexity: longest function evaluate_tile_quality_metadata at ~25 lines; evaluate_dedup at ~25 lines with an O(N²) loop the docstring explicitly documents. All under coderule's 50-line threshold.
  • DRY: no in-batch duplication. Cross-batch duplication of haversine logic surfaced as F1.
  • Test quality: every unit test uses Arrange/Act/Assert comments per coderule. Tests assert meaningful behavior (specific drift values, specific failing tile IDs, specific error substrings) rather than "no error thrown".
  • Dead code: none. _top_level_field_to_attr is a documented forward-compatibility seam (1:1 today, allows future field-name drift handling).

Phase 4: Security Quick-Scan

  • No SQL, no string-interpolated queries (FDR is JSON file iteration).
  • No subprocess(... shell=True), no exec, no eval.
  • No hardcoded secrets; the HTTP client takes a base URL from mock_suite_sat_url fixture.
  • Input validation: fetch_audit validates base_url non-empty, run_id non-empty, HTTP 2xx, JSON object body, entries list shape.
  • No sensitive data in error messages; HTTP error message truncates body to 200 chars.
  • No insecure deserialization — JSON parsed via httpx.Response.json() (stdlib json.loads underneath); shape checked post-parse.

Phase 5: Performance Scan

  • evaluate_dedup is O(N²) — explicitly documented in the docstring as acceptable for <100 tiles per 5-min replay. For longer flights a spatial index (KD-tree) would be needed; out of scope for AZ-422.
  • fetch_audit is one-shot HTTP GET with 10 s timeout and no retries — appropriate for a co-located mock in the compose harness.
  • FDR iteration in scenarios uses fdr_reader.iter_records (generator).
  • No N+1 patterns, no unbounded fetching beyond the SUT's tile count.

Phase 6: Cross-Task Consistency

  • TileSpec mirrors e2e/fixtures/mock-suite-sat/app.py's TilePublishRequest + TileQualityMetadata (Mode B Fact #105). Constants TILE_REQUIRED_TOP_LEVEL_FIELDS and TILE_REQUIRED_QUALITY_FIELDS make the contract explicit.
  • Scenario skip pattern (if not sitl_replay_ready: pytest.skip(...)) matches FT-P-12/13/15/16/18.
  • httpx HTTP client matches the dependency already pinned for sibling helpers.
  • geo.distance_m reuse rather than re-implementing Vincenty/haversine in this helper.
  • Test fixture imports (evidence_dir, run_id, nfr_recorder, fc_adapter, vio_strategy, mock_suite_sat_url, sitl_replay_ready) match the conftest signature used by sibling scenarios.

Phase 7: Architecture Compliance

  • All new files live under e2e/ — owned exclusively by the Blackbox Tests component per _docs/02_document/module-layout.md.
  • No imports from src/gps_denied_onboard — explicit "public-boundary discipline" note at top of mid_flight_tile_evaluator.py; verified by reading every import.
  • No layering violations.
  • No new cyclic module dependencies (the helper imports from .geo only; tests import from runner.helpers.*).
  • No duplicate symbols across components (the carry-over duplicate haversine is intra-component, tracked in F1).
  • No cross-cutting concerns re-implemented locally; HTTP client, JSON parsing, and geo math all delegate to shared dependencies.

Verdict

  • 0 Critical, 0 High → no FAIL trigger.
  • 2 Low (one carry-over, one production-dependency surface) → PASS_WITH_WARNINGS.

Batch 83 is ready to commit. The two Low findings are surfaced for batch report and feed forward into the next cumulative review (batches 8284) without blocking.