Implement the AC-8.4 and AC-NEW-6 blackbox scenarios for mid-flight tile generation, dedup, landing-time upload, and freshness gating. Helpers: - runner/helpers/mid_flight_tile_evaluator.py — pure-logic evaluators for tile generation rate, Mode B Fact #105 schema check, footprint+ GSD dedup (via geo.distance_m), upload-audit reconciliation, and the AC-5/AC-6 capture_utc + freshness-gate checks. - runner/helpers/mock_suite_sat_audit.py — httpx wrapper for the mock-suite-sat-service /tiles/audit endpoint with strict response- shape validation. Scenarios: - tests/positive/test_ft_p_17_mid_flight_tiles.py - tests/negative/test_ft_n_06_mid_flight_freshness.py Both skip when sitl_replay_ready is false and fail loudly when fixture records are missing (tests-as-gates discipline). 52 new unit tests (41 evaluator + 11 audit client) cover every helper branch. Review: PASS_WITH_WARNINGS (2 Low — duplicate haversine carry-over, upstream production dependency surface). Co-authored-by: Cursor <cursoragent@cursor.com>
9.5 KiB
Code Review Report
Batch: 83 — AZ-422 (FT-P-17 + FT-N-06 mid-flight tile generation + freshness) Date: 2026-05-17 Verdict: PASS_WITH_WARNINGS
Files Reviewed
Created:
e2e/runner/helpers/mid_flight_tile_evaluator.pye2e/runner/helpers/mock_suite_sat_audit.pye2e/tests/positive/test_ft_p_17_mid_flight_tiles.pye2e/tests/negative/test_ft_n_06_mid_flight_freshness.pye2e/_unit_tests/helpers/test_mid_flight_tile_evaluator.pye2e/_unit_tests/helpers/test_mock_suite_sat_audit.py
Modified:
e2e/_unit_tests/test_directory_layout.py(registered 4 new paths)
Findings
| # | Severity | Category | File:Line | Title |
|---|---|---|---|---|
| 1 | Low | Maintainability | e2e/runner/helpers/gcs_telemetry_evaluator.py (carry-over) |
Duplicate haversine helper not consolidated to geo.distance_m |
| 2 | Low | Spec-Gap | e2e/tests/positive/test_ft_p_17_mid_flight_tiles.py, e2e/tests/negative/test_ft_n_06_mid_flight_freshness.py |
Tests depend on upstream production + fixture-builder features that don't exist yet |
Finding Details
F1: Duplicate haversine helper not consolidated to geo.distance_m (Low / Maintainability — carry-over)
- Location:
e2e/runner/helpers/gcs_telemetry_evaluator.py - Description: Batch 81 introduced a private haversine function inside
gcs_telemetry_evaluator.pyfor search-region-shift distance math.runner.helpers.geo.distance_mis the project-wide Vincenty helper (used by this batch'smid_flight_tile_evaluator.pyfor dedup). Two helpers, two algorithms, same intent. - Suggestion: Migrate
gcs_telemetry_evaluator.pytogeo.distance_min a dedicated refactor batch (≤1 point). Out of scope for AZ-422 — would expand the diff into a helper already shipped and reviewed. - Task: Carry-over from batches 79–81 cumulative review.
F2: Tests depend on upstream production + fixture-builder features that don't exist yet (Low / Spec-Gap)
- Location:
e2e/tests/positive/test_ft_p_17_mid_flight_tiles.py,e2e/tests/negative/test_ft_n_06_mid_flight_freshness.py - Description: Both scenarios require:
- SUT writing
mid-flight-tile-outputFDR records with the Mode B Fact #105 schema (production, owned by epic E-OBC / Mode B work — outside the test harness). - SUT emitting
tile-load-rejectedFDR records withreason="stale"from the freshness gate (same). simulate_landing()MAVLink command or equivalent public-input mechanism that triggers landing-time tile upload (production, public-input).mock-suite-sat-serviceaudit endpoint (already exists ine2e/fixtures/mock-suite-sat/app.py).- Fixture builder support for the
FT_P_17_HIGH_QUALITY_WINDOW_Senv var, computed from segment-quality FDR records (AZ-595). - Fixture builder support for AZ-422 5-min Derkachi replay scenario.
- SUT writing
- The tests skip cleanly when
sitl_replay_readyis false (consistent with FT-P-12/13/15/16/18) and fail loudly when the fixture exists but the required records are missing — adhering to the "tests as gates" principle. - Suggestion: Surface as a single line in the AZ-422 batch report under Production Dependencies. No code change in this batch.
- Task: AZ-422.
Phase 1: Context Loading
Read AZ-422 task spec (_docs/02_tasks/todo/AZ-422_ft_p_17_ftn_06_mid_flight_tiles.md). All seven ACs and the SUT boundary statement understood before review.
Phase 2: Spec Compliance
| AC | Helper | Scenario assertion | Unit-test coverage |
|---|---|---|---|
| AC-1 (≥1 tile / 3 s high-quality nav frames) | evaluate_tile_generation_rate |
assert rate_report.passes |
5 cases (exact pass, under-min fail, zero window, invalid arg, empty list) |
| AC-2 (Mode B Fact #105 quality fields populated) | evaluate_tile_quality_metadata |
assert quality_report.passes |
7 cases (full pass, missing quality field, missing top-level, non-dict quality, empty list, null value, partial drop) |
| AC-3 (dedup: ±1 m footprint AND ±5 % GSD) | evaluate_dedup |
assert dedup_report.passes |
8 cases (dup same centre, far apart pass, different GSD pass, close-GSD dupe, missing GSD skip, empty pass, invalid args raise, 3-tile pair detection) |
| AC-4 (landing upload HTTP 202 every tile) | evaluate_upload_acks + mock_suite_sat_audit.fetch_audit |
assert upload_report.passes |
5 cases (all acked pass, missing tile fail, extra audit pass, empty generated fail, malformed-entry skip, non-dict skip) + 11 HTTP-client cases |
| AC-5 ( | capture_utc − generated_at | ≤ 60 s) | evaluate_capture_date_freshness |
AC-6 (no tile-load-rejected: stale for fresh tiles) |
evaluate_freshness_gate |
assert freshness_report.passes |
7 cases (no rejections pass, unrelated rejection pass, fresh stale-rejected fail, non-stale reason ignored, tile_id key variant, non-dict skip, custom stale reason) |
AC-7 (parameterized across (fc_adapter, vio_strategy)) |
conftest fixtures | 6 collected variants per scenario = 12 total | — |
All ACs satisfied. The @pytest.mark.traces_to("AC-8.4,AC-1,...") markers wire scenarios to AC IDs for the traceability matrix.
Phase 3: Code Quality
- SOLID: each evaluator is a pure function over a
TileSpec/dict input returning a frozen-dataclass report.mock_suite_sat_audit.fetch_auditis a single-responsibility HTTP wrapper. Test files mirror helper shape (one test file per helper module). - Error handling: all error paths raise
ValueError(input validation) orRuntimeError(HTTP / response shape). No bareexceptor silent swallowing. Theexcept (TypeError, ValueError)in_parse_iso8601_utc_secondsis typed and limited to parsing failure. - Naming:
evaluate_*matches sibling helpers (gcs_telemetry_evaluator,tile_cache_inspector); report dataclasses follow<Concern>Report/<Concern>EntryReportnaming. - Complexity: longest function
evaluate_tile_quality_metadataat ~25 lines;evaluate_dedupat ~25 lines with an O(N²) loop the docstring explicitly documents. All under coderule's 50-line threshold. - DRY: no in-batch duplication. Cross-batch duplication of haversine logic surfaced as F1.
- Test quality: every unit test uses Arrange/Act/Assert comments per coderule. Tests assert meaningful behavior (specific drift values, specific failing tile IDs, specific error substrings) rather than "no error thrown".
- Dead code: none.
_top_level_field_to_attris a documented forward-compatibility seam (1:1 today, allows future field-name drift handling).
Phase 4: Security Quick-Scan
- No SQL, no string-interpolated queries (FDR is JSON file iteration).
- No
subprocess(... shell=True), noexec, noeval. - No hardcoded secrets; the HTTP client takes a base URL from
mock_suite_sat_urlfixture. - Input validation:
fetch_auditvalidatesbase_urlnon-empty,run_idnon-empty, HTTP 2xx, JSON object body,entrieslist shape. - No sensitive data in error messages; HTTP error message truncates body to 200 chars.
- No insecure deserialization — JSON parsed via
httpx.Response.json()(stdlibjson.loadsunderneath); shape checked post-parse.
Phase 5: Performance Scan
evaluate_dedupis O(N²) — explicitly documented in the docstring as acceptable for <100 tiles per 5-min replay. For longer flights a spatial index (KD-tree) would be needed; out of scope for AZ-422.fetch_auditis one-shot HTTP GET with 10 s timeout and no retries — appropriate for a co-located mock in the compose harness.- FDR iteration in scenarios uses
fdr_reader.iter_records(generator). - No N+1 patterns, no unbounded fetching beyond the SUT's tile count.
Phase 6: Cross-Task Consistency
TileSpecmirrorse2e/fixtures/mock-suite-sat/app.py'sTilePublishRequest+TileQualityMetadata(Mode B Fact #105). ConstantsTILE_REQUIRED_TOP_LEVEL_FIELDSandTILE_REQUIRED_QUALITY_FIELDSmake the contract explicit.- Scenario skip pattern (
if not sitl_replay_ready: pytest.skip(...)) matches FT-P-12/13/15/16/18. httpxHTTP client matches the dependency already pinned for sibling helpers.geo.distance_mreuse rather than re-implementing Vincenty/haversine in this helper.- Test fixture imports (
evidence_dir,run_id,nfr_recorder,fc_adapter,vio_strategy,mock_suite_sat_url,sitl_replay_ready) match the conftest signature used by sibling scenarios.
Phase 7: Architecture Compliance
- All new files live under
e2e/— owned exclusively by the Blackbox Tests component per_docs/02_document/module-layout.md. - No imports from
src/gps_denied_onboard— explicit "public-boundary discipline" note at top ofmid_flight_tile_evaluator.py; verified by reading every import. - No layering violations.
- No new cyclic module dependencies (the helper imports from
.geoonly; tests import fromrunner.helpers.*). - No duplicate symbols across components (the carry-over duplicate haversine is intra-component, tracked in F1).
- No cross-cutting concerns re-implemented locally; HTTP client, JSON parsing, and geo math all delegate to shared dependencies.
Verdict
- 0 Critical, 0 High → no FAIL trigger.
- 2 Low (one carry-over, one production-dependency surface) → PASS_WITH_WARNINGS.
Batch 83 is ready to commit. The two Low findings are surfaced for batch report and feed forward into the next cumulative review (batches 82–84) without blocking.