mirror of https://github.com/azaion/gps-denied-onboard.git synced 2026-06-21 22:31:13 +00:00

Files

T

Oleksandr Bezdieniezhnykh 5def1a3eb3 [AZ-422] Add FT-P-17 + FT-N-06 mid-flight tile blackbox tests

Implement the AC-8.4 and AC-NEW-6 blackbox scenarios for mid-flight
tile generation, dedup, landing-time upload, and freshness gating.

Helpers:
- runner/helpers/mid_flight_tile_evaluator.py — pure-logic evaluators
  for tile generation rate, Mode B Fact #105 schema check, footprint+
  GSD dedup (via geo.distance_m), upload-audit reconciliation, and
  the AC-5/AC-6 capture_utc + freshness-gate checks.
- runner/helpers/mock_suite_sat_audit.py — httpx wrapper for the
  mock-suite-sat-service /tiles/audit endpoint with strict response-
  shape validation.

Scenarios:
- tests/positive/test_ft_p_17_mid_flight_tiles.py
- tests/negative/test_ft_n_06_mid_flight_freshness.py

Both skip when sitl_replay_ready is false and fail loudly when fixture
records are missing (tests-as-gates discipline). 52 new unit tests
(41 evaluator + 11 audit client) cover every helper branch.

Review: PASS_WITH_WARNINGS (2 Low — duplicate haversine carry-over,
upstream production dependency surface).

Co-authored-by: Cursor <cursoragent@cursor.com>

2026-05-17 15:28:39 +03:00

9.5 KiB

Raw Blame History

Code Review Report

Batch: 83 — AZ-422 (FT-P-17 + FT-N-06 mid-flight tile generation + freshness) Date: 2026-05-17 Verdict: PASS_WITH_WARNINGS

Files Reviewed

Created:

e2e/runner/helpers/mid_flight_tile_evaluator.py
e2e/runner/helpers/mock_suite_sat_audit.py
e2e/tests/positive/test_ft_p_17_mid_flight_tiles.py
e2e/tests/negative/test_ft_n_06_mid_flight_freshness.py
e2e/_unit_tests/helpers/test_mid_flight_tile_evaluator.py
e2e/_unit_tests/helpers/test_mock_suite_sat_audit.py

Modified:

e2e/_unit_tests/test_directory_layout.py (registered 4 new paths)

Findings

#	Severity	Category	File:Line	Title
1	Low	Maintainability	`e2e/runner/helpers/gcs_telemetry_evaluator.py` (carry-over)	Duplicate haversine helper not consolidated to `geo.distance_m`
2	Low	Spec-Gap	`e2e/tests/positive/test_ft_p_17_mid_flight_tiles.py`, `e2e/tests/negative/test_ft_n_06_mid_flight_freshness.py`	Tests depend on upstream production + fixture-builder features that don't exist yet

Finding Details

F1: Duplicate haversine helper not consolidated to geo.distance_m (Low / Maintainability — carry-over)

Location: e2e/runner/helpers/gcs_telemetry_evaluator.py
Description: Batch 81 introduced a private haversine function inside gcs_telemetry_evaluator.py for search-region-shift distance math. runner.helpers.geo.distance_m is the project-wide Vincenty helper (used by this batch's mid_flight_tile_evaluator.py for dedup). Two helpers, two algorithms, same intent.
Suggestion: Migrate gcs_telemetry_evaluator.py to geo.distance_m in a dedicated refactor batch (≤1 point). Out of scope for AZ-422 — would expand the diff into a helper already shipped and reviewed.
Task: Carry-over from batches 79–81 cumulative review.

F2: Tests depend on upstream production + fixture-builder features that don't exist yet (Low / Spec-Gap)

Location: e2e/tests/positive/test_ft_p_17_mid_flight_tiles.py, e2e/tests/negative/test_ft_n_06_mid_flight_freshness.py
Description: Both scenarios require:
- SUT writing mid-flight-tile-output FDR records with the Mode B Fact #105 schema (production, owned by epic E-OBC / Mode B work — outside the test harness).
- SUT emitting tile-load-rejected FDR records with reason="stale" from the freshness gate (same).
- simulate_landing() MAVLink command or equivalent public-input mechanism that triggers landing-time tile upload (production, public-input).
- mock-suite-sat-service audit endpoint (already exists in e2e/fixtures/mock-suite-sat/app.py).
- Fixture builder support for the FT_P_17_HIGH_QUALITY_WINDOW_S env var, computed from segment-quality FDR records (AZ-595).
- Fixture builder support for AZ-422 5-min Derkachi replay scenario.
The tests skip cleanly when sitl_replay_ready is false (consistent with FT-P-12/13/15/16/18) and fail loudly when the fixture exists but the required records are missing — adhering to the "tests as gates" principle.
Suggestion: Surface as a single line in the AZ-422 batch report under Production Dependencies. No code change in this batch.
Task: AZ-422.

Phase 1: Context Loading

Read AZ-422 task spec (_docs/02_tasks/todo/AZ-422_ft_p_17_ftn_06_mid_flight_tiles.md). All seven ACs and the SUT boundary statement understood before review.

Phase 2: Spec Compliance

AC	Helper	Scenario assertion	Unit-test coverage
AC-1 (≥1 tile / 3 s high-quality nav frames)	`evaluate_tile_generation_rate`	`assert rate_report.passes`	5 cases (exact pass, under-min fail, zero window, invalid arg, empty list)
AC-2 (Mode B Fact #105 quality fields populated)	`evaluate_tile_quality_metadata`	`assert quality_report.passes`	7 cases (full pass, missing quality field, missing top-level, non-dict quality, empty list, null value, partial drop)
AC-3 (dedup: ±1 m footprint AND ±5 % GSD)	`evaluate_dedup`	`assert dedup_report.passes`	8 cases (dup same centre, far apart pass, different GSD pass, close-GSD dupe, missing GSD skip, empty pass, invalid args raise, 3-tile pair detection)
AC-4 (landing upload HTTP 202 every tile)	`evaluate_upload_acks` + `mock_suite_sat_audit.fetch_audit`	`assert upload_report.passes`	5 cases (all acked pass, missing tile fail, extra audit pass, empty generated fail, malformed-entry skip, non-dict skip) + 11 HTTP-client cases
AC-5 (	capture_utc − generated_at	≤ 60 s)	`evaluate_capture_date_freshness`
AC-6 (no `tile-load-rejected: stale` for fresh tiles)	`evaluate_freshness_gate`	`assert freshness_report.passes`	7 cases (no rejections pass, unrelated rejection pass, fresh stale-rejected fail, non-stale reason ignored, tile_id key variant, non-dict skip, custom stale reason)
AC-7 (parameterized across `(fc_adapter, vio_strategy)`)	conftest fixtures	6 collected variants per scenario = 12 total	—

All ACs satisfied. The @pytest.mark.traces_to("AC-8.4,AC-1,...") markers wire scenarios to AC IDs for the traceability matrix.

Phase 3: Code Quality

SOLID: each evaluator is a pure function over a TileSpec/dict input returning a frozen-dataclass report. mock_suite_sat_audit.fetch_audit is a single-responsibility HTTP wrapper. Test files mirror helper shape (one test file per helper module).
Error handling: all error paths raise ValueError (input validation) or RuntimeError (HTTP / response shape). No bare except or silent swallowing. The except (TypeError, ValueError) in _parse_iso8601_utc_seconds is typed and limited to parsing failure.
Naming: evaluate_* matches sibling helpers (gcs_telemetry_evaluator, tile_cache_inspector); report dataclasses follow <Concern>Report / <Concern>EntryReport naming.
Complexity: longest function evaluate_tile_quality_metadata at ~25 lines; evaluate_dedup at ~25 lines with an O(N²) loop the docstring explicitly documents. All under coderule's 50-line threshold.
DRY: no in-batch duplication. Cross-batch duplication of haversine logic surfaced as F1.
Test quality: every unit test uses Arrange/Act/Assert comments per coderule. Tests assert meaningful behavior (specific drift values, specific failing tile IDs, specific error substrings) rather than "no error thrown".
Dead code: none. _top_level_field_to_attr is a documented forward-compatibility seam (1:1 today, allows future field-name drift handling).

Phase 4: Security Quick-Scan

No SQL, no string-interpolated queries (FDR is JSON file iteration).
No subprocess(... shell=True), no exec, no eval.
No hardcoded secrets; the HTTP client takes a base URL from mock_suite_sat_url fixture.
Input validation: fetch_audit validates base_url non-empty, run_id non-empty, HTTP 2xx, JSON object body, entries list shape.
No sensitive data in error messages; HTTP error message truncates body to 200 chars.
No insecure deserialization — JSON parsed via httpx.Response.json() (stdlib json.loads underneath); shape checked post-parse.

Phase 5: Performance Scan

evaluate_dedup is O(N²) — explicitly documented in the docstring as acceptable for <100 tiles per 5-min replay. For longer flights a spatial index (KD-tree) would be needed; out of scope for AZ-422.
fetch_audit is one-shot HTTP GET with 10 s timeout and no retries — appropriate for a co-located mock in the compose harness.
FDR iteration in scenarios uses fdr_reader.iter_records (generator).
No N+1 patterns, no unbounded fetching beyond the SUT's tile count.

Phase 6: Cross-Task Consistency

TileSpec mirrors e2e/fixtures/mock-suite-sat/app.py's TilePublishRequest + TileQualityMetadata (Mode B Fact #105). Constants TILE_REQUIRED_TOP_LEVEL_FIELDS and TILE_REQUIRED_QUALITY_FIELDS make the contract explicit.
Scenario skip pattern (if not sitl_replay_ready: pytest.skip(...)) matches FT-P-12/13/15/16/18.
httpx HTTP client matches the dependency already pinned for sibling helpers.
geo.distance_m reuse rather than re-implementing Vincenty/haversine in this helper.
Test fixture imports (evidence_dir, run_id, nfr_recorder, fc_adapter, vio_strategy, mock_suite_sat_url, sitl_replay_ready) match the conftest signature used by sibling scenarios.

Phase 7: Architecture Compliance

All new files live under e2e/ — owned exclusively by the Blackbox Tests component per _docs/02_document/module-layout.md.
No imports from src/gps_denied_onboard — explicit "public-boundary discipline" note at top of mid_flight_tile_evaluator.py; verified by reading every import.
No layering violations.
No new cyclic module dependencies (the helper imports from .geo only; tests import from runner.helpers.*).
No duplicate symbols across components (the carry-over duplicate haversine is intra-component, tracked in F1).
No cross-cutting concerns re-implemented locally; HTTP client, JSON parsing, and geo math all delegate to shared dependencies.

Verdict

0 Critical, 0 High → no FAIL trigger.
2 Low (one carry-over, one production-dependency surface) → PASS_WITH_WARNINGS.

Batch 83 is ready to commit. The two Low findings are surfaced for batch report and feed forward into the next cumulative review (batches 82–84) without blocking.

9.5 KiB Raw Blame History Unescape Escape