# Code Review Report **Batch**: 81 — AZ-420 (FT-P-12 GCS downsample + FT-P-13 GCS command path) **Date**: 2026-05-17 **Verdict**: PASS_WITH_WARNINGS ## Findings | # | Severity | Category | File:Line | Title | |----|----------|---------------|--------------------------------------------------------------------|--------------------------------------------------------| | 1 | Low | Scope | `e2e/runner/helpers/gcs_telemetry_evaluator.py` | `HintAckReport.passes` returns False for empty hints | | 2 | Low | Maintainability | `e2e/tests/positive/test_ft_p_13_gcs_command.py:114` | FDR records loaded twice if regions list is long | ### Finding Details **F1: `HintAckReport.passes` returns False when no hints supplied** (Low / Scope) - Location: `e2e/runner/helpers/gcs_telemetry_evaluator.py:205-210` - Description: `passes` returns `False` if the hint list is empty. The scenario test pre-checks `if not hints: pytest.fail(...)` before calling `correlate_hint_acks`, so this branch is never reached in practice. But a future caller could be surprised — "no hints = trivially pass" is arguably the more defensible default for a pure evaluator. - Suggestion: leave as-is; the explicit upstream `pytest.fail` is cleaner than overloading the evaluator's semantics. Documented in the dataclass docstring. - Task: AZ-420 **F2: FDR record loop appends to two lists in one pass** (Low / Maintainability) - Location: `e2e/tests/positive/test_ft_p_13_gcs_command.py:117-137` - Description: The test walks the FDR archive once and appends to both `acks` and `regions`. The if/elif keeps the walk O(n), but the branch ordering makes the test harder to scan when a future contributor adds a third record type. - Suggestion: defer until a third record type is needed; splitting prematurely adds two loops for no current benefit. - Task: AZ-420 ## Findings Sweep ### Phase 1 — Context Loading Read AZ-420 spec, project restrictions, module-layout, blackbox-tests docs (FT-P-12 / FT-P-13 sections), and the previously implemented templates (`test_ft_p_02_derkachi_drift.py`, `test_ft_p_09_ap_signing.py`) to inventory the test patterns and fixture surface. Reviewed `mavlink_gcs_adapter.py` to understand the SUT's outbound summary shape (`GLOBAL_POSITION_INT` + `NAMED_VALUE_FLOAT`) — only the position message is counted for AC-6.1 to avoid double-counting the decorative companion. ### Phase 2 — Spec Compliance (AC trace) * **AC-1** (FT-P-12 GCS rate ∈ `[1, 2]` Hz) ✓ - Scenario: `test_ft_p_12_gcs_downsample` calls `compute_gcs_summary_rate` and asserts `report.passes`. - Pure-logic coverage: 10 tests in `test_gcs_telemetry_evaluator.py` (window bounds, boundary 1.0/2.0/inclusive, single-message degeneracy, identical-timestamps, non-position filtering, custom bounds, invalid bounds → `ValueError`). * **AC-2** (FT-P-13 hint ack ≤ 2 s via FDR) ✓ - Scenario: `test_ft_p_13_gcs_command` calls `correlate_hint_acks` and asserts `ack_report.passes`. - Pure-logic coverage: 6 tests (single-hint single-ack, multi-hint greedy pairing, ack-before-hint ignored, latency exactly at boundary, missing ack → `passes = False`, empty hints). * **AC-3** (FT-P-13 search prior bias) ✓ - Scenario: `test_ft_p_13_gcs_command` calls `evaluate_search_region_shift` against `anchor_search_region` FDR records and asserts `shift_report.passes`. - Pure-logic coverage: 5 shift tests + 3 haversine sanity tests (no pre-hint region, no post-hint region, shift toward hint, drift away from hint, equal distance — non-strict comparison documented). * **AC-4** (FT-P-13 no rejection) ✓ - Scenario: `test_ft_p_13_gcs_command` calls `detect_hint_rejection` and asserts `rejection_report.passes`. - Pure-logic coverage: 7 tests (no STATUSTEXT, rejection inside window, rejection outside window, case-insensitive token match, BAD_SIGNATURE / UNAUTHORIZED / REJECTED tokens, invalid `window_us` → `ValueError`). * **AC-5** (parameterisation) ✓ - `pytest --collect-only` confirms 6 param IDs per scenario: `[ardupilot|inav]-[okvis2|klt_ransac|vins_mono]`. Both tests accept `fc_adapter` + `vio_strategy` fixtures via conftest. ### Phase 3 — Code Quality * SRP: `gcs_telemetry_evaluator.py` owns four independent evaluators (`compute_gcs_summary_rate`, `correlate_hint_acks`, `evaluate_search_region_shift`, `detect_hint_rejection`) + two tlog→DTO adapters (`extract_inbound_hints`, `parse_reloc_payload`). Each function has one reason to change. ✓ * No silent error suppression: invalid bounds raise `ValueError` with a message naming the offending value (`min_required_hz must be ≥0, got -1`); malformed payload parses raise `ValueError` with the raw text (`hint payload must have 3 comma-separated fields...`); ack correlation has no try/except. ✓ * No code comments narrating mechanics; only docstrings + a one-line comment on the greedy-pairing intent ("keep moving forward to find the last pre-hint"). Tests use AAA pattern. ✓ * Function complexity: longest is `evaluate_search_region_shift` at 35 lines including the dataclass-construction tail. All under the 50-line / cyclomatic-10 threshold. ✓ * Naming: `inject_timestamp_us`, `ack_timestamp_us`, `distance_after_m`, `passes` — units are in the names; no `data` / `item` / `candidate` vagueness. ✓ ### Phase 4 — Security Quick-Scan * No SQL, no `shell=True`, no `eval`, no `exec`. ✓ * No hardcoded secrets; no API keys. ✓ * Input validation: `parse_reloc_payload` validates field count and float parsing before returning; `compute_gcs_summary_rate` validates rate bounds; `detect_hint_rejection` validates `window_us > 0`. ✓ * No sensitive data in logs (no log statements in helper). ✓ ### Phase 5 — Performance * `compute_gcs_summary_rate`: O(n) over messages, one materialisation into `timestamps`. ✓ * `correlate_hint_acks`: O(n log n) ack sort + single linear pass with greedy cursor. ✓ * `evaluate_search_region_shift`: O(r) single pass over regions. ✓ * `detect_hint_rejection`: O(m) single pass over messages with early filter on `msg_type`. ✓ * No blocking I/O in async contexts (no async here). ✓ * `collect_messages_to_list` materialises the tlog iterator once; scenarios then run 3 analyzers over the result without re-parsing — same pattern as `ap_contract_evaluator`. ✓ ### Phase 6 — Cross-Task Consistency * `capture_gcs_tlog` mirrors `capture_ap_tlog` exactly: same signature `(host: str, duration_s: float) -> Path`, same env-var resolution (`E2E_SITL_REPLAY_DIR`), same RuntimeError messaging pattern, same `duration_s > 0` precondition. ✓ * `traces_to` marker format matches FT-P-09 / FT-P-02 conventions (`AC-6.1,AC-1,AC-5` — top-level NFR + per-AC IDs comma-separated). ✓ * Fixture naming follows `_.tlog` (matches existing `ap_tlog_.tlog` next to it). ✓ ### Phase 7 — Architecture Compliance Inputs: `_docs/02_document/module-layout.md` (Blackbox Tests owns `e2e/**`); changed files all under `e2e/`. 1. **Layer direction**: all imports inside `e2e/` reference `runner.helpers.*` (same component). No imports of `src/gps_denied_onboard.*`. Verified by `test_no_sut_imports.py` (PASS). ✓ 2. **Public API respect**: `gcs_telemetry_evaluator` imports only `runner.helpers.mavproxy_tlog_reader.TlogMessage` (a sibling helper); scenario tests import only from `runner.helpers.*` and stdlib. No cross-component imports. ✓ 3. **No new cyclic dependencies**: `gcs_telemetry_evaluator` → `mavproxy_tlog_reader`; no back-edge from reader to evaluator. Scenario tests are leaf modules (nothing imports them). ✓ 4. **Duplicate symbols**: no class/function/constant in the new helper duplicates an existing symbol anywhere in `e2e/`. `compute_gcs_summary_rate` is the GCS-summary-rate counterpart to `ap_contract_evaluator.compute_gps_input_rate` but is named differently and operates on a distinct message type (`GLOBAL_POSITION_INT` vs. `GPS_INPUT`). ✓ 5. **Cross-cutting concerns**: haversine math is local to this helper. Project does not yet have a shared geo-math module; one helper instance is acceptable until a second consumer appears (e.g. FT-N-04 spoof detection might want it). Noted for future refactor; not flagged as a finding. ✓ ## Production Dependencies (forward-look) FT-P-13 (AC-2 / AC-3) transitively depends on two production capabilities that are documented as deferred work: * **Inbound STATUSTEXT command parser** in `c8_fc_adapter/mavlink_gcs_adapter.py` (currently emits but does not consume). The C12 `MavlinkOperatorCommandTransport` concrete implementation is a Protocol-only stub today. * **`anchor_search_region` FDR record** emitted by the C2 backbone per nav-camera frame. The FDR schema (AC-NEW-3 family) reserves the slot, but no producer wires it yet. Both gaps are tracked outside AZ-420 — the test is shaped so it exercises these capabilities the moment they land, and skips cleanly (via `sitl_replay_ready`) or fails loudly (via the explicit `pytest.fail` on empty hint list / empty FDR archive) otherwise. This is the "tests as gates" pattern endorsed by the implement skill. ## Regression Gate Full `e2e/_unit_tests/` suite: **746 passed in 147.57 s**, single run, no flakes. Up from 700 (batch 80 baseline) by +46: * +39 in new `test_gcs_telemetry_evaluator.py` (10 rate, 6 ack-corr, 3 haversine, 5 shift, 7 rejection, 4 extract-hints, 3 parse-payload, 1 collect_messages_to_list). * +4 in `test_sitl_observer.py` (`capture_gcs_tlog` happy path + missing env + missing fixture + zero/negative duration). * +2 in `test_directory_layout.py` (new helper module + 2 new scenario tests under positive/). * +1 net from a `test_no_sut_imports.py` walk that now covers the new helper. No tests removed; no tests skipped under normal CI execution; the two new scenarios skip locally because `E2E_SITL_REPLAY_DIR` is unset, which is the intended container-vs-host boundary.