Files
gps-denied-onboard/_docs/03_implementation/reviews/batch_81_review.md
T
Oleksandr Bezdieniezhnykh bb744d9078 [AZ-420] Batch 81: FT-P-12 + FT-P-13 GCS scenarios
FT-P-12: parse mavproxy-listener tlog over a 60 s Derkachi replay and
assert SUT->GCS GLOBAL_POSITION_INT cadence lands in [1, 2] Hz (AC-6.1).

FT-P-13: inject `RELOC:<lat>,<lon>,<radius_m>` STATUSTEXT while the SUT
is in dead_reckoned; verify FDR `c8.gcs.operator_command` ack <=2s,
`anchor_search_region` centre shifts toward the hint, and no
BAD_SIGNATURE / UNAUTHORIZED / REJECTED STATUSTEXT lands in the
post-inject window (AC-6.2).

Adds runner.helpers.gcs_telemetry_evaluator (rate, hint-ack correlation,
haversine search-region shift, rejection scan) and
sitl_observer.capture_gcs_tlog (parity surface to capture_ap_tlog).
Pure-logic coverage: 39 new unit tests; full e2e/_unit_tests/ suite
746 passing (was 700). Scenarios skip locally on missing SITL replay
fixture; production hooks (inbound STATUSTEXT parser, anchor_search_region
FDR emitter) tracked outside this task.

See _docs/03_implementation/batch_81_report.md +
reviews/batch_81_review.md.

Co-authored-by: Cursor <cursoragent@cursor.com>
2026-05-17 14:46:08 +03:00

10 KiB

Code Review Report

Batch: 81 — AZ-420 (FT-P-12 GCS downsample + FT-P-13 GCS command path) Date: 2026-05-17 Verdict: PASS_WITH_WARNINGS

Findings

# Severity Category File:Line Title
1 Low Scope e2e/runner/helpers/gcs_telemetry_evaluator.py HintAckReport.passes returns False for empty hints
2 Low Maintainability e2e/tests/positive/test_ft_p_13_gcs_command.py:114 FDR records loaded twice if regions list is long

Finding Details

F1: HintAckReport.passes returns False when no hints supplied (Low / Scope)

  • Location: e2e/runner/helpers/gcs_telemetry_evaluator.py:205-210
  • Description: passes returns False if the hint list is empty. The scenario test pre-checks if not hints: pytest.fail(...) before calling correlate_hint_acks, so this branch is never reached in practice. But a future caller could be surprised — "no hints = trivially pass" is arguably the more defensible default for a pure evaluator.
  • Suggestion: leave as-is; the explicit upstream pytest.fail is cleaner than overloading the evaluator's semantics. Documented in the dataclass docstring.
  • Task: AZ-420

F2: FDR record loop appends to two lists in one pass (Low / Maintainability)

  • Location: e2e/tests/positive/test_ft_p_13_gcs_command.py:117-137
  • Description: The test walks the FDR archive once and appends to both acks and regions. The if/elif keeps the walk O(n), but the branch ordering makes the test harder to scan when a future contributor adds a third record type.
  • Suggestion: defer until a third record type is needed; splitting prematurely adds two loops for no current benefit.
  • Task: AZ-420

Findings Sweep

Phase 1 — Context Loading

Read AZ-420 spec, project restrictions, module-layout, blackbox-tests docs (FT-P-12 / FT-P-13 sections), and the previously implemented templates (test_ft_p_02_derkachi_drift.py, test_ft_p_09_ap_signing.py) to inventory the test patterns and fixture surface. Reviewed mavlink_gcs_adapter.py to understand the SUT's outbound summary shape (GLOBAL_POSITION_INT + NAMED_VALUE_FLOAT) — only the position message is counted for AC-6.1 to avoid double-counting the decorative companion.

Phase 2 — Spec Compliance (AC trace)

  • AC-1 (FT-P-12 GCS rate ∈ [1, 2] Hz) ✓

    • Scenario: test_ft_p_12_gcs_downsample calls compute_gcs_summary_rate and asserts report.passes.
    • Pure-logic coverage: 10 tests in test_gcs_telemetry_evaluator.py (window bounds, boundary 1.0/2.0/inclusive, single-message degeneracy, identical-timestamps, non-position filtering, custom bounds, invalid bounds → ValueError).
  • AC-2 (FT-P-13 hint ack ≤ 2 s via FDR) ✓

    • Scenario: test_ft_p_13_gcs_command calls correlate_hint_acks and asserts ack_report.passes.
    • Pure-logic coverage: 6 tests (single-hint single-ack, multi-hint greedy pairing, ack-before-hint ignored, latency exactly at boundary, missing ack → passes = False, empty hints).
  • AC-3 (FT-P-13 search prior bias) ✓

    • Scenario: test_ft_p_13_gcs_command calls evaluate_search_region_shift against anchor_search_region FDR records and asserts shift_report.passes.
    • Pure-logic coverage: 5 shift tests + 3 haversine sanity tests (no pre-hint region, no post-hint region, shift toward hint, drift away from hint, equal distance — non-strict comparison documented).
  • AC-4 (FT-P-13 no rejection) ✓

    • Scenario: test_ft_p_13_gcs_command calls detect_hint_rejection and asserts rejection_report.passes.
    • Pure-logic coverage: 7 tests (no STATUSTEXT, rejection inside window, rejection outside window, case-insensitive token match, BAD_SIGNATURE / UNAUTHORIZED / REJECTED tokens, invalid window_usValueError).
  • AC-5 (parameterisation) ✓

    • pytest --collect-only confirms 6 param IDs per scenario: [ardupilot|inav]-[okvis2|klt_ransac|vins_mono]. Both tests accept fc_adapter + vio_strategy fixtures via conftest.

Phase 3 — Code Quality

  • SRP: gcs_telemetry_evaluator.py owns four independent evaluators (compute_gcs_summary_rate, correlate_hint_acks, evaluate_search_region_shift, detect_hint_rejection) + two tlog→DTO adapters (extract_inbound_hints, parse_reloc_payload). Each function has one reason to change. ✓
  • No silent error suppression: invalid bounds raise ValueError with a message naming the offending value (min_required_hz must be ≥0, got -1); malformed payload parses raise ValueError with the raw text (hint payload must have 3 comma-separated fields...); ack correlation has no try/except. ✓
  • No code comments narrating mechanics; only docstrings + a one-line comment on the greedy-pairing intent ("keep moving forward to find the last pre-hint"). Tests use AAA pattern. ✓
  • Function complexity: longest is evaluate_search_region_shift at 35 lines including the dataclass-construction tail. All under the 50-line / cyclomatic-10 threshold. ✓
  • Naming: inject_timestamp_us, ack_timestamp_us, distance_after_m, passes — units are in the names; no data / item / candidate vagueness. ✓

Phase 4 — Security Quick-Scan

  • No SQL, no shell=True, no eval, no exec. ✓
  • No hardcoded secrets; no API keys. ✓
  • Input validation: parse_reloc_payload validates field count and float parsing before returning; compute_gcs_summary_rate validates rate bounds; detect_hint_rejection validates window_us > 0. ✓
  • No sensitive data in logs (no log statements in helper). ✓

Phase 5 — Performance

  • compute_gcs_summary_rate: O(n) over messages, one materialisation into timestamps. ✓
  • correlate_hint_acks: O(n log n) ack sort + single linear pass with greedy cursor. ✓
  • evaluate_search_region_shift: O(r) single pass over regions. ✓
  • detect_hint_rejection: O(m) single pass over messages with early filter on msg_type. ✓
  • No blocking I/O in async contexts (no async here). ✓
  • collect_messages_to_list materialises the tlog iterator once; scenarios then run 3 analyzers over the result without re-parsing — same pattern as ap_contract_evaluator. ✓

Phase 6 — Cross-Task Consistency

  • capture_gcs_tlog mirrors capture_ap_tlog exactly: same signature (host: str, duration_s: float) -> Path, same env-var resolution (E2E_SITL_REPLAY_DIR), same RuntimeError messaging pattern, same duration_s > 0 precondition. ✓
  • traces_to marker format matches FT-P-09 / FT-P-02 conventions (AC-6.1,AC-1,AC-5 — top-level NFR + per-AC IDs comma-separated). ✓
  • Fixture naming follows <artifact>_<host>.tlog (matches existing ap_tlog_<host>.tlog next to it). ✓

Phase 7 — Architecture Compliance

Inputs: _docs/02_document/module-layout.md (Blackbox Tests owns e2e/**); changed files all under e2e/.

  1. Layer direction: all imports inside e2e/ reference runner.helpers.* (same component). No imports of src/gps_denied_onboard.*. Verified by test_no_sut_imports.py (PASS). ✓
  2. Public API respect: gcs_telemetry_evaluator imports only runner.helpers.mavproxy_tlog_reader.TlogMessage (a sibling helper); scenario tests import only from runner.helpers.* and stdlib. No cross-component imports. ✓
  3. No new cyclic dependencies: gcs_telemetry_evaluatormavproxy_tlog_reader; no back-edge from reader to evaluator. Scenario tests are leaf modules (nothing imports them). ✓
  4. Duplicate symbols: no class/function/constant in the new helper duplicates an existing symbol anywhere in e2e/. compute_gcs_summary_rate is the GCS-summary-rate counterpart to ap_contract_evaluator.compute_gps_input_rate but is named differently and operates on a distinct message type (GLOBAL_POSITION_INT vs. GPS_INPUT). ✓
  5. Cross-cutting concerns: haversine math is local to this helper. Project does not yet have a shared geo-math module; one helper instance is acceptable until a second consumer appears (e.g. FT-N-04 spoof detection might want it). Noted for future refactor; not flagged as a finding. ✓

Production Dependencies (forward-look)

FT-P-13 (AC-2 / AC-3) transitively depends on two production capabilities that are documented as deferred work:

  • Inbound STATUSTEXT command parser in c8_fc_adapter/mavlink_gcs_adapter.py (currently emits but does not consume). The C12 MavlinkOperatorCommandTransport concrete implementation is a Protocol-only stub today.
  • anchor_search_region FDR record emitted by the C2 backbone per nav-camera frame. The FDR schema (AC-NEW-3 family) reserves the slot, but no producer wires it yet.

Both gaps are tracked outside AZ-420 — the test is shaped so it exercises these capabilities the moment they land, and skips cleanly (via sitl_replay_ready) or fails loudly (via the explicit pytest.fail on empty hint list / empty FDR archive) otherwise. This is the "tests as gates" pattern endorsed by the implement skill.

Regression Gate

Full e2e/_unit_tests/ suite: 746 passed in 147.57 s, single run, no flakes. Up from 700 (batch 80 baseline) by +46:

  • +39 in new test_gcs_telemetry_evaluator.py (10 rate, 6 ack-corr, 3 haversine, 5 shift, 7 rejection, 4 extract-hints, 3 parse-payload, 1 collect_messages_to_list).
  • +4 in test_sitl_observer.py (capture_gcs_tlog happy path + missing env + missing fixture + zero/negative duration).
  • +2 in test_directory_layout.py (new helper module + 2 new scenario tests under positive/).
  • +1 net from a test_no_sut_imports.py walk that now covers the new helper.

No tests removed; no tests skipped under normal CI execution; the two new scenarios skip locally because E2E_SITL_REPLAY_DIR is unset, which is the intended container-vs-host boundary.