FT-P-12: parse mavproxy-listener tlog over a 60 s Derkachi replay and assert SUT->GCS GLOBAL_POSITION_INT cadence lands in [1, 2] Hz (AC-6.1). FT-P-13: inject `RELOC:<lat>,<lon>,<radius_m>` STATUSTEXT while the SUT is in dead_reckoned; verify FDR `c8.gcs.operator_command` ack <=2s, `anchor_search_region` centre shifts toward the hint, and no BAD_SIGNATURE / UNAUTHORIZED / REJECTED STATUSTEXT lands in the post-inject window (AC-6.2). Adds runner.helpers.gcs_telemetry_evaluator (rate, hint-ack correlation, haversine search-region shift, rejection scan) and sitl_observer.capture_gcs_tlog (parity surface to capture_ap_tlog). Pure-logic coverage: 39 new unit tests; full e2e/_unit_tests/ suite 746 passing (was 700). Scenarios skip locally on missing SITL replay fixture; production hooks (inbound STATUSTEXT parser, anchor_search_region FDR emitter) tracked outside this task. See _docs/03_implementation/batch_81_report.md + reviews/batch_81_review.md. Co-authored-by: Cursor <cursoragent@cursor.com>
10 KiB
Code Review Report
Batch: 81 — AZ-420 (FT-P-12 GCS downsample + FT-P-13 GCS command path) Date: 2026-05-17 Verdict: PASS_WITH_WARNINGS
Findings
| # | Severity | Category | File:Line | Title |
|---|---|---|---|---|
| 1 | Low | Scope | e2e/runner/helpers/gcs_telemetry_evaluator.py |
HintAckReport.passes returns False for empty hints |
| 2 | Low | Maintainability | e2e/tests/positive/test_ft_p_13_gcs_command.py:114 |
FDR records loaded twice if regions list is long |
Finding Details
F1: HintAckReport.passes returns False when no hints supplied (Low / Scope)
- Location:
e2e/runner/helpers/gcs_telemetry_evaluator.py:205-210 - Description:
passesreturnsFalseif the hint list is empty. The scenario test pre-checksif not hints: pytest.fail(...)before callingcorrelate_hint_acks, so this branch is never reached in practice. But a future caller could be surprised — "no hints = trivially pass" is arguably the more defensible default for a pure evaluator. - Suggestion: leave as-is; the explicit upstream
pytest.failis cleaner than overloading the evaluator's semantics. Documented in the dataclass docstring. - Task: AZ-420
F2: FDR record loop appends to two lists in one pass (Low / Maintainability)
- Location:
e2e/tests/positive/test_ft_p_13_gcs_command.py:117-137 - Description: The test walks the FDR archive once and appends to
both
acksandregions. The if/elif keeps the walk O(n), but the branch ordering makes the test harder to scan when a future contributor adds a third record type. - Suggestion: defer until a third record type is needed; splitting prematurely adds two loops for no current benefit.
- Task: AZ-420
Findings Sweep
Phase 1 — Context Loading
Read AZ-420 spec, project restrictions, module-layout, blackbox-tests
docs (FT-P-12 / FT-P-13 sections), and the previously implemented
templates (test_ft_p_02_derkachi_drift.py, test_ft_p_09_ap_signing.py)
to inventory the test patterns and fixture surface. Reviewed
mavlink_gcs_adapter.py to understand the SUT's outbound summary
shape (GLOBAL_POSITION_INT + NAMED_VALUE_FLOAT) — only the
position message is counted for AC-6.1 to avoid double-counting the
decorative companion.
Phase 2 — Spec Compliance (AC trace)
-
AC-1 (FT-P-12 GCS rate ∈
[1, 2]Hz) ✓- Scenario:
test_ft_p_12_gcs_downsamplecallscompute_gcs_summary_rateand assertsreport.passes. - Pure-logic coverage: 10 tests in
test_gcs_telemetry_evaluator.py(window bounds, boundary 1.0/2.0/inclusive, single-message degeneracy, identical-timestamps, non-position filtering, custom bounds, invalid bounds →ValueError).
- Scenario:
-
AC-2 (FT-P-13 hint ack ≤ 2 s via FDR) ✓
- Scenario:
test_ft_p_13_gcs_commandcallscorrelate_hint_acksand assertsack_report.passes. - Pure-logic coverage: 6 tests (single-hint single-ack, multi-hint
greedy pairing, ack-before-hint ignored, latency exactly at
boundary, missing ack →
passes = False, empty hints).
- Scenario:
-
AC-3 (FT-P-13 search prior bias) ✓
- Scenario:
test_ft_p_13_gcs_commandcallsevaluate_search_region_shiftagainstanchor_search_regionFDR records and assertsshift_report.passes. - Pure-logic coverage: 5 shift tests + 3 haversine sanity tests (no pre-hint region, no post-hint region, shift toward hint, drift away from hint, equal distance — non-strict comparison documented).
- Scenario:
-
AC-4 (FT-P-13 no rejection) ✓
- Scenario:
test_ft_p_13_gcs_commandcallsdetect_hint_rejectionand assertsrejection_report.passes. - Pure-logic coverage: 7 tests (no STATUSTEXT, rejection
inside window, rejection outside window, case-insensitive
token match, BAD_SIGNATURE / UNAUTHORIZED / REJECTED tokens,
invalid
window_us→ValueError).
- Scenario:
-
AC-5 (parameterisation) ✓
pytest --collect-onlyconfirms 6 param IDs per scenario:[ardupilot|inav]-[okvis2|klt_ransac|vins_mono]. Both tests acceptfc_adapter+vio_strategyfixtures via conftest.
Phase 3 — Code Quality
- SRP:
gcs_telemetry_evaluator.pyowns four independent evaluators (compute_gcs_summary_rate,correlate_hint_acks,evaluate_search_region_shift,detect_hint_rejection) + two tlog→DTO adapters (extract_inbound_hints,parse_reloc_payload). Each function has one reason to change. ✓ - No silent error suppression: invalid bounds raise
ValueErrorwith a message naming the offending value (min_required_hz must be ≥0, got -1); malformed payload parses raiseValueErrorwith the raw text (hint payload must have 3 comma-separated fields...); ack correlation has no try/except. ✓ - No code comments narrating mechanics; only docstrings + a one-line comment on the greedy-pairing intent ("keep moving forward to find the last pre-hint"). Tests use AAA pattern. ✓
- Function complexity: longest is
evaluate_search_region_shiftat 35 lines including the dataclass-construction tail. All under the 50-line / cyclomatic-10 threshold. ✓ - Naming:
inject_timestamp_us,ack_timestamp_us,distance_after_m,passes— units are in the names; nodata/item/candidatevagueness. ✓
Phase 4 — Security Quick-Scan
- No SQL, no
shell=True, noeval, noexec. ✓ - No hardcoded secrets; no API keys. ✓
- Input validation:
parse_reloc_payloadvalidates field count and float parsing before returning;compute_gcs_summary_ratevalidates rate bounds;detect_hint_rejectionvalidateswindow_us > 0. ✓ - No sensitive data in logs (no log statements in helper). ✓
Phase 5 — Performance
compute_gcs_summary_rate: O(n) over messages, one materialisation intotimestamps. ✓correlate_hint_acks: O(n log n) ack sort + single linear pass with greedy cursor. ✓evaluate_search_region_shift: O(r) single pass over regions. ✓detect_hint_rejection: O(m) single pass over messages with early filter onmsg_type. ✓- No blocking I/O in async contexts (no async here). ✓
collect_messages_to_listmaterialises the tlog iterator once; scenarios then run 3 analyzers over the result without re-parsing — same pattern asap_contract_evaluator. ✓
Phase 6 — Cross-Task Consistency
capture_gcs_tlogmirrorscapture_ap_tlogexactly: same signature(host: str, duration_s: float) -> Path, same env-var resolution (E2E_SITL_REPLAY_DIR), same RuntimeError messaging pattern, sameduration_s > 0precondition. ✓traces_tomarker format matches FT-P-09 / FT-P-02 conventions (AC-6.1,AC-1,AC-5— top-level NFR + per-AC IDs comma-separated). ✓- Fixture naming follows
<artifact>_<host>.tlog(matches existingap_tlog_<host>.tlognext to it). ✓
Phase 7 — Architecture Compliance
Inputs: _docs/02_document/module-layout.md (Blackbox Tests owns
e2e/**); changed files all under e2e/.
- Layer direction: all imports inside
e2e/referencerunner.helpers.*(same component). No imports ofsrc/gps_denied_onboard.*. Verified bytest_no_sut_imports.py(PASS). ✓ - Public API respect:
gcs_telemetry_evaluatorimports onlyrunner.helpers.mavproxy_tlog_reader.TlogMessage(a sibling helper); scenario tests import only fromrunner.helpers.*and stdlib. No cross-component imports. ✓ - No new cyclic dependencies:
gcs_telemetry_evaluator→mavproxy_tlog_reader; no back-edge from reader to evaluator. Scenario tests are leaf modules (nothing imports them). ✓ - Duplicate symbols: no class/function/constant in the new
helper duplicates an existing symbol anywhere in
e2e/.compute_gcs_summary_rateis the GCS-summary-rate counterpart toap_contract_evaluator.compute_gps_input_ratebut is named differently and operates on a distinct message type (GLOBAL_POSITION_INTvs.GPS_INPUT). ✓ - Cross-cutting concerns: haversine math is local to this helper. Project does not yet have a shared geo-math module; one helper instance is acceptable until a second consumer appears (e.g. FT-N-04 spoof detection might want it). Noted for future refactor; not flagged as a finding. ✓
Production Dependencies (forward-look)
FT-P-13 (AC-2 / AC-3) transitively depends on two production capabilities that are documented as deferred work:
- Inbound STATUSTEXT command parser in
c8_fc_adapter/mavlink_gcs_adapter.py(currently emits but does not consume). The C12MavlinkOperatorCommandTransportconcrete implementation is a Protocol-only stub today. anchor_search_regionFDR record emitted by the C2 backbone per nav-camera frame. The FDR schema (AC-NEW-3 family) reserves the slot, but no producer wires it yet.
Both gaps are tracked outside AZ-420 — the test is shaped so it
exercises these capabilities the moment they land, and skips
cleanly (via sitl_replay_ready) or fails loudly (via the
explicit pytest.fail on empty hint list / empty FDR archive)
otherwise. This is the "tests as gates" pattern endorsed by the
implement skill.
Regression Gate
Full e2e/_unit_tests/ suite: 746 passed in 147.57 s, single run,
no flakes. Up from 700 (batch 80 baseline) by +46:
- +39 in new
test_gcs_telemetry_evaluator.py(10 rate, 6 ack-corr, 3 haversine, 5 shift, 7 rejection, 4 extract-hints, 3 parse-payload, 1 collect_messages_to_list). - +4 in
test_sitl_observer.py(capture_gcs_tloghappy path + missing env + missing fixture + zero/negative duration). - +2 in
test_directory_layout.py(new helper module + 2 new scenario tests under positive/). - +1 net from a
test_no_sut_imports.pywalk that now covers the new helper.
No tests removed; no tests skipped under normal CI execution; the
two new scenarios skip locally because E2E_SITL_REPLAY_DIR is
unset, which is the intended container-vs-host boundary.