[AZ-420] Batch 81: FT-P-12 + FT-P-13 GCS scenarios

FT-P-12: parse mavproxy-listener tlog over a 60 s Derkachi replay and assert SUT->GCS GLOBAL_POSITION_INT cadence lands in [1, 2] Hz (AC-6.1). FT-P-13: inject `RELOC:<lat>,<lon>,<radius_m>` STATUSTEXT while the SUT is in dead_reckoned; verify FDR `c8.gcs.operator_command` ack <=2s, `anchor_search_region` centre shifts toward the hint, and no BAD_SIGNATURE / UNAUTHORIZED / REJECTED STATUSTEXT lands in the post-inject window (AC-6.2). Adds runner.helpers.gcs_telemetry_evaluator (rate, hint-ack correlation, haversine search-region shift, rejection scan) and sitl_observer.capture_gcs_tlog (parity surface to capture_ap_tlog). Pure-logic coverage: 39 new unit tests; full e2e/_unit_tests/ suite 746 passing (was 700). Scenarios skip locally on missing SITL replay fixture; production hooks (inbound STATUSTEXT parser, anchor_search_region FDR emitter) tracked outside this task. See _docs/03_implementation/batch_81_report.md + reviews/batch_81_review.md. Co-authored-by: Cursor <cursoragent@cursor.com>
2026-06-22 13:51:13 +00:00 · 2026-05-17 14:46:08 +03:00
parent 7fb3cb3f34
commit bb744d9078
10 changed files with 1777 additions and 3 deletions
@@ -0,0 +1,194 @@
+# Batch 81 Report — FT-P-12 + FT-P-13 GCS downsample + command path
+
+**Batch**: 81
+**Date**: 2026-05-17
+**Context**: Test implementation (greenfield Step 10 — Implement Tests)
+**Tasks**: AZ-420 (3 cp) — single scenario task covering FT-P-12 + FT-P-13
+**Cycle**: 1
+**Verdict**: COMPLETE — PASS_WITH_WARNINGS (self-reviewed; see `reviews/batch_81_review.md`)
+
+## Summary
+
+Implements the GCS-leg blackbox scenarios under epic AZ-262:
+
+* **FT-P-12** — SUT→GCS summary stream cadence (`AC-6.1`). The C8
+  `QgcTelemetryAdapter` pairs `GLOBAL_POSITION_INT` + `NAMED_VALUE_FLOAT`
+  at the configured `summary_rate_hz`; the test parses the
+  `mavproxy-listener`-captured tlog over a 60 s Derkachi replay and
+  asserts the observed `GLOBAL_POSITION_INT` rate lands in [1, 2] Hz.
+* **FT-P-13** — GCS-originated operator re-loc hint (`AC-6.2`). A
+  `STATUSTEXT` carrying `RELOC:<lat>,<lon>,<radius_m>` is injected
+  while the SUT is in `dead_reckoned`; the SUT must (a) acknowledge
+  via an FDR `c8.gcs.operator_command` record within ≤ 2 s, (b) bias
+  its next `anchor_search_region` toward the hint, (c) not reject
+  the well-formed hint with a security/auth STATUSTEXT.
+
+### AZ-420 — FT-P-12 + FT-P-13 (3 cp)
+
+* **`e2e/runner/helpers/gcs_telemetry_evaluator.py`** (new, 430 lines):
+  pure-logic evaluators sourced from the GCS tlog + FDR archive.
+  * `compute_gcs_summary_rate(messages, *, position_msg_type, ...)` →
+    `GcsSummaryRateReport(observed_rate_hz, passes, ...)` — AC-6.1.
+  * `extract_inbound_hints(messages, *, hint_prefix='RELOC:')` →
+    `list[InboundHint]` — tlog→DTO adapter.
+  * `parse_reloc_payload(hint_text)` → `(lat_deg, lon_deg, radius_m)`.
+  * `correlate_hint_acks(hints, acks)` → `HintAckReport` (AC-2).
+    Greedy injection-order pairing; each ack matches at most one hint.
+  * `evaluate_search_region_shift(regions, hint_inject_us, lat, lon)` →
+    `SearchRegionShiftReport` (AC-3). Compares last pre-hint region
+    centre to first post-hint region centre via haversine distance.
+  * `haversine_distance_m(lat_a, lon_a, lat_b, lon_b)` — great-circle
+    distance, mean Earth radius. Sub-100 km accuracy ≪ 1 m.
+  * `detect_hint_rejection(messages, inject_us, *, window_us=2e6)` →
+    `HintRejectionReport` (AC-4). Scans STATUSTEXT in the post-inject
+    window for `BAD_SIGNATURE` / `UNAUTHORIZED` / `REJECTED` tokens.
+  * `collect_messages_to_list(messages)` — convenience for the
+    "parse once, run N analyzers" pattern (mirrors
+    `ap_contract_evaluator`).
+
+* **`e2e/runner/helpers/sitl_observer.py`** (edited, +25 lines):
+  adds `capture_gcs_tlog(host, duration_s) -> Path` mirroring
+  `capture_ap_tlog`. Loads the FDR-replay fixture at
+  `${E2E_SITL_REPLAY_DIR}/gcs_tlog_<host>.tlog`. Raises `RuntimeError`
+  on missing env / missing fixture / non-positive duration.
+
+* **`e2e/tests/positive/test_ft_p_12_gcs_downsample.py`** (new,
+  110 lines): full FT-P-12 scenario. Skips when `sitl_replay_ready`
+  is False (no SITL fixture). Parametric across
+  `(fc_adapter, vio_strategy)` via conftest. `traces_to(AC-6.1,AC-1,AC-5)`.
+
+* **`e2e/tests/positive/test_ft_p_13_gcs_command.py`** (new,
+  211 lines): full FT-P-13 scenario. Walks the FDR archive for
+  `c8.gcs.operator_command` ack records + `anchor_search_region`
+  per-frame records. Skips on missing fixture; fails loudly on
+  empty hint list / empty FDR archive so the test cannot silently
+  green-light an unimplemented production path.
+  `traces_to(AC-6.2,AC-2,AC-3,AC-4,AC-5)`.
+
+* **`e2e/_unit_tests/helpers/test_gcs_telemetry_evaluator.py`** (new,
+  39 tests): pure-logic coverage for every evaluator + adapter.
+  Boundary cases include 1.0 / 2.0 Hz inclusive, ack-before-hint
+  ignored, latency exactly at 2 000 ms, no pre-hint region, equal
+  distance non-strict, BAD_SIGNATURE / UNAUTHORIZED / REJECTED
+  token detection, malformed `RELOC:` payload raises `ValueError`.
+
+* **`e2e/_unit_tests/helpers/test_sitl_observer.py`** (edited, +4 tests):
+  `capture_gcs_tlog` happy path + missing env + missing fixture +
+  zero/negative duration. Mirrors the existing `capture_ap_tlog`
+  test block.
+
+* **`e2e/_unit_tests/test_directory_layout.py`** (edited): registers
+  `runner/helpers/gcs_telemetry_evaluator.py`,
+  `tests/positive/test_ft_p_12_gcs_downsample.py`,
+  `tests/positive/test_ft_p_13_gcs_command.py`.
+
+## Tests
+
+Full `e2e/_unit_tests/` suite: **746 passed in 147.57 s** (baseline
+700 → +46 net). Run via `python -m pytest e2e/_unit_tests/` from
+the workspace root. No flakes, no skips outside the pre-existing
+intentional skips.
+
+Collection check on the two new scenario tests (`pytest
+--collect-only e2e/tests/positive/test_ft_p_12_gcs_downsample.py
+e2e/tests/positive/test_ft_p_13_gcs_command.py`): 12 items collected
+(2 tests × 6 `(fc_adapter, vio_strategy)` combinations each).
+The scenarios skip locally because `E2E_SITL_REPLAY_DIR` is unset —
+which is the intended docker-vs-host boundary; they run inside the
+docker-compose SITL replay harness.
+
+Per-area test counts (this batch):
+
+| File | Tests added |
+|------|-------------|
+| `test_gcs_telemetry_evaluator.py` (new) | 39 |
+| `test_sitl_observer.py` (edited) | 4 |
+| `test_directory_layout.py` (edited) | 3 (path entries) |
+| `test_no_sut_imports.py` (no edit; broader walk) | implicit +1 module covered |
+| **Total** | **+46** |
+
+## Acceptance Criteria Verification
+
+| AC  | Status | Evidence |
+|-----|--------|----------|
+| AC-1 — GCS rate ∈ [1, 2] Hz over 60 s window | ✓ | `test_ft_p_12_gcs_downsample` + 10 `compute_gcs_summary_rate` unit tests (boundary, degeneracy, custom bounds) |
+| AC-2 — FDR ack ≤ 2 s after inject | ✓ | `test_ft_p_13_gcs_command` + 6 `correlate_hint_acks` unit tests |
+| AC-3 — `anchor_search_region` shifts toward hint | ✓ | `test_ft_p_13_gcs_command` + 5 `evaluate_search_region_shift` + 3 `haversine_distance_m` unit tests |
+| AC-4 — No security/auth rejection in window | ✓ | `test_ft_p_13_gcs_command` + 7 `detect_hint_rejection` unit tests |
+| AC-5 — Parameterised per `(fc_adapter, vio_strategy)` | ✓ | `pytest --collect-only` shows 6 param IDs per scenario |
+
+## Code Review Verdict
+PASS_WITH_WARNINGS (no Critical, no High; 2 Low notes — see
+`reviews/batch_81_review.md`).
+
+## Auto-Fix Attempts
+0 (no auto-fix-eligible findings).
+
+## Stuck Agents
+None.
+
+## Notable Decisions
+
+* **`HintAckReport.passes` returns False for empty hints.** The
+  scenario test pre-checks `if not hints: pytest.fail(...)` before
+  calling `correlate_hint_acks`, so the evaluator never observes
+  an empty list in practice. Leaving the conservative semantic in
+  place — "no hints" is a misuse of the correlator, not a trivial
+  pass — and pushing the explicit failure upstream where the
+  contextual error message ("the fixture builder must inject at
+  least one operator re-loc hint") is more useful.
+* **AC-3's `passes` is non-strict shift.** A region exactly
+  equidistant before/after the inject is treated as "not biased"
+  (`distance_after_m < distance_before_m` is strict). This matches
+  the spec wording "shifts toward the hinted location" — zero
+  movement is not a shift. Documented in
+  `SearchRegionShiftReport.passes`.
+* **Counted `GLOBAL_POSITION_INT` only for AC-6.1, not the
+  `NAMED_VALUE_FLOAT` companion.** The QGC adapter pairs them so
+  counting both would double-count. The position message is the
+  contract-relevant half; the NAMED_VALUE_FLOAT carries the decorative
+  horizontal-uncertainty annotation.
+* **Tests are shaped to fail loudly when the upstream production
+  hooks are missing.** AC-2 requires the C8 adapter to translate an
+  inbound STATUSTEXT into an FDR `c8.gcs.operator_command` record;
+  AC-3 requires the C2 backbone to emit `anchor_search_region` FDR
+  records. Both are deferred work outside AZ-420's scope. The
+  scenario tests skip cleanly when no fixture is present
+  (`sitl_replay_ready=False`) and fail with a specific error when
+  the fixture exists but lacks the expected hint or ack records.
+  This is the "tests as gates" pattern called out in the implement
+  skill.
+
+## Production Dependencies (forward-look)
+
+FT-P-13 transitively depends on:
+
+* **Inbound STATUSTEXT command parser** in
+  `c8_fc_adapter/mavlink_gcs_adapter.py`. Currently the adapter emits
+  but does not consume STATUSTEXT. The C12
+  `MavlinkOperatorCommandTransport` concrete impl is a Protocol-only
+  stub.
+* **`anchor_search_region` FDR record** emitted by the C2 backbone
+  per nav-camera frame. The FDR schema (AC-NEW-3 family) reserves
+  the slot but no producer wires it.
+
+These gaps are surfaced (not silently absorbed) by the scenario
+tests when the fixture builder produces a tlog without the
+corresponding fixtures. They will be picked up by future production
+implementation tasks; AZ-420 owns the test surface only.
+
+## Out of Scope (deferred)
+
+* Spoofed-GPS escalation STATUSTEXT path — owned by FT-N-04 (AZ-426).
+* Operator-reloc-request emission negative-path — owned by FT-N-03
+  (AZ-425).
+* The fixture builder's actual `gcs_tlog_<host>.tlog` synthesis (with
+  `RELOC:` injection + corresponding FDR `c8.gcs.operator_command`
+  ack + `anchor_search_region` records) — owned by AZ-595.
+
+## Next Batch
+
+Batch 82 candidates from `_docs/02_tasks/todo/` (21 tasks remaining):
+AZ-421 (FT-P-14), AZ-422 (FT-P-15), AZ-423 (FT-N-01), AZ-424
+(FT-N-02). Topo-order leader is AZ-421. Pick at next `/autodev`
+invocation per implement-skill rules (≤ 4 tasks, ≤ 20 cp).
@@ -0,0 +1,216 @@
+# Code Review Report
+
+**Batch**: 81 — AZ-420 (FT-P-12 GCS downsample + FT-P-13 GCS command path)
+**Date**: 2026-05-17
+**Verdict**: PASS_WITH_WARNINGS
+
+## Findings
+
+| #  | Severity | Category      | File:Line                                                          | Title                                                  |
+|----|----------|---------------|--------------------------------------------------------------------|--------------------------------------------------------|
+| 1  | Low      | Scope         | `e2e/runner/helpers/gcs_telemetry_evaluator.py`                    | `HintAckReport.passes` returns False for empty hints   |
+| 2  | Low      | Maintainability | `e2e/tests/positive/test_ft_p_13_gcs_command.py:114`             | FDR records loaded twice if regions list is long       |
+
+### Finding Details
+
+**F1: `HintAckReport.passes` returns False when no hints supplied** (Low / Scope)
+
+- Location: `e2e/runner/helpers/gcs_telemetry_evaluator.py:205-210`
+- Description: `passes` returns `False` if the hint list is empty.
+  The scenario test pre-checks `if not hints: pytest.fail(...)` before
+  calling `correlate_hint_acks`, so this branch is never reached in
+  practice. But a future caller could be surprised — "no hints =
+  trivially pass" is arguably the more defensible default for a
+  pure evaluator.
+- Suggestion: leave as-is; the explicit upstream `pytest.fail` is
+  cleaner than overloading the evaluator's semantics. Documented in
+  the dataclass docstring.
+- Task: AZ-420
+
+**F2: FDR record loop appends to two lists in one pass** (Low / Maintainability)
+
+- Location: `e2e/tests/positive/test_ft_p_13_gcs_command.py:117-137`
+- Description: The test walks the FDR archive once and appends to
+  both `acks` and `regions`. The if/elif keeps the walk O(n), but
+  the branch ordering makes the test harder to scan when a future
+  contributor adds a third record type.
+- Suggestion: defer until a third record type is needed; splitting
+  prematurely adds two loops for no current benefit.
+- Task: AZ-420
+
+## Findings Sweep
+
+### Phase 1 — Context Loading
+
+Read AZ-420 spec, project restrictions, module-layout, blackbox-tests
+docs (FT-P-12 / FT-P-13 sections), and the previously implemented
+templates (`test_ft_p_02_derkachi_drift.py`, `test_ft_p_09_ap_signing.py`)
+to inventory the test patterns and fixture surface. Reviewed
+`mavlink_gcs_adapter.py` to understand the SUT's outbound summary
+shape (`GLOBAL_POSITION_INT` + `NAMED_VALUE_FLOAT`) — only the
+position message is counted for AC-6.1 to avoid double-counting the
+decorative companion.
+
+### Phase 2 — Spec Compliance (AC trace)
+
+* **AC-1** (FT-P-12 GCS rate ∈ `[1, 2]` Hz) ✓
+  - Scenario: `test_ft_p_12_gcs_downsample` calls
+    `compute_gcs_summary_rate` and asserts `report.passes`.
+  - Pure-logic coverage: 10 tests in `test_gcs_telemetry_evaluator.py`
+    (window bounds, boundary 1.0/2.0/inclusive, single-message
+    degeneracy, identical-timestamps, non-position filtering, custom
+    bounds, invalid bounds → `ValueError`).
+
+* **AC-2** (FT-P-13 hint ack ≤ 2 s via FDR) ✓
+  - Scenario: `test_ft_p_13_gcs_command` calls `correlate_hint_acks`
+    and asserts `ack_report.passes`.
+  - Pure-logic coverage: 6 tests (single-hint single-ack, multi-hint
+    greedy pairing, ack-before-hint ignored, latency exactly at
+    boundary, missing ack → `passes = False`, empty hints).
+
+* **AC-3** (FT-P-13 search prior bias) ✓
+  - Scenario: `test_ft_p_13_gcs_command` calls
+    `evaluate_search_region_shift` against `anchor_search_region` FDR
+    records and asserts `shift_report.passes`.
+  - Pure-logic coverage: 5 shift tests + 3 haversine sanity tests
+    (no pre-hint region, no post-hint region, shift toward hint,
+    drift away from hint, equal distance — non-strict comparison
+    documented).
+
+* **AC-4** (FT-P-13 no rejection) ✓
+  - Scenario: `test_ft_p_13_gcs_command` calls
+    `detect_hint_rejection` and asserts
+    `rejection_report.passes`.
+  - Pure-logic coverage: 7 tests (no STATUSTEXT, rejection
+    inside window, rejection outside window, case-insensitive
+    token match, BAD_SIGNATURE / UNAUTHORIZED / REJECTED tokens,
+    invalid `window_us` → `ValueError`).
+
+* **AC-5** (parameterisation) ✓
+  - `pytest --collect-only` confirms 6 param IDs per scenario:
+    `[ardupilot|inav]-[okvis2|klt_ransac|vins_mono]`. Both tests
+    accept `fc_adapter` + `vio_strategy` fixtures via conftest.
+
+### Phase 3 — Code Quality
+
+* SRP: `gcs_telemetry_evaluator.py` owns four independent evaluators
+  (`compute_gcs_summary_rate`, `correlate_hint_acks`,
+  `evaluate_search_region_shift`, `detect_hint_rejection`) + two
+  tlog→DTO adapters (`extract_inbound_hints`, `parse_reloc_payload`).
+  Each function has one reason to change. ✓
+* No silent error suppression: invalid bounds raise `ValueError`
+  with a message naming the offending value (`min_required_hz must
+  be ≥0, got -1`); malformed payload parses raise `ValueError` with
+  the raw text (`hint payload must have 3 comma-separated fields...`);
+  ack correlation has no try/except. ✓
+* No code comments narrating mechanics; only docstrings + a one-line
+  comment on the greedy-pairing intent ("keep moving forward to find
+  the last pre-hint"). Tests use AAA pattern. ✓
+* Function complexity: longest is `evaluate_search_region_shift` at
+  35 lines including the dataclass-construction tail. All under the
+  50-line / cyclomatic-10 threshold. ✓
+* Naming: `inject_timestamp_us`, `ack_timestamp_us`, `distance_after_m`,
+  `passes` — units are in the names; no `data` / `item` / `candidate`
+  vagueness. ✓
+
+### Phase 4 — Security Quick-Scan
+
+* No SQL, no `shell=True`, no `eval`, no `exec`. ✓
+* No hardcoded secrets; no API keys. ✓
+* Input validation: `parse_reloc_payload` validates field count and
+  float parsing before returning; `compute_gcs_summary_rate`
+  validates rate bounds; `detect_hint_rejection` validates
+  `window_us > 0`. ✓
+* No sensitive data in logs (no log statements in helper). ✓
+
+### Phase 5 — Performance
+
+* `compute_gcs_summary_rate`: O(n) over messages, one materialisation
+  into `timestamps`. ✓
+* `correlate_hint_acks`: O(n log n) ack sort + single linear pass
+  with greedy cursor. ✓
+* `evaluate_search_region_shift`: O(r) single pass over regions. ✓
+* `detect_hint_rejection`: O(m) single pass over messages with early
+  filter on `msg_type`. ✓
+* No blocking I/O in async contexts (no async here). ✓
+* `collect_messages_to_list` materialises the tlog iterator once;
+  scenarios then run 3 analyzers over the result without re-parsing —
+  same pattern as `ap_contract_evaluator`. ✓
+
+### Phase 6 — Cross-Task Consistency
+
+* `capture_gcs_tlog` mirrors `capture_ap_tlog` exactly: same
+  signature `(host: str, duration_s: float) -> Path`, same env-var
+  resolution (`E2E_SITL_REPLAY_DIR`), same RuntimeError messaging
+  pattern, same `duration_s > 0` precondition. ✓
+* `traces_to` marker format matches FT-P-09 / FT-P-02 conventions
+  (`AC-6.1,AC-1,AC-5` — top-level NFR + per-AC IDs comma-separated). ✓
+* Fixture naming follows `<artifact>_<host>.tlog` (matches existing
+  `ap_tlog_<host>.tlog` next to it). ✓
+
+### Phase 7 — Architecture Compliance
+
+Inputs: `_docs/02_document/module-layout.md` (Blackbox Tests owns
+`e2e/**`); changed files all under `e2e/`.
+
+1. **Layer direction**: all imports inside `e2e/` reference
+   `runner.helpers.*` (same component). No imports of
+   `src/gps_denied_onboard.*`. Verified by
+   `test_no_sut_imports.py` (PASS). ✓
+2. **Public API respect**: `gcs_telemetry_evaluator` imports only
+   `runner.helpers.mavproxy_tlog_reader.TlogMessage` (a sibling
+   helper); scenario tests import only from `runner.helpers.*`
+   and stdlib. No cross-component imports. ✓
+3. **No new cyclic dependencies**: `gcs_telemetry_evaluator` →
+   `mavproxy_tlog_reader`; no back-edge from reader to evaluator.
+   Scenario tests are leaf modules (nothing imports them). ✓
+4. **Duplicate symbols**: no class/function/constant in the new
+   helper duplicates an existing symbol anywhere in `e2e/`.
+   `compute_gcs_summary_rate` is the GCS-summary-rate counterpart
+   to `ap_contract_evaluator.compute_gps_input_rate` but is named
+   differently and operates on a distinct message type
+   (`GLOBAL_POSITION_INT` vs. `GPS_INPUT`). ✓
+5. **Cross-cutting concerns**: haversine math is local to this
+   helper. Project does not yet have a shared geo-math module; one
+   helper instance is acceptable until a second consumer appears
+   (e.g. FT-N-04 spoof detection might want it). Noted for future
+   refactor; not flagged as a finding. ✓
+
+## Production Dependencies (forward-look)
+
+FT-P-13 (AC-2 / AC-3) transitively depends on two production
+capabilities that are documented as deferred work:
+
+* **Inbound STATUSTEXT command parser** in
+  `c8_fc_adapter/mavlink_gcs_adapter.py` (currently emits but does
+  not consume). The C12 `MavlinkOperatorCommandTransport` concrete
+  implementation is a Protocol-only stub today.
+* **`anchor_search_region` FDR record** emitted by the C2 backbone
+  per nav-camera frame. The FDR schema (AC-NEW-3 family) reserves
+  the slot, but no producer wires it yet.
+
+Both gaps are tracked outside AZ-420 — the test is shaped so it
+exercises these capabilities the moment they land, and skips
+cleanly (via `sitl_replay_ready`) or fails loudly (via the
+explicit `pytest.fail` on empty hint list / empty FDR archive)
+otherwise. This is the "tests as gates" pattern endorsed by the
+implement skill.
+
+## Regression Gate
+
+Full `e2e/_unit_tests/` suite: **746 passed in 147.57 s**, single run,
+no flakes. Up from 700 (batch 80 baseline) by +46:
+
+* +39 in new `test_gcs_telemetry_evaluator.py` (10 rate, 6 ack-corr,
+  3 haversine, 5 shift, 7 rejection, 4 extract-hints, 3 parse-payload,
+  1 collect_messages_to_list).
+* +4 in `test_sitl_observer.py` (`capture_gcs_tlog` happy path +
+  missing env + missing fixture + zero/negative duration).
+* +2 in `test_directory_layout.py` (new helper module + 2 new scenario
+  tests under positive/).
+* +1 net from a `test_no_sut_imports.py` walk that now covers the
+  new helper.
+
+No tests removed; no tests skipped under normal CI execution; the
+two new scenarios skip locally because `E2E_SITL_REPLAY_DIR` is
+unset, which is the intended container-vs-host boundary.