Files
gps-denied-onboard/_docs/03_implementation/reviews/cumulative_79_81_review.md
T
Oleksandr Bezdieniezhnykh b0296da911 [AZ-420] Batch 81 housekeeping + cumulative 79-81 review
Archive AZ-420 to done/; add cumulative review for batches 79-81 (PASS,
no new findings); advance autodev state to await batch 82.

Co-authored-by: Cursor <cursoragent@cursor.com>
2026-05-17 14:48:45 +03:00

9.1 KiB

Cumulative Code Review — Batches 79, 80, 81

Date: 2026-05-17 Verdict: PASS

Covers the arc:

  • Batch 79 / AZ-599 — FT-P-02 Derkachi vertical-slice fixture builder + _common.py extraction (factored out shared helpers from the b78 FT-P-01 builder).
  • Batch 80 / AZ-600 — refactor _common.py + build_p01_fixtures.py
    • build_p02_fixtures.py into a parameterised strategy framework (builder.py).
  • Batch 81 / AZ-420 — FT-P-12 (GCS downsample) + FT-P-13 (GCS command path) scenarios with a new gcs_telemetry_evaluator.py helper + sitl_observer.capture_gcs_tlog.

The three batches together close two unrelated arcs:

  1. Fixture-builder consolidation: the b78 FT-P-01 vertical slice landed as a one-off builder; b79 added a second one-off (FT-P-02) plus an _common.py extraction; b80 then collapsed both into a strategy-pattern framework where future scenarios compose existing strategies. Net result: a single durable builder.py instead of N per-scenario builders.
  2. GCS-leg blackbox coverage: b81 adds the FT-P-12 and FT-P-13 scenarios that exercise the operator-facing telemetry stream and command path — orthogonal to the fixture-builder work and the first scenario pair that probes the C8 QgcTelemetryAdapter.

Cross-Cutting Themes

Convergence on a single helper / evaluator shape

Both b80's builder.py strategies and b81's gcs_telemetry_evaluator.py follow the same convention seen across the wider runner.helpers package:

  1. Pure-logic dataclasses (FixtureBuilderConfig, GcsSummaryRateReport, HintAckReport, SearchRegionShiftReport, HintRejectionReport, InboundHint, FdrCommandAck, SearchRegionRecord) — @dataclass(frozen=True) with a passes property where the dataclass is an evaluation result.
  2. Iterable[TlogMessage] ingress — b81's four evaluators all accept Iterable[TlogMessage] instead of a Sequence so they compose with mavproxy_tlog_reader.iter_messages without materialisation. The scenario test materialises once via collect_messages_to_list and reuses for multiple analyzers (mirrors ap_contract_evaluator.collect_messages_to_list from b78).
  3. No src/gps_denied_onboard importse2e/_unit_tests/ test_no_sut_imports.py walks the entire e2e/ tree and passes across all three batches.

Fixture-naming convention

The artifact naming pattern <artifact>_<host>.tlog is consistent across batches:

  • ap_tlog_<host>.tlog (pre-existing, AP/SITL-side)
  • gcs_tlog_<host>.tlog (b81, GCS-side)

Same env var (E2E_SITL_REPLAY_DIR), same observer surface (sitl_observer.capture_ap_tlog / capture_gcs_tlog), same RuntimeError messaging convention ("RuntimeError: capture_X: env var not set", "fixture not found at PATH").

Strategy-pattern reuse readiness

B80's builder.py factored four ABCs (VideoSource, TlogSource, FdrProjection) + six concrete impls. B81 was the first scenario batch after b80; it consumes no new strategy from builder.py (FT-P-12/13 are runtime tests, not fixture-builder scenarios), so the strategy framework's reuse is still latent. The next builder work (any of FT-P-04/05/07/08/10/11) will be the actual test of b80's refactor — they should each land as a ~30-line config factory, not a new builder module.

Phase-by-Phase Findings

Phase 1 — Context Loading

Read all three batch reports, all three per-batch reviews, and the task specs AZ-599, AZ-600, AZ-420. Mapped changed files to tasks via the batch reports' "Tasks" sections.

Phase 2 — Spec Compliance (cumulative)

All three per-batch reviews already verified ACs against tests. Re-walking AC coverage across the union:

  • AZ-599 (FT-P-02 builder): 7 scenario integration tests (test_sitl_replay_builder_p02.py). Behavior preserved through b80's refactor (same tests now in slimmed file, plus new strategy-level coverage in test_sitl_replay_builder_builder.py).
  • AZ-600 (strategy refactor): 33 strategy-level tests + 6 + 7 scenario tests = 46 total across the three files.
  • AZ-420 (FT-P-12+13): 39 evaluator + adapter unit tests + 2 scenario tests (skip locally; collect 12 param IDs total).

No AC went from covered → uncovered between batches.

Phase 3 — Code Quality (cumulative)

  • SRP across batches: b80 splits orchestration (build_fixtures) from materialisation (each strategy) — clean. B81's gcs_telemetry_evaluator.py keeps four independent evaluator functions in one module — defensible because they all consume the same tlog stream and the same FDR record family; splitting would just shuffle imports.
  • Naming consistency: timestamps end in _us or _ms everywhere; distances end in _m; rates end in _hz; coordinates end in _deg. Units in the name, not in comments. ✓
  • Test AAA pattern: spot-checked 10 random tests across all three batches. All use # Arrange / # Act / # Assert comments per the coding rules. ✓
  • No comment narration of mechanics: only docstrings + one-liners on non-obvious invariants (STATIONARY_Z_ACCEL_MG gravity encoding in b80; the greedy-pairing "keep moving forward to find the last pre-hint" in b81). ✓

Phase 4 — Security Quick-Scan (cumulative)

  • No SQL, no shell=True, no eval/exec. ✓
  • No hardcoded secrets/API keys. ✓
  • subprocess.run in builder.py's run_gps_denied_replay uses a list-form cmd, not shell-interpolation. ✓
  • Tlog/FDR record parsing wraps int() / float() calls with ValueError re-raise messaging on the offending payload (b81's parse_reloc_payload, b80's ImuCsvTlog). ✓

Phase 5 — Performance (cumulative)

  • build_fixtures is bound by subprocess (run_gps_denied_replay) wall-clock, not in-process compute. Per-strategy materialisation is O(n) over input rows/frames. ✓
  • gcs_telemetry_evaluator is all O(n) over messages with a single O(n log n) ack sort. ✓
  • No new blocking I/O introduced. No new time.sleep polling loops. ✓

Phase 6 — Cross-Task Consistency

  • Interface compatibility across batches: b80 removed _common.py; b81 doesn't reference it. The only b81-side import from the fixture-builder area is the implicit fixture file path pattern, which is unchanged. No drift. ✓
  • Duplicate symbols: compute_gcs_summary_rate (b81) and compute_gps_input_rate (pre-existing ap_contract_evaluator) compute the same (N-1)/window rate over different message types — same algorithm, different AC. The fact they're separate functions is correct (they have different domain semantics). If a third "rate over message type X" appears, then a generic helper would be warranted. ✓
  • Shared logging / config: no batch introduced a new logging or config loader; the existing runner.helpers.replay_mode env-var helpers are reused. ✓

Phase 7 — Architecture Compliance

_docs/02_document/module-layout.md declares Blackbox Tests as the sole owner of e2e/**.

  1. Layer direction: all imports across the 23 changed files resolve within e2e/ (runner.helpers.*, e2e.fixtures.*, stdlib, pytest, mavlink/pymavlink, cv2 in the b80 builder). No imports of src/gps_denied_onboard.*. ✓
  2. Public API respect: all cross-helper imports go through module-level public names; no _private symbol crossed a module boundary. ✓
  3. No new cyclic deps: b80's builder.py is a leaf consumed by build_p01_fixtures.py and build_p02_fixtures.py; b81's gcs_telemetry_evaluator.py is a leaf consumed by the two scenario tests. No back-edges. ✓
  4. No duplicate symbols across components: there is only one component touched (Blackbox Tests), so this check is vacuous except internally — covered by Phase 6.
  5. Cross-cutting concerns: haversine math (b81) is local. The project does not yet have a shared geo-math module; if a second consumer appears (e.g., spoof-distance check in FT-N-04), promote to runner/helpers/geo_math.py. Tracked here, not flagged. ✓

Findings

(None new at cumulative scope.)

The two Low/Note findings from batch_81_review.md (HintAckReport.passes empty-hints semantic; FDR record loop branch ordering) remain as documented; both are scoped to b81 alone and neither is exacerbated by viewing b79/b80/b81 jointly. Carry over to the next cumulative cycle if a future batch touches the same files.

Regression Gate (cumulative)

Full e2e/_unit_tests/ suite: 746 passing in 147.57 s.

Baseline history across the arc:

Cumulative window Suite count Δ
Pre-batch-79 ~625
Post-batch-79 686 +61 (FT-P-02 builder scenario tests)
Post-batch-80 700 +14 (strategy reorganisation; finer-grained per-strategy tests)
Post-batch-81 746 +46 (FT-P-12/13 unit + scenario fixtures)

No tests removed without an equivalent replacement (test_common_module_exports_used_by_b01 retired in b80 because the "shared helper" pattern it pinned is now structural rather than file-location-based). No flakes observed across the three full-suite runs that closed each batch.