Archive AZ-420 to done/; add cumulative review for batches 79-81 (PASS, no new findings); advance autodev state to await batch 82. Co-authored-by: Cursor <cursoragent@cursor.com>
9.1 KiB
Cumulative Code Review — Batches 79, 80, 81
Date: 2026-05-17 Verdict: PASS
Covers the arc:
- Batch 79 / AZ-599 — FT-P-02 Derkachi vertical-slice fixture
builder +
_common.pyextraction (factored out shared helpers from the b78 FT-P-01 builder). - Batch 80 / AZ-600 — refactor
_common.py+build_p01_fixtures.pybuild_p02_fixtures.pyinto a parameterised strategy framework (builder.py).
- Batch 81 / AZ-420 — FT-P-12 (GCS downsample) + FT-P-13 (GCS
command path) scenarios with a new
gcs_telemetry_evaluator.pyhelper +sitl_observer.capture_gcs_tlog.
The three batches together close two unrelated arcs:
- Fixture-builder consolidation: the b78 FT-P-01 vertical slice
landed as a one-off builder; b79 added a second one-off (FT-P-02)
plus an
_common.pyextraction; b80 then collapsed both into a strategy-pattern framework where future scenarios compose existing strategies. Net result: a single durablebuilder.pyinstead of N per-scenario builders. - GCS-leg blackbox coverage: b81 adds the FT-P-12 and FT-P-13
scenarios that exercise the operator-facing telemetry stream
and command path — orthogonal to the fixture-builder work and
the first scenario pair that probes the C8
QgcTelemetryAdapter.
Cross-Cutting Themes
Convergence on a single helper / evaluator shape
Both b80's builder.py strategies and b81's
gcs_telemetry_evaluator.py follow the same convention seen across
the wider runner.helpers package:
- Pure-logic dataclasses (
FixtureBuilderConfig,GcsSummaryRateReport,HintAckReport,SearchRegionShiftReport,HintRejectionReport,InboundHint,FdrCommandAck,SearchRegionRecord) —@dataclass(frozen=True)with apassesproperty where the dataclass is an evaluation result. Iterable[TlogMessage]ingress — b81's four evaluators all acceptIterable[TlogMessage]instead of a Sequence so they compose withmavproxy_tlog_reader.iter_messageswithout materialisation. The scenario test materialises once viacollect_messages_to_listand reuses for multiple analyzers (mirrorsap_contract_evaluator.collect_messages_to_listfrom b78).- No
src/gps_denied_onboardimports —e2e/_unit_tests/ test_no_sut_imports.pywalks the entiree2e/tree and passes across all three batches.
Fixture-naming convention
The artifact naming pattern <artifact>_<host>.tlog is consistent
across batches:
ap_tlog_<host>.tlog(pre-existing, AP/SITL-side)gcs_tlog_<host>.tlog(b81, GCS-side)
Same env var (E2E_SITL_REPLAY_DIR), same observer surface
(sitl_observer.capture_ap_tlog / capture_gcs_tlog), same
RuntimeError messaging convention ("RuntimeError: capture_X: env var
not set", "fixture not found at PATH").
Strategy-pattern reuse readiness
B80's builder.py factored four ABCs (VideoSource, TlogSource,
FdrProjection) + six concrete impls. B81 was the first scenario
batch after b80; it consumes no new strategy from builder.py
(FT-P-12/13 are runtime tests, not fixture-builder scenarios), so
the strategy framework's reuse is still latent. The next builder
work (any of FT-P-04/05/07/08/10/11) will be the actual test of
b80's refactor — they should each land as a ~30-line config
factory, not a new builder module.
Phase-by-Phase Findings
Phase 1 — Context Loading
Read all three batch reports, all three per-batch reviews, and the task specs AZ-599, AZ-600, AZ-420. Mapped changed files to tasks via the batch reports' "Tasks" sections.
Phase 2 — Spec Compliance (cumulative)
All three per-batch reviews already verified ACs against tests. Re-walking AC coverage across the union:
- AZ-599 (FT-P-02 builder): 7 scenario integration tests (test_sitl_replay_builder_p02.py). Behavior preserved through b80's refactor (same tests now in slimmed file, plus new strategy-level coverage in test_sitl_replay_builder_builder.py).
- AZ-600 (strategy refactor): 33 strategy-level tests + 6 + 7 scenario tests = 46 total across the three files.
- AZ-420 (FT-P-12+13): 39 evaluator + adapter unit tests + 2 scenario tests (skip locally; collect 12 param IDs total).
No AC went from covered → uncovered between batches.
Phase 3 — Code Quality (cumulative)
- SRP across batches: b80 splits orchestration (
build_fixtures) from materialisation (each strategy) — clean. B81'sgcs_telemetry_evaluator.pykeeps four independent evaluator functions in one module — defensible because they all consume the same tlog stream and the same FDR record family; splitting would just shuffle imports. - Naming consistency: timestamps end in
_usor_mseverywhere; distances end in_m; rates end in_hz; coordinates end in_deg. Units in the name, not in comments. ✓ - Test AAA pattern: spot-checked 10 random tests across all three
batches. All use
# Arrange / # Act / # Assertcomments per the coding rules. ✓ - No comment narration of mechanics: only docstrings + one-liners
on non-obvious invariants (
STATIONARY_Z_ACCEL_MGgravity encoding in b80; the greedy-pairing "keep moving forward to find the last pre-hint" in b81). ✓
Phase 4 — Security Quick-Scan (cumulative)
- No SQL, no
shell=True, noeval/exec. ✓ - No hardcoded secrets/API keys. ✓
subprocess.runinbuilder.py'srun_gps_denied_replayuses a list-formcmd, not shell-interpolation. ✓- Tlog/FDR record parsing wraps
int()/float()calls with ValueError re-raise messaging on the offending payload (b81'sparse_reloc_payload, b80'sImuCsvTlog). ✓
Phase 5 — Performance (cumulative)
build_fixturesis bound by subprocess (run_gps_denied_replay) wall-clock, not in-process compute. Per-strategy materialisation is O(n) over input rows/frames. ✓gcs_telemetry_evaluatoris all O(n) over messages with a single O(n log n) ack sort. ✓- No new blocking I/O introduced. No new
time.sleeppolling loops. ✓
Phase 6 — Cross-Task Consistency
- Interface compatibility across batches: b80 removed
_common.py; b81 doesn't reference it. The only b81-side import from the fixture-builder area is the implicit fixture file path pattern, which is unchanged. No drift. ✓ - Duplicate symbols:
compute_gcs_summary_rate(b81) andcompute_gps_input_rate(pre-existingap_contract_evaluator) compute the same(N-1)/windowrate over different message types — same algorithm, different AC. The fact they're separate functions is correct (they have different domain semantics). If a third "rate over message type X" appears, then a generic helper would be warranted. ✓ - Shared logging / config: no batch introduced a new logging or
config loader; the existing
runner.helpers.replay_modeenv-var helpers are reused. ✓
Phase 7 — Architecture Compliance
_docs/02_document/module-layout.md declares Blackbox Tests as the
sole owner of e2e/**.
- Layer direction: all imports across the 23 changed files
resolve within
e2e/(runner.helpers.*,e2e.fixtures.*, stdlib, pytest, mavlink/pymavlink, cv2 in the b80 builder). No imports ofsrc/gps_denied_onboard.*. ✓ - Public API respect: all cross-helper imports go through
module-level public names; no
_privatesymbol crossed a module boundary. ✓ - No new cyclic deps: b80's
builder.pyis a leaf consumed bybuild_p01_fixtures.pyandbuild_p02_fixtures.py; b81'sgcs_telemetry_evaluator.pyis a leaf consumed by the two scenario tests. No back-edges. ✓ - No duplicate symbols across components: there is only one component touched (Blackbox Tests), so this check is vacuous except internally — covered by Phase 6.
- Cross-cutting concerns: haversine math (b81) is local. The
project does not yet have a shared geo-math module; if a second
consumer appears (e.g., spoof-distance check in FT-N-04), promote
to
runner/helpers/geo_math.py. Tracked here, not flagged. ✓
Findings
(None new at cumulative scope.)
The two Low/Note findings from batch_81_review.md
(HintAckReport.passes empty-hints semantic; FDR record loop branch
ordering) remain as documented; both are scoped to b81 alone and
neither is exacerbated by viewing b79/b80/b81 jointly. Carry over
to the next cumulative cycle if a future batch touches the same
files.
Regression Gate (cumulative)
Full e2e/_unit_tests/ suite: 746 passing in 147.57 s.
Baseline history across the arc:
| Cumulative window | Suite count | Δ |
|---|---|---|
| Pre-batch-79 | ~625 | |
| Post-batch-79 | 686 | +61 (FT-P-02 builder scenario tests) |
| Post-batch-80 | 700 | +14 (strategy reorganisation; finer-grained per-strategy tests) |
| Post-batch-81 | 746 | +46 (FT-P-12/13 unit + scenario fixtures) |
No tests removed without an equivalent replacement
(test_common_module_exports_used_by_b01 retired in b80 because
the "shared helper" pattern it pinned is now structural rather than
file-location-based). No flakes observed across the three full-suite
runs that closed each batch.