Implement all 11 `sitl_observer` public surfaces as an offline
FDR-replay strategy (reads JSON fixtures under `${E2E_SITL_REPLAY_DIR}`
instead of live pymavlink/yamspy). Replace 12 per-scenario
`_harness_helpers_implemented` probes with one shared session-scoped
`sitl_replay_ready` fixture in `e2e/tests/conftest.py`.
Net: -636 LoC of duplicated scenario gating, +17 LoC shared fixture,
+38 new unit tests (596 total, up from 558). Includes K=3 cumulative
review for batches 73-75 (PASS).
Co-authored-by: Cursor <cursoragent@cursor.com>
8.2 KiB
Cumulative Code Review Report
Batches: 73, 74, 75 (AZ-424, AZ-425, AZ-426, AZ-594, AZ-595) Date: 2026-05-17 Verdict: PASS
Scope
Three consecutive implementation batches that together close the "negative-scenario evaluators + core harness stubs + offline observer strategy" arc:
- Batch 73 (AZ-424 / AZ-425 / AZ-426) — added the three remaining negative-scenario evaluators (outlier tolerance, outage request, blackout-spoof) and their FT-N-01 / FT-N-03 / FT-N-04 scenarios.
- Batch 74 (AZ-594) — turned the three core harness stubs
(
fdr_reader.iter_records,frame_source_replay.replay_video,imu_replay.ImuReplayer.replay) fromNotImplementedErrorinto real implementations. - Batch 75 (AZ-595) — implemented all 11
sitl_observerpublic surfaces as an offline FDR-replay observer reading from${E2E_SITL_REPLAY_DIR}; collapsed 12 per-scenario_harness_helpers_implementedprobes into one sharedsitl_replay_readyfixture.
Cross-Batch Findings
(none)
Cross-Batch Sweep
Spec Compliance
Every AC across the five tickets has at least one unit test plus the
intended scenario-side wiring. Coverage matrices are reproduced in the
per-batch review files (batch_73_review.md § Phase 2,
batch_74_review.md § Phase 2, batch_75_review.md § Phase 2). Spot
checks across the three batches:
- AC numbering is consistent within each ticket and traced via
@pytest.mark.traces_to(...)on every scenario. - The pure-logic ACs (evaluator math, schema parsing, FDR record
enumeration) are unit-tested at module level; the AC-5 / AC-6 "full
scenario passes under the parametrize matrix" rows are gated on the
shared
sitl_replay_readyfixture and will activate the moment the fixture builder lands.
Code Quality (cross-batch)
- Single responsibility holds across the three batches: each new
evaluator owns exactly the AC math it claims (outlier_tolerance,
outage_request, blackout_spoof). Each new helper body owns exactly
one stream-of-records surface (fdr_reader, frame_source_replay,
imu_replay). The offline
sitl_observerowns env-var resolution + fixture-JSON ingestion, with one helper per fixture filename. - No suppressed errors across any batch: all parse paths raise
ValueErrorwith a file pointer; missing inputs raiseFileNotFoundError; the_load_optional_*family insitl_observerfalls back to[]ONLY when the file is genuinely absent — a present-but-malformed file still raises. - AAA comment discipline holds across the three batches: 61 (b73)
- 34 (b74) + 38 (b75) = 133 new unit tests, each tagged with
# Arrange / # Act / # Assert. No vague stubs.
- 34 (b74) + 38 (b75) = 133 new unit tests, each tagged with
- No code comments narrate code; module docstrings explain non-obvious design choices (wire-vs-runner schema rename, OpenCV realtime/non-realtime distinction, offline replay rationale).
- Public boundary holds: every new module imports only stdlib +
cv2(frame_source_replay) + pre-existing internal helpers. Zerofrom gps_denied_onboard ...imports across the three batches.
Security (cross-batch)
- No new secrets, credentials, or network surfaces introduced in any of the three batches. Batch 75 in particular removes the implicit requirement for a live SITL container in scenario unit runs — the runner now reads from on-disk JSON fixtures via a read-only env var.
- No
eval,exec,pickle,subprocess, oryaml.load(unsafe=True)in any new module. - The wire-schema gate in
fdr_reader._parse_envelope(b74) is the cross-batch safety invariant — any SUT schema drift surfaces at parse time, never as silent default-zero records.
Performance (cross-batch)
- All new evaluators / helpers are O(N) in the bounded inputs they
consume (per-window frame count, per-fixture event count, per-CSV
row count). The only O(N log N) step is
fdr_reader.iter_records's multi-file merge-sort, deliberately accepted to give scenarios a monotonic-time guarantee. - No I/O at module-import time across all three batches.
_FdrReplayObserver(b75) caches the parsed payload behind theget_observerfactory so repeat scenario calls for the same (fc_kind, host) do not re-parse JSON.
Cross-Task Consistency
- The three batches are tightly coupled by design:
- b73 scenarios introduced
_harness_helpers_implementedprobes that gated onNotImplementedErrorfromfdr_reader,frame_source_replay,imu_replay, andsitl_observer. - b74 landed three of those four helpers — the probes started
catching
FileNotFoundErrorinstead ofNotImplementedError, and (per the outerexcept Exception: return False) continued to skip cleanly. No batch-73 scenario broke. - b75 landed the fourth helper, then deleted every
_harness_helpers_implementedprobe in favour of the sharedsitl_replay_readyfixture. The skip path is now keyed on a single env var (E2E_SITL_REPLAY_DIR) rather than fourtry / except NotImplementedErrorprobes.
- b73 scenarios introduced
- Skip semantics are now uniform: every scenario that depends on
the FDR-replay path emits a skip message of the form
"FT-X-Y full scenario requiresE2E_SITL_REPLAY_DIRto point at a prepared SITL replay fixture (AZ-595). Pure-logic ACs covered by <unit-test>."The grep-by-scenario inventory inbatch_75_review.mdPhase 6 enumerates the 12 scenarios. - Dataclass field names hold across batches: the b75
EkfDivergenceEvent,GpsHealthSample,ConsistencyCheckEvent,FcGpsState,MspFrameCapture,InavGpsStatefield names match exactly what the b73 evaluators (outlier_tolerance_evaluator,outage_request_evaluator,blackout_spoof_evaluator) and b72 consumers (ap_contract_evaluator,msp_frame_observer) already reference. No consumer required edits in b75. FileNotFoundErroris the cross-batch convention for missing on-disk inputs —accuracy_evaluator,multi_segment_evaluator,mavproxy_tlog_reader,cold_start_evaluator(pre-existing),fdr_reader/frame_source_replay/imu_replay(b74), andsitl_observer._load_required_json(b75) all agree.sleep_fninjection pattern introduced in b74 forFrameSourceReplayer+ImuReplayeris also the pattern used by pre-existing helperstile_cache_builder,age_injector. b75 did not introduce any new sleep paths.
Architecture Compliance
- Module placement unchanged across all three batches. Every new
helper / evaluator lives at
e2e/runner/helpers/<name>.py; every new unit test lives ate2e/_unit_tests/helpers/test_<name>.py. Directory-layout invariant test (test_directory_layout.py) passes unmodified across b73→b75 (the b73 additions registered the three new evaluators; b74/b75 only edited existing registered files). - No
src/gps_denied_onboardimports introduced in any of the three batches. Confirmed bygrep. - Backwards-compatible scenario contract: every public surface
that scenarios called pre-b73 retains the same name + return type
through b75. The only deletions are the local
_harness_helpers_implemented/_NullSink/_NullImuEmitterhelpers that lived inside the scenario files themselves — never part of the public helper API.
Test Results (cumulative)
| Batch | New unit tests | Cumulative pass count | Net delta |
|---|---|---|---|
| 72 (baseline) | — | 460 | — |
| 73 | 61 (14 outlier + 18 outage + 29 blackout-spoof) | 527 | +67 |
| 74 | 34 (14 fdr_reader + 10 frame_source_replay + 10 imu_replay) | 558 | +31 |
| 75 | 38 (sitl_observer end-to-end) | 596 | +38 |
- Full
e2e/_unit_testssuite: 596 passed in 123 s at end of b75. - No new linter errors at any batch boundary.
- The pre-existing collection-time
/e2e-results/evidenceteardown warning persists when scenarios are collected outside docker — it is a known b67 artefact, not caused by any of these three batches.
Cumulative Verdict
PASS. The three batches together (a) close the negative-scenario evaluator coverage, (b) land the three core harness stubs that unblocked every NotImplementedError-gated scenario, and (c) unify the scenario skip pattern behind one env var. No cross-batch inconsistencies, no architectural drift, no security regressions.