# Cumulative Code Review Report **Batches**: 73, 74, 75 (AZ-424, AZ-425, AZ-426, AZ-594, AZ-595) **Date**: 2026-05-17 **Verdict**: PASS ## Scope Three consecutive implementation batches that together close the "negative-scenario evaluators + core harness stubs + offline observer strategy" arc: * **Batch 73 (AZ-424 / AZ-425 / AZ-426)** — added the three remaining negative-scenario evaluators (outlier tolerance, outage request, blackout-spoof) and their FT-N-01 / FT-N-03 / FT-N-04 scenarios. * **Batch 74 (AZ-594)** — turned the three core harness stubs (`fdr_reader.iter_records`, `frame_source_replay.replay_video`, `imu_replay.ImuReplayer.replay`) from `NotImplementedError` into real implementations. * **Batch 75 (AZ-595)** — implemented all 11 `sitl_observer` public surfaces as an offline FDR-replay observer reading from `${E2E_SITL_REPLAY_DIR}`; collapsed 12 per-scenario `_harness_helpers_implemented` probes into one shared `sitl_replay_ready` fixture. ## Cross-Batch Findings (none) ## Cross-Batch Sweep ### Spec Compliance Every AC across the five tickets has at least one unit test plus the intended scenario-side wiring. Coverage matrices are reproduced in the per-batch review files (`batch_73_review.md` § Phase 2, `batch_74_review.md` § Phase 2, `batch_75_review.md` § Phase 2). Spot checks across the three batches: * AC numbering is consistent within each ticket and traced via `@pytest.mark.traces_to(...)` on every scenario. * The pure-logic ACs (evaluator math, schema parsing, FDR record enumeration) are unit-tested at module level; the AC-5 / AC-6 "full scenario passes under the parametrize matrix" rows are gated on the shared `sitl_replay_ready` fixture and will activate the moment the fixture builder lands. ### Code Quality (cross-batch) * **Single responsibility holds across the three batches**: each new evaluator owns exactly the AC math it claims (outlier_tolerance, outage_request, blackout_spoof). Each new helper body owns exactly one stream-of-records surface (fdr_reader, frame_source_replay, imu_replay). The offline `sitl_observer` owns env-var resolution + fixture-JSON ingestion, with one helper per fixture filename. * **No suppressed errors across any batch**: all parse paths raise `ValueError` with a file pointer; missing inputs raise `FileNotFoundError`; the `_load_optional_*` family in `sitl_observer` falls back to `[]` ONLY when the file is genuinely absent — a present-but-malformed file still raises. * **AAA comment discipline holds across the three batches**: 61 (b73) + 34 (b74) + 38 (b75) = **133 new unit tests**, each tagged with `# Arrange / # Act / # Assert`. No vague stubs. * **No code comments narrate code**; module docstrings explain non-obvious design choices (wire-vs-runner schema rename, OpenCV realtime/non-realtime distinction, offline replay rationale). * **Public boundary holds**: every new module imports only stdlib + `cv2` (frame_source_replay) + pre-existing internal helpers. Zero `from gps_denied_onboard ...` imports across the three batches. ### Security (cross-batch) * **No new secrets, credentials, or network surfaces introduced** in any of the three batches. Batch 75 in particular removes the implicit requirement for a live SITL container in scenario unit runs — the runner now reads from on-disk JSON fixtures via a read-only env var. * **No `eval`, `exec`, `pickle`, `subprocess`, or `yaml.load(unsafe=True)`** in any new module. * The wire-schema gate in `fdr_reader._parse_envelope` (b74) is the cross-batch safety invariant — any SUT schema drift surfaces at parse time, never as silent default-zero records. ### Performance (cross-batch) * All new evaluators / helpers are O(N) in the bounded inputs they consume (per-window frame count, per-fixture event count, per-CSV row count). The only O(N log N) step is `fdr_reader.iter_records`'s multi-file merge-sort, deliberately accepted to give scenarios a monotonic-time guarantee. * No I/O at module-import time across all three batches. * `_FdrReplayObserver` (b75) caches the parsed payload behind the `get_observer` factory so repeat scenario calls for the same (fc_kind, host) do not re-parse JSON. ### Cross-Task Consistency * **The three batches are tightly coupled by design**: - b73 scenarios introduced `_harness_helpers_implemented` probes that gated on `NotImplementedError` from `fdr_reader`, `frame_source_replay`, `imu_replay`, and `sitl_observer`. - b74 landed three of those four helpers — the probes started catching `FileNotFoundError` instead of `NotImplementedError`, and (per the outer `except Exception: return False`) continued to skip cleanly. No batch-73 scenario broke. - b75 landed the fourth helper, then deleted every `_harness_helpers_implemented` probe in favour of the shared `sitl_replay_ready` fixture. The skip path is now keyed on a single env var (`E2E_SITL_REPLAY_DIR`) rather than four `try / except NotImplementedError` probes. * **Skip semantics are now uniform**: every scenario that depends on the FDR-replay path emits a skip message of the form `"FT-X-Y full scenario requires `E2E_SITL_REPLAY_DIR` to point at a prepared SITL replay fixture (AZ-595). Pure-logic ACs covered by ."` The grep-by-scenario inventory in `batch_75_review.md` Phase 6 enumerates the 12 scenarios. * **Dataclass field names hold across batches**: the b75 `EkfDivergenceEvent`, `GpsHealthSample`, `ConsistencyCheckEvent`, `FcGpsState`, `MspFrameCapture`, `InavGpsState` field names match exactly what the b73 evaluators (`outlier_tolerance_evaluator`, `outage_request_evaluator`, `blackout_spoof_evaluator`) and b72 consumers (`ap_contract_evaluator`, `msp_frame_observer`) already reference. No consumer required edits in b75. * **`FileNotFoundError` is the cross-batch convention** for missing on-disk inputs — `accuracy_evaluator`, `multi_segment_evaluator`, `mavproxy_tlog_reader`, `cold_start_evaluator` (pre-existing), `fdr_reader` / `frame_source_replay` / `imu_replay` (b74), and `sitl_observer._load_required_json` (b75) all agree. * **`sleep_fn` injection pattern** introduced in b74 for `FrameSourceReplayer` + `ImuReplayer` is also the pattern used by pre-existing helpers `tile_cache_builder`, `age_injector`. b75 did not introduce any new sleep paths. ### Architecture Compliance * **Module placement unchanged** across all three batches. Every new helper / evaluator lives at `e2e/runner/helpers/.py`; every new unit test lives at `e2e/_unit_tests/helpers/test_.py`. Directory-layout invariant test (`test_directory_layout.py`) passes unmodified across b73→b75 (the b73 additions registered the three new evaluators; b74/b75 only edited existing registered files). * **No `src/gps_denied_onboard` imports** introduced in any of the three batches. Confirmed by `grep`. * **Backwards-compatible scenario contract**: every public surface that scenarios called pre-b73 retains the same name + return type through b75. The only deletions are the local `_harness_helpers_implemented` / `_NullSink` / `_NullImuEmitter` helpers that lived inside the scenario files themselves — never part of the public helper API. ## Test Results (cumulative) | Batch | New unit tests | Cumulative pass count | Net delta | |-------|----------------|------------------------|-----------| | 72 (baseline) | — | 460 | — | | 73 | 61 (14 outlier + 18 outage + 29 blackout-spoof) | 527 | +67 | | 74 | 34 (14 fdr_reader + 10 frame_source_replay + 10 imu_replay) | 558 | +31 | | 75 | 38 (sitl_observer end-to-end) | **596** | +38 | * Full `e2e/_unit_tests` suite: **596 passed in 123 s** at end of b75. * No new linter errors at any batch boundary. * The pre-existing collection-time `/e2e-results/evidence` teardown warning persists when scenarios are collected outside docker — it is a known b67 artefact, not caused by any of these three batches. ## Cumulative Verdict PASS. The three batches together (a) close the negative-scenario evaluator coverage, (b) land the three core harness stubs that unblocked every NotImplementedError-gated scenario, and (c) unify the scenario skip pattern behind one env var. No cross-batch inconsistencies, no architectural drift, no security regressions.