Adds three pure-logic evaluators + scenarios + unit tests covering the project's failure-mode robustness ladder (AC-3.1, AC-3.4, AC-3.5, AC-NEW-8): * outlier_tolerance_evaluator (AZ-424 / FT-N-01): per-event 50 m drift bound + 3-frame covariance-monotonic window over the AZ-408 outlier injector's medium-density manifest. * outage_request_evaluator (AZ-425 / FT-N-03): detects 3+ consecutive missing-frame windows; validates OPERATOR_RELOC_REQUEST STATUSTEXT arrives at 2 s ±500 ms, dead_reckoned label during outage, and no FC EKF divergence. * blackout_spoof_evaluator (AZ-426 / FT-N-04): eight-AC ladder across the 5 s / 15 s / 35 s sub-windows — switch latency, spoof rejection, monotonic covariance, honest horiz_accuracy, STATUSTEXT 1-2 Hz, 35 s escalation thresholds, and recovery gate. Each scenario is skip-gated on the AZ-441 / AZ-407 / AZ-416 replay / SITL / mavproxy helpers; unit tests (14 + 18 + 29 = 61) cover the AC logic today. Full e2e unit-test suite: 527 passed (+67). Co-authored-by: Cursor <cursoragent@cursor.com>
12 KiB
Code Review Report
Batch: 73 — AZ-424, AZ-425, AZ-426 Date: 2026-05-17 Verdict: PASS
Findings
(none)
Findings Sweep
Phase 1 — Context Loading
Loaded specs AZ-424_ft_n_01_outlier_tolerance.md,
AZ-425_ft_n_03_outage_reloc.md, AZ-426_ft_n_04_blackout_spoof.md.
Re-read injector surfaces touched by the new evaluators:
e2e/fixtures/injectors/outlier.py (manifest.csv schema +
OutlierInjectionReport.out_root), e2e/fixtures/injectors/blackout_spoof.py
(BlackoutSpoofPlan, BlackoutSpoofSchedule.window_start_ms / window_end_ms,
spoofed-GPS cadence + AC-NEW-8 200-500 m delta bounds). Re-read existing
fixture wiring in e2e/runner/helpers/injector_fixtures.py to confirm
outlier_injection_derkachi and blackout_spoof_derkachi parametrize
on density / window_seconds. Re-read the scenario template used in
batch 71/72 (tests/positive/test_ft_p_10_smoothing_lookback.py,
tests/negative/test_ft_n_02_sharp_turn_failure.py) for the
_harness_helpers_implemented gate pattern and the FDR / mavproxy /
sitl_observer access conventions.
Phase 2 — Spec Compliance
AZ-424 (FT-N-01)
| AC | Coverage | Status |
|---|---|---|
| AC-1 (medium-density injection; ≥10 outliers) | test_constants_match_spec, test_evaluate_count_below_minimum_fails, test_evaluate_count_at_minimum_passes_count_gate, scenario assertion via MIN_OUTLIER_COUNT |
Covered |
| AC-2 (drift bound ≤50 m per outlier) | test_evaluate_event_drift_within_budget, test_evaluate_event_drift_exceeds_budget_fails, test_evaluate_event_missing_neighbour_drift_none, scenario per-event assertion via OutlierEventReport.passes_drift |
Covered |
| AC-3 (covariance monotonic across 3-frame window) | test_evaluate_event_cov_monotonic_passes, test_evaluate_event_cov_decreasing_fails, test_evaluate_event_cov_flat_window_passes, scenario assertion via passes_covariance |
Covered |
| AC-4 (parameterization per fc_adapter × vio_strategy) | scenario uses conftest fc_adapter/vio_strategy fixtures + indirect outlier_injection_derkachi (density=medium, seed=0) |
Covered |
| CSV evidence | test_write_csv_evidence_round_trips, scenario writes ft-n-01-{fc_adapter}-{vio_strategy}.csv |
Covered |
AZ-425 (FT-N-03)
| AC | Coverage | Status |
|---|---|---|
| AC-1 (≥3 consecutive missing frames) | test_detect_no_outage_returns_empty, test_detect_run_below_min_length_ignored, test_detect_single_outage_window, test_detect_multiple_windows, test_detect_trailing_outage_window, scenario assertion via passes_min_length |
Covered |
AC-2 (STATUSTEXT OPERATOR_RELOC_REQUEST within 2 s ±500 ms of onset) |
test_statustext_within_tolerance_passes, test_statustext_within_tolerance_late_passes, test_statustext_too_early_fails, test_statustext_too_late_fails, test_statustext_missing_fails, test_statustext_payload_mismatch_fails, scenario assertion via passes_statustext |
Covered |
| AC-3 (dead_reckoned label during outage) | test_dead_reckoned_during_window_passes, test_dead_reckoned_absent_fails, scenario assertion via passes_dead_reckoned |
Covered |
| AC-4 (no FC EKF divergence event during outage) | test_ekf_divergence_during_window_fails, test_ekf_divergence_outside_window_ignored, scenario assertion via passes_ekf |
Covered |
| AC-5 (parameterization) | scenario uses conftest fc_adapter/vio_strategy fixtures |
Covered |
| CSV evidence | test_write_csv_evidence_round_trips, scenario writes ft-n-03-{fc_adapter}-{vio_strategy}.csv |
Covered |
AZ-426 (FT-N-04)
| AC | Coverage | Status |
|---|---|---|
| AC-1 (switch latency ≤1 frame OR ≤400 ms) | test_switch_latency_within_400_ms_passes (validates min(400, frame_period_ms) budget), test_switch_latency_within_one_frame_passes, test_switch_latency_at_one_frame_boundary_passes, test_switch_latency_missing_dead_reckoned_fails, scenario assertion via switch_latency.passes |
Covered |
| AC-2 (spoof-rejected events AND no satellite re-anchor inside window) | test_spoof_rejection_pass, test_spoof_rejection_no_events_fails, test_spoof_rejection_label_returns_to_satellite_fails, scenario assertion via spoof_rejection.passes |
Covered |
| AC-3 (covariance monotonic) | test_covariance_monotonic_pass, test_covariance_monotonic_decreasing_fails, scenario assertion via covariance_monotonic.passes |
Covered |
AC-4 (horiz_accuracy ≥ 0.95 × cov_semi_major_m) |
test_honest_accuracy_pass, test_honest_accuracy_boundary_pass, test_honest_accuracy_violation_fails, scenario assertion via honest_accuracy.passes |
Covered |
AC-5 (VISUAL_BLACKOUT_IMU_ONLY rate ∈ [1, 2] Hz) |
test_statustext_rate_pass_at_1hz, test_statustext_rate_pass_at_2hz, test_statustext_rate_too_slow_fails, test_statustext_rate_too_fast_fails, scenario assertion via statustext_rate.passes |
Covered |
| AC-6 (35 s only: cov 100 m → fix_type ≤2D) | test_escalation_non_35s_window_passes_vacuously, test_escalation_35s_ac6_fix_type_degraded_passes, test_escalation_35s_ac6_fix_type_not_degraded_fails, scenario assertion gated on is_35s via escalation.passes_ac6 |
Covered |
AC-7 (35 s only: cov 500 m OR 30 s duration → horiz=999, VISUAL_BLACKOUT_FAILSAFE within 500 ms) |
test_escalation_35s_no_crossings_passes (vacuous on duration-only path), test_escalation_35s_ac7_horiz_not_999_fails, scenario assertion gated on is_35s via escalation.passes_ac7 |
Covered |
| AC-8 (recovery gate: ≥10 s stable + consistency check pass) | test_recovery_gate_pass, test_recovery_gate_unstable_fails, test_recovery_gate_spoofed_fails, test_recovery_gate_no_consistency_check_fails, test_recovery_gate_no_recovery_attempt_vacuous_pass, scenario assertion via recovery_gate.passes |
Covered |
| AC-9 (parameterization × 3 windows) | scenario indirect-parametrizes blackout_spoof_derkachi over _WINDOW_LADDER_S = (5.0, 15.0, 35.0) with ids ["5s", "15s", "35s"]; conftest fc_adapter/vio_strategy adds 6 variants = 18 collected items per fc_adapter pair |
Covered |
| CSV evidence | test_write_csv_evidence_round_trips, scenario writes ft-n-04-{window_s}s-{fc_adapter}-{vio_strategy}.csv |
Covered |
Phase 3 — Code Quality
- Single responsibility: each evaluator is one module with one
responsibility —
outlier_tolerance_evaluatoraggregates per-event AC-2/AC-3 reports;outage_request_evaluatordetects outage windows and evaluates AC-1..AC-4 per window;blackout_spoof_evaluatorevaluates the AC-1..AC-8 ladder against oneBlackoutWindow. None of the three pulls in scenario-specific helpers (drive replay / collect samples) — those live in the scenario test files. - Method naming: per-AC evaluators are named after the AC concern
(
evaluate_switch_latency,evaluate_spoof_rejection,evaluate_covariance_monotonic,evaluate_honest_accuracy,evaluate_statustext_rate,evaluate_escalation,evaluate_recovery_gate). The aggregateevaluate(...)in each module composes the per-AC reports into a single dataclass. - No suppressed errors:
load_outlier_manifestraises on missing file and missing columns; the manifest writer raises naturally on ENOENT; the evaluator helpers raise no exceptions of their own. No bareexcept, no2>/dev/null-equivalents. - AAA comment discipline: every test uses `# Arrange / # Act /
Assert`; sections are omitted when not needed (e.g. constant
invariant tests just have# Assert). - Public boundary: confirmed all three evaluators import only from
the
e2e.runner.helpers.geosymbol (when needed) and dataclasses / stdlib. Nofrom gps_denied_onboard .... Confirmed via grep.
Phase 4 — Security
- No new secrets, credentials, or network paths. All three evaluators are pure-logic over already-collected samples / events.
- Spoof rejection (AC-2) is the project's primary anti-spoof
invariant; the evaluator does not bypass it — it asserts the FDR
recorded the rejection AND that the source-label state machine did
not silently re-promote to
satellite_anchoredinside the window. - Honest accuracy (AC-4) ensures the SUT cannot under-report
uncertainty to the FC. The evaluator's check is
horiz_accuracy ≥ 0.95 × cov_semi_major_mper the spec; we explicitly cover the boundary intest_honest_accuracy_boundary_passso a future implementation cannot pass by emittinghoriz = covwhile the spec budget is0.95 × cov.
Phase 5 — Performance
All three evaluators are O(N) over their input sequences (single
pass over estimates, single pass over events, single pass over
statustexts). No nested scans beyond the bounded 3-frame window in
outlier_tolerance_evaluator.evaluate_event. CSV writes use
buffered csv.writer. No file I/O at module import time.
Phase 6 — Cross-Task Consistency
- Shared
geo.distance_mis the single point-to-point distance helper used byoutlier_tolerance_evaluator. Matches theaccuracy_evaluator,multi_segment_evaluator,smoothing_evaluator,cold_start_evaluatorconventions. - Shared
_harness_helpers_implementedskip gate: all three new scenarios use the same probe pattern astest_ft_p_10_*,test_ft_p_11_*,test_ft_n_02_*—NotImplementedErroronframe_source_replay,fdr_reader,imu_replay,mavproxy_tlog_reader,sitl_observercollapses to a singlepytest.skip(...)with a pointer to the relevant unit test. - Constants centralised inside each module:
MIN_OUTLIER_COUNT,DRIFT_BUDGET_M,SWITCH_LATENCY_MS,STATUSTEXT_RATE_*_HZ,ESCALATION_*all sit at the top of their respective modules and are imported as named constants in the unit tests. No magic numbers inline. - Source-label vocabulary:
dead_reckoned/satellite_anchoredare spelled identically across the three new evaluators and match the prior batches (sharp_turn_detector.ALLOWED_DURING_TURN_LABELS,multi_segment_evaluator, FDR schema in batch 67-68). - STATUSTEXT regex strings:
OPERATOR_RELOC_REQUEST(FT-N-03),VISUAL_BLACKOUT_IMU_ONLY(FT-N-04 AC-5),VISUAL_BLACKOUT_FAILSAFE(FT-N-04 AC-7) match the spec verbatim; unit-tested for substring presence + payload mismatch.
Phase 7 — Architecture Compliance
- Module placement: all three evaluators live in
e2e/runner/helpers/; their unit tests ine2e/_unit_tests/helpers/; their scenarios ine2e/tests/negative/. Consistent with the AZ-406 layout and the directory-layout invariant test (which now lists the three new helpers + three new scenarios). - No
src/gps_denied_onboardimports anywhere in the new code. Verified by inspection — the evaluators only consume typed dataclasses populated by the scenario from public-boundary sources (FDR, mavproxy tlog, SITL state, injector manifests). - Scenario gating: each new scenario file uses
pytest.skip(...)with an explicit message pointing to the unit test that covers the gated AC logic. This is the established pattern from FT-P-07/08/09/10/11 and FT-N-02 — scenario coverage comes online once the AZ-441 / AZ-407 / AZ-416 leftovers ship.
Test Results
- New unit tests: 14 (outlier) + 18 (outage) + 29 (blackout-spoof) = 61 new tests
- Plus 6 new entries in the parametrized
test_required_path_exists(3 evaluator paths + 3 scenario paths) — counted toward the suite total. - Full
e2e/_unit_testssuite: 527 passed in 130 s (previous cumulative: 460 → +67 net). - Scenario collection for the three negative tests: 48 items collect
cleanly (parametrized across
fc_adapter × vio_strategy × {density | window_seconds}). The session-end/e2e-results/evidence/per-nfrteardown error is the same pre-existing wart documented in batches 69-72 (nfr_recorder hardcoded path; not introduced by this batch).