mirror of https://github.com/azaion/gps-denied-onboard.git synced 2026-06-21 22:01:13 +00:00

Files

T

Oleksandr Bezdieniezhnykh 2d6d44af5d [AZ-424] [AZ-425] [AZ-426] Implement negatives set (FT-N-01/03/04)

Adds three pure-logic evaluators + scenarios + unit tests covering the
project's failure-mode robustness ladder (AC-3.1, AC-3.4, AC-3.5,
AC-NEW-8):

* outlier_tolerance_evaluator (AZ-424 / FT-N-01): per-event 50 m drift
  bound + 3-frame covariance-monotonic window over the AZ-408 outlier
  injector's medium-density manifest.
* outage_request_evaluator (AZ-425 / FT-N-03): detects 3+ consecutive
  missing-frame windows; validates OPERATOR_RELOC_REQUEST STATUSTEXT
  arrives at 2 s ±500 ms, dead_reckoned label during outage, and no
  FC EKF divergence.
* blackout_spoof_evaluator (AZ-426 / FT-N-04): eight-AC ladder across
  the 5 s / 15 s / 35 s sub-windows — switch latency, spoof rejection,
  monotonic covariance, honest horiz_accuracy, STATUSTEXT 1-2 Hz,
  35 s escalation thresholds, and recovery gate.

Each scenario is skip-gated on the AZ-441 / AZ-407 / AZ-416 replay /
SITL / mavproxy helpers; unit tests (14 + 18 + 29 = 61) cover the
AC logic today. Full e2e unit-test suite: 527 passed (+67).

Co-authored-by: Cursor <cursoragent@cursor.com>

2026-05-17 08:26:16 +03:00

12 KiB

Raw Blame History

Code Review Report

Batch: 73 — AZ-424, AZ-425, AZ-426 Date: 2026-05-17 Verdict: PASS

Findings

(none)

Findings Sweep

Phase 1 — Context Loading

Loaded specs AZ-424_ft_n_01_outlier_tolerance.md, AZ-425_ft_n_03_outage_reloc.md, AZ-426_ft_n_04_blackout_spoof.md. Re-read injector surfaces touched by the new evaluators: e2e/fixtures/injectors/outlier.py (manifest.csv schema + OutlierInjectionReport.out_root), e2e/fixtures/injectors/blackout_spoof.py (BlackoutSpoofPlan, BlackoutSpoofSchedule.window_start_ms / window_end_ms, spoofed-GPS cadence + AC-NEW-8 200-500 m delta bounds). Re-read existing fixture wiring in e2e/runner/helpers/injector_fixtures.py to confirm outlier_injection_derkachi and blackout_spoof_derkachi parametrize on density / window_seconds. Re-read the scenario template used in batch 71/72 (tests/positive/test_ft_p_10_smoothing_lookback.py, tests/negative/test_ft_n_02_sharp_turn_failure.py) for the _harness_helpers_implemented gate pattern and the FDR / mavproxy / sitl_observer access conventions.

Phase 2 — Spec Compliance

AZ-424 (FT-N-01)

AC	Coverage	Status
AC-1 (medium-density injection; ≥10 outliers)	`test_constants_match_spec`, `test_evaluate_count_below_minimum_fails`, `test_evaluate_count_at_minimum_passes_count_gate`, scenario assertion via `MIN_OUTLIER_COUNT`	Covered
AC-2 (drift bound ≤50 m per outlier)	`test_evaluate_event_drift_within_budget`, `test_evaluate_event_drift_exceeds_budget_fails`, `test_evaluate_event_missing_neighbour_drift_none`, scenario per-event assertion via `OutlierEventReport.passes_drift`	Covered
AC-3 (covariance monotonic across 3-frame window)	`test_evaluate_event_cov_monotonic_passes`, `test_evaluate_event_cov_decreasing_fails`, `test_evaluate_event_cov_flat_window_passes`, scenario assertion via `passes_covariance`	Covered
AC-4 (parameterization per fc_adapter × vio_strategy)	scenario uses conftest `fc_adapter`/`vio_strategy` fixtures + indirect `outlier_injection_derkachi` (density=medium, seed=0)	Covered
CSV evidence	`test_write_csv_evidence_round_trips`, scenario writes `ft-n-01-{fc_adapter}-{vio_strategy}.csv`	Covered

AZ-425 (FT-N-03)

AC	Coverage	Status
AC-1 (≥3 consecutive missing frames)	`test_detect_no_outage_returns_empty`, `test_detect_run_below_min_length_ignored`, `test_detect_single_outage_window`, `test_detect_multiple_windows`, `test_detect_trailing_outage_window`, scenario assertion via `passes_min_length`	Covered
AC-2 (STATUSTEXT `OPERATOR_RELOC_REQUEST` within 2 s ±500 ms of onset)	`test_statustext_within_tolerance_passes`, `test_statustext_within_tolerance_late_passes`, `test_statustext_too_early_fails`, `test_statustext_too_late_fails`, `test_statustext_missing_fails`, `test_statustext_payload_mismatch_fails`, scenario assertion via `passes_statustext`	Covered
AC-3 (dead_reckoned label during outage)	`test_dead_reckoned_during_window_passes`, `test_dead_reckoned_absent_fails`, scenario assertion via `passes_dead_reckoned`	Covered
AC-4 (no FC EKF divergence event during outage)	`test_ekf_divergence_during_window_fails`, `test_ekf_divergence_outside_window_ignored`, scenario assertion via `passes_ekf`	Covered
AC-5 (parameterization)	scenario uses conftest `fc_adapter`/`vio_strategy` fixtures	Covered
CSV evidence	`test_write_csv_evidence_round_trips`, scenario writes `ft-n-03-{fc_adapter}-{vio_strategy}.csv`	Covered

AZ-426 (FT-N-04)

AC	Coverage	Status
AC-1 (switch latency ≤1 frame OR ≤400 ms)	`test_switch_latency_within_400_ms_passes` (validates `min(400, frame_period_ms)` budget), `test_switch_latency_within_one_frame_passes`, `test_switch_latency_at_one_frame_boundary_passes`, `test_switch_latency_missing_dead_reckoned_fails`, scenario assertion via `switch_latency.passes`	Covered
AC-2 (spoof-rejected events AND no satellite re-anchor inside window)	`test_spoof_rejection_pass`, `test_spoof_rejection_no_events_fails`, `test_spoof_rejection_label_returns_to_satellite_fails`, scenario assertion via `spoof_rejection.passes`	Covered
AC-3 (covariance monotonic)	`test_covariance_monotonic_pass`, `test_covariance_monotonic_decreasing_fails`, scenario assertion via `covariance_monotonic.passes`	Covered
AC-4 (`horiz_accuracy ≥ 0.95 × cov_semi_major_m`)	`test_honest_accuracy_pass`, `test_honest_accuracy_boundary_pass`, `test_honest_accuracy_violation_fails`, scenario assertion via `honest_accuracy.passes`	Covered
AC-5 (`VISUAL_BLACKOUT_IMU_ONLY` rate ∈ [1, 2] Hz)	`test_statustext_rate_pass_at_1hz`, `test_statustext_rate_pass_at_2hz`, `test_statustext_rate_too_slow_fails`, `test_statustext_rate_too_fast_fails`, scenario assertion via `statustext_rate.passes`	Covered
AC-6 (35 s only: cov 100 m → fix_type ≤2D)	`test_escalation_non_35s_window_passes_vacuously`, `test_escalation_35s_ac6_fix_type_degraded_passes`, `test_escalation_35s_ac6_fix_type_not_degraded_fails`, scenario assertion gated on `is_35s` via `escalation.passes_ac6`	Covered
AC-7 (35 s only: cov 500 m OR 30 s duration → `horiz=999`, `VISUAL_BLACKOUT_FAILSAFE` within 500 ms)	`test_escalation_35s_no_crossings_passes` (vacuous on duration-only path), `test_escalation_35s_ac7_horiz_not_999_fails`, scenario assertion gated on `is_35s` via `escalation.passes_ac7`	Covered
AC-8 (recovery gate: ≥10 s stable + consistency check pass)	`test_recovery_gate_pass`, `test_recovery_gate_unstable_fails`, `test_recovery_gate_spoofed_fails`, `test_recovery_gate_no_consistency_check_fails`, `test_recovery_gate_no_recovery_attempt_vacuous_pass`, scenario assertion via `recovery_gate.passes`	Covered
AC-9 (parameterization × 3 windows)	scenario indirect-parametrizes `blackout_spoof_derkachi` over `_WINDOW_LADDER_S = (5.0, 15.0, 35.0)` with ids `["5s", "15s", "35s"]`; conftest `fc_adapter`/`vio_strategy` adds 6 variants = 18 collected items per fc_adapter pair	Covered
CSV evidence	`test_write_csv_evidence_round_trips`, scenario writes `ft-n-04-{window_s}s-{fc_adapter}-{vio_strategy}.csv`	Covered

Phase 3 — Code Quality

Single responsibility: each evaluator is one module with one responsibility — outlier_tolerance_evaluator aggregates per-event AC-2/AC-3 reports; outage_request_evaluator detects outage windows and evaluates AC-1..AC-4 per window; blackout_spoof_evaluator evaluates the AC-1..AC-8 ladder against one BlackoutWindow. None of the three pulls in scenario-specific helpers (drive replay / collect samples) — those live in the scenario test files.
Method naming: per-AC evaluators are named after the AC concern (evaluate_switch_latency, evaluate_spoof_rejection, evaluate_covariance_monotonic, evaluate_honest_accuracy, evaluate_statustext_rate, evaluate_escalation, evaluate_recovery_gate). The aggregate evaluate(...) in each module composes the per-AC reports into a single dataclass.
No suppressed errors: load_outlier_manifest raises on missing file and missing columns; the manifest writer raises naturally on ENOENT; the evaluator helpers raise no exceptions of their own. No bare except, no 2>/dev/null-equivalents.
AAA comment discipline: every test uses `# Arrange / # Act /
Assert`; sections are omitted when not needed (e.g. constant
invariant tests just have # Assert).
Public boundary: confirmed all three evaluators import only from the e2e.runner.helpers.geo symbol (when needed) and dataclasses / stdlib. No from gps_denied_onboard .... Confirmed via grep.

Phase 4 — Security

No new secrets, credentials, or network paths. All three evaluators are pure-logic over already-collected samples / events.
Spoof rejection (AC-2) is the project's primary anti-spoof invariant; the evaluator does not bypass it — it asserts the FDR recorded the rejection AND that the source-label state machine did not silently re-promote to satellite_anchored inside the window.
Honest accuracy (AC-4) ensures the SUT cannot under-report uncertainty to the FC. The evaluator's check is horiz_accuracy ≥ 0.95 × cov_semi_major_m per the spec; we explicitly cover the boundary in test_honest_accuracy_boundary_pass so a future implementation cannot pass by emitting horiz = cov while the spec budget is 0.95 × cov.

Phase 5 — Performance

All three evaluators are O(N) over their input sequences (single pass over estimates, single pass over events, single pass over statustexts). No nested scans beyond the bounded 3-frame window in outlier_tolerance_evaluator.evaluate_event. CSV writes use buffered csv.writer. No file I/O at module import time.

Phase 6 — Cross-Task Consistency

Shared geo.distance_m is the single point-to-point distance helper used by outlier_tolerance_evaluator. Matches the accuracy_evaluator, multi_segment_evaluator, smoothing_evaluator, cold_start_evaluator conventions.
Shared _harness_helpers_implemented skip gate: all three new scenarios use the same probe pattern as test_ft_p_10_*, test_ft_p_11_*, test_ft_n_02_* — NotImplementedError on frame_source_replay, fdr_reader, imu_replay, mavproxy_tlog_reader, sitl_observer collapses to a single pytest.skip(...) with a pointer to the relevant unit test.
Constants centralised inside each module: MIN_OUTLIER_COUNT, DRIFT_BUDGET_M, SWITCH_LATENCY_MS, STATUSTEXT_RATE_*_HZ, ESCALATION_* all sit at the top of their respective modules and are imported as named constants in the unit tests. No magic numbers inline.
Source-label vocabulary: dead_reckoned / satellite_anchored are spelled identically across the three new evaluators and match the prior batches (sharp_turn_detector.ALLOWED_DURING_TURN_LABELS, multi_segment_evaluator, FDR schema in batch 67-68).
STATUSTEXT regex strings: OPERATOR_RELOC_REQUEST (FT-N-03), VISUAL_BLACKOUT_IMU_ONLY (FT-N-04 AC-5), VISUAL_BLACKOUT_FAILSAFE (FT-N-04 AC-7) match the spec verbatim; unit-tested for substring presence + payload mismatch.

Phase 7 — Architecture Compliance

Module placement: all three evaluators live in e2e/runner/helpers/; their unit tests in e2e/_unit_tests/helpers/; their scenarios in e2e/tests/negative/. Consistent with the AZ-406 layout and the directory-layout invariant test (which now lists the three new helpers + three new scenarios).
No src/gps_denied_onboard imports anywhere in the new code. Verified by inspection — the evaluators only consume typed dataclasses populated by the scenario from public-boundary sources (FDR, mavproxy tlog, SITL state, injector manifests).
Scenario gating: each new scenario file uses pytest.skip(...) with an explicit message pointing to the unit test that covers the gated AC logic. This is the established pattern from FT-P-07/08/09/10/11 and FT-N-02 — scenario coverage comes online once the AZ-441 / AZ-407 / AZ-416 leftovers ship.

Test Results

New unit tests: 14 (outlier) + 18 (outage) + 29 (blackout-spoof) = 61 new tests
Plus 6 new entries in the parametrized test_required_path_exists (3 evaluator paths + 3 scenario paths) — counted toward the suite total.
Full e2e/_unit_tests suite: 527 passed in 130 s (previous cumulative: 460 → +67 net).
Scenario collection for the three negative tests: 48 items collect cleanly (parametrized across fc_adapter × vio_strategy × {density | window_seconds}). The session-end /e2e-results/evidence/per-nfr teardown error is the same pre-existing wart documented in batches 69-72 (nfr_recorder hardcoded path; not introduced by this batch).

12 KiB Raw Blame History Unescape Escape