Files
Oleksandr Bezdieniezhnykh 330893be5c [AZ-432] [AZ-433] [AZ-434] [AZ-435] Add NFT-RES-01..04 resilience scenarios
Batch 86: 4 NFT-RES blackbox scenarios + 4 helper evaluators + 74 unit
tests + directory-layout registration.

* AZ-432 NFT-RES-01: 30 s IMU-only fallback drift bound (AC-3.5 + AC-NEW-7);
  two sub-cases (no_imu ≤100m, good_imu_combined_factor ≤50m).
* AZ-433 NFT-RES-02: companion mid-flight reboot (AC-5.2 + AC-5.3); resume
  ≤30s + first-emission accuracy ≤100m.
* AZ-434 NFT-RES-03: 100-iteration Monte Carlo envelope (AC-NEW-4);
  iteration-count + master-seed determinism + envelope ratio ≥0.95.
  Canonical-param by default; E2E_NFT_RES_03_FULL_MATRIX=1 unlocks matrix.
* AZ-435 NFT-RES-04: 35s blackout+spoof escalation ladder (AC-NEW-8);
  AC-1 (cov-2d→fix-degrade ≤500ms) + AC-2 (failsafe→999+STATUSTEXT
  ≤500ms) + AC-ORDER (strict ordering).

Verdict: PASS_WITH_WARNINGS (0 Critical, 0 High, 0 Medium, 5 Low).
F5 documents intentional threshold duplication with blackout_spoof
evaluator (prevents contract drift between FT-N-04 and NFT-RES-04).

Co-authored-by: Cursor <cursoragent@cursor.com>
2026-05-17 17:09:04 +03:00

9.1 KiB

Code Review Report — Batch 86

Batch: 86 (AZ-432 + AZ-433 + AZ-434 + AZ-435 — Resilience NFTs) Date: 2026-05-17 Verdict: PASS_WITH_WARNINGS

Findings

# Severity Category File:Line Title
1 Low Maintainability e2e/runner/helpers/{imu_fallback_drift,companion_reboot,monte_carlo_envelope,escalation_ladder}_evaluator.py write_csv_evidence duplication continues — now 8 evaluators (carry-over of batch-85 F4)
2 Low Spec-Gap (surfacing) e2e/tests/resilience/test_nft_res_01_imu_only_fallback.py Sub-case (a) "no IMU" needs SUT-side disable-IMU config OR empty IMU stream; production dep on AZ-595
3 Low Spec-Gap (surfacing) e2e/tests/resilience/test_nft_res_02_companion_reboot.py Process-restart observation requires runner-side health probe; AZ-444 owns Tier-2, Tier-1 needs docker-compose healthcheck wiring
4 Low Maintainability e2e/tests/resilience/test_nft_res_0{1,2,3,4}_*.py _resolve_fixture_path duplicated across 4 new scenarios (matches NFT-PERF pattern)
5 Low Maintainability (intentional) e2e/runner/helpers/escalation_ladder_evaluator.py Thresholds (COV_2D_THRESHOLD_M, COV_FAILSAFE_THRESHOLD_M, etc.) deliberately re-defined locally rather than imported from blackout_spoof_evaluator

Finding Details

F1: write_csv_evidence duplication continues (Low / Maintainability — carry-over)

  • Location: imu_fallback_drift_evaluator.py, companion_reboot_evaluator.py, monte_carlo_envelope_evaluator.py, escalation_ladder_evaluator.py
  • Description: Batch-85 surfaced this on 4 evaluators (F4); batch 86 adds 4 more. The pattern is: open file, write header row, write data row(s), close. All 8 evaluators implement it independently.
  • Suggestion: Continued hygiene PBI — extract a _emit_csv(path, header, rows) helper. Not blocking; will be surfaced again in the cumulative review of batches 85-87.
  • Task: AZ-432 / AZ-433 / AZ-434 / AZ-435

F2: Sub-case (a) "no IMU" needs SUT-side disable-IMU path (Low / Spec-Gap — surfacing)

  • Location: e2e/tests/resilience/test_nft_res_01_imu_only_fallback.py
  • Description: AZ-432 sub-case (a) requires the SUT to run with IMU input disabled. The task spec calls out two options: (i) SUT config flag E2E_NFT_RES_01_DISABLE_IMU=1 (or equivalent) propagated to the SUT, or (ii) empty IMU stream from the FC inbound proxy. Neither path is wired in production code today; the fixture builder (AZ-595) is responsible for setting up the variant that produces the captured no-IMU estimates.
  • Suggestion: Track as production dependency for AZ-595 — the SITL replay builder must emit a no_imu sub-case capture using one of those two strategies and label it accordingly in the fixture JSON.
  • Task: AZ-432

F3: Process-restart observation needs runner-side wiring (Low / Spec-Gap — surfacing)

  • Location: e2e/tests/resilience/test_nft_res_02_companion_reboot.py
  • Description: AZ-433 AC-1 (process restarts within ≤5 s) requires the runner to observe the moment the SUT process is back up. On Tier-2, AZ-444 owns the systemd watchdog observation. On Tier-1, the equivalent docker-compose healthcheck transition (unhealthy → healthy) is not yet captured into the fixture JSON. The scenario tolerates process_restarted_monotonic_ms = null (AC-1 fails loud), but the positive path requires the harness.
  • Suggestion: Track as production dependency for AZ-444 (Tier-2) + AZ-595 (Tier-1 docker-compose healthcheck capture).
  • Task: AZ-433

F4: _resolve_fixture_path duplicated across scenarios (Low / Maintainability)

  • Location: test_nft_res_0{1,2,3,4}_*.py
  • Description: Each scenario re-implements the env-var-or-default fixture path resolver (E2E_NFT_RES_NN_FIXTURE → absolute or relative to E2E_SITL_REPLAY_DIR). The NFT-PERF batch (85) introduced the same pattern across 4 scenarios. Now 8 scenarios duplicate it.
  • Suggestion: Future hygiene PBI — extract a runner.helpers.fixture_resolver.resolve(env_var, default_name) utility. Not blocking; isolation per scenario keeps each test self-contained today.
  • Task: AZ-432 / AZ-433 / AZ-434 / AZ-435

F5: Thresholds intentionally re-defined locally (Low / Maintainability — intentional)

  • Location: e2e/runner/helpers/escalation_ladder_evaluator.py:25-40
  • Description: COV_2D_THRESHOLD_M = 100.0, COV_FAILSAFE_THRESHOLD_M = 500.0, DURATION_FAILSAFE_S = 30.0, FIX_TYPE_2D = 2, HORIZ_ACCURACY_FAILSAFE = 999.0, STATUSTEXT_FAILSAFE = "VISUAL_BLACKOUT_FAILSAFE", ESCALATION_LATENCY_MS = 500 are also defined in blackout_spoof_evaluator.py under the same numeric values. The duplication is intentional — NFT-RES-04 and FT-N-04 must continue to fail loud if either evaluator's contract drifts away from the other. A shared constants module would silently propagate one team's edit into both AC suites.
  • Suggestion: NONE. Documented in escalation_ladder_evaluator.py module docstring. If a future cumulative review proposes to "DRY" these, reject and re-cite this finding.
  • Task: AZ-435

Phase Notes

Phase 1 — Context

All 4 task specs read; ACs walked through against helpers + scenarios. Architecture-compliance file (module-layout.md § blackbox_tests) re-read to confirm e2e/** ownership envelope.

Phase 2 — Spec Compliance

Task AC Evidence
AZ-432 AC-1 30 s window imu_fallback_drift_evaluator.BlackoutWindow.window_in_spec + scenario AC-1 assert
AZ-432 AC-2 no-IMU drift ≤ 100 m evaluate_subcase(... SUBCASE_NO_IMU).passes + 2 unit tests + scenario AC-2 assert
AZ-432 AC-3 good-IMU drift ≤ 50 m evaluate_subcase(... SUBCASE_GOOD_IMU).passes + 2 unit tests + scenario AC-3 assert
AZ-432 AC-4 parameterization 6 collected variants per scenario
AZ-433 AC-1 restart trigger ≤ 5 s companion_reboot_evaluator.passes_restart_trigger + 4 unit tests + scenario AC-1 assert
AZ-433 AC-2 resume time ≤ 30 s passes_resume_time + 4 unit tests + scenario AC-2 assert
AZ-433 AC-3 accuracy ≤ 100 m passes_first_emission_accuracy + 5 unit tests + scenario AC-3 assert
AZ-433 AC-4 parameterization 6 collected variants
AZ-434 AC-1 N ≥ 100 monte_carlo_envelope_evaluator.passes_iteration_count + 3 unit tests + scenario AC-1 assert
AZ-434 AC-2 determinism determinism_fingerprint + 4 unit tests + scenario dual-evaluate AC-2 assert
AZ-434 AC-3 envelope ≥ 95 % passes_envelope + 6 unit tests + scenario AC-3 assert
AZ-434 AC-4 parameterization Canonical-only by default; E2E_NFT_RES_03_FULL_MATRIX=1 unlocks full 6 variants
AZ-435 AC-1 100 m → fix-degrade ≤ 500 ms escalation_ladder_evaluator.fix_degrade.passes + 4 unit tests + scenario AC-1 assert
AZ-435 AC-2 500 m/30 s → 999+STATUSTEXT ≤ 500 ms failsafe.passes + 6 unit tests + scenario AC-2 assert
AZ-435 AC-ORDER cov-2d strictly precedes failsafe ordering.passes + 3 unit tests + scenario AC-ORDER assert
AZ-435 AC-3 parameterization 6 collected variants

All ACs PASS.

Phase 3 — Code Quality

  • SRP: each evaluator owns one resilience concern (drift / restart / envelope / ladder). Scenarios are thin adapters.
  • Error handling: explicit ValueError on bad helper input (negative latency, unknown sub-case); pytest.fail() on bad fixture shape; no bare except.
  • Naming: clear (SubCaseReport, RestartEvidence, IterationOutcome, EscalationLadderReport); follows project conventions.
  • Complexity: longest function is escalation_ladder_evaluator.evaluate (~60 lines), all sequential helper calls — acceptable.
  • DRY: F1 + F4 + F5 noted above.
  • Test quality: every helper public function has ≥3 unit tests covering happy path, boundary, failure; total 74 new unit tests (16 + 19 + 15 + 24).

Phase 4 — Security Quick-Scan

  • No SQL, no subprocess(shell=True), no eval/exec, no hardcoded secrets, no insecure deserialization (JSON loads only with explicit shape validation).
  • Pass.

Phase 5 — Performance Scan

  • All evaluators are O(n) single-pass sweeps over their inputs.
  • No nested loops, no N+1 patterns, no blocking I/O in hot paths.
  • Pass.

Phase 6 — Cross-Task Consistency

  • All 4 helpers share geo.distance_m (consistent with batch-85 + earlier pattern).
  • All 4 scenarios use the same skip pattern (sitl_replay_readypytest.skip; missing fixture → pytest.fail).
  • All 4 scenarios use nfr_recorder.record_metric with ac_id kwarg consistently.
  • F5 documents the one intentional duplication (escalation-ladder thresholds vs blackout-spoof thresholds).

Phase 7 — Architecture Compliance

  • All new files under e2e/** per module-layout.md § blackbox_tests Owns glob.
  • No imports from src/gps_denied_onboard/** (verified — only runner.helpers.geo, runner.helpers.sitl_observer, pyproj, stdlib).
  • No new cyclic dependencies. New evaluators are leaves of the import DAG.
  • No new infrastructure libraries.

Auto-Fix Gate

0 Critical, 0 High, 0 Medium, 5 Low. All Low findings are either intentional, surfacing, or future hygiene. No auto-fix attempts triggered; no escalation.

Verdict

PASS_WITH_WARNINGS.