Files
Oleksandr Bezdieniezhnykh 330893be5c [AZ-432] [AZ-433] [AZ-434] [AZ-435] Add NFT-RES-01..04 resilience scenarios
Batch 86: 4 NFT-RES blackbox scenarios + 4 helper evaluators + 74 unit
tests + directory-layout registration.

* AZ-432 NFT-RES-01: 30 s IMU-only fallback drift bound (AC-3.5 + AC-NEW-7);
  two sub-cases (no_imu ≤100m, good_imu_combined_factor ≤50m).
* AZ-433 NFT-RES-02: companion mid-flight reboot (AC-5.2 + AC-5.3); resume
  ≤30s + first-emission accuracy ≤100m.
* AZ-434 NFT-RES-03: 100-iteration Monte Carlo envelope (AC-NEW-4);
  iteration-count + master-seed determinism + envelope ratio ≥0.95.
  Canonical-param by default; E2E_NFT_RES_03_FULL_MATRIX=1 unlocks matrix.
* AZ-435 NFT-RES-04: 35s blackout+spoof escalation ladder (AC-NEW-8);
  AC-1 (cov-2d→fix-degrade ≤500ms) + AC-2 (failsafe→999+STATUSTEXT
  ≤500ms) + AC-ORDER (strict ordering).

Verdict: PASS_WITH_WARNINGS (0 Critical, 0 High, 0 Medium, 5 Low).
F5 documents intentional threshold duplication with blackout_spoof
evaluator (prevents contract drift between FT-N-04 and NFT-RES-04).

Co-authored-by: Cursor <cursoragent@cursor.com>
2026-05-17 17:09:04 +03:00

9.0 KiB
Raw Permalink Blame History

Batch 86 — AZ-432 + AZ-433 + AZ-434 + AZ-435 (Resilience NFTs)

Tracker: AZ-432, AZ-433, AZ-434, AZ-435 Tasks: 4 tasks / 14 complexity points (3 + 3 + 5 + 3) Date: 2026-05-17 Verdict: PASS_WITH_WARNINGS Review: _docs/03_implementation/reviews/batch_86_review.md

Scope

  • AZ-432 / NFT-RES-01 (AC-3.5 + AC-NEW-7) — 30 s pure-vision-blackout drift bound; two sub-cases (no_imu ≤100 m, good_imu_combined_factor ≤50 m); Tier-1 OR Tier-2.
  • AZ-433 / NFT-RES-02 (AC-5.2 + AC-5.3) — Mid-flight Docker/systemd restart; resume ≤30 s + first-emission accuracy ≤100 m; Tier-1 OR Tier-2.
  • AZ-434 / NFT-RES-03 (AC-NEW-4) — 100-iteration Monte Carlo statistical envelope; AC-1 (N≥100) + AC-2 (master-seed determinism) + AC-3 (count(error_m ≤ 1.96 × cov_semi_major_m) / total ≥ 0.95); canonical-param by default, E2E_NFT_RES_03_FULL_MATRIX=1 unlocks full matrix.
  • AZ-435 / NFT-RES-04 (AC-NEW-8 escalation) — 35 s blackout+spoof full ladder; AC-1 (cov-2d → fix-degrade ≤500 ms) + AC-2 (failsafe trigger → 999+STATUSTEXT ≤500 ms) + AC-ORDER (cov-2d strictly precedes failsafe trigger).

Files

Created (12 files)

  • e2e/runner/helpers/imu_fallback_drift_evaluator.py — sub-case drift evaluator (no_imu / good_imu_combined_factor) with window-in-spec guard.
  • e2e/runner/helpers/companion_reboot_evaluator.py — restart-trigger + resume-time + first-emission-accuracy verdicts from one captured RestartEvidence.
  • e2e/runner/helpers/monte_carlo_envelope_evaluator.py — iteration-count + envelope-ratio + SHA-256 determinism fingerprint.
  • e2e/runner/helpers/escalation_ladder_evaluator.py — cov-2d/cov-500/duration triggers, latency budgets, strict ordering.
  • e2e/tests/resilience/test_nft_res_01_imu_only_fallback.py — NFT-RES-01 scenario (Tier-1/2; fixture-consumer).
  • e2e/tests/resilience/test_nft_res_02_companion_reboot.py — NFT-RES-02 scenario (Tier-1/2; fixture-consumer).
  • e2e/tests/resilience/test_nft_res_03_monte_carlo.py — NFT-RES-03 scenario (Tier-1/2; canonical-only by default; full-matrix gated).
  • e2e/tests/resilience/test_nft_res_04_blackout_escalation.py — NFT-RES-04 scenario (Tier-1/2; fixture-consumer; sibling of FT-N-04).
  • e2e/_unit_tests/helpers/test_imu_fallback_drift_evaluator.py — 16 unit tests.
  • e2e/_unit_tests/helpers/test_companion_reboot_evaluator.py — 19 unit tests.
  • e2e/_unit_tests/helpers/test_monte_carlo_envelope_evaluator.py — 15 unit tests.
  • e2e/_unit_tests/helpers/test_escalation_ladder_evaluator.py — 24 unit tests.

Modified

  • e2e/_unit_tests/test_directory_layout.py — registered 8 new paths.

Test Results

$ pytest e2e/_unit_tests/helpers/test_imu_fallback_drift_evaluator.py \
         e2e/_unit_tests/helpers/test_companion_reboot_evaluator.py \
         e2e/_unit_tests/helpers/test_monte_carlo_envelope_evaluator.py \
         e2e/_unit_tests/helpers/test_escalation_ladder_evaluator.py \
         e2e/_unit_tests/test_directory_layout.py
================ 199 passed in 0.69s ================

Scenario collection (24 cases, all parameterised):

$ pytest e2e/tests/resilience/ --collect-only -p no:csv --evidence-out=/tmp/e2e-test-evidence
collected 24 items
  test_nft_res_01_imu_only_fallback:    6 cases
  test_nft_res_02_companion_reboot:     6 cases
  test_nft_res_03_monte_carlo:          6 cases
  test_nft_res_04_blackout_escalation:  6 cases

Scenario smoke (all 24 skip cleanly with rich diagnostic messages):

24 skipped in 0.17s

Skip breakdown:

  • 13 skip-on-sitl_replay_ready=False (no E2E_SITL_REPLAY_DIR locally — expected pattern).
  • 8 skip-on-vins_mono (research-build-only per D-C1-1-SUB-A — conftest applies on production builds).
  • 3 skip-on-non-canonical-param for NFT-RES-03 (AC-4 default canonical-only; unlock with E2E_NFT_RES_03_FULL_MATRIX=1).

AC Verification

AZ-432 / NFT-RES-01

AC Coverage
AC-1 30 s window injected BlackoutWindow.window_in_spec (±2 s tolerance) + 3 unit tests + scenario gate
AC-2 no-IMU drift ≤ 100 m evaluate_subcase(... no_imu).passes + 2 unit tests + scenario AC-2 assert
AC-3 good-IMU drift ≤ 50 m evaluate_subcase(... good_imu_combined_factor).passes + 2 unit tests + scenario AC-3 assert
AC-4 parameterization 6 collected variants (fc_adapter × vio_strategy)

AZ-433 / NFT-RES-02

AC Coverage
AC-1 restart trigger ≤ 5 s passes_restart_trigger + 4 unit tests + scenario AC-1 assert
AC-2 resume time ≤ 30 s passes_resume_time + 4 unit tests + scenario AC-2 assert
AC-3 first-emission accuracy ≤ 100 m passes_first_emission_accuracy + 5 unit tests + scenario AC-3 assert
AC-4 parameterization 6 collected variants

AZ-434 / NFT-RES-03

AC Coverage
AC-1 N ≥ 100 iterations passes_iteration_count + 3 unit tests + scenario AC-1 assert
AC-2 master-seed determinism determinism_fingerprint + 4 unit tests + scenario dual-evaluate AC-2 assert
AC-3 envelope ratio ≥ 0.95 passes_envelope + 6 unit tests + scenario AC-3 assert
AC-4 parameterization Canonical (ardupilot, okvis2) by default; full matrix via E2E_NFT_RES_03_FULL_MATRIX=1

AZ-435 / NFT-RES-04

AC Coverage
AC-1 100 m → fix-degrade ≤ 500 ms fix_degrade.passes + 4 unit tests + scenario AC-1 assert
AC-2 500 m / 30 s → 999+STATUSTEXT ≤ 500 ms failsafe.passes + 6 unit tests + scenario AC-2 assert
AC-ORDER cov-2d strictly precedes failsafe ordering.passes + 3 unit tests + scenario AC-ORDER assert
AC-3 parameterization 6 collected variants

traces_to markers:

  • NFT-RES-01: AC-3.5,AC-NEW-7,AC-1,AC-2,AC-3,AC-4
  • NFT-RES-02: AC-5.2,AC-5.3,AC-1,AC-2,AC-3,AC-4
  • NFT-RES-03: AC-NEW-4,AC-1,AC-2,AC-3,AC-4
  • NFT-RES-04: AC-NEW-8,AC-1,AC-2,AC-3

Code Review

Verdict: PASS_WITH_WARNINGS — 0 Critical, 0 High, 0 Medium, 5 Low.

  • F1 (Low / Maintainability — carry-over of batch-85 F4): write_csv_evidence boilerplate now in 8 evaluators. Future hygiene PBI.
  • F2 (Low / Spec-Gap surfacing): AZ-432 sub-case (a) needs SUT-side disable-IMU path OR empty IMU stream from FC inbound proxy — production dep on AZ-595.
  • F3 (Low / Spec-Gap surfacing): AZ-433 process-restart observation needs runner-side health-probe; AZ-444 owns Tier-2, Tier-1 needs docker-compose healthcheck wiring.
  • F4 (Low / Maintainability): _resolve_fixture_path duplicated across 4 new scenarios (matches NFT-PERF pattern from batch 85). Future hygiene PBI.
  • F5 (Low / Maintainability — intentional): escalation_ladder_evaluator thresholds intentionally re-defined locally rather than imported from blackout_spoof_evaluator. Documented; will be cited if a future review proposes to "DRY" them.

Full review: _docs/03_implementation/reviews/batch_86_review.md.

Production Dependencies

Surfaced for the cumulative review window (85-87) + traceability matrix:

  1. AZ-595 (fixture builder): emit nft_res_01_imu_fallback.json (both sub-cases × 30 s blackout × estimate+GT samples), nft_res_02_companion_reboot.json (restart-command + process-restarted + first-post-restart-emission + GT-at-emission timestamps), nft_res_03_monte_carlo.json (master_seed + 100 iterations × per-frame (error_m, cov_semi_major_m)), nft_res_04_blackout_escalation.json (35 s window + estimate stream with cov/horiz/fix_type + STATUSTEXT stream).
  2. AZ-444 (Tier-2 runner): per-iteration clean-state lifecycle for NFT-RES-03 (fdr-output volume wipe + SUT cold restart × 100); systemd watchdog observation for NFT-RES-02 process-restart timing.
  3. AZ-595 + SUT: SUT-side no_imu config path OR FC-proxy empty IMU stream injection for AZ-432 sub-case (a).
  4. SUT-side: outbound stream MUST carry cov_semi_major_m, horiz_accuracy, and fix_type per-frame for NFT-RES-04 to detect the ladder. Existing FT-N-04 already requires the first two; fix_type is new for NFT-RES-04 AC-1 (MAVLink GPS_INPUT.fix_type ≤ 2 for AP, equivalent for iNav).
  5. Already exists: sitl_replay_ready fixture, sitl_observer.replay_dir(), evidence_dir, nfr_recorder (AZ-406/AZ-445), geo.distance_m (AZ-407), conftest fc_adapter / vio_strategy / vins_mono skip rules.

Architecture Compliance

  • All new files under e2e/, owned by the Blackbox Tests cross-cutting component per _docs/02_document/module-layout.md.
  • No imports from src/gps_denied_onboard (verified — only runner.helpers.geo, runner.helpers.sitl_observer, pyproj, stdlib).
  • No new cyclic dependencies. New evaluators are leaves of the import DAG.
  • No new infrastructure libraries.

Sub-step Trace

Phases executed per implement/SKILL.md:

  • phase 5 (load-spec) → 4 task specs read
  • phase 6 (implement-tasks-sequentially) → helpers + scenarios + unit tests for all 4 tasks
  • phase 7 (verify-ac-coverage) → ACs traced above
  • phase 8 (code-review) → batch_86_review.md (PASS_WITH_WARNINGS)
  • phase 8.5 (cumulative-review) → defer to batch 87 (K=3 window 85-87)
  • phase 11 (commit-batch) → next.