# Batch 86 — AZ-432 + AZ-433 + AZ-434 + AZ-435 (Resilience NFTs) **Tracker**: AZ-432, AZ-433, AZ-434, AZ-435 **Tasks**: 4 tasks / 14 complexity points (3 + 3 + 5 + 3) **Date**: 2026-05-17 **Verdict**: PASS_WITH_WARNINGS **Review**: `_docs/03_implementation/reviews/batch_86_review.md` ## Scope - **AZ-432 / NFT-RES-01 (AC-3.5 + AC-NEW-7)** — 30 s pure-vision-blackout drift bound; two sub-cases (`no_imu` ≤100 m, `good_imu_combined_factor` ≤50 m); Tier-1 OR Tier-2. - **AZ-433 / NFT-RES-02 (AC-5.2 + AC-5.3)** — Mid-flight Docker/systemd restart; resume ≤30 s + first-emission accuracy ≤100 m; Tier-1 OR Tier-2. - **AZ-434 / NFT-RES-03 (AC-NEW-4)** — 100-iteration Monte Carlo statistical envelope; AC-1 (N≥100) + AC-2 (master-seed determinism) + AC-3 (`count(error_m ≤ 1.96 × cov_semi_major_m) / total ≥ 0.95`); canonical-param by default, `E2E_NFT_RES_03_FULL_MATRIX=1` unlocks full matrix. - **AZ-435 / NFT-RES-04 (AC-NEW-8 escalation)** — 35 s blackout+spoof full ladder; AC-1 (cov-2d → fix-degrade ≤500 ms) + AC-2 (failsafe trigger → 999+STATUSTEXT ≤500 ms) + AC-ORDER (cov-2d strictly precedes failsafe trigger). ## Files ### Created (12 files) - `e2e/runner/helpers/imu_fallback_drift_evaluator.py` — sub-case drift evaluator (no_imu / good_imu_combined_factor) with window-in-spec guard. - `e2e/runner/helpers/companion_reboot_evaluator.py` — restart-trigger + resume-time + first-emission-accuracy verdicts from one captured `RestartEvidence`. - `e2e/runner/helpers/monte_carlo_envelope_evaluator.py` — iteration-count + envelope-ratio + SHA-256 determinism fingerprint. - `e2e/runner/helpers/escalation_ladder_evaluator.py` — cov-2d/cov-500/duration triggers, latency budgets, strict ordering. - `e2e/tests/resilience/test_nft_res_01_imu_only_fallback.py` — NFT-RES-01 scenario (Tier-1/2; fixture-consumer). - `e2e/tests/resilience/test_nft_res_02_companion_reboot.py` — NFT-RES-02 scenario (Tier-1/2; fixture-consumer). - `e2e/tests/resilience/test_nft_res_03_monte_carlo.py` — NFT-RES-03 scenario (Tier-1/2; canonical-only by default; full-matrix gated). - `e2e/tests/resilience/test_nft_res_04_blackout_escalation.py` — NFT-RES-04 scenario (Tier-1/2; fixture-consumer; sibling of FT-N-04). - `e2e/_unit_tests/helpers/test_imu_fallback_drift_evaluator.py` — 16 unit tests. - `e2e/_unit_tests/helpers/test_companion_reboot_evaluator.py` — 19 unit tests. - `e2e/_unit_tests/helpers/test_monte_carlo_envelope_evaluator.py` — 15 unit tests. - `e2e/_unit_tests/helpers/test_escalation_ladder_evaluator.py` — 24 unit tests. ### Modified - `e2e/_unit_tests/test_directory_layout.py` — registered 8 new paths. ## Test Results ``` $ pytest e2e/_unit_tests/helpers/test_imu_fallback_drift_evaluator.py \ e2e/_unit_tests/helpers/test_companion_reboot_evaluator.py \ e2e/_unit_tests/helpers/test_monte_carlo_envelope_evaluator.py \ e2e/_unit_tests/helpers/test_escalation_ladder_evaluator.py \ e2e/_unit_tests/test_directory_layout.py ================ 199 passed in 0.69s ================ ``` Scenario collection (24 cases, all parameterised): ``` $ pytest e2e/tests/resilience/ --collect-only -p no:csv --evidence-out=/tmp/e2e-test-evidence collected 24 items test_nft_res_01_imu_only_fallback: 6 cases test_nft_res_02_companion_reboot: 6 cases test_nft_res_03_monte_carlo: 6 cases test_nft_res_04_blackout_escalation: 6 cases ``` Scenario smoke (all 24 skip cleanly with rich diagnostic messages): ``` 24 skipped in 0.17s ``` Skip breakdown: - 13 skip-on-`sitl_replay_ready=False` (no `E2E_SITL_REPLAY_DIR` locally — expected pattern). - 8 skip-on-`vins_mono` (research-build-only per D-C1-1-SUB-A — conftest applies on production builds). - 3 skip-on-non-canonical-param for NFT-RES-03 (AC-4 default canonical-only; unlock with `E2E_NFT_RES_03_FULL_MATRIX=1`). ## AC Verification ### AZ-432 / NFT-RES-01 | AC | Coverage | |----|----------| | AC-1 30 s window injected | `BlackoutWindow.window_in_spec` (±2 s tolerance) + 3 unit tests + scenario gate | | AC-2 no-IMU drift ≤ 100 m | `evaluate_subcase(... no_imu).passes` + 2 unit tests + scenario AC-2 assert | | AC-3 good-IMU drift ≤ 50 m | `evaluate_subcase(... good_imu_combined_factor).passes` + 2 unit tests + scenario AC-3 assert | | AC-4 parameterization | 6 collected variants (fc_adapter × vio_strategy) | ### AZ-433 / NFT-RES-02 | AC | Coverage | |----|----------| | AC-1 restart trigger ≤ 5 s | `passes_restart_trigger` + 4 unit tests + scenario AC-1 assert | | AC-2 resume time ≤ 30 s | `passes_resume_time` + 4 unit tests + scenario AC-2 assert | | AC-3 first-emission accuracy ≤ 100 m | `passes_first_emission_accuracy` + 5 unit tests + scenario AC-3 assert | | AC-4 parameterization | 6 collected variants | ### AZ-434 / NFT-RES-03 | AC | Coverage | |----|----------| | AC-1 N ≥ 100 iterations | `passes_iteration_count` + 3 unit tests + scenario AC-1 assert | | AC-2 master-seed determinism | `determinism_fingerprint` + 4 unit tests + scenario dual-evaluate AC-2 assert | | AC-3 envelope ratio ≥ 0.95 | `passes_envelope` + 6 unit tests + scenario AC-3 assert | | AC-4 parameterization | Canonical (ardupilot, okvis2) by default; full matrix via `E2E_NFT_RES_03_FULL_MATRIX=1` | ### AZ-435 / NFT-RES-04 | AC | Coverage | |----|----------| | AC-1 100 m → fix-degrade ≤ 500 ms | `fix_degrade.passes` + 4 unit tests + scenario AC-1 assert | | AC-2 500 m / 30 s → 999+STATUSTEXT ≤ 500 ms | `failsafe.passes` + 6 unit tests + scenario AC-2 assert | | AC-ORDER cov-2d strictly precedes failsafe | `ordering.passes` + 3 unit tests + scenario AC-ORDER assert | | AC-3 parameterization | 6 collected variants | `traces_to` markers: - NFT-RES-01: `AC-3.5,AC-NEW-7,AC-1,AC-2,AC-3,AC-4` - NFT-RES-02: `AC-5.2,AC-5.3,AC-1,AC-2,AC-3,AC-4` - NFT-RES-03: `AC-NEW-4,AC-1,AC-2,AC-3,AC-4` - NFT-RES-04: `AC-NEW-8,AC-1,AC-2,AC-3` ## Code Review **Verdict**: PASS_WITH_WARNINGS — 0 Critical, 0 High, 0 Medium, 5 Low. - **F1 (Low / Maintainability — carry-over of batch-85 F4)**: `write_csv_evidence` boilerplate now in 8 evaluators. Future hygiene PBI. - **F2 (Low / Spec-Gap surfacing)**: AZ-432 sub-case (a) needs SUT-side disable-IMU path OR empty IMU stream from FC inbound proxy — production dep on AZ-595. - **F3 (Low / Spec-Gap surfacing)**: AZ-433 process-restart observation needs runner-side health-probe; AZ-444 owns Tier-2, Tier-1 needs docker-compose healthcheck wiring. - **F4 (Low / Maintainability)**: `_resolve_fixture_path` duplicated across 4 new scenarios (matches NFT-PERF pattern from batch 85). Future hygiene PBI. - **F5 (Low / Maintainability — intentional)**: `escalation_ladder_evaluator` thresholds intentionally re-defined locally rather than imported from `blackout_spoof_evaluator`. Documented; will be cited if a future review proposes to "DRY" them. Full review: `_docs/03_implementation/reviews/batch_86_review.md`. ## Production Dependencies Surfaced for the cumulative review window (85-87) + traceability matrix: 1. **AZ-595 (fixture builder)**: emit `nft_res_01_imu_fallback.json` (both sub-cases × 30 s blackout × estimate+GT samples), `nft_res_02_companion_reboot.json` (restart-command + process-restarted + first-post-restart-emission + GT-at-emission timestamps), `nft_res_03_monte_carlo.json` (master_seed + 100 iterations × per-frame (error_m, cov_semi_major_m)), `nft_res_04_blackout_escalation.json` (35 s window + estimate stream with cov/horiz/fix_type + STATUSTEXT stream). 2. **AZ-444 (Tier-2 runner)**: per-iteration clean-state lifecycle for NFT-RES-03 (fdr-output volume wipe + SUT cold restart × 100); systemd watchdog observation for NFT-RES-02 process-restart timing. 3. **AZ-595 + SUT**: SUT-side `no_imu` config path OR FC-proxy empty IMU stream injection for AZ-432 sub-case (a). 4. **SUT-side**: outbound stream MUST carry `cov_semi_major_m`, `horiz_accuracy`, and `fix_type` per-frame for NFT-RES-04 to detect the ladder. Existing FT-N-04 already requires the first two; `fix_type` is new for NFT-RES-04 AC-1 (MAVLink `GPS_INPUT.fix_type ≤ 2` for AP, equivalent for iNav). 5. **Already exists**: `sitl_replay_ready` fixture, `sitl_observer.replay_dir()`, `evidence_dir`, `nfr_recorder` (AZ-406/AZ-445), `geo.distance_m` (AZ-407), conftest `fc_adapter` / `vio_strategy` / `vins_mono` skip rules. ## Architecture Compliance - All new files under `e2e/`, owned by the Blackbox Tests cross-cutting component per `_docs/02_document/module-layout.md`. - No imports from `src/gps_denied_onboard` (verified — only `runner.helpers.geo`, `runner.helpers.sitl_observer`, `pyproj`, stdlib). - No new cyclic dependencies. New evaluators are leaves of the import DAG. - No new infrastructure libraries. ## Sub-step Trace Phases executed per `implement/SKILL.md`: - phase 5 (load-spec) → 4 task specs read - phase 6 (implement-tasks-sequentially) → helpers + scenarios + unit tests for all 4 tasks - phase 7 (verify-ac-coverage) → ACs traced above - phase 8 (code-review) → batch_86_review.md (PASS_WITH_WARNINGS) - phase 8.5 (cumulative-review) → defer to batch 87 (K=3 window 85-87) - phase 11 (commit-batch) → next.