# Code Review Report — Batch 86 **Batch**: 86 (AZ-432 + AZ-433 + AZ-434 + AZ-435 — Resilience NFTs) **Date**: 2026-05-17 **Verdict**: PASS_WITH_WARNINGS ## Findings | # | Severity | Category | File:Line | Title | |---|----------|----------|-----------|-------| | 1 | Low | Maintainability | `e2e/runner/helpers/{imu_fallback_drift,companion_reboot,monte_carlo_envelope,escalation_ladder}_evaluator.py` | `write_csv_evidence` duplication continues — now 8 evaluators (carry-over of batch-85 F4) | | 2 | Low | Spec-Gap (surfacing) | `e2e/tests/resilience/test_nft_res_01_imu_only_fallback.py` | Sub-case (a) "no IMU" needs SUT-side disable-IMU config OR empty IMU stream; production dep on AZ-595 | | 3 | Low | Spec-Gap (surfacing) | `e2e/tests/resilience/test_nft_res_02_companion_reboot.py` | Process-restart observation requires runner-side health probe; AZ-444 owns Tier-2, Tier-1 needs docker-compose healthcheck wiring | | 4 | Low | Maintainability | `e2e/tests/resilience/test_nft_res_0{1,2,3,4}_*.py` | `_resolve_fixture_path` duplicated across 4 new scenarios (matches NFT-PERF pattern) | | 5 | Low | Maintainability (intentional) | `e2e/runner/helpers/escalation_ladder_evaluator.py` | Thresholds (`COV_2D_THRESHOLD_M`, `COV_FAILSAFE_THRESHOLD_M`, etc.) deliberately re-defined locally rather than imported from `blackout_spoof_evaluator` | ### Finding Details **F1: `write_csv_evidence` duplication continues** (Low / Maintainability — carry-over) - Location: `imu_fallback_drift_evaluator.py`, `companion_reboot_evaluator.py`, `monte_carlo_envelope_evaluator.py`, `escalation_ladder_evaluator.py` - Description: Batch-85 surfaced this on 4 evaluators (F4); batch 86 adds 4 more. The pattern is: open file, write header row, write data row(s), close. All 8 evaluators implement it independently. - Suggestion: Continued hygiene PBI — extract a `_emit_csv(path, header, rows)` helper. Not blocking; will be surfaced again in the cumulative review of batches 85-87. - Task: AZ-432 / AZ-433 / AZ-434 / AZ-435 **F2: Sub-case (a) "no IMU" needs SUT-side disable-IMU path** (Low / Spec-Gap — surfacing) - Location: `e2e/tests/resilience/test_nft_res_01_imu_only_fallback.py` - Description: AZ-432 sub-case (a) requires the SUT to run with IMU input disabled. The task spec calls out two options: (i) SUT config flag `E2E_NFT_RES_01_DISABLE_IMU=1` (or equivalent) propagated to the SUT, or (ii) empty IMU stream from the FC inbound proxy. Neither path is wired in production code today; the fixture builder (AZ-595) is responsible for setting up the variant that produces the captured no-IMU estimates. - Suggestion: Track as production dependency for AZ-595 — the SITL replay builder must emit a `no_imu` sub-case capture using one of those two strategies and label it accordingly in the fixture JSON. - Task: AZ-432 **F3: Process-restart observation needs runner-side wiring** (Low / Spec-Gap — surfacing) - Location: `e2e/tests/resilience/test_nft_res_02_companion_reboot.py` - Description: AZ-433 AC-1 (process restarts within ≤5 s) requires the runner to observe the moment the SUT process is back up. On Tier-2, AZ-444 owns the systemd watchdog observation. On Tier-1, the equivalent docker-compose `healthcheck` transition (`unhealthy → healthy`) is not yet captured into the fixture JSON. The scenario tolerates `process_restarted_monotonic_ms = null` (AC-1 fails loud), but the *positive* path requires the harness. - Suggestion: Track as production dependency for AZ-444 (Tier-2) + AZ-595 (Tier-1 docker-compose healthcheck capture). - Task: AZ-433 **F4: `_resolve_fixture_path` duplicated across scenarios** (Low / Maintainability) - Location: `test_nft_res_0{1,2,3,4}_*.py` - Description: Each scenario re-implements the env-var-or-default fixture path resolver (`E2E_NFT_RES_NN_FIXTURE` → absolute or relative to `E2E_SITL_REPLAY_DIR`). The NFT-PERF batch (85) introduced the same pattern across 4 scenarios. Now 8 scenarios duplicate it. - Suggestion: Future hygiene PBI — extract a `runner.helpers.fixture_resolver.resolve(env_var, default_name)` utility. Not blocking; isolation per scenario keeps each test self-contained today. - Task: AZ-432 / AZ-433 / AZ-434 / AZ-435 **F5: Thresholds intentionally re-defined locally** (Low / Maintainability — intentional) - Location: `e2e/runner/helpers/escalation_ladder_evaluator.py:25-40` - Description: `COV_2D_THRESHOLD_M = 100.0`, `COV_FAILSAFE_THRESHOLD_M = 500.0`, `DURATION_FAILSAFE_S = 30.0`, `FIX_TYPE_2D = 2`, `HORIZ_ACCURACY_FAILSAFE = 999.0`, `STATUSTEXT_FAILSAFE = "VISUAL_BLACKOUT_FAILSAFE"`, `ESCALATION_LATENCY_MS = 500` are also defined in `blackout_spoof_evaluator.py` under the same numeric values. The duplication is **intentional** — NFT-RES-04 and FT-N-04 must continue to fail loud if either evaluator's contract drifts away from the other. A shared constants module would silently propagate one team's edit into both AC suites. - Suggestion: NONE. Documented in `escalation_ladder_evaluator.py` module docstring. If a future cumulative review proposes to "DRY" these, reject and re-cite this finding. - Task: AZ-435 ## Phase Notes ### Phase 1 — Context All 4 task specs read; ACs walked through against helpers + scenarios. Architecture-compliance file (`module-layout.md` § blackbox_tests) re-read to confirm `e2e/**` ownership envelope. ### Phase 2 — Spec Compliance | Task | AC | Evidence | |------|----|----------| | AZ-432 | AC-1 30 s window | `imu_fallback_drift_evaluator.BlackoutWindow.window_in_spec` + scenario AC-1 assert | | AZ-432 | AC-2 no-IMU drift ≤ 100 m | `evaluate_subcase(... SUBCASE_NO_IMU).passes` + 2 unit tests + scenario AC-2 assert | | AZ-432 | AC-3 good-IMU drift ≤ 50 m | `evaluate_subcase(... SUBCASE_GOOD_IMU).passes` + 2 unit tests + scenario AC-3 assert | | AZ-432 | AC-4 parameterization | 6 collected variants per scenario | | AZ-433 | AC-1 restart trigger ≤ 5 s | `companion_reboot_evaluator.passes_restart_trigger` + 4 unit tests + scenario AC-1 assert | | AZ-433 | AC-2 resume time ≤ 30 s | `passes_resume_time` + 4 unit tests + scenario AC-2 assert | | AZ-433 | AC-3 accuracy ≤ 100 m | `passes_first_emission_accuracy` + 5 unit tests + scenario AC-3 assert | | AZ-433 | AC-4 parameterization | 6 collected variants | | AZ-434 | AC-1 N ≥ 100 | `monte_carlo_envelope_evaluator.passes_iteration_count` + 3 unit tests + scenario AC-1 assert | | AZ-434 | AC-2 determinism | `determinism_fingerprint` + 4 unit tests + scenario dual-evaluate AC-2 assert | | AZ-434 | AC-3 envelope ≥ 95 % | `passes_envelope` + 6 unit tests + scenario AC-3 assert | | AZ-434 | AC-4 parameterization | Canonical-only by default; `E2E_NFT_RES_03_FULL_MATRIX=1` unlocks full 6 variants | | AZ-435 | AC-1 100 m → fix-degrade ≤ 500 ms | `escalation_ladder_evaluator.fix_degrade.passes` + 4 unit tests + scenario AC-1 assert | | AZ-435 | AC-2 500 m/30 s → 999+STATUSTEXT ≤ 500 ms | `failsafe.passes` + 6 unit tests + scenario AC-2 assert | | AZ-435 | AC-ORDER cov-2d strictly precedes failsafe | `ordering.passes` + 3 unit tests + scenario AC-ORDER assert | | AZ-435 | AC-3 parameterization | 6 collected variants | All ACs PASS. ### Phase 3 — Code Quality - SRP: each evaluator owns one resilience concern (drift / restart / envelope / ladder). Scenarios are thin adapters. - Error handling: explicit `ValueError` on bad helper input (negative latency, unknown sub-case); `pytest.fail()` on bad fixture shape; no bare `except`. - Naming: clear (`SubCaseReport`, `RestartEvidence`, `IterationOutcome`, `EscalationLadderReport`); follows project conventions. - Complexity: longest function is `escalation_ladder_evaluator.evaluate` (~60 lines), all sequential helper calls — acceptable. - DRY: F1 + F4 + F5 noted above. - Test quality: every helper public function has ≥3 unit tests covering happy path, boundary, failure; total 74 new unit tests (16 + 19 + 15 + 24). ### Phase 4 — Security Quick-Scan - No SQL, no `subprocess(shell=True)`, no `eval/exec`, no hardcoded secrets, no insecure deserialization (JSON loads only with explicit shape validation). - Pass. ### Phase 5 — Performance Scan - All evaluators are O(n) single-pass sweeps over their inputs. - No nested loops, no N+1 patterns, no blocking I/O in hot paths. - Pass. ### Phase 6 — Cross-Task Consistency - All 4 helpers share `geo.distance_m` (consistent with batch-85 + earlier pattern). - All 4 scenarios use the same skip pattern (`sitl_replay_ready` → `pytest.skip`; missing fixture → `pytest.fail`). - All 4 scenarios use `nfr_recorder.record_metric` with `ac_id` kwarg consistently. - F5 documents the one intentional duplication (escalation-ladder thresholds vs blackout-spoof thresholds). ### Phase 7 — Architecture Compliance - All new files under `e2e/**` per `module-layout.md` § blackbox_tests `Owns` glob. - No imports from `src/gps_denied_onboard/**` (verified — only `runner.helpers.geo`, `runner.helpers.sitl_observer`, `pyproj`, stdlib). - No new cyclic dependencies. New evaluators are leaves of the import DAG. - No new infrastructure libraries. ## Auto-Fix Gate 0 Critical, 0 High, 0 Medium, 5 Low. All Low findings are either intentional, surfacing, or future hygiene. No auto-fix attempts triggered; no escalation. ## Verdict **PASS_WITH_WARNINGS**.