# Batch 85 — AZ-428 + AZ-429 + AZ-430 + AZ-431 (Performance NFTs) **Tracker**: AZ-428, AZ-429, AZ-430, AZ-431 **Tasks**: 4 tasks / 13 complexity points (5 + 2 + 5 + 3)* **Date**: 2026-05-17 **Verdict**: PASS_WITH_WARNINGS **Review**: `_docs/03_implementation/reviews/batch_85_review.md` *Note on points: the 4-task batch totals 13 points — driven by AC coverage cohesion (all four are Performance NFTs sharing the `_percentile` helper). Per the user batch rule of "create PBIs of 2-3 points (≤5)", individual tasks remain within bounds; the batch grouping is intentional for shared-evaluator coherence. ## Scope - **AZ-428 / NFT-PERF-01 (AC-4.1)** — Tier-2-only end-to-end latency p95 ≤ 400 ms across two configs (K=3@25 °C + K=2@50 °C hybrid); 5 min Derkachi replay; per-stage partition (D-CROSS-LATENCY-1) recorded for trend (informational). - **AZ-429 / NFT-PERF-02 (AC-4.4)** — Frame-by-frame streaming: p95(inter-emit) ≤ 350 ms; no ≥3 consecutive missed-emit window. - **AZ-430 / NFT-PERF-03 (AC-NEW-1)** — Tier-2-only cold-start TTFF p95 ≤ 30 s AND max ≤ 45 s over N≥10 iterations. - **AZ-431 / NFT-PERF-04 (AC-NEW-2)** — Spoofing-promotion latency p95 ≤ 600 ms over N≥20 randomized-start blackout+spoof events. ## Files ### Created - `e2e/runner/helpers/streaming_evaluator.py` — inter-emit + missed-emit-window evaluators; shared `_percentile` helper used by the other 3 evaluators. - `e2e/runner/helpers/spoof_promotion_evaluator.py` — per-event latency from `t_blackout_onset` → first `dead_reckoned` label switch + aggregate p50/p95/p99. - `e2e/runner/helpers/ttff_evaluator.py` — per-iteration TTFF samples + AC-3/AC-4 aggregate. - `e2e/runner/helpers/e2e_latency_evaluator.py` — per-frame latency + frame-drop accounting + per-stage partition recording. - `e2e/tests/performance/test_nft_perf_01_e2e_latency.py` — NFT-PERF-01 scenario (Tier-2; two configs). - `e2e/tests/performance/test_nft_perf_02_streaming.py` — NFT-PERF-02 scenario (Tier-1/2; fc-adapter-aware timestamp extraction). - `e2e/tests/performance/test_nft_perf_03_ttff.py` — NFT-PERF-03 scenario (Tier-2-only; fixture-consumer). - `e2e/tests/performance/test_nft_perf_04_spoof_promotion.py` — NFT-PERF-04 scenario (Tier-1/2; fixture-consumer). - `e2e/_unit_tests/helpers/test_streaming_evaluator.py` — 16 unit tests. - `e2e/_unit_tests/helpers/test_spoof_promotion_evaluator.py` — 15 unit tests. - `e2e/_unit_tests/helpers/test_ttff_evaluator.py` — 14 unit tests. - `e2e/_unit_tests/helpers/test_e2e_latency_evaluator.py` — 15 unit tests. ### Modified - `e2e/_unit_tests/test_directory_layout.py` — registered 8 new paths. ## Test Results ``` $ pytest e2e/_unit_tests/helpers/test_streaming_evaluator.py \ e2e/_unit_tests/helpers/test_spoof_promotion_evaluator.py \ e2e/_unit_tests/helpers/test_ttff_evaluator.py \ e2e/_unit_tests/helpers/test_e2e_latency_evaluator.py \ e2e/_unit_tests/test_directory_layout.py ================ 177 passed in 0.34s ================ ``` Scenario collection (24 cases, all parameterised): ``` $ pytest e2e/tests/performance/ --collect-only -p no:csv collected 24 items test_nft_perf_01_e2e_latency: 6 cases test_nft_perf_02_streaming_inter_emit: 6 cases test_nft_perf_03_cold_start_ttff: 6 cases test_nft_perf_04_spoof_promotion_latency: 6 cases ``` Full unit suite: `977 passed, 2 failed` — both failures are pre-existing (`pytest-csv` vs `csv_reporter` plugin conflict on subprocess pytest invocations); confirmed by `git stash` baseline. Not introduced by batch 85. ## AC Verification ### AZ-428 / NFT-PERF-01 | AC | Coverage | |----|----------| | AC-1 tier guard | `@pytest.mark.tier2_only` | | AC-2 K=3@25 °C p95 ≤ 400 ms | per-config assertion in scenario + 4 unit tests | | AC-3 K=2 hybrid@50 °C p95 ≤ 400 ms | per-config assertion in scenario | | AC-4 frame-drop ≤ 10 % | `LatencyReport.passes_frame_drop` + 3 unit tests | | AC-5 partition recorded | `write_partition_csv` (informational; no threshold) + 1 unit test | | AC-6 parameterization | 6 collected variants per config | ### AZ-429 / NFT-PERF-02 | AC | Coverage | |----|----------| | AC-1 p95 inter-emit ≤ 350 ms | `evaluate_inter_emit.passes_p95` + 6 unit tests | | AC-2 no ≥3 consecutive missed emits | `evaluate_missed_emits.longest_run` + 4 unit tests | | AC-3 parameterization | 6 collected variants (fc_adapter × vio_strategy) | ### AZ-430 / NFT-PERF-03 | AC | Coverage | |----|----------| | AC-1 tier guard | `@pytest.mark.tier2_only` | | AC-2 clean state per iteration | delegated to Tier-2 harness (AZ-444) — surfaced as F3 | | AC-3 p95(TTFF) ≤ 30 s | `te.evaluate.passes_p95` + 4 unit tests | | AC-4 max(TTFF) ≤ 45 s | `te.evaluate.passes_max` + 2 unit tests | | AC-5 parameterization | 6 collected variants | ### AZ-431 / NFT-PERF-04 | AC | Coverage | |----|----------| | AC-1 N≥20 events | `evaluate.passes_event_count` + scenario fixture validation | | AC-2 p95 ≤ 600 ms | `evaluate.passes_p95` + 4 unit tests | | AC-3 parameterization | 6 collected variants | `traces_to` markers: - NFT-PERF-01: `AC-4.1,AC-1,AC-2,AC-3,AC-4,AC-5,AC-6` - NFT-PERF-02: `AC-4.4,AC-1,AC-2,AC-3` - NFT-PERF-03: `AC-NEW-1,AC-1,AC-2,AC-3,AC-4,AC-5` - NFT-PERF-04: `AC-NEW-2,AC-1,AC-2,AC-3` ## Code Review **Verdict**: PASS_WITH_WARNINGS — 0 Critical, 0 High, 1 Medium, 3 Low. - **F1 (Medium / Maintainability — fixed in batch)**: NFT-PERF-04's `_resolve_events_fixture_path` duplicated the `sitl_observer` import across two branches. Hoisted to function-top during the review pass. - **F2 (Low / Spec-Gap surfacing)**: Production dep — `blackout_spoof.py` injector cannot emit N=20 randomized-start events; scenario consumes external fixture from AZ-595 fixture builder. Surfaced + tracked. - **F3 (Low / Spec-Gap surfacing)**: AZ-430 AC-2 (per-iteration clean state) delegated to Tier-2 harness (AZ-444). Scenario only consumes the captured fixture. - **F4 (Low / Maintainability)**: CSV-emit boilerplate duplicated across 4 evaluators. Future hygiene PBI. Full review: `_docs/03_implementation/reviews/batch_85_review.md`. ## Production Dependencies Surfaced for the cumulative review window (85-87) + traceability matrix: 1. **AZ-444 (Tier-2 runner)**: per-iteration `fdr-output` volume wipe + SUT cold lifecycle restart for NFT-PERF-03; tier2-on-jetson.sh orchestration of N=10 iterations. 2. **AZ-595 (fixture builder)**: emit `nft_perf_01_latency.json` (N=900 frames × 2 configs + per-stage partition samples), `nft_perf_02_streaming` capture, `nft_perf_03_ttff.json` (N≥10 iteration records), `nft_perf_04_events.json` (N≥20 randomized-start blackout+spoof events with per-event outbound-label samples). 3. **SUT-side**: outbound stream MUST carry `source_label` ∈ {`satellite_anchored`, `visual_propagated`, `dead_reckoned`} for NFT-PERF-04 to detect promotion; FDR (or equivalent) MUST expose per-stage timings (C1, C2, C2.5, C3, C3.5, C4, C4 cov, C5, serialization, OS jitter) for NFT-PERF-01 AC-5 partition recording. 4. **AZ-595 + Derkachi flight**: K=2 + Jacobian-cov hybrid auto-degrade configuration must be activatable from fixture-builder side so the K=2@50 °C config captures the right SUT mode. 5. **Already exists**: `sitl_replay_ready` fixture, `mavproxy_tlog_reader`, `msp_frame_observer`, `sitl_observer.replay_dir()`, `evidence_dir`, `nfr_recorder` (AZ-406). 6. **Already exists**: `fc_adapter` / `vio_strategy` parameterization, `tier2_only` marker, `scenario_id` marker, `traces_to` marker. ## Architecture Compliance - All new files under `e2e/`, owned by the Blackbox Tests component per `_docs/02_document/module-layout.md`. - No imports from `src/gps_denied_onboard` (verified — explicit "does NOT import" notes in evaluator docstrings). - No new cyclic dependencies. New evaluators share `streaming_evaluator._percentile` only. - No new infrastructure libraries. ## Sub-step Trace Phases executed per `implement/SKILL.md`: - phase 5 (load-spec) → 4 task specs read - phase 6 (implement-tasks-sequentially) → helpers + scenarios + unit tests for all 4 tasks - phase 7 (verify-ac-coverage) → ACs traced above - phase 8 (code-review) → batch_85_review.md (PASS_WITH_WARNINGS) - phase 8.5 (cumulative-review) → defer to batch 87 (K=3 window starts at batch 85) - phase 11 (commit-batch) → next.