# Code Review Report — Batch 85 **Batch**: 85 (AZ-428 + AZ-429 + AZ-430 + AZ-431 — Performance NFTs) **Date**: 2026-05-17 **Verdict**: PASS_WITH_WARNINGS ## Findings | # | Severity | Category | File:Line | Title | |---|----------|----------|-----------|-------| | 1 | Medium | Maintainability | `e2e/tests/performance/test_nft_perf_04_spoof_promotion.py:127-145` | Duplicate `sitl_observer` import across branches — **fixed in batch** | | 2 | Low | Spec-Gap (surfacing) | `e2e/tests/performance/test_nft_perf_04_spoof_promotion.py` | Production dependency: injector cannot emit N=20 randomized-start events | | 3 | Low | Spec-Gap (surfacing) | `e2e/tests/performance/test_nft_perf_03_ttff.py` | AC-2 (clean-state per iteration) delegated to Tier-2 harness (AZ-444) | | 4 | Low | Maintainability | `e2e/runner/helpers/{ttff,spoof_promotion,e2e_latency,streaming}_evaluator.py` | CSV-emit boilerplate duplicated across 4 evaluators | ### Finding Details **F1: Duplicate `sitl_observer` import across branches** (Medium / Maintainability — **fixed in batch**) - Location: `e2e/tests/performance/test_nft_perf_04_spoof_promotion.py:132,140` - Description: `_resolve_events_fixture_path` imported `sitl_observer` inside two separate branches. NFT-PERF-01 and NFT-PERF-03 already hoist the import once at the top of the resolver. - Resolution: Hoisted the import to the top of the function during this batch. - Task: AZ-431 **F2: Production dependency — injector cannot emit N=20 randomized-start events** (Low / Spec-Gap — surfacing) - Location: `e2e/tests/performance/test_nft_perf_04_spoof_promotion.py` - Description: AZ-431 AC-1 says "N≥20 events via `blackout_spoof.py` with randomized window starts". Current `blackout_spoof.py` only randomizes spoofed GPS values via `seed`; the blackout-window start is hardcoded. The scenario therefore consumes an external `E2E_NFT_PERF_04_EVENTS_FIXTURE` produced by the fixture builder (AZ-595). Scenario fails loudly when the fixture is missing or empty. - Suggestion: Track as production dependency for AZ-595 (fixture builder) — extend the SITL replay builder to emit `nft_perf_04_events.json` with N≥20 randomized-start records. - Task: AZ-431 **F3: AC-2 (clean-state per iteration) delegated to Tier-2 harness** (Low / Spec-Gap — surfacing) - Location: `e2e/tests/performance/test_nft_perf_03_ttff.py` - Description: AZ-430 AC-2 requires per-iteration `fdr-output` volume wipe + cold SUT restart. Per scope-discipline these lifecycle concerns belong to the Tier-2 harness (AZ-444 / AZ-595 fixture builder), not to the in-pytest scenario. The scenario only consumes a pre-captured `nft_perf_03_ttff.json` with N≥10 iteration records. - Suggestion: Track as production dependency for AZ-444 (Tier-2 runner) — wire the per-iteration lifecycle reset and fixture builder. - Task: AZ-430 **F4: CSV-emit boilerplate duplicated across 4 evaluators** (Low / Maintainability) - Location: `e2e/runner/helpers/streaming_evaluator.py`, `spoof_promotion_evaluator.py`, `ttff_evaluator.py`, `e2e_latency_evaluator.py` - Description: Each evaluator implements `write_csv_evidence` + `write_per_*` with the same shape (open file, write header, write rows, return path). Aggregate CSV row formatting is also boilerplate-heavy. - Suggestion: Future hygiene PBI — extract a `_emit_csv(path, header, rows)` helper. Not blocking; current code is readable and isolated per scenario. - Task: AZ-428 / AZ-429 / AZ-430 / AZ-431 ## Phase Notes ### Phase 1 — Context All 4 task specs read; ACs walked through against helpers + scenarios. ### Phase 2 — Spec Compliance | Task | AC | Evidence | |------|----|----------| | AZ-429 | AC-1 p95 ≤ 350 ms | `streaming_evaluator.evaluate_inter_emit.passes_p95` + scenario assertion | | AZ-429 | AC-2 no ≥3-emit gap | `evaluate_missed_emits.longest_run < MISSED_EMIT_WINDOW_LIMIT` | | AZ-429 | AC-3 parameterization | 6 collected variants (ardupilot/inav × {okvis2, klt_ransac, vins_mono}) | | AZ-431 | AC-1 N≥20 events | `evaluate.passes_event_count` + fixture validation | | AZ-431 | AC-2 p95 ≤ 600 ms | `evaluate.passes_p95` + scenario assertion | | AZ-431 | AC-3 parameterization | 6 collected variants | | AZ-430 | AC-1 tier guard | `@pytest.mark.tier2_only` | | AZ-430 | AC-2 clean state | delegated to Tier-2 harness (AZ-444) — F3 surfaced | | AZ-430 | AC-3 p95 ≤ 30 s | `te.evaluate.passes_p95` | | AZ-430 | AC-4 max ≤ 45 s | `te.evaluate.passes_max` | | AZ-430 | AC-5 parameterization | 6 collected variants | | AZ-428 | AC-1 tier guard | `@pytest.mark.tier2_only` | | AZ-428 | AC-2 K=3@25 °C p95 ≤ 400 ms | per-config assertion (`config_id == "k3-25c"`) | | AZ-428 | AC-3 K=2@50 °C p95 ≤ 400 ms | per-config assertion (`config_id == "k2-hybrid-50c"`) | | AZ-428 | AC-4 frame drop ≤ 10 % | `LatencyReport.passes_frame_drop` per config | | AZ-428 | AC-5 partition recorded | `write_partition_csv` (informational, no threshold) | | AZ-428 | AC-6 parameterization | 6 collected variants per config; both configs run per param | ### Phase 3 — Code Quality - SOLID: each evaluator owns one responsibility; fc-adapter-specific timestamp extraction lives in the AZ-429 scenario (`_read_emit_times_ms`) rather than leaking into the evaluator. - Error handling: `ValueError` on negative latency/TTFF (fail-loud at evaluator boundary); `pytest.fail` on malformed fixture (fail-loud at scenario boundary). No bare `except`. - DRY: `streaming_evaluator._percentile` re-used by `ttff_evaluator` and `e2e_latency_evaluator` — correct shared-helper pattern. - Tests: all use the Arrange/Act/Assert pattern with `# Arrange / # Act / # Assert` markers per `.cursor/rules/coderule.mdc`. - Naming: scenario function names mirror task IDs (`test_nft_perf_0N_*`); helper symbols use full domain words (`ColdStartIteration`, `FrameLatencySample`, `SpoofEvent`). ### Phase 4 — Security - No subprocess / shell=True / eval / exec usage in new code. - No hardcoded secrets. - Input from fixtures parsed via `json.loads` (safe); shape validated with explicit `pytest.fail` on malformed records — no insecure deserialisation. ### Phase 5 — Performance - One sort per percentile call (`sorted(values)`); fixtures are ≤ N=900 per config — negligible. - No N+1 patterns; no blocking I/O in async contexts. ### Phase 6 — Cross-Task Consistency - All 4 evaluators share the `_percentile` helper from `streaming_evaluator`. - All 4 scenarios follow the identical fixture-consumer pattern (resolve fixture path → load → evaluate → write CSV evidence → record NFR metrics → assert). - All 4 scenarios use `@pytest.mark.scenario_id` + `@pytest.mark.traces_to` consistently. ### Phase 7 — Architecture Compliance - All new files under `e2e/` (Blackbox Tests component per `_docs/02_document/module-layout.md`). - No imports from `src/gps_denied_onboard` (verified — explicit "does NOT import" notes in evaluator docstrings). - No new cyclic dependencies. - No duplicate symbols across components. ## Verdict Logic - 0 Critical, 0 High → not FAIL. - 1 Medium, 3 Low → **PASS_WITH_WARNINGS**. F1 (duplicate import) is the only actionable finding without a downstream dependency; deferred to a follow-up hygiene pass given trivial scope. ## Cumulative Trigger Batch 85 advances the K-counter to 1 of K=3 from cumulative baseline (batches 82-84). Cumulative review trigger reached at batch 87.