Batch 85 — 4 Performance NFT scenarios + pure-logic evaluators. - NFT-PERF-01 (AZ-428, Tier-2): two-config e2e latency p95 ≤ 400 ms (K=3@25°C, K=2 hybrid@50°C) + frame-drop ≤10% + informational per-stage partition recording (D-CROSS-LATENCY-1). - NFT-PERF-02 (AZ-429): inter-emit p95 ≤ 350 ms + no ≥3 missed-emit windows. fc-adapter-aware SITL timestamp extraction (tlog vs MSP). - NFT-PERF-03 (AZ-430, Tier-2): cold-start TTFF p95 ≤ 30 s AND max ≤ 45 s over N≥10 iterations. - NFT-PERF-04 (AZ-431): spoof-promotion latency p95 ≤ 600 ms over N≥20 randomized-start blackout+spoof events. All scenarios consume external fixtures (AZ-595 dependency surfaced) and fail loudly when fixtures are missing or empty. Public-boundary discipline preserved — evaluators do NOT import src/gps_denied_onboard. Tests: 60 new unit tests pass; 24 scenarios collect (4 tests × 2 fc × 3 vio). Code review: PASS_WITH_WARNINGS — 1 Medium (fixed in batch), 3 Low (production-dependency surfacings + future hygiene). Co-authored-by: Cursor <cursoragent@cursor.com>
7.3 KiB
Code Review Report — Batch 85
Batch: 85 (AZ-428 + AZ-429 + AZ-430 + AZ-431 — Performance NFTs) Date: 2026-05-17 Verdict: PASS_WITH_WARNINGS
Findings
| # | Severity | Category | File:Line | Title |
|---|---|---|---|---|
| 1 | Medium | Maintainability | e2e/tests/performance/test_nft_perf_04_spoof_promotion.py:127-145 |
Duplicate sitl_observer import across branches — fixed in batch |
| 2 | Low | Spec-Gap (surfacing) | e2e/tests/performance/test_nft_perf_04_spoof_promotion.py |
Production dependency: injector cannot emit N=20 randomized-start events |
| 3 | Low | Spec-Gap (surfacing) | e2e/tests/performance/test_nft_perf_03_ttff.py |
AC-2 (clean-state per iteration) delegated to Tier-2 harness (AZ-444) |
| 4 | Low | Maintainability | e2e/runner/helpers/{ttff,spoof_promotion,e2e_latency,streaming}_evaluator.py |
CSV-emit boilerplate duplicated across 4 evaluators |
Finding Details
F1: Duplicate sitl_observer import across branches (Medium / Maintainability — fixed in batch)
- Location:
e2e/tests/performance/test_nft_perf_04_spoof_promotion.py:132,140 - Description:
_resolve_events_fixture_pathimportedsitl_observerinside two separate branches. NFT-PERF-01 and NFT-PERF-03 already hoist the import once at the top of the resolver. - Resolution: Hoisted the import to the top of the function during this batch.
- Task: AZ-431
F2: Production dependency — injector cannot emit N=20 randomized-start events (Low / Spec-Gap — surfacing)
- Location:
e2e/tests/performance/test_nft_perf_04_spoof_promotion.py - Description: AZ-431 AC-1 says "N≥20 events via
blackout_spoof.pywith randomized window starts". Currentblackout_spoof.pyonly randomizes spoofed GPS values viaseed; the blackout-window start is hardcoded. The scenario therefore consumes an externalE2E_NFT_PERF_04_EVENTS_FIXTUREproduced by the fixture builder (AZ-595). Scenario fails loudly when the fixture is missing or empty. - Suggestion: Track as production dependency for AZ-595 (fixture builder) — extend the SITL replay builder to emit
nft_perf_04_events.jsonwith N≥20 randomized-start records. - Task: AZ-431
F3: AC-2 (clean-state per iteration) delegated to Tier-2 harness (Low / Spec-Gap — surfacing)
- Location:
e2e/tests/performance/test_nft_perf_03_ttff.py - Description: AZ-430 AC-2 requires per-iteration
fdr-outputvolume wipe + cold SUT restart. Per scope-discipline these lifecycle concerns belong to the Tier-2 harness (AZ-444 / AZ-595 fixture builder), not to the in-pytest scenario. The scenario only consumes a pre-capturednft_perf_03_ttff.jsonwith N≥10 iteration records. - Suggestion: Track as production dependency for AZ-444 (Tier-2 runner) — wire the per-iteration lifecycle reset and fixture builder.
- Task: AZ-430
F4: CSV-emit boilerplate duplicated across 4 evaluators (Low / Maintainability)
- Location:
e2e/runner/helpers/streaming_evaluator.py,spoof_promotion_evaluator.py,ttff_evaluator.py,e2e_latency_evaluator.py - Description: Each evaluator implements
write_csv_evidence+write_per_*with the same shape (open file, write header, write rows, return path). Aggregate CSV row formatting is also boilerplate-heavy. - Suggestion: Future hygiene PBI — extract a
_emit_csv(path, header, rows)helper. Not blocking; current code is readable and isolated per scenario. - Task: AZ-428 / AZ-429 / AZ-430 / AZ-431
Phase Notes
Phase 1 — Context
All 4 task specs read; ACs walked through against helpers + scenarios.
Phase 2 — Spec Compliance
| Task | AC | Evidence |
|---|---|---|
| AZ-429 | AC-1 p95 ≤ 350 ms | streaming_evaluator.evaluate_inter_emit.passes_p95 + scenario assertion |
| AZ-429 | AC-2 no ≥3-emit gap | evaluate_missed_emits.longest_run < MISSED_EMIT_WINDOW_LIMIT |
| AZ-429 | AC-3 parameterization | 6 collected variants (ardupilot/inav × {okvis2, klt_ransac, vins_mono}) |
| AZ-431 | AC-1 N≥20 events | evaluate.passes_event_count + fixture validation |
| AZ-431 | AC-2 p95 ≤ 600 ms | evaluate.passes_p95 + scenario assertion |
| AZ-431 | AC-3 parameterization | 6 collected variants |
| AZ-430 | AC-1 tier guard | @pytest.mark.tier2_only |
| AZ-430 | AC-2 clean state | delegated to Tier-2 harness (AZ-444) — F3 surfaced |
| AZ-430 | AC-3 p95 ≤ 30 s | te.evaluate.passes_p95 |
| AZ-430 | AC-4 max ≤ 45 s | te.evaluate.passes_max |
| AZ-430 | AC-5 parameterization | 6 collected variants |
| AZ-428 | AC-1 tier guard | @pytest.mark.tier2_only |
| AZ-428 | AC-2 K=3@25 °C p95 ≤ 400 ms | per-config assertion (config_id == "k3-25c") |
| AZ-428 | AC-3 K=2@50 °C p95 ≤ 400 ms | per-config assertion (config_id == "k2-hybrid-50c") |
| AZ-428 | AC-4 frame drop ≤ 10 % | LatencyReport.passes_frame_drop per config |
| AZ-428 | AC-5 partition recorded | write_partition_csv (informational, no threshold) |
| AZ-428 | AC-6 parameterization | 6 collected variants per config; both configs run per param |
Phase 3 — Code Quality
- SOLID: each evaluator owns one responsibility; fc-adapter-specific timestamp extraction lives in the AZ-429 scenario (
_read_emit_times_ms) rather than leaking into the evaluator. - Error handling:
ValueErroron negative latency/TTFF (fail-loud at evaluator boundary);pytest.failon malformed fixture (fail-loud at scenario boundary). No bareexcept. - DRY:
streaming_evaluator._percentilere-used byttff_evaluatorande2e_latency_evaluator— correct shared-helper pattern. - Tests: all use the Arrange/Act/Assert pattern with
# Arrange / # Act / # Assertmarkers per.cursor/rules/coderule.mdc. - Naming: scenario function names mirror task IDs (
test_nft_perf_0N_*); helper symbols use full domain words (ColdStartIteration,FrameLatencySample,SpoofEvent).
Phase 4 — Security
- No subprocess / shell=True / eval / exec usage in new code.
- No hardcoded secrets.
- Input from fixtures parsed via
json.loads(safe); shape validated with explicitpytest.failon malformed records — no insecure deserialisation.
Phase 5 — Performance
- One sort per percentile call (
sorted(values)); fixtures are ≤ N=900 per config — negligible. - No N+1 patterns; no blocking I/O in async contexts.
Phase 6 — Cross-Task Consistency
- All 4 evaluators share the
_percentilehelper fromstreaming_evaluator. - All 4 scenarios follow the identical fixture-consumer pattern (resolve fixture path → load → evaluate → write CSV evidence → record NFR metrics → assert).
- All 4 scenarios use
@pytest.mark.scenario_id+@pytest.mark.traces_toconsistently.
Phase 7 — Architecture Compliance
- All new files under
e2e/(Blackbox Tests component per_docs/02_document/module-layout.md). - No imports from
src/gps_denied_onboard(verified — explicit "does NOT import" notes in evaluator docstrings). - No new cyclic dependencies.
- No duplicate symbols across components.
Verdict Logic
- 0 Critical, 0 High → not FAIL.
- 1 Medium, 3 Low → PASS_WITH_WARNINGS.
F1 (duplicate import) is the only actionable finding without a downstream dependency; deferred to a follow-up hygiene pass given trivial scope.
Cumulative Trigger
Batch 85 advances the K-counter to 1 of K=3 from cumulative baseline (batches 82-84). Cumulative review trigger reached at batch 87.