# Performance Tests Authored by `/test-spec` Phase 2 (2026-05-19). Performance tests measure latency / rate / sustained-load characteristics. Functional behaviour that those characteristics enable lives in `blackbox-tests.md`. Resource ceilings live in `resource-limit-tests.md`. Every scenario records steady-state metrics — cold-start measurements are explicitly excluded by a warm-up precondition. Pass criteria use the methods in `_docs/00_problem/input_data/expected_results/results_report.md` (referenced by row id). --- ## Latency ### NFT-PERF-L1: Tier-1 per-frame end-to-end latency ≤ 100 ms **Summary**: Per-frame end-to-end latency through the Tier-1 contract (frame in → normalised-box record out) ≤ 100 ms at 1280 px input. **Traces to**: AC `Latency — Primitive (Tier 1) object detection / L1`. **Tier**: HW (representative Jetson Orin Nano Super) OR benchmarked replay (the only way to satisfy the project-level Acceptance Gate). **Metric**: per-frame wall-clock from RTSP frame-receive timestamp to normalised-box emission timestamp. **Preconditions**: - Warm-up: 100 frames played before measurement starts (TensorRT engine warm, autopilot's frame pipeline in steady state). - Single 1280 px frame replayed via `rtsp-loopback`; the live Tier-1 service is colocated on the same Jetson. | Step | Consumer Action | Measurement | |---|---|---| | 1 | Play `fixtures/images/4d6e1830d211ad50.jpg` as a 60 s loop at 30 fps | record per-frame (frame_receive_ts, normalised_box_emit_ts); compute Δms | | 2 | Aggregate over the measurement window | report p50, p95, p99, max | **Pass criteria**: `p95 ≤ 100 ms` AND `max ≤ 150 ms` (max gives a soft headroom; AC enforces the p95 line). **Duration**: 60 s after warm-up. **Test status**: READY (fixture present); Tier requires HW for the release gate. --- ### NFT-PERF-L2: Tier-2 per-ROI semantic confirmation ≤ 200 ms **Summary**: Per-ROI latency through Tier-2 semantic confirmation ≤ 200 ms. **Traces to**: AC `Latency — Semantic confirmation (Tier 2) / L2`. **Tier**: HW + Tier-B (inline ROI crop generation). **Metric**: per-ROI wall-clock from ROI submitted to Tier-2 to Tier-2 emits semantic confirmation. **Preconditions**: - Warm-up: 50 ROIs processed before measurement. - Test runner derives a ~640×640 ROI inline from `fixtures/images/4d6e1830d211ad50.jpg` and injects it directly into the SUT's Tier-2 entry (via a test-only ROI submission API exposed in test builds). | Step | Consumer Action | Measurement | |---|---|---| | 1 | Submit 1000 ROIs at 5 Hz | per-ROI Δms | | 2 | Aggregate | p50, p95, p99 | **Pass criteria**: `p95 ≤ 200 ms`. **Duration**: 200 s. **Test status**: READY. --- ### NFT-PERF-L3: Tier-3 deep-analysis ≤ 5 s per ROI **Summary**: Per-ROI deep-analysis (Tier-3 / VLM, when enabled) ≤ 5 s. **Traces to**: AC `Latency — Deep semantic confirmation (Tier 3 / VLM, when enabled) / L3`. **Tier**: HW + Tier-B (vlm-mock). **Metric**: per-ROI wall-clock from SUT issuing a Tier-3 IPC call to VLM response received and schema-validated. **Preconditions**: - Warm-up: 5 Tier-3 calls. - `vlm-mock` configured to respond from `vlm-io-pairs` fixture; Tier-3 enabled via SUT config. | Step | Consumer Action | Measurement | |---|---|---| | 1 | Trigger 100 Tier-3 calls via injected ROIs | per-call Δms | | 2 | Aggregate | p50, p95, p99 | **Pass criteria**: `p95 ≤ 5000 ms`. **Duration**: as needed for 100 calls. **Test status**: DEFERRED — ``. --- ### NFT-PERF-L4: Camera zoom transition (medium → high) ≤ 2 s **Summary**: Wall-clock from issuing the medium→high zoom command to the physical zoom transition completing ≤ 2 s, including the 1–2 s physical floor (restriction). **Traces to**: AC `Latency — Camera zoom transition / L4`, RESTRICT `Hardware — 40× optical zoom traversal takes 1–2 s wall-clock`. **Tier**: HW (physical A40 OR benchmarked replay) — pure-emulator runs not acceptable per `expected_results/results_report.md → Notes on this spec`. **Metric**: wall-clock from outbound zoom command (observed on gimbal UDP) to gimbal-mock zoom telemetry reporting target_zoom_band. **Preconditions**: - SUT in `ZoomedIn` mode after a sweep-to-zoom transition; gimbal at medium zoom. - HW Jetson OR `gimbal-mock` replaying recorded A40 zoom telemetry with realistic traversal time. | Step | Consumer Action | Measurement | |---|---|---| | 1 | Trigger 30 medium→high zoom transitions via scripted POI sequence | per-transition Δms | | 2 | Aggregate | p50, p95, max | **Pass criteria**: `p95 ≤ 2000 ms`. **Test status**: DEFERRED — ``. --- ### NFT-PERF-L5: Decision-to-movement latency ≤ 500 ms **Summary**: From the internal scan-control decision (POI detected mid-sweep) to the camera physically beginning to move ≤ 500 ms. **Traces to**: AC `Latency — Decision-to-movement latency / L5`. **Tier**: HW + Tier-B. **Metric**: wall-clock from Tier-1 detection received at the scan-controller to first gimbal command observed on `gimbal-mock`. **Preconditions**: - Warm-up: 10 scripted POI events. - Scripted scan-decision events followed by camera physical motion observed on the gimbal UDP channel. | Step | Consumer Action | Measurement | |---|---|---| | 1 | Inject 100 POI detections at random sweep positions | per-event Δms (detection-receive-ts → gimbal-command-out-ts) | | 2 | Aggregate | p95 | **Pass criteria**: `p95 ≤ 500 ms`. **Test status**: DEFERRED — ``. --- ### NFT-PERF-L6: Movement candidate enqueue ≤ 1 s (wide sweep) **Summary**: From the movement event in the visual stream to candidate enqueued for zoomed inspection ≤ 1 s during the wide-area sweep. **Traces to**: AC `Latency — Movement candidate enqueue / L6`. **Tier**: B + E. **Metric**: wall-clock from ground-truth movement-event timestamp (annotated in the fixture) to candidate appearing on operator-stream. **Preconditions**: - Warm-up: 30 s of sweep playback. - Synchronised RTSP + gimbal.csv + telemetry.csv (DEFERRED CSV pair). | Step | Consumer Action | Measurement | |---|---|---| | 1 | Replay `fixtures/movement/video01.mp4` + paired CSVs | record per-event Δms | | 2 | Aggregate over ~20 movement events | p95 | **Pass criteria**: `p95 ≤ 1000 ms`. **Test status**: DEFERRED — ``. --- ### NFT-PERF-L7: Movement candidate enqueue ≤ 1.5 s (zoomed-in) **Summary**: Same as L6 but during a zoomed-in hold; budget relaxed to 1.5 s to accommodate gimbal slew. **Traces to**: AC `Latency — Movement candidate enqueue … during the zoomed-in inspection / L7`. **Tier**: B + E. **Metric**: same as L6 but starting from a ZoomedIn hold. **Preconditions**: - SUT in ZoomedIn hold; small mover appears mid-hold. - DEFERRED zoomed-in CSV pair. | Step | Consumer Action | Measurement | |---|---|---| | 1 | Drive SUT into ZoomedIn hold; replay zoomed-in scene with small mover | per-event Δms | | 2 | Aggregate over ~10 movement events | p95 | **Pass criteria**: `p95 ≤ 1500 ms`. **Test status**: DEFERRED — ``. --- ### NFT-PERF-L8: Zoom-out → zoom-in transition ≤ 2 s **Summary**: From POI detected during sweep to ROI fully zoomed and held ≤ 2 s wall-clock. **Traces to**: AC `Latency — Zoom-out → zoom-in transition / L8`. **Tier**: HW + Tier-B. **Metric**: wall-clock from Tier-1 detection injected → first frame at full zoom on the ROI (observed via gimbal-mock zoom telemetry and the operator-stream ROI overlay). **Preconditions**: - Warm-up. - Scripted sweep + injected POI. | Step | Consumer Action | Measurement | |---|---|---| | 1 | Inject 30 mid-sweep POIs | per-transition Δms | | 2 | Aggregate | p95 | **Pass criteria**: `p95 ≤ 2000 ms`. **Test status**: DEFERRED — ``. --- ### NFT-PERF-L9: Operator command → action ≤ 500 ms **Summary**: From operator click event (entering the SUT on the operator-stream return path) to the corresponding outbound command observed on its destination channel ≤ 500 ms; modem RTT explicitly excluded by measuring inside the SUT-side of the modem. **Traces to**: AC `Latency — Operator command → action / L9`. **Tier**: B + E. **Metric**: wall-clock from operator-stream message arrival at SUT → first outbound command observed on the affected channel (MAVLink waypoint POST, gimbal command, mode-change emission). **Preconditions**: - Operator-session-scripts include click events at deterministic offsets. | Step | Consumer Action | Measurement | |---|---|---| | 1 | Replay scripted operator-click sequence (50 clicks across confirm / decline / target-follow / abort) | per-click Δms | | 2 | Aggregate | p95 | **Pass criteria**: `p95 ≤ 500 ms`. **Test status**: DEFERRED — `` for signed commands; happy-path placeholder usable today for an early measurement (mark interim baseline only). --- ## Throughput / Rate ### NFT-PERF-T1: POI rate to operator capped at ≤ 5 / min **Summary**: Even when Tier-1 produces detections faster than the cap, the rate of POIs SURFACED to the operator MUST stay ≤ 5 / min (hard cap, frozen 2026-05-06). **Traces to**: AC `Throughput / Rate — POI rate surfaced to the operator / T1`. **Tier**: B. **Metric**: count of POIs emitted on operator-stream per rolling 60 s window. **Preconditions**: - Synthetic POI feed sustained at 20 POIs / min via `synthetic-poi-feeds`. | Step | Consumer Action | Measurement | |---|---|---| | 1 | Inject sustained 20 POI/min feed for 10 minutes | per-minute count of surfaced POIs | | 2 | Compute max over any rolling 60 s window | rolling-max | **Pass criteria**: `rolling-max ≤ 5` POIs/min for every 60 s window. **Duration**: 10 min. **Test status**: READY (synthetic feeds inline-authorable). --- ### NFT-PERF-T2: Position telemetry rate ∈ [1 Hz, 10 Hz] **Summary**: The position telemetry the SUT consumes from the airframe link MUST sustain ≥1 Hz, target 10 Hz, over a 60 s window. **Traces to**: AC `Throughput / Rate — Position telemetry rate / T2`. **Tier**: B (with MAVLink replay) + E (live SITL). **Metric**: count of `GLOBAL_POSITION_INT` messages consumed by the SUT per second. **Preconditions**: - MAVLink stream replayed at the configured target rate (10 Hz). | Step | Consumer Action | Measurement | |---|---|---| | 1 | Replay 60 s of GLOBAL_POSITION_INT at 10 Hz | per-second consumed count | | 2 | Aggregate | min, mean | **Pass criteria**: `min ≥ 1 Hz` AND `mean ≥ 9.5 Hz` (target 10 Hz with ≤ 5 % tolerance). **Test status**: DEFERRED — ``. --- ### NFT-PERF-T3: Frame-rate floor → suppress zoom-in + health yellow **Summary**: When the sustained camera frame rate drops below 10 fps for ≥5 s, zoom-in transitions MUST be suppressed AND overall health MUST surface yellow. **Traces to**: AC `Throughput / Rate — Sustained camera frame-rate floor / T3`. **Tier**: B. **Metric**: pair: (boolean — was a zoom-in suppressed during the low-FPS window?), (boolean — did health surface yellow?). **Preconditions**: - SUT in normal sweep mode. - `rtsp-loopback` plays `fixtures/videos/94d42580bd1ad6ff.mp4` with throttled decode injecting frame drops to keep FPS < 10 for ≥ 5 s. | Step | Consumer Action | Measurement | |---|---|---| | 1 | Start playback at normal 30 fps | health remains green; zoom-in proceeds normally on detection | | 2 | Throttle decode + drop frames to push FPS below 10 for ≥ 5 s | record: (a) whether a zoom-in-required event during this window was suppressed; (b) whether `GET /health` returns `overall == "yellow"` | **Pass criteria**: both observations TRUE. **Duration**: 30 s (5 s low-FPS window + buffer). **Test status**: READY (fixture present; throttling implemented by consumer). --- ## Sustained-load (handoff to resource-limit-tests) The two sustained-resource AC rows (Re1, Re2) live as resource-limit tests rather than performance tests because the pass criterion is "stays within ceiling for the duration", not "is fast enough": - Re1 — combined RSS ≤ 6 GB onboard for everything autopilot owns — see `resource-limit-tests.md → NFT-RES-LIM-Re1`. - Re2 — Tier-1 per-frame latency Δ ≤ 5 ms when autopilot's workload runs concurrently — see `resource-limit-tests.md → NFT-RES-LIM-Re2`. Re2 is the Tier-1 non-degradation contract; the absolute Tier-1 latency target is L1. --- ## Common preconditions for every performance scenario - **Warm-up**: every scenario MUST include an explicit warm-up phase whose duration is recorded in the CSV report. This separates cold-start cost from steady-state behaviour. - **Steady-state window**: pass criteria apply only to the steady-state window (after warm-up), not to the warm-up itself. - **Hardware honesty**: scenarios that name Tier HW MUST run on representative Jetson Orin Nano Super OR on a benchmarked replay. Pure-x86-emulator runs report results but do NOT contribute to the project-level Acceptance Gate. - **Concurrent workload disclosure**: every scenario records whether other autopilot subsystems were running concurrently (Tier-1 inference, VLM, MAVLink, etc.). Re2 is the only scenario that REQUIRES concurrent workload; the others MUST report it for context. - **Seed + determinism**: where the test inputs randomness (e.g., synthetic-POI ordering tie-breakers), the seed is captured in the CSV report.