autopilot/_docs/02_document/tests/performance-tests.md

# Performance Tests

Authored by `/test-spec` Phase 2 (2026-05-19). Performance tests measure latency / rate / sustained-load characteristics. Functional behaviour that those characteristics enable lives in `blackbox-tests.md`. Resource ceilings live in `resource-limit-tests.md`.

Every scenario records steady-state metrics — cold-start measurements are explicitly excluded by a warm-up precondition. Pass criteria use the methods in `_docs/00_problem/input_data/expected_results/results_report.md` (referenced by row id).

---

## Latency

### NFT-PERF-L1: Tier-1 per-frame end-to-end latency ≤ 100 ms
**Summary**: Per-frame end-to-end latency through the Tier-1 contract (frame in → normalised-box record out) ≤ 100 ms at 1280 px input.
**Traces to**: AC `Latency — Primitive (Tier 1) object detection / L1`.
**Tier**: HW (representative Jetson Orin Nano Super) OR benchmarked replay (the only way to satisfy the project-level Acceptance Gate).
**Metric**: per-frame wall-clock from RTSP frame-receive timestamp to normalised-box emission timestamp.

**Preconditions**:
- Warm-up: 100 frames played before measurement starts (TensorRT engine warm, autopilot's frame pipeline in steady state).
- Single 1280 px frame replayed via `rtsp-loopback`; the live Tier-1 service is colocated on the same Jetson.

| Step | Consumer Action | Measurement |
|---|---|---|
| 1 | Play `fixtures/images/4d6e1830d211ad50.jpg` as a 60 s loop at 30 fps | record per-frame (frame_receive_ts, normalised_box_emit_ts); compute Δms |
| 2 | Aggregate over the measurement window | report p50, p95, p99, max |

**Pass criteria**: `p95 ≤ 100 ms` AND `max ≤ 150 ms` (max gives a soft headroom; AC enforces the p95 line).
**Duration**: 60 s after warm-up.
**Test status**: READY (fixture present); Tier requires HW for the release gate.

---

### NFT-PERF-L2: Tier-2 per-ROI semantic confirmation ≤ 200 ms
**Summary**: Per-ROI latency through Tier-2 semantic confirmation ≤ 200 ms.
**Traces to**: AC `Latency — Semantic confirmation (Tier 2) / L2`.
**Tier**: HW + Tier-B (inline ROI crop generation).
**Metric**: per-ROI wall-clock from ROI submitted to Tier-2 to Tier-2 emits semantic confirmation.

**Preconditions**:
- Warm-up: 50 ROIs processed before measurement.
- Test runner derives a ~640×640 ROI inline from `fixtures/images/4d6e1830d211ad50.jpg` and injects it directly into the SUT's Tier-2 entry (via a test-only ROI submission API exposed in test builds).

| Step | Consumer Action | Measurement |
|---|---|---|
| 1 | Submit 1000 ROIs at 5 Hz | per-ROI Δms |
| 2 | Aggregate | p50, p95, p99 |

**Pass criteria**: `p95 ≤ 200 ms`.
**Duration**: 200 s.
**Test status**: READY.

---

### NFT-PERF-L3: Tier-3 deep-analysis ≤ 5 s per ROI
**Summary**: Per-ROI deep-analysis (Tier-3 / VLM, when enabled) ≤ 5 s.
**Traces to**: AC `Latency — Deep semantic confirmation (Tier 3 / VLM, when enabled) / L3`.
**Tier**: HW + Tier-B (vlm-mock).
**Metric**: per-ROI wall-clock from SUT issuing a Tier-3 IPC call to VLM response received and schema-validated.

**Preconditions**:
- Warm-up: 5 Tier-3 calls.
- `vlm-mock` configured to respond from `vlm-io-pairs` fixture; Tier-3 enabled via SUT config.

| Step | Consumer Action | Measurement |
|---|---|---|
| 1 | Trigger 100 Tier-3 calls via injected ROIs | per-call Δms |
| 2 | Aggregate | p50, p95, p99 |

**Pass criteria**: `p95 ≤ 5000 ms`.
**Duration**: as needed for 100 calls.
**Test status**: DEFERRED — `<DEFERRED: vlm-io-pairs (real I/O) and the pinned local VLM model>`.

---

### NFT-PERF-L4: Camera zoom transition (medium → high) ≤ 2 s
**Summary**: Wall-clock from issuing the medium→high zoom command to the physical zoom transition completing ≤ 2 s, including the 1–2 s physical floor (restriction).
**Traces to**: AC `Latency — Camera zoom transition / L4`, RESTRICT `Hardware — 40× optical zoom traversal takes 1–2 s wall-clock`.
**Tier**: HW (physical A40 OR benchmarked replay) — pure-emulator runs not acceptable per `expected_results/results_report.md → Notes on this spec`.
**Metric**: wall-clock from outbound zoom command (observed on gimbal UDP) to gimbal-mock zoom telemetry reporting target_zoom_band.

**Preconditions**:
- SUT in `ZoomedIn` mode after a sweep-to-zoom transition; gimbal at medium zoom.
- HW Jetson OR `gimbal-mock` replaying recorded A40 zoom telemetry with realistic traversal time.

| Step | Consumer Action | Measurement |
|---|---|---|
| 1 | Trigger 30 medium→high zoom transitions via scripted POI sequence | per-transition Δms |
| 2 | Aggregate | p50, p95, max |

**Pass criteria**: `p95 ≤ 2000 ms`.
**Test status**: DEFERRED — `<DEFERRED: SITL or hardware-in-loop ViewPro A40 zoom command capture>`.

---

### NFT-PERF-L5: Decision-to-movement latency ≤ 500 ms
**Summary**: From the internal scan-control decision (POI detected mid-sweep) to the camera physically beginning to move ≤ 500 ms.
**Traces to**: AC `Latency — Decision-to-movement latency / L5`.
**Tier**: HW + Tier-B.
**Metric**: wall-clock from Tier-1 detection received at the scan-controller to first gimbal command observed on `gimbal-mock`.

**Preconditions**:
- Warm-up: 10 scripted POI events.
- Scripted scan-decision events followed by camera physical motion observed on the gimbal UDP channel.

| Step | Consumer Action | Measurement |
|---|---|---|
| 1 | Inject 100 POI detections at random sweep positions | per-event Δms (detection-receive-ts → gimbal-command-out-ts) |
| 2 | Aggregate | p95 |

**Pass criteria**: `p95 ≤ 500 ms`.
**Test status**: DEFERRED — `<DEFERRED: scripted scan decision events with paired gimbal telemetry capture>`.

---

### NFT-PERF-L6: Movement candidate enqueue ≤ 1 s (wide sweep)
**Summary**: From the movement event in the visual stream to candidate enqueued for zoomed inspection ≤ 1 s during the wide-area sweep.
**Traces to**: AC `Latency — Movement candidate enqueue / L6`.
**Tier**: B + E.
**Metric**: wall-clock from ground-truth movement-event timestamp (annotated in the fixture) to candidate appearing on operator-stream.

**Preconditions**:
- Warm-up: 30 s of sweep playback.
- Synchronised RTSP + gimbal.csv + telemetry.csv (DEFERRED CSV pair).

| Step | Consumer Action | Measurement |
|---|---|---|
| 1 | Replay `fixtures/movement/video01.mp4` + paired CSVs | record per-event Δms |
| 2 | Aggregate over ~20 movement events | p95 |

**Pass criteria**: `p95 ≤ 1000 ms`.
**Test status**: DEFERRED — `<DEFERRED: paired gimbal.csv + telemetry.csv for video01.mp4 with annotated movement-event timestamps>`.

---

### NFT-PERF-L7: Movement candidate enqueue ≤ 1.5 s (zoomed-in)
**Summary**: Same as L6 but during a zoomed-in hold; budget relaxed to 1.5 s to accommodate gimbal slew.
**Traces to**: AC `Latency — Movement candidate enqueue … during the zoomed-in inspection / L7`.
**Tier**: B + E.
**Metric**: same as L6 but starting from a ZoomedIn hold.

**Preconditions**:
- SUT in ZoomedIn hold; small mover appears mid-hold.
- DEFERRED zoomed-in CSV pair.

| Step | Consumer Action | Measurement |
|---|---|---|
| 1 | Drive SUT into ZoomedIn hold; replay zoomed-in scene with small mover | per-event Δms |
| 2 | Aggregate over ~10 movement events | p95 |

**Pass criteria**: `p95 ≤ 1500 ms`.
**Test status**: DEFERRED — `<DEFERRED: paired gimbal.csv + telemetry.csv at zoomed-in band>`.

---

### NFT-PERF-L8: Zoom-out → zoom-in transition ≤ 2 s
**Summary**: From POI detected during sweep to ROI fully zoomed and held ≤ 2 s wall-clock.
**Traces to**: AC `Latency — Zoom-out → zoom-in transition / L8`.
**Tier**: HW + Tier-B.
**Metric**: wall-clock from Tier-1 detection injected → first frame at full zoom on the ROI (observed via gimbal-mock zoom telemetry and the operator-stream ROI overlay).

**Preconditions**:
- Warm-up.
- Scripted sweep + injected POI.

| Step | Consumer Action | Measurement |
|---|---|---|
| 1 | Inject 30 mid-sweep POIs | per-transition Δms |
| 2 | Aggregate | p95 |

**Pass criteria**: `p95 ≤ 2000 ms`.
**Test status**: DEFERRED — `<DEFERRED: sweep → zoomed-inspection transition capture with annotated transition-complete timestamps>`.

---

### NFT-PERF-L9: Operator command → action ≤ 500 ms
**Summary**: From operator click event (entering the SUT on the operator-stream return path) to the corresponding outbound command observed on its destination channel ≤ 500 ms; modem RTT explicitly excluded by measuring inside the SUT-side of the modem.
**Traces to**: AC `Latency — Operator command → action / L9`.
**Tier**: B + E.
**Metric**: wall-clock from operator-stream message arrival at SUT → first outbound command observed on the affected channel (MAVLink waypoint POST, gimbal command, mode-change emission).

**Preconditions**:
- Operator-session-scripts include click events at deterministic offsets.

| Step | Consumer Action | Measurement |
|---|---|---|
| 1 | Replay scripted operator-click sequence (50 clicks across confirm / decline / target-follow / abort) | per-click Δms |
| 2 | Aggregate | p95 |

**Pass criteria**: `p95 ≤ 500 ms`.
**Test status**: DEFERRED — `<DEFERRED: operator-envelopes once Q9 resolves>` for signed commands; happy-path placeholder usable today for an early measurement (mark interim baseline only).

---

## Throughput / Rate

### NFT-PERF-T1: POI rate to operator capped at ≤ 5 / min
**Summary**: Even when Tier-1 produces detections faster than the cap, the rate of POIs SURFACED to the operator MUST stay ≤ 5 / min (hard cap, frozen 2026-05-06).
**Traces to**: AC `Throughput / Rate — POI rate surfaced to the operator / T1`.
**Tier**: B.
**Metric**: count of POIs emitted on operator-stream per rolling 60 s window.

**Preconditions**:
- Synthetic POI feed sustained at 20 POIs / min via `synthetic-poi-feeds`.

| Step | Consumer Action | Measurement |
|---|---|---|
| 1 | Inject sustained 20 POI/min feed for 10 minutes | per-minute count of surfaced POIs |
| 2 | Compute max over any rolling 60 s window | rolling-max |

**Pass criteria**: `rolling-max ≤ 5` POIs/min for every 60 s window.
**Duration**: 10 min.
**Test status**: READY (synthetic feeds inline-authorable).

---

### NFT-PERF-T2: Position telemetry rate ∈ [1 Hz, 10 Hz]
**Summary**: The position telemetry the SUT consumes from the airframe link MUST sustain ≥1 Hz, target 10 Hz, over a 60 s window.
**Traces to**: AC `Throughput / Rate — Position telemetry rate / T2`.
**Tier**: B (with MAVLink replay) + E (live SITL).
**Metric**: count of `GLOBAL_POSITION_INT` messages consumed by the SUT per second.

**Preconditions**:
- MAVLink stream replayed at the configured target rate (10 Hz).

| Step | Consumer Action | Measurement |
|---|---|---|
| 1 | Replay 60 s of GLOBAL_POSITION_INT at 10 Hz | per-second consumed count |
| 2 | Aggregate | min, mean |

**Pass criteria**: `min ≥ 1 Hz` AND `mean ≥ 9.5 Hz` (target 10 Hz with ≤ 5 % tolerance).
**Test status**: DEFERRED — `<DEFERRED: MAVLink replay fixture over a 60 s window>`.

---

### NFT-PERF-T3: Frame-rate floor → suppress zoom-in + health yellow
**Summary**: When the sustained camera frame rate drops below 10 fps for ≥5 s, zoom-in transitions MUST be suppressed AND overall health MUST surface yellow.
**Traces to**: AC `Throughput / Rate — Sustained camera frame-rate floor / T3`.
**Tier**: B.
**Metric**: pair: (boolean — was a zoom-in suppressed during the low-FPS window?), (boolean — did health surface yellow?).

**Preconditions**:
- SUT in normal sweep mode.
- `rtsp-loopback` plays `fixtures/videos/94d42580bd1ad6ff.mp4` with throttled decode injecting frame drops to keep FPS < 10 for ≥ 5 s.

| Step | Consumer Action | Measurement |
|---|---|---|
| 1 | Start playback at normal 30 fps | health remains green; zoom-in proceeds normally on detection |
| 2 | Throttle decode + drop frames to push FPS below 10 for ≥ 5 s | record: (a) whether a zoom-in-required event during this window was suppressed; (b) whether `GET /health` returns `overall == "yellow"` |

**Pass criteria**: both observations TRUE.
**Duration**: 30 s (5 s low-FPS window + buffer).
**Test status**: READY (fixture present; throttling implemented by consumer).

---

## Sustained-load (handoff to resource-limit-tests)

The two sustained-resource AC rows (Re1, Re2) live as resource-limit tests rather than performance tests because the pass criterion is "stays within ceiling for the duration", not "is fast enough":

- Re1 — combined RSS ≤ 6 GB onboard for everything autopilot owns — see `resource-limit-tests.md → NFT-RES-LIM-Re1`.
- Re2 — Tier-1 per-frame latency Δ ≤ 5 ms when autopilot's workload runs concurrently — see `resource-limit-tests.md → NFT-RES-LIM-Re2`. Re2 is the Tier-1 non-degradation contract; the absolute Tier-1 latency target is L1.

---

## Common preconditions for every performance scenario

- **Warm-up**: every scenario MUST include an explicit warm-up phase whose duration is recorded in the CSV report. This separates cold-start cost from steady-state behaviour.
- **Steady-state window**: pass criteria apply only to the steady-state window (after warm-up), not to the warm-up itself.
- **Hardware honesty**: scenarios that name Tier HW MUST run on representative Jetson Orin Nano Super OR on a benchmarked replay. Pure-x86-emulator runs report results but do NOT contribute to the project-level Acceptance Gate.
- **Concurrent workload disclosure**: every scenario records whether other autopilot subsystems were running concurrently (Tier-1 inference, VLM, MAVLink, etc.). Re2 is the only scenario that REQUIRES concurrent workload; the others MUST report it for context.
- **Seed + determinism**: where the test inputs randomness (e.g., synthetic-POI ordering tie-breakers), the seed is captured in the CSV report.