mirror of
https://github.com/azaion/autopilot.git
synced 2026-06-21 15:51:10 +00:00
bc40ea7300
Greenfield Steps 1-6 baseline for the autopilot rewrite from legacy Qt/C++ to a Rust workspace. - Remove legacy Qt/C++ tree (ai_controller, drone_controller, misc/camera, python_scaffold, root Dockerfile, autopilot.pro, legacy main.py / requirements.txt). - Add _docs/00_problem (problem, restrictions, acceptance criteria, security approach, input data + fixtures). - Add _docs/01_solution/solution_draft01. - Add _docs/02_document (architecture, system-flows, data_model, glossary, decision-rationale, deployment, 13 component descriptions, tests/ specs, FINAL_report, module-layout). - Add _docs/02_tasks/todo with 47 task specs (AZ-640..AZ-686, one bootstrap + 46 component tasks) and _dependencies_table.md. - Add .cursor/rules/artifact-srp.mdc (single-responsibility rule for canonical _docs artifacts). - Track autodev state in _docs/_autodev_state.md (Step 6 completed, ready for Step 7 Implement). Jira: bootstrap AZ-626; component epics AZ-627..AZ-639; tasks AZ-640..AZ-686. Total complexity 173 points across 12 epics. Co-authored-by: Cursor <cursoragent@cursor.com>
271 lines
14 KiB
Markdown
271 lines
14 KiB
Markdown
# Performance Tests
|
||
|
||
Authored by `/test-spec` Phase 2 (2026-05-19). Performance tests measure latency / rate / sustained-load characteristics. Functional behaviour that those characteristics enable lives in `blackbox-tests.md`. Resource ceilings live in `resource-limit-tests.md`.
|
||
|
||
Every scenario records steady-state metrics — cold-start measurements are explicitly excluded by a warm-up precondition. Pass criteria use the methods in `_docs/00_problem/input_data/expected_results/results_report.md` (referenced by row id).
|
||
|
||
---
|
||
|
||
## Latency
|
||
|
||
### NFT-PERF-L1: Tier-1 per-frame end-to-end latency ≤ 100 ms
|
||
**Summary**: Per-frame end-to-end latency through the Tier-1 contract (frame in → normalised-box record out) ≤ 100 ms at 1280 px input.
|
||
**Traces to**: AC `Latency — Primitive (Tier 1) object detection / L1`.
|
||
**Tier**: HW (representative Jetson Orin Nano Super) OR benchmarked replay (the only way to satisfy the project-level Acceptance Gate).
|
||
**Metric**: per-frame wall-clock from RTSP frame-receive timestamp to normalised-box emission timestamp.
|
||
|
||
**Preconditions**:
|
||
- Warm-up: 100 frames played before measurement starts (TensorRT engine warm, autopilot's frame pipeline in steady state).
|
||
- Single 1280 px frame replayed via `rtsp-loopback`; the live Tier-1 service is colocated on the same Jetson.
|
||
|
||
| Step | Consumer Action | Measurement |
|
||
|---|---|---|
|
||
| 1 | Play `fixtures/images/4d6e1830d211ad50.jpg` as a 60 s loop at 30 fps | record per-frame (frame_receive_ts, normalised_box_emit_ts); compute Δms |
|
||
| 2 | Aggregate over the measurement window | report p50, p95, p99, max |
|
||
|
||
**Pass criteria**: `p95 ≤ 100 ms` AND `max ≤ 150 ms` (max gives a soft headroom; AC enforces the p95 line).
|
||
**Duration**: 60 s after warm-up.
|
||
**Test status**: READY (fixture present); Tier requires HW for the release gate.
|
||
|
||
---
|
||
|
||
### NFT-PERF-L2: Tier-2 per-ROI semantic confirmation ≤ 200 ms
|
||
**Summary**: Per-ROI latency through Tier-2 semantic confirmation ≤ 200 ms.
|
||
**Traces to**: AC `Latency — Semantic confirmation (Tier 2) / L2`.
|
||
**Tier**: HW + Tier-B (inline ROI crop generation).
|
||
**Metric**: per-ROI wall-clock from ROI submitted to Tier-2 to Tier-2 emits semantic confirmation.
|
||
|
||
**Preconditions**:
|
||
- Warm-up: 50 ROIs processed before measurement.
|
||
- Test runner derives a ~640×640 ROI inline from `fixtures/images/4d6e1830d211ad50.jpg` and injects it directly into the SUT's Tier-2 entry (via a test-only ROI submission API exposed in test builds).
|
||
|
||
| Step | Consumer Action | Measurement |
|
||
|---|---|---|
|
||
| 1 | Submit 1000 ROIs at 5 Hz | per-ROI Δms |
|
||
| 2 | Aggregate | p50, p95, p99 |
|
||
|
||
**Pass criteria**: `p95 ≤ 200 ms`.
|
||
**Duration**: 200 s.
|
||
**Test status**: READY.
|
||
|
||
---
|
||
|
||
### NFT-PERF-L3: Tier-3 deep-analysis ≤ 5 s per ROI
|
||
**Summary**: Per-ROI deep-analysis (Tier-3 / VLM, when enabled) ≤ 5 s.
|
||
**Traces to**: AC `Latency — Deep semantic confirmation (Tier 3 / VLM, when enabled) / L3`.
|
||
**Tier**: HW + Tier-B (vlm-mock).
|
||
**Metric**: per-ROI wall-clock from SUT issuing a Tier-3 IPC call to VLM response received and schema-validated.
|
||
|
||
**Preconditions**:
|
||
- Warm-up: 5 Tier-3 calls.
|
||
- `vlm-mock` configured to respond from `vlm-io-pairs` fixture; Tier-3 enabled via SUT config.
|
||
|
||
| Step | Consumer Action | Measurement |
|
||
|---|---|---|
|
||
| 1 | Trigger 100 Tier-3 calls via injected ROIs | per-call Δms |
|
||
| 2 | Aggregate | p50, p95, p99 |
|
||
|
||
**Pass criteria**: `p95 ≤ 5000 ms`.
|
||
**Duration**: as needed for 100 calls.
|
||
**Test status**: DEFERRED — `<DEFERRED: vlm-io-pairs (real I/O) and the pinned local VLM model>`.
|
||
|
||
---
|
||
|
||
### NFT-PERF-L4: Camera zoom transition (medium → high) ≤ 2 s
|
||
**Summary**: Wall-clock from issuing the medium→high zoom command to the physical zoom transition completing ≤ 2 s, including the 1–2 s physical floor (restriction).
|
||
**Traces to**: AC `Latency — Camera zoom transition / L4`, RESTRICT `Hardware — 40× optical zoom traversal takes 1–2 s wall-clock`.
|
||
**Tier**: HW (physical A40 OR benchmarked replay) — pure-emulator runs not acceptable per `expected_results/results_report.md → Notes on this spec`.
|
||
**Metric**: wall-clock from outbound zoom command (observed on gimbal UDP) to gimbal-mock zoom telemetry reporting target_zoom_band.
|
||
|
||
**Preconditions**:
|
||
- SUT in `ZoomedIn` mode after a sweep-to-zoom transition; gimbal at medium zoom.
|
||
- HW Jetson OR `gimbal-mock` replaying recorded A40 zoom telemetry with realistic traversal time.
|
||
|
||
| Step | Consumer Action | Measurement |
|
||
|---|---|---|
|
||
| 1 | Trigger 30 medium→high zoom transitions via scripted POI sequence | per-transition Δms |
|
||
| 2 | Aggregate | p50, p95, max |
|
||
|
||
**Pass criteria**: `p95 ≤ 2000 ms`.
|
||
**Test status**: DEFERRED — `<DEFERRED: SITL or hardware-in-loop ViewPro A40 zoom command capture>`.
|
||
|
||
---
|
||
|
||
### NFT-PERF-L5: Decision-to-movement latency ≤ 500 ms
|
||
**Summary**: From the internal scan-control decision (POI detected mid-sweep) to the camera physically beginning to move ≤ 500 ms.
|
||
**Traces to**: AC `Latency — Decision-to-movement latency / L5`.
|
||
**Tier**: HW + Tier-B.
|
||
**Metric**: wall-clock from Tier-1 detection received at the scan-controller to first gimbal command observed on `gimbal-mock`.
|
||
|
||
**Preconditions**:
|
||
- Warm-up: 10 scripted POI events.
|
||
- Scripted scan-decision events followed by camera physical motion observed on the gimbal UDP channel.
|
||
|
||
| Step | Consumer Action | Measurement |
|
||
|---|---|---|
|
||
| 1 | Inject 100 POI detections at random sweep positions | per-event Δms (detection-receive-ts → gimbal-command-out-ts) |
|
||
| 2 | Aggregate | p95 |
|
||
|
||
**Pass criteria**: `p95 ≤ 500 ms`.
|
||
**Test status**: DEFERRED — `<DEFERRED: scripted scan decision events with paired gimbal telemetry capture>`.
|
||
|
||
---
|
||
|
||
### NFT-PERF-L6: Movement candidate enqueue ≤ 1 s (wide sweep)
|
||
**Summary**: From the movement event in the visual stream to candidate enqueued for zoomed inspection ≤ 1 s during the wide-area sweep.
|
||
**Traces to**: AC `Latency — Movement candidate enqueue / L6`.
|
||
**Tier**: B + E.
|
||
**Metric**: wall-clock from ground-truth movement-event timestamp (annotated in the fixture) to candidate appearing on operator-stream.
|
||
|
||
**Preconditions**:
|
||
- Warm-up: 30 s of sweep playback.
|
||
- Synchronised RTSP + gimbal.csv + telemetry.csv (DEFERRED CSV pair).
|
||
|
||
| Step | Consumer Action | Measurement |
|
||
|---|---|---|
|
||
| 1 | Replay `fixtures/movement/video01.mp4` + paired CSVs | record per-event Δms |
|
||
| 2 | Aggregate over ~20 movement events | p95 |
|
||
|
||
**Pass criteria**: `p95 ≤ 1000 ms`.
|
||
**Test status**: DEFERRED — `<DEFERRED: paired gimbal.csv + telemetry.csv for video01.mp4 with annotated movement-event timestamps>`.
|
||
|
||
---
|
||
|
||
### NFT-PERF-L7: Movement candidate enqueue ≤ 1.5 s (zoomed-in)
|
||
**Summary**: Same as L6 but during a zoomed-in hold; budget relaxed to 1.5 s to accommodate gimbal slew.
|
||
**Traces to**: AC `Latency — Movement candidate enqueue … during the zoomed-in inspection / L7`.
|
||
**Tier**: B + E.
|
||
**Metric**: same as L6 but starting from a ZoomedIn hold.
|
||
|
||
**Preconditions**:
|
||
- SUT in ZoomedIn hold; small mover appears mid-hold.
|
||
- DEFERRED zoomed-in CSV pair.
|
||
|
||
| Step | Consumer Action | Measurement |
|
||
|---|---|---|
|
||
| 1 | Drive SUT into ZoomedIn hold; replay zoomed-in scene with small mover | per-event Δms |
|
||
| 2 | Aggregate over ~10 movement events | p95 |
|
||
|
||
**Pass criteria**: `p95 ≤ 1500 ms`.
|
||
**Test status**: DEFERRED — `<DEFERRED: paired gimbal.csv + telemetry.csv at zoomed-in band>`.
|
||
|
||
---
|
||
|
||
### NFT-PERF-L8: Zoom-out → zoom-in transition ≤ 2 s
|
||
**Summary**: From POI detected during sweep to ROI fully zoomed and held ≤ 2 s wall-clock.
|
||
**Traces to**: AC `Latency — Zoom-out → zoom-in transition / L8`.
|
||
**Tier**: HW + Tier-B.
|
||
**Metric**: wall-clock from Tier-1 detection injected → first frame at full zoom on the ROI (observed via gimbal-mock zoom telemetry and the operator-stream ROI overlay).
|
||
|
||
**Preconditions**:
|
||
- Warm-up.
|
||
- Scripted sweep + injected POI.
|
||
|
||
| Step | Consumer Action | Measurement |
|
||
|---|---|---|
|
||
| 1 | Inject 30 mid-sweep POIs | per-transition Δms |
|
||
| 2 | Aggregate | p95 |
|
||
|
||
**Pass criteria**: `p95 ≤ 2000 ms`.
|
||
**Test status**: DEFERRED — `<DEFERRED: sweep → zoomed-inspection transition capture with annotated transition-complete timestamps>`.
|
||
|
||
---
|
||
|
||
### NFT-PERF-L9: Operator command → action ≤ 500 ms
|
||
**Summary**: From operator click event (entering the SUT on the operator-stream return path) to the corresponding outbound command observed on its destination channel ≤ 500 ms; modem RTT explicitly excluded by measuring inside the SUT-side of the modem.
|
||
**Traces to**: AC `Latency — Operator command → action / L9`.
|
||
**Tier**: B + E.
|
||
**Metric**: wall-clock from operator-stream message arrival at SUT → first outbound command observed on the affected channel (MAVLink waypoint POST, gimbal command, mode-change emission).
|
||
|
||
**Preconditions**:
|
||
- Operator-session-scripts include click events at deterministic offsets.
|
||
|
||
| Step | Consumer Action | Measurement |
|
||
|---|---|---|
|
||
| 1 | Replay scripted operator-click sequence (50 clicks across confirm / decline / target-follow / abort) | per-click Δms |
|
||
| 2 | Aggregate | p95 |
|
||
|
||
**Pass criteria**: `p95 ≤ 500 ms`.
|
||
**Test status**: DEFERRED — `<DEFERRED: operator-envelopes once Q9 resolves>` for signed commands; happy-path placeholder usable today for an early measurement (mark interim baseline only).
|
||
|
||
---
|
||
|
||
## Throughput / Rate
|
||
|
||
### NFT-PERF-T1: POI rate to operator capped at ≤ 5 / min
|
||
**Summary**: Even when Tier-1 produces detections faster than the cap, the rate of POIs SURFACED to the operator MUST stay ≤ 5 / min (hard cap, frozen 2026-05-06).
|
||
**Traces to**: AC `Throughput / Rate — POI rate surfaced to the operator / T1`.
|
||
**Tier**: B.
|
||
**Metric**: count of POIs emitted on operator-stream per rolling 60 s window.
|
||
|
||
**Preconditions**:
|
||
- Synthetic POI feed sustained at 20 POIs / min via `synthetic-poi-feeds`.
|
||
|
||
| Step | Consumer Action | Measurement |
|
||
|---|---|---|
|
||
| 1 | Inject sustained 20 POI/min feed for 10 minutes | per-minute count of surfaced POIs |
|
||
| 2 | Compute max over any rolling 60 s window | rolling-max |
|
||
|
||
**Pass criteria**: `rolling-max ≤ 5` POIs/min for every 60 s window.
|
||
**Duration**: 10 min.
|
||
**Test status**: READY (synthetic feeds inline-authorable).
|
||
|
||
---
|
||
|
||
### NFT-PERF-T2: Position telemetry rate ∈ [1 Hz, 10 Hz]
|
||
**Summary**: The position telemetry the SUT consumes from the airframe link MUST sustain ≥1 Hz, target 10 Hz, over a 60 s window.
|
||
**Traces to**: AC `Throughput / Rate — Position telemetry rate / T2`.
|
||
**Tier**: B (with MAVLink replay) + E (live SITL).
|
||
**Metric**: count of `GLOBAL_POSITION_INT` messages consumed by the SUT per second.
|
||
|
||
**Preconditions**:
|
||
- MAVLink stream replayed at the configured target rate (10 Hz).
|
||
|
||
| Step | Consumer Action | Measurement |
|
||
|---|---|---|
|
||
| 1 | Replay 60 s of GLOBAL_POSITION_INT at 10 Hz | per-second consumed count |
|
||
| 2 | Aggregate | min, mean |
|
||
|
||
**Pass criteria**: `min ≥ 1 Hz` AND `mean ≥ 9.5 Hz` (target 10 Hz with ≤ 5 % tolerance).
|
||
**Test status**: DEFERRED — `<DEFERRED: MAVLink replay fixture over a 60 s window>`.
|
||
|
||
---
|
||
|
||
### NFT-PERF-T3: Frame-rate floor → suppress zoom-in + health yellow
|
||
**Summary**: When the sustained camera frame rate drops below 10 fps for ≥5 s, zoom-in transitions MUST be suppressed AND overall health MUST surface yellow.
|
||
**Traces to**: AC `Throughput / Rate — Sustained camera frame-rate floor / T3`.
|
||
**Tier**: B.
|
||
**Metric**: pair: (boolean — was a zoom-in suppressed during the low-FPS window?), (boolean — did health surface yellow?).
|
||
|
||
**Preconditions**:
|
||
- SUT in normal sweep mode.
|
||
- `rtsp-loopback` plays `fixtures/videos/94d42580bd1ad6ff.mp4` with throttled decode injecting frame drops to keep FPS < 10 for ≥ 5 s.
|
||
|
||
| Step | Consumer Action | Measurement |
|
||
|---|---|---|
|
||
| 1 | Start playback at normal 30 fps | health remains green; zoom-in proceeds normally on detection |
|
||
| 2 | Throttle decode + drop frames to push FPS below 10 for ≥ 5 s | record: (a) whether a zoom-in-required event during this window was suppressed; (b) whether `GET /health` returns `overall == "yellow"` |
|
||
|
||
**Pass criteria**: both observations TRUE.
|
||
**Duration**: 30 s (5 s low-FPS window + buffer).
|
||
**Test status**: READY (fixture present; throttling implemented by consumer).
|
||
|
||
---
|
||
|
||
## Sustained-load (handoff to resource-limit-tests)
|
||
|
||
The two sustained-resource AC rows (Re1, Re2) live as resource-limit tests rather than performance tests because the pass criterion is "stays within ceiling for the duration", not "is fast enough":
|
||
|
||
- Re1 — combined RSS ≤ 6 GB onboard for everything autopilot owns — see `resource-limit-tests.md → NFT-RES-LIM-Re1`.
|
||
- Re2 — Tier-1 per-frame latency Δ ≤ 5 ms when autopilot's workload runs concurrently — see `resource-limit-tests.md → NFT-RES-LIM-Re2`. Re2 is the Tier-1 non-degradation contract; the absolute Tier-1 latency target is L1.
|
||
|
||
---
|
||
|
||
## Common preconditions for every performance scenario
|
||
|
||
- **Warm-up**: every scenario MUST include an explicit warm-up phase whose duration is recorded in the CSV report. This separates cold-start cost from steady-state behaviour.
|
||
- **Steady-state window**: pass criteria apply only to the steady-state window (after warm-up), not to the warm-up itself.
|
||
- **Hardware honesty**: scenarios that name Tier HW MUST run on representative Jetson Orin Nano Super OR on a benchmarked replay. Pure-x86-emulator runs report results but do NOT contribute to the project-level Acceptance Gate.
|
||
- **Concurrent workload disclosure**: every scenario records whether other autopilot subsystems were running concurrently (Tier-1 inference, VLM, MAVLink, etc.). Re2 is the only scenario that REQUIRES concurrent workload; the others MUST report it for context.
|
||
- **Seed + determinism**: where the test inputs randomness (e.g., synthetic-POI ordering tie-breakers), the seed is captured in the CSV report.
|