Files
Oleksandr Bezdieniezhnykh bc40ea7300 [AZ-626] Decompose complete: 47 tasks + docs + module layout
Greenfield Steps 1-6 baseline for the autopilot rewrite from legacy
Qt/C++ to a Rust workspace.

- Remove legacy Qt/C++ tree (ai_controller, drone_controller,
  misc/camera, python_scaffold, root Dockerfile, autopilot.pro,
  legacy main.py / requirements.txt).
- Add _docs/00_problem (problem, restrictions, acceptance criteria,
  security approach, input data + fixtures).
- Add _docs/01_solution/solution_draft01.
- Add _docs/02_document (architecture, system-flows, data_model,
  glossary, decision-rationale, deployment, 13 component descriptions,
  tests/ specs, FINAL_report, module-layout).
- Add _docs/02_tasks/todo with 47 task specs (AZ-640..AZ-686, one
  bootstrap + 46 component tasks) and _dependencies_table.md.
- Add .cursor/rules/artifact-srp.mdc (single-responsibility rule for
  canonical _docs artifacts).
- Track autodev state in _docs/_autodev_state.md (Step 6 completed,
  ready for Step 7 Implement).

Jira: bootstrap AZ-626; component epics AZ-627..AZ-639; tasks
AZ-640..AZ-686. Total complexity 173 points across 12 epics.

Co-authored-by: Cursor <cursoragent@cursor.com>
2026-05-19 11:02:01 +03:00

14 KiB
Raw Permalink Blame History

Performance Tests

Authored by /test-spec Phase 2 (2026-05-19). Performance tests measure latency / rate / sustained-load characteristics. Functional behaviour that those characteristics enable lives in blackbox-tests.md. Resource ceilings live in resource-limit-tests.md.

Every scenario records steady-state metrics — cold-start measurements are explicitly excluded by a warm-up precondition. Pass criteria use the methods in _docs/00_problem/input_data/expected_results/results_report.md (referenced by row id).


Latency

NFT-PERF-L1: Tier-1 per-frame end-to-end latency ≤ 100 ms

Summary: Per-frame end-to-end latency through the Tier-1 contract (frame in → normalised-box record out) ≤ 100 ms at 1280 px input. Traces to: AC Latency — Primitive (Tier 1) object detection / L1. Tier: HW (representative Jetson Orin Nano Super) OR benchmarked replay (the only way to satisfy the project-level Acceptance Gate). Metric: per-frame wall-clock from RTSP frame-receive timestamp to normalised-box emission timestamp.

Preconditions:

  • Warm-up: 100 frames played before measurement starts (TensorRT engine warm, autopilot's frame pipeline in steady state).
  • Single 1280 px frame replayed via rtsp-loopback; the live Tier-1 service is colocated on the same Jetson.
Step Consumer Action Measurement
1 Play fixtures/images/4d6e1830d211ad50.jpg as a 60 s loop at 30 fps record per-frame (frame_receive_ts, normalised_box_emit_ts); compute Δms
2 Aggregate over the measurement window report p50, p95, p99, max

Pass criteria: p95 ≤ 100 ms AND max ≤ 150 ms (max gives a soft headroom; AC enforces the p95 line). Duration: 60 s after warm-up. Test status: READY (fixture present); Tier requires HW for the release gate.


NFT-PERF-L2: Tier-2 per-ROI semantic confirmation ≤ 200 ms

Summary: Per-ROI latency through Tier-2 semantic confirmation ≤ 200 ms. Traces to: AC Latency — Semantic confirmation (Tier 2) / L2. Tier: HW + Tier-B (inline ROI crop generation). Metric: per-ROI wall-clock from ROI submitted to Tier-2 to Tier-2 emits semantic confirmation.

Preconditions:

  • Warm-up: 50 ROIs processed before measurement.
  • Test runner derives a ~640×640 ROI inline from fixtures/images/4d6e1830d211ad50.jpg and injects it directly into the SUT's Tier-2 entry (via a test-only ROI submission API exposed in test builds).
Step Consumer Action Measurement
1 Submit 1000 ROIs at 5 Hz per-ROI Δms
2 Aggregate p50, p95, p99

Pass criteria: p95 ≤ 200 ms. Duration: 200 s. Test status: READY.


NFT-PERF-L3: Tier-3 deep-analysis ≤ 5 s per ROI

Summary: Per-ROI deep-analysis (Tier-3 / VLM, when enabled) ≤ 5 s. Traces to: AC Latency — Deep semantic confirmation (Tier 3 / VLM, when enabled) / L3. Tier: HW + Tier-B (vlm-mock). Metric: per-ROI wall-clock from SUT issuing a Tier-3 IPC call to VLM response received and schema-validated.

Preconditions:

  • Warm-up: 5 Tier-3 calls.
  • vlm-mock configured to respond from vlm-io-pairs fixture; Tier-3 enabled via SUT config.
Step Consumer Action Measurement
1 Trigger 100 Tier-3 calls via injected ROIs per-call Δms
2 Aggregate p50, p95, p99

Pass criteria: p95 ≤ 5000 ms. Duration: as needed for 100 calls. Test status: DEFERRED — <DEFERRED: vlm-io-pairs (real I/O) and the pinned local VLM model>.


NFT-PERF-L4: Camera zoom transition (medium → high) ≤ 2 s

Summary: Wall-clock from issuing the medium→high zoom command to the physical zoom transition completing ≤ 2 s, including the 12 s physical floor (restriction). Traces to: AC Latency — Camera zoom transition / L4, RESTRICT Hardware — 40× optical zoom traversal takes 12 s wall-clock. Tier: HW (physical A40 OR benchmarked replay) — pure-emulator runs not acceptable per expected_results/results_report.md → Notes on this spec. Metric: wall-clock from outbound zoom command (observed on gimbal UDP) to gimbal-mock zoom telemetry reporting target_zoom_band.

Preconditions:

  • SUT in ZoomedIn mode after a sweep-to-zoom transition; gimbal at medium zoom.
  • HW Jetson OR gimbal-mock replaying recorded A40 zoom telemetry with realistic traversal time.
Step Consumer Action Measurement
1 Trigger 30 medium→high zoom transitions via scripted POI sequence per-transition Δms
2 Aggregate p50, p95, max

Pass criteria: p95 ≤ 2000 ms. Test status: DEFERRED — <DEFERRED: SITL or hardware-in-loop ViewPro A40 zoom command capture>.


NFT-PERF-L5: Decision-to-movement latency ≤ 500 ms

Summary: From the internal scan-control decision (POI detected mid-sweep) to the camera physically beginning to move ≤ 500 ms. Traces to: AC Latency — Decision-to-movement latency / L5. Tier: HW + Tier-B. Metric: wall-clock from Tier-1 detection received at the scan-controller to first gimbal command observed on gimbal-mock.

Preconditions:

  • Warm-up: 10 scripted POI events.
  • Scripted scan-decision events followed by camera physical motion observed on the gimbal UDP channel.
Step Consumer Action Measurement
1 Inject 100 POI detections at random sweep positions per-event Δms (detection-receive-ts → gimbal-command-out-ts)
2 Aggregate p95

Pass criteria: p95 ≤ 500 ms. Test status: DEFERRED — <DEFERRED: scripted scan decision events with paired gimbal telemetry capture>.


NFT-PERF-L6: Movement candidate enqueue ≤ 1 s (wide sweep)

Summary: From the movement event in the visual stream to candidate enqueued for zoomed inspection ≤ 1 s during the wide-area sweep. Traces to: AC Latency — Movement candidate enqueue / L6. Tier: B + E. Metric: wall-clock from ground-truth movement-event timestamp (annotated in the fixture) to candidate appearing on operator-stream.

Preconditions:

  • Warm-up: 30 s of sweep playback.
  • Synchronised RTSP + gimbal.csv + telemetry.csv (DEFERRED CSV pair).
Step Consumer Action Measurement
1 Replay fixtures/movement/video01.mp4 + paired CSVs record per-event Δms
2 Aggregate over ~20 movement events p95

Pass criteria: p95 ≤ 1000 ms. Test status: DEFERRED — <DEFERRED: paired gimbal.csv + telemetry.csv for video01.mp4 with annotated movement-event timestamps>.


NFT-PERF-L7: Movement candidate enqueue ≤ 1.5 s (zoomed-in)

Summary: Same as L6 but during a zoomed-in hold; budget relaxed to 1.5 s to accommodate gimbal slew. Traces to: AC Latency — Movement candidate enqueue … during the zoomed-in inspection / L7. Tier: B + E. Metric: same as L6 but starting from a ZoomedIn hold.

Preconditions:

  • SUT in ZoomedIn hold; small mover appears mid-hold.
  • DEFERRED zoomed-in CSV pair.
Step Consumer Action Measurement
1 Drive SUT into ZoomedIn hold; replay zoomed-in scene with small mover per-event Δms
2 Aggregate over ~10 movement events p95

Pass criteria: p95 ≤ 1500 ms. Test status: DEFERRED — <DEFERRED: paired gimbal.csv + telemetry.csv at zoomed-in band>.


NFT-PERF-L8: Zoom-out → zoom-in transition ≤ 2 s

Summary: From POI detected during sweep to ROI fully zoomed and held ≤ 2 s wall-clock. Traces to: AC Latency — Zoom-out → zoom-in transition / L8. Tier: HW + Tier-B. Metric: wall-clock from Tier-1 detection injected → first frame at full zoom on the ROI (observed via gimbal-mock zoom telemetry and the operator-stream ROI overlay).

Preconditions:

  • Warm-up.
  • Scripted sweep + injected POI.
Step Consumer Action Measurement
1 Inject 30 mid-sweep POIs per-transition Δms
2 Aggregate p95

Pass criteria: p95 ≤ 2000 ms. Test status: DEFERRED — <DEFERRED: sweep → zoomed-inspection transition capture with annotated transition-complete timestamps>.


NFT-PERF-L9: Operator command → action ≤ 500 ms

Summary: From operator click event (entering the SUT on the operator-stream return path) to the corresponding outbound command observed on its destination channel ≤ 500 ms; modem RTT explicitly excluded by measuring inside the SUT-side of the modem. Traces to: AC Latency — Operator command → action / L9. Tier: B + E. Metric: wall-clock from operator-stream message arrival at SUT → first outbound command observed on the affected channel (MAVLink waypoint POST, gimbal command, mode-change emission).

Preconditions:

  • Operator-session-scripts include click events at deterministic offsets.
Step Consumer Action Measurement
1 Replay scripted operator-click sequence (50 clicks across confirm / decline / target-follow / abort) per-click Δms
2 Aggregate p95

Pass criteria: p95 ≤ 500 ms. Test status: DEFERRED — <DEFERRED: operator-envelopes once Q9 resolves> for signed commands; happy-path placeholder usable today for an early measurement (mark interim baseline only).


Throughput / Rate

NFT-PERF-T1: POI rate to operator capped at ≤ 5 / min

Summary: Even when Tier-1 produces detections faster than the cap, the rate of POIs SURFACED to the operator MUST stay ≤ 5 / min (hard cap, frozen 2026-05-06). Traces to: AC Throughput / Rate — POI rate surfaced to the operator / T1. Tier: B. Metric: count of POIs emitted on operator-stream per rolling 60 s window.

Preconditions:

  • Synthetic POI feed sustained at 20 POIs / min via synthetic-poi-feeds.
Step Consumer Action Measurement
1 Inject sustained 20 POI/min feed for 10 minutes per-minute count of surfaced POIs
2 Compute max over any rolling 60 s window rolling-max

Pass criteria: rolling-max ≤ 5 POIs/min for every 60 s window. Duration: 10 min. Test status: READY (synthetic feeds inline-authorable).


NFT-PERF-T2: Position telemetry rate ∈ [1 Hz, 10 Hz]

Summary: The position telemetry the SUT consumes from the airframe link MUST sustain ≥1 Hz, target 10 Hz, over a 60 s window. Traces to: AC Throughput / Rate — Position telemetry rate / T2. Tier: B (with MAVLink replay) + E (live SITL). Metric: count of GLOBAL_POSITION_INT messages consumed by the SUT per second.

Preconditions:

  • MAVLink stream replayed at the configured target rate (10 Hz).
Step Consumer Action Measurement
1 Replay 60 s of GLOBAL_POSITION_INT at 10 Hz per-second consumed count
2 Aggregate min, mean

Pass criteria: min ≥ 1 Hz AND mean ≥ 9.5 Hz (target 10 Hz with ≤ 5 % tolerance). Test status: DEFERRED — <DEFERRED: MAVLink replay fixture over a 60 s window>.


NFT-PERF-T3: Frame-rate floor → suppress zoom-in + health yellow

Summary: When the sustained camera frame rate drops below 10 fps for ≥5 s, zoom-in transitions MUST be suppressed AND overall health MUST surface yellow. Traces to: AC Throughput / Rate — Sustained camera frame-rate floor / T3. Tier: B. Metric: pair: (boolean — was a zoom-in suppressed during the low-FPS window?), (boolean — did health surface yellow?).

Preconditions:

  • SUT in normal sweep mode.
  • rtsp-loopback plays fixtures/videos/94d42580bd1ad6ff.mp4 with throttled decode injecting frame drops to keep FPS < 10 for ≥ 5 s.
Step Consumer Action Measurement
1 Start playback at normal 30 fps health remains green; zoom-in proceeds normally on detection
2 Throttle decode + drop frames to push FPS below 10 for ≥ 5 s record: (a) whether a zoom-in-required event during this window was suppressed; (b) whether GET /health returns overall == "yellow"

Pass criteria: both observations TRUE. Duration: 30 s (5 s low-FPS window + buffer). Test status: READY (fixture present; throttling implemented by consumer).


Sustained-load (handoff to resource-limit-tests)

The two sustained-resource AC rows (Re1, Re2) live as resource-limit tests rather than performance tests because the pass criterion is "stays within ceiling for the duration", not "is fast enough":

  • Re1 — combined RSS ≤ 6 GB onboard for everything autopilot owns — see resource-limit-tests.md → NFT-RES-LIM-Re1.
  • Re2 — Tier-1 per-frame latency Δ ≤ 5 ms when autopilot's workload runs concurrently — see resource-limit-tests.md → NFT-RES-LIM-Re2. Re2 is the Tier-1 non-degradation contract; the absolute Tier-1 latency target is L1.

Common preconditions for every performance scenario

  • Warm-up: every scenario MUST include an explicit warm-up phase whose duration is recorded in the CSV report. This separates cold-start cost from steady-state behaviour.
  • Steady-state window: pass criteria apply only to the steady-state window (after warm-up), not to the warm-up itself.
  • Hardware honesty: scenarios that name Tier HW MUST run on representative Jetson Orin Nano Super OR on a benchmarked replay. Pure-x86-emulator runs report results but do NOT contribute to the project-level Acceptance Gate.
  • Concurrent workload disclosure: every scenario records whether other autopilot subsystems were running concurrently (Tier-1 inference, VLM, MAVLink, etc.). Re2 is the only scenario that REQUIRES concurrent workload; the others MUST report it for context.
  • Seed + determinism: where the test inputs randomness (e.g., synthetic-POI ordering tie-breakers), the seed is captured in the CSV report.