# Performance Tests **Task**: AZ-146_test_performance **Name**: Performance Tests **Description**: Implement E2E tests measuring detection latency, concurrent inference throughput, tiling overhead, and video processing frame rate **Complexity**: 3 points **Dependencies**: AZ-138_test_infrastructure **Component**: Integration Tests **Jira**: AZ-146 **Epic**: AZ-137 ## Problem Performance characteristics must be baselined and verified: single image latency, concurrent request handling with the 2-worker ThreadPoolExecutor, tiling overhead for large images, and video processing frame rate. These tests establish performance contracts. ## Outcome - Single image latency profiled (p50, p95, p99) for warm engine - Concurrent inference behavior validated (2-at-a-time processing confirmed) - Large image tiling overhead measured and bounded - Video processing frame rate baselined ## Scope ### Included - NFT-PERF-01: Single image detection latency - NFT-PERF-02: Concurrent inference throughput - NFT-PERF-03: Large image tiling processing time - NFT-PERF-04: Video processing frame rate ### Excluded - GPU vs CPU comparative benchmarks - Memory usage profiling - Load testing beyond 4 concurrent requests ## Acceptance Criteria **AC-1: Single image latency** Given a warm engine When 10 sequential POST /detect requests are sent with small-image Then p95 latency < 5000ms for ONNX CPU or p95 < 1000ms for TensorRT GPU **AC-2: Concurrent throughput** Given a warm engine When 2 concurrent POST /detect requests are sent Then both complete without error And 3 concurrent requests show queuing (total time > time for 2) **AC-3: Tiling overhead** Given a warm engine When POST /detect is sent with large-image (4000x3000) Then request completes within 120s And processing time scales proportionally with tile count **AC-4: Video frame rate** Given a warm engine with SSE connected When async detection processes test-video with frame_period=4 Then processing completes within 5x video duration (< 50s) And frame processing rate is consistent (no stalls > 10s) ## Non-Functional Requirements **Performance** - Tests themselves should complete within defined bounds - Results should be logged for trend analysis ## Integration Tests | AC Ref | Initial Data/Conditions | What to Test | Expected Behavior | NFR References | |--------|------------------------|-------------|-------------------|----------------| | AC-1 | Engine warm | 10 sequential detections | p95 < 5000ms (CPU) | ~60s | | AC-2 | Engine warm | 2 then 3 concurrent requests | Queuing observed at 3 | ~30s | | AC-3 | Engine warm, large-image | Single large image detection | Completes < 120s | ~120s | | AC-4 | Engine warm, SSE connected | Video detection | < 50s, consistent rate | ~120s | ## Constraints - Pass criteria differ between CPU (ONNX) and GPU (TensorRT) profiles - Concurrent request tests must account for connection overhead - Video frame rate depends on hardware; test measures consistency, not absolute speed ## Risks & Mitigation **Risk 1: CI hardware variability** - *Risk*: Latency thresholds may fail on slower CI hardware - *Mitigation*: Use generous thresholds; mark as performance benchmark tests that can be skipped in resource-constrained CI