Files
ai-training/_docs/02_tasks/done/AZ-146_test_performance.md
T
Oleksandr Bezdieniezhnykh cbf370c765 Refactor task management structure and update documentation
- Changed the directory structure for task specifications to include a dedicated `todo/` folder within `_docs/02_tasks/` for tasks ready for implementation.
- Updated references in various skills and documentation to reflect the new task lifecycle, including changes in the `implementer` and `decompose` skills.
- Enhanced the README and flow documentation to clarify the new task organization and its implications for the implementation process.

These updates improve task management clarity and streamline the implementation workflow.
2026-03-28 01:17:45 +02:00

3.2 KiB

Performance Tests

Task: AZ-146_test_performance Name: Performance Tests Description: Implement E2E tests measuring detection latency, concurrent inference throughput, tiling overhead, and video processing frame rate Complexity: 3 points Dependencies: AZ-138_test_infrastructure Component: Integration Tests Jira: AZ-146 Epic: AZ-137

Problem

Performance characteristics must be baselined and verified: single image latency, concurrent request handling with the 2-worker ThreadPoolExecutor, tiling overhead for large images, and video processing frame rate. These tests establish performance contracts.

Outcome

  • Single image latency profiled (p50, p95, p99) for warm engine
  • Concurrent inference behavior validated (2-at-a-time processing confirmed)
  • Large image tiling overhead measured and bounded
  • Video processing frame rate baselined

Scope

Included

  • NFT-PERF-01: Single image detection latency
  • NFT-PERF-02: Concurrent inference throughput
  • NFT-PERF-03: Large image tiling processing time
  • NFT-PERF-04: Video processing frame rate

Excluded

  • GPU vs CPU comparative benchmarks
  • Memory usage profiling
  • Load testing beyond 4 concurrent requests

Acceptance Criteria

AC-1: Single image latency Given a warm engine When 10 sequential POST /detect requests are sent with small-image Then p95 latency < 5000ms for ONNX CPU or p95 < 1000ms for TensorRT GPU

AC-2: Concurrent throughput Given a warm engine When 2 concurrent POST /detect requests are sent Then both complete without error And 3 concurrent requests show queuing (total time > time for 2)

AC-3: Tiling overhead Given a warm engine When POST /detect is sent with large-image (4000x3000) Then request completes within 120s And processing time scales proportionally with tile count

AC-4: Video frame rate Given a warm engine with SSE connected When async detection processes test-video with frame_period=4 Then processing completes within 5x video duration (< 50s) And frame processing rate is consistent (no stalls > 10s)

Non-Functional Requirements

Performance

  • Tests themselves should complete within defined bounds
  • Results should be logged for trend analysis

Integration Tests

AC Ref Initial Data/Conditions What to Test Expected Behavior NFR References
AC-1 Engine warm 10 sequential detections p95 < 5000ms (CPU) ~60s
AC-2 Engine warm 2 then 3 concurrent requests Queuing observed at 3 ~30s
AC-3 Engine warm, large-image Single large image detection Completes < 120s ~120s
AC-4 Engine warm, SSE connected Video detection < 50s, consistent rate ~120s

Constraints

  • Pass criteria differ between CPU (ONNX) and GPU (TensorRT) profiles
  • Concurrent request tests must account for connection overhead
  • Video frame rate depends on hardware; test measures consistency, not absolute speed

Risks & Mitigation

Risk 1: CI hardware variability

  • Risk: Latency thresholds may fail on slower CI hardware
  • Mitigation: Use generous thresholds; mark as performance benchmark tests that can be skipped in resource-constrained CI