Refactor task management structure and update documentation

- Changed the directory structure for task specifications to include a dedicated `todo/` folder within `_docs/02_tasks/` for tasks ready for implementation. - Updated references in various skills and documentation to reflect the new task lifecycle, including changes in the `implementer` and `decompose` skills. - Enhanced the README and flow documentation to clarify the new task organization and its implications for the implementation process. These updates improve task management clarity and streamline the implementation workflow.
2026-06-22 07:01:11 +00:00 · 2026-03-28 01:17:45 +02:00
parent 8c665bd0a4
commit cbf370c765
35 changed files with 1348 additions and 58 deletions
@@ -0,0 +1,86 @@
+# Performance Tests
+
+**Task**: AZ-146_test_performance
+**Name**: Performance Tests
+**Description**: Implement E2E tests measuring detection latency, concurrent inference throughput, tiling overhead, and video processing frame rate
+**Complexity**: 3 points
+**Dependencies**: AZ-138_test_infrastructure
+**Component**: Integration Tests
+**Jira**: AZ-146
+**Epic**: AZ-137
+
+## Problem
+
+Performance characteristics must be baselined and verified: single image latency, concurrent request handling with the 2-worker ThreadPoolExecutor, tiling overhead for large images, and video processing frame rate. These tests establish performance contracts.
+
+## Outcome
+
+- Single image latency profiled (p50, p95, p99) for warm engine
+- Concurrent inference behavior validated (2-at-a-time processing confirmed)
+- Large image tiling overhead measured and bounded
+- Video processing frame rate baselined
+
+## Scope
+
+### Included
+- NFT-PERF-01: Single image detection latency
+- NFT-PERF-02: Concurrent inference throughput
+- NFT-PERF-03: Large image tiling processing time
+- NFT-PERF-04: Video processing frame rate
+
+### Excluded
+- GPU vs CPU comparative benchmarks
+- Memory usage profiling
+- Load testing beyond 4 concurrent requests
+
+## Acceptance Criteria
+
+**AC-1: Single image latency**
+Given a warm engine
+When 10 sequential POST /detect requests are sent with small-image
+Then p95 latency < 5000ms for ONNX CPU or p95 < 1000ms for TensorRT GPU
+
+**AC-2: Concurrent throughput**
+Given a warm engine
+When 2 concurrent POST /detect requests are sent
+Then both complete without error
+And 3 concurrent requests show queuing (total time > time for 2)
+
+**AC-3: Tiling overhead**
+Given a warm engine
+When POST /detect is sent with large-image (4000x3000)
+Then request completes within 120s
+And processing time scales proportionally with tile count
+
+**AC-4: Video frame rate**
+Given a warm engine with SSE connected
+When async detection processes test-video with frame_period=4
+Then processing completes within 5x video duration (< 50s)
+And frame processing rate is consistent (no stalls > 10s)
+
+## Non-Functional Requirements
+
+**Performance**
+- Tests themselves should complete within defined bounds
+- Results should be logged for trend analysis
+
+## Integration Tests
+
+| AC Ref | Initial Data/Conditions | What to Test | Expected Behavior | NFR References |
+|--------|------------------------|-------------|-------------------|----------------|
+| AC-1 | Engine warm | 10 sequential detections | p95 < 5000ms (CPU) | ~60s |
+| AC-2 | Engine warm | 2 then 3 concurrent requests | Queuing observed at 3 | ~30s |
+| AC-3 | Engine warm, large-image | Single large image detection | Completes < 120s | ~120s |
+| AC-4 | Engine warm, SSE connected | Video detection | < 50s, consistent rate | ~120s |
+
+## Constraints
+
+- Pass criteria differ between CPU (ONNX) and GPU (TensorRT) profiles
+- Concurrent request tests must account for connection overhead
+- Video frame rate depends on hardware; test measures consistency, not absolute speed
+
+## Risks & Mitigation
+
+**Risk 1: CI hardware variability**
+- *Risk*: Latency thresholds may fail on slower CI hardware
+- *Mitigation*: Use generous thresholds; mark as performance benchmark tests that can be skipped in resource-constrained CI