Refactor testing framework to replace integration tests with blackbox tests across various skills and documentation. Update related workflows, templates, and task specifications to align with the new blackbox testing approach. Remove obsolete integration test files and enhance clarity in task management and reporting structures.

2026-06-21 13:21:08 +00:00 · 2026-03-24 03:38:36 +02:00
parent ae3ad50b9e
commit e609586c7c
49 changed files with 2222 additions and 872 deletions
@@ -0,0 +1,85 @@
+# Performance Tests
+
+### NFT-PERF-01: Single image detection latency
+
+**Summary**: Measure end-to-end latency for a single small image detection request after engine is warm.
+**Traces to**: AC-API-2
+**Metric**: Request-to-response latency (ms)
+
+**Preconditions**:
+- Engine is initialized and warm (at least 1 prior detection)
+
+**Steps**:
+
+| Step | Consumer Action | Measurement |
+|------|----------------|-------------|
+| 1 | Send 10 sequential `POST /detect` with small-image | Record each request-response latency |
+| 2 | Compute p50, p95, p99 | — |
+
+**Pass criteria**: p95 latency < 5000ms for ONNX CPU, p95 < 1000ms for TensorRT GPU
+**Duration**: ~60s (10 requests)
+
+---
+
+### NFT-PERF-02: Concurrent inference throughput
+
+**Summary**: Verify the system handles 2 concurrent inference requests (ThreadPoolExecutor limit).
+**Traces to**: RESTRICT-HW-3
+**Metric**: Throughput (requests/second), latency under concurrency
+
+**Preconditions**:
+- Engine is initialized and warm
+
+**Steps**:
+
+| Step | Consumer Action | Measurement |
+|------|----------------|-------------|
+| 1 | Send 2 concurrent `POST /detect` requests with small-image | Measure both response times |
+| 2 | Send 3 concurrent requests | Third request should queue behind the first two |
+| 3 | Record total time for 3 concurrent requests vs 2 concurrent | — |
+
+**Pass criteria**: 2 concurrent requests complete without error. 3 concurrent requests: total time > time for 2 (queuing observed).
+**Duration**: ~30s
+
+---
+
+### NFT-PERF-03: Large image tiling processing time
+
+**Summary**: Measure processing time for a large image that triggers GSD-based tiling.
+**Traces to**: AC-IP-2
+**Metric**: Total processing time (ms), tiles processed
+
+**Preconditions**:
+- Engine is initialized and warm
+
+**Steps**:
+
+| Step | Consumer Action | Measurement |
+|------|----------------|-------------|
+| 1 | `POST /detect` with large-image (4000×3000) and GSD config | Record total response time |
+| 2 | Compare with small-image baseline from NFT-PERF-01 | Ratio indicates tiling overhead |
+
+**Pass criteria**: Request completes within 120s. Processing time scales proportionally with number of tiles (not exponentially).
+**Duration**: ~120s
+
+---
+
+### NFT-PERF-04: Video processing frame rate
+
+**Summary**: Measure effective frame processing rate during video detection.
+**Traces to**: AC-VP-1
+**Metric**: Frames processed per second, total processing time
+
+**Preconditions**:
+- Engine is initialized and warm
+- SSE client connected
+
+**Steps**:
+
+| Step | Consumer Action | Measurement |
+|------|----------------|-------------|
+| 1 | `POST /detect/test-media-perf` with test-video and `frame_period_recognition: 4` | — |
+| 2 | Count SSE events and measure total time from "started" to "AIProcessed" | Compute frames/second |
+
+**Pass criteria**: Processing completes within 5× video duration (10s video → < 50s processing). Frame processing rate is consistent (no stalls > 10s between events).
+**Duration**: ~120s