Add detailed file index and enhance skill documentation for autopilot, decompose, deploy, plan, and research skills. Introduce tests-only mode in decompose skill, clarify required files for deploy and plan skills, and improve prerequisite checks across skills for better user guidance and workflow efficiency.

2026-04-22 10:46:31 +00:00 · 2026-03-22 16:15:49 +02:00
parent 60ebe686ff
commit 3165a88f0b
60 changed files with 6324 additions and 1550 deletions
@@ -0,0 +1,325 @@
+# E2E Non-Functional Tests
+
+## Performance Tests
+
+### NFT-PERF-01: Single image detection latency
+
+**Summary**: Measure end-to-end latency for a single small image detection request after engine is warm.
+**Traces to**: AC-API-2
+**Metric**: Request-to-response latency (ms)
+
+**Preconditions**:
+- Engine is initialized and warm (at least 1 prior detection)
+
+**Steps**:
+
+| Step | Consumer Action | Measurement |
+|------|----------------|-------------|
+| 1 | Send 10 sequential `POST /detect` with small-image | Record each request-response latency |
+| 2 | Compute p50, p95, p99 | — |
+
+**Pass criteria**: p95 latency < 5000ms for ONNX CPU, p95 < 1000ms for TensorRT GPU
+**Duration**: ~60s (10 requests)
+
+---
+
+### NFT-PERF-02: Concurrent inference throughput
+
+**Summary**: Verify the system handles 2 concurrent inference requests (ThreadPoolExecutor limit).
+**Traces to**: RESTRICT-HW-3
+**Metric**: Throughput (requests/second), latency under concurrency
+
+**Preconditions**:
+- Engine is initialized and warm
+
+**Steps**:
+
+| Step | Consumer Action | Measurement |
+|------|----------------|-------------|
+| 1 | Send 2 concurrent `POST /detect` requests with small-image | Measure both response times |
+| 2 | Send 3 concurrent requests | Third request should queue behind the first two |
+| 3 | Record total time for 3 concurrent requests vs 2 concurrent | — |
+
+**Pass criteria**: 2 concurrent requests complete without error. 3 concurrent requests: total time > time for 2 (queuing observed).
+**Duration**: ~30s
+
+---
+
+### NFT-PERF-03: Large image tiling processing time
+
+**Summary**: Measure processing time for a large image that triggers GSD-based tiling.
+**Traces to**: AC-IP-2
+**Metric**: Total processing time (ms), tiles processed
+
+**Preconditions**:
+- Engine is initialized and warm
+
+**Steps**:
+
+| Step | Consumer Action | Measurement |
+|------|----------------|-------------|
+| 1 | `POST /detect` with large-image (4000×3000) and GSD config | Record total response time |
+| 2 | Compare with small-image baseline from NFT-PERF-01 | Ratio indicates tiling overhead |
+
+**Pass criteria**: Request completes within 120s. Processing time scales proportionally with number of tiles (not exponentially).
+**Duration**: ~120s
+
+---
+
+### NFT-PERF-04: Video processing frame rate
+
+**Summary**: Measure effective frame processing rate during video detection.
+**Traces to**: AC-VP-1
+**Metric**: Frames processed per second, total processing time
+
+**Preconditions**:
+- Engine is initialized and warm
+- SSE client connected
+
+**Steps**:
+
+| Step | Consumer Action | Measurement |
+|------|----------------|-------------|
+| 1 | `POST /detect/test-media-perf` with test-video and `frame_period_recognition: 4` | — |
+| 2 | Count SSE events and measure total time from "started" to "AIProcessed" | Compute frames/second |
+
+**Pass criteria**: Processing completes within 5× video duration (10s video → < 50s processing). Frame processing rate is consistent (no stalls > 10s between events).
+**Duration**: ~120s
+
+---
+
+## Resilience Tests
+
+### NFT-RES-01: Loader service outage after engine initialization
+
+**Summary**: Verify that detections continue working when the Loader service goes down after the engine is already loaded.
+**Traces to**: RESTRICT-ENV-1
+
+**Preconditions**:
+- Engine is initialized (model already downloaded)
+
+**Fault injection**:
+- Stop mock-loader service
+
+**Steps**:
+
+| Step | Action | Expected Behavior |
+|------|--------|------------------|
+| 1 | Stop mock-loader | — |
+| 2 | `POST /detect` with small-image | 200 OK — detection succeeds (engine already in memory) |
+| 3 | `GET /health` | `aiAvailability` remains "Enabled" |
+
+**Pass criteria**: Detection continues to work. Health status remains stable. No errors from loader unavailability.
+
+---
+
+### NFT-RES-02: Annotations service outage during async detection
+
+**Summary**: Verify that async detection completes and delivers SSE events even when Annotations service is down.
+**Traces to**: RESTRICT-ENV-2
+
+**Preconditions**:
+- Engine is initialized
+- SSE client connected
+
+**Fault injection**:
+- Stop mock-annotations mid-processing
+
+**Steps**:
+
+| Step | Action | Expected Behavior |
+|------|--------|------------------|
+| 1 | Start async detection: `POST /detect/test-media-res01` | `{"status": "started"}` |
+| 2 | After first few SSE events, stop mock-annotations | — |
+| 3 | Continue listening to SSE | Events continue arriving. Annotations POST failures are silently caught |
+| 4 | Wait for completion | Final `AIProcessed` event received |
+
+**Pass criteria**: Detection pipeline completes fully. SSE delivery is unaffected. No crash or 500 errors.
+
+---
+
+### NFT-RES-03: Engine initialization retry after transient loader failure
+
+**Summary**: Verify that if model download fails on first attempt, a subsequent detection request retries initialization.
+**Traces to**: AC-EL-2
+
+**Preconditions**:
+- Fresh service (engine not initialized)
+
+**Fault injection**:
+- Mock-loader returns 503 on first model request, then recovers
+
+**Steps**:
+
+| Step | Action | Expected Behavior |
+|------|--------|------------------|
+| 1 | Configure mock-loader to fail first request | — |
+| 2 | `POST /detect` with small-image | Error (503 or 422) |
+| 3 | Configure mock-loader to succeed | — |
+| 4 | `POST /detect` with small-image | 200 OK — engine initializes on retry |
+
+**Pass criteria**: Second detection succeeds after loader recovers. System does not permanently lock into error state.
+
+---
+
+### NFT-RES-04: Service restart with in-memory state loss
+
+**Summary**: Verify that after a service restart, all in-memory state (_active_detections, _event_queues) is cleanly reset.
+**Traces to**: RESTRICT-OP-5, RESTRICT-OP-6
+
+**Preconditions**:
+- Previous detection may have been in progress
+
+**Fault injection**:
+- Restart detections container
+
+**Steps**:
+
+| Step | Action | Expected Behavior |
+|------|--------|------------------|
+| 1 | Restart detections container | — |
+| 2 | `GET /health` | Returns `aiAvailability: "None"` (fresh start) |
+| 3 | `POST /detect/any-media-id` | Accepted (no stale _active_detections blocking it) |
+
+**Pass criteria**: No stale state from previous session. All endpoints functional after restart.
+
+---
+
+## Security Tests
+
+### NFT-SEC-01: Malformed multipart payload handling
+
+**Summary**: Verify that the service handles malformed multipart requests without crashing.
+**Traces to**: AC-API-2 (security)
+
+**Steps**:
+
+| Step | Consumer Action | Expected Response |
+|------|----------------|------------------|
+| 1 | Send `POST /detect` with truncated multipart body (missing boundary) | 400 or 422 — not 500 |
+| 2 | Send `POST /detect` with Content-Type: multipart but no file part | 400 — empty image |
+| 3 | `GET /health` after malformed requests | Service is still healthy |
+
+**Pass criteria**: All malformed requests return 4xx. Service remains operational.
+
+---
+
+### NFT-SEC-02: Oversized request body
+
+**Summary**: Verify system behavior when an extremely large file is uploaded.
+**Traces to**: RESTRICT-OP-4
+
+**Steps**:
+
+| Step | Consumer Action | Expected Response |
+|------|----------------|------------------|
+| 1 | Send `POST /detect` with a 500 MB random file | Error response (413, 400, or timeout) — not OOM crash |
+| 2 | `GET /health` | Service is still running |
+
+**Pass criteria**: Service does not crash or run out of memory. Returns an error or times out gracefully.
+
+---
+
+### NFT-SEC-03: JWT token is forwarded without modification
+
+**Summary**: Verify that the Authorization header is forwarded to the Annotations service as-is.
+**Traces to**: AC-API-3
+
+**Steps**:
+
+| Step | Consumer Action | Expected Response |
+|------|----------------|------------------|
+| 1 | `POST /detect/test-media-sec` with `Authorization: Bearer test-jwt-123` and `x-refresh-token: refresh-456` | `{"status": "started"}` |
+| 2 | After processing, query mock-annotations `GET /mock/annotations` | Recorded request contains `Authorization: Bearer test-jwt-123` header |
+
+**Pass criteria**: Exact token received by mock-annotations matches what the consumer sent.
+
+---
+
+## Resource Limit Tests
+
+### NFT-RES-LIM-01: ThreadPoolExecutor worker limit (2 concurrent)
+
+**Summary**: Verify that no more than 2 inference operations run simultaneously.
+**Traces to**: RESTRICT-HW-3
+
+**Preconditions**:
+- Engine is initialized
+
+**Monitoring**:
+- Track concurrent request timings
+
+**Steps**:
+
+| Step | Consumer Action | Expected Behavior |
+|------|----------------|------------------|
+| 1 | Send 4 concurrent `POST /detect` requests | — |
+| 2 | Measure response arrival times | First 2 complete roughly together; next 2 complete after |
+
+**Duration**: ~60s
+**Pass criteria**: Clear evidence of 2-at-a-time processing (second batch starts after first completes). All 4 requests eventually succeed.
+
+---
+
+### NFT-RES-LIM-02: SSE queue depth limit (100 events)
+
+**Summary**: Verify that the SSE queue per client does not exceed 100 events.
+**Traces to**: AC-API-4
+
+**Preconditions**:
+- Engine is initialized
+
+**Monitoring**:
+- SSE event count
+
+**Steps**:
+
+| Step | Consumer Action | Expected Behavior |
+|------|----------------|------------------|
+| 1 | Open SSE connection but do not read (stall client) | — |
+| 2 | Trigger async detection that produces > 100 events | — |
+| 3 | After processing completes, drain the SSE queue | ≤ 100 events received |
+
+**Duration**: ~120s
+**Pass criteria**: No more than 100 events buffered. No OOM or connection errors from queue growth.
+
+---
+
+### NFT-RES-LIM-03: Max 300 detections per frame
+
+**Summary**: Verify that the system returns at most 300 detections per frame (model output limit).
+**Traces to**: RESTRICT-SW-6
+
+**Preconditions**:
+- Engine is initialized
+- Image with dense scene expected to produce many detections
+
+**Monitoring**:
+- Detection count per response
+
+**Duration**: ~30s
+**Pass criteria**: No response contains more than 300 detections. Dense images hit the cap without errors.
+
+---
+
+### NFT-RES-LIM-04: Log file rotation and retention
+
+**Summary**: Verify that log files rotate daily and are retained for 30 days.
+**Traces to**: AC-LOG-1, AC-LOG-2
+
+**Preconditions**:
+- Detections service running with Logs/ volume mounted for inspection
+
+**Monitoring**:
+- Log file creation, naming, and count
+
+**Steps**:
+
+| Step | Consumer Action | Expected Behavior |
+|------|----------------|------------------|
+| 1 | Make several detection requests | Logs written to `Logs/log_inference_YYYYMMDD.txt` |
+| 2 | Verify log file name matches current date | File name contains today's date |
+| 3 | Verify log content format | Contains INFO/DEBUG/WARNING entries with timestamps |
+
+**Duration**: ~10s
+**Pass criteria**: Log file exists with correct date-based naming. Content includes structured log entries.