Add detailed file index and enhance skill documentation for autopilot, decompose, deploy, plan, and research skills. Introduce tests-only mode in decompose skill, clarify required files for deploy and plan skills, and improve prerequisite checks across skills for better user guidance and workflow efficiency.

This commit is contained in:
Oleksandr Bezdieniezhnykh
2026-03-22 16:15:49 +02:00
parent 60ebe686ff
commit 3165a88f0b
60 changed files with 6324 additions and 1550 deletions
@@ -0,0 +1,325 @@
# E2E Non-Functional Tests
## Performance Tests
### NFT-PERF-01: Single image detection latency
**Summary**: Measure end-to-end latency for a single small image detection request after engine is warm.
**Traces to**: AC-API-2
**Metric**: Request-to-response latency (ms)
**Preconditions**:
- Engine is initialized and warm (at least 1 prior detection)
**Steps**:
| Step | Consumer Action | Measurement |
|------|----------------|-------------|
| 1 | Send 10 sequential `POST /detect` with small-image | Record each request-response latency |
| 2 | Compute p50, p95, p99 | — |
**Pass criteria**: p95 latency < 5000ms for ONNX CPU, p95 < 1000ms for TensorRT GPU
**Duration**: ~60s (10 requests)
---
### NFT-PERF-02: Concurrent inference throughput
**Summary**: Verify the system handles 2 concurrent inference requests (ThreadPoolExecutor limit).
**Traces to**: RESTRICT-HW-3
**Metric**: Throughput (requests/second), latency under concurrency
**Preconditions**:
- Engine is initialized and warm
**Steps**:
| Step | Consumer Action | Measurement |
|------|----------------|-------------|
| 1 | Send 2 concurrent `POST /detect` requests with small-image | Measure both response times |
| 2 | Send 3 concurrent requests | Third request should queue behind the first two |
| 3 | Record total time for 3 concurrent requests vs 2 concurrent | — |
**Pass criteria**: 2 concurrent requests complete without error. 3 concurrent requests: total time > time for 2 (queuing observed).
**Duration**: ~30s
---
### NFT-PERF-03: Large image tiling processing time
**Summary**: Measure processing time for a large image that triggers GSD-based tiling.
**Traces to**: AC-IP-2
**Metric**: Total processing time (ms), tiles processed
**Preconditions**:
- Engine is initialized and warm
**Steps**:
| Step | Consumer Action | Measurement |
|------|----------------|-------------|
| 1 | `POST /detect` with large-image (4000×3000) and GSD config | Record total response time |
| 2 | Compare with small-image baseline from NFT-PERF-01 | Ratio indicates tiling overhead |
**Pass criteria**: Request completes within 120s. Processing time scales proportionally with number of tiles (not exponentially).
**Duration**: ~120s
---
### NFT-PERF-04: Video processing frame rate
**Summary**: Measure effective frame processing rate during video detection.
**Traces to**: AC-VP-1
**Metric**: Frames processed per second, total processing time
**Preconditions**:
- Engine is initialized and warm
- SSE client connected
**Steps**:
| Step | Consumer Action | Measurement |
|------|----------------|-------------|
| 1 | `POST /detect/test-media-perf` with test-video and `frame_period_recognition: 4` | — |
| 2 | Count SSE events and measure total time from "started" to "AIProcessed" | Compute frames/second |
**Pass criteria**: Processing completes within 5× video duration (10s video → < 50s processing). Frame processing rate is consistent (no stalls > 10s between events).
**Duration**: ~120s
---
## Resilience Tests
### NFT-RES-01: Loader service outage after engine initialization
**Summary**: Verify that detections continue working when the Loader service goes down after the engine is already loaded.
**Traces to**: RESTRICT-ENV-1
**Preconditions**:
- Engine is initialized (model already downloaded)
**Fault injection**:
- Stop mock-loader service
**Steps**:
| Step | Action | Expected Behavior |
|------|--------|------------------|
| 1 | Stop mock-loader | — |
| 2 | `POST /detect` with small-image | 200 OK — detection succeeds (engine already in memory) |
| 3 | `GET /health` | `aiAvailability` remains "Enabled" |
**Pass criteria**: Detection continues to work. Health status remains stable. No errors from loader unavailability.
---
### NFT-RES-02: Annotations service outage during async detection
**Summary**: Verify that async detection completes and delivers SSE events even when Annotations service is down.
**Traces to**: RESTRICT-ENV-2
**Preconditions**:
- Engine is initialized
- SSE client connected
**Fault injection**:
- Stop mock-annotations mid-processing
**Steps**:
| Step | Action | Expected Behavior |
|------|--------|------------------|
| 1 | Start async detection: `POST /detect/test-media-res01` | `{"status": "started"}` |
| 2 | After first few SSE events, stop mock-annotations | — |
| 3 | Continue listening to SSE | Events continue arriving. Annotations POST failures are silently caught |
| 4 | Wait for completion | Final `AIProcessed` event received |
**Pass criteria**: Detection pipeline completes fully. SSE delivery is unaffected. No crash or 500 errors.
---
### NFT-RES-03: Engine initialization retry after transient loader failure
**Summary**: Verify that if model download fails on first attempt, a subsequent detection request retries initialization.
**Traces to**: AC-EL-2
**Preconditions**:
- Fresh service (engine not initialized)
**Fault injection**:
- Mock-loader returns 503 on first model request, then recovers
**Steps**:
| Step | Action | Expected Behavior |
|------|--------|------------------|
| 1 | Configure mock-loader to fail first request | — |
| 2 | `POST /detect` with small-image | Error (503 or 422) |
| 3 | Configure mock-loader to succeed | — |
| 4 | `POST /detect` with small-image | 200 OK — engine initializes on retry |
**Pass criteria**: Second detection succeeds after loader recovers. System does not permanently lock into error state.
---
### NFT-RES-04: Service restart with in-memory state loss
**Summary**: Verify that after a service restart, all in-memory state (_active_detections, _event_queues) is cleanly reset.
**Traces to**: RESTRICT-OP-5, RESTRICT-OP-6
**Preconditions**:
- Previous detection may have been in progress
**Fault injection**:
- Restart detections container
**Steps**:
| Step | Action | Expected Behavior |
|------|--------|------------------|
| 1 | Restart detections container | — |
| 2 | `GET /health` | Returns `aiAvailability: "None"` (fresh start) |
| 3 | `POST /detect/any-media-id` | Accepted (no stale _active_detections blocking it) |
**Pass criteria**: No stale state from previous session. All endpoints functional after restart.
---
## Security Tests
### NFT-SEC-01: Malformed multipart payload handling
**Summary**: Verify that the service handles malformed multipart requests without crashing.
**Traces to**: AC-API-2 (security)
**Steps**:
| Step | Consumer Action | Expected Response |
|------|----------------|------------------|
| 1 | Send `POST /detect` with truncated multipart body (missing boundary) | 400 or 422 — not 500 |
| 2 | Send `POST /detect` with Content-Type: multipart but no file part | 400 — empty image |
| 3 | `GET /health` after malformed requests | Service is still healthy |
**Pass criteria**: All malformed requests return 4xx. Service remains operational.
---
### NFT-SEC-02: Oversized request body
**Summary**: Verify system behavior when an extremely large file is uploaded.
**Traces to**: RESTRICT-OP-4
**Steps**:
| Step | Consumer Action | Expected Response |
|------|----------------|------------------|
| 1 | Send `POST /detect` with a 500 MB random file | Error response (413, 400, or timeout) — not OOM crash |
| 2 | `GET /health` | Service is still running |
**Pass criteria**: Service does not crash or run out of memory. Returns an error or times out gracefully.
---
### NFT-SEC-03: JWT token is forwarded without modification
**Summary**: Verify that the Authorization header is forwarded to the Annotations service as-is.
**Traces to**: AC-API-3
**Steps**:
| Step | Consumer Action | Expected Response |
|------|----------------|------------------|
| 1 | `POST /detect/test-media-sec` with `Authorization: Bearer test-jwt-123` and `x-refresh-token: refresh-456` | `{"status": "started"}` |
| 2 | After processing, query mock-annotations `GET /mock/annotations` | Recorded request contains `Authorization: Bearer test-jwt-123` header |
**Pass criteria**: Exact token received by mock-annotations matches what the consumer sent.
---
## Resource Limit Tests
### NFT-RES-LIM-01: ThreadPoolExecutor worker limit (2 concurrent)
**Summary**: Verify that no more than 2 inference operations run simultaneously.
**Traces to**: RESTRICT-HW-3
**Preconditions**:
- Engine is initialized
**Monitoring**:
- Track concurrent request timings
**Steps**:
| Step | Consumer Action | Expected Behavior |
|------|----------------|------------------|
| 1 | Send 4 concurrent `POST /detect` requests | — |
| 2 | Measure response arrival times | First 2 complete roughly together; next 2 complete after |
**Duration**: ~60s
**Pass criteria**: Clear evidence of 2-at-a-time processing (second batch starts after first completes). All 4 requests eventually succeed.
---
### NFT-RES-LIM-02: SSE queue depth limit (100 events)
**Summary**: Verify that the SSE queue per client does not exceed 100 events.
**Traces to**: AC-API-4
**Preconditions**:
- Engine is initialized
**Monitoring**:
- SSE event count
**Steps**:
| Step | Consumer Action | Expected Behavior |
|------|----------------|------------------|
| 1 | Open SSE connection but do not read (stall client) | — |
| 2 | Trigger async detection that produces > 100 events | — |
| 3 | After processing completes, drain the SSE queue | ≤ 100 events received |
**Duration**: ~120s
**Pass criteria**: No more than 100 events buffered. No OOM or connection errors from queue growth.
---
### NFT-RES-LIM-03: Max 300 detections per frame
**Summary**: Verify that the system returns at most 300 detections per frame (model output limit).
**Traces to**: RESTRICT-SW-6
**Preconditions**:
- Engine is initialized
- Image with dense scene expected to produce many detections
**Monitoring**:
- Detection count per response
**Duration**: ~30s
**Pass criteria**: No response contains more than 300 detections. Dense images hit the cap without errors.
---
### NFT-RES-LIM-04: Log file rotation and retention
**Summary**: Verify that log files rotate daily and are retained for 30 days.
**Traces to**: AC-LOG-1, AC-LOG-2
**Preconditions**:
- Detections service running with Logs/ volume mounted for inspection
**Monitoring**:
- Log file creation, naming, and count
**Steps**:
| Step | Consumer Action | Expected Behavior |
|------|----------------|------------------|
| 1 | Make several detection requests | Logs written to `Logs/log_inference_YYYYMMDD.txt` |
| 2 | Verify log file name matches current date | File name contains today's date |
| 3 | Verify log content format | Contains INFO/DEBUG/WARNING entries with timestamps |
**Duration**: ~10s
**Pass criteria**: Log file exists with correct date-based naming. Content includes structured log entries.