[AZ-137] [AZ-138] Decompose test tasks and scaffold E2E test infrastructure

Made-with: Cursor
This commit is contained in:
Oleksandr Bezdieniezhnykh
2026-03-23 14:07:54 +02:00
parent 091d9a8fb0
commit 86d8e7e22d
47 changed files with 1883 additions and 88 deletions
@@ -0,0 +1,99 @@
# Resource Limit Tests
**Task**: AZ-148_test_resource_limits
**Name**: Resource Limit Tests
**Description**: Implement E2E tests verifying ThreadPoolExecutor worker limit, SSE queue depth cap, max detections per frame, SSE overflow handling, and log file rotation
**Complexity**: 3 points
**Dependencies**: AZ-138_test_infrastructure, AZ-142_test_async_sse
**Component**: Integration Tests
**Jira**: AZ-148
**Epic**: AZ-137
## Problem
The system enforces several resource limits: 2 concurrent inference workers, 100-event SSE queue depth, 300 max detections per frame, and daily log rotation. Tests must verify these limits are enforced correctly and that overflow conditions are handled gracefully.
## Outcome
- ThreadPoolExecutor limited to 2 concurrent inference operations
- SSE queue capped at 100 events per client, overflow silently dropped
- No response contains more than 300 detections per frame
- Log files use date-based naming with daily rotation
- SSE overflow does not crash the service or the detection pipeline
## Scope
### Included
- FT-N-08: SSE queue overflow is silently dropped
- NFT-RES-LIM-01: ThreadPoolExecutor worker limit (2 concurrent)
- NFT-RES-LIM-02: SSE queue depth limit (100 events)
- NFT-RES-LIM-03: Max 300 detections per frame
- NFT-RES-LIM-04: Log file rotation and retention
### Excluded
- Memory limits (OS-level, not application-enforced)
- Disk space limits
- Network bandwidth throttling
## Acceptance Criteria
**AC-1: Worker limit**
Given an initialized engine
When 4 concurrent POST /detect requests are sent
Then first 2 complete roughly together, next 2 complete after (2-at-a-time processing)
And all 4 requests eventually succeed
**AC-2: SSE queue depth**
Given an SSE client connected but not reading (stalled)
When async detection produces > 100 events
Then stalled client receives <= 100 events when it resumes reading
And no OOM or connection errors
**AC-3: SSE overflow handling**
Given an SSE client pauses reading
When async detection generates many events
Then detection completes normally (no error from overflow)
And stalled client receives at most 100 buffered events
**AC-4: Max detections per frame**
Given an initialized engine and a dense scene image
When POST /detect is called
Then response contains at most 300 detections
**AC-5: Log file rotation**
Given the service is running with Logs/ volume mounted
When detection requests are made
Then log file exists at Logs/log_inference_YYYYMMDD.txt with today's date
And log content contains structured INFO/DEBUG/WARNING entries
## Non-Functional Requirements
**Reliability**
- Resource limits must be enforced without crash or undefined behavior
## Integration Tests
| AC Ref | Initial Data/Conditions | What to Test | Expected Behavior | NFR References |
|--------|------------------------|-------------|-------------------|----------------|
| AC-1 | Engine warm | 4 concurrent POST /detect | 2-at-a-time processing pattern | Max 60s |
| AC-2 | Engine warm, stalled SSE | Async detection > 100 events | <= 100 events buffered | Max 120s |
| AC-3 | Engine warm, stalled SSE | Detection pipeline behavior | Completes normally | Max 120s |
| AC-4 | Engine warm, dense scene image | POST /detect | <= 300 detections | Max 30s |
| AC-5 | Service running, Logs/ mounted | Detection requests | Date-named log file exists | Max 10s |
## Constraints
- Worker limit test requires precise timing measurement of response arrivals
- SSE overflow test requires ability to pause/resume SSE client reading
- Detection cap test requires an image producing many detections (may not reach 300 with test fixture)
- Log rotation test verifies naming convention; full 30-day retention requires long-running test
## Risks & Mitigation
**Risk 1: Insufficient detections for cap test**
- *Risk*: Test image may not produce 300 detections to actually hit the cap
- *Mitigation*: Verify the cap exists by checking detection count <= 300; accept as passing if under limit
**Risk 2: SSE client stall implementation**
- *Risk*: HTTP client libraries may not support controlled read pausing
- *Mitigation*: Use raw socket or thread-based approach to control when events are consumed