Files
ai-training/_docs/02_tasks/done/AZ-148_test_resource_limits.md
T
Oleksandr Bezdieniezhnykh cbf370c765 Refactor task management structure and update documentation
- Changed the directory structure for task specifications to include a dedicated `todo/` folder within `_docs/02_tasks/` for tasks ready for implementation.
- Updated references in various skills and documentation to reflect the new task lifecycle, including changes in the `implementer` and `decompose` skills.
- Enhanced the README and flow documentation to clarify the new task organization and its implications for the implementation process.

These updates improve task management clarity and streamline the implementation workflow.
2026-03-28 01:17:45 +02:00

4.1 KiB

Resource Limit Tests

Task: AZ-148_test_resource_limits Name: Resource Limit Tests Description: Implement E2E tests verifying ThreadPoolExecutor worker limit, SSE queue depth cap, max detections per frame, SSE overflow handling, and log file rotation Complexity: 3 points Dependencies: AZ-138_test_infrastructure, AZ-142_test_async_sse Component: Integration Tests Jira: AZ-148 Epic: AZ-137

Problem

The system enforces several resource limits: 2 concurrent inference workers, 100-event SSE queue depth, 300 max detections per frame, and daily log rotation. Tests must verify these limits are enforced correctly and that overflow conditions are handled gracefully.

Outcome

  • ThreadPoolExecutor limited to 2 concurrent inference operations
  • SSE queue capped at 100 events per client, overflow silently dropped
  • No response contains more than 300 detections per frame
  • Log files use date-based naming with daily rotation
  • SSE overflow does not crash the service or the detection pipeline

Scope

Included

  • FT-N-08: SSE queue overflow is silently dropped
  • NFT-RES-LIM-01: ThreadPoolExecutor worker limit (2 concurrent)
  • NFT-RES-LIM-02: SSE queue depth limit (100 events)
  • NFT-RES-LIM-03: Max 300 detections per frame
  • NFT-RES-LIM-04: Log file rotation and retention

Excluded

  • Memory limits (OS-level, not application-enforced)
  • Disk space limits
  • Network bandwidth throttling

Acceptance Criteria

AC-1: Worker limit Given an initialized engine When 4 concurrent POST /detect requests are sent Then first 2 complete roughly together, next 2 complete after (2-at-a-time processing) And all 4 requests eventually succeed

AC-2: SSE queue depth Given an SSE client connected but not reading (stalled) When async detection produces > 100 events Then stalled client receives <= 100 events when it resumes reading And no OOM or connection errors

AC-3: SSE overflow handling Given an SSE client pauses reading When async detection generates many events Then detection completes normally (no error from overflow) And stalled client receives at most 100 buffered events

AC-4: Max detections per frame Given an initialized engine and a dense scene image When POST /detect is called Then response contains at most 300 detections

AC-5: Log file rotation Given the service is running with Logs/ volume mounted When detection requests are made Then log file exists at Logs/log_inference_YYYYMMDD.txt with today's date And log content contains structured INFO/DEBUG/WARNING entries

Non-Functional Requirements

Reliability

  • Resource limits must be enforced without crash or undefined behavior

Integration Tests

AC Ref Initial Data/Conditions What to Test Expected Behavior NFR References
AC-1 Engine warm 4 concurrent POST /detect 2-at-a-time processing pattern Max 60s
AC-2 Engine warm, stalled SSE Async detection > 100 events <= 100 events buffered Max 120s
AC-3 Engine warm, stalled SSE Detection pipeline behavior Completes normally Max 120s
AC-4 Engine warm, dense scene image POST /detect <= 300 detections Max 30s
AC-5 Service running, Logs/ mounted Detection requests Date-named log file exists Max 10s

Constraints

  • Worker limit test requires precise timing measurement of response arrivals
  • SSE overflow test requires ability to pause/resume SSE client reading
  • Detection cap test requires an image producing many detections (may not reach 300 with test fixture)
  • Log rotation test verifies naming convention; full 30-day retention requires long-running test

Risks & Mitigation

Risk 1: Insufficient detections for cap test

  • Risk: Test image may not produce 300 detections to actually hit the cap
  • Mitigation: Verify the cap exists by checking detection count <= 300; accept as passing if under limit

Risk 2: SSE client stall implementation

  • Risk: HTTP client libraries may not support controlled read pausing
  • Mitigation: Use raw socket or thread-based approach to control when events are consumed