mirror of https://github.com/azaion/ai-training.git synced 2026-04-22 22:16:35 +00:00

Files

T

Oleksandr Bezdieniezhnykh cbf370c765 Refactor task management structure and update documentation

- Changed the directory structure for task specifications to include a dedicated `todo/` folder within `_docs/02_tasks/` for tasks ready for implementation.
- Updated references in various skills and documentation to reflect the new task lifecycle, including changes in the `implementer` and `decompose` skills.
- Enhanced the README and flow documentation to clarify the new task organization and its implications for the implementation process.

These updates improve task management clarity and streamline the implementation workflow.

2026-03-28 01:17:45 +02:00

4.1 KiB

Raw Blame History

Resource Limit Tests

Task: AZ-148_test_resource_limits Name: Resource Limit Tests Description: Implement E2E tests verifying ThreadPoolExecutor worker limit, SSE queue depth cap, max detections per frame, SSE overflow handling, and log file rotation Complexity: 3 points Dependencies: AZ-138_test_infrastructure, AZ-142_test_async_sse Component: Integration Tests Jira: AZ-148 Epic: AZ-137

Problem

The system enforces several resource limits: 2 concurrent inference workers, 100-event SSE queue depth, 300 max detections per frame, and daily log rotation. Tests must verify these limits are enforced correctly and that overflow conditions are handled gracefully.

Outcome

ThreadPoolExecutor limited to 2 concurrent inference operations
SSE queue capped at 100 events per client, overflow silently dropped
No response contains more than 300 detections per frame
Log files use date-based naming with daily rotation
SSE overflow does not crash the service or the detection pipeline

Scope

Included

FT-N-08: SSE queue overflow is silently dropped
NFT-RES-LIM-01: ThreadPoolExecutor worker limit (2 concurrent)
NFT-RES-LIM-02: SSE queue depth limit (100 events)
NFT-RES-LIM-03: Max 300 detections per frame
NFT-RES-LIM-04: Log file rotation and retention

Excluded

Memory limits (OS-level, not application-enforced)
Disk space limits
Network bandwidth throttling

Acceptance Criteria

AC-1: Worker limit Given an initialized engine When 4 concurrent POST /detect requests are sent Then first 2 complete roughly together, next 2 complete after (2-at-a-time processing) And all 4 requests eventually succeed

AC-2: SSE queue depth Given an SSE client connected but not reading (stalled) When async detection produces > 100 events Then stalled client receives <= 100 events when it resumes reading And no OOM or connection errors

AC-3: SSE overflow handling Given an SSE client pauses reading When async detection generates many events Then detection completes normally (no error from overflow) And stalled client receives at most 100 buffered events

AC-4: Max detections per frame Given an initialized engine and a dense scene image When POST /detect is called Then response contains at most 300 detections

AC-5: Log file rotation Given the service is running with Logs/ volume mounted When detection requests are made Then log file exists at Logs/log_inference_YYYYMMDD.txt with today's date And log content contains structured INFO/DEBUG/WARNING entries

Non-Functional Requirements

Reliability

Resource limits must be enforced without crash or undefined behavior

Integration Tests

AC Ref	Initial Data/Conditions	What to Test	Expected Behavior	NFR References
AC-1	Engine warm	4 concurrent POST /detect	2-at-a-time processing pattern	Max 60s
AC-2	Engine warm, stalled SSE	Async detection > 100 events	<= 100 events buffered	Max 120s
AC-3	Engine warm, stalled SSE	Detection pipeline behavior	Completes normally	Max 120s
AC-4	Engine warm, dense scene image	POST /detect	<= 300 detections	Max 30s
AC-5	Service running, Logs/ mounted	Detection requests	Date-named log file exists	Max 10s

Constraints

Worker limit test requires precise timing measurement of response arrivals
SSE overflow test requires ability to pause/resume SSE client reading
Detection cap test requires an image producing many detections (may not reach 300 with test fixture)
Log rotation test verifies naming convention; full 30-day retention requires long-running test

Risks & Mitigation

Risk 1: Insufficient detections for cap test

Risk: Test image may not produce 300 detections to actually hit the cap
Mitigation: Verify the cap exists by checking detection count <= 300; accept as passing if under limit

Risk 2: SSE client stall implementation

Risk: HTTP client libraries may not support controlled read pausing
Mitigation: Use raw socket or thread-based approach to control when events are consumed

4.1 KiB Raw Blame History