Refactor task management structure and update documentation

- Changed the directory structure for task specifications to include a dedicated `todo/` folder within `_docs/02_tasks/` for tasks ready for implementation.
- Updated references in various skills and documentation to reflect the new task lifecycle, including changes in the `implementer` and `decompose` skills.
- Enhanced the README and flow documentation to clarify the new task organization and its implications for the implementation process.

These updates improve task management clarity and streamline the implementation workflow.
This commit is contained in:
Oleksandr Bezdieniezhnykh
2026-03-28 01:17:45 +02:00
parent 8c665bd0a4
commit cbf370c765
35 changed files with 1348 additions and 58 deletions
@@ -0,0 +1,87 @@
# Health & Engine Lifecycle Tests
**Task**: AZ-139_test_health_engine
**Name**: Health & Engine Lifecycle Tests
**Description**: Implement E2E tests verifying health endpoint responses and engine lazy initialization lifecycle
**Complexity**: 3 points
**Dependencies**: AZ-138_test_infrastructure
**Component**: Integration Tests
**Jira**: AZ-139
**Epic**: AZ-137
## Problem
The health endpoint and engine initialization lifecycle are critical for operational monitoring and service readiness. Tests must verify that the health endpoint correctly reflects engine state transitions (None → Downloading → Enabled/Error) and that engine initialization is lazy (triggered by first detection, not at startup).
## Outcome
- Health endpoint behavior verified across all engine states
- Lazy initialization confirmed (no engine load at startup)
- ONNX fallback path validated on CPU-only environments
- Engine state transitions observable through health endpoint
## Scope
### Included
- FT-P-01: Health check returns status before engine initialization
- FT-P-02: Health check reflects engine availability after initialization
- FT-P-14: Engine lazy initialization on first detection request
- FT-P-15: ONNX fallback when GPU unavailable
### Excluded
- TensorRT-specific engine tests (require GPU hardware)
- Performance benchmarking of engine initialization time
- Engine error recovery scenarios (covered in resilience tests)
## Acceptance Criteria
**AC-1: Pre-init health check**
Given the detections service just started with no prior requests
When GET /health is called
Then response is 200 with status "healthy" and aiAvailability "None"
**AC-2: Post-init health check**
Given a successful detection has been performed
When GET /health is called
Then aiAvailability reflects an active engine state (not "None" or "Downloading")
**AC-3: Lazy initialization**
Given a fresh service start
When GET /health is called immediately
Then aiAvailability is "None" (engine not loaded at startup)
And after POST /detect with a valid image, GET /health shows engine active
**AC-4: ONNX fallback**
Given the service runs without GPU runtime (CPU-only profile)
When POST /detect is called with a valid image
Then detection succeeds via ONNX Runtime without TensorRT errors
## Non-Functional Requirements
**Performance**
- Health check response within 2s
- First detection (including engine init) within 60s
**Reliability**
- Tests must work on both CPU-only and GPU Docker profiles
## Integration Tests
| AC Ref | Initial Data/Conditions | What to Test | Expected Behavior | NFR References |
|--------|------------------------|-------------|-------------------|----------------|
| AC-1 | Fresh service, no requests | GET /health before any detection | 200, aiAvailability: "None" | Max 2s |
| AC-2 | After POST /detect succeeds | GET /health after detection | aiAvailability not "None" | Max 30s |
| AC-3 | Fresh service | Health → Detect → Health sequence | State transition None → active | Max 60s |
| AC-4 | CPU-only Docker profile | POST /detect on CPU profile | Detection succeeds via ONNX | Max 60s |
## Constraints
- Tests must use the CPU Docker profile for ONNX fallback verification
- Engine initialization time varies by hardware; timeouts must be generous
- Health endpoint schema depends on AiAvailabilityStatus enum from codebase
## Risks & Mitigation
**Risk 1: Engine init timeout on slow CI**
- *Risk*: Engine initialization may exceed timeout on resource-constrained CI runners
- *Mitigation*: Use generous timeouts (60s) and mark as known slow test