- Changed the directory structure for task specifications to include a dedicated `todo/` folder within `_docs/02_tasks/` for tasks ready for implementation. - Updated references in various skills and documentation to reflect the new task lifecycle, including changes in the `implementer` and `decompose` skills. - Enhanced the README and flow documentation to clarify the new task organization and its implications for the implementation process. These updates improve task management clarity and streamline the implementation workflow.
3.5 KiB
Health & Engine Lifecycle Tests
Task: AZ-139_test_health_engine Name: Health & Engine Lifecycle Tests Description: Implement E2E tests verifying health endpoint responses and engine lazy initialization lifecycle Complexity: 3 points Dependencies: AZ-138_test_infrastructure Component: Integration Tests Jira: AZ-139 Epic: AZ-137
Problem
The health endpoint and engine initialization lifecycle are critical for operational monitoring and service readiness. Tests must verify that the health endpoint correctly reflects engine state transitions (None → Downloading → Enabled/Error) and that engine initialization is lazy (triggered by first detection, not at startup).
Outcome
- Health endpoint behavior verified across all engine states
- Lazy initialization confirmed (no engine load at startup)
- ONNX fallback path validated on CPU-only environments
- Engine state transitions observable through health endpoint
Scope
Included
- FT-P-01: Health check returns status before engine initialization
- FT-P-02: Health check reflects engine availability after initialization
- FT-P-14: Engine lazy initialization on first detection request
- FT-P-15: ONNX fallback when GPU unavailable
Excluded
- TensorRT-specific engine tests (require GPU hardware)
- Performance benchmarking of engine initialization time
- Engine error recovery scenarios (covered in resilience tests)
Acceptance Criteria
AC-1: Pre-init health check Given the detections service just started with no prior requests When GET /health is called Then response is 200 with status "healthy" and aiAvailability "None"
AC-2: Post-init health check Given a successful detection has been performed When GET /health is called Then aiAvailability reflects an active engine state (not "None" or "Downloading")
AC-3: Lazy initialization Given a fresh service start When GET /health is called immediately Then aiAvailability is "None" (engine not loaded at startup) And after POST /detect with a valid image, GET /health shows engine active
AC-4: ONNX fallback Given the service runs without GPU runtime (CPU-only profile) When POST /detect is called with a valid image Then detection succeeds via ONNX Runtime without TensorRT errors
Non-Functional Requirements
Performance
- Health check response within 2s
- First detection (including engine init) within 60s
Reliability
- Tests must work on both CPU-only and GPU Docker profiles
Integration Tests
| AC Ref | Initial Data/Conditions | What to Test | Expected Behavior | NFR References |
|---|---|---|---|---|
| AC-1 | Fresh service, no requests | GET /health before any detection | 200, aiAvailability: "None" | Max 2s |
| AC-2 | After POST /detect succeeds | GET /health after detection | aiAvailability not "None" | Max 30s |
| AC-3 | Fresh service | Health → Detect → Health sequence | State transition None → active | Max 60s |
| AC-4 | CPU-only Docker profile | POST /detect on CPU profile | Detection succeeds via ONNX | Max 60s |
Constraints
- Tests must use the CPU Docker profile for ONNX fallback verification
- Engine initialization time varies by hardware; timeouts must be generous
- Health endpoint schema depends on AiAvailabilityStatus enum from codebase
Risks & Mitigation
Risk 1: Engine init timeout on slow CI
- Risk: Engine initialization may exceed timeout on resource-constrained CI runners
- Mitigation: Use generous timeouts (60s) and mark as known slow test