[AZ-137] [AZ-138] Decompose test tasks and scaffold E2E test infrastructure

Made-with: Cursor
2026-04-23 02:46:31 +00:00 · 2026-03-23 14:07:54 +02:00
parent 091d9a8fb0
commit 86d8e7e22d
47 changed files with 1883 additions and 88 deletions
@@ -0,0 +1,87 @@
+# Health & Engine Lifecycle Tests
+
+**Task**: AZ-139_test_health_engine
+**Name**: Health & Engine Lifecycle Tests
+**Description**: Implement E2E tests verifying health endpoint responses and engine lazy initialization lifecycle
+**Complexity**: 3 points
+**Dependencies**: AZ-138_test_infrastructure
+**Component**: Integration Tests
+**Jira**: AZ-139
+**Epic**: AZ-137
+
+## Problem
+
+The health endpoint and engine initialization lifecycle are critical for operational monitoring and service readiness. Tests must verify that the health endpoint correctly reflects engine state transitions (None → Downloading → Enabled/Error) and that engine initialization is lazy (triggered by first detection, not at startup).
+
+## Outcome
+
+- Health endpoint behavior verified across all engine states
+- Lazy initialization confirmed (no engine load at startup)
+- ONNX fallback path validated on CPU-only environments
+- Engine state transitions observable through health endpoint
+
+## Scope
+
+### Included
+- FT-P-01: Health check returns status before engine initialization
+- FT-P-02: Health check reflects engine availability after initialization
+- FT-P-14: Engine lazy initialization on first detection request
+- FT-P-15: ONNX fallback when GPU unavailable
+
+### Excluded
+- TensorRT-specific engine tests (require GPU hardware)
+- Performance benchmarking of engine initialization time
+- Engine error recovery scenarios (covered in resilience tests)
+
+## Acceptance Criteria
+
+**AC-1: Pre-init health check**
+Given the detections service just started with no prior requests
+When GET /health is called
+Then response is 200 with status "healthy" and aiAvailability "None"
+
+**AC-2: Post-init health check**
+Given a successful detection has been performed
+When GET /health is called
+Then aiAvailability reflects an active engine state (not "None" or "Downloading")
+
+**AC-3: Lazy initialization**
+Given a fresh service start
+When GET /health is called immediately
+Then aiAvailability is "None" (engine not loaded at startup)
+And after POST /detect with a valid image, GET /health shows engine active
+
+**AC-4: ONNX fallback**
+Given the service runs without GPU runtime (CPU-only profile)
+When POST /detect is called with a valid image
+Then detection succeeds via ONNX Runtime without TensorRT errors
+
+## Non-Functional Requirements
+
+**Performance**
+- Health check response within 2s
+- First detection (including engine init) within 60s
+
+**Reliability**
+- Tests must work on both CPU-only and GPU Docker profiles
+
+## Integration Tests
+
+| AC Ref | Initial Data/Conditions | What to Test | Expected Behavior | NFR References |
+|--------|------------------------|-------------|-------------------|----------------|
+| AC-1 | Fresh service, no requests | GET /health before any detection | 200, aiAvailability: "None" | Max 2s |
+| AC-2 | After POST /detect succeeds | GET /health after detection | aiAvailability not "None" | Max 30s |
+| AC-3 | Fresh service | Health → Detect → Health sequence | State transition None → active | Max 60s |
+| AC-4 | CPU-only Docker profile | POST /detect on CPU profile | Detection succeeds via ONNX | Max 60s |
+
+## Constraints
+
+- Tests must use the CPU Docker profile for ONNX fallback verification
+- Engine initialization time varies by hardware; timeouts must be generous
+- Health endpoint schema depends on AiAvailabilityStatus enum from codebase
+
+## Risks & Mitigation
+
+**Risk 1: Engine init timeout on slow CI**
+- *Risk*: Engine initialization may exceed timeout on resource-constrained CI runners
+- *Mitigation*: Use generous timeouts (60s) and mark as known slow test