Health & Engine Lifecycle Tests

Task: AZ-139_test_health_engine Name: Health & Engine Lifecycle Tests Description: Implement E2E tests verifying health endpoint responses and engine lazy initialization lifecycle Complexity: 3 points Dependencies: AZ-138_test_infrastructure Component: Integration Tests Jira: AZ-139 Epic: AZ-137

Problem

The health endpoint and engine initialization lifecycle are critical for operational monitoring and service readiness. Tests must verify that the health endpoint correctly reflects engine state transitions (None → Downloading → Enabled/Error) and that engine initialization is lazy (triggered by first detection, not at startup).

Outcome

Health endpoint behavior verified across all engine states
Lazy initialization confirmed (no engine load at startup)
ONNX fallback path validated on CPU-only environments
Engine state transitions observable through health endpoint

Scope

Included

FT-P-01: Health check returns status before engine initialization
FT-P-02: Health check reflects engine availability after initialization
FT-P-14: Engine lazy initialization on first detection request
FT-P-15: ONNX fallback when GPU unavailable

Excluded

TensorRT-specific engine tests (require GPU hardware)
Performance benchmarking of engine initialization time
Engine error recovery scenarios (covered in resilience tests)

Acceptance Criteria

AC-1: Pre-init health check Given the detections service just started with no prior requests When GET /health is called Then response is 200 with status "healthy" and aiAvailability "None"

AC-2: Post-init health check Given a successful detection has been performed When GET /health is called Then aiAvailability reflects an active engine state (not "None" or "Downloading")

AC-3: Lazy initialization Given a fresh service start When GET /health is called immediately Then aiAvailability is "None" (engine not loaded at startup) And after POST /detect with a valid image, GET /health shows engine active

AC-4: ONNX fallback Given the service runs without GPU runtime (CPU-only profile) When POST /detect is called with a valid image Then detection succeeds via ONNX Runtime without TensorRT errors

Non-Functional Requirements

Performance

Health check response within 2s
First detection (including engine init) within 60s

Reliability

Tests must work on both CPU-only and GPU Docker profiles

Integration Tests

AC Ref	Initial Data/Conditions	What to Test	Expected Behavior	NFR References
AC-1	Fresh service, no requests	GET /health before any detection	200, aiAvailability: "None"	Max 2s
AC-2	After POST /detect succeeds	GET /health after detection	aiAvailability not "None"	Max 30s
AC-3	Fresh service	Health → Detect → Health sequence	State transition None → active	Max 60s
AC-4	CPU-only Docker profile	POST /detect on CPU profile	Detection succeeds via ONNX	Max 60s

Constraints

Tests must use the CPU Docker profile for ONNX fallback verification
Engine initialization time varies by hardware; timeouts must be generous
Health endpoint schema depends on AiAvailabilityStatus enum from codebase

Risks & Mitigation

Risk 1: Engine init timeout on slow CI

Risk: Engine initialization may exceed timeout on resource-constrained CI runners
Mitigation: Use generous timeouts (60s) and mark as known slow test

3.5 KiB Raw Blame History