Files
detections/_docs/02_tasks/AZ-139_test_health_engine.md
T
2026-03-23 14:07:54 +02:00

3.5 KiB

Health & Engine Lifecycle Tests

Task: AZ-139_test_health_engine Name: Health & Engine Lifecycle Tests Description: Implement E2E tests verifying health endpoint responses and engine lazy initialization lifecycle Complexity: 3 points Dependencies: AZ-138_test_infrastructure Component: Integration Tests Jira: AZ-139 Epic: AZ-137

Problem

The health endpoint and engine initialization lifecycle are critical for operational monitoring and service readiness. Tests must verify that the health endpoint correctly reflects engine state transitions (None → Downloading → Enabled/Error) and that engine initialization is lazy (triggered by first detection, not at startup).

Outcome

  • Health endpoint behavior verified across all engine states
  • Lazy initialization confirmed (no engine load at startup)
  • ONNX fallback path validated on CPU-only environments
  • Engine state transitions observable through health endpoint

Scope

Included

  • FT-P-01: Health check returns status before engine initialization
  • FT-P-02: Health check reflects engine availability after initialization
  • FT-P-14: Engine lazy initialization on first detection request
  • FT-P-15: ONNX fallback when GPU unavailable

Excluded

  • TensorRT-specific engine tests (require GPU hardware)
  • Performance benchmarking of engine initialization time
  • Engine error recovery scenarios (covered in resilience tests)

Acceptance Criteria

AC-1: Pre-init health check Given the detections service just started with no prior requests When GET /health is called Then response is 200 with status "healthy" and aiAvailability "None"

AC-2: Post-init health check Given a successful detection has been performed When GET /health is called Then aiAvailability reflects an active engine state (not "None" or "Downloading")

AC-3: Lazy initialization Given a fresh service start When GET /health is called immediately Then aiAvailability is "None" (engine not loaded at startup) And after POST /detect with a valid image, GET /health shows engine active

AC-4: ONNX fallback Given the service runs without GPU runtime (CPU-only profile) When POST /detect is called with a valid image Then detection succeeds via ONNX Runtime without TensorRT errors

Non-Functional Requirements

Performance

  • Health check response within 2s
  • First detection (including engine init) within 60s

Reliability

  • Tests must work on both CPU-only and GPU Docker profiles

Integration Tests

AC Ref Initial Data/Conditions What to Test Expected Behavior NFR References
AC-1 Fresh service, no requests GET /health before any detection 200, aiAvailability: "None" Max 2s
AC-2 After POST /detect succeeds GET /health after detection aiAvailability not "None" Max 30s
AC-3 Fresh service Health → Detect → Health sequence State transition None → active Max 60s
AC-4 CPU-only Docker profile POST /detect on CPU profile Detection succeeds via ONNX Max 60s

Constraints

  • Tests must use the CPU Docker profile for ONNX fallback verification
  • Engine initialization time varies by hardware; timeouts must be generous
  • Health endpoint schema depends on AiAvailabilityStatus enum from codebase

Risks & Mitigation

Risk 1: Engine init timeout on slow CI

  • Risk: Engine initialization may exceed timeout on resource-constrained CI runners
  • Mitigation: Use generous timeouts (60s) and mark as known slow test