Files
ai-training/_docs/02_tasks/done/AZ-139_test_health_engine.md
T
Oleksandr Bezdieniezhnykh cbf370c765 Refactor task management structure and update documentation
- Changed the directory structure for task specifications to include a dedicated `todo/` folder within `_docs/02_tasks/` for tasks ready for implementation.
- Updated references in various skills and documentation to reflect the new task lifecycle, including changes in the `implementer` and `decompose` skills.
- Enhanced the README and flow documentation to clarify the new task organization and its implications for the implementation process.

These updates improve task management clarity and streamline the implementation workflow.
2026-03-28 01:17:45 +02:00

3.5 KiB

Health & Engine Lifecycle Tests

Task: AZ-139_test_health_engine Name: Health & Engine Lifecycle Tests Description: Implement E2E tests verifying health endpoint responses and engine lazy initialization lifecycle Complexity: 3 points Dependencies: AZ-138_test_infrastructure Component: Integration Tests Jira: AZ-139 Epic: AZ-137

Problem

The health endpoint and engine initialization lifecycle are critical for operational monitoring and service readiness. Tests must verify that the health endpoint correctly reflects engine state transitions (None → Downloading → Enabled/Error) and that engine initialization is lazy (triggered by first detection, not at startup).

Outcome

  • Health endpoint behavior verified across all engine states
  • Lazy initialization confirmed (no engine load at startup)
  • ONNX fallback path validated on CPU-only environments
  • Engine state transitions observable through health endpoint

Scope

Included

  • FT-P-01: Health check returns status before engine initialization
  • FT-P-02: Health check reflects engine availability after initialization
  • FT-P-14: Engine lazy initialization on first detection request
  • FT-P-15: ONNX fallback when GPU unavailable

Excluded

  • TensorRT-specific engine tests (require GPU hardware)
  • Performance benchmarking of engine initialization time
  • Engine error recovery scenarios (covered in resilience tests)

Acceptance Criteria

AC-1: Pre-init health check Given the detections service just started with no prior requests When GET /health is called Then response is 200 with status "healthy" and aiAvailability "None"

AC-2: Post-init health check Given a successful detection has been performed When GET /health is called Then aiAvailability reflects an active engine state (not "None" or "Downloading")

AC-3: Lazy initialization Given a fresh service start When GET /health is called immediately Then aiAvailability is "None" (engine not loaded at startup) And after POST /detect with a valid image, GET /health shows engine active

AC-4: ONNX fallback Given the service runs without GPU runtime (CPU-only profile) When POST /detect is called with a valid image Then detection succeeds via ONNX Runtime without TensorRT errors

Non-Functional Requirements

Performance

  • Health check response within 2s
  • First detection (including engine init) within 60s

Reliability

  • Tests must work on both CPU-only and GPU Docker profiles

Integration Tests

AC Ref Initial Data/Conditions What to Test Expected Behavior NFR References
AC-1 Fresh service, no requests GET /health before any detection 200, aiAvailability: "None" Max 2s
AC-2 After POST /detect succeeds GET /health after detection aiAvailability not "None" Max 30s
AC-3 Fresh service Health → Detect → Health sequence State transition None → active Max 60s
AC-4 CPU-only Docker profile POST /detect on CPU profile Detection succeeds via ONNX Max 60s

Constraints

  • Tests must use the CPU Docker profile for ONNX fallback verification
  • Engine initialization time varies by hardware; timeouts must be generous
  • Health endpoint schema depends on AiAvailabilityStatus enum from codebase

Risks & Mitigation

Risk 1: Engine init timeout on slow CI

  • Risk: Engine initialization may exceed timeout on resource-constrained CI runners
  • Mitigation: Use generous timeouts (60s) and mark as known slow test