Files
detections/_docs/04_deploy/deployment_procedures.md
Oleksandr Bezdieniezhnykh be4cab4fcb [AZ-178] Implement streaming video detection endpoint
- Added `/detect/video` endpoint for true streaming video detection, allowing inference to start as upload bytes arrive.
- Introduced `run_detect_video_stream` method in the inference module to handle video processing from a file-like object.
- Updated media hashing to include a new function for computing hashes directly from files with minimal I/O.
- Enhanced documentation to reflect changes in video processing and API behavior.

Made-with: Cursor
2026-04-01 03:11:43 +03:00

3.0 KiB

Deployment Procedures

Deployment Strategy

Pattern: Rolling deployment (single-instance service; blue-green not applicable without load balancer).

Rationale: The Detections service is a single-instance stateless service with in-memory state (_active_detections, _event_queues). Rolling replacement with graceful shutdown is the simplest approach.

Zero-downtime: Achievable only with load-balanced multi-instance setup. For single-instance deployments, brief downtime (~10-30s) during container restart is expected.

Deployment Flow

1. Pull new images → pull-images.sh
2. Stop current services gracefully → stop-services.sh (30s grace)
3. Start new services → start-services.sh
4. Verify health → health-check.sh
5. If unhealthy → rollback (deploy.sh --rollback)

Health Checks

Check Type Endpoint Interval Threshold
Liveness HTTP GET /health 30s 3 failures → restart
Readiness HTTP GET /health (check aiAvailability != "None") 10s Wait for engine init

Note: The service's /health endpoint returns {"status": "healthy"} immediately, with aiAvailability transitioning through DOWNLOADING → CONVERTING → ENABLED on first inference request (lazy init). Readiness depends on the use case — the API is ready to accept requests immediately, but inference is only ready after engine initialization.

Rollback Procedures

Trigger criteria:

  • Health check fails after deployment (3 consecutive failures)
  • Error rate spike detected in monitoring
  • Critical bug reported by users

Rollback steps:

  1. Run deploy.sh --rollback — redeploys the previous image tag
  2. Verify health via health-check.sh
  3. Notify stakeholders of rollback
  4. Investigate root cause

Previous image tag: Saved by stop-services.sh to .deploy-previous-tag before each deployment.

Deployment Checklist

Pre-deployment:

  • All CI pipeline stages pass (lint, test, security, e2e)
  • Security scan clean (zero critical/high CVEs)
  • Environment variables configured on target
  • classes.json available on target (or mounted volume)
  • ONNX model available in Loader service
  • Loader and Annotations services reachable from target

Post-deployment:

  • /health returns 200
  • aiAvailability transitions to ENABLED after first request
  • SSE stream endpoint accessible
  • Detection request returns valid results
  • Logs show no errors

Graceful Shutdown

Current limitation: No graceful shutdown implemented. In-progress detections are aborted on container stop. Background TensorRT conversion runs in a daemon thread (terminates with process).

Mitigation: The stop-services.sh script sends SIGTERM with a 30-second grace period, allowing uvicorn to finish in-flight HTTP requests. Long-running video detections may be interrupted.

Database Migrations

Not applicable — the Detections service has no database. All persistence is delegated to the Annotations service.