Files
detections/_docs/04_deploy/observability.md
T
Oleksandr Bezdieniezhnykh be4cab4fcb [AZ-178] Implement streaming video detection endpoint
- Added `/detect/video` endpoint for true streaming video detection, allowing inference to start as upload bytes arrive.
- Introduced `run_detect_video_stream` method in the inference module to handle video processing from a file-like object.
- Updated media hashing to include a new function for computing hashes directly from files with minimal I/O.
- Enhanced documentation to reflect changes in video processing and API behavior.

Made-with: Cursor
2026-04-01 03:11:43 +03:00

2.9 KiB

Observability

Logging

Current state: loguru with daily rotated files (Logs/log_inference_YYYYMMDD.txt, 30-day retention) + stdout/stderr. Format: [HH:mm:ss LEVEL] message.

Recommended improvements:

Aspect Current Recommended
Format Human-readable Structured JSON to stdout (container-friendly)
Fields timestamp, level, message + service, correlation_id, context
PII Not applicable No user IDs or tokens in logs
Retention 30 days (file) Console in dev; 7 days staging; 30 days production (via log aggregator)

Container logging pattern: Log to stdout/stderr only; let the container runtime (Docker/K8s) handle log collection and routing. Remove file-based logging in containerized deployments.

Metrics

Recommended /metrics endpoint (Prometheus-compatible):

Metric Type Labels Description
detections_requests_total Counter method, endpoint, status Total HTTP requests
detections_request_duration_seconds Histogram method, endpoint Request processing time
detections_inference_duration_seconds Histogram media_type (image/video) Inference processing time
detections_active_inferences Gauge Currently running inference jobs (0-2)
detections_sse_clients Gauge Connected SSE clients
detections_engine_status Gauge engine_type 1=ready, 0=not ready

Collection: Prometheus scrape at 15s intervals.

Distributed Tracing

Limited applicability: The Detections service makes outbound HTTP calls to Loader and Annotations. Trace context propagation is recommended for cross-service correlation.

Span Parent Description
detections.detect_image Client request Full image detection flow
detections.detect_video Client request Full video detection flow
detections.model_download detect_* Model download from Loader
detections.post_annotation detect_* Annotation POST to Annotations service

Implementation: OpenTelemetry Python SDK with OTLP exporter. Sampling: 100% in dev/staging, 10% in production.

Alerting

Severity Response Time Condition
Critical 5 min Health endpoint returns non-200; container restart loop
High 30 min Error rate > 5%; inference duration p95 > 10s
Medium 4 hours SSE client count = 0 for extended period; disk > 80%
Low Next business day Elevated log warnings; model download retries

Dashboards

Operations dashboard:

  • Service health status
  • Request rate by endpoint
  • Inference duration histogram (p50, p95, p99)
  • Active inference count (0-2 gauge)
  • SSE connected clients
  • Error rate by type

Inference dashboard:

  • Detections per frame/video
  • Model availability status timeline
  • Engine type distribution (ONNX vs TensorRT)
  • Video batch processing rate