[AZ-178] Implement streaming video detection endpoint

- Added `/detect/video` endpoint for true streaming video detection, allowing inference to start as upload bytes arrive. - Introduced `run_detect_video_stream` method in the inference module to handle video processing from a file-like object. - Updated media hashing to include a new function for computing hashes directly from files with minimal I/O. - Enhanced documentation to reflect changes in video processing and API behavior. Made-with: Cursor
2026-06-21 17:21:07 +00:00 · 2026-04-01 03:11:43 +03:00
parent e65d8da6a3
commit be4cab4fcb
42 changed files with 2983 additions and 29 deletions
@@ -0,0 +1,69 @@
+# Observability
+
+## Logging
+
+**Current state**: loguru with daily rotated files (`Logs/log_inference_YYYYMMDD.txt`, 30-day retention) + stdout/stderr. Format: `[HH:mm:ss LEVEL] message`.
+
+**Recommended improvements**:
+
+| Aspect | Current | Recommended |
+|--------|---------|-------------|
+| Format | Human-readable | Structured JSON to stdout (container-friendly) |
+| Fields | timestamp, level, message | + service, correlation_id, context |
+| PII | Not applicable | No user IDs or tokens in logs |
+| Retention | 30 days (file) | Console in dev; 7 days staging; 30 days production (via log aggregator) |
+
+**Container logging pattern**: Log to stdout/stderr only; let the container runtime (Docker/K8s) handle log collection and routing. Remove file-based logging in containerized deployments.
+
+## Metrics
+
+**Recommended `/metrics` endpoint** (Prometheus-compatible):
+
+| Metric | Type | Labels | Description |
+|--------|------|--------|-------------|
+| `detections_requests_total` | Counter | method, endpoint, status | Total HTTP requests |
+| `detections_request_duration_seconds` | Histogram | method, endpoint | Request processing time |
+| `detections_inference_duration_seconds` | Histogram | media_type (image/video) | Inference processing time |
+| `detections_active_inferences` | Gauge | — | Currently running inference jobs (0-2) |
+| `detections_sse_clients` | Gauge | — | Connected SSE clients |
+| `detections_engine_status` | Gauge | engine_type | 1=ready, 0=not ready |
+
+**Collection**: Prometheus scrape at 15s intervals.
+
+## Distributed Tracing
+
+**Limited applicability**: The Detections service makes outbound HTTP calls to Loader and Annotations. Trace context propagation is recommended for cross-service correlation.
+
+| Span | Parent | Description |
+|------|--------|-------------|
+| `detections.detect_image` | Client request | Full image detection flow |
+| `detections.detect_video` | Client request | Full video detection flow |
+| `detections.model_download` | detect_* | Model download from Loader |
+| `detections.post_annotation` | detect_* | Annotation POST to Annotations service |
+
+**Implementation**: OpenTelemetry Python SDK with OTLP exporter. Sampling: 100% in dev/staging, 10% in production.
+
+## Alerting
+
+| Severity | Response Time | Condition |
+|----------|---------------|-----------|
+| Critical | 5 min | Health endpoint returns non-200; container restart loop |
+| High | 30 min | Error rate > 5%; inference duration p95 > 10s |
+| Medium | 4 hours | SSE client count = 0 for extended period; disk > 80% |
+| Low | Next business day | Elevated log warnings; model download retries |
+
+## Dashboards
+
+**Operations dashboard**:
+- Service health status
+- Request rate by endpoint
+- Inference duration histogram (p50, p95, p99)
+- Active inference count (0-2 gauge)
+- SSE connected clients
+- Error rate by type
+
+**Inference dashboard**:
+- Detections per frame/video
+- Model availability status timeline
+- Engine type distribution (ONNX vs TensorRT)
+- Video batch processing rate