# Observability ## Logging **Current state**: loguru with daily rotated files (`Logs/log_inference_YYYYMMDD.txt`, 30-day retention) + stdout/stderr. Format: `[HH:mm:ss LEVEL] message`. **Recommended improvements**: | Aspect | Current | Recommended | |--------|---------|-------------| | Format | Human-readable | Structured JSON to stdout (container-friendly) | | Fields | timestamp, level, message | + service, correlation_id, context | | PII | Not applicable | No user IDs or tokens in logs | | Retention | 30 days (file) | Console in dev; 7 days staging; 30 days production (via log aggregator) | **Container logging pattern**: Log to stdout/stderr only; let the container runtime (Docker/K8s) handle log collection and routing. Remove file-based logging in containerized deployments. ## Metrics **Recommended `/metrics` endpoint** (Prometheus-compatible): | Metric | Type | Labels | Description | |--------|------|--------|-------------| | `detections_requests_total` | Counter | method, endpoint, status | Total HTTP requests | | `detections_request_duration_seconds` | Histogram | method, endpoint | Request processing time | | `detections_inference_duration_seconds` | Histogram | media_type (image/video) | Inference processing time | | `detections_active_inferences` | Gauge | — | Currently running inference jobs (0-2) | | `detections_sse_clients` | Gauge | — | Connected SSE clients | | `detections_engine_status` | Gauge | engine_type | 1=ready, 0=not ready | **Collection**: Prometheus scrape at 15s intervals. ## Distributed Tracing **Limited applicability**: The Detections service makes outbound HTTP calls to Loader and Annotations. Trace context propagation is recommended for cross-service correlation. | Span | Parent | Description | |------|--------|-------------| | `detections.detect_image` | Client request | Full image detection flow | | `detections.detect_video` | Client request | Full video detection flow | | `detections.model_download` | detect_* | Model download from Loader | | `detections.post_annotation` | detect_* | Annotation POST to Annotations service | **Implementation**: OpenTelemetry Python SDK with OTLP exporter. Sampling: 100% in dev/staging, 10% in production. ## Alerting | Severity | Response Time | Condition | |----------|---------------|-----------| | Critical | 5 min | Health endpoint returns non-200; container restart loop | | High | 30 min | Error rate > 5%; inference duration p95 > 10s | | Medium | 4 hours | SSE client count = 0 for extended period; disk > 80% | | Low | Next business day | Elevated log warnings; model download retries | ## Dashboards **Operations dashboard**: - Service health status - Request rate by endpoint - Inference duration histogram (p50, p95, p99) - Active inference count (0-2 gauge) - SSE connected clients - Error rate by type **Inference dashboard**: - Detections per frame/video - Model availability status timeline - Engine type distribution (ONNX vs TensorRT) - Video batch processing rate