- Added `/detect/video` endpoint for true streaming video detection, allowing inference to start as upload bytes arrive. - Introduced `run_detect_video_stream` method in the inference module to handle video processing from a file-like object. - Updated media hashing to include a new function for computing hashes directly from files with minimal I/O. - Enhanced documentation to reflect changes in video processing and API behavior. Made-with: Cursor
2.9 KiB
Observability
Logging
Current state: loguru with daily rotated files (Logs/log_inference_YYYYMMDD.txt, 30-day retention) + stdout/stderr. Format: [HH:mm:ss LEVEL] message.
Recommended improvements:
| Aspect | Current | Recommended |
|---|---|---|
| Format | Human-readable | Structured JSON to stdout (container-friendly) |
| Fields | timestamp, level, message | + service, correlation_id, context |
| PII | Not applicable | No user IDs or tokens in logs |
| Retention | 30 days (file) | Console in dev; 7 days staging; 30 days production (via log aggregator) |
Container logging pattern: Log to stdout/stderr only; let the container runtime (Docker/K8s) handle log collection and routing. Remove file-based logging in containerized deployments.
Metrics
Recommended /metrics endpoint (Prometheus-compatible):
| Metric | Type | Labels | Description |
|---|---|---|---|
detections_requests_total |
Counter | method, endpoint, status | Total HTTP requests |
detections_request_duration_seconds |
Histogram | method, endpoint | Request processing time |
detections_inference_duration_seconds |
Histogram | media_type (image/video) | Inference processing time |
detections_active_inferences |
Gauge | — | Currently running inference jobs (0-2) |
detections_sse_clients |
Gauge | — | Connected SSE clients |
detections_engine_status |
Gauge | engine_type | 1=ready, 0=not ready |
Collection: Prometheus scrape at 15s intervals.
Distributed Tracing
Limited applicability: The Detections service makes outbound HTTP calls to Loader and Annotations. Trace context propagation is recommended for cross-service correlation.
| Span | Parent | Description |
|---|---|---|
detections.detect_image |
Client request | Full image detection flow |
detections.detect_video |
Client request | Full video detection flow |
detections.model_download |
detect_* | Model download from Loader |
detections.post_annotation |
detect_* | Annotation POST to Annotations service |
Implementation: OpenTelemetry Python SDK with OTLP exporter. Sampling: 100% in dev/staging, 10% in production.
Alerting
| Severity | Response Time | Condition |
|---|---|---|
| Critical | 5 min | Health endpoint returns non-200; container restart loop |
| High | 30 min | Error rate > 5%; inference duration p95 > 10s |
| Medium | 4 hours | SSE client count = 0 for extended period; disk > 80% |
| Low | Next business day | Elevated log warnings; model download retries |
Dashboards
Operations dashboard:
- Service health status
- Request rate by endpoint
- Inference duration histogram (p50, p95, p99)
- Active inference count (0-2 gauge)
- SSE connected clients
- Error rate by type
Inference dashboard:
- Detections per frame/video
- Model availability status timeline
- Engine type distribution (ONNX vs TensorRT)
- Video batch processing rate