[AZ-178] Implement streaming video detection endpoint

- Added `/detect/video` endpoint for true streaming video detection, allowing inference to start as upload bytes arrive. - Introduced `run_detect_video_stream` method in the inference module to handle video processing from a file-like object. - Updated media hashing to include a new function for computing hashes directly from files with minimal I/O. - Enhanced documentation to reflect changes in video processing and API behavior. Made-with: Cursor
2026-06-21 10:41:07 +00:00 · 2026-04-01 03:11:43 +03:00
parent e65d8da6a3
commit be4cab4fcb
42 changed files with 2983 additions and 29 deletions
@@ -0,0 +1,148 @@
+# CI/CD Pipeline
+
+## Platform
+
+GitHub Actions (default recommendation; adaptable to Azure Pipelines).
+
+## Pipeline Stages
+
+| Stage | Trigger | Steps | Quality Gate |
+|-------|---------|-------|-------------|
+| Lint | Every push | `black --check`, Cython syntax check | Zero errors |
+| Unit Test | Every push | `pytest tests/ -v --csv=report.csv` | All pass |
+| Security Scan | Every push | `pip-audit`, Trivy image scan | Zero critical/high CVEs |
+| Build | PR merge to dev | Build `detections-cpu` and `detections-gpu` images, tag with git SHA | Build succeeds |
+| E2E Test | After build | `docker compose -f e2e/docker-compose.test.yml up --abort-on-container-exit` | All e2e tests pass |
+| Push | After e2e | Push images to container registry | Push succeeds |
+| Deploy Staging | After push | Deploy to staging via `scripts/deploy.sh` | Health check passes |
+| Deploy Production | Manual approval | Deploy to production via `scripts/deploy.sh` | Health check passes |
+
+## Workflow Definition (GitHub Actions)
+
+```yaml
+name: CI/CD
+
+on:
+  push:
+    branches: [dev, main]
+  pull_request:
+    branches: [dev]
+
+env:
+  REGISTRY: ${{ secrets.REGISTRY }}
+
+jobs:
+  lint:
+    runs-on: ubuntu-latest
+    steps:
+      - uses: actions/checkout@v4
+      - uses: actions/setup-python@v5
+        with:
+          python-version: "3.11"
+      - run: pip install black
+      - run: black --check src/ tests/
+
+  test:
+    runs-on: ubuntu-latest
+    needs: lint
+    steps:
+      - uses: actions/checkout@v4
+      - uses: actions/setup-python@v5
+        with:
+          python-version: "3.11"
+      - run: pip install -r requirements.txt
+      - run: python setup.py build_ext --inplace
+      - run: cd src && pytest ../tests/ -v
+
+  security:
+    runs-on: ubuntu-latest
+    needs: lint
+    steps:
+      - uses: actions/checkout@v4
+      - uses: actions/setup-python@v5
+        with:
+          python-version: "3.11"
+      - run: pip install pip-audit
+      - run: pip-audit -r requirements.txt
+
+  build:
+    runs-on: ubuntu-latest
+    needs: [test, security]
+    if: github.event_name == 'push'
+    steps:
+      - uses: actions/checkout@v4
+      - uses: docker/setup-buildx-action@v3
+      - uses: docker/login-action@v3
+        with:
+          registry: ${{ env.REGISTRY }}
+          username: ${{ secrets.REGISTRY_USER }}
+          password: ${{ secrets.REGISTRY_PASSWORD }}
+      - uses: docker/build-push-action@v5
+        with:
+          context: .
+          file: Dockerfile
+          push: true
+          tags: ${{ env.REGISTRY }}/azaion/detections-cpu:${{ github.sha }}
+          cache-from: type=gha
+          cache-to: type=gha,mode=max
+
+  e2e:
+    runs-on: ubuntu-latest
+    needs: build
+    steps:
+      - uses: actions/checkout@v4
+      - run: |
+          cd e2e
+          COMPOSE_PROFILES=cpu docker compose -f docker-compose.test.yml up \
+            --build --abort-on-container-exit --exit-code-from e2e-runner
+
+  deploy-staging:
+    runs-on: ubuntu-latest
+    needs: e2e
+    if: github.ref == 'refs/heads/dev'
+    environment: staging
+    steps:
+      - uses: actions/checkout@v4
+      - run: |
+          IMAGE_TAG=${{ github.sha }} \
+          DEPLOY_HOST=${{ secrets.STAGING_HOST }} \
+          DEPLOY_USER=${{ secrets.STAGING_USER }} \
+          bash scripts/deploy.sh
+
+  deploy-production:
+    runs-on: ubuntu-latest
+    needs: e2e
+    if: github.ref == 'refs/heads/main'
+    environment:
+      name: production
+      url: ${{ secrets.PRODUCTION_URL }}
+    steps:
+      - uses: actions/checkout@v4
+      - run: |
+          IMAGE_TAG=${{ github.sha }} \
+          DEPLOY_HOST=${{ secrets.PRODUCTION_HOST }} \
+          DEPLOY_USER=${{ secrets.PRODUCTION_USER }} \
+          bash scripts/deploy.sh
+```
+
+## Caching Strategy
+
+| Cache Type | Scope | Tool |
+|-----------|-------|------|
+| Python dependencies | Per requirements.txt hash | actions/cache + pip cache dir |
+| Docker layers | Per Dockerfile hash | BuildKit GHA cache |
+| Cython compiled modules | Per src/ hash | actions/cache |
+
+## Parallelization
+
+- `test` and `security` jobs run in parallel after `lint`
+- `build` waits for both `test` and `security`
+- GPU image build can be added as a parallel job to CPU build
+
+## Notifications
+
+| Event | Channel |
+|-------|---------|
+| Build failure | GitHub PR status check (blocks merge) |
+| Security scan failure | GitHub PR status check + team notification |
+| Deployment success/failure | Deployment environment status |
@@ -0,0 +1,125 @@
+# Containerization Plan
+
+## Image Variants
+
+### detections-cpu (Dockerfile)
+
+| Aspect | Specification |
+|--------|--------------|
+| Base image | `python:3.11-slim` (pinned digest recommended) |
+| Build stages | Single stage (Cython compile requires gcc at runtime for setup.py) |
+| Non-root user | `adduser --disabled-password --gecos '' appuser` + `USER appuser` |
+| Health check | `HEALTHCHECK --interval=30s --timeout=5s CMD curl -f http://localhost:8080/health \|\| exit 1` |
+| Exposed ports | 8080 |
+| Entrypoint | `uvicorn main:app --host 0.0.0.0 --port 8080` |
+
+**Changes needed to existing Dockerfile**:
+1. Add non-root user (security finding F7)
+2. Add HEALTHCHECK directive
+3. Pin `python:3.11-slim` to specific digest
+4. Add `curl` to apt-get install (for health check)
+
+### detections-gpu (Dockerfile.gpu)
+
+| Aspect | Specification |
+|--------|--------------|
+| Base image | `nvidia/cuda:12.2.0-runtime-ubuntu22.04` |
+| Build stages | Single stage |
+| Non-root user | `adduser --disabled-password --gecos '' appuser` + `USER appuser` |
+| Health check | `HEALTHCHECK --interval=30s --timeout=5s CMD curl -f http://localhost:8080/health \|\| exit 1` |
+| Exposed ports | 8080 |
+| Entrypoint | `uvicorn main:app --host 0.0.0.0 --port 8080` |
+| Runtime | Requires `--runtime=nvidia` or `nvidia` runtime in Docker |
+
+**Changes needed to existing Dockerfile.gpu**:
+1. Add non-root user
+2. Add HEALTHCHECK directive
+3. Add `curl` to apt-get install
+
+### .dockerignore
+
+```
+.git
+.gitignore
+_docs/
+_standalone/
+e2e/
+tests/
+*.md
+.env
+.env.*
+.cursor/
+.venv/
+venv/
+__pycache__/
+*.pyc
+build/
+dist/
+*.egg-info
+Logs/
+```
+
+## Docker Compose — Local Development
+
+`docker-compose.yml` (already partially exists as `e2e/docker-compose.mocks.yml`):
+
+```yaml
+name: detections-dev
+
+services:
+  mock-loader:
+    build: ./e2e/mocks/loader
+    ports:
+      - "18080:8080"
+    volumes:
+      - ./e2e/fixtures:/models
+    networks:
+      - dev-net
+
+  mock-annotations:
+    build: ./e2e/mocks/annotations
+    ports:
+      - "18081:8081"
+    networks:
+      - dev-net
+
+  detections:
+    build:
+      context: .
+      dockerfile: Dockerfile
+    ports:
+      - "8080:8080"
+    depends_on:
+      - mock-loader
+      - mock-annotations
+    env_file: .env
+    environment:
+      LOADER_URL: http://mock-loader:8080
+      ANNOTATIONS_URL: http://mock-annotations:8081
+    volumes:
+      - ./e2e/fixtures/classes.json:/app/classes.json:ro
+      - detections-logs:/app/Logs
+    shm_size: 512m
+    networks:
+      - dev-net
+
+volumes:
+  detections-logs:
+
+networks:
+  dev-net:
+    driver: bridge
+```
+
+## Docker Compose — Blackbox Tests
+
+Already exists: `e2e/docker-compose.test.yml`. No changes needed — supports both `cpu` and `gpu` profiles with mock services and test runner.
+
+## Image Tagging Strategy
+
+| Context | Tag Format | Example |
+|---------|------------|---------|
+| CI builds | `<registry>/azaion/detections-cpu:<git-sha>` | `registry.example.com/azaion/detections-cpu:a1b2c3d` |
+| CI builds (GPU) | `<registry>/azaion/detections-gpu:<git-sha>` | `registry.example.com/azaion/detections-gpu:a1b2c3d` |
+| Local development | `detections-cpu:dev` | — |
+| Latest stable | `<registry>/azaion/detections-cpu:latest` | Updated on merge to main |
@@ -0,0 +1,65 @@
+# Deployment Scripts
+
+All scripts are in `scripts/` at the project root. Each script is POSIX-compatible (`bash` with `set -euo pipefail`), sources `.env` from the project root, and supports `--help`.
+
+## Script Reference
+
+| Script | Purpose | Key Env Vars |
+|--------|---------|-------------|
+| `deploy.sh` | Orchestrates full deployment | REGISTRY, IMAGE_TAG, DEPLOY_HOST, DEPLOY_USER |
+| `pull-images.sh` | Pulls Docker images from registry | REGISTRY, IMAGE_TAG |
+| `start-services.sh` | Starts the detections container | REGISTRY, IMAGE_TAG, LOADER_URL, ANNOTATIONS_URL |
+| `stop-services.sh` | Gracefully stops and removes container | — |
+| `health-check.sh` | Verifies `/health` endpoint | HEALTH_CHECK_HOST, HEALTH_CHECK_PORT |
+
+## Usage
+
+### Full Deployment
+
+```bash
+REGISTRY=registry.example.com IMAGE_TAG=a1b2c3d bash scripts/deploy.sh
+```
+
+### Remote Deployment (via SSH)
+
+```bash
+REGISTRY=registry.example.com \
+IMAGE_TAG=a1b2c3d \
+DEPLOY_HOST=production-server.example.com \
+DEPLOY_USER=deploy \
+bash scripts/deploy.sh
+```
+
+### Rollback
+
+```bash
+bash scripts/deploy.sh --rollback
+```
+
+Redeploys the previous image tag (saved in `.deploy-previous-tag` by `stop-services.sh`).
+
+### Local Development
+
+```bash
+docker compose -f docker-compose.yml up
+```
+
+Or using the e2e compose for testing:
+
+```bash
+cd e2e && COMPOSE_PROFILES=cpu docker compose -f docker-compose.test.yml up --build
+```
+
+## Deployment Flow
+
+```
+deploy.sh
+  ├── pull-images.sh    → docker pull
+  ├── stop-services.sh  → docker stop (30s grace) + save previous tag
+  ├── start-services.sh → docker run
+  └── health-check.sh   → curl /health (10 retries, 3s interval)
+```
+
+## Environment Variables
+
+All scripts source `.env` from the project root if it exists. Variables can also be passed directly via the environment. See `.env.example` for the full list.
@@ -0,0 +1,70 @@
+# Deployment Procedures
+
+## Deployment Strategy
+
+**Pattern**: Rolling deployment (single-instance service; blue-green not applicable without load balancer).
+
+**Rationale**: The Detections service is a single-instance stateless service with in-memory state (`_active_detections`, `_event_queues`). Rolling replacement with graceful shutdown is the simplest approach.
+
+**Zero-downtime**: Achievable only with load-balanced multi-instance setup. For single-instance deployments, brief downtime (~10-30s) during container restart is expected.
+
+## Deployment Flow
+
+```
+1. Pull new images → pull-images.sh
+2. Stop current services gracefully → stop-services.sh (30s grace)
+3. Start new services → start-services.sh
+4. Verify health → health-check.sh
+5. If unhealthy → rollback (deploy.sh --rollback)
+```
+
+## Health Checks
+
+| Check | Type | Endpoint | Interval | Threshold |
+|-------|------|----------|----------|-----------|
+| Liveness | HTTP GET | `/health` | 30s | 3 failures → restart |
+| Readiness | HTTP GET | `/health` (check `aiAvailability` != "None") | 10s | Wait for engine init |
+
+**Note**: The service's `/health` endpoint returns `{"status": "healthy"}` immediately, with `aiAvailability` transitioning through DOWNLOADING → CONVERTING → ENABLED on first inference request (lazy init). Readiness depends on the use case — the API is ready to accept requests immediately, but inference is only ready after engine initialization.
+
+## Rollback Procedures
+
+**Trigger criteria**:
+- Health check fails after deployment (3 consecutive failures)
+- Error rate spike detected in monitoring
+- Critical bug reported by users
+
+**Rollback steps**:
+1. Run `deploy.sh --rollback` — redeploys the previous image tag
+2. Verify health via `health-check.sh`
+3. Notify stakeholders of rollback
+4. Investigate root cause
+
+**Previous image tag**: Saved by `stop-services.sh` to `.deploy-previous-tag` before each deployment.
+
+## Deployment Checklist
+
+Pre-deployment:
+- [ ] All CI pipeline stages pass (lint, test, security, e2e)
+- [ ] Security scan clean (zero critical/high CVEs)
+- [ ] Environment variables configured on target
+- [ ] `classes.json` available on target (or mounted volume)
+- [ ] ONNX model available in Loader service
+- [ ] Loader and Annotations services reachable from target
+
+Post-deployment:
+- [ ] `/health` returns 200
+- [ ] `aiAvailability` transitions to ENABLED after first request
+- [ ] SSE stream endpoint accessible
+- [ ] Detection request returns valid results
+- [ ] Logs show no errors
+
+## Graceful Shutdown
+
+**Current limitation**: No graceful shutdown implemented. In-progress detections are aborted on container stop. Background TensorRT conversion runs in a daemon thread (terminates with process).
+
+**Mitigation**: The `stop-services.sh` script sends SIGTERM with a 30-second grace period, allowing uvicorn to finish in-flight HTTP requests. Long-running video detections may be interrupted.
+
+## Database Migrations
+
+Not applicable — the Detections service has no database. All persistence is delegated to the Annotations service.
@@ -0,0 +1,42 @@
+# Environment Strategy
+
+## Environments
+
+| Environment | Purpose | Infrastructure | Data |
+|-------------|---------|---------------|------|
+| Development | Local developer workflow | docker-compose with mock services | Mock Loader serves test ONNX model; mock Annotations accepts all requests |
+| Staging | Pre-production validation | Mirrors production topology (Docker or K8s) | Real Loader with test model; real Annotations with test database |
+| Production | Live system | Docker with GPU (TensorRT) + reverse proxy | Real Loader, real Annotations, production model |
+
+## Environment Variable Management
+
+| Source | Environment | Method |
+|--------|-------------|--------|
+| `.env` file | Development | Loaded by docker-compose; git-ignored |
+| `.env.example` | All | Template committed to VCS (no secrets) |
+| Secret manager | Staging/Production | Inject via deployment scripts or K8s secrets |
+
+All required variables are listed in `.env.example`. The application fails fast on missing `classes.json` (startup crash) but uses safe defaults for all other variables.
+
+## Secrets Management
+
+| Secret | Development | Staging | Production |
+|--------|-------------|---------|------------|
+| Container registry credentials | Local registry or none | CI/CD secret | CI/CD secret |
+| SSH deploy key | N/A | CI/CD secret | CI/CD secret |
+| Bearer tokens | Test tokens from mock | Real auth service | Real auth service |
+
+**Rotation policy**: Registry credentials and deploy keys should be rotated every 90 days. Bearer tokens are per-request (no stored credentials in the service).
+
+**No secrets stored by the service**: Detections is stateless — tokens come from client HTTP headers and are forwarded to the Annotations service. No database credentials, API keys, or encryption keys are needed.
+
+## Configuration Per Environment
+
+| Config | Development | Staging | Production |
+|--------|-------------|---------|------------|
+| LOADER_URL | http://mock-loader:8080 | http://loader:8080 | http://loader:8080 |
+| ANNOTATIONS_URL | http://mock-annotations:8081 | http://annotations:8080 | http://annotations:8080 |
+| GPU | Not required (ONNX CPU) | Optional | Required (TensorRT) |
+| Log level | DEBUG (stdout) | INFO (file + stdout) | INFO (file) |
+| TLS | None | Reverse proxy | Reverse proxy |
+| Rate limiting | None | Reverse proxy (optional) | Reverse proxy (required) |
@@ -0,0 +1,69 @@
+# Observability
+
+## Logging
+
+**Current state**: loguru with daily rotated files (`Logs/log_inference_YYYYMMDD.txt`, 30-day retention) + stdout/stderr. Format: `[HH:mm:ss LEVEL] message`.
+
+**Recommended improvements**:
+
+| Aspect | Current | Recommended |
+|--------|---------|-------------|
+| Format | Human-readable | Structured JSON to stdout (container-friendly) |
+| Fields | timestamp, level, message | + service, correlation_id, context |
+| PII | Not applicable | No user IDs or tokens in logs |
+| Retention | 30 days (file) | Console in dev; 7 days staging; 30 days production (via log aggregator) |
+
+**Container logging pattern**: Log to stdout/stderr only; let the container runtime (Docker/K8s) handle log collection and routing. Remove file-based logging in containerized deployments.
+
+## Metrics
+
+**Recommended `/metrics` endpoint** (Prometheus-compatible):
+
+| Metric | Type | Labels | Description |
+|--------|------|--------|-------------|
+| `detections_requests_total` | Counter | method, endpoint, status | Total HTTP requests |
+| `detections_request_duration_seconds` | Histogram | method, endpoint | Request processing time |
+| `detections_inference_duration_seconds` | Histogram | media_type (image/video) | Inference processing time |
+| `detections_active_inferences` | Gauge | — | Currently running inference jobs (0-2) |
+| `detections_sse_clients` | Gauge | — | Connected SSE clients |
+| `detections_engine_status` | Gauge | engine_type | 1=ready, 0=not ready |
+
+**Collection**: Prometheus scrape at 15s intervals.
+
+## Distributed Tracing
+
+**Limited applicability**: The Detections service makes outbound HTTP calls to Loader and Annotations. Trace context propagation is recommended for cross-service correlation.
+
+| Span | Parent | Description |
+|------|--------|-------------|
+| `detections.detect_image` | Client request | Full image detection flow |
+| `detections.detect_video` | Client request | Full video detection flow |
+| `detections.model_download` | detect_* | Model download from Loader |
+| `detections.post_annotation` | detect_* | Annotation POST to Annotations service |
+
+**Implementation**: OpenTelemetry Python SDK with OTLP exporter. Sampling: 100% in dev/staging, 10% in production.
+
+## Alerting
+
+| Severity | Response Time | Condition |
+|----------|---------------|-----------|
+| Critical | 5 min | Health endpoint returns non-200; container restart loop |
+| High | 30 min | Error rate > 5%; inference duration p95 > 10s |
+| Medium | 4 hours | SSE client count = 0 for extended period; disk > 80% |
+| Low | Next business day | Elevated log warnings; model download retries |
+
+## Dashboards
+
+**Operations dashboard**:
+- Service health status
+- Request rate by endpoint
+- Inference duration histogram (p50, p95, p99)
+- Active inference count (0-2 gauge)
+- SSE connected clients
+- Error rate by type
+
+**Inference dashboard**:
+- Detections per frame/video
+- Model availability status timeline
+- Engine type distribution (ONNX vs TensorRT)
+- Video batch processing rate
@@ -0,0 +1,56 @@
+# Deployment Status Report
+
+**Date**: 2026-03-31
+**Project**: Azaion.Detections
+
+## Component Readiness
+
+| Component | Status | Notes |
+|-----------|--------|-------|
+| Detections Service (CPU) | Implemented & Tested | Dockerfile exists, e2e tests pass |
+| Detections Service (GPU) | Implemented & Tested | Dockerfile.gpu exists, e2e tests pass |
+| E2E Test Suite | Implemented | docker-compose.test.yml exists |
+| Mock Services (Loader, Annotations) | Implemented | docker-compose.mocks.yml exists |
+
+## External Dependencies
+
+| Dependency | Type | Required | Notes |
+|------------|------|----------|-------|
+| Loader Service | REST API | Yes | Model download/upload |
+| Annotations Service | REST API | Yes | Result persistence, auth refresh |
+| classes.json | Config file | Yes | Mounted into container |
+| ONNX model (azaion.onnx) | AI model | Yes | Served by Loader |
+
+## Infrastructure Prerequisites
+
+| Prerequisite | Status | Action Required |
+|-------------|--------|-----------------|
+| Container registry | Needed | User must provide registry URL |
+| Docker runtime | Available | Both CPU and GPU Dockerfiles exist |
+| NVIDIA runtime | Needed for GPU | `nvidia-docker` required for Dockerfile.gpu |
+| Reverse proxy / TLS | Needed | No TLS at application level; requires external termination |
+| DNS / Service discovery | Needed | Services communicate by hostname |
+| Persistent storage | Not needed | Service is stateless |
+
+## Environment Variables
+
+| Variable | Default | Required | Purpose |
+|----------|---------|----------|---------|
+| LOADER_URL | http://loader:8080 | Yes | Loader service endpoint |
+| ANNOTATIONS_URL | http://annotations:8080 | Yes | Annotations service endpoint |
+| CLASSES_JSON_PATH | classes.json | Yes | Detection class definitions |
+| LOG_DIR | Logs | No | Log output directory |
+| VIDEOS_DIR | ./data/videos | No | Upload video storage |
+| IMAGES_DIR | ./data/images | No | Upload image storage |
+
+## Deployment Blockers
+
+| Blocker | Severity | Mitigation |
+|---------|----------|------------|
+| Security findings (FAIL verdict) | High | Fix Critical/High CVEs before production (see `_docs/05_security/security_report.md`) |
+| Containers run as root | Medium | Add non-root USER directive to Dockerfiles |
+| No CI/CD pipeline | Medium | Define and implement pipeline |
+
+## Recommendation
+
+The application is functionally ready for deployment. Security findings from the audit should be addressed before production deployment — at minimum, pin dependencies to fix CVE-2025-43859 and CVE-2026-28356.