[AZ-178] Implement streaming video detection endpoint

- Added `/detect/video` endpoint for true streaming video detection, allowing inference to start as upload bytes arrive.
- Introduced `run_detect_video_stream` method in the inference module to handle video processing from a file-like object.
- Updated media hashing to include a new function for computing hashes directly from files with minimal I/O.
- Enhanced documentation to reflect changes in video processing and API behavior.

Made-with: Cursor
This commit is contained in:
Oleksandr Bezdieniezhnykh
2026-04-01 03:11:43 +03:00
parent e65d8da6a3
commit be4cab4fcb
42 changed files with 2983 additions and 29 deletions
+148
View File
@@ -0,0 +1,148 @@
# CI/CD Pipeline
## Platform
GitHub Actions (default recommendation; adaptable to Azure Pipelines).
## Pipeline Stages
| Stage | Trigger | Steps | Quality Gate |
|-------|---------|-------|-------------|
| Lint | Every push | `black --check`, Cython syntax check | Zero errors |
| Unit Test | Every push | `pytest tests/ -v --csv=report.csv` | All pass |
| Security Scan | Every push | `pip-audit`, Trivy image scan | Zero critical/high CVEs |
| Build | PR merge to dev | Build `detections-cpu` and `detections-gpu` images, tag with git SHA | Build succeeds |
| E2E Test | After build | `docker compose -f e2e/docker-compose.test.yml up --abort-on-container-exit` | All e2e tests pass |
| Push | After e2e | Push images to container registry | Push succeeds |
| Deploy Staging | After push | Deploy to staging via `scripts/deploy.sh` | Health check passes |
| Deploy Production | Manual approval | Deploy to production via `scripts/deploy.sh` | Health check passes |
## Workflow Definition (GitHub Actions)
```yaml
name: CI/CD
on:
push:
branches: [dev, main]
pull_request:
branches: [dev]
env:
REGISTRY: ${{ secrets.REGISTRY }}
jobs:
lint:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
- uses: actions/setup-python@v5
with:
python-version: "3.11"
- run: pip install black
- run: black --check src/ tests/
test:
runs-on: ubuntu-latest
needs: lint
steps:
- uses: actions/checkout@v4
- uses: actions/setup-python@v5
with:
python-version: "3.11"
- run: pip install -r requirements.txt
- run: python setup.py build_ext --inplace
- run: cd src && pytest ../tests/ -v
security:
runs-on: ubuntu-latest
needs: lint
steps:
- uses: actions/checkout@v4
- uses: actions/setup-python@v5
with:
python-version: "3.11"
- run: pip install pip-audit
- run: pip-audit -r requirements.txt
build:
runs-on: ubuntu-latest
needs: [test, security]
if: github.event_name == 'push'
steps:
- uses: actions/checkout@v4
- uses: docker/setup-buildx-action@v3
- uses: docker/login-action@v3
with:
registry: ${{ env.REGISTRY }}
username: ${{ secrets.REGISTRY_USER }}
password: ${{ secrets.REGISTRY_PASSWORD }}
- uses: docker/build-push-action@v5
with:
context: .
file: Dockerfile
push: true
tags: ${{ env.REGISTRY }}/azaion/detections-cpu:${{ github.sha }}
cache-from: type=gha
cache-to: type=gha,mode=max
e2e:
runs-on: ubuntu-latest
needs: build
steps:
- uses: actions/checkout@v4
- run: |
cd e2e
COMPOSE_PROFILES=cpu docker compose -f docker-compose.test.yml up \
--build --abort-on-container-exit --exit-code-from e2e-runner
deploy-staging:
runs-on: ubuntu-latest
needs: e2e
if: github.ref == 'refs/heads/dev'
environment: staging
steps:
- uses: actions/checkout@v4
- run: |
IMAGE_TAG=${{ github.sha }} \
DEPLOY_HOST=${{ secrets.STAGING_HOST }} \
DEPLOY_USER=${{ secrets.STAGING_USER }} \
bash scripts/deploy.sh
deploy-production:
runs-on: ubuntu-latest
needs: e2e
if: github.ref == 'refs/heads/main'
environment:
name: production
url: ${{ secrets.PRODUCTION_URL }}
steps:
- uses: actions/checkout@v4
- run: |
IMAGE_TAG=${{ github.sha }} \
DEPLOY_HOST=${{ secrets.PRODUCTION_HOST }} \
DEPLOY_USER=${{ secrets.PRODUCTION_USER }} \
bash scripts/deploy.sh
```
## Caching Strategy
| Cache Type | Scope | Tool |
|-----------|-------|------|
| Python dependencies | Per requirements.txt hash | actions/cache + pip cache dir |
| Docker layers | Per Dockerfile hash | BuildKit GHA cache |
| Cython compiled modules | Per src/ hash | actions/cache |
## Parallelization
- `test` and `security` jobs run in parallel after `lint`
- `build` waits for both `test` and `security`
- GPU image build can be added as a parallel job to CPU build
## Notifications
| Event | Channel |
|-------|---------|
| Build failure | GitHub PR status check (blocks merge) |
| Security scan failure | GitHub PR status check + team notification |
| Deployment success/failure | Deployment environment status |
+125
View File
@@ -0,0 +1,125 @@
# Containerization Plan
## Image Variants
### detections-cpu (Dockerfile)
| Aspect | Specification |
|--------|--------------|
| Base image | `python:3.11-slim` (pinned digest recommended) |
| Build stages | Single stage (Cython compile requires gcc at runtime for setup.py) |
| Non-root user | `adduser --disabled-password --gecos '' appuser` + `USER appuser` |
| Health check | `HEALTHCHECK --interval=30s --timeout=5s CMD curl -f http://localhost:8080/health \|\| exit 1` |
| Exposed ports | 8080 |
| Entrypoint | `uvicorn main:app --host 0.0.0.0 --port 8080` |
**Changes needed to existing Dockerfile**:
1. Add non-root user (security finding F7)
2. Add HEALTHCHECK directive
3. Pin `python:3.11-slim` to specific digest
4. Add `curl` to apt-get install (for health check)
### detections-gpu (Dockerfile.gpu)
| Aspect | Specification |
|--------|--------------|
| Base image | `nvidia/cuda:12.2.0-runtime-ubuntu22.04` |
| Build stages | Single stage |
| Non-root user | `adduser --disabled-password --gecos '' appuser` + `USER appuser` |
| Health check | `HEALTHCHECK --interval=30s --timeout=5s CMD curl -f http://localhost:8080/health \|\| exit 1` |
| Exposed ports | 8080 |
| Entrypoint | `uvicorn main:app --host 0.0.0.0 --port 8080` |
| Runtime | Requires `--runtime=nvidia` or `nvidia` runtime in Docker |
**Changes needed to existing Dockerfile.gpu**:
1. Add non-root user
2. Add HEALTHCHECK directive
3. Add `curl` to apt-get install
### .dockerignore
```
.git
.gitignore
_docs/
_standalone/
e2e/
tests/
*.md
.env
.env.*
.cursor/
.venv/
venv/
__pycache__/
*.pyc
build/
dist/
*.egg-info
Logs/
```
## Docker Compose — Local Development
`docker-compose.yml` (already partially exists as `e2e/docker-compose.mocks.yml`):
```yaml
name: detections-dev
services:
mock-loader:
build: ./e2e/mocks/loader
ports:
- "18080:8080"
volumes:
- ./e2e/fixtures:/models
networks:
- dev-net
mock-annotations:
build: ./e2e/mocks/annotations
ports:
- "18081:8081"
networks:
- dev-net
detections:
build:
context: .
dockerfile: Dockerfile
ports:
- "8080:8080"
depends_on:
- mock-loader
- mock-annotations
env_file: .env
environment:
LOADER_URL: http://mock-loader:8080
ANNOTATIONS_URL: http://mock-annotations:8081
volumes:
- ./e2e/fixtures/classes.json:/app/classes.json:ro
- detections-logs:/app/Logs
shm_size: 512m
networks:
- dev-net
volumes:
detections-logs:
networks:
dev-net:
driver: bridge
```
## Docker Compose — Blackbox Tests
Already exists: `e2e/docker-compose.test.yml`. No changes needed — supports both `cpu` and `gpu` profiles with mock services and test runner.
## Image Tagging Strategy
| Context | Tag Format | Example |
|---------|------------|---------|
| CI builds | `<registry>/azaion/detections-cpu:<git-sha>` | `registry.example.com/azaion/detections-cpu:a1b2c3d` |
| CI builds (GPU) | `<registry>/azaion/detections-gpu:<git-sha>` | `registry.example.com/azaion/detections-gpu:a1b2c3d` |
| Local development | `detections-cpu:dev` | — |
| Latest stable | `<registry>/azaion/detections-cpu:latest` | Updated on merge to main |
+65
View File
@@ -0,0 +1,65 @@
# Deployment Scripts
All scripts are in `scripts/` at the project root. Each script is POSIX-compatible (`bash` with `set -euo pipefail`), sources `.env` from the project root, and supports `--help`.
## Script Reference
| Script | Purpose | Key Env Vars |
|--------|---------|-------------|
| `deploy.sh` | Orchestrates full deployment | REGISTRY, IMAGE_TAG, DEPLOY_HOST, DEPLOY_USER |
| `pull-images.sh` | Pulls Docker images from registry | REGISTRY, IMAGE_TAG |
| `start-services.sh` | Starts the detections container | REGISTRY, IMAGE_TAG, LOADER_URL, ANNOTATIONS_URL |
| `stop-services.sh` | Gracefully stops and removes container | — |
| `health-check.sh` | Verifies `/health` endpoint | HEALTH_CHECK_HOST, HEALTH_CHECK_PORT |
## Usage
### Full Deployment
```bash
REGISTRY=registry.example.com IMAGE_TAG=a1b2c3d bash scripts/deploy.sh
```
### Remote Deployment (via SSH)
```bash
REGISTRY=registry.example.com \
IMAGE_TAG=a1b2c3d \
DEPLOY_HOST=production-server.example.com \
DEPLOY_USER=deploy \
bash scripts/deploy.sh
```
### Rollback
```bash
bash scripts/deploy.sh --rollback
```
Redeploys the previous image tag (saved in `.deploy-previous-tag` by `stop-services.sh`).
### Local Development
```bash
docker compose -f docker-compose.yml up
```
Or using the e2e compose for testing:
```bash
cd e2e && COMPOSE_PROFILES=cpu docker compose -f docker-compose.test.yml up --build
```
## Deployment Flow
```
deploy.sh
├── pull-images.sh → docker pull
├── stop-services.sh → docker stop (30s grace) + save previous tag
├── start-services.sh → docker run
└── health-check.sh → curl /health (10 retries, 3s interval)
```
## Environment Variables
All scripts source `.env` from the project root if it exists. Variables can also be passed directly via the environment. See `.env.example` for the full list.
+70
View File
@@ -0,0 +1,70 @@
# Deployment Procedures
## Deployment Strategy
**Pattern**: Rolling deployment (single-instance service; blue-green not applicable without load balancer).
**Rationale**: The Detections service is a single-instance stateless service with in-memory state (`_active_detections`, `_event_queues`). Rolling replacement with graceful shutdown is the simplest approach.
**Zero-downtime**: Achievable only with load-balanced multi-instance setup. For single-instance deployments, brief downtime (~10-30s) during container restart is expected.
## Deployment Flow
```
1. Pull new images → pull-images.sh
2. Stop current services gracefully → stop-services.sh (30s grace)
3. Start new services → start-services.sh
4. Verify health → health-check.sh
5. If unhealthy → rollback (deploy.sh --rollback)
```
## Health Checks
| Check | Type | Endpoint | Interval | Threshold |
|-------|------|----------|----------|-----------|
| Liveness | HTTP GET | `/health` | 30s | 3 failures → restart |
| Readiness | HTTP GET | `/health` (check `aiAvailability` != "None") | 10s | Wait for engine init |
**Note**: The service's `/health` endpoint returns `{"status": "healthy"}` immediately, with `aiAvailability` transitioning through DOWNLOADING → CONVERTING → ENABLED on first inference request (lazy init). Readiness depends on the use case — the API is ready to accept requests immediately, but inference is only ready after engine initialization.
## Rollback Procedures
**Trigger criteria**:
- Health check fails after deployment (3 consecutive failures)
- Error rate spike detected in monitoring
- Critical bug reported by users
**Rollback steps**:
1. Run `deploy.sh --rollback` — redeploys the previous image tag
2. Verify health via `health-check.sh`
3. Notify stakeholders of rollback
4. Investigate root cause
**Previous image tag**: Saved by `stop-services.sh` to `.deploy-previous-tag` before each deployment.
## Deployment Checklist
Pre-deployment:
- [ ] All CI pipeline stages pass (lint, test, security, e2e)
- [ ] Security scan clean (zero critical/high CVEs)
- [ ] Environment variables configured on target
- [ ] `classes.json` available on target (or mounted volume)
- [ ] ONNX model available in Loader service
- [ ] Loader and Annotations services reachable from target
Post-deployment:
- [ ] `/health` returns 200
- [ ] `aiAvailability` transitions to ENABLED after first request
- [ ] SSE stream endpoint accessible
- [ ] Detection request returns valid results
- [ ] Logs show no errors
## Graceful Shutdown
**Current limitation**: No graceful shutdown implemented. In-progress detections are aborted on container stop. Background TensorRT conversion runs in a daemon thread (terminates with process).
**Mitigation**: The `stop-services.sh` script sends SIGTERM with a 30-second grace period, allowing uvicorn to finish in-flight HTTP requests. Long-running video detections may be interrupted.
## Database Migrations
Not applicable — the Detections service has no database. All persistence is delegated to the Annotations service.
+42
View File
@@ -0,0 +1,42 @@
# Environment Strategy
## Environments
| Environment | Purpose | Infrastructure | Data |
|-------------|---------|---------------|------|
| Development | Local developer workflow | docker-compose with mock services | Mock Loader serves test ONNX model; mock Annotations accepts all requests |
| Staging | Pre-production validation | Mirrors production topology (Docker or K8s) | Real Loader with test model; real Annotations with test database |
| Production | Live system | Docker with GPU (TensorRT) + reverse proxy | Real Loader, real Annotations, production model |
## Environment Variable Management
| Source | Environment | Method |
|--------|-------------|--------|
| `.env` file | Development | Loaded by docker-compose; git-ignored |
| `.env.example` | All | Template committed to VCS (no secrets) |
| Secret manager | Staging/Production | Inject via deployment scripts or K8s secrets |
All required variables are listed in `.env.example`. The application fails fast on missing `classes.json` (startup crash) but uses safe defaults for all other variables.
## Secrets Management
| Secret | Development | Staging | Production |
|--------|-------------|---------|------------|
| Container registry credentials | Local registry or none | CI/CD secret | CI/CD secret |
| SSH deploy key | N/A | CI/CD secret | CI/CD secret |
| Bearer tokens | Test tokens from mock | Real auth service | Real auth service |
**Rotation policy**: Registry credentials and deploy keys should be rotated every 90 days. Bearer tokens are per-request (no stored credentials in the service).
**No secrets stored by the service**: Detections is stateless — tokens come from client HTTP headers and are forwarded to the Annotations service. No database credentials, API keys, or encryption keys are needed.
## Configuration Per Environment
| Config | Development | Staging | Production |
|--------|-------------|---------|------------|
| LOADER_URL | http://mock-loader:8080 | http://loader:8080 | http://loader:8080 |
| ANNOTATIONS_URL | http://mock-annotations:8081 | http://annotations:8080 | http://annotations:8080 |
| GPU | Not required (ONNX CPU) | Optional | Required (TensorRT) |
| Log level | DEBUG (stdout) | INFO (file + stdout) | INFO (file) |
| TLS | None | Reverse proxy | Reverse proxy |
| Rate limiting | None | Reverse proxy (optional) | Reverse proxy (required) |
+69
View File
@@ -0,0 +1,69 @@
# Observability
## Logging
**Current state**: loguru with daily rotated files (`Logs/log_inference_YYYYMMDD.txt`, 30-day retention) + stdout/stderr. Format: `[HH:mm:ss LEVEL] message`.
**Recommended improvements**:
| Aspect | Current | Recommended |
|--------|---------|-------------|
| Format | Human-readable | Structured JSON to stdout (container-friendly) |
| Fields | timestamp, level, message | + service, correlation_id, context |
| PII | Not applicable | No user IDs or tokens in logs |
| Retention | 30 days (file) | Console in dev; 7 days staging; 30 days production (via log aggregator) |
**Container logging pattern**: Log to stdout/stderr only; let the container runtime (Docker/K8s) handle log collection and routing. Remove file-based logging in containerized deployments.
## Metrics
**Recommended `/metrics` endpoint** (Prometheus-compatible):
| Metric | Type | Labels | Description |
|--------|------|--------|-------------|
| `detections_requests_total` | Counter | method, endpoint, status | Total HTTP requests |
| `detections_request_duration_seconds` | Histogram | method, endpoint | Request processing time |
| `detections_inference_duration_seconds` | Histogram | media_type (image/video) | Inference processing time |
| `detections_active_inferences` | Gauge | — | Currently running inference jobs (0-2) |
| `detections_sse_clients` | Gauge | — | Connected SSE clients |
| `detections_engine_status` | Gauge | engine_type | 1=ready, 0=not ready |
**Collection**: Prometheus scrape at 15s intervals.
## Distributed Tracing
**Limited applicability**: The Detections service makes outbound HTTP calls to Loader and Annotations. Trace context propagation is recommended for cross-service correlation.
| Span | Parent | Description |
|------|--------|-------------|
| `detections.detect_image` | Client request | Full image detection flow |
| `detections.detect_video` | Client request | Full video detection flow |
| `detections.model_download` | detect_* | Model download from Loader |
| `detections.post_annotation` | detect_* | Annotation POST to Annotations service |
**Implementation**: OpenTelemetry Python SDK with OTLP exporter. Sampling: 100% in dev/staging, 10% in production.
## Alerting
| Severity | Response Time | Condition |
|----------|---------------|-----------|
| Critical | 5 min | Health endpoint returns non-200; container restart loop |
| High | 30 min | Error rate > 5%; inference duration p95 > 10s |
| Medium | 4 hours | SSE client count = 0 for extended period; disk > 80% |
| Low | Next business day | Elevated log warnings; model download retries |
## Dashboards
**Operations dashboard**:
- Service health status
- Request rate by endpoint
- Inference duration histogram (p50, p95, p99)
- Active inference count (0-2 gauge)
- SSE connected clients
- Error rate by type
**Inference dashboard**:
- Detections per frame/video
- Model availability status timeline
- Engine type distribution (ONNX vs TensorRT)
- Video batch processing rate
@@ -0,0 +1,56 @@
# Deployment Status Report
**Date**: 2026-03-31
**Project**: Azaion.Detections
## Component Readiness
| Component | Status | Notes |
|-----------|--------|-------|
| Detections Service (CPU) | Implemented & Tested | Dockerfile exists, e2e tests pass |
| Detections Service (GPU) | Implemented & Tested | Dockerfile.gpu exists, e2e tests pass |
| E2E Test Suite | Implemented | docker-compose.test.yml exists |
| Mock Services (Loader, Annotations) | Implemented | docker-compose.mocks.yml exists |
## External Dependencies
| Dependency | Type | Required | Notes |
|------------|------|----------|-------|
| Loader Service | REST API | Yes | Model download/upload |
| Annotations Service | REST API | Yes | Result persistence, auth refresh |
| classes.json | Config file | Yes | Mounted into container |
| ONNX model (azaion.onnx) | AI model | Yes | Served by Loader |
## Infrastructure Prerequisites
| Prerequisite | Status | Action Required |
|-------------|--------|-----------------|
| Container registry | Needed | User must provide registry URL |
| Docker runtime | Available | Both CPU and GPU Dockerfiles exist |
| NVIDIA runtime | Needed for GPU | `nvidia-docker` required for Dockerfile.gpu |
| Reverse proxy / TLS | Needed | No TLS at application level; requires external termination |
| DNS / Service discovery | Needed | Services communicate by hostname |
| Persistent storage | Not needed | Service is stateless |
## Environment Variables
| Variable | Default | Required | Purpose |
|----------|---------|----------|---------|
| LOADER_URL | http://loader:8080 | Yes | Loader service endpoint |
| ANNOTATIONS_URL | http://annotations:8080 | Yes | Annotations service endpoint |
| CLASSES_JSON_PATH | classes.json | Yes | Detection class definitions |
| LOG_DIR | Logs | No | Log output directory |
| VIDEOS_DIR | ./data/videos | No | Upload video storage |
| IMAGES_DIR | ./data/images | No | Upload image storage |
## Deployment Blockers
| Blocker | Severity | Mitigation |
|---------|----------|------------|
| Security findings (FAIL verdict) | High | Fix Critical/High CVEs before production (see `_docs/05_security/security_report.md`) |
| Containers run as root | Medium | Add non-root USER directive to Dockerfiles |
| No CI/CD pipeline | Medium | Define and implement pipeline |
## Recommendation
The application is functionally ready for deployment. Security findings from the audit should be addressed before production deployment — at minimum, pin dependencies to fix CVE-2025-43859 and CVE-2026-28356.