mirror of https://github.com/azaion/detections.git synced 2026-04-22 22:16:31 +00:00

Files

T

Oleksandr Bezdieniezhnykh 2149cd6c08 [AZ-180] Add Jetson Orin Nano support with INT8 TensorRT engine

- Dockerfile.jetson: JetPack 6.x L4T base image (aarch64), TensorRT and PyCUDA from apt
- requirements-jetson.txt: derived from requirements.txt, no pip tensorrt/pycuda
- docker-compose.jetson.yml: runtime: nvidia for NVIDIA Container Runtime
- tensorrt_engine.pyx: convert_from_source accepts optional calib_cache_path; INT8 used when cache present, FP16 fallback; get_engine_filename encodes precision suffix to avoid engine cache confusion
- inference.pyx: init_ai tries INT8 engine then FP16 on lookup; downloads calibration cache before conversion thread; passes cache path through to convert_from_source
- constants_inf: add INT8_CALIB_CACHE_FILE constant
- Unit tests for AC-3 (INT8 flag set when cache provided) and AC-4 (FP16 when no cache)

Made-with: Cursor

2026-04-02 07:12:45 +03:00

4.5 KiB

Raw Blame History

Containerization Plan

Image Variants

detections-cpu (Dockerfile)

Aspect	Specification
Base image	`python:3.11-slim` (pinned digest recommended)
Build stages	Single stage (Cython compile requires gcc at runtime for setup.py)
Non-root user	`adduser --disabled-password --gecos '' appuser` + `USER appuser`
Health check	`HEALTHCHECK --interval=30s --timeout=5s CMD curl -f http://localhost:8080/health \|\| exit 1`
Exposed ports	8080
Entrypoint	`uvicorn main:app --host 0.0.0.0 --port 8080`

Changes needed to existing Dockerfile:

Add non-root user (security finding F7)
Add HEALTHCHECK directive
Pin python:3.11-slim to specific digest
Add curl to apt-get install (for health check)

detections-gpu (Dockerfile.gpu)

Aspect	Specification
Base image	`nvidia/cuda:12.2.0-runtime-ubuntu22.04`
Build stages	Single stage
Non-root user	`adduser --disabled-password --gecos '' appuser` + `USER appuser`
Health check	`HEALTHCHECK --interval=30s --timeout=5s CMD curl -f http://localhost:8080/health \|\| exit 1`
Exposed ports	8080
Entrypoint	`uvicorn main:app --host 0.0.0.0 --port 8080`
Runtime	Requires `--runtime=nvidia` or `nvidia` runtime in Docker

Changes needed to existing Dockerfile.gpu:

Add non-root user
Add HEALTHCHECK directive
Add curl to apt-get install

.dockerignore

.git
.gitignore
_docs/
_standalone/
e2e/
tests/
*.md
.env
.env.*
.cursor/
.venv/
venv/
__pycache__/
*.pyc
build/
dist/
*.egg-info
Logs/

Docker Compose — Local Development

docker-compose.yml (already partially exists as e2e/docker-compose.mocks.yml):

name: detections-dev

services:
  mock-loader:
    build: ./e2e/mocks/loader
    ports:
      - "18080:8080"
    volumes:
      - ./e2e/fixtures:/models
    networks:
      - dev-net

  mock-annotations:
    build: ./e2e/mocks/annotations
    ports:
      - "18081:8081"
    networks:
      - dev-net

  detections:
    build:
      context: .
      dockerfile: Dockerfile
    ports:
      - "8080:8080"
    depends_on:
      - mock-loader
      - mock-annotations
    env_file: .env
    environment:
      LOADER_URL: http://mock-loader:8080
      ANNOTATIONS_URL: http://mock-annotations:8081
    volumes:
      - ./e2e/fixtures/classes.json:/app/classes.json:ro
      - detections-logs:/app/Logs
    shm_size: 512m
    networks:
      - dev-net

volumes:
  detections-logs:

networks:
  dev-net:
    driver: bridge

Docker Compose — Blackbox Tests

Already exists: e2e/docker-compose.test.yml. No changes needed — supports both cpu and gpu profiles with mock services and test runner.

detections-jetson (Dockerfile.jetson)

Aspect	Specification
Base image	`nvcr.io/nvidia/l4t-base:r36.3.0` (JetPack 6.x, aarch64)
TensorRT	Pre-installed via JetPack — `python3-libnvinfer` apt package (NOT pip)
PyCUDA	Pre-installed via JetPack — `python3-pycuda` apt package (NOT pip)
Build stages	Single stage (Cython compile requires gcc)
Non-root user	`adduser --disabled-password --gecos '' appuser` + `USER appuser`
Exposed ports	8080
Entrypoint	`uvicorn main:app --host 0.0.0.0 --port 8080`
Runtime	Requires NVIDIA Container Runtime (`runtime: nvidia` in docker-compose)

Jetson-specific behaviour:

requirements-jetson.txt derives from requirements.txt — tensorrt and pycuda are excluded from pip and provided by JetPack
Engine filename auto-encodes CC+SM (e.g. azaion.cc_8.7_sm_16.engine for Orin Nano), ensuring the Jetson engine is distinct from any x86-cached engine
INT8 is used when azaion.int8_calib.cache is available on the Loader service; precision suffix appended to engine filename (*.int8.engine); FP16 fallback when cache is absent
docker-compose.jetson.yml uses runtime: nvidia for the NVIDIA Container Runtime

Compose usage on Jetson:

docker compose -f docker-compose.jetson.yml up

Image Tagging Strategy

Context	Tag Format	Example
CI builds	`<registry>/azaion/detections-cpu:<git-sha>`	`registry.example.com/azaion/detections-cpu:a1b2c3d`
CI builds (GPU)	`<registry>/azaion/detections-gpu:<git-sha>`	`registry.example.com/azaion/detections-gpu:a1b2c3d`
Local development	`detections-cpu:dev`	—
Latest stable	`<registry>/azaion/detections-cpu:latest`	Updated on merge to main

4.5 KiB Raw Blame History