mirror of
https://github.com/azaion/detections.git
synced 2026-04-22 08:36:31 +00:00
[AZ-180] Add Jetson Orin Nano support with INT8 TensorRT engine
- Dockerfile.jetson: JetPack 6.x L4T base image (aarch64), TensorRT and PyCUDA from apt - requirements-jetson.txt: derived from requirements.txt, no pip tensorrt/pycuda - docker-compose.jetson.yml: runtime: nvidia for NVIDIA Container Runtime - tensorrt_engine.pyx: convert_from_source accepts optional calib_cache_path; INT8 used when cache present, FP16 fallback; get_engine_filename encodes precision suffix to avoid engine cache confusion - inference.pyx: init_ai tries INT8 engine then FP16 on lookup; downloads calibration cache before conversion thread; passes cache path through to convert_from_source - constants_inf: add INT8_CALIB_CACHE_FILE constant - Unit tests for AC-3 (INT8 flag set when cache provided) and AC-4 (FP16 when no cache) Made-with: Cursor
This commit is contained in:
@@ -34,3 +34,4 @@
|
||||
| Task | Name | Complexity | Dependencies | Epic | Status |
|
||||
|------|------|-----------|-------------|------|--------|
|
||||
| AZ-177 | remove_redundant_video_prewrite | 2 | AZ-173 | AZ-172 | todo |
|
||||
| AZ-180 | jetson_orin_nano_support | 5 | None | pending | todo |
|
||||
|
||||
@@ -0,0 +1,121 @@
|
||||
# Jetson Orin Nano Support (TensorRT + INT8)
|
||||
|
||||
**Task**: AZ-180_jetson_orin_nano_support
|
||||
**Name**: Jetson Orin Nano Support
|
||||
**Description**: Run the detection service on NVIDIA Jetson Orin Nano with a JetPack 6.x container image, INT8 engine conversion using a pre-generated calibration cache, and docker-compose configuration.
|
||||
**Complexity**: 5 points
|
||||
**Dependencies**: None
|
||||
**Component**: Deployment + Inference Engine
|
||||
**Tracker**: AZ-180
|
||||
**Epic**: pending
|
||||
|
||||
## Problem
|
||||
|
||||
The detection service cannot run on NVIDIA Jetson Orin Nano for two reasons:
|
||||
1. The existing `Dockerfile.gpu` and `requirements-gpu.txt` are x86-specific — TensorRT and PyCUDA are pip-installed but on Jetson they come bundled with JetPack and cannot be installed via pip on aarch64.
|
||||
2. The ONNX→TensorRT conversion in `convert_from_source()` only supports FP16/FP32. On Jetson Orin Nano (8 GB shared RAM, 40 TOPS INT8 vs 20 TFLOPS FP16), INT8 gives ~1.5–2× throughput improvement over FP16, but requires a calibration cache to quantize per-layer activations.
|
||||
|
||||
## Outcome
|
||||
|
||||
- A `Dockerfile.jetson` that builds and runs on Jetson Orin Nano (aarch64, JetPack 6.x)
|
||||
- A `requirements-jetson.txt` that installs Python dependencies without pip-installing tensorrt or pycuda
|
||||
- A `docker-compose.jetson.yml` with NVIDIA Container Runtime configuration
|
||||
- `convert_from_source()` in `tensorrt_engine.pyx` extended to accept an optional INT8 calibration cache path — if the cache is present, INT8 is used; otherwise FP16 fallback
|
||||
- `init_ai()` in `inference.pyx` extended to try downloading the calibration cache from the Loader service before starting the conversion thread
|
||||
- The service reports `engineType: tensorrt` on `GET /health` on Jetson
|
||||
|
||||
## Scope
|
||||
|
||||
### Included
|
||||
- `Dockerfile.jetson` using a JetPack 6.x L4T base image with pre-installed TensorRT and PyCUDA
|
||||
- `requirements-jetson.txt` derived from `requirements.txt`, excluding tensorrt and pycuda
|
||||
- `docker-compose.jetson.yml` with `runtime: nvidia`
|
||||
- `tensorrt_engine.pyx`: extend `convert_from_source(bytes onnx_model, str calib_cache_path=None)` — set `INT8` flag and load cache when path is provided; fall back to FP16 when not
|
||||
- `inference.pyx`: extend `init_ai()` to attempt download of `azaion.int8_calib.cache` from Loader before spawning the conversion thread; pass the local path to `convert_from_source()`
|
||||
- Update `_docs/04_deploy/containerization.md` with the Jetson image variant
|
||||
|
||||
### Excluded
|
||||
- DLA (Deep Learning Accelerator) targeting
|
||||
- Calibration cache generation tooling (cache is produced offline and uploaded to Loader manually)
|
||||
- CI/CD pipeline for Jetson (no build agents available)
|
||||
- ONNX fallback in Jetson image
|
||||
|
||||
## Acceptance Criteria
|
||||
|
||||
**AC-1: Jetson image builds**
|
||||
Given a machine with Docker and aarch64 buildx (or a Jetson device)
|
||||
When `docker build -f Dockerfile.jetson .` is executed
|
||||
Then the image builds without error and Cython extensions compile for aarch64
|
||||
|
||||
**AC-2: Service starts with TensorRT engine on Jetson**
|
||||
Given the container is running on a Jetson Orin Nano with JetPack 6.x
|
||||
When `GET /health` is called after engine initialization
|
||||
Then `aiAvailability` is `Enabled` and `engineType` is `tensorrt`
|
||||
|
||||
**AC-3: INT8 conversion used when calibration cache is available**
|
||||
Given a valid `azaion.int8_calib.cache` is accessible on the Loader service
|
||||
When the service initializes and no cached engine exists
|
||||
Then ONNX→TensorRT conversion uses INT8 precision
|
||||
|
||||
**AC-4: FP16 fallback when calibration cache is absent**
|
||||
Given no `azaion.int8_calib.cache` is available on the Loader service
|
||||
When the service initializes and no cached engine exists
|
||||
Then ONNX→TensorRT conversion falls back to FP16 (existing behavior)
|
||||
|
||||
**AC-5: Detection endpoint functions on Jetson**
|
||||
Given the container is running and a valid model is accessible via the Loader service
|
||||
When `POST /detect/image` is called with a test image
|
||||
Then detections are returned with the same structure as on x86
|
||||
|
||||
**AC-6: Compose brings up the service on Jetson**
|
||||
Given a Jetson device with docker-compose and NVIDIA Container Runtime installed
|
||||
When `docker compose -f docker-compose.jetson.yml up` is executed
|
||||
Then the detections service is reachable on port 8080
|
||||
|
||||
## Non-Functional Requirements
|
||||
|
||||
**Compatibility**
|
||||
- JetPack 6.x (CUDA 12.2, TensorRT 10.x)
|
||||
- Jetson Orin Nano (aarch64, SM 8.7)
|
||||
|
||||
**Reliability**
|
||||
- Engine filename auto-encodes CC+SM — Jetson engine file is distinct from any x86-cached engine
|
||||
- INT8 conversion is best-effort: calibration cache download failure is non-fatal; service falls back to FP16
|
||||
|
||||
## Unit Tests
|
||||
|
||||
| AC Ref | What to Test | Required Outcome |
|
||||
|--------|-------------|-----------------|
|
||||
| AC-3 | `convert_from_source()` with a valid calib cache path | INT8 flag set in builder config |
|
||||
| AC-4 | `convert_from_source()` with `calib_cache_path=None` | FP16 flag set, no INT8 flag |
|
||||
|
||||
## Blackbox Tests
|
||||
|
||||
| AC Ref | Initial Data/Conditions | What to Test | Expected Behavior | NFR References |
|
||||
|--------|------------------------|-------------|-------------------|----------------|
|
||||
| AC-2 | Jetson device, container running, model accessible | GET /health after warm-up | engineType=tensorrt, aiAvailability=Enabled | — |
|
||||
| AC-5 | Jetson device, test image | POST /detect/image | Valid DetectionDto list | — |
|
||||
| AC-6 | Jetson device, docker-compose.jetson.yml | docker compose up | Service healthy on :8080 | — |
|
||||
|
||||
Note: AC-2, AC-5, AC-6 require physical Jetson hardware and cannot run in standard CI.
|
||||
|
||||
## Constraints
|
||||
|
||||
- TensorRT and PyCUDA must NOT be pip-installed — provided by JetPack in the base image
|
||||
- Base image must be a JetPack 6.x L4T image — not a generic CUDA image
|
||||
- Calibration cache download failure must be non-fatal — log a warning and fall back to FP16
|
||||
- INT8 conversion and FP16 conversion produce different engine files (different filenames) so cached engines are not confused
|
||||
|
||||
## Risks & Mitigation
|
||||
|
||||
**Risk 1: JetPack TensorRT Python binding path**
|
||||
- *Risk*: L4T base image may put TensorRT Python bindings in a non-standard location
|
||||
- *Mitigation*: Verify `python3 -c "import tensorrt"` in the base image layer; adjust `PYTHONPATH` if needed
|
||||
|
||||
**Risk 2: PyCUDA availability in base image**
|
||||
- *Risk*: Some L4T images do not include pycuda
|
||||
- *Mitigation*: Fall back to `apt-get install python3-pycuda` or source build with `CUDA_ROOT` set
|
||||
|
||||
**Risk 3: INT8 accuracy degradation**
|
||||
- *Risk*: Without a well-representative calibration dataset, mAP may drop >1 point
|
||||
- *Mitigation*: Calibration cache is generated offline with controlled data; FP16 fallback always available
|
||||
Reference in New Issue
Block a user