[AZ-180] Update module and component docs for Jetson/INT8 changes

Made-with: Cursor
This commit is contained in:
Oleksandr Bezdieniezhnykh
2026-04-02 07:25:22 +03:00
parent 2ed9ce3336
commit 7a7f2a4cdd
5 changed files with 37 additions and 23 deletions
@@ -2,7 +2,7 @@
## Overview
**Purpose**: Provides pluggable inference backends (ONNX Runtime and TensorRT) behind a common abstract interface, including ONNX-to-TensorRT model conversion.
**Purpose**: Provides pluggable inference backends (ONNX Runtime and TensorRT) behind a common abstract interface, including ONNX-to-TensorRT model conversion with FP16 and INT8 precision support.
**Pattern**: Strategy pattern — `InferenceEngine` defines the contract; `OnnxEngine` and `TensorRTEngine` are interchangeable implementations.
@@ -43,8 +43,8 @@ cdef class OnnxEngine(InferenceEngine):
cdef class TensorRTEngine(InferenceEngine):
# Implements all base methods
@staticmethod get_gpu_memory_bytes(int device_id) -> int
@staticmethod get_engine_filename(int device_id) -> str
@staticmethod convert_from_onnx(bytes onnx_model) -> bytes or None
@staticmethod get_engine_filename(str precision="fp16") -> str # "fp16" or "int8"
@staticmethod convert_from_source(bytes onnx_model, str calib_cache_path=None) -> bytes or None
```
## External API
@@ -61,13 +61,14 @@ None — internal component consumed by Inference Pipeline.
- **OnnxEngine**: default batch_size=1; loads model into `onnxruntime.InferenceSession`
- **TensorRTEngine**: default batch_size=4; dynamic dimensions default to 1280×1280 input, 300 max detections
- **Model conversion**: `convert_from_onnx` uses 90% of GPU memory as workspace, enables FP16 if hardware supports it
- **Engine filename**: GPU-specific (`azaion.cc_{major}.{minor}_sm_{count}.engine`) — allows pre-built engine caching per GPU architecture
- **Model conversion**: `convert_from_source` uses 90% of GPU memory as workspace; INT8 precision when calibration cache supplied, FP16 if GPU supports it, FP32 otherwise
- **Engine filename**: GPU-specific with precision suffix (`azaion.cc_{major}.{minor}_sm_{count}.engine` for FP16, `*.int8.engine` for INT8) — prevents cache confusion between precision variants
- Output format: `[batch][detection_index][x1, y1, x2, y2, confidence, class_id]`
## Caveats
- TensorRT engine files are GPU-architecture-specific and not portable
- INT8 engine files require a pre-computed calibration cache; cache is generated offline and uploaded to Loader manually
- `pycuda.autoinit` import is required as side-effect (initializes CUDA context)
- Dynamic shapes defaulting to 1280×1280 is hardcoded — not configurable