mirror of
https://github.com/azaion/detections.git
synced 2026-04-22 09:36:32 +00:00
[AZ-180] Update module and component docs for Jetson/INT8 changes
Made-with: Cursor
This commit is contained in:
@@ -2,7 +2,7 @@
|
||||
|
||||
## Overview
|
||||
|
||||
**Purpose**: Provides pluggable inference backends (ONNX Runtime and TensorRT) behind a common abstract interface, including ONNX-to-TensorRT model conversion.
|
||||
**Purpose**: Provides pluggable inference backends (ONNX Runtime and TensorRT) behind a common abstract interface, including ONNX-to-TensorRT model conversion with FP16 and INT8 precision support.
|
||||
|
||||
**Pattern**: Strategy pattern — `InferenceEngine` defines the contract; `OnnxEngine` and `TensorRTEngine` are interchangeable implementations.
|
||||
|
||||
@@ -43,8 +43,8 @@ cdef class OnnxEngine(InferenceEngine):
|
||||
cdef class TensorRTEngine(InferenceEngine):
|
||||
# Implements all base methods
|
||||
@staticmethod get_gpu_memory_bytes(int device_id) -> int
|
||||
@staticmethod get_engine_filename(int device_id) -> str
|
||||
@staticmethod convert_from_onnx(bytes onnx_model) -> bytes or None
|
||||
@staticmethod get_engine_filename(str precision="fp16") -> str # "fp16" or "int8"
|
||||
@staticmethod convert_from_source(bytes onnx_model, str calib_cache_path=None) -> bytes or None
|
||||
```
|
||||
|
||||
## External API
|
||||
@@ -61,13 +61,14 @@ None — internal component consumed by Inference Pipeline.
|
||||
|
||||
- **OnnxEngine**: default batch_size=1; loads model into `onnxruntime.InferenceSession`
|
||||
- **TensorRTEngine**: default batch_size=4; dynamic dimensions default to 1280×1280 input, 300 max detections
|
||||
- **Model conversion**: `convert_from_onnx` uses 90% of GPU memory as workspace, enables FP16 if hardware supports it
|
||||
- **Engine filename**: GPU-specific (`azaion.cc_{major}.{minor}_sm_{count}.engine`) — allows pre-built engine caching per GPU architecture
|
||||
- **Model conversion**: `convert_from_source` uses 90% of GPU memory as workspace; INT8 precision when calibration cache supplied, FP16 if GPU supports it, FP32 otherwise
|
||||
- **Engine filename**: GPU-specific with precision suffix (`azaion.cc_{major}.{minor}_sm_{count}.engine` for FP16, `*.int8.engine` for INT8) — prevents cache confusion between precision variants
|
||||
- Output format: `[batch][detection_index][x1, y1, x2, y2, confidence, class_id]`
|
||||
|
||||
## Caveats
|
||||
|
||||
- TensorRT engine files are GPU-architecture-specific and not portable
|
||||
- INT8 engine files require a pre-computed calibration cache; cache is generated offline and uploaded to Loader manually
|
||||
- `pycuda.autoinit` import is required as side-effect (initializes CUDA context)
|
||||
- Dynamic shapes defaulting to 1280×1280 is hardcoded — not configurable
|
||||
|
||||
|
||||
Reference in New Issue
Block a user