mirror of
https://github.com/azaion/detections.git
synced 2026-04-22 22:06:32 +00:00
7a7f2a4cdd
Made-with: Cursor
3.3 KiB
3.3 KiB
Module: tensorrt_engine
Purpose
TensorRT-based inference engine — high-performance GPU inference with CUDA memory management and ONNX-to-TensorRT model conversion. Supports FP16 and INT8 precision; INT8 is used when a pre-computed calibration cache is supplied.
Public Interface
Class: TensorRTEngine (extends InferenceEngine)
| Method | Signature | Description |
|---|---|---|
__init__ |
(bytes model_bytes, int batch_size=4, **kwargs) |
Deserializes TensorRT engine from bytes, allocates CUDA input/output memory, creates execution context and stream |
get_input_shape |
() -> tuple |
Returns (height, width) from input tensor shape |
get_batch_size |
() -> int |
Returns batch size |
run |
(input_data) -> list |
Async H2D copy → execute → D2H copy, returns output as numpy array |
get_gpu_memory_bytes |
(int device_id) -> int |
Static. Returns total GPU memory in bytes (default 2GB if unavailable) |
get_engine_filename |
(str precision="fp16") -> str |
Static. Returns engine filename encoding compute capability, SM count, and precision suffix: azaion.cc_{major}.{minor}_sm_{count}.engine (FP16) or azaion.cc_{major}.{minor}_sm_{count}.int8.engine (INT8) |
convert_from_source |
(bytes onnx_model, str calib_cache_path=None) -> bytes or None |
Static. Converts ONNX model to TensorRT serialized engine. Uses INT8 when calib_cache_path is provided and the file exists; falls back to FP16 if GPU supports it, FP32 otherwise. |
Helper Class: _CacheCalibrator (module-private)
Implements trt.IInt8EntropyCalibrator2. Loads a pre-generated INT8 calibration cache from disk and supplies it to the TensorRT builder. Used only when convert_from_source is called with a valid calib_cache_path.
Internal Logic
- Input shape defaults to 1280×1280 for dynamic dimensions.
- Output shape defaults to 300 max detections × 6 values (x1, y1, x2, y2, conf, cls) for dynamic dimensions.
runuses async CUDA memory transfers with stream synchronization.convert_from_sourceuses explicit batch mode:- If
calib_cache_pathis a valid file path → setsBuilderFlag.INT8and assigns_CacheCalibratoras the calibrator - Else if GPU has fast FP16 → sets
BuilderFlag.FP16 - Else → FP32 (no flag)
- If
- Engine filenames encode precision suffix (
*.int8.enginevs*.engine) to prevent INT8/FP16 engine cache confusion across conversions.
Dependencies
- External:
tensorrt,pycuda.driver,pycuda.autoinit,pynvml,numpy,os - Internal:
inference_engine(base class),constants_inf(logging)
Consumers
inference— instantiated when compatible NVIDIA GPU is found; also callsconvert_from_sourceandget_engine_filename
Data Models
None (wraps TensorRT runtime objects).
Configuration
- Engine filename encodes GPU compute capability + SM count + precision suffix.
- Workspace memory is 90% of available GPU memory.
- INT8 calibration cache path is supplied at conversion time (downloaded by
inference.init_ai).
External Integrations
None directly — model bytes provided by caller.
Security
None.
Tests
tests/test_az180_jetson_int8.py— unit tests for INT8 flag (AC-3) and FP16 fallback (AC-4); skipped when TensorRT is not available (GPU environment required).