mirror of
https://github.com/azaion/detections.git
synced 2026-04-22 22:06:32 +00:00
2.2 KiB
2.2 KiB
Module: tensorrt_engine
Purpose
TensorRT-based inference engine — high-performance GPU inference with CUDA memory management and ONNX-to-TensorRT model conversion.
Public Interface
Class: TensorRTEngine (extends InferenceEngine)
| Method | Signature | Description |
|---|---|---|
__init__ |
(bytes model_bytes, int batch_size=4, **kwargs) |
Deserializes TensorRT engine from bytes, allocates CUDA input/output memory, creates execution context and stream |
get_input_shape |
() -> tuple |
Returns (height, width) from input tensor shape |
get_batch_size |
() -> int |
Returns batch size |
run |
(input_data) -> list |
Async H2D copy → execute → D2H copy, returns output as numpy array |
get_gpu_memory_bytes |
(int device_id) -> int |
Static. Returns total GPU memory in bytes (default 2GB if unavailable) |
get_engine_filename |
(int device_id) -> str |
Static. Returns engine filename with compute capability and SM count: azaion.cc_{major}.{minor}_sm_{count}.engine |
convert_from_onnx |
(bytes onnx_model) -> bytes or None |
Static. Converts ONNX model to TensorRT serialized engine. Uses 90% of GPU memory as workspace. Enables FP16 if supported. |
Internal Logic
- Input shape defaults to 1280×1280 for dynamic dimensions.
- Output shape defaults to 300 max detections × 6 values (x1, y1, x2, y2, conf, cls) for dynamic dimensions.
runuses async CUDA memory transfers with stream synchronization.convert_from_onnxuses explicit batch mode, configures FP16 precision when GPU supports it.- Default batch size is 4 (vs OnnxEngine's 1).
Dependencies
- External:
tensorrt,pycuda.driver,pycuda.autoinit,pynvml,numpy - Internal:
inference_engine(base class),constants_inf(logging)
Consumers
inference— instantiated when compatible NVIDIA GPU is found; also callsconvert_from_onnxandget_engine_filename
Data Models
None (wraps TensorRT runtime objects).
Configuration
- Engine filename is GPU-specific (compute capability + SM count).
- Workspace memory is 90% of available GPU memory.
External Integrations
None directly — model bytes provided by caller.
Security
None.
Tests
None found.