Files
detections/_docs/02_document/modules/tensorrt_engine.md
T

2.2 KiB
Raw Blame History

Module: tensorrt_engine

Purpose

TensorRT-based inference engine — high-performance GPU inference with CUDA memory management and ONNX-to-TensorRT model conversion.

Public Interface

Class: TensorRTEngine (extends InferenceEngine)

Method Signature Description
__init__ (bytes model_bytes, int batch_size=4, **kwargs) Deserializes TensorRT engine from bytes, allocates CUDA input/output memory, creates execution context and stream
get_input_shape () -> tuple Returns (height, width) from input tensor shape
get_batch_size () -> int Returns batch size
run (input_data) -> list Async H2D copy → execute → D2H copy, returns output as numpy array
get_gpu_memory_bytes (int device_id) -> int Static. Returns total GPU memory in bytes (default 2GB if unavailable)
get_engine_filename (int device_id) -> str Static. Returns engine filename with compute capability and SM count: azaion.cc_{major}.{minor}_sm_{count}.engine
convert_from_onnx (bytes onnx_model) -> bytes or None Static. Converts ONNX model to TensorRT serialized engine. Uses 90% of GPU memory as workspace. Enables FP16 if supported.

Internal Logic

  • Input shape defaults to 1280×1280 for dynamic dimensions.
  • Output shape defaults to 300 max detections × 6 values (x1, y1, x2, y2, conf, cls) for dynamic dimensions.
  • run uses async CUDA memory transfers with stream synchronization.
  • convert_from_onnx uses explicit batch mode, configures FP16 precision when GPU supports it.
  • Default batch size is 4 (vs OnnxEngine's 1).

Dependencies

  • External: tensorrt, pycuda.driver, pycuda.autoinit, pynvml, numpy
  • Internal: inference_engine (base class), constants_inf (logging)

Consumers

  • inference — instantiated when compatible NVIDIA GPU is found; also calls convert_from_onnx and get_engine_filename

Data Models

None (wraps TensorRT runtime objects).

Configuration

  • Engine filename is GPU-specific (compute capability + SM count).
  • Workspace memory is 90% of available GPU memory.

External Integrations

None directly — model bytes provided by caller.

Security

None.

Tests

None found.