Component: Inference Engines

Overview

Purpose: Provides pluggable inference backends (ONNX Runtime and TensorRT) behind a common abstract interface, including ONNX-to-TensorRT model conversion.

Pattern: Strategy pattern — InferenceEngine defines the contract; OnnxEngine and TensorRTEngine are interchangeable implementations.

Upstream: Domain (constants_inf for logging). Downstream: Inference Pipeline (creates and uses engines).

Modules

Module	Role
`inference_engine`	Abstract base class defining `get_input_shape`, `get_batch_size`, `run`
`onnx_engine`	ONNX Runtime implementation (CPU/CUDA)
`tensorrt_engine`	TensorRT implementation (GPU) + ONNX→TensorRT converter

Internal Interfaces

InferenceEngine (abstract)

cdef class InferenceEngine:
    __init__(bytes model_bytes, int batch_size=1, **kwargs)
    cdef tuple get_input_shape()       # -> (height, width)
    cdef int get_batch_size()          # -> batch_size
    cdef run(input_data)               # -> list of output tensors

OnnxEngine

cdef class OnnxEngine(InferenceEngine):
    # Implements all base methods
    # Provider priority: CUDA > CPU

TensorRTEngine

cdef class TensorRTEngine(InferenceEngine):
    # Implements all base methods
    @staticmethod get_gpu_memory_bytes(int device_id) -> int
    @staticmethod get_engine_filename(int device_id) -> str
    @staticmethod convert_from_onnx(bytes onnx_model) -> bytes or None

External API

None — internal component consumed by Inference Pipeline.

Data Access Patterns

Model bytes loaded in-memory (provided by caller)
TensorRT: CUDA device memory allocated at init, async H2D/D2H transfers during inference
ONNX: managed by onnxruntime internally

Implementation Details

OnnxEngine: default batch_size=1; loads model into onnxruntime.InferenceSession
TensorRTEngine: default batch_size=4; dynamic dimensions default to 1280×1280 input, 300 max detections
Model conversion: convert_from_onnx uses 90% of GPU memory as workspace, enables FP16 if hardware supports it
Engine filename: GPU-specific (azaion.cc_{major}.{minor}_sm_{count}.engine) — allows pre-built engine caching per GPU architecture
Output format: [batch][detection_index][x1, y1, x2, y2, confidence, class_id]

Caveats

TensorRT engine files are GPU-architecture-specific and not portable
pycuda.autoinit import is required as side-effect (initializes CUDA context)
Dynamic shapes defaulting to 1280×1280 is hardcoded — not configurable

Dependency Graph

graph TD
    onnx_engine --> inference_engine
    onnx_engine --> constants_inf
    tensorrt_engine --> inference_engine
    tensorrt_engine --> constants_inf

Logging Strategy

Logs model metadata at init and conversion progress/errors via constants_inf.log/logerror.

2.9 KiB Raw Blame History Unescape Escape