mirror of
https://github.com/azaion/detections.git
synced 2026-04-22 21:46:31 +00:00
2.9 KiB
2.9 KiB
Component: Inference Engines
Overview
Purpose: Provides pluggable inference backends (ONNX Runtime and TensorRT) behind a common abstract interface, including ONNX-to-TensorRT model conversion.
Pattern: Strategy pattern — InferenceEngine defines the contract; OnnxEngine and TensorRTEngine are interchangeable implementations.
Upstream: Domain (constants_inf for logging). Downstream: Inference Pipeline (creates and uses engines).
Modules
| Module | Role |
|---|---|
inference_engine |
Abstract base class defining get_input_shape, get_batch_size, run |
onnx_engine |
ONNX Runtime implementation (CPU/CUDA) |
tensorrt_engine |
TensorRT implementation (GPU) + ONNX→TensorRT converter |
Internal Interfaces
InferenceEngine (abstract)
cdef class InferenceEngine:
__init__(bytes model_bytes, int batch_size=1, **kwargs)
cdef tuple get_input_shape() # -> (height, width)
cdef int get_batch_size() # -> batch_size
cdef run(input_data) # -> list of output tensors
OnnxEngine
cdef class OnnxEngine(InferenceEngine):
# Implements all base methods
# Provider priority: CUDA > CPU
TensorRTEngine
cdef class TensorRTEngine(InferenceEngine):
# Implements all base methods
@staticmethod get_gpu_memory_bytes(int device_id) -> int
@staticmethod get_engine_filename(int device_id) -> str
@staticmethod convert_from_onnx(bytes onnx_model) -> bytes or None
External API
None — internal component consumed by Inference Pipeline.
Data Access Patterns
- Model bytes loaded in-memory (provided by caller)
- TensorRT: CUDA device memory allocated at init, async H2D/D2H transfers during inference
- ONNX: managed by onnxruntime internally
Implementation Details
- OnnxEngine: default batch_size=1; loads model into
onnxruntime.InferenceSession - TensorRTEngine: default batch_size=4; dynamic dimensions default to 1280×1280 input, 300 max detections
- Model conversion:
convert_from_onnxuses 90% of GPU memory as workspace, enables FP16 if hardware supports it - Engine filename: GPU-specific (
azaion.cc_{major}.{minor}_sm_{count}.engine) — allows pre-built engine caching per GPU architecture - Output format:
[batch][detection_index][x1, y1, x2, y2, confidence, class_id]
Caveats
- TensorRT engine files are GPU-architecture-specific and not portable
pycuda.autoinitimport is required as side-effect (initializes CUDA context)- Dynamic shapes defaulting to 1280×1280 is hardcoded — not configurable
Dependency Graph
graph TD
onnx_engine --> inference_engine
onnx_engine --> constants_inf
tensorrt_engine --> inference_engine
tensorrt_engine --> constants_inf
Logging Strategy
Logs model metadata at init and conversion progress/errors via constants_inf.log/logerror.