# Module: inference/onnx_engine ## Purpose Defines the abstract `InferenceEngine` base class and the `OnnxEngine` implementation for running ONNX model inference with GPU acceleration. ## Public Interface ### InferenceEngine (ABC) | Method | Signature | Description | |--------|-----------|-------------| | `__init__` | `(model_path: str, batch_size: int = 1, **kwargs)` | Abstract constructor | | `get_input_shape` | `() -> Tuple[int, int]` | Returns (height, width) of model input | | `get_batch_size` | `() -> int` | Returns the batch size | | `run` | `(input_data: np.ndarray) -> List[np.ndarray]` | Runs inference, returns output tensors | ### OnnxEngine (extends InferenceEngine) | Method | Signature | Description | |--------|-----------|-------------| | `__init__` | `(model_bytes, batch_size: int = 1, **kwargs)` | Loads ONNX model from bytes, creates InferenceSession with CUDA+CPU providers | | `get_input_shape` | `() -> Tuple[int, int]` | Returns (height, width) from model input shape | | `get_batch_size` | `() -> int` | Returns batch size (from model shape or constructor arg) | | `run` | `(input_data: np.ndarray) -> List[np.ndarray]` | Runs ONNX inference session | ## Internal Logic - Uses ONNX Runtime with `CUDAExecutionProvider` (primary) and `CPUExecutionProvider` (fallback). - Reads model metadata to extract class names from custom metadata map. - If model input shape has a fixed batch dimension (not -1), overrides the constructor batch_size. ## Dependencies - `onnxruntime` (external) — ONNX inference runtime - `numpy` (external) - `abc`, `typing` (stdlib) ## Consumers inference/inference, inference/tensorrt_engine (inherits InferenceEngine), train (imports OnnxEngine) ## Data Models None. ## Configuration None. ## External Integrations - ONNX Runtime GPU execution (CUDA) ## Security None. ## Tests None.