# Module: onnx_engine ## Purpose ONNX Runtime-based inference engine — CPU/CUDA fallback when TensorRT is unavailable. ## Public Interface ### Class: OnnxEngine (extends InferenceEngine) | Method | Signature | Description | |--------|-----------|-------------| | `__init__` | `(bytes model_bytes, int batch_size=1, **kwargs)` | Loads ONNX model from bytes, creates InferenceSession with CUDA > CPU provider priority. Reads input shape and batch size from model metadata. | | `get_input_shape` | `() -> tuple` | Returns `(height, width)` from input tensor shape | | `get_batch_size` | `() -> int` | Returns batch size (from model if not dynamic, else from constructor) | | `run` | `(input_data) -> list` | Runs session inference, returns output tensors | ## Internal Logic - Provider order: `["CUDAExecutionProvider", "CPUExecutionProvider"]` — ONNX Runtime selects the best available. - If the model's batch dimension is dynamic (-1), uses the constructor's `batch_size` parameter. - Logs model input metadata and custom metadata map at init. ## Dependencies - **External**: `onnxruntime` - **Internal**: `inference_engine` (base class), `constants_inf` (logging) ## Consumers - `inference` — instantiated when no compatible NVIDIA GPU is found ## Data Models None (wraps onnxruntime.InferenceSession). ## Configuration None. ## External Integrations None directly — model bytes are provided by caller (loaded via `loader_http_client`). ## Security None. ## Tests None found.