# Module: onnx_engine

## Purpose

ONNX Runtime-based inference engine — CPU/CUDA fallback when TensorRT is unavailable.

## Public Interface

### Class: OnnxEngine (extends InferenceEngine)

| Method | Signature | Description |
|--------|-----------|-------------|
| `__init__` | `(bytes model_bytes, int batch_size=1, **kwargs)` | Loads ONNX model from bytes, creates InferenceSession with CUDA > CPU provider priority. Reads input shape and batch size from model metadata. |
| `get_input_shape` | `() -> tuple` | Returns `(height, width)` from input tensor shape |
| `get_batch_size` | `() -> int` | Returns batch size (from model if not dynamic, else from constructor) |
| `run` | `(input_data) -> list` | Runs session inference, returns output tensors |

## Internal Logic

- Provider order: `["CUDAExecutionProvider", "CPUExecutionProvider"]` — ONNX Runtime selects the best available.
- If the model's batch dimension is dynamic (-1), uses the constructor's `batch_size` parameter.
- Logs model input metadata and custom metadata map at init.

## Dependencies

- **External**: `onnxruntime`
- **Internal**: `inference_engine` (base class), `constants_inf` (logging)

## Consumers

- `inference` — instantiated when no compatible NVIDIA GPU is found

## Data Models

None (wraps onnxruntime.InferenceSession).

## Configuration

None.

## External Integrations

None directly — model bytes are provided by caller (loaded via `loader_http_client`).

## Security

None.

## Tests

None found.