Files
detections/_docs/02_document/modules/onnx_engine.md
T

52 lines
1.5 KiB
Markdown

# Module: onnx_engine
## Purpose
ONNX Runtime-based inference engine — CPU/CUDA fallback when TensorRT is unavailable.
## Public Interface
### Class: OnnxEngine (extends InferenceEngine)
| Method | Signature | Description |
|--------|-----------|-------------|
| `__init__` | `(bytes model_bytes, int batch_size=1, **kwargs)` | Loads ONNX model from bytes, creates InferenceSession with CUDA > CPU provider priority. Reads input shape and batch size from model metadata. |
| `get_input_shape` | `() -> tuple` | Returns `(height, width)` from input tensor shape |
| `get_batch_size` | `() -> int` | Returns batch size (from model if not dynamic, else from constructor) |
| `run` | `(input_data) -> list` | Runs session inference, returns output tensors |
## Internal Logic
- Provider order: `["CUDAExecutionProvider", "CPUExecutionProvider"]` — ONNX Runtime selects the best available.
- If the model's batch dimension is dynamic (-1), uses the constructor's `batch_size` parameter.
- Logs model input metadata and custom metadata map at init.
## Dependencies
- **External**: `onnxruntime`
- **Internal**: `inference_engine` (base class), `constants_inf` (logging)
## Consumers
- `inference` — instantiated when no compatible NVIDIA GPU is found
## Data Models
None (wraps onnxruntime.InferenceSession).
## Configuration
None.
## External Integrations
None directly — model bytes are provided by caller (loaded via `loader_http_client`).
## Security
None.
## Tests
None found.