Files
detections/_docs/02_document/modules/onnx_engine.md
T

1.5 KiB

Module: onnx_engine

Purpose

ONNX Runtime-based inference engine — CPU/CUDA fallback when TensorRT is unavailable.

Public Interface

Class: OnnxEngine (extends InferenceEngine)

Method Signature Description
__init__ (bytes model_bytes, int batch_size=1, **kwargs) Loads ONNX model from bytes, creates InferenceSession with CUDA > CPU provider priority. Reads input shape and batch size from model metadata.
get_input_shape () -> tuple Returns (height, width) from input tensor shape
get_batch_size () -> int Returns batch size (from model if not dynamic, else from constructor)
run (input_data) -> list Runs session inference, returns output tensors

Internal Logic

  • Provider order: ["CUDAExecutionProvider", "CPUExecutionProvider"] — ONNX Runtime selects the best available.
  • If the model's batch dimension is dynamic (-1), uses the constructor's batch_size parameter.
  • Logs model input metadata and custom metadata map at init.

Dependencies

  • External: onnxruntime
  • Internal: inference_engine (base class), constants_inf (logging)

Consumers

  • inference — instantiated when no compatible NVIDIA GPU is found

Data Models

None (wraps onnxruntime.InferenceSession).

Configuration

None.

External Integrations

None directly — model bytes are provided by caller (loaded via loader_http_client).

Security

None.

Tests

None found.