azaion/detections

mirror of https://github.com/azaion/detections.git synced 2026-04-22 22:06:32 +00:00

Files

T

Oleksandr Bezdieniezhnykh 3165a88f0b Add detailed file index and enhance skill documentation for autopilot, decompose, deploy, plan, and research skills. Introduce tests-only mode in decompose skill, clarify required files for deploy and plan skills, and improve prerequisite checks across skills for better user guidance and workflow efficiency.

2026-03-22 16:15:49 +02:00

2.2 KiB

Raw Blame History

Module: tensorrt_engine

Purpose

TensorRT-based inference engine — high-performance GPU inference with CUDA memory management and ONNX-to-TensorRT model conversion.

Public Interface

Class: TensorRTEngine (extends InferenceEngine)

Method	Signature	Description
`__init__`	`(bytes model_bytes, int batch_size=4, **kwargs)`	Deserializes TensorRT engine from bytes, allocates CUDA input/output memory, creates execution context and stream
`get_input_shape`	`() -> tuple`	Returns `(height, width)` from input tensor shape
`get_batch_size`	`() -> int`	Returns batch size
`run`	`(input_data) -> list`	Async H2D copy → execute → D2H copy, returns output as numpy array
`get_gpu_memory_bytes`	`(int device_id) -> int`	Static. Returns total GPU memory in bytes (default 2GB if unavailable)
`get_engine_filename`	`(int device_id) -> str`	Static. Returns engine filename with compute capability and SM count: `azaion.cc_{major}.{minor}_sm_{count}.engine`
`convert_from_onnx`	`(bytes onnx_model) -> bytes or None`	Static. Converts ONNX model to TensorRT serialized engine. Uses 90% of GPU memory as workspace. Enables FP16 if supported.

Internal Logic

Input shape defaults to 1280×1280 for dynamic dimensions.
Output shape defaults to 300 max detections × 6 values (x1, y1, x2, y2, conf, cls) for dynamic dimensions.
run uses async CUDA memory transfers with stream synchronization.
convert_from_onnx uses explicit batch mode, configures FP16 precision when GPU supports it.
Default batch size is 4 (vs OnnxEngine's 1).

Dependencies

External: tensorrt, pycuda.driver, pycuda.autoinit, pynvml, numpy
Internal: inference_engine (base class), constants_inf (logging)

Consumers

inference — instantiated when compatible NVIDIA GPU is found; also calls convert_from_onnx and get_engine_filename

Data Models

None (wraps TensorRT runtime objects).

Configuration

Engine filename is GPU-specific (compute capability + SM count).
Workspace memory is 90% of available GPU memory.

External Integrations

None directly — model bytes provided by caller.

Security

None.

Tests

None found.