Files
Oleksandr Bezdieniezhnykh 142c6c4de8 Refactor constants management to use Pydantic BaseModel for configuration
- Replaced module-level path variables in constants.py with a structured Pydantic Config class.
- Updated all relevant modules (train.py, augmentation.py, exports.py, dataset-visualiser.py, manual_run.py) to access paths through the new config structure.
- Fixed bugs related to image processing and model saving.
- Enhanced test infrastructure to accommodate the new configuration approach.

This refactor improves code maintainability and clarity by centralizing configuration management.
2026-03-27 18:18:30 +02:00

2.6 KiB
Raw Permalink Blame History

Module: inference/tensorrt_engine

Purpose

TensorRT-based inference engine implementation. Provides GPU-accelerated inference using NVIDIA TensorRT with CUDA memory management, plus ONNX-to-TensorRT conversion.

Public Interface

TensorRTEngine (extends InferenceEngine)

Method Signature Returns Description
__init__ (model_bytes: bytes, **kwargs) Deserializes TensorRT engine from bytes, allocates CUDA memory
get_input_shape () -> Tuple[int, int] (height, width) Returns model input dimensions
get_batch_size () -> int int Returns configured batch size
run (input_data: np.ndarray) -> List[np.ndarray] Output tensors Runs async inference on CUDA stream
get_gpu_memory_bytes (device_id=0) -> int GPU memory in bytes Queries total GPU VRAM via pynvml (static)
get_engine_filename (device_id=0) -> str | None Filename string Generates device-specific engine filename (static)
convert_from_onnx (onnx_model: bytes) -> bytes | None Serialized TensorRT plan Converts ONNX model to TensorRT engine (static)

Internal Logic

  • Initialization: Deserializes TensorRT engine, creates execution context, allocates pinned host memory and device memory for input/output tensors.
  • Dynamic shapes: Handles -1 (dynamic) dimensions, defaults to 1280×1280 for spatial dims, batch size from engine or constructor.
  • Output shape: [batch_size, 300 max detections, 6 values per detection (x1, y1, x2, y2, conf, cls)].
  • Inference flow: Host→Device async copy → execute_async_v3 → synchronize → Device→Host copy.
  • ONNX conversion: Creates TensorRT builder, parses ONNX, configures workspace (90% of GPU memory), enables FP16 if supported, builds serialized network.
  • Engine filename: azaion.cc_{major}.{minor}_sm_{sm_count}.engine — uniquely identifies engine per GPU architecture.

Dependencies

  • inference/onnx_engine — InferenceEngine ABC
  • tensorrt (external) — TensorRT runtime and builder
  • pycuda.driver (external) — CUDA memory management
  • pycuda.autoinit (external) — CUDA context auto-initialization
  • pynvml (external) — GPU memory query
  • numpy, json, struct, re, subprocess, pathlib, typing (stdlib/external)

Consumers

start_inference

Data Models

None.

Configuration

None.

External Integrations

  • NVIDIA TensorRT runtime (GPU inference)
  • CUDA driver API (memory allocation, streams)
  • NVML (GPU hardware queries)

Security

None.

Tests

None.