mirror of https://github.com/azaion/ai-training.git synced 2026-04-22 11:16:35 +00:00

Files

T

Oleksandr Bezdieniezhnykh 142c6c4de8 Refactor constants management to use Pydantic BaseModel for configuration

- Replaced module-level path variables in constants.py with a structured Pydantic Config class.
- Updated all relevant modules (train.py, augmentation.py, exports.py, dataset-visualiser.py, manual_run.py) to access paths through the new config structure.
- Fixed bugs related to image processing and model saving.
- Enhanced test infrastructure to accommodate the new configuration approach.

This refactor improves code maintainability and clarity by centralizing configuration management.

2026-03-27 18:18:30 +02:00

2.6 KiB

Raw Permalink Blame History

Module: inference/tensorrt_engine

Purpose

TensorRT-based inference engine implementation. Provides GPU-accelerated inference using NVIDIA TensorRT with CUDA memory management, plus ONNX-to-TensorRT conversion.

Public Interface

TensorRTEngine (extends InferenceEngine)

Method	Signature	Returns	Description
`__init__`	`(model_bytes: bytes, **kwargs)`	—	Deserializes TensorRT engine from bytes, allocates CUDA memory
`get_input_shape`	`() -> Tuple[int, int]`	(height, width)	Returns model input dimensions
`get_batch_size`	`() -> int`	int	Returns configured batch size
`run`	`(input_data: np.ndarray) -> List[np.ndarray]`	Output tensors	Runs async inference on CUDA stream
`get_gpu_memory_bytes`	`(device_id=0) -> int`	GPU memory in bytes	Queries total GPU VRAM via pynvml (static)
`get_engine_filename`	`(device_id=0) -> str \| None`	Filename string	Generates device-specific engine filename (static)
`convert_from_onnx`	`(onnx_model: bytes) -> bytes \| None`	Serialized TensorRT plan	Converts ONNX model to TensorRT engine (static)

Internal Logic

Initialization: Deserializes TensorRT engine, creates execution context, allocates pinned host memory and device memory for input/output tensors.
Dynamic shapes: Handles -1 (dynamic) dimensions, defaults to 1280×1280 for spatial dims, batch size from engine or constructor.
Output shape: [batch_size, 300 max detections, 6 values per detection (x1, y1, x2, y2, conf, cls)].
Inference flow: Host→Device async copy → execute_async_v3 → synchronize → Device→Host copy.
ONNX conversion: Creates TensorRT builder, parses ONNX, configures workspace (90% of GPU memory), enables FP16 if supported, builds serialized network.
Engine filename: azaion.cc_{major}.{minor}_sm_{sm_count}.engine — uniquely identifies engine per GPU architecture.

Dependencies

inference/onnx_engine — InferenceEngine ABC
tensorrt (external) — TensorRT runtime and builder
pycuda.driver (external) — CUDA memory management
pycuda.autoinit (external) — CUDA context auto-initialization
pynvml (external) — GPU memory query
numpy, json, struct, re, subprocess, pathlib, typing (stdlib/external)

Consumers

start_inference

Data Models

None.

Configuration

None.

External Integrations

NVIDIA TensorRT runtime (GPU inference)
CUDA driver API (memory allocation, streams)
NVML (GPU hardware queries)

Security

None.

Tests

None.

2.6 KiB Raw Permalink Blame History Unescape Escape

Module: inference/tensorrt_engine

Purpose

Public Interface

TensorRTEngine (extends InferenceEngine)

Internal Logic

Dependencies

Consumers

Data Models

Configuration

External Integrations

Security

Tests

2.6 KiB

Raw Permalink Blame History