mirror of
https://github.com/azaion/ai-training.git
synced 2026-04-22 06:56:34 +00:00
142c6c4de8
- Replaced module-level path variables in constants.py with a structured Pydantic Config class. - Updated all relevant modules (train.py, augmentation.py, exports.py, dataset-visualiser.py, manual_run.py) to access paths through the new config structure. - Fixed bugs related to image processing and model saving. - Enhanced test infrastructure to accommodate the new configuration approach. This refactor improves code maintainability and clarity by centralizing configuration management.
2.6 KiB
2.6 KiB
Module: inference/tensorrt_engine
Purpose
TensorRT-based inference engine implementation. Provides GPU-accelerated inference using NVIDIA TensorRT with CUDA memory management, plus ONNX-to-TensorRT conversion.
Public Interface
TensorRTEngine (extends InferenceEngine)
| Method | Signature | Returns | Description |
|---|---|---|---|
__init__ |
(model_bytes: bytes, **kwargs) |
— | Deserializes TensorRT engine from bytes, allocates CUDA memory |
get_input_shape |
() -> Tuple[int, int] |
(height, width) | Returns model input dimensions |
get_batch_size |
() -> int |
int | Returns configured batch size |
run |
(input_data: np.ndarray) -> List[np.ndarray] |
Output tensors | Runs async inference on CUDA stream |
get_gpu_memory_bytes |
(device_id=0) -> int |
GPU memory in bytes | Queries total GPU VRAM via pynvml (static) |
get_engine_filename |
(device_id=0) -> str | None |
Filename string | Generates device-specific engine filename (static) |
convert_from_onnx |
(onnx_model: bytes) -> bytes | None |
Serialized TensorRT plan | Converts ONNX model to TensorRT engine (static) |
Internal Logic
- Initialization: Deserializes TensorRT engine, creates execution context, allocates pinned host memory and device memory for input/output tensors.
- Dynamic shapes: Handles -1 (dynamic) dimensions, defaults to 1280×1280 for spatial dims, batch size from engine or constructor.
- Output shape: [batch_size, 300 max detections, 6 values per detection (x1, y1, x2, y2, conf, cls)].
- Inference flow: Host→Device async copy → execute_async_v3 → synchronize → Device→Host copy.
- ONNX conversion: Creates TensorRT builder, parses ONNX, configures workspace (90% of GPU memory), enables FP16 if supported, builds serialized network.
- Engine filename:
azaion.cc_{major}.{minor}_sm_{sm_count}.engine— uniquely identifies engine per GPU architecture.
Dependencies
inference/onnx_engine— InferenceEngine ABCtensorrt(external) — TensorRT runtime and builderpycuda.driver(external) — CUDA memory managementpycuda.autoinit(external) — CUDA context auto-initializationpynvml(external) — GPU memory querynumpy,json,struct,re,subprocess,pathlib,typing(stdlib/external)
Consumers
start_inference
Data Models
None.
Configuration
None.
External Integrations
- NVIDIA TensorRT runtime (GPU inference)
- CUDA driver API (memory allocation, streams)
- NVML (GPU hardware queries)
Security
None.
Tests
None.