mirror of
https://github.com/azaion/ai-training.git
synced 2026-04-22 13:26:35 +00:00
Refactor constants management to use Pydantic BaseModel for configuration
- Replaced module-level path variables in constants.py with a structured Pydantic Config class. - Updated all relevant modules (train.py, augmentation.py, exports.py, dataset-visualiser.py, manual_run.py) to access paths through the new config structure. - Fixed bugs related to image processing and model saving. - Enhanced test infrastructure to accommodate the new configuration approach. This refactor improves code maintainability and clarity by centralizing configuration management.
This commit is contained in:
@@ -0,0 +1,53 @@
|
||||
# Module: inference/tensorrt_engine
|
||||
|
||||
## Purpose
|
||||
TensorRT-based inference engine implementation. Provides GPU-accelerated inference using NVIDIA TensorRT with CUDA memory management, plus ONNX-to-TensorRT conversion.
|
||||
|
||||
## Public Interface
|
||||
|
||||
### TensorRTEngine (extends InferenceEngine)
|
||||
| Method | Signature | Returns | Description |
|
||||
|--------|-----------|---------|-------------|
|
||||
| `__init__` | `(model_bytes: bytes, **kwargs)` | — | Deserializes TensorRT engine from bytes, allocates CUDA memory |
|
||||
| `get_input_shape` | `() -> Tuple[int, int]` | (height, width) | Returns model input dimensions |
|
||||
| `get_batch_size` | `() -> int` | int | Returns configured batch size |
|
||||
| `run` | `(input_data: np.ndarray) -> List[np.ndarray]` | Output tensors | Runs async inference on CUDA stream |
|
||||
| `get_gpu_memory_bytes` | `(device_id=0) -> int` | GPU memory in bytes | Queries total GPU VRAM via pynvml (static) |
|
||||
| `get_engine_filename` | `(device_id=0) -> str \| None` | Filename string | Generates device-specific engine filename (static) |
|
||||
| `convert_from_onnx` | `(onnx_model: bytes) -> bytes \| None` | Serialized TensorRT plan | Converts ONNX model to TensorRT engine (static) |
|
||||
|
||||
## Internal Logic
|
||||
- **Initialization**: Deserializes TensorRT engine, creates execution context, allocates pinned host memory and device memory for input/output tensors.
|
||||
- **Dynamic shapes**: Handles -1 (dynamic) dimensions, defaults to 1280×1280 for spatial dims, batch size from engine or constructor.
|
||||
- **Output shape**: [batch_size, 300 max detections, 6 values per detection (x1, y1, x2, y2, conf, cls)].
|
||||
- **Inference flow**: Host→Device async copy → execute_async_v3 → synchronize → Device→Host copy.
|
||||
- **ONNX conversion**: Creates TensorRT builder, parses ONNX, configures workspace (90% of GPU memory), enables FP16 if supported, builds serialized network.
|
||||
- **Engine filename**: `azaion.cc_{major}.{minor}_sm_{sm_count}.engine` — uniquely identifies engine per GPU architecture.
|
||||
|
||||
## Dependencies
|
||||
- `inference/onnx_engine` — InferenceEngine ABC
|
||||
- `tensorrt` (external) — TensorRT runtime and builder
|
||||
- `pycuda.driver` (external) — CUDA memory management
|
||||
- `pycuda.autoinit` (external) — CUDA context auto-initialization
|
||||
- `pynvml` (external) — GPU memory query
|
||||
- `numpy`, `json`, `struct`, `re`, `subprocess`, `pathlib`, `typing` (stdlib/external)
|
||||
|
||||
## Consumers
|
||||
start_inference
|
||||
|
||||
## Data Models
|
||||
None.
|
||||
|
||||
## Configuration
|
||||
None.
|
||||
|
||||
## External Integrations
|
||||
- NVIDIA TensorRT runtime (GPU inference)
|
||||
- CUDA driver API (memory allocation, streams)
|
||||
- NVML (GPU hardware queries)
|
||||
|
||||
## Security
|
||||
None.
|
||||
|
||||
## Tests
|
||||
None.
|
||||
Reference in New Issue
Block a user