mirror of https://github.com/azaion/ai-training.git synced 2026-04-22 22:26:36 +00:00

Files

T

Oleksandr Bezdieniezhnykh 142c6c4de8 Refactor constants management to use Pydantic BaseModel for configuration

- Replaced module-level path variables in constants.py with a structured Pydantic Config class.
- Updated all relevant modules (train.py, augmentation.py, exports.py, dataset-visualiser.py, manual_run.py) to access paths through the new config structure.
- Fixed bugs related to image processing and model saving.
- Enhanced test infrastructure to accommodate the new configuration approach.

This refactor improves code maintainability and clarity by centralizing configuration management.

2026-03-27 18:18:30 +02:00

1.9 KiB

Raw Blame History

Restrictions

Hardware

Training requires NVIDIA GPU with ≥24GB VRAM (validated: RTX 4090). Batch size 11 consumes ~22GB; batch size 12 exceeds 24GB.
TensorRT inference requires NVIDIA GPU with TensorRT support. Engine files are GPU-architecture-specific (compiled per compute capability).
ONNX Runtime inference requires NVIDIA GPU with CUDA support (~6.3GB VRAM for 200s video).
Edge inference requires RK3588 SoC (OrangePi5).
Hardware fingerprinting reads CPU model, GPU name, RAM total, and drive serial — requires access to these system properties.

Software

Python 3.10+ (uses match statements).
CUDA 12.1 with PyTorch 2.3.0.
TensorRT runtime for production GPU inference.
ONNX Runtime with CUDAExecutionProvider for cross-platform inference.
Albumentations for augmentation transforms.
boto3 for S3-compatible CDN access.
rstream for RabbitMQ Streams protocol.
cryptography library for AES-256-CBC encryption.

Environment

Filesystem paths hardcoded to /azaion/ root (configurable via config.yaml).
Requires network access to Azaion REST API, S3-compatible CDN, and RabbitMQ instance.
Configuration files (config.yaml, cdn.yaml) must be present with valid credentials.
classes.json must be present with the 17 annotation class definitions.
No containerization — processes run directly on host OS.

Operational

Training duration: ~11.5 days for 360K annotations on a single RTX 4090.
Augmentation runs as an infinite loop with 5-minute sleep intervals.
Annotation queue consumer runs as a persistent async process.
TensorRT engine files are GPU-architecture-specific — must be regenerated when moving to a different GPU.
Model encryption key is hardcoded — changing it invalidates all previously encrypted models.
No graceful shutdown mechanism for the augmentation process.
No reconnection logic for the annotation queue consumer on disconnect.

1.9 KiB Raw Blame History

Restrictions

Hardware

Software

Environment

Operational

1.9 KiB

Raw Blame History