mirror of
https://github.com/azaion/ai-training.git
synced 2026-04-22 22:26:36 +00:00
142c6c4de8
- Replaced module-level path variables in constants.py with a structured Pydantic Config class. - Updated all relevant modules (train.py, augmentation.py, exports.py, dataset-visualiser.py, manual_run.py) to access paths through the new config structure. - Fixed bugs related to image processing and model saving. - Enhanced test infrastructure to accommodate the new configuration approach. This refactor improves code maintainability and clarity by centralizing configuration management.
1.9 KiB
1.9 KiB
Restrictions
Hardware
- Training requires NVIDIA GPU with ≥24GB VRAM (validated: RTX 4090). Batch size 11 consumes ~22GB; batch size 12 exceeds 24GB.
- TensorRT inference requires NVIDIA GPU with TensorRT support. Engine files are GPU-architecture-specific (compiled per compute capability).
- ONNX Runtime inference requires NVIDIA GPU with CUDA support (~6.3GB VRAM for 200s video).
- Edge inference requires RK3588 SoC (OrangePi5).
- Hardware fingerprinting reads CPU model, GPU name, RAM total, and drive serial — requires access to these system properties.
Software
- Python 3.10+ (uses
matchstatements). - CUDA 12.1 with PyTorch 2.3.0.
- TensorRT runtime for production GPU inference.
- ONNX Runtime with CUDAExecutionProvider for cross-platform inference.
- Albumentations for augmentation transforms.
- boto3 for S3-compatible CDN access.
- rstream for RabbitMQ Streams protocol.
- cryptography library for AES-256-CBC encryption.
Environment
- Filesystem paths hardcoded to
/azaion/root (configurable viaconfig.yaml). - Requires network access to Azaion REST API, S3-compatible CDN, and RabbitMQ instance.
- Configuration files (
config.yaml,cdn.yaml) must be present with valid credentials. classes.jsonmust be present with the 17 annotation class definitions.- No containerization — processes run directly on host OS.
Operational
- Training duration: ~11.5 days for 360K annotations on a single RTX 4090.
- Augmentation runs as an infinite loop with 5-minute sleep intervals.
- Annotation queue consumer runs as a persistent async process.
- TensorRT engine files are GPU-architecture-specific — must be regenerated when moving to a different GPU.
- Model encryption key is hardcoded — changing it invalidates all previously encrypted models.
- No graceful shutdown mechanism for the augmentation process.
- No reconnection logic for the annotation queue consumer on disconnect.