mirror of
https://github.com/azaion/ai-training.git
synced 2026-04-23 00:26:35 +00:00
Refactor constants management to use Pydantic BaseModel for configuration
- Replaced module-level path variables in constants.py with a structured Pydantic Config class. - Updated all relevant modules (train.py, augmentation.py, exports.py, dataset-visualiser.py, manual_run.py) to access paths through the new config structure. - Fixed bugs related to image processing and model saving. - Enhanced test infrastructure to accommodate the new configuration approach. This refactor improves code maintainability and clarity by centralizing configuration management.
This commit is contained in:
@@ -0,0 +1,38 @@
|
||||
# Restrictions
|
||||
|
||||
## Hardware
|
||||
|
||||
- Training requires NVIDIA GPU with ≥24GB VRAM (validated: RTX 4090). Batch size 11 consumes ~22GB; batch size 12 exceeds 24GB.
|
||||
- TensorRT inference requires NVIDIA GPU with TensorRT support. Engine files are GPU-architecture-specific (compiled per compute capability).
|
||||
- ONNX Runtime inference requires NVIDIA GPU with CUDA support (~6.3GB VRAM for 200s video).
|
||||
- Edge inference requires RK3588 SoC (OrangePi5).
|
||||
- Hardware fingerprinting reads CPU model, GPU name, RAM total, and drive serial — requires access to these system properties.
|
||||
|
||||
## Software
|
||||
|
||||
- Python 3.10+ (uses `match` statements).
|
||||
- CUDA 12.1 with PyTorch 2.3.0.
|
||||
- TensorRT runtime for production GPU inference.
|
||||
- ONNX Runtime with CUDAExecutionProvider for cross-platform inference.
|
||||
- Albumentations for augmentation transforms.
|
||||
- boto3 for S3-compatible CDN access.
|
||||
- rstream for RabbitMQ Streams protocol.
|
||||
- cryptography library for AES-256-CBC encryption.
|
||||
|
||||
## Environment
|
||||
|
||||
- Filesystem paths hardcoded to `/azaion/` root (configurable via `config.yaml`).
|
||||
- Requires network access to Azaion REST API, S3-compatible CDN, and RabbitMQ instance.
|
||||
- Configuration files (`config.yaml`, `cdn.yaml`) must be present with valid credentials.
|
||||
- `classes.json` must be present with the 17 annotation class definitions.
|
||||
- No containerization — processes run directly on host OS.
|
||||
|
||||
## Operational
|
||||
|
||||
- Training duration: ~11.5 days for 360K annotations on a single RTX 4090.
|
||||
- Augmentation runs as an infinite loop with 5-minute sleep intervals.
|
||||
- Annotation queue consumer runs as a persistent async process.
|
||||
- TensorRT engine files are GPU-architecture-specific — must be regenerated when moving to a different GPU.
|
||||
- Model encryption key is hardcoded — changing it invalidates all previously encrypted models.
|
||||
- No graceful shutdown mechanism for the augmentation process.
|
||||
- No reconnection logic for the annotation queue consumer on disconnect.
|
||||
Reference in New Issue
Block a user