mirror of
https://github.com/azaion/ai-training.git
synced 2026-04-22 22:26:36 +00:00
142c6c4de8
- Replaced module-level path variables in constants.py with a structured Pydantic Config class. - Updated all relevant modules (train.py, augmentation.py, exports.py, dataset-visualiser.py, manual_run.py) to access paths through the new config structure. - Fixed bugs related to image processing and model saving. - Enhanced test infrastructure to accommodate the new configuration approach. This refactor improves code maintainability and clarity by centralizing configuration management.
39 lines
1.9 KiB
Markdown
39 lines
1.9 KiB
Markdown
# Restrictions
|
|
|
|
## Hardware
|
|
|
|
- Training requires NVIDIA GPU with ≥24GB VRAM (validated: RTX 4090). Batch size 11 consumes ~22GB; batch size 12 exceeds 24GB.
|
|
- TensorRT inference requires NVIDIA GPU with TensorRT support. Engine files are GPU-architecture-specific (compiled per compute capability).
|
|
- ONNX Runtime inference requires NVIDIA GPU with CUDA support (~6.3GB VRAM for 200s video).
|
|
- Edge inference requires RK3588 SoC (OrangePi5).
|
|
- Hardware fingerprinting reads CPU model, GPU name, RAM total, and drive serial — requires access to these system properties.
|
|
|
|
## Software
|
|
|
|
- Python 3.10+ (uses `match` statements).
|
|
- CUDA 12.1 with PyTorch 2.3.0.
|
|
- TensorRT runtime for production GPU inference.
|
|
- ONNX Runtime with CUDAExecutionProvider for cross-platform inference.
|
|
- Albumentations for augmentation transforms.
|
|
- boto3 for S3-compatible CDN access.
|
|
- rstream for RabbitMQ Streams protocol.
|
|
- cryptography library for AES-256-CBC encryption.
|
|
|
|
## Environment
|
|
|
|
- Filesystem paths hardcoded to `/azaion/` root (configurable via `config.yaml`).
|
|
- Requires network access to Azaion REST API, S3-compatible CDN, and RabbitMQ instance.
|
|
- Configuration files (`config.yaml`, `cdn.yaml`) must be present with valid credentials.
|
|
- `classes.json` must be present with the 17 annotation class definitions.
|
|
- No containerization — processes run directly on host OS.
|
|
|
|
## Operational
|
|
|
|
- Training duration: ~11.5 days for 360K annotations on a single RTX 4090.
|
|
- Augmentation runs as an infinite loop with 5-minute sleep intervals.
|
|
- Annotation queue consumer runs as a persistent async process.
|
|
- TensorRT engine files are GPU-architecture-specific — must be regenerated when moving to a different GPU.
|
|
- Model encryption key is hardcoded — changing it invalidates all previously encrypted models.
|
|
- No graceful shutdown mechanism for the augmentation process.
|
|
- No reconnection logic for the annotation queue consumer on disconnect.
|