ai-training/_docs/00_problem/restrictions.md

# Restrictions

## Hardware

- Training requires NVIDIA GPU with ≥24GB VRAM (validated: RTX 4090). Batch size 11 consumes ~22GB; batch size 12 exceeds 24GB.
- TensorRT inference requires NVIDIA GPU with TensorRT support. Engine files are GPU-architecture-specific (compiled per compute capability).
- ONNX Runtime inference requires NVIDIA GPU with CUDA support (~6.3GB VRAM for 200s video).
- Edge inference requires RK3588 SoC (OrangePi5).
- Hardware fingerprinting reads CPU model, GPU name, RAM total, and drive serial — requires access to these system properties.

## Software

- Python 3.10+ (uses `match` statements).
- CUDA 12.1 with PyTorch 2.3.0.
- TensorRT runtime for production GPU inference.
- ONNX Runtime with CUDAExecutionProvider for cross-platform inference.
- Albumentations for augmentation transforms.
- boto3 for S3-compatible CDN access.
- rstream for RabbitMQ Streams protocol.
- cryptography library for AES-256-CBC encryption.

## Environment

- Filesystem paths hardcoded to `/azaion/` root (configurable via `config.yaml`).
- Requires network access to Azaion REST API, S3-compatible CDN, and RabbitMQ instance.
- Configuration files (`config.yaml`, `cdn.yaml`) must be present with valid credentials.
- `classes.json` must be present with the 17 annotation class definitions.
- No containerization — processes run directly on host OS.

## Operational

- Training duration: ~11.5 days for 360K annotations on a single RTX 4090.
- Augmentation runs as an infinite loop with 5-minute sleep intervals.
- Annotation queue consumer runs as a persistent async process.
- TensorRT engine files are GPU-architecture-specific — must be regenerated when moving to a different GPU.
- Model encryption key is hardcoded — changing it invalidates all previously encrypted models.
- No graceful shutdown mechanism for the augmentation process.
- No reconnection logic for the annotation queue consumer on disconnect.