Files
ai-training/_docs/00_problem/restrictions.md
T
Oleksandr Bezdieniezhnykh 142c6c4de8 Refactor constants management to use Pydantic BaseModel for configuration
- Replaced module-level path variables in constants.py with a structured Pydantic Config class.
- Updated all relevant modules (train.py, augmentation.py, exports.py, dataset-visualiser.py, manual_run.py) to access paths through the new config structure.
- Fixed bugs related to image processing and model saving.
- Enhanced test infrastructure to accommodate the new configuration approach.

This refactor improves code maintainability and clarity by centralizing configuration management.
2026-03-27 18:18:30 +02:00

1.9 KiB

Restrictions

Hardware

  • Training requires NVIDIA GPU with ≥24GB VRAM (validated: RTX 4090). Batch size 11 consumes ~22GB; batch size 12 exceeds 24GB.
  • TensorRT inference requires NVIDIA GPU with TensorRT support. Engine files are GPU-architecture-specific (compiled per compute capability).
  • ONNX Runtime inference requires NVIDIA GPU with CUDA support (~6.3GB VRAM for 200s video).
  • Edge inference requires RK3588 SoC (OrangePi5).
  • Hardware fingerprinting reads CPU model, GPU name, RAM total, and drive serial — requires access to these system properties.

Software

  • Python 3.10+ (uses match statements).
  • CUDA 12.1 with PyTorch 2.3.0.
  • TensorRT runtime for production GPU inference.
  • ONNX Runtime with CUDAExecutionProvider for cross-platform inference.
  • Albumentations for augmentation transforms.
  • boto3 for S3-compatible CDN access.
  • rstream for RabbitMQ Streams protocol.
  • cryptography library for AES-256-CBC encryption.

Environment

  • Filesystem paths hardcoded to /azaion/ root (configurable via config.yaml).
  • Requires network access to Azaion REST API, S3-compatible CDN, and RabbitMQ instance.
  • Configuration files (config.yaml, cdn.yaml) must be present with valid credentials.
  • classes.json must be present with the 17 annotation class definitions.
  • No containerization — processes run directly on host OS.

Operational

  • Training duration: ~11.5 days for 360K annotations on a single RTX 4090.
  • Augmentation runs as an infinite loop with 5-minute sleep intervals.
  • Annotation queue consumer runs as a persistent async process.
  • TensorRT engine files are GPU-architecture-specific — must be regenerated when moving to a different GPU.
  • Model encryption key is hardcoded — changing it invalidates all previously encrypted models.
  • No graceful shutdown mechanism for the augmentation process.
  • No reconnection logic for the annotation queue consumer on disconnect.