Architecture
System Context
Azaion AI Training is a Python-based ML pipeline for training, exporting, and deploying YOLOv11 object detection models. The system operates within the Azaion platform ecosystem, consuming annotated image data and producing encrypted inference-ready models.
Boundaries
| Boundary |
Interface |
Protocol |
| Azaion REST API |
ApiClient |
HTTPS (JWT auth) |
| S3-compatible CDN |
CDNManager (boto3) |
HTTPS (S3 API) |
| RabbitMQ Streams |
rstream Consumer |
AMQP 1.0 |
| Local filesystem |
Direct I/O |
POSIX paths at /azaion/ |
| NVIDIA GPU |
PyTorch, TensorRT, ONNX RT, PyCUDA |
CUDA 12.1 |
System Context Diagram
Tech Stack
| Layer |
Technology |
Version/Detail |
| Language |
Python |
3.10+ (match statements) |
| ML Framework |
Ultralytics YOLO |
YOLOv11 medium |
| Deep Learning |
PyTorch |
2.3.0 (CUDA 12.1) |
| GPU Inference |
TensorRT |
FP16/INT8, async CUDA streams |
| GPU Inference (alt) |
ONNX Runtime GPU |
CUDAExecutionProvider |
| Edge Inference |
RKNN |
RK3588 (OrangePi5) |
| Augmentation |
Albumentations |
Geometric + color transforms |
| Computer Vision |
OpenCV |
Image I/O, preprocessing, display |
| Object Storage |
boto3 |
S3-compatible CDN |
| Message Queue |
rstream |
RabbitMQ Streams consumer |
| Serialization |
msgpack |
Queue message format |
| Encryption |
cryptography |
AES-256-CBC |
| HTTP Client |
requests |
REST API communication |
| Configuration |
PyYAML |
YAML config files |
| Visualization |
matplotlib, netron |
Annotation display, model graphs |
Deployment Model
The system runs as multiple independent processes on machines with NVIDIA GPUs:
| Process |
Entry Point |
Runtime |
Typical Host |
| Training |
train.py |
Long-running (days) |
GPU server (RTX 4090, 24GB VRAM) |
| Augmentation |
augmentation.py |
Continuous loop (infinite) |
Same GPU server or CPU-only |
| Annotation Queue |
annotation-queue/annotation_queue_handler.py |
Continuous (async) |
Any server with network access |
| Inference |
start_inference.py |
On-demand |
GPU-equipped machine |
| Data Tools |
convert-annotations.py, dataset-visualiser.py |
Ad-hoc |
Developer machine |
No containerization (Dockerfile), CI/CD pipeline, or orchestration infrastructure was found in the codebase. Deployment appears to be manual.
Data Model Overview
Annotation Data Flow
Annotation Class System
- 17 base classes (ArmorVehicle, Truck, Vehicle, Artillery, Shadow, Trenches, MilitaryMan, TyreTracks, AdditArmoredTank, Smoke, Plane, Moto, CamouflageNet, CamouflageBranches, Roof, Building, Caponier)
- 3 weather modes: Norm (offset 0), Wint (offset 20), Night (offset 40)
- Total class slots: 80 (17 × 3 = 51 used, 29 reserved)
- Format: YOLO (center_x, center_y, width, height — all normalized 0–1)
Model Artifacts
| Format |
Use |
Export Details |
.pt |
Training checkpoint |
YOLOv11 PyTorch weights |
.onnx |
Cross-platform inference |
1280px, dynamic batch (1–8), NMS baked in |
.engine |
GPU inference (production) |
TensorRT FP16, dynamic batch max 8, per-GPU architecture |
.rknn |
Edge inference |
RK3588 target (OrangePi5) |
Integration Points
Azaion REST API
POST /login → JWT token
POST /resources/{folder} → file upload (Bearer auth)
POST /resources/get/{folder} → encrypted file download (hardware-bound key)
S3-compatible CDN
- Upload: model big parts (
upload_fileobj)
- Download: model big parts (
download_file)
- Separate read/write access keys
RabbitMQ Streams
- Queue:
azaion-annotations
- Protocol: AMQP with rstream library
- Message format: msgpack with positional integer keys
- Offset tracking: persisted to
offset.yaml
Non-Functional Requirements (Observed)
| Category |
Observation |
Source |
| Training duration |
~11.5 days for 360K annotations on 1× RTX 4090 |
Code comment in train.py |
| VRAM usage |
batch=11 → ~22GB (batch=12 fails at 24.2GB) |
Code comment in train.py |
| Inference speed |
TensorRT: 54s for 200s video (3.7GB VRAM) |
Code comment in start_inference.py |
| ONNX inference |
81s for 200s video (6.3GB VRAM) |
Code comment in start_inference.py |
| Augmentation ratio |
8× (1 original + 7 augmented per image) |
augmentation.py |
| Frame sampling |
Every 4th frame during inference |
inference/inference.py |
Security Architecture
| Mechanism |
Implementation |
Location |
| API authentication |
JWT token (email/password login) |
api_client.py |
| Resource encryption |
AES-256-CBC (hardware-bound key) |
security.py |
| Model encryption |
AES-256-CBC (static key) |
security.py |
| Split model storage |
Small part on API, big part on CDN |
api_client.py |
| Hardware fingerprinting |
CPU+GPU+RAM+drive serial hash |
hardware_service.py |
| CDN access control |
Separate read/write S3 credentials |
cdn_manager.py |
Security Concerns
- Hardcoded credentials in
config.yaml and cdn.yaml
- Hardcoded model encryption key in
security.py
- No TLS certificate validation visible in code
- No input validation on API responses
- Queue credentials in plaintext config files
Key Architectural Decisions
| Decision |
Rationale (inferred) |
| YOLOv11 medium at 1280px |
Balance between detection quality and training time |
| Split model storage |
Prevent model theft from single storage compromise |
| Hardware-bound API encryption |
Tie resource access to authorized machines |
| TensorRT for production inference |
~33% faster than ONNX, ~42% less VRAM |
| Augmentation as separate process |
Decouples data prep from training; runs continuously |
| Annotation queue as separate service |
Independent lifecycle; different dependency set |
| RKNN export for OrangePi5 |
Edge deployment on low-power ARM SoC |