# Architecture ## System Context Azaion AI Training is a Python-based ML pipeline for training, exporting, and deploying YOLOv11 object detection models. The system operates within the Azaion platform ecosystem, consuming annotated image data and producing encrypted inference-ready models. ### Boundaries | Boundary | Interface | Protocol | |----------|-----------|----------| | Azaion REST API | ApiClient | HTTPS (JWT auth) | | S3-compatible CDN | CDNManager (boto3) | HTTPS (S3 API) | | RabbitMQ Streams | rstream Consumer | AMQP 1.0 | | Local filesystem | Direct I/O | POSIX paths at `/azaion/` | | NVIDIA GPU | PyTorch, TensorRT, ONNX RT, PyCUDA | CUDA 12.1 | ### System Context Diagram ```mermaid graph LR subgraph "Azaion Platform" API[Azaion REST API] CDN[S3-compatible CDN] Queue[RabbitMQ Streams] end subgraph "AI Training System" AQ[Annotation Queue Consumer] AUG[Augmentation Pipeline] TRAIN[Training Pipeline] INF[Inference Engine] end subgraph "Storage" FS["/azaion/ filesystem"] end subgraph "Hardware" GPU[NVIDIA GPU] end Queue -->|annotation events| AQ AQ -->|images + labels| FS FS -->|raw annotations| AUG AUG -->|augmented data| FS FS -->|processed dataset| TRAIN TRAIN -->|trained model| GPU TRAIN -->|encrypted model| API TRAIN -->|encrypted model big part| CDN API -->|encrypted model small part| INF CDN -->|encrypted model big part| INF INF -->|inference| GPU ``` ## Tech Stack | Layer | Technology | Version/Detail | |-------|-----------|---------------| | Language | Python | 3.10+ (match statements) | | ML Framework | Ultralytics YOLO | YOLOv11 medium | | Deep Learning | PyTorch | 2.3.0 (CUDA 12.1) | | GPU Inference | TensorRT | FP16/INT8, async CUDA streams | | GPU Inference (alt) | ONNX Runtime GPU | CUDAExecutionProvider | | Edge Inference | RKNN | RK3588 (OrangePi5) | | Augmentation | Albumentations | Geometric + color transforms | | Computer Vision | OpenCV | Image I/O, preprocessing, display | | Object Storage | boto3 | S3-compatible CDN | | Message Queue | rstream | RabbitMQ Streams consumer | | Serialization | msgpack | Queue message format | | Encryption | cryptography | AES-256-CBC | | HTTP Client | requests | REST API communication | | Configuration | PyYAML | YAML config files | | Visualization | matplotlib, netron | Annotation display, model graphs | ## Deployment Model The system runs as multiple independent processes on machines with NVIDIA GPUs: | Process | Entry Point | Runtime | Typical Host | |---------|------------|---------|-------------| | Training | `train.py` | Long-running (days) | GPU server (RTX 4090, 24GB VRAM) | | Augmentation | `augmentation.py` | Continuous loop (infinite) | Same GPU server or CPU-only | | Annotation Queue | `annotation-queue/annotation_queue_handler.py` | Continuous (async) | Any server with network access | | Inference | `start_inference.py` | On-demand | GPU-equipped machine | | Data Tools | `convert-annotations.py`, `dataset-visualiser.py` | Ad-hoc | Developer machine | No containerization (Dockerfile), CI/CD pipeline, or orchestration infrastructure was found in the codebase. Deployment appears to be manual. ## Data Model Overview ### Annotation Data Flow ``` Raw annotations (Queue) → /azaion/data-seed/ (unvalidated) → /azaion/data/ (validated) → /azaion/data-processed/ (augmented, 8×) → /azaion/datasets/azaion-{date}/ (train/valid/test split) → /azaion/data-corrupted/ (invalid labels) → /azaion/data_deleted/ (soft-deleted) ``` ### Annotation Class System - 17 base classes (ArmorVehicle, Truck, Vehicle, Artillery, Shadow, Trenches, MilitaryMan, TyreTracks, AdditArmoredTank, Smoke, Plane, Moto, CamouflageNet, CamouflageBranches, Roof, Building, Caponier) - 3 weather modes: Norm (offset 0), Wint (offset 20), Night (offset 40) - Total class slots: 80 (17 × 3 = 51 used, 29 reserved) - Format: YOLO (center_x, center_y, width, height — all normalized 0–1) ### Model Artifacts | Format | Use | Export Details | |--------|-----|---------------| | `.pt` | Training checkpoint | YOLOv11 PyTorch weights | | `.onnx` | Cross-platform inference | 1280px, batch=4, NMS baked in | | `.engine` | GPU inference (production) | TensorRT FP16, batch=4, per-GPU architecture | | `.rknn` | Edge inference | RK3588 target (OrangePi5) | ## Integration Points ### Azaion REST API - `POST /login` → JWT token - `POST /resources/{folder}` → file upload (Bearer auth) - `POST /resources/get/{folder}` → encrypted file download (hardware-bound key) ### S3-compatible CDN - Upload: model big parts (`upload_fileobj`) - Download: model big parts (`download_file`) - Separate read/write access keys ### RabbitMQ Streams - Queue: `azaion-annotations` - Protocol: AMQP with rstream library - Message format: msgpack with positional integer keys - Offset tracking: persisted to `offset.yaml` ## Non-Functional Requirements (Observed) | Category | Observation | Source | |----------|------------|--------| | Training duration | ~11.5 days for 360K annotations on 1× RTX 4090 | Code comment in train.py | | VRAM usage | batch=11 → ~22GB (batch=12 fails at 24.2GB) | Code comment in train.py | | Inference speed | TensorRT: 54s for 200s video (3.7GB VRAM) | Code comment in start_inference.py | | ONNX inference | 81s for 200s video (6.3GB VRAM) | Code comment in start_inference.py | | Augmentation ratio | 8× (1 original + 7 augmented per image) | augmentation.py | | Frame sampling | Every 4th frame during inference | inference/inference.py | ## Security Architecture | Mechanism | Implementation | Location | |-----------|---------------|----------| | API authentication | JWT token (email/password login) | api_client.py | | Resource encryption | AES-256-CBC (hardware-bound key) | security.py | | Model encryption | AES-256-CBC (static key) | security.py | | Split model storage | Small part on API, big part on CDN | api_client.py | | Hardware fingerprinting | CPU+GPU+RAM+drive serial hash | hardware_service.py | | CDN access control | Separate read/write S3 credentials | cdn_manager.py | ### Security Concerns - Hardcoded credentials in `config.yaml` and `cdn.yaml` - Hardcoded model encryption key in `security.py` - No TLS certificate validation visible in code - No input validation on API responses - Queue credentials in plaintext config files ## Key Architectural Decisions | Decision | Rationale (inferred) | |----------|---------------------| | YOLOv11 medium at 1280px | Balance between detection quality and training time | | Split model storage | Prevent model theft from single storage compromise | | Hardware-bound API encryption | Tie resource access to authorized machines | | TensorRT for production inference | ~33% faster than ONNX, ~42% less VRAM | | Augmentation as separate process | Decouples data prep from training; runs continuously | | Annotation queue as separate service | Independent lifecycle; different dependency set | | RKNN export for OrangePi5 | Edge deployment on low-power ARM SoC |