mirror of https://github.com/azaion/ai-training.git synced 2026-04-22 22:26:36 +00:00

Files

T

Oleksandr Bezdieniezhnykh 433e080a07 [AZ-171] Enable dynamic batch size for ONNX, TensorRT, and CoreML exports

Made-with: Cursor

2026-03-28 17:25:15 +02:00

7.1 KiB

Raw Blame History

Architecture

System Context

Azaion AI Training is a Python-based ML pipeline for training, exporting, and deploying YOLOv11 object detection models. The system operates within the Azaion platform ecosystem, consuming annotated image data and producing encrypted inference-ready models.

Boundaries

Boundary	Interface	Protocol
Azaion REST API	ApiClient	HTTPS (JWT auth)
S3-compatible CDN	CDNManager (boto3)	HTTPS (S3 API)
RabbitMQ Streams	rstream Consumer	AMQP 1.0
Local filesystem	Direct I/O	POSIX paths at `/azaion/`
NVIDIA GPU	PyTorch, TensorRT, ONNX RT, PyCUDA	CUDA 12.1

System Context Diagram

graph LR
    subgraph "Azaion Platform"
        API[Azaion REST API]
        CDN[S3-compatible CDN]
        Queue[RabbitMQ Streams]
    end

    subgraph "AI Training System"
        AQ[Annotation Queue Consumer]
        AUG[Augmentation Pipeline]
        TRAIN[Training Pipeline]
        INF[Inference Engine]
    end

    subgraph "Storage"
        FS["/azaion/ filesystem"]
    end

    subgraph "Hardware"
        GPU[NVIDIA GPU]
    end

    Queue -->|annotation events| AQ
    AQ -->|images + labels| FS
    FS -->|raw annotations| AUG
    AUG -->|augmented data| FS
    FS -->|processed dataset| TRAIN
    TRAIN -->|trained model| GPU
    TRAIN -->|encrypted model| API
    TRAIN -->|encrypted model big part| CDN
    API -->|encrypted model small part| INF
    CDN -->|encrypted model big part| INF
    INF -->|inference| GPU

Tech Stack

Layer	Technology	Version/Detail
Language	Python	3.10+ (match statements)
ML Framework	Ultralytics YOLO	YOLOv11 medium
Deep Learning	PyTorch	2.3.0 (CUDA 12.1)
GPU Inference	TensorRT	FP16/INT8, async CUDA streams
GPU Inference (alt)	ONNX Runtime GPU	CUDAExecutionProvider
Edge Inference	RKNN	RK3588 (OrangePi5)
Augmentation	Albumentations	Geometric + color transforms
Computer Vision	OpenCV	Image I/O, preprocessing, display
Object Storage	boto3	S3-compatible CDN
Message Queue	rstream	RabbitMQ Streams consumer
Serialization	msgpack	Queue message format
Encryption	cryptography	AES-256-CBC
HTTP Client	requests	REST API communication
Configuration	PyYAML	YAML config files
Visualization	matplotlib, netron	Annotation display, model graphs

Deployment Model

The system runs as multiple independent processes on machines with NVIDIA GPUs:

Process	Entry Point	Runtime	Typical Host
Training	`train.py`	Long-running (days)	GPU server (RTX 4090, 24GB VRAM)
Augmentation	`augmentation.py`	Continuous loop (infinite)	Same GPU server or CPU-only
Annotation Queue	`annotation-queue/annotation_queue_handler.py`	Continuous (async)	Any server with network access
Inference	`start_inference.py`	On-demand	GPU-equipped machine
Data Tools	`convert-annotations.py`, `dataset-visualiser.py`	Ad-hoc	Developer machine

No containerization (Dockerfile), CI/CD pipeline, or orchestration infrastructure was found in the codebase. Deployment appears to be manual.

Data Model Overview

Annotation Data Flow

Raw annotations (Queue) → /azaion/data-seed/ (unvalidated)
                        → /azaion/data/ (validated)
                        → /azaion/data-processed/ (augmented, 8×)
                        → /azaion/datasets/azaion-{date}/ (train/valid/test split)
                        → /azaion/data-corrupted/ (invalid labels)
                        → /azaion/data_deleted/ (soft-deleted)

Annotation Class System

17 base classes (ArmorVehicle, Truck, Vehicle, Artillery, Shadow, Trenches, MilitaryMan, TyreTracks, AdditArmoredTank, Smoke, Plane, Moto, CamouflageNet, CamouflageBranches, Roof, Building, Caponier)
3 weather modes: Norm (offset 0), Wint (offset 20), Night (offset 40)
Total class slots: 80 (17 × 3 = 51 used, 29 reserved)
Format: YOLO (center_x, center_y, width, height — all normalized 0–1)

Model Artifacts

Format	Use	Export Details
`.pt`	Training checkpoint	YOLOv11 PyTorch weights
`.onnx`	Cross-platform inference	1280px, dynamic batch (1–8), NMS baked in
`.engine`	GPU inference (production)	TensorRT FP16, dynamic batch max 8, per-GPU architecture
`.rknn`	Edge inference	RK3588 target (OrangePi5)

Integration Points

Azaion REST API

POST /login → JWT token
POST /resources/{folder} → file upload (Bearer auth)
POST /resources/get/{folder} → encrypted file download (hardware-bound key)

S3-compatible CDN

Upload: model big parts (upload_fileobj)
Download: model big parts (download_file)
Separate read/write access keys

RabbitMQ Streams

Queue: azaion-annotations
Protocol: AMQP with rstream library
Message format: msgpack with positional integer keys
Offset tracking: persisted to offset.yaml

Non-Functional Requirements (Observed)

Category	Observation	Source
Training duration	~11.5 days for 360K annotations on 1× RTX 4090	Code comment in train.py
VRAM usage	batch=11 → ~22GB (batch=12 fails at 24.2GB)	Code comment in train.py
Inference speed	TensorRT: 54s for 200s video (3.7GB VRAM)	Code comment in start_inference.py
ONNX inference	81s for 200s video (6.3GB VRAM)	Code comment in start_inference.py
Augmentation ratio	8× (1 original + 7 augmented per image)	augmentation.py
Frame sampling	Every 4th frame during inference	inference/inference.py

Security Architecture

Mechanism	Implementation	Location
API authentication	JWT token (email/password login)	api_client.py
Resource encryption	AES-256-CBC (hardware-bound key)	security.py
Model encryption	AES-256-CBC (static key)	security.py
Split model storage	Small part on API, big part on CDN	api_client.py
Hardware fingerprinting	CPU+GPU+RAM+drive serial hash	hardware_service.py
CDN access control	Separate read/write S3 credentials	cdn_manager.py

Security Concerns

Hardcoded credentials in config.yaml and cdn.yaml
Hardcoded model encryption key in security.py
No TLS certificate validation visible in code
No input validation on API responses
Queue credentials in plaintext config files

Key Architectural Decisions

Decision	Rationale (inferred)
YOLOv11 medium at 1280px	Balance between detection quality and training time
Split model storage	Prevent model theft from single storage compromise
Hardware-bound API encryption	Tie resource access to authorized machines
TensorRT for production inference	~33% faster than ONNX, ~42% less VRAM
Augmentation as separate process	Decouples data prep from training; runs continuously
Annotation queue as separate service	Independent lifecycle; different dependency set
RKNN export for OrangePi5	Edge deployment on low-power ARM SoC

7.1 KiB Raw Blame History Unescape Escape