ai-training/_docs/02_document/architecture.md

# Architecture

## System Context

Azaion AI Training is a Python-based ML pipeline for training, exporting, and deploying YOLOv11 object detection models. The system operates within the Azaion platform ecosystem, consuming annotated image data and producing encrypted inference-ready models.

### Boundaries

| Boundary | Interface | Protocol |
|----------|-----------|----------|
| Azaion REST API | ApiClient | HTTPS (JWT auth) |
| S3-compatible CDN | CDNManager (boto3) | HTTPS (S3 API) |
| RabbitMQ Streams | rstream Consumer | AMQP 1.0 |
| Local filesystem | Direct I/O | POSIX paths at `/azaion/` |
| NVIDIA GPU | PyTorch, TensorRT, ONNX RT, PyCUDA | CUDA 12.1 |

### System Context Diagram

```mermaid
graph LR
    subgraph "Azaion Platform"
        API[Azaion REST API]
        CDN[S3-compatible CDN]
        Queue[RabbitMQ Streams]
    end

    subgraph "AI Training System"
        AQ[Annotation Queue Consumer]
        AUG[Augmentation Pipeline]
        TRAIN[Training Pipeline]
        INF[Inference Engine]
    end

    subgraph "Storage"
        FS["/azaion/ filesystem"]
    end

    subgraph "Hardware"
        GPU[NVIDIA GPU]
    end

    Queue -->|annotation events| AQ
    AQ -->|images + labels| FS
    FS -->|raw annotations| AUG
    AUG -->|augmented data| FS
    FS -->|processed dataset| TRAIN
    TRAIN -->|trained model| GPU
    TRAIN -->|encrypted model| API
    TRAIN -->|encrypted model big part| CDN
    API -->|encrypted model small part| INF
    CDN -->|encrypted model big part| INF
    INF -->|inference| GPU
```

## Tech Stack

| Layer | Technology | Version/Detail |
|-------|-----------|---------------|
| Language | Python | 3.10+ (match statements) |
| ML Framework | Ultralytics YOLO | YOLOv11 medium |
| Deep Learning | PyTorch | 2.3.0 (CUDA 12.1) |
| GPU Inference | TensorRT | FP16/INT8, async CUDA streams |
| GPU Inference (alt) | ONNX Runtime GPU | CUDAExecutionProvider |
| Edge Inference | RKNN | RK3588 (OrangePi5) |
| Augmentation | Albumentations | Geometric + color transforms |
| Computer Vision | OpenCV | Image I/O, preprocessing, display |
| Object Storage | boto3 | S3-compatible CDN |
| Message Queue | rstream | RabbitMQ Streams consumer |
| Serialization | msgpack | Queue message format |
| Encryption | cryptography | AES-256-CBC |
| HTTP Client | requests | REST API communication |
| Configuration | PyYAML | YAML config files |
| Visualization | matplotlib, netron | Annotation display, model graphs |

## Deployment Model

The system runs as multiple independent processes on machines with NVIDIA GPUs:

| Process | Entry Point | Runtime | Typical Host |
|---------|------------|---------|-------------|
| Training | `train.py` | Long-running (days) | GPU server (RTX 4090, 24GB VRAM) |
| Augmentation | `augmentation.py` | Continuous loop (infinite) | Same GPU server or CPU-only |
| Annotation Queue | `annotation-queue/annotation_queue_handler.py` | Continuous (async) | Any server with network access |
| Inference | `start_inference.py` | On-demand | GPU-equipped machine |
| Data Tools | `convert-annotations.py`, `dataset-visualiser.py` | Ad-hoc | Developer machine |

No containerization (Dockerfile), CI/CD pipeline, or orchestration infrastructure was found in the codebase. Deployment appears to be manual.

## Data Model Overview

### Annotation Data Flow

```
Raw annotations (Queue) → /azaion/data-seed/ (unvalidated)
                        → /azaion/data/ (validated)
                        → /azaion/data-processed/ (augmented, 8×)
                        → /azaion/datasets/azaion-{date}/ (train/valid/test split)
                        → /azaion/data-corrupted/ (invalid labels)
                        → /azaion/data_deleted/ (soft-deleted)
```

### Annotation Class System

- 17 base classes (ArmorVehicle, Truck, Vehicle, Artillery, Shadow, Trenches, MilitaryMan, TyreTracks, AdditArmoredTank, Smoke, Plane, Moto, CamouflageNet, CamouflageBranches, Roof, Building, Caponier)
- 3 weather modes: Norm (offset 0), Wint (offset 20), Night (offset 40)
- Total class slots: 80 (17 × 3 = 51 used, 29 reserved)
- Format: YOLO (center_x, center_y, width, height — all normalized 0–1)

### Model Artifacts

| Format | Use | Export Details |
|--------|-----|---------------|
| `.pt` | Training checkpoint | YOLOv11 PyTorch weights |
| `.onnx` | Cross-platform inference | 1280px, batch=4, NMS baked in |
| `.engine` | GPU inference (production) | TensorRT FP16, batch=4, per-GPU architecture |
| `.rknn` | Edge inference | RK3588 target (OrangePi5) |

## Integration Points

### Azaion REST API
- `POST /login` → JWT token
- `POST /resources/{folder}` → file upload (Bearer auth)
- `POST /resources/get/{folder}` → encrypted file download (hardware-bound key)

### S3-compatible CDN
- Upload: model big parts (`upload_fileobj`)
- Download: model big parts (`download_file`)
- Separate read/write access keys

### RabbitMQ Streams
- Queue: `azaion-annotations`
- Protocol: AMQP with rstream library
- Message format: msgpack with positional integer keys
- Offset tracking: persisted to `offset.yaml`

## Non-Functional Requirements (Observed)

| Category | Observation | Source |
|----------|------------|--------|
| Training duration | ~11.5 days for 360K annotations on 1× RTX 4090 | Code comment in train.py |
| VRAM usage | batch=11 → ~22GB (batch=12 fails at 24.2GB) | Code comment in train.py |
| Inference speed | TensorRT: 54s for 200s video (3.7GB VRAM) | Code comment in start_inference.py |
| ONNX inference | 81s for 200s video (6.3GB VRAM) | Code comment in start_inference.py |
| Augmentation ratio | 8× (1 original + 7 augmented per image) | augmentation.py |
| Frame sampling | Every 4th frame during inference | inference/inference.py |

## Security Architecture

| Mechanism | Implementation | Location |
|-----------|---------------|----------|
| API authentication | JWT token (email/password login) | api_client.py |
| Resource encryption | AES-256-CBC (hardware-bound key) | security.py |
| Model encryption | AES-256-CBC (static key) | security.py |
| Split model storage | Small part on API, big part on CDN | api_client.py |
| Hardware fingerprinting | CPU+GPU+RAM+drive serial hash | hardware_service.py |
| CDN access control | Separate read/write S3 credentials | cdn_manager.py |

### Security Concerns
- Hardcoded credentials in `config.yaml` and `cdn.yaml`
- Hardcoded model encryption key in `security.py`
- No TLS certificate validation visible in code
- No input validation on API responses
- Queue credentials in plaintext config files

## Key Architectural Decisions

| Decision | Rationale (inferred) |
|----------|---------------------|
| YOLOv11 medium at 1280px | Balance between detection quality and training time |
| Split model storage | Prevent model theft from single storage compromise |
| Hardware-bound API encryption | Tie resource access to authorized machines |
| TensorRT for production inference | ~33% faster than ONNX, ~42% less VRAM |
| Augmentation as separate process | Decouples data prep from training; runs continuously |
| Annotation queue as separate service | Independent lifecycle; different dependency set |
| RKNN export for OrangePi5 | Edge deployment on low-power ARM SoC |