# Final Documentation Report — Azaion AI Training ## Executive Summary Azaion AI Training is a Python-based ML pipeline for training, deploying, and running YOLOv11 object detection models targeting aerial military asset recognition. The system comprises 8 components (21 modules) spanning annotation ingestion, data augmentation, GPU-accelerated training, multi-format model export, encrypted model distribution, and real-time inference — with edge deployment capability via RKNN on OrangePi5 devices. The codebase is functional and production-used (last training run: 2024-06-27) but has no CI/CD, no containerization, no formal test framework, and several hardcoded credentials. Verification identified 5 code bugs, 3 high-severity security issues, and 1 missing module. ## Problem Statement The system automates detection of 17 classes of military objects and infrastructure in aerial/satellite imagery across 3 weather conditions (Normal, Winter, Night). It replaces manual image analysis with a continuous pipeline: human-annotated data flows in via RabbitMQ, is augmented 8× for training diversity, trains YOLOv11 models over multi-day GPU runs, and distributes encrypted models to inference clients that run real-time video detection. ## Architecture Overview **Tech stack**: Python 3.10+ · PyTorch 2.3.0 (CUDA 12.1) · Ultralytics YOLOv11m · TensorRT · ONNX Runtime · Albumentations · boto3 · rstream · cryptography **Deployment**: 5 independent processes (no orchestration, no containers) running on GPU-equipped servers. Manual deployment. ## Component Summary | # | Component | Modules | Purpose | Key Dependencies | |---|-----------|---------|---------|-----------------| | 01 | Core Infrastructure | constants, utils | Shared paths, config keys, Dotdict helper | None | | 02 | Security & Hardware | security, hardware_service | AES-256-CBC encryption, hardware fingerprinting | cryptography, pynvml | | 03 | API & CDN Client | api_client, cdn_manager | REST API (JWT auth) + S3 CDN communication | requests, boto3, Security | | 04 | Data Models | dto/annotationClass, dto/imageLabel | Annotation class definitions, image+label container | OpenCV, matplotlib | | 05 | Data Pipeline | augmentation, convert-annotations, dataset-visualiser | 8× augmentation, format conversion, visualization | Albumentations, Data Models | | 06 | Training Pipeline | train, exports, manual_run | Dataset formation → YOLO training → export → encrypted upload | Ultralytics, API & CDN, Security | | 07 | Inference Engine | inference/dto, onnx_engine, tensorrt_engine, inference, start_inference | Model download, decryption, TensorRT/ONNX video inference | TensorRT, ONNX Runtime, PyCUDA | | 08 | Annotation Queue | annotation_queue_dto, annotation_queue_handler | Async RabbitMQ Streams consumer for annotation CRUD events | rstream, msgpack | ## System Flows | # | Flow | Entry Point | Path | Output | |---|------|-------------|------|--------| | 1 | Annotation Ingestion | RabbitMQ message | Queue → Handler → Filesystem | Images + labels on disk | | 2 | Data Augmentation | Filesystem scan (5-min loop) | /data/ → Augmentator → /data-processed/ | 8× augmented images + labels | | 3 | Training Pipeline | train.py __main__ | /data-processed/ → Dataset split → YOLO train → Export → Encrypt → Upload | Encrypted model on API + CDN | | 4 | Model Download & Inference | start_inference.py __main__ | API + CDN download → Decrypt → TensorRT init → Video frames → Detections | Annotated video output | | 5 | Model Export (Multi-Format) | train.py / manual_run.py | .pt → .onnx / .engine / .rknn | Multi-format model artifacts | ## Risk Observations ### Code Bugs (from Verification) | # | Location | Issue | Impact | |---|----------|-------|--------| | 1 | augmentation.py:118 | `self.total_to_process` undefined (should be `self.total_images_to_process`) | AttributeError during progress logging | | 2 | train.py:93,99 | `copied` counter never incremented | Incorrect progress reporting (cosmetic) | | 3 | exports.py:97 | Stale `ApiClient(ApiCredentials(...))` constructor call with wrong params | `upload_model` function would fail at runtime | | 4 | inference/tensorrt_engine.py:43-44 | `batch_size` uninitialized for dynamic batch dimensions | NameError for models with dynamic batch size | | 5 | dataset-visualiser.py:6 | Imports from `preprocessing` module that doesn't exist | Script cannot run | ### Security Issues | Issue | Severity | Location | |-------|----------|----------| | Hardcoded API credentials | High | config.yaml | | Hardcoded CDN access keys (4 keys) | High | cdn.yaml | | Hardcoded model encryption key | High | security.py:67 | | Queue credentials in plaintext | Medium | config.yaml, annotation-queue/config.yaml | | No TLS certificate validation | Low | api_client.py | ### Structural Concerns - No CI/CD pipeline or containerization - No formal test framework (2 script-based tests, 1 broken) - Duplicated AnnotationClass/WeatherMode code in 3 locations - No graceful shutdown for augmentation process - No reconnect logic for annotation queue consumer - Manual deployment only ## Open Questions - The `preprocessing` module may have existed previously and been deleted or renamed — its absence breaks `dataset-visualiser.py` and `tests/imagelabel_visualize_test.py` - `exports.upload_model` may be intentionally deprecated in favor of the ApiClient-based flow in `train.py` - The `orangepi5/` shell scripts were not analyzed (bash, not Python) — they appear to be setup/run scripts for edge deployment - `checkpoint.txt` (2024-06-27) suggests training infrastructure was last used in mid-2024 ## Artifact Index | Path | Description | Step | |------|-------------|------| | `_docs/00_problem/problem.md` | Problem statement | 6 | | `_docs/00_problem/restrictions.md` | Hardware, software, environment, operational restrictions | 6 | | `_docs/00_problem/acceptance_criteria.md` | Measurable acceptance criteria from code | 6 | | `_docs/00_problem/input_data/data_parameters.md` | Input data schemas and formats | 6 | | `_docs/00_problem/security_approach.md` | Security mechanisms and known issues | 6 | | `_docs/01_solution/solution.md` | Retrospective solution document | 5 | | `_docs/02_document/00_discovery.md` | Codebase discovery: tree, tech stack, dependency graph | 0 | | `_docs/02_document/modules/*.md` | 21 module-level documentation files | 1 | | `_docs/02_document/components/0N_*/description.md` | 8 component specifications | 2 | | `_docs/02_document/diagrams/components.md` | Component relationship diagram (Mermaid) | 2 | | `_docs/02_document/architecture.md` | System architecture document | 3 | | `_docs/02_document/system-flows.md` | 5 system flow diagrams with sequence diagrams | 3 | | `_docs/02_document/data_model.md` | Data model with ER diagram | 3 | | `_docs/02_document/diagrams/flows/flow_*.md` | Individual flow diagrams (4 files) | 3 | | `_docs/02_document/04_verification_log.md` | Verification results: 87 entities, 5 bugs, 3 security issues | 4 | | `_docs/02_document/FINAL_report.md` | This report | 7 | | `_docs/02_document/state.json` | Document skill progress tracking | — |