Refactor constants management to use Pydantic BaseModel for configuration

- Replaced module-level path variables in constants.py with a structured Pydantic Config class.
- Updated all relevant modules (train.py, augmentation.py, exports.py, dataset-visualiser.py, manual_run.py) to access paths through the new config structure.
- Fixed bugs related to image processing and model saving.
- Enhanced test infrastructure to accommodate the new configuration approach.

This refactor improves code maintainability and clarity by centralizing configuration management.
This commit is contained in:
Oleksandr Bezdieniezhnykh
2026-03-27 18:18:30 +02:00
parent b68c07b540
commit 142c6c4de8
106 changed files with 5706 additions and 654 deletions
+205
View File
@@ -0,0 +1,205 @@
# Codebase Discovery
## Directory Tree
```
ai-training/
├── annotation-queue/ # Separate sub-service: annotation message queue consumer
│ ├── annotation_queue_dto.py
│ ├── annotation_queue_handler.py
│ ├── classes.json
│ ├── config.yaml
│ ├── offset.yaml
│ ├── requirements.txt
│ └── run.sh
├── dto/ # Data transfer objects for the training pipeline
│ ├── annotationClass.py
│ ├── annotation_bulk_message.py (empty)
│ ├── annotation_message.py (empty)
│ └── imageLabel.py
├── inference/ # Inference engine subsystem (ONNX + TensorRT)
│ ├── __init__.py (empty)
│ ├── dto.py
│ ├── inference.py
│ ├── onnx_engine.py
│ └── tensorrt_engine.py
├── orangepi5/ # Setup scripts for OrangePi5 edge device
│ ├── 01 install.sh
│ ├── 02 install-inference.sh
│ └── 03 run_inference.sh
├── scripts/
│ └── init-sftp.sh
├── tests/
│ ├── data.yaml
│ ├── imagelabel_visualize_test.py
│ ├── libomp140.x86_64.dll (binary workaround for Windows)
│ └── security_test.py
├── api_client.py # API client for Azaion backend + CDN resource management
├── augmentation.py # Image augmentation pipeline (albumentations)
├── cdn_manager.py # S3-compatible CDN upload/download via boto3
├── cdn.yaml # CDN credentials config
├── checkpoint.txt # Last training checkpoint timestamp
├── classes.json # Annotation class definitions (17 classes + weather modes)
├── config.yaml # Main config (API url, queue, directories)
├── constants.py # Shared path constants and config keys
├── convert-annotations.py # Annotation format converter (Pascal VOC / bbox → YOLO)
├── dataset-visualiser.py # Interactive dataset visualization tool
├── exports.py # Model export (ONNX, TensorRT, RKNN) and upload
├── hardware_service.py # Hardware fingerprinting (CPU/GPU/RAM/drive serial)
├── install.sh # Dependency installation script
├── manual_run.py # Manual training/export entry point
├── requirements.txt # Python dependencies
├── security.py # AES-256-CBC encryption/decryption + key derivation
├── start_inference.py # Inference entry point (downloads model, runs TensorRT)
├── train.py # Main training pipeline (dataset formation → YOLO training → export)
└── utils.py # Utility classes (Dotdict)
```
## Tech Stack Summary
| Category | Technology | Details |
|----------|-----------|---------|
| Language | Python 3.10+ | Match statements used (3.10 feature) |
| ML Framework | Ultralytics (YOLO) | YOLOv11 object detection model |
| Deep Learning | PyTorch 2.3.0 (CUDA 12.1) | GPU-accelerated training |
| Inference (Primary) | TensorRT | GPU inference with FP16/INT8 support |
| Inference (Fallback) | ONNX Runtime GPU | Cross-platform inference |
| Augmentation | Albumentations | Image augmentation pipeline |
| Computer Vision | OpenCV (cv2) | Image I/O, preprocessing, visualization |
| CDN/Storage | boto3 (S3-compatible) | Model artifact storage |
| Message Queue | RabbitMQ Streams (rstream) | Annotation message consumption |
| Serialization | msgpack | Queue message deserialization |
| Encryption | cryptography (AES-256-CBC) | Model encryption, API resource encryption |
| GPU Management | pycuda, pynvml | CUDA memory management, device queries |
| HTTP | requests | API communication |
| Config | PyYAML | Configuration files |
| Visualization | matplotlib, netron | Annotation display, model graph viewer |
| Edge Deployment | RKNN (RK3588) | OrangePi5 inference target |
## Dependency Graph
### Internal Module Dependencies (textual)
**Leaves (no internal dependencies):**
- `constants` — path constants, config keys
- `utils` — Dotdict helper
- `security` — encryption/decryption, key derivation
- `hardware_service` — hardware fingerprinting
- `cdn_manager` — S3-compatible CDN client
- `dto/annotationClass` — annotation class model + JSON reader
- `dto/imageLabel` — image+labels container with visualization
- `inference/dto` — Detection, Annotation, AnnotationClass (inference-specific)
- `inference/onnx_engine` — InferenceEngine ABC + OnnxEngine implementation
- `convert-annotations` — standalone annotation format converter
- `annotation-queue/annotation_queue_dto` — queue message DTOs
**Level 1 (depends on leaves):**
- `api_client` → constants, cdn_manager, hardware_service, security
- `augmentation` → constants, dto/imageLabel
- `inference/tensorrt_engine` → inference/onnx_engine (InferenceEngine ABC)
- `inference/inference` → inference/dto, inference/onnx_engine
- `annotation-queue/annotation_queue_handler` → annotation_queue_dto
**Level 2 (depends on level 1):**
- `exports` → constants, api_client, cdn_manager, security, utils
**Level 3 (depends on level 2):**
- `train` → constants, api_client, cdn_manager, dto/annotationClass, inference/onnx_engine, security, utils, exports
- `start_inference` → constants, api_client, cdn_manager, inference/inference, inference/tensorrt_engine, security, utils
**Level 4 (depends on level 3):**
- `manual_run` → constants, train, augmentation
**Broken dependency:**
- `dataset-visualiser` → constants, dto/annotationClass, dto/imageLabel, **preprocessing** (module not found in codebase)
### Dependency Graph (Mermaid)
```mermaid
graph TD
constants --> api_client
constants --> augmentation
constants --> exports
constants --> train
constants --> manual_run
constants --> start_inference
constants --> dataset-visualiser
utils --> exports
utils --> train
utils --> start_inference
security --> api_client
security --> exports
security --> train
security --> start_inference
hardware_service --> api_client
cdn_manager --> api_client
cdn_manager --> exports
cdn_manager --> train
cdn_manager --> start_inference
api_client --> exports
api_client --> train
api_client --> start_inference
dto_annotationClass[dto/annotationClass] --> train
dto_annotationClass --> dataset-visualiser
dto_imageLabel[dto/imageLabel] --> augmentation
dto_imageLabel --> dataset-visualiser
inference_dto[inference/dto] --> inference_inference[inference/inference]
inference_onnx[inference/onnx_engine] --> inference_inference
inference_onnx --> inference_trt[inference/tensorrt_engine]
inference_onnx --> train
inference_inference --> start_inference
inference_trt --> start_inference
exports --> train
train --> manual_run
augmentation --> manual_run
aq_dto[annotation-queue/annotation_queue_dto] --> aq_handler[annotation-queue/annotation_queue_handler]
```
## Topological Processing Order
| Batch | Modules |
|-------|---------|
| 1 (leaves) | constants, utils, security, hardware_service, cdn_manager |
| 2 (leaves) | dto/annotationClass, dto/imageLabel, inference/dto, inference/onnx_engine |
| 3 (level 1) | api_client, augmentation, inference/tensorrt_engine, inference/inference |
| 4 (level 2) | exports, convert-annotations, dataset-visualiser |
| 5 (level 3) | train, start_inference |
| 6 (level 4) | manual_run |
| 7 (separate) | annotation-queue/annotation_queue_dto, annotation-queue/annotation_queue_handler |
## Entry Points
| Entry Point | Description |
|-------------|-------------|
| `train.py` (`__main__`) | Main pipeline: form dataset → train YOLO → export + upload ONNX model |
| `augmentation.py` (`__main__`) | Continuous augmentation loop (runs indefinitely) |
| `start_inference.py` (`__main__`) | Download encrypted TensorRT model → run video inference |
| `manual_run.py` (script) | Ad-hoc training/export commands |
| `convert-annotations.py` (`__main__`) | One-shot annotation format conversion |
| `dataset-visualiser.py` (`__main__`) | Interactive annotation visualization |
| `annotation-queue/annotation_queue_handler.py` (`__main__`) | Async queue consumer for annotation CRUD events |
## Leaf Modules
constants, utils, security, hardware_service, cdn_manager, dto/annotationClass, dto/imageLabel, inference/dto, inference/onnx_engine, convert-annotations, annotation-queue/annotation_queue_dto
## Observations
- **Security concern**: `config.yaml` and `cdn.yaml` contain hardcoded credentials (API passwords, S3 access keys). These should be moved to environment variables or a secrets manager.
- **Missing module**: `dataset-visualiser.py` imports from `preprocessing` which does not exist in the codebase.
- **Duplicate code**: `AnnotationClass` and `WeatherMode` are defined in three separate locations: `dto/annotationClass.py`, `inference/dto.py`, and `annotation-queue/annotation_queue_dto.py`.
- **Empty files**: `dto/annotation_bulk_message.py`, `dto/annotation_message.py`, and `inference/__init__.py` are empty.
- **Separate sub-service**: `annotation-queue/` has its own `requirements.txt` and `config.yaml`, functioning as an independent service.
- **Hardcoded encryption key**: `security.py` has a hardcoded model encryption key string.
- **No formal test framework**: tests are script-based, not using pytest/unittest.
+138
View File
@@ -0,0 +1,138 @@
# Verification Log
## Summary
| Metric | Count |
|--------|-------|
| Entities verified | 87 |
| Entities flagged | 0 |
| Corrections applied | 0 |
| Bugs found in code | 5 |
| Missing modules | 1 |
| Duplicated code | 1 pattern (3 locations) |
| Security issues | 3 |
| Completeness | 21/21 modules (100%) |
## Entity Verification
All class names, function names, method signatures, and module names referenced in documentation were verified against the actual source code. No hallucinated entities found.
### Verified Entities (key samples)
| Entity | Location | Doc Reference | Status |
|--------|----------|--------------|--------|
| `Security.encrypt_to` | security.py:14 | modules/security.md | OK |
| `Security.decrypt_to` | security.py:28 | modules/security.md | OK |
| `Security.get_model_encryption_key` | security.py:66 | modules/security.md | OK |
| `get_hardware_info` | hardware_service.py:5 | modules/hardware_service.md | OK |
| `CDNManager.upload` | cdn_manager.py:28 | modules/cdn_manager.md | OK |
| `CDNManager.download` | cdn_manager.py:37 | modules/cdn_manager.md | OK |
| `ApiClient.login` | api_client.py:43 | modules/api_client.md | OK |
| `ApiClient.load_bytes` | api_client.py:63 | modules/api_client.md | OK |
| `ApiClient.upload_big_small_resource` | api_client.py:113 | modules/api_client.md | OK |
| `Augmentator.augment_annotations` | augmentation.py:125 | modules/augmentation.md | OK |
| `Augmentator.augment_inner` | augmentation.py:55 | modules/augmentation.md | OK |
| `InferenceEngine` (ABC) | inference/onnx_engine.py:7 | modules/inference_onnx_engine.md | OK |
| `OnnxEngine` | inference/onnx_engine.py:25 | modules/inference_onnx_engine.md | OK |
| `TensorRTEngine` | inference/tensorrt_engine.py:16 | modules/inference_tensorrt_engine.md | OK |
| `TensorRTEngine.convert_from_onnx` | inference/tensorrt_engine.py:104 | modules/inference_tensorrt_engine.md | OK |
| `Inference.process` | inference/inference.py:83 | modules/inference_inference.md | OK |
| `Inference.remove_overlapping_detections` | inference/inference.py:120 | modules/inference_inference.md | OK |
| `AnnotationQueueHandler.on_message` | annotation-queue/annotation_queue_handler.py:87 | modules/annotation_queue_handler.md | OK |
| `AnnotationMessage` | annotation-queue/annotation_queue_dto.py:91 | modules/annotation_queue_dto.md | OK |
| `form_dataset` | train.py:42 | modules/train.md | OK |
| `train_dataset` | train.py:147 | modules/train.md | OK |
| `export_onnx` | exports.py:29 | modules/exports.md | OK |
| `export_rknn` | exports.py:19 | modules/exports.md | OK |
| `export_tensorrt` | exports.py:45 | modules/exports.md | OK |
| `upload_model` | exports.py:82 | modules/exports.md | OK |
| `WeatherMode` | dto/annotationClass.py:6 | modules/dto_annotationClass.md | OK |
| `AnnotationClass.read_json` | dto/annotationClass.py:18 | modules/dto_annotationClass.md | OK |
| `ImageLabel.visualize` | dto/imageLabel.py:12 | modules/dto_imageLabel.md | OK |
| `Dotdict` | utils.py:1 | modules/utils.md | OK |
## Code Bugs Found During Verification
### Bug 1: `augmentation.py` — undefined attribute `total_to_process`
- **Location**: augmentation.py, line 118
- **Issue**: References `self.total_to_process` but only `self.total_images_to_process` is defined in `__init__`
- **Impact**: AttributeError at runtime during progress logging
- **Documented in**: modules/augmentation.md, components/05_data_pipeline/description.md
### Bug 2: `train.py` `copy_annotations` — reporting bug
- **Location**: train.py, line 93 and 99
- **Issue**: `copied = 0` is declared but never incremented. The global `total_files_copied` is incremented inside the inner function, but `copied` is printed in the final message: `f'Copied all {copied} annotations'` always prints 0.
- **Impact**: Incorrect progress reporting (cosmetic)
- **Documented in**: modules/train.md, components/06_training/description.md
### Bug 3: `exports.py` `upload_model` — stale ApiClient constructor call
- **Location**: exports.py, line 97
- **Issue**: `ApiClient(ApiCredentials(api_c.url, api_c.user, api_c.pw, api_c.folder))` — but `ApiClient.__init__` takes no args, and `ApiCredentials.__init__` takes `(url, email, password)`, not `(url, user, pw, folder)`.
- **Impact**: `upload_model` function would fail at runtime. This function appears to be stale code — the actual upload flow in `train.py:export_current_model` uses the correct `ApiClient()` constructor.
- **Documented in**: modules/exports.md, components/06_training/description.md
### Bug 4: `inference/tensorrt_engine.py` — potential uninitialized `batch_size`
- **Location**: inference/tensorrt_engine.py, line 4344
- **Issue**: `self.batch_size` is only set if `engine_input_shape[0] != -1`. If the batch dimension is dynamic (-1), `self.batch_size` is never assigned before being used in `self.input_shape = [self.batch_size, ...]`.
- **Impact**: NameError at runtime for models with dynamic batch size (unless batch_size is passed via kwargs/set elsewhere)
- **Documented in**: modules/inference_tensorrt_engine.md, components/07_inference/description.md
### Bug 5: `dataset-visualiser.py` — missing import
- **Location**: dataset-visualiser.py, line 6
- **Issue**: `from preprocessing import read_labels` — the `preprocessing` module does not exist in the codebase.
- **Impact**: Script cannot run; ImportError at startup
- **Documented in**: modules/dataset_visualiser.md, components/05_data_pipeline/description.md
## Missing Modules
| Module | Referenced By | Status |
|--------|-------------|--------|
| `preprocessing` | dataset-visualiser.py, tests/imagelabel_visualize_test.py | Not found in codebase |
## Duplicated Code
### AnnotationClass + WeatherMode (3 locations)
| Location | Differences |
|----------|-------------|
| `dto/annotationClass.py` | Standard version. `color_tuple` property strips first 3 chars. |
| `inference/dto.py` | Adds `opencv_color` BGR field. Same `read_json` logic. |
| `annotation-queue/annotation_queue_dto.py` | Adds `opencv_color`. Reads `classes.json` from CWD (not relative to package). |
## Security Issues
| Issue | Location | Severity |
|-------|----------|----------|
| Hardcoded API credentials | config.yaml (email, password) | High |
| Hardcoded CDN access keys | cdn.yaml (4 access keys) | High |
| Hardcoded encryption key | security.py:67 (`get_model_encryption_key`) | High |
| Queue credentials in plaintext | config.yaml, annotation-queue/config.yaml | Medium |
| No TLS cert validation in API calls | api_client.py | Low |
## Completeness Check
All 21 source modules documented. All 8 components cover all modules with no gaps.
| Component | Modules | Complete |
|-----------|---------|----------|
| 01 Core | constants, utils | Yes |
| 02 Security | security, hardware_service | Yes |
| 03 API & CDN | api_client, cdn_manager | Yes |
| 04 Data Models | dto/annotationClass, dto/imageLabel | Yes |
| 05 Data Pipeline | augmentation, convert-annotations, dataset-visualiser | Yes |
| 06 Training | train, exports, manual_run | Yes |
| 07 Inference | inference/dto, onnx_engine, tensorrt_engine, inference, start_inference | Yes |
| 08 Annotation Queue | annotation_queue_dto, annotation_queue_handler | Yes |
## Consistency Check
- Component docs agree with architecture doc: Yes
- Flow diagrams match component interfaces: Yes
- Module dependency graph in discovery matches import analysis: Yes
- Data model doc matches filesystem layout in architecture: Yes
## Remaining Gaps / Uncertainties
- The `preprocessing` module may have existed previously and been deleted or renamed
- `exports.upload_model` may be intentionally deprecated in favor of the ApiClient-based flow in train.py
- `checkpoint.txt` content (`2024-06-27 20:51:35`) suggests training infrastructure was last used in mid-2024
- The `orangepi5/` shell scripts were not analyzed (bash, not Python) — they appear to be setup/run scripts for edge deployment
+100
View File
@@ -0,0 +1,100 @@
# Final Documentation Report — Azaion AI Training
## Executive Summary
Azaion AI Training is a Python-based ML pipeline for training, deploying, and running YOLOv11 object detection models targeting aerial military asset recognition. The system comprises 8 components (21 modules) spanning annotation ingestion, data augmentation, GPU-accelerated training, multi-format model export, encrypted model distribution, and real-time inference — with edge deployment capability via RKNN on OrangePi5 devices.
The codebase is functional and production-used (last training run: 2024-06-27) but has no CI/CD, no containerization, no formal test framework, and several hardcoded credentials. Verification identified 5 code bugs, 3 high-severity security issues, and 1 missing module.
## Problem Statement
The system automates detection of 17 classes of military objects and infrastructure in aerial/satellite imagery across 3 weather conditions (Normal, Winter, Night). It replaces manual image analysis with a continuous pipeline: human-annotated data flows in via RabbitMQ, is augmented 8× for training diversity, trains YOLOv11 models over multi-day GPU runs, and distributes encrypted models to inference clients that run real-time video detection.
## Architecture Overview
**Tech stack**: Python 3.10+ · PyTorch 2.3.0 (CUDA 12.1) · Ultralytics YOLOv11m · TensorRT · ONNX Runtime · Albumentations · boto3 · rstream · cryptography
**Deployment**: 5 independent processes (no orchestration, no containers) running on GPU-equipped servers. Manual deployment.
## Component Summary
| # | Component | Modules | Purpose | Key Dependencies |
|---|-----------|---------|---------|-----------------|
| 01 | Core Infrastructure | constants, utils | Shared paths, config keys, Dotdict helper | None |
| 02 | Security & Hardware | security, hardware_service | AES-256-CBC encryption, hardware fingerprinting | cryptography, pynvml |
| 03 | API & CDN Client | api_client, cdn_manager | REST API (JWT auth) + S3 CDN communication | requests, boto3, Security |
| 04 | Data Models | dto/annotationClass, dto/imageLabel | Annotation class definitions, image+label container | OpenCV, matplotlib |
| 05 | Data Pipeline | augmentation, convert-annotations, dataset-visualiser | 8× augmentation, format conversion, visualization | Albumentations, Data Models |
| 06 | Training Pipeline | train, exports, manual_run | Dataset formation → YOLO training → export → encrypted upload | Ultralytics, API & CDN, Security |
| 07 | Inference Engine | inference/dto, onnx_engine, tensorrt_engine, inference, start_inference | Model download, decryption, TensorRT/ONNX video inference | TensorRT, ONNX Runtime, PyCUDA |
| 08 | Annotation Queue | annotation_queue_dto, annotation_queue_handler | Async RabbitMQ Streams consumer for annotation CRUD events | rstream, msgpack |
## System Flows
| # | Flow | Entry Point | Path | Output |
|---|------|-------------|------|--------|
| 1 | Annotation Ingestion | RabbitMQ message | Queue → Handler → Filesystem | Images + labels on disk |
| 2 | Data Augmentation | Filesystem scan (5-min loop) | /data/ → Augmentator → /data-processed/ | 8× augmented images + labels |
| 3 | Training Pipeline | train.py __main__ | /data-processed/ → Dataset split → YOLO train → Export → Encrypt → Upload | Encrypted model on API + CDN |
| 4 | Model Download & Inference | start_inference.py __main__ | API + CDN download → Decrypt → TensorRT init → Video frames → Detections | Annotated video output |
| 5 | Model Export (Multi-Format) | train.py / manual_run.py | .pt → .onnx / .engine / .rknn | Multi-format model artifacts |
## Risk Observations
### Code Bugs (from Verification)
| # | Location | Issue | Impact |
|---|----------|-------|--------|
| 1 | augmentation.py:118 | `self.total_to_process` undefined (should be `self.total_images_to_process`) | AttributeError during progress logging |
| 2 | train.py:93,99 | `copied` counter never incremented | Incorrect progress reporting (cosmetic) |
| 3 | exports.py:97 | Stale `ApiClient(ApiCredentials(...))` constructor call with wrong params | `upload_model` function would fail at runtime |
| 4 | inference/tensorrt_engine.py:43-44 | `batch_size` uninitialized for dynamic batch dimensions | NameError for models with dynamic batch size |
| 5 | dataset-visualiser.py:6 | Imports from `preprocessing` module that doesn't exist | Script cannot run |
### Security Issues
| Issue | Severity | Location |
|-------|----------|----------|
| Hardcoded API credentials | High | config.yaml |
| Hardcoded CDN access keys (4 keys) | High | cdn.yaml |
| Hardcoded model encryption key | High | security.py:67 |
| Queue credentials in plaintext | Medium | config.yaml, annotation-queue/config.yaml |
| No TLS certificate validation | Low | api_client.py |
### Structural Concerns
- No CI/CD pipeline or containerization
- No formal test framework (2 script-based tests, 1 broken)
- Duplicated AnnotationClass/WeatherMode code in 3 locations
- No graceful shutdown for augmentation process
- No reconnect logic for annotation queue consumer
- Manual deployment only
## Open Questions
- The `preprocessing` module may have existed previously and been deleted or renamed — its absence breaks `dataset-visualiser.py` and `tests/imagelabel_visualize_test.py`
- `exports.upload_model` may be intentionally deprecated in favor of the ApiClient-based flow in `train.py`
- The `orangepi5/` shell scripts were not analyzed (bash, not Python) — they appear to be setup/run scripts for edge deployment
- `checkpoint.txt` (2024-06-27) suggests training infrastructure was last used in mid-2024
## Artifact Index
| Path | Description | Step |
|------|-------------|------|
| `_docs/00_problem/problem.md` | Problem statement | 6 |
| `_docs/00_problem/restrictions.md` | Hardware, software, environment, operational restrictions | 6 |
| `_docs/00_problem/acceptance_criteria.md` | Measurable acceptance criteria from code | 6 |
| `_docs/00_problem/input_data/data_parameters.md` | Input data schemas and formats | 6 |
| `_docs/00_problem/security_approach.md` | Security mechanisms and known issues | 6 |
| `_docs/01_solution/solution.md` | Retrospective solution document | 5 |
| `_docs/02_document/00_discovery.md` | Codebase discovery: tree, tech stack, dependency graph | 0 |
| `_docs/02_document/modules/*.md` | 21 module-level documentation files | 1 |
| `_docs/02_document/components/0N_*/description.md` | 8 component specifications | 2 |
| `_docs/02_document/diagrams/components.md` | Component relationship diagram (Mermaid) | 2 |
| `_docs/02_document/architecture.md` | System architecture document | 3 |
| `_docs/02_document/system-flows.md` | 5 system flow diagrams with sequence diagrams | 3 |
| `_docs/02_document/data_model.md` | Data model with ER diagram | 3 |
| `_docs/02_document/diagrams/flows/flow_*.md` | Individual flow diagrams (4 files) | 3 |
| `_docs/02_document/04_verification_log.md` | Verification results: 87 entities, 5 bugs, 3 security issues | 4 |
| `_docs/02_document/FINAL_report.md` | This report | 7 |
| `_docs/02_document/state.json` | Document skill progress tracking | — |
+175
View File
@@ -0,0 +1,175 @@
# Architecture
## System Context
Azaion AI Training is a Python-based ML pipeline for training, exporting, and deploying YOLOv11 object detection models. The system operates within the Azaion platform ecosystem, consuming annotated image data and producing encrypted inference-ready models.
### Boundaries
| Boundary | Interface | Protocol |
|----------|-----------|----------|
| Azaion REST API | ApiClient | HTTPS (JWT auth) |
| S3-compatible CDN | CDNManager (boto3) | HTTPS (S3 API) |
| RabbitMQ Streams | rstream Consumer | AMQP 1.0 |
| Local filesystem | Direct I/O | POSIX paths at `/azaion/` |
| NVIDIA GPU | PyTorch, TensorRT, ONNX RT, PyCUDA | CUDA 12.1 |
### System Context Diagram
```mermaid
graph LR
subgraph "Azaion Platform"
API[Azaion REST API]
CDN[S3-compatible CDN]
Queue[RabbitMQ Streams]
end
subgraph "AI Training System"
AQ[Annotation Queue Consumer]
AUG[Augmentation Pipeline]
TRAIN[Training Pipeline]
INF[Inference Engine]
end
subgraph "Storage"
FS["/azaion/ filesystem"]
end
subgraph "Hardware"
GPU[NVIDIA GPU]
end
Queue -->|annotation events| AQ
AQ -->|images + labels| FS
FS -->|raw annotations| AUG
AUG -->|augmented data| FS
FS -->|processed dataset| TRAIN
TRAIN -->|trained model| GPU
TRAIN -->|encrypted model| API
TRAIN -->|encrypted model big part| CDN
API -->|encrypted model small part| INF
CDN -->|encrypted model big part| INF
INF -->|inference| GPU
```
## Tech Stack
| Layer | Technology | Version/Detail |
|-------|-----------|---------------|
| Language | Python | 3.10+ (match statements) |
| ML Framework | Ultralytics YOLO | YOLOv11 medium |
| Deep Learning | PyTorch | 2.3.0 (CUDA 12.1) |
| GPU Inference | TensorRT | FP16/INT8, async CUDA streams |
| GPU Inference (alt) | ONNX Runtime GPU | CUDAExecutionProvider |
| Edge Inference | RKNN | RK3588 (OrangePi5) |
| Augmentation | Albumentations | Geometric + color transforms |
| Computer Vision | OpenCV | Image I/O, preprocessing, display |
| Object Storage | boto3 | S3-compatible CDN |
| Message Queue | rstream | RabbitMQ Streams consumer |
| Serialization | msgpack | Queue message format |
| Encryption | cryptography | AES-256-CBC |
| HTTP Client | requests | REST API communication |
| Configuration | PyYAML | YAML config files |
| Visualization | matplotlib, netron | Annotation display, model graphs |
## Deployment Model
The system runs as multiple independent processes on machines with NVIDIA GPUs:
| Process | Entry Point | Runtime | Typical Host |
|---------|------------|---------|-------------|
| Training | `train.py` | Long-running (days) | GPU server (RTX 4090, 24GB VRAM) |
| Augmentation | `augmentation.py` | Continuous loop (infinite) | Same GPU server or CPU-only |
| Annotation Queue | `annotation-queue/annotation_queue_handler.py` | Continuous (async) | Any server with network access |
| Inference | `start_inference.py` | On-demand | GPU-equipped machine |
| Data Tools | `convert-annotations.py`, `dataset-visualiser.py` | Ad-hoc | Developer machine |
No containerization (Dockerfile), CI/CD pipeline, or orchestration infrastructure was found in the codebase. Deployment appears to be manual.
## Data Model Overview
### Annotation Data Flow
```
Raw annotations (Queue) → /azaion/data-seed/ (unvalidated)
→ /azaion/data/ (validated)
→ /azaion/data-processed/ (augmented, 8×)
→ /azaion/datasets/azaion-{date}/ (train/valid/test split)
→ /azaion/data-corrupted/ (invalid labels)
→ /azaion/data_deleted/ (soft-deleted)
```
### Annotation Class System
- 17 base classes (ArmorVehicle, Truck, Vehicle, Artillery, Shadow, Trenches, MilitaryMan, TyreTracks, AdditArmoredTank, Smoke, Plane, Moto, CamouflageNet, CamouflageBranches, Roof, Building, Caponier)
- 3 weather modes: Norm (offset 0), Wint (offset 20), Night (offset 40)
- Total class slots: 80 (17 × 3 = 51 used, 29 reserved)
- Format: YOLO (center_x, center_y, width, height — all normalized 01)
### Model Artifacts
| Format | Use | Export Details |
|--------|-----|---------------|
| `.pt` | Training checkpoint | YOLOv11 PyTorch weights |
| `.onnx` | Cross-platform inference | 1280px, batch=4, NMS baked in |
| `.engine` | GPU inference (production) | TensorRT FP16, batch=4, per-GPU architecture |
| `.rknn` | Edge inference | RK3588 target (OrangePi5) |
## Integration Points
### Azaion REST API
- `POST /login` → JWT token
- `POST /resources/{folder}` → file upload (Bearer auth)
- `POST /resources/get/{folder}` → encrypted file download (hardware-bound key)
### S3-compatible CDN
- Upload: model big parts (`upload_fileobj`)
- Download: model big parts (`download_file`)
- Separate read/write access keys
### RabbitMQ Streams
- Queue: `azaion-annotations`
- Protocol: AMQP with rstream library
- Message format: msgpack with positional integer keys
- Offset tracking: persisted to `offset.yaml`
## Non-Functional Requirements (Observed)
| Category | Observation | Source |
|----------|------------|--------|
| Training duration | ~11.5 days for 360K annotations on 1× RTX 4090 | Code comment in train.py |
| VRAM usage | batch=11 → ~22GB (batch=12 fails at 24.2GB) | Code comment in train.py |
| Inference speed | TensorRT: 54s for 200s video (3.7GB VRAM) | Code comment in start_inference.py |
| ONNX inference | 81s for 200s video (6.3GB VRAM) | Code comment in start_inference.py |
| Augmentation ratio | 8× (1 original + 7 augmented per image) | augmentation.py |
| Frame sampling | Every 4th frame during inference | inference/inference.py |
## Security Architecture
| Mechanism | Implementation | Location |
|-----------|---------------|----------|
| API authentication | JWT token (email/password login) | api_client.py |
| Resource encryption | AES-256-CBC (hardware-bound key) | security.py |
| Model encryption | AES-256-CBC (static key) | security.py |
| Split model storage | Small part on API, big part on CDN | api_client.py |
| Hardware fingerprinting | CPU+GPU+RAM+drive serial hash | hardware_service.py |
| CDN access control | Separate read/write S3 credentials | cdn_manager.py |
### Security Concerns
- Hardcoded credentials in `config.yaml` and `cdn.yaml`
- Hardcoded model encryption key in `security.py`
- No TLS certificate validation visible in code
- No input validation on API responses
- Queue credentials in plaintext config files
## Key Architectural Decisions
| Decision | Rationale (inferred) |
|----------|---------------------|
| YOLOv11 medium at 1280px | Balance between detection quality and training time |
| Split model storage | Prevent model theft from single storage compromise |
| Hardware-bound API encryption | Tie resource access to authorized machines |
| TensorRT for production inference | ~33% faster than ONNX, ~42% less VRAM |
| Augmentation as separate process | Decouples data prep from training; runs continuously |
| Annotation queue as separate service | Independent lifecycle; different dependency set |
| RKNN export for OrangePi5 | Edge deployment on low-power ARM SoC |
@@ -0,0 +1,53 @@
# Component: Core Infrastructure
## Overview
Shared constants and utility classes that form the foundation for all other components. Provides path definitions, config file references, and helper data structures.
**Pattern**: Configuration constants + utility library
**Upstream**: None (leaf component)
**Downstream**: All other components
## Modules
- `constants` — filesystem paths, config keys, thresholds
- `utils` — Dotdict helper class
## Internal Interfaces
### constants (public symbols)
All path/string constants — see module doc for full list. Key exports:
- Directory paths: `data_dir`, `processed_dir`, `datasets_dir`, `models_dir` and their images/labels subdirectories
- Config references: `CONFIG_FILE`, `CDN_CONFIG`, `OFFSET_FILE`
- Model paths: `CURRENT_PT_MODEL`, `CURRENT_ONNX_MODEL`
- Thresholds: `SMALL_SIZE_KB = 3`
### utils.Dotdict
```python
class Dotdict(dict):
# Enables config.url instead of config["url"]
```
## Data Access Patterns
None — pure constants, no I/O.
## Implementation Details
- All paths rooted at `/azaion/` — assumes a fixed deployment directory structure
- No environment-variable override for any path — paths are entirely static
## Caveats
- Hardcoded root `/azaion/` makes local development without that directory structure impossible
- No `.env` or environment-based configuration override mechanism
- `Dotdict.__getattr__` uses `dict.get` which returns `None` for missing keys instead of raising `AttributeError`
## Dependency Graph
```mermaid
graph TD
constants --> api_client_comp[API & CDN]
constants --> training_comp[Training]
constants --> data_pipeline_comp[Data Pipeline]
constants --> inference_comp[Inference]
utils --> training_comp
utils --> inference_comp
```
## Logging Strategy
None.
@@ -0,0 +1,59 @@
# Component: Security & Hardware Identity
## Overview
Provides cryptographic operations (AES-256-CBC encryption/decryption) and hardware fingerprinting. Used for protecting model files in transit and at rest, and for binding API encryption keys to specific machines.
**Pattern**: Utility/service library (static methods)
**Upstream**: None (leaf component)
**Downstream**: API & CDN, Training, Inference
## Modules
- `security` — AES encryption, key derivation (SHA-384), hardcoded model key
- `hardware_service` — cross-platform hardware info collection (CPU, GPU, RAM, drive serial)
## Internal Interfaces
### Security (static methods)
```python
Security.encrypt_to(input_bytes: bytes, key: str) -> bytes
Security.decrypt_to(ciphertext_with_iv: bytes, key: str) -> bytes
Security.calc_hash(key: str) -> str
Security.get_hw_hash(hardware: str) -> str
Security.get_api_encryption_key(creds, hardware_hash: str) -> str
Security.get_model_encryption_key() -> str
```
### hardware_service
```python
get_hardware_info() -> str
```
## Data Access Patterns
- `hardware_service` executes shell commands to query OS/hardware info
- `security` performs in-memory cryptographic operations only
## Implementation Details
- **Encryption**: AES-256-CBC. Key = SHA-256(key_string). IV = 16 random bytes prepended to ciphertext. PKCS7 padding.
- **Key derivation hierarchy**:
1. `get_model_encryption_key()` → hardcoded secret → SHA-384 → base64
2. `get_hw_hash(hardware_string)` → salted hardware string → SHA-384 → base64
3. `get_api_encryption_key(creds, hw_hash)` → email+password+hw_hash+salt → SHA-384 → base64
- **Hardware fingerprint format**: `CPU: {cpu}. GPU: {gpu}. Memory: {memory}. DriveSerial: {serial}`
## Caveats
- **Hardcoded model encryption key** in `get_model_encryption_key()` — anyone with source code access can derive the key
- **Shell command injection risk**: `hardware_service` uses `shell=True` subprocess — safe since no user input is involved, but fragile
- **PKCS7 unpadding** in `decrypt_to` uses manual check instead of the `cryptography` library's unpadder — potential padding oracle if error handling is observed
- `BUFFER_SIZE` constant declared but unused in security.py
## Dependency Graph
```mermaid
graph TD
hardware_service --> api_client[API & CDN: api_client]
security --> api_client
security --> training[Training]
security --> inference[Inference: start_inference]
```
## Logging Strategy
None — operations are silent except for exceptions.
@@ -0,0 +1,90 @@
# Component: API & CDN Client
## Overview
Communication layer for the Azaion backend API and S3-compatible CDN. Handles authentication, encrypted file transfer, and the split-resource pattern for secure model distribution.
**Pattern**: Client library with split-storage resource management
**Upstream**: Core (constants), Security (encryption, hardware identity)
**Downstream**: Training, Inference, Exports
## Modules
- `api_client` — REST client for Azaion API, JWT auth, encrypted resource download/upload, split big/small pattern
- `cdn_manager` — boto3 S3 client with separate read/write credentials
## Internal Interfaces
### CDNCredentials
```python
CDNCredentials(host, downloader_access_key, downloader_access_secret, uploader_access_key, uploader_access_secret)
```
### CDNManager
```python
CDNManager(credentials: CDNCredentials)
CDNManager.upload(bucket: str, filename: str, file_bytes: bytearray) -> bool
CDNManager.download(bucket: str, filename: str) -> bool
```
### ApiCredentials
```python
ApiCredentials(url, email, password)
```
### ApiClient
```python
ApiClient()
ApiClient.login() -> None
ApiClient.upload_file(filename: str, file_bytes: bytearray, folder: str) -> None
ApiClient.load_bytes(filename: str, folder: str) -> bytes
ApiClient.load_big_small_resource(resource_name: str, folder: str, key: str) -> bytes
ApiClient.upload_big_small_resource(resource: bytes, resource_name: str, folder: str, key: str) -> None
```
## External API Specification
### Azaion REST API (consumed)
| Endpoint | Method | Auth | Description |
|----------|--------|------|-------------|
| `/login` | POST | None (returns JWT) | `{"email": ..., "password": ...}``{"token": ...}` |
| `/resources/{folder}` | POST | Bearer JWT | Multipart file upload |
| `/resources/get/{folder}` | POST | Bearer JWT | Download encrypted resource (sends hardware info in body) |
### S3-compatible CDN
| Operation | Description |
|-----------|-------------|
| `upload_fileobj` | Upload bytes to S3 bucket |
| `download_file` | Download file from S3 bucket to disk |
## Data Access Patterns
- API Client reads `config.yaml` on init for API credentials
- CDN credentials loaded by API Client from encrypted `cdn.yaml` (downloaded from API)
- Split resources: big part stored locally + CDN, small part on API server
## Implementation Details
- **JWT auto-refresh**: On 401/403 response, automatically re-authenticates and retries
- **Split-resource pattern**: Encrypts data → splits at ~20% (SMALL_SIZE_KB * 1024 min) boundary → small part to API, big part to CDN. Neither part alone can reconstruct the original.
- **CDN credential isolation**: Separate S3 access keys for upload vs download (least-privilege)
- **CDN self-bootstrap**: `cdn.yaml` credentials are themselves encrypted and downloaded from the API during ApiClient init
## Caveats
- Credentials hardcoded in `config.yaml` and `cdn.yaml` — not using environment variables or secrets manager
- `cdn_manager.download()` saves to current working directory with the same filename
- No retry logic beyond JWT refresh (no exponential backoff, no connection retry)
- `CDNManager` imports `sys`, `yaml`, `os` but doesn't use them
## Dependency Graph
```mermaid
graph TD
constants --> api_client
security --> api_client
hardware_service --> api_client
cdn_manager --> api_client
api_client --> exports
api_client --> train
api_client --> start_inference
cdn_manager --> exports
cdn_manager --> train
```
## Logging Strategy
Print statements for upload/download confirmations and errors. No structured logging.
@@ -0,0 +1,61 @@
# Component: Data Models
## Overview
Shared data transfer objects for the training pipeline: annotation class definitions (with weather modes) and image+label containers for visualization and augmentation.
**Pattern**: Plain data classes / value objects
**Upstream**: None (leaf)
**Downstream**: Data Pipeline (augmentation, dataset-visualiser), Training (YAML generation)
## Modules
- `dto/annotationClass` — AnnotationClass, WeatherMode enum, classes.json reader
- `dto/imageLabel` — ImageLabel container with bbox visualization
## Internal Interfaces
### WeatherMode (Enum)
| Member | Value | Description |
|--------|-------|-------------|
| Norm | 0 | Normal weather |
| Wint | 20 | Winter |
| Night | 40 | Night |
### AnnotationClass
```python
AnnotationClass(id: int, name: str, color: str)
AnnotationClass.read_json() -> dict[int, AnnotationClass] # static
AnnotationClass.color_tuple -> tuple # property, RGB ints
```
### ImageLabel
```python
ImageLabel(image_path: str, image: np.ndarray, labels_path: str, labels: list)
ImageLabel.visualize(annotation_classes: dict) -> None
```
## Data Access Patterns
- `AnnotationClass.read_json()` reads `classes.json` from project root (relative to `dto/` parent)
- `ImageLabel.visualize()` renders to matplotlib window (no disk I/O)
## Implementation Details
- 17 base annotation classes × 3 weather modes = 51 classes with offset IDs (016, 2036, 4056)
- System reserves 80 class slots (DEFAULT_CLASS_NUM in train.py)
- YOLO label format: [x_center, y_center, width, height, class_id] — all normalized 01
- `color_tuple` parsing strips first 3 chars (assumes "#ff" prefix format) — fragile if color format changes
## Caveats
- `AnnotationClass` duplicated in 3 locations (dto, inference/dto, annotation-queue/annotation_queue_dto) with slight differences
- `color_tuple` property has a non-obvious parsing approach that may break on different color string formats
- Empty files: `dto/annotation_bulk_message.py` and `dto/annotation_message.py` suggest planned but unimplemented DTOs
## Dependency Graph
```mermaid
graph TD
dto_annotationClass[dto/annotationClass] --> train
dto_annotationClass --> dataset-visualiser
dto_imageLabel[dto/imageLabel] --> augmentation
dto_imageLabel --> dataset-visualiser
```
## Logging Strategy
None.
@@ -0,0 +1,74 @@
# Component: Data Pipeline
## Overview
Tools for preparing and managing annotation data: augmentation of training images, format conversion from external annotation systems, and visual inspection of annotated datasets.
**Pattern**: Batch processing tools (standalone scripts + library)
**Upstream**: Core (constants), Data Models (ImageLabel, AnnotationClass)
**Downstream**: Training (augmented images feed into dataset formation)
## Modules
- `augmentation` — image augmentation pipeline (albumentations)
- `convert-annotations` — Pascal VOC / oriented bbox → YOLO format converter
- `dataset-visualiser` — interactive annotation visualization tool
## Internal Interfaces
### Augmentator
```python
Augmentator()
Augmentator.augment_annotations(from_scratch: bool = False) -> None
Augmentator.augment_inner(img_ann: ImageLabel) -> list[ImageLabel]
Augmentator.correct_bboxes(labels) -> list
Augmentator.read_labels(labels_path) -> list[list]
```
### convert-annotations (functions)
```python
convert(folder, dest_folder, read_annotations, ann_format) -> None
minmax2yolo(width, height, xmin, xmax, ymin, ymax) -> tuple
read_pascal_voc(width, height, s: str) -> list[str]
read_bbox_oriented(width, height, s: str) -> list[str]
```
### dataset-visualiser (functions)
```python
visualise_dataset() -> None
visualise_processed_folder() -> None
```
## Data Access Patterns
- **Augmentation**: Reads from `/azaion/data/images/` + `/azaion/data/labels/`, writes to `/azaion/data-processed/images/` + `/azaion/data-processed/labels/`
- **Conversion**: Reads from user-specified source folder, writes to destination folder
- **Visualiser**: Reads from datasets or processed folder, renders to matplotlib window
## Implementation Details
- **Augmentation pipeline**: Per image → 1 original copy + 7 augmented variants (8× data expansion)
- HorizontalFlip (60%), BrightnessContrast (40%), Affine (80%), MotionBlur (10%), HueSaturation (40%)
- Bbox correction clips outside-boundary boxes, removes boxes < 1% of image
- Incremental: skips already-processed images
- Continuous mode: infinite loop with 5-minute sleep between rounds
- Concurrent: ThreadPoolExecutor for parallel image processing
- **Format conversion**: Pluggable reader pattern — `convert()` accepts any reader function that maps (width, height, text) → YOLO lines
- **Visualiser**: Interactive (waits for keypress) — developer debugging tool
## Caveats
- `dataset-visualiser` imports from `preprocessing` module which does not exist — broken import
- `dataset-visualiser` has hardcoded dataset date (`2024-06-18`) and start index (35247)
- `convert-annotations` hardcodes class mappings (Truck=1, Car/Taxi=2) — not configurable
- Augmentation parameters are hardcoded, not configurable via config file
- Augmentation `total_to_process` attribute referenced in `augment_annotation` but never set (uses `total_images_to_process`)
## Dependency Graph
```mermaid
graph TD
constants --> augmentation
dto_imageLabel[dto/imageLabel] --> augmentation
constants --> dataset-visualiser
dto_annotationClass[dto/annotationClass] --> dataset-visualiser
dto_imageLabel --> dataset-visualiser
augmentation --> manual_run
```
## Logging Strategy
Print statements for progress tracking (processed count, errors). No structured logging.
@@ -0,0 +1,87 @@
# Component: Training Pipeline
## Overview
End-to-end YOLOv11 object detection training workflow: dataset formation from augmented annotations, model training, multi-format export (ONNX, TensorRT, RKNN), and encrypted model upload.
**Pattern**: Pipeline / orchestrator
**Upstream**: Core, Security, API & CDN, Data Models, Data Pipeline (augmented images)
**Downstream**: None (produces trained models consumed externally)
## Modules
- `train` — main pipeline: dataset formation → YOLO training → export → upload
- `exports` — model format conversion (ONNX, TensorRT, RKNN) + upload utilities
- `manual_run` — ad-hoc developer script for selective pipeline steps
## Internal Interfaces
### train
```python
form_dataset() -> None
copy_annotations(images, folder: str) -> None
check_label(label_path: str) -> bool
create_yaml() -> None
resume_training(last_pt_path: str) -> None
train_dataset() -> None
export_current_model() -> None
```
### exports
```python
export_rknn(model_path: str) -> None
export_onnx(model_path: str, batch_size: int = 4) -> None
export_tensorrt(model_path: str) -> None
form_data_sample(destination_path: str, size: int = 500, write_txt_log: bool = False) -> None
show_model(model: str = None) -> None
upload_model(model_path: str, filename: str, size_small_in_kb: int = 3) -> None
```
## Data Access Patterns
- **Input**: Reads augmented images from `/azaion/data-processed/images/` + labels
- **Dataset output**: Creates dated dataset at `/azaion/datasets/azaion-{YYYY-MM-DD}/` with train/valid/test splits
- **Model output**: Saves trained models to `/azaion/models/azaion-{YYYY-MM-DD}/`, copies best.pt to `/azaion/models/azaion.pt`
- **Upload**: Encrypted model uploaded as split big/small to CDN + API
- **Corrupted data**: Invalid labels moved to `/azaion/data-corrupted/`
## Implementation Details
- **Dataset split**: 70% train / 20% valid / 10% test (random shuffle)
- **Label validation**: `check_label()` verifies all YOLO coordinates are ≤ 1.0
- **YAML generation**: Writes `data.yaml` with 80 class names (17 actual from classes.json × 3 weather modes, rest as placeholders)
- **Training config**: YOLOv11 medium (`yolo11m.yaml`), epochs=120, batch=11 (tuned for 24GB VRAM), imgsz=1280, save_period=1, workers=24
- **Post-training**: Removes intermediate epoch checkpoints, keeps only `best.pt`
- **Export chain**: `.pt` → ONNX (1280px, batch=4, NMS) → encrypted → split → upload
- **TensorRT export**: batch=4, FP16, NMS, simplify
- **RKNN export**: targets RK3588 SoC (OrangePi5)
- **Concurrent file copying**: ThreadPoolExecutor for parallel image/label copying during dataset formation
- **`__main__`** in `train.py`: `train_dataset()``export_current_model()`
## Caveats
- Training hyperparameters are hardcoded (not configurable via config file)
- `old_images_percentage = 75` declared but unused
- `train.py` imports `subprocess`, `sleep` but doesn't use them
- `train.py` imports `OnnxEngine` but doesn't use it
- `exports.upload_model()` creates `ApiClient` with different constructor signature than the one in `api_client.py` — likely stale code
- `copy_annotations` uses a global `total_files_copied` counter with a local `copied` variable that stays at 0 — reporting bug
- `resume_training` references `yaml` (the module) instead of a YAML file path in the `data` parameter
## Dependency Graph
```mermaid
graph TD
constants --> train
constants --> exports
api_client --> train
api_client --> exports
cdn_manager --> train
cdn_manager --> exports
security --> train
security --> exports
utils --> train
utils --> exports
dto_annotationClass[dto/annotationClass] --> train
inference_onnx[inference/onnx_engine] --> train
exports --> train
train --> manual_run
augmentation --> manual_run
```
## Logging Strategy
Print statements for progress (file count, shuffling status, training results). No structured logging.
@@ -0,0 +1,85 @@
# Component: Inference Engine
## Overview
Real-time object detection inference subsystem supporting ONNX Runtime and TensorRT backends. Processes video streams with batched inference, custom NMS, and live visualization.
**Pattern**: Strategy pattern (InferenceEngine ABC) + pipeline orchestrator
**Upstream**: Core, Security, API & CDN (for model download)
**Downstream**: None (end-user facing — processes video input)
## Modules
- `inference/dto` — Detection, Annotation, AnnotationClass data classes
- `inference/onnx_engine` — InferenceEngine ABC + OnnxEngine implementation
- `inference/tensorrt_engine` — TensorRTEngine implementation with CUDA memory management + ONNX converter
- `inference/inference` — Video processing pipeline (preprocess → infer → postprocess → draw)
- `start_inference` — Entry point: downloads model, initializes engine, runs on video
## Internal Interfaces
### InferenceEngine (ABC)
```python
InferenceEngine.__init__(model_path: str, batch_size: int = 1, **kwargs)
InferenceEngine.get_input_shape() -> Tuple[int, int]
InferenceEngine.get_batch_size() -> int
InferenceEngine.run(input_data: np.ndarray) -> List[np.ndarray]
```
### OnnxEngine (extends InferenceEngine)
Constructor takes `model_bytes` (not path). Uses CUDAExecutionProvider + CPUExecutionProvider.
### TensorRTEngine (extends InferenceEngine)
Constructor takes `model_bytes: bytes`. Additional static methods:
```python
TensorRTEngine.get_gpu_memory_bytes(device_id=0) -> int
TensorRTEngine.get_engine_filename(device_id=0) -> str | None
TensorRTEngine.convert_from_onnx(onnx_model: bytes) -> bytes | None
```
### Inference
```python
Inference(engine: InferenceEngine, confidence_threshold, iou_threshold)
Inference.preprocess(frames: list) -> np.ndarray
Inference.postprocess(batch_frames, batch_timestamps, output) -> list[Annotation]
Inference.process(video: str) -> None
Inference.draw(annotation: Annotation) -> None
Inference.remove_overlapping_detections(detections) -> list[Detection]
```
## Data Access Patterns
- Model bytes loaded by caller (start_inference via ApiClient.load_big_small_resource)
- Video input via cv2.VideoCapture (file path)
- No disk writes during inference
## Implementation Details
- **Video processing**: Every 4th frame processed (25% frame sampling), batched to engine batch size
- **Preprocessing**: cv2.dnn.blobFromImage (1/255 scale, model input size, BGR→RGB)
- **Postprocessing**: Raw detections filtered by confidence, coordinates normalized to [0,1], custom NMS applied
- **Custom NMS**: Pairwise IoU comparison. Keeps higher confidence; ties broken by lower class ID.
- **TensorRT**: Async CUDA execution (memcpy_htod_async → execute_async_v3 → synchronize → memcpy_dtoh)
- **TensorRT shapes**: Default 1280×1280 input, 300 max detections, 6 values per detection (x1,y1,x2,y2,conf,cls)
- **ONNX conversion**: TensorRT builder with 90% GPU memory workspace, FP16 if supported
- **Engine filename**: GPU-architecture-specific: `azaion.cc_{major}.{minor}_sm_{sm_count}.engine`
- **start_inference flow**: ApiClient → load encrypted TensorRT model (big/small split) → decrypt → TensorRTEngine → Inference.process()
## Caveats
- `start_inference.get_engine_filename()` duplicates `TensorRTEngine.get_engine_filename()`
- Video path hardcoded in `start_inference` (`tests/ForAI_test.mp4`)
- `inference/dto` has its own AnnotationClass — duplicated from `dto/annotationClass`
- cv2.imshow display requires a GUI environment — won't work headless
- TensorRT `batch_size` attribute used before assignment if engine input shape has dynamic batch — potential NameError
## Dependency Graph
```mermaid
graph TD
inference_dto[inference/dto] --> inference_inference[inference/inference]
inference_onnx[inference/onnx_engine] --> inference_inference
inference_onnx --> inference_trt[inference/tensorrt_engine]
inference_trt --> start_inference
inference_inference --> start_inference
constants --> start_inference
api_client --> start_inference
security --> start_inference
```
## Logging Strategy
Print statements for metadata, download progress, timing. cv2.imshow for visual output.
@@ -0,0 +1,71 @@
# Component: Annotation Queue Service
## Overview
Self-contained async service that consumes annotation CRUD events from a RabbitMQ Streams queue and persists images + labels to the filesystem. Operates independently from the training pipeline.
**Pattern**: Message-driven event handler / consumer service
**Upstream**: External RabbitMQ Streams queue (Azaion platform)
**Downstream**: Data Pipeline (files written become input for augmentation)
## Modules
- `annotation-queue/annotation_queue_dto` — message DTOs (AnnotationMessage, AnnotationBulkMessage, AnnotationStatus, Detection, etc.)
- `annotation-queue/annotation_queue_handler` — async queue consumer with message routing and file management
## Internal Interfaces
### AnnotationQueueHandler
```python
AnnotationQueueHandler()
AnnotationQueueHandler.start() -> async
AnnotationQueueHandler.on_message(message: AMQPMessage, context: MessageContext) -> None
AnnotationQueueHandler.save_annotation(ann: AnnotationMessage) -> None
AnnotationQueueHandler.validate(msg: AnnotationBulkMessage) -> None
AnnotationQueueHandler.delete(msg: AnnotationBulkMessage) -> None
```
### Key DTOs
```python
AnnotationMessage(msgpack_bytes) # Full annotation with image + detections
AnnotationBulkMessage(msgpack_bytes) # Bulk validate/delete
AnnotationStatus: Created(10), Edited(20), Validated(30), Deleted(40)
RoleEnum: Operator(10), Validator(20), CompanionPC(30), Admin(40), ApiAdmin(1000)
```
## Data Access Patterns
- **Queue**: Consumes from RabbitMQ Streams queue `azaion-annotations` using rstream library
- **Offset persistence**: `offset.yaml` tracks last processed message offset for resume
- **Filesystem writes**:
- Validated annotations → `{root}/data/images/` + `{root}/data/labels/`
- Unvalidated (seed) → `{root}/data-seed/images/` + `{root}/data-seed/labels/`
- Deleted → `{root}/data_deleted/images/` + `{root}/data_deleted/labels/`
## Implementation Details
- **Message routing**: Based on `AnnotationStatus` from AMQP application properties:
- Created/Edited → save label + optionally image; validator role writes to data, operator to seed
- Validated (bulk) → move from seed to data
- Deleted (bulk) → move to deleted directory
- **Role-based logic**: `RoleEnum.is_validator()` returns True for Validator, Admin, ApiAdmin — these roles write directly to validated data directory
- **Serialization**: Messages are msgpack-encoded with positional integer keys. Detections are embedded as a JSON string within the msgpack payload.
- **Offset tracking**: After each successfully processed message, offset is persisted to `offset.yaml` (survives restarts)
- **Logging**: TimedRotatingFileHandler with daily rotation, 7-day retention, writes to `logs/` directory
- **Separate dependencies**: Own `requirements.txt` (pyyaml, msgpack, rstream only)
- **Own config.yaml**: Points to test directories by default (`data-test`, `data-test-seed`)
## Caveats
- Credentials hardcoded in `config.yaml` (queue host, user, password)
- AnnotationClass duplicated (third copy) with slight differences from dto/ version
- No reconnection logic for queue disconnections
- No dead-letter queue or message retry on processing failures
- `save_annotation` writes empty label files when detections list has no newline separators between entries
- The annotation-queue `config.yaml` uses different directory names (`data-test` vs `data`) than the main `config.yaml` — likely a test vs production configuration issue
## Dependency Graph
```mermaid
graph TD
annotation_queue_dto --> annotation_queue_handler
rstream_ext[rstream library] --> annotation_queue_handler
msgpack_ext[msgpack library] --> annotation_queue_dto
```
## Logging Strategy
`logging` module with TimedRotatingFileHandler. Format: `HH:MM:SS|message`. Daily rotation, 7-day retention. Also outputs to stdout.
+106
View File
@@ -0,0 +1,106 @@
# Data Model
## Entity Overview
This system does not use a database. All data is stored as files on the filesystem and in-memory data structures. The primary entities are annotation images, labels, and ML models.
## Entities
### Annotation Image
- **Storage**: JPEG files on filesystem
- **Naming**: `{uuid}.jpg` (name assigned by Azaion platform)
- **Lifecycle**: Created → Seed/Validated → Augmented → Dataset → Model Training
### Annotation Label (YOLO format)
- **Storage**: Text files on filesystem
- **Naming**: `{uuid}.txt` (matches image name)
- **Format**: One line per detection: `{class_id} {center_x} {center_y} {width} {height}`
- **Coordinates**: All normalized to 01 range relative to image dimensions
### AnnotationClass
- **Storage**: `classes.json` (static file, 17 entries)
- **Fields**: Id (int), Name (str), ShortName (str), Color (hex str)
- **Weather expansion**: Each class × 3 weather modes → IDs offset by 0/20/40
- **Total slots**: 80 (51 used, 29 reserved as "Class-N" placeholders)
### Detection (inference)
- **In-memory only**: Created during inference postprocessing
- **Fields**: x, y, w, h (normalized), cls (int), confidence (float)
### Annotation (inference)
- **In-memory only**: Groups detections per video frame
- **Fields**: frame (image), time (ms), detections (list)
### AnnotationMessage (queue)
- **Wire format**: msgpack with positional integer keys
- **Fields**: createdDate, name, originalMediaName, time, imageExtension, detections (JSON string), image (bytes), createdRole, createdEmail, source, status
### ML Model
- **Formats**: .pt, .onnx, .engine, .rknn
- **Encryption**: AES-256-CBC before upload
- **Split storage**: .small part (API server) + .big part (CDN)
- **Naming**: `azaion.{ext}` for current model; `azaion.cc_{major}.{minor}_sm_{count}.engine` for GPU-specific TensorRT
## Filesystem Entity Relationships
```mermaid
erDiagram
ANNOTATION_IMAGE ||--|| ANNOTATION_LABEL : "matches by filename stem"
ANNOTATION_CLASS ||--o{ ANNOTATION_LABEL : "class_id references"
ANNOTATION_IMAGE }o--|| DATASET_SPLIT : "copied into"
ANNOTATION_LABEL }o--|| DATASET_SPLIT : "copied into"
DATASET_SPLIT ||--|| TRAINING_RUN : "input to"
TRAINING_RUN ||--|| MODEL_PT : "produces"
MODEL_PT ||--|| MODEL_ONNX : "exported to"
MODEL_PT ||--|| MODEL_ENGINE : "exported to"
MODEL_PT ||--|| MODEL_RKNN : "exported to"
MODEL_ONNX ||--|| ENCRYPTED_MODEL : "encrypted"
MODEL_ENGINE ||--|| ENCRYPTED_MODEL : "encrypted"
ENCRYPTED_MODEL ||--|| MODEL_SMALL : "split part"
ENCRYPTED_MODEL ||--|| MODEL_BIG : "split part"
```
## Directory Layout (Data Lifecycle)
```
/azaion/
├── data-seed/ ← Unvalidated annotations (from operators)
│ ├── images/
│ └── labels/
├── data/ ← Validated annotations (from validators/admins)
│ ├── images/
│ └── labels/
├── data-processed/ ← Augmented data (8× expansion)
│ ├── images/
│ └── labels/
├── data-corrupted/ ← Invalid labels (coords > 1.0)
│ ├── images/
│ └── labels/
├── data_deleted/ ← Soft-deleted annotations
│ ├── images/
│ └── labels/
├── data-sample/ ← Random sample for review
├── datasets/ ← Training datasets (dated)
│ └── azaion-{YYYY-MM-DD}/
│ ├── train/images/ + labels/
│ ├── valid/images/ + labels/
│ ├── test/images/ + labels/
│ └── data.yaml
└── models/ ← Trained model artifacts
├── azaion.pt ← Current best model
├── azaion.onnx ← Current ONNX export
└── azaion-{YYYY-MM-DD}/← Per-training-run results
└── weights/
└── best.pt
```
## Configuration Files
| File | Location | Contents |
|------|----------|---------|
| `config.yaml` | Project root | API credentials, queue config, directory paths |
| `cdn.yaml` | Project root | CDN endpoint + S3 access keys |
| `classes.json` | Project root | Annotation class definitions (17 classes) |
| `checkpoint.txt` | Project root | Last training checkpoint timestamp |
| `offset.yaml` | annotation-queue/ | Queue consumer offset |
| `data.yaml` | Per dataset | YOLO training config (class names, split paths) |
+99
View File
@@ -0,0 +1,99 @@
# Component Relationship Diagram
```mermaid
graph TD
subgraph "Core Infrastructure"
core[01 Core<br/>constants, utils]
end
subgraph "Security & Hardware"
sec[02 Security<br/>security, hardware_service]
end
subgraph "API & CDN Client"
api[03 API & CDN<br/>api_client, cdn_manager]
end
subgraph "Data Models"
dto[04 Data Models<br/>dto/annotationClass, dto/imageLabel]
end
subgraph "Data Pipeline"
data[05 Data Pipeline<br/>augmentation, convert-annotations,<br/>dataset-visualiser]
end
subgraph "Training Pipeline"
train[06 Training<br/>train, exports, manual_run]
end
subgraph "Inference Engine"
infer[07 Inference<br/>inference/*, start_inference]
end
subgraph "Annotation Queue Service"
queue[08 Annotation Queue<br/>annotation-queue/*]
end
core --> api
core --> data
core --> train
core --> infer
sec --> api
sec --> train
sec --> infer
api --> train
api --> infer
dto --> data
dto --> train
data -.->|augmented images<br/>on filesystem| train
queue -.->|annotation files<br/>on filesystem| data
style core fill:#e8f5e9
style sec fill:#fff3e0
style api fill:#e3f2fd
style dto fill:#f3e5f5
style data fill:#fce4ec
style train fill:#e0f2f1
style infer fill:#f9fbe7
style queue fill:#efebe9
```
## Component Summary
| # | Component | Modules | Purpose |
|---|-----------|---------|---------|
| 01 | Core Infrastructure | constants, utils | Shared paths, config keys, helper classes |
| 02 | Security & Hardware | security, hardware_service | AES encryption, key derivation, hardware fingerprinting |
| 03 | API & CDN Client | api_client, cdn_manager | REST API + S3 CDN communication, split-resource pattern |
| 04 | Data Models | dto/annotationClass, dto/imageLabel | Annotation classes, image+label container |
| 05 | Data Pipeline | augmentation, convert-annotations, dataset-visualiser | Data prep: augmentation, format conversion, visualization |
| 06 | Training Pipeline | train, exports, manual_run | YOLO training, model export, encrypted upload |
| 07 | Inference Engine | inference/dto, onnx_engine, tensorrt_engine, inference, start_inference | Real-time video object detection |
| 08 | Annotation Queue | annotation_queue_dto, annotation_queue_handler | Async annotation event consumer service |
## Module Coverage Verification
All 21 source modules are covered by exactly one component:
- 01: constants, utils (2)
- 02: security, hardware_service (2)
- 03: api_client, cdn_manager (2)
- 04: dto/annotationClass, dto/imageLabel (2)
- 05: augmentation, convert-annotations, dataset-visualiser (3)
- 06: train, exports, manual_run (3)
- 07: inference/dto, inference/onnx_engine, inference/tensorrt_engine, inference/inference, start_inference (5)
- 08: annotation-queue/annotation_queue_dto, annotation-queue/annotation_queue_handler (2)
- **Total: 21 modules covered**
## Inter-Component Communication
| From | To | Mechanism |
|------|----|-----------|
| Annotation Queue → Data Pipeline | Filesystem | Queue writes images/labels → augmentation reads them |
| Data Pipeline → Training | Filesystem | Augmented images in `/azaion/data-processed/` → dataset formation |
| Training → API & CDN | API calls | Encrypted model upload (split big/small) |
| Inference → API & CDN | API calls | Encrypted model download (reassemble big/small) |
| API & CDN → Security | Function calls | Encryption/decryption for transit protection |
| API & CDN → Core | Import | Path constants, config file references |
@@ -0,0 +1,3 @@
# Flow: Annotation Ingestion
See `_docs/02_document/system-flows.md` — Flow 1.
@@ -0,0 +1,3 @@
# Flow: Data Augmentation
See `_docs/02_document/system-flows.md` — Flow 2.
@@ -0,0 +1,3 @@
# Flow: Model Download & Inference
See `_docs/02_document/system-flows.md` — Flow 4.
@@ -0,0 +1,3 @@
# Flow: Training Pipeline
See `_docs/02_document/system-flows.md` — Flow 3.
@@ -0,0 +1,97 @@
# Module: annotation-queue/annotation_queue_dto
## Purpose
Data transfer objects for the annotation queue consumer. Defines message types for annotation CRUD events received from a RabbitMQ Streams queue.
## Public Interface
### AnnotationClass (local copy)
Same as dto/annotationClass but reads `classes.json` from current working directory and adds `opencv_color` BGR field.
### AnnotationStatus (Enum)
| Member | Value |
|--------|-------|
| Created | 10 |
| Edited | 20 |
| Validated | 30 |
| Deleted | 40 |
### SourceEnum (Enum)
| Member | Value |
|--------|-------|
| AI | 0 |
| Manual | 1 |
### RoleEnum (Enum)
| Member | Value | Description |
|--------|-------|-------------|
| Operator | 10 | Regular annotator |
| Validator | 20 | Annotation validator |
| CompanionPC | 30 | Companion device |
| Admin | 40 | Administrator |
| ApiAdmin | 1000 | API-level admin |
`RoleEnum.is_validator() -> bool`: Returns True for Validator, Admin, ApiAdmin.
### Detection
| Field | Type |
|-------|------|
| `annotation_name` | str |
| `cls` | int |
| `x`, `y`, `w`, `h` | float |
| `confidence` | float (optional) |
### AnnotationCreatedMessageNarrow
Lightweight message with only `name` and `createdEmail` (from msgpack fields 1, 2).
### AnnotationMessage
Full annotation message deserialized from msgpack:
| Field | Type | Source |
|-------|------|--------|
| `createdDate` | datetime | msgpack field 0 (Timestamp) |
| `name` | str | field 1 |
| `originalMediaName` | str | field 2 |
| `time` | timedelta | field 3 (microseconds/10) |
| `imageExtension` | str | field 4 |
| `detections` | list[Detection] | field 5 (JSON string) |
| `image` | bytes | field 6 |
| `createdRole` | RoleEnum | field 7 |
| `createdEmail` | str | field 8 |
| `source` | SourceEnum | field 9 |
| `status` | AnnotationStatus | field 10 |
### AnnotationBulkMessage
Bulk operation message for validate/delete:
| Field | Type | Source |
|-------|------|--------|
| `annotation_names` | list[str] | msgpack field 0 |
| `annotation_status` | AnnotationStatus | field 1 |
| `createdEmail` | str | field 2 |
| `createdDate` | datetime | field 3 (Timestamp) |
## Internal Logic
- All messages are deserialized from msgpack binary using positional integer keys.
- Detections within AnnotationMessage are stored as a JSON string inside the msgpack payload.
- Module-level `annotation_classes = AnnotationClass.read_json()` is loaded at import time for Detection.__str__ formatting.
## Dependencies
- `msgpack` (external) — binary message deserialization
- `json`, `datetime`, `enum` (stdlib)
## Consumers
annotation-queue/annotation_queue_handler
## Data Models
AnnotationClass, AnnotationStatus, SourceEnum, RoleEnum, Detection, AnnotationCreatedMessageNarrow, AnnotationMessage, AnnotationBulkMessage.
## Configuration
Reads `classes.json` from current working directory.
## External Integrations
None (pure data classes).
## Security
None.
## Tests
None.
@@ -0,0 +1,59 @@
# Module: annotation-queue/annotation_queue_handler
## Purpose
Async consumer for the Azaion annotation queue (RabbitMQ Streams). Listens for annotation CRUD events and writes/moves image+label files on the filesystem.
## Public Interface
### AnnotationQueueHandler
| Method | Signature | Returns | Description |
|--------|-----------|---------|-------------|
| `__init__` | `()` | — | Reads config.yaml, creates directories, initializes rstream Consumer, reads offset |
| `start` | `async ()` | — | Starts consumer, subscribes to queue stream, runs event loop |
| `on_message` | `(message: AMQPMessage, context: MessageContext)` | — | Message callback: routes by AnnotationStatus to save/validate/delete |
| `save_annotation` | `(ann: AnnotationMessage)` | — | Writes label file + image to data or seed directory based on role |
| `validate` | `(msg: AnnotationBulkMessage)` | — | Moves annotations from seed to data directory |
| `delete` | `(msg: AnnotationBulkMessage)` | — | Moves annotations to deleted directory |
### AnnotationQueueHandler.AnnotationName (inner class)
Helper that pre-computes file paths for an annotation name across data/seed directories.
## Internal Logic
- **Queue protocol**: Subscribes to a RabbitMQ Streams queue using rstream library with AMQP message decoding. Resumes from a persisted offset stored in `offset.yaml`.
- **Message routing** (via `application_properties['AnnotationStatus']`):
- `Created` / `Edited``save_annotation`: If validator role, writes to data dir; else writes to seed dir. For Created status, also saves the image bytes. For Edited by validator, moves image from seed to data.
- `Validated``validate`: Bulk-moves all named annotations from seed to data directory.
- `Deleted``delete`: Bulk-moves all named annotations to the deleted directory.
- **Offset tracking**: After each message, increments offset and persists to `offset.yaml`.
- **Directory layout**:
- `{root}/data/images/` + `{root}/data/labels/` — validated annotations
- `{root}/data-seed/images/` + `{root}/data-seed/labels/` — unvalidated annotations
- `{root}/data_deleted/images/` + `{root}/data_deleted/labels/` — soft-deleted annotations
- **Logging**: TimedRotatingFileHandler with daily rotation, 7-day retention, logs to `logs/` directory.
## Dependencies
- `annotation_queue_dto` — AnnotationStatus, AnnotationMessage, AnnotationBulkMessage
- `rstream` (external) — RabbitMQ Streams consumer
- `yaml` (external) — config and offset persistence
- `asyncio`, `os`, `shutil`, `sys`, `logging`, `datetime` (stdlib)
## Consumers
None (entry point — runs via `__main__`).
## Data Models
Uses AnnotationMessage, AnnotationBulkMessage from annotation_queue_dto.
## Configuration
- `config.yaml`: API creds (url, email, password), queue config (host, port, consumer_user, consumer_pw, name), directory structure (root, data, data_seed, data_processed, data_deleted, images, labels)
- `offset.yaml`: persisted queue consumer offset
## External Integrations
- RabbitMQ Streams queue (rstream library) on host `188.245.120.247:5552`
- Filesystem: `/azaion/data/`, `/azaion/data-seed/`, `/azaion/data_deleted/`
## Security
- Queue credentials in `config.yaml` (hardcoded — security concern)
- No encryption of annotation data at rest
## Tests
None.
+64
View File
@@ -0,0 +1,64 @@
# Module: api_client
## Purpose
HTTP client for the Azaion backend API. Handles authentication, file upload/download with encryption, and split-resource management (big/small model parts).
## Public Interface
### ApiCredentials
| Field | Type | Description |
|-------|------|-------------|
| `url` | str | API base URL |
| `email` | str | Login email |
| `password` | str | Login password |
### ApiClient
| Method | Signature | Returns | Description |
|--------|-----------|---------|-------------|
| `__init__` | `()` | — | Reads `config.yaml` for API creds, reads `cdn.yaml` via `load_bytes`, initializes CDNManager |
| `login` | `()` | — | POST `/login` → stores JWT token |
| `upload_file` | `(filename: str, file_bytes: bytearray, folder: str)` | — | Uploads file to API resource endpoint |
| `load_bytes` | `(filename: str, folder: str) -> bytes` | Decrypted bytes | Downloads encrypted resource from API, decrypts with hardware-bound key |
| `load_big_small_resource` | `(resource_name: str, folder: str, key: str) -> bytes` | Decrypted bytes | Reassembles a split resource: big part from local disk + small part from API, decrypts combined |
| `upload_big_small_resource` | `(resource: bytes, resource_name: str, folder: str, key: str)` | — | Encrypts resource, splits into big (CDN) + small (API), uploads both |
## Internal Logic
- **Authentication**: JWT-based. Auto-login on first request, re-login on 401/403.
- **load_bytes**: Sends hardware fingerprint in request payload. Server returns encrypted bytes. Client decrypts using key derived from credentials + hardware hash.
- **Split resource pattern**: Large files (models) are split into two parts:
- `*.small` — first N bytes (min of `SMALL_SIZE_KB * 1024` or 20% of encrypted size) — stored on API server
- `*.big` — remainder — stored on CDN (S3)
- This split ensures the model cannot be reconstructed from either storage alone.
- **CDN initialization**: On construction, `cdn.yaml` is loaded via `load_bytes` (from API, encrypted), then used to initialize `CDNManager`.
## Dependencies
- `constants` — config file paths, size thresholds, model folder name
- `cdn_manager` — CDNCredentials, CDNManager for S3 operations
- `hardware_service``get_hardware_info()` for hardware fingerprint
- `security` — encryption/decryption, key derivation
- `requests` (external) — HTTP client
- `yaml` (external) — config parsing
- `io`, `json`, `os` (stdlib)
## Consumers
exports, train, start_inference
## Data Models
`ApiCredentials` — API connection credentials.
## Configuration
- `config.yaml` — API URL, email, password
- `cdn.yaml` — CDN credentials (loaded encrypted from API at init time)
## External Integrations
- Azaion REST API (`POST /login`, `POST /resources/{folder}`, `POST /resources/get/{folder}`)
- S3-compatible CDN via CDNManager
## Security
- JWT token-based authentication with auto-refresh on 401/403
- Hardware-bound encryption for downloaded resources
- Split model storage prevents single-point compromise
- Credentials read from `config.yaml` (hardcoded in file — security concern)
## Tests
None.
+56
View File
@@ -0,0 +1,56 @@
# Module: augmentation
## Purpose
Image augmentation pipeline that takes raw annotated images and produces multiple augmented variants for training data expansion. Runs continuously in a loop.
## Public Interface
### Augmentator
| Method | Signature | Returns | Description |
|--------|-----------|---------|-------------|
| `__init__` | `()` | — | Initializes augmentation transforms and counters |
| `augment_annotations` | `(from_scratch: bool = False)` | — | Processes all unprocessed images from `data/images``data-processed/images` |
| `augment_annotation` | `(image_file)` | — | Processes a single image file: reads image + labels, augments, saves results |
| `augment_inner` | `(img_ann: ImageLabel) -> list[ImageLabel]` | List of augmented images | Generates 1 original + 7 augmented variants |
| `correct_bboxes` | `(labels) -> list` | Corrected labels | Clips bounding boxes to image boundaries, removes tiny boxes |
| `read_labels` | `(labels_path) -> list[list]` | Parsed YOLO labels | Reads YOLO-format label file into list of [x, y, w, h, class_id] |
## Internal Logic
- **Augmentation pipeline** (albumentations Compose):
1. HorizontalFlip (p=0.6)
2. RandomBrightnessContrast (p=0.4)
3. Affine: scale 0.81.2, rotate ±35°, shear ±10° (p=0.8)
4. MotionBlur (p=0.1)
5. HueSaturationValue (p=0.4)
- Each image produces **8 outputs**: 1 original copy + 7 augmented variants
- Naming: `{stem}_{1..7}.jpg` for augmented, original keeps its name
- **Bbox correction**: clips bounding boxes that extend outside image borders, removes boxes smaller than `correct_min_bbox_size` (0.01 of image dimension)
- **Incremental processing**: skips images already present in `processed_images_dir`
- **Concurrent**: uses `ThreadPoolExecutor` for parallel processing
- **Continuous mode**: `__main__` runs augmentation in an infinite loop with 5-minute sleep between rounds
## Dependencies
- `constants` — directory paths (data_images_dir, data_labels_dir, processed_*)
- `dto/imageLabel` — ImageLabel container class
- `albumentations` (external) — augmentation transforms
- `cv2` (external) — image read/write
- `numpy` (external) — image array handling
- `concurrent.futures`, `os`, `shutil`, `time`, `datetime`, `pathlib` (stdlib)
## Consumers
manual_run
## Data Models
Uses `ImageLabel` from `dto/imageLabel`.
## Configuration
Hardcoded augmentation parameters (probabilities, ranges). Directory paths from `constants`.
## External Integrations
Filesystem I/O: reads from `/azaion/data/`, writes to `/azaion/data-processed/`.
## Security
None.
## Tests
None.
+51
View File
@@ -0,0 +1,51 @@
# Module: cdn_manager
## Purpose
Manages file upload and download to/from an S3-compatible CDN (MinIO/similar) using separate credentials for upload and download operations.
## Public Interface
### CDNCredentials
| Field | Type | Description |
|-------|------|-------------|
| `host` | str | CDN endpoint URL |
| `downloader_access_key` | str | S3 access key for downloads |
| `downloader_access_secret` | str | S3 secret for downloads |
| `uploader_access_key` | str | S3 access key for uploads |
| `uploader_access_secret` | str | S3 secret for uploads |
### CDNManager
| Method | Signature | Returns | Description |
|--------|-----------|---------|-------------|
| `__init__` | `(credentials: CDNCredentials)` | — | Creates two boto3 S3 clients (download + upload) |
| `upload` | `(bucket: str, filename: str, file_bytes: bytearray) -> bool` | True on success | Uploads bytes to S3 bucket |
| `download` | `(bucket: str, filename: str) -> bool` | True on success | Downloads file from S3 to current directory |
## Internal Logic
- Maintains two separate boto3 S3 clients with different credentials (read vs write separation)
- Upload uses `upload_fileobj` with in-memory BytesIO wrapper
- Download uses `download_file` (saves directly to disk with same filename)
- Both methods catch all exceptions, print error, return bool
## Dependencies
- `boto3` (external) — S3 client
- `io`, `sys`, `yaml`, `os` (stdlib) — Note: `sys`, `yaml`, `os` are imported but unused
## Consumers
api_client, exports, train, start_inference
## Data Models
`CDNCredentials` — plain data class holding S3 access credentials.
## Configuration
Credentials loaded from `cdn.yaml` by callers (not by this module directly).
## External Integrations
- S3-compatible object storage (configured via `CDNCredentials.host`)
## Security
- Separate read/write credentials enforce least-privilege access
- Credentials passed in at construction time, not hardcoded here
## Tests
None.
+59
View File
@@ -0,0 +1,59 @@
# Module: constants
## Purpose
Centralizes all filesystem path constants, config file names, file extensions, and size thresholds used across the training pipeline.
## Public Interface
| Name | Type | Value/Description |
|------|------|-------------------|
| `azaion` | str | Root directory: `/azaion` |
| `prefix` | str | Naming prefix: `azaion-` |
| `data_dir` | str | `/azaion/data` |
| `data_images_dir` | str | `/azaion/data/images` |
| `data_labels_dir` | str | `/azaion/data/labels` |
| `processed_dir` | str | `/azaion/data-processed` |
| `processed_images_dir` | str | `/azaion/data-processed/images` |
| `processed_labels_dir` | str | `/azaion/data-processed/labels` |
| `corrupted_dir` | str | `/azaion/data-corrupted` |
| `corrupted_images_dir` | str | `/azaion/data-corrupted/images` |
| `corrupted_labels_dir` | str | `/azaion/data-corrupted/labels` |
| `sample_dir` | str | `/azaion/data-sample` |
| `datasets_dir` | str | `/azaion/datasets` |
| `models_dir` | str | `/azaion/models` |
| `date_format` | str | `%Y-%m-%d` |
| `checkpoint_file` | str | `checkpoint.txt` |
| `checkpoint_date_format` | str | `%Y-%m-%d %H:%M:%S` |
| `CONFIG_FILE` | str | `config.yaml` |
| `JPG_EXT` | str | `.jpg` |
| `TXT_EXT` | str | `.txt` |
| `OFFSET_FILE` | str | `offset.yaml` |
| `SMALL_SIZE_KB` | int | `3` (KB threshold for split-upload small part) |
| `CDN_CONFIG` | str | `cdn.yaml` |
| `MODELS_FOLDER` | str | `models` |
| `CURRENT_PT_MODEL` | str | `/azaion/models/azaion.pt` |
| `CURRENT_ONNX_MODEL` | str | `/azaion/models/azaion.onnx` |
## Internal Logic
Pure constant definitions using `os.path.join`. No functions, no classes, no dynamic behavior.
## Dependencies
- `os.path` (stdlib)
## Consumers
api_client, augmentation, exports, train, manual_run, start_inference, dataset-visualiser
## Data Models
None.
## Configuration
Defines `CONFIG_FILE = 'config.yaml'` and `CDN_CONFIG = 'cdn.yaml'` — the filenames for runtime configuration. Does not read them.
## External Integrations
None.
## Security
None.
## Tests
None.
@@ -0,0 +1,43 @@
# Module: convert-annotations
## Purpose
Standalone script that converts annotation files from external formats (Pascal VOC XML, oriented bounding box text) to YOLO format.
## Public Interface
| Function | Signature | Returns | Description |
|----------|-----------|---------|-------------|
| `convert` | `(folder, dest_folder, read_annotations, ann_format)` | — | Generic converter: reads images + annotations from folder, writes YOLO format to dest |
| `minmax2yolo` | `(width, height, xmin, xmax, ymin, ymax) -> tuple` | (cx, cy, w, h) | Converts pixel min/max coords to normalized YOLO center format |
| `read_pascal_voc` | `(width, height, s: str) -> list[str]` | YOLO label lines | Parses Pascal VOC XML, maps class names to IDs, outputs YOLO lines |
| `read_bbox_oriented` | `(width, height, s: str) -> list[str]` | YOLO label lines | Parses 14-column oriented bbox format, outputs YOLO lines (hardcoded class 2) |
| `rename_images` | `(folder)` | — | Renames files by trimming last 7 chars + replacing extension with .png |
## Internal Logic
- **convert()**: Iterates image files in source folder, reads corresponding annotation file, calls format-specific reader, copies image and writes YOLO label to destination.
- **Pascal VOC**: Parses XML `<object>` elements, maps class names via `name_class_map` (Truck→1, Car/Taxi→2), filters forbidden classes (Motorcycle). Default class = 1.
- **Oriented bbox**: 14-column space-separated format, extracts min/max from columns 613, hardcodes class to 2.
- **Validation**: Skips labels where normalized coordinates exceed 1.0 (out of bounds).
## Dependencies
- `cv2` (external) — image reading for dimensions
- `xml.etree.cElementTree` (stdlib) — Pascal VOC XML parsing
- `os`, `shutil`, `pathlib` (stdlib)
## Consumers
None (standalone script).
## Data Models
None.
## Configuration
Hardcoded class mappings: `name_class_map = {'Truck': 1, 'Car': 2, 'Taxi': 2}`, `forbidden_classes = ['Motorcycle']`.
## External Integrations
Filesystem I/O only.
## Security
None.
## Tests
None.
@@ -0,0 +1,41 @@
# Module: dataset-visualiser
## Purpose
Interactive tool for visually inspecting annotated images from datasets or the processed folder, displaying bounding boxes with class colors.
## Public Interface
| Function | Signature | Description |
|----------|-----------|-------------|
| `visualise_dataset` | `()` | Iterates images in a specific dataset folder, shows each with annotations. Waits for keypress. |
| `visualise_processed_folder` | `()` | Shows images from the processed folder with annotations. |
## Internal Logic
- **visualise_dataset()**: Hardcoded to a specific dataset date (`2024-06-18`), iterates from index 35247 onward. Reads image + labels, calls `ImageLabel.visualize()`, waits for user input to advance.
- **visualise_processed_folder()**: Lists all processed images, shows the first one.
- Both functions use `read_labels()` imported from a `preprocessing` module **which does not exist** in the codebase — this is a broken import.
## Dependencies
- `constants` — directory paths (datasets_dir, prefix, processed_*)
- `dto/annotationClass` — AnnotationClass for class colors
- `dto/imageLabel` — ImageLabel for visualization
- `preprocessing`**MISSING MODULE** (read_labels function)
- `cv2` (external), `matplotlib` (external), `os`, `pathlib` (stdlib)
## Consumers
None (standalone script).
## Data Models
Uses ImageLabel, AnnotationClass.
## Configuration
Hardcoded dataset path and start index.
## External Integrations
Filesystem I/O, matplotlib interactive display.
## Security
None.
## Tests
None.
@@ -0,0 +1,49 @@
# Module: dto/annotationClass
## Purpose
Defines the `AnnotationClass` data model and `WeatherMode` enum used in the training pipeline. Reads annotation class definitions from `classes.json`.
## Public Interface
### WeatherMode (Enum)
| Member | Value | Description |
|--------|-------|-------------|
| `Norm` | 0 | Normal weather |
| `Wint` | 20 | Winter conditions |
| `Night` | 40 | Night conditions |
### AnnotationClass
| Field/Method | Type/Signature | Description |
|-------------|----------------|-------------|
| `id` | int | Class ID (weather_offset + base_id) |
| `name` | str | Class name (with weather suffix if non-Norm) |
| `color` | str | Hex color string (e.g. `#ff0000`) |
| `color_tuple` | property → tuple | RGB tuple parsed from hex color |
| `read_json()` | static → dict[int, AnnotationClass] | Reads `classes.json`, expands across weather modes, returns dict keyed by ID |
## Internal Logic
- `read_json()` locates `classes.json` relative to the parent directory of the `dto/` package
- For each of the 3 weather modes, creates an AnnotationClass per entry in `classes.json` with offset IDs (0, 20, 40)
- This produces up to 80 classes total (17 base × 3 modes = 51, but the system reserves 80 slots)
- `color_tuple` strips the first 3 characters of the color string and parses hex pairs
## Dependencies
- `json`, `enum`, `os.path` (stdlib)
## Consumers
train (for YAML generation), dataset-visualiser (for visualization colors)
## Data Models
`AnnotationClass` — annotation class with ID, name, color. `WeatherMode` — enum for weather conditions.
## Configuration
Reads `classes.json` from project root (relative path from `dto/` parent).
## External Integrations
None.
## Security
None.
## Tests
None directly; used transitively by `tests/imagelabel_visualize_test.py`.
@@ -0,0 +1,41 @@
# Module: dto/imageLabel
## Purpose
Container class for an image with its YOLO-format bounding box labels, plus a visualization method for debugging annotations.
## Public Interface
### ImageLabel
| Field/Method | Type/Signature | Description |
|-------------|----------------|-------------|
| `image_path` | str | Filesystem path to the image |
| `image` | numpy.ndarray | OpenCV image array |
| `labels_path` | str | Filesystem path to the labels file |
| `labels` | list[list] | List of YOLO bboxes: [x_center, y_center, width, height, class_id] |
| `visualize` | `(annotation_classes: dict) -> None` | Draws bounding boxes on image and displays via matplotlib |
## Internal Logic
- `visualize()` converts BGR→RGB, iterates labels, converts normalized YOLO coordinates to pixel coordinates, draws colored rectangles using `annotation_classes[class_num].color_tuple`, displays with matplotlib.
- Labels use YOLO format: center_x, center_y, width, height (all normalized 01), class_id as last element.
## Dependencies
- `cv2` (external) — image manipulation
- `matplotlib.pyplot` (external) — image display
## Consumers
augmentation (as augmented image container), dataset-visualiser (for visualization)
## Data Models
`ImageLabel` — image + labels container.
## Configuration
None.
## External Integrations
None.
## Security
None.
## Tests
Used by `tests/imagelabel_visualize_test.py`.
+53
View File
@@ -0,0 +1,53 @@
# Module: exports
## Purpose
Model export utilities: converts trained YOLO .pt models to ONNX, TensorRT, and RKNN formats. Also handles encrypted model upload (split big/small pattern) and data sampling.
## Public Interface
| Function | Signature | Returns | Description |
|----------|-----------|---------|-------------|
| `export_rknn` | `(model_path: str)` | — | Exports YOLO model to RKNN format (RK3588 target), cleans up temp folder |
| `export_onnx` | `(model_path: str, batch_size: int = 4)` | — | Exports YOLO model to ONNX (1280px, NMS enabled, GPU device 0) |
| `export_tensorrt` | `(model_path: str)` | — | Exports YOLO model to TensorRT engine (batch=4, half precision, NMS) |
| `form_data_sample` | `(destination_path: str, size: int = 500, write_txt_log: bool = False)` | — | Creates a random sample of processed images |
| `show_model` | `(model: str = None)` | — | Opens model visualization in netron |
| `upload_model` | `(model_path: str, filename: str, size_small_in_kb: int = 3)` | — | Encrypts model, splits big/small, uploads to API + CDN |
## Internal Logic
- **export_onnx**: Removes existing ONNX file if present, exports at 1280px with NMS baked in and simplification.
- **export_tensorrt**: Uses YOLO's built-in TensorRT export (batch=4, FP16, NMS, simplify).
- **export_rknn**: Exports to RKNN format targeting RK3588 SoC, moves result file and cleans temp directory.
- **upload_model**: Encrypts with `Security.get_model_encryption_key()`, splits encrypted bytes at 30%/70% boundary (or `size_small_in_kb * 1024`), uploads small part to API, big part to CDN.
- **form_data_sample**: Randomly shuffles processed images, copies first N to destination folder.
## Dependencies
- `constants` — directory paths, model paths, config file names
- `api_client` — ApiClient, ApiCredentials for upload
- `cdn_manager` — CDNManager, CDNCredentials for CDN upload
- `security` — model encryption key, encrypt_to
- `utils` — Dotdict for config access
- `ultralytics` (external) — YOLO model
- `netron` (external) — model visualization
- `yaml`, `os`, `shutil`, `random`, `pathlib` (stdlib)
## Consumers
train (export_tensorrt, upload_model, export_onnx)
## Data Models
None.
## Configuration
Reads `config.yaml` for API credentials (in `upload_model`), `cdn.yaml` for CDN credentials.
## External Integrations
- Ultralytics YOLO export pipeline
- Netron model viewer
- Azaion API + CDN for model upload
## Security
- Models are encrypted with AES-256-CBC before upload
- Split storage (big on CDN, small on API) prevents single-point compromise
## Tests
None.
@@ -0,0 +1,38 @@
# Module: hardware_service
## Purpose
Collects hardware fingerprint information (CPU, GPU, RAM, drive serial) from the host machine for use in hardware-bound encryption key derivation.
## Public Interface
| Function | Signature | Returns |
|----------|-----------|---------|
| `get_hardware_info` | `() -> str` | Formatted string: `CPU: {cpu}. GPU: {gpu}. Memory: {memory}. DriveSerial: {drive_serial}` |
## Internal Logic
- Detects OS via `os.name` (`nt` for Windows, else Linux)
- **Windows**: PowerShell commands to query `Win32_Processor`, `Win32_VideoController`, `Win32_OperatingSystem`, disk serial
- **Linux**: `lscpu`, `lspci`, `free`, `/sys/block/sda/device/` serial
- Parses multi-line output: first line = CPU, second = GPU, second-to-last = memory, last = drive serial
- Handles multiple GPUs by taking first GPU and last two lines for memory/drive
## Dependencies
- `os`, `subprocess` (stdlib)
## Consumers
api_client (used in `load_bytes` to generate hardware string for encryption)
## Data Models
None.
## Configuration
None.
## External Integrations
Executes OS-level shell commands to query hardware.
## Security
The hardware fingerprint is used as input to `Security.get_hw_hash()` and subsequently `Security.get_api_encryption_key()`, binding API encryption to the specific machine.
## Tests
None.
@@ -0,0 +1,55 @@
# Module: inference/dto
## Purpose
Data transfer objects for the inference subsystem: Detection, Annotation, and a local copy of AnnotationClass/WeatherMode.
## Public Interface
### Detection
| Field | Type | Description |
|-------|------|-------------|
| `x` | float | Normalized center X |
| `y` | float | Normalized center Y |
| `w` | float | Normalized width |
| `h` | float | Normalized height |
| `cls` | int | Class ID |
| `confidence` | float | Detection confidence score |
| `overlaps(det2, iou_threshold) -> bool` | method | IoU-based overlap check |
### Annotation
| Field | Type | Description |
|-------|------|-------------|
| `frame` | numpy.ndarray | Video frame image |
| `time` | int/float | Timestamp in the video |
| `detections` | list[Detection] | Detected objects in this frame |
### AnnotationClass (duplicate)
Same as `dto/annotationClass.AnnotationClass` but with an additional `opencv_color` field (BGR tuple). Reads from `classes.json` relative to `inference/` parent directory.
### WeatherMode (duplicate)
Same as `dto/annotationClass.WeatherMode`.
## Internal Logic
- `Detection.overlaps()` computes IoU between two bounding boxes and returns True if above threshold.
- `AnnotationClass` here adds `opencv_color` as a pre-computed BGR tuple from the hex color for efficient OpenCV rendering.
## Dependencies
- `json`, `enum`, `os.path` (stdlib)
## Consumers
inference/inference
## Data Models
Detection, Annotation, AnnotationClass, WeatherMode.
## Configuration
Reads `classes.json` from project root.
## External Integrations
None.
## Security
None.
## Tests
None.
@@ -0,0 +1,48 @@
# Module: inference/inference
## Purpose
High-level video inference pipeline. Orchestrates preprocessing → engine inference → postprocessing → visualization for object detection on video streams.
## Public Interface
### Inference
| Method | Signature | Returns | Description |
|--------|-----------|---------|-------------|
| `__init__` | `(engine: InferenceEngine, confidence_threshold, iou_threshold)` | — | Stores engine, thresholds, loads annotation classes |
| `preprocess` | `(frames: list) -> np.ndarray` | Batched blob tensor | Normalizes, resizes, and stacks frames into NCHW blob |
| `postprocess` | `(batch_frames, batch_timestamps, output) -> list[Annotation]` | Annotations per frame | Extracts detections from raw output, applies confidence filter and NMS |
| `process` | `(video: str)` | — | End-to-end: reads video → batched inference → draws + displays results |
| `draw` | `(annotation: Annotation)` | — | Draws bounding boxes with class labels on frame, shows via cv2.imshow |
| `remove_overlapping_detections` | `(detections: list[Detection]) -> list[Detection]` | Filtered list | Custom NMS: removes overlapping detections keeping higher confidence |
## Internal Logic
- **Video processing**: Reads video via cv2.VideoCapture, processes every 4th frame (frame_count % 4), batches frames to engine batch size.
- **Preprocessing**: `cv2.dnn.blobFromImage` with 1/255 scaling, model input size, BGR→RGB swap.
- **Postprocessing**: Iterates raw output, filters by confidence threshold, normalizes coordinates from model space to [0,1], creates Detection objects, applies custom NMS.
- **Custom NMS**: Pairwise IoU comparison. When two detections overlap above threshold, keeps the one with higher confidence (ties broken by lower class ID).
- **Visualization**: Draws colored rectangles and confidence labels using annotation class colors in OpenCV window.
## Dependencies
- `inference/dto` — Detection, Annotation, AnnotationClass
- `inference/onnx_engine` — InferenceEngine ABC (type hint)
- `cv2` (external) — video I/O, image processing, display
- `numpy` (external) — tensor operations
## Consumers
start_inference
## Data Models
Uses Detection, Annotation from `inference/dto`.
## Configuration
`confidence_threshold` and `iou_threshold` set at construction.
## External Integrations
- OpenCV video capture (file or stream input)
- OpenCV GUI window for real-time display
## Security
None.
## Tests
None.
@@ -0,0 +1,50 @@
# Module: inference/onnx_engine
## Purpose
Defines the abstract `InferenceEngine` base class and the `OnnxEngine` implementation for running ONNX model inference with GPU acceleration.
## Public Interface
### InferenceEngine (ABC)
| Method | Signature | Description |
|--------|-----------|-------------|
| `__init__` | `(model_path: str, batch_size: int = 1, **kwargs)` | Abstract constructor |
| `get_input_shape` | `() -> Tuple[int, int]` | Returns (height, width) of model input |
| `get_batch_size` | `() -> int` | Returns the batch size |
| `run` | `(input_data: np.ndarray) -> List[np.ndarray]` | Runs inference, returns output tensors |
### OnnxEngine (extends InferenceEngine)
| Method | Signature | Description |
|--------|-----------|-------------|
| `__init__` | `(model_bytes, batch_size: int = 1, **kwargs)` | Loads ONNX model from bytes, creates InferenceSession with CUDA+CPU providers |
| `get_input_shape` | `() -> Tuple[int, int]` | Returns (height, width) from model input shape |
| `get_batch_size` | `() -> int` | Returns batch size (from model shape or constructor arg) |
| `run` | `(input_data: np.ndarray) -> List[np.ndarray]` | Runs ONNX inference session |
## Internal Logic
- Uses ONNX Runtime with `CUDAExecutionProvider` (primary) and `CPUExecutionProvider` (fallback).
- Reads model metadata to extract class names from custom metadata map.
- If model input shape has a fixed batch dimension (not -1), overrides the constructor batch_size.
## Dependencies
- `onnxruntime` (external) — ONNX inference runtime
- `numpy` (external)
- `abc`, `typing` (stdlib)
## Consumers
inference/inference, inference/tensorrt_engine (inherits InferenceEngine), train (imports OnnxEngine)
## Data Models
None.
## Configuration
None.
## External Integrations
- ONNX Runtime GPU execution (CUDA)
## Security
None.
## Tests
None.
@@ -0,0 +1,53 @@
# Module: inference/tensorrt_engine
## Purpose
TensorRT-based inference engine implementation. Provides GPU-accelerated inference using NVIDIA TensorRT with CUDA memory management, plus ONNX-to-TensorRT conversion.
## Public Interface
### TensorRTEngine (extends InferenceEngine)
| Method | Signature | Returns | Description |
|--------|-----------|---------|-------------|
| `__init__` | `(model_bytes: bytes, **kwargs)` | — | Deserializes TensorRT engine from bytes, allocates CUDA memory |
| `get_input_shape` | `() -> Tuple[int, int]` | (height, width) | Returns model input dimensions |
| `get_batch_size` | `() -> int` | int | Returns configured batch size |
| `run` | `(input_data: np.ndarray) -> List[np.ndarray]` | Output tensors | Runs async inference on CUDA stream |
| `get_gpu_memory_bytes` | `(device_id=0) -> int` | GPU memory in bytes | Queries total GPU VRAM via pynvml (static) |
| `get_engine_filename` | `(device_id=0) -> str \| None` | Filename string | Generates device-specific engine filename (static) |
| `convert_from_onnx` | `(onnx_model: bytes) -> bytes \| None` | Serialized TensorRT plan | Converts ONNX model to TensorRT engine (static) |
## Internal Logic
- **Initialization**: Deserializes TensorRT engine, creates execution context, allocates pinned host memory and device memory for input/output tensors.
- **Dynamic shapes**: Handles -1 (dynamic) dimensions, defaults to 1280×1280 for spatial dims, batch size from engine or constructor.
- **Output shape**: [batch_size, 300 max detections, 6 values per detection (x1, y1, x2, y2, conf, cls)].
- **Inference flow**: Host→Device async copy → execute_async_v3 → synchronize → Device→Host copy.
- **ONNX conversion**: Creates TensorRT builder, parses ONNX, configures workspace (90% of GPU memory), enables FP16 if supported, builds serialized network.
- **Engine filename**: `azaion.cc_{major}.{minor}_sm_{sm_count}.engine` — uniquely identifies engine per GPU architecture.
## Dependencies
- `inference/onnx_engine` — InferenceEngine ABC
- `tensorrt` (external) — TensorRT runtime and builder
- `pycuda.driver` (external) — CUDA memory management
- `pycuda.autoinit` (external) — CUDA context auto-initialization
- `pynvml` (external) — GPU memory query
- `numpy`, `json`, `struct`, `re`, `subprocess`, `pathlib`, `typing` (stdlib/external)
## Consumers
start_inference
## Data Models
None.
## Configuration
None.
## External Integrations
- NVIDIA TensorRT runtime (GPU inference)
- CUDA driver API (memory allocation, streams)
- NVML (GPU hardware queries)
## Security
None.
## Tests
None.
+36
View File
@@ -0,0 +1,36 @@
# Module: manual_run
## Purpose
Ad-hoc script for manual training operations. Contains commented-out alternatives and a hardcoded workflow for copying model weights and exporting.
## Public Interface
No functions or classes. Script-level code only.
## Internal Logic
- Contains commented-out calls to `Augmentator().augment_annotations()`, `train.train_dataset()`, `train.resume_training()`.
- Active code: references a specific model date (`2025-05-18`), removes intermediate epoch checkpoint files, copies `best.pt` to `CURRENT_PT_MODEL`, then calls `train.export_current_model()`.
- Serves as a developer convenience script for one-off training/export operations.
## Dependencies
- `constants` — models_dir, prefix, CURRENT_PT_MODEL
- `train` — export_current_model
- `augmentation` — Augmentator (imported, usage commented out)
- `glob`, `os`, `shutil` (stdlib)
## Consumers
None (standalone script).
## Data Models
None.
## Configuration
Hardcoded model date: `2025-05-18`.
## External Integrations
Filesystem operations on `/azaion/models/`.
## Security
None.
## Tests
None.
+45
View File
@@ -0,0 +1,45 @@
# Module: security
## Purpose
Provides AES-256-CBC encryption/decryption and key derivation functions used to protect model files and API resources in transit.
## Public Interface
| Method | Signature | Returns | Description |
|--------|-----------|---------|-------------|
| `Security.encrypt_to` | `(input_bytes: bytes, key: str) -> bytes` | IV + ciphertext | AES-256-CBC encrypt with PKCS7 padding; prepends 16-byte random IV |
| `Security.decrypt_to` | `(ciphertext_with_iv_bytes: bytes, key: str) -> bytes` | plaintext bytes | Extracts IV from first 16 bytes, decrypts, removes PKCS7 padding |
| `Security.calc_hash` | `(key: str) -> str` | base64-encoded SHA-384 hash | General-purpose hash function |
| `Security.get_hw_hash` | `(hardware: str) -> str` | base64 hash | Derives a hardware-specific hash using `Azaion_{hardware}_%$$$)0_` salt |
| `Security.get_api_encryption_key` | `(creds, hardware_hash: str) -> str` | base64 hash | Derives API encryption key from credentials + hardware hash |
| `Security.get_model_encryption_key` | `() -> str` | base64 hash | Returns a fixed encryption key derived from a hardcoded secret string |
## Internal Logic
- Encryption: SHA-256 of the key string → 32-byte AES key. Random 16-byte IV generated per encryption. PKCS7 padding applied. Output = IV ∥ ciphertext.
- Decryption: First 16 bytes = IV, remainder = ciphertext. Manual PKCS7 unpadding (checks last byte is 116).
- Key derivation uses SHA-384 + base64 encoding for all hash-based keys.
- `BUFFER_SIZE = 64 * 1024` is declared but unused.
## Dependencies
- `cryptography.hazmat` (external) — AES cipher, CBC mode, PKCS7 padding
- `hashlib`, `base64`, `os` (stdlib)
## Consumers
api_client, exports, train, start_inference, tests/security_test
## Data Models
None.
## Configuration
None consumed at runtime. Contains hardcoded key material.
## External Integrations
None.
## Security
- **Hardcoded model encryption key**: `get_model_encryption_key()` uses a static string `'-#%@AzaionKey@%#---234sdfklgvhjbnn'`. This is a significant security concern — the key should be stored in a secrets manager or environment variable.
- API encryption key is derived from user credentials + hardware fingerprint, providing per-device uniqueness.
- AES-256-CBC with random IV is cryptographically sound for symmetric encryption.
## Tests
- `tests/security_test.py` — basic round-trip encrypt/decrypt test (script-based, no test framework).
@@ -0,0 +1,52 @@
# Module: start_inference
## Purpose
Entry point for running inference on video files using a TensorRT engine. Downloads the encrypted model from the API/CDN, initializes the engine, and processes video.
## Public Interface
| Function | Signature | Returns | Description |
|----------|-----------|---------|-------------|
| `get_engine_filename` | `(device_id=0) -> str \| None` | Engine filename | Generates GPU-specific engine filename (duplicate of TensorRTEngine.get_engine_filename) |
`__main__` block: Creates ApiClient, downloads encrypted TensorRT model (split big/small), initializes TensorRTEngine, runs Inference on a test video.
## Internal Logic
- **Model download flow**: ApiClient → `load_big_small_resource` → reassembles from local big part + API-downloaded small part → decrypts with model encryption key → raw engine bytes.
- **Inference setup**: TensorRTEngine initialized from decrypted bytes, Inference configured with confidence_threshold=0.5, iou_threshold=0.3.
- **Video source**: Hardcoded to `tests/ForAI_test.mp4`.
- **get_engine_filename()**: Duplicates `TensorRTEngine.get_engine_filename()` — generates `azaion.cc_{major}.{minor}_sm_{sm_count}.engine` based on CUDA device compute capability and SM count.
## Dependencies
- `constants` — config file paths
- `api_client` — ApiClient, ApiCredentials for model download
- `cdn_manager` — CDNManager, CDNCredentials (imported but CDN managed by api_client)
- `inference/inference` — Inference pipeline
- `inference/tensorrt_engine` — TensorRTEngine
- `security` — model encryption key
- `utils` — Dotdict
- `pycuda.driver` (external) — CUDA device queries
- `yaml` (external)
## Consumers
None (entry point).
## Data Models
None.
## Configuration
- Confidence threshold: 0.5
- IoU threshold: 0.3
- Video path: `tests/ForAI_test.mp4` (hardcoded)
## External Integrations
- Azaion API + CDN for model download
- TensorRT GPU inference
- OpenCV video capture and display
## Security
- Model is downloaded encrypted (split big/small) and decrypted locally
- Uses hardware-bound and model encryption keys
## Tests
None.
+61
View File
@@ -0,0 +1,61 @@
# Module: train
## Purpose
Main training pipeline. Forms YOLO datasets from processed annotations, trains YOLOv11 models, and exports/uploads the trained model.
## Public Interface
| Function | Signature | Returns | Description |
|----------|-----------|---------|-------------|
| `form_dataset` | `()` | — | Creates train/valid/test split from processed images |
| `copy_annotations` | `(images, folder: str)` | — | Copies image+label pairs to a dataset split folder (concurrent) |
| `check_label` | `(label_path: str) -> bool` | bool | Validates YOLO label file (all coords ≤ 1.0) |
| `create_yaml` | `()` | — | Generates YOLO `data.yaml` with class names from `classes.json` |
| `resume_training` | `(last_pt_path: str)` | — | Resumes training from a checkpoint |
| `train_dataset` | `()` | — | Full pipeline: form_dataset → create_yaml → train YOLOv11 → save model |
| `export_current_model` | `()` | — | Exports current .pt to ONNX, encrypts, uploads as split resource |
## Internal Logic
- **Dataset formation**: Shuffles all processed images, splits 70/20/10 (train/valid/test). Copies in parallel via ThreadPoolExecutor. Corrupted labels (coords > 1.0) are moved to `/azaion/data-corrupted/`.
- **YAML generation**: Reads annotation classes from `classes.json`, builds `data.yaml` with 80 class names (17 actual + 63 placeholders "Class-N"), sets train/valid/test paths.
- **Training**: YOLOv11 medium (`yolo11m.yaml`), 120 epochs, batch=11 (tuned for 24GB VRAM), 1280px input, save every epoch, 24 workers.
- **Post-training**: Copies results to `/azaion/models/{date}/`, removes intermediate epoch checkpoints, copies `best.pt` to `CURRENT_PT_MODEL`.
- **Export**: Calls `export_onnx`, reads the ONNX file, encrypts with model key, uploads via `upload_big_small_resource`.
- **Dataset naming**: `azaion-{YYYY-MM-DD}` using current date.
- **`__main__`**: Runs `train_dataset()` then `export_current_model()`.
## Dependencies
- `constants` — all directory/path constants
- `api_client` — ApiClient for model upload
- `cdn_manager` — CDNCredentials, CDNManager (imported but CDN init done via api_client)
- `dto/annotationClass` — AnnotationClass for class name generation
- `inference/onnx_engine` — OnnxEngine (imported but unused in current code)
- `security` — model encryption key
- `utils` — Dotdict
- `exports` — export_tensorrt, upload_model, export_onnx
- `ultralytics` (external) — YOLO training and export
- `yaml`, `concurrent.futures`, `glob`, `os`, `random`, `shutil`, `subprocess`, `datetime`, `pathlib`, `time` (stdlib)
## Consumers
manual_run
## Data Models
Uses AnnotationClass for class definitions.
## Configuration
- Training hyperparameters hardcoded: epochs=120, batch=11, imgsz=1280, save_period=1, workers=24
- Dataset split ratios: train_set=70, valid_set=20, test_set=10
- old_images_percentage=75 (declared but unused)
- DEFAULT_CLASS_NUM=80
## External Integrations
- Ultralytics YOLOv11 training pipeline
- Azaion API + CDN for model upload
- Filesystem: `/azaion/datasets/`, `/azaion/models/`, `/azaion/data-processed/`, `/azaion/data-corrupted/`
## Security
- Trained models are encrypted before upload
- Uses `Security.get_model_encryption_key()` for encryption
## Tests
None.
+36
View File
@@ -0,0 +1,36 @@
# Module: utils
## Purpose
Provides a dictionary subclass that supports dot-notation attribute access.
## Public Interface
| Name | Type | Signature |
|------|------|-----------|
| `Dotdict` | class (extends `dict`) | `Dotdict(dict)` |
`Dotdict` overrides `__getattr__`, `__setattr__`, `__delattr__` to delegate to `dict.get`, `dict.__setitem__`, `dict.__delitem__` respectively.
## Internal Logic
Single-class module. Allows `config.url` instead of `config["url"]` for YAML-loaded dicts.
## Dependencies
None (stdlib `dict` only).
## Consumers
exports, train, start_inference
## Data Models
None.
## Configuration
None.
## External Integrations
None.
## Security
None.
## Tests
None.
+6
View File
@@ -0,0 +1,6 @@
1. update yolo to 26m version
2. don't use external augmentation, use built-in in yolo, put additional parameters for that in train command, each parameter should be on its own line with a proper comment
3. because of that, we don't need processed folder, just use data dir.
4. do not copy the files itself to dataset folder, use hard simlynks for that
5. unify constants directories in config - remove annotations-queue/config.yaml
and use constants for that
+22
View File
@@ -0,0 +1,22 @@
{
"current_step": "complete",
"completed_steps": ["discovery", "module-analysis", "component-assembly", "system-synthesis", "verification", "solution-extraction", "problem-extraction", "final-report"],
"focus_dir": null,
"modules_total": 21,
"modules_documented": [
"constants", "utils", "security", "hardware_service", "cdn_manager",
"dto/annotationClass", "dto/imageLabel", "inference/dto", "inference/onnx_engine",
"api_client", "augmentation", "inference/tensorrt_engine", "inference/inference",
"exports", "convert-annotations", "dataset-visualiser",
"train", "start_inference",
"manual_run",
"annotation-queue/annotation_queue_dto", "annotation-queue/annotation_queue_handler"
],
"modules_remaining": [],
"module_batch": 7,
"components_written": [
"01_core", "02_security", "03_api_cdn", "04_data_models",
"05_data_pipeline", "06_training", "07_inference", "08_annotation_queue"
],
"last_updated": "2026-03-26T00:00:00Z"
}
+188
View File
@@ -0,0 +1,188 @@
# System Flows
## Flow 1: Annotation Ingestion (Annotation Queue → Filesystem)
```mermaid
sequenceDiagram
participant RMQ as RabbitMQ Streams
participant AQH as AnnotationQueueHandler
participant FS as Filesystem
RMQ->>AQH: AMQP message (msgpack)
AQH->>AQH: Decode message, read AnnotationStatus
alt Created / Edited
AQH->>AQH: Parse AnnotationMessage (image + detections)
alt Validator / Admin role
AQH->>FS: Write label → /data/labels/{name}.txt
AQH->>FS: Write image → /data/images/{name}.jpg
else Operator role
AQH->>FS: Write label → /data-seed/labels/{name}.txt
AQH->>FS: Write image → /data-seed/images/{name}.jpg
end
else Validated (bulk)
AQH->>FS: Move images+labels from /data-seed/ → /data/
else Deleted (bulk)
AQH->>FS: Move images+labels → /data_deleted/
end
AQH->>FS: Persist offset to offset.yaml
```
### Data Flow Table
| Step | Input | Output | Component |
|------|-------|--------|-----------|
| Receive | AMQP message (msgpack) | AnnotationMessage / AnnotationBulkMessage | Annotation Queue |
| Route | AnnotationStatus header | Dispatch to save/validate/delete | Annotation Queue |
| Save | Image bytes + detection JSON | .jpg + .txt files on disk | Annotation Queue |
| Track | Message context offset | offset.yaml | Annotation Queue |
---
## Flow 2: Data Augmentation
```mermaid
sequenceDiagram
participant FS as Filesystem (/azaion/data/)
participant AUG as Augmentator
participant PFS as Filesystem (/azaion/data-processed/)
loop Every 5 minutes
AUG->>FS: Scan /data/images/ for unprocessed files
AUG->>AUG: Filter out already-processed images
loop Each unprocessed image (parallel)
AUG->>FS: Read image + labels
AUG->>AUG: Correct bounding boxes (clip + filter)
AUG->>AUG: Generate 7 augmented variants
AUG->>PFS: Write 8 images (original + 7 augmented)
AUG->>PFS: Write 8 label files
end
AUG->>AUG: Sleep 5 minutes
end
```
---
## Flow 3: Training Pipeline
```mermaid
sequenceDiagram
participant PFS as Filesystem (/data-processed/)
participant TRAIN as train.py
participant DS as Filesystem (/datasets/)
participant YOLO as Ultralytics YOLO
participant API as Azaion API
participant CDN as S3 CDN
TRAIN->>PFS: Read all processed images
TRAIN->>TRAIN: Shuffle, split 70/20/10
TRAIN->>DS: Copy to train/valid/test folders
Note over TRAIN: Corrupted labels → /data-corrupted/
TRAIN->>TRAIN: Generate data.yaml (80 class names)
TRAIN->>YOLO: Train yolo11m (120 epochs, batch=11, 1280px)
YOLO-->>TRAIN: Training results + best.pt
TRAIN->>DS: Copy results to /models/{date}/
TRAIN->>TRAIN: Copy best.pt → /models/azaion.pt
TRAIN->>TRAIN: Export .pt → .onnx (1280px, batch=4)
TRAIN->>TRAIN: Read azaion.onnx bytes
TRAIN->>TRAIN: Encrypt with model key (AES-256-CBC)
TRAIN->>TRAIN: Split: small (≤3KB or 20%) + big (rest)
TRAIN->>API: Upload azaion.onnx.small
TRAIN->>CDN: Upload azaion.onnx.big
```
---
## Flow 4: Model Download & Inference
```mermaid
sequenceDiagram
participant INF as start_inference.py
participant API as Azaion API
participant CDN as S3 CDN
participant SEC as Security
participant TRT as TensorRTEngine
participant VID as Video File
participant GUI as OpenCV Window
INF->>INF: Determine GPU-specific engine filename
INF->>SEC: Get model encryption key
INF->>API: Login (JWT)
INF->>API: Download {engine}.small (encrypted)
INF->>INF: Read {engine}.big from local disk
INF->>INF: Reassemble: small + big
INF->>SEC: Decrypt (AES-256-CBC)
INF->>TRT: Initialize engine from bytes
TRT->>TRT: Allocate CUDA memory (input + output)
loop Video frames
INF->>VID: Read frame (every 4th)
INF->>INF: Batch frames to batch_size
INF->>TRT: Preprocess (blob, normalize, resize)
TRT->>TRT: CUDA memcpy host→device
TRT->>TRT: Execute inference (async)
TRT->>TRT: CUDA memcpy device→host
INF->>INF: Postprocess (confidence filter + NMS)
INF->>GUI: Draw bounding boxes + display
end
```
### Data Flow Table
| Step | Input | Output | Component |
|------|-------|--------|-----------|
| Model resolve | GPU compute capability | Engine filename | Inference |
| Download small | API endpoint + JWT | Encrypted small bytes | API & CDN |
| Load big | Local filesystem | Encrypted big bytes | API & CDN |
| Reassemble | small + big bytes | Full encrypted model | API & CDN |
| Decrypt | Encrypted model + key | Raw TensorRT engine | Security |
| Init engine | Engine bytes | CUDA buffers allocated | Inference |
| Preprocess | Video frame | NCHW float32 blob | Inference |
| Inference | Input blob | Raw detection tensor | Inference |
| Postprocess | Raw tensor | List[Detection] | Inference |
| Visualize | Detections + frame | Annotated frame | Inference |
---
## Flow 5: Model Export (Multi-Format)
```mermaid
flowchart LR
PT[azaion.pt] -->|export_onnx| ONNX[azaion.onnx]
PT -->|export_tensorrt| TRT[azaion.engine]
PT -->|export_rknn| RKNN[azaion.rknn]
ONNX -->|encrypt + split| UPLOAD[API + CDN upload]
TRT -->|encrypt + split| UPLOAD
```
| Target Format | Resolution | Batch | Precision | Use Case |
|---------------|-----------|-------|-----------|----------|
| ONNX | 1280px | 4 | FP32 | Cross-platform inference |
| TensorRT | auto | 4 | FP16 | Production GPU inference |
| RKNN | auto | auto | auto | OrangePi5 edge device |
---
## Error Scenarios
| Flow | Error | Handling |
|------|-------|---------|
| Annotation ingestion | Malformed message | Caught by on_message exception handler, logged |
| Annotation ingestion | Queue disconnect | Process exits (no reconnect logic) |
| Augmentation | Corrupted image | Caught per-thread, logged, skipped |
| Augmentation | Transform failure | Caught per-variant, logged, fewer augmentations produced |
| Training | Corrupted label (coords > 1.0) | Moved to /data-corrupted/ |
| Training | Power outage | save_period=1 enables resume_training from last epoch |
| API download | 401/403 | Auto-relogin + retry |
| API download | 500 | Printed, no retry |
| Inference | CUDA error | RuntimeError raised |
| CDN upload/download | Any exception | Caught, printed, returns False |
+285
View File
@@ -0,0 +1,285 @@
# Blackbox Test Scenarios
## BT-AUG: Augmentation Pipeline
### BT-AUG-01: Single image produces 8 outputs
- **Input**: 1 image + 1 valid label from fixture dataset
- **Action**: Run `Augmentator.augment_inner()` on the image
- **Expected**: Returns list of exactly 8 ImageLabel objects
- **Traces**: AC: 8× augmentation ratio
### BT-AUG-02: Augmented filenames follow naming convention
- **Input**: Image with stem "test_image"
- **Action**: Run `augment_inner()`
- **Expected**: Output filenames: `test_image.jpg`, `test_image_1.jpg` through `test_image_7.jpg`; matching `.txt` labels
- **Traces**: AC: Augmentation output format
### BT-AUG-03: All output bounding boxes in valid range
- **Input**: 1 image + label with multiple bboxes
- **Action**: Run `augment_inner()`
- **Expected**: Every bbox coordinate in every output label is in [0.0, 1.0]
- **Traces**: AC: Bounding boxes clipped to [0, 1]
### BT-AUG-04: Bounding box correction clips edge bboxes
- **Input**: Label with bbox near edge: `0 0.99 0.5 0.2 0.1`
- **Action**: Run `correct_bboxes()`
- **Expected**: Width reduced so bbox fits within [margin, 1-margin]; no coordinate exceeds bounds
- **Traces**: AC: Bounding boxes clipped to [0, 1]
### BT-AUG-05: Tiny bounding boxes removed after correction
- **Input**: Label with tiny bbox that becomes < 0.01 after clipping
- **Action**: Run `correct_bboxes()`
- **Expected**: Bbox removed from output (area < correct_min_bbox_size)
- **Traces**: AC: Bounding boxes with area < 0.01% discarded
### BT-AUG-06: Empty label produces 8 outputs with empty labels
- **Input**: 1 image + empty label file
- **Action**: Run `augment_inner()`
- **Expected**: 8 ImageLabel objects returned; all have empty labels lists
- **Traces**: AC: Augmentation handles empty annotations
### BT-AUG-07: Full augmentation pipeline (filesystem integration)
- **Input**: 5 images + labels copied to data/ directory in tmp_path
- **Action**: Run `augment_annotations()` with patched paths
- **Expected**: 40 images (5 × 8) in processed images dir; 40 matching labels in processed labels dir
- **Traces**: AC: 8× augmentation, filesystem output
### BT-AUG-08: Augmentation skips already-processed images
- **Input**: 5 images in data/; 3 already present in processed/ dir
- **Action**: Run `augment_annotations()`
- **Expected**: Only 2 new images processed (16 new outputs); existing 3 untouched
- **Traces**: AC: Augmentation processes only unprocessed images
---
## BT-DSF: Dataset Formation
### BT-DSF-01: 70/20/10 split ratio
- **Input**: 100 images + labels in processed/ dir
- **Action**: Run `form_dataset()` with patched paths
- **Expected**: train: 70 images+labels, valid: 20, test: 10
- **Traces**: AC: Dataset split 70/20/10
### BT-DSF-02: Split directories structure
- **Input**: 100 images + labels
- **Action**: Run `form_dataset()`
- **Expected**: Created: `train/images/`, `train/labels/`, `valid/images/`, `valid/labels/`, `test/images/`, `test/labels/`
- **Traces**: AC: YOLO dataset directory structure
### BT-DSF-03: Total files preserved across splits
- **Input**: 100 valid images + labels
- **Action**: Run `form_dataset()`
- **Expected**: `count(train) + count(valid) + count(test) == 100` (no data loss)
- **Traces**: AC: Dataset integrity
### BT-DSF-04: Corrupted labels moved to corrupted directory
- **Input**: 95 valid + 5 corrupted labels (coords > 1.0)
- **Action**: Run `form_dataset()` with patched paths
- **Expected**: 5 images+labels in `data-corrupted/`; 95 across train/valid/test splits
- **Traces**: AC: Corrupted labels filtered
---
## BT-LBL: Label Validation
### BT-LBL-01: Valid label accepted
- **Input**: Label file: `0 0.5 0.5 0.1 0.1`
- **Action**: Call `check_label(path)`
- **Expected**: Returns `True`
- **Traces**: AC: Valid YOLO label format
### BT-LBL-02: Label with x > 1.0 rejected
- **Input**: Label file: `0 1.5 0.5 0.1 0.1`
- **Action**: Call `check_label(path)`
- **Expected**: Returns `False`
- **Traces**: AC: Corrupted labels detected
### BT-LBL-03: Label with height > 1.0 rejected
- **Input**: Label file: `0 0.5 0.5 0.1 1.2`
- **Action**: Call `check_label(path)`
- **Expected**: Returns `False`
- **Traces**: AC: Corrupted labels detected
### BT-LBL-04: Missing label file rejected
- **Input**: Non-existent file path
- **Action**: Call `check_label(path)`
- **Expected**: Returns `False`
- **Traces**: AC: Missing labels handled
### BT-LBL-05: Multi-line label with one corrupted line
- **Input**: Label file: `0 0.5 0.5 0.1 0.1\n3 0.5 0.5 0.1 1.5`
- **Action**: Call `check_label(path)`
- **Expected**: Returns `False` (any corrupted line fails the whole file)
- **Traces**: AC: Corrupted labels detected
---
## BT-ENC: Encryption
### BT-ENC-01: Encrypt-decrypt roundtrip (arbitrary data)
- **Input**: 1024 random bytes, key "test-key"
- **Action**: `decrypt_to(encrypt_to(data, key), key)`
- **Expected**: Output equals input bytes exactly
- **Traces**: AC: AES-256-CBC encryption
### BT-ENC-02: Encrypt-decrypt roundtrip (ONNX model)
- **Input**: `azaion.onnx` bytes, model encryption key
- **Action**: `decrypt_to(encrypt_to(model_bytes, key), key)`
- **Expected**: Output equals input bytes exactly
- **Traces**: AC: Model encryption
### BT-ENC-03: Empty input roundtrip
- **Input**: `b""`, key "test-key"
- **Action**: `decrypt_to(encrypt_to(b"", key), key)`
- **Expected**: Output equals `b""`
- **Traces**: AC: Edge case handling
### BT-ENC-04: Single byte roundtrip
- **Input**: `b"\x00"`, key "test-key"
- **Action**: `decrypt_to(encrypt_to(b"\x00", key), key)`
- **Expected**: Output equals `b"\x00"`
- **Traces**: AC: Edge case handling
### BT-ENC-05: Different keys produce different ciphertext
- **Input**: Same 1024 bytes, keys "key-a" and "key-b"
- **Action**: `encrypt_to(data, "key-a")` vs `encrypt_to(data, "key-b")`
- **Expected**: Ciphertexts differ
- **Traces**: AC: Key-dependent encryption
### BT-ENC-06: Wrong key fails decryption
- **Input**: Encrypted with "key-a", decrypt with "key-b"
- **Action**: `decrypt_to(encrypted, "key-b")`
- **Expected**: Output does NOT equal original input
- **Traces**: AC: Key-dependent encryption
---
## BT-SPL: Model Split Storage
### BT-SPL-01: Split respects size constraint
- **Input**: 10000 encrypted bytes
- **Action**: Split into small + big per `SMALL_SIZE_KB = 3` logic
- **Expected**: small ≤ max(3072 bytes, 20% of total); big = remainder
- **Traces**: AC: Model split ≤3KB or 20%
### BT-SPL-02: Reassembly produces original
- **Input**: 10000 encrypted bytes → split → reassemble
- **Action**: `small + big`
- **Expected**: Equals original encrypted bytes
- **Traces**: AC: Split model integrity
---
## BT-CLS: Annotation Class Loading
### BT-CLS-01: Load 17 base classes
- **Input**: `classes.json`
- **Action**: `AnnotationClass.read_json()`
- **Expected**: Dict with 17 unique class entries (base IDs)
- **Traces**: AC: 17 base classes
### BT-CLS-02: Weather mode expansion
- **Input**: `classes.json`
- **Action**: `AnnotationClass.read_json()`
- **Expected**: Same class at offset 0 (Norm), 20 (Wint), 40 (Night); e.g., ID 0, 20, 40 all represent ArmorVehicle
- **Traces**: AC: 3 weather modes
### BT-CLS-03: YAML generation produces 80 class names
- **Input**: `classes.json` + dataset path
- **Action**: `create_yaml()` with patched paths
- **Expected**: data.yaml contains `nc: 80`, 17 named classes + 63 `Class-N` placeholders
- **Traces**: AC: 80 total class slots
---
## BT-HSH: Hardware Hash
### BT-HSH-01: Deterministic output
- **Input**: "test-hardware-info"
- **Action**: `Security.get_hw_hash()` called twice
- **Expected**: Both calls return identical string
- **Traces**: AC: Hardware fingerprinting determinism
### BT-HSH-02: Different inputs produce different hashes
- **Input**: "hw-a" and "hw-b"
- **Action**: `Security.get_hw_hash()` on each
- **Expected**: Results differ
- **Traces**: AC: Hardware-bound uniqueness
### BT-HSH-03: Output is valid base64
- **Input**: "test-hardware-info"
- **Action**: `Security.get_hw_hash()`
- **Expected**: Matches regex `^[A-Za-z0-9+/]+=*$`
- **Traces**: AC: Hash format
---
## BT-INF: ONNX Inference
### BT-INF-01: Model loads successfully
- **Input**: `azaion.onnx` bytes
- **Action**: `OnnxEngine(model_bytes)`
- **Expected**: No exception; engine object created with valid input_shape and batch_size
- **Traces**: AC: ONNX inference capability
### BT-INF-02: Inference returns output
- **Input**: ONNX engine + 1 preprocessed image
- **Action**: `engine.run(input_blob)`
- **Expected**: Returns list of numpy arrays; first array has shape [batch, N, 6+]
- **Traces**: AC: ONNX inference produces results
### BT-INF-03: Postprocessing returns valid detections
- **Input**: ONNX engine output from real image
- **Action**: `Inference.postprocess()`
- **Expected**: Returns list of Annotation objects; each Detection has x,y,w,h ∈ [0,1], cls ∈ [0,79], confidence ∈ [0,1]
- **Traces**: AC: Detection format validity
---
## BT-NMS: Overlap Removal
### BT-NMS-01: Overlapping detections — keep higher confidence
- **Input**: 2 Detection objects at same position, confidence 0.9 and 0.5, IoU > 0.3
- **Action**: `remove_overlapping_detections()`
- **Expected**: 1 detection returned (confidence 0.9)
- **Traces**: AC: NMS IoU threshold 0.3
### BT-NMS-02: Non-overlapping detections — keep both
- **Input**: 2 Detection objects at distant positions, IoU < 0.3
- **Action**: `remove_overlapping_detections()`
- **Expected**: 2 detections returned
- **Traces**: AC: NMS preserves non-overlapping
### BT-NMS-03: Chain overlap resolution
- **Input**: 3 Detection objects: A overlaps B (IoU > 0.3), B overlaps C (IoU > 0.3), A doesn't overlap C
- **Action**: `remove_overlapping_detections()`
- **Expected**: ≤ 2 detections; highest confidence per overlapping pair kept
- **Traces**: AC: NMS handles chains
---
## BT-AQM: Annotation Queue Message Parsing
### BT-AQM-01: Parse Created annotation message
- **Input**: Msgpack bytes matching AnnotationMessage schema (status=Created, role=Validator)
- **Action**: Decode and construct AnnotationMessage
- **Expected**: All fields populated: name, detections, image bytes, status == "Created", role == "Validator"
- **Traces**: AC: Annotation message parsing
### BT-AQM-02: Parse Validated bulk message
- **Input**: Msgpack bytes with status=Validated, list of names
- **Action**: Decode and construct AnnotationBulkMessage
- **Expected**: Status == "Validated", names list matches input
- **Traces**: AC: Bulk validation parsing
### BT-AQM-03: Parse Deleted bulk message
- **Input**: Msgpack bytes with status=Deleted, list of names
- **Action**: Decode and construct AnnotationBulkMessage
- **Expected**: Status == "Deleted", names list matches input
- **Traces**: AC: Bulk deletion parsing
### BT-AQM-04: Malformed message raises exception
- **Input**: Invalid msgpack bytes
- **Action**: Attempt to decode
- **Expected**: Exception raised
- **Traces**: AC: Error handling for malformed messages
+71
View File
@@ -0,0 +1,71 @@
# Test Environment
## Runtime Requirements
| Requirement | Specification |
|-------------|--------------|
| Python | 3.10+ |
| OS | Linux or macOS (POSIX filesystem paths) |
| GPU | Optional — ONNX inference falls back to CPUExecutionProvider |
| Disk | Temp directory for fixture data (~500MB for augmentation output) |
| Network | Not required (all tests are offline) |
## Execution Modes
Tests MUST be runnable in two ways:
### 1. Local (no Docker) — primary mode
Run directly on the host machine. Required for macOS development where Docker has GPU/performance limitations.
```bash
scripts/run-tests-local.sh
```
### 2. Docker — CI/portable mode
Run inside a container for reproducible CI environments (Linux-based CI runners).
```bash
docker compose -f docker-compose.test.yml up --build --abort-on-container-exit
```
Both modes run the same pytest suite; the only difference is the runtime environment.
## Dependencies
All test dependencies are a subset of the production `requirements.txt` plus pytest:
| Package | Purpose |
|---------|---------|
| pytest | Test runner |
| albumentations | Augmentation tests |
| opencv-python-headless | Image I/O (headless — no GUI) |
| numpy | Array operations |
| onnxruntime | ONNX inference (CPU fallback) |
| cryptography | Encryption tests |
| msgpack | Annotation queue message tests |
| PyYAML | Config/YAML generation tests |
## Fixture Data
| Fixture | Location | Size |
|---------|----------|------|
| 100 annotated images | `_docs/00_problem/input_data/dataset/images/` | ~50MB |
| 100 YOLO labels | `_docs/00_problem/input_data/dataset/labels/` | ~10KB |
| ONNX model | `_docs/00_problem/input_data/azaion.onnx` | 81MB |
| Class definitions | `classes.json` (project root) | 2KB |
## Test Isolation
- Each test creates a temporary directory (via `tmp_path` pytest fixture) for filesystem operations
- No tests modify the actual `/azaion/` directory structure
- No tests require running external services (RabbitMQ, Azaion API, S3 CDN)
- Constants paths are patched/overridden to point to temp directories during tests
## Excluded (Require External Services)
| Component | Service Required | Reason for Exclusion |
|-----------|-----------------|---------------------|
| API upload/download | Azaion REST API | No mock server; real API has auth |
| CDN upload/download | S3-compatible CDN | No mock S3; real CDN has credentials |
| Queue consumption | RabbitMQ Streams | No mock broker; rstream requires live connection |
| TensorRT inference | NVIDIA GPU + TensorRT | Hardware-specific; cannot run in CI without GPU |
@@ -0,0 +1,33 @@
# Performance Test Scenarios
## PT-AUG-01: Augmentation throughput
- **Input**: 10 images from fixture dataset
- **Action**: Run `augment_annotations()`, measure wall time
- **Expected**: Completes within 60 seconds (10 images × 8 outputs = 80 files)
- **Traces**: Restriction: Augmentation runs continuously
- **Note**: Threshold is generous; actual performance depends on CPU
## PT-AUG-02: Parallel augmentation speedup
- **Input**: 10 images from fixture dataset
- **Action**: Run with ThreadPoolExecutor vs sequential, compare times
- **Expected**: Parallel is ≥ 1.5× faster than sequential
- **Traces**: AC: Parallelized per-image processing
## PT-DSF-01: Dataset formation throughput
- **Input**: 100 images + labels
- **Action**: Run `form_dataset()`, measure wall time
- **Expected**: Completes within 30 seconds
- **Traces**: Restriction: Dataset formation before training
## PT-ENC-01: Encryption throughput
- **Input**: 10MB random bytes
- **Action**: Encrypt + decrypt roundtrip, measure wall time
- **Expected**: Completes within 5 seconds
- **Traces**: AC: Model encryption feasible for large models
## PT-INF-01: ONNX inference latency (single image)
- **Input**: 1 preprocessed image + ONNX model
- **Action**: Run single inference, measure wall time
- **Expected**: Completes within 10 seconds on CPU (no GPU requirement for test)
- **Traces**: AC: Inference capability
- **Note**: Production uses GPU; CPU is slower but validates correctness
@@ -0,0 +1,37 @@
# Resilience Test Scenarios
## RT-AUG-01: Augmentation handles corrupted image gracefully
- **Input**: 1 valid image + 1 corrupted image file (truncated JPEG) in data/ dir
- **Action**: Run `augment_annotations()`
- **Expected**: Valid image produces 8 outputs; corrupted image skipped without crashing pipeline; total output: 8 files
- **Traces**: Restriction: Augmentation exception handling per-image
## RT-AUG-02: Augmentation handles missing label file
- **Input**: 1 image with no matching label file
- **Action**: Run `augment_annotation()` on the image
- **Expected**: Exception caught per-thread; does not crash pipeline
- **Traces**: Restriction: Augmentation exception handling
## RT-AUG-03: Augmentation transform failure produces fewer variants
- **Input**: 1 image + label that causes some transforms to fail (extremely narrow bbox)
- **Action**: Run `augment_inner()`
- **Expected**: Returns 1-8 ImageLabel objects (original always present; failed variants skipped); no crash
- **Traces**: Restriction: Transform failure handling
## RT-DSF-01: Dataset formation with empty processed directory
- **Input**: Empty processed images dir
- **Action**: Run `form_dataset()`
- **Expected**: Creates empty train/valid/test directories; no crash
- **Traces**: Restriction: Edge case handling
## RT-ENC-01: Decrypt with corrupted ciphertext
- **Input**: Randomly modified ciphertext bytes
- **Action**: `Security.decrypt_to(corrupted_bytes, key)`
- **Expected**: Either raises exception or returns garbage bytes (not original)
- **Traces**: AC: Encryption integrity
## RT-AQM-01: Malformed msgpack message
- **Input**: Random bytes that aren't valid msgpack
- **Action**: Pass to message handler
- **Expected**: Exception caught; handler doesn't crash
- **Traces**: AC: Error handling for malformed messages
@@ -0,0 +1,31 @@
# Resource Limit Test Scenarios
## RL-AUG-01: Augmentation output count bounded
- **Input**: 1 image
- **Action**: Run `augment_inner()`
- **Expected**: Returns exactly 8 outputs (never more, even with retries)
- **Traces**: AC: 8× augmentation ratio (1 original + 7 augmented)
## RL-DSF-01: Dataset split ratios sum to 100%
- **Input**: Any number of images
- **Action**: Check `train_set + valid_set + test_set`
- **Expected**: Equals 100
- **Traces**: AC: 70/20/10 split
## RL-DSF-02: No data duplication across splits
- **Input**: 100 images
- **Action**: Run `form_dataset()`, collect all filenames across train/valid/test
- **Expected**: No filename appears in more than one split
- **Traces**: AC: Dataset integrity
## RL-ENC-01: Encrypted output size bounded
- **Input**: N bytes plaintext
- **Action**: Encrypt
- **Expected**: Ciphertext size ≤ N + 32 bytes (16 IV + up to 16 padding)
- **Traces**: Restriction: AES-256-CBC overhead
## RL-CLS-01: Total class count is exactly 80
- **Input**: `classes.json`
- **Action**: Generate class list for YAML
- **Expected**: Exactly 80 entries (17 named × 3 weather + 29 placeholders = 80)
- **Traces**: AC: 80 total class slots
+43
View File
@@ -0,0 +1,43 @@
# Security Test Scenarios
## ST-ENC-01: Encryption produces different ciphertext each time (random IV)
- **Input**: Same 1024 bytes, same key, encrypt twice
- **Action**: Compare two ciphertexts
- **Expected**: Ciphertexts differ (random IV ensures non-deterministic output)
- **Traces**: AC: AES-256-CBC with random IV
## ST-ENC-02: Wrong key cannot recover plaintext
- **Input**: Encrypt with "key-a", attempt decrypt with "key-b"
- **Action**: `Security.decrypt_to(encrypted, "key-b")`
- **Expected**: Output != original plaintext
- **Traces**: AC: Key-dependent encryption
## ST-ENC-03: Model encryption key is deterministic
- **Input**: Call `Security.get_model_encryption_key()` twice
- **Action**: Compare results
- **Expected**: Identical strings
- **Traces**: AC: Static model encryption key
## ST-HSH-01: Hardware hash is deterministic for same input
- **Input**: Same hardware info string
- **Action**: `Security.get_hw_hash()` called twice
- **Expected**: Identical output
- **Traces**: AC: Hardware fingerprinting determinism
## ST-HSH-02: Different hardware produces different hash
- **Input**: Two different hardware info strings
- **Action**: `Security.get_hw_hash()` on each
- **Expected**: Different outputs
- **Traces**: AC: Hardware-bound uniqueness
## ST-HSH-03: API encryption key depends on credentials + hardware
- **Input**: Same credentials with different hardware hashes
- **Action**: `Security.get_api_encryption_key()` for each
- **Expected**: Different keys
- **Traces**: AC: Hardware-bound API encryption
## ST-HSH-04: API encryption key depends on credentials
- **Input**: Different credentials with same hardware hash
- **Action**: `Security.get_api_encryption_key()` for each
- **Expected**: Different keys
- **Traces**: AC: Credential-dependent API encryption
+26
View File
@@ -0,0 +1,26 @@
# Test Data Management
## Fixture Sources
| ID | Data Item | Source | Format | Preparation |
|----|-----------|--------|--------|-------------|
| FD-01 | Annotated images (100) | `_docs/00_problem/input_data/dataset/images/` | JPEG | Copy subset to tmp_path at test start |
| FD-02 | YOLO labels (100) | `_docs/00_problem/input_data/dataset/labels/` | TXT | Copy subset to tmp_path at test start |
| FD-03 | ONNX model | `_docs/00_problem/input_data/azaion.onnx` | ONNX | Read bytes at test start |
| FD-04 | Class definitions | `classes.json` (project root) | JSON | Copy to tmp_path at test start |
| FD-05 | Corrupted labels | Generated at test time | TXT | Create labels with coords > 1.0 |
| FD-06 | Edge-case bboxes | Generated at test time | In-memory | Construct bboxes near image boundaries |
| FD-07 | Detection objects | Generated at test time | In-memory | Construct Detection instances for NMS tests |
| FD-08 | Msgpack messages | Generated at test time | bytes | Construct AnnotationMessage-compatible msgpack |
| FD-09 | Random binary data | Generated at test time | bytes | `os.urandom(N)` for encryption tests |
| FD-10 | Empty label file | Generated at test time | TXT | Empty file for augmentation edge case |
## Data Lifecycle
1. **Setup**: pytest `conftest.py` copies fixture files to `tmp_path`
2. **Execution**: Tests operate on copied data in isolation
3. **Teardown**: `tmp_path` is automatically cleaned by pytest
## Expected Results Location
All expected results are defined in `_docs/00_problem/input_data/expected_results/results_report.md` (37 test scenarios mapped).
@@ -0,0 +1,67 @@
# Traceability Matrix
## Acceptance Criteria Coverage
| AC / Restriction | Test IDs | Coverage |
|------------------|----------|----------|
| 8× augmentation ratio | BT-AUG-01, BT-AUG-06, BT-AUG-07, RL-AUG-01 | Full |
| Augmentation naming convention | BT-AUG-02 | Full |
| Bounding boxes clipped to [0,1] | BT-AUG-03, BT-AUG-04 | Full |
| Tiny bboxes (< 0.01) discarded | BT-AUG-05 | Full |
| Augmentation skips already-processed | BT-AUG-08 | Full |
| Augmentation parallelized | PT-AUG-02 | Full |
| Augmentation handles corrupted images | RT-AUG-01 | Full |
| Augmentation handles missing labels | RT-AUG-02 | Full |
| Transform failure graceful | RT-AUG-03 | Full |
| Dataset split 70/20/10 | BT-DSF-01, RL-DSF-01 | Full |
| Dataset directory structure | BT-DSF-02 | Full |
| Dataset integrity (no data loss) | BT-DSF-03, RL-DSF-02 | Full |
| Corrupted label filtering | BT-DSF-04, BT-LBL-01 to BT-LBL-05 | Full |
| AES-256-CBC encryption | BT-ENC-01 to BT-ENC-06, ST-ENC-01, ST-ENC-02 | Full |
| Model encryption roundtrip | BT-ENC-02 | Full |
| Model split ≤3KB or 20% | BT-SPL-01, BT-SPL-02 | Full |
| 17 base classes | BT-CLS-01 | Full |
| 3 weather modes (Norm/Wint/Night) | BT-CLS-02 | Full |
| 80 total class slots | BT-CLS-03, RL-CLS-01 | Full |
| YAML generation (nc: 80) | BT-CLS-03 | Full |
| Hardware hash determinism | BT-HSH-01 to BT-HSH-03, ST-HSH-01, ST-HSH-02 | Full |
| Hardware-bound API encryption | ST-HSH-03, ST-HSH-04 | Full |
| ONNX inference loads model | BT-INF-01 | Full |
| ONNX inference returns detections | BT-INF-02, BT-INF-03 | Full |
| NMS overlap removal (IoU 0.3) | BT-NMS-01, BT-NMS-02, BT-NMS-03 | Full |
| Annotation message parsing | BT-AQM-01 to BT-AQM-04, RT-AQM-01 | Full |
| Encryption size overhead bounded | RL-ENC-01 | Full |
| Static model encryption key | ST-ENC-03 | Full |
| Random IV per encryption | ST-ENC-01 | Full |
## Uncovered (Require External Services)
| AC / Restriction | Reason |
|------------------|--------|
| TensorRT inference (54s for 200s video) | Requires NVIDIA GPU + TensorRT runtime |
| API upload/download with JWT auth | Requires live Azaion API |
| CDN upload/download (S3) | Requires live S3-compatible CDN |
| Queue offset persistence | Requires live RabbitMQ Streams |
| Auto-relogin on 401/403 | Requires live Azaion API |
| Frame sampling every 4th frame | Requires video file (fixture not provided) |
| Confidence threshold 0.3 filtering | Partially covered by BT-INF-03 (validates range, not exact threshold) |
## Summary
| Metric | Value |
|--------|-------|
| Total AC + Restrictions | 36 |
| Covered by tests | 29 |
| Uncovered (external deps) | 7 |
| **Coverage** | **80.6%** |
## Test Count Summary
| Category | Count |
|----------|-------|
| Blackbox tests | 32 |
| Performance tests | 5 |
| Resilience tests | 6 |
| Security tests | 7 |
| Resource limit tests | 5 |
| **Total** | **55** |