mirror of
https://github.com/azaion/ai-training.git
synced 2026-04-22 12:56:35 +00:00
Refactor constants management to use Pydantic BaseModel for configuration
- Replaced module-level path variables in constants.py with a structured Pydantic Config class. - Updated all relevant modules (train.py, augmentation.py, exports.py, dataset-visualiser.py, manual_run.py) to access paths through the new config structure. - Fixed bugs related to image processing and model saving. - Enhanced test infrastructure to accommodate the new configuration approach. This refactor improves code maintainability and clarity by centralizing configuration management.
This commit is contained in:
@@ -0,0 +1,205 @@
|
||||
# Codebase Discovery
|
||||
|
||||
## Directory Tree
|
||||
|
||||
```
|
||||
ai-training/
|
||||
├── annotation-queue/ # Separate sub-service: annotation message queue consumer
|
||||
│ ├── annotation_queue_dto.py
|
||||
│ ├── annotation_queue_handler.py
|
||||
│ ├── classes.json
|
||||
│ ├── config.yaml
|
||||
│ ├── offset.yaml
|
||||
│ ├── requirements.txt
|
||||
│ └── run.sh
|
||||
├── dto/ # Data transfer objects for the training pipeline
|
||||
│ ├── annotationClass.py
|
||||
│ ├── annotation_bulk_message.py (empty)
|
||||
│ ├── annotation_message.py (empty)
|
||||
│ └── imageLabel.py
|
||||
├── inference/ # Inference engine subsystem (ONNX + TensorRT)
|
||||
│ ├── __init__.py (empty)
|
||||
│ ├── dto.py
|
||||
│ ├── inference.py
|
||||
│ ├── onnx_engine.py
|
||||
│ └── tensorrt_engine.py
|
||||
├── orangepi5/ # Setup scripts for OrangePi5 edge device
|
||||
│ ├── 01 install.sh
|
||||
│ ├── 02 install-inference.sh
|
||||
│ └── 03 run_inference.sh
|
||||
├── scripts/
|
||||
│ └── init-sftp.sh
|
||||
├── tests/
|
||||
│ ├── data.yaml
|
||||
│ ├── imagelabel_visualize_test.py
|
||||
│ ├── libomp140.x86_64.dll (binary workaround for Windows)
|
||||
│ └── security_test.py
|
||||
├── api_client.py # API client for Azaion backend + CDN resource management
|
||||
├── augmentation.py # Image augmentation pipeline (albumentations)
|
||||
├── cdn_manager.py # S3-compatible CDN upload/download via boto3
|
||||
├── cdn.yaml # CDN credentials config
|
||||
├── checkpoint.txt # Last training checkpoint timestamp
|
||||
├── classes.json # Annotation class definitions (17 classes + weather modes)
|
||||
├── config.yaml # Main config (API url, queue, directories)
|
||||
├── constants.py # Shared path constants and config keys
|
||||
├── convert-annotations.py # Annotation format converter (Pascal VOC / bbox → YOLO)
|
||||
├── dataset-visualiser.py # Interactive dataset visualization tool
|
||||
├── exports.py # Model export (ONNX, TensorRT, RKNN) and upload
|
||||
├── hardware_service.py # Hardware fingerprinting (CPU/GPU/RAM/drive serial)
|
||||
├── install.sh # Dependency installation script
|
||||
├── manual_run.py # Manual training/export entry point
|
||||
├── requirements.txt # Python dependencies
|
||||
├── security.py # AES-256-CBC encryption/decryption + key derivation
|
||||
├── start_inference.py # Inference entry point (downloads model, runs TensorRT)
|
||||
├── train.py # Main training pipeline (dataset formation → YOLO training → export)
|
||||
└── utils.py # Utility classes (Dotdict)
|
||||
```
|
||||
|
||||
## Tech Stack Summary
|
||||
|
||||
| Category | Technology | Details |
|
||||
|----------|-----------|---------|
|
||||
| Language | Python 3.10+ | Match statements used (3.10 feature) |
|
||||
| ML Framework | Ultralytics (YOLO) | YOLOv11 object detection model |
|
||||
| Deep Learning | PyTorch 2.3.0 (CUDA 12.1) | GPU-accelerated training |
|
||||
| Inference (Primary) | TensorRT | GPU inference with FP16/INT8 support |
|
||||
| Inference (Fallback) | ONNX Runtime GPU | Cross-platform inference |
|
||||
| Augmentation | Albumentations | Image augmentation pipeline |
|
||||
| Computer Vision | OpenCV (cv2) | Image I/O, preprocessing, visualization |
|
||||
| CDN/Storage | boto3 (S3-compatible) | Model artifact storage |
|
||||
| Message Queue | RabbitMQ Streams (rstream) | Annotation message consumption |
|
||||
| Serialization | msgpack | Queue message deserialization |
|
||||
| Encryption | cryptography (AES-256-CBC) | Model encryption, API resource encryption |
|
||||
| GPU Management | pycuda, pynvml | CUDA memory management, device queries |
|
||||
| HTTP | requests | API communication |
|
||||
| Config | PyYAML | Configuration files |
|
||||
| Visualization | matplotlib, netron | Annotation display, model graph viewer |
|
||||
| Edge Deployment | RKNN (RK3588) | OrangePi5 inference target |
|
||||
|
||||
## Dependency Graph
|
||||
|
||||
### Internal Module Dependencies (textual)
|
||||
|
||||
**Leaves (no internal dependencies):**
|
||||
- `constants` — path constants, config keys
|
||||
- `utils` — Dotdict helper
|
||||
- `security` — encryption/decryption, key derivation
|
||||
- `hardware_service` — hardware fingerprinting
|
||||
- `cdn_manager` — S3-compatible CDN client
|
||||
- `dto/annotationClass` — annotation class model + JSON reader
|
||||
- `dto/imageLabel` — image+labels container with visualization
|
||||
- `inference/dto` — Detection, Annotation, AnnotationClass (inference-specific)
|
||||
- `inference/onnx_engine` — InferenceEngine ABC + OnnxEngine implementation
|
||||
- `convert-annotations` — standalone annotation format converter
|
||||
- `annotation-queue/annotation_queue_dto` — queue message DTOs
|
||||
|
||||
**Level 1 (depends on leaves):**
|
||||
- `api_client` → constants, cdn_manager, hardware_service, security
|
||||
- `augmentation` → constants, dto/imageLabel
|
||||
- `inference/tensorrt_engine` → inference/onnx_engine (InferenceEngine ABC)
|
||||
- `inference/inference` → inference/dto, inference/onnx_engine
|
||||
- `annotation-queue/annotation_queue_handler` → annotation_queue_dto
|
||||
|
||||
**Level 2 (depends on level 1):**
|
||||
- `exports` → constants, api_client, cdn_manager, security, utils
|
||||
|
||||
**Level 3 (depends on level 2):**
|
||||
- `train` → constants, api_client, cdn_manager, dto/annotationClass, inference/onnx_engine, security, utils, exports
|
||||
- `start_inference` → constants, api_client, cdn_manager, inference/inference, inference/tensorrt_engine, security, utils
|
||||
|
||||
**Level 4 (depends on level 3):**
|
||||
- `manual_run` → constants, train, augmentation
|
||||
|
||||
**Broken dependency:**
|
||||
- `dataset-visualiser` → constants, dto/annotationClass, dto/imageLabel, **preprocessing** (module not found in codebase)
|
||||
|
||||
### Dependency Graph (Mermaid)
|
||||
|
||||
```mermaid
|
||||
graph TD
|
||||
constants --> api_client
|
||||
constants --> augmentation
|
||||
constants --> exports
|
||||
constants --> train
|
||||
constants --> manual_run
|
||||
constants --> start_inference
|
||||
constants --> dataset-visualiser
|
||||
|
||||
utils --> exports
|
||||
utils --> train
|
||||
utils --> start_inference
|
||||
|
||||
security --> api_client
|
||||
security --> exports
|
||||
security --> train
|
||||
security --> start_inference
|
||||
|
||||
hardware_service --> api_client
|
||||
|
||||
cdn_manager --> api_client
|
||||
cdn_manager --> exports
|
||||
cdn_manager --> train
|
||||
cdn_manager --> start_inference
|
||||
|
||||
api_client --> exports
|
||||
api_client --> train
|
||||
api_client --> start_inference
|
||||
|
||||
dto_annotationClass[dto/annotationClass] --> train
|
||||
dto_annotationClass --> dataset-visualiser
|
||||
|
||||
dto_imageLabel[dto/imageLabel] --> augmentation
|
||||
dto_imageLabel --> dataset-visualiser
|
||||
|
||||
inference_dto[inference/dto] --> inference_inference[inference/inference]
|
||||
inference_onnx[inference/onnx_engine] --> inference_inference
|
||||
inference_onnx --> inference_trt[inference/tensorrt_engine]
|
||||
inference_onnx --> train
|
||||
|
||||
inference_inference --> start_inference
|
||||
inference_trt --> start_inference
|
||||
|
||||
exports --> train
|
||||
train --> manual_run
|
||||
augmentation --> manual_run
|
||||
|
||||
aq_dto[annotation-queue/annotation_queue_dto] --> aq_handler[annotation-queue/annotation_queue_handler]
|
||||
```
|
||||
|
||||
## Topological Processing Order
|
||||
|
||||
| Batch | Modules |
|
||||
|-------|---------|
|
||||
| 1 (leaves) | constants, utils, security, hardware_service, cdn_manager |
|
||||
| 2 (leaves) | dto/annotationClass, dto/imageLabel, inference/dto, inference/onnx_engine |
|
||||
| 3 (level 1) | api_client, augmentation, inference/tensorrt_engine, inference/inference |
|
||||
| 4 (level 2) | exports, convert-annotations, dataset-visualiser |
|
||||
| 5 (level 3) | train, start_inference |
|
||||
| 6 (level 4) | manual_run |
|
||||
| 7 (separate) | annotation-queue/annotation_queue_dto, annotation-queue/annotation_queue_handler |
|
||||
|
||||
## Entry Points
|
||||
|
||||
| Entry Point | Description |
|
||||
|-------------|-------------|
|
||||
| `train.py` (`__main__`) | Main pipeline: form dataset → train YOLO → export + upload ONNX model |
|
||||
| `augmentation.py` (`__main__`) | Continuous augmentation loop (runs indefinitely) |
|
||||
| `start_inference.py` (`__main__`) | Download encrypted TensorRT model → run video inference |
|
||||
| `manual_run.py` (script) | Ad-hoc training/export commands |
|
||||
| `convert-annotations.py` (`__main__`) | One-shot annotation format conversion |
|
||||
| `dataset-visualiser.py` (`__main__`) | Interactive annotation visualization |
|
||||
| `annotation-queue/annotation_queue_handler.py` (`__main__`) | Async queue consumer for annotation CRUD events |
|
||||
|
||||
## Leaf Modules
|
||||
|
||||
constants, utils, security, hardware_service, cdn_manager, dto/annotationClass, dto/imageLabel, inference/dto, inference/onnx_engine, convert-annotations, annotation-queue/annotation_queue_dto
|
||||
|
||||
## Observations
|
||||
|
||||
- **Security concern**: `config.yaml` and `cdn.yaml` contain hardcoded credentials (API passwords, S3 access keys). These should be moved to environment variables or a secrets manager.
|
||||
- **Missing module**: `dataset-visualiser.py` imports from `preprocessing` which does not exist in the codebase.
|
||||
- **Duplicate code**: `AnnotationClass` and `WeatherMode` are defined in three separate locations: `dto/annotationClass.py`, `inference/dto.py`, and `annotation-queue/annotation_queue_dto.py`.
|
||||
- **Empty files**: `dto/annotation_bulk_message.py`, `dto/annotation_message.py`, and `inference/__init__.py` are empty.
|
||||
- **Separate sub-service**: `annotation-queue/` has its own `requirements.txt` and `config.yaml`, functioning as an independent service.
|
||||
- **Hardcoded encryption key**: `security.py` has a hardcoded model encryption key string.
|
||||
- **No formal test framework**: tests are script-based, not using pytest/unittest.
|
||||
@@ -0,0 +1,138 @@
|
||||
# Verification Log
|
||||
|
||||
## Summary
|
||||
|
||||
| Metric | Count |
|
||||
|--------|-------|
|
||||
| Entities verified | 87 |
|
||||
| Entities flagged | 0 |
|
||||
| Corrections applied | 0 |
|
||||
| Bugs found in code | 5 |
|
||||
| Missing modules | 1 |
|
||||
| Duplicated code | 1 pattern (3 locations) |
|
||||
| Security issues | 3 |
|
||||
| Completeness | 21/21 modules (100%) |
|
||||
|
||||
## Entity Verification
|
||||
|
||||
All class names, function names, method signatures, and module names referenced in documentation were verified against the actual source code. No hallucinated entities found.
|
||||
|
||||
### Verified Entities (key samples)
|
||||
|
||||
| Entity | Location | Doc Reference | Status |
|
||||
|--------|----------|--------------|--------|
|
||||
| `Security.encrypt_to` | security.py:14 | modules/security.md | OK |
|
||||
| `Security.decrypt_to` | security.py:28 | modules/security.md | OK |
|
||||
| `Security.get_model_encryption_key` | security.py:66 | modules/security.md | OK |
|
||||
| `get_hardware_info` | hardware_service.py:5 | modules/hardware_service.md | OK |
|
||||
| `CDNManager.upload` | cdn_manager.py:28 | modules/cdn_manager.md | OK |
|
||||
| `CDNManager.download` | cdn_manager.py:37 | modules/cdn_manager.md | OK |
|
||||
| `ApiClient.login` | api_client.py:43 | modules/api_client.md | OK |
|
||||
| `ApiClient.load_bytes` | api_client.py:63 | modules/api_client.md | OK |
|
||||
| `ApiClient.upload_big_small_resource` | api_client.py:113 | modules/api_client.md | OK |
|
||||
| `Augmentator.augment_annotations` | augmentation.py:125 | modules/augmentation.md | OK |
|
||||
| `Augmentator.augment_inner` | augmentation.py:55 | modules/augmentation.md | OK |
|
||||
| `InferenceEngine` (ABC) | inference/onnx_engine.py:7 | modules/inference_onnx_engine.md | OK |
|
||||
| `OnnxEngine` | inference/onnx_engine.py:25 | modules/inference_onnx_engine.md | OK |
|
||||
| `TensorRTEngine` | inference/tensorrt_engine.py:16 | modules/inference_tensorrt_engine.md | OK |
|
||||
| `TensorRTEngine.convert_from_onnx` | inference/tensorrt_engine.py:104 | modules/inference_tensorrt_engine.md | OK |
|
||||
| `Inference.process` | inference/inference.py:83 | modules/inference_inference.md | OK |
|
||||
| `Inference.remove_overlapping_detections` | inference/inference.py:120 | modules/inference_inference.md | OK |
|
||||
| `AnnotationQueueHandler.on_message` | annotation-queue/annotation_queue_handler.py:87 | modules/annotation_queue_handler.md | OK |
|
||||
| `AnnotationMessage` | annotation-queue/annotation_queue_dto.py:91 | modules/annotation_queue_dto.md | OK |
|
||||
| `form_dataset` | train.py:42 | modules/train.md | OK |
|
||||
| `train_dataset` | train.py:147 | modules/train.md | OK |
|
||||
| `export_onnx` | exports.py:29 | modules/exports.md | OK |
|
||||
| `export_rknn` | exports.py:19 | modules/exports.md | OK |
|
||||
| `export_tensorrt` | exports.py:45 | modules/exports.md | OK |
|
||||
| `upload_model` | exports.py:82 | modules/exports.md | OK |
|
||||
| `WeatherMode` | dto/annotationClass.py:6 | modules/dto_annotationClass.md | OK |
|
||||
| `AnnotationClass.read_json` | dto/annotationClass.py:18 | modules/dto_annotationClass.md | OK |
|
||||
| `ImageLabel.visualize` | dto/imageLabel.py:12 | modules/dto_imageLabel.md | OK |
|
||||
| `Dotdict` | utils.py:1 | modules/utils.md | OK |
|
||||
|
||||
## Code Bugs Found During Verification
|
||||
|
||||
### Bug 1: `augmentation.py` — undefined attribute `total_to_process`
|
||||
- **Location**: augmentation.py, line 118
|
||||
- **Issue**: References `self.total_to_process` but only `self.total_images_to_process` is defined in `__init__`
|
||||
- **Impact**: AttributeError at runtime during progress logging
|
||||
- **Documented in**: modules/augmentation.md, components/05_data_pipeline/description.md
|
||||
|
||||
### Bug 2: `train.py` `copy_annotations` — reporting bug
|
||||
- **Location**: train.py, line 93 and 99
|
||||
- **Issue**: `copied = 0` is declared but never incremented. The global `total_files_copied` is incremented inside the inner function, but `copied` is printed in the final message: `f'Copied all {copied} annotations'` always prints 0.
|
||||
- **Impact**: Incorrect progress reporting (cosmetic)
|
||||
- **Documented in**: modules/train.md, components/06_training/description.md
|
||||
|
||||
### Bug 3: `exports.py` `upload_model` — stale ApiClient constructor call
|
||||
- **Location**: exports.py, line 97
|
||||
- **Issue**: `ApiClient(ApiCredentials(api_c.url, api_c.user, api_c.pw, api_c.folder))` — but `ApiClient.__init__` takes no args, and `ApiCredentials.__init__` takes `(url, email, password)`, not `(url, user, pw, folder)`.
|
||||
- **Impact**: `upload_model` function would fail at runtime. This function appears to be stale code — the actual upload flow in `train.py:export_current_model` uses the correct `ApiClient()` constructor.
|
||||
- **Documented in**: modules/exports.md, components/06_training/description.md
|
||||
|
||||
### Bug 4: `inference/tensorrt_engine.py` — potential uninitialized `batch_size`
|
||||
- **Location**: inference/tensorrt_engine.py, line 43–44
|
||||
- **Issue**: `self.batch_size` is only set if `engine_input_shape[0] != -1`. If the batch dimension is dynamic (-1), `self.batch_size` is never assigned before being used in `self.input_shape = [self.batch_size, ...]`.
|
||||
- **Impact**: NameError at runtime for models with dynamic batch size (unless batch_size is passed via kwargs/set elsewhere)
|
||||
- **Documented in**: modules/inference_tensorrt_engine.md, components/07_inference/description.md
|
||||
|
||||
### Bug 5: `dataset-visualiser.py` — missing import
|
||||
- **Location**: dataset-visualiser.py, line 6
|
||||
- **Issue**: `from preprocessing import read_labels` — the `preprocessing` module does not exist in the codebase.
|
||||
- **Impact**: Script cannot run; ImportError at startup
|
||||
- **Documented in**: modules/dataset_visualiser.md, components/05_data_pipeline/description.md
|
||||
|
||||
## Missing Modules
|
||||
|
||||
| Module | Referenced By | Status |
|
||||
|--------|-------------|--------|
|
||||
| `preprocessing` | dataset-visualiser.py, tests/imagelabel_visualize_test.py | Not found in codebase |
|
||||
|
||||
## Duplicated Code
|
||||
|
||||
### AnnotationClass + WeatherMode (3 locations)
|
||||
| Location | Differences |
|
||||
|----------|-------------|
|
||||
| `dto/annotationClass.py` | Standard version. `color_tuple` property strips first 3 chars. |
|
||||
| `inference/dto.py` | Adds `opencv_color` BGR field. Same `read_json` logic. |
|
||||
| `annotation-queue/annotation_queue_dto.py` | Adds `opencv_color`. Reads `classes.json` from CWD (not relative to package). |
|
||||
|
||||
## Security Issues
|
||||
|
||||
| Issue | Location | Severity |
|
||||
|-------|----------|----------|
|
||||
| Hardcoded API credentials | config.yaml (email, password) | High |
|
||||
| Hardcoded CDN access keys | cdn.yaml (4 access keys) | High |
|
||||
| Hardcoded encryption key | security.py:67 (`get_model_encryption_key`) | High |
|
||||
| Queue credentials in plaintext | config.yaml, annotation-queue/config.yaml | Medium |
|
||||
| No TLS cert validation in API calls | api_client.py | Low |
|
||||
|
||||
## Completeness Check
|
||||
|
||||
All 21 source modules documented. All 8 components cover all modules with no gaps.
|
||||
|
||||
| Component | Modules | Complete |
|
||||
|-----------|---------|----------|
|
||||
| 01 Core | constants, utils | Yes |
|
||||
| 02 Security | security, hardware_service | Yes |
|
||||
| 03 API & CDN | api_client, cdn_manager | Yes |
|
||||
| 04 Data Models | dto/annotationClass, dto/imageLabel | Yes |
|
||||
| 05 Data Pipeline | augmentation, convert-annotations, dataset-visualiser | Yes |
|
||||
| 06 Training | train, exports, manual_run | Yes |
|
||||
| 07 Inference | inference/dto, onnx_engine, tensorrt_engine, inference, start_inference | Yes |
|
||||
| 08 Annotation Queue | annotation_queue_dto, annotation_queue_handler | Yes |
|
||||
|
||||
## Consistency Check
|
||||
|
||||
- Component docs agree with architecture doc: Yes
|
||||
- Flow diagrams match component interfaces: Yes
|
||||
- Module dependency graph in discovery matches import analysis: Yes
|
||||
- Data model doc matches filesystem layout in architecture: Yes
|
||||
|
||||
## Remaining Gaps / Uncertainties
|
||||
|
||||
- The `preprocessing` module may have existed previously and been deleted or renamed
|
||||
- `exports.upload_model` may be intentionally deprecated in favor of the ApiClient-based flow in train.py
|
||||
- `checkpoint.txt` content (`2024-06-27 20:51:35`) suggests training infrastructure was last used in mid-2024
|
||||
- The `orangepi5/` shell scripts were not analyzed (bash, not Python) — they appear to be setup/run scripts for edge deployment
|
||||
@@ -0,0 +1,100 @@
|
||||
# Final Documentation Report — Azaion AI Training
|
||||
|
||||
## Executive Summary
|
||||
|
||||
Azaion AI Training is a Python-based ML pipeline for training, deploying, and running YOLOv11 object detection models targeting aerial military asset recognition. The system comprises 8 components (21 modules) spanning annotation ingestion, data augmentation, GPU-accelerated training, multi-format model export, encrypted model distribution, and real-time inference — with edge deployment capability via RKNN on OrangePi5 devices.
|
||||
|
||||
The codebase is functional and production-used (last training run: 2024-06-27) but has no CI/CD, no containerization, no formal test framework, and several hardcoded credentials. Verification identified 5 code bugs, 3 high-severity security issues, and 1 missing module.
|
||||
|
||||
## Problem Statement
|
||||
|
||||
The system automates detection of 17 classes of military objects and infrastructure in aerial/satellite imagery across 3 weather conditions (Normal, Winter, Night). It replaces manual image analysis with a continuous pipeline: human-annotated data flows in via RabbitMQ, is augmented 8× for training diversity, trains YOLOv11 models over multi-day GPU runs, and distributes encrypted models to inference clients that run real-time video detection.
|
||||
|
||||
## Architecture Overview
|
||||
|
||||
**Tech stack**: Python 3.10+ · PyTorch 2.3.0 (CUDA 12.1) · Ultralytics YOLOv11m · TensorRT · ONNX Runtime · Albumentations · boto3 · rstream · cryptography
|
||||
|
||||
**Deployment**: 5 independent processes (no orchestration, no containers) running on GPU-equipped servers. Manual deployment.
|
||||
|
||||
## Component Summary
|
||||
|
||||
| # | Component | Modules | Purpose | Key Dependencies |
|
||||
|---|-----------|---------|---------|-----------------|
|
||||
| 01 | Core Infrastructure | constants, utils | Shared paths, config keys, Dotdict helper | None |
|
||||
| 02 | Security & Hardware | security, hardware_service | AES-256-CBC encryption, hardware fingerprinting | cryptography, pynvml |
|
||||
| 03 | API & CDN Client | api_client, cdn_manager | REST API (JWT auth) + S3 CDN communication | requests, boto3, Security |
|
||||
| 04 | Data Models | dto/annotationClass, dto/imageLabel | Annotation class definitions, image+label container | OpenCV, matplotlib |
|
||||
| 05 | Data Pipeline | augmentation, convert-annotations, dataset-visualiser | 8× augmentation, format conversion, visualization | Albumentations, Data Models |
|
||||
| 06 | Training Pipeline | train, exports, manual_run | Dataset formation → YOLO training → export → encrypted upload | Ultralytics, API & CDN, Security |
|
||||
| 07 | Inference Engine | inference/dto, onnx_engine, tensorrt_engine, inference, start_inference | Model download, decryption, TensorRT/ONNX video inference | TensorRT, ONNX Runtime, PyCUDA |
|
||||
| 08 | Annotation Queue | annotation_queue_dto, annotation_queue_handler | Async RabbitMQ Streams consumer for annotation CRUD events | rstream, msgpack |
|
||||
|
||||
## System Flows
|
||||
|
||||
| # | Flow | Entry Point | Path | Output |
|
||||
|---|------|-------------|------|--------|
|
||||
| 1 | Annotation Ingestion | RabbitMQ message | Queue → Handler → Filesystem | Images + labels on disk |
|
||||
| 2 | Data Augmentation | Filesystem scan (5-min loop) | /data/ → Augmentator → /data-processed/ | 8× augmented images + labels |
|
||||
| 3 | Training Pipeline | train.py __main__ | /data-processed/ → Dataset split → YOLO train → Export → Encrypt → Upload | Encrypted model on API + CDN |
|
||||
| 4 | Model Download & Inference | start_inference.py __main__ | API + CDN download → Decrypt → TensorRT init → Video frames → Detections | Annotated video output |
|
||||
| 5 | Model Export (Multi-Format) | train.py / manual_run.py | .pt → .onnx / .engine / .rknn | Multi-format model artifacts |
|
||||
|
||||
## Risk Observations
|
||||
|
||||
### Code Bugs (from Verification)
|
||||
|
||||
| # | Location | Issue | Impact |
|
||||
|---|----------|-------|--------|
|
||||
| 1 | augmentation.py:118 | `self.total_to_process` undefined (should be `self.total_images_to_process`) | AttributeError during progress logging |
|
||||
| 2 | train.py:93,99 | `copied` counter never incremented | Incorrect progress reporting (cosmetic) |
|
||||
| 3 | exports.py:97 | Stale `ApiClient(ApiCredentials(...))` constructor call with wrong params | `upload_model` function would fail at runtime |
|
||||
| 4 | inference/tensorrt_engine.py:43-44 | `batch_size` uninitialized for dynamic batch dimensions | NameError for models with dynamic batch size |
|
||||
| 5 | dataset-visualiser.py:6 | Imports from `preprocessing` module that doesn't exist | Script cannot run |
|
||||
|
||||
### Security Issues
|
||||
|
||||
| Issue | Severity | Location |
|
||||
|-------|----------|----------|
|
||||
| Hardcoded API credentials | High | config.yaml |
|
||||
| Hardcoded CDN access keys (4 keys) | High | cdn.yaml |
|
||||
| Hardcoded model encryption key | High | security.py:67 |
|
||||
| Queue credentials in plaintext | Medium | config.yaml, annotation-queue/config.yaml |
|
||||
| No TLS certificate validation | Low | api_client.py |
|
||||
|
||||
### Structural Concerns
|
||||
|
||||
- No CI/CD pipeline or containerization
|
||||
- No formal test framework (2 script-based tests, 1 broken)
|
||||
- Duplicated AnnotationClass/WeatherMode code in 3 locations
|
||||
- No graceful shutdown for augmentation process
|
||||
- No reconnect logic for annotation queue consumer
|
||||
- Manual deployment only
|
||||
|
||||
## Open Questions
|
||||
|
||||
- The `preprocessing` module may have existed previously and been deleted or renamed — its absence breaks `dataset-visualiser.py` and `tests/imagelabel_visualize_test.py`
|
||||
- `exports.upload_model` may be intentionally deprecated in favor of the ApiClient-based flow in `train.py`
|
||||
- The `orangepi5/` shell scripts were not analyzed (bash, not Python) — they appear to be setup/run scripts for edge deployment
|
||||
- `checkpoint.txt` (2024-06-27) suggests training infrastructure was last used in mid-2024
|
||||
|
||||
## Artifact Index
|
||||
|
||||
| Path | Description | Step |
|
||||
|------|-------------|------|
|
||||
| `_docs/00_problem/problem.md` | Problem statement | 6 |
|
||||
| `_docs/00_problem/restrictions.md` | Hardware, software, environment, operational restrictions | 6 |
|
||||
| `_docs/00_problem/acceptance_criteria.md` | Measurable acceptance criteria from code | 6 |
|
||||
| `_docs/00_problem/input_data/data_parameters.md` | Input data schemas and formats | 6 |
|
||||
| `_docs/00_problem/security_approach.md` | Security mechanisms and known issues | 6 |
|
||||
| `_docs/01_solution/solution.md` | Retrospective solution document | 5 |
|
||||
| `_docs/02_document/00_discovery.md` | Codebase discovery: tree, tech stack, dependency graph | 0 |
|
||||
| `_docs/02_document/modules/*.md` | 21 module-level documentation files | 1 |
|
||||
| `_docs/02_document/components/0N_*/description.md` | 8 component specifications | 2 |
|
||||
| `_docs/02_document/diagrams/components.md` | Component relationship diagram (Mermaid) | 2 |
|
||||
| `_docs/02_document/architecture.md` | System architecture document | 3 |
|
||||
| `_docs/02_document/system-flows.md` | 5 system flow diagrams with sequence diagrams | 3 |
|
||||
| `_docs/02_document/data_model.md` | Data model with ER diagram | 3 |
|
||||
| `_docs/02_document/diagrams/flows/flow_*.md` | Individual flow diagrams (4 files) | 3 |
|
||||
| `_docs/02_document/04_verification_log.md` | Verification results: 87 entities, 5 bugs, 3 security issues | 4 |
|
||||
| `_docs/02_document/FINAL_report.md` | This report | 7 |
|
||||
| `_docs/02_document/state.json` | Document skill progress tracking | — |
|
||||
@@ -0,0 +1,175 @@
|
||||
# Architecture
|
||||
|
||||
## System Context
|
||||
|
||||
Azaion AI Training is a Python-based ML pipeline for training, exporting, and deploying YOLOv11 object detection models. The system operates within the Azaion platform ecosystem, consuming annotated image data and producing encrypted inference-ready models.
|
||||
|
||||
### Boundaries
|
||||
|
||||
| Boundary | Interface | Protocol |
|
||||
|----------|-----------|----------|
|
||||
| Azaion REST API | ApiClient | HTTPS (JWT auth) |
|
||||
| S3-compatible CDN | CDNManager (boto3) | HTTPS (S3 API) |
|
||||
| RabbitMQ Streams | rstream Consumer | AMQP 1.0 |
|
||||
| Local filesystem | Direct I/O | POSIX paths at `/azaion/` |
|
||||
| NVIDIA GPU | PyTorch, TensorRT, ONNX RT, PyCUDA | CUDA 12.1 |
|
||||
|
||||
### System Context Diagram
|
||||
|
||||
```mermaid
|
||||
graph LR
|
||||
subgraph "Azaion Platform"
|
||||
API[Azaion REST API]
|
||||
CDN[S3-compatible CDN]
|
||||
Queue[RabbitMQ Streams]
|
||||
end
|
||||
|
||||
subgraph "AI Training System"
|
||||
AQ[Annotation Queue Consumer]
|
||||
AUG[Augmentation Pipeline]
|
||||
TRAIN[Training Pipeline]
|
||||
INF[Inference Engine]
|
||||
end
|
||||
|
||||
subgraph "Storage"
|
||||
FS["/azaion/ filesystem"]
|
||||
end
|
||||
|
||||
subgraph "Hardware"
|
||||
GPU[NVIDIA GPU]
|
||||
end
|
||||
|
||||
Queue -->|annotation events| AQ
|
||||
AQ -->|images + labels| FS
|
||||
FS -->|raw annotations| AUG
|
||||
AUG -->|augmented data| FS
|
||||
FS -->|processed dataset| TRAIN
|
||||
TRAIN -->|trained model| GPU
|
||||
TRAIN -->|encrypted model| API
|
||||
TRAIN -->|encrypted model big part| CDN
|
||||
API -->|encrypted model small part| INF
|
||||
CDN -->|encrypted model big part| INF
|
||||
INF -->|inference| GPU
|
||||
```
|
||||
|
||||
## Tech Stack
|
||||
|
||||
| Layer | Technology | Version/Detail |
|
||||
|-------|-----------|---------------|
|
||||
| Language | Python | 3.10+ (match statements) |
|
||||
| ML Framework | Ultralytics YOLO | YOLOv11 medium |
|
||||
| Deep Learning | PyTorch | 2.3.0 (CUDA 12.1) |
|
||||
| GPU Inference | TensorRT | FP16/INT8, async CUDA streams |
|
||||
| GPU Inference (alt) | ONNX Runtime GPU | CUDAExecutionProvider |
|
||||
| Edge Inference | RKNN | RK3588 (OrangePi5) |
|
||||
| Augmentation | Albumentations | Geometric + color transforms |
|
||||
| Computer Vision | OpenCV | Image I/O, preprocessing, display |
|
||||
| Object Storage | boto3 | S3-compatible CDN |
|
||||
| Message Queue | rstream | RabbitMQ Streams consumer |
|
||||
| Serialization | msgpack | Queue message format |
|
||||
| Encryption | cryptography | AES-256-CBC |
|
||||
| HTTP Client | requests | REST API communication |
|
||||
| Configuration | PyYAML | YAML config files |
|
||||
| Visualization | matplotlib, netron | Annotation display, model graphs |
|
||||
|
||||
## Deployment Model
|
||||
|
||||
The system runs as multiple independent processes on machines with NVIDIA GPUs:
|
||||
|
||||
| Process | Entry Point | Runtime | Typical Host |
|
||||
|---------|------------|---------|-------------|
|
||||
| Training | `train.py` | Long-running (days) | GPU server (RTX 4090, 24GB VRAM) |
|
||||
| Augmentation | `augmentation.py` | Continuous loop (infinite) | Same GPU server or CPU-only |
|
||||
| Annotation Queue | `annotation-queue/annotation_queue_handler.py` | Continuous (async) | Any server with network access |
|
||||
| Inference | `start_inference.py` | On-demand | GPU-equipped machine |
|
||||
| Data Tools | `convert-annotations.py`, `dataset-visualiser.py` | Ad-hoc | Developer machine |
|
||||
|
||||
No containerization (Dockerfile), CI/CD pipeline, or orchestration infrastructure was found in the codebase. Deployment appears to be manual.
|
||||
|
||||
## Data Model Overview
|
||||
|
||||
### Annotation Data Flow
|
||||
|
||||
```
|
||||
Raw annotations (Queue) → /azaion/data-seed/ (unvalidated)
|
||||
→ /azaion/data/ (validated)
|
||||
→ /azaion/data-processed/ (augmented, 8×)
|
||||
→ /azaion/datasets/azaion-{date}/ (train/valid/test split)
|
||||
→ /azaion/data-corrupted/ (invalid labels)
|
||||
→ /azaion/data_deleted/ (soft-deleted)
|
||||
```
|
||||
|
||||
### Annotation Class System
|
||||
|
||||
- 17 base classes (ArmorVehicle, Truck, Vehicle, Artillery, Shadow, Trenches, MilitaryMan, TyreTracks, AdditArmoredTank, Smoke, Plane, Moto, CamouflageNet, CamouflageBranches, Roof, Building, Caponier)
|
||||
- 3 weather modes: Norm (offset 0), Wint (offset 20), Night (offset 40)
|
||||
- Total class slots: 80 (17 × 3 = 51 used, 29 reserved)
|
||||
- Format: YOLO (center_x, center_y, width, height — all normalized 0–1)
|
||||
|
||||
### Model Artifacts
|
||||
|
||||
| Format | Use | Export Details |
|
||||
|--------|-----|---------------|
|
||||
| `.pt` | Training checkpoint | YOLOv11 PyTorch weights |
|
||||
| `.onnx` | Cross-platform inference | 1280px, batch=4, NMS baked in |
|
||||
| `.engine` | GPU inference (production) | TensorRT FP16, batch=4, per-GPU architecture |
|
||||
| `.rknn` | Edge inference | RK3588 target (OrangePi5) |
|
||||
|
||||
## Integration Points
|
||||
|
||||
### Azaion REST API
|
||||
- `POST /login` → JWT token
|
||||
- `POST /resources/{folder}` → file upload (Bearer auth)
|
||||
- `POST /resources/get/{folder}` → encrypted file download (hardware-bound key)
|
||||
|
||||
### S3-compatible CDN
|
||||
- Upload: model big parts (`upload_fileobj`)
|
||||
- Download: model big parts (`download_file`)
|
||||
- Separate read/write access keys
|
||||
|
||||
### RabbitMQ Streams
|
||||
- Queue: `azaion-annotations`
|
||||
- Protocol: AMQP with rstream library
|
||||
- Message format: msgpack with positional integer keys
|
||||
- Offset tracking: persisted to `offset.yaml`
|
||||
|
||||
## Non-Functional Requirements (Observed)
|
||||
|
||||
| Category | Observation | Source |
|
||||
|----------|------------|--------|
|
||||
| Training duration | ~11.5 days for 360K annotations on 1× RTX 4090 | Code comment in train.py |
|
||||
| VRAM usage | batch=11 → ~22GB (batch=12 fails at 24.2GB) | Code comment in train.py |
|
||||
| Inference speed | TensorRT: 54s for 200s video (3.7GB VRAM) | Code comment in start_inference.py |
|
||||
| ONNX inference | 81s for 200s video (6.3GB VRAM) | Code comment in start_inference.py |
|
||||
| Augmentation ratio | 8× (1 original + 7 augmented per image) | augmentation.py |
|
||||
| Frame sampling | Every 4th frame during inference | inference/inference.py |
|
||||
|
||||
## Security Architecture
|
||||
|
||||
| Mechanism | Implementation | Location |
|
||||
|-----------|---------------|----------|
|
||||
| API authentication | JWT token (email/password login) | api_client.py |
|
||||
| Resource encryption | AES-256-CBC (hardware-bound key) | security.py |
|
||||
| Model encryption | AES-256-CBC (static key) | security.py |
|
||||
| Split model storage | Small part on API, big part on CDN | api_client.py |
|
||||
| Hardware fingerprinting | CPU+GPU+RAM+drive serial hash | hardware_service.py |
|
||||
| CDN access control | Separate read/write S3 credentials | cdn_manager.py |
|
||||
|
||||
### Security Concerns
|
||||
- Hardcoded credentials in `config.yaml` and `cdn.yaml`
|
||||
- Hardcoded model encryption key in `security.py`
|
||||
- No TLS certificate validation visible in code
|
||||
- No input validation on API responses
|
||||
- Queue credentials in plaintext config files
|
||||
|
||||
## Key Architectural Decisions
|
||||
|
||||
| Decision | Rationale (inferred) |
|
||||
|----------|---------------------|
|
||||
| YOLOv11 medium at 1280px | Balance between detection quality and training time |
|
||||
| Split model storage | Prevent model theft from single storage compromise |
|
||||
| Hardware-bound API encryption | Tie resource access to authorized machines |
|
||||
| TensorRT for production inference | ~33% faster than ONNX, ~42% less VRAM |
|
||||
| Augmentation as separate process | Decouples data prep from training; runs continuously |
|
||||
| Annotation queue as separate service | Independent lifecycle; different dependency set |
|
||||
| RKNN export for OrangePi5 | Edge deployment on low-power ARM SoC |
|
||||
@@ -0,0 +1,53 @@
|
||||
# Component: Core Infrastructure
|
||||
|
||||
## Overview
|
||||
Shared constants and utility classes that form the foundation for all other components. Provides path definitions, config file references, and helper data structures.
|
||||
|
||||
**Pattern**: Configuration constants + utility library
|
||||
**Upstream**: None (leaf component)
|
||||
**Downstream**: All other components
|
||||
|
||||
## Modules
|
||||
- `constants` — filesystem paths, config keys, thresholds
|
||||
- `utils` — Dotdict helper class
|
||||
|
||||
## Internal Interfaces
|
||||
|
||||
### constants (public symbols)
|
||||
All path/string constants — see module doc for full list. Key exports:
|
||||
- Directory paths: `data_dir`, `processed_dir`, `datasets_dir`, `models_dir` and their images/labels subdirectories
|
||||
- Config references: `CONFIG_FILE`, `CDN_CONFIG`, `OFFSET_FILE`
|
||||
- Model paths: `CURRENT_PT_MODEL`, `CURRENT_ONNX_MODEL`
|
||||
- Thresholds: `SMALL_SIZE_KB = 3`
|
||||
|
||||
### utils.Dotdict
|
||||
```python
|
||||
class Dotdict(dict):
|
||||
# Enables config.url instead of config["url"]
|
||||
```
|
||||
|
||||
## Data Access Patterns
|
||||
None — pure constants, no I/O.
|
||||
|
||||
## Implementation Details
|
||||
- All paths rooted at `/azaion/` — assumes a fixed deployment directory structure
|
||||
- No environment-variable override for any path — paths are entirely static
|
||||
|
||||
## Caveats
|
||||
- Hardcoded root `/azaion/` makes local development without that directory structure impossible
|
||||
- No `.env` or environment-based configuration override mechanism
|
||||
- `Dotdict.__getattr__` uses `dict.get` which returns `None` for missing keys instead of raising `AttributeError`
|
||||
|
||||
## Dependency Graph
|
||||
```mermaid
|
||||
graph TD
|
||||
constants --> api_client_comp[API & CDN]
|
||||
constants --> training_comp[Training]
|
||||
constants --> data_pipeline_comp[Data Pipeline]
|
||||
constants --> inference_comp[Inference]
|
||||
utils --> training_comp
|
||||
utils --> inference_comp
|
||||
```
|
||||
|
||||
## Logging Strategy
|
||||
None.
|
||||
@@ -0,0 +1,59 @@
|
||||
# Component: Security & Hardware Identity
|
||||
|
||||
## Overview
|
||||
Provides cryptographic operations (AES-256-CBC encryption/decryption) and hardware fingerprinting. Used for protecting model files in transit and at rest, and for binding API encryption keys to specific machines.
|
||||
|
||||
**Pattern**: Utility/service library (static methods)
|
||||
**Upstream**: None (leaf component)
|
||||
**Downstream**: API & CDN, Training, Inference
|
||||
|
||||
## Modules
|
||||
- `security` — AES encryption, key derivation (SHA-384), hardcoded model key
|
||||
- `hardware_service` — cross-platform hardware info collection (CPU, GPU, RAM, drive serial)
|
||||
|
||||
## Internal Interfaces
|
||||
|
||||
### Security (static methods)
|
||||
```python
|
||||
Security.encrypt_to(input_bytes: bytes, key: str) -> bytes
|
||||
Security.decrypt_to(ciphertext_with_iv: bytes, key: str) -> bytes
|
||||
Security.calc_hash(key: str) -> str
|
||||
Security.get_hw_hash(hardware: str) -> str
|
||||
Security.get_api_encryption_key(creds, hardware_hash: str) -> str
|
||||
Security.get_model_encryption_key() -> str
|
||||
```
|
||||
|
||||
### hardware_service
|
||||
```python
|
||||
get_hardware_info() -> str
|
||||
```
|
||||
|
||||
## Data Access Patterns
|
||||
- `hardware_service` executes shell commands to query OS/hardware info
|
||||
- `security` performs in-memory cryptographic operations only
|
||||
|
||||
## Implementation Details
|
||||
- **Encryption**: AES-256-CBC. Key = SHA-256(key_string). IV = 16 random bytes prepended to ciphertext. PKCS7 padding.
|
||||
- **Key derivation hierarchy**:
|
||||
1. `get_model_encryption_key()` → hardcoded secret → SHA-384 → base64
|
||||
2. `get_hw_hash(hardware_string)` → salted hardware string → SHA-384 → base64
|
||||
3. `get_api_encryption_key(creds, hw_hash)` → email+password+hw_hash+salt → SHA-384 → base64
|
||||
- **Hardware fingerprint format**: `CPU: {cpu}. GPU: {gpu}. Memory: {memory}. DriveSerial: {serial}`
|
||||
|
||||
## Caveats
|
||||
- **Hardcoded model encryption key** in `get_model_encryption_key()` — anyone with source code access can derive the key
|
||||
- **Shell command injection risk**: `hardware_service` uses `shell=True` subprocess — safe since no user input is involved, but fragile
|
||||
- **PKCS7 unpadding** in `decrypt_to` uses manual check instead of the `cryptography` library's unpadder — potential padding oracle if error handling is observed
|
||||
- `BUFFER_SIZE` constant declared but unused in security.py
|
||||
|
||||
## Dependency Graph
|
||||
```mermaid
|
||||
graph TD
|
||||
hardware_service --> api_client[API & CDN: api_client]
|
||||
security --> api_client
|
||||
security --> training[Training]
|
||||
security --> inference[Inference: start_inference]
|
||||
```
|
||||
|
||||
## Logging Strategy
|
||||
None — operations are silent except for exceptions.
|
||||
@@ -0,0 +1,90 @@
|
||||
# Component: API & CDN Client
|
||||
|
||||
## Overview
|
||||
Communication layer for the Azaion backend API and S3-compatible CDN. Handles authentication, encrypted file transfer, and the split-resource pattern for secure model distribution.
|
||||
|
||||
**Pattern**: Client library with split-storage resource management
|
||||
**Upstream**: Core (constants), Security (encryption, hardware identity)
|
||||
**Downstream**: Training, Inference, Exports
|
||||
|
||||
## Modules
|
||||
- `api_client` — REST client for Azaion API, JWT auth, encrypted resource download/upload, split big/small pattern
|
||||
- `cdn_manager` — boto3 S3 client with separate read/write credentials
|
||||
|
||||
## Internal Interfaces
|
||||
|
||||
### CDNCredentials
|
||||
```python
|
||||
CDNCredentials(host, downloader_access_key, downloader_access_secret, uploader_access_key, uploader_access_secret)
|
||||
```
|
||||
|
||||
### CDNManager
|
||||
```python
|
||||
CDNManager(credentials: CDNCredentials)
|
||||
CDNManager.upload(bucket: str, filename: str, file_bytes: bytearray) -> bool
|
||||
CDNManager.download(bucket: str, filename: str) -> bool
|
||||
```
|
||||
|
||||
### ApiCredentials
|
||||
```python
|
||||
ApiCredentials(url, email, password)
|
||||
```
|
||||
|
||||
### ApiClient
|
||||
```python
|
||||
ApiClient()
|
||||
ApiClient.login() -> None
|
||||
ApiClient.upload_file(filename: str, file_bytes: bytearray, folder: str) -> None
|
||||
ApiClient.load_bytes(filename: str, folder: str) -> bytes
|
||||
ApiClient.load_big_small_resource(resource_name: str, folder: str, key: str) -> bytes
|
||||
ApiClient.upload_big_small_resource(resource: bytes, resource_name: str, folder: str, key: str) -> None
|
||||
```
|
||||
|
||||
## External API Specification
|
||||
|
||||
### Azaion REST API (consumed)
|
||||
| Endpoint | Method | Auth | Description |
|
||||
|----------|--------|------|-------------|
|
||||
| `/login` | POST | None (returns JWT) | `{"email": ..., "password": ...}` → `{"token": ...}` |
|
||||
| `/resources/{folder}` | POST | Bearer JWT | Multipart file upload |
|
||||
| `/resources/get/{folder}` | POST | Bearer JWT | Download encrypted resource (sends hardware info in body) |
|
||||
|
||||
### S3-compatible CDN
|
||||
| Operation | Description |
|
||||
|-----------|-------------|
|
||||
| `upload_fileobj` | Upload bytes to S3 bucket |
|
||||
| `download_file` | Download file from S3 bucket to disk |
|
||||
|
||||
## Data Access Patterns
|
||||
- API Client reads `config.yaml` on init for API credentials
|
||||
- CDN credentials loaded by API Client from encrypted `cdn.yaml` (downloaded from API)
|
||||
- Split resources: big part stored locally + CDN, small part on API server
|
||||
|
||||
## Implementation Details
|
||||
- **JWT auto-refresh**: On 401/403 response, automatically re-authenticates and retries
|
||||
- **Split-resource pattern**: Encrypts data → splits at ~20% (SMALL_SIZE_KB * 1024 min) boundary → small part to API, big part to CDN. Neither part alone can reconstruct the original.
|
||||
- **CDN credential isolation**: Separate S3 access keys for upload vs download (least-privilege)
|
||||
- **CDN self-bootstrap**: `cdn.yaml` credentials are themselves encrypted and downloaded from the API during ApiClient init
|
||||
|
||||
## Caveats
|
||||
- Credentials hardcoded in `config.yaml` and `cdn.yaml` — not using environment variables or secrets manager
|
||||
- `cdn_manager.download()` saves to current working directory with the same filename
|
||||
- No retry logic beyond JWT refresh (no exponential backoff, no connection retry)
|
||||
- `CDNManager` imports `sys`, `yaml`, `os` but doesn't use them
|
||||
|
||||
## Dependency Graph
|
||||
```mermaid
|
||||
graph TD
|
||||
constants --> api_client
|
||||
security --> api_client
|
||||
hardware_service --> api_client
|
||||
cdn_manager --> api_client
|
||||
api_client --> exports
|
||||
api_client --> train
|
||||
api_client --> start_inference
|
||||
cdn_manager --> exports
|
||||
cdn_manager --> train
|
||||
```
|
||||
|
||||
## Logging Strategy
|
||||
Print statements for upload/download confirmations and errors. No structured logging.
|
||||
@@ -0,0 +1,61 @@
|
||||
# Component: Data Models
|
||||
|
||||
## Overview
|
||||
Shared data transfer objects for the training pipeline: annotation class definitions (with weather modes) and image+label containers for visualization and augmentation.
|
||||
|
||||
**Pattern**: Plain data classes / value objects
|
||||
**Upstream**: None (leaf)
|
||||
**Downstream**: Data Pipeline (augmentation, dataset-visualiser), Training (YAML generation)
|
||||
|
||||
## Modules
|
||||
- `dto/annotationClass` — AnnotationClass, WeatherMode enum, classes.json reader
|
||||
- `dto/imageLabel` — ImageLabel container with bbox visualization
|
||||
|
||||
## Internal Interfaces
|
||||
|
||||
### WeatherMode (Enum)
|
||||
| Member | Value | Description |
|
||||
|--------|-------|-------------|
|
||||
| Norm | 0 | Normal weather |
|
||||
| Wint | 20 | Winter |
|
||||
| Night | 40 | Night |
|
||||
|
||||
### AnnotationClass
|
||||
```python
|
||||
AnnotationClass(id: int, name: str, color: str)
|
||||
AnnotationClass.read_json() -> dict[int, AnnotationClass] # static
|
||||
AnnotationClass.color_tuple -> tuple # property, RGB ints
|
||||
```
|
||||
|
||||
### ImageLabel
|
||||
```python
|
||||
ImageLabel(image_path: str, image: np.ndarray, labels_path: str, labels: list)
|
||||
ImageLabel.visualize(annotation_classes: dict) -> None
|
||||
```
|
||||
|
||||
## Data Access Patterns
|
||||
- `AnnotationClass.read_json()` reads `classes.json` from project root (relative to `dto/` parent)
|
||||
- `ImageLabel.visualize()` renders to matplotlib window (no disk I/O)
|
||||
|
||||
## Implementation Details
|
||||
- 17 base annotation classes × 3 weather modes = 51 classes with offset IDs (0–16, 20–36, 40–56)
|
||||
- System reserves 80 class slots (DEFAULT_CLASS_NUM in train.py)
|
||||
- YOLO label format: [x_center, y_center, width, height, class_id] — all normalized 0–1
|
||||
- `color_tuple` parsing strips first 3 chars (assumes "#ff" prefix format) — fragile if color format changes
|
||||
|
||||
## Caveats
|
||||
- `AnnotationClass` duplicated in 3 locations (dto, inference/dto, annotation-queue/annotation_queue_dto) with slight differences
|
||||
- `color_tuple` property has a non-obvious parsing approach that may break on different color string formats
|
||||
- Empty files: `dto/annotation_bulk_message.py` and `dto/annotation_message.py` suggest planned but unimplemented DTOs
|
||||
|
||||
## Dependency Graph
|
||||
```mermaid
|
||||
graph TD
|
||||
dto_annotationClass[dto/annotationClass] --> train
|
||||
dto_annotationClass --> dataset-visualiser
|
||||
dto_imageLabel[dto/imageLabel] --> augmentation
|
||||
dto_imageLabel --> dataset-visualiser
|
||||
```
|
||||
|
||||
## Logging Strategy
|
||||
None.
|
||||
@@ -0,0 +1,74 @@
|
||||
# Component: Data Pipeline
|
||||
|
||||
## Overview
|
||||
Tools for preparing and managing annotation data: augmentation of training images, format conversion from external annotation systems, and visual inspection of annotated datasets.
|
||||
|
||||
**Pattern**: Batch processing tools (standalone scripts + library)
|
||||
**Upstream**: Core (constants), Data Models (ImageLabel, AnnotationClass)
|
||||
**Downstream**: Training (augmented images feed into dataset formation)
|
||||
|
||||
## Modules
|
||||
- `augmentation` — image augmentation pipeline (albumentations)
|
||||
- `convert-annotations` — Pascal VOC / oriented bbox → YOLO format converter
|
||||
- `dataset-visualiser` — interactive annotation visualization tool
|
||||
|
||||
## Internal Interfaces
|
||||
|
||||
### Augmentator
|
||||
```python
|
||||
Augmentator()
|
||||
Augmentator.augment_annotations(from_scratch: bool = False) -> None
|
||||
Augmentator.augment_inner(img_ann: ImageLabel) -> list[ImageLabel]
|
||||
Augmentator.correct_bboxes(labels) -> list
|
||||
Augmentator.read_labels(labels_path) -> list[list]
|
||||
```
|
||||
|
||||
### convert-annotations (functions)
|
||||
```python
|
||||
convert(folder, dest_folder, read_annotations, ann_format) -> None
|
||||
minmax2yolo(width, height, xmin, xmax, ymin, ymax) -> tuple
|
||||
read_pascal_voc(width, height, s: str) -> list[str]
|
||||
read_bbox_oriented(width, height, s: str) -> list[str]
|
||||
```
|
||||
|
||||
### dataset-visualiser (functions)
|
||||
```python
|
||||
visualise_dataset() -> None
|
||||
visualise_processed_folder() -> None
|
||||
```
|
||||
|
||||
## Data Access Patterns
|
||||
- **Augmentation**: Reads from `/azaion/data/images/` + `/azaion/data/labels/`, writes to `/azaion/data-processed/images/` + `/azaion/data-processed/labels/`
|
||||
- **Conversion**: Reads from user-specified source folder, writes to destination folder
|
||||
- **Visualiser**: Reads from datasets or processed folder, renders to matplotlib window
|
||||
|
||||
## Implementation Details
|
||||
- **Augmentation pipeline**: Per image → 1 original copy + 7 augmented variants (8× data expansion)
|
||||
- HorizontalFlip (60%), BrightnessContrast (40%), Affine (80%), MotionBlur (10%), HueSaturation (40%)
|
||||
- Bbox correction clips outside-boundary boxes, removes boxes < 1% of image
|
||||
- Incremental: skips already-processed images
|
||||
- Continuous mode: infinite loop with 5-minute sleep between rounds
|
||||
- Concurrent: ThreadPoolExecutor for parallel image processing
|
||||
- **Format conversion**: Pluggable reader pattern — `convert()` accepts any reader function that maps (width, height, text) → YOLO lines
|
||||
- **Visualiser**: Interactive (waits for keypress) — developer debugging tool
|
||||
|
||||
## Caveats
|
||||
- `dataset-visualiser` imports from `preprocessing` module which does not exist — broken import
|
||||
- `dataset-visualiser` has hardcoded dataset date (`2024-06-18`) and start index (35247)
|
||||
- `convert-annotations` hardcodes class mappings (Truck=1, Car/Taxi=2) — not configurable
|
||||
- Augmentation parameters are hardcoded, not configurable via config file
|
||||
- Augmentation `total_to_process` attribute referenced in `augment_annotation` but never set (uses `total_images_to_process`)
|
||||
|
||||
## Dependency Graph
|
||||
```mermaid
|
||||
graph TD
|
||||
constants --> augmentation
|
||||
dto_imageLabel[dto/imageLabel] --> augmentation
|
||||
constants --> dataset-visualiser
|
||||
dto_annotationClass[dto/annotationClass] --> dataset-visualiser
|
||||
dto_imageLabel --> dataset-visualiser
|
||||
augmentation --> manual_run
|
||||
```
|
||||
|
||||
## Logging Strategy
|
||||
Print statements for progress tracking (processed count, errors). No structured logging.
|
||||
@@ -0,0 +1,87 @@
|
||||
# Component: Training Pipeline
|
||||
|
||||
## Overview
|
||||
End-to-end YOLOv11 object detection training workflow: dataset formation from augmented annotations, model training, multi-format export (ONNX, TensorRT, RKNN), and encrypted model upload.
|
||||
|
||||
**Pattern**: Pipeline / orchestrator
|
||||
**Upstream**: Core, Security, API & CDN, Data Models, Data Pipeline (augmented images)
|
||||
**Downstream**: None (produces trained models consumed externally)
|
||||
|
||||
## Modules
|
||||
- `train` — main pipeline: dataset formation → YOLO training → export → upload
|
||||
- `exports` — model format conversion (ONNX, TensorRT, RKNN) + upload utilities
|
||||
- `manual_run` — ad-hoc developer script for selective pipeline steps
|
||||
|
||||
## Internal Interfaces
|
||||
|
||||
### train
|
||||
```python
|
||||
form_dataset() -> None
|
||||
copy_annotations(images, folder: str) -> None
|
||||
check_label(label_path: str) -> bool
|
||||
create_yaml() -> None
|
||||
resume_training(last_pt_path: str) -> None
|
||||
train_dataset() -> None
|
||||
export_current_model() -> None
|
||||
```
|
||||
|
||||
### exports
|
||||
```python
|
||||
export_rknn(model_path: str) -> None
|
||||
export_onnx(model_path: str, batch_size: int = 4) -> None
|
||||
export_tensorrt(model_path: str) -> None
|
||||
form_data_sample(destination_path: str, size: int = 500, write_txt_log: bool = False) -> None
|
||||
show_model(model: str = None) -> None
|
||||
upload_model(model_path: str, filename: str, size_small_in_kb: int = 3) -> None
|
||||
```
|
||||
|
||||
## Data Access Patterns
|
||||
- **Input**: Reads augmented images from `/azaion/data-processed/images/` + labels
|
||||
- **Dataset output**: Creates dated dataset at `/azaion/datasets/azaion-{YYYY-MM-DD}/` with train/valid/test splits
|
||||
- **Model output**: Saves trained models to `/azaion/models/azaion-{YYYY-MM-DD}/`, copies best.pt to `/azaion/models/azaion.pt`
|
||||
- **Upload**: Encrypted model uploaded as split big/small to CDN + API
|
||||
- **Corrupted data**: Invalid labels moved to `/azaion/data-corrupted/`
|
||||
|
||||
## Implementation Details
|
||||
- **Dataset split**: 70% train / 20% valid / 10% test (random shuffle)
|
||||
- **Label validation**: `check_label()` verifies all YOLO coordinates are ≤ 1.0
|
||||
- **YAML generation**: Writes `data.yaml` with 80 class names (17 actual from classes.json × 3 weather modes, rest as placeholders)
|
||||
- **Training config**: YOLOv11 medium (`yolo11m.yaml`), epochs=120, batch=11 (tuned for 24GB VRAM), imgsz=1280, save_period=1, workers=24
|
||||
- **Post-training**: Removes intermediate epoch checkpoints, keeps only `best.pt`
|
||||
- **Export chain**: `.pt` → ONNX (1280px, batch=4, NMS) → encrypted → split → upload
|
||||
- **TensorRT export**: batch=4, FP16, NMS, simplify
|
||||
- **RKNN export**: targets RK3588 SoC (OrangePi5)
|
||||
- **Concurrent file copying**: ThreadPoolExecutor for parallel image/label copying during dataset formation
|
||||
- **`__main__`** in `train.py`: `train_dataset()` → `export_current_model()`
|
||||
|
||||
## Caveats
|
||||
- Training hyperparameters are hardcoded (not configurable via config file)
|
||||
- `old_images_percentage = 75` declared but unused
|
||||
- `train.py` imports `subprocess`, `sleep` but doesn't use them
|
||||
- `train.py` imports `OnnxEngine` but doesn't use it
|
||||
- `exports.upload_model()` creates `ApiClient` with different constructor signature than the one in `api_client.py` — likely stale code
|
||||
- `copy_annotations` uses a global `total_files_copied` counter with a local `copied` variable that stays at 0 — reporting bug
|
||||
- `resume_training` references `yaml` (the module) instead of a YAML file path in the `data` parameter
|
||||
|
||||
## Dependency Graph
|
||||
```mermaid
|
||||
graph TD
|
||||
constants --> train
|
||||
constants --> exports
|
||||
api_client --> train
|
||||
api_client --> exports
|
||||
cdn_manager --> train
|
||||
cdn_manager --> exports
|
||||
security --> train
|
||||
security --> exports
|
||||
utils --> train
|
||||
utils --> exports
|
||||
dto_annotationClass[dto/annotationClass] --> train
|
||||
inference_onnx[inference/onnx_engine] --> train
|
||||
exports --> train
|
||||
train --> manual_run
|
||||
augmentation --> manual_run
|
||||
```
|
||||
|
||||
## Logging Strategy
|
||||
Print statements for progress (file count, shuffling status, training results). No structured logging.
|
||||
@@ -0,0 +1,85 @@
|
||||
# Component: Inference Engine
|
||||
|
||||
## Overview
|
||||
Real-time object detection inference subsystem supporting ONNX Runtime and TensorRT backends. Processes video streams with batched inference, custom NMS, and live visualization.
|
||||
|
||||
**Pattern**: Strategy pattern (InferenceEngine ABC) + pipeline orchestrator
|
||||
**Upstream**: Core, Security, API & CDN (for model download)
|
||||
**Downstream**: None (end-user facing — processes video input)
|
||||
|
||||
## Modules
|
||||
- `inference/dto` — Detection, Annotation, AnnotationClass data classes
|
||||
- `inference/onnx_engine` — InferenceEngine ABC + OnnxEngine implementation
|
||||
- `inference/tensorrt_engine` — TensorRTEngine implementation with CUDA memory management + ONNX converter
|
||||
- `inference/inference` — Video processing pipeline (preprocess → infer → postprocess → draw)
|
||||
- `start_inference` — Entry point: downloads model, initializes engine, runs on video
|
||||
|
||||
## Internal Interfaces
|
||||
|
||||
### InferenceEngine (ABC)
|
||||
```python
|
||||
InferenceEngine.__init__(model_path: str, batch_size: int = 1, **kwargs)
|
||||
InferenceEngine.get_input_shape() -> Tuple[int, int]
|
||||
InferenceEngine.get_batch_size() -> int
|
||||
InferenceEngine.run(input_data: np.ndarray) -> List[np.ndarray]
|
||||
```
|
||||
|
||||
### OnnxEngine (extends InferenceEngine)
|
||||
Constructor takes `model_bytes` (not path). Uses CUDAExecutionProvider + CPUExecutionProvider.
|
||||
|
||||
### TensorRTEngine (extends InferenceEngine)
|
||||
Constructor takes `model_bytes: bytes`. Additional static methods:
|
||||
```python
|
||||
TensorRTEngine.get_gpu_memory_bytes(device_id=0) -> int
|
||||
TensorRTEngine.get_engine_filename(device_id=0) -> str | None
|
||||
TensorRTEngine.convert_from_onnx(onnx_model: bytes) -> bytes | None
|
||||
```
|
||||
|
||||
### Inference
|
||||
```python
|
||||
Inference(engine: InferenceEngine, confidence_threshold, iou_threshold)
|
||||
Inference.preprocess(frames: list) -> np.ndarray
|
||||
Inference.postprocess(batch_frames, batch_timestamps, output) -> list[Annotation]
|
||||
Inference.process(video: str) -> None
|
||||
Inference.draw(annotation: Annotation) -> None
|
||||
Inference.remove_overlapping_detections(detections) -> list[Detection]
|
||||
```
|
||||
|
||||
## Data Access Patterns
|
||||
- Model bytes loaded by caller (start_inference via ApiClient.load_big_small_resource)
|
||||
- Video input via cv2.VideoCapture (file path)
|
||||
- No disk writes during inference
|
||||
|
||||
## Implementation Details
|
||||
- **Video processing**: Every 4th frame processed (25% frame sampling), batched to engine batch size
|
||||
- **Preprocessing**: cv2.dnn.blobFromImage (1/255 scale, model input size, BGR→RGB)
|
||||
- **Postprocessing**: Raw detections filtered by confidence, coordinates normalized to [0,1], custom NMS applied
|
||||
- **Custom NMS**: Pairwise IoU comparison. Keeps higher confidence; ties broken by lower class ID.
|
||||
- **TensorRT**: Async CUDA execution (memcpy_htod_async → execute_async_v3 → synchronize → memcpy_dtoh)
|
||||
- **TensorRT shapes**: Default 1280×1280 input, 300 max detections, 6 values per detection (x1,y1,x2,y2,conf,cls)
|
||||
- **ONNX conversion**: TensorRT builder with 90% GPU memory workspace, FP16 if supported
|
||||
- **Engine filename**: GPU-architecture-specific: `azaion.cc_{major}.{minor}_sm_{sm_count}.engine`
|
||||
- **start_inference flow**: ApiClient → load encrypted TensorRT model (big/small split) → decrypt → TensorRTEngine → Inference.process()
|
||||
|
||||
## Caveats
|
||||
- `start_inference.get_engine_filename()` duplicates `TensorRTEngine.get_engine_filename()`
|
||||
- Video path hardcoded in `start_inference` (`tests/ForAI_test.mp4`)
|
||||
- `inference/dto` has its own AnnotationClass — duplicated from `dto/annotationClass`
|
||||
- cv2.imshow display requires a GUI environment — won't work headless
|
||||
- TensorRT `batch_size` attribute used before assignment if engine input shape has dynamic batch — potential NameError
|
||||
|
||||
## Dependency Graph
|
||||
```mermaid
|
||||
graph TD
|
||||
inference_dto[inference/dto] --> inference_inference[inference/inference]
|
||||
inference_onnx[inference/onnx_engine] --> inference_inference
|
||||
inference_onnx --> inference_trt[inference/tensorrt_engine]
|
||||
inference_trt --> start_inference
|
||||
inference_inference --> start_inference
|
||||
constants --> start_inference
|
||||
api_client --> start_inference
|
||||
security --> start_inference
|
||||
```
|
||||
|
||||
## Logging Strategy
|
||||
Print statements for metadata, download progress, timing. cv2.imshow for visual output.
|
||||
@@ -0,0 +1,71 @@
|
||||
# Component: Annotation Queue Service
|
||||
|
||||
## Overview
|
||||
Self-contained async service that consumes annotation CRUD events from a RabbitMQ Streams queue and persists images + labels to the filesystem. Operates independently from the training pipeline.
|
||||
|
||||
**Pattern**: Message-driven event handler / consumer service
|
||||
**Upstream**: External RabbitMQ Streams queue (Azaion platform)
|
||||
**Downstream**: Data Pipeline (files written become input for augmentation)
|
||||
|
||||
## Modules
|
||||
- `annotation-queue/annotation_queue_dto` — message DTOs (AnnotationMessage, AnnotationBulkMessage, AnnotationStatus, Detection, etc.)
|
||||
- `annotation-queue/annotation_queue_handler` — async queue consumer with message routing and file management
|
||||
|
||||
## Internal Interfaces
|
||||
|
||||
### AnnotationQueueHandler
|
||||
```python
|
||||
AnnotationQueueHandler()
|
||||
AnnotationQueueHandler.start() -> async
|
||||
AnnotationQueueHandler.on_message(message: AMQPMessage, context: MessageContext) -> None
|
||||
AnnotationQueueHandler.save_annotation(ann: AnnotationMessage) -> None
|
||||
AnnotationQueueHandler.validate(msg: AnnotationBulkMessage) -> None
|
||||
AnnotationQueueHandler.delete(msg: AnnotationBulkMessage) -> None
|
||||
```
|
||||
|
||||
### Key DTOs
|
||||
```python
|
||||
AnnotationMessage(msgpack_bytes) # Full annotation with image + detections
|
||||
AnnotationBulkMessage(msgpack_bytes) # Bulk validate/delete
|
||||
AnnotationStatus: Created(10), Edited(20), Validated(30), Deleted(40)
|
||||
RoleEnum: Operator(10), Validator(20), CompanionPC(30), Admin(40), ApiAdmin(1000)
|
||||
```
|
||||
|
||||
## Data Access Patterns
|
||||
- **Queue**: Consumes from RabbitMQ Streams queue `azaion-annotations` using rstream library
|
||||
- **Offset persistence**: `offset.yaml` tracks last processed message offset for resume
|
||||
- **Filesystem writes**:
|
||||
- Validated annotations → `{root}/data/images/` + `{root}/data/labels/`
|
||||
- Unvalidated (seed) → `{root}/data-seed/images/` + `{root}/data-seed/labels/`
|
||||
- Deleted → `{root}/data_deleted/images/` + `{root}/data_deleted/labels/`
|
||||
|
||||
## Implementation Details
|
||||
- **Message routing**: Based on `AnnotationStatus` from AMQP application properties:
|
||||
- Created/Edited → save label + optionally image; validator role writes to data, operator to seed
|
||||
- Validated (bulk) → move from seed to data
|
||||
- Deleted (bulk) → move to deleted directory
|
||||
- **Role-based logic**: `RoleEnum.is_validator()` returns True for Validator, Admin, ApiAdmin — these roles write directly to validated data directory
|
||||
- **Serialization**: Messages are msgpack-encoded with positional integer keys. Detections are embedded as a JSON string within the msgpack payload.
|
||||
- **Offset tracking**: After each successfully processed message, offset is persisted to `offset.yaml` (survives restarts)
|
||||
- **Logging**: TimedRotatingFileHandler with daily rotation, 7-day retention, writes to `logs/` directory
|
||||
- **Separate dependencies**: Own `requirements.txt` (pyyaml, msgpack, rstream only)
|
||||
- **Own config.yaml**: Points to test directories by default (`data-test`, `data-test-seed`)
|
||||
|
||||
## Caveats
|
||||
- Credentials hardcoded in `config.yaml` (queue host, user, password)
|
||||
- AnnotationClass duplicated (third copy) with slight differences from dto/ version
|
||||
- No reconnection logic for queue disconnections
|
||||
- No dead-letter queue or message retry on processing failures
|
||||
- `save_annotation` writes empty label files when detections list has no newline separators between entries
|
||||
- The annotation-queue `config.yaml` uses different directory names (`data-test` vs `data`) than the main `config.yaml` — likely a test vs production configuration issue
|
||||
|
||||
## Dependency Graph
|
||||
```mermaid
|
||||
graph TD
|
||||
annotation_queue_dto --> annotation_queue_handler
|
||||
rstream_ext[rstream library] --> annotation_queue_handler
|
||||
msgpack_ext[msgpack library] --> annotation_queue_dto
|
||||
```
|
||||
|
||||
## Logging Strategy
|
||||
`logging` module with TimedRotatingFileHandler. Format: `HH:MM:SS|message`. Daily rotation, 7-day retention. Also outputs to stdout.
|
||||
@@ -0,0 +1,106 @@
|
||||
# Data Model
|
||||
|
||||
## Entity Overview
|
||||
|
||||
This system does not use a database. All data is stored as files on the filesystem and in-memory data structures. The primary entities are annotation images, labels, and ML models.
|
||||
|
||||
## Entities
|
||||
|
||||
### Annotation Image
|
||||
- **Storage**: JPEG files on filesystem
|
||||
- **Naming**: `{uuid}.jpg` (name assigned by Azaion platform)
|
||||
- **Lifecycle**: Created → Seed/Validated → Augmented → Dataset → Model Training
|
||||
|
||||
### Annotation Label (YOLO format)
|
||||
- **Storage**: Text files on filesystem
|
||||
- **Naming**: `{uuid}.txt` (matches image name)
|
||||
- **Format**: One line per detection: `{class_id} {center_x} {center_y} {width} {height}`
|
||||
- **Coordinates**: All normalized to 0–1 range relative to image dimensions
|
||||
|
||||
### AnnotationClass
|
||||
- **Storage**: `classes.json` (static file, 17 entries)
|
||||
- **Fields**: Id (int), Name (str), ShortName (str), Color (hex str)
|
||||
- **Weather expansion**: Each class × 3 weather modes → IDs offset by 0/20/40
|
||||
- **Total slots**: 80 (51 used, 29 reserved as "Class-N" placeholders)
|
||||
|
||||
### Detection (inference)
|
||||
- **In-memory only**: Created during inference postprocessing
|
||||
- **Fields**: x, y, w, h (normalized), cls (int), confidence (float)
|
||||
|
||||
### Annotation (inference)
|
||||
- **In-memory only**: Groups detections per video frame
|
||||
- **Fields**: frame (image), time (ms), detections (list)
|
||||
|
||||
### AnnotationMessage (queue)
|
||||
- **Wire format**: msgpack with positional integer keys
|
||||
- **Fields**: createdDate, name, originalMediaName, time, imageExtension, detections (JSON string), image (bytes), createdRole, createdEmail, source, status
|
||||
|
||||
### ML Model
|
||||
- **Formats**: .pt, .onnx, .engine, .rknn
|
||||
- **Encryption**: AES-256-CBC before upload
|
||||
- **Split storage**: .small part (API server) + .big part (CDN)
|
||||
- **Naming**: `azaion.{ext}` for current model; `azaion.cc_{major}.{minor}_sm_{count}.engine` for GPU-specific TensorRT
|
||||
|
||||
## Filesystem Entity Relationships
|
||||
|
||||
```mermaid
|
||||
erDiagram
|
||||
ANNOTATION_IMAGE ||--|| ANNOTATION_LABEL : "matches by filename stem"
|
||||
ANNOTATION_CLASS ||--o{ ANNOTATION_LABEL : "class_id references"
|
||||
ANNOTATION_IMAGE }o--|| DATASET_SPLIT : "copied into"
|
||||
ANNOTATION_LABEL }o--|| DATASET_SPLIT : "copied into"
|
||||
DATASET_SPLIT ||--|| TRAINING_RUN : "input to"
|
||||
TRAINING_RUN ||--|| MODEL_PT : "produces"
|
||||
MODEL_PT ||--|| MODEL_ONNX : "exported to"
|
||||
MODEL_PT ||--|| MODEL_ENGINE : "exported to"
|
||||
MODEL_PT ||--|| MODEL_RKNN : "exported to"
|
||||
MODEL_ONNX ||--|| ENCRYPTED_MODEL : "encrypted"
|
||||
MODEL_ENGINE ||--|| ENCRYPTED_MODEL : "encrypted"
|
||||
ENCRYPTED_MODEL ||--|| MODEL_SMALL : "split part"
|
||||
ENCRYPTED_MODEL ||--|| MODEL_BIG : "split part"
|
||||
```
|
||||
|
||||
## Directory Layout (Data Lifecycle)
|
||||
|
||||
```
|
||||
/azaion/
|
||||
├── data-seed/ ← Unvalidated annotations (from operators)
|
||||
│ ├── images/
|
||||
│ └── labels/
|
||||
├── data/ ← Validated annotations (from validators/admins)
|
||||
│ ├── images/
|
||||
│ └── labels/
|
||||
├── data-processed/ ← Augmented data (8× expansion)
|
||||
│ ├── images/
|
||||
│ └── labels/
|
||||
├── data-corrupted/ ← Invalid labels (coords > 1.0)
|
||||
│ ├── images/
|
||||
│ └── labels/
|
||||
├── data_deleted/ ← Soft-deleted annotations
|
||||
│ ├── images/
|
||||
│ └── labels/
|
||||
├── data-sample/ ← Random sample for review
|
||||
├── datasets/ ← Training datasets (dated)
|
||||
│ └── azaion-{YYYY-MM-DD}/
|
||||
│ ├── train/images/ + labels/
|
||||
│ ├── valid/images/ + labels/
|
||||
│ ├── test/images/ + labels/
|
||||
│ └── data.yaml
|
||||
└── models/ ← Trained model artifacts
|
||||
├── azaion.pt ← Current best model
|
||||
├── azaion.onnx ← Current ONNX export
|
||||
└── azaion-{YYYY-MM-DD}/← Per-training-run results
|
||||
└── weights/
|
||||
└── best.pt
|
||||
```
|
||||
|
||||
## Configuration Files
|
||||
|
||||
| File | Location | Contents |
|
||||
|------|----------|---------|
|
||||
| `config.yaml` | Project root | API credentials, queue config, directory paths |
|
||||
| `cdn.yaml` | Project root | CDN endpoint + S3 access keys |
|
||||
| `classes.json` | Project root | Annotation class definitions (17 classes) |
|
||||
| `checkpoint.txt` | Project root | Last training checkpoint timestamp |
|
||||
| `offset.yaml` | annotation-queue/ | Queue consumer offset |
|
||||
| `data.yaml` | Per dataset | YOLO training config (class names, split paths) |
|
||||
@@ -0,0 +1,99 @@
|
||||
# Component Relationship Diagram
|
||||
|
||||
```mermaid
|
||||
graph TD
|
||||
subgraph "Core Infrastructure"
|
||||
core[01 Core<br/>constants, utils]
|
||||
end
|
||||
|
||||
subgraph "Security & Hardware"
|
||||
sec[02 Security<br/>security, hardware_service]
|
||||
end
|
||||
|
||||
subgraph "API & CDN Client"
|
||||
api[03 API & CDN<br/>api_client, cdn_manager]
|
||||
end
|
||||
|
||||
subgraph "Data Models"
|
||||
dto[04 Data Models<br/>dto/annotationClass, dto/imageLabel]
|
||||
end
|
||||
|
||||
subgraph "Data Pipeline"
|
||||
data[05 Data Pipeline<br/>augmentation, convert-annotations,<br/>dataset-visualiser]
|
||||
end
|
||||
|
||||
subgraph "Training Pipeline"
|
||||
train[06 Training<br/>train, exports, manual_run]
|
||||
end
|
||||
|
||||
subgraph "Inference Engine"
|
||||
infer[07 Inference<br/>inference/*, start_inference]
|
||||
end
|
||||
|
||||
subgraph "Annotation Queue Service"
|
||||
queue[08 Annotation Queue<br/>annotation-queue/*]
|
||||
end
|
||||
|
||||
core --> api
|
||||
core --> data
|
||||
core --> train
|
||||
core --> infer
|
||||
|
||||
sec --> api
|
||||
sec --> train
|
||||
sec --> infer
|
||||
|
||||
api --> train
|
||||
api --> infer
|
||||
|
||||
dto --> data
|
||||
dto --> train
|
||||
|
||||
data -.->|augmented images<br/>on filesystem| train
|
||||
|
||||
queue -.->|annotation files<br/>on filesystem| data
|
||||
|
||||
style core fill:#e8f5e9
|
||||
style sec fill:#fff3e0
|
||||
style api fill:#e3f2fd
|
||||
style dto fill:#f3e5f5
|
||||
style data fill:#fce4ec
|
||||
style train fill:#e0f2f1
|
||||
style infer fill:#f9fbe7
|
||||
style queue fill:#efebe9
|
||||
```
|
||||
|
||||
## Component Summary
|
||||
|
||||
| # | Component | Modules | Purpose |
|
||||
|---|-----------|---------|---------|
|
||||
| 01 | Core Infrastructure | constants, utils | Shared paths, config keys, helper classes |
|
||||
| 02 | Security & Hardware | security, hardware_service | AES encryption, key derivation, hardware fingerprinting |
|
||||
| 03 | API & CDN Client | api_client, cdn_manager | REST API + S3 CDN communication, split-resource pattern |
|
||||
| 04 | Data Models | dto/annotationClass, dto/imageLabel | Annotation classes, image+label container |
|
||||
| 05 | Data Pipeline | augmentation, convert-annotations, dataset-visualiser | Data prep: augmentation, format conversion, visualization |
|
||||
| 06 | Training Pipeline | train, exports, manual_run | YOLO training, model export, encrypted upload |
|
||||
| 07 | Inference Engine | inference/dto, onnx_engine, tensorrt_engine, inference, start_inference | Real-time video object detection |
|
||||
| 08 | Annotation Queue | annotation_queue_dto, annotation_queue_handler | Async annotation event consumer service |
|
||||
|
||||
## Module Coverage Verification
|
||||
All 21 source modules are covered by exactly one component:
|
||||
- 01: constants, utils (2)
|
||||
- 02: security, hardware_service (2)
|
||||
- 03: api_client, cdn_manager (2)
|
||||
- 04: dto/annotationClass, dto/imageLabel (2)
|
||||
- 05: augmentation, convert-annotations, dataset-visualiser (3)
|
||||
- 06: train, exports, manual_run (3)
|
||||
- 07: inference/dto, inference/onnx_engine, inference/tensorrt_engine, inference/inference, start_inference (5)
|
||||
- 08: annotation-queue/annotation_queue_dto, annotation-queue/annotation_queue_handler (2)
|
||||
- **Total: 21 modules covered**
|
||||
|
||||
## Inter-Component Communication
|
||||
| From | To | Mechanism |
|
||||
|------|----|-----------|
|
||||
| Annotation Queue → Data Pipeline | Filesystem | Queue writes images/labels → augmentation reads them |
|
||||
| Data Pipeline → Training | Filesystem | Augmented images in `/azaion/data-processed/` → dataset formation |
|
||||
| Training → API & CDN | API calls | Encrypted model upload (split big/small) |
|
||||
| Inference → API & CDN | API calls | Encrypted model download (reassemble big/small) |
|
||||
| API & CDN → Security | Function calls | Encryption/decryption for transit protection |
|
||||
| API & CDN → Core | Import | Path constants, config file references |
|
||||
@@ -0,0 +1,3 @@
|
||||
# Flow: Annotation Ingestion
|
||||
|
||||
See `_docs/02_document/system-flows.md` — Flow 1.
|
||||
@@ -0,0 +1,3 @@
|
||||
# Flow: Data Augmentation
|
||||
|
||||
See `_docs/02_document/system-flows.md` — Flow 2.
|
||||
@@ -0,0 +1,3 @@
|
||||
# Flow: Model Download & Inference
|
||||
|
||||
See `_docs/02_document/system-flows.md` — Flow 4.
|
||||
@@ -0,0 +1,3 @@
|
||||
# Flow: Training Pipeline
|
||||
|
||||
See `_docs/02_document/system-flows.md` — Flow 3.
|
||||
@@ -0,0 +1,97 @@
|
||||
# Module: annotation-queue/annotation_queue_dto
|
||||
|
||||
## Purpose
|
||||
Data transfer objects for the annotation queue consumer. Defines message types for annotation CRUD events received from a RabbitMQ Streams queue.
|
||||
|
||||
## Public Interface
|
||||
|
||||
### AnnotationClass (local copy)
|
||||
Same as dto/annotationClass but reads `classes.json` from current working directory and adds `opencv_color` BGR field.
|
||||
|
||||
### AnnotationStatus (Enum)
|
||||
| Member | Value |
|
||||
|--------|-------|
|
||||
| Created | 10 |
|
||||
| Edited | 20 |
|
||||
| Validated | 30 |
|
||||
| Deleted | 40 |
|
||||
|
||||
### SourceEnum (Enum)
|
||||
| Member | Value |
|
||||
|--------|-------|
|
||||
| AI | 0 |
|
||||
| Manual | 1 |
|
||||
|
||||
### RoleEnum (Enum)
|
||||
| Member | Value | Description |
|
||||
|--------|-------|-------------|
|
||||
| Operator | 10 | Regular annotator |
|
||||
| Validator | 20 | Annotation validator |
|
||||
| CompanionPC | 30 | Companion device |
|
||||
| Admin | 40 | Administrator |
|
||||
| ApiAdmin | 1000 | API-level admin |
|
||||
|
||||
`RoleEnum.is_validator() -> bool`: Returns True for Validator, Admin, ApiAdmin.
|
||||
|
||||
### Detection
|
||||
| Field | Type |
|
||||
|-------|------|
|
||||
| `annotation_name` | str |
|
||||
| `cls` | int |
|
||||
| `x`, `y`, `w`, `h` | float |
|
||||
| `confidence` | float (optional) |
|
||||
|
||||
### AnnotationCreatedMessageNarrow
|
||||
Lightweight message with only `name` and `createdEmail` (from msgpack fields 1, 2).
|
||||
|
||||
### AnnotationMessage
|
||||
Full annotation message deserialized from msgpack:
|
||||
| Field | Type | Source |
|
||||
|-------|------|--------|
|
||||
| `createdDate` | datetime | msgpack field 0 (Timestamp) |
|
||||
| `name` | str | field 1 |
|
||||
| `originalMediaName` | str | field 2 |
|
||||
| `time` | timedelta | field 3 (microseconds/10) |
|
||||
| `imageExtension` | str | field 4 |
|
||||
| `detections` | list[Detection] | field 5 (JSON string) |
|
||||
| `image` | bytes | field 6 |
|
||||
| `createdRole` | RoleEnum | field 7 |
|
||||
| `createdEmail` | str | field 8 |
|
||||
| `source` | SourceEnum | field 9 |
|
||||
| `status` | AnnotationStatus | field 10 |
|
||||
|
||||
### AnnotationBulkMessage
|
||||
Bulk operation message for validate/delete:
|
||||
| Field | Type | Source |
|
||||
|-------|------|--------|
|
||||
| `annotation_names` | list[str] | msgpack field 0 |
|
||||
| `annotation_status` | AnnotationStatus | field 1 |
|
||||
| `createdEmail` | str | field 2 |
|
||||
| `createdDate` | datetime | field 3 (Timestamp) |
|
||||
|
||||
## Internal Logic
|
||||
- All messages are deserialized from msgpack binary using positional integer keys.
|
||||
- Detections within AnnotationMessage are stored as a JSON string inside the msgpack payload.
|
||||
- Module-level `annotation_classes = AnnotationClass.read_json()` is loaded at import time for Detection.__str__ formatting.
|
||||
|
||||
## Dependencies
|
||||
- `msgpack` (external) — binary message deserialization
|
||||
- `json`, `datetime`, `enum` (stdlib)
|
||||
|
||||
## Consumers
|
||||
annotation-queue/annotation_queue_handler
|
||||
|
||||
## Data Models
|
||||
AnnotationClass, AnnotationStatus, SourceEnum, RoleEnum, Detection, AnnotationCreatedMessageNarrow, AnnotationMessage, AnnotationBulkMessage.
|
||||
|
||||
## Configuration
|
||||
Reads `classes.json` from current working directory.
|
||||
|
||||
## External Integrations
|
||||
None (pure data classes).
|
||||
|
||||
## Security
|
||||
None.
|
||||
|
||||
## Tests
|
||||
None.
|
||||
@@ -0,0 +1,59 @@
|
||||
# Module: annotation-queue/annotation_queue_handler
|
||||
|
||||
## Purpose
|
||||
Async consumer for the Azaion annotation queue (RabbitMQ Streams). Listens for annotation CRUD events and writes/moves image+label files on the filesystem.
|
||||
|
||||
## Public Interface
|
||||
|
||||
### AnnotationQueueHandler
|
||||
| Method | Signature | Returns | Description |
|
||||
|--------|-----------|---------|-------------|
|
||||
| `__init__` | `()` | — | Reads config.yaml, creates directories, initializes rstream Consumer, reads offset |
|
||||
| `start` | `async ()` | — | Starts consumer, subscribes to queue stream, runs event loop |
|
||||
| `on_message` | `(message: AMQPMessage, context: MessageContext)` | — | Message callback: routes by AnnotationStatus to save/validate/delete |
|
||||
| `save_annotation` | `(ann: AnnotationMessage)` | — | Writes label file + image to data or seed directory based on role |
|
||||
| `validate` | `(msg: AnnotationBulkMessage)` | — | Moves annotations from seed to data directory |
|
||||
| `delete` | `(msg: AnnotationBulkMessage)` | — | Moves annotations to deleted directory |
|
||||
|
||||
### AnnotationQueueHandler.AnnotationName (inner class)
|
||||
Helper that pre-computes file paths for an annotation name across data/seed directories.
|
||||
|
||||
## Internal Logic
|
||||
- **Queue protocol**: Subscribes to a RabbitMQ Streams queue using rstream library with AMQP message decoding. Resumes from a persisted offset stored in `offset.yaml`.
|
||||
- **Message routing** (via `application_properties['AnnotationStatus']`):
|
||||
- `Created` / `Edited` → `save_annotation`: If validator role, writes to data dir; else writes to seed dir. For Created status, also saves the image bytes. For Edited by validator, moves image from seed to data.
|
||||
- `Validated` → `validate`: Bulk-moves all named annotations from seed to data directory.
|
||||
- `Deleted` → `delete`: Bulk-moves all named annotations to the deleted directory.
|
||||
- **Offset tracking**: After each message, increments offset and persists to `offset.yaml`.
|
||||
- **Directory layout**:
|
||||
- `{root}/data/images/` + `{root}/data/labels/` — validated annotations
|
||||
- `{root}/data-seed/images/` + `{root}/data-seed/labels/` — unvalidated annotations
|
||||
- `{root}/data_deleted/images/` + `{root}/data_deleted/labels/` — soft-deleted annotations
|
||||
- **Logging**: TimedRotatingFileHandler with daily rotation, 7-day retention, logs to `logs/` directory.
|
||||
|
||||
## Dependencies
|
||||
- `annotation_queue_dto` — AnnotationStatus, AnnotationMessage, AnnotationBulkMessage
|
||||
- `rstream` (external) — RabbitMQ Streams consumer
|
||||
- `yaml` (external) — config and offset persistence
|
||||
- `asyncio`, `os`, `shutil`, `sys`, `logging`, `datetime` (stdlib)
|
||||
|
||||
## Consumers
|
||||
None (entry point — runs via `__main__`).
|
||||
|
||||
## Data Models
|
||||
Uses AnnotationMessage, AnnotationBulkMessage from annotation_queue_dto.
|
||||
|
||||
## Configuration
|
||||
- `config.yaml`: API creds (url, email, password), queue config (host, port, consumer_user, consumer_pw, name), directory structure (root, data, data_seed, data_processed, data_deleted, images, labels)
|
||||
- `offset.yaml`: persisted queue consumer offset
|
||||
|
||||
## External Integrations
|
||||
- RabbitMQ Streams queue (rstream library) on host `188.245.120.247:5552`
|
||||
- Filesystem: `/azaion/data/`, `/azaion/data-seed/`, `/azaion/data_deleted/`
|
||||
|
||||
## Security
|
||||
- Queue credentials in `config.yaml` (hardcoded — security concern)
|
||||
- No encryption of annotation data at rest
|
||||
|
||||
## Tests
|
||||
None.
|
||||
@@ -0,0 +1,64 @@
|
||||
# Module: api_client
|
||||
|
||||
## Purpose
|
||||
HTTP client for the Azaion backend API. Handles authentication, file upload/download with encryption, and split-resource management (big/small model parts).
|
||||
|
||||
## Public Interface
|
||||
|
||||
### ApiCredentials
|
||||
| Field | Type | Description |
|
||||
|-------|------|-------------|
|
||||
| `url` | str | API base URL |
|
||||
| `email` | str | Login email |
|
||||
| `password` | str | Login password |
|
||||
|
||||
### ApiClient
|
||||
| Method | Signature | Returns | Description |
|
||||
|--------|-----------|---------|-------------|
|
||||
| `__init__` | `()` | — | Reads `config.yaml` for API creds, reads `cdn.yaml` via `load_bytes`, initializes CDNManager |
|
||||
| `login` | `()` | — | POST `/login` → stores JWT token |
|
||||
| `upload_file` | `(filename: str, file_bytes: bytearray, folder: str)` | — | Uploads file to API resource endpoint |
|
||||
| `load_bytes` | `(filename: str, folder: str) -> bytes` | Decrypted bytes | Downloads encrypted resource from API, decrypts with hardware-bound key |
|
||||
| `load_big_small_resource` | `(resource_name: str, folder: str, key: str) -> bytes` | Decrypted bytes | Reassembles a split resource: big part from local disk + small part from API, decrypts combined |
|
||||
| `upload_big_small_resource` | `(resource: bytes, resource_name: str, folder: str, key: str)` | — | Encrypts resource, splits into big (CDN) + small (API), uploads both |
|
||||
|
||||
## Internal Logic
|
||||
- **Authentication**: JWT-based. Auto-login on first request, re-login on 401/403.
|
||||
- **load_bytes**: Sends hardware fingerprint in request payload. Server returns encrypted bytes. Client decrypts using key derived from credentials + hardware hash.
|
||||
- **Split resource pattern**: Large files (models) are split into two parts:
|
||||
- `*.small` — first N bytes (min of `SMALL_SIZE_KB * 1024` or 20% of encrypted size) — stored on API server
|
||||
- `*.big` — remainder — stored on CDN (S3)
|
||||
- This split ensures the model cannot be reconstructed from either storage alone.
|
||||
- **CDN initialization**: On construction, `cdn.yaml` is loaded via `load_bytes` (from API, encrypted), then used to initialize `CDNManager`.
|
||||
|
||||
## Dependencies
|
||||
- `constants` — config file paths, size thresholds, model folder name
|
||||
- `cdn_manager` — CDNCredentials, CDNManager for S3 operations
|
||||
- `hardware_service` — `get_hardware_info()` for hardware fingerprint
|
||||
- `security` — encryption/decryption, key derivation
|
||||
- `requests` (external) — HTTP client
|
||||
- `yaml` (external) — config parsing
|
||||
- `io`, `json`, `os` (stdlib)
|
||||
|
||||
## Consumers
|
||||
exports, train, start_inference
|
||||
|
||||
## Data Models
|
||||
`ApiCredentials` — API connection credentials.
|
||||
|
||||
## Configuration
|
||||
- `config.yaml` — API URL, email, password
|
||||
- `cdn.yaml` — CDN credentials (loaded encrypted from API at init time)
|
||||
|
||||
## External Integrations
|
||||
- Azaion REST API (`POST /login`, `POST /resources/{folder}`, `POST /resources/get/{folder}`)
|
||||
- S3-compatible CDN via CDNManager
|
||||
|
||||
## Security
|
||||
- JWT token-based authentication with auto-refresh on 401/403
|
||||
- Hardware-bound encryption for downloaded resources
|
||||
- Split model storage prevents single-point compromise
|
||||
- Credentials read from `config.yaml` (hardcoded in file — security concern)
|
||||
|
||||
## Tests
|
||||
None.
|
||||
@@ -0,0 +1,56 @@
|
||||
# Module: augmentation
|
||||
|
||||
## Purpose
|
||||
Image augmentation pipeline that takes raw annotated images and produces multiple augmented variants for training data expansion. Runs continuously in a loop.
|
||||
|
||||
## Public Interface
|
||||
|
||||
### Augmentator
|
||||
| Method | Signature | Returns | Description |
|
||||
|--------|-----------|---------|-------------|
|
||||
| `__init__` | `()` | — | Initializes augmentation transforms and counters |
|
||||
| `augment_annotations` | `(from_scratch: bool = False)` | — | Processes all unprocessed images from `data/images` → `data-processed/images` |
|
||||
| `augment_annotation` | `(image_file)` | — | Processes a single image file: reads image + labels, augments, saves results |
|
||||
| `augment_inner` | `(img_ann: ImageLabel) -> list[ImageLabel]` | List of augmented images | Generates 1 original + 7 augmented variants |
|
||||
| `correct_bboxes` | `(labels) -> list` | Corrected labels | Clips bounding boxes to image boundaries, removes tiny boxes |
|
||||
| `read_labels` | `(labels_path) -> list[list]` | Parsed YOLO labels | Reads YOLO-format label file into list of [x, y, w, h, class_id] |
|
||||
|
||||
## Internal Logic
|
||||
- **Augmentation pipeline** (albumentations Compose):
|
||||
1. HorizontalFlip (p=0.6)
|
||||
2. RandomBrightnessContrast (p=0.4)
|
||||
3. Affine: scale 0.8–1.2, rotate ±35°, shear ±10° (p=0.8)
|
||||
4. MotionBlur (p=0.1)
|
||||
5. HueSaturationValue (p=0.4)
|
||||
- Each image produces **8 outputs**: 1 original copy + 7 augmented variants
|
||||
- Naming: `{stem}_{1..7}.jpg` for augmented, original keeps its name
|
||||
- **Bbox correction**: clips bounding boxes that extend outside image borders, removes boxes smaller than `correct_min_bbox_size` (0.01 of image dimension)
|
||||
- **Incremental processing**: skips images already present in `processed_images_dir`
|
||||
- **Concurrent**: uses `ThreadPoolExecutor` for parallel processing
|
||||
- **Continuous mode**: `__main__` runs augmentation in an infinite loop with 5-minute sleep between rounds
|
||||
|
||||
## Dependencies
|
||||
- `constants` — directory paths (data_images_dir, data_labels_dir, processed_*)
|
||||
- `dto/imageLabel` — ImageLabel container class
|
||||
- `albumentations` (external) — augmentation transforms
|
||||
- `cv2` (external) — image read/write
|
||||
- `numpy` (external) — image array handling
|
||||
- `concurrent.futures`, `os`, `shutil`, `time`, `datetime`, `pathlib` (stdlib)
|
||||
|
||||
## Consumers
|
||||
manual_run
|
||||
|
||||
## Data Models
|
||||
Uses `ImageLabel` from `dto/imageLabel`.
|
||||
|
||||
## Configuration
|
||||
Hardcoded augmentation parameters (probabilities, ranges). Directory paths from `constants`.
|
||||
|
||||
## External Integrations
|
||||
Filesystem I/O: reads from `/azaion/data/`, writes to `/azaion/data-processed/`.
|
||||
|
||||
## Security
|
||||
None.
|
||||
|
||||
## Tests
|
||||
None.
|
||||
@@ -0,0 +1,51 @@
|
||||
# Module: cdn_manager
|
||||
|
||||
## Purpose
|
||||
Manages file upload and download to/from an S3-compatible CDN (MinIO/similar) using separate credentials for upload and download operations.
|
||||
|
||||
## Public Interface
|
||||
|
||||
### CDNCredentials
|
||||
| Field | Type | Description |
|
||||
|-------|------|-------------|
|
||||
| `host` | str | CDN endpoint URL |
|
||||
| `downloader_access_key` | str | S3 access key for downloads |
|
||||
| `downloader_access_secret` | str | S3 secret for downloads |
|
||||
| `uploader_access_key` | str | S3 access key for uploads |
|
||||
| `uploader_access_secret` | str | S3 secret for uploads |
|
||||
|
||||
### CDNManager
|
||||
| Method | Signature | Returns | Description |
|
||||
|--------|-----------|---------|-------------|
|
||||
| `__init__` | `(credentials: CDNCredentials)` | — | Creates two boto3 S3 clients (download + upload) |
|
||||
| `upload` | `(bucket: str, filename: str, file_bytes: bytearray) -> bool` | True on success | Uploads bytes to S3 bucket |
|
||||
| `download` | `(bucket: str, filename: str) -> bool` | True on success | Downloads file from S3 to current directory |
|
||||
|
||||
## Internal Logic
|
||||
- Maintains two separate boto3 S3 clients with different credentials (read vs write separation)
|
||||
- Upload uses `upload_fileobj` with in-memory BytesIO wrapper
|
||||
- Download uses `download_file` (saves directly to disk with same filename)
|
||||
- Both methods catch all exceptions, print error, return bool
|
||||
|
||||
## Dependencies
|
||||
- `boto3` (external) — S3 client
|
||||
- `io`, `sys`, `yaml`, `os` (stdlib) — Note: `sys`, `yaml`, `os` are imported but unused
|
||||
|
||||
## Consumers
|
||||
api_client, exports, train, start_inference
|
||||
|
||||
## Data Models
|
||||
`CDNCredentials` — plain data class holding S3 access credentials.
|
||||
|
||||
## Configuration
|
||||
Credentials loaded from `cdn.yaml` by callers (not by this module directly).
|
||||
|
||||
## External Integrations
|
||||
- S3-compatible object storage (configured via `CDNCredentials.host`)
|
||||
|
||||
## Security
|
||||
- Separate read/write credentials enforce least-privilege access
|
||||
- Credentials passed in at construction time, not hardcoded here
|
||||
|
||||
## Tests
|
||||
None.
|
||||
@@ -0,0 +1,59 @@
|
||||
# Module: constants
|
||||
|
||||
## Purpose
|
||||
Centralizes all filesystem path constants, config file names, file extensions, and size thresholds used across the training pipeline.
|
||||
|
||||
## Public Interface
|
||||
|
||||
| Name | Type | Value/Description |
|
||||
|------|------|-------------------|
|
||||
| `azaion` | str | Root directory: `/azaion` |
|
||||
| `prefix` | str | Naming prefix: `azaion-` |
|
||||
| `data_dir` | str | `/azaion/data` |
|
||||
| `data_images_dir` | str | `/azaion/data/images` |
|
||||
| `data_labels_dir` | str | `/azaion/data/labels` |
|
||||
| `processed_dir` | str | `/azaion/data-processed` |
|
||||
| `processed_images_dir` | str | `/azaion/data-processed/images` |
|
||||
| `processed_labels_dir` | str | `/azaion/data-processed/labels` |
|
||||
| `corrupted_dir` | str | `/azaion/data-corrupted` |
|
||||
| `corrupted_images_dir` | str | `/azaion/data-corrupted/images` |
|
||||
| `corrupted_labels_dir` | str | `/azaion/data-corrupted/labels` |
|
||||
| `sample_dir` | str | `/azaion/data-sample` |
|
||||
| `datasets_dir` | str | `/azaion/datasets` |
|
||||
| `models_dir` | str | `/azaion/models` |
|
||||
| `date_format` | str | `%Y-%m-%d` |
|
||||
| `checkpoint_file` | str | `checkpoint.txt` |
|
||||
| `checkpoint_date_format` | str | `%Y-%m-%d %H:%M:%S` |
|
||||
| `CONFIG_FILE` | str | `config.yaml` |
|
||||
| `JPG_EXT` | str | `.jpg` |
|
||||
| `TXT_EXT` | str | `.txt` |
|
||||
| `OFFSET_FILE` | str | `offset.yaml` |
|
||||
| `SMALL_SIZE_KB` | int | `3` (KB threshold for split-upload small part) |
|
||||
| `CDN_CONFIG` | str | `cdn.yaml` |
|
||||
| `MODELS_FOLDER` | str | `models` |
|
||||
| `CURRENT_PT_MODEL` | str | `/azaion/models/azaion.pt` |
|
||||
| `CURRENT_ONNX_MODEL` | str | `/azaion/models/azaion.onnx` |
|
||||
|
||||
## Internal Logic
|
||||
Pure constant definitions using `os.path.join`. No functions, no classes, no dynamic behavior.
|
||||
|
||||
## Dependencies
|
||||
- `os.path` (stdlib)
|
||||
|
||||
## Consumers
|
||||
api_client, augmentation, exports, train, manual_run, start_inference, dataset-visualiser
|
||||
|
||||
## Data Models
|
||||
None.
|
||||
|
||||
## Configuration
|
||||
Defines `CONFIG_FILE = 'config.yaml'` and `CDN_CONFIG = 'cdn.yaml'` — the filenames for runtime configuration. Does not read them.
|
||||
|
||||
## External Integrations
|
||||
None.
|
||||
|
||||
## Security
|
||||
None.
|
||||
|
||||
## Tests
|
||||
None.
|
||||
@@ -0,0 +1,43 @@
|
||||
# Module: convert-annotations
|
||||
|
||||
## Purpose
|
||||
Standalone script that converts annotation files from external formats (Pascal VOC XML, oriented bounding box text) to YOLO format.
|
||||
|
||||
## Public Interface
|
||||
|
||||
| Function | Signature | Returns | Description |
|
||||
|----------|-----------|---------|-------------|
|
||||
| `convert` | `(folder, dest_folder, read_annotations, ann_format)` | — | Generic converter: reads images + annotations from folder, writes YOLO format to dest |
|
||||
| `minmax2yolo` | `(width, height, xmin, xmax, ymin, ymax) -> tuple` | (cx, cy, w, h) | Converts pixel min/max coords to normalized YOLO center format |
|
||||
| `read_pascal_voc` | `(width, height, s: str) -> list[str]` | YOLO label lines | Parses Pascal VOC XML, maps class names to IDs, outputs YOLO lines |
|
||||
| `read_bbox_oriented` | `(width, height, s: str) -> list[str]` | YOLO label lines | Parses 14-column oriented bbox format, outputs YOLO lines (hardcoded class 2) |
|
||||
| `rename_images` | `(folder)` | — | Renames files by trimming last 7 chars + replacing extension with .png |
|
||||
|
||||
## Internal Logic
|
||||
- **convert()**: Iterates image files in source folder, reads corresponding annotation file, calls format-specific reader, copies image and writes YOLO label to destination.
|
||||
- **Pascal VOC**: Parses XML `<object>` elements, maps class names via `name_class_map` (Truck→1, Car/Taxi→2), filters forbidden classes (Motorcycle). Default class = 1.
|
||||
- **Oriented bbox**: 14-column space-separated format, extracts min/max from columns 6–13, hardcodes class to 2.
|
||||
- **Validation**: Skips labels where normalized coordinates exceed 1.0 (out of bounds).
|
||||
|
||||
## Dependencies
|
||||
- `cv2` (external) — image reading for dimensions
|
||||
- `xml.etree.cElementTree` (stdlib) — Pascal VOC XML parsing
|
||||
- `os`, `shutil`, `pathlib` (stdlib)
|
||||
|
||||
## Consumers
|
||||
None (standalone script).
|
||||
|
||||
## Data Models
|
||||
None.
|
||||
|
||||
## Configuration
|
||||
Hardcoded class mappings: `name_class_map = {'Truck': 1, 'Car': 2, 'Taxi': 2}`, `forbidden_classes = ['Motorcycle']`.
|
||||
|
||||
## External Integrations
|
||||
Filesystem I/O only.
|
||||
|
||||
## Security
|
||||
None.
|
||||
|
||||
## Tests
|
||||
None.
|
||||
@@ -0,0 +1,41 @@
|
||||
# Module: dataset-visualiser
|
||||
|
||||
## Purpose
|
||||
Interactive tool for visually inspecting annotated images from datasets or the processed folder, displaying bounding boxes with class colors.
|
||||
|
||||
## Public Interface
|
||||
|
||||
| Function | Signature | Description |
|
||||
|----------|-----------|-------------|
|
||||
| `visualise_dataset` | `()` | Iterates images in a specific dataset folder, shows each with annotations. Waits for keypress. |
|
||||
| `visualise_processed_folder` | `()` | Shows images from the processed folder with annotations. |
|
||||
|
||||
## Internal Logic
|
||||
- **visualise_dataset()**: Hardcoded to a specific dataset date (`2024-06-18`), iterates from index 35247 onward. Reads image + labels, calls `ImageLabel.visualize()`, waits for user input to advance.
|
||||
- **visualise_processed_folder()**: Lists all processed images, shows the first one.
|
||||
- Both functions use `read_labels()` imported from a `preprocessing` module **which does not exist** in the codebase — this is a broken import.
|
||||
|
||||
## Dependencies
|
||||
- `constants` — directory paths (datasets_dir, prefix, processed_*)
|
||||
- `dto/annotationClass` — AnnotationClass for class colors
|
||||
- `dto/imageLabel` — ImageLabel for visualization
|
||||
- `preprocessing` — **MISSING MODULE** (read_labels function)
|
||||
- `cv2` (external), `matplotlib` (external), `os`, `pathlib` (stdlib)
|
||||
|
||||
## Consumers
|
||||
None (standalone script).
|
||||
|
||||
## Data Models
|
||||
Uses ImageLabel, AnnotationClass.
|
||||
|
||||
## Configuration
|
||||
Hardcoded dataset path and start index.
|
||||
|
||||
## External Integrations
|
||||
Filesystem I/O, matplotlib interactive display.
|
||||
|
||||
## Security
|
||||
None.
|
||||
|
||||
## Tests
|
||||
None.
|
||||
@@ -0,0 +1,49 @@
|
||||
# Module: dto/annotationClass
|
||||
|
||||
## Purpose
|
||||
Defines the `AnnotationClass` data model and `WeatherMode` enum used in the training pipeline. Reads annotation class definitions from `classes.json`.
|
||||
|
||||
## Public Interface
|
||||
|
||||
### WeatherMode (Enum)
|
||||
| Member | Value | Description |
|
||||
|--------|-------|-------------|
|
||||
| `Norm` | 0 | Normal weather |
|
||||
| `Wint` | 20 | Winter conditions |
|
||||
| `Night` | 40 | Night conditions |
|
||||
|
||||
### AnnotationClass
|
||||
| Field/Method | Type/Signature | Description |
|
||||
|-------------|----------------|-------------|
|
||||
| `id` | int | Class ID (weather_offset + base_id) |
|
||||
| `name` | str | Class name (with weather suffix if non-Norm) |
|
||||
| `color` | str | Hex color string (e.g. `#ff0000`) |
|
||||
| `color_tuple` | property → tuple | RGB tuple parsed from hex color |
|
||||
| `read_json()` | static → dict[int, AnnotationClass] | Reads `classes.json`, expands across weather modes, returns dict keyed by ID |
|
||||
|
||||
## Internal Logic
|
||||
- `read_json()` locates `classes.json` relative to the parent directory of the `dto/` package
|
||||
- For each of the 3 weather modes, creates an AnnotationClass per entry in `classes.json` with offset IDs (0, 20, 40)
|
||||
- This produces up to 80 classes total (17 base × 3 modes = 51, but the system reserves 80 slots)
|
||||
- `color_tuple` strips the first 3 characters of the color string and parses hex pairs
|
||||
|
||||
## Dependencies
|
||||
- `json`, `enum`, `os.path` (stdlib)
|
||||
|
||||
## Consumers
|
||||
train (for YAML generation), dataset-visualiser (for visualization colors)
|
||||
|
||||
## Data Models
|
||||
`AnnotationClass` — annotation class with ID, name, color. `WeatherMode` — enum for weather conditions.
|
||||
|
||||
## Configuration
|
||||
Reads `classes.json` from project root (relative path from `dto/` parent).
|
||||
|
||||
## External Integrations
|
||||
None.
|
||||
|
||||
## Security
|
||||
None.
|
||||
|
||||
## Tests
|
||||
None directly; used transitively by `tests/imagelabel_visualize_test.py`.
|
||||
@@ -0,0 +1,41 @@
|
||||
# Module: dto/imageLabel
|
||||
|
||||
## Purpose
|
||||
Container class for an image with its YOLO-format bounding box labels, plus a visualization method for debugging annotations.
|
||||
|
||||
## Public Interface
|
||||
|
||||
### ImageLabel
|
||||
| Field/Method | Type/Signature | Description |
|
||||
|-------------|----------------|-------------|
|
||||
| `image_path` | str | Filesystem path to the image |
|
||||
| `image` | numpy.ndarray | OpenCV image array |
|
||||
| `labels_path` | str | Filesystem path to the labels file |
|
||||
| `labels` | list[list] | List of YOLO bboxes: [x_center, y_center, width, height, class_id] |
|
||||
| `visualize` | `(annotation_classes: dict) -> None` | Draws bounding boxes on image and displays via matplotlib |
|
||||
|
||||
## Internal Logic
|
||||
- `visualize()` converts BGR→RGB, iterates labels, converts normalized YOLO coordinates to pixel coordinates, draws colored rectangles using `annotation_classes[class_num].color_tuple`, displays with matplotlib.
|
||||
- Labels use YOLO format: center_x, center_y, width, height (all normalized 0–1), class_id as last element.
|
||||
|
||||
## Dependencies
|
||||
- `cv2` (external) — image manipulation
|
||||
- `matplotlib.pyplot` (external) — image display
|
||||
|
||||
## Consumers
|
||||
augmentation (as augmented image container), dataset-visualiser (for visualization)
|
||||
|
||||
## Data Models
|
||||
`ImageLabel` — image + labels container.
|
||||
|
||||
## Configuration
|
||||
None.
|
||||
|
||||
## External Integrations
|
||||
None.
|
||||
|
||||
## Security
|
||||
None.
|
||||
|
||||
## Tests
|
||||
Used by `tests/imagelabel_visualize_test.py`.
|
||||
@@ -0,0 +1,53 @@
|
||||
# Module: exports
|
||||
|
||||
## Purpose
|
||||
Model export utilities: converts trained YOLO .pt models to ONNX, TensorRT, and RKNN formats. Also handles encrypted model upload (split big/small pattern) and data sampling.
|
||||
|
||||
## Public Interface
|
||||
|
||||
| Function | Signature | Returns | Description |
|
||||
|----------|-----------|---------|-------------|
|
||||
| `export_rknn` | `(model_path: str)` | — | Exports YOLO model to RKNN format (RK3588 target), cleans up temp folder |
|
||||
| `export_onnx` | `(model_path: str, batch_size: int = 4)` | — | Exports YOLO model to ONNX (1280px, NMS enabled, GPU device 0) |
|
||||
| `export_tensorrt` | `(model_path: str)` | — | Exports YOLO model to TensorRT engine (batch=4, half precision, NMS) |
|
||||
| `form_data_sample` | `(destination_path: str, size: int = 500, write_txt_log: bool = False)` | — | Creates a random sample of processed images |
|
||||
| `show_model` | `(model: str = None)` | — | Opens model visualization in netron |
|
||||
| `upload_model` | `(model_path: str, filename: str, size_small_in_kb: int = 3)` | — | Encrypts model, splits big/small, uploads to API + CDN |
|
||||
|
||||
## Internal Logic
|
||||
- **export_onnx**: Removes existing ONNX file if present, exports at 1280px with NMS baked in and simplification.
|
||||
- **export_tensorrt**: Uses YOLO's built-in TensorRT export (batch=4, FP16, NMS, simplify).
|
||||
- **export_rknn**: Exports to RKNN format targeting RK3588 SoC, moves result file and cleans temp directory.
|
||||
- **upload_model**: Encrypts with `Security.get_model_encryption_key()`, splits encrypted bytes at 30%/70% boundary (or `size_small_in_kb * 1024`), uploads small part to API, big part to CDN.
|
||||
- **form_data_sample**: Randomly shuffles processed images, copies first N to destination folder.
|
||||
|
||||
## Dependencies
|
||||
- `constants` — directory paths, model paths, config file names
|
||||
- `api_client` — ApiClient, ApiCredentials for upload
|
||||
- `cdn_manager` — CDNManager, CDNCredentials for CDN upload
|
||||
- `security` — model encryption key, encrypt_to
|
||||
- `utils` — Dotdict for config access
|
||||
- `ultralytics` (external) — YOLO model
|
||||
- `netron` (external) — model visualization
|
||||
- `yaml`, `os`, `shutil`, `random`, `pathlib` (stdlib)
|
||||
|
||||
## Consumers
|
||||
train (export_tensorrt, upload_model, export_onnx)
|
||||
|
||||
## Data Models
|
||||
None.
|
||||
|
||||
## Configuration
|
||||
Reads `config.yaml` for API credentials (in `upload_model`), `cdn.yaml` for CDN credentials.
|
||||
|
||||
## External Integrations
|
||||
- Ultralytics YOLO export pipeline
|
||||
- Netron model viewer
|
||||
- Azaion API + CDN for model upload
|
||||
|
||||
## Security
|
||||
- Models are encrypted with AES-256-CBC before upload
|
||||
- Split storage (big on CDN, small on API) prevents single-point compromise
|
||||
|
||||
## Tests
|
||||
None.
|
||||
@@ -0,0 +1,38 @@
|
||||
# Module: hardware_service
|
||||
|
||||
## Purpose
|
||||
Collects hardware fingerprint information (CPU, GPU, RAM, drive serial) from the host machine for use in hardware-bound encryption key derivation.
|
||||
|
||||
## Public Interface
|
||||
|
||||
| Function | Signature | Returns |
|
||||
|----------|-----------|---------|
|
||||
| `get_hardware_info` | `() -> str` | Formatted string: `CPU: {cpu}. GPU: {gpu}. Memory: {memory}. DriveSerial: {drive_serial}` |
|
||||
|
||||
## Internal Logic
|
||||
- Detects OS via `os.name` (`nt` for Windows, else Linux)
|
||||
- **Windows**: PowerShell commands to query `Win32_Processor`, `Win32_VideoController`, `Win32_OperatingSystem`, disk serial
|
||||
- **Linux**: `lscpu`, `lspci`, `free`, `/sys/block/sda/device/` serial
|
||||
- Parses multi-line output: first line = CPU, second = GPU, second-to-last = memory, last = drive serial
|
||||
- Handles multiple GPUs by taking first GPU and last two lines for memory/drive
|
||||
|
||||
## Dependencies
|
||||
- `os`, `subprocess` (stdlib)
|
||||
|
||||
## Consumers
|
||||
api_client (used in `load_bytes` to generate hardware string for encryption)
|
||||
|
||||
## Data Models
|
||||
None.
|
||||
|
||||
## Configuration
|
||||
None.
|
||||
|
||||
## External Integrations
|
||||
Executes OS-level shell commands to query hardware.
|
||||
|
||||
## Security
|
||||
The hardware fingerprint is used as input to `Security.get_hw_hash()` and subsequently `Security.get_api_encryption_key()`, binding API encryption to the specific machine.
|
||||
|
||||
## Tests
|
||||
None.
|
||||
@@ -0,0 +1,55 @@
|
||||
# Module: inference/dto
|
||||
|
||||
## Purpose
|
||||
Data transfer objects for the inference subsystem: Detection, Annotation, and a local copy of AnnotationClass/WeatherMode.
|
||||
|
||||
## Public Interface
|
||||
|
||||
### Detection
|
||||
| Field | Type | Description |
|
||||
|-------|------|-------------|
|
||||
| `x` | float | Normalized center X |
|
||||
| `y` | float | Normalized center Y |
|
||||
| `w` | float | Normalized width |
|
||||
| `h` | float | Normalized height |
|
||||
| `cls` | int | Class ID |
|
||||
| `confidence` | float | Detection confidence score |
|
||||
| `overlaps(det2, iou_threshold) -> bool` | method | IoU-based overlap check |
|
||||
|
||||
### Annotation
|
||||
| Field | Type | Description |
|
||||
|-------|------|-------------|
|
||||
| `frame` | numpy.ndarray | Video frame image |
|
||||
| `time` | int/float | Timestamp in the video |
|
||||
| `detections` | list[Detection] | Detected objects in this frame |
|
||||
|
||||
### AnnotationClass (duplicate)
|
||||
Same as `dto/annotationClass.AnnotationClass` but with an additional `opencv_color` field (BGR tuple). Reads from `classes.json` relative to `inference/` parent directory.
|
||||
|
||||
### WeatherMode (duplicate)
|
||||
Same as `dto/annotationClass.WeatherMode`.
|
||||
|
||||
## Internal Logic
|
||||
- `Detection.overlaps()` computes IoU between two bounding boxes and returns True if above threshold.
|
||||
- `AnnotationClass` here adds `opencv_color` as a pre-computed BGR tuple from the hex color for efficient OpenCV rendering.
|
||||
|
||||
## Dependencies
|
||||
- `json`, `enum`, `os.path` (stdlib)
|
||||
|
||||
## Consumers
|
||||
inference/inference
|
||||
|
||||
## Data Models
|
||||
Detection, Annotation, AnnotationClass, WeatherMode.
|
||||
|
||||
## Configuration
|
||||
Reads `classes.json` from project root.
|
||||
|
||||
## External Integrations
|
||||
None.
|
||||
|
||||
## Security
|
||||
None.
|
||||
|
||||
## Tests
|
||||
None.
|
||||
@@ -0,0 +1,48 @@
|
||||
# Module: inference/inference
|
||||
|
||||
## Purpose
|
||||
High-level video inference pipeline. Orchestrates preprocessing → engine inference → postprocessing → visualization for object detection on video streams.
|
||||
|
||||
## Public Interface
|
||||
|
||||
### Inference
|
||||
| Method | Signature | Returns | Description |
|
||||
|--------|-----------|---------|-------------|
|
||||
| `__init__` | `(engine: InferenceEngine, confidence_threshold, iou_threshold)` | — | Stores engine, thresholds, loads annotation classes |
|
||||
| `preprocess` | `(frames: list) -> np.ndarray` | Batched blob tensor | Normalizes, resizes, and stacks frames into NCHW blob |
|
||||
| `postprocess` | `(batch_frames, batch_timestamps, output) -> list[Annotation]` | Annotations per frame | Extracts detections from raw output, applies confidence filter and NMS |
|
||||
| `process` | `(video: str)` | — | End-to-end: reads video → batched inference → draws + displays results |
|
||||
| `draw` | `(annotation: Annotation)` | — | Draws bounding boxes with class labels on frame, shows via cv2.imshow |
|
||||
| `remove_overlapping_detections` | `(detections: list[Detection]) -> list[Detection]` | Filtered list | Custom NMS: removes overlapping detections keeping higher confidence |
|
||||
|
||||
## Internal Logic
|
||||
- **Video processing**: Reads video via cv2.VideoCapture, processes every 4th frame (frame_count % 4), batches frames to engine batch size.
|
||||
- **Preprocessing**: `cv2.dnn.blobFromImage` with 1/255 scaling, model input size, BGR→RGB swap.
|
||||
- **Postprocessing**: Iterates raw output, filters by confidence threshold, normalizes coordinates from model space to [0,1], creates Detection objects, applies custom NMS.
|
||||
- **Custom NMS**: Pairwise IoU comparison. When two detections overlap above threshold, keeps the one with higher confidence (ties broken by lower class ID).
|
||||
- **Visualization**: Draws colored rectangles and confidence labels using annotation class colors in OpenCV window.
|
||||
|
||||
## Dependencies
|
||||
- `inference/dto` — Detection, Annotation, AnnotationClass
|
||||
- `inference/onnx_engine` — InferenceEngine ABC (type hint)
|
||||
- `cv2` (external) — video I/O, image processing, display
|
||||
- `numpy` (external) — tensor operations
|
||||
|
||||
## Consumers
|
||||
start_inference
|
||||
|
||||
## Data Models
|
||||
Uses Detection, Annotation from `inference/dto`.
|
||||
|
||||
## Configuration
|
||||
`confidence_threshold` and `iou_threshold` set at construction.
|
||||
|
||||
## External Integrations
|
||||
- OpenCV video capture (file or stream input)
|
||||
- OpenCV GUI window for real-time display
|
||||
|
||||
## Security
|
||||
None.
|
||||
|
||||
## Tests
|
||||
None.
|
||||
@@ -0,0 +1,50 @@
|
||||
# Module: inference/onnx_engine
|
||||
|
||||
## Purpose
|
||||
Defines the abstract `InferenceEngine` base class and the `OnnxEngine` implementation for running ONNX model inference with GPU acceleration.
|
||||
|
||||
## Public Interface
|
||||
|
||||
### InferenceEngine (ABC)
|
||||
| Method | Signature | Description |
|
||||
|--------|-----------|-------------|
|
||||
| `__init__` | `(model_path: str, batch_size: int = 1, **kwargs)` | Abstract constructor |
|
||||
| `get_input_shape` | `() -> Tuple[int, int]` | Returns (height, width) of model input |
|
||||
| `get_batch_size` | `() -> int` | Returns the batch size |
|
||||
| `run` | `(input_data: np.ndarray) -> List[np.ndarray]` | Runs inference, returns output tensors |
|
||||
|
||||
### OnnxEngine (extends InferenceEngine)
|
||||
| Method | Signature | Description |
|
||||
|--------|-----------|-------------|
|
||||
| `__init__` | `(model_bytes, batch_size: int = 1, **kwargs)` | Loads ONNX model from bytes, creates InferenceSession with CUDA+CPU providers |
|
||||
| `get_input_shape` | `() -> Tuple[int, int]` | Returns (height, width) from model input shape |
|
||||
| `get_batch_size` | `() -> int` | Returns batch size (from model shape or constructor arg) |
|
||||
| `run` | `(input_data: np.ndarray) -> List[np.ndarray]` | Runs ONNX inference session |
|
||||
|
||||
## Internal Logic
|
||||
- Uses ONNX Runtime with `CUDAExecutionProvider` (primary) and `CPUExecutionProvider` (fallback).
|
||||
- Reads model metadata to extract class names from custom metadata map.
|
||||
- If model input shape has a fixed batch dimension (not -1), overrides the constructor batch_size.
|
||||
|
||||
## Dependencies
|
||||
- `onnxruntime` (external) — ONNX inference runtime
|
||||
- `numpy` (external)
|
||||
- `abc`, `typing` (stdlib)
|
||||
|
||||
## Consumers
|
||||
inference/inference, inference/tensorrt_engine (inherits InferenceEngine), train (imports OnnxEngine)
|
||||
|
||||
## Data Models
|
||||
None.
|
||||
|
||||
## Configuration
|
||||
None.
|
||||
|
||||
## External Integrations
|
||||
- ONNX Runtime GPU execution (CUDA)
|
||||
|
||||
## Security
|
||||
None.
|
||||
|
||||
## Tests
|
||||
None.
|
||||
@@ -0,0 +1,53 @@
|
||||
# Module: inference/tensorrt_engine
|
||||
|
||||
## Purpose
|
||||
TensorRT-based inference engine implementation. Provides GPU-accelerated inference using NVIDIA TensorRT with CUDA memory management, plus ONNX-to-TensorRT conversion.
|
||||
|
||||
## Public Interface
|
||||
|
||||
### TensorRTEngine (extends InferenceEngine)
|
||||
| Method | Signature | Returns | Description |
|
||||
|--------|-----------|---------|-------------|
|
||||
| `__init__` | `(model_bytes: bytes, **kwargs)` | — | Deserializes TensorRT engine from bytes, allocates CUDA memory |
|
||||
| `get_input_shape` | `() -> Tuple[int, int]` | (height, width) | Returns model input dimensions |
|
||||
| `get_batch_size` | `() -> int` | int | Returns configured batch size |
|
||||
| `run` | `(input_data: np.ndarray) -> List[np.ndarray]` | Output tensors | Runs async inference on CUDA stream |
|
||||
| `get_gpu_memory_bytes` | `(device_id=0) -> int` | GPU memory in bytes | Queries total GPU VRAM via pynvml (static) |
|
||||
| `get_engine_filename` | `(device_id=0) -> str \| None` | Filename string | Generates device-specific engine filename (static) |
|
||||
| `convert_from_onnx` | `(onnx_model: bytes) -> bytes \| None` | Serialized TensorRT plan | Converts ONNX model to TensorRT engine (static) |
|
||||
|
||||
## Internal Logic
|
||||
- **Initialization**: Deserializes TensorRT engine, creates execution context, allocates pinned host memory and device memory for input/output tensors.
|
||||
- **Dynamic shapes**: Handles -1 (dynamic) dimensions, defaults to 1280×1280 for spatial dims, batch size from engine or constructor.
|
||||
- **Output shape**: [batch_size, 300 max detections, 6 values per detection (x1, y1, x2, y2, conf, cls)].
|
||||
- **Inference flow**: Host→Device async copy → execute_async_v3 → synchronize → Device→Host copy.
|
||||
- **ONNX conversion**: Creates TensorRT builder, parses ONNX, configures workspace (90% of GPU memory), enables FP16 if supported, builds serialized network.
|
||||
- **Engine filename**: `azaion.cc_{major}.{minor}_sm_{sm_count}.engine` — uniquely identifies engine per GPU architecture.
|
||||
|
||||
## Dependencies
|
||||
- `inference/onnx_engine` — InferenceEngine ABC
|
||||
- `tensorrt` (external) — TensorRT runtime and builder
|
||||
- `pycuda.driver` (external) — CUDA memory management
|
||||
- `pycuda.autoinit` (external) — CUDA context auto-initialization
|
||||
- `pynvml` (external) — GPU memory query
|
||||
- `numpy`, `json`, `struct`, `re`, `subprocess`, `pathlib`, `typing` (stdlib/external)
|
||||
|
||||
## Consumers
|
||||
start_inference
|
||||
|
||||
## Data Models
|
||||
None.
|
||||
|
||||
## Configuration
|
||||
None.
|
||||
|
||||
## External Integrations
|
||||
- NVIDIA TensorRT runtime (GPU inference)
|
||||
- CUDA driver API (memory allocation, streams)
|
||||
- NVML (GPU hardware queries)
|
||||
|
||||
## Security
|
||||
None.
|
||||
|
||||
## Tests
|
||||
None.
|
||||
@@ -0,0 +1,36 @@
|
||||
# Module: manual_run
|
||||
|
||||
## Purpose
|
||||
Ad-hoc script for manual training operations. Contains commented-out alternatives and a hardcoded workflow for copying model weights and exporting.
|
||||
|
||||
## Public Interface
|
||||
No functions or classes. Script-level code only.
|
||||
|
||||
## Internal Logic
|
||||
- Contains commented-out calls to `Augmentator().augment_annotations()`, `train.train_dataset()`, `train.resume_training()`.
|
||||
- Active code: references a specific model date (`2025-05-18`), removes intermediate epoch checkpoint files, copies `best.pt` to `CURRENT_PT_MODEL`, then calls `train.export_current_model()`.
|
||||
- Serves as a developer convenience script for one-off training/export operations.
|
||||
|
||||
## Dependencies
|
||||
- `constants` — models_dir, prefix, CURRENT_PT_MODEL
|
||||
- `train` — export_current_model
|
||||
- `augmentation` — Augmentator (imported, usage commented out)
|
||||
- `glob`, `os`, `shutil` (stdlib)
|
||||
|
||||
## Consumers
|
||||
None (standalone script).
|
||||
|
||||
## Data Models
|
||||
None.
|
||||
|
||||
## Configuration
|
||||
Hardcoded model date: `2025-05-18`.
|
||||
|
||||
## External Integrations
|
||||
Filesystem operations on `/azaion/models/`.
|
||||
|
||||
## Security
|
||||
None.
|
||||
|
||||
## Tests
|
||||
None.
|
||||
@@ -0,0 +1,45 @@
|
||||
# Module: security
|
||||
|
||||
## Purpose
|
||||
Provides AES-256-CBC encryption/decryption and key derivation functions used to protect model files and API resources in transit.
|
||||
|
||||
## Public Interface
|
||||
|
||||
| Method | Signature | Returns | Description |
|
||||
|--------|-----------|---------|-------------|
|
||||
| `Security.encrypt_to` | `(input_bytes: bytes, key: str) -> bytes` | IV + ciphertext | AES-256-CBC encrypt with PKCS7 padding; prepends 16-byte random IV |
|
||||
| `Security.decrypt_to` | `(ciphertext_with_iv_bytes: bytes, key: str) -> bytes` | plaintext bytes | Extracts IV from first 16 bytes, decrypts, removes PKCS7 padding |
|
||||
| `Security.calc_hash` | `(key: str) -> str` | base64-encoded SHA-384 hash | General-purpose hash function |
|
||||
| `Security.get_hw_hash` | `(hardware: str) -> str` | base64 hash | Derives a hardware-specific hash using `Azaion_{hardware}_%$$$)0_` salt |
|
||||
| `Security.get_api_encryption_key` | `(creds, hardware_hash: str) -> str` | base64 hash | Derives API encryption key from credentials + hardware hash |
|
||||
| `Security.get_model_encryption_key` | `() -> str` | base64 hash | Returns a fixed encryption key derived from a hardcoded secret string |
|
||||
|
||||
## Internal Logic
|
||||
- Encryption: SHA-256 of the key string → 32-byte AES key. Random 16-byte IV generated per encryption. PKCS7 padding applied. Output = IV ∥ ciphertext.
|
||||
- Decryption: First 16 bytes = IV, remainder = ciphertext. Manual PKCS7 unpadding (checks last byte is 1–16).
|
||||
- Key derivation uses SHA-384 + base64 encoding for all hash-based keys.
|
||||
- `BUFFER_SIZE = 64 * 1024` is declared but unused.
|
||||
|
||||
## Dependencies
|
||||
- `cryptography.hazmat` (external) — AES cipher, CBC mode, PKCS7 padding
|
||||
- `hashlib`, `base64`, `os` (stdlib)
|
||||
|
||||
## Consumers
|
||||
api_client, exports, train, start_inference, tests/security_test
|
||||
|
||||
## Data Models
|
||||
None.
|
||||
|
||||
## Configuration
|
||||
None consumed at runtime. Contains hardcoded key material.
|
||||
|
||||
## External Integrations
|
||||
None.
|
||||
|
||||
## Security
|
||||
- **Hardcoded model encryption key**: `get_model_encryption_key()` uses a static string `'-#%@AzaionKey@%#---234sdfklgvhjbnn'`. This is a significant security concern — the key should be stored in a secrets manager or environment variable.
|
||||
- API encryption key is derived from user credentials + hardware fingerprint, providing per-device uniqueness.
|
||||
- AES-256-CBC with random IV is cryptographically sound for symmetric encryption.
|
||||
|
||||
## Tests
|
||||
- `tests/security_test.py` — basic round-trip encrypt/decrypt test (script-based, no test framework).
|
||||
@@ -0,0 +1,52 @@
|
||||
# Module: start_inference
|
||||
|
||||
## Purpose
|
||||
Entry point for running inference on video files using a TensorRT engine. Downloads the encrypted model from the API/CDN, initializes the engine, and processes video.
|
||||
|
||||
## Public Interface
|
||||
|
||||
| Function | Signature | Returns | Description |
|
||||
|----------|-----------|---------|-------------|
|
||||
| `get_engine_filename` | `(device_id=0) -> str \| None` | Engine filename | Generates GPU-specific engine filename (duplicate of TensorRTEngine.get_engine_filename) |
|
||||
|
||||
`__main__` block: Creates ApiClient, downloads encrypted TensorRT model (split big/small), initializes TensorRTEngine, runs Inference on a test video.
|
||||
|
||||
## Internal Logic
|
||||
- **Model download flow**: ApiClient → `load_big_small_resource` → reassembles from local big part + API-downloaded small part → decrypts with model encryption key → raw engine bytes.
|
||||
- **Inference setup**: TensorRTEngine initialized from decrypted bytes, Inference configured with confidence_threshold=0.5, iou_threshold=0.3.
|
||||
- **Video source**: Hardcoded to `tests/ForAI_test.mp4`.
|
||||
- **get_engine_filename()**: Duplicates `TensorRTEngine.get_engine_filename()` — generates `azaion.cc_{major}.{minor}_sm_{sm_count}.engine` based on CUDA device compute capability and SM count.
|
||||
|
||||
## Dependencies
|
||||
- `constants` — config file paths
|
||||
- `api_client` — ApiClient, ApiCredentials for model download
|
||||
- `cdn_manager` — CDNManager, CDNCredentials (imported but CDN managed by api_client)
|
||||
- `inference/inference` — Inference pipeline
|
||||
- `inference/tensorrt_engine` — TensorRTEngine
|
||||
- `security` — model encryption key
|
||||
- `utils` — Dotdict
|
||||
- `pycuda.driver` (external) — CUDA device queries
|
||||
- `yaml` (external)
|
||||
|
||||
## Consumers
|
||||
None (entry point).
|
||||
|
||||
## Data Models
|
||||
None.
|
||||
|
||||
## Configuration
|
||||
- Confidence threshold: 0.5
|
||||
- IoU threshold: 0.3
|
||||
- Video path: `tests/ForAI_test.mp4` (hardcoded)
|
||||
|
||||
## External Integrations
|
||||
- Azaion API + CDN for model download
|
||||
- TensorRT GPU inference
|
||||
- OpenCV video capture and display
|
||||
|
||||
## Security
|
||||
- Model is downloaded encrypted (split big/small) and decrypted locally
|
||||
- Uses hardware-bound and model encryption keys
|
||||
|
||||
## Tests
|
||||
None.
|
||||
@@ -0,0 +1,61 @@
|
||||
# Module: train
|
||||
|
||||
## Purpose
|
||||
Main training pipeline. Forms YOLO datasets from processed annotations, trains YOLOv11 models, and exports/uploads the trained model.
|
||||
|
||||
## Public Interface
|
||||
|
||||
| Function | Signature | Returns | Description |
|
||||
|----------|-----------|---------|-------------|
|
||||
| `form_dataset` | `()` | — | Creates train/valid/test split from processed images |
|
||||
| `copy_annotations` | `(images, folder: str)` | — | Copies image+label pairs to a dataset split folder (concurrent) |
|
||||
| `check_label` | `(label_path: str) -> bool` | bool | Validates YOLO label file (all coords ≤ 1.0) |
|
||||
| `create_yaml` | `()` | — | Generates YOLO `data.yaml` with class names from `classes.json` |
|
||||
| `resume_training` | `(last_pt_path: str)` | — | Resumes training from a checkpoint |
|
||||
| `train_dataset` | `()` | — | Full pipeline: form_dataset → create_yaml → train YOLOv11 → save model |
|
||||
| `export_current_model` | `()` | — | Exports current .pt to ONNX, encrypts, uploads as split resource |
|
||||
|
||||
## Internal Logic
|
||||
- **Dataset formation**: Shuffles all processed images, splits 70/20/10 (train/valid/test). Copies in parallel via ThreadPoolExecutor. Corrupted labels (coords > 1.0) are moved to `/azaion/data-corrupted/`.
|
||||
- **YAML generation**: Reads annotation classes from `classes.json`, builds `data.yaml` with 80 class names (17 actual + 63 placeholders "Class-N"), sets train/valid/test paths.
|
||||
- **Training**: YOLOv11 medium (`yolo11m.yaml`), 120 epochs, batch=11 (tuned for 24GB VRAM), 1280px input, save every epoch, 24 workers.
|
||||
- **Post-training**: Copies results to `/azaion/models/{date}/`, removes intermediate epoch checkpoints, copies `best.pt` to `CURRENT_PT_MODEL`.
|
||||
- **Export**: Calls `export_onnx`, reads the ONNX file, encrypts with model key, uploads via `upload_big_small_resource`.
|
||||
- **Dataset naming**: `azaion-{YYYY-MM-DD}` using current date.
|
||||
- **`__main__`**: Runs `train_dataset()` then `export_current_model()`.
|
||||
|
||||
## Dependencies
|
||||
- `constants` — all directory/path constants
|
||||
- `api_client` — ApiClient for model upload
|
||||
- `cdn_manager` — CDNCredentials, CDNManager (imported but CDN init done via api_client)
|
||||
- `dto/annotationClass` — AnnotationClass for class name generation
|
||||
- `inference/onnx_engine` — OnnxEngine (imported but unused in current code)
|
||||
- `security` — model encryption key
|
||||
- `utils` — Dotdict
|
||||
- `exports` — export_tensorrt, upload_model, export_onnx
|
||||
- `ultralytics` (external) — YOLO training and export
|
||||
- `yaml`, `concurrent.futures`, `glob`, `os`, `random`, `shutil`, `subprocess`, `datetime`, `pathlib`, `time` (stdlib)
|
||||
|
||||
## Consumers
|
||||
manual_run
|
||||
|
||||
## Data Models
|
||||
Uses AnnotationClass for class definitions.
|
||||
|
||||
## Configuration
|
||||
- Training hyperparameters hardcoded: epochs=120, batch=11, imgsz=1280, save_period=1, workers=24
|
||||
- Dataset split ratios: train_set=70, valid_set=20, test_set=10
|
||||
- old_images_percentage=75 (declared but unused)
|
||||
- DEFAULT_CLASS_NUM=80
|
||||
|
||||
## External Integrations
|
||||
- Ultralytics YOLOv11 training pipeline
|
||||
- Azaion API + CDN for model upload
|
||||
- Filesystem: `/azaion/datasets/`, `/azaion/models/`, `/azaion/data-processed/`, `/azaion/data-corrupted/`
|
||||
|
||||
## Security
|
||||
- Trained models are encrypted before upload
|
||||
- Uses `Security.get_model_encryption_key()` for encryption
|
||||
|
||||
## Tests
|
||||
None.
|
||||
@@ -0,0 +1,36 @@
|
||||
# Module: utils
|
||||
|
||||
## Purpose
|
||||
Provides a dictionary subclass that supports dot-notation attribute access.
|
||||
|
||||
## Public Interface
|
||||
|
||||
| Name | Type | Signature |
|
||||
|------|------|-----------|
|
||||
| `Dotdict` | class (extends `dict`) | `Dotdict(dict)` |
|
||||
|
||||
`Dotdict` overrides `__getattr__`, `__setattr__`, `__delattr__` to delegate to `dict.get`, `dict.__setitem__`, `dict.__delitem__` respectively.
|
||||
|
||||
## Internal Logic
|
||||
Single-class module. Allows `config.url` instead of `config["url"]` for YAML-loaded dicts.
|
||||
|
||||
## Dependencies
|
||||
None (stdlib `dict` only).
|
||||
|
||||
## Consumers
|
||||
exports, train, start_inference
|
||||
|
||||
## Data Models
|
||||
None.
|
||||
|
||||
## Configuration
|
||||
None.
|
||||
|
||||
## External Integrations
|
||||
None.
|
||||
|
||||
## Security
|
||||
None.
|
||||
|
||||
## Tests
|
||||
None.
|
||||
@@ -0,0 +1,6 @@
|
||||
1. update yolo to 26m version
|
||||
2. don't use external augmentation, use built-in in yolo, put additional parameters for that in train command, each parameter should be on its own line with a proper comment
|
||||
3. because of that, we don't need processed folder, just use data dir.
|
||||
4. do not copy the files itself to dataset folder, use hard simlynks for that
|
||||
5. unify constants directories in config - remove annotations-queue/config.yaml
|
||||
and use constants for that
|
||||
@@ -0,0 +1,22 @@
|
||||
{
|
||||
"current_step": "complete",
|
||||
"completed_steps": ["discovery", "module-analysis", "component-assembly", "system-synthesis", "verification", "solution-extraction", "problem-extraction", "final-report"],
|
||||
"focus_dir": null,
|
||||
"modules_total": 21,
|
||||
"modules_documented": [
|
||||
"constants", "utils", "security", "hardware_service", "cdn_manager",
|
||||
"dto/annotationClass", "dto/imageLabel", "inference/dto", "inference/onnx_engine",
|
||||
"api_client", "augmentation", "inference/tensorrt_engine", "inference/inference",
|
||||
"exports", "convert-annotations", "dataset-visualiser",
|
||||
"train", "start_inference",
|
||||
"manual_run",
|
||||
"annotation-queue/annotation_queue_dto", "annotation-queue/annotation_queue_handler"
|
||||
],
|
||||
"modules_remaining": [],
|
||||
"module_batch": 7,
|
||||
"components_written": [
|
||||
"01_core", "02_security", "03_api_cdn", "04_data_models",
|
||||
"05_data_pipeline", "06_training", "07_inference", "08_annotation_queue"
|
||||
],
|
||||
"last_updated": "2026-03-26T00:00:00Z"
|
||||
}
|
||||
@@ -0,0 +1,188 @@
|
||||
# System Flows
|
||||
|
||||
## Flow 1: Annotation Ingestion (Annotation Queue → Filesystem)
|
||||
|
||||
```mermaid
|
||||
sequenceDiagram
|
||||
participant RMQ as RabbitMQ Streams
|
||||
participant AQH as AnnotationQueueHandler
|
||||
participant FS as Filesystem
|
||||
|
||||
RMQ->>AQH: AMQP message (msgpack)
|
||||
AQH->>AQH: Decode message, read AnnotationStatus
|
||||
|
||||
alt Created / Edited
|
||||
AQH->>AQH: Parse AnnotationMessage (image + detections)
|
||||
alt Validator / Admin role
|
||||
AQH->>FS: Write label → /data/labels/{name}.txt
|
||||
AQH->>FS: Write image → /data/images/{name}.jpg
|
||||
else Operator role
|
||||
AQH->>FS: Write label → /data-seed/labels/{name}.txt
|
||||
AQH->>FS: Write image → /data-seed/images/{name}.jpg
|
||||
end
|
||||
else Validated (bulk)
|
||||
AQH->>FS: Move images+labels from /data-seed/ → /data/
|
||||
else Deleted (bulk)
|
||||
AQH->>FS: Move images+labels → /data_deleted/
|
||||
end
|
||||
|
||||
AQH->>FS: Persist offset to offset.yaml
|
||||
```
|
||||
|
||||
### Data Flow Table
|
||||
|
||||
| Step | Input | Output | Component |
|
||||
|------|-------|--------|-----------|
|
||||
| Receive | AMQP message (msgpack) | AnnotationMessage / AnnotationBulkMessage | Annotation Queue |
|
||||
| Route | AnnotationStatus header | Dispatch to save/validate/delete | Annotation Queue |
|
||||
| Save | Image bytes + detection JSON | .jpg + .txt files on disk | Annotation Queue |
|
||||
| Track | Message context offset | offset.yaml | Annotation Queue |
|
||||
|
||||
---
|
||||
|
||||
## Flow 2: Data Augmentation
|
||||
|
||||
```mermaid
|
||||
sequenceDiagram
|
||||
participant FS as Filesystem (/azaion/data/)
|
||||
participant AUG as Augmentator
|
||||
participant PFS as Filesystem (/azaion/data-processed/)
|
||||
|
||||
loop Every 5 minutes
|
||||
AUG->>FS: Scan /data/images/ for unprocessed files
|
||||
AUG->>AUG: Filter out already-processed images
|
||||
loop Each unprocessed image (parallel)
|
||||
AUG->>FS: Read image + labels
|
||||
AUG->>AUG: Correct bounding boxes (clip + filter)
|
||||
AUG->>AUG: Generate 7 augmented variants
|
||||
AUG->>PFS: Write 8 images (original + 7 augmented)
|
||||
AUG->>PFS: Write 8 label files
|
||||
end
|
||||
AUG->>AUG: Sleep 5 minutes
|
||||
end
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Flow 3: Training Pipeline
|
||||
|
||||
```mermaid
|
||||
sequenceDiagram
|
||||
participant PFS as Filesystem (/data-processed/)
|
||||
participant TRAIN as train.py
|
||||
participant DS as Filesystem (/datasets/)
|
||||
participant YOLO as Ultralytics YOLO
|
||||
participant API as Azaion API
|
||||
participant CDN as S3 CDN
|
||||
|
||||
TRAIN->>PFS: Read all processed images
|
||||
TRAIN->>TRAIN: Shuffle, split 70/20/10
|
||||
TRAIN->>DS: Copy to train/valid/test folders
|
||||
Note over TRAIN: Corrupted labels → /data-corrupted/
|
||||
|
||||
TRAIN->>TRAIN: Generate data.yaml (80 class names)
|
||||
TRAIN->>YOLO: Train yolo11m (120 epochs, batch=11, 1280px)
|
||||
YOLO-->>TRAIN: Training results + best.pt
|
||||
|
||||
TRAIN->>DS: Copy results to /models/{date}/
|
||||
TRAIN->>TRAIN: Copy best.pt → /models/azaion.pt
|
||||
|
||||
TRAIN->>TRAIN: Export .pt → .onnx (1280px, batch=4)
|
||||
TRAIN->>TRAIN: Read azaion.onnx bytes
|
||||
TRAIN->>TRAIN: Encrypt with model key (AES-256-CBC)
|
||||
TRAIN->>TRAIN: Split: small (≤3KB or 20%) + big (rest)
|
||||
|
||||
TRAIN->>API: Upload azaion.onnx.small
|
||||
TRAIN->>CDN: Upload azaion.onnx.big
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Flow 4: Model Download & Inference
|
||||
|
||||
```mermaid
|
||||
sequenceDiagram
|
||||
participant INF as start_inference.py
|
||||
participant API as Azaion API
|
||||
participant CDN as S3 CDN
|
||||
participant SEC as Security
|
||||
participant TRT as TensorRTEngine
|
||||
participant VID as Video File
|
||||
participant GUI as OpenCV Window
|
||||
|
||||
INF->>INF: Determine GPU-specific engine filename
|
||||
INF->>SEC: Get model encryption key
|
||||
|
||||
INF->>API: Login (JWT)
|
||||
INF->>API: Download {engine}.small (encrypted)
|
||||
INF->>INF: Read {engine}.big from local disk
|
||||
INF->>INF: Reassemble: small + big
|
||||
INF->>SEC: Decrypt (AES-256-CBC)
|
||||
|
||||
INF->>TRT: Initialize engine from bytes
|
||||
TRT->>TRT: Allocate CUDA memory (input + output)
|
||||
|
||||
loop Video frames
|
||||
INF->>VID: Read frame (every 4th)
|
||||
INF->>INF: Batch frames to batch_size
|
||||
|
||||
INF->>TRT: Preprocess (blob, normalize, resize)
|
||||
TRT->>TRT: CUDA memcpy host→device
|
||||
TRT->>TRT: Execute inference (async)
|
||||
TRT->>TRT: CUDA memcpy device→host
|
||||
|
||||
INF->>INF: Postprocess (confidence filter + NMS)
|
||||
INF->>GUI: Draw bounding boxes + display
|
||||
end
|
||||
```
|
||||
|
||||
### Data Flow Table
|
||||
|
||||
| Step | Input | Output | Component |
|
||||
|------|-------|--------|-----------|
|
||||
| Model resolve | GPU compute capability | Engine filename | Inference |
|
||||
| Download small | API endpoint + JWT | Encrypted small bytes | API & CDN |
|
||||
| Load big | Local filesystem | Encrypted big bytes | API & CDN |
|
||||
| Reassemble | small + big bytes | Full encrypted model | API & CDN |
|
||||
| Decrypt | Encrypted model + key | Raw TensorRT engine | Security |
|
||||
| Init engine | Engine bytes | CUDA buffers allocated | Inference |
|
||||
| Preprocess | Video frame | NCHW float32 blob | Inference |
|
||||
| Inference | Input blob | Raw detection tensor | Inference |
|
||||
| Postprocess | Raw tensor | List[Detection] | Inference |
|
||||
| Visualize | Detections + frame | Annotated frame | Inference |
|
||||
|
||||
---
|
||||
|
||||
## Flow 5: Model Export (Multi-Format)
|
||||
|
||||
```mermaid
|
||||
flowchart LR
|
||||
PT[azaion.pt] -->|export_onnx| ONNX[azaion.onnx]
|
||||
PT -->|export_tensorrt| TRT[azaion.engine]
|
||||
PT -->|export_rknn| RKNN[azaion.rknn]
|
||||
ONNX -->|encrypt + split| UPLOAD[API + CDN upload]
|
||||
TRT -->|encrypt + split| UPLOAD
|
||||
```
|
||||
|
||||
| Target Format | Resolution | Batch | Precision | Use Case |
|
||||
|---------------|-----------|-------|-----------|----------|
|
||||
| ONNX | 1280px | 4 | FP32 | Cross-platform inference |
|
||||
| TensorRT | auto | 4 | FP16 | Production GPU inference |
|
||||
| RKNN | auto | auto | auto | OrangePi5 edge device |
|
||||
|
||||
---
|
||||
|
||||
## Error Scenarios
|
||||
|
||||
| Flow | Error | Handling |
|
||||
|------|-------|---------|
|
||||
| Annotation ingestion | Malformed message | Caught by on_message exception handler, logged |
|
||||
| Annotation ingestion | Queue disconnect | Process exits (no reconnect logic) |
|
||||
| Augmentation | Corrupted image | Caught per-thread, logged, skipped |
|
||||
| Augmentation | Transform failure | Caught per-variant, logged, fewer augmentations produced |
|
||||
| Training | Corrupted label (coords > 1.0) | Moved to /data-corrupted/ |
|
||||
| Training | Power outage | save_period=1 enables resume_training from last epoch |
|
||||
| API download | 401/403 | Auto-relogin + retry |
|
||||
| API download | 500 | Printed, no retry |
|
||||
| Inference | CUDA error | RuntimeError raised |
|
||||
| CDN upload/download | Any exception | Caught, printed, returns False |
|
||||
@@ -0,0 +1,285 @@
|
||||
# Blackbox Test Scenarios
|
||||
|
||||
## BT-AUG: Augmentation Pipeline
|
||||
|
||||
### BT-AUG-01: Single image produces 8 outputs
|
||||
- **Input**: 1 image + 1 valid label from fixture dataset
|
||||
- **Action**: Run `Augmentator.augment_inner()` on the image
|
||||
- **Expected**: Returns list of exactly 8 ImageLabel objects
|
||||
- **Traces**: AC: 8× augmentation ratio
|
||||
|
||||
### BT-AUG-02: Augmented filenames follow naming convention
|
||||
- **Input**: Image with stem "test_image"
|
||||
- **Action**: Run `augment_inner()`
|
||||
- **Expected**: Output filenames: `test_image.jpg`, `test_image_1.jpg` through `test_image_7.jpg`; matching `.txt` labels
|
||||
- **Traces**: AC: Augmentation output format
|
||||
|
||||
### BT-AUG-03: All output bounding boxes in valid range
|
||||
- **Input**: 1 image + label with multiple bboxes
|
||||
- **Action**: Run `augment_inner()`
|
||||
- **Expected**: Every bbox coordinate in every output label is in [0.0, 1.0]
|
||||
- **Traces**: AC: Bounding boxes clipped to [0, 1]
|
||||
|
||||
### BT-AUG-04: Bounding box correction clips edge bboxes
|
||||
- **Input**: Label with bbox near edge: `0 0.99 0.5 0.2 0.1`
|
||||
- **Action**: Run `correct_bboxes()`
|
||||
- **Expected**: Width reduced so bbox fits within [margin, 1-margin]; no coordinate exceeds bounds
|
||||
- **Traces**: AC: Bounding boxes clipped to [0, 1]
|
||||
|
||||
### BT-AUG-05: Tiny bounding boxes removed after correction
|
||||
- **Input**: Label with tiny bbox that becomes < 0.01 after clipping
|
||||
- **Action**: Run `correct_bboxes()`
|
||||
- **Expected**: Bbox removed from output (area < correct_min_bbox_size)
|
||||
- **Traces**: AC: Bounding boxes with area < 0.01% discarded
|
||||
|
||||
### BT-AUG-06: Empty label produces 8 outputs with empty labels
|
||||
- **Input**: 1 image + empty label file
|
||||
- **Action**: Run `augment_inner()`
|
||||
- **Expected**: 8 ImageLabel objects returned; all have empty labels lists
|
||||
- **Traces**: AC: Augmentation handles empty annotations
|
||||
|
||||
### BT-AUG-07: Full augmentation pipeline (filesystem integration)
|
||||
- **Input**: 5 images + labels copied to data/ directory in tmp_path
|
||||
- **Action**: Run `augment_annotations()` with patched paths
|
||||
- **Expected**: 40 images (5 × 8) in processed images dir; 40 matching labels in processed labels dir
|
||||
- **Traces**: AC: 8× augmentation, filesystem output
|
||||
|
||||
### BT-AUG-08: Augmentation skips already-processed images
|
||||
- **Input**: 5 images in data/; 3 already present in processed/ dir
|
||||
- **Action**: Run `augment_annotations()`
|
||||
- **Expected**: Only 2 new images processed (16 new outputs); existing 3 untouched
|
||||
- **Traces**: AC: Augmentation processes only unprocessed images
|
||||
|
||||
---
|
||||
|
||||
## BT-DSF: Dataset Formation
|
||||
|
||||
### BT-DSF-01: 70/20/10 split ratio
|
||||
- **Input**: 100 images + labels in processed/ dir
|
||||
- **Action**: Run `form_dataset()` with patched paths
|
||||
- **Expected**: train: 70 images+labels, valid: 20, test: 10
|
||||
- **Traces**: AC: Dataset split 70/20/10
|
||||
|
||||
### BT-DSF-02: Split directories structure
|
||||
- **Input**: 100 images + labels
|
||||
- **Action**: Run `form_dataset()`
|
||||
- **Expected**: Created: `train/images/`, `train/labels/`, `valid/images/`, `valid/labels/`, `test/images/`, `test/labels/`
|
||||
- **Traces**: AC: YOLO dataset directory structure
|
||||
|
||||
### BT-DSF-03: Total files preserved across splits
|
||||
- **Input**: 100 valid images + labels
|
||||
- **Action**: Run `form_dataset()`
|
||||
- **Expected**: `count(train) + count(valid) + count(test) == 100` (no data loss)
|
||||
- **Traces**: AC: Dataset integrity
|
||||
|
||||
### BT-DSF-04: Corrupted labels moved to corrupted directory
|
||||
- **Input**: 95 valid + 5 corrupted labels (coords > 1.0)
|
||||
- **Action**: Run `form_dataset()` with patched paths
|
||||
- **Expected**: 5 images+labels in `data-corrupted/`; 95 across train/valid/test splits
|
||||
- **Traces**: AC: Corrupted labels filtered
|
||||
|
||||
---
|
||||
|
||||
## BT-LBL: Label Validation
|
||||
|
||||
### BT-LBL-01: Valid label accepted
|
||||
- **Input**: Label file: `0 0.5 0.5 0.1 0.1`
|
||||
- **Action**: Call `check_label(path)`
|
||||
- **Expected**: Returns `True`
|
||||
- **Traces**: AC: Valid YOLO label format
|
||||
|
||||
### BT-LBL-02: Label with x > 1.0 rejected
|
||||
- **Input**: Label file: `0 1.5 0.5 0.1 0.1`
|
||||
- **Action**: Call `check_label(path)`
|
||||
- **Expected**: Returns `False`
|
||||
- **Traces**: AC: Corrupted labels detected
|
||||
|
||||
### BT-LBL-03: Label with height > 1.0 rejected
|
||||
- **Input**: Label file: `0 0.5 0.5 0.1 1.2`
|
||||
- **Action**: Call `check_label(path)`
|
||||
- **Expected**: Returns `False`
|
||||
- **Traces**: AC: Corrupted labels detected
|
||||
|
||||
### BT-LBL-04: Missing label file rejected
|
||||
- **Input**: Non-existent file path
|
||||
- **Action**: Call `check_label(path)`
|
||||
- **Expected**: Returns `False`
|
||||
- **Traces**: AC: Missing labels handled
|
||||
|
||||
### BT-LBL-05: Multi-line label with one corrupted line
|
||||
- **Input**: Label file: `0 0.5 0.5 0.1 0.1\n3 0.5 0.5 0.1 1.5`
|
||||
- **Action**: Call `check_label(path)`
|
||||
- **Expected**: Returns `False` (any corrupted line fails the whole file)
|
||||
- **Traces**: AC: Corrupted labels detected
|
||||
|
||||
---
|
||||
|
||||
## BT-ENC: Encryption
|
||||
|
||||
### BT-ENC-01: Encrypt-decrypt roundtrip (arbitrary data)
|
||||
- **Input**: 1024 random bytes, key "test-key"
|
||||
- **Action**: `decrypt_to(encrypt_to(data, key), key)`
|
||||
- **Expected**: Output equals input bytes exactly
|
||||
- **Traces**: AC: AES-256-CBC encryption
|
||||
|
||||
### BT-ENC-02: Encrypt-decrypt roundtrip (ONNX model)
|
||||
- **Input**: `azaion.onnx` bytes, model encryption key
|
||||
- **Action**: `decrypt_to(encrypt_to(model_bytes, key), key)`
|
||||
- **Expected**: Output equals input bytes exactly
|
||||
- **Traces**: AC: Model encryption
|
||||
|
||||
### BT-ENC-03: Empty input roundtrip
|
||||
- **Input**: `b""`, key "test-key"
|
||||
- **Action**: `decrypt_to(encrypt_to(b"", key), key)`
|
||||
- **Expected**: Output equals `b""`
|
||||
- **Traces**: AC: Edge case handling
|
||||
|
||||
### BT-ENC-04: Single byte roundtrip
|
||||
- **Input**: `b"\x00"`, key "test-key"
|
||||
- **Action**: `decrypt_to(encrypt_to(b"\x00", key), key)`
|
||||
- **Expected**: Output equals `b"\x00"`
|
||||
- **Traces**: AC: Edge case handling
|
||||
|
||||
### BT-ENC-05: Different keys produce different ciphertext
|
||||
- **Input**: Same 1024 bytes, keys "key-a" and "key-b"
|
||||
- **Action**: `encrypt_to(data, "key-a")` vs `encrypt_to(data, "key-b")`
|
||||
- **Expected**: Ciphertexts differ
|
||||
- **Traces**: AC: Key-dependent encryption
|
||||
|
||||
### BT-ENC-06: Wrong key fails decryption
|
||||
- **Input**: Encrypted with "key-a", decrypt with "key-b"
|
||||
- **Action**: `decrypt_to(encrypted, "key-b")`
|
||||
- **Expected**: Output does NOT equal original input
|
||||
- **Traces**: AC: Key-dependent encryption
|
||||
|
||||
---
|
||||
|
||||
## BT-SPL: Model Split Storage
|
||||
|
||||
### BT-SPL-01: Split respects size constraint
|
||||
- **Input**: 10000 encrypted bytes
|
||||
- **Action**: Split into small + big per `SMALL_SIZE_KB = 3` logic
|
||||
- **Expected**: small ≤ max(3072 bytes, 20% of total); big = remainder
|
||||
- **Traces**: AC: Model split ≤3KB or 20%
|
||||
|
||||
### BT-SPL-02: Reassembly produces original
|
||||
- **Input**: 10000 encrypted bytes → split → reassemble
|
||||
- **Action**: `small + big`
|
||||
- **Expected**: Equals original encrypted bytes
|
||||
- **Traces**: AC: Split model integrity
|
||||
|
||||
---
|
||||
|
||||
## BT-CLS: Annotation Class Loading
|
||||
|
||||
### BT-CLS-01: Load 17 base classes
|
||||
- **Input**: `classes.json`
|
||||
- **Action**: `AnnotationClass.read_json()`
|
||||
- **Expected**: Dict with 17 unique class entries (base IDs)
|
||||
- **Traces**: AC: 17 base classes
|
||||
|
||||
### BT-CLS-02: Weather mode expansion
|
||||
- **Input**: `classes.json`
|
||||
- **Action**: `AnnotationClass.read_json()`
|
||||
- **Expected**: Same class at offset 0 (Norm), 20 (Wint), 40 (Night); e.g., ID 0, 20, 40 all represent ArmorVehicle
|
||||
- **Traces**: AC: 3 weather modes
|
||||
|
||||
### BT-CLS-03: YAML generation produces 80 class names
|
||||
- **Input**: `classes.json` + dataset path
|
||||
- **Action**: `create_yaml()` with patched paths
|
||||
- **Expected**: data.yaml contains `nc: 80`, 17 named classes + 63 `Class-N` placeholders
|
||||
- **Traces**: AC: 80 total class slots
|
||||
|
||||
---
|
||||
|
||||
## BT-HSH: Hardware Hash
|
||||
|
||||
### BT-HSH-01: Deterministic output
|
||||
- **Input**: "test-hardware-info"
|
||||
- **Action**: `Security.get_hw_hash()` called twice
|
||||
- **Expected**: Both calls return identical string
|
||||
- **Traces**: AC: Hardware fingerprinting determinism
|
||||
|
||||
### BT-HSH-02: Different inputs produce different hashes
|
||||
- **Input**: "hw-a" and "hw-b"
|
||||
- **Action**: `Security.get_hw_hash()` on each
|
||||
- **Expected**: Results differ
|
||||
- **Traces**: AC: Hardware-bound uniqueness
|
||||
|
||||
### BT-HSH-03: Output is valid base64
|
||||
- **Input**: "test-hardware-info"
|
||||
- **Action**: `Security.get_hw_hash()`
|
||||
- **Expected**: Matches regex `^[A-Za-z0-9+/]+=*$`
|
||||
- **Traces**: AC: Hash format
|
||||
|
||||
---
|
||||
|
||||
## BT-INF: ONNX Inference
|
||||
|
||||
### BT-INF-01: Model loads successfully
|
||||
- **Input**: `azaion.onnx` bytes
|
||||
- **Action**: `OnnxEngine(model_bytes)`
|
||||
- **Expected**: No exception; engine object created with valid input_shape and batch_size
|
||||
- **Traces**: AC: ONNX inference capability
|
||||
|
||||
### BT-INF-02: Inference returns output
|
||||
- **Input**: ONNX engine + 1 preprocessed image
|
||||
- **Action**: `engine.run(input_blob)`
|
||||
- **Expected**: Returns list of numpy arrays; first array has shape [batch, N, 6+]
|
||||
- **Traces**: AC: ONNX inference produces results
|
||||
|
||||
### BT-INF-03: Postprocessing returns valid detections
|
||||
- **Input**: ONNX engine output from real image
|
||||
- **Action**: `Inference.postprocess()`
|
||||
- **Expected**: Returns list of Annotation objects; each Detection has x,y,w,h ∈ [0,1], cls ∈ [0,79], confidence ∈ [0,1]
|
||||
- **Traces**: AC: Detection format validity
|
||||
|
||||
---
|
||||
|
||||
## BT-NMS: Overlap Removal
|
||||
|
||||
### BT-NMS-01: Overlapping detections — keep higher confidence
|
||||
- **Input**: 2 Detection objects at same position, confidence 0.9 and 0.5, IoU > 0.3
|
||||
- **Action**: `remove_overlapping_detections()`
|
||||
- **Expected**: 1 detection returned (confidence 0.9)
|
||||
- **Traces**: AC: NMS IoU threshold 0.3
|
||||
|
||||
### BT-NMS-02: Non-overlapping detections — keep both
|
||||
- **Input**: 2 Detection objects at distant positions, IoU < 0.3
|
||||
- **Action**: `remove_overlapping_detections()`
|
||||
- **Expected**: 2 detections returned
|
||||
- **Traces**: AC: NMS preserves non-overlapping
|
||||
|
||||
### BT-NMS-03: Chain overlap resolution
|
||||
- **Input**: 3 Detection objects: A overlaps B (IoU > 0.3), B overlaps C (IoU > 0.3), A doesn't overlap C
|
||||
- **Action**: `remove_overlapping_detections()`
|
||||
- **Expected**: ≤ 2 detections; highest confidence per overlapping pair kept
|
||||
- **Traces**: AC: NMS handles chains
|
||||
|
||||
---
|
||||
|
||||
## BT-AQM: Annotation Queue Message Parsing
|
||||
|
||||
### BT-AQM-01: Parse Created annotation message
|
||||
- **Input**: Msgpack bytes matching AnnotationMessage schema (status=Created, role=Validator)
|
||||
- **Action**: Decode and construct AnnotationMessage
|
||||
- **Expected**: All fields populated: name, detections, image bytes, status == "Created", role == "Validator"
|
||||
- **Traces**: AC: Annotation message parsing
|
||||
|
||||
### BT-AQM-02: Parse Validated bulk message
|
||||
- **Input**: Msgpack bytes with status=Validated, list of names
|
||||
- **Action**: Decode and construct AnnotationBulkMessage
|
||||
- **Expected**: Status == "Validated", names list matches input
|
||||
- **Traces**: AC: Bulk validation parsing
|
||||
|
||||
### BT-AQM-03: Parse Deleted bulk message
|
||||
- **Input**: Msgpack bytes with status=Deleted, list of names
|
||||
- **Action**: Decode and construct AnnotationBulkMessage
|
||||
- **Expected**: Status == "Deleted", names list matches input
|
||||
- **Traces**: AC: Bulk deletion parsing
|
||||
|
||||
### BT-AQM-04: Malformed message raises exception
|
||||
- **Input**: Invalid msgpack bytes
|
||||
- **Action**: Attempt to decode
|
||||
- **Expected**: Exception raised
|
||||
- **Traces**: AC: Error handling for malformed messages
|
||||
@@ -0,0 +1,71 @@
|
||||
# Test Environment
|
||||
|
||||
## Runtime Requirements
|
||||
|
||||
| Requirement | Specification |
|
||||
|-------------|--------------|
|
||||
| Python | 3.10+ |
|
||||
| OS | Linux or macOS (POSIX filesystem paths) |
|
||||
| GPU | Optional — ONNX inference falls back to CPUExecutionProvider |
|
||||
| Disk | Temp directory for fixture data (~500MB for augmentation output) |
|
||||
| Network | Not required (all tests are offline) |
|
||||
|
||||
## Execution Modes
|
||||
|
||||
Tests MUST be runnable in two ways:
|
||||
|
||||
### 1. Local (no Docker) — primary mode
|
||||
Run directly on the host machine. Required for macOS development where Docker has GPU/performance limitations.
|
||||
|
||||
```bash
|
||||
scripts/run-tests-local.sh
|
||||
```
|
||||
|
||||
### 2. Docker — CI/portable mode
|
||||
Run inside a container for reproducible CI environments (Linux-based CI runners).
|
||||
|
||||
```bash
|
||||
docker compose -f docker-compose.test.yml up --build --abort-on-container-exit
|
||||
```
|
||||
|
||||
Both modes run the same pytest suite; the only difference is the runtime environment.
|
||||
|
||||
## Dependencies
|
||||
|
||||
All test dependencies are a subset of the production `requirements.txt` plus pytest:
|
||||
|
||||
| Package | Purpose |
|
||||
|---------|---------|
|
||||
| pytest | Test runner |
|
||||
| albumentations | Augmentation tests |
|
||||
| opencv-python-headless | Image I/O (headless — no GUI) |
|
||||
| numpy | Array operations |
|
||||
| onnxruntime | ONNX inference (CPU fallback) |
|
||||
| cryptography | Encryption tests |
|
||||
| msgpack | Annotation queue message tests |
|
||||
| PyYAML | Config/YAML generation tests |
|
||||
|
||||
## Fixture Data
|
||||
|
||||
| Fixture | Location | Size |
|
||||
|---------|----------|------|
|
||||
| 100 annotated images | `_docs/00_problem/input_data/dataset/images/` | ~50MB |
|
||||
| 100 YOLO labels | `_docs/00_problem/input_data/dataset/labels/` | ~10KB |
|
||||
| ONNX model | `_docs/00_problem/input_data/azaion.onnx` | 81MB |
|
||||
| Class definitions | `classes.json` (project root) | 2KB |
|
||||
|
||||
## Test Isolation
|
||||
|
||||
- Each test creates a temporary directory (via `tmp_path` pytest fixture) for filesystem operations
|
||||
- No tests modify the actual `/azaion/` directory structure
|
||||
- No tests require running external services (RabbitMQ, Azaion API, S3 CDN)
|
||||
- Constants paths are patched/overridden to point to temp directories during tests
|
||||
|
||||
## Excluded (Require External Services)
|
||||
|
||||
| Component | Service Required | Reason for Exclusion |
|
||||
|-----------|-----------------|---------------------|
|
||||
| API upload/download | Azaion REST API | No mock server; real API has auth |
|
||||
| CDN upload/download | S3-compatible CDN | No mock S3; real CDN has credentials |
|
||||
| Queue consumption | RabbitMQ Streams | No mock broker; rstream requires live connection |
|
||||
| TensorRT inference | NVIDIA GPU + TensorRT | Hardware-specific; cannot run in CI without GPU |
|
||||
@@ -0,0 +1,33 @@
|
||||
# Performance Test Scenarios
|
||||
|
||||
## PT-AUG-01: Augmentation throughput
|
||||
- **Input**: 10 images from fixture dataset
|
||||
- **Action**: Run `augment_annotations()`, measure wall time
|
||||
- **Expected**: Completes within 60 seconds (10 images × 8 outputs = 80 files)
|
||||
- **Traces**: Restriction: Augmentation runs continuously
|
||||
- **Note**: Threshold is generous; actual performance depends on CPU
|
||||
|
||||
## PT-AUG-02: Parallel augmentation speedup
|
||||
- **Input**: 10 images from fixture dataset
|
||||
- **Action**: Run with ThreadPoolExecutor vs sequential, compare times
|
||||
- **Expected**: Parallel is ≥ 1.5× faster than sequential
|
||||
- **Traces**: AC: Parallelized per-image processing
|
||||
|
||||
## PT-DSF-01: Dataset formation throughput
|
||||
- **Input**: 100 images + labels
|
||||
- **Action**: Run `form_dataset()`, measure wall time
|
||||
- **Expected**: Completes within 30 seconds
|
||||
- **Traces**: Restriction: Dataset formation before training
|
||||
|
||||
## PT-ENC-01: Encryption throughput
|
||||
- **Input**: 10MB random bytes
|
||||
- **Action**: Encrypt + decrypt roundtrip, measure wall time
|
||||
- **Expected**: Completes within 5 seconds
|
||||
- **Traces**: AC: Model encryption feasible for large models
|
||||
|
||||
## PT-INF-01: ONNX inference latency (single image)
|
||||
- **Input**: 1 preprocessed image + ONNX model
|
||||
- **Action**: Run single inference, measure wall time
|
||||
- **Expected**: Completes within 10 seconds on CPU (no GPU requirement for test)
|
||||
- **Traces**: AC: Inference capability
|
||||
- **Note**: Production uses GPU; CPU is slower but validates correctness
|
||||
@@ -0,0 +1,37 @@
|
||||
# Resilience Test Scenarios
|
||||
|
||||
## RT-AUG-01: Augmentation handles corrupted image gracefully
|
||||
- **Input**: 1 valid image + 1 corrupted image file (truncated JPEG) in data/ dir
|
||||
- **Action**: Run `augment_annotations()`
|
||||
- **Expected**: Valid image produces 8 outputs; corrupted image skipped without crashing pipeline; total output: 8 files
|
||||
- **Traces**: Restriction: Augmentation exception handling per-image
|
||||
|
||||
## RT-AUG-02: Augmentation handles missing label file
|
||||
- **Input**: 1 image with no matching label file
|
||||
- **Action**: Run `augment_annotation()` on the image
|
||||
- **Expected**: Exception caught per-thread; does not crash pipeline
|
||||
- **Traces**: Restriction: Augmentation exception handling
|
||||
|
||||
## RT-AUG-03: Augmentation transform failure produces fewer variants
|
||||
- **Input**: 1 image + label that causes some transforms to fail (extremely narrow bbox)
|
||||
- **Action**: Run `augment_inner()`
|
||||
- **Expected**: Returns 1-8 ImageLabel objects (original always present; failed variants skipped); no crash
|
||||
- **Traces**: Restriction: Transform failure handling
|
||||
|
||||
## RT-DSF-01: Dataset formation with empty processed directory
|
||||
- **Input**: Empty processed images dir
|
||||
- **Action**: Run `form_dataset()`
|
||||
- **Expected**: Creates empty train/valid/test directories; no crash
|
||||
- **Traces**: Restriction: Edge case handling
|
||||
|
||||
## RT-ENC-01: Decrypt with corrupted ciphertext
|
||||
- **Input**: Randomly modified ciphertext bytes
|
||||
- **Action**: `Security.decrypt_to(corrupted_bytes, key)`
|
||||
- **Expected**: Either raises exception or returns garbage bytes (not original)
|
||||
- **Traces**: AC: Encryption integrity
|
||||
|
||||
## RT-AQM-01: Malformed msgpack message
|
||||
- **Input**: Random bytes that aren't valid msgpack
|
||||
- **Action**: Pass to message handler
|
||||
- **Expected**: Exception caught; handler doesn't crash
|
||||
- **Traces**: AC: Error handling for malformed messages
|
||||
@@ -0,0 +1,31 @@
|
||||
# Resource Limit Test Scenarios
|
||||
|
||||
## RL-AUG-01: Augmentation output count bounded
|
||||
- **Input**: 1 image
|
||||
- **Action**: Run `augment_inner()`
|
||||
- **Expected**: Returns exactly 8 outputs (never more, even with retries)
|
||||
- **Traces**: AC: 8× augmentation ratio (1 original + 7 augmented)
|
||||
|
||||
## RL-DSF-01: Dataset split ratios sum to 100%
|
||||
- **Input**: Any number of images
|
||||
- **Action**: Check `train_set + valid_set + test_set`
|
||||
- **Expected**: Equals 100
|
||||
- **Traces**: AC: 70/20/10 split
|
||||
|
||||
## RL-DSF-02: No data duplication across splits
|
||||
- **Input**: 100 images
|
||||
- **Action**: Run `form_dataset()`, collect all filenames across train/valid/test
|
||||
- **Expected**: No filename appears in more than one split
|
||||
- **Traces**: AC: Dataset integrity
|
||||
|
||||
## RL-ENC-01: Encrypted output size bounded
|
||||
- **Input**: N bytes plaintext
|
||||
- **Action**: Encrypt
|
||||
- **Expected**: Ciphertext size ≤ N + 32 bytes (16 IV + up to 16 padding)
|
||||
- **Traces**: Restriction: AES-256-CBC overhead
|
||||
|
||||
## RL-CLS-01: Total class count is exactly 80
|
||||
- **Input**: `classes.json`
|
||||
- **Action**: Generate class list for YAML
|
||||
- **Expected**: Exactly 80 entries (17 named × 3 weather + 29 placeholders = 80)
|
||||
- **Traces**: AC: 80 total class slots
|
||||
@@ -0,0 +1,43 @@
|
||||
# Security Test Scenarios
|
||||
|
||||
## ST-ENC-01: Encryption produces different ciphertext each time (random IV)
|
||||
- **Input**: Same 1024 bytes, same key, encrypt twice
|
||||
- **Action**: Compare two ciphertexts
|
||||
- **Expected**: Ciphertexts differ (random IV ensures non-deterministic output)
|
||||
- **Traces**: AC: AES-256-CBC with random IV
|
||||
|
||||
## ST-ENC-02: Wrong key cannot recover plaintext
|
||||
- **Input**: Encrypt with "key-a", attempt decrypt with "key-b"
|
||||
- **Action**: `Security.decrypt_to(encrypted, "key-b")`
|
||||
- **Expected**: Output != original plaintext
|
||||
- **Traces**: AC: Key-dependent encryption
|
||||
|
||||
## ST-ENC-03: Model encryption key is deterministic
|
||||
- **Input**: Call `Security.get_model_encryption_key()` twice
|
||||
- **Action**: Compare results
|
||||
- **Expected**: Identical strings
|
||||
- **Traces**: AC: Static model encryption key
|
||||
|
||||
## ST-HSH-01: Hardware hash is deterministic for same input
|
||||
- **Input**: Same hardware info string
|
||||
- **Action**: `Security.get_hw_hash()` called twice
|
||||
- **Expected**: Identical output
|
||||
- **Traces**: AC: Hardware fingerprinting determinism
|
||||
|
||||
## ST-HSH-02: Different hardware produces different hash
|
||||
- **Input**: Two different hardware info strings
|
||||
- **Action**: `Security.get_hw_hash()` on each
|
||||
- **Expected**: Different outputs
|
||||
- **Traces**: AC: Hardware-bound uniqueness
|
||||
|
||||
## ST-HSH-03: API encryption key depends on credentials + hardware
|
||||
- **Input**: Same credentials with different hardware hashes
|
||||
- **Action**: `Security.get_api_encryption_key()` for each
|
||||
- **Expected**: Different keys
|
||||
- **Traces**: AC: Hardware-bound API encryption
|
||||
|
||||
## ST-HSH-04: API encryption key depends on credentials
|
||||
- **Input**: Different credentials with same hardware hash
|
||||
- **Action**: `Security.get_api_encryption_key()` for each
|
||||
- **Expected**: Different keys
|
||||
- **Traces**: AC: Credential-dependent API encryption
|
||||
@@ -0,0 +1,26 @@
|
||||
# Test Data Management
|
||||
|
||||
## Fixture Sources
|
||||
|
||||
| ID | Data Item | Source | Format | Preparation |
|
||||
|----|-----------|--------|--------|-------------|
|
||||
| FD-01 | Annotated images (100) | `_docs/00_problem/input_data/dataset/images/` | JPEG | Copy subset to tmp_path at test start |
|
||||
| FD-02 | YOLO labels (100) | `_docs/00_problem/input_data/dataset/labels/` | TXT | Copy subset to tmp_path at test start |
|
||||
| FD-03 | ONNX model | `_docs/00_problem/input_data/azaion.onnx` | ONNX | Read bytes at test start |
|
||||
| FD-04 | Class definitions | `classes.json` (project root) | JSON | Copy to tmp_path at test start |
|
||||
| FD-05 | Corrupted labels | Generated at test time | TXT | Create labels with coords > 1.0 |
|
||||
| FD-06 | Edge-case bboxes | Generated at test time | In-memory | Construct bboxes near image boundaries |
|
||||
| FD-07 | Detection objects | Generated at test time | In-memory | Construct Detection instances for NMS tests |
|
||||
| FD-08 | Msgpack messages | Generated at test time | bytes | Construct AnnotationMessage-compatible msgpack |
|
||||
| FD-09 | Random binary data | Generated at test time | bytes | `os.urandom(N)` for encryption tests |
|
||||
| FD-10 | Empty label file | Generated at test time | TXT | Empty file for augmentation edge case |
|
||||
|
||||
## Data Lifecycle
|
||||
|
||||
1. **Setup**: pytest `conftest.py` copies fixture files to `tmp_path`
|
||||
2. **Execution**: Tests operate on copied data in isolation
|
||||
3. **Teardown**: `tmp_path` is automatically cleaned by pytest
|
||||
|
||||
## Expected Results Location
|
||||
|
||||
All expected results are defined in `_docs/00_problem/input_data/expected_results/results_report.md` (37 test scenarios mapped).
|
||||
@@ -0,0 +1,67 @@
|
||||
# Traceability Matrix
|
||||
|
||||
## Acceptance Criteria Coverage
|
||||
|
||||
| AC / Restriction | Test IDs | Coverage |
|
||||
|------------------|----------|----------|
|
||||
| 8× augmentation ratio | BT-AUG-01, BT-AUG-06, BT-AUG-07, RL-AUG-01 | Full |
|
||||
| Augmentation naming convention | BT-AUG-02 | Full |
|
||||
| Bounding boxes clipped to [0,1] | BT-AUG-03, BT-AUG-04 | Full |
|
||||
| Tiny bboxes (< 0.01) discarded | BT-AUG-05 | Full |
|
||||
| Augmentation skips already-processed | BT-AUG-08 | Full |
|
||||
| Augmentation parallelized | PT-AUG-02 | Full |
|
||||
| Augmentation handles corrupted images | RT-AUG-01 | Full |
|
||||
| Augmentation handles missing labels | RT-AUG-02 | Full |
|
||||
| Transform failure graceful | RT-AUG-03 | Full |
|
||||
| Dataset split 70/20/10 | BT-DSF-01, RL-DSF-01 | Full |
|
||||
| Dataset directory structure | BT-DSF-02 | Full |
|
||||
| Dataset integrity (no data loss) | BT-DSF-03, RL-DSF-02 | Full |
|
||||
| Corrupted label filtering | BT-DSF-04, BT-LBL-01 to BT-LBL-05 | Full |
|
||||
| AES-256-CBC encryption | BT-ENC-01 to BT-ENC-06, ST-ENC-01, ST-ENC-02 | Full |
|
||||
| Model encryption roundtrip | BT-ENC-02 | Full |
|
||||
| Model split ≤3KB or 20% | BT-SPL-01, BT-SPL-02 | Full |
|
||||
| 17 base classes | BT-CLS-01 | Full |
|
||||
| 3 weather modes (Norm/Wint/Night) | BT-CLS-02 | Full |
|
||||
| 80 total class slots | BT-CLS-03, RL-CLS-01 | Full |
|
||||
| YAML generation (nc: 80) | BT-CLS-03 | Full |
|
||||
| Hardware hash determinism | BT-HSH-01 to BT-HSH-03, ST-HSH-01, ST-HSH-02 | Full |
|
||||
| Hardware-bound API encryption | ST-HSH-03, ST-HSH-04 | Full |
|
||||
| ONNX inference loads model | BT-INF-01 | Full |
|
||||
| ONNX inference returns detections | BT-INF-02, BT-INF-03 | Full |
|
||||
| NMS overlap removal (IoU 0.3) | BT-NMS-01, BT-NMS-02, BT-NMS-03 | Full |
|
||||
| Annotation message parsing | BT-AQM-01 to BT-AQM-04, RT-AQM-01 | Full |
|
||||
| Encryption size overhead bounded | RL-ENC-01 | Full |
|
||||
| Static model encryption key | ST-ENC-03 | Full |
|
||||
| Random IV per encryption | ST-ENC-01 | Full |
|
||||
|
||||
## Uncovered (Require External Services)
|
||||
|
||||
| AC / Restriction | Reason |
|
||||
|------------------|--------|
|
||||
| TensorRT inference (54s for 200s video) | Requires NVIDIA GPU + TensorRT runtime |
|
||||
| API upload/download with JWT auth | Requires live Azaion API |
|
||||
| CDN upload/download (S3) | Requires live S3-compatible CDN |
|
||||
| Queue offset persistence | Requires live RabbitMQ Streams |
|
||||
| Auto-relogin on 401/403 | Requires live Azaion API |
|
||||
| Frame sampling every 4th frame | Requires video file (fixture not provided) |
|
||||
| Confidence threshold 0.3 filtering | Partially covered by BT-INF-03 (validates range, not exact threshold) |
|
||||
|
||||
## Summary
|
||||
|
||||
| Metric | Value |
|
||||
|--------|-------|
|
||||
| Total AC + Restrictions | 36 |
|
||||
| Covered by tests | 29 |
|
||||
| Uncovered (external deps) | 7 |
|
||||
| **Coverage** | **80.6%** |
|
||||
|
||||
## Test Count Summary
|
||||
|
||||
| Category | Count |
|
||||
|----------|-------|
|
||||
| Blackbox tests | 32 |
|
||||
| Performance tests | 5 |
|
||||
| Resilience tests | 6 |
|
||||
| Security tests | 7 |
|
||||
| Resource limit tests | 5 |
|
||||
| **Total** | **55** |
|
||||
Reference in New Issue
Block a user