mirror of
https://github.com/azaion/ai-training.git
synced 2026-04-22 21:06:35 +00:00
Refactor constants management to use Pydantic BaseModel for configuration
- Replaced module-level path variables in constants.py with a structured Pydantic Config class. - Updated all relevant modules (train.py, augmentation.py, exports.py, dataset-visualiser.py, manual_run.py) to access paths through the new config structure. - Fixed bugs related to image processing and model saving. - Enhanced test infrastructure to accommodate the new configuration approach. This refactor improves code maintainability and clarity by centralizing configuration management.
This commit is contained in:
@@ -0,0 +1,205 @@
|
||||
# Codebase Discovery
|
||||
|
||||
## Directory Tree
|
||||
|
||||
```
|
||||
ai-training/
|
||||
├── annotation-queue/ # Separate sub-service: annotation message queue consumer
|
||||
│ ├── annotation_queue_dto.py
|
||||
│ ├── annotation_queue_handler.py
|
||||
│ ├── classes.json
|
||||
│ ├── config.yaml
|
||||
│ ├── offset.yaml
|
||||
│ ├── requirements.txt
|
||||
│ └── run.sh
|
||||
├── dto/ # Data transfer objects for the training pipeline
|
||||
│ ├── annotationClass.py
|
||||
│ ├── annotation_bulk_message.py (empty)
|
||||
│ ├── annotation_message.py (empty)
|
||||
│ └── imageLabel.py
|
||||
├── inference/ # Inference engine subsystem (ONNX + TensorRT)
|
||||
│ ├── __init__.py (empty)
|
||||
│ ├── dto.py
|
||||
│ ├── inference.py
|
||||
│ ├── onnx_engine.py
|
||||
│ └── tensorrt_engine.py
|
||||
├── orangepi5/ # Setup scripts for OrangePi5 edge device
|
||||
│ ├── 01 install.sh
|
||||
│ ├── 02 install-inference.sh
|
||||
│ └── 03 run_inference.sh
|
||||
├── scripts/
|
||||
│ └── init-sftp.sh
|
||||
├── tests/
|
||||
│ ├── data.yaml
|
||||
│ ├── imagelabel_visualize_test.py
|
||||
│ ├── libomp140.x86_64.dll (binary workaround for Windows)
|
||||
│ └── security_test.py
|
||||
├── api_client.py # API client for Azaion backend + CDN resource management
|
||||
├── augmentation.py # Image augmentation pipeline (albumentations)
|
||||
├── cdn_manager.py # S3-compatible CDN upload/download via boto3
|
||||
├── cdn.yaml # CDN credentials config
|
||||
├── checkpoint.txt # Last training checkpoint timestamp
|
||||
├── classes.json # Annotation class definitions (17 classes + weather modes)
|
||||
├── config.yaml # Main config (API url, queue, directories)
|
||||
├── constants.py # Shared path constants and config keys
|
||||
├── convert-annotations.py # Annotation format converter (Pascal VOC / bbox → YOLO)
|
||||
├── dataset-visualiser.py # Interactive dataset visualization tool
|
||||
├── exports.py # Model export (ONNX, TensorRT, RKNN) and upload
|
||||
├── hardware_service.py # Hardware fingerprinting (CPU/GPU/RAM/drive serial)
|
||||
├── install.sh # Dependency installation script
|
||||
├── manual_run.py # Manual training/export entry point
|
||||
├── requirements.txt # Python dependencies
|
||||
├── security.py # AES-256-CBC encryption/decryption + key derivation
|
||||
├── start_inference.py # Inference entry point (downloads model, runs TensorRT)
|
||||
├── train.py # Main training pipeline (dataset formation → YOLO training → export)
|
||||
└── utils.py # Utility classes (Dotdict)
|
||||
```
|
||||
|
||||
## Tech Stack Summary
|
||||
|
||||
| Category | Technology | Details |
|
||||
|----------|-----------|---------|
|
||||
| Language | Python 3.10+ | Match statements used (3.10 feature) |
|
||||
| ML Framework | Ultralytics (YOLO) | YOLOv11 object detection model |
|
||||
| Deep Learning | PyTorch 2.3.0 (CUDA 12.1) | GPU-accelerated training |
|
||||
| Inference (Primary) | TensorRT | GPU inference with FP16/INT8 support |
|
||||
| Inference (Fallback) | ONNX Runtime GPU | Cross-platform inference |
|
||||
| Augmentation | Albumentations | Image augmentation pipeline |
|
||||
| Computer Vision | OpenCV (cv2) | Image I/O, preprocessing, visualization |
|
||||
| CDN/Storage | boto3 (S3-compatible) | Model artifact storage |
|
||||
| Message Queue | RabbitMQ Streams (rstream) | Annotation message consumption |
|
||||
| Serialization | msgpack | Queue message deserialization |
|
||||
| Encryption | cryptography (AES-256-CBC) | Model encryption, API resource encryption |
|
||||
| GPU Management | pycuda, pynvml | CUDA memory management, device queries |
|
||||
| HTTP | requests | API communication |
|
||||
| Config | PyYAML | Configuration files |
|
||||
| Visualization | matplotlib, netron | Annotation display, model graph viewer |
|
||||
| Edge Deployment | RKNN (RK3588) | OrangePi5 inference target |
|
||||
|
||||
## Dependency Graph
|
||||
|
||||
### Internal Module Dependencies (textual)
|
||||
|
||||
**Leaves (no internal dependencies):**
|
||||
- `constants` — path constants, config keys
|
||||
- `utils` — Dotdict helper
|
||||
- `security` — encryption/decryption, key derivation
|
||||
- `hardware_service` — hardware fingerprinting
|
||||
- `cdn_manager` — S3-compatible CDN client
|
||||
- `dto/annotationClass` — annotation class model + JSON reader
|
||||
- `dto/imageLabel` — image+labels container with visualization
|
||||
- `inference/dto` — Detection, Annotation, AnnotationClass (inference-specific)
|
||||
- `inference/onnx_engine` — InferenceEngine ABC + OnnxEngine implementation
|
||||
- `convert-annotations` — standalone annotation format converter
|
||||
- `annotation-queue/annotation_queue_dto` — queue message DTOs
|
||||
|
||||
**Level 1 (depends on leaves):**
|
||||
- `api_client` → constants, cdn_manager, hardware_service, security
|
||||
- `augmentation` → constants, dto/imageLabel
|
||||
- `inference/tensorrt_engine` → inference/onnx_engine (InferenceEngine ABC)
|
||||
- `inference/inference` → inference/dto, inference/onnx_engine
|
||||
- `annotation-queue/annotation_queue_handler` → annotation_queue_dto
|
||||
|
||||
**Level 2 (depends on level 1):**
|
||||
- `exports` → constants, api_client, cdn_manager, security, utils
|
||||
|
||||
**Level 3 (depends on level 2):**
|
||||
- `train` → constants, api_client, cdn_manager, dto/annotationClass, inference/onnx_engine, security, utils, exports
|
||||
- `start_inference` → constants, api_client, cdn_manager, inference/inference, inference/tensorrt_engine, security, utils
|
||||
|
||||
**Level 4 (depends on level 3):**
|
||||
- `manual_run` → constants, train, augmentation
|
||||
|
||||
**Broken dependency:**
|
||||
- `dataset-visualiser` → constants, dto/annotationClass, dto/imageLabel, **preprocessing** (module not found in codebase)
|
||||
|
||||
### Dependency Graph (Mermaid)
|
||||
|
||||
```mermaid
|
||||
graph TD
|
||||
constants --> api_client
|
||||
constants --> augmentation
|
||||
constants --> exports
|
||||
constants --> train
|
||||
constants --> manual_run
|
||||
constants --> start_inference
|
||||
constants --> dataset-visualiser
|
||||
|
||||
utils --> exports
|
||||
utils --> train
|
||||
utils --> start_inference
|
||||
|
||||
security --> api_client
|
||||
security --> exports
|
||||
security --> train
|
||||
security --> start_inference
|
||||
|
||||
hardware_service --> api_client
|
||||
|
||||
cdn_manager --> api_client
|
||||
cdn_manager --> exports
|
||||
cdn_manager --> train
|
||||
cdn_manager --> start_inference
|
||||
|
||||
api_client --> exports
|
||||
api_client --> train
|
||||
api_client --> start_inference
|
||||
|
||||
dto_annotationClass[dto/annotationClass] --> train
|
||||
dto_annotationClass --> dataset-visualiser
|
||||
|
||||
dto_imageLabel[dto/imageLabel] --> augmentation
|
||||
dto_imageLabel --> dataset-visualiser
|
||||
|
||||
inference_dto[inference/dto] --> inference_inference[inference/inference]
|
||||
inference_onnx[inference/onnx_engine] --> inference_inference
|
||||
inference_onnx --> inference_trt[inference/tensorrt_engine]
|
||||
inference_onnx --> train
|
||||
|
||||
inference_inference --> start_inference
|
||||
inference_trt --> start_inference
|
||||
|
||||
exports --> train
|
||||
train --> manual_run
|
||||
augmentation --> manual_run
|
||||
|
||||
aq_dto[annotation-queue/annotation_queue_dto] --> aq_handler[annotation-queue/annotation_queue_handler]
|
||||
```
|
||||
|
||||
## Topological Processing Order
|
||||
|
||||
| Batch | Modules |
|
||||
|-------|---------|
|
||||
| 1 (leaves) | constants, utils, security, hardware_service, cdn_manager |
|
||||
| 2 (leaves) | dto/annotationClass, dto/imageLabel, inference/dto, inference/onnx_engine |
|
||||
| 3 (level 1) | api_client, augmentation, inference/tensorrt_engine, inference/inference |
|
||||
| 4 (level 2) | exports, convert-annotations, dataset-visualiser |
|
||||
| 5 (level 3) | train, start_inference |
|
||||
| 6 (level 4) | manual_run |
|
||||
| 7 (separate) | annotation-queue/annotation_queue_dto, annotation-queue/annotation_queue_handler |
|
||||
|
||||
## Entry Points
|
||||
|
||||
| Entry Point | Description |
|
||||
|-------------|-------------|
|
||||
| `train.py` (`__main__`) | Main pipeline: form dataset → train YOLO → export + upload ONNX model |
|
||||
| `augmentation.py` (`__main__`) | Continuous augmentation loop (runs indefinitely) |
|
||||
| `start_inference.py` (`__main__`) | Download encrypted TensorRT model → run video inference |
|
||||
| `manual_run.py` (script) | Ad-hoc training/export commands |
|
||||
| `convert-annotations.py` (`__main__`) | One-shot annotation format conversion |
|
||||
| `dataset-visualiser.py` (`__main__`) | Interactive annotation visualization |
|
||||
| `annotation-queue/annotation_queue_handler.py` (`__main__`) | Async queue consumer for annotation CRUD events |
|
||||
|
||||
## Leaf Modules
|
||||
|
||||
constants, utils, security, hardware_service, cdn_manager, dto/annotationClass, dto/imageLabel, inference/dto, inference/onnx_engine, convert-annotations, annotation-queue/annotation_queue_dto
|
||||
|
||||
## Observations
|
||||
|
||||
- **Security concern**: `config.yaml` and `cdn.yaml` contain hardcoded credentials (API passwords, S3 access keys). These should be moved to environment variables or a secrets manager.
|
||||
- **Missing module**: `dataset-visualiser.py` imports from `preprocessing` which does not exist in the codebase.
|
||||
- **Duplicate code**: `AnnotationClass` and `WeatherMode` are defined in three separate locations: `dto/annotationClass.py`, `inference/dto.py`, and `annotation-queue/annotation_queue_dto.py`.
|
||||
- **Empty files**: `dto/annotation_bulk_message.py`, `dto/annotation_message.py`, and `inference/__init__.py` are empty.
|
||||
- **Separate sub-service**: `annotation-queue/` has its own `requirements.txt` and `config.yaml`, functioning as an independent service.
|
||||
- **Hardcoded encryption key**: `security.py` has a hardcoded model encryption key string.
|
||||
- **No formal test framework**: tests are script-based, not using pytest/unittest.
|
||||
Reference in New Issue
Block a user