mirror of
https://github.com/azaion/ai-training.git
synced 2026-04-22 13:26:35 +00:00
Refactor constants management to use Pydantic BaseModel for configuration
- Replaced module-level path variables in constants.py with a structured Pydantic Config class. - Updated all relevant modules (train.py, augmentation.py, exports.py, dataset-visualiser.py, manual_run.py) to access paths through the new config structure. - Fixed bugs related to image processing and model saving. - Enhanced test infrastructure to accommodate the new configuration approach. This refactor improves code maintainability and clarity by centralizing configuration management.
This commit is contained in:
@@ -0,0 +1,87 @@
|
||||
# Component: Training Pipeline
|
||||
|
||||
## Overview
|
||||
End-to-end YOLOv11 object detection training workflow: dataset formation from augmented annotations, model training, multi-format export (ONNX, TensorRT, RKNN), and encrypted model upload.
|
||||
|
||||
**Pattern**: Pipeline / orchestrator
|
||||
**Upstream**: Core, Security, API & CDN, Data Models, Data Pipeline (augmented images)
|
||||
**Downstream**: None (produces trained models consumed externally)
|
||||
|
||||
## Modules
|
||||
- `train` — main pipeline: dataset formation → YOLO training → export → upload
|
||||
- `exports` — model format conversion (ONNX, TensorRT, RKNN) + upload utilities
|
||||
- `manual_run` — ad-hoc developer script for selective pipeline steps
|
||||
|
||||
## Internal Interfaces
|
||||
|
||||
### train
|
||||
```python
|
||||
form_dataset() -> None
|
||||
copy_annotations(images, folder: str) -> None
|
||||
check_label(label_path: str) -> bool
|
||||
create_yaml() -> None
|
||||
resume_training(last_pt_path: str) -> None
|
||||
train_dataset() -> None
|
||||
export_current_model() -> None
|
||||
```
|
||||
|
||||
### exports
|
||||
```python
|
||||
export_rknn(model_path: str) -> None
|
||||
export_onnx(model_path: str, batch_size: int = 4) -> None
|
||||
export_tensorrt(model_path: str) -> None
|
||||
form_data_sample(destination_path: str, size: int = 500, write_txt_log: bool = False) -> None
|
||||
show_model(model: str = None) -> None
|
||||
upload_model(model_path: str, filename: str, size_small_in_kb: int = 3) -> None
|
||||
```
|
||||
|
||||
## Data Access Patterns
|
||||
- **Input**: Reads augmented images from `/azaion/data-processed/images/` + labels
|
||||
- **Dataset output**: Creates dated dataset at `/azaion/datasets/azaion-{YYYY-MM-DD}/` with train/valid/test splits
|
||||
- **Model output**: Saves trained models to `/azaion/models/azaion-{YYYY-MM-DD}/`, copies best.pt to `/azaion/models/azaion.pt`
|
||||
- **Upload**: Encrypted model uploaded as split big/small to CDN + API
|
||||
- **Corrupted data**: Invalid labels moved to `/azaion/data-corrupted/`
|
||||
|
||||
## Implementation Details
|
||||
- **Dataset split**: 70% train / 20% valid / 10% test (random shuffle)
|
||||
- **Label validation**: `check_label()` verifies all YOLO coordinates are ≤ 1.0
|
||||
- **YAML generation**: Writes `data.yaml` with 80 class names (17 actual from classes.json × 3 weather modes, rest as placeholders)
|
||||
- **Training config**: YOLOv11 medium (`yolo11m.yaml`), epochs=120, batch=11 (tuned for 24GB VRAM), imgsz=1280, save_period=1, workers=24
|
||||
- **Post-training**: Removes intermediate epoch checkpoints, keeps only `best.pt`
|
||||
- **Export chain**: `.pt` → ONNX (1280px, batch=4, NMS) → encrypted → split → upload
|
||||
- **TensorRT export**: batch=4, FP16, NMS, simplify
|
||||
- **RKNN export**: targets RK3588 SoC (OrangePi5)
|
||||
- **Concurrent file copying**: ThreadPoolExecutor for parallel image/label copying during dataset formation
|
||||
- **`__main__`** in `train.py`: `train_dataset()` → `export_current_model()`
|
||||
|
||||
## Caveats
|
||||
- Training hyperparameters are hardcoded (not configurable via config file)
|
||||
- `old_images_percentage = 75` declared but unused
|
||||
- `train.py` imports `subprocess`, `sleep` but doesn't use them
|
||||
- `train.py` imports `OnnxEngine` but doesn't use it
|
||||
- `exports.upload_model()` creates `ApiClient` with different constructor signature than the one in `api_client.py` — likely stale code
|
||||
- `copy_annotations` uses a global `total_files_copied` counter with a local `copied` variable that stays at 0 — reporting bug
|
||||
- `resume_training` references `yaml` (the module) instead of a YAML file path in the `data` parameter
|
||||
|
||||
## Dependency Graph
|
||||
```mermaid
|
||||
graph TD
|
||||
constants --> train
|
||||
constants --> exports
|
||||
api_client --> train
|
||||
api_client --> exports
|
||||
cdn_manager --> train
|
||||
cdn_manager --> exports
|
||||
security --> train
|
||||
security --> exports
|
||||
utils --> train
|
||||
utils --> exports
|
||||
dto_annotationClass[dto/annotationClass] --> train
|
||||
inference_onnx[inference/onnx_engine] --> train
|
||||
exports --> train
|
||||
train --> manual_run
|
||||
augmentation --> manual_run
|
||||
```
|
||||
|
||||
## Logging Strategy
|
||||
Print statements for progress (file count, shuffling status, training results). No structured logging.
|
||||
Reference in New Issue
Block a user