mirror of https://github.com/azaion/ai-training.git synced 2026-04-22 22:16:35 +00:00

Files

T

Oleksandr Bezdieniezhnykh 142c6c4de8 Refactor constants management to use Pydantic BaseModel for configuration

- Replaced module-level path variables in constants.py with a structured Pydantic Config class.
- Updated all relevant modules (train.py, augmentation.py, exports.py, dataset-visualiser.py, manual_run.py) to access paths through the new config structure.
- Fixed bugs related to image processing and model saving.
- Enhanced test infrastructure to accommodate the new configuration approach.

This refactor improves code maintainability and clarity by centralizing configuration management.

2026-03-27 18:18:30 +02:00

3.9 KiB

Raw Permalink Blame History

Component: Training Pipeline

Overview

End-to-end YOLOv11 object detection training workflow: dataset formation from augmented annotations, model training, multi-format export (ONNX, TensorRT, RKNN), and encrypted model upload.

Pattern: Pipeline / orchestrator Upstream: Core, Security, API & CDN, Data Models, Data Pipeline (augmented images) Downstream: None (produces trained models consumed externally)

Modules

train — main pipeline: dataset formation → YOLO training → export → upload
exports — model format conversion (ONNX, TensorRT, RKNN) + upload utilities
manual_run — ad-hoc developer script for selective pipeline steps

Internal Interfaces

train

form_dataset() -> None
copy_annotations(images, folder: str) -> None
check_label(label_path: str) -> bool
create_yaml() -> None
resume_training(last_pt_path: str) -> None
train_dataset() -> None
export_current_model() -> None

exports

export_rknn(model_path: str) -> None
export_onnx(model_path: str, batch_size: int = 4) -> None
export_tensorrt(model_path: str) -> None
form_data_sample(destination_path: str, size: int = 500, write_txt_log: bool = False) -> None
show_model(model: str = None) -> None
upload_model(model_path: str, filename: str, size_small_in_kb: int = 3) -> None

Data Access Patterns

Input: Reads augmented images from /azaion/data-processed/images/ + labels
Dataset output: Creates dated dataset at /azaion/datasets/azaion-{YYYY-MM-DD}/ with train/valid/test splits
Model output: Saves trained models to /azaion/models/azaion-{YYYY-MM-DD}/, copies best.pt to /azaion/models/azaion.pt
Upload: Encrypted model uploaded as split big/small to CDN + API
Corrupted data: Invalid labels moved to /azaion/data-corrupted/

Implementation Details

Dataset split: 70% train / 20% valid / 10% test (random shuffle)
Label validation: check_label() verifies all YOLO coordinates are ≤ 1.0
YAML generation: Writes data.yaml with 80 class names (17 actual from classes.json × 3 weather modes, rest as placeholders)
Training config: YOLOv11 medium (yolo11m.yaml), epochs=120, batch=11 (tuned for 24GB VRAM), imgsz=1280, save_period=1, workers=24
Post-training: Removes intermediate epoch checkpoints, keeps only best.pt
Export chain: .pt → ONNX (1280px, batch=4, NMS) → encrypted → split → upload
TensorRT export: batch=4, FP16, NMS, simplify
RKNN export: targets RK3588 SoC (OrangePi5)
Concurrent file copying: ThreadPoolExecutor for parallel image/label copying during dataset formation
__main__ in train.py: train_dataset() → export_current_model()

Caveats

Training hyperparameters are hardcoded (not configurable via config file)
old_images_percentage = 75 declared but unused
train.py imports subprocess, sleep but doesn't use them
train.py imports OnnxEngine but doesn't use it
exports.upload_model() creates ApiClient with different constructor signature than the one in api_client.py — likely stale code
copy_annotations uses a global total_files_copied counter with a local copied variable that stays at 0 — reporting bug
resume_training references yaml (the module) instead of a YAML file path in the data parameter

Dependency Graph

graph TD
    constants --> train
    constants --> exports
    api_client --> train
    api_client --> exports
    cdn_manager --> train
    cdn_manager --> exports
    security --> train
    security --> exports
    utils --> train
    utils --> exports
    dto_annotationClass[dto/annotationClass] --> train
    inference_onnx[inference/onnx_engine] --> train
    exports --> train
    train --> manual_run
    augmentation --> manual_run

Logging Strategy

Print statements for progress (file count, shuffling status, training results). No structured logging.

3.9 KiB Raw Permalink Blame History Unescape Escape