mirror of
https://github.com/azaion/ai-training.git
synced 2026-04-22 21:46:35 +00:00
142c6c4de8
- Replaced module-level path variables in constants.py with a structured Pydantic Config class. - Updated all relevant modules (train.py, augmentation.py, exports.py, dataset-visualiser.py, manual_run.py) to access paths through the new config structure. - Fixed bugs related to image processing and model saving. - Enhanced test infrastructure to accommodate the new configuration approach. This refactor improves code maintainability and clarity by centralizing configuration management.
3.9 KiB
3.9 KiB
Component: Training Pipeline
Overview
End-to-end YOLOv11 object detection training workflow: dataset formation from augmented annotations, model training, multi-format export (ONNX, TensorRT, RKNN), and encrypted model upload.
Pattern: Pipeline / orchestrator Upstream: Core, Security, API & CDN, Data Models, Data Pipeline (augmented images) Downstream: None (produces trained models consumed externally)
Modules
train— main pipeline: dataset formation → YOLO training → export → uploadexports— model format conversion (ONNX, TensorRT, RKNN) + upload utilitiesmanual_run— ad-hoc developer script for selective pipeline steps
Internal Interfaces
train
form_dataset() -> None
copy_annotations(images, folder: str) -> None
check_label(label_path: str) -> bool
create_yaml() -> None
resume_training(last_pt_path: str) -> None
train_dataset() -> None
export_current_model() -> None
exports
export_rknn(model_path: str) -> None
export_onnx(model_path: str, batch_size: int = 4) -> None
export_tensorrt(model_path: str) -> None
form_data_sample(destination_path: str, size: int = 500, write_txt_log: bool = False) -> None
show_model(model: str = None) -> None
upload_model(model_path: str, filename: str, size_small_in_kb: int = 3) -> None
Data Access Patterns
- Input: Reads augmented images from
/azaion/data-processed/images/+ labels - Dataset output: Creates dated dataset at
/azaion/datasets/azaion-{YYYY-MM-DD}/with train/valid/test splits - Model output: Saves trained models to
/azaion/models/azaion-{YYYY-MM-DD}/, copies best.pt to/azaion/models/azaion.pt - Upload: Encrypted model uploaded as split big/small to CDN + API
- Corrupted data: Invalid labels moved to
/azaion/data-corrupted/
Implementation Details
- Dataset split: 70% train / 20% valid / 10% test (random shuffle)
- Label validation:
check_label()verifies all YOLO coordinates are ≤ 1.0 - YAML generation: Writes
data.yamlwith 80 class names (17 actual from classes.json × 3 weather modes, rest as placeholders) - Training config: YOLOv11 medium (
yolo11m.yaml), epochs=120, batch=11 (tuned for 24GB VRAM), imgsz=1280, save_period=1, workers=24 - Post-training: Removes intermediate epoch checkpoints, keeps only
best.pt - Export chain:
.pt→ ONNX (1280px, batch=4, NMS) → encrypted → split → upload - TensorRT export: batch=4, FP16, NMS, simplify
- RKNN export: targets RK3588 SoC (OrangePi5)
- Concurrent file copying: ThreadPoolExecutor for parallel image/label copying during dataset formation
__main__intrain.py:train_dataset()→export_current_model()
Caveats
- Training hyperparameters are hardcoded (not configurable via config file)
old_images_percentage = 75declared but unusedtrain.pyimportssubprocess,sleepbut doesn't use themtrain.pyimportsOnnxEnginebut doesn't use itexports.upload_model()createsApiClientwith different constructor signature than the one inapi_client.py— likely stale codecopy_annotationsuses a globaltotal_files_copiedcounter with a localcopiedvariable that stays at 0 — reporting bugresume_trainingreferencesyaml(the module) instead of a YAML file path in thedataparameter
Dependency Graph
graph TD
constants --> train
constants --> exports
api_client --> train
api_client --> exports
cdn_manager --> train
cdn_manager --> exports
security --> train
security --> exports
utils --> train
utils --> exports
dto_annotationClass[dto/annotationClass] --> train
inference_onnx[inference/onnx_engine] --> train
exports --> train
train --> manual_run
augmentation --> manual_run
Logging Strategy
Print statements for progress (file count, shuffling status, training results). No structured logging.