mirror of
https://github.com/azaion/ai-training.git
synced 2026-04-22 22:56:34 +00:00
142c6c4de8
- Replaced module-level path variables in constants.py with a structured Pydantic Config class. - Updated all relevant modules (train.py, augmentation.py, exports.py, dataset-visualiser.py, manual_run.py) to access paths through the new config structure. - Fixed bugs related to image processing and model saving. - Enhanced test infrastructure to accommodate the new configuration approach. This refactor improves code maintainability and clarity by centralizing configuration management.
3.2 KiB
3.2 KiB
Module: train
Purpose
Main training pipeline. Forms YOLO datasets from processed annotations, trains YOLOv11 models, and exports/uploads the trained model.
Public Interface
| Function | Signature | Returns | Description |
|---|---|---|---|
form_dataset |
() |
— | Creates train/valid/test split from processed images |
copy_annotations |
(images, folder: str) |
— | Copies image+label pairs to a dataset split folder (concurrent) |
check_label |
(label_path: str) -> bool |
bool | Validates YOLO label file (all coords ≤ 1.0) |
create_yaml |
() |
— | Generates YOLO data.yaml with class names from classes.json |
resume_training |
(last_pt_path: str) |
— | Resumes training from a checkpoint |
train_dataset |
() |
— | Full pipeline: form_dataset → create_yaml → train YOLOv11 → save model |
export_current_model |
() |
— | Exports current .pt to ONNX, encrypts, uploads as split resource |
Internal Logic
- Dataset formation: Shuffles all processed images, splits 70/20/10 (train/valid/test). Copies in parallel via ThreadPoolExecutor. Corrupted labels (coords > 1.0) are moved to
/azaion/data-corrupted/. - YAML generation: Reads annotation classes from
classes.json, buildsdata.yamlwith 80 class names (17 actual + 63 placeholders "Class-N"), sets train/valid/test paths. - Training: YOLOv11 medium (
yolo11m.yaml), 120 epochs, batch=11 (tuned for 24GB VRAM), 1280px input, save every epoch, 24 workers. - Post-training: Copies results to
/azaion/models/{date}/, removes intermediate epoch checkpoints, copiesbest.pttoCURRENT_PT_MODEL. - Export: Calls
export_onnx, reads the ONNX file, encrypts with model key, uploads viaupload_big_small_resource. - Dataset naming:
azaion-{YYYY-MM-DD}using current date. __main__: Runstrain_dataset()thenexport_current_model().
Dependencies
constants— all directory/path constantsapi_client— ApiClient for model uploadcdn_manager— CDNCredentials, CDNManager (imported but CDN init done via api_client)dto/annotationClass— AnnotationClass for class name generationinference/onnx_engine— OnnxEngine (imported but unused in current code)security— model encryption keyutils— Dotdictexports— export_tensorrt, upload_model, export_onnxultralytics(external) — YOLO training and exportyaml,concurrent.futures,glob,os,random,shutil,subprocess,datetime,pathlib,time(stdlib)
Consumers
manual_run
Data Models
Uses AnnotationClass for class definitions.
Configuration
- Training hyperparameters hardcoded: epochs=120, batch=11, imgsz=1280, save_period=1, workers=24
- Dataset split ratios: train_set=70, valid_set=20, test_set=10
- old_images_percentage=75 (declared but unused)
- DEFAULT_CLASS_NUM=80
External Integrations
- Ultralytics YOLOv11 training pipeline
- Azaion API + CDN for model upload
- Filesystem:
/azaion/datasets/,/azaion/models/,/azaion/data-processed/,/azaion/data-corrupted/
Security
- Trained models are encrypted before upload
- Uses
Security.get_model_encryption_key()for encryption
Tests
None.