Files
ai-training/_docs/02_document/modules/train.md
T
Oleksandr Bezdieniezhnykh 142c6c4de8 Refactor constants management to use Pydantic BaseModel for configuration
- Replaced module-level path variables in constants.py with a structured Pydantic Config class.
- Updated all relevant modules (train.py, augmentation.py, exports.py, dataset-visualiser.py, manual_run.py) to access paths through the new config structure.
- Fixed bugs related to image processing and model saving.
- Enhanced test infrastructure to accommodate the new configuration approach.

This refactor improves code maintainability and clarity by centralizing configuration management.
2026-03-27 18:18:30 +02:00

3.2 KiB

Module: train

Purpose

Main training pipeline. Forms YOLO datasets from processed annotations, trains YOLOv11 models, and exports/uploads the trained model.

Public Interface

Function Signature Returns Description
form_dataset () Creates train/valid/test split from processed images
copy_annotations (images, folder: str) Copies image+label pairs to a dataset split folder (concurrent)
check_label (label_path: str) -> bool bool Validates YOLO label file (all coords ≤ 1.0)
create_yaml () Generates YOLO data.yaml with class names from classes.json
resume_training (last_pt_path: str) Resumes training from a checkpoint
train_dataset () Full pipeline: form_dataset → create_yaml → train YOLOv11 → save model
export_current_model () Exports current .pt to ONNX, encrypts, uploads as split resource

Internal Logic

  • Dataset formation: Shuffles all processed images, splits 70/20/10 (train/valid/test). Copies in parallel via ThreadPoolExecutor. Corrupted labels (coords > 1.0) are moved to /azaion/data-corrupted/.
  • YAML generation: Reads annotation classes from classes.json, builds data.yaml with 80 class names (17 actual + 63 placeholders "Class-N"), sets train/valid/test paths.
  • Training: YOLOv11 medium (yolo11m.yaml), 120 epochs, batch=11 (tuned for 24GB VRAM), 1280px input, save every epoch, 24 workers.
  • Post-training: Copies results to /azaion/models/{date}/, removes intermediate epoch checkpoints, copies best.pt to CURRENT_PT_MODEL.
  • Export: Calls export_onnx, reads the ONNX file, encrypts with model key, uploads via upload_big_small_resource.
  • Dataset naming: azaion-{YYYY-MM-DD} using current date.
  • __main__: Runs train_dataset() then export_current_model().

Dependencies

  • constants — all directory/path constants
  • api_client — ApiClient for model upload
  • cdn_manager — CDNCredentials, CDNManager (imported but CDN init done via api_client)
  • dto/annotationClass — AnnotationClass for class name generation
  • inference/onnx_engine — OnnxEngine (imported but unused in current code)
  • security — model encryption key
  • utils — Dotdict
  • exports — export_tensorrt, upload_model, export_onnx
  • ultralytics (external) — YOLO training and export
  • yaml, concurrent.futures, glob, os, random, shutil, subprocess, datetime, pathlib, time (stdlib)

Consumers

manual_run

Data Models

Uses AnnotationClass for class definitions.

Configuration

  • Training hyperparameters hardcoded: epochs=120, batch=11, imgsz=1280, save_period=1, workers=24
  • Dataset split ratios: train_set=70, valid_set=20, test_set=10
  • old_images_percentage=75 (declared but unused)
  • DEFAULT_CLASS_NUM=80

External Integrations

  • Ultralytics YOLOv11 training pipeline
  • Azaion API + CDN for model upload
  • Filesystem: /azaion/datasets/, /azaion/models/, /azaion/data-processed/, /azaion/data-corrupted/

Security

  • Trained models are encrypted before upload
  • Uses Security.get_model_encryption_key() for encryption

Tests

None.