mirror of https://github.com/azaion/ai-training.git synced 2026-04-22 10:26:36 +00:00

Files

T

Oleksandr Bezdieniezhnykh 142c6c4de8 Refactor constants management to use Pydantic BaseModel for configuration

- Replaced module-level path variables in constants.py with a structured Pydantic Config class.
- Updated all relevant modules (train.py, augmentation.py, exports.py, dataset-visualiser.py, manual_run.py) to access paths through the new config structure.
- Fixed bugs related to image processing and model saving.
- Enhanced test infrastructure to accommodate the new configuration approach.

This refactor improves code maintainability and clarity by centralizing configuration management.

2026-03-27 18:18:30 +02:00

4.3 KiB

Raw Blame History

Data Model

Entity Overview

This system does not use a database. All data is stored as files on the filesystem and in-memory data structures. The primary entities are annotation images, labels, and ML models.

Entities

Annotation Image

Storage: JPEG files on filesystem
Naming: {uuid}.jpg (name assigned by Azaion platform)
Lifecycle: Created → Seed/Validated → Augmented → Dataset → Model Training

Annotation Label (YOLO format)

Storage: Text files on filesystem
Naming: {uuid}.txt (matches image name)
Format: One line per detection: {class_id} {center_x} {center_y} {width} {height}
Coordinates: All normalized to 0–1 range relative to image dimensions

AnnotationClass

Storage: classes.json (static file, 17 entries)
Fields: Id (int), Name (str), ShortName (str), Color (hex str)
Weather expansion: Each class × 3 weather modes → IDs offset by 0/20/40
Total slots: 80 (51 used, 29 reserved as "Class-N" placeholders)

Detection (inference)

In-memory only: Created during inference postprocessing
Fields: x, y, w, h (normalized), cls (int), confidence (float)

Annotation (inference)

In-memory only: Groups detections per video frame
Fields: frame (image), time (ms), detections (list)

AnnotationMessage (queue)

Wire format: msgpack with positional integer keys
Fields: createdDate, name, originalMediaName, time, imageExtension, detections (JSON string), image (bytes), createdRole, createdEmail, source, status

ML Model

Formats: .pt, .onnx, .engine, .rknn
Encryption: AES-256-CBC before upload
Split storage: .small part (API server) + .big part (CDN)
Naming: azaion.{ext} for current model; azaion.cc_{major}.{minor}_sm_{count}.engine for GPU-specific TensorRT

Filesystem Entity Relationships

erDiagram
    ANNOTATION_IMAGE ||--|| ANNOTATION_LABEL : "matches by filename stem"
    ANNOTATION_CLASS ||--o{ ANNOTATION_LABEL : "class_id references"
    ANNOTATION_IMAGE }o--|| DATASET_SPLIT : "copied into"
    ANNOTATION_LABEL }o--|| DATASET_SPLIT : "copied into"
    DATASET_SPLIT ||--|| TRAINING_RUN : "input to"
    TRAINING_RUN ||--|| MODEL_PT : "produces"
    MODEL_PT ||--|| MODEL_ONNX : "exported to"
    MODEL_PT ||--|| MODEL_ENGINE : "exported to"
    MODEL_PT ||--|| MODEL_RKNN : "exported to"
    MODEL_ONNX ||--|| ENCRYPTED_MODEL : "encrypted"
    MODEL_ENGINE ||--|| ENCRYPTED_MODEL : "encrypted"
    ENCRYPTED_MODEL ||--|| MODEL_SMALL : "split part"
    ENCRYPTED_MODEL ||--|| MODEL_BIG : "split part"

Directory Layout (Data Lifecycle)

/azaion/
├── data-seed/              ← Unvalidated annotations (from operators)
│   ├── images/
│   └── labels/
├── data/                   ← Validated annotations (from validators/admins)
│   ├── images/
│   └── labels/
├── data-processed/         ← Augmented data (8× expansion)
│   ├── images/
│   └── labels/
├── data-corrupted/         ← Invalid labels (coords > 1.0)
│   ├── images/
│   └── labels/
├── data_deleted/           ← Soft-deleted annotations
│   ├── images/
│   └── labels/
├── data-sample/            ← Random sample for review
├── datasets/               ← Training datasets (dated)
│   └── azaion-{YYYY-MM-DD}/
│       ├── train/images/ + labels/
│       ├── valid/images/ + labels/
│       ├── test/images/ + labels/
│       └── data.yaml
└── models/                 ← Trained model artifacts
    ├── azaion.pt           ← Current best model
    ├── azaion.onnx         ← Current ONNX export
    └── azaion-{YYYY-MM-DD}/← Per-training-run results
        └── weights/
            └── best.pt

Configuration Files

File	Location	Contents
`config.yaml`	Project root	API credentials, queue config, directory paths
`cdn.yaml`	Project root	CDN endpoint + S3 access keys
`classes.json`	Project root	Annotation class definitions (17 classes)
`checkpoint.txt`	Project root	Last training checkpoint timestamp
`offset.yaml`	annotation-queue/	Queue consumer offset
`data.yaml`	Per dataset	YOLO training config (class names, split paths)

4.3 KiB Raw Blame History Unescape Escape