mirror of
https://github.com/azaion/ai-training.git
synced 2026-04-22 08:26:35 +00:00
142c6c4de8
- Replaced module-level path variables in constants.py with a structured Pydantic Config class. - Updated all relevant modules (train.py, augmentation.py, exports.py, dataset-visualiser.py, manual_run.py) to access paths through the new config structure. - Fixed bugs related to image processing and model saving. - Enhanced test infrastructure to accommodate the new configuration approach. This refactor improves code maintainability and clarity by centralizing configuration management.
4.3 KiB
4.3 KiB
Data Model
Entity Overview
This system does not use a database. All data is stored as files on the filesystem and in-memory data structures. The primary entities are annotation images, labels, and ML models.
Entities
Annotation Image
- Storage: JPEG files on filesystem
- Naming:
{uuid}.jpg(name assigned by Azaion platform) - Lifecycle: Created → Seed/Validated → Augmented → Dataset → Model Training
Annotation Label (YOLO format)
- Storage: Text files on filesystem
- Naming:
{uuid}.txt(matches image name) - Format: One line per detection:
{class_id} {center_x} {center_y} {width} {height} - Coordinates: All normalized to 0–1 range relative to image dimensions
AnnotationClass
- Storage:
classes.json(static file, 17 entries) - Fields: Id (int), Name (str), ShortName (str), Color (hex str)
- Weather expansion: Each class × 3 weather modes → IDs offset by 0/20/40
- Total slots: 80 (51 used, 29 reserved as "Class-N" placeholders)
Detection (inference)
- In-memory only: Created during inference postprocessing
- Fields: x, y, w, h (normalized), cls (int), confidence (float)
Annotation (inference)
- In-memory only: Groups detections per video frame
- Fields: frame (image), time (ms), detections (list)
AnnotationMessage (queue)
- Wire format: msgpack with positional integer keys
- Fields: createdDate, name, originalMediaName, time, imageExtension, detections (JSON string), image (bytes), createdRole, createdEmail, source, status
ML Model
- Formats: .pt, .onnx, .engine, .rknn
- Encryption: AES-256-CBC before upload
- Split storage: .small part (API server) + .big part (CDN)
- Naming:
azaion.{ext}for current model;azaion.cc_{major}.{minor}_sm_{count}.enginefor GPU-specific TensorRT
Filesystem Entity Relationships
erDiagram
ANNOTATION_IMAGE ||--|| ANNOTATION_LABEL : "matches by filename stem"
ANNOTATION_CLASS ||--o{ ANNOTATION_LABEL : "class_id references"
ANNOTATION_IMAGE }o--|| DATASET_SPLIT : "copied into"
ANNOTATION_LABEL }o--|| DATASET_SPLIT : "copied into"
DATASET_SPLIT ||--|| TRAINING_RUN : "input to"
TRAINING_RUN ||--|| MODEL_PT : "produces"
MODEL_PT ||--|| MODEL_ONNX : "exported to"
MODEL_PT ||--|| MODEL_ENGINE : "exported to"
MODEL_PT ||--|| MODEL_RKNN : "exported to"
MODEL_ONNX ||--|| ENCRYPTED_MODEL : "encrypted"
MODEL_ENGINE ||--|| ENCRYPTED_MODEL : "encrypted"
ENCRYPTED_MODEL ||--|| MODEL_SMALL : "split part"
ENCRYPTED_MODEL ||--|| MODEL_BIG : "split part"
Directory Layout (Data Lifecycle)
/azaion/
├── data-seed/ ← Unvalidated annotations (from operators)
│ ├── images/
│ └── labels/
├── data/ ← Validated annotations (from validators/admins)
│ ├── images/
│ └── labels/
├── data-processed/ ← Augmented data (8× expansion)
│ ├── images/
│ └── labels/
├── data-corrupted/ ← Invalid labels (coords > 1.0)
│ ├── images/
│ └── labels/
├── data_deleted/ ← Soft-deleted annotations
│ ├── images/
│ └── labels/
├── data-sample/ ← Random sample for review
├── datasets/ ← Training datasets (dated)
│ └── azaion-{YYYY-MM-DD}/
│ ├── train/images/ + labels/
│ ├── valid/images/ + labels/
│ ├── test/images/ + labels/
│ └── data.yaml
└── models/ ← Trained model artifacts
├── azaion.pt ← Current best model
├── azaion.onnx ← Current ONNX export
└── azaion-{YYYY-MM-DD}/← Per-training-run results
└── weights/
└── best.pt
Configuration Files
| File | Location | Contents |
|---|---|---|
config.yaml |
Project root | API credentials, queue config, directory paths |
cdn.yaml |
Project root | CDN endpoint + S3 access keys |
classes.json |
Project root | Annotation class definitions (17 classes) |
checkpoint.txt |
Project root | Last training checkpoint timestamp |
offset.yaml |
annotation-queue/ | Queue consumer offset |
data.yaml |
Per dataset | YOLO training config (class names, split paths) |