mirror of
https://github.com/azaion/ai-training.git
synced 2026-04-22 12:06:36 +00:00
142c6c4de8
- Replaced module-level path variables in constants.py with a structured Pydantic Config class. - Updated all relevant modules (train.py, augmentation.py, exports.py, dataset-visualiser.py, manual_run.py) to access paths through the new config structure. - Fixed bugs related to image processing and model saving. - Enhanced test infrastructure to accommodate the new configuration approach. This refactor improves code maintainability and clarity by centralizing configuration management.
107 lines
4.3 KiB
Markdown
107 lines
4.3 KiB
Markdown
# Data Model
|
||
|
||
## Entity Overview
|
||
|
||
This system does not use a database. All data is stored as files on the filesystem and in-memory data structures. The primary entities are annotation images, labels, and ML models.
|
||
|
||
## Entities
|
||
|
||
### Annotation Image
|
||
- **Storage**: JPEG files on filesystem
|
||
- **Naming**: `{uuid}.jpg` (name assigned by Azaion platform)
|
||
- **Lifecycle**: Created → Seed/Validated → Augmented → Dataset → Model Training
|
||
|
||
### Annotation Label (YOLO format)
|
||
- **Storage**: Text files on filesystem
|
||
- **Naming**: `{uuid}.txt` (matches image name)
|
||
- **Format**: One line per detection: `{class_id} {center_x} {center_y} {width} {height}`
|
||
- **Coordinates**: All normalized to 0–1 range relative to image dimensions
|
||
|
||
### AnnotationClass
|
||
- **Storage**: `classes.json` (static file, 17 entries)
|
||
- **Fields**: Id (int), Name (str), ShortName (str), Color (hex str)
|
||
- **Weather expansion**: Each class × 3 weather modes → IDs offset by 0/20/40
|
||
- **Total slots**: 80 (51 used, 29 reserved as "Class-N" placeholders)
|
||
|
||
### Detection (inference)
|
||
- **In-memory only**: Created during inference postprocessing
|
||
- **Fields**: x, y, w, h (normalized), cls (int), confidence (float)
|
||
|
||
### Annotation (inference)
|
||
- **In-memory only**: Groups detections per video frame
|
||
- **Fields**: frame (image), time (ms), detections (list)
|
||
|
||
### AnnotationMessage (queue)
|
||
- **Wire format**: msgpack with positional integer keys
|
||
- **Fields**: createdDate, name, originalMediaName, time, imageExtension, detections (JSON string), image (bytes), createdRole, createdEmail, source, status
|
||
|
||
### ML Model
|
||
- **Formats**: .pt, .onnx, .engine, .rknn
|
||
- **Encryption**: AES-256-CBC before upload
|
||
- **Split storage**: .small part (API server) + .big part (CDN)
|
||
- **Naming**: `azaion.{ext}` for current model; `azaion.cc_{major}.{minor}_sm_{count}.engine` for GPU-specific TensorRT
|
||
|
||
## Filesystem Entity Relationships
|
||
|
||
```mermaid
|
||
erDiagram
|
||
ANNOTATION_IMAGE ||--|| ANNOTATION_LABEL : "matches by filename stem"
|
||
ANNOTATION_CLASS ||--o{ ANNOTATION_LABEL : "class_id references"
|
||
ANNOTATION_IMAGE }o--|| DATASET_SPLIT : "copied into"
|
||
ANNOTATION_LABEL }o--|| DATASET_SPLIT : "copied into"
|
||
DATASET_SPLIT ||--|| TRAINING_RUN : "input to"
|
||
TRAINING_RUN ||--|| MODEL_PT : "produces"
|
||
MODEL_PT ||--|| MODEL_ONNX : "exported to"
|
||
MODEL_PT ||--|| MODEL_ENGINE : "exported to"
|
||
MODEL_PT ||--|| MODEL_RKNN : "exported to"
|
||
MODEL_ONNX ||--|| ENCRYPTED_MODEL : "encrypted"
|
||
MODEL_ENGINE ||--|| ENCRYPTED_MODEL : "encrypted"
|
||
ENCRYPTED_MODEL ||--|| MODEL_SMALL : "split part"
|
||
ENCRYPTED_MODEL ||--|| MODEL_BIG : "split part"
|
||
```
|
||
|
||
## Directory Layout (Data Lifecycle)
|
||
|
||
```
|
||
/azaion/
|
||
├── data-seed/ ← Unvalidated annotations (from operators)
|
||
│ ├── images/
|
||
│ └── labels/
|
||
├── data/ ← Validated annotations (from validators/admins)
|
||
│ ├── images/
|
||
│ └── labels/
|
||
├── data-processed/ ← Augmented data (8× expansion)
|
||
│ ├── images/
|
||
│ └── labels/
|
||
├── data-corrupted/ ← Invalid labels (coords > 1.0)
|
||
│ ├── images/
|
||
│ └── labels/
|
||
├── data_deleted/ ← Soft-deleted annotations
|
||
│ ├── images/
|
||
│ └── labels/
|
||
├── data-sample/ ← Random sample for review
|
||
├── datasets/ ← Training datasets (dated)
|
||
│ └── azaion-{YYYY-MM-DD}/
|
||
│ ├── train/images/ + labels/
|
||
│ ├── valid/images/ + labels/
|
||
│ ├── test/images/ + labels/
|
||
│ └── data.yaml
|
||
└── models/ ← Trained model artifacts
|
||
├── azaion.pt ← Current best model
|
||
├── azaion.onnx ← Current ONNX export
|
||
└── azaion-{YYYY-MM-DD}/← Per-training-run results
|
||
└── weights/
|
||
└── best.pt
|
||
```
|
||
|
||
## Configuration Files
|
||
|
||
| File | Location | Contents |
|
||
|------|----------|---------|
|
||
| `config.yaml` | Project root | API credentials, queue config, directory paths |
|
||
| `cdn.yaml` | Project root | CDN endpoint + S3 access keys |
|
||
| `classes.json` | Project root | Annotation class definitions (17 classes) |
|
||
| `checkpoint.txt` | Project root | Last training checkpoint timestamp |
|
||
| `offset.yaml` | annotation-queue/ | Queue consumer offset |
|
||
| `data.yaml` | Per dataset | YOLO training config (class names, split paths) |
|