Files
ai-training/_docs/02_document/data_model.md
T
Oleksandr Bezdieniezhnykh 142c6c4de8 Refactor constants management to use Pydantic BaseModel for configuration
- Replaced module-level path variables in constants.py with a structured Pydantic Config class.
- Updated all relevant modules (train.py, augmentation.py, exports.py, dataset-visualiser.py, manual_run.py) to access paths through the new config structure.
- Fixed bugs related to image processing and model saving.
- Enhanced test infrastructure to accommodate the new configuration approach.

This refactor improves code maintainability and clarity by centralizing configuration management.
2026-03-27 18:18:30 +02:00

107 lines
4.3 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters
This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.
# Data Model
## Entity Overview
This system does not use a database. All data is stored as files on the filesystem and in-memory data structures. The primary entities are annotation images, labels, and ML models.
## Entities
### Annotation Image
- **Storage**: JPEG files on filesystem
- **Naming**: `{uuid}.jpg` (name assigned by Azaion platform)
- **Lifecycle**: Created → Seed/Validated → Augmented → Dataset → Model Training
### Annotation Label (YOLO format)
- **Storage**: Text files on filesystem
- **Naming**: `{uuid}.txt` (matches image name)
- **Format**: One line per detection: `{class_id} {center_x} {center_y} {width} {height}`
- **Coordinates**: All normalized to 01 range relative to image dimensions
### AnnotationClass
- **Storage**: `classes.json` (static file, 17 entries)
- **Fields**: Id (int), Name (str), ShortName (str), Color (hex str)
- **Weather expansion**: Each class × 3 weather modes → IDs offset by 0/20/40
- **Total slots**: 80 (51 used, 29 reserved as "Class-N" placeholders)
### Detection (inference)
- **In-memory only**: Created during inference postprocessing
- **Fields**: x, y, w, h (normalized), cls (int), confidence (float)
### Annotation (inference)
- **In-memory only**: Groups detections per video frame
- **Fields**: frame (image), time (ms), detections (list)
### AnnotationMessage (queue)
- **Wire format**: msgpack with positional integer keys
- **Fields**: createdDate, name, originalMediaName, time, imageExtension, detections (JSON string), image (bytes), createdRole, createdEmail, source, status
### ML Model
- **Formats**: .pt, .onnx, .engine, .rknn
- **Encryption**: AES-256-CBC before upload
- **Split storage**: .small part (API server) + .big part (CDN)
- **Naming**: `azaion.{ext}` for current model; `azaion.cc_{major}.{minor}_sm_{count}.engine` for GPU-specific TensorRT
## Filesystem Entity Relationships
```mermaid
erDiagram
ANNOTATION_IMAGE ||--|| ANNOTATION_LABEL : "matches by filename stem"
ANNOTATION_CLASS ||--o{ ANNOTATION_LABEL : "class_id references"
ANNOTATION_IMAGE }o--|| DATASET_SPLIT : "copied into"
ANNOTATION_LABEL }o--|| DATASET_SPLIT : "copied into"
DATASET_SPLIT ||--|| TRAINING_RUN : "input to"
TRAINING_RUN ||--|| MODEL_PT : "produces"
MODEL_PT ||--|| MODEL_ONNX : "exported to"
MODEL_PT ||--|| MODEL_ENGINE : "exported to"
MODEL_PT ||--|| MODEL_RKNN : "exported to"
MODEL_ONNX ||--|| ENCRYPTED_MODEL : "encrypted"
MODEL_ENGINE ||--|| ENCRYPTED_MODEL : "encrypted"
ENCRYPTED_MODEL ||--|| MODEL_SMALL : "split part"
ENCRYPTED_MODEL ||--|| MODEL_BIG : "split part"
```
## Directory Layout (Data Lifecycle)
```
/azaion/
├── data-seed/ ← Unvalidated annotations (from operators)
│ ├── images/
│ └── labels/
├── data/ ← Validated annotations (from validators/admins)
│ ├── images/
│ └── labels/
├── data-processed/ ← Augmented data (8× expansion)
│ ├── images/
│ └── labels/
├── data-corrupted/ ← Invalid labels (coords > 1.0)
│ ├── images/
│ └── labels/
├── data_deleted/ ← Soft-deleted annotations
│ ├── images/
│ └── labels/
├── data-sample/ ← Random sample for review
├── datasets/ ← Training datasets (dated)
│ └── azaion-{YYYY-MM-DD}/
│ ├── train/images/ + labels/
│ ├── valid/images/ + labels/
│ ├── test/images/ + labels/
│ └── data.yaml
└── models/ ← Trained model artifacts
├── azaion.pt ← Current best model
├── azaion.onnx ← Current ONNX export
└── azaion-{YYYY-MM-DD}/← Per-training-run results
└── weights/
└── best.pt
```
## Configuration Files
| File | Location | Contents |
|------|----------|---------|
| `config.yaml` | Project root | API credentials, queue config, directory paths |
| `cdn.yaml` | Project root | CDN endpoint + S3 access keys |
| `classes.json` | Project root | Annotation class definitions (17 classes) |
| `checkpoint.txt` | Project root | Last training checkpoint timestamp |
| `offset.yaml` | annotation-queue/ | Queue consumer offset |
| `data.yaml` | Per dataset | YOLO training config (class names, split paths) |