Refactor constants management to use Pydantic BaseModel for configuration

- Replaced module-level path variables in constants.py with a structured Pydantic Config class. - Updated all relevant modules (train.py, augmentation.py, exports.py, dataset-visualiser.py, manual_run.py) to access paths through the new config structure. - Fixed bugs related to image processing and model saving. - Enhanced test infrastructure to accommodate the new configuration approach. This refactor improves code maintainability and clarity by centralizing configuration management.
2026-06-21 16:01:11 +00:00 · 2026-03-27 18:18:30 +02:00
parent b68c07b540
commit 142c6c4de8
106 changed files with 5706 additions and 654 deletions
@@ -0,0 +1,106 @@
+# Data Model
+
+## Entity Overview
+
+This system does not use a database. All data is stored as files on the filesystem and in-memory data structures. The primary entities are annotation images, labels, and ML models.
+
+## Entities
+
+### Annotation Image
+- **Storage**: JPEG files on filesystem
+- **Naming**: `{uuid}.jpg` (name assigned by Azaion platform)
+- **Lifecycle**: Created → Seed/Validated → Augmented → Dataset → Model Training
+
+### Annotation Label (YOLO format)
+- **Storage**: Text files on filesystem
+- **Naming**: `{uuid}.txt` (matches image name)
+- **Format**: One line per detection: `{class_id} {center_x} {center_y} {width} {height}`
+- **Coordinates**: All normalized to 0–1 range relative to image dimensions
+
+### AnnotationClass
+- **Storage**: `classes.json` (static file, 17 entries)
+- **Fields**: Id (int), Name (str), ShortName (str), Color (hex str)
+- **Weather expansion**: Each class × 3 weather modes → IDs offset by 0/20/40
+- **Total slots**: 80 (51 used, 29 reserved as "Class-N" placeholders)
+
+### Detection (inference)
+- **In-memory only**: Created during inference postprocessing
+- **Fields**: x, y, w, h (normalized), cls (int), confidence (float)
+
+### Annotation (inference)
+- **In-memory only**: Groups detections per video frame
+- **Fields**: frame (image), time (ms), detections (list)
+
+### AnnotationMessage (queue)
+- **Wire format**: msgpack with positional integer keys
+- **Fields**: createdDate, name, originalMediaName, time, imageExtension, detections (JSON string), image (bytes), createdRole, createdEmail, source, status
+
+### ML Model
+- **Formats**: .pt, .onnx, .engine, .rknn
+- **Encryption**: AES-256-CBC before upload
+- **Split storage**: .small part (API server) + .big part (CDN)
+- **Naming**: `azaion.{ext}` for current model; `azaion.cc_{major}.{minor}_sm_{count}.engine` for GPU-specific TensorRT
+
+## Filesystem Entity Relationships
+
+```mermaid
+erDiagram
+    ANNOTATION_IMAGE ||--|| ANNOTATION_LABEL : "matches by filename stem"
+    ANNOTATION_CLASS ||--o{ ANNOTATION_LABEL : "class_id references"
+    ANNOTATION_IMAGE }o--|| DATASET_SPLIT : "copied into"
+    ANNOTATION_LABEL }o--|| DATASET_SPLIT : "copied into"
+    DATASET_SPLIT ||--|| TRAINING_RUN : "input to"
+    TRAINING_RUN ||--|| MODEL_PT : "produces"
+    MODEL_PT ||--|| MODEL_ONNX : "exported to"
+    MODEL_PT ||--|| MODEL_ENGINE : "exported to"
+    MODEL_PT ||--|| MODEL_RKNN : "exported to"
+    MODEL_ONNX ||--|| ENCRYPTED_MODEL : "encrypted"
+    MODEL_ENGINE ||--|| ENCRYPTED_MODEL : "encrypted"
+    ENCRYPTED_MODEL ||--|| MODEL_SMALL : "split part"
+    ENCRYPTED_MODEL ||--|| MODEL_BIG : "split part"
+```
+
+## Directory Layout (Data Lifecycle)
+
+```
+/azaion/
+├── data-seed/              ← Unvalidated annotations (from operators)
+│   ├── images/
+│   └── labels/
+├── data/                   ← Validated annotations (from validators/admins)
+│   ├── images/
+│   └── labels/
+├── data-processed/         ← Augmented data (8× expansion)
+│   ├── images/
+│   └── labels/
+├── data-corrupted/         ← Invalid labels (coords > 1.0)
+│   ├── images/
+│   └── labels/
+├── data_deleted/           ← Soft-deleted annotations
+│   ├── images/
+│   └── labels/
+├── data-sample/            ← Random sample for review
+├── datasets/               ← Training datasets (dated)
+│   └── azaion-{YYYY-MM-DD}/
+│       ├── train/images/ + labels/
+│       ├── valid/images/ + labels/
+│       ├── test/images/ + labels/
+│       └── data.yaml
+└── models/                 ← Trained model artifacts
+    ├── azaion.pt           ← Current best model
+    ├── azaion.onnx         ← Current ONNX export
+    └── azaion-{YYYY-MM-DD}/← Per-training-run results
+        └── weights/
+            └── best.pt
+```
+
+## Configuration Files
+
+| File | Location | Contents |
+|------|----------|---------|
+| `config.yaml` | Project root | API credentials, queue config, directory paths |
+| `cdn.yaml` | Project root | CDN endpoint + S3 access keys |
+| `classes.json` | Project root | Annotation class definitions (17 classes) |
+| `checkpoint.txt` | Project root | Last training checkpoint timestamp |
+| `offset.yaml` | annotation-queue/ | Queue consumer offset |
+| `data.yaml` | Per dataset | YOLO training config (class names, split paths) |