# Data Model ## Entity Overview This system does not use a database. All data is stored as files on the filesystem and in-memory data structures. The primary entities are annotation images, labels, and ML models. ## Entities ### Annotation Image - **Storage**: JPEG files on filesystem - **Naming**: `{uuid}.jpg` (name assigned by Azaion platform) - **Lifecycle**: Created → Seed/Validated → Augmented → Dataset → Model Training ### Annotation Label (YOLO format) - **Storage**: Text files on filesystem - **Naming**: `{uuid}.txt` (matches image name) - **Format**: One line per detection: `{class_id} {center_x} {center_y} {width} {height}` - **Coordinates**: All normalized to 0–1 range relative to image dimensions ### AnnotationClass - **Storage**: `classes.json` (static file, 17 entries) - **Fields**: Id (int), Name (str), ShortName (str), Color (hex str) - **Weather expansion**: Each class × 3 weather modes → IDs offset by 0/20/40 - **Total slots**: 80 (51 used, 29 reserved as "Class-N" placeholders) ### Detection (inference) - **In-memory only**: Created during inference postprocessing - **Fields**: x, y, w, h (normalized), cls (int), confidence (float) ### Annotation (inference) - **In-memory only**: Groups detections per video frame - **Fields**: frame (image), time (ms), detections (list) ### AnnotationMessage (queue) - **Wire format**: msgpack with positional integer keys - **Fields**: createdDate, name, originalMediaName, time, imageExtension, detections (JSON string), image (bytes), createdRole, createdEmail, source, status ### ML Model - **Formats**: .pt, .onnx, .engine, .rknn - **Encryption**: AES-256-CBC before upload - **Split storage**: .small part (API server) + .big part (CDN) - **Naming**: `azaion.{ext}` for current model; `azaion.cc_{major}.{minor}_sm_{count}.engine` for GPU-specific TensorRT ## Filesystem Entity Relationships ```mermaid erDiagram ANNOTATION_IMAGE ||--|| ANNOTATION_LABEL : "matches by filename stem" ANNOTATION_CLASS ||--o{ ANNOTATION_LABEL : "class_id references" ANNOTATION_IMAGE }o--|| DATASET_SPLIT : "copied into" ANNOTATION_LABEL }o--|| DATASET_SPLIT : "copied into" DATASET_SPLIT ||--|| TRAINING_RUN : "input to" TRAINING_RUN ||--|| MODEL_PT : "produces" MODEL_PT ||--|| MODEL_ONNX : "exported to" MODEL_PT ||--|| MODEL_ENGINE : "exported to" MODEL_PT ||--|| MODEL_RKNN : "exported to" MODEL_ONNX ||--|| ENCRYPTED_MODEL : "encrypted" MODEL_ENGINE ||--|| ENCRYPTED_MODEL : "encrypted" ENCRYPTED_MODEL ||--|| MODEL_SMALL : "split part" ENCRYPTED_MODEL ||--|| MODEL_BIG : "split part" ``` ## Directory Layout (Data Lifecycle) ``` /azaion/ ├── data-seed/ ← Unvalidated annotations (from operators) │ ├── images/ │ └── labels/ ├── data/ ← Validated annotations (from validators/admins) │ ├── images/ │ └── labels/ ├── data-processed/ ← Augmented data (8× expansion) │ ├── images/ │ └── labels/ ├── data-corrupted/ ← Invalid labels (coords > 1.0) │ ├── images/ │ └── labels/ ├── data_deleted/ ← Soft-deleted annotations │ ├── images/ │ └── labels/ ├── data-sample/ ← Random sample for review ├── datasets/ ← Training datasets (dated) │ └── azaion-{YYYY-MM-DD}/ │ ├── train/images/ + labels/ │ ├── valid/images/ + labels/ │ ├── test/images/ + labels/ │ └── data.yaml └── models/ ← Trained model artifacts ├── azaion.pt ← Current best model ├── azaion.onnx ← Current ONNX export └── azaion-{YYYY-MM-DD}/← Per-training-run results └── weights/ └── best.pt ``` ## Configuration Files | File | Location | Contents | |------|----------|---------| | `config.yaml` | Project root | API credentials, queue config, directory paths | | `cdn.yaml` | Project root | CDN endpoint + S3 access keys | | `classes.json` | Project root | Annotation class definitions (17 classes) | | `checkpoint.txt` | Project root | Last training checkpoint timestamp | | `offset.yaml` | annotation-queue/ | Queue consumer offset | | `data.yaml` | Per dataset | YOLO training config (class names, split paths) |