Refactor constants management to use Pydantic BaseModel for configuration

- Replaced module-level path variables in constants.py with a structured Pydantic Config class. - Updated all relevant modules (train.py, augmentation.py, exports.py, dataset-visualiser.py, manual_run.py) to access paths through the new config structure. - Fixed bugs related to image processing and model saving. - Enhanced test infrastructure to accommodate the new configuration approach. This refactor improves code maintainability and clarity by centralizing configuration management.
2026-04-22 11:26:36 +00:00 · 2026-03-27 18:18:30 +02:00
parent b68c07b540
commit 142c6c4de8
106 changed files with 5706 additions and 654 deletions
@@ -0,0 +1,47 @@
+# Input Data Parameters
+
+## Annotation Images
+
+- **Format**: JPEG
+- **Naming**: UUID-based (`{uuid}.jpg`)
+- **Source**: Azaion annotation platform via RabbitMQ Streams
+- **Volume**: Up to 360K+ annotations observed in training comments
+- **Delivery**: Real-time streaming via annotation queue consumer
+
+## Annotation Labels
+
+- **Format**: YOLO text format (one detection per line)
+- **Schema**: `{class_id} {center_x} {center_y} {width} {height}`
+- **Coordinate system**: All values normalized to 0–1 relative to image dimensions
+- **Constraints**: Coordinates must be in [0, 1]; labels with coords > 1.0 are treated as corrupted
+
+## Annotation Classes
+
+- **Source file**: `classes.json` (static, 17 entries)
+- **Schema per class**: `{ Id: int, Name: str, ShortName: str, Color: hex_str }`
+- **Classes**: ArmorVehicle, Truck, Vehicle, Artillery, Shadow, Trenches, MilitaryMan, TyreTracks, AdditArmoredTank, Smoke, Plane, Moto, CamouflageNet, CamouflageBranches, Roof, Building, Caponier
+- **Weather expansion**: Each class × 3 modes (Norm offset 0, Wint offset 20, Night offset 40)
+- **Total class IDs**: 80 slots (51 used, 29 reserved as placeholders)
+
+## Queue Messages
+
+- **Protocol**: AMQP via RabbitMQ Streams (rstream library)
+- **Serialization**: msgpack with positional integer keys
+- **Message types**: AnnotationMessage (single), AnnotationBulkMessage (batch validate/delete)
+- **Fields**: createdDate, name, originalMediaName, time, imageExtension, detections (JSON string), image (raw bytes), createdRole, createdEmail, source, status
+
+## Configuration Files
+
+| File | Format | Key Contents |
+|------|--------|-------------|
+| `config.yaml` | YAML | API URL, email, password, queue host/port/username/password, directory paths |
+| `cdn.yaml` | YAML | CDN endpoint, read access key/secret, write access key/secret, bucket name |
+| `classes.json` | JSON | Annotation class definitions array |
+| `checkpoint.txt` | Plain text | Last training run timestamp |
+| `offset.yaml` | YAML | Queue consumer offset for resume |
+
+## Video Input (Inference)
+
+- **Format**: Any OpenCV-supported video format
+- **Processing**: Every 4th frame sampled, batched in groups of 4
+- **Resolution**: Resized to model input size (1280×1280) during preprocessing
@@ -0,0 +1,107 @@
+# Expected Results
+
+Maps every input data item to its quantifiable expected result.
+
+## Result Format Legend
+
+| Result Type | When to Use | Example |
+|-------------|-------------|---------|
+| Exact value | Output must match precisely | `detection_count: 3`, `file_count: 8` |
+| Tolerance range | Numeric output with acceptable variance | `confidence: 0.92 ± 0.05` |
+| Threshold | Output must exceed or stay below a limit | `latency < 500ms`, `confidence ≥ 0.3` |
+| Pattern match | Output must match a string/regex pattern | `filename matches *_1.jpg` |
+| Set/count | Output must contain specific items or counts | `output_count == 8` |
+
+## Input → Expected Result Mapping
+
+### Augmentation
+
+| # | Input | Input Description | Expected Result | Comparison | Tolerance | Reference File |
+|---|-------|-------------------|-----------------|------------|-----------|---------------|
+| 1 | 1 image + 1 label from `dataset/` | Single annotated image with valid bboxes | `output_count: 8` (1 original + 7 augmented) | exact | N/A | N/A |
+| 2 | 1 image + 1 label from `dataset/` | Same image, output filenames | Original keeps name; augmented named `{stem}_1` through `{stem}_7` | pattern | N/A | N/A |
+| 3 | 1 image + 1 label from `dataset/` | All output label bboxes | Every coordinate in [0, 1] range | range | [0.0, 1.0] | N/A |
+| 4 | 1 image + label with bbox near edge (x=0.99, w=0.1) | Bbox partially outside image | Bbox clipped: width reduced, tiny bboxes (area < 0.01) removed | threshold_min | width ≥ 0.01, height ≥ 0.01 | N/A |
+| 5 | 1 image + empty label file | Image with no detections | `output_count: 8`, all label files empty | exact | N/A | N/A |
+
+### Dataset Formation
+
+| # | Input | Input Description | Expected Result | Comparison | Tolerance | Reference File |
+|---|-------|-------------------|-----------------|------------|-----------|---------------|
+| 6 | 100 images + 100 labels from `dataset/` | Full fixture dataset | 3 folders created: `train/`, `valid/`, `test/` | exact | N/A | N/A |
+| 7 | 100 images + 100 labels from `dataset/` | Split ratio | train: 70, valid: 20, test: 10 | exact | N/A | N/A |
+| 8 | 100 images + 100 labels from `dataset/` | Each split has images/ and labels/ subdirs | `train/images/`, `train/labels/`, `valid/images/`, `valid/labels/`, `test/images/`, `test/labels/` | exact | N/A | N/A |
+| 9 | 100 images + 100 labels from `dataset/` | Total files across all splits equals input count | `sum(train + valid + test) == 100` | exact | N/A | N/A |
+
+### Label Validation
+
+| # | Input | Input Description | Expected Result | Comparison | Tolerance | Reference File |
+|---|-------|-------------------|-----------------|------------|-----------|---------------|
+| 10 | Label file: `0 0.5 0.5 0.1 0.1` | Valid label (all coords ≤ 1.0) | `check_label` returns `True` | exact | N/A | N/A |
+| 11 | Label file: `0 1.5 0.5 0.1 0.1` | Corrupted label (x > 1.0) | `check_label` returns `False` | exact | N/A | N/A |
+| 12 | Label file: `0 0.5 0.5 0.1 1.2` | Corrupted label (h > 1.0) | `check_label` returns `False` | exact | N/A | N/A |
+| 13 | Non-existent label path | Missing label file | `check_label` returns `False` | exact | N/A | N/A |
+| 14 | Mix of 5 valid + 1 corrupted images/labels | Dataset formation with corrupted data | Corrupted image+label moved to `data-corrupted/`; valid ones in dataset splits | exact | corrupted_count: 1, valid_count: 5 | N/A |
+
+### Encryption Roundtrip
+
+| # | Input | Input Description | Expected Result | Comparison | Tolerance | Reference File |
+|---|-------|-------------------|-----------------|------------|-----------|---------------|
+| 15 | 1024 random bytes + key "test-key" | Arbitrary binary data | `decrypt(encrypt(data, key), key) == data` | exact | N/A | N/A |
+| 16 | `azaion.onnx` bytes + model encryption key | Full ONNX model file | `decrypt(encrypt(model_bytes, key), key) == model_bytes` | exact | N/A | N/A |
+| 17 | Empty bytes + key "test-key" | Edge case: zero-length input | `decrypt(encrypt(b"", key), key) == b""` | exact | N/A | N/A |
+| 18 | 1 byte + key "test-key" | Edge case: minimum-length input | `decrypt(encrypt(b"\x00", key), key) == b"\x00"` | exact | N/A | N/A |
+
+### Model Encryption + Split
+
+| # | Input | Input Description | Expected Result | Comparison | Tolerance | Reference File |
+|---|-------|-------------------|-----------------|------------|-----------|---------------|
+| 19 | 10000 bytes, key | Model-like binary data | Encrypted bytes split: small ≤ 3KB or 20% of total, big = remainder | threshold_max | small ≤ max(3072, total*0.2) | N/A |
+| 20 | 10000 bytes, key | Same data, reassembled | `small + big == encrypted_total` | exact | N/A | N/A |
+
+### Annotation Class Loading
+
+| # | Input | Input Description | Expected Result | Comparison | Tolerance | Reference File |
+|---|-------|-------------------|-----------------|------------|-----------|---------------|
+| 21 | `classes.json` | Standard class definitions | `len(classes) == 17` unique base classes | exact | N/A | N/A |
+| 22 | `classes.json` | Weather mode expansion | Class IDs: Norm offset 0, Wint offset 20, Night offset 40 | exact | N/A | N/A |
+| 23 | `classes.json` | Total class slots in data.yaml | `nc: 80` in generated YAML | exact | N/A | N/A |
+
+### Hardware Hash Determinism
+
+| # | Input | Input Description | Expected Result | Comparison | Tolerance | Reference File |
+|---|-------|-------------------|-----------------|------------|-----------|---------------|
+| 24 | String "test-hardware-info" | Arbitrary hardware string | `get_hw_hash(s1) == get_hw_hash(s1)` (deterministic) | exact | N/A | N/A |
+| 25 | Strings "hw-a" and "hw-b" | Different hardware strings | `get_hw_hash("hw-a") != get_hw_hash("hw-b")` | exact | N/A | N/A |
+| 26 | String "test-hardware-info" | Hash format | Result is base64-encoded string, length > 0 | pattern | matches `^[A-Za-z0-9+/]+=*$` | N/A |
+
+### ONNX Inference Smoke Test
+
+| # | Input | Input Description | Expected Result | Comparison | Tolerance | Reference File |
+|---|-------|-------------------|-----------------|------------|-----------|---------------|
+| 27 | `azaion.onnx` + 1 image from `dataset/` | Model + annotated image (known to contain objects) | Engine loads without error; returns output array with shape [batch, N, 6] | exact (no exception) | N/A | N/A |
+| 28 | `azaion.onnx` + 1 image from `dataset/` | Inference postprocessing | Returns list of Detection objects (≥ 0 items); each Detection has x, y, w, h in [0,1], cls ≥ 0, confidence in [0,1] | range | x,y,w,h ∈ [0,1]; confidence ∈ [0,1]; cls ∈ [0,79] | N/A |
+
+### NMS / Overlap Removal
+
+| # | Input | Input Description | Expected Result | Comparison | Tolerance | Reference File |
+|---|-------|-------------------|-----------------|------------|-----------|---------------|
+| 29 | 2 Detections: same position, conf 0.9 and 0.5, IoU > 0.3 | Overlapping detections, different confidence | 1 detection remaining (conf 0.9 kept) | exact | count: 1 | N/A |
+| 30 | 2 Detections: non-overlapping positions, IoU < 0.3 | Non-overlapping detections | 2 detections remaining (both kept) | exact | count: 2 | N/A |
+| 31 | 3 Detections: A overlaps B, B overlaps C, A doesn't overlap C | Chain overlap | ≤ 2 detections remaining; highest confidence per overlap pair kept | threshold_max | count ≤ 2 | N/A |
+
+### Annotation Queue Message Parsing
+
+| # | Input | Input Description | Expected Result | Comparison | Tolerance | Reference File |
+|---|-------|-------------------|-----------------|------------|-----------|---------------|
+| 32 | Constructed msgpack bytes matching AnnotationMessage schema | Valid Created annotation message | Parsed AnnotationMessage with correct fields: name, detections, image bytes, status == Created | exact | N/A | N/A |
+| 33 | Constructed msgpack bytes for bulk Validated message | Valid bulk validation message | Parsed with status == Validated, list of annotation names | exact | N/A | N/A |
+| 34 | Constructed msgpack bytes for bulk Deleted message | Valid bulk deletion message | Parsed with status == Deleted, list of annotation names | exact | N/A | N/A |
+| 35 | Malformed msgpack bytes | Invalid message format | Exception raised (caught by handler) | exact (exception type) | N/A | N/A |
+
+### YAML Generation
+
+| # | Input | Input Description | Expected Result | Comparison | Tolerance | Reference File |
+|---|-------|-------------------|-----------------|------------|-----------|---------------|
+| 36 | `classes.json` + dataset path | Generate data.yaml for training | YAML contains: `nc: 80`, `train: train/images`, `val: valid/images`, `test: test/images`, 80 class names | exact | N/A | N/A |
+| 37 | `classes.json` with 17 classes | Class name listing in YAML | 17 known class names present; 63 placeholder names as `Class-N` | exact | 17 named + 63 placeholder = 80 total | N/A |