Refactor constants management to use Pydantic BaseModel for configuration

- Replaced module-level path variables in constants.py with a structured Pydantic Config class. - Updated all relevant modules (train.py, augmentation.py, exports.py, dataset-visualiser.py, manual_run.py) to access paths through the new config structure. - Fixed bugs related to image processing and model saving. - Enhanced test infrastructure to accommodate the new configuration approach. This refactor improves code maintainability and clarity by centralizing configuration management.
2026-06-21 09:41:12 +00:00 · 2026-03-27 18:18:30 +02:00
parent b68c07b540
commit 142c6c4de8
106 changed files with 5706 additions and 654 deletions
@@ -0,0 +1,137 @@
+# Test Infrastructure
+
+**Task**: AZ-152_test_infrastructure
+**Name**: Test Infrastructure
+**Description**: Scaffold the test project — pytest configuration, fixtures, conftest, test data management, Docker test environment
+**Complexity**: 3 points
+**Dependencies**: None
+**Component**: Blackbox Tests
+**Jira**: AZ-152
+**Epic**: AZ-151
+
+## Test Project Folder Layout
+
+```
+tests/
+├── conftest.py
+├── test_augmentation.py
+├── test_dataset_formation.py
+├── test_label_validation.py
+├── test_encryption.py
+├── test_model_split.py
+├── test_annotation_classes.py
+├── test_hardware_hash.py
+├── test_onnx_inference.py
+├── test_nms.py
+├── test_annotation_queue.py
+├── performance/
+│   ├── conftest.py
+│   ├── test_augmentation_perf.py
+│   ├── test_dataset_perf.py
+│   ├── test_encryption_perf.py
+│   └── test_inference_perf.py
+├── resilience/
+│   └── (resilience tests embedded in main test files via markers)
+├── security/
+│   └── (security tests embedded in main test files via markers)
+└── resource_limits/
+    └── (resource limit tests embedded in main test files via markers)
+```
+
+### Layout Rationale
+
+Flat test file structure per functional area matches the existing codebase module layout. Performance tests are separated into a subdirectory so they can be run independently (slower, threshold-based). Resilience, security, and resource limit tests use pytest markers (`@pytest.mark.resilience`, `@pytest.mark.security`, `@pytest.mark.resource_limit`) within the main test files to avoid unnecessary file proliferation while allowing selective execution.
+
+## Mock Services
+
+No mock services required. All 55 test scenarios operate offline against local code modules. External services (Azaion API, S3 CDN, RabbitMQ Streams, TensorRT) are excluded from the test scope per user decision.
+
+## Docker Test Environment
+
+### docker-compose.test.yml Structure
+
+| Service | Image / Build | Purpose | Depends On |
+|---------|--------------|---------|------------|
+| test-runner | Build from `Dockerfile.test` | Runs pytest suite | — |
+
+Single-container setup: the system under test is a Python library (not a service), so tests import modules directly. No network services required.
+
+### Volumes
+
+| Volume Mount | Purpose |
+|-------------|---------|
+| `./test-results:/app/test-results` | JUnit XML output for CI parsing |
+| `./_docs/00_problem/input_data:/app/_docs/00_problem/input_data:ro` | Fixture images, labels, ONNX model (read-only) |
+
+## Test Runner Configuration
+
+**Framework**: pytest
+**Plugins**: pytest (built-in JUnit XML via `--junitxml`)
+**Entry point (local)**: `scripts/run-tests-local.sh`
+**Entry point (Docker)**: `docker compose -f docker-compose.test.yml up --build --abort-on-container-exit`
+
+### Fixture Strategy
+
+| Fixture | Scope | Purpose |
+|---------|-------|---------|
+| `fixture_images_dir` | session | Path to 100 JPEG images from `_docs/00_problem/input_data/dataset/images/` |
+| `fixture_labels_dir` | session | Path to 100 YOLO labels from `_docs/00_problem/input_data/dataset/labels/` |
+| `fixture_onnx_model` | session | Bytes of `_docs/00_problem/input_data/azaion.onnx` |
+| `fixture_classes_json` | session | Path to `classes.json` |
+| `work_dir` | function | `tmp_path` based working directory for filesystem tests |
+| `sample_image_label` | function | Copies 1 image + label to `tmp_path` |
+| `sample_images_labels` | function | Copies N images + labels to `tmp_path` (parameterizable) |
+| `corrupted_label` | function | Generates a label file with coords > 1.0 in `tmp_path` |
+| `edge_bbox_label` | function | Generates a label with bbox near image edge in `tmp_path` |
+| `empty_label` | function | Generates an empty label file in `tmp_path` |
+
+## Test Data Fixtures
+
+| Data Set | Source | Format | Used By |
+|----------|--------|--------|---------|
+| 100 annotated images | `_docs/00_problem/input_data/dataset/images/` | JPEG | Augmentation, dataset formation, inference |
+| 100 YOLO labels | `_docs/00_problem/input_data/dataset/labels/` | TXT | Augmentation, dataset formation, label validation |
+| ONNX model (77MB) | `_docs/00_problem/input_data/azaion.onnx` | ONNX | Encryption roundtrip, inference |
+| Class definitions | `classes.json` (project root) | JSON | Annotation class loading, YAML generation |
+| Corrupted labels | Generated at test time | TXT | Label validation, dataset formation |
+| Edge-case bboxes | Generated at test time | In-memory | Augmentation bbox correction |
+| Detection objects | Generated at test time | In-memory | NMS overlap removal |
+| Msgpack messages | Generated at test time | bytes | Annotation queue parsing |
+| Random binary data | Generated at test time (`os.urandom`) | bytes | Encryption tests |
+| Empty label files | Generated at test time | TXT | Augmentation edge case |
+
+### Data Isolation
+
+Each test function receives an isolated `tmp_path` directory. Fixture files are copied (not symlinked) to `tmp_path` to prevent cross-test interference. Session-scoped fixtures (image dir, model bytes) are read-only references. No test modifies the source fixture data.
+
+## Test Reporting
+
+**Format**: JUnit XML
+**Output path**: `test-results/test-results.xml` (local), `/app/test-results/test-results.xml` (Docker)
+**CI integration**: Standard JUnit XML parseable by GitHub Actions, Azure Pipelines, GitLab CI
+
+## Constants Patching Strategy
+
+The production code uses hardcoded paths from `constants.py` (e.g., `/azaion/data/`). Tests must override these paths to point to `tmp_path` directories. Strategy: use `monkeypatch` or `unittest.mock.patch` to override `constants.*` module attributes at test function scope.
+
+## Acceptance Criteria
+
+**AC-1: Local test runner works**
+Given requirements-test.txt is installed
+When `scripts/run-tests-local.sh` is executed
+Then pytest discovers and runs tests, produces JUnit XML in `test-results/`
+
+**AC-2: Docker test runner works**
+Given Dockerfile.test and docker-compose.test.yml exist
+When `docker compose -f docker-compose.test.yml up --build` is executed
+Then test-runner container runs all tests, JUnit XML is written to mounted `test-results/` volume
+
+**AC-3: Fixtures provide test data**
+Given conftest.py defines session and function-scoped fixtures
+When a test requests `fixture_images_dir`
+Then it receives a valid path to 100 JPEG images
+
+**AC-4: Constants are properly patched**
+Given a test patches `constants.data_dir` to `tmp_path`
+When the test runs augmentation or dataset formation
+Then all file operations target `tmp_path`, not `/azaion/`
@@ -0,0 +1,83 @@
+# Augmentation Blackbox Tests
+
+**Task**: AZ-153_test_augmentation
+**Name**: Augmentation Blackbox Tests
+**Description**: Implement 8 blackbox tests for the augmentation pipeline — output count, naming, bbox validation, edge cases, filesystem integration
+**Complexity**: 3 points
+**Dependencies**: AZ-152_test_infrastructure
+**Component**: Blackbox Tests
+**Jira**: AZ-153
+**Epic**: AZ-151
+
+## Problem
+
+The augmentation pipeline transforms annotated images into 8 variants each. Tests must verify output count, naming conventions, bounding box validity, edge cases, and filesystem integration without referencing internals.
+
+## Outcome
+
+- 8 passing pytest tests in `tests/test_augmentation.py`
+- Covers: single-image augmentation, naming convention, bbox range, bbox clipping, tiny bbox removal, empty labels, full pipeline, skip-already-processed
+
+## Scope
+
+### Included
+- BT-AUG-01: Single image → 8 outputs
+- BT-AUG-02: Augmented filenames follow naming convention
+- BT-AUG-03: All output bounding boxes in valid range [0,1]
+- BT-AUG-04: Bounding box correction clips edge bboxes
+- BT-AUG-05: Tiny bounding boxes removed after correction
+- BT-AUG-06: Empty label produces 8 outputs with empty labels
+- BT-AUG-07: Full augmentation pipeline (filesystem, 5 images → 40 outputs)
+- BT-AUG-08: Augmentation skips already-processed images
+
+### Excluded
+- Performance tests (separate task)
+- Resilience tests (separate task)
+
+## Acceptance Criteria
+
+**AC-1: Output count**
+Given 1 image + 1 valid label
+When augment_inner() runs
+Then exactly 8 ImageLabel objects are returned
+
+**AC-2: Naming convention**
+Given image with stem "test_image"
+When augment_inner() runs
+Then outputs named test_image.jpg, test_image_1.jpg through test_image_7.jpg with matching .txt labels
+
+**AC-3: Bbox validity**
+Given 1 image + label with multiple bboxes
+When augment_inner() runs
+Then every bbox coordinate in every output is in [0.0, 1.0]
+
+**AC-4: Edge bbox clipping**
+Given label with bbox near edge (x=0.99, w=0.2)
+When correct_bboxes() runs
+Then width reduced to fit within bounds; no coordinate exceeds [margin, 1-margin]
+
+**AC-5: Tiny bbox removal**
+Given label with bbox that becomes < 0.01 area after clipping
+When correct_bboxes() runs
+Then bbox is removed from output
+
+**AC-6: Empty label**
+Given 1 image + empty label file
+When augment_inner() runs
+Then 8 ImageLabel objects returned, all with empty labels lists
+
+**AC-7: Full pipeline**
+Given 5 images + labels in data/ directory
+When augment_annotations() runs with patched paths
+Then 40 images in processed images dir, 40 matching labels
+
+**AC-8: Skip already-processed**
+Given 5 images in data/, 3 already in processed/
+When augment_annotations() runs
+Then only 2 new images processed (16 new outputs), existing 3 untouched
+
+## Constraints
+
+- Must patch constants.py paths to use tmp_path
+- Fixture images from _docs/00_problem/input_data/dataset/
+- Each test operates in isolated tmp_path
@@ -0,0 +1,72 @@
+# Augmentation Performance, Resilience & Resource Tests
+
+**Task**: AZ-154_test_augmentation_nonfunc
+**Name**: Augmentation Non-Functional Tests
+**Description**: Implement performance, resilience, and resource limit tests for augmentation — throughput, parallel speedup, error handling, output bounds
+**Complexity**: 2 points
+**Dependencies**: AZ-152_test_infrastructure
+**Component**: Blackbox Tests
+**Jira**: AZ-154
+**Epic**: AZ-151
+
+## Problem
+
+Augmentation must perform within time thresholds, handle corrupted/missing inputs gracefully, and respect output count bounds.
+
+## Outcome
+
+- 6 passing pytest tests across performance and resilience categories
+- Performance tests in `tests/performance/test_augmentation_perf.py`
+- Resilience and resource limit tests in `tests/test_augmentation.py` with markers
+
+## Scope
+
+### Included
+- PT-AUG-01: Augmentation throughput (10 images ≤ 60s)
+- PT-AUG-02: Parallel augmentation speedup (≥ 1.5× faster)
+- RT-AUG-01: Handles corrupted image gracefully
+- RT-AUG-02: Handles missing label file
+- RT-AUG-03: Transform failure produces fewer variants (no crash)
+- RL-AUG-01: Output count bounded to exactly 8
+
+### Excluded
+- Blackbox functional tests (separate task 02)
+
+## Acceptance Criteria
+
+**AC-1: Throughput**
+Given 10 images from fixture dataset
+When augment_annotations() runs
+Then completes within 60 seconds
+
+**AC-2: Parallel speedup**
+Given 10 images from fixture dataset
+When run with ThreadPoolExecutor vs sequential
+Then parallel is ≥ 1.5× faster
+
+**AC-3: Corrupted image**
+Given 1 valid + 1 corrupted image (truncated JPEG)
+When augment_annotations() runs
+Then valid image produces 8 outputs, corrupted skipped, no crash
+
+**AC-4: Missing label**
+Given 1 image with no matching label file
+When augment_annotation() runs on it
+Then exception caught per-thread, pipeline continues
+
+**AC-5: Transform failure**
+Given 1 image + label with extremely narrow bbox
+When augment_inner() runs
+Then 1-8 ImageLabel objects returned, no crash
+
+**AC-6: Output count bounded**
+Given 1 image
+When augment_inner() runs
+Then exactly 8 outputs returned (never more)
+
+## Constraints
+
+- Performance tests require pytest markers: `@pytest.mark.performance`
+- Resilience tests marked: `@pytest.mark.resilience`
+- Resource limit tests marked: `@pytest.mark.resource_limit`
+- Performance thresholds are generous (CPU-bound, no GPU requirement)
@@ -0,0 +1,78 @@
+# Dataset Formation Tests
+
+**Task**: AZ-155_test_dataset_formation
+**Name**: Dataset Formation Tests
+**Description**: Implement blackbox, performance, resilience, and resource tests for dataset split — 70/20/10 ratio, directory structure, integrity, corrupted filtering
+**Complexity**: 2 points
+**Dependencies**: AZ-152_test_infrastructure
+**Component**: Blackbox Tests
+**Jira**: AZ-155
+**Epic**: AZ-151
+
+## Problem
+
+Dataset formation splits annotated images into train/valid/test sets. Tests must verify correct ratios, directory structure, data integrity, corrupted label filtering, and performance.
+
+## Outcome
+
+- 8 passing pytest tests covering dataset formation
+- Blackbox tests in `tests/test_dataset_formation.py`
+- Performance test in `tests/performance/test_dataset_perf.py`
+
+## Scope
+
+### Included
+- BT-DSF-01: 70/20/10 split ratio (100 images → 70/20/10)
+- BT-DSF-02: Split directory structure (6 subdirs created)
+- BT-DSF-03: Total files preserved (sum == 100)
+- BT-DSF-04: Corrupted labels moved to corrupted directory
+- PT-DSF-01: Dataset formation throughput (100 images ≤ 30s)
+- RT-DSF-01: Empty processed directory handled gracefully
+- RL-DSF-01: Split ratios sum to 100%
+- RL-DSF-02: No data duplication across splits
+
+### Excluded
+- Label validation (separate task)
+
+## Acceptance Criteria
+
+**AC-1: Split ratio**
+Given 100 images + labels in processed/ dir
+When form_dataset() runs with patched paths
+Then train: 70, valid: 20, test: 10
+
+**AC-2: Directory structure**
+Given 100 images + labels
+When form_dataset() runs
+Then creates train/images/, train/labels/, valid/images/, valid/labels/, test/images/, test/labels/
+
+**AC-3: Data integrity**
+Given 100 valid images + labels
+When form_dataset() runs
+Then count(train) + count(valid) + count(test) == 100
+
+**AC-4: Corrupted filtering**
+Given 95 valid + 5 corrupted labels
+When form_dataset() runs
+Then 5 in data-corrupted/, 95 across splits
+
+**AC-5: Throughput**
+Given 100 images + labels
+When form_dataset() runs
+Then completes within 30 seconds
+
+**AC-6: Empty directory**
+Given empty processed images dir
+When form_dataset() runs
+Then empty dirs created, no crash
+
+**AC-7: No duplication**
+Given 100 images after form_dataset()
+When collecting all filenames across train/valid/test
+Then no filename appears in more than one split
+
+## Constraints
+
+- Must patch constants.py paths to use tmp_path
+- Requires copying 100 fixture images to tmp_path (session fixture)
+- Performance test marked: `@pytest.mark.performance`
@@ -0,0 +1,62 @@
+# Label Validation Tests
+
+**Task**: AZ-156_test_label_validation
+**Name**: Label Validation Tests
+**Description**: Implement 5 blackbox tests for YOLO label validation — valid labels, out-of-range coords, missing files, multi-line corruption
+**Complexity**: 1 point
+**Dependencies**: AZ-152_test_infrastructure
+**Component**: Blackbox Tests
+**Jira**: AZ-156
+**Epic**: AZ-151
+
+## Problem
+
+Labels must be validated before dataset formation. Tests verify the check_label function correctly accepts valid labels and rejects corrupted ones.
+
+## Outcome
+
+- 5 passing pytest tests in `tests/test_label_validation.py`
+
+## Scope
+
+### Included
+- BT-LBL-01: Valid label accepted (returns True)
+- BT-LBL-02: Label with x > 1.0 rejected (returns False)
+- BT-LBL-03: Label with height > 1.0 rejected (returns False)
+- BT-LBL-04: Missing label file rejected (returns False)
+- BT-LBL-05: Multi-line label with one corrupted line (returns False)
+
+### Excluded
+- Integration with dataset formation (separate task)
+
+## Acceptance Criteria
+
+**AC-1: Valid label**
+Given label file with content `0 0.5 0.5 0.1 0.1`
+When check_label(path) is called
+Then returns True
+
+**AC-2: x out of range**
+Given label file with content `0 1.5 0.5 0.1 0.1`
+When check_label(path) is called
+Then returns False
+
+**AC-3: height out of range**
+Given label file with content `0 0.5 0.5 0.1 1.2`
+When check_label(path) is called
+Then returns False
+
+**AC-4: Missing file**
+Given non-existent file path
+When check_label(path) is called
+Then returns False
+
+**AC-5: Multi-line corruption**
+Given label with `0 0.5 0.5 0.1 0.1\n3 0.5 0.5 0.1 1.5`
+When check_label(path) is called
+Then returns False
+
+## Constraints
+
+- Label files are generated in tmp_path at test time
+- No external fixtures needed
@@ -0,0 +1,102 @@
+# Encryption & Security Tests
+
+**Task**: AZ-157_test_encryption
+**Name**: Encryption & Security Tests
+**Description**: Implement blackbox, security, performance, resilience, and resource tests for AES-256-CBC encryption — roundtrips, key behavior, IV randomness, throughput, size bounds
+**Complexity**: 3 points
+**Dependencies**: AZ-152_test_infrastructure
+**Component**: Blackbox Tests
+**Jira**: AZ-157
+**Epic**: AZ-151
+
+## Problem
+
+The encryption module must correctly encrypt/decrypt data, produce key-dependent ciphertexts with random IVs, handle edge cases, and meet throughput requirements.
+
+## Outcome
+
+- 13 passing pytest tests in `tests/test_encryption.py`
+- Performance test in `tests/performance/test_encryption_perf.py`
+
+## Scope
+
+### Included
+- BT-ENC-01: Encrypt-decrypt roundtrip (1024 random bytes)
+- BT-ENC-02: Encrypt-decrypt roundtrip (ONNX model)
+- BT-ENC-03: Empty input roundtrip
+- BT-ENC-04: Single byte roundtrip
+- BT-ENC-05: Different keys produce different ciphertext
+- BT-ENC-06: Wrong key fails decryption
+- PT-ENC-01: Encryption throughput (10MB ≤ 5s)
+- RT-ENC-01: Decrypt with corrupted ciphertext
+- ST-ENC-01: Random IV (same data, same key → different ciphertexts)
+- ST-ENC-02: Wrong key cannot recover plaintext
+- ST-ENC-03: Model encryption key is deterministic
+- RL-ENC-01: Encrypted output size bounded (≤ N + 32 bytes)
+
+### Excluded
+- Model split tests (separate task)
+
+## Acceptance Criteria
+
+**AC-1: Roundtrip**
+Given 1024 random bytes and key "test-key"
+When encrypt then decrypt
+Then output equals input exactly
+
+**AC-2: Model roundtrip**
+Given azaion.onnx bytes and model encryption key
+When encrypt then decrypt
+Then output equals input exactly
+
+**AC-3: Empty input**
+Given b"" and key
+When encrypt then decrypt
+Then output equals b""
+
+**AC-4: Single byte**
+Given b"\x00" and key
+When encrypt then decrypt
+Then output equals b"\x00"
+
+**AC-5: Key-dependent ciphertext**
+Given same data, keys "key-a" and "key-b"
+When encrypting with each key
+Then ciphertexts differ
+
+**AC-6: Wrong key failure**
+Given encrypted with "key-a"
+When decrypting with "key-b"
+Then output does NOT equal original
+
+**AC-7: Throughput**
+Given 10MB random bytes
+When encrypt + decrypt roundtrip
+Then completes within 5 seconds
+
+**AC-8: Corrupted ciphertext**
+Given randomly modified ciphertext bytes
+When decrypt_to is called
+Then either raises exception or returns non-original bytes
+
+**AC-9: Random IV**
+Given same data, same key, encrypted twice
+When comparing ciphertexts
+Then they differ (random IV)
+
+**AC-10: Model key deterministic**
+Given two calls to get_model_encryption_key()
+When comparing results
+Then identical
+
+**AC-11: Size bound**
+Given N bytes plaintext
+When encrypted
+Then ciphertext size ≤ N + 32 bytes
+
+## Constraints
+
+- ONNX model fixture is session-scoped (77MB, read once)
+- Security tests marked: `@pytest.mark.security`
+- Performance test marked: `@pytest.mark.performance`
+- Resource limit test marked: `@pytest.mark.resource_limit`
@@ -0,0 +1,44 @@
+# Model Split Tests
+
+**Task**: AZ-158_test_model_split
+**Name**: Model Split Tests
+**Description**: Implement 2 blackbox tests for model split storage — size constraint and reassembly integrity
+**Complexity**: 1 point
+**Dependencies**: AZ-152_test_infrastructure
+**Component**: Blackbox Tests
+**Jira**: AZ-158
+**Epic**: AZ-151
+
+## Problem
+
+Encrypted models are split into small and big parts for CDN storage. Tests must verify the split respects size constraints and reassembly produces the original.
+
+## Outcome
+
+- 2 passing pytest tests in `tests/test_model_split.py`
+
+## Scope
+
+### Included
+- BT-SPL-01: Split respects size constraint (small ≤ max(3072 bytes, 20% of total))
+- BT-SPL-02: Reassembly produces original (small + big == encrypted bytes)
+
+### Excluded
+- CDN upload/download (requires external service)
+
+## Acceptance Criteria
+
+**AC-1: Size constraint**
+Given 10000 encrypted bytes
+When split into small + big
+Then small ≤ max(3072, total × 0.2); big = remainder
+
+**AC-2: Reassembly**
+Given split parts from 10000 encrypted bytes
+When small + big concatenated
+Then equals original encrypted bytes
+
+## Constraints
+
+- Uses generated binary data (no fixture files needed)
+- References SMALL_SIZE_KB constant from constants.py
@@ -0,0 +1,57 @@
+# Annotation Class & YAML Tests
+
+**Task**: AZ-159_test_annotation_classes
+**Name**: Annotation Class & YAML Tests
+**Description**: Implement 4 tests for annotation class loading, weather mode expansion, YAML generation, and total class count
+**Complexity**: 2 points
+**Dependencies**: AZ-152_test_infrastructure
+**Component**: Blackbox Tests
+**Jira**: AZ-159
+**Epic**: AZ-151
+
+## Problem
+
+The system loads 17 base annotation classes, expands them across 3 weather modes, and generates a data.yaml with 80 class slots. Tests verify the class pipeline.
+
+## Outcome
+
+- 4 passing pytest tests in `tests/test_annotation_classes.py`
+
+## Scope
+
+### Included
+- BT-CLS-01: Load 17 base classes from classes.json
+- BT-CLS-02: Weather mode expansion (offsets 0, 20, 40)
+- BT-CLS-03: YAML generation produces nc: 80 with 17 named + 63 placeholders
+- RL-CLS-01: Total class count is exactly 80
+
+### Excluded
+- Training configuration (beyond scope)
+
+## Acceptance Criteria
+
+**AC-1: Base classes**
+Given classes.json
+When AnnotationClass.read_json() is called
+Then returns dict with 17 unique base class entries
+
+**AC-2: Weather expansion**
+Given classes.json
+When classes are read
+Then same class exists at offset 0 (Norm), 20 (Wint), 40 (Night)
+
+**AC-3: YAML generation**
+Given classes.json + dataset path
+When create_yaml() runs with patched paths
+Then data.yaml contains nc: 80, 17 named classes + 63 Class-N placeholders
+
+**AC-4: Total count**
+Given classes.json
+When generating class list
+Then exactly 80 entries
+
+## Constraints
+
+- Uses classes.json from project root (fixture_classes_json)
+- YAML output goes to tmp_path
+- Resource limit test marked: `@pytest.mark.resource_limit`
@@ -0,0 +1,65 @@
+# Hardware Hash & API Key Tests
+
+**Task**: AZ-160_test_hardware_hash
+**Name**: Hardware Hash & API Key Tests
+**Description**: Implement 7 tests for hardware fingerprinting — determinism, uniqueness, base64 format, API key derivation from credentials and hardware
+**Complexity**: 2 points
+**Dependencies**: AZ-152_test_infrastructure
+**Component**: Blackbox Tests
+**Jira**: AZ-160
+**Epic**: AZ-151
+
+## Problem
+
+Hardware hashing provides machine-bound security for model encryption and API authentication. Tests must verify determinism, uniqueness, format, and credential/hardware dependency.
+
+## Outcome
+
+- 7 passing pytest tests in `tests/test_hardware_hash.py`
+
+## Scope
+
+### Included
+- BT-HSH-01: Deterministic output (same input → same hash)
+- BT-HSH-02: Different inputs → different hashes
+- BT-HSH-03: Output is valid base64
+- ST-HSH-01: Hardware hash deterministic (duplicate of BT-HSH-01 for security coverage)
+- ST-HSH-02: Different hardware → different hash
+- ST-HSH-03: API encryption key depends on credentials + hardware
+- ST-HSH-04: API encryption key depends on credentials
+
+### Excluded
+- Actual hardware info collection (may need mocking)
+
+## Acceptance Criteria
+
+**AC-1: Determinism**
+Given "test-hardware-info"
+When get_hw_hash() called twice
+Then both calls return identical string
+
+**AC-2: Uniqueness**
+Given "hw-a" and "hw-b"
+When get_hw_hash() called on each
+Then results differ
+
+**AC-3: Base64 format**
+Given "test-hardware-info"
+When get_hw_hash() called
+Then result matches `^[A-Za-z0-9+/]+=*$`
+
+**AC-4: API key depends on hardware**
+Given same credentials, different hardware hashes
+When get_api_encryption_key() called
+Then different keys returned
+
+**AC-5: API key depends on credentials**
+Given different credentials, same hardware hash
+When get_api_encryption_key() called
+Then different keys returned
+
+## Constraints
+
+- Security tests marked: `@pytest.mark.security`
+- May require mocking hardware info collection functions
+- All inputs are generated strings (no external fixtures)
@@ -0,0 +1,62 @@
+# ONNX Inference Tests
+
+**Task**: AZ-161_test_onnx_inference
+**Name**: ONNX Inference Tests
+**Description**: Implement 4 tests for ONNX model loading, inference execution, postprocessing, and CPU latency
+**Complexity**: 3 points
+**Dependencies**: AZ-152_test_infrastructure
+**Component**: Blackbox Tests
+**Jira**: AZ-161
+**Epic**: AZ-151
+
+## Problem
+
+The ONNX inference engine loads a model, runs detection on images, and postprocesses results. Tests must verify the full pipeline works on CPU (smoke test — no precision validation).
+
+## Outcome
+
+- 4 passing pytest tests
+- Blackbox tests in `tests/test_onnx_inference.py`
+- Performance test in `tests/performance/test_inference_perf.py`
+
+## Scope
+
+### Included
+- BT-INF-01: Model loads successfully (no exception, valid engine)
+- BT-INF-02: Inference returns output (array shape [batch, N, 6+])
+- BT-INF-03: Postprocessing returns valid detections (x,y,w,h ∈ [0,1], cls ∈ [0,79], conf ∈ [0,1])
+- PT-INF-01: ONNX inference latency (single image ≤ 10s on CPU)
+
+### Excluded
+- TensorRT inference (requires NVIDIA GPU)
+- Detection precision/recall validation (smoke-only per user decision)
+
+## Acceptance Criteria
+
+**AC-1: Model loads**
+Given azaion.onnx bytes
+When OnnxEngine(model_bytes) is constructed
+Then no exception; engine has valid input_shape and batch_size
+
+**AC-2: Inference output**
+Given ONNX engine + 1 preprocessed image
+When engine.run(input_blob) is called
+Then returns list of numpy arrays; first array has shape [batch, N, 6+]
+
+**AC-3: Valid detections**
+Given ONNX engine output from real image
+When Inference.postprocess() is called
+Then returns list of Detection objects; each has x,y,w,h ∈ [0,1], cls ∈ [0,79], confidence ∈ [0,1]
+
+**AC-4: CPU latency**
+Given 1 preprocessed image + ONNX model
+When single inference runs
+Then completes within 10 seconds
+
+## Constraints
+
+- Uses onnxruntime (CPU) not onnxruntime-gpu
+- ONNX model is 77MB, loaded once (session fixture)
+- Image preprocessing must match model input size (1280×1280)
+- Performance test marked: `@pytest.mark.performance`
+- This is a smoke test — validates structure, not detection accuracy
@@ -0,0 +1,50 @@
+# NMS Overlap Removal Tests
+
+**Task**: AZ-162_test_nms
+**Name**: NMS Overlap Removal Tests
+**Description**: Implement 3 tests for non-maximum suppression — overlapping kept by confidence, non-overlapping preserved, chain overlap resolution
+**Complexity**: 1 point
+**Dependencies**: AZ-152_test_infrastructure
+**Component**: Blackbox Tests
+**Jira**: AZ-162
+**Epic**: AZ-151
+
+## Problem
+
+The NMS module removes overlapping detections based on IoU threshold (0.3), keeping the higher-confidence detection. Tests verify all overlap scenarios.
+
+## Outcome
+
+- 3 passing pytest tests in `tests/test_nms.py`
+
+## Scope
+
+### Included
+- BT-NMS-01: Overlapping detections — keep higher confidence (IoU > 0.3 → 1 kept)
+- BT-NMS-02: Non-overlapping detections — keep both (IoU < 0.3 → 2 kept)
+- BT-NMS-03: Chain overlap resolution (A↔B, B↔C → ≤ 2 kept)
+
+### Excluded
+- Integration with inference pipeline (separate task)
+
+## Acceptance Criteria
+
+**AC-1: Overlap removal**
+Given 2 Detections at same position, confidence 0.9 and 0.5, IoU > 0.3
+When remove_overlapping_detections() runs
+Then 1 detection returned (confidence 0.9)
+
+**AC-2: Non-overlapping preserved**
+Given 2 Detections at distant positions, IoU < 0.3
+When remove_overlapping_detections() runs
+Then 2 detections returned
+
+**AC-3: Chain overlap**
+Given 3 Detections: A overlaps B, B overlaps C, A doesn't overlap C
+When remove_overlapping_detections() runs
+Then ≤ 2 detections; highest confidence per overlapping pair kept
+
+## Constraints
+
+- Detection objects constructed in-memory (no fixture files)
+- IoU threshold is 0.3 (from constants or hardcoded in NMS)
@@ -0,0 +1,64 @@
+# Annotation Queue Message Tests
+
+**Task**: AZ-163_test_annotation_queue
+**Name**: Annotation Queue Message Tests
+**Description**: Implement 5 tests for annotation queue message parsing — Created, Validated bulk, Deleted bulk, malformed handling
+**Complexity**: 2 points
+**Dependencies**: AZ-152_test_infrastructure
+**Component**: Blackbox Tests
+**Jira**: AZ-163
+**Epic**: AZ-151
+
+## Problem
+
+The annotation queue processes msgpack-encoded messages from RabbitMQ Streams. Tests must verify correct parsing of all message types and graceful handling of malformed input.
+
+## Outcome
+
+- 5 passing pytest tests in `tests/test_annotation_queue.py`
+
+## Scope
+
+### Included
+- BT-AQM-01: Parse Created annotation message (all fields populated correctly)
+- BT-AQM-02: Parse Validated bulk message (status == Validated, names list matches)
+- BT-AQM-03: Parse Deleted bulk message (status == Deleted, names list matches)
+- BT-AQM-04: Malformed message raises exception
+- RT-AQM-01: Malformed msgpack bytes handled (exception caught, no crash)
+
+### Excluded
+- Live RabbitMQ Streams connection (requires external service)
+- Queue offset persistence (requires live broker)
+
+## Acceptance Criteria
+
+**AC-1: Created message**
+Given msgpack bytes matching AnnotationMessage schema (status=Created, role=Validator)
+When decoded and constructed
+Then all fields populated: name, detections, image bytes, status == "Created", role == "Validator"
+
+**AC-2: Validated bulk**
+Given msgpack bytes with status=Validated, list of names
+When decoded and constructed
+Then status == "Validated", names list matches input
+
+**AC-3: Deleted bulk**
+Given msgpack bytes with status=Deleted, list of names
+When decoded and constructed
+Then status == "Deleted", names list matches input
+
+**AC-4: Malformed msgpack**
+Given invalid msgpack bytes
+When decode is attempted
+Then exception raised
+
+**AC-5: Resilient handling**
+Given random bytes (not valid msgpack)
+When passed to message handler
+Then exception caught, handler doesn't crash
+
+## Constraints
+
+- Msgpack messages constructed in-memory at test time
+- Must match the AnnotationMessage/AnnotationBulkMessage schemas from annotation-queue/
+- Resilience test marked: `@pytest.mark.resilience`
@@ -0,0 +1,45 @@
+# Dependencies Table
+
+**Date**: 2026-03-26
+**Total Tasks**: 12
+**Total Complexity Points**: 25
+**Epic**: AZ-151
+
+| Task | Name | Complexity | Dependencies | Epic | Test Scenarios |
+|------|------|-----------|-------------|------|----------------|
+| AZ-152 | test_infrastructure | 3 | None | AZ-151 | — |
+| AZ-153 | test_augmentation | 3 | AZ-152 | AZ-151 | BT-AUG-01 to BT-AUG-08 (8) |
+| AZ-154 | test_augmentation_nonfunc | 2 | AZ-152 | AZ-151 | PT-AUG-01, PT-AUG-02, RT-AUG-01 to RT-AUG-03, RL-AUG-01 (6) |
+| AZ-155 | test_dataset_formation | 2 | AZ-152 | AZ-151 | BT-DSF-01 to BT-DSF-04, PT-DSF-01, RT-DSF-01, RL-DSF-01, RL-DSF-02 (8) |
+| AZ-156 | test_label_validation | 1 | AZ-152 | AZ-151 | BT-LBL-01 to BT-LBL-05 (5) |
+| AZ-157 | test_encryption | 3 | AZ-152 | AZ-151 | BT-ENC-01 to BT-ENC-06, PT-ENC-01, RT-ENC-01, ST-ENC-01 to ST-ENC-03, RL-ENC-01 (12) |
+| AZ-158 | test_model_split | 1 | AZ-152 | AZ-151 | BT-SPL-01, BT-SPL-02 (2) |
+| AZ-159 | test_annotation_classes | 2 | AZ-152 | AZ-151 | BT-CLS-01 to BT-CLS-03, RL-CLS-01 (4) |
+| AZ-160 | test_hardware_hash | 2 | AZ-152 | AZ-151 | BT-HSH-01 to BT-HSH-03, ST-HSH-01 to ST-HSH-04 (7) |
+| AZ-161 | test_onnx_inference | 3 | AZ-152 | AZ-151 | BT-INF-01 to BT-INF-03, PT-INF-01 (4) |
+| AZ-162 | test_nms | 1 | AZ-152 | AZ-151 | BT-NMS-01 to BT-NMS-03 (3) |
+| AZ-163 | test_annotation_queue | 2 | AZ-152 | AZ-151 | BT-AQM-01 to BT-AQM-04, RT-AQM-01 (5) |
+
+## Dependency Graph
+
+```
+AZ-151 (Epic: Blackbox Tests)
+└── AZ-152 test_infrastructure
+    ├── AZ-153 test_augmentation
+    ├── AZ-154 test_augmentation_nonfunc
+    ├── AZ-155 test_dataset_formation
+    ├── AZ-156 test_label_validation
+    ├── AZ-157 test_encryption
+    ├── AZ-158 test_model_split
+    ├── AZ-159 test_annotation_classes
+    ├── AZ-160 test_hardware_hash
+    ├── AZ-161 test_onnx_inference
+    ├── AZ-162 test_nms
+    └── AZ-163 test_annotation_queue
+```
+
+## Implementation Strategy
+
+- **Batch 1**: AZ-152 (test infrastructure) — must be implemented first
+- **Batch 2**: AZ-153 to AZ-163 (all test tasks) — can be implemented in parallel after infrastructure is ready
+- **Estimated batches**: 2