Refactor constants management to use Pydantic BaseModel for configuration

- Replaced module-level path variables in constants.py with a structured Pydantic Config class.
- Updated all relevant modules (train.py, augmentation.py, exports.py, dataset-visualiser.py, manual_run.py) to access paths through the new config structure.
- Fixed bugs related to image processing and model saving.
- Enhanced test infrastructure to accommodate the new configuration approach.

This refactor improves code maintainability and clarity by centralizing configuration management.
This commit is contained in:
Oleksandr Bezdieniezhnykh
2026-03-27 18:18:30 +02:00
parent b68c07b540
commit 142c6c4de8
106 changed files with 5706 additions and 654 deletions
+138
View File
@@ -0,0 +1,138 @@
# Verification Log
## Summary
| Metric | Count |
|--------|-------|
| Entities verified | 87 |
| Entities flagged | 0 |
| Corrections applied | 0 |
| Bugs found in code | 5 |
| Missing modules | 1 |
| Duplicated code | 1 pattern (3 locations) |
| Security issues | 3 |
| Completeness | 21/21 modules (100%) |
## Entity Verification
All class names, function names, method signatures, and module names referenced in documentation were verified against the actual source code. No hallucinated entities found.
### Verified Entities (key samples)
| Entity | Location | Doc Reference | Status |
|--------|----------|--------------|--------|
| `Security.encrypt_to` | security.py:14 | modules/security.md | OK |
| `Security.decrypt_to` | security.py:28 | modules/security.md | OK |
| `Security.get_model_encryption_key` | security.py:66 | modules/security.md | OK |
| `get_hardware_info` | hardware_service.py:5 | modules/hardware_service.md | OK |
| `CDNManager.upload` | cdn_manager.py:28 | modules/cdn_manager.md | OK |
| `CDNManager.download` | cdn_manager.py:37 | modules/cdn_manager.md | OK |
| `ApiClient.login` | api_client.py:43 | modules/api_client.md | OK |
| `ApiClient.load_bytes` | api_client.py:63 | modules/api_client.md | OK |
| `ApiClient.upload_big_small_resource` | api_client.py:113 | modules/api_client.md | OK |
| `Augmentator.augment_annotations` | augmentation.py:125 | modules/augmentation.md | OK |
| `Augmentator.augment_inner` | augmentation.py:55 | modules/augmentation.md | OK |
| `InferenceEngine` (ABC) | inference/onnx_engine.py:7 | modules/inference_onnx_engine.md | OK |
| `OnnxEngine` | inference/onnx_engine.py:25 | modules/inference_onnx_engine.md | OK |
| `TensorRTEngine` | inference/tensorrt_engine.py:16 | modules/inference_tensorrt_engine.md | OK |
| `TensorRTEngine.convert_from_onnx` | inference/tensorrt_engine.py:104 | modules/inference_tensorrt_engine.md | OK |
| `Inference.process` | inference/inference.py:83 | modules/inference_inference.md | OK |
| `Inference.remove_overlapping_detections` | inference/inference.py:120 | modules/inference_inference.md | OK |
| `AnnotationQueueHandler.on_message` | annotation-queue/annotation_queue_handler.py:87 | modules/annotation_queue_handler.md | OK |
| `AnnotationMessage` | annotation-queue/annotation_queue_dto.py:91 | modules/annotation_queue_dto.md | OK |
| `form_dataset` | train.py:42 | modules/train.md | OK |
| `train_dataset` | train.py:147 | modules/train.md | OK |
| `export_onnx` | exports.py:29 | modules/exports.md | OK |
| `export_rknn` | exports.py:19 | modules/exports.md | OK |
| `export_tensorrt` | exports.py:45 | modules/exports.md | OK |
| `upload_model` | exports.py:82 | modules/exports.md | OK |
| `WeatherMode` | dto/annotationClass.py:6 | modules/dto_annotationClass.md | OK |
| `AnnotationClass.read_json` | dto/annotationClass.py:18 | modules/dto_annotationClass.md | OK |
| `ImageLabel.visualize` | dto/imageLabel.py:12 | modules/dto_imageLabel.md | OK |
| `Dotdict` | utils.py:1 | modules/utils.md | OK |
## Code Bugs Found During Verification
### Bug 1: `augmentation.py` — undefined attribute `total_to_process`
- **Location**: augmentation.py, line 118
- **Issue**: References `self.total_to_process` but only `self.total_images_to_process` is defined in `__init__`
- **Impact**: AttributeError at runtime during progress logging
- **Documented in**: modules/augmentation.md, components/05_data_pipeline/description.md
### Bug 2: `train.py` `copy_annotations` — reporting bug
- **Location**: train.py, line 93 and 99
- **Issue**: `copied = 0` is declared but never incremented. The global `total_files_copied` is incremented inside the inner function, but `copied` is printed in the final message: `f'Copied all {copied} annotations'` always prints 0.
- **Impact**: Incorrect progress reporting (cosmetic)
- **Documented in**: modules/train.md, components/06_training/description.md
### Bug 3: `exports.py` `upload_model` — stale ApiClient constructor call
- **Location**: exports.py, line 97
- **Issue**: `ApiClient(ApiCredentials(api_c.url, api_c.user, api_c.pw, api_c.folder))` — but `ApiClient.__init__` takes no args, and `ApiCredentials.__init__` takes `(url, email, password)`, not `(url, user, pw, folder)`.
- **Impact**: `upload_model` function would fail at runtime. This function appears to be stale code — the actual upload flow in `train.py:export_current_model` uses the correct `ApiClient()` constructor.
- **Documented in**: modules/exports.md, components/06_training/description.md
### Bug 4: `inference/tensorrt_engine.py` — potential uninitialized `batch_size`
- **Location**: inference/tensorrt_engine.py, line 4344
- **Issue**: `self.batch_size` is only set if `engine_input_shape[0] != -1`. If the batch dimension is dynamic (-1), `self.batch_size` is never assigned before being used in `self.input_shape = [self.batch_size, ...]`.
- **Impact**: NameError at runtime for models with dynamic batch size (unless batch_size is passed via kwargs/set elsewhere)
- **Documented in**: modules/inference_tensorrt_engine.md, components/07_inference/description.md
### Bug 5: `dataset-visualiser.py` — missing import
- **Location**: dataset-visualiser.py, line 6
- **Issue**: `from preprocessing import read_labels` — the `preprocessing` module does not exist in the codebase.
- **Impact**: Script cannot run; ImportError at startup
- **Documented in**: modules/dataset_visualiser.md, components/05_data_pipeline/description.md
## Missing Modules
| Module | Referenced By | Status |
|--------|-------------|--------|
| `preprocessing` | dataset-visualiser.py, tests/imagelabel_visualize_test.py | Not found in codebase |
## Duplicated Code
### AnnotationClass + WeatherMode (3 locations)
| Location | Differences |
|----------|-------------|
| `dto/annotationClass.py` | Standard version. `color_tuple` property strips first 3 chars. |
| `inference/dto.py` | Adds `opencv_color` BGR field. Same `read_json` logic. |
| `annotation-queue/annotation_queue_dto.py` | Adds `opencv_color`. Reads `classes.json` from CWD (not relative to package). |
## Security Issues
| Issue | Location | Severity |
|-------|----------|----------|
| Hardcoded API credentials | config.yaml (email, password) | High |
| Hardcoded CDN access keys | cdn.yaml (4 access keys) | High |
| Hardcoded encryption key | security.py:67 (`get_model_encryption_key`) | High |
| Queue credentials in plaintext | config.yaml, annotation-queue/config.yaml | Medium |
| No TLS cert validation in API calls | api_client.py | Low |
## Completeness Check
All 21 source modules documented. All 8 components cover all modules with no gaps.
| Component | Modules | Complete |
|-----------|---------|----------|
| 01 Core | constants, utils | Yes |
| 02 Security | security, hardware_service | Yes |
| 03 API & CDN | api_client, cdn_manager | Yes |
| 04 Data Models | dto/annotationClass, dto/imageLabel | Yes |
| 05 Data Pipeline | augmentation, convert-annotations, dataset-visualiser | Yes |
| 06 Training | train, exports, manual_run | Yes |
| 07 Inference | inference/dto, onnx_engine, tensorrt_engine, inference, start_inference | Yes |
| 08 Annotation Queue | annotation_queue_dto, annotation_queue_handler | Yes |
## Consistency Check
- Component docs agree with architecture doc: Yes
- Flow diagrams match component interfaces: Yes
- Module dependency graph in discovery matches import analysis: Yes
- Data model doc matches filesystem layout in architecture: Yes
## Remaining Gaps / Uncertainties
- The `preprocessing` module may have existed previously and been deleted or renamed
- `exports.upload_model` may be intentionally deprecated in favor of the ApiClient-based flow in train.py
- `checkpoint.txt` content (`2024-06-27 20:51:35`) suggests training infrastructure was last used in mid-2024
- The `orangepi5/` shell scripts were not analyzed (bash, not Python) — they appear to be setup/run scripts for edge deployment