mirror of
https://github.com/azaion/ai-training.git
synced 2026-04-23 00:16:35 +00:00
142c6c4de8
- Replaced module-level path variables in constants.py with a structured Pydantic Config class. - Updated all relevant modules (train.py, augmentation.py, exports.py, dataset-visualiser.py, manual_run.py) to access paths through the new config structure. - Fixed bugs related to image processing and model saving. - Enhanced test infrastructure to accommodate the new configuration approach. This refactor improves code maintainability and clarity by centralizing configuration management.
139 lines
7.8 KiB
Markdown
139 lines
7.8 KiB
Markdown
# Verification Log
|
||
|
||
## Summary
|
||
|
||
| Metric | Count |
|
||
|--------|-------|
|
||
| Entities verified | 87 |
|
||
| Entities flagged | 0 |
|
||
| Corrections applied | 0 |
|
||
| Bugs found in code | 5 |
|
||
| Missing modules | 1 |
|
||
| Duplicated code | 1 pattern (3 locations) |
|
||
| Security issues | 3 |
|
||
| Completeness | 21/21 modules (100%) |
|
||
|
||
## Entity Verification
|
||
|
||
All class names, function names, method signatures, and module names referenced in documentation were verified against the actual source code. No hallucinated entities found.
|
||
|
||
### Verified Entities (key samples)
|
||
|
||
| Entity | Location | Doc Reference | Status |
|
||
|--------|----------|--------------|--------|
|
||
| `Security.encrypt_to` | security.py:14 | modules/security.md | OK |
|
||
| `Security.decrypt_to` | security.py:28 | modules/security.md | OK |
|
||
| `Security.get_model_encryption_key` | security.py:66 | modules/security.md | OK |
|
||
| `get_hardware_info` | hardware_service.py:5 | modules/hardware_service.md | OK |
|
||
| `CDNManager.upload` | cdn_manager.py:28 | modules/cdn_manager.md | OK |
|
||
| `CDNManager.download` | cdn_manager.py:37 | modules/cdn_manager.md | OK |
|
||
| `ApiClient.login` | api_client.py:43 | modules/api_client.md | OK |
|
||
| `ApiClient.load_bytes` | api_client.py:63 | modules/api_client.md | OK |
|
||
| `ApiClient.upload_big_small_resource` | api_client.py:113 | modules/api_client.md | OK |
|
||
| `Augmentator.augment_annotations` | augmentation.py:125 | modules/augmentation.md | OK |
|
||
| `Augmentator.augment_inner` | augmentation.py:55 | modules/augmentation.md | OK |
|
||
| `InferenceEngine` (ABC) | inference/onnx_engine.py:7 | modules/inference_onnx_engine.md | OK |
|
||
| `OnnxEngine` | inference/onnx_engine.py:25 | modules/inference_onnx_engine.md | OK |
|
||
| `TensorRTEngine` | inference/tensorrt_engine.py:16 | modules/inference_tensorrt_engine.md | OK |
|
||
| `TensorRTEngine.convert_from_onnx` | inference/tensorrt_engine.py:104 | modules/inference_tensorrt_engine.md | OK |
|
||
| `Inference.process` | inference/inference.py:83 | modules/inference_inference.md | OK |
|
||
| `Inference.remove_overlapping_detections` | inference/inference.py:120 | modules/inference_inference.md | OK |
|
||
| `AnnotationQueueHandler.on_message` | annotation-queue/annotation_queue_handler.py:87 | modules/annotation_queue_handler.md | OK |
|
||
| `AnnotationMessage` | annotation-queue/annotation_queue_dto.py:91 | modules/annotation_queue_dto.md | OK |
|
||
| `form_dataset` | train.py:42 | modules/train.md | OK |
|
||
| `train_dataset` | train.py:147 | modules/train.md | OK |
|
||
| `export_onnx` | exports.py:29 | modules/exports.md | OK |
|
||
| `export_rknn` | exports.py:19 | modules/exports.md | OK |
|
||
| `export_tensorrt` | exports.py:45 | modules/exports.md | OK |
|
||
| `upload_model` | exports.py:82 | modules/exports.md | OK |
|
||
| `WeatherMode` | dto/annotationClass.py:6 | modules/dto_annotationClass.md | OK |
|
||
| `AnnotationClass.read_json` | dto/annotationClass.py:18 | modules/dto_annotationClass.md | OK |
|
||
| `ImageLabel.visualize` | dto/imageLabel.py:12 | modules/dto_imageLabel.md | OK |
|
||
| `Dotdict` | utils.py:1 | modules/utils.md | OK |
|
||
|
||
## Code Bugs Found During Verification
|
||
|
||
### Bug 1: `augmentation.py` — undefined attribute `total_to_process`
|
||
- **Location**: augmentation.py, line 118
|
||
- **Issue**: References `self.total_to_process` but only `self.total_images_to_process` is defined in `__init__`
|
||
- **Impact**: AttributeError at runtime during progress logging
|
||
- **Documented in**: modules/augmentation.md, components/05_data_pipeline/description.md
|
||
|
||
### Bug 2: `train.py` `copy_annotations` — reporting bug
|
||
- **Location**: train.py, line 93 and 99
|
||
- **Issue**: `copied = 0` is declared but never incremented. The global `total_files_copied` is incremented inside the inner function, but `copied` is printed in the final message: `f'Copied all {copied} annotations'` always prints 0.
|
||
- **Impact**: Incorrect progress reporting (cosmetic)
|
||
- **Documented in**: modules/train.md, components/06_training/description.md
|
||
|
||
### Bug 3: `exports.py` `upload_model` — stale ApiClient constructor call
|
||
- **Location**: exports.py, line 97
|
||
- **Issue**: `ApiClient(ApiCredentials(api_c.url, api_c.user, api_c.pw, api_c.folder))` — but `ApiClient.__init__` takes no args, and `ApiCredentials.__init__` takes `(url, email, password)`, not `(url, user, pw, folder)`.
|
||
- **Impact**: `upload_model` function would fail at runtime. This function appears to be stale code — the actual upload flow in `train.py:export_current_model` uses the correct `ApiClient()` constructor.
|
||
- **Documented in**: modules/exports.md, components/06_training/description.md
|
||
|
||
### Bug 4: `inference/tensorrt_engine.py` — potential uninitialized `batch_size`
|
||
- **Location**: inference/tensorrt_engine.py, line 43–44
|
||
- **Issue**: `self.batch_size` is only set if `engine_input_shape[0] != -1`. If the batch dimension is dynamic (-1), `self.batch_size` is never assigned before being used in `self.input_shape = [self.batch_size, ...]`.
|
||
- **Impact**: NameError at runtime for models with dynamic batch size (unless batch_size is passed via kwargs/set elsewhere)
|
||
- **Documented in**: modules/inference_tensorrt_engine.md, components/07_inference/description.md
|
||
|
||
### Bug 5: `dataset-visualiser.py` — missing import
|
||
- **Location**: dataset-visualiser.py, line 6
|
||
- **Issue**: `from preprocessing import read_labels` — the `preprocessing` module does not exist in the codebase.
|
||
- **Impact**: Script cannot run; ImportError at startup
|
||
- **Documented in**: modules/dataset_visualiser.md, components/05_data_pipeline/description.md
|
||
|
||
## Missing Modules
|
||
|
||
| Module | Referenced By | Status |
|
||
|--------|-------------|--------|
|
||
| `preprocessing` | dataset-visualiser.py, tests/imagelabel_visualize_test.py | Not found in codebase |
|
||
|
||
## Duplicated Code
|
||
|
||
### AnnotationClass + WeatherMode (3 locations)
|
||
| Location | Differences |
|
||
|----------|-------------|
|
||
| `dto/annotationClass.py` | Standard version. `color_tuple` property strips first 3 chars. |
|
||
| `inference/dto.py` | Adds `opencv_color` BGR field. Same `read_json` logic. |
|
||
| `annotation-queue/annotation_queue_dto.py` | Adds `opencv_color`. Reads `classes.json` from CWD (not relative to package). |
|
||
|
||
## Security Issues
|
||
|
||
| Issue | Location | Severity |
|
||
|-------|----------|----------|
|
||
| Hardcoded API credentials | config.yaml (email, password) | High |
|
||
| Hardcoded CDN access keys | cdn.yaml (4 access keys) | High |
|
||
| Hardcoded encryption key | security.py:67 (`get_model_encryption_key`) | High |
|
||
| Queue credentials in plaintext | config.yaml, annotation-queue/config.yaml | Medium |
|
||
| No TLS cert validation in API calls | api_client.py | Low |
|
||
|
||
## Completeness Check
|
||
|
||
All 21 source modules documented. All 8 components cover all modules with no gaps.
|
||
|
||
| Component | Modules | Complete |
|
||
|-----------|---------|----------|
|
||
| 01 Core | constants, utils | Yes |
|
||
| 02 Security | security, hardware_service | Yes |
|
||
| 03 API & CDN | api_client, cdn_manager | Yes |
|
||
| 04 Data Models | dto/annotationClass, dto/imageLabel | Yes |
|
||
| 05 Data Pipeline | augmentation, convert-annotations, dataset-visualiser | Yes |
|
||
| 06 Training | train, exports, manual_run | Yes |
|
||
| 07 Inference | inference/dto, onnx_engine, tensorrt_engine, inference, start_inference | Yes |
|
||
| 08 Annotation Queue | annotation_queue_dto, annotation_queue_handler | Yes |
|
||
|
||
## Consistency Check
|
||
|
||
- Component docs agree with architecture doc: Yes
|
||
- Flow diagrams match component interfaces: Yes
|
||
- Module dependency graph in discovery matches import analysis: Yes
|
||
- Data model doc matches filesystem layout in architecture: Yes
|
||
|
||
## Remaining Gaps / Uncertainties
|
||
|
||
- The `preprocessing` module may have existed previously and been deleted or renamed
|
||
- `exports.upload_model` may be intentionally deprecated in favor of the ApiClient-based flow in train.py
|
||
- `checkpoint.txt` content (`2024-06-27 20:51:35`) suggests training infrastructure was last used in mid-2024
|
||
- The `orangepi5/` shell scripts were not analyzed (bash, not Python) — they appear to be setup/run scripts for edge deployment
|