ai-training/_docs/02_document/04_verification_log.md

# Verification Log

## Summary

| Metric | Count |
|--------|-------|
| Entities verified | 87 |
| Entities flagged | 0 |
| Corrections applied | 0 |
| Bugs found in code | 5 |
| Missing modules | 1 |
| Duplicated code | 1 pattern (3 locations) |
| Security issues | 3 |
| Completeness | 21/21 modules (100%) |

## Entity Verification

All class names, function names, method signatures, and module names referenced in documentation were verified against the actual source code. No hallucinated entities found.

### Verified Entities (key samples)

| Entity | Location | Doc Reference | Status |
|--------|----------|--------------|--------|
| `Security.encrypt_to` | security.py:14 | modules/security.md | OK |
| `Security.decrypt_to` | security.py:28 | modules/security.md | OK |
| `Security.get_model_encryption_key` | security.py:66 | modules/security.md | OK |
| `get_hardware_info` | hardware_service.py:5 | modules/hardware_service.md | OK |
| `CDNManager.upload` | cdn_manager.py:28 | modules/cdn_manager.md | OK |
| `CDNManager.download` | cdn_manager.py:37 | modules/cdn_manager.md | OK |
| `ApiClient.login` | api_client.py:43 | modules/api_client.md | OK |
| `ApiClient.load_bytes` | api_client.py:63 | modules/api_client.md | OK |
| `ApiClient.upload_big_small_resource` | api_client.py:113 | modules/api_client.md | OK |
| `Augmentator.augment_annotations` | augmentation.py:125 | modules/augmentation.md | OK |
| `Augmentator.augment_inner` | augmentation.py:55 | modules/augmentation.md | OK |
| `InferenceEngine` (ABC) | inference/onnx_engine.py:7 | modules/inference_onnx_engine.md | OK |
| `OnnxEngine` | inference/onnx_engine.py:25 | modules/inference_onnx_engine.md | OK |
| `TensorRTEngine` | inference/tensorrt_engine.py:16 | modules/inference_tensorrt_engine.md | OK |
| `TensorRTEngine.convert_from_onnx` | inference/tensorrt_engine.py:104 | modules/inference_tensorrt_engine.md | OK |
| `Inference.process` | inference/inference.py:83 | modules/inference_inference.md | OK |
| `Inference.remove_overlapping_detections` | inference/inference.py:120 | modules/inference_inference.md | OK |
| `AnnotationQueueHandler.on_message` | annotation-queue/annotation_queue_handler.py:87 | modules/annotation_queue_handler.md | OK |
| `AnnotationMessage` | annotation-queue/annotation_queue_dto.py:91 | modules/annotation_queue_dto.md | OK |
| `form_dataset` | train.py:42 | modules/train.md | OK |
| `train_dataset` | train.py:147 | modules/train.md | OK |
| `export_onnx` | exports.py:29 | modules/exports.md | OK |
| `export_rknn` | exports.py:19 | modules/exports.md | OK |
| `export_tensorrt` | exports.py:45 | modules/exports.md | OK |
| `upload_model` | exports.py:82 | modules/exports.md | OK |
| `WeatherMode` | dto/annotationClass.py:6 | modules/dto_annotationClass.md | OK |
| `AnnotationClass.read_json` | dto/annotationClass.py:18 | modules/dto_annotationClass.md | OK |
| `ImageLabel.visualize` | dto/imageLabel.py:12 | modules/dto_imageLabel.md | OK |
| `Dotdict` | utils.py:1 | modules/utils.md | OK |

## Code Bugs Found During Verification

### Bug 1: `augmentation.py` — undefined attribute `total_to_process`
- **Location**: augmentation.py, line 118
- **Issue**: References `self.total_to_process` but only `self.total_images_to_process` is defined in `__init__`
- **Impact**: AttributeError at runtime during progress logging
- **Documented in**: modules/augmentation.md, components/05_data_pipeline/description.md

### Bug 2: `train.py` `copy_annotations` — reporting bug
- **Location**: train.py, line 93 and 99
- **Issue**: `copied = 0` is declared but never incremented. The global `total_files_copied` is incremented inside the inner function, but `copied` is printed in the final message: `f'Copied all {copied} annotations'` always prints 0.
- **Impact**: Incorrect progress reporting (cosmetic)
- **Documented in**: modules/train.md, components/06_training/description.md

### Bug 3: `exports.py` `upload_model` — stale ApiClient constructor call
- **Location**: exports.py, line 97
- **Issue**: `ApiClient(ApiCredentials(api_c.url, api_c.user, api_c.pw, api_c.folder))` — but `ApiClient.__init__` takes no args, and `ApiCredentials.__init__` takes `(url, email, password)`, not `(url, user, pw, folder)`.
- **Impact**: `upload_model` function would fail at runtime. This function appears to be stale code — the actual upload flow in `train.py:export_current_model` uses the correct `ApiClient()` constructor.
- **Documented in**: modules/exports.md, components/06_training/description.md

### Bug 4: `inference/tensorrt_engine.py` — potential uninitialized `batch_size`
- **Location**: inference/tensorrt_engine.py, line 43–44
- **Issue**: `self.batch_size` is only set if `engine_input_shape[0] != -1`. If the batch dimension is dynamic (-1), `self.batch_size` is never assigned before being used in `self.input_shape = [self.batch_size, ...]`.
- **Impact**: NameError at runtime for models with dynamic batch size (unless batch_size is passed via kwargs/set elsewhere)
- **Documented in**: modules/inference_tensorrt_engine.md, components/07_inference/description.md

### Bug 5: `dataset-visualiser.py` — missing import
- **Location**: dataset-visualiser.py, line 6
- **Issue**: `from preprocessing import read_labels` — the `preprocessing` module does not exist in the codebase.
- **Impact**: Script cannot run; ImportError at startup
- **Documented in**: modules/dataset_visualiser.md, components/05_data_pipeline/description.md

## Missing Modules

| Module | Referenced By | Status |
|--------|-------------|--------|
| `preprocessing` | dataset-visualiser.py, tests/imagelabel_visualize_test.py | Not found in codebase |

## Duplicated Code

### AnnotationClass + WeatherMode (3 locations)
| Location | Differences |
|----------|-------------|
| `dto/annotationClass.py` | Standard version. `color_tuple` property strips first 3 chars. |
| `inference/dto.py` | Adds `opencv_color` BGR field. Same `read_json` logic. |
| `annotation-queue/annotation_queue_dto.py` | Adds `opencv_color`. Reads `classes.json` from CWD (not relative to package). |

## Security Issues

| Issue | Location | Severity |
|-------|----------|----------|
| Hardcoded API credentials | config.yaml (email, password) | High |
| Hardcoded CDN access keys | cdn.yaml (4 access keys) | High |
| Hardcoded encryption key | security.py:67 (`get_model_encryption_key`) | High |
| Queue credentials in plaintext | config.yaml, annotation-queue/config.yaml | Medium |
| No TLS cert validation in API calls | api_client.py | Low |

## Completeness Check

All 21 source modules documented. All 8 components cover all modules with no gaps.

| Component | Modules | Complete |
|-----------|---------|----------|
| 01 Core | constants, utils | Yes |
| 02 Security | security, hardware_service | Yes |
| 03 API & CDN | api_client, cdn_manager | Yes |
| 04 Data Models | dto/annotationClass, dto/imageLabel | Yes |
| 05 Data Pipeline | augmentation, convert-annotations, dataset-visualiser | Yes |
| 06 Training | train, exports, manual_run | Yes |
| 07 Inference | inference/dto, onnx_engine, tensorrt_engine, inference, start_inference | Yes |
| 08 Annotation Queue | annotation_queue_dto, annotation_queue_handler | Yes |

## Consistency Check

- Component docs agree with architecture doc: Yes
- Flow diagrams match component interfaces: Yes
- Module dependency graph in discovery matches import analysis: Yes
- Data model doc matches filesystem layout in architecture: Yes

## Remaining Gaps / Uncertainties

- The `preprocessing` module may have existed previously and been deleted or renamed
- `exports.upload_model` may be intentionally deprecated in favor of the ApiClient-based flow in train.py
- `checkpoint.txt` content (`2024-06-27 20:51:35`) suggests training infrastructure was last used in mid-2024
- The `orangepi5/` shell scripts were not analyzed (bash, not Python) — they appear to be setup/run scripts for edge deployment