mirror of https://github.com/azaion/ai-training.git synced 2026-04-22 22:36:36 +00:00

Files

T

Oleksandr Bezdieniezhnykh 142c6c4de8 Refactor constants management to use Pydantic BaseModel for configuration

- Replaced module-level path variables in constants.py with a structured Pydantic Config class.
- Updated all relevant modules (train.py, augmentation.py, exports.py, dataset-visualiser.py, manual_run.py) to access paths through the new config structure.
- Fixed bugs related to image processing and model saving.
- Enhanced test infrastructure to accommodate the new configuration approach.

This refactor improves code maintainability and clarity by centralizing configuration management.

2026-03-27 18:18:30 +02:00

7.8 KiB

Raw Blame History

Verification Log

Summary

Metric	Count
Entities verified	87
Entities flagged	0
Corrections applied	0
Bugs found in code	5
Missing modules	1
Duplicated code	1 pattern (3 locations)
Security issues	3
Completeness	21/21 modules (100%)

Entity Verification

All class names, function names, method signatures, and module names referenced in documentation were verified against the actual source code. No hallucinated entities found.

Verified Entities (key samples)

Entity	Location	Doc Reference	Status
`Security.encrypt_to`	security.py:14	modules/security.md	OK
`Security.decrypt_to`	security.py:28	modules/security.md	OK
`Security.get_model_encryption_key`	security.py:66	modules/security.md	OK
`get_hardware_info`	hardware_service.py:5	modules/hardware_service.md	OK
`CDNManager.upload`	cdn_manager.py:28	modules/cdn_manager.md	OK
`CDNManager.download`	cdn_manager.py:37	modules/cdn_manager.md	OK
`ApiClient.login`	api_client.py:43	modules/api_client.md	OK
`ApiClient.load_bytes`	api_client.py:63	modules/api_client.md	OK
`ApiClient.upload_big_small_resource`	api_client.py:113	modules/api_client.md	OK
`Augmentator.augment_annotations`	augmentation.py:125	modules/augmentation.md	OK
`Augmentator.augment_inner`	augmentation.py:55	modules/augmentation.md	OK
`InferenceEngine` (ABC)	inference/onnx_engine.py:7	modules/inference_onnx_engine.md	OK
`OnnxEngine`	inference/onnx_engine.py:25	modules/inference_onnx_engine.md	OK
`TensorRTEngine`	inference/tensorrt_engine.py:16	modules/inference_tensorrt_engine.md	OK
`TensorRTEngine.convert_from_onnx`	inference/tensorrt_engine.py:104	modules/inference_tensorrt_engine.md	OK
`Inference.process`	inference/inference.py:83	modules/inference_inference.md	OK
`Inference.remove_overlapping_detections`	inference/inference.py:120	modules/inference_inference.md	OK
`AnnotationQueueHandler.on_message`	annotation-queue/annotation_queue_handler.py:87	modules/annotation_queue_handler.md	OK
`AnnotationMessage`	annotation-queue/annotation_queue_dto.py:91	modules/annotation_queue_dto.md	OK
`form_dataset`	train.py:42	modules/train.md	OK
`train_dataset`	train.py:147	modules/train.md	OK
`export_onnx`	exports.py:29	modules/exports.md	OK
`export_rknn`	exports.py:19	modules/exports.md	OK
`export_tensorrt`	exports.py:45	modules/exports.md	OK
`upload_model`	exports.py:82	modules/exports.md	OK
`WeatherMode`	dto/annotationClass.py:6	modules/dto_annotationClass.md	OK
`AnnotationClass.read_json`	dto/annotationClass.py:18	modules/dto_annotationClass.md	OK
`ImageLabel.visualize`	dto/imageLabel.py:12	modules/dto_imageLabel.md	OK
`Dotdict`	utils.py:1	modules/utils.md	OK

Code Bugs Found During Verification

Bug 1: `augmentation.py` — undefined attribute `total_to_process`

Location: augmentation.py, line 118
Issue: References self.total_to_process but only self.total_images_to_process is defined in __init__
Impact: AttributeError at runtime during progress logging
Documented in: modules/augmentation.md, components/05_data_pipeline/description.md

Bug 2: `train.py` `copy_annotations` — reporting bug

Location: train.py, line 93 and 99
Issue: copied = 0 is declared but never incremented. The global total_files_copied is incremented inside the inner function, but copied is printed in the final message: f'Copied all {copied} annotations' always prints 0.
Impact: Incorrect progress reporting (cosmetic)
Documented in: modules/train.md, components/06_training/description.md

Bug 3: `exports.py` `upload_model` — stale ApiClient constructor call

Location: exports.py, line 97
Issue: ApiClient(ApiCredentials(api_c.url, api_c.user, api_c.pw, api_c.folder)) — but ApiClient.__init__ takes no args, and ApiCredentials.__init__ takes (url, email, password), not (url, user, pw, folder).
Impact: upload_model function would fail at runtime. This function appears to be stale code — the actual upload flow in train.py:export_current_model uses the correct ApiClient() constructor.
Documented in: modules/exports.md, components/06_training/description.md

Bug 4: `inference/tensorrt_engine.py` — potential uninitialized `batch_size`

Location: inference/tensorrt_engine.py, line 43–44
Issue: self.batch_size is only set if engine_input_shape[0] != -1. If the batch dimension is dynamic (-1), self.batch_size is never assigned before being used in self.input_shape = [self.batch_size, ...].
Impact: NameError at runtime for models with dynamic batch size (unless batch_size is passed via kwargs/set elsewhere)
Documented in: modules/inference_tensorrt_engine.md, components/07_inference/description.md

Bug 5: `dataset-visualiser.py` — missing import

Location: dataset-visualiser.py, line 6
Issue: from preprocessing import read_labels — the preprocessing module does not exist in the codebase.
Impact: Script cannot run; ImportError at startup
Documented in: modules/dataset_visualiser.md, components/05_data_pipeline/description.md

Missing Modules

Module	Referenced By	Status
`preprocessing`	dataset-visualiser.py, tests/imagelabel_visualize_test.py	Not found in codebase

Duplicated Code

AnnotationClass + WeatherMode (3 locations)

Location	Differences
`dto/annotationClass.py`	Standard version. `color_tuple` property strips first 3 chars.
`inference/dto.py`	Adds `opencv_color` BGR field. Same `read_json` logic.
`annotation-queue/annotation_queue_dto.py`	Adds `opencv_color`. Reads `classes.json` from CWD (not relative to package).

Security Issues

Issue	Location	Severity
Hardcoded API credentials	config.yaml (email, password)	High
Hardcoded CDN access keys	cdn.yaml (4 access keys)	High
Hardcoded encryption key	security.py:67 (`get_model_encryption_key`)	High
Queue credentials in plaintext	config.yaml, annotation-queue/config.yaml	Medium
No TLS cert validation in API calls	api_client.py	Low

Completeness Check

All 21 source modules documented. All 8 components cover all modules with no gaps.

Component	Modules	Complete
01 Core	constants, utils	Yes
02 Security	security, hardware_service	Yes
03 API & CDN	api_client, cdn_manager	Yes
04 Data Models	dto/annotationClass, dto/imageLabel	Yes
05 Data Pipeline	augmentation, convert-annotations, dataset-visualiser	Yes
06 Training	train, exports, manual_run	Yes
07 Inference	inference/dto, onnx_engine, tensorrt_engine, inference, start_inference	Yes
08 Annotation Queue	annotation_queue_dto, annotation_queue_handler	Yes

Consistency Check

Component docs agree with architecture doc: Yes
Flow diagrams match component interfaces: Yes
Module dependency graph in discovery matches import analysis: Yes
Data model doc matches filesystem layout in architecture: Yes

Remaining Gaps / Uncertainties

The preprocessing module may have existed previously and been deleted or renamed
exports.upload_model may be intentionally deprecated in favor of the ApiClient-based flow in train.py
checkpoint.txt content (2024-06-27 20:51:35) suggests training infrastructure was last used in mid-2024
The orangepi5/ shell scripts were not analyzed (bash, not Python) — they appear to be setup/run scripts for edge deployment

7.8 KiB Raw Blame History Unescape Escape