# Codebase Discovery ## Directory Tree ``` ai-training/ ├── annotation-queue/ # Separate sub-service: annotation message queue consumer │ ├── annotation_queue_dto.py │ ├── annotation_queue_handler.py │ ├── classes.json │ ├── config.yaml │ ├── offset.yaml │ ├── requirements.txt │ └── run.sh ├── dto/ # Data transfer objects for the training pipeline │ ├── annotationClass.py │ ├── annotation_bulk_message.py (empty) │ ├── annotation_message.py (empty) │ └── imageLabel.py ├── inference/ # Inference engine subsystem (ONNX + TensorRT) │ ├── __init__.py (empty) │ ├── dto.py │ ├── inference.py │ ├── onnx_engine.py │ └── tensorrt_engine.py ├── orangepi5/ # Setup scripts for OrangePi5 edge device │ ├── 01 install.sh │ ├── 02 install-inference.sh │ └── 03 run_inference.sh ├── scripts/ │ └── init-sftp.sh ├── tests/ │ ├── data.yaml │ ├── imagelabel_visualize_test.py │ ├── libomp140.x86_64.dll (binary workaround for Windows) │ └── security_test.py ├── api_client.py # API client for Azaion backend + CDN resource management ├── augmentation.py # Image augmentation pipeline (albumentations) ├── cdn_manager.py # S3-compatible CDN upload/download via boto3 ├── cdn.yaml # CDN credentials config ├── checkpoint.txt # Last training checkpoint timestamp ├── classes.json # Annotation class definitions (17 classes + weather modes) ├── config.yaml # Main config (API url, queue, directories) ├── constants.py # Shared path constants and config keys ├── convert-annotations.py # Annotation format converter (Pascal VOC / bbox → YOLO) ├── dataset-visualiser.py # Interactive dataset visualization tool ├── exports.py # Model export (ONNX, TensorRT, RKNN) and upload ├── hardware_service.py # Hardware fingerprinting (CPU/GPU/RAM/drive serial) ├── install.sh # Dependency installation script ├── manual_run.py # Manual training/export entry point ├── requirements.txt # Python dependencies ├── security.py # AES-256-CBC encryption/decryption + key derivation ├── start_inference.py # Inference entry point (downloads model, runs TensorRT) ├── train.py # Main training pipeline (dataset formation → YOLO training → export) └── utils.py # Utility classes (Dotdict) ``` ## Tech Stack Summary | Category | Technology | Details | |----------|-----------|---------| | Language | Python 3.10+ | Match statements used (3.10 feature) | | ML Framework | Ultralytics (YOLO) | YOLOv11 object detection model | | Deep Learning | PyTorch 2.3.0 (CUDA 12.1) | GPU-accelerated training | | Inference (Primary) | TensorRT | GPU inference with FP16/INT8 support | | Inference (Fallback) | ONNX Runtime GPU | Cross-platform inference | | Augmentation | Albumentations | Image augmentation pipeline | | Computer Vision | OpenCV (cv2) | Image I/O, preprocessing, visualization | | CDN/Storage | boto3 (S3-compatible) | Model artifact storage | | Message Queue | RabbitMQ Streams (rstream) | Annotation message consumption | | Serialization | msgpack | Queue message deserialization | | Encryption | cryptography (AES-256-CBC) | Model encryption, API resource encryption | | GPU Management | pycuda, pynvml | CUDA memory management, device queries | | HTTP | requests | API communication | | Config | PyYAML | Configuration files | | Visualization | matplotlib, netron | Annotation display, model graph viewer | | Edge Deployment | RKNN (RK3588) | OrangePi5 inference target | ## Dependency Graph ### Internal Module Dependencies (textual) **Leaves (no internal dependencies):** - `constants` — path constants, config keys - `utils` — Dotdict helper - `security` — encryption/decryption, key derivation - `hardware_service` — hardware fingerprinting - `cdn_manager` — S3-compatible CDN client - `dto/annotationClass` — annotation class model + JSON reader - `dto/imageLabel` — image+labels container with visualization - `inference/dto` — Detection, Annotation, AnnotationClass (inference-specific) - `inference/onnx_engine` — InferenceEngine ABC + OnnxEngine implementation - `convert-annotations` — standalone annotation format converter - `annotation-queue/annotation_queue_dto` — queue message DTOs **Level 1 (depends on leaves):** - `api_client` → constants, cdn_manager, hardware_service, security - `augmentation` → constants, dto/imageLabel - `inference/tensorrt_engine` → inference/onnx_engine (InferenceEngine ABC) - `inference/inference` → inference/dto, inference/onnx_engine - `annotation-queue/annotation_queue_handler` → annotation_queue_dto **Level 2 (depends on level 1):** - `exports` → constants, api_client, cdn_manager, security, utils **Level 3 (depends on level 2):** - `train` → constants, api_client, cdn_manager, dto/annotationClass, inference/onnx_engine, security, utils, exports - `start_inference` → constants, api_client, cdn_manager, inference/inference, inference/tensorrt_engine, security, utils **Level 4 (depends on level 3):** - `manual_run` → constants, train, augmentation **Broken dependency:** - `dataset-visualiser` → constants, dto/annotationClass, dto/imageLabel, **preprocessing** (module not found in codebase) ### Dependency Graph (Mermaid) ```mermaid graph TD constants --> api_client constants --> augmentation constants --> exports constants --> train constants --> manual_run constants --> start_inference constants --> dataset-visualiser utils --> exports utils --> train utils --> start_inference security --> api_client security --> exports security --> train security --> start_inference hardware_service --> api_client cdn_manager --> api_client cdn_manager --> exports cdn_manager --> train cdn_manager --> start_inference api_client --> exports api_client --> train api_client --> start_inference dto_annotationClass[dto/annotationClass] --> train dto_annotationClass --> dataset-visualiser dto_imageLabel[dto/imageLabel] --> augmentation dto_imageLabel --> dataset-visualiser inference_dto[inference/dto] --> inference_inference[inference/inference] inference_onnx[inference/onnx_engine] --> inference_inference inference_onnx --> inference_trt[inference/tensorrt_engine] inference_onnx --> train inference_inference --> start_inference inference_trt --> start_inference exports --> train train --> manual_run augmentation --> manual_run aq_dto[annotation-queue/annotation_queue_dto] --> aq_handler[annotation-queue/annotation_queue_handler] ``` ## Topological Processing Order | Batch | Modules | |-------|---------| | 1 (leaves) | constants, utils, security, hardware_service, cdn_manager | | 2 (leaves) | dto/annotationClass, dto/imageLabel, inference/dto, inference/onnx_engine | | 3 (level 1) | api_client, augmentation, inference/tensorrt_engine, inference/inference | | 4 (level 2) | exports, convert-annotations, dataset-visualiser | | 5 (level 3) | train, start_inference | | 6 (level 4) | manual_run | | 7 (separate) | annotation-queue/annotation_queue_dto, annotation-queue/annotation_queue_handler | ## Entry Points | Entry Point | Description | |-------------|-------------| | `train.py` (`__main__`) | Main pipeline: form dataset → train YOLO → export + upload ONNX model | | `augmentation.py` (`__main__`) | Continuous augmentation loop (runs indefinitely) | | `start_inference.py` (`__main__`) | Download encrypted TensorRT model → run video inference | | `manual_run.py` (script) | Ad-hoc training/export commands | | `convert-annotations.py` (`__main__`) | One-shot annotation format conversion | | `dataset-visualiser.py` (`__main__`) | Interactive annotation visualization | | `annotation-queue/annotation_queue_handler.py` (`__main__`) | Async queue consumer for annotation CRUD events | ## Leaf Modules constants, utils, security, hardware_service, cdn_manager, dto/annotationClass, dto/imageLabel, inference/dto, inference/onnx_engine, convert-annotations, annotation-queue/annotation_queue_dto ## Observations - **Security concern**: `config.yaml` and `cdn.yaml` contain hardcoded credentials (API passwords, S3 access keys). These should be moved to environment variables or a secrets manager. - **Missing module**: `dataset-visualiser.py` imports from `preprocessing` which does not exist in the codebase. - **Duplicate code**: `AnnotationClass` and `WeatherMode` are defined in three separate locations: `dto/annotationClass.py`, `inference/dto.py`, and `annotation-queue/annotation_queue_dto.py`. - **Empty files**: `dto/annotation_bulk_message.py`, `dto/annotation_message.py`, and `inference/__init__.py` are empty. - **Separate sub-service**: `annotation-queue/` has its own `requirements.txt` and `config.yaml`, functioning as an independent service. - **Hardcoded encryption key**: `security.py` has a hardcoded model encryption key string. - **No formal test framework**: tests are script-based, not using pytest/unittest.