# List of Changes **Run**: 01-code-improvements **Mode**: guided **Source**: `_docs/02_document/refactoring_notes.md` **Date**: 2026-03-28 ## Summary Apply 5 improvements from documentation review: update YOLO model, switch to built-in augmentation, remove processed directory, use hard symlinks for dataset formation, and unify configuration files. ## Changes ### C01: Update YOLO model to 26m variant - **File(s)**: `src/constants.py`, `src/train.py` - **Problem**: Current model config uses `yolo11m.yaml` which trains from a YAML architecture definition - **Change**: Update `TrainingConfig.model` to the YOLO 26m variant; ensure `train_dataset()` uses the updated model reference - **Rationale**: Use updated model version as requested; pretrained weights improve convergence - **Risk**: medium - **Dependencies**: None ### C02: Replace external augmentation with YOLO built-in - **File(s)**: `src/train.py`, `src/augmentation.py` - **Problem**: `augmentation.py` uses albumentations to augment images into a separate `processed_dir` before training — adds complexity, disk usage, and a separate processing step - **Change**: Remove the `augment_annotations()` call from the training pipeline; add YOLO built-in augmentation parameters (hsv_h, hsv_s, hsv_v, degrees, translate, scale, shear, flipud, fliplr, mosaic, mixup) to the `model.train()` call in `train_dataset()`, each on its own line with a descriptive comment; `augmentation.py` remains in codebase but is no longer called during training - **Rationale**: YOLO's built-in augmentation applies on-the-fly during training, eliminating the pre-processing step and processed directory - **Risk**: medium - **Dependencies**: C01 ### C03: Remove processed directory — use data dir directly - **File(s)**: `src/constants.py`, `src/train.py`, `src/exports.py`, `src/dataset-visualiser.py` - **Problem**: `processed_dir`, `processed_images_dir`, `processed_labels_dir` properties in `Config` are no longer needed when built-in augmentation is used; `form_dataset()` reads from processed dir; `form_data_sample()` reads from processed dir; `visualise_processed_folder()` reads from processed dir - **Change**: Remove `processed_dir`/`processed_images_dir`/`processed_labels_dir` properties from `Config`; update `form_dataset()` to read from `data_images_dir`/`data_labels_dir`; update `form_data_sample()` similarly; update `visualise_processed_folder()` similarly - **Rationale**: Processed directory is unnecessary without external augmentation step - **Risk**: medium - **Dependencies**: C02 ### C04: Use hard symlinks instead of file copies for dataset - **File(s)**: `src/train.py` - **Problem**: `copy_annotations()` uses `shutil.copy()` to duplicate images and labels into train/valid/test splits — wastes disk space on large datasets - **Change**: Replace `shutil.copy()` with `os.link()` to create hard links; add fallback to `shutil.copy()` for cross-filesystem scenarios - **Rationale**: Hard links share the same inode, saving disk space while maintaining independent directory entries - **Risk**: low - **Dependencies**: C03 ### C05: Unify configuration — remove annotation-queue/config.yaml - **File(s)**: `src/constants.py`, `src/annotation-queue/annotation_queue_handler.py`, `src/annotation-queue/config.yaml` - **Problem**: `src/annotation-queue/config.yaml` duplicates root `config.yaml` with different `dirs` values; `annotation_queue_handler.py` parses config manually via `yaml.safe_load` instead of using the shared `Config` model - **Change**: Extend `Config` in `constants.py` to include queue and annotation-queue directory settings; refactor `annotation_queue_handler.py` to accept a `Config` instance (or import from constants); delete `src/annotation-queue/config.yaml` - **Rationale**: Single source of truth for configuration eliminates drift risk and inconsistency - **Risk**: medium - **Dependencies**: None