# List of Changes

**Run**: 01-code-improvements
**Mode**: guided
**Source**: `_docs/02_document/refactoring_notes.md`
**Date**: 2026-03-28

## Summary

Apply 5 improvements from documentation review: update YOLO model, switch to built-in augmentation, remove processed directory, use hard symlinks for dataset formation, and unify configuration files.

## Changes

### C01: Update YOLO model to 26m variant
- **File(s)**: `src/constants.py`, `src/train.py`
- **Problem**: Current model config uses `yolo11m.yaml` which trains from a YAML architecture definition
- **Change**: Update `TrainingConfig.model` to the YOLO 26m variant; ensure `train_dataset()` uses the updated model reference
- **Rationale**: Use updated model version as requested; pretrained weights improve convergence
- **Risk**: medium
- **Dependencies**: None

### C02: Replace external augmentation with YOLO built-in
- **File(s)**: `src/train.py`, `src/augmentation.py`
- **Problem**: `augmentation.py` uses albumentations to augment images into a separate `processed_dir` before training — adds complexity, disk usage, and a separate processing step
- **Change**: Remove the `augment_annotations()` call from the training pipeline; add YOLO built-in augmentation parameters (hsv_h, hsv_s, hsv_v, degrees, translate, scale, shear, flipud, fliplr, mosaic, mixup) to the `model.train()` call in `train_dataset()`, each on its own line with a descriptive comment; `augmentation.py` remains in codebase but is no longer called during training
- **Rationale**: YOLO's built-in augmentation applies on-the-fly during training, eliminating the pre-processing step and processed directory
- **Risk**: medium
- **Dependencies**: C01

### C03: Remove processed directory — use data dir directly
- **File(s)**: `src/constants.py`, `src/train.py`, `src/exports.py`, `src/dataset-visualiser.py`
- **Problem**: `processed_dir`, `processed_images_dir`, `processed_labels_dir` properties in `Config` are no longer needed when built-in augmentation is used; `form_dataset()` reads from processed dir; `form_data_sample()` reads from processed dir; `visualise_processed_folder()` reads from processed dir
- **Change**: Remove `processed_dir`/`processed_images_dir`/`processed_labels_dir` properties from `Config`; update `form_dataset()` to read from `data_images_dir`/`data_labels_dir`; update `form_data_sample()` similarly; update `visualise_processed_folder()` similarly
- **Rationale**: Processed directory is unnecessary without external augmentation step
- **Risk**: medium
- **Dependencies**: C02

### C04: Use hard symlinks instead of file copies for dataset
- **File(s)**: `src/train.py`
- **Problem**: `copy_annotations()` uses `shutil.copy()` to duplicate images and labels into train/valid/test splits — wastes disk space on large datasets
- **Change**: Replace `shutil.copy()` with `os.link()` to create hard links; add fallback to `shutil.copy()` for cross-filesystem scenarios
- **Rationale**: Hard links share the same inode, saving disk space while maintaining independent directory entries
- **Risk**: low
- **Dependencies**: C03

### C05: Unify configuration — remove annotation-queue/config.yaml
- **File(s)**: `src/constants.py`, `src/annotation-queue/annotation_queue_handler.py`, `src/annotation-queue/config.yaml`
- **Problem**: `src/annotation-queue/config.yaml` duplicates root `config.yaml` with different `dirs` values; `annotation_queue_handler.py` parses config manually via `yaml.safe_load` instead of using the shared `Config` model
- **Change**: Extend `Config` in `constants.py` to include queue and annotation-queue directory settings; refactor `annotation_queue_handler.py` to accept a `Config` instance (or import from constants); delete `src/annotation-queue/config.yaml`
- **Rationale**: Single source of truth for configuration eliminates drift risk and inconsistency
- **Risk**: medium
- **Dependencies**: None