Update configuration and test structure for improved clarity and functionality

- Modified `.gitignore` to include test fixture data while excluding test results. - Updated `config.yaml` to change the model from 'yolo11m.yaml' to 'yolo26m.pt'. - Enhanced `.cursor/rules/coderule.mdc` with additional guidelines for test environment consistency and infrastructure handling. - Revised autopilot state management in `_docs/_autopilot_state.md` to reflect current progress and tasks. - Removed outdated augmentation tests and adjusted dataset formation tests to align with the new structure. These changes streamline the configuration and testing processes, ensuring better organization and clarity in the project.
2026-06-22 15:11:11 +00:00 · 2026-03-28 06:11:55 +02:00
parent cdcd1f6ea7
commit a47fa135de
119 changed files with 824 additions and 774 deletions
@@ -0,0 +1,52 @@
+# List of Changes
+
+**Run**: 01-code-improvements
+**Mode**: guided
+**Source**: `_docs/02_document/refactoring_notes.md`
+**Date**: 2026-03-28
+
+## Summary
+
+Apply 5 improvements from documentation review: update YOLO model, switch to built-in augmentation, remove processed directory, use hard symlinks for dataset formation, and unify configuration files.
+
+## Changes
+
+### C01: Update YOLO model to 26m variant
+- **File(s)**: `src/constants.py`, `src/train.py`
+- **Problem**: Current model config uses `yolo11m.yaml` which trains from a YAML architecture definition
+- **Change**: Update `TrainingConfig.model` to the YOLO 26m variant; ensure `train_dataset()` uses the updated model reference
+- **Rationale**: Use updated model version as requested; pretrained weights improve convergence
+- **Risk**: medium
+- **Dependencies**: None
+
+### C02: Replace external augmentation with YOLO built-in
+- **File(s)**: `src/train.py`, `src/augmentation.py`
+- **Problem**: `augmentation.py` uses albumentations to augment images into a separate `processed_dir` before training — adds complexity, disk usage, and a separate processing step
+- **Change**: Remove the `augment_annotations()` call from the training pipeline; add YOLO built-in augmentation parameters (hsv_h, hsv_s, hsv_v, degrees, translate, scale, shear, flipud, fliplr, mosaic, mixup) to the `model.train()` call in `train_dataset()`, each on its own line with a descriptive comment; `augmentation.py` remains in codebase but is no longer called during training
+- **Rationale**: YOLO's built-in augmentation applies on-the-fly during training, eliminating the pre-processing step and processed directory
+- **Risk**: medium
+- **Dependencies**: C01
+
+### C03: Remove processed directory — use data dir directly
+- **File(s)**: `src/constants.py`, `src/train.py`, `src/exports.py`, `src/dataset-visualiser.py`
+- **Problem**: `processed_dir`, `processed_images_dir`, `processed_labels_dir` properties in `Config` are no longer needed when built-in augmentation is used; `form_dataset()` reads from processed dir; `form_data_sample()` reads from processed dir; `visualise_processed_folder()` reads from processed dir
+- **Change**: Remove `processed_dir`/`processed_images_dir`/`processed_labels_dir` properties from `Config`; update `form_dataset()` to read from `data_images_dir`/`data_labels_dir`; update `form_data_sample()` similarly; update `visualise_processed_folder()` similarly
+- **Rationale**: Processed directory is unnecessary without external augmentation step
+- **Risk**: medium
+- **Dependencies**: C02
+
+### C04: Use hard symlinks instead of file copies for dataset
+- **File(s)**: `src/train.py`
+- **Problem**: `copy_annotations()` uses `shutil.copy()` to duplicate images and labels into train/valid/test splits — wastes disk space on large datasets
+- **Change**: Replace `shutil.copy()` with `os.link()` to create hard links; add fallback to `shutil.copy()` for cross-filesystem scenarios
+- **Rationale**: Hard links share the same inode, saving disk space while maintaining independent directory entries
+- **Risk**: low
+- **Dependencies**: C03
+
+### C05: Unify configuration — remove annotation-queue/config.yaml
+- **File(s)**: `src/constants.py`, `src/annotation-queue/annotation_queue_handler.py`, `src/annotation-queue/config.yaml`
+- **Problem**: `src/annotation-queue/config.yaml` duplicates root `config.yaml` with different `dirs` values; `annotation_queue_handler.py` parses config manually via `yaml.safe_load` instead of using the shared `Config` model
+- **Change**: Extend `Config` in `constants.py` to include queue and annotation-queue directory settings; refactor `annotation_queue_handler.py` to accept a `Config` instance (or import from constants); delete `src/annotation-queue/config.yaml`
+- **Rationale**: Single source of truth for configuration eliminates drift risk and inconsistency
+- **Risk**: medium
+- **Dependencies**: None