Update configuration and test structure for improved clarity and functionality

- Modified `.gitignore` to include test fixture data while excluding test results. - Updated `config.yaml` to change the model from 'yolo11m.yaml' to 'yolo26m.pt'. - Enhanced `.cursor/rules/coderule.mdc` with additional guidelines for test environment consistency and infrastructure handling. - Revised autopilot state management in `_docs/_autopilot_state.md` to reflect current progress and tasks. - Removed outdated augmentation tests and adjusted dataset formation tests to align with the new structure. These changes streamline the configuration and testing processes, ensuring better organization and clarity in the project.
2026-06-21 23:31:12 +00:00 · 2026-03-28 06:11:55 +02:00
parent cdcd1f6ea7
commit a47fa135de
119 changed files with 824 additions and 774 deletions
@@ -0,0 +1,26 @@
+# Training Pipeline
+
+## Files
+- `src/train.py` (178 LOC)
+- `src/augmentation.py` (152 LOC)
+- `src/constants.py` (118 LOC)
+
+## Current Flow
+
+```mermaid
+graph TD
+    A[augmentation.py] -->|reads from| B[data_dir]
+    A -->|writes to| C[processed_dir]
+    D[train.py::form_dataset] -->|reads from| C
+    D -->|shutil.copy to| E[datasets_dir/today/train,valid,test]
+    F[train.py::train_dataset] -->|YOLO.train| E
+```
+
+## Issues
+- External augmentation (albumentations) runs as separate step, writing to `processed_dir`
+- `form_dataset()` copies files from `processed_dir` to dataset splits using `shutil.copy`
+- YOLO has built-in augmentation that runs during training (mosaic, mixup, flips, etc.)
+- Using built-in aug eliminates need for `processed_dir` and the full `augmentation.py` pipeline
+- `copy_annotations()` uses `shutil.copy` — wasteful for large datasets
+- Global mutable `total_files_copied` variable in `copy_annotations`
+- Model config `yolo11m.yaml` trains from scratch; likely should use pretrained weights or updated variant
@@ -0,0 +1,18 @@
+# Configuration System
+
+## Files
+- `src/constants.py` (118 LOC)
+- `config.yaml` (root, 30 lines)
+- `src/annotation-queue/config.yaml` (21 lines)
+- `src/annotation-queue/annotation_queue_handler.py` (173 LOC)
+
+## Current State
+- `constants.py` defines `Config` (Pydantic model) loaded from root `config.yaml`
+- `annotation_queue_handler.py` reads its own `config.yaml` with raw `yaml.safe_load`
+- Both config files share `api`, `queue`, `dirs` sections but with different `dirs` values
+- Annotation queue config has `data: 'data-test'` vs root `data: 'data'`
+
+## Issues
+- Two config files with overlapping content — drift risk
+- `annotation_queue_handler.py` parses config manually instead of using `Config` model
+- `constants.py` still has `processed_dir` properties that become obsolete after removing external augmentation
@@ -0,0 +1,10 @@
+# Data Utilities
+
+## Files
+- `src/exports.py` — `form_data_sample()` reads from `processed_images_dir`
+- `src/dataset-visualiser.py` — `visualise_processed_folder()` reads from `processed_images_dir`/`processed_labels_dir`
+
+## Impact
+- Both files reference `processed_dir` via `constants.config`
+- After removing `processed_dir`, these must switch to `data_images_dir`/`data_labels_dir`
+- `form_data_sample()` also uses `shutil.copy` — candidate for hard links