[AZ-165] [AZ-166] [AZ-167] [AZ-168] [AZ-169] Complete refactoring: delete dead augmentation.py, move tasks to done

- Delete src/augmentation.py (dead code with broken processed_dir refs after AZ-168) - Remove dead Augmentator import from manual_run.py - Move all 5 refactoring tasks from todo/ to done/ - Update autopilot state: Step 7 Refactor complete, advance to Step 8 New Task - Strengthen tracker.mdc: NEVER use ADO MCP Made-with: Cursor
2026-06-22 13:01:12 +00:00 · 2026-03-28 16:51:14 +02:00
parent cd04f282d0
commit 1e139d7533
9 changed files with 6 additions and 159 deletions
@@ -1,54 +0,0 @@
-# Unify Configuration
-
-**Task**: AZ-165_refactor_unify_config
-**Name**: Unify configuration — remove annotation-queue/config.yaml
-**Description**: Consolidate two config files into one shared Config model
-**Complexity**: 3 points
-**Dependencies**: None
-**Component**: Configuration
-**Tracker**: AZ-165
-**Epic**: AZ-164
-
-## Problem
-
-Two separate `config.yaml` files exist (root and `src/annotation-queue/`) with overlapping content but different `dirs` values. The annotation queue handler parses YAML manually instead of using the shared `Config` Pydantic model, creating drift risk.
-
-## Outcome
-
- Single `Config` model in `constants.py` covers all configuration including queue settings
- `annotation_queue_handler.py` uses the shared `Config` instead of parsing its own YAML
- `src/annotation-queue/config.yaml` is deleted
-
-## Scope
-
-### Included
- Add Pydantic models for `ApiConfig`, `QueueConfig`; extend `DirsConfig` with all directory fields (data, data_seed, data_processed, data_deleted, images, labels)
- Add these to the `Config` Pydantic model in `constants.py`
- Refactor `annotation_queue_handler.py` constructor to accept/import the shared Pydantic `Config`
- Delete `src/annotation-queue/config.yaml`
-
-### Excluded
- Changing queue connection logic or message handling
- Modifying root `config.yaml` structure beyond adding queue section (it already has it)
-
-## Acceptance Criteria
-
-**AC-1: Single config source**
-Given the root `config.yaml` contains queue and dirs settings
-When `annotation_queue_handler.py` initializes
-Then it reads configuration from the shared `Config` model, not a local YAML file
-
-**AC-2: No duplicate config file**
-Given the refactoring is complete
-When listing `src/annotation-queue/`
-Then `config.yaml` does not exist
-
-**AC-3: Annotation queue behavior preserved**
-Given the unified configuration
-When the annotation queue handler processes messages
-Then it uses the correct directory paths from configuration
-
-## Constraints
-
- Root `config.yaml` already has the `queue` section — reuse it
- `annotation_queue_handler.py` runs as a separate process — config import path must work from its working directory
@@ -1,56 +0,0 @@
-# Update YOLO Model
-
-**Task**: AZ-166_refactor_yolo_model
-**Name**: Update YOLO model to 26m variant (supports both from-scratch and pretrained)
-**Description**: Update model references from YOLO11m to YOLO26m; support both training from scratch (`.yaml`) and from pretrained weights (`.pt`)
-**Complexity**: 2 points
-**Dependencies**: None
-**Component**: Training Pipeline
-**Tracker**: AZ-166
-**Epic**: AZ-164
-
-## Problem
-
-Current `TrainingConfig.model` is set to `yolo11m.yaml` which defines a YOLO11 architecture. YOLO26m is the latest model variant. The system should support both training modes:
-1. **From scratch** — using `yolo26m.yaml` (architecture definition, trains from random weights)
-2. **From pretrained** — using `yolo26m.pt` (pretrained weights, faster convergence)
-
-## Outcome
-
- `TrainingConfig` default model updated to `yolo26m.pt` (pretrained, recommended default)
- `config.yaml` updated to `yolo26m.pt`
- Both `yolo26m.pt` and `yolo26m.yaml` work when set in `config.yaml`
- `train_dataset()` and `resume_training()` work with either model reference
-
-## Scope
-
-### Included
- Update `TrainingConfig.model` default from `yolo11m.yaml` to `yolo26m.pt`
- Update `config.yaml` training.model from `yolo11m.yaml` to `yolo26m.pt`
- Verify `train_dataset()` works with both `.pt` and `.yaml` model values
-
-### Excluded
- Changing training hyperparameters (epochs, batch, imgsz)
- Updating ultralytics library version
-
-## Acceptance Criteria
-
-**AC-1: Default model config updated**
-Given the training configuration
-When reading `TrainingConfig.model`
-Then the default value is `yolo26m.pt`
-
-**AC-2: config.yaml updated**
-Given the root `config.yaml`
-When reading `training.model`
-Then the value is `yolo26m.pt`
-
-**AC-3: From-scratch training supported**
-Given `config.yaml` sets `training.model: yolo26m.yaml`
-When `YOLO(constants.config.training.model)` is called
-Then a YOLO26m model is built from the architecture definition
-
-**AC-4: Pretrained training supported**
-Given `config.yaml` sets `training.model: yolo26m.pt`
-When `YOLO(constants.config.training.model)` is called
-Then a YOLO26m model is loaded from pretrained weights
@@ -1,55 +0,0 @@
-# Replace External Augmentation with YOLO Built-in
-
-**Task**: AZ-167_refactor_builtin_augmentation
-**Name**: Replace external augmentation with YOLO built-in
-**Description**: Remove albumentations pipeline and use YOLO model.train() built-in augmentation parameters
-**Complexity**: 3 points
-**Dependencies**: AZ-166_refactor_yolo_model
-**Component**: Training Pipeline
-**Tracker**: AZ-167
-**Epic**: AZ-164
-
-## Problem
-
-`augmentation.py` uses the `albumentations` library to augment images into a `processed_dir` before training. This creates a separate processing step, uses extra disk space (8x original), and adds complexity. YOLO's built-in augmentation applies on-the-fly during training.
-
-## Outcome
-
- `train_dataset()` passes augmentation parameters directly to `model.train()`
- Each augmentation parameter is on its own line with a descriptive comment
- The external augmentation step is removed from the training pipeline
- `augmentation.py` is no longer called during training
-
-## Scope
-
-### Included
- Add YOLO built-in augmentation parameters to `model.train()` call in `train_dataset()`
- Parameters to add: hsv_h, hsv_s, hsv_v, degrees, translate, scale, shear, fliplr, mosaic (each with comment)
- Remove call to augmentation from training flow
-
-### Excluded
- Deleting `augmentation.py` file (may still be useful standalone)
- Changing training hyperparameters unrelated to augmentation
-
-## Acceptance Criteria
-
-**AC-1: Built-in augmentation parameters with comments**
-Given the `train_dataset()` function
-When `model.train()` is called
-Then every parameter (including augmentation: hsv_h, hsv_s, hsv_v, degrees, scale, shear, fliplr, mosaic, and training: data, epochs, batch, imgsz, etc.) is on its own line with an inline comment explaining what the parameter controls
-
-**AC-2: No external augmentation in training flow**
-Given the training pipeline
-When `train_dataset()` runs
-Then it does not call `augment_annotations()` or any albumentations-based augmentation
-
-## Constraints
-
- Every parameter row in the `model.train()` call MUST have an inline comment describing what it does (e.g. `hsv_h=0.015,  # hue shift fraction of the color wheel`)
- This applies to ALL parameters, not just augmentation — training params (data, epochs, batch, imgsz, save_period, workers) also need comments
- Augmentation parameter values should approximate the current albumentations settings:
-  - fliplr=0.6 (was HorizontalFlip p=0.6)
-  - degrees=35.0 (was Affine rotate=(-35,35))
-  - shear=10.0 (was Affine shear=(-10,10))
-  - hsv_h=0.015, hsv_s=0.7, hsv_v=0.4 (approximate HSV shifts)
-  - mosaic=1.0 (new YOLO built-in, recommended default)
@@ -1,60 +0,0 @@
-# Remove Processed Directory
-
-**Task**: AZ-168_refactor_remove_processed_dir
-**Name**: Remove processed directory — use data dir directly
-**Description**: Eliminate processed_dir concept from Config and all consumers; read from data dir directly; update e2e test fixture
-**Complexity**: 3 points
-**Dependencies**: AZ-167_refactor_builtin_augmentation
-**Component**: Training Pipeline, Data Utilities
-**Tracker**: AZ-168
-**Epic**: AZ-164
-
-## Problem
-
-`Config` exposes `processed_dir`, `processed_images_dir`, `processed_labels_dir` properties. Multiple files read from the processed directory: `train.py::form_dataset()`, `exports.py::form_data_sample()`, `dataset-visualiser.py::visualise_processed_folder()`. With built-in augmentation, the processed directory is no longer populated.
-
-The e2e test fixture (`tests/test_training_e2e.py`) currently copies images to both `data_images_dir` and `processed_images_dir` as a workaround — this needs cleanup once `form_dataset()` reads from data dirs.
-
-## Outcome
-
- `Config` no longer has `processed_dir`/`processed_images_dir`/`processed_labels_dir` properties
- `form_dataset()` reads images/labels from `data_images_dir`/`data_labels_dir`
- `form_data_sample()` reads from `data_images_dir`
- `visualise_processed_folder()` reads from `data_images_dir`/`data_labels_dir`
- E2e test fixture copies images only to `data_images_dir`/`data_labels_dir` (no more processed dir population)
-
-## Scope
-
-### Included
- Remove `processed_dir`, `processed_images_dir`, `processed_labels_dir` from `Config`
- Update `form_dataset()` in `train.py` to use `data_images_dir` and `data_labels_dir`
- Update `copy_annotations()` in `train.py` to look up labels from `data_labels_dir` instead of `processed_labels_dir`
- Update `form_data_sample()` in `exports.py` to use `data_images_dir`
- Update `visualise_processed_folder()` in `dataset-visualiser.py`
- Update `tests/test_training_e2e.py` e2e fixture: remove processed dir population (only copy to data dirs)
-
-### Excluded
- Removing `augmentation.py` file
- Changing `corrupted_dir` handling
-
-## Acceptance Criteria
-
-**AC-1: No processed dir in Config**
-Given the `Config` class
-When inspecting its properties
-Then `processed_dir`, `processed_images_dir`, `processed_labels_dir` do not exist
-
-**AC-2: Dataset formation reads data dir**
-Given images and labels in `data_images_dir` / `data_labels_dir`
-When `form_dataset()` runs
-Then it reads from `data_images_dir` and validates labels from `data_labels_dir`
-
-**AC-3: Data sample reads data dir**
-Given images in `data_images_dir`
-When `form_data_sample()` runs
-Then it reads from `data_images_dir`
-
-**AC-4: E2e test uses data dirs only**
-Given the e2e test fixture
-When setting up test data
-Then it copies images/labels only to `data_images_dir`/`data_labels_dir` (no processed dir)
@@ -1,42 +0,0 @@
-# Use Hard Symlinks for Dataset
-
-**Task**: AZ-169_refactor_hard_symlinks
-**Name**: Use hard symlinks instead of file copies for dataset formation
-**Description**: Replace shutil.copy() with os.link() in dataset split creation to save disk space
-**Complexity**: 2 points
-**Dependencies**: AZ-168_refactor_remove_processed_dir
-**Component**: Training Pipeline
-**Tracker**: AZ-169
-**Epic**: AZ-164
-
-## Problem
-
-`copy_annotations()` in `train.py` uses `shutil.copy()` to duplicate images and labels into train/valid/test splits. For large datasets this wastes significant disk space.
-
-## Outcome
-
- Dataset formation uses `os.link()` (hard links) instead of `shutil.copy()`
- Fallback to `shutil.copy()` when hard links fail (cross-filesystem)
- No change in training behavior — YOLO reads hard-linked files identically
-
-## Scope
-
-### Included
- Replace `shutil.copy()` with `os.link()` in `copy_annotations()` inner `copy_image()` function
- Add try/except fallback to `shutil.copy()` for `OSError` (cross-filesystem)
-
-### Excluded
- Changing `form_data_sample()` in exports.py (separate utility, lower priority)
- Changing corrupted file handling
-
-## Acceptance Criteria
-
-**AC-1: Hard links used**
-Given images and labels in the data directory
-When `copy_annotations()` creates train/valid/test splits
-Then files are hard-linked via `os.link()`, not copied
-
-**AC-2: Fallback on failure**
-Given a cross-filesystem scenario where `os.link()` raises `OSError`
-When `copy_annotations()` encounters the error
-Then it falls back to `shutil.copy()` transparently