mirror of
https://github.com/azaion/ai-training.git
synced 2026-04-23 01:36:34 +00:00
[AZ-165] [AZ-166] [AZ-167] [AZ-168] [AZ-169] Complete refactoring: delete dead augmentation.py, move tasks to done
- Delete src/augmentation.py (dead code with broken processed_dir refs after AZ-168) - Remove dead Augmentator import from manual_run.py - Move all 5 refactoring tasks from todo/ to done/ - Update autopilot state: Step 7 Refactor complete, advance to Step 8 New Task - Strengthen tracker.mdc: NEVER use ADO MCP Made-with: Cursor
This commit is contained in:
@@ -1,54 +0,0 @@
|
||||
# Unify Configuration
|
||||
|
||||
**Task**: AZ-165_refactor_unify_config
|
||||
**Name**: Unify configuration — remove annotation-queue/config.yaml
|
||||
**Description**: Consolidate two config files into one shared Config model
|
||||
**Complexity**: 3 points
|
||||
**Dependencies**: None
|
||||
**Component**: Configuration
|
||||
**Tracker**: AZ-165
|
||||
**Epic**: AZ-164
|
||||
|
||||
## Problem
|
||||
|
||||
Two separate `config.yaml` files exist (root and `src/annotation-queue/`) with overlapping content but different `dirs` values. The annotation queue handler parses YAML manually instead of using the shared `Config` Pydantic model, creating drift risk.
|
||||
|
||||
## Outcome
|
||||
|
||||
- Single `Config` model in `constants.py` covers all configuration including queue settings
|
||||
- `annotation_queue_handler.py` uses the shared `Config` instead of parsing its own YAML
|
||||
- `src/annotation-queue/config.yaml` is deleted
|
||||
|
||||
## Scope
|
||||
|
||||
### Included
|
||||
- Add Pydantic models for `ApiConfig`, `QueueConfig`; extend `DirsConfig` with all directory fields (data, data_seed, data_processed, data_deleted, images, labels)
|
||||
- Add these to the `Config` Pydantic model in `constants.py`
|
||||
- Refactor `annotation_queue_handler.py` constructor to accept/import the shared Pydantic `Config`
|
||||
- Delete `src/annotation-queue/config.yaml`
|
||||
|
||||
### Excluded
|
||||
- Changing queue connection logic or message handling
|
||||
- Modifying root `config.yaml` structure beyond adding queue section (it already has it)
|
||||
|
||||
## Acceptance Criteria
|
||||
|
||||
**AC-1: Single config source**
|
||||
Given the root `config.yaml` contains queue and dirs settings
|
||||
When `annotation_queue_handler.py` initializes
|
||||
Then it reads configuration from the shared `Config` model, not a local YAML file
|
||||
|
||||
**AC-2: No duplicate config file**
|
||||
Given the refactoring is complete
|
||||
When listing `src/annotation-queue/`
|
||||
Then `config.yaml` does not exist
|
||||
|
||||
**AC-3: Annotation queue behavior preserved**
|
||||
Given the unified configuration
|
||||
When the annotation queue handler processes messages
|
||||
Then it uses the correct directory paths from configuration
|
||||
|
||||
## Constraints
|
||||
|
||||
- Root `config.yaml` already has the `queue` section — reuse it
|
||||
- `annotation_queue_handler.py` runs as a separate process — config import path must work from its working directory
|
||||
@@ -1,56 +0,0 @@
|
||||
# Update YOLO Model
|
||||
|
||||
**Task**: AZ-166_refactor_yolo_model
|
||||
**Name**: Update YOLO model to 26m variant (supports both from-scratch and pretrained)
|
||||
**Description**: Update model references from YOLO11m to YOLO26m; support both training from scratch (`.yaml`) and from pretrained weights (`.pt`)
|
||||
**Complexity**: 2 points
|
||||
**Dependencies**: None
|
||||
**Component**: Training Pipeline
|
||||
**Tracker**: AZ-166
|
||||
**Epic**: AZ-164
|
||||
|
||||
## Problem
|
||||
|
||||
Current `TrainingConfig.model` is set to `yolo11m.yaml` which defines a YOLO11 architecture. YOLO26m is the latest model variant. The system should support both training modes:
|
||||
1. **From scratch** — using `yolo26m.yaml` (architecture definition, trains from random weights)
|
||||
2. **From pretrained** — using `yolo26m.pt` (pretrained weights, faster convergence)
|
||||
|
||||
## Outcome
|
||||
|
||||
- `TrainingConfig` default model updated to `yolo26m.pt` (pretrained, recommended default)
|
||||
- `config.yaml` updated to `yolo26m.pt`
|
||||
- Both `yolo26m.pt` and `yolo26m.yaml` work when set in `config.yaml`
|
||||
- `train_dataset()` and `resume_training()` work with either model reference
|
||||
|
||||
## Scope
|
||||
|
||||
### Included
|
||||
- Update `TrainingConfig.model` default from `yolo11m.yaml` to `yolo26m.pt`
|
||||
- Update `config.yaml` training.model from `yolo11m.yaml` to `yolo26m.pt`
|
||||
- Verify `train_dataset()` works with both `.pt` and `.yaml` model values
|
||||
|
||||
### Excluded
|
||||
- Changing training hyperparameters (epochs, batch, imgsz)
|
||||
- Updating ultralytics library version
|
||||
|
||||
## Acceptance Criteria
|
||||
|
||||
**AC-1: Default model config updated**
|
||||
Given the training configuration
|
||||
When reading `TrainingConfig.model`
|
||||
Then the default value is `yolo26m.pt`
|
||||
|
||||
**AC-2: config.yaml updated**
|
||||
Given the root `config.yaml`
|
||||
When reading `training.model`
|
||||
Then the value is `yolo26m.pt`
|
||||
|
||||
**AC-3: From-scratch training supported**
|
||||
Given `config.yaml` sets `training.model: yolo26m.yaml`
|
||||
When `YOLO(constants.config.training.model)` is called
|
||||
Then a YOLO26m model is built from the architecture definition
|
||||
|
||||
**AC-4: Pretrained training supported**
|
||||
Given `config.yaml` sets `training.model: yolo26m.pt`
|
||||
When `YOLO(constants.config.training.model)` is called
|
||||
Then a YOLO26m model is loaded from pretrained weights
|
||||
@@ -1,55 +0,0 @@
|
||||
# Replace External Augmentation with YOLO Built-in
|
||||
|
||||
**Task**: AZ-167_refactor_builtin_augmentation
|
||||
**Name**: Replace external augmentation with YOLO built-in
|
||||
**Description**: Remove albumentations pipeline and use YOLO model.train() built-in augmentation parameters
|
||||
**Complexity**: 3 points
|
||||
**Dependencies**: AZ-166_refactor_yolo_model
|
||||
**Component**: Training Pipeline
|
||||
**Tracker**: AZ-167
|
||||
**Epic**: AZ-164
|
||||
|
||||
## Problem
|
||||
|
||||
`augmentation.py` uses the `albumentations` library to augment images into a `processed_dir` before training. This creates a separate processing step, uses extra disk space (8x original), and adds complexity. YOLO's built-in augmentation applies on-the-fly during training.
|
||||
|
||||
## Outcome
|
||||
|
||||
- `train_dataset()` passes augmentation parameters directly to `model.train()`
|
||||
- Each augmentation parameter is on its own line with a descriptive comment
|
||||
- The external augmentation step is removed from the training pipeline
|
||||
- `augmentation.py` is no longer called during training
|
||||
|
||||
## Scope
|
||||
|
||||
### Included
|
||||
- Add YOLO built-in augmentation parameters to `model.train()` call in `train_dataset()`
|
||||
- Parameters to add: hsv_h, hsv_s, hsv_v, degrees, translate, scale, shear, fliplr, mosaic (each with comment)
|
||||
- Remove call to augmentation from training flow
|
||||
|
||||
### Excluded
|
||||
- Deleting `augmentation.py` file (may still be useful standalone)
|
||||
- Changing training hyperparameters unrelated to augmentation
|
||||
|
||||
## Acceptance Criteria
|
||||
|
||||
**AC-1: Built-in augmentation parameters with comments**
|
||||
Given the `train_dataset()` function
|
||||
When `model.train()` is called
|
||||
Then every parameter (including augmentation: hsv_h, hsv_s, hsv_v, degrees, scale, shear, fliplr, mosaic, and training: data, epochs, batch, imgsz, etc.) is on its own line with an inline comment explaining what the parameter controls
|
||||
|
||||
**AC-2: No external augmentation in training flow**
|
||||
Given the training pipeline
|
||||
When `train_dataset()` runs
|
||||
Then it does not call `augment_annotations()` or any albumentations-based augmentation
|
||||
|
||||
## Constraints
|
||||
|
||||
- Every parameter row in the `model.train()` call MUST have an inline comment describing what it does (e.g. `hsv_h=0.015, # hue shift fraction of the color wheel`)
|
||||
- This applies to ALL parameters, not just augmentation — training params (data, epochs, batch, imgsz, save_period, workers) also need comments
|
||||
- Augmentation parameter values should approximate the current albumentations settings:
|
||||
- fliplr=0.6 (was HorizontalFlip p=0.6)
|
||||
- degrees=35.0 (was Affine rotate=(-35,35))
|
||||
- shear=10.0 (was Affine shear=(-10,10))
|
||||
- hsv_h=0.015, hsv_s=0.7, hsv_v=0.4 (approximate HSV shifts)
|
||||
- mosaic=1.0 (new YOLO built-in, recommended default)
|
||||
@@ -1,60 +0,0 @@
|
||||
# Remove Processed Directory
|
||||
|
||||
**Task**: AZ-168_refactor_remove_processed_dir
|
||||
**Name**: Remove processed directory — use data dir directly
|
||||
**Description**: Eliminate processed_dir concept from Config and all consumers; read from data dir directly; update e2e test fixture
|
||||
**Complexity**: 3 points
|
||||
**Dependencies**: AZ-167_refactor_builtin_augmentation
|
||||
**Component**: Training Pipeline, Data Utilities
|
||||
**Tracker**: AZ-168
|
||||
**Epic**: AZ-164
|
||||
|
||||
## Problem
|
||||
|
||||
`Config` exposes `processed_dir`, `processed_images_dir`, `processed_labels_dir` properties. Multiple files read from the processed directory: `train.py::form_dataset()`, `exports.py::form_data_sample()`, `dataset-visualiser.py::visualise_processed_folder()`. With built-in augmentation, the processed directory is no longer populated.
|
||||
|
||||
The e2e test fixture (`tests/test_training_e2e.py`) currently copies images to both `data_images_dir` and `processed_images_dir` as a workaround — this needs cleanup once `form_dataset()` reads from data dirs.
|
||||
|
||||
## Outcome
|
||||
|
||||
- `Config` no longer has `processed_dir`/`processed_images_dir`/`processed_labels_dir` properties
|
||||
- `form_dataset()` reads images/labels from `data_images_dir`/`data_labels_dir`
|
||||
- `form_data_sample()` reads from `data_images_dir`
|
||||
- `visualise_processed_folder()` reads from `data_images_dir`/`data_labels_dir`
|
||||
- E2e test fixture copies images only to `data_images_dir`/`data_labels_dir` (no more processed dir population)
|
||||
|
||||
## Scope
|
||||
|
||||
### Included
|
||||
- Remove `processed_dir`, `processed_images_dir`, `processed_labels_dir` from `Config`
|
||||
- Update `form_dataset()` in `train.py` to use `data_images_dir` and `data_labels_dir`
|
||||
- Update `copy_annotations()` in `train.py` to look up labels from `data_labels_dir` instead of `processed_labels_dir`
|
||||
- Update `form_data_sample()` in `exports.py` to use `data_images_dir`
|
||||
- Update `visualise_processed_folder()` in `dataset-visualiser.py`
|
||||
- Update `tests/test_training_e2e.py` e2e fixture: remove processed dir population (only copy to data dirs)
|
||||
|
||||
### Excluded
|
||||
- Removing `augmentation.py` file
|
||||
- Changing `corrupted_dir` handling
|
||||
|
||||
## Acceptance Criteria
|
||||
|
||||
**AC-1: No processed dir in Config**
|
||||
Given the `Config` class
|
||||
When inspecting its properties
|
||||
Then `processed_dir`, `processed_images_dir`, `processed_labels_dir` do not exist
|
||||
|
||||
**AC-2: Dataset formation reads data dir**
|
||||
Given images and labels in `data_images_dir` / `data_labels_dir`
|
||||
When `form_dataset()` runs
|
||||
Then it reads from `data_images_dir` and validates labels from `data_labels_dir`
|
||||
|
||||
**AC-3: Data sample reads data dir**
|
||||
Given images in `data_images_dir`
|
||||
When `form_data_sample()` runs
|
||||
Then it reads from `data_images_dir`
|
||||
|
||||
**AC-4: E2e test uses data dirs only**
|
||||
Given the e2e test fixture
|
||||
When setting up test data
|
||||
Then it copies images/labels only to `data_images_dir`/`data_labels_dir` (no processed dir)
|
||||
@@ -1,42 +0,0 @@
|
||||
# Use Hard Symlinks for Dataset
|
||||
|
||||
**Task**: AZ-169_refactor_hard_symlinks
|
||||
**Name**: Use hard symlinks instead of file copies for dataset formation
|
||||
**Description**: Replace shutil.copy() with os.link() in dataset split creation to save disk space
|
||||
**Complexity**: 2 points
|
||||
**Dependencies**: AZ-168_refactor_remove_processed_dir
|
||||
**Component**: Training Pipeline
|
||||
**Tracker**: AZ-169
|
||||
**Epic**: AZ-164
|
||||
|
||||
## Problem
|
||||
|
||||
`copy_annotations()` in `train.py` uses `shutil.copy()` to duplicate images and labels into train/valid/test splits. For large datasets this wastes significant disk space.
|
||||
|
||||
## Outcome
|
||||
|
||||
- Dataset formation uses `os.link()` (hard links) instead of `shutil.copy()`
|
||||
- Fallback to `shutil.copy()` when hard links fail (cross-filesystem)
|
||||
- No change in training behavior — YOLO reads hard-linked files identically
|
||||
|
||||
## Scope
|
||||
|
||||
### Included
|
||||
- Replace `shutil.copy()` with `os.link()` in `copy_annotations()` inner `copy_image()` function
|
||||
- Add try/except fallback to `shutil.copy()` for `OSError` (cross-filesystem)
|
||||
|
||||
### Excluded
|
||||
- Changing `form_data_sample()` in exports.py (separate utility, lower priority)
|
||||
- Changing corrupted file handling
|
||||
|
||||
## Acceptance Criteria
|
||||
|
||||
**AC-1: Hard links used**
|
||||
Given images and labels in the data directory
|
||||
When `copy_annotations()` creates train/valid/test splits
|
||||
Then files are hard-linked via `os.link()`, not copied
|
||||
|
||||
**AC-2: Fallback on failure**
|
||||
Given a cross-filesystem scenario where `os.link()` raises `OSError`
|
||||
When `copy_annotations()` encounters the error
|
||||
Then it falls back to `shutil.copy()` transparently
|
||||
Reference in New Issue
Block a user