mirror of
https://github.com/azaion/ai-training.git
synced 2026-04-23 00:26:35 +00:00
Update configuration and test structure for improved clarity and functionality
- Modified `.gitignore` to include test fixture data while excluding test results. - Updated `config.yaml` to change the model from 'yolo11m.yaml' to 'yolo26m.pt'. - Enhanced `.cursor/rules/coderule.mdc` with additional guidelines for test environment consistency and infrastructure handling. - Revised autopilot state management in `_docs/_autopilot_state.md` to reflect current progress and tasks. - Removed outdated augmentation tests and adjusted dataset formation tests to align with the new structure. These changes streamline the configuration and testing processes, ensuring better organization and clarity in the project.
This commit is contained in:
@@ -0,0 +1,54 @@
|
||||
# Unify Configuration
|
||||
|
||||
**Task**: AZ-165_refactor_unify_config
|
||||
**Name**: Unify configuration — remove annotation-queue/config.yaml
|
||||
**Description**: Consolidate two config files into one shared Config model
|
||||
**Complexity**: 3 points
|
||||
**Dependencies**: None
|
||||
**Component**: Configuration
|
||||
**Tracker**: AZ-165
|
||||
**Epic**: AZ-164
|
||||
|
||||
## Problem
|
||||
|
||||
Two separate `config.yaml` files exist (root and `src/annotation-queue/`) with overlapping content but different `dirs` values. The annotation queue handler parses YAML manually instead of using the shared `Config` Pydantic model, creating drift risk.
|
||||
|
||||
## Outcome
|
||||
|
||||
- Single `Config` model in `constants.py` covers all configuration including queue settings
|
||||
- `annotation_queue_handler.py` uses the shared `Config` instead of parsing its own YAML
|
||||
- `src/annotation-queue/config.yaml` is deleted
|
||||
|
||||
## Scope
|
||||
|
||||
### Included
|
||||
- Add Pydantic models for `ApiConfig`, `QueueConfig`; extend `DirsConfig` with all directory fields (data, data_seed, data_processed, data_deleted, images, labels)
|
||||
- Add these to the `Config` Pydantic model in `constants.py`
|
||||
- Refactor `annotation_queue_handler.py` constructor to accept/import the shared Pydantic `Config`
|
||||
- Delete `src/annotation-queue/config.yaml`
|
||||
|
||||
### Excluded
|
||||
- Changing queue connection logic or message handling
|
||||
- Modifying root `config.yaml` structure beyond adding queue section (it already has it)
|
||||
|
||||
## Acceptance Criteria
|
||||
|
||||
**AC-1: Single config source**
|
||||
Given the root `config.yaml` contains queue and dirs settings
|
||||
When `annotation_queue_handler.py` initializes
|
||||
Then it reads configuration from the shared `Config` model, not a local YAML file
|
||||
|
||||
**AC-2: No duplicate config file**
|
||||
Given the refactoring is complete
|
||||
When listing `src/annotation-queue/`
|
||||
Then `config.yaml` does not exist
|
||||
|
||||
**AC-3: Annotation queue behavior preserved**
|
||||
Given the unified configuration
|
||||
When the annotation queue handler processes messages
|
||||
Then it uses the correct directory paths from configuration
|
||||
|
||||
## Constraints
|
||||
|
||||
- Root `config.yaml` already has the `queue` section — reuse it
|
||||
- `annotation_queue_handler.py` runs as a separate process — config import path must work from its working directory
|
||||
@@ -0,0 +1,56 @@
|
||||
# Update YOLO Model
|
||||
|
||||
**Task**: AZ-166_refactor_yolo_model
|
||||
**Name**: Update YOLO model to 26m variant (supports both from-scratch and pretrained)
|
||||
**Description**: Update model references from YOLO11m to YOLO26m; support both training from scratch (`.yaml`) and from pretrained weights (`.pt`)
|
||||
**Complexity**: 2 points
|
||||
**Dependencies**: None
|
||||
**Component**: Training Pipeline
|
||||
**Tracker**: AZ-166
|
||||
**Epic**: AZ-164
|
||||
|
||||
## Problem
|
||||
|
||||
Current `TrainingConfig.model` is set to `yolo11m.yaml` which defines a YOLO11 architecture. YOLO26m is the latest model variant. The system should support both training modes:
|
||||
1. **From scratch** — using `yolo26m.yaml` (architecture definition, trains from random weights)
|
||||
2. **From pretrained** — using `yolo26m.pt` (pretrained weights, faster convergence)
|
||||
|
||||
## Outcome
|
||||
|
||||
- `TrainingConfig` default model updated to `yolo26m.pt` (pretrained, recommended default)
|
||||
- `config.yaml` updated to `yolo26m.pt`
|
||||
- Both `yolo26m.pt` and `yolo26m.yaml` work when set in `config.yaml`
|
||||
- `train_dataset()` and `resume_training()` work with either model reference
|
||||
|
||||
## Scope
|
||||
|
||||
### Included
|
||||
- Update `TrainingConfig.model` default from `yolo11m.yaml` to `yolo26m.pt`
|
||||
- Update `config.yaml` training.model from `yolo11m.yaml` to `yolo26m.pt`
|
||||
- Verify `train_dataset()` works with both `.pt` and `.yaml` model values
|
||||
|
||||
### Excluded
|
||||
- Changing training hyperparameters (epochs, batch, imgsz)
|
||||
- Updating ultralytics library version
|
||||
|
||||
## Acceptance Criteria
|
||||
|
||||
**AC-1: Default model config updated**
|
||||
Given the training configuration
|
||||
When reading `TrainingConfig.model`
|
||||
Then the default value is `yolo26m.pt`
|
||||
|
||||
**AC-2: config.yaml updated**
|
||||
Given the root `config.yaml`
|
||||
When reading `training.model`
|
||||
Then the value is `yolo26m.pt`
|
||||
|
||||
**AC-3: From-scratch training supported**
|
||||
Given `config.yaml` sets `training.model: yolo26m.yaml`
|
||||
When `YOLO(constants.config.training.model)` is called
|
||||
Then a YOLO26m model is built from the architecture definition
|
||||
|
||||
**AC-4: Pretrained training supported**
|
||||
Given `config.yaml` sets `training.model: yolo26m.pt`
|
||||
When `YOLO(constants.config.training.model)` is called
|
||||
Then a YOLO26m model is loaded from pretrained weights
|
||||
@@ -0,0 +1,55 @@
|
||||
# Replace External Augmentation with YOLO Built-in
|
||||
|
||||
**Task**: AZ-167_refactor_builtin_augmentation
|
||||
**Name**: Replace external augmentation with YOLO built-in
|
||||
**Description**: Remove albumentations pipeline and use YOLO model.train() built-in augmentation parameters
|
||||
**Complexity**: 3 points
|
||||
**Dependencies**: AZ-166_refactor_yolo_model
|
||||
**Component**: Training Pipeline
|
||||
**Tracker**: AZ-167
|
||||
**Epic**: AZ-164
|
||||
|
||||
## Problem
|
||||
|
||||
`augmentation.py` uses the `albumentations` library to augment images into a `processed_dir` before training. This creates a separate processing step, uses extra disk space (8x original), and adds complexity. YOLO's built-in augmentation applies on-the-fly during training.
|
||||
|
||||
## Outcome
|
||||
|
||||
- `train_dataset()` passes augmentation parameters directly to `model.train()`
|
||||
- Each augmentation parameter is on its own line with a descriptive comment
|
||||
- The external augmentation step is removed from the training pipeline
|
||||
- `augmentation.py` is no longer called during training
|
||||
|
||||
## Scope
|
||||
|
||||
### Included
|
||||
- Add YOLO built-in augmentation parameters to `model.train()` call in `train_dataset()`
|
||||
- Parameters to add: hsv_h, hsv_s, hsv_v, degrees, translate, scale, shear, fliplr, mosaic (each with comment)
|
||||
- Remove call to augmentation from training flow
|
||||
|
||||
### Excluded
|
||||
- Deleting `augmentation.py` file (may still be useful standalone)
|
||||
- Changing training hyperparameters unrelated to augmentation
|
||||
|
||||
## Acceptance Criteria
|
||||
|
||||
**AC-1: Built-in augmentation parameters with comments**
|
||||
Given the `train_dataset()` function
|
||||
When `model.train()` is called
|
||||
Then every parameter (including augmentation: hsv_h, hsv_s, hsv_v, degrees, scale, shear, fliplr, mosaic, and training: data, epochs, batch, imgsz, etc.) is on its own line with an inline comment explaining what the parameter controls
|
||||
|
||||
**AC-2: No external augmentation in training flow**
|
||||
Given the training pipeline
|
||||
When `train_dataset()` runs
|
||||
Then it does not call `augment_annotations()` or any albumentations-based augmentation
|
||||
|
||||
## Constraints
|
||||
|
||||
- Every parameter row in the `model.train()` call MUST have an inline comment describing what it does (e.g. `hsv_h=0.015, # hue shift fraction of the color wheel`)
|
||||
- This applies to ALL parameters, not just augmentation — training params (data, epochs, batch, imgsz, save_period, workers) also need comments
|
||||
- Augmentation parameter values should approximate the current albumentations settings:
|
||||
- fliplr=0.6 (was HorizontalFlip p=0.6)
|
||||
- degrees=35.0 (was Affine rotate=(-35,35))
|
||||
- shear=10.0 (was Affine shear=(-10,10))
|
||||
- hsv_h=0.015, hsv_s=0.7, hsv_v=0.4 (approximate HSV shifts)
|
||||
- mosaic=1.0 (new YOLO built-in, recommended default)
|
||||
@@ -0,0 +1,60 @@
|
||||
# Remove Processed Directory
|
||||
|
||||
**Task**: AZ-168_refactor_remove_processed_dir
|
||||
**Name**: Remove processed directory — use data dir directly
|
||||
**Description**: Eliminate processed_dir concept from Config and all consumers; read from data dir directly; update e2e test fixture
|
||||
**Complexity**: 3 points
|
||||
**Dependencies**: AZ-167_refactor_builtin_augmentation
|
||||
**Component**: Training Pipeline, Data Utilities
|
||||
**Tracker**: AZ-168
|
||||
**Epic**: AZ-164
|
||||
|
||||
## Problem
|
||||
|
||||
`Config` exposes `processed_dir`, `processed_images_dir`, `processed_labels_dir` properties. Multiple files read from the processed directory: `train.py::form_dataset()`, `exports.py::form_data_sample()`, `dataset-visualiser.py::visualise_processed_folder()`. With built-in augmentation, the processed directory is no longer populated.
|
||||
|
||||
The e2e test fixture (`tests/test_training_e2e.py`) currently copies images to both `data_images_dir` and `processed_images_dir` as a workaround — this needs cleanup once `form_dataset()` reads from data dirs.
|
||||
|
||||
## Outcome
|
||||
|
||||
- `Config` no longer has `processed_dir`/`processed_images_dir`/`processed_labels_dir` properties
|
||||
- `form_dataset()` reads images/labels from `data_images_dir`/`data_labels_dir`
|
||||
- `form_data_sample()` reads from `data_images_dir`
|
||||
- `visualise_processed_folder()` reads from `data_images_dir`/`data_labels_dir`
|
||||
- E2e test fixture copies images only to `data_images_dir`/`data_labels_dir` (no more processed dir population)
|
||||
|
||||
## Scope
|
||||
|
||||
### Included
|
||||
- Remove `processed_dir`, `processed_images_dir`, `processed_labels_dir` from `Config`
|
||||
- Update `form_dataset()` in `train.py` to use `data_images_dir` and `data_labels_dir`
|
||||
- Update `copy_annotations()` in `train.py` to look up labels from `data_labels_dir` instead of `processed_labels_dir`
|
||||
- Update `form_data_sample()` in `exports.py` to use `data_images_dir`
|
||||
- Update `visualise_processed_folder()` in `dataset-visualiser.py`
|
||||
- Update `tests/test_training_e2e.py` e2e fixture: remove processed dir population (only copy to data dirs)
|
||||
|
||||
### Excluded
|
||||
- Removing `augmentation.py` file
|
||||
- Changing `corrupted_dir` handling
|
||||
|
||||
## Acceptance Criteria
|
||||
|
||||
**AC-1: No processed dir in Config**
|
||||
Given the `Config` class
|
||||
When inspecting its properties
|
||||
Then `processed_dir`, `processed_images_dir`, `processed_labels_dir` do not exist
|
||||
|
||||
**AC-2: Dataset formation reads data dir**
|
||||
Given images and labels in `data_images_dir` / `data_labels_dir`
|
||||
When `form_dataset()` runs
|
||||
Then it reads from `data_images_dir` and validates labels from `data_labels_dir`
|
||||
|
||||
**AC-3: Data sample reads data dir**
|
||||
Given images in `data_images_dir`
|
||||
When `form_data_sample()` runs
|
||||
Then it reads from `data_images_dir`
|
||||
|
||||
**AC-4: E2e test uses data dirs only**
|
||||
Given the e2e test fixture
|
||||
When setting up test data
|
||||
Then it copies images/labels only to `data_images_dir`/`data_labels_dir` (no processed dir)
|
||||
@@ -0,0 +1,42 @@
|
||||
# Use Hard Symlinks for Dataset
|
||||
|
||||
**Task**: AZ-169_refactor_hard_symlinks
|
||||
**Name**: Use hard symlinks instead of file copies for dataset formation
|
||||
**Description**: Replace shutil.copy() with os.link() in dataset split creation to save disk space
|
||||
**Complexity**: 2 points
|
||||
**Dependencies**: AZ-168_refactor_remove_processed_dir
|
||||
**Component**: Training Pipeline
|
||||
**Tracker**: AZ-169
|
||||
**Epic**: AZ-164
|
||||
|
||||
## Problem
|
||||
|
||||
`copy_annotations()` in `train.py` uses `shutil.copy()` to duplicate images and labels into train/valid/test splits. For large datasets this wastes significant disk space.
|
||||
|
||||
## Outcome
|
||||
|
||||
- Dataset formation uses `os.link()` (hard links) instead of `shutil.copy()`
|
||||
- Fallback to `shutil.copy()` when hard links fail (cross-filesystem)
|
||||
- No change in training behavior — YOLO reads hard-linked files identically
|
||||
|
||||
## Scope
|
||||
|
||||
### Included
|
||||
- Replace `shutil.copy()` with `os.link()` in `copy_annotations()` inner `copy_image()` function
|
||||
- Add try/except fallback to `shutil.copy()` for `OSError` (cross-filesystem)
|
||||
|
||||
### Excluded
|
||||
- Changing `form_data_sample()` in exports.py (separate utility, lower priority)
|
||||
- Changing corrupted file handling
|
||||
|
||||
## Acceptance Criteria
|
||||
|
||||
**AC-1: Hard links used**
|
||||
Given images and labels in the data directory
|
||||
When `copy_annotations()` creates train/valid/test splits
|
||||
Then files are hard-linked via `os.link()`, not copied
|
||||
|
||||
**AC-2: Fallback on failure**
|
||||
Given a cross-filesystem scenario where `os.link()` raises `OSError`
|
||||
When `copy_annotations()` encounters the error
|
||||
Then it falls back to `shutil.copy()` transparently
|
||||
Reference in New Issue
Block a user