ai-training/_docs/02_document/modules/augmentation.md

# Module: augmentation

## Purpose
Image augmentation pipeline that takes raw annotated images and produces multiple augmented variants for training data expansion. Runs continuously in a loop.

## Public Interface

### Augmentator
| Method | Signature | Returns | Description |
|--------|-----------|---------|-------------|
| `__init__` | `()` | — | Initializes augmentation transforms and counters |
| `augment_annotations` | `(from_scratch: bool = False)` | — | Processes all unprocessed images from `data/images` → `data-processed/images` |
| `augment_annotation` | `(image_file)` | — | Processes a single image file: reads image + labels, augments, saves results |
| `augment_inner` | `(img_ann: ImageLabel) -> list[ImageLabel]` | List of augmented images | Generates 1 original + 7 augmented variants |
| `correct_bboxes` | `(labels) -> list` | Corrected labels | Clips bounding boxes to image boundaries, removes tiny boxes |
| `read_labels` | `(labels_path) -> list[list]` | Parsed YOLO labels | Reads YOLO-format label file into list of [x, y, w, h, class_id] |

## Internal Logic
- **Augmentation pipeline** (albumentations Compose):
  1. HorizontalFlip (p=0.6)
  2. RandomBrightnessContrast (p=0.4)
  3. Affine: scale 0.8–1.2, rotate ±35°, shear ±10° (p=0.8)
  4. MotionBlur (p=0.1)
  5. HueSaturationValue (p=0.4)
- Each image produces **8 outputs**: 1 original copy + 7 augmented variants
- Naming: `{stem}_{1..7}.jpg` for augmented, original keeps its name
- **Bbox correction**: clips bounding boxes that extend outside image borders, removes boxes smaller than `correct_min_bbox_size` (0.01 of image dimension)
- **Incremental processing**: skips images already present in `processed_images_dir`
- **Concurrent**: uses `ThreadPoolExecutor` for parallel processing
- **Continuous mode**: `__main__` runs augmentation in an infinite loop with 5-minute sleep between rounds

## Dependencies
- `constants` — directory paths (data_images_dir, data_labels_dir, processed_*)
- `dto/imageLabel` — ImageLabel container class
- `albumentations` (external) — augmentation transforms
- `cv2` (external) — image read/write
- `numpy` (external) — image array handling
- `concurrent.futures`, `os`, `shutil`, `time`, `datetime`, `pathlib` (stdlib)

## Consumers
manual_run

## Data Models
Uses `ImageLabel` from `dto/imageLabel`.

## Configuration
Hardcoded augmentation parameters (probabilities, ranges). Directory paths from `constants`.

## External Integrations
Filesystem I/O: reads from `/azaion/data/`, writes to `/azaion/data-processed/`.

## Security
None.

## Tests
None.