mirror of
https://github.com/azaion/ai-training.git
synced 2026-04-22 11:26:36 +00:00
142c6c4de8
- Replaced module-level path variables in constants.py with a structured Pydantic Config class. - Updated all relevant modules (train.py, augmentation.py, exports.py, dataset-visualiser.py, manual_run.py) to access paths through the new config structure. - Fixed bugs related to image processing and model saving. - Enhanced test infrastructure to accommodate the new configuration approach. This refactor improves code maintainability and clarity by centralizing configuration management.
44 lines
2.0 KiB
Markdown
44 lines
2.0 KiB
Markdown
# Module: convert-annotations
|
||
|
||
## Purpose
|
||
Standalone script that converts annotation files from external formats (Pascal VOC XML, oriented bounding box text) to YOLO format.
|
||
|
||
## Public Interface
|
||
|
||
| Function | Signature | Returns | Description |
|
||
|----------|-----------|---------|-------------|
|
||
| `convert` | `(folder, dest_folder, read_annotations, ann_format)` | — | Generic converter: reads images + annotations from folder, writes YOLO format to dest |
|
||
| `minmax2yolo` | `(width, height, xmin, xmax, ymin, ymax) -> tuple` | (cx, cy, w, h) | Converts pixel min/max coords to normalized YOLO center format |
|
||
| `read_pascal_voc` | `(width, height, s: str) -> list[str]` | YOLO label lines | Parses Pascal VOC XML, maps class names to IDs, outputs YOLO lines |
|
||
| `read_bbox_oriented` | `(width, height, s: str) -> list[str]` | YOLO label lines | Parses 14-column oriented bbox format, outputs YOLO lines (hardcoded class 2) |
|
||
| `rename_images` | `(folder)` | — | Renames files by trimming last 7 chars + replacing extension with .png |
|
||
|
||
## Internal Logic
|
||
- **convert()**: Iterates image files in source folder, reads corresponding annotation file, calls format-specific reader, copies image and writes YOLO label to destination.
|
||
- **Pascal VOC**: Parses XML `<object>` elements, maps class names via `name_class_map` (Truck→1, Car/Taxi→2), filters forbidden classes (Motorcycle). Default class = 1.
|
||
- **Oriented bbox**: 14-column space-separated format, extracts min/max from columns 6–13, hardcodes class to 2.
|
||
- **Validation**: Skips labels where normalized coordinates exceed 1.0 (out of bounds).
|
||
|
||
## Dependencies
|
||
- `cv2` (external) — image reading for dimensions
|
||
- `xml.etree.cElementTree` (stdlib) — Pascal VOC XML parsing
|
||
- `os`, `shutil`, `pathlib` (stdlib)
|
||
|
||
## Consumers
|
||
None (standalone script).
|
||
|
||
## Data Models
|
||
None.
|
||
|
||
## Configuration
|
||
Hardcoded class mappings: `name_class_map = {'Truck': 1, 'Car': 2, 'Taxi': 2}`, `forbidden_classes = ['Motorcycle']`.
|
||
|
||
## External Integrations
|
||
Filesystem I/O only.
|
||
|
||
## Security
|
||
None.
|
||
|
||
## Tests
|
||
None.
|