mirror of
https://github.com/azaion/ai-training.git
synced 2026-04-22 07:16:36 +00:00
[AZ-171] Enable dynamic batch size for ONNX, TensorRT, and CoreML exports
Made-with: Cursor
This commit is contained in:
@@ -111,8 +111,8 @@ Raw annotations (Queue) → /azaion/data-seed/ (unvalidated)
|
||||
| Format | Use | Export Details |
|
||||
|--------|-----|---------------|
|
||||
| `.pt` | Training checkpoint | YOLOv11 PyTorch weights |
|
||||
| `.onnx` | Cross-platform inference | 1280px, batch=4, NMS baked in |
|
||||
| `.engine` | GPU inference (production) | TensorRT FP16, batch=4, per-GPU architecture |
|
||||
| `.onnx` | Cross-platform inference | 1280px, dynamic batch (1–8), NMS baked in |
|
||||
| `.engine` | GPU inference (production) | TensorRT FP16, dynamic batch max 8, per-GPU architecture |
|
||||
| `.rknn` | Edge inference | RK3588 target (OrangePi5) |
|
||||
|
||||
## Integration Points
|
||||
|
||||
@@ -71,3 +71,27 @@ AZ-164 (Epic: Code Improvements Refactoring)
|
||||
- **Batch 2**: AZ-167 (built-in aug) + AZ-168 (remove processed dir) — sequential chain
|
||||
- **Batch 3**: AZ-169 (hard symlinks) — depends on batch 2
|
||||
- **Estimated batches**: 3
|
||||
|
||||
---
|
||||
|
||||
## New Feature Tasks (Epic: AZ-164)
|
||||
|
||||
**Date**: 2026-03-28
|
||||
**Total Tasks**: 1
|
||||
**Total Complexity Points**: 2
|
||||
|
||||
| Task | Name | Complexity | Dependencies | Epic |
|
||||
|------|------|-----------|-------------|------|
|
||||
| AZ-171 | dynamic_batch_export | 2 | None | AZ-164 |
|
||||
|
||||
### Dependency Graph
|
||||
|
||||
```
|
||||
AZ-164 (Epic: Code Improvements)
|
||||
└── AZ-171 dynamic_batch_export (independent)
|
||||
```
|
||||
|
||||
### Implementation Strategy
|
||||
|
||||
- **Batch 1**: AZ-171 (dynamic batch export) — single independent task
|
||||
- **Estimated batches**: 1
|
||||
|
||||
@@ -0,0 +1,93 @@
|
||||
# Dynamic Batch Size for Model Exports
|
||||
|
||||
**Task**: AZ-171_dynamic_batch_export
|
||||
**Name**: Dynamic batch size for ONNX, TensorRT, and CoreML exports
|
||||
**Description**: Enable exported models to accept a variable number of images per inference call instead of a fixed batch size
|
||||
**Complexity**: 2 points
|
||||
**Dependencies**: None
|
||||
**Component**: exports
|
||||
**Tracker**: AZ-171
|
||||
**Epic**: AZ-164
|
||||
|
||||
## Problem
|
||||
|
||||
Exported models (ONNX, TensorRT, CoreML) are currently locked to a fixed batch size (4). Consumers that need to run inference on fewer or more images must pad or split their input to match the fixed batch, wasting GPU resources or adding unnecessary complexity.
|
||||
|
||||
## Outcome
|
||||
|
||||
- All three export formats (ONNX, TensorRT, CoreML) accept any number of input images from 1 to 8 per inference call
|
||||
- No padding or input manipulation required by consumers
|
||||
- Existing inference engine (OnnxEngine) continues to work without changes
|
||||
|
||||
## Scope
|
||||
|
||||
### Included
|
||||
- Enable dynamic batch dimension in ONNX export
|
||||
- Enable dynamic batch dimension in TensorRT export (max batch size: 8)
|
||||
- Enable dynamic batch dimension in CoreML export
|
||||
- Update architecture documentation to reflect dynamic batch support
|
||||
|
||||
### Excluded
|
||||
- Dynamic image resolution (image size remains fixed at configured imgsz)
|
||||
- Changes to the inference engine (OnnxEngine already supports dynamic batch)
|
||||
- Changes to training pipeline
|
||||
- RKNN export (edge device, different constraints)
|
||||
|
||||
## Acceptance Criteria
|
||||
|
||||
**AC-1: ONNX dynamic batch**
|
||||
Given a model exported to ONNX format
|
||||
When inference is run with 1 image, then with 4 images, then with 8 images
|
||||
Then all three runs produce correct detection results without errors
|
||||
|
||||
**AC-2: TensorRT dynamic batch**
|
||||
Given a model exported to TensorRT format with max batch size 8
|
||||
When inference is run with 1 image, then with 4 images, then with 8 images
|
||||
Then all three runs produce correct detection results without errors
|
||||
|
||||
**AC-3: CoreML dynamic batch**
|
||||
Given a model exported to CoreML format
|
||||
When inference is run with 1 image, then with 4 images
|
||||
Then both runs produce correct detection results without errors
|
||||
|
||||
**AC-4: Backward compatibility**
|
||||
Given the updated export functions
|
||||
When called without explicit batch parameters
|
||||
Then they use the configured defaults and produce working models
|
||||
|
||||
## Non-Functional Requirements
|
||||
|
||||
**Performance**
|
||||
- Dynamic batch should not degrade inference latency by more than 5% compared to fixed batch at the same batch size
|
||||
|
||||
**Compatibility**
|
||||
- ONNX export must remain compatible with ONNX Runtime (CPU and CUDA providers)
|
||||
- TensorRT export must remain compatible with existing GPU inference pipeline
|
||||
|
||||
## Unit Tests
|
||||
|
||||
| AC Ref | What to Test | Required Outcome |
|
||||
|--------|-------------|-----------------|
|
||||
| AC-1 | export_onnx produces model with dynamic batch dim | ONNX model input shape has -1 as batch dimension |
|
||||
| AC-2 | export_tensorrt with dynamic=True and batch=8 | TensorRT engine accepts batch sizes 1 through 8 |
|
||||
| AC-3 | export_coreml with dynamic=True | CoreML model accepts variable batch input |
|
||||
| AC-4 | export functions with default parameters | Models export successfully with dynamic enabled |
|
||||
|
||||
## Blackbox Tests
|
||||
|
||||
| AC Ref | Initial Data/Conditions | What to Test | Expected Behavior | NFR References |
|
||||
|--------|------------------------|-------------|-------------------|----------------|
|
||||
| AC-1 | Trained .pt model, sample images | ONNX inference with batch=1,4,8 | Correct detections for all batch sizes | Performance |
|
||||
| AC-2 | Trained .pt model, NVIDIA GPU | TensorRT inference with batch=1,4,8 | Correct detections for all batch sizes | Performance |
|
||||
| AC-3 | Trained .pt model | CoreML inference with batch=1,4 | Correct detections for both batch sizes | Compatibility |
|
||||
|
||||
## Constraints
|
||||
|
||||
- TensorRT engines are GPU-architecture-specific; dynamic batch adds optimization profiles that increase engine build time
|
||||
- CoreML export is not supported on Windows
|
||||
|
||||
## Risks & Mitigation
|
||||
|
||||
**Risk 1: TensorRT engine build time increase**
|
||||
- *Risk*: Dynamic batch profiles may increase TensorRT engine build/optimization time
|
||||
- *Mitigation*: Acceptable trade-off; engine build is a one-time operation during export
|
||||
@@ -2,8 +2,8 @@
|
||||
|
||||
## Current Step
|
||||
flow: existing-code
|
||||
step: 8
|
||||
name: New Task
|
||||
step: 9
|
||||
name: Implement
|
||||
status: in_progress
|
||||
sub_step: 1 — Gather Feature Description
|
||||
sub_step: 0
|
||||
retry_count: 0
|
||||
|
||||
+4
-1
@@ -37,6 +37,7 @@ def export_onnx(model_path, batch_size=None):
|
||||
format="onnx",
|
||||
imgsz=constants.config.export.onnx_imgsz,
|
||||
batch=batch_size,
|
||||
dynamic=True,
|
||||
simplify=True,
|
||||
nms=True,
|
||||
)
|
||||
@@ -47,13 +48,15 @@ def export_coreml(model_path):
|
||||
model.export(
|
||||
format="coreml",
|
||||
imgsz=constants.config.export.onnx_imgsz,
|
||||
dynamic=True,
|
||||
)
|
||||
|
||||
|
||||
def export_tensorrt(model_path):
|
||||
YOLO(model_path).export(
|
||||
format='engine',
|
||||
batch=4,
|
||||
batch=8,
|
||||
dynamic=True,
|
||||
half=True,
|
||||
simplify=True,
|
||||
nms=True
|
||||
|
||||
Reference in New Issue
Block a user