mirror of
https://github.com/azaion/ai-training.git
synced 2026-04-22 23:46:35 +00:00
[AZ-171] Add TensorRT tests, AC coverage gate in implement skill, optimize test infrastructure
- Add TensorRT export tests with graceful skip when no GPU available - Add AC test coverage verification step (Step 8) to implement skill - Add test coverage gap analysis to new-task skill - Move exported_models fixture to conftest.py as session-scoped (shared across modules) - Reorder tests: e2e training runs first so images/labels are available for all tests - Consolidate teardown into single session-level cleanup in conftest.py - Fix infrastructure tests to count files dynamically instead of hardcoded 20 Made-with: Cursor
This commit is contained in:
@@ -0,0 +1,93 @@
|
||||
# Dynamic Batch Size for Model Exports
|
||||
|
||||
**Task**: AZ-171_dynamic_batch_export
|
||||
**Name**: Dynamic batch size for ONNX, TensorRT, and CoreML exports
|
||||
**Description**: Enable exported models to accept a variable number of images per inference call instead of a fixed batch size
|
||||
**Complexity**: 2 points
|
||||
**Dependencies**: None
|
||||
**Component**: exports
|
||||
**Tracker**: AZ-171
|
||||
**Epic**: AZ-164
|
||||
|
||||
## Problem
|
||||
|
||||
Exported models (ONNX, TensorRT, CoreML) are currently locked to a fixed batch size (4). Consumers that need to run inference on fewer or more images must pad or split their input to match the fixed batch, wasting GPU resources or adding unnecessary complexity.
|
||||
|
||||
## Outcome
|
||||
|
||||
- All three export formats (ONNX, TensorRT, CoreML) accept any number of input images from 1 to 8 per inference call
|
||||
- No padding or input manipulation required by consumers
|
||||
- Existing inference engine (OnnxEngine) continues to work without changes
|
||||
|
||||
## Scope
|
||||
|
||||
### Included
|
||||
- Enable dynamic batch dimension in ONNX export
|
||||
- Enable dynamic batch dimension in TensorRT export (max batch size: 8)
|
||||
- Enable dynamic batch dimension in CoreML export
|
||||
- Update architecture documentation to reflect dynamic batch support
|
||||
|
||||
### Excluded
|
||||
- Dynamic image resolution (image size remains fixed at configured imgsz)
|
||||
- Changes to the inference engine (OnnxEngine already supports dynamic batch)
|
||||
- Changes to training pipeline
|
||||
- RKNN export (edge device, different constraints)
|
||||
|
||||
## Acceptance Criteria
|
||||
|
||||
**AC-1: ONNX dynamic batch**
|
||||
Given a model exported to ONNX format
|
||||
When inference is run with 1 image, then with 4 images, then with 8 images
|
||||
Then all three runs produce correct detection results without errors
|
||||
|
||||
**AC-2: TensorRT dynamic batch**
|
||||
Given a model exported to TensorRT format with max batch size 8
|
||||
When inference is run with 1 image, then with 4 images, then with 8 images
|
||||
Then all three runs produce correct detection results without errors
|
||||
|
||||
**AC-3: CoreML dynamic batch**
|
||||
Given a model exported to CoreML format
|
||||
When inference is run with 1 image, then with 4 images
|
||||
Then both runs produce correct detection results without errors
|
||||
|
||||
**AC-4: Backward compatibility**
|
||||
Given the updated export functions
|
||||
When called without explicit batch parameters
|
||||
Then they use the configured defaults and produce working models
|
||||
|
||||
## Non-Functional Requirements
|
||||
|
||||
**Performance**
|
||||
- Dynamic batch should not degrade inference latency by more than 5% compared to fixed batch at the same batch size
|
||||
|
||||
**Compatibility**
|
||||
- ONNX export must remain compatible with ONNX Runtime (CPU and CUDA providers)
|
||||
- TensorRT export must remain compatible with existing GPU inference pipeline
|
||||
|
||||
## Unit Tests
|
||||
|
||||
| AC Ref | What to Test | Required Outcome |
|
||||
|--------|-------------|-----------------|
|
||||
| AC-1 | export_onnx produces model with dynamic batch dim | ONNX model input shape has -1 as batch dimension |
|
||||
| AC-2 | export_tensorrt with dynamic=True and batch=8 | TensorRT engine accepts batch sizes 1 through 8 |
|
||||
| AC-3 | export_coreml with dynamic=True | CoreML model accepts variable batch input |
|
||||
| AC-4 | export functions with default parameters | Models export successfully with dynamic enabled |
|
||||
|
||||
## Blackbox Tests
|
||||
|
||||
| AC Ref | Initial Data/Conditions | What to Test | Expected Behavior | NFR References |
|
||||
|--------|------------------------|-------------|-------------------|----------------|
|
||||
| AC-1 | Trained .pt model, sample images | ONNX inference with batch=1,4,8 | Correct detections for all batch sizes | Performance |
|
||||
| AC-2 | Trained .pt model, NVIDIA GPU | TensorRT inference with batch=1,4,8 | Correct detections for all batch sizes | Performance |
|
||||
| AC-3 | Trained .pt model | CoreML inference with batch=1,4 | Correct detections for both batch sizes | Compatibility |
|
||||
|
||||
## Constraints
|
||||
|
||||
- TensorRT engines are GPU-architecture-specific; dynamic batch adds optimization profiles that increase engine build time
|
||||
- CoreML export is not supported on Windows
|
||||
|
||||
## Risks & Mitigation
|
||||
|
||||
**Risk 1: TensorRT engine build time increase**
|
||||
- *Risk*: Dynamic batch profiles may increase TensorRT engine build/optimization time
|
||||
- *Mitigation*: Acceptable trade-off; engine build is a one-time operation during export
|
||||
Reference in New Issue
Block a user