mirror of
https://github.com/azaion/detections.git
synced 2026-04-22 11:26:33 +00:00
Add detailed file index and enhance skill documentation for autopilot, decompose, deploy, plan, and research skills. Introduce tests-only mode in decompose skill, clarify required files for deploy and plan skills, and improve prerequisite checks across skills for better user guidance and workflow efficiency.
This commit is contained in:
@@ -0,0 +1,86 @@
|
||||
# Component: Inference Engines
|
||||
|
||||
## Overview
|
||||
|
||||
**Purpose**: Provides pluggable inference backends (ONNX Runtime and TensorRT) behind a common abstract interface, including ONNX-to-TensorRT model conversion.
|
||||
|
||||
**Pattern**: Strategy pattern — `InferenceEngine` defines the contract; `OnnxEngine` and `TensorRTEngine` are interchangeable implementations.
|
||||
|
||||
**Upstream**: Domain (constants_inf for logging).
|
||||
**Downstream**: Inference Pipeline (creates and uses engines).
|
||||
|
||||
## Modules
|
||||
|
||||
| Module | Role |
|
||||
|--------|------|
|
||||
| `inference_engine` | Abstract base class defining `get_input_shape`, `get_batch_size`, `run` |
|
||||
| `onnx_engine` | ONNX Runtime implementation (CPU/CUDA) |
|
||||
| `tensorrt_engine` | TensorRT implementation (GPU) + ONNX→TensorRT converter |
|
||||
|
||||
## Internal Interfaces
|
||||
|
||||
### InferenceEngine (abstract)
|
||||
|
||||
```
|
||||
cdef class InferenceEngine:
|
||||
__init__(bytes model_bytes, int batch_size=1, **kwargs)
|
||||
cdef tuple get_input_shape() # -> (height, width)
|
||||
cdef int get_batch_size() # -> batch_size
|
||||
cdef run(input_data) # -> list of output tensors
|
||||
```
|
||||
|
||||
### OnnxEngine
|
||||
|
||||
```
|
||||
cdef class OnnxEngine(InferenceEngine):
|
||||
# Implements all base methods
|
||||
# Provider priority: CUDA > CPU
|
||||
```
|
||||
|
||||
### TensorRTEngine
|
||||
|
||||
```
|
||||
cdef class TensorRTEngine(InferenceEngine):
|
||||
# Implements all base methods
|
||||
@staticmethod get_gpu_memory_bytes(int device_id) -> int
|
||||
@staticmethod get_engine_filename(int device_id) -> str
|
||||
@staticmethod convert_from_onnx(bytes onnx_model) -> bytes or None
|
||||
```
|
||||
|
||||
## External API
|
||||
|
||||
None — internal component consumed by Inference Pipeline.
|
||||
|
||||
## Data Access Patterns
|
||||
|
||||
- Model bytes loaded in-memory (provided by caller)
|
||||
- TensorRT: CUDA device memory allocated at init, async H2D/D2H transfers during inference
|
||||
- ONNX: managed by onnxruntime internally
|
||||
|
||||
## Implementation Details
|
||||
|
||||
- **OnnxEngine**: default batch_size=1; loads model into `onnxruntime.InferenceSession`
|
||||
- **TensorRTEngine**: default batch_size=4; dynamic dimensions default to 1280×1280 input, 300 max detections
|
||||
- **Model conversion**: `convert_from_onnx` uses 90% of GPU memory as workspace, enables FP16 if hardware supports it
|
||||
- **Engine filename**: GPU-specific (`azaion.cc_{major}.{minor}_sm_{count}.engine`) — allows pre-built engine caching per GPU architecture
|
||||
- Output format: `[batch][detection_index][x1, y1, x2, y2, confidence, class_id]`
|
||||
|
||||
## Caveats
|
||||
|
||||
- TensorRT engine files are GPU-architecture-specific and not portable
|
||||
- `pycuda.autoinit` import is required as side-effect (initializes CUDA context)
|
||||
- Dynamic shapes defaulting to 1280×1280 is hardcoded — not configurable
|
||||
|
||||
## Dependency Graph
|
||||
|
||||
```mermaid
|
||||
graph TD
|
||||
onnx_engine --> inference_engine
|
||||
onnx_engine --> constants_inf
|
||||
tensorrt_engine --> inference_engine
|
||||
tensorrt_engine --> constants_inf
|
||||
```
|
||||
|
||||
## Logging Strategy
|
||||
|
||||
Logs model metadata at init and conversion progress/errors via `constants_inf.log`/`logerror`.
|
||||
Reference in New Issue
Block a user