Add detailed file index and enhance skill documentation for autopilot, decompose, deploy, plan, and research skills. Introduce tests-only mode in decompose skill, clarify required files for deploy and plan skills, and improve prerequisite checks across skills for better user guidance and workflow efficiency.

This commit is contained in:
Oleksandr Bezdieniezhnykh
2026-03-22 16:15:49 +02:00
parent 60ebe686ff
commit 3165a88f0b
60 changed files with 6324 additions and 1550 deletions
@@ -0,0 +1,86 @@
# Component: Inference Engines
## Overview
**Purpose**: Provides pluggable inference backends (ONNX Runtime and TensorRT) behind a common abstract interface, including ONNX-to-TensorRT model conversion.
**Pattern**: Strategy pattern — `InferenceEngine` defines the contract; `OnnxEngine` and `TensorRTEngine` are interchangeable implementations.
**Upstream**: Domain (constants_inf for logging).
**Downstream**: Inference Pipeline (creates and uses engines).
## Modules
| Module | Role |
|--------|------|
| `inference_engine` | Abstract base class defining `get_input_shape`, `get_batch_size`, `run` |
| `onnx_engine` | ONNX Runtime implementation (CPU/CUDA) |
| `tensorrt_engine` | TensorRT implementation (GPU) + ONNX→TensorRT converter |
## Internal Interfaces
### InferenceEngine (abstract)
```
cdef class InferenceEngine:
__init__(bytes model_bytes, int batch_size=1, **kwargs)
cdef tuple get_input_shape() # -> (height, width)
cdef int get_batch_size() # -> batch_size
cdef run(input_data) # -> list of output tensors
```
### OnnxEngine
```
cdef class OnnxEngine(InferenceEngine):
# Implements all base methods
# Provider priority: CUDA > CPU
```
### TensorRTEngine
```
cdef class TensorRTEngine(InferenceEngine):
# Implements all base methods
@staticmethod get_gpu_memory_bytes(int device_id) -> int
@staticmethod get_engine_filename(int device_id) -> str
@staticmethod convert_from_onnx(bytes onnx_model) -> bytes or None
```
## External API
None — internal component consumed by Inference Pipeline.
## Data Access Patterns
- Model bytes loaded in-memory (provided by caller)
- TensorRT: CUDA device memory allocated at init, async H2D/D2H transfers during inference
- ONNX: managed by onnxruntime internally
## Implementation Details
- **OnnxEngine**: default batch_size=1; loads model into `onnxruntime.InferenceSession`
- **TensorRTEngine**: default batch_size=4; dynamic dimensions default to 1280×1280 input, 300 max detections
- **Model conversion**: `convert_from_onnx` uses 90% of GPU memory as workspace, enables FP16 if hardware supports it
- **Engine filename**: GPU-specific (`azaion.cc_{major}.{minor}_sm_{count}.engine`) — allows pre-built engine caching per GPU architecture
- Output format: `[batch][detection_index][x1, y1, x2, y2, confidence, class_id]`
## Caveats
- TensorRT engine files are GPU-architecture-specific and not portable
- `pycuda.autoinit` import is required as side-effect (initializes CUDA context)
- Dynamic shapes defaulting to 1280×1280 is hardcoded — not configurable
## Dependency Graph
```mermaid
graph TD
onnx_engine --> inference_engine
onnx_engine --> constants_inf
tensorrt_engine --> inference_engine
tensorrt_engine --> constants_inf
```
## Logging Strategy
Logs model metadata at init and conversion progress/errors via `constants_inf.log`/`logerror`.