Add detailed file index and enhance skill documentation for autopilot, decompose, deploy, plan, and research skills. Introduce tests-only mode in decompose skill, clarify required files for deploy and plan skills, and improve prerequisite checks across skills for better user guidance and workflow efficiency.

2026-06-21 19:51:08 +00:00 · 2026-03-22 16:15:49 +02:00
parent 60ebe686ff
commit 3165a88f0b
60 changed files with 6324 additions and 1550 deletions
@@ -0,0 +1,86 @@
+# Component: Inference Engines
+
+## Overview
+
+**Purpose**: Provides pluggable inference backends (ONNX Runtime and TensorRT) behind a common abstract interface, including ONNX-to-TensorRT model conversion.
+
+**Pattern**: Strategy pattern — `InferenceEngine` defines the contract; `OnnxEngine` and `TensorRTEngine` are interchangeable implementations.
+
+**Upstream**: Domain (constants_inf for logging).
+**Downstream**: Inference Pipeline (creates and uses engines).
+
+## Modules
+
+| Module | Role |
+|--------|------|
+| `inference_engine` | Abstract base class defining `get_input_shape`, `get_batch_size`, `run` |
+| `onnx_engine` | ONNX Runtime implementation (CPU/CUDA) |
+| `tensorrt_engine` | TensorRT implementation (GPU) + ONNX→TensorRT converter |
+
+## Internal Interfaces
+
+### InferenceEngine (abstract)
+
+```
+cdef class InferenceEngine:
+    __init__(bytes model_bytes, int batch_size=1, **kwargs)
+    cdef tuple get_input_shape()       # -> (height, width)
+    cdef int get_batch_size()          # -> batch_size
+    cdef run(input_data)               # -> list of output tensors
+```
+
+### OnnxEngine
+
+```
+cdef class OnnxEngine(InferenceEngine):
+    # Implements all base methods
+    # Provider priority: CUDA > CPU
+```
+
+### TensorRTEngine
+
+```
+cdef class TensorRTEngine(InferenceEngine):
+    # Implements all base methods
+    @staticmethod get_gpu_memory_bytes(int device_id) -> int
+    @staticmethod get_engine_filename(int device_id) -> str
+    @staticmethod convert_from_onnx(bytes onnx_model) -> bytes or None
+```
+
+## External API
+
+None — internal component consumed by Inference Pipeline.
+
+## Data Access Patterns
+
+- Model bytes loaded in-memory (provided by caller)
+- TensorRT: CUDA device memory allocated at init, async H2D/D2H transfers during inference
+- ONNX: managed by onnxruntime internally
+
+## Implementation Details
+
+- **OnnxEngine**: default batch_size=1; loads model into `onnxruntime.InferenceSession`
+- **TensorRTEngine**: default batch_size=4; dynamic dimensions default to 1280×1280 input, 300 max detections
+- **Model conversion**: `convert_from_onnx` uses 90% of GPU memory as workspace, enables FP16 if hardware supports it
+- **Engine filename**: GPU-specific (`azaion.cc_{major}.{minor}_sm_{count}.engine`) — allows pre-built engine caching per GPU architecture
+- Output format: `[batch][detection_index][x1, y1, x2, y2, confidence, class_id]`
+
+## Caveats
+
+- TensorRT engine files are GPU-architecture-specific and not portable
+- `pycuda.autoinit` import is required as side-effect (initializes CUDA context)
+- Dynamic shapes defaulting to 1280×1280 is hardcoded — not configurable
+
+## Dependency Graph
+
+```mermaid
+graph TD
+    onnx_engine --> inference_engine
+    onnx_engine --> constants_inf
+    tensorrt_engine --> inference_engine
+    tensorrt_engine --> constants_inf
+```
+
+## Logging Strategy
+
+Logs model metadata at init and conversion progress/errors via `constants_inf.log`/`logerror`.