Add detailed file index and enhance skill documentation for autopilot, decompose, deploy, plan, and research skills. Introduce tests-only mode in decompose skill, clarify required files for deploy and plan skills, and improve prerequisite checks across skills for better user guidance and workflow efficiency.

2026-06-21 13:11:09 +00:00 · 2026-03-22 16:15:49 +02:00
parent 60ebe686ff
commit 3165a88f0b
60 changed files with 6324 additions and 1550 deletions
@@ -0,0 +1,57 @@
+# Module: tensorrt_engine
+
+## Purpose
+
+TensorRT-based inference engine — high-performance GPU inference with CUDA memory management and ONNX-to-TensorRT model conversion.
+
+## Public Interface
+
+### Class: TensorRTEngine (extends InferenceEngine)
+
+| Method | Signature | Description |
+|--------|-----------|-------------|
+| `__init__` | `(bytes model_bytes, int batch_size=4, **kwargs)` | Deserializes TensorRT engine from bytes, allocates CUDA input/output memory, creates execution context and stream |
+| `get_input_shape` | `() -> tuple` | Returns `(height, width)` from input tensor shape |
+| `get_batch_size` | `() -> int` | Returns batch size |
+| `run` | `(input_data) -> list` | Async H2D copy → execute → D2H copy, returns output as numpy array |
+| `get_gpu_memory_bytes` | `(int device_id) -> int` | Static. Returns total GPU memory in bytes (default 2GB if unavailable) |
+| `get_engine_filename` | `(int device_id) -> str` | Static. Returns engine filename with compute capability and SM count: `azaion.cc_{major}.{minor}_sm_{count}.engine` |
+| `convert_from_onnx` | `(bytes onnx_model) -> bytes or None` | Static. Converts ONNX model to TensorRT serialized engine. Uses 90% of GPU memory as workspace. Enables FP16 if supported. |
+
+## Internal Logic
+
+- Input shape defaults to 1280×1280 for dynamic dimensions.
+- Output shape defaults to 300 max detections × 6 values (x1, y1, x2, y2, conf, cls) for dynamic dimensions.
+- `run` uses async CUDA memory transfers with stream synchronization.
+- `convert_from_onnx` uses explicit batch mode, configures FP16 precision when GPU supports it.
+- Default batch size is 4 (vs OnnxEngine's 1).
+
+## Dependencies
+
+- **External**: `tensorrt`, `pycuda.driver`, `pycuda.autoinit`, `pynvml`, `numpy`
+- **Internal**: `inference_engine` (base class), `constants_inf` (logging)
+
+## Consumers
+
+- `inference` — instantiated when compatible NVIDIA GPU is found; also calls `convert_from_onnx` and `get_engine_filename`
+
+## Data Models
+
+None (wraps TensorRT runtime objects).
+
+## Configuration
+
+- Engine filename is GPU-specific (compute capability + SM count).
+- Workspace memory is 90% of available GPU memory.
+
+## External Integrations
+
+None directly — model bytes provided by caller.
+
+## Security
+
+None.
+
+## Tests
+
+None found.