mirror of
https://github.com/azaion/detections.git
synced 2026-04-22 14:56:31 +00:00
Add detailed file index and enhance skill documentation for autopilot, decompose, deploy, plan, and research skills. Introduce tests-only mode in decompose skill, clarify required files for deploy and plan skills, and improve prerequisite checks across skills for better user guidance and workflow efficiency.
This commit is contained in:
@@ -0,0 +1,57 @@
|
||||
# Module: tensorrt_engine
|
||||
|
||||
## Purpose
|
||||
|
||||
TensorRT-based inference engine — high-performance GPU inference with CUDA memory management and ONNX-to-TensorRT model conversion.
|
||||
|
||||
## Public Interface
|
||||
|
||||
### Class: TensorRTEngine (extends InferenceEngine)
|
||||
|
||||
| Method | Signature | Description |
|
||||
|--------|-----------|-------------|
|
||||
| `__init__` | `(bytes model_bytes, int batch_size=4, **kwargs)` | Deserializes TensorRT engine from bytes, allocates CUDA input/output memory, creates execution context and stream |
|
||||
| `get_input_shape` | `() -> tuple` | Returns `(height, width)` from input tensor shape |
|
||||
| `get_batch_size` | `() -> int` | Returns batch size |
|
||||
| `run` | `(input_data) -> list` | Async H2D copy → execute → D2H copy, returns output as numpy array |
|
||||
| `get_gpu_memory_bytes` | `(int device_id) -> int` | Static. Returns total GPU memory in bytes (default 2GB if unavailable) |
|
||||
| `get_engine_filename` | `(int device_id) -> str` | Static. Returns engine filename with compute capability and SM count: `azaion.cc_{major}.{minor}_sm_{count}.engine` |
|
||||
| `convert_from_onnx` | `(bytes onnx_model) -> bytes or None` | Static. Converts ONNX model to TensorRT serialized engine. Uses 90% of GPU memory as workspace. Enables FP16 if supported. |
|
||||
|
||||
## Internal Logic
|
||||
|
||||
- Input shape defaults to 1280×1280 for dynamic dimensions.
|
||||
- Output shape defaults to 300 max detections × 6 values (x1, y1, x2, y2, conf, cls) for dynamic dimensions.
|
||||
- `run` uses async CUDA memory transfers with stream synchronization.
|
||||
- `convert_from_onnx` uses explicit batch mode, configures FP16 precision when GPU supports it.
|
||||
- Default batch size is 4 (vs OnnxEngine's 1).
|
||||
|
||||
## Dependencies
|
||||
|
||||
- **External**: `tensorrt`, `pycuda.driver`, `pycuda.autoinit`, `pynvml`, `numpy`
|
||||
- **Internal**: `inference_engine` (base class), `constants_inf` (logging)
|
||||
|
||||
## Consumers
|
||||
|
||||
- `inference` — instantiated when compatible NVIDIA GPU is found; also calls `convert_from_onnx` and `get_engine_filename`
|
||||
|
||||
## Data Models
|
||||
|
||||
None (wraps TensorRT runtime objects).
|
||||
|
||||
## Configuration
|
||||
|
||||
- Engine filename is GPU-specific (compute capability + SM count).
|
||||
- Workspace memory is 90% of available GPU memory.
|
||||
|
||||
## External Integrations
|
||||
|
||||
None directly — model bytes provided by caller.
|
||||
|
||||
## Security
|
||||
|
||||
None.
|
||||
|
||||
## Tests
|
||||
|
||||
None found.
|
||||
Reference in New Issue
Block a user