mirror of
https://github.com/azaion/gps-denied-desktop.git
synced 2026-04-23 00:46:37 +00:00
add features
This commit is contained in:
@@ -0,0 +1,63 @@
|
||||
# Feature: Model Lifecycle Management
|
||||
|
||||
## Description
|
||||
|
||||
Manages the complete lifecycle of ML models including loading, caching, warmup, and unloading. Handles all four models (SuperPoint, LightGlue, DINOv2, LiteSAM) with support for multiple formats (TensorRT, ONNX, PyTorch).
|
||||
|
||||
## Component APIs Implemented
|
||||
|
||||
### `load_model(model_name: str, model_format: str) -> bool`
|
||||
- Loads model in specified format
|
||||
- Checks if model already loaded (cache hit)
|
||||
- Initializes inference engine
|
||||
- Triggers warmup
|
||||
- Caches for reuse
|
||||
|
||||
### `warmup_model(model_name: str) -> bool`
|
||||
- Warms up model with dummy input
|
||||
- Initializes CUDA kernels
|
||||
- Pre-allocates GPU memory
|
||||
- Ensures first real inference is fast
|
||||
|
||||
## External Tools and Services
|
||||
|
||||
- **TensorRT**: Loading TensorRT engine files
|
||||
- **ONNX Runtime**: Loading ONNX models
|
||||
- **PyTorch**: Loading PyTorch model weights (optional)
|
||||
- **CUDA**: GPU memory allocation
|
||||
|
||||
## Internal Methods
|
||||
|
||||
| Method | Purpose |
|
||||
|--------|---------|
|
||||
| `_check_model_cache(model_name)` | Check if model already loaded |
|
||||
| `_load_tensorrt_engine(path)` | Load TensorRT engine from file |
|
||||
| `_load_onnx_model(path)` | Load ONNX model from file |
|
||||
| `_allocate_gpu_memory(model)` | Allocate GPU memory for model |
|
||||
| `_create_dummy_input(model_name)` | Create appropriate dummy input for warmup |
|
||||
| `_cache_model(model_name, engine)` | Store loaded model in cache |
|
||||
|
||||
## Unit Tests
|
||||
|
||||
| Test | Description | Expected Result |
|
||||
|------|-------------|-----------------|
|
||||
| UT-16.01-01 | Load TensorRT model | Model loaded, returns True |
|
||||
| UT-16.01-02 | Load ONNX model | Model loaded, returns True |
|
||||
| UT-16.01-03 | Load already cached model | Returns True immediately (no reload) |
|
||||
| UT-16.01-04 | Load invalid model name | Returns False, logs error |
|
||||
| UT-16.01-05 | Load invalid model path | Returns False, logs error |
|
||||
| UT-16.01-06 | Warmup SuperPoint | CUDA kernels initialized |
|
||||
| UT-16.01-07 | Warmup LightGlue | CUDA kernels initialized |
|
||||
| UT-16.01-08 | Warmup DINOv2 | CUDA kernels initialized |
|
||||
| UT-16.01-09 | Warmup LiteSAM | CUDA kernels initialized |
|
||||
| UT-16.01-10 | Warmup unloaded model | Returns False |
|
||||
|
||||
## Integration Tests
|
||||
|
||||
| Test | Description | Expected Result |
|
||||
|------|-------------|-----------------|
|
||||
| IT-16.01-01 | Load all 4 models sequentially | All models loaded successfully |
|
||||
| IT-16.01-02 | Load + warmup cycle for each model | All models ready for inference |
|
||||
| IT-16.01-03 | GPU memory allocation after loading all models | ~4GB GPU memory used |
|
||||
| IT-16.01-04 | First inference after warmup | Latency within target range |
|
||||
|
||||
@@ -0,0 +1,68 @@
|
||||
# Feature: Inference Engine Provisioning
|
||||
|
||||
## Description
|
||||
|
||||
Provides inference engines to consuming components and handles TensorRT optimization with automatic ONNX fallback. Ensures consistent inference interface regardless of underlying format and meets <5s processing requirement through acceleration.
|
||||
|
||||
## Component APIs Implemented
|
||||
|
||||
### `get_inference_engine(model_name: str) -> InferenceEngine`
|
||||
- Returns inference engine for specified model
|
||||
- Engine provides unified `infer(input: np.ndarray) -> np.ndarray` interface
|
||||
- Consumers: F07 Sequential VO (SuperPoint, LightGlue), F08 GPR (DINOv2), F09 Metric Refinement (LiteSAM)
|
||||
|
||||
### `optimize_to_tensorrt(model_name: str, onnx_path: str) -> str`
|
||||
- Converts ONNX model to TensorRT engine
|
||||
- Applies FP16 precision (2-3x speedup)
|
||||
- Performs graph fusion and kernel optimization
|
||||
- One-time conversion, result cached
|
||||
|
||||
### `fallback_to_onnx(model_name: str) -> bool`
|
||||
- Detects TensorRT failure
|
||||
- Loads ONNX model as fallback
|
||||
- Logs warning for monitoring
|
||||
- Ensures system continues functioning
|
||||
|
||||
## External Tools and Services
|
||||
|
||||
- **TensorRT**: Model optimization and inference
|
||||
- **ONNX Runtime**: Fallback inference
|
||||
- **CUDA**: GPU execution
|
||||
|
||||
## Internal Methods
|
||||
|
||||
| Method | Purpose |
|
||||
|--------|---------|
|
||||
| `_get_cached_engine(model_name)` | Retrieve engine from cache |
|
||||
| `_build_tensorrt_engine(onnx_path)` | Build TensorRT engine from ONNX |
|
||||
| `_apply_fp16_optimization(builder)` | Enable FP16 precision in TensorRT |
|
||||
| `_cache_tensorrt_engine(model_name, path)` | Save TensorRT engine to disk |
|
||||
| `_detect_tensorrt_failure(error)` | Determine if error requires ONNX fallback |
|
||||
| `_create_inference_wrapper(engine, format)` | Create unified InferenceEngine interface |
|
||||
|
||||
## Unit Tests
|
||||
|
||||
| Test | Description | Expected Result |
|
||||
|------|-------------|-----------------|
|
||||
| UT-16.02-01 | Get SuperPoint engine | Returns valid InferenceEngine |
|
||||
| UT-16.02-02 | Get LightGlue engine | Returns valid InferenceEngine |
|
||||
| UT-16.02-03 | Get DINOv2 engine | Returns valid InferenceEngine |
|
||||
| UT-16.02-04 | Get LiteSAM engine | Returns valid InferenceEngine |
|
||||
| UT-16.02-05 | Get unloaded model engine | Raises error or returns None |
|
||||
| UT-16.02-06 | InferenceEngine.infer() with valid input | Returns features array |
|
||||
| UT-16.02-07 | Optimize ONNX to TensorRT | TensorRT engine file created |
|
||||
| UT-16.02-08 | TensorRT optimization with FP16 | Engine uses FP16 precision |
|
||||
| UT-16.02-09 | Fallback to ONNX on TensorRT failure | ONNX model loaded, returns True |
|
||||
| UT-16.02-10 | Fallback logs warning | Warning logged |
|
||||
|
||||
## Integration Tests
|
||||
|
||||
| Test | Description | Expected Result |
|
||||
|------|-------------|-----------------|
|
||||
| IT-16.02-01 | SuperPoint inference 100 iterations | Avg latency ~15ms (TensorRT) or ~50ms (ONNX) |
|
||||
| IT-16.02-02 | LightGlue inference 100 iterations | Avg latency ~50ms (TensorRT) or ~150ms (ONNX) |
|
||||
| IT-16.02-03 | DINOv2 inference 100 iterations | Avg latency ~150ms (TensorRT) or ~500ms (ONNX) |
|
||||
| IT-16.02-04 | LiteSAM inference 100 iterations | Avg latency ~60ms (TensorRT) or ~200ms (ONNX) |
|
||||
| IT-16.02-05 | Simulate TensorRT failure → ONNX fallback | System continues with ONNX |
|
||||
| IT-16.02-06 | Full pipeline: optimize → load → infer | End-to-end works |
|
||||
|
||||
Reference in New Issue
Block a user