mirror of
https://github.com/azaion/gps-denied-desktop.git
synced 2026-04-22 22:46:36 +00:00
add chunking
This commit is contained in:
@@ -0,0 +1,224 @@
|
||||
# Model Manager
|
||||
|
||||
## Interface Definition
|
||||
|
||||
**Interface Name**: `IModelManager`
|
||||
|
||||
### Interface Methods
|
||||
|
||||
```python
|
||||
class IModelManager(ABC):
|
||||
@abstractmethod
|
||||
def load_model(self, model_name: str, model_format: str) -> bool:
|
||||
pass
|
||||
|
||||
@abstractmethod
|
||||
def get_inference_engine(self, model_name: str) -> InferenceEngine:
|
||||
pass
|
||||
|
||||
@abstractmethod
|
||||
def optimize_to_tensorrt(self, model_name: str, onnx_path: str) -> str:
|
||||
pass
|
||||
|
||||
@abstractmethod
|
||||
def fallback_to_onnx(self, model_name: str) -> bool:
|
||||
pass
|
||||
|
||||
@abstractmethod
|
||||
def warmup_model(self, model_name: str) -> bool:
|
||||
pass
|
||||
```
|
||||
|
||||
## Component Description
|
||||
|
||||
### Responsibilities
|
||||
- Load ML models (TensorRT primary, ONNX fallback)
|
||||
- Manage model lifecycle (loading, unloading, warmup)
|
||||
- Provide inference engines for:
|
||||
- SuperPoint (feature extraction)
|
||||
- LightGlue (feature matching)
|
||||
- DINOv2 (global descriptors)
|
||||
- LiteSAM (cross-view matching)
|
||||
- Handle TensorRT optimization and ONNX fallback
|
||||
- Ensure <5s processing requirement through acceleration
|
||||
|
||||
### Scope
|
||||
- Model loading and caching
|
||||
- TensorRT optimization
|
||||
- ONNX fallback handling
|
||||
- Inference engine abstraction
|
||||
- GPU memory management
|
||||
|
||||
## API Methods
|
||||
|
||||
### `load_model(model_name: str, model_format: str) -> bool`
|
||||
|
||||
**Description**: Loads model in specified format.
|
||||
|
||||
**Called By**: F02 Flight Manager (during initialization)
|
||||
|
||||
**Input**:
|
||||
```python
|
||||
model_name: str # "SuperPoint", "LightGlue", "DINOv2", "LiteSAM"
|
||||
model_format: str # "tensorrt", "onnx", "pytorch"
|
||||
```
|
||||
|
||||
**Output**: `bool` - True if loaded
|
||||
|
||||
**Processing Flow**:
|
||||
1. Check if model already loaded
|
||||
2. Load model file
|
||||
3. Initialize inference engine
|
||||
4. Warm up model
|
||||
5. Cache for reuse
|
||||
|
||||
**Test Cases**:
|
||||
1. Load TensorRT model → succeeds
|
||||
2. TensorRT unavailable → fallback to ONNX
|
||||
3. Load all 4 models → all succeed
|
||||
|
||||
---
|
||||
|
||||
### `get_inference_engine(model_name: str) -> InferenceEngine`
|
||||
|
||||
**Description**: Gets inference engine for a model.
|
||||
|
||||
**Called By**:
|
||||
- F07 Sequential VO (SuperPoint, LightGlue)
|
||||
- F08 Global Place Recognition (DINOv2)
|
||||
- F09 Metric Refinement (LiteSAM)
|
||||
|
||||
**Output**:
|
||||
```python
|
||||
InferenceEngine:
|
||||
model_name: str
|
||||
format: str
|
||||
infer(input: np.ndarray) -> np.ndarray
|
||||
```
|
||||
|
||||
**Test Cases**:
|
||||
1. Get SuperPoint engine → returns engine
|
||||
2. Call infer() → returns features
|
||||
|
||||
---
|
||||
|
||||
### `optimize_to_tensorrt(model_name: str, onnx_path: str) -> str`
|
||||
|
||||
**Description**: Converts ONNX model to TensorRT for acceleration.
|
||||
|
||||
**Called By**: System initialization (one-time)
|
||||
|
||||
**Input**:
|
||||
```python
|
||||
model_name: str
|
||||
onnx_path: str # Path to ONNX model
|
||||
```
|
||||
|
||||
**Output**: `str` - Path to TensorRT engine
|
||||
|
||||
**Processing Details**:
|
||||
- FP16 precision (2-3x speedup)
|
||||
- Graph fusion and kernel optimization
|
||||
- One-time conversion, cached for reuse
|
||||
|
||||
**Test Cases**:
|
||||
1. Convert ONNX to TensorRT → engine created
|
||||
2. Load TensorRT engine → inference faster than ONNX
|
||||
|
||||
---
|
||||
|
||||
### `fallback_to_onnx(model_name: str) -> bool`
|
||||
|
||||
**Description**: Falls back to ONNX if TensorRT fails.
|
||||
|
||||
**Called By**: Internal (during load_model)
|
||||
|
||||
**Processing Flow**:
|
||||
1. Detect TensorRT failure
|
||||
2. Load ONNX model
|
||||
3. Log warning
|
||||
4. Continue with ONNX
|
||||
|
||||
**Test Cases**:
|
||||
1. TensorRT fails → ONNX loaded automatically
|
||||
2. System continues functioning
|
||||
|
||||
---
|
||||
|
||||
### `warmup_model(model_name: str) -> bool`
|
||||
|
||||
**Description**: Warms up model with dummy input.
|
||||
|
||||
**Called By**: Internal (after load_model)
|
||||
|
||||
**Purpose**: Initialize CUDA kernels, allocate GPU memory
|
||||
|
||||
**Test Cases**:
|
||||
1. Warmup → first real inference fast
|
||||
|
||||
## Integration Tests
|
||||
|
||||
### Test 1: Model Loading
|
||||
1. load_model("SuperPoint", "tensorrt")
|
||||
2. load_model("LightGlue", "tensorrt")
|
||||
3. load_model("DINOv2", "tensorrt")
|
||||
4. load_model("LiteSAM", "tensorrt")
|
||||
5. Verify all loaded
|
||||
|
||||
### Test 2: Inference Performance
|
||||
1. Get inference engine
|
||||
2. Run inference 100 times
|
||||
3. Measure average latency
|
||||
4. Verify meets performance targets
|
||||
|
||||
### Test 3: Fallback Scenario
|
||||
1. Simulate TensorRT failure
|
||||
2. Verify fallback to ONNX
|
||||
3. Verify inference still works
|
||||
|
||||
## Non-Functional Requirements
|
||||
|
||||
### Performance
|
||||
- **SuperPoint**: ~15ms (TensorRT), ~50ms (ONNX)
|
||||
- **LightGlue**: ~50ms (TensorRT), ~150ms (ONNX)
|
||||
- **DINOv2**: ~150ms (TensorRT), ~500ms (ONNX)
|
||||
- **LiteSAM**: ~60ms (TensorRT), ~200ms (ONNX)
|
||||
|
||||
### Memory
|
||||
- GPU memory: ~4GB for all 4 models
|
||||
|
||||
### Reliability
|
||||
- Graceful fallback to ONNX
|
||||
- Automatic retry on transient errors
|
||||
|
||||
## Dependencies
|
||||
|
||||
### External Dependencies
|
||||
- **TensorRT**: NVIDIA inference optimization
|
||||
- **ONNX Runtime**: ONNX inference
|
||||
- **PyTorch**: Model weights (optional)
|
||||
- **CUDA**: GPU acceleration
|
||||
|
||||
## Data Models
|
||||
|
||||
### InferenceEngine
|
||||
```python
|
||||
class InferenceEngine(ABC):
|
||||
model_name: str
|
||||
format: str
|
||||
|
||||
@abstractmethod
|
||||
def infer(self, input: np.ndarray) -> np.ndarray:
|
||||
pass
|
||||
```
|
||||
|
||||
### ModelConfig
|
||||
```python
|
||||
class ModelConfig(BaseModel):
|
||||
model_name: str
|
||||
model_path: str
|
||||
format: str
|
||||
precision: str # "fp16", "fp32"
|
||||
warmup_iterations: int = 3
|
||||
```
|
||||
|
||||
Reference in New Issue
Block a user