mirror of
https://github.com/azaion/gps-denied-desktop.git
synced 2026-04-22 22:36:36 +00:00
4.9 KiB
4.9 KiB
Model Manager
Interface Definition
Interface Name: IModelManager
Interface Methods
class IModelManager(ABC):
@abstractmethod
def load_model(self, model_name: str, model_format: str) -> bool:
pass
@abstractmethod
def get_inference_engine(self, model_name: str) -> InferenceEngine:
pass
@abstractmethod
def optimize_to_tensorrt(self, model_name: str, onnx_path: str) -> str:
pass
@abstractmethod
def fallback_to_onnx(self, model_name: str) -> bool:
pass
@abstractmethod
def warmup_model(self, model_name: str) -> bool:
pass
Component Description
Responsibilities
- Load ML models (TensorRT primary, ONNX fallback)
- Manage model lifecycle (loading, unloading, warmup)
- Provide inference engines for:
- SuperPoint (feature extraction)
- LightGlue (feature matching)
- DINOv2 (global descriptors)
- LiteSAM (cross-view matching)
- Handle TensorRT optimization and ONNX fallback
- Ensure <5s processing requirement through acceleration
Scope
- Model loading and caching
- TensorRT optimization
- ONNX fallback handling
- Inference engine abstraction
- GPU memory management
API Methods
load_model(model_name: str, model_format: str) -> bool
Description: Loads model in specified format.
Called By: F02.1 Flight Lifecycle Manager (during system initialization)
Input:
model_name: str # "SuperPoint", "LightGlue", "DINOv2", "LiteSAM"
model_format: str # "tensorrt", "onnx", "pytorch"
Output: bool - True if loaded
Processing Flow:
- Check if model already loaded
- Load model file
- Initialize inference engine
- Warm up model
- Cache for reuse
Test Cases:
- Load TensorRT model → succeeds
- TensorRT unavailable → fallback to ONNX
- Load all 4 models → all succeed
get_inference_engine(model_name: str) -> InferenceEngine
Description: Gets inference engine for a model.
Called By:
- F07 Sequential VO (SuperPoint, LightGlue)
- F08 Global Place Recognition (DINOv2)
- F09 Metric Refinement (LiteSAM)
Output:
InferenceEngine:
model_name: str
format: str
infer(input: np.ndarray) -> np.ndarray
Test Cases:
- Get SuperPoint engine → returns engine
- Call infer() → returns features
optimize_to_tensorrt(model_name: str, onnx_path: str) -> str
Description: Converts ONNX model to TensorRT for acceleration.
Called By: System initialization (one-time)
Input:
model_name: str
onnx_path: str # Path to ONNX model
Output: str - Path to TensorRT engine
Processing Details:
- FP16 precision (2-3x speedup)
- Graph fusion and kernel optimization
- One-time conversion, cached for reuse
Test Cases:
- Convert ONNX to TensorRT → engine created
- Load TensorRT engine → inference faster than ONNX
fallback_to_onnx(model_name: str) -> bool
Description: Falls back to ONNX if TensorRT fails.
Called By: Internal (during load_model)
Processing Flow:
- Detect TensorRT failure
- Load ONNX model
- Log warning
- Continue with ONNX
Test Cases:
- TensorRT fails → ONNX loaded automatically
- System continues functioning
warmup_model(model_name: str) -> bool
Description: Warms up model with dummy input.
Called By: Internal (after load_model)
Purpose: Initialize CUDA kernels, allocate GPU memory
Test Cases:
- Warmup → first real inference fast
Integration Tests
Test 1: Model Loading
- load_model("SuperPoint", "tensorrt")
- load_model("LightGlue", "tensorrt")
- load_model("DINOv2", "tensorrt")
- load_model("LiteSAM", "tensorrt")
- Verify all loaded
Test 2: Inference Performance
- Get inference engine
- Run inference 100 times
- Measure average latency
- Verify meets performance targets
Test 3: Fallback Scenario
- Simulate TensorRT failure
- Verify fallback to ONNX
- Verify inference still works
Non-Functional Requirements
Performance
- SuperPoint: ~15ms (TensorRT), ~50ms (ONNX)
- LightGlue: ~50ms (TensorRT), ~150ms (ONNX)
- DINOv2: ~150ms (TensorRT), ~500ms (ONNX)
- LiteSAM: ~60ms (TensorRT), ~200ms (ONNX)
Memory
- GPU memory: ~4GB for all 4 models
Reliability
- Graceful fallback to ONNX
- Automatic retry on transient errors
Dependencies
External Dependencies
- TensorRT: NVIDIA inference optimization
- ONNX Runtime: ONNX inference
- PyTorch: Model weights (optional)
- CUDA: GPU acceleration
Data Models
InferenceEngine
class InferenceEngine(ABC):
model_name: str
format: str
@abstractmethod
def infer(self, input: np.ndarray) -> np.ndarray:
pass
ModelConfig
class ModelConfig(BaseModel):
model_name: str
model_path: str
format: str
precision: str # "fp16", "fp32"
warmup_iterations: int = 3