# Model Manager ## Interface Definition **Interface Name**: `IModelManager` ### Interface Methods ```python class IModelManager(ABC): @abstractmethod def load_model(self, model_name: str, model_format: str) -> bool: pass @abstractmethod def get_inference_engine(self, model_name: str) -> InferenceEngine: pass @abstractmethod def optimize_to_tensorrt(self, model_name: str, onnx_path: str) -> str: pass @abstractmethod def fallback_to_onnx(self, model_name: str) -> bool: pass @abstractmethod def warmup_model(self, model_name: str) -> bool: pass ``` ## Component Description ### Responsibilities - Load ML models (TensorRT primary, ONNX fallback) - Manage model lifecycle (loading, unloading, warmup) - Provide inference engines for: - SuperPoint (feature extraction) - LightGlue (feature matching) - DINOv2 (global descriptors) - LiteSAM (cross-view matching) - Handle TensorRT optimization and ONNX fallback - Ensure <5s processing requirement through acceleration ### Scope - Model loading and caching - TensorRT optimization - ONNX fallback handling - Inference engine abstraction - GPU memory management ## API Methods ### `load_model(model_name: str, model_format: str) -> bool` **Description**: Loads model in specified format. **Called By**: F02.1 Flight Lifecycle Manager (during system initialization) **Input**: ```python model_name: str # "SuperPoint", "LightGlue", "DINOv2", "LiteSAM" model_format: str # "tensorrt", "onnx", "pytorch" ``` **Output**: `bool` - True if loaded **Processing Flow**: 1. Check if model already loaded 2. Load model file 3. Initialize inference engine 4. Warm up model 5. Cache for reuse **Test Cases**: 1. Load TensorRT model → succeeds 2. TensorRT unavailable → fallback to ONNX 3. Load all 4 models → all succeed --- ### `get_inference_engine(model_name: str) -> InferenceEngine` **Description**: Gets inference engine for a model. **Called By**: - F07 Sequential VO (SuperPoint, LightGlue) - F08 Global Place Recognition (DINOv2) - F09 Metric Refinement (LiteSAM) **Output**: ```python InferenceEngine: model_name: str format: str infer(input: np.ndarray) -> np.ndarray ``` **Test Cases**: 1. Get SuperPoint engine → returns engine 2. Call infer() → returns features --- ### `optimize_to_tensorrt(model_name: str, onnx_path: str) -> str` **Description**: Converts ONNX model to TensorRT for acceleration. **Called By**: System initialization (one-time) **Input**: ```python model_name: str onnx_path: str # Path to ONNX model ``` **Output**: `str` - Path to TensorRT engine **Processing Details**: - FP16 precision (2-3x speedup) - Graph fusion and kernel optimization - One-time conversion, cached for reuse **Test Cases**: 1. Convert ONNX to TensorRT → engine created 2. Load TensorRT engine → inference faster than ONNX --- ### `fallback_to_onnx(model_name: str) -> bool` **Description**: Falls back to ONNX if TensorRT fails. **Called By**: Internal (during load_model) **Processing Flow**: 1. Detect TensorRT failure 2. Load ONNX model 3. Log warning 4. Continue with ONNX **Test Cases**: 1. TensorRT fails → ONNX loaded automatically 2. System continues functioning --- ### `warmup_model(model_name: str) -> bool` **Description**: Warms up model with dummy input. **Called By**: Internal (after load_model) **Purpose**: Initialize CUDA kernels, allocate GPU memory **Test Cases**: 1. Warmup → first real inference fast ## Integration Tests ### Test 1: Model Loading 1. load_model("SuperPoint", "tensorrt") 2. load_model("LightGlue", "tensorrt") 3. load_model("DINOv2", "tensorrt") 4. load_model("LiteSAM", "tensorrt") 5. Verify all loaded ### Test 2: Inference Performance 1. Get inference engine 2. Run inference 100 times 3. Measure average latency 4. Verify meets performance targets ### Test 3: Fallback Scenario 1. Simulate TensorRT failure 2. Verify fallback to ONNX 3. Verify inference still works ## Non-Functional Requirements ### Performance - **SuperPoint**: ~15ms (TensorRT), ~50ms (ONNX) - **LightGlue**: ~50ms (TensorRT), ~150ms (ONNX) - **DINOv2**: ~150ms (TensorRT), ~500ms (ONNX) - **LiteSAM**: ~60ms (TensorRT), ~200ms (ONNX) ### Memory - GPU memory: ~4GB for all 4 models ### Reliability - Graceful fallback to ONNX - Automatic retry on transient errors ## Dependencies ### External Dependencies - **TensorRT**: NVIDIA inference optimization - **ONNX Runtime**: ONNX inference - **PyTorch**: Model weights (optional) - **CUDA**: GPU acceleration ## Data Models ### InferenceEngine ```python class InferenceEngine(ABC): model_name: str format: str @abstractmethod def infer(self, input: np.ndarray) -> np.ndarray: pass ``` ### ModelConfig ```python class ModelConfig(BaseModel): model_name: str model_path: str format: str precision: str # "fp16", "fp32" warmup_iterations: int = 3 ```