# Model Manager

## Interface Definition

**Interface Name**: `IModelManager`

### Interface Methods

```python
class IModelManager(ABC):
    @abstractmethod
    def load_model(self, model_name: str, model_format: str) -> bool:
        pass
    
    @abstractmethod
    def get_inference_engine(self, model_name: str) -> InferenceEngine:
        pass
    
    @abstractmethod
    def optimize_to_tensorrt(self, model_name: str, onnx_path: str) -> str:
        pass
    
    @abstractmethod
    def fallback_to_onnx(self, model_name: str) -> bool:
        pass
    
    @abstractmethod
    def warmup_model(self, model_name: str) -> bool:
        pass
```

## Component Description

### Responsibilities
- Load ML models (TensorRT primary, ONNX fallback)
- Manage model lifecycle (loading, unloading, warmup)
- Provide inference engines for:
  - SuperPoint (feature extraction)
  - LightGlue (feature matching)
  - DINOv2 (global descriptors)
  - LiteSAM (cross-view matching)
- Handle TensorRT optimization and ONNX fallback
- Ensure <5s processing requirement through acceleration

### Scope
- Model loading and caching
- TensorRT optimization
- ONNX fallback handling
- Inference engine abstraction
- GPU memory management

## API Methods

### `load_model(model_name: str, model_format: str) -> bool`

**Description**: Loads model in specified format.

**Called By**: F02.1 Flight Lifecycle Manager (during system initialization)

**Input**:
```python
model_name: str  # "SuperPoint", "LightGlue", "DINOv2", "LiteSAM"
model_format: str  # "tensorrt", "onnx", "pytorch"
```

**Output**: `bool` - True if loaded

**Processing Flow**:
1. Check if model already loaded
2. Load model file
3. Initialize inference engine
4. Warm up model
5. Cache for reuse

**Test Cases**:
1. Load TensorRT model → succeeds
2. TensorRT unavailable → fallback to ONNX
3. Load all 4 models → all succeed

---

### `get_inference_engine(model_name: str) -> InferenceEngine`

**Description**: Gets inference engine for a model.

**Called By**:
- F07 Sequential VO (SuperPoint, LightGlue)
- F08 Global Place Recognition (DINOv2)
- F09 Metric Refinement (LiteSAM)

**Output**:
```python
InferenceEngine:
    model_name: str
    format: str
    infer(input: np.ndarray) -> np.ndarray
```

**Test Cases**:
1. Get SuperPoint engine → returns engine
2. Call infer() → returns features

---

### `optimize_to_tensorrt(model_name: str, onnx_path: str) -> str`

**Description**: Converts ONNX model to TensorRT for acceleration.

**Called By**: System initialization (one-time)

**Input**:
```python
model_name: str
onnx_path: str  # Path to ONNX model
```

**Output**: `str` - Path to TensorRT engine

**Processing Details**:
- FP16 precision (2-3x speedup)
- Graph fusion and kernel optimization
- One-time conversion, cached for reuse

**Test Cases**:
1. Convert ONNX to TensorRT → engine created
2. Load TensorRT engine → inference faster than ONNX

---

### `fallback_to_onnx(model_name: str) -> bool`

**Description**: Falls back to ONNX if TensorRT fails.

**Called By**: Internal (during load_model)

**Processing Flow**:
1. Detect TensorRT failure
2. Load ONNX model
3. Log warning
4. Continue with ONNX

**Test Cases**:
1. TensorRT fails → ONNX loaded automatically
2. System continues functioning

---

### `warmup_model(model_name: str) -> bool`

**Description**: Warms up model with dummy input.

**Called By**: Internal (after load_model)

**Purpose**: Initialize CUDA kernels, allocate GPU memory

**Test Cases**:
1. Warmup → first real inference fast

## Integration Tests

### Test 1: Model Loading
1. load_model("SuperPoint", "tensorrt")
2. load_model("LightGlue", "tensorrt")
3. load_model("DINOv2", "tensorrt")
4. load_model("LiteSAM", "tensorrt")
5. Verify all loaded

### Test 2: Inference Performance
1. Get inference engine
2. Run inference 100 times
3. Measure average latency
4. Verify meets performance targets

### Test 3: Fallback Scenario
1. Simulate TensorRT failure
2. Verify fallback to ONNX
3. Verify inference still works

## Non-Functional Requirements

### Performance
- **SuperPoint**: ~15ms (TensorRT), ~50ms (ONNX)
- **LightGlue**: ~50ms (TensorRT), ~150ms (ONNX)
- **DINOv2**: ~150ms (TensorRT), ~500ms (ONNX)
- **LiteSAM**: ~60ms (TensorRT), ~200ms (ONNX)

### Memory
- GPU memory: ~4GB for all 4 models

### Reliability
- Graceful fallback to ONNX
- Automatic retry on transient errors

## Dependencies

### External Dependencies
- **TensorRT**: NVIDIA inference optimization
- **ONNX Runtime**: ONNX inference
- **PyTorch**: Model weights (optional)
- **CUDA**: GPU acceleration

## Data Models

### InferenceEngine
```python
class InferenceEngine(ABC):
    model_name: str
    format: str
    
    @abstractmethod
    def infer(self, input: np.ndarray) -> np.ndarray:
        pass
```

### ModelConfig
```python
class ModelConfig(BaseModel):
    model_name: str
    model_path: str
    format: str
    precision: str  # "fp16", "fp32"
    warmup_iterations: int = 3
```