Files
gps-denied-desktop/docs/02_components/16_model_manager/model_manager_spec.md
T
2025-11-30 16:09:31 +02:00

4.9 KiB

Model Manager

Interface Definition

Interface Name: IModelManager

Interface Methods

class IModelManager(ABC):
    @abstractmethod
    def load_model(self, model_name: str, model_format: str) -> bool:
        pass
    
    @abstractmethod
    def get_inference_engine(self, model_name: str) -> InferenceEngine:
        pass
    
    @abstractmethod
    def optimize_to_tensorrt(self, model_name: str, onnx_path: str) -> str:
        pass
    
    @abstractmethod
    def fallback_to_onnx(self, model_name: str) -> bool:
        pass
    
    @abstractmethod
    def warmup_model(self, model_name: str) -> bool:
        pass

Component Description

Responsibilities

  • Load ML models (TensorRT primary, ONNX fallback)
  • Manage model lifecycle (loading, unloading, warmup)
  • Provide inference engines for:
    • SuperPoint (feature extraction)
    • LightGlue (feature matching)
    • DINOv2 (global descriptors)
    • LiteSAM (cross-view matching)
  • Handle TensorRT optimization and ONNX fallback
  • Ensure <5s processing requirement through acceleration

Scope

  • Model loading and caching
  • TensorRT optimization
  • ONNX fallback handling
  • Inference engine abstraction
  • GPU memory management

API Methods

load_model(model_name: str, model_format: str) -> bool

Description: Loads model in specified format.

Called By: F02.1 Flight Lifecycle Manager (during system initialization)

Input:

model_name: str  # "SuperPoint", "LightGlue", "DINOv2", "LiteSAM"
model_format: str  # "tensorrt", "onnx", "pytorch"

Output: bool - True if loaded

Processing Flow:

  1. Check if model already loaded
  2. Load model file
  3. Initialize inference engine
  4. Warm up model
  5. Cache for reuse

Test Cases:

  1. Load TensorRT model → succeeds
  2. TensorRT unavailable → fallback to ONNX
  3. Load all 4 models → all succeed

get_inference_engine(model_name: str) -> InferenceEngine

Description: Gets inference engine for a model.

Called By:

  • F07 Sequential VO (SuperPoint, LightGlue)
  • F08 Global Place Recognition (DINOv2)
  • F09 Metric Refinement (LiteSAM)

Output:

InferenceEngine:
    model_name: str
    format: str
    infer(input: np.ndarray) -> np.ndarray

Test Cases:

  1. Get SuperPoint engine → returns engine
  2. Call infer() → returns features

optimize_to_tensorrt(model_name: str, onnx_path: str) -> str

Description: Converts ONNX model to TensorRT for acceleration.

Called By: System initialization (one-time)

Input:

model_name: str
onnx_path: str  # Path to ONNX model

Output: str - Path to TensorRT engine

Processing Details:

  • FP16 precision (2-3x speedup)
  • Graph fusion and kernel optimization
  • One-time conversion, cached for reuse

Test Cases:

  1. Convert ONNX to TensorRT → engine created
  2. Load TensorRT engine → inference faster than ONNX

fallback_to_onnx(model_name: str) -> bool

Description: Falls back to ONNX if TensorRT fails.

Called By: Internal (during load_model)

Processing Flow:

  1. Detect TensorRT failure
  2. Load ONNX model
  3. Log warning
  4. Continue with ONNX

Test Cases:

  1. TensorRT fails → ONNX loaded automatically
  2. System continues functioning

warmup_model(model_name: str) -> bool

Description: Warms up model with dummy input.

Called By: Internal (after load_model)

Purpose: Initialize CUDA kernels, allocate GPU memory

Test Cases:

  1. Warmup → first real inference fast

Integration Tests

Test 1: Model Loading

  1. load_model("SuperPoint", "tensorrt")
  2. load_model("LightGlue", "tensorrt")
  3. load_model("DINOv2", "tensorrt")
  4. load_model("LiteSAM", "tensorrt")
  5. Verify all loaded

Test 2: Inference Performance

  1. Get inference engine
  2. Run inference 100 times
  3. Measure average latency
  4. Verify meets performance targets

Test 3: Fallback Scenario

  1. Simulate TensorRT failure
  2. Verify fallback to ONNX
  3. Verify inference still works

Non-Functional Requirements

Performance

  • SuperPoint: ~15ms (TensorRT), ~50ms (ONNX)
  • LightGlue: ~50ms (TensorRT), ~150ms (ONNX)
  • DINOv2: ~150ms (TensorRT), ~500ms (ONNX)
  • LiteSAM: ~60ms (TensorRT), ~200ms (ONNX)

Memory

  • GPU memory: ~4GB for all 4 models

Reliability

  • Graceful fallback to ONNX
  • Automatic retry on transient errors

Dependencies

External Dependencies

  • TensorRT: NVIDIA inference optimization
  • ONNX Runtime: ONNX inference
  • PyTorch: Model weights (optional)
  • CUDA: GPU acceleration

Data Models

InferenceEngine

class InferenceEngine(ABC):
    model_name: str
    format: str
    
    @abstractmethod
    def infer(self, input: np.ndarray) -> np.ndarray:
        pass

ModelConfig

class ModelConfig(BaseModel):
    model_name: str
    model_path: str
    format: str
    precision: str  # "fp16", "fp32"
    warmup_iterations: int = 3