azaion/gps-denied-desktop

Fork 0

mirror of https://github.com/azaion/gps-denied-desktop.git synced 2026-04-22 22:36:36 +00:00

Files

T

Oleksandr Bezdieniezhnykh ce9760fcbe component assesment and fixes done

2025-11-30 16:09:31 +02:00

4.9 KiB

Raw Blame History

Model Manager

Interface Definition

Interface Name: IModelManager

Interface Methods

class IModelManager(ABC):
    @abstractmethod
    def load_model(self, model_name: str, model_format: str) -> bool:
        pass
    
    @abstractmethod
    def get_inference_engine(self, model_name: str) -> InferenceEngine:
        pass
    
    @abstractmethod
    def optimize_to_tensorrt(self, model_name: str, onnx_path: str) -> str:
        pass
    
    @abstractmethod
    def fallback_to_onnx(self, model_name: str) -> bool:
        pass
    
    @abstractmethod
    def warmup_model(self, model_name: str) -> bool:
        pass

Component Description

Responsibilities

Load ML models (TensorRT primary, ONNX fallback)
Manage model lifecycle (loading, unloading, warmup)
Provide inference engines for:
- SuperPoint (feature extraction)
- LightGlue (feature matching)
- DINOv2 (global descriptors)
- LiteSAM (cross-view matching)
Handle TensorRT optimization and ONNX fallback
Ensure <5s processing requirement through acceleration

Scope

Model loading and caching
TensorRT optimization
ONNX fallback handling
Inference engine abstraction
GPU memory management

API Methods

`load_model(model_name: str, model_format: str) -> bool`

Description: Loads model in specified format.

Called By: F02.1 Flight Lifecycle Manager (during system initialization)

Input:

model_name: str  # "SuperPoint", "LightGlue", "DINOv2", "LiteSAM"
model_format: str  # "tensorrt", "onnx", "pytorch"

Output: bool - True if loaded

Processing Flow:

Check if model already loaded
Load model file
Initialize inference engine
Warm up model
Cache for reuse

Test Cases:

Load TensorRT model → succeeds
TensorRT unavailable → fallback to ONNX
Load all 4 models → all succeed

`get_inference_engine(model_name: str) -> InferenceEngine`

Description: Gets inference engine for a model.

Called By:

F07 Sequential VO (SuperPoint, LightGlue)
F08 Global Place Recognition (DINOv2)
F09 Metric Refinement (LiteSAM)

Output:

InferenceEngine:
    model_name: str
    format: str
    infer(input: np.ndarray) -> np.ndarray

Test Cases:

Get SuperPoint engine → returns engine
Call infer() → returns features

`optimize_to_tensorrt(model_name: str, onnx_path: str) -> str`

Description: Converts ONNX model to TensorRT for acceleration.

Called By: System initialization (one-time)

Input:

model_name: str
onnx_path: str  # Path to ONNX model

Output: str - Path to TensorRT engine

Processing Details:

FP16 precision (2-3x speedup)
Graph fusion and kernel optimization
One-time conversion, cached for reuse

Test Cases:

Convert ONNX to TensorRT → engine created
Load TensorRT engine → inference faster than ONNX

`fallback_to_onnx(model_name: str) -> bool`

Description: Falls back to ONNX if TensorRT fails.

Called By: Internal (during load_model)

Processing Flow:

Detect TensorRT failure
Load ONNX model
Log warning
Continue with ONNX

Test Cases:

TensorRT fails → ONNX loaded automatically
System continues functioning

`warmup_model(model_name: str) -> bool`

Description: Warms up model with dummy input.

Called By: Internal (after load_model)

Purpose: Initialize CUDA kernels, allocate GPU memory

Test Cases:

Warmup → first real inference fast

Integration Tests

Test 1: Model Loading

load_model("SuperPoint", "tensorrt")
load_model("LightGlue", "tensorrt")
load_model("DINOv2", "tensorrt")
load_model("LiteSAM", "tensorrt")
Verify all loaded

Test 2: Inference Performance

Get inference engine
Run inference 100 times
Measure average latency
Verify meets performance targets

Test 3: Fallback Scenario

Simulate TensorRT failure
Verify fallback to ONNX
Verify inference still works

Non-Functional Requirements

Performance

SuperPoint: ~15ms (TensorRT), ~50ms (ONNX)
LightGlue: ~50ms (TensorRT), ~150ms (ONNX)
DINOv2: ~150ms (TensorRT), ~500ms (ONNX)
LiteSAM: ~60ms (TensorRT), ~200ms (ONNX)

Memory

GPU memory: ~4GB for all 4 models

Reliability

Graceful fallback to ONNX
Automatic retry on transient errors

Dependencies

External Dependencies

TensorRT: NVIDIA inference optimization
ONNX Runtime: ONNX inference
PyTorch: Model weights (optional)
CUDA: GPU acceleration

Data Models

InferenceEngine

class InferenceEngine(ABC):
    model_name: str
    format: str
    
    @abstractmethod
    def infer(self, input: np.ndarray) -> np.ndarray:
        pass

ModelConfig

class ModelConfig(BaseModel):
    model_name: str
    model_path: str
    format: str
    precision: str  # "fp16", "fp32"
    warmup_iterations: int = 3

4.9 KiB Raw Blame History

Model Manager

Interface Definition

Interface Methods

Component Description

Responsibilities

Scope

API Methods

load_model(model_name: str, model_format: str) -> bool

get_inference_engine(model_name: str) -> InferenceEngine

optimize_to_tensorrt(model_name: str, onnx_path: str) -> str

fallback_to_onnx(model_name: str) -> bool

warmup_model(model_name: str) -> bool

Integration Tests

Test 1: Model Loading

Test 2: Inference Performance

Test 3: Fallback Scenario

Non-Functional Requirements

Performance

Memory

Reliability

Dependencies

External Dependencies

Data Models

InferenceEngine

ModelConfig

4.9 KiB

Raw Blame History

`load_model(model_name: str, model_format: str) -> bool`

`get_inference_engine(model_name: str) -> InferenceEngine`

`optimize_to_tensorrt(model_name: str, onnx_path: str) -> str`

`fallback_to_onnx(model_name: str) -> bool`

`warmup_model(model_name: str) -> bool`