initial structure implemented

docs -> _docs
2026-04-22 23:46:36 +00:00 · 2025-12-01 14:20:56 +02:00
parent 9134c5db06
commit abc26d5c20
360 changed files with 3881 additions and 101 deletions
@@ -0,0 +1,63 @@
+# Feature: Model Lifecycle Management
+
+## Description
+
+Manages the complete lifecycle of ML models including loading, caching, warmup, and unloading. Handles all four models (SuperPoint, LightGlue, DINOv2, LiteSAM) with support for multiple formats (TensorRT, ONNX, PyTorch).
+
+## Component APIs Implemented
+
+### `load_model(model_name: str, model_format: str) -> bool`
+- Loads model in specified format
+- Checks if model already loaded (cache hit)
+- Initializes inference engine
+- Triggers warmup
+- Caches for reuse
+
+### `warmup_model(model_name: str) -> bool`
+- Warms up model with dummy input
+- Initializes CUDA kernels
+- Pre-allocates GPU memory
+- Ensures first real inference is fast
+
+## External Tools and Services
+
+- **TensorRT**: Loading TensorRT engine files
+- **ONNX Runtime**: Loading ONNX models
+- **PyTorch**: Loading PyTorch model weights (optional)
+- **CUDA**: GPU memory allocation
+
+## Internal Methods
+
+| Method | Purpose |
+|--------|---------|
+| `_check_model_cache(model_name)` | Check if model already loaded |
+| `_load_tensorrt_engine(path)` | Load TensorRT engine from file |
+| `_load_onnx_model(path)` | Load ONNX model from file |
+| `_allocate_gpu_memory(model)` | Allocate GPU memory for model |
+| `_create_dummy_input(model_name)` | Create appropriate dummy input for warmup |
+| `_cache_model(model_name, engine)` | Store loaded model in cache |
+
+## Unit Tests
+
+| Test | Description | Expected Result |
+|------|-------------|-----------------|
+| UT-16.01-01 | Load TensorRT model | Model loaded, returns True |
+| UT-16.01-02 | Load ONNX model | Model loaded, returns True |
+| UT-16.01-03 | Load already cached model | Returns True immediately (no reload) |
+| UT-16.01-04 | Load invalid model name | Returns False, logs error |
+| UT-16.01-05 | Load invalid model path | Returns False, logs error |
+| UT-16.01-06 | Warmup SuperPoint | CUDA kernels initialized |
+| UT-16.01-07 | Warmup LightGlue | CUDA kernels initialized |
+| UT-16.01-08 | Warmup DINOv2 | CUDA kernels initialized |
+| UT-16.01-09 | Warmup LiteSAM | CUDA kernels initialized |
+| UT-16.01-10 | Warmup unloaded model | Returns False |
+
+## Integration Tests
+
+| Test | Description | Expected Result |
+|------|-------------|-----------------|
+| IT-16.01-01 | Load all 4 models sequentially | All models loaded successfully |
+| IT-16.01-02 | Load + warmup cycle for each model | All models ready for inference |
+| IT-16.01-03 | GPU memory allocation after loading all models | ~4GB GPU memory used |
+| IT-16.01-04 | First inference after warmup | Latency within target range |
+
@@ -0,0 +1,68 @@
+# Feature: Inference Engine Provisioning
+
+## Description
+
+Provides inference engines to consuming components and handles TensorRT optimization with automatic ONNX fallback. Ensures consistent inference interface regardless of underlying format and meets <5s processing requirement through acceleration.
+
+## Component APIs Implemented
+
+### `get_inference_engine(model_name: str) -> InferenceEngine`
+- Returns inference engine for specified model
+- Engine provides unified `infer(input: np.ndarray) -> np.ndarray` interface
+- Consumers: F07 Sequential VO (SuperPoint, LightGlue), F08 GPR (DINOv2), F09 Metric Refinement (LiteSAM)
+
+### `optimize_to_tensorrt(model_name: str, onnx_path: str) -> str`
+- Converts ONNX model to TensorRT engine
+- Applies FP16 precision (2-3x speedup)
+- Performs graph fusion and kernel optimization
+- One-time conversion, result cached
+
+### `fallback_to_onnx(model_name: str) -> bool`
+- Detects TensorRT failure
+- Loads ONNX model as fallback
+- Logs warning for monitoring
+- Ensures system continues functioning
+
+## External Tools and Services
+
+- **TensorRT**: Model optimization and inference
+- **ONNX Runtime**: Fallback inference
+- **CUDA**: GPU execution
+
+## Internal Methods
+
+| Method | Purpose |
+|--------|---------|
+| `_get_cached_engine(model_name)` | Retrieve engine from cache |
+| `_build_tensorrt_engine(onnx_path)` | Build TensorRT engine from ONNX |
+| `_apply_fp16_optimization(builder)` | Enable FP16 precision in TensorRT |
+| `_cache_tensorrt_engine(model_name, path)` | Save TensorRT engine to disk |
+| `_detect_tensorrt_failure(error)` | Determine if error requires ONNX fallback |
+| `_create_inference_wrapper(engine, format)` | Create unified InferenceEngine interface |
+
+## Unit Tests
+
+| Test | Description | Expected Result |
+|------|-------------|-----------------|
+| UT-16.02-01 | Get SuperPoint engine | Returns valid InferenceEngine |
+| UT-16.02-02 | Get LightGlue engine | Returns valid InferenceEngine |
+| UT-16.02-03 | Get DINOv2 engine | Returns valid InferenceEngine |
+| UT-16.02-04 | Get LiteSAM engine | Returns valid InferenceEngine |
+| UT-16.02-05 | Get unloaded model engine | Raises error or returns None |
+| UT-16.02-06 | InferenceEngine.infer() with valid input | Returns features array |
+| UT-16.02-07 | Optimize ONNX to TensorRT | TensorRT engine file created |
+| UT-16.02-08 | TensorRT optimization with FP16 | Engine uses FP16 precision |
+| UT-16.02-09 | Fallback to ONNX on TensorRT failure | ONNX model loaded, returns True |
+| UT-16.02-10 | Fallback logs warning | Warning logged |
+
+## Integration Tests
+
+| Test | Description | Expected Result |
+|------|-------------|-----------------|
+| IT-16.02-01 | SuperPoint inference 100 iterations | Avg latency ~15ms (TensorRT) or ~50ms (ONNX) |
+| IT-16.02-02 | LightGlue inference 100 iterations | Avg latency ~50ms (TensorRT) or ~150ms (ONNX) |
+| IT-16.02-03 | DINOv2 inference 100 iterations | Avg latency ~150ms (TensorRT) or ~500ms (ONNX) |
+| IT-16.02-04 | LiteSAM inference 100 iterations | Avg latency ~60ms (TensorRT) or ~200ms (ONNX) |
+| IT-16.02-05 | Simulate TensorRT failure → ONNX fallback | System continues with ONNX |
+| IT-16.02-06 | Full pipeline: optimize → load → infer | End-to-end works |
+
@@ -0,0 +1,224 @@
+# Model Manager
+
+## Interface Definition
+
+**Interface Name**: `IModelManager`
+
+### Interface Methods
+
+```python
+class IModelManager(ABC):
+    @abstractmethod
+    def load_model(self, model_name: str, model_format: str) -> bool:
+        pass
+    
+    @abstractmethod
+    def get_inference_engine(self, model_name: str) -> InferenceEngine:
+        pass
+    
+    @abstractmethod
+    def optimize_to_tensorrt(self, model_name: str, onnx_path: str) -> str:
+        pass
+    
+    @abstractmethod
+    def fallback_to_onnx(self, model_name: str) -> bool:
+        pass
+    
+    @abstractmethod
+    def warmup_model(self, model_name: str) -> bool:
+        pass
+```
+
+## Component Description
+
+### Responsibilities
+- Load ML models (TensorRT primary, ONNX fallback)
+- Manage model lifecycle (loading, unloading, warmup)
+- Provide inference engines for:
+  - SuperPoint (feature extraction)
+  - LightGlue (feature matching)
+  - DINOv2 (global descriptors)
+  - LiteSAM (cross-view matching)
+- Handle TensorRT optimization and ONNX fallback
+- Ensure <5s processing requirement through acceleration
+
+### Scope
+- Model loading and caching
+- TensorRT optimization
+- ONNX fallback handling
+- Inference engine abstraction
+- GPU memory management
+
+## API Methods
+
+### `load_model(model_name: str, model_format: str) -> bool`
+
+**Description**: Loads model in specified format.
+
+**Called By**: F02.1 Flight Lifecycle Manager (during system initialization)
+
+**Input**:
+```python
+model_name: str  # "SuperPoint", "LightGlue", "DINOv2", "LiteSAM"
+model_format: str  # "tensorrt", "onnx", "pytorch"
+```
+
+**Output**: `bool` - True if loaded
+
+**Processing Flow**:
+1. Check if model already loaded
+2. Load model file
+3. Initialize inference engine
+4. Warm up model
+5. Cache for reuse
+
+**Test Cases**:
+1. Load TensorRT model → succeeds
+2. TensorRT unavailable → fallback to ONNX
+3. Load all 4 models → all succeed
+
+---
+
+### `get_inference_engine(model_name: str) -> InferenceEngine`
+
+**Description**: Gets inference engine for a model.
+
+**Called By**:
+- F07 Sequential VO (SuperPoint, LightGlue)
+- F08 Global Place Recognition (DINOv2)
+- F09 Metric Refinement (LiteSAM)
+
+**Output**:
+```python
+InferenceEngine:
+    model_name: str
+    format: str
+    infer(input: np.ndarray) -> np.ndarray
+```
+
+**Test Cases**:
+1. Get SuperPoint engine → returns engine
+2. Call infer() → returns features
+
+---
+
+### `optimize_to_tensorrt(model_name: str, onnx_path: str) -> str`
+
+**Description**: Converts ONNX model to TensorRT for acceleration.
+
+**Called By**: System initialization (one-time)
+
+**Input**:
+```python
+model_name: str
+onnx_path: str  # Path to ONNX model
+```
+
+**Output**: `str` - Path to TensorRT engine
+
+**Processing Details**:
+- FP16 precision (2-3x speedup)
+- Graph fusion and kernel optimization
+- One-time conversion, cached for reuse
+
+**Test Cases**:
+1. Convert ONNX to TensorRT → engine created
+2. Load TensorRT engine → inference faster than ONNX
+
+---
+
+### `fallback_to_onnx(model_name: str) -> bool`
+
+**Description**: Falls back to ONNX if TensorRT fails.
+
+**Called By**: Internal (during load_model)
+
+**Processing Flow**:
+1. Detect TensorRT failure
+2. Load ONNX model
+3. Log warning
+4. Continue with ONNX
+
+**Test Cases**:
+1. TensorRT fails → ONNX loaded automatically
+2. System continues functioning
+
+---
+
+### `warmup_model(model_name: str) -> bool`
+
+**Description**: Warms up model with dummy input.
+
+**Called By**: Internal (after load_model)
+
+**Purpose**: Initialize CUDA kernels, allocate GPU memory
+
+**Test Cases**:
+1. Warmup → first real inference fast
+
+## Integration Tests
+
+### Test 1: Model Loading
+1. load_model("SuperPoint", "tensorrt")
+2. load_model("LightGlue", "tensorrt")
+3. load_model("DINOv2", "tensorrt")
+4. load_model("LiteSAM", "tensorrt")
+5. Verify all loaded
+
+### Test 2: Inference Performance
+1. Get inference engine
+2. Run inference 100 times
+3. Measure average latency
+4. Verify meets performance targets
+
+### Test 3: Fallback Scenario
+1. Simulate TensorRT failure
+2. Verify fallback to ONNX
+3. Verify inference still works
+
+## Non-Functional Requirements
+
+### Performance
+- **SuperPoint**: ~15ms (TensorRT), ~50ms (ONNX)
+- **LightGlue**: ~50ms (TensorRT), ~150ms (ONNX)
+- **DINOv2**: ~150ms (TensorRT), ~500ms (ONNX)
+- **LiteSAM**: ~60ms (TensorRT), ~200ms (ONNX)
+
+### Memory
+- GPU memory: ~4GB for all 4 models
+
+### Reliability
+- Graceful fallback to ONNX
+- Automatic retry on transient errors
+
+## Dependencies
+
+### External Dependencies
+- **TensorRT**: NVIDIA inference optimization
+- **ONNX Runtime**: ONNX inference
+- **PyTorch**: Model weights (optional)
+- **CUDA**: GPU acceleration
+
+## Data Models
+
+### InferenceEngine
+```python
+class InferenceEngine(ABC):
+    model_name: str
+    format: str
+    
+    @abstractmethod
+    def infer(self, input: np.ndarray) -> np.ndarray:
+        pass
+```
+
+### ModelConfig
+```python
+class ModelConfig(BaseModel):
+    model_name: str
+    model_path: str
+    format: str
+    precision: str  # "fp16", "fp32"
+    warmup_iterations: int = 3
+```
+