# Integration Test: Model Manager ## Summary Validate the Model Manager component responsible for loading, managing, and providing access to TensorRT-optimized deep learning models used by the vision pipeline. ## Component Under Test **Component**: Model Manager **Location**: `gps_denied_15_model_manager` **Dependencies**: - TensorRT runtime - ONNX model files - GPU (NVIDIA RTX 2060/3070) - Configuration Manager - File system access ## Detailed Description This test validates that the Model Manager can: 1. Load TensorRT engines for SuperPoint, LightGlue, DINOv2, and LiteSAM 2. Convert ONNX models to TensorRT engines if needed 3. Manage model lifecycle (load, warm-up, unload) 4. Provide thread-safe access to models for concurrent requests 5. Handle GPU memory allocation efficiently 6. Support FP16 precision for performance 7. Cache compiled engines for fast startup 8. Detect and adapt to available GPU capabilities 9. Handle model loading failures gracefully 10. Monitor GPU utilization and memory Per AC-7 requirement of <5 seconds per image, models must be optimized with TensorRT and loaded efficiently. ## Input Data ### Test Case 1: Load SuperPoint Model - **Model**: SuperPoint feature detector - **Input Size**: 1024x683 (for FullHD downscaled) - **Format**: TensorRT engine or ONNX → TensorRT - **Expected**: Model loaded, ready for inference ### Test Case 2: Load LightGlue Model - **Model**: LightGlue feature matcher - **Input**: Two sets of 256 features each - **Format**: TensorRT engine - **Expected**: Model loaded with correct input/output bindings ### Test Case 3: Load DINOv2 Model - **Model**: DINOv2-Small for AnyLoc - **Input Size**: 512x512 - **Format**: TensorRT engine - **Expected**: Model loaded, optimized for batch processing ### Test Case 4: Load LiteSAM Model - **Model**: LiteSAM for cross-view matching - **Input**: UAV image + satellite tile - **Format**: TensorRT engine - **Expected**: Model loaded with multi-input support ### Test Case 5: Cold Start (All Models) - **Scenario**: Load all 4 models from scratch - **Expected**: All models ready within 10 seconds ### Test Case 6: Warm Start (Cached Engines) - **Scenario**: Load pre-compiled TensorRT engines - **Expected**: All models ready within 2 seconds ### Test Case 7: Model Inference (SuperPoint) - **Input**: Test image AD000001.jpg (1024x683) - **Expected Output**: Keypoints and descriptors - **Expected**: Inference time <15ms on RTX 3070 ### Test Case 8: Concurrent Inference - **Scenario**: 5 simultaneous inference requests to SuperPoint - **Expected**: All complete successfully, no crashes ### Test Case 9: FP16 Precision - **Model**: SuperPoint in FP16 mode - **Expected**: 2-3x speedup vs FP32, minimal accuracy loss ### Test Case 10: GPU Memory Management - **Scenario**: Load all models, monitor GPU memory - **Expected**: Total GPU memory < 6GB (fits on RTX 2060) ### Test Case 11: Model Unload and Reload - **Scenario**: Unload SuperPoint, reload it - **Expected**: Successful reload, no memory leak ### Test Case 12: Handle Missing Model File - **Scenario**: Attempt to load non-existent model - **Expected**: Clear error message, graceful failure ### Test Case 13: Handle Incompatible GPU - **Scenario**: Simulate GPU without required compute capability - **Expected**: Detect and report incompatibility ### Test Case 14: ONNX to TensorRT Conversion - **Model**: ONNX model file - **Expected**: Automatic conversion to TensorRT, caching ### Test Case 15: Model Warm-up - **Scenario**: Run warm-up inference after loading - **Expected**: First real inference is fast (no CUDA init overhead) ## Expected Output For each test case: ```json { "model_name": "superpoint|lightglue|dinov2|litesam", "load_status": "success|failed", "load_time_ms": , "engine_path": "path/to/tensorrt/engine", "input_shapes": [ {"name": "input", "shape": [1, 3, 1024, 683]} ], "output_shapes": [ {"name": "keypoints", "shape": [1, 256, 2]}, {"name": "descriptors", "shape": [1, 256, 256]} ], "precision": "fp32|fp16", "gpu_memory_mb": , "inference_time_ms": , "error": "string|null" } ``` ## Success Criteria **Test Case 1 (SuperPoint)**: - load_status = "success" - load_time_ms < 3000 - Model ready for inference **Test Case 2 (LightGlue)**: - load_status = "success" - load_time_ms < 3000 - Correct input/output bindings **Test Case 3 (DINOv2)**: - load_status = "success" - load_time_ms < 5000 - Supports batch processing **Test Case 4 (LiteSAM)**: - load_status = "success" - load_time_ms < 5000 - Multi-input configuration correct **Test Case 5 (Cold Start)**: - All 4 models loaded - Total time < 10 seconds - All models functional **Test Case 6 (Warm Start)**: - All 4 models loaded from cache - Total time < 2 seconds - Faster than cold start **Test Case 7 (Inference)**: - Inference successful - Output shapes correct - inference_time_ms < 15ms (RTX 3070) or < 25ms (RTX 2060) **Test Case 8 (Concurrent)**: - All 5 requests complete - No crashes or errors - Average inference time < 50ms **Test Case 9 (FP16)**: - FP16 engine loads successfully - inference_time_ms < 8ms (2x speedup) - Output quality acceptable **Test Case 10 (GPU Memory)**: - gpu_memory_mb < 6000 (fits RTX 2060 with 6GB VRAM) - No out-of-memory errors - Memory usage stable **Test Case 11 (Unload/Reload)**: - Unload successful - Reload successful - Memory freed after unload - No memory leak (< 100MB difference) **Test Case 12 (Missing File)**: - load_status = "failed" - error message clear and specific - No crash **Test Case 13 (Incompatible GPU)**: - Incompatibility detected - Error message informative - Suggests compatible GPU or CPU fallback **Test Case 14 (ONNX Conversion)**: - Conversion successful - TensorRT engine cached - Next load uses cache (faster) **Test Case 15 (Warm-up)**: - Warm-up completes < 1 second - First real inference fast (< 20ms) - No CUDA initialization delays ## Maximum Expected Time - **Load single model (cold)**: < 3 seconds - **Load single model (warm)**: < 500ms - **Load all models (cold)**: < 10 seconds - **Load all models (warm)**: < 2 seconds - **Single inference**: < 15ms (RTX 3070), < 25ms (RTX 2060) - **ONNX conversion**: < 60 seconds (one-time cost) - **Total test suite**: < 180 seconds ## Test Execution Steps 1. **Setup Phase**: a. Verify GPU availability and capabilities b. Check TensorRT installation c. Prepare model files (ONNX or pre-compiled engines) d. Initialize Model Manager 2. **Test Case 1-4 - Load Individual Models**: For each model: a. Call load_model(model_name) b. Measure load time c. Verify model ready d. Check input/output shapes e. Monitor GPU memory 3. **Test Case 5 - Cold Start**: a. Clear any cached engines b. Load all 4 models c. Measure total time d. Verify all functional 4. **Test Case 6 - Warm Start**: a. Unload all models b. Load all 4 models again (engines cached) c. Measure total time d. Compare with cold start 5. **Test Case 7 - Inference**: a. Load test image b. Run SuperPoint inference c. Measure inference time d. Validate output format 6. **Test Case 8 - Concurrent**: a. Prepare 5 inference requests b. Submit all simultaneously c. Wait for all completions d. Check for errors 7. **Test Case 9 - FP16**: a. Load SuperPoint in FP16 mode b. Run inference c. Compare speed with FP32 d. Validate output quality 8. **Test Case 10 - GPU Memory**: a. Query GPU memory before loading b. Load all models c. Query GPU memory after loading d. Calculate usage 9. **Test Case 11 - Unload/Reload**: a. Unload SuperPoint b. Check memory freed c. Reload SuperPoint d. Verify no memory leak 10. **Test Case 12 - Missing File**: a. Attempt to load non-existent model b. Catch error c. Verify error message d. Check no crash 11. **Test Case 13 - Incompatible GPU**: a. Check GPU compute capability b. If incompatible, verify detection c. Check error handling 12. **Test Case 14 - ONNX Conversion**: a. Provide ONNX model file b. Trigger conversion c. Verify TensorRT engine created d. Check caching works 13. **Test Case 15 - Warm-up**: a. Load model b. Run warm-up inference(s) c. Run real inference d. Verify fast execution ## Pass/Fail Criteria **Overall Test Passes If**: - All models load successfully (cold and warm start) - Total load time < 10 seconds (cold), < 2 seconds (warm) - Inference times meet AC-7 requirements (contribute to < 5s per image) - GPU memory usage < 6GB - No memory leaks - Concurrent inference works correctly - Error handling robust **Test Fails If**: - Any model fails to load (with valid files) - Load time exceeds 15 seconds - Inference time > 50ms on RTX 3070 or > 100ms on RTX 2060 - GPU memory exceeds 8GB - Memory leaks detected (> 500MB growth) - Concurrent inference causes crashes - Invalid error handling ## Additional Validation **TensorRT Optimization**: Verify optimizations: - Layer fusion - Kernel auto-tuning - Dynamic tensor memory allocation - FP16/INT8 quantization support **Model Configuration**: - Input preprocessing (normalization, resizing) - Output postprocessing (NMS, thresholding) - Batch size configuration - Dynamic vs static shapes **Performance Benchmarks**: **SuperPoint**: - FP32: ~15ms on RTX 3070, ~25ms on RTX 2060 - FP16: ~8ms on RTX 3070, ~12ms on RTX 2060 - Keypoints extracted: 200-500 per image **LightGlue**: - FP32: ~50ms on RTX 3070, ~100ms on RTX 2060 - FP16: ~30ms on RTX 3070, ~60ms on RTX 2060 - Matches: 50-200 per image pair **DINOv2**: - FP32: ~150ms on RTX 3070, ~250ms on RTX 2060 - FP16: ~80ms on RTX 3070, ~130ms on RTX 2060 - Feature dimension: 384 (DINOv2-Small) **LiteSAM**: - FP32: ~60ms on RTX 3070, ~120ms on RTX 2060 - FP16: ~40ms on RTX 3070, ~70ms on RTX 2060 - Correspondence points: 100-500 **Total Vision Pipeline** (L1 + L2 or L3): - Target: < 500ms per image (to meet AC-7 < 5s total) - Current estimate: 15 + 50 = 65ms (L1) or 60ms (L3) - Well within budget **Error Scenarios**: 1. **CUDA Out of Memory**: Provide clear error with suggestions 2. **Model File Corrupted**: Detect and report 3. **TensorRT Version Mismatch**: Handle gracefully 4. **GPU Driver Issues**: Detect and suggest update 5. **Insufficient Compute Capability**: Reject with clear message **Thread Safety**: - Multiple threads calling inference simultaneously - Model loading during inference - Thread-safe reference counting - No race conditions **Resource Cleanup**: - GPU memory freed on model unload - CUDA contexts released properly - File handles closed - No resource leaks **Monitoring and Logging**: - Log model load times - Track inference times (min/max/avg) - Monitor GPU utilization - Alert on performance degradation - Memory usage trends **Compatibility Matrix**: | GPU Model | VRAM | Compute Capability | FP16 Support | Expected Performance | |-----------|------|-------------------|--------------|---------------------| | RTX 2060 | 6GB | 7.5 | Yes | Baseline (25ms SuperPoint) | | RTX 3070 | 8GB | 8.6 | Yes | ~1.5x faster (15ms) | | RTX 4070 | 12GB | 8.9 | Yes | ~2x faster (10ms) | **Model Versions and Updates**: - Support multiple model versions - Graceful migration to new versions - A/B testing capability - Rollback on performance regression **Configuration Options**: - Model path configuration - Precision selection (FP32/FP16) - Batch size tuning - Workspace size for TensorRT builder - Engine cache location - Warm-up settings