initial structure implemented

docs -> _docs
2026-04-23 05:06:38 +00:00 · 2025-12-01 14:20:56 +02:00
parent 9134c5db06
commit abc26d5c20
360 changed files with 3881 additions and 101 deletions
@@ -0,0 +1,402 @@
+# Integration Test: Model Manager
+
+## Summary
+Validate the Model Manager component responsible for loading, managing, and providing access to TensorRT-optimized deep learning models used by the vision pipeline.
+
+## Component Under Test
+**Component**: Model Manager
+**Location**: `gps_denied_15_model_manager`
+**Dependencies**:
+- TensorRT runtime
+- ONNX model files
+- GPU (NVIDIA RTX 2060/3070)
+- Configuration Manager
+- File system access
+
+## Detailed Description
+This test validates that the Model Manager can:
+1. Load TensorRT engines for SuperPoint, LightGlue, DINOv2, and LiteSAM
+2. Convert ONNX models to TensorRT engines if needed
+3. Manage model lifecycle (load, warm-up, unload)
+4. Provide thread-safe access to models for concurrent requests
+5. Handle GPU memory allocation efficiently
+6. Support FP16 precision for performance
+7. Cache compiled engines for fast startup
+8. Detect and adapt to available GPU capabilities
+9. Handle model loading failures gracefully
+10. Monitor GPU utilization and memory
+
+Per AC-7 requirement of <5 seconds per image, models must be optimized with TensorRT and loaded efficiently.
+
+## Input Data
+
+### Test Case 1: Load SuperPoint Model
+- **Model**: SuperPoint feature detector
+- **Input Size**: 1024x683 (for FullHD downscaled)
+- **Format**: TensorRT engine or ONNX → TensorRT
+- **Expected**: Model loaded, ready for inference
+
+### Test Case 2: Load LightGlue Model
+- **Model**: LightGlue feature matcher
+- **Input**: Two sets of 256 features each
+- **Format**: TensorRT engine
+- **Expected**: Model loaded with correct input/output bindings
+
+### Test Case 3: Load DINOv2 Model
+- **Model**: DINOv2-Small for AnyLoc
+- **Input Size**: 512x512
+- **Format**: TensorRT engine
+- **Expected**: Model loaded, optimized for batch processing
+
+### Test Case 4: Load LiteSAM Model
+- **Model**: LiteSAM for cross-view matching
+- **Input**: UAV image + satellite tile
+- **Format**: TensorRT engine
+- **Expected**: Model loaded with multi-input support
+
+### Test Case 5: Cold Start (All Models)
+- **Scenario**: Load all 4 models from scratch
+- **Expected**: All models ready within 10 seconds
+
+### Test Case 6: Warm Start (Cached Engines)
+- **Scenario**: Load pre-compiled TensorRT engines
+- **Expected**: All models ready within 2 seconds
+
+### Test Case 7: Model Inference (SuperPoint)
+- **Input**: Test image AD000001.jpg (1024x683)
+- **Expected Output**: Keypoints and descriptors
+- **Expected**: Inference time <15ms on RTX 3070
+
+### Test Case 8: Concurrent Inference
+- **Scenario**: 5 simultaneous inference requests to SuperPoint
+- **Expected**: All complete successfully, no crashes
+
+### Test Case 9: FP16 Precision
+- **Model**: SuperPoint in FP16 mode
+- **Expected**: 2-3x speedup vs FP32, minimal accuracy loss
+
+### Test Case 10: GPU Memory Management
+- **Scenario**: Load all models, monitor GPU memory
+- **Expected**: Total GPU memory < 6GB (fits on RTX 2060)
+
+### Test Case 11: Model Unload and Reload
+- **Scenario**: Unload SuperPoint, reload it
+- **Expected**: Successful reload, no memory leak
+
+### Test Case 12: Handle Missing Model File
+- **Scenario**: Attempt to load non-existent model
+- **Expected**: Clear error message, graceful failure
+
+### Test Case 13: Handle Incompatible GPU
+- **Scenario**: Simulate GPU without required compute capability
+- **Expected**: Detect and report incompatibility
+
+### Test Case 14: ONNX to TensorRT Conversion
+- **Model**: ONNX model file
+- **Expected**: Automatic conversion to TensorRT, caching
+
+### Test Case 15: Model Warm-up
+- **Scenario**: Run warm-up inference after loading
+- **Expected**: First real inference is fast (no CUDA init overhead)
+
+## Expected Output
+
+For each test case:
+```json
+{
+  "model_name": "superpoint|lightglue|dinov2|litesam",
+  "load_status": "success|failed",
+  "load_time_ms": <float>,
+  "engine_path": "path/to/tensorrt/engine",
+  "input_shapes": [
+    {"name": "input", "shape": [1, 3, 1024, 683]}
+  ],
+  "output_shapes": [
+    {"name": "keypoints", "shape": [1, 256, 2]},
+    {"name": "descriptors", "shape": [1, 256, 256]}
+  ],
+  "precision": "fp32|fp16",
+  "gpu_memory_mb": <float>,
+  "inference_time_ms": <float>,
+  "error": "string|null"
+}
+```
+
+## Success Criteria
+
+**Test Case 1 (SuperPoint)**:
+- load_status = "success"
+- load_time_ms < 3000
+- Model ready for inference
+
+**Test Case 2 (LightGlue)**:
+- load_status = "success"
+- load_time_ms < 3000
+- Correct input/output bindings
+
+**Test Case 3 (DINOv2)**:
+- load_status = "success"
+- load_time_ms < 5000
+- Supports batch processing
+
+**Test Case 4 (LiteSAM)**:
+- load_status = "success"
+- load_time_ms < 5000
+- Multi-input configuration correct
+
+**Test Case 5 (Cold Start)**:
+- All 4 models loaded
+- Total time < 10 seconds
+- All models functional
+
+**Test Case 6 (Warm Start)**:
+- All 4 models loaded from cache
+- Total time < 2 seconds
+- Faster than cold start
+
+**Test Case 7 (Inference)**:
+- Inference successful
+- Output shapes correct
+- inference_time_ms < 15ms (RTX 3070) or < 25ms (RTX 2060)
+
+**Test Case 8 (Concurrent)**:
+- All 5 requests complete
+- No crashes or errors
+- Average inference time < 50ms
+
+**Test Case 9 (FP16)**:
+- FP16 engine loads successfully
+- inference_time_ms < 8ms (2x speedup)
+- Output quality acceptable
+
+**Test Case 10 (GPU Memory)**:
+- gpu_memory_mb < 6000 (fits RTX 2060 with 6GB VRAM)
+- No out-of-memory errors
+- Memory usage stable
+
+**Test Case 11 (Unload/Reload)**:
+- Unload successful
+- Reload successful
+- Memory freed after unload
+- No memory leak (< 100MB difference)
+
+**Test Case 12 (Missing File)**:
+- load_status = "failed"
+- error message clear and specific
+- No crash
+
+**Test Case 13 (Incompatible GPU)**:
+- Incompatibility detected
+- Error message informative
+- Suggests compatible GPU or CPU fallback
+
+**Test Case 14 (ONNX Conversion)**:
+- Conversion successful
+- TensorRT engine cached
+- Next load uses cache (faster)
+
+**Test Case 15 (Warm-up)**:
+- Warm-up completes < 1 second
+- First real inference fast (< 20ms)
+- No CUDA initialization delays
+
+## Maximum Expected Time
+- **Load single model (cold)**: < 3 seconds
+- **Load single model (warm)**: < 500ms
+- **Load all models (cold)**: < 10 seconds
+- **Load all models (warm)**: < 2 seconds
+- **Single inference**: < 15ms (RTX 3070), < 25ms (RTX 2060)
+- **ONNX conversion**: < 60 seconds (one-time cost)
+- **Total test suite**: < 180 seconds
+
+## Test Execution Steps
+
+1. **Setup Phase**:
+   a. Verify GPU availability and capabilities
+   b. Check TensorRT installation
+   c. Prepare model files (ONNX or pre-compiled engines)
+   d. Initialize Model Manager
+
+2. **Test Case 1-4 - Load Individual Models**:
+   For each model:
+   a. Call load_model(model_name)
+   b. Measure load time
+   c. Verify model ready
+   d. Check input/output shapes
+   e. Monitor GPU memory
+
+3. **Test Case 5 - Cold Start**:
+   a. Clear any cached engines
+   b. Load all 4 models
+   c. Measure total time
+   d. Verify all functional
+
+4. **Test Case 6 - Warm Start**:
+   a. Unload all models
+   b. Load all 4 models again (engines cached)
+   c. Measure total time
+   d. Compare with cold start
+
+5. **Test Case 7 - Inference**:
+   a. Load test image
+   b. Run SuperPoint inference
+   c. Measure inference time
+   d. Validate output format
+
+6. **Test Case 8 - Concurrent**:
+   a. Prepare 5 inference requests
+   b. Submit all simultaneously
+   c. Wait for all completions
+   d. Check for errors
+
+7. **Test Case 9 - FP16**:
+   a. Load SuperPoint in FP16 mode
+   b. Run inference
+   c. Compare speed with FP32
+   d. Validate output quality
+
+8. **Test Case 10 - GPU Memory**:
+   a. Query GPU memory before loading
+   b. Load all models
+   c. Query GPU memory after loading
+   d. Calculate usage
+
+9. **Test Case 11 - Unload/Reload**:
+   a. Unload SuperPoint
+   b. Check memory freed
+   c. Reload SuperPoint
+   d. Verify no memory leak
+
+10. **Test Case 12 - Missing File**:
+    a. Attempt to load non-existent model
+    b. Catch error
+    c. Verify error message
+    d. Check no crash
+
+11. **Test Case 13 - Incompatible GPU**:
+    a. Check GPU compute capability
+    b. If incompatible, verify detection
+    c. Check error handling
+
+12. **Test Case 14 - ONNX Conversion**:
+    a. Provide ONNX model file
+    b. Trigger conversion
+    c. Verify TensorRT engine created
+    d. Check caching works
+
+13. **Test Case 15 - Warm-up**:
+    a. Load model
+    b. Run warm-up inference(s)
+    c. Run real inference
+    d. Verify fast execution
+
+## Pass/Fail Criteria
+
+**Overall Test Passes If**:
+- All models load successfully (cold and warm start)
+- Total load time < 10 seconds (cold), < 2 seconds (warm)
+- Inference times meet AC-7 requirements (contribute to < 5s per image)
+- GPU memory usage < 6GB
+- No memory leaks
+- Concurrent inference works correctly
+- Error handling robust
+
+**Test Fails If**:
+- Any model fails to load (with valid files)
+- Load time exceeds 15 seconds
+- Inference time > 50ms on RTX 3070 or > 100ms on RTX 2060
+- GPU memory exceeds 8GB
+- Memory leaks detected (> 500MB growth)
+- Concurrent inference causes crashes
+- Invalid error handling
+
+## Additional Validation
+
+**TensorRT Optimization**:
+Verify optimizations:
+- Layer fusion
+- Kernel auto-tuning
+- Dynamic tensor memory allocation
+- FP16/INT8 quantization support
+
+**Model Configuration**:
+- Input preprocessing (normalization, resizing)
+- Output postprocessing (NMS, thresholding)
+- Batch size configuration
+- Dynamic vs static shapes
+
+**Performance Benchmarks**:
+
+**SuperPoint**:
+- FP32: ~15ms on RTX 3070, ~25ms on RTX 2060
+- FP16: ~8ms on RTX 3070, ~12ms on RTX 2060
+- Keypoints extracted: 200-500 per image
+
+**LightGlue**:
+- FP32: ~50ms on RTX 3070, ~100ms on RTX 2060
+- FP16: ~30ms on RTX 3070, ~60ms on RTX 2060
+- Matches: 50-200 per image pair
+
+**DINOv2**:
+- FP32: ~150ms on RTX 3070, ~250ms on RTX 2060
+- FP16: ~80ms on RTX 3070, ~130ms on RTX 2060
+- Feature dimension: 384 (DINOv2-Small)
+
+**LiteSAM**:
+- FP32: ~60ms on RTX 3070, ~120ms on RTX 2060
+- FP16: ~40ms on RTX 3070, ~70ms on RTX 2060
+- Correspondence points: 100-500
+
+**Total Vision Pipeline** (L1 + L2 or L3):
+- Target: < 500ms per image (to meet AC-7 < 5s total)
+- Current estimate: 15 + 50 = 65ms (L1) or 60ms (L3)
+- Well within budget
+
+**Error Scenarios**:
+1. **CUDA Out of Memory**: Provide clear error with suggestions
+2. **Model File Corrupted**: Detect and report
+3. **TensorRT Version Mismatch**: Handle gracefully
+4. **GPU Driver Issues**: Detect and suggest update
+5. **Insufficient Compute Capability**: Reject with clear message
+
+**Thread Safety**:
+- Multiple threads calling inference simultaneously
+- Model loading during inference
+- Thread-safe reference counting
+- No race conditions
+
+**Resource Cleanup**:
+- GPU memory freed on model unload
+- CUDA contexts released properly
+- File handles closed
+- No resource leaks
+
+**Monitoring and Logging**:
+- Log model load times
+- Track inference times (min/max/avg)
+- Monitor GPU utilization
+- Alert on performance degradation
+- Memory usage trends
+
+**Compatibility Matrix**:
+
+| GPU Model | VRAM | Compute Capability | FP16 Support | Expected Performance |
+|-----------|------|-------------------|--------------|---------------------|
+| RTX 2060 | 6GB | 7.5 | Yes | Baseline (25ms SuperPoint) |
+| RTX 3070 | 8GB | 8.6 | Yes | ~1.5x faster (15ms) |
+| RTX 4070 | 12GB | 8.9 | Yes | ~2x faster (10ms) |
+
+**Model Versions and Updates**:
+- Support multiple model versions
+- Graceful migration to new versions
+- A/B testing capability
+- Rollback on performance regression
+
+**Configuration Options**:
+- Model path configuration
+- Precision selection (FP32/FP16)
+- Batch size tuning
+- Workspace size for TensorRT builder
+- Engine cache location
+- Warm-up settings
+