gen_tests updated solution.md updated
11 KiB
Integration Test: Model Manager
Summary
Validate the Model Manager component responsible for loading, managing, and providing access to TensorRT-optimized deep learning models used by the vision pipeline.
Component Under Test
Component: Model Manager
Location: gps_denied_15_model_manager
Dependencies:
- TensorRT runtime
- ONNX model files
- GPU (NVIDIA RTX 2060/3070)
- Configuration Manager
- File system access
Detailed Description
This test validates that the Model Manager can:
- Load TensorRT engines for SuperPoint, LightGlue, DINOv2, and LiteSAM
- Convert ONNX models to TensorRT engines if needed
- Manage model lifecycle (load, warm-up, unload)
- Provide thread-safe access to models for concurrent requests
- Handle GPU memory allocation efficiently
- Support FP16 precision for performance
- Cache compiled engines for fast startup
- Detect and adapt to available GPU capabilities
- Handle model loading failures gracefully
- Monitor GPU utilization and memory
Per AC-7 requirement of <5 seconds per image, models must be optimized with TensorRT and loaded efficiently.
Input Data
Test Case 1: Load SuperPoint Model
- Model: SuperPoint feature detector
- Input Size: 1024x683 (for FullHD downscaled)
- Format: TensorRT engine or ONNX → TensorRT
- Expected: Model loaded, ready for inference
Test Case 2: Load LightGlue Model
- Model: LightGlue feature matcher
- Input: Two sets of 256 features each
- Format: TensorRT engine
- Expected: Model loaded with correct input/output bindings
Test Case 3: Load DINOv2 Model
- Model: DINOv2-Small for AnyLoc
- Input Size: 512x512
- Format: TensorRT engine
- Expected: Model loaded, optimized for batch processing
Test Case 4: Load LiteSAM Model
- Model: LiteSAM for cross-view matching
- Input: UAV image + satellite tile
- Format: TensorRT engine
- Expected: Model loaded with multi-input support
Test Case 5: Cold Start (All Models)
- Scenario: Load all 4 models from scratch
- Expected: All models ready within 10 seconds
Test Case 6: Warm Start (Cached Engines)
- Scenario: Load pre-compiled TensorRT engines
- Expected: All models ready within 2 seconds
Test Case 7: Model Inference (SuperPoint)
- Input: Test image AD000001.jpg (1024x683)
- Expected Output: Keypoints and descriptors
- Expected: Inference time <15ms on RTX 3070
Test Case 8: Concurrent Inference
- Scenario: 5 simultaneous inference requests to SuperPoint
- Expected: All complete successfully, no crashes
Test Case 9: FP16 Precision
- Model: SuperPoint in FP16 mode
- Expected: 2-3x speedup vs FP32, minimal accuracy loss
Test Case 10: GPU Memory Management
- Scenario: Load all models, monitor GPU memory
- Expected: Total GPU memory < 6GB (fits on RTX 2060)
Test Case 11: Model Unload and Reload
- Scenario: Unload SuperPoint, reload it
- Expected: Successful reload, no memory leak
Test Case 12: Handle Missing Model File
- Scenario: Attempt to load non-existent model
- Expected: Clear error message, graceful failure
Test Case 13: Handle Incompatible GPU
- Scenario: Simulate GPU without required compute capability
- Expected: Detect and report incompatibility
Test Case 14: ONNX to TensorRT Conversion
- Model: ONNX model file
- Expected: Automatic conversion to TensorRT, caching
Test Case 15: Model Warm-up
- Scenario: Run warm-up inference after loading
- Expected: First real inference is fast (no CUDA init overhead)
Expected Output
For each test case:
{
"model_name": "superpoint|lightglue|dinov2|litesam",
"load_status": "success|failed",
"load_time_ms": <float>,
"engine_path": "path/to/tensorrt/engine",
"input_shapes": [
{"name": "input", "shape": [1, 3, 1024, 683]}
],
"output_shapes": [
{"name": "keypoints", "shape": [1, 256, 2]},
{"name": "descriptors", "shape": [1, 256, 256]}
],
"precision": "fp32|fp16",
"gpu_memory_mb": <float>,
"inference_time_ms": <float>,
"error": "string|null"
}
Success Criteria
Test Case 1 (SuperPoint):
- load_status = "success"
- load_time_ms < 3000
- Model ready for inference
Test Case 2 (LightGlue):
- load_status = "success"
- load_time_ms < 3000
- Correct input/output bindings
Test Case 3 (DINOv2):
- load_status = "success"
- load_time_ms < 5000
- Supports batch processing
Test Case 4 (LiteSAM):
- load_status = "success"
- load_time_ms < 5000
- Multi-input configuration correct
Test Case 5 (Cold Start):
- All 4 models loaded
- Total time < 10 seconds
- All models functional
Test Case 6 (Warm Start):
- All 4 models loaded from cache
- Total time < 2 seconds
- Faster than cold start
Test Case 7 (Inference):
- Inference successful
- Output shapes correct
- inference_time_ms < 15ms (RTX 3070) or < 25ms (RTX 2060)
Test Case 8 (Concurrent):
- All 5 requests complete
- No crashes or errors
- Average inference time < 50ms
Test Case 9 (FP16):
- FP16 engine loads successfully
- inference_time_ms < 8ms (2x speedup)
- Output quality acceptable
Test Case 10 (GPU Memory):
- gpu_memory_mb < 6000 (fits RTX 2060 with 6GB VRAM)
- No out-of-memory errors
- Memory usage stable
Test Case 11 (Unload/Reload):
- Unload successful
- Reload successful
- Memory freed after unload
- No memory leak (< 100MB difference)
Test Case 12 (Missing File):
- load_status = "failed"
- error message clear and specific
- No crash
Test Case 13 (Incompatible GPU):
- Incompatibility detected
- Error message informative
- Suggests compatible GPU or CPU fallback
Test Case 14 (ONNX Conversion):
- Conversion successful
- TensorRT engine cached
- Next load uses cache (faster)
Test Case 15 (Warm-up):
- Warm-up completes < 1 second
- First real inference fast (< 20ms)
- No CUDA initialization delays
Maximum Expected Time
- Load single model (cold): < 3 seconds
- Load single model (warm): < 500ms
- Load all models (cold): < 10 seconds
- Load all models (warm): < 2 seconds
- Single inference: < 15ms (RTX 3070), < 25ms (RTX 2060)
- ONNX conversion: < 60 seconds (one-time cost)
- Total test suite: < 180 seconds
Test Execution Steps
-
Setup Phase: a. Verify GPU availability and capabilities b. Check TensorRT installation c. Prepare model files (ONNX or pre-compiled engines) d. Initialize Model Manager
-
Test Case 1-4 - Load Individual Models: For each model: a. Call load_model(model_name) b. Measure load time c. Verify model ready d. Check input/output shapes e. Monitor GPU memory
-
Test Case 5 - Cold Start: a. Clear any cached engines b. Load all 4 models c. Measure total time d. Verify all functional
-
Test Case 6 - Warm Start: a. Unload all models b. Load all 4 models again (engines cached) c. Measure total time d. Compare with cold start
-
Test Case 7 - Inference: a. Load test image b. Run SuperPoint inference c. Measure inference time d. Validate output format
-
Test Case 8 - Concurrent: a. Prepare 5 inference requests b. Submit all simultaneously c. Wait for all completions d. Check for errors
-
Test Case 9 - FP16: a. Load SuperPoint in FP16 mode b. Run inference c. Compare speed with FP32 d. Validate output quality
-
Test Case 10 - GPU Memory: a. Query GPU memory before loading b. Load all models c. Query GPU memory after loading d. Calculate usage
-
Test Case 11 - Unload/Reload: a. Unload SuperPoint b. Check memory freed c. Reload SuperPoint d. Verify no memory leak
-
Test Case 12 - Missing File: a. Attempt to load non-existent model b. Catch error c. Verify error message d. Check no crash
-
Test Case 13 - Incompatible GPU: a. Check GPU compute capability b. If incompatible, verify detection c. Check error handling
-
Test Case 14 - ONNX Conversion: a. Provide ONNX model file b. Trigger conversion c. Verify TensorRT engine created d. Check caching works
-
Test Case 15 - Warm-up: a. Load model b. Run warm-up inference(s) c. Run real inference d. Verify fast execution
Pass/Fail Criteria
Overall Test Passes If:
- All models load successfully (cold and warm start)
- Total load time < 10 seconds (cold), < 2 seconds (warm)
- Inference times meet AC-7 requirements (contribute to < 5s per image)
- GPU memory usage < 6GB
- No memory leaks
- Concurrent inference works correctly
- Error handling robust
Test Fails If:
- Any model fails to load (with valid files)
- Load time exceeds 15 seconds
- Inference time > 50ms on RTX 3070 or > 100ms on RTX 2060
- GPU memory exceeds 8GB
- Memory leaks detected (> 500MB growth)
- Concurrent inference causes crashes
- Invalid error handling
Additional Validation
TensorRT Optimization: Verify optimizations:
- Layer fusion
- Kernel auto-tuning
- Dynamic tensor memory allocation
- FP16/INT8 quantization support
Model Configuration:
- Input preprocessing (normalization, resizing)
- Output postprocessing (NMS, thresholding)
- Batch size configuration
- Dynamic vs static shapes
Performance Benchmarks:
SuperPoint:
- FP32: ~15ms on RTX 3070, ~25ms on RTX 2060
- FP16: ~8ms on RTX 3070, ~12ms on RTX 2060
- Keypoints extracted: 200-500 per image
LightGlue:
- FP32: ~50ms on RTX 3070, ~100ms on RTX 2060
- FP16: ~30ms on RTX 3070, ~60ms on RTX 2060
- Matches: 50-200 per image pair
DINOv2:
- FP32: ~150ms on RTX 3070, ~250ms on RTX 2060
- FP16: ~80ms on RTX 3070, ~130ms on RTX 2060
- Feature dimension: 384 (DINOv2-Small)
LiteSAM:
- FP32: ~60ms on RTX 3070, ~120ms on RTX 2060
- FP16: ~40ms on RTX 3070, ~70ms on RTX 2060
- Correspondence points: 100-500
Total Vision Pipeline (L1 + L2 or L3):
- Target: < 500ms per image (to meet AC-7 < 5s total)
- Current estimate: 15 + 50 = 65ms (L1) or 60ms (L3)
- Well within budget
Error Scenarios:
- CUDA Out of Memory: Provide clear error with suggestions
- Model File Corrupted: Detect and report
- TensorRT Version Mismatch: Handle gracefully
- GPU Driver Issues: Detect and suggest update
- Insufficient Compute Capability: Reject with clear message
Thread Safety:
- Multiple threads calling inference simultaneously
- Model loading during inference
- Thread-safe reference counting
- No race conditions
Resource Cleanup:
- GPU memory freed on model unload
- CUDA contexts released properly
- File handles closed
- No resource leaks
Monitoring and Logging:
- Log model load times
- Track inference times (min/max/avg)
- Monitor GPU utilization
- Alert on performance degradation
- Memory usage trends
Compatibility Matrix:
| GPU Model | VRAM | Compute Capability | FP16 Support | Expected Performance |
|---|---|---|---|---|
| RTX 2060 | 6GB | 7.5 | Yes | Baseline (25ms SuperPoint) |
| RTX 3070 | 8GB | 8.6 | Yes | ~1.5x faster (15ms) |
| RTX 4070 | 12GB | 8.9 | Yes | ~2x faster (10ms) |
Model Versions and Updates:
- Support multiple model versions
- Graceful migration to new versions
- A/B testing capability
- Rollback on performance regression
Configuration Options:
- Model path configuration
- Precision selection (FP32/FP16)
- Batch size tuning
- Workspace size for TensorRT builder
- Engine cache location
- Warm-up settings