mirror of
https://github.com/azaion/gps-denied-onboard.git
synced 2026-04-23 05:06:38 +00:00
initial structure implemented
docs -> _docs
This commit is contained in:
@@ -0,0 +1,402 @@
|
||||
# Integration Test: Model Manager
|
||||
|
||||
## Summary
|
||||
Validate the Model Manager component responsible for loading, managing, and providing access to TensorRT-optimized deep learning models used by the vision pipeline.
|
||||
|
||||
## Component Under Test
|
||||
**Component**: Model Manager
|
||||
**Location**: `gps_denied_15_model_manager`
|
||||
**Dependencies**:
|
||||
- TensorRT runtime
|
||||
- ONNX model files
|
||||
- GPU (NVIDIA RTX 2060/3070)
|
||||
- Configuration Manager
|
||||
- File system access
|
||||
|
||||
## Detailed Description
|
||||
This test validates that the Model Manager can:
|
||||
1. Load TensorRT engines for SuperPoint, LightGlue, DINOv2, and LiteSAM
|
||||
2. Convert ONNX models to TensorRT engines if needed
|
||||
3. Manage model lifecycle (load, warm-up, unload)
|
||||
4. Provide thread-safe access to models for concurrent requests
|
||||
5. Handle GPU memory allocation efficiently
|
||||
6. Support FP16 precision for performance
|
||||
7. Cache compiled engines for fast startup
|
||||
8. Detect and adapt to available GPU capabilities
|
||||
9. Handle model loading failures gracefully
|
||||
10. Monitor GPU utilization and memory
|
||||
|
||||
Per AC-7 requirement of <5 seconds per image, models must be optimized with TensorRT and loaded efficiently.
|
||||
|
||||
## Input Data
|
||||
|
||||
### Test Case 1: Load SuperPoint Model
|
||||
- **Model**: SuperPoint feature detector
|
||||
- **Input Size**: 1024x683 (for FullHD downscaled)
|
||||
- **Format**: TensorRT engine or ONNX → TensorRT
|
||||
- **Expected**: Model loaded, ready for inference
|
||||
|
||||
### Test Case 2: Load LightGlue Model
|
||||
- **Model**: LightGlue feature matcher
|
||||
- **Input**: Two sets of 256 features each
|
||||
- **Format**: TensorRT engine
|
||||
- **Expected**: Model loaded with correct input/output bindings
|
||||
|
||||
### Test Case 3: Load DINOv2 Model
|
||||
- **Model**: DINOv2-Small for AnyLoc
|
||||
- **Input Size**: 512x512
|
||||
- **Format**: TensorRT engine
|
||||
- **Expected**: Model loaded, optimized for batch processing
|
||||
|
||||
### Test Case 4: Load LiteSAM Model
|
||||
- **Model**: LiteSAM for cross-view matching
|
||||
- **Input**: UAV image + satellite tile
|
||||
- **Format**: TensorRT engine
|
||||
- **Expected**: Model loaded with multi-input support
|
||||
|
||||
### Test Case 5: Cold Start (All Models)
|
||||
- **Scenario**: Load all 4 models from scratch
|
||||
- **Expected**: All models ready within 10 seconds
|
||||
|
||||
### Test Case 6: Warm Start (Cached Engines)
|
||||
- **Scenario**: Load pre-compiled TensorRT engines
|
||||
- **Expected**: All models ready within 2 seconds
|
||||
|
||||
### Test Case 7: Model Inference (SuperPoint)
|
||||
- **Input**: Test image AD000001.jpg (1024x683)
|
||||
- **Expected Output**: Keypoints and descriptors
|
||||
- **Expected**: Inference time <15ms on RTX 3070
|
||||
|
||||
### Test Case 8: Concurrent Inference
|
||||
- **Scenario**: 5 simultaneous inference requests to SuperPoint
|
||||
- **Expected**: All complete successfully, no crashes
|
||||
|
||||
### Test Case 9: FP16 Precision
|
||||
- **Model**: SuperPoint in FP16 mode
|
||||
- **Expected**: 2-3x speedup vs FP32, minimal accuracy loss
|
||||
|
||||
### Test Case 10: GPU Memory Management
|
||||
- **Scenario**: Load all models, monitor GPU memory
|
||||
- **Expected**: Total GPU memory < 6GB (fits on RTX 2060)
|
||||
|
||||
### Test Case 11: Model Unload and Reload
|
||||
- **Scenario**: Unload SuperPoint, reload it
|
||||
- **Expected**: Successful reload, no memory leak
|
||||
|
||||
### Test Case 12: Handle Missing Model File
|
||||
- **Scenario**: Attempt to load non-existent model
|
||||
- **Expected**: Clear error message, graceful failure
|
||||
|
||||
### Test Case 13: Handle Incompatible GPU
|
||||
- **Scenario**: Simulate GPU without required compute capability
|
||||
- **Expected**: Detect and report incompatibility
|
||||
|
||||
### Test Case 14: ONNX to TensorRT Conversion
|
||||
- **Model**: ONNX model file
|
||||
- **Expected**: Automatic conversion to TensorRT, caching
|
||||
|
||||
### Test Case 15: Model Warm-up
|
||||
- **Scenario**: Run warm-up inference after loading
|
||||
- **Expected**: First real inference is fast (no CUDA init overhead)
|
||||
|
||||
## Expected Output
|
||||
|
||||
For each test case:
|
||||
```json
|
||||
{
|
||||
"model_name": "superpoint|lightglue|dinov2|litesam",
|
||||
"load_status": "success|failed",
|
||||
"load_time_ms": <float>,
|
||||
"engine_path": "path/to/tensorrt/engine",
|
||||
"input_shapes": [
|
||||
{"name": "input", "shape": [1, 3, 1024, 683]}
|
||||
],
|
||||
"output_shapes": [
|
||||
{"name": "keypoints", "shape": [1, 256, 2]},
|
||||
{"name": "descriptors", "shape": [1, 256, 256]}
|
||||
],
|
||||
"precision": "fp32|fp16",
|
||||
"gpu_memory_mb": <float>,
|
||||
"inference_time_ms": <float>,
|
||||
"error": "string|null"
|
||||
}
|
||||
```
|
||||
|
||||
## Success Criteria
|
||||
|
||||
**Test Case 1 (SuperPoint)**:
|
||||
- load_status = "success"
|
||||
- load_time_ms < 3000
|
||||
- Model ready for inference
|
||||
|
||||
**Test Case 2 (LightGlue)**:
|
||||
- load_status = "success"
|
||||
- load_time_ms < 3000
|
||||
- Correct input/output bindings
|
||||
|
||||
**Test Case 3 (DINOv2)**:
|
||||
- load_status = "success"
|
||||
- load_time_ms < 5000
|
||||
- Supports batch processing
|
||||
|
||||
**Test Case 4 (LiteSAM)**:
|
||||
- load_status = "success"
|
||||
- load_time_ms < 5000
|
||||
- Multi-input configuration correct
|
||||
|
||||
**Test Case 5 (Cold Start)**:
|
||||
- All 4 models loaded
|
||||
- Total time < 10 seconds
|
||||
- All models functional
|
||||
|
||||
**Test Case 6 (Warm Start)**:
|
||||
- All 4 models loaded from cache
|
||||
- Total time < 2 seconds
|
||||
- Faster than cold start
|
||||
|
||||
**Test Case 7 (Inference)**:
|
||||
- Inference successful
|
||||
- Output shapes correct
|
||||
- inference_time_ms < 15ms (RTX 3070) or < 25ms (RTX 2060)
|
||||
|
||||
**Test Case 8 (Concurrent)**:
|
||||
- All 5 requests complete
|
||||
- No crashes or errors
|
||||
- Average inference time < 50ms
|
||||
|
||||
**Test Case 9 (FP16)**:
|
||||
- FP16 engine loads successfully
|
||||
- inference_time_ms < 8ms (2x speedup)
|
||||
- Output quality acceptable
|
||||
|
||||
**Test Case 10 (GPU Memory)**:
|
||||
- gpu_memory_mb < 6000 (fits RTX 2060 with 6GB VRAM)
|
||||
- No out-of-memory errors
|
||||
- Memory usage stable
|
||||
|
||||
**Test Case 11 (Unload/Reload)**:
|
||||
- Unload successful
|
||||
- Reload successful
|
||||
- Memory freed after unload
|
||||
- No memory leak (< 100MB difference)
|
||||
|
||||
**Test Case 12 (Missing File)**:
|
||||
- load_status = "failed"
|
||||
- error message clear and specific
|
||||
- No crash
|
||||
|
||||
**Test Case 13 (Incompatible GPU)**:
|
||||
- Incompatibility detected
|
||||
- Error message informative
|
||||
- Suggests compatible GPU or CPU fallback
|
||||
|
||||
**Test Case 14 (ONNX Conversion)**:
|
||||
- Conversion successful
|
||||
- TensorRT engine cached
|
||||
- Next load uses cache (faster)
|
||||
|
||||
**Test Case 15 (Warm-up)**:
|
||||
- Warm-up completes < 1 second
|
||||
- First real inference fast (< 20ms)
|
||||
- No CUDA initialization delays
|
||||
|
||||
## Maximum Expected Time
|
||||
- **Load single model (cold)**: < 3 seconds
|
||||
- **Load single model (warm)**: < 500ms
|
||||
- **Load all models (cold)**: < 10 seconds
|
||||
- **Load all models (warm)**: < 2 seconds
|
||||
- **Single inference**: < 15ms (RTX 3070), < 25ms (RTX 2060)
|
||||
- **ONNX conversion**: < 60 seconds (one-time cost)
|
||||
- **Total test suite**: < 180 seconds
|
||||
|
||||
## Test Execution Steps
|
||||
|
||||
1. **Setup Phase**:
|
||||
a. Verify GPU availability and capabilities
|
||||
b. Check TensorRT installation
|
||||
c. Prepare model files (ONNX or pre-compiled engines)
|
||||
d. Initialize Model Manager
|
||||
|
||||
2. **Test Case 1-4 - Load Individual Models**:
|
||||
For each model:
|
||||
a. Call load_model(model_name)
|
||||
b. Measure load time
|
||||
c. Verify model ready
|
||||
d. Check input/output shapes
|
||||
e. Monitor GPU memory
|
||||
|
||||
3. **Test Case 5 - Cold Start**:
|
||||
a. Clear any cached engines
|
||||
b. Load all 4 models
|
||||
c. Measure total time
|
||||
d. Verify all functional
|
||||
|
||||
4. **Test Case 6 - Warm Start**:
|
||||
a. Unload all models
|
||||
b. Load all 4 models again (engines cached)
|
||||
c. Measure total time
|
||||
d. Compare with cold start
|
||||
|
||||
5. **Test Case 7 - Inference**:
|
||||
a. Load test image
|
||||
b. Run SuperPoint inference
|
||||
c. Measure inference time
|
||||
d. Validate output format
|
||||
|
||||
6. **Test Case 8 - Concurrent**:
|
||||
a. Prepare 5 inference requests
|
||||
b. Submit all simultaneously
|
||||
c. Wait for all completions
|
||||
d. Check for errors
|
||||
|
||||
7. **Test Case 9 - FP16**:
|
||||
a. Load SuperPoint in FP16 mode
|
||||
b. Run inference
|
||||
c. Compare speed with FP32
|
||||
d. Validate output quality
|
||||
|
||||
8. **Test Case 10 - GPU Memory**:
|
||||
a. Query GPU memory before loading
|
||||
b. Load all models
|
||||
c. Query GPU memory after loading
|
||||
d. Calculate usage
|
||||
|
||||
9. **Test Case 11 - Unload/Reload**:
|
||||
a. Unload SuperPoint
|
||||
b. Check memory freed
|
||||
c. Reload SuperPoint
|
||||
d. Verify no memory leak
|
||||
|
||||
10. **Test Case 12 - Missing File**:
|
||||
a. Attempt to load non-existent model
|
||||
b. Catch error
|
||||
c. Verify error message
|
||||
d. Check no crash
|
||||
|
||||
11. **Test Case 13 - Incompatible GPU**:
|
||||
a. Check GPU compute capability
|
||||
b. If incompatible, verify detection
|
||||
c. Check error handling
|
||||
|
||||
12. **Test Case 14 - ONNX Conversion**:
|
||||
a. Provide ONNX model file
|
||||
b. Trigger conversion
|
||||
c. Verify TensorRT engine created
|
||||
d. Check caching works
|
||||
|
||||
13. **Test Case 15 - Warm-up**:
|
||||
a. Load model
|
||||
b. Run warm-up inference(s)
|
||||
c. Run real inference
|
||||
d. Verify fast execution
|
||||
|
||||
## Pass/Fail Criteria
|
||||
|
||||
**Overall Test Passes If**:
|
||||
- All models load successfully (cold and warm start)
|
||||
- Total load time < 10 seconds (cold), < 2 seconds (warm)
|
||||
- Inference times meet AC-7 requirements (contribute to < 5s per image)
|
||||
- GPU memory usage < 6GB
|
||||
- No memory leaks
|
||||
- Concurrent inference works correctly
|
||||
- Error handling robust
|
||||
|
||||
**Test Fails If**:
|
||||
- Any model fails to load (with valid files)
|
||||
- Load time exceeds 15 seconds
|
||||
- Inference time > 50ms on RTX 3070 or > 100ms on RTX 2060
|
||||
- GPU memory exceeds 8GB
|
||||
- Memory leaks detected (> 500MB growth)
|
||||
- Concurrent inference causes crashes
|
||||
- Invalid error handling
|
||||
|
||||
## Additional Validation
|
||||
|
||||
**TensorRT Optimization**:
|
||||
Verify optimizations:
|
||||
- Layer fusion
|
||||
- Kernel auto-tuning
|
||||
- Dynamic tensor memory allocation
|
||||
- FP16/INT8 quantization support
|
||||
|
||||
**Model Configuration**:
|
||||
- Input preprocessing (normalization, resizing)
|
||||
- Output postprocessing (NMS, thresholding)
|
||||
- Batch size configuration
|
||||
- Dynamic vs static shapes
|
||||
|
||||
**Performance Benchmarks**:
|
||||
|
||||
**SuperPoint**:
|
||||
- FP32: ~15ms on RTX 3070, ~25ms on RTX 2060
|
||||
- FP16: ~8ms on RTX 3070, ~12ms on RTX 2060
|
||||
- Keypoints extracted: 200-500 per image
|
||||
|
||||
**LightGlue**:
|
||||
- FP32: ~50ms on RTX 3070, ~100ms on RTX 2060
|
||||
- FP16: ~30ms on RTX 3070, ~60ms on RTX 2060
|
||||
- Matches: 50-200 per image pair
|
||||
|
||||
**DINOv2**:
|
||||
- FP32: ~150ms on RTX 3070, ~250ms on RTX 2060
|
||||
- FP16: ~80ms on RTX 3070, ~130ms on RTX 2060
|
||||
- Feature dimension: 384 (DINOv2-Small)
|
||||
|
||||
**LiteSAM**:
|
||||
- FP32: ~60ms on RTX 3070, ~120ms on RTX 2060
|
||||
- FP16: ~40ms on RTX 3070, ~70ms on RTX 2060
|
||||
- Correspondence points: 100-500
|
||||
|
||||
**Total Vision Pipeline** (L1 + L2 or L3):
|
||||
- Target: < 500ms per image (to meet AC-7 < 5s total)
|
||||
- Current estimate: 15 + 50 = 65ms (L1) or 60ms (L3)
|
||||
- Well within budget
|
||||
|
||||
**Error Scenarios**:
|
||||
1. **CUDA Out of Memory**: Provide clear error with suggestions
|
||||
2. **Model File Corrupted**: Detect and report
|
||||
3. **TensorRT Version Mismatch**: Handle gracefully
|
||||
4. **GPU Driver Issues**: Detect and suggest update
|
||||
5. **Insufficient Compute Capability**: Reject with clear message
|
||||
|
||||
**Thread Safety**:
|
||||
- Multiple threads calling inference simultaneously
|
||||
- Model loading during inference
|
||||
- Thread-safe reference counting
|
||||
- No race conditions
|
||||
|
||||
**Resource Cleanup**:
|
||||
- GPU memory freed on model unload
|
||||
- CUDA contexts released properly
|
||||
- File handles closed
|
||||
- No resource leaks
|
||||
|
||||
**Monitoring and Logging**:
|
||||
- Log model load times
|
||||
- Track inference times (min/max/avg)
|
||||
- Monitor GPU utilization
|
||||
- Alert on performance degradation
|
||||
- Memory usage trends
|
||||
|
||||
**Compatibility Matrix**:
|
||||
|
||||
| GPU Model | VRAM | Compute Capability | FP16 Support | Expected Performance |
|
||||
|-----------|------|-------------------|--------------|---------------------|
|
||||
| RTX 2060 | 6GB | 7.5 | Yes | Baseline (25ms SuperPoint) |
|
||||
| RTX 3070 | 8GB | 8.6 | Yes | ~1.5x faster (15ms) |
|
||||
| RTX 4070 | 12GB | 8.9 | Yes | ~2x faster (10ms) |
|
||||
|
||||
**Model Versions and Updates**:
|
||||
- Support multiple model versions
|
||||
- Graceful migration to new versions
|
||||
- A/B testing capability
|
||||
- Rollback on performance regression
|
||||
|
||||
**Configuration Options**:
|
||||
- Model path configuration
|
||||
- Precision selection (FP32/FP16)
|
||||
- Batch size tuning
|
||||
- Workspace size for TensorRT builder
|
||||
- Engine cache location
|
||||
- Warm-up settings
|
||||
|
||||
Reference in New Issue
Block a user