Files
gps-denied-onboard/docs/03_tests/13_model_manager_integration_spec.md
T
Oleksandr Bezdieniezhnykh 4f8c18a066 add tests
gen_tests updated
solution.md updated
2025-11-24 22:57:46 +02:00

11 KiB

Integration Test: Model Manager

Summary

Validate the Model Manager component responsible for loading, managing, and providing access to TensorRT-optimized deep learning models used by the vision pipeline.

Component Under Test

Component: Model Manager Location: gps_denied_15_model_manager Dependencies:

  • TensorRT runtime
  • ONNX model files
  • GPU (NVIDIA RTX 2060/3070)
  • Configuration Manager
  • File system access

Detailed Description

This test validates that the Model Manager can:

  1. Load TensorRT engines for SuperPoint, LightGlue, DINOv2, and LiteSAM
  2. Convert ONNX models to TensorRT engines if needed
  3. Manage model lifecycle (load, warm-up, unload)
  4. Provide thread-safe access to models for concurrent requests
  5. Handle GPU memory allocation efficiently
  6. Support FP16 precision for performance
  7. Cache compiled engines for fast startup
  8. Detect and adapt to available GPU capabilities
  9. Handle model loading failures gracefully
  10. Monitor GPU utilization and memory

Per AC-7 requirement of <5 seconds per image, models must be optimized with TensorRT and loaded efficiently.

Input Data

Test Case 1: Load SuperPoint Model

  • Model: SuperPoint feature detector
  • Input Size: 1024x683 (for FullHD downscaled)
  • Format: TensorRT engine or ONNX → TensorRT
  • Expected: Model loaded, ready for inference

Test Case 2: Load LightGlue Model

  • Model: LightGlue feature matcher
  • Input: Two sets of 256 features each
  • Format: TensorRT engine
  • Expected: Model loaded with correct input/output bindings

Test Case 3: Load DINOv2 Model

  • Model: DINOv2-Small for AnyLoc
  • Input Size: 512x512
  • Format: TensorRT engine
  • Expected: Model loaded, optimized for batch processing

Test Case 4: Load LiteSAM Model

  • Model: LiteSAM for cross-view matching
  • Input: UAV image + satellite tile
  • Format: TensorRT engine
  • Expected: Model loaded with multi-input support

Test Case 5: Cold Start (All Models)

  • Scenario: Load all 4 models from scratch
  • Expected: All models ready within 10 seconds

Test Case 6: Warm Start (Cached Engines)

  • Scenario: Load pre-compiled TensorRT engines
  • Expected: All models ready within 2 seconds

Test Case 7: Model Inference (SuperPoint)

  • Input: Test image AD000001.jpg (1024x683)
  • Expected Output: Keypoints and descriptors
  • Expected: Inference time <15ms on RTX 3070

Test Case 8: Concurrent Inference

  • Scenario: 5 simultaneous inference requests to SuperPoint
  • Expected: All complete successfully, no crashes

Test Case 9: FP16 Precision

  • Model: SuperPoint in FP16 mode
  • Expected: 2-3x speedup vs FP32, minimal accuracy loss

Test Case 10: GPU Memory Management

  • Scenario: Load all models, monitor GPU memory
  • Expected: Total GPU memory < 6GB (fits on RTX 2060)

Test Case 11: Model Unload and Reload

  • Scenario: Unload SuperPoint, reload it
  • Expected: Successful reload, no memory leak

Test Case 12: Handle Missing Model File

  • Scenario: Attempt to load non-existent model
  • Expected: Clear error message, graceful failure

Test Case 13: Handle Incompatible GPU

  • Scenario: Simulate GPU without required compute capability
  • Expected: Detect and report incompatibility

Test Case 14: ONNX to TensorRT Conversion

  • Model: ONNX model file
  • Expected: Automatic conversion to TensorRT, caching

Test Case 15: Model Warm-up

  • Scenario: Run warm-up inference after loading
  • Expected: First real inference is fast (no CUDA init overhead)

Expected Output

For each test case:

{
  "model_name": "superpoint|lightglue|dinov2|litesam",
  "load_status": "success|failed",
  "load_time_ms": <float>,
  "engine_path": "path/to/tensorrt/engine",
  "input_shapes": [
    {"name": "input", "shape": [1, 3, 1024, 683]}
  ],
  "output_shapes": [
    {"name": "keypoints", "shape": [1, 256, 2]},
    {"name": "descriptors", "shape": [1, 256, 256]}
  ],
  "precision": "fp32|fp16",
  "gpu_memory_mb": <float>,
  "inference_time_ms": <float>,
  "error": "string|null"
}

Success Criteria

Test Case 1 (SuperPoint):

  • load_status = "success"
  • load_time_ms < 3000
  • Model ready for inference

Test Case 2 (LightGlue):

  • load_status = "success"
  • load_time_ms < 3000
  • Correct input/output bindings

Test Case 3 (DINOv2):

  • load_status = "success"
  • load_time_ms < 5000
  • Supports batch processing

Test Case 4 (LiteSAM):

  • load_status = "success"
  • load_time_ms < 5000
  • Multi-input configuration correct

Test Case 5 (Cold Start):

  • All 4 models loaded
  • Total time < 10 seconds
  • All models functional

Test Case 6 (Warm Start):

  • All 4 models loaded from cache
  • Total time < 2 seconds
  • Faster than cold start

Test Case 7 (Inference):

  • Inference successful
  • Output shapes correct
  • inference_time_ms < 15ms (RTX 3070) or < 25ms (RTX 2060)

Test Case 8 (Concurrent):

  • All 5 requests complete
  • No crashes or errors
  • Average inference time < 50ms

Test Case 9 (FP16):

  • FP16 engine loads successfully
  • inference_time_ms < 8ms (2x speedup)
  • Output quality acceptable

Test Case 10 (GPU Memory):

  • gpu_memory_mb < 6000 (fits RTX 2060 with 6GB VRAM)
  • No out-of-memory errors
  • Memory usage stable

Test Case 11 (Unload/Reload):

  • Unload successful
  • Reload successful
  • Memory freed after unload
  • No memory leak (< 100MB difference)

Test Case 12 (Missing File):

  • load_status = "failed"
  • error message clear and specific
  • No crash

Test Case 13 (Incompatible GPU):

  • Incompatibility detected
  • Error message informative
  • Suggests compatible GPU or CPU fallback

Test Case 14 (ONNX Conversion):

  • Conversion successful
  • TensorRT engine cached
  • Next load uses cache (faster)

Test Case 15 (Warm-up):

  • Warm-up completes < 1 second
  • First real inference fast (< 20ms)
  • No CUDA initialization delays

Maximum Expected Time

  • Load single model (cold): < 3 seconds
  • Load single model (warm): < 500ms
  • Load all models (cold): < 10 seconds
  • Load all models (warm): < 2 seconds
  • Single inference: < 15ms (RTX 3070), < 25ms (RTX 2060)
  • ONNX conversion: < 60 seconds (one-time cost)
  • Total test suite: < 180 seconds

Test Execution Steps

  1. Setup Phase: a. Verify GPU availability and capabilities b. Check TensorRT installation c. Prepare model files (ONNX or pre-compiled engines) d. Initialize Model Manager

  2. Test Case 1-4 - Load Individual Models: For each model: a. Call load_model(model_name) b. Measure load time c. Verify model ready d. Check input/output shapes e. Monitor GPU memory

  3. Test Case 5 - Cold Start: a. Clear any cached engines b. Load all 4 models c. Measure total time d. Verify all functional

  4. Test Case 6 - Warm Start: a. Unload all models b. Load all 4 models again (engines cached) c. Measure total time d. Compare with cold start

  5. Test Case 7 - Inference: a. Load test image b. Run SuperPoint inference c. Measure inference time d. Validate output format

  6. Test Case 8 - Concurrent: a. Prepare 5 inference requests b. Submit all simultaneously c. Wait for all completions d. Check for errors

  7. Test Case 9 - FP16: a. Load SuperPoint in FP16 mode b. Run inference c. Compare speed with FP32 d. Validate output quality

  8. Test Case 10 - GPU Memory: a. Query GPU memory before loading b. Load all models c. Query GPU memory after loading d. Calculate usage

  9. Test Case 11 - Unload/Reload: a. Unload SuperPoint b. Check memory freed c. Reload SuperPoint d. Verify no memory leak

  10. Test Case 12 - Missing File: a. Attempt to load non-existent model b. Catch error c. Verify error message d. Check no crash

  11. Test Case 13 - Incompatible GPU: a. Check GPU compute capability b. If incompatible, verify detection c. Check error handling

  12. Test Case 14 - ONNX Conversion: a. Provide ONNX model file b. Trigger conversion c. Verify TensorRT engine created d. Check caching works

  13. Test Case 15 - Warm-up: a. Load model b. Run warm-up inference(s) c. Run real inference d. Verify fast execution

Pass/Fail Criteria

Overall Test Passes If:

  • All models load successfully (cold and warm start)
  • Total load time < 10 seconds (cold), < 2 seconds (warm)
  • Inference times meet AC-7 requirements (contribute to < 5s per image)
  • GPU memory usage < 6GB
  • No memory leaks
  • Concurrent inference works correctly
  • Error handling robust

Test Fails If:

  • Any model fails to load (with valid files)
  • Load time exceeds 15 seconds
  • Inference time > 50ms on RTX 3070 or > 100ms on RTX 2060
  • GPU memory exceeds 8GB
  • Memory leaks detected (> 500MB growth)
  • Concurrent inference causes crashes
  • Invalid error handling

Additional Validation

TensorRT Optimization: Verify optimizations:

  • Layer fusion
  • Kernel auto-tuning
  • Dynamic tensor memory allocation
  • FP16/INT8 quantization support

Model Configuration:

  • Input preprocessing (normalization, resizing)
  • Output postprocessing (NMS, thresholding)
  • Batch size configuration
  • Dynamic vs static shapes

Performance Benchmarks:

SuperPoint:

  • FP32: ~15ms on RTX 3070, ~25ms on RTX 2060
  • FP16: ~8ms on RTX 3070, ~12ms on RTX 2060
  • Keypoints extracted: 200-500 per image

LightGlue:

  • FP32: ~50ms on RTX 3070, ~100ms on RTX 2060
  • FP16: ~30ms on RTX 3070, ~60ms on RTX 2060
  • Matches: 50-200 per image pair

DINOv2:

  • FP32: ~150ms on RTX 3070, ~250ms on RTX 2060
  • FP16: ~80ms on RTX 3070, ~130ms on RTX 2060
  • Feature dimension: 384 (DINOv2-Small)

LiteSAM:

  • FP32: ~60ms on RTX 3070, ~120ms on RTX 2060
  • FP16: ~40ms on RTX 3070, ~70ms on RTX 2060
  • Correspondence points: 100-500

Total Vision Pipeline (L1 + L2 or L3):

  • Target: < 500ms per image (to meet AC-7 < 5s total)
  • Current estimate: 15 + 50 = 65ms (L1) or 60ms (L3)
  • Well within budget

Error Scenarios:

  1. CUDA Out of Memory: Provide clear error with suggestions
  2. Model File Corrupted: Detect and report
  3. TensorRT Version Mismatch: Handle gracefully
  4. GPU Driver Issues: Detect and suggest update
  5. Insufficient Compute Capability: Reject with clear message

Thread Safety:

  • Multiple threads calling inference simultaneously
  • Model loading during inference
  • Thread-safe reference counting
  • No race conditions

Resource Cleanup:

  • GPU memory freed on model unload
  • CUDA contexts released properly
  • File handles closed
  • No resource leaks

Monitoring and Logging:

  • Log model load times
  • Track inference times (min/max/avg)
  • Monitor GPU utilization
  • Alert on performance degradation
  • Memory usage trends

Compatibility Matrix:

GPU Model VRAM Compute Capability FP16 Support Expected Performance
RTX 2060 6GB 7.5 Yes Baseline (25ms SuperPoint)
RTX 3070 8GB 8.6 Yes ~1.5x faster (15ms)
RTX 4070 12GB 8.9 Yes ~2x faster (10ms)

Model Versions and Updates:

  • Support multiple model versions
  • Graceful migration to new versions
  • A/B testing capability
  • Rollback on performance regression

Configuration Options:

  • Model path configuration
  • Precision selection (FP32/FP16)
  • Batch size tuning
  • Workspace size for TensorRT builder
  • Engine cache location
  • Warm-up settings