mirror of https://github.com/azaion/gps-denied-desktop.git synced 2026-04-22 22:06:36 +00:00

Files

T

Oleksandr Bezdieniezhnykh 4f8c18a066 add tests

gen_tests updated
solution.md updated

2025-11-24 22:57:46 +02:00

11 KiB

Raw Blame History

Integration Test: Model Manager

Summary

Validate the Model Manager component responsible for loading, managing, and providing access to TensorRT-optimized deep learning models used by the vision pipeline.

Component Under Test

Component: Model Manager Location: gps_denied_15_model_manager Dependencies:

TensorRT runtime
ONNX model files
GPU (NVIDIA RTX 2060/3070)
Configuration Manager
File system access

Detailed Description

This test validates that the Model Manager can:

Load TensorRT engines for SuperPoint, LightGlue, DINOv2, and LiteSAM
Convert ONNX models to TensorRT engines if needed
Manage model lifecycle (load, warm-up, unload)
Provide thread-safe access to models for concurrent requests
Handle GPU memory allocation efficiently
Support FP16 precision for performance
Cache compiled engines for fast startup
Detect and adapt to available GPU capabilities
Handle model loading failures gracefully
Monitor GPU utilization and memory

Per AC-7 requirement of <5 seconds per image, models must be optimized with TensorRT and loaded efficiently.

Input Data

Test Case 1: Load SuperPoint Model

Model: SuperPoint feature detector
Input Size: 1024x683 (for FullHD downscaled)
Format: TensorRT engine or ONNX → TensorRT
Expected: Model loaded, ready for inference

Test Case 2: Load LightGlue Model

Model: LightGlue feature matcher
Input: Two sets of 256 features each
Format: TensorRT engine
Expected: Model loaded with correct input/output bindings

Test Case 3: Load DINOv2 Model

Model: DINOv2-Small for AnyLoc
Input Size: 512x512
Format: TensorRT engine
Expected: Model loaded, optimized for batch processing

Test Case 4: Load LiteSAM Model

Model: LiteSAM for cross-view matching
Input: UAV image + satellite tile
Format: TensorRT engine
Expected: Model loaded with multi-input support

Test Case 5: Cold Start (All Models)

Scenario: Load all 4 models from scratch
Expected: All models ready within 10 seconds

Test Case 6: Warm Start (Cached Engines)

Scenario: Load pre-compiled TensorRT engines
Expected: All models ready within 2 seconds

Test Case 7: Model Inference (SuperPoint)

Input: Test image AD000001.jpg (1024x683)
Expected Output: Keypoints and descriptors
Expected: Inference time <15ms on RTX 3070

Test Case 8: Concurrent Inference

Scenario: 5 simultaneous inference requests to SuperPoint
Expected: All complete successfully, no crashes

Test Case 9: FP16 Precision

Model: SuperPoint in FP16 mode
Expected: 2-3x speedup vs FP32, minimal accuracy loss

Test Case 10: GPU Memory Management

Scenario: Load all models, monitor GPU memory
Expected: Total GPU memory < 6GB (fits on RTX 2060)

Test Case 11: Model Unload and Reload

Scenario: Unload SuperPoint, reload it
Expected: Successful reload, no memory leak

Test Case 12: Handle Missing Model File

Scenario: Attempt to load non-existent model
Expected: Clear error message, graceful failure

Test Case 13: Handle Incompatible GPU

Scenario: Simulate GPU without required compute capability
Expected: Detect and report incompatibility

Test Case 14: ONNX to TensorRT Conversion

Model: ONNX model file
Expected: Automatic conversion to TensorRT, caching

Test Case 15: Model Warm-up

Scenario: Run warm-up inference after loading
Expected: First real inference is fast (no CUDA init overhead)

Expected Output

For each test case:

{
  "model_name": "superpoint|lightglue|dinov2|litesam",
  "load_status": "success|failed",
  "load_time_ms": <float>,
  "engine_path": "path/to/tensorrt/engine",
  "input_shapes": [
    {"name": "input", "shape": [1, 3, 1024, 683]}
  ],
  "output_shapes": [
    {"name": "keypoints", "shape": [1, 256, 2]},
    {"name": "descriptors", "shape": [1, 256, 256]}
  ],
  "precision": "fp32|fp16",
  "gpu_memory_mb": <float>,
  "inference_time_ms": <float>,
  "error": "string|null"
}

Success Criteria

Test Case 1 (SuperPoint):

load_status = "success"
load_time_ms < 3000
Model ready for inference

Test Case 2 (LightGlue):

load_status = "success"
load_time_ms < 3000
Correct input/output bindings

Test Case 3 (DINOv2):

load_status = "success"
load_time_ms < 5000
Supports batch processing

Test Case 4 (LiteSAM):

load_status = "success"
load_time_ms < 5000
Multi-input configuration correct

Test Case 5 (Cold Start):

All 4 models loaded
Total time < 10 seconds
All models functional

Test Case 6 (Warm Start):

All 4 models loaded from cache
Total time < 2 seconds
Faster than cold start

Test Case 7 (Inference):

Inference successful
Output shapes correct
inference_time_ms < 15ms (RTX 3070) or < 25ms (RTX 2060)

Test Case 8 (Concurrent):

All 5 requests complete
No crashes or errors
Average inference time < 50ms

Test Case 9 (FP16):

FP16 engine loads successfully
inference_time_ms < 8ms (2x speedup)
Output quality acceptable

Test Case 10 (GPU Memory):

gpu_memory_mb < 6000 (fits RTX 2060 with 6GB VRAM)
No out-of-memory errors
Memory usage stable

Test Case 11 (Unload/Reload):

Unload successful
Reload successful
Memory freed after unload
No memory leak (< 100MB difference)

Test Case 12 (Missing File):

load_status = "failed"
error message clear and specific
No crash

Test Case 13 (Incompatible GPU):

Incompatibility detected
Error message informative
Suggests compatible GPU or CPU fallback

Test Case 14 (ONNX Conversion):

Conversion successful
TensorRT engine cached
Next load uses cache (faster)

Test Case 15 (Warm-up):

Warm-up completes < 1 second
First real inference fast (< 20ms)
No CUDA initialization delays

Maximum Expected Time

Load single model (cold): < 3 seconds
Load single model (warm): < 500ms
Load all models (cold): < 10 seconds
Load all models (warm): < 2 seconds
Single inference: < 15ms (RTX 3070), < 25ms (RTX 2060)
ONNX conversion: < 60 seconds (one-time cost)
Total test suite: < 180 seconds

Test Execution Steps

Setup Phase: a. Verify GPU availability and capabilities b. Check TensorRT installation c. Prepare model files (ONNX or pre-compiled engines) d. Initialize Model Manager
Test Case 1-4 - Load Individual Models: For each model: a. Call load_model(model_name) b. Measure load time c. Verify model ready d. Check input/output shapes e. Monitor GPU memory
Test Case 5 - Cold Start: a. Clear any cached engines b. Load all 4 models c. Measure total time d. Verify all functional
Test Case 6 - Warm Start: a. Unload all models b. Load all 4 models again (engines cached) c. Measure total time d. Compare with cold start
Test Case 7 - Inference: a. Load test image b. Run SuperPoint inference c. Measure inference time d. Validate output format
Test Case 8 - Concurrent: a. Prepare 5 inference requests b. Submit all simultaneously c. Wait for all completions d. Check for errors
Test Case 9 - FP16: a. Load SuperPoint in FP16 mode b. Run inference c. Compare speed with FP32 d. Validate output quality
Test Case 10 - GPU Memory: a. Query GPU memory before loading b. Load all models c. Query GPU memory after loading d. Calculate usage
Test Case 11 - Unload/Reload: a. Unload SuperPoint b. Check memory freed c. Reload SuperPoint d. Verify no memory leak
Test Case 12 - Missing File: a. Attempt to load non-existent model b. Catch error c. Verify error message d. Check no crash
Test Case 13 - Incompatible GPU: a. Check GPU compute capability b. If incompatible, verify detection c. Check error handling
Test Case 14 - ONNX Conversion: a. Provide ONNX model file b. Trigger conversion c. Verify TensorRT engine created d. Check caching works
Test Case 15 - Warm-up: a. Load model b. Run warm-up inference(s) c. Run real inference d. Verify fast execution

Pass/Fail Criteria

Overall Test Passes If:

All models load successfully (cold and warm start)
Total load time < 10 seconds (cold), < 2 seconds (warm)
Inference times meet AC-7 requirements (contribute to < 5s per image)
GPU memory usage < 6GB
No memory leaks
Concurrent inference works correctly
Error handling robust

Test Fails If:

Any model fails to load (with valid files)
Load time exceeds 15 seconds
Inference time > 50ms on RTX 3070 or > 100ms on RTX 2060
GPU memory exceeds 8GB
Memory leaks detected (> 500MB growth)
Concurrent inference causes crashes
Invalid error handling

Additional Validation

TensorRT Optimization: Verify optimizations:

Layer fusion
Kernel auto-tuning
Dynamic tensor memory allocation
FP16/INT8 quantization support

Model Configuration:

Input preprocessing (normalization, resizing)
Output postprocessing (NMS, thresholding)
Batch size configuration
Dynamic vs static shapes

Performance Benchmarks:

SuperPoint:

FP32: ~15ms on RTX 3070, ~25ms on RTX 2060
FP16: ~8ms on RTX 3070, ~12ms on RTX 2060
Keypoints extracted: 200-500 per image

LightGlue:

FP32: ~50ms on RTX 3070, ~100ms on RTX 2060
FP16: ~30ms on RTX 3070, ~60ms on RTX 2060
Matches: 50-200 per image pair

DINOv2:

FP32: ~150ms on RTX 3070, ~250ms on RTX 2060
FP16: ~80ms on RTX 3070, ~130ms on RTX 2060
Feature dimension: 384 (DINOv2-Small)

LiteSAM:

FP32: ~60ms on RTX 3070, ~120ms on RTX 2060
FP16: ~40ms on RTX 3070, ~70ms on RTX 2060
Correspondence points: 100-500

Total Vision Pipeline (L1 + L2 or L3):

Target: < 500ms per image (to meet AC-7 < 5s total)
Current estimate: 15 + 50 = 65ms (L1) or 60ms (L3)
Well within budget

Error Scenarios:

CUDA Out of Memory: Provide clear error with suggestions
Model File Corrupted: Detect and report
TensorRT Version Mismatch: Handle gracefully
GPU Driver Issues: Detect and suggest update
Insufficient Compute Capability: Reject with clear message

Thread Safety:

Multiple threads calling inference simultaneously
Model loading during inference
Thread-safe reference counting
No race conditions

Resource Cleanup:

GPU memory freed on model unload
CUDA contexts released properly
File handles closed
No resource leaks

Monitoring and Logging:

Log model load times
Track inference times (min/max/avg)
Monitor GPU utilization
Alert on performance degradation
Memory usage trends

Compatibility Matrix:

GPU Model	VRAM	Compute Capability	FP16 Support	Expected Performance
RTX 2060	6GB	7.5	Yes	Baseline (25ms SuperPoint)
RTX 3070	8GB	8.6	Yes	~1.5x faster (15ms)
RTX 4070	12GB	8.9	Yes	~2x faster (10ms)

Model Versions and Updates:

Support multiple model versions
Graceful migration to new versions
A/B testing capability
Rollback on performance regression

Configuration Options:

Model path configuration
Precision selection (FP32/FP16)
Batch size tuning
Workspace size for TensorRT builder
Engine cache location
Warm-up settings

11 KiB Raw Blame History