gen_tests updated solution.md updated
6.1 KiB
Integration Test: Metric Refinement
Summary
Validate the Layer 3 (L3) Metric Refinement component using LiteSAM for precise cross-view geo-localization between UAV images and satellite tiles.
Component Under Test
Component: Metric Refinement (L3)
Location: gps_denied_09_metric_refinement
Dependencies:
- Model Manager (TensorRT engine for LiteSAM)
- Global Place Recognition (provides candidate satellite tiles)
- Coordinate Transformer (pixel-to-GPS conversion)
- Satellite Data Manager
Detailed Description
This test validates that the Metric Refinement component can:
- Accept a UAV image and candidate satellite tile from L2
- Compute dense correspondence field using LiteSAM
- Estimate homography transformation between images
- Extract precise GPS coordinates from the homography
- Achieve target accuracy of <50m for 80% of images and <20m for 60% of images
- Handle scale variations due to altitude changes
The component is critical for meeting the accuracy requirements (AC-1, AC-2) by providing absolute GPS anchors to reset drift in the factor graph.
Input Data
Test Case 1: High-Quality Reference Match
- UAV Image: AD000001.jpg
- Satellite Tile: AD000001_gmaps.png (reference image available)
- Ground truth GPS: 48.275292, 37.385220
- Expected accuracy: < 10m (best case scenario)
Test Case 2: Standard Flight Point
- UAV Image: AD000015.jpg
- Satellite Tile: Retrieved via L2 for this location
- Ground truth GPS: 48.268291, 37.369815
- Expected accuracy: < 20m
Test Case 3: After Sharp Turn
- UAV Image: AD000033.jpg (after 220m jump)
- Satellite Tile: Retrieved via L2
- Ground truth GPS: 48.258653, 37.347004
- Expected accuracy: < 50m
Test Case 4: Near Outlier Region
- UAV Image: AD000047.jpg
- Satellite Tile: Retrieved via L2
- Ground truth GPS: 48.249414, 37.343296
- Expected accuracy: < 50m
Test Case 5: End of Route
- UAV Image: AD000060.jpg
- Satellite Tile: Retrieved via L2
- Ground truth GPS: 48.256246, 37.357485
- Expected accuracy: < 20m
Test Case 6: Multi-Scale Test
- UAV Images: AD000010.jpg, AD000020.jpg, AD000030.jpg
- Context: Test consistency across different parts of route
- Expected: All should achieve < 50m accuracy
Expected Output
For each test case:
{
"success": true/false,
"uav_image": "AD000001.jpg",
"satellite_tile_id": "tile_xyz",
"estimated_gps": {
"lat": <float>,
"lon": <float>
},
"ground_truth_gps": {
"lat": <float>,
"lon": <float>
},
"error_meters": <float>,
"confidence": <float 0-1>,
"num_correspondences": <integer>,
"homography_matrix": [[h11, h12, h13], [h21, h22, h23], [h31, h32, h33]],
"processing_time_ms": <float>
}
Success Criteria
Per Test Case:
- success = true
- num_correspondences > 50
- confidence > 0.6
- processing_time_ms < 100ms (RTX 3070) or < 150ms (RTX 2060)
Test Case Specific Accuracy:
- Test Case 1: error_meters < 10m
- Test Case 2: error_meters < 20m
- Test Case 3: error_meters < 50m
- Test Case 4: error_meters < 50m
- Test Case 5: error_meters < 20m
- Test Case 6: All three < 50m
Overall Accuracy Targets (aligned with AC-1, AC-2):
- At least 80% of test cases achieve error < 50m
- At least 60% of test cases achieve error < 20m
Maximum Expected Time
- Per image pair: < 100ms (on RTX 3070)
- Per image pair: < 150ms (on RTX 2060)
- Model loading: < 5 seconds
- Total test suite: < 15 seconds
Test Execution Steps
-
Setup Phase: a. Initialize Model Manager and load LiteSAM TensorRT engine b. Initialize Satellite Data Manager with pre-cached tiles c. Initialize Coordinate Transformer for GPS calculations d. Verify satellite tiles are georeferenced correctly
-
For Each Test Case: a. Load UAV image from test data b. Retrieve appropriate satellite tile (use L2 or pre-specified reference) c. Run LiteSAM to compute correspondence field d. Estimate homography from correspondences e. Extract GPS coordinates using homography and satellite tile georeference f. Calculate haversine distance to ground truth g. Record all metrics
-
Validation Phase: a. Calculate percentage achieving <50m accuracy b. Calculate percentage achieving <20m accuracy c. Verify processing times meet constraints d. Check for outliers (errors >100m) e. Validate confidence scores correlate with accuracy
-
Report Generation: a. Per-image results table b. Accuracy distribution histogram c. Timing statistics d. Pass/fail determination
Pass/Fail Criteria
Overall Test Passes If:
- ≥80% of test cases achieve error <50m (meets AC-1)
- ≥60% of test cases achieve error <20m (meets AC-2)
- Average processing time <100ms
- No test case exceeds 200m error
- Success rate >90%
Test Fails If:
- <80% achieve error <50m
- <60% achieve error <20m
- Any test case exceeds 500m error (catastrophic failure)
- More than 1 test case fails completely (success=false)
- Average processing time >150ms
Additional Validation
Robustness Tests:
- Scale Variation: Test with artificially scaled UAV images (0.8x, 1.2x) - should maintain accuracy
- Rotation: Test with rotated UAV images (±15°) - should detect via rotation manager
- Seasonal Difference: If available, test with satellite imagery from different season - should maintain <100m accuracy
- Low Contrast: Test with brightness/contrast adjusted images - should degrade gracefully
Quality Metrics:
- RMSE (Root Mean Square Error): Overall RMSE should be <30m
- Median Error: Should be <25m
- 90th Percentile Error: Should be <60m
- Correspondence Quality: Average num_correspondences should be >100
- Confidence Correlation: Correlation between confidence and accuracy should be >0.5
Error Analysis
If test fails, analyze:
- Distribution of errors across test cases
- Correlation between num_correspondences and accuracy
- Relationship between GPS distance jumps and accuracy degradation
- Impact of terrain features (fields vs roads) on accuracy
- Processing time variance across test cases