add tests

gen_tests updated solution.md updated
2026-04-23 01:46:38 +00:00 · 2025-11-24 22:57:46 +02:00
parent f50006d100
commit 4f8c18a066
49 changed files with 7209 additions and 3 deletions
@@ -0,0 +1,181 @@
+# Integration Test: Metric Refinement
+
+## Summary
+Validate the Layer 3 (L3) Metric Refinement component using LiteSAM for precise cross-view geo-localization between UAV images and satellite tiles.
+
+## Component Under Test
+**Component**: Metric Refinement (L3)
+**Location**: `gps_denied_09_metric_refinement`
+**Dependencies**:
+- Model Manager (TensorRT engine for LiteSAM)
+- Global Place Recognition (provides candidate satellite tiles)
+- Coordinate Transformer (pixel-to-GPS conversion)
+- Satellite Data Manager
+
+## Detailed Description
+This test validates that the Metric Refinement component can:
+1. Accept a UAV image and candidate satellite tile from L2
+2. Compute dense correspondence field using LiteSAM
+3. Estimate homography transformation between images
+4. Extract precise GPS coordinates from the homography
+5. Achieve target accuracy of <50m for 80% of images and <20m for 60% of images
+6. Handle scale variations due to altitude changes
+
+The component is critical for meeting the accuracy requirements (AC-1, AC-2) by providing absolute GPS anchors to reset drift in the factor graph.
+
+## Input Data
+
+### Test Case 1: High-Quality Reference Match
+- **UAV Image**: AD000001.jpg
+- **Satellite Tile**: AD000001_gmaps.png (reference image available)
+- **Ground truth GPS**: 48.275292, 37.385220
+- **Expected accuracy**: < 10m (best case scenario)
+
+### Test Case 2: Standard Flight Point
+- **UAV Image**: AD000015.jpg
+- **Satellite Tile**: Retrieved via L2 for this location
+- **Ground truth GPS**: 48.268291, 37.369815
+- **Expected accuracy**: < 20m
+
+### Test Case 3: After Sharp Turn
+- **UAV Image**: AD000033.jpg (after 220m jump)
+- **Satellite Tile**: Retrieved via L2
+- **Ground truth GPS**: 48.258653, 37.347004
+- **Expected accuracy**: < 50m
+
+### Test Case 4: Near Outlier Region
+- **UAV Image**: AD000047.jpg
+- **Satellite Tile**: Retrieved via L2
+- **Ground truth GPS**: 48.249414, 37.343296
+- **Expected accuracy**: < 50m
+
+### Test Case 5: End of Route
+- **UAV Image**: AD000060.jpg
+- **Satellite Tile**: Retrieved via L2
+- **Ground truth GPS**: 48.256246, 37.357485
+- **Expected accuracy**: < 20m
+
+### Test Case 6: Multi-Scale Test
+- **UAV Images**: AD000010.jpg, AD000020.jpg, AD000030.jpg
+- **Context**: Test consistency across different parts of route
+- **Expected**: All should achieve < 50m accuracy
+
+## Expected Output
+
+For each test case:
+```json
+{
+  "success": true/false,
+  "uav_image": "AD000001.jpg",
+  "satellite_tile_id": "tile_xyz",
+  "estimated_gps": {
+    "lat": <float>,
+    "lon": <float>
+  },
+  "ground_truth_gps": {
+    "lat": <float>,
+    "lon": <float>
+  },
+  "error_meters": <float>,
+  "confidence": <float 0-1>,
+  "num_correspondences": <integer>,
+  "homography_matrix": [[h11, h12, h13], [h21, h22, h23], [h31, h32, h33]],
+  "processing_time_ms": <float>
+}
+```
+
+## Success Criteria
+
+**Per Test Case**:
+- success = true
+- num_correspondences > 50
+- confidence > 0.6
+- processing_time_ms < 100ms (RTX 3070) or < 150ms (RTX 2060)
+
+**Test Case Specific Accuracy**:
+- **Test Case 1**: error_meters < 10m
+- **Test Case 2**: error_meters < 20m
+- **Test Case 3**: error_meters < 50m
+- **Test Case 4**: error_meters < 50m
+- **Test Case 5**: error_meters < 20m
+- **Test Case 6**: All three < 50m
+
+**Overall Accuracy Targets** (aligned with AC-1, AC-2):
+- At least 80% of test cases achieve error < 50m
+- At least 60% of test cases achieve error < 20m
+
+## Maximum Expected Time
+- **Per image pair**: < 100ms (on RTX 3070)
+- **Per image pair**: < 150ms (on RTX 2060)
+- **Model loading**: < 5 seconds
+- **Total test suite**: < 15 seconds
+
+## Test Execution Steps
+
+1. **Setup Phase**:
+   a. Initialize Model Manager and load LiteSAM TensorRT engine
+   b. Initialize Satellite Data Manager with pre-cached tiles
+   c. Initialize Coordinate Transformer for GPS calculations
+   d. Verify satellite tiles are georeferenced correctly
+
+2. **For Each Test Case**:
+   a. Load UAV image from test data
+   b. Retrieve appropriate satellite tile (use L2 or pre-specified reference)
+   c. Run LiteSAM to compute correspondence field
+   d. Estimate homography from correspondences
+   e. Extract GPS coordinates using homography and satellite tile georeference
+   f. Calculate haversine distance to ground truth
+   g. Record all metrics
+
+3. **Validation Phase**:
+   a. Calculate percentage achieving <50m accuracy
+   b. Calculate percentage achieving <20m accuracy
+   c. Verify processing times meet constraints
+   d. Check for outliers (errors >100m)
+   e. Validate confidence scores correlate with accuracy
+
+4. **Report Generation**:
+   a. Per-image results table
+   b. Accuracy distribution histogram
+   c. Timing statistics
+   d. Pass/fail determination
+
+## Pass/Fail Criteria
+
+**Overall Test Passes If**:
+- ≥80% of test cases achieve error <50m (meets AC-1)
+- ≥60% of test cases achieve error <20m (meets AC-2)
+- Average processing time <100ms
+- No test case exceeds 200m error
+- Success rate >90%
+
+**Test Fails If**:
+- <80% achieve error <50m
+- <60% achieve error <20m
+- Any test case exceeds 500m error (catastrophic failure)
+- More than 1 test case fails completely (success=false)
+- Average processing time >150ms
+
+## Additional Validation
+
+**Robustness Tests**:
+1. **Scale Variation**: Test with artificially scaled UAV images (0.8x, 1.2x) - should maintain accuracy
+2. **Rotation**: Test with rotated UAV images (±15°) - should detect via rotation manager
+3. **Seasonal Difference**: If available, test with satellite imagery from different season - should maintain <100m accuracy
+4. **Low Contrast**: Test with brightness/contrast adjusted images - should degrade gracefully
+
+**Quality Metrics**:
+- **RMSE (Root Mean Square Error)**: Overall RMSE should be <30m
+- **Median Error**: Should be <25m
+- **90th Percentile Error**: Should be <60m
+- **Correspondence Quality**: Average num_correspondences should be >100
+- **Confidence Correlation**: Correlation between confidence and accuracy should be >0.5
+
+## Error Analysis
+If test fails, analyze:
+- Distribution of errors across test cases
+- Correlation between num_correspondences and accuracy
+- Relationship between GPS distance jumps and accuracy degradation
+- Impact of terrain features (fields vs roads) on accuracy
+- Processing time variance across test cases
+