# Testing Strategy & Acceptance Test Plan (ATP) ## Overview This document details the comprehensive testing strategy for the UAV Aerial Image Geolocalization System, covering unit tests, integration tests, system tests, field validation, and acceptance criteria. --- ## 1. Test Pyramid Architecture ``` ▲ /|\ / | \ ACCEPTANCE TESTS (5%) ┌─────────────────────────┐ /│ Field Trials │\ / │ Real UAV Flights │ \ / └─────────────────────────┘ \ ╱ ╲ SYSTEM TESTS (15%) ┌────────────────────────────────┐ /│ End-to-End Performance │\ / │ Accuracy, Speed, Robustness │ \ / └────────────────────────────────┘ \ ╱ ╲ INTEGRATION TESTS (30%) ┌─────────────────────────────────┐ /│ Multi-Component Workflows │\ / │ FeatureMatcher → Triangulator │ \ │ Bundle Adjustment Refinement │ └─────────────────────────────────┘ UNIT TESTS (50%) ┌─────────────────────────┐ /│ Individual Components │\ / │ AKAZE, Essential Matrix │ │ Triangulation, BA... │ └─────────────────────────┘ ``` --- ## 2. Detailed Test Categories ### 2.1 Unit Tests (Level 1) #### UT-1: Feature Extraction (AKAZE) ``` Purpose: Verify keypoint detection and descriptor computation Test Data: Synthetic images with known features (checkerboard patterns) Test Cases: ├─ UT-1.1: Basic feature detection │ Input: 1024×768 synthetic image with checkerboard │ Expected: ≥500 keypoints detected │ Pass: count ≥ 500 │ ├─ UT-1.2: Scale invariance │ Input: Same scene at 2x scale │ Expected: Keypoints at proportional positions │ Pass: correlation of positions > 0.9 │ ├─ UT-1.3: Rotation robustness │ Input: Image rotated ±30° │ Expected: Descriptors match original + rotated │ Pass: match rate > 80% │ ├─ UT-1.4: Multi-scale handling │ Input: Image with features at multiple scales │ Expected: Features detected at all scales (pyramid) │ Pass: ratio of scales [1:1.2:1.44:...] verified │ └─ UT-1.5: Performance constraint Input: FullHD image (1920×1080) Expected: <500ms feature extraction Pass: 95th percentile < 500ms ``` #### UT-2: Feature Matching ``` Purpose: Verify robust feature correspondence Test Data: Pairs of synthetic/real images with known correspondence Test Cases: ├─ UT-2.1: Basic matching │ Input: Two images from synthetic scene (90% overlap) │ Expected: ≥95% of ground-truth features matched │ Pass: match_rate ≥ 0.95 │ ├─ UT-2.2: Outlier rejection (Lowe's ratio test) │ Input: Synthetic pair + 50% false features │ Expected: False matches rejected │ Pass: false_match_rate < 0.1 │ ├─ UT-2.3: Low overlap scenario │ Input: Two images with 20% overlap │ Expected: Still matches ≥20 points │ Pass: min_matches ≥ 20 │ └─ UT-2.4: Performance Input: FullHD images, 1000 features each Expected: <300ms matching time Pass: 95th percentile < 300ms ``` #### UT-3: Essential Matrix Estimation ``` Purpose: Verify 5-point/8-point algorithms for camera geometry Test Data: Synthetic correspondences with known relative pose Test Cases: ├─ UT-3.1: 8-point algorithm │ Input: 8+ point correspondences │ Expected: Essential matrix E with rank 2 │ Pass: min_singular_value(E) < 1e-6 │ ├─ UT-3.2: 5-point algorithm │ Input: 5 point correspondences │ Expected: Up to 4 solutions generated │ Pass: num_solutions ∈ [1, 4] │ ├─ UT-3.3: RANSAC convergence │ Input: 100 correspondences, 30% outliers │ Expected: Essential matrix recovery despite outliers │ Pass: inlier_ratio ≥ 0.6 │ └─ UT-3.4: Chirality constraint Input: Multiple (R,t) solutions from decomposition Expected: Only solution with points in front of cameras selected Pass: selected_solution verified via triangulation ``` #### UT-4: Triangulation (DLT) ``` Purpose: Verify 3D point reconstruction from image correspondences Test Data: Synthetic scenes with known 3D geometry Test Cases: ├─ UT-4.1: Accuracy │ Input: Noise-free point correspondences │ Expected: Reconstructed X matches ground truth │ Pass: RMSE < 0.1cm on 1m scene │ ├─ UT-4.2: Outlier handling │ Input: 10 valid + 2 invalid correspondences │ Expected: Invalid points detected (behind camera/far) │ Pass: valid_mask accuracy > 95% │ ├─ UT-4.3: Altitude constraint │ Input: Points with z < 50m (below aircraft) │ Expected: Points rejected │ Pass: altitude_filter works correctly │ └─ UT-4.4: Batch performance Input: 500 point triangulations Expected: <100ms total Pass: 95th percentile < 100ms ``` #### UT-5: Bundle Adjustment ``` Purpose: Verify pose and 3D point optimization Test Data: Synthetic multi-view scenes Test Cases: ├─ UT-5.1: Convergence │ Input: 5 frames with noisy initial poses │ Expected: Residual decreases monotonically │ Pass: final_residual < 0.001 * initial_residual │ ├─ UT-5.2: Covariance computation │ Input: Optimized poses and points │ Expected: Covariance matrix positive-definite │ Pass: all_eigenvalues > 0 │ ├─ UT-5.3: Window size effect │ Input: Same problem with window sizes [3, 5, 10] │ Expected: Larger windows → better residuals │ Pass: residual_5 < residual_3, residual_10 < residual_5 │ └─ UT-5.4: Performance scaling Input: Window size [5, 10, 15, 20] Expected: Time ~= O(w^3) Pass: quadratic fit accurate (R² > 0.95) ``` --- ### 2.2 Integration Tests (Level 2) #### IT-1: Sequential Pipeline ``` Purpose: Verify image-to-image processing chain Test Data: Real aerial image sequences (5-20 images) Test Cases: ├─ IT-1.1: Feature flow │ Features extracted from img₁ → tracked to img₂ → matched │ Expected: Consistent tracking across images │ Pass: ≥70% features tracked end-to-end │ ├─ IT-1.2: Pose chain consistency │ Poses P₁, P₂, P₃ computed sequentially │ Expected: P₃ = P₂ ∘ P₂₋₁ (composition consistency) │ Pass: pose_error < 0.1° rotation, 5cm translation │ ├─ IT-1.3: Trajectory smoothness │ Velocity computed between poses │ Expected: Smooth velocity profile (no jumps) │ Pass: velocity_std_dev < 20% mean_velocity │ └─ IT-1.4: Memory usage Process 100-image sequence Expected: Constant memory (windowed processing) Pass: peak_memory < 2GB ``` #### IT-2: Satellite Georeferencing ``` Purpose: Verify local-to-global coordinate transformation Test Data: Synthetic/real images with known satellite reference Test Cases: ├─ IT-2.1: Feature matching with satellite │ Input: Aerial image + satellite reference │ Expected: ≥10 matched features between viewpoints │ Pass: match_count ≥ 10 │ ├─ IT-2.2: Homography estimation │ Matched features → homography matrix │ Expected: Valid transformation (3×3 matrix) │ Pass: det(H) ≠ 0, condition_number < 100 │ ├─ IT-2.3: GPS transformation accuracy │ Apply homography to image corners │ Expected: Computed GPS ≈ known reference GPS │ Pass: error < 100m (on test data) │ └─ IT-2.4: Confidence scoring Compute inlier_ratio and MI (mutual information) Expected: score = inlier_ratio × MI ∈ [0, 1] Pass: high_confidence for obvious matches ``` #### IT-3: Outlier Detection Chain ``` Purpose: Verify multi-stage outlier detection Test Data: Synthetic trajectory with injected outliers Test Cases: ├─ IT-3.1: Velocity anomaly detection │ Inject 350m jump at frame N │ Expected: Detected as outlier │ Pass: outlier_flag = True │ ├─ IT-3.2: Recovery mechanism │ After outlier detection │ Expected: System attempts skip-frame matching (N→N+2) │ Pass: recovery_successful = True │ ├─ IT-3.3: False positive rate │ Normal sequence with small perturbations │ Expected: <5% false outlier flagging │ Pass: false_positive_rate < 0.05 │ └─ IT-3.4: Consistency across stages Multiple detection stages should agree Pass: agreement_score > 0.8 ``` --- ### 2.3 System Tests (Level 3) #### ST-1: Accuracy Criteria ``` Purpose: Verify system meets ±50m and ±20m accuracy targets Test Data: Real aerial image sequences with ground-truth GPS Test Cases: ├─ ST-1.1: 50m accuracy target │ Input: 500-image flight │ Compute: % images within 50m of ground truth │ Expected: ≥80% │ Pass: accuracy_50m ≥ 0.80 │ ├─ ST-1.2: 20m accuracy target │ Same flight data │ Expected: ≥60% within 20m │ Pass: accuracy_20m ≥ 0.60 │ ├─ ST-1.3: Mean absolute error │ Compute: MAE over all images │ Expected: <40m typical │ Pass: MAE < 50m │ └─ ST-1.4: Error distribution Expected: Error approximately Gaussian Pass: K-S test p-value > 0.05 ``` #### ST-2: Registration Rate ``` Purpose: Verify ≥95% of images successfully registered Test Data: Real flights with various conditions Test Cases: ├─ ST-2.1: Baseline registration │ Good overlap, clear features │ Expected: >98% registration rate │ Pass: registration_rate ≥ 0.98 │ ├─ ST-2.2: Challenging conditions │ Low texture, variable lighting │ Expected: ≥95% registration rate │ Pass: registration_rate ≥ 0.95 │ ├─ ST-2.3: Sharp turns scenario │ Images with <10% overlap │ Expected: Fallback mechanisms trigger, ≥90% success │ Pass: fallback_success_rate ≥ 0.90 │ └─ ST-2.4: Consecutive failures Track max consecutive unregistered images Expected: <3 consecutive failures Pass: max_consecutive_failures ≤ 3 ``` #### ST-3: Reprojection Error ``` Purpose: Verify <1.0 pixel mean reprojection error Test Data: Real flight data after bundle adjustment Test Cases: ├─ ST-3.1: Mean reprojection error │ After BA optimization │ Expected: <1.0 pixel │ Pass: mean_reproj_error < 1.0 │ ├─ ST-3.2: Error distribution │ Histogram of per-point errors │ Expected: Tightly concentrated <2 pixels │ Pass: 95th_percentile < 2.0 px │ ├─ ST-3.3: Per-frame consistency │ Error should not vary dramatically │ Expected: Consistent across frames │ Pass: frame_error_std_dev < 0.3 px │ └─ ST-3.4: Outlier points Very large reprojection errors Expected: <1% of points with error >3 px Pass: outlier_rate < 0.01 ``` #### ST-4: Processing Speed ``` Purpose: Verify <2 seconds per image Test Data: Full flight sequences on target hardware Test Cases: ├─ ST-4.1: Average latency │ Mean processing time per image │ Expected: <2 seconds │ Pass: mean_latency < 2.0 sec │ ├─ ST-4.2: 95th percentile latency │ Worst-case images (complex scenes) │ Expected: <2.5 seconds │ Pass: p95_latency < 2.5 sec │ ├─ ST-4.3: Component breakdown │ Feature extraction: <0.5s │ Matching: <0.3s │ RANSAC: <0.2s │ BA: <0.8s │ Satellite: <0.3s │ Pass: Each component within budget │ └─ ST-4.4: Scaling with problem size Memory usage, CPU usage vs. image resolution Expected: Linear scaling Pass: O(n) complexity verified ``` #### ST-5: Robustness - Outlier Handling ``` Purpose: Verify graceful handling of 350m outlier drifts Test Data: Synthetic/real data with injected outliers Test Cases: ├─ ST-5.1: Single 350m outlier │ Inject outlier at frame N │ Expected: Detected, trajectory continues │ Pass: system_continues = True │ ├─ ST-5.2: Multiple outliers │ 3-5 outliers scattered in sequence │ Expected: All detected, recovery attempted │ Pass: detection_rate ≥ 0.8 │ ├─ ST-5.3: False positive rate │ Normal trajectory, no outliers │ Expected: <5% false flagging │ Pass: false_positive_rate < 0.05 │ └─ ST-5.4: Recovery latency Time to recover after outlier Expected: ≤3 frames Pass: recovery_latency ≤ 3 frames ``` #### ST-6: Robustness - Sharp Turns ``` Purpose: Verify handling of <5% image overlap scenarios Test Data: Synthetic sequences with sharp angles Test Cases: ├─ ST-6.1: 5% overlap matching │ Two images with 5% overlap │ Expected: Minimal matches or skip-frame │ Pass: system_handles_gracefully = True │ ├─ ST-6.2: Skip-frame fallback │ Direct N→N+1 fails, tries N→N+2 │ Expected: Succeeds with N→N+2 │ Pass: skip_frame_success_rate ≥ 0.8 │ ├─ ST-6.3: 90° turn handling │ Images at near-orthogonal angles │ Expected: Degeneracy detected, logged │ Pass: degeneracy_detection = True │ └─ ST-6.4: Trajectory consistency Consecutive turns: check velocity smoothness Expected: No velocity jumps > 50% Pass: velocity_consistency verified ``` --- ### 2.4 Field Acceptance Tests (Level 4) #### FAT-1: Real UAV Flight Trial #1 (Baseline) ``` Scenario: Nominal flight over agricultural field ┌────────────────────────────────────────┐ │ Conditions: │ │ • Clear weather, good sunlight │ │ • Flat terrain, sparse trees │ │ • 300m altitude, 50m/s speed │ │ • 800 images, ~15 min flight │ └────────────────────────────────────────┘ Pass Criteria: ✓ Accuracy: ≥80% within 50m ✓ Accuracy: ≥60% within 20m ✓ Registration rate: ≥95% ✓ Processing time: <2s/image ✓ Satellite validation: <10% outliers ✓ Reprojection error: <1.0px mean Success Metrics: • MAE (mean absolute error): <40m • RMS error: <45m • Max error: <200m • Trajectory coherence: smooth (no jumps) ``` #### FAT-2: Real UAV Flight Trial #2 (Challenging) ``` Scenario: Flight with more complex terrain ┌────────────────────────────────────────┐ │ Conditions: │ │ • Mixed urban/agricultural │ │ • Buildings, vegetation, water bodies │ │ • Variable altitude (250-400m) │ │ • Includes 1-2 sharp turns │ │ • 1200 images, ~25 min flight │ └────────────────────────────────────────┘ Pass Criteria: ✓ Accuracy: ≥75% within 50m (relaxed from 80%) ✓ Accuracy: ≥50% within 20m (relaxed from 60%) ✓ Registration rate: ≥92% (relaxed from 95%) ✓ Processing time: <2.5s/image avg ✓ Outliers detected: <15% (relaxed from 10%) Fallback Validation: ✓ User corrected <20% of uncertain images ✓ After correction, accuracy meets FAT-1 targets ``` #### FAT-3: Real UAV Flight Trial #3 (Edge Case) ``` Scenario: Low-texture flight (challenging for features) ┌────────────────────────────────────────┐ │ Conditions: │ │ • Sandy/desert terrain or water │ │ • Minimal features │ │ • Overcast/variable lighting │ │ • 500-600 images, ~12 min flight │ └────────────────────────────────────────┘ Pass Criteria: ✓ System continues (no crash): YES ✓ Graceful degradation: Flags uncertainty ✓ User can correct and improve: YES ✓ Satellite anchor helps recovery: YES Success Metrics: • >80% of images tagged "uncertain" • After user correction: meets standard targets • Demonstrates fallback mechanisms working ``` --- ## 3. Test Environment Setup ### Hardware Requirements ``` CPU: 16+ cores (Intel Xeon / AMD Ryzen) RAM: 64GB minimum (32GB acceptable for <1500 images) Storage: 1TB SSD (for raw images + processing) GPU: Optional (CUDA 11.8+ for 5-10x acceleration) Network: For satellite API queries (can be cached) ``` ### Software Requirements ``` OS: Ubuntu 20.04 LTS or macOS 12+ Build: CMake 3.20+, GCC 9+ or Clang 11+ Dependencies: OpenCV 4.8+, Eigen 3.4+, GDAL 3.0+ Testing: GoogleTest, Pytest CI/CD: GitHub Actions or Jenkins ``` ### Test Data Management ``` Synthetic Data: Generated via Blender (checked into repo) Real Data: External dataset storage (S3/local SSD) Ground Truth: Maintained in CSV format with metadata Versioning: Git-LFS for binary image data ``` --- ## 4. Test Execution Plan ### Phase 1: Unit Testing (Weeks 1-6) ``` Sprint 1-2: UT-1 (Feature detection) - 2 week Sprint 3-4: UT-2 (Feature matching) - 2 weeks Sprint 5-6: UT-3, UT-4, UT-5 (Geometry) - 2 weeks Continuous: Run full unit test suite every commit Coverage target: >90% code coverage ``` ### Phase 2: Integration Testing (Weeks 7-12) ``` Sprint 7-9: IT-1 (Sequential pipeline) - 3 weeks Sprint 10-11: IT-2, IT-3 (Georef, Outliers) - 2 weeks Sprint 12: System integration - 1 week Continuous: Integration tests run nightly ``` ### Phase 3: System Testing (Weeks 13-18) ``` Sprint 13-14: ST-1, ST-2 (Accuracy, Registration) - 2 weeks Sprint 15-16: ST-3, ST-4 (Error, Speed) - 2 weeks Sprint 17-18: ST-5, ST-6 (Robustness) - 2 weeks Load testing: 1000-3000 image sequences Stress testing: Edge cases, memory limits ``` ### Phase 4: Field Acceptance (Weeks 19-30) ``` Week 19-22: FAT-1 (Baseline trial) • Coordinate 1-2 baseline flights • Validate system on real data • Adjust parameters as needed Week 23-26: FAT-2 (Challenging trial) • More complex scenarios • Test fallback mechanisms • Refine user interface Week 27-30: FAT-3 (Edge case trial) • Low-texture scenarios • Validate robustness • Final adjustments Post-trial: Generate comprehensive report ``` --- ## 5. Acceptance Criteria Summary | Criterion | Target | Test | Pass/Fail | |-----------|--------|------|-----------| | **Accuracy@50m** | ≥80% | FAT-1 | ≥80% pass | | **Accuracy@20m** | ≥60% | FAT-1 | ≥60% pass | | **Registration Rate** | ≥95% | ST-2 | ≥95% pass | | **Reprojection Error** | <1.0px mean | ST-3 | <1.0px pass | | **Processing Speed** | <2.0s/image | ST-4 | p95<2.5s pass | | **Robustness (350m outlier)** | Handled | ST-5 | Continue pass | | **Sharp turns (<5% overlap)** | Handled | ST-6 | Skip-frame pass | | **Satellite validation** | <10% outliers | FAT-1-3 | <10% pass | --- ## 6. Success Metrics **Green Light Criteria** (Ready for production): - ✅ All unit tests pass (100%) - ✅ All integration tests pass (100%) - ✅ All system tests pass (100%) - ✅ FAT-1 and FAT-2 pass acceptance criteria - ✅ FAT-3 shows graceful degradation - ✅ <10% code defects discovered in field trials - ✅ Performance meets SLA consistently **Yellow Light Criteria** (Conditional deployment): - ⚠ 85-89% of acceptance criteria met - ⚠ Minor issues in edge cases - ⚠ Requires workaround documentation - ⚠ Re-test after fixes **Red Light Criteria** (Do not deploy): - ❌ <85% of acceptance criteria met - ❌ Critical failures in core functionality - ❌ Safety/security concerns - ❌ Cannot meet latency or accuracy targets --- ## Conclusion This comprehensive testing strategy ensures the UAV Image Geolocalization System is robust, accurate, and reliable before production deployment. The multi-level approach (unit → integration → system → field) progressively validates system behavior from components to end-to-end scenarios, with field trials providing real-world validation of the acceptance criteria.