Files
gps-denied-desktop/docs/01_solution/01_solution_draft/Testing_Strategy_ATP.md
T
2025-11-03 21:18:52 +02:00

20 KiB
Raw Blame History

Testing Strategy & Acceptance Test Plan (ATP)

Overview

This document details the comprehensive testing strategy for the UAV Aerial Image Geolocalization System, covering unit tests, integration tests, system tests, field validation, and acceptance criteria.


1. Test Pyramid Architecture

                          ▲
                         /|\
                        / | \
                    ACCEPTANCE TESTS (5%)
              ┌─────────────────────────┐
             /│      Field Trials       │\
            / │   Real UAV Flights      │ \
           /  └─────────────────────────┘  \
          ╱                                  ╲
      SYSTEM TESTS (15%)
    ┌────────────────────────────────┐
   /│    End-to-End Performance     │\
  / │ Accuracy, Speed, Robustness   │ \
 /  └────────────────────────────────┘  \
╱                                          ╲
    INTEGRATION TESTS (30%)
  ┌─────────────────────────────────┐
 /│  Multi-Component Workflows      │\
/  │  FeatureMatcher → Triangulator │ \
   │  Bundle Adjustment Refinement  │
   └─────────────────────────────────┘

        UNIT TESTS (50%)
      ┌─────────────────────────┐
     /│  Individual Components  │\
    /  │ AKAZE, Essential Matrix │
       │ Triangulation, BA...    │
       └─────────────────────────┘

2. Detailed Test Categories

2.1 Unit Tests (Level 1)

UT-1: Feature Extraction (AKAZE)

Purpose: Verify keypoint detection and descriptor computation
Test Data: Synthetic images with known features (checkerboard patterns)
Test Cases:
  ├─ UT-1.1: Basic feature detection
  │   Input: 1024×768 synthetic image with checkerboard
  │   Expected: ≥500 keypoints detected
  │   Pass: count ≥ 500
  │
  ├─ UT-1.2: Scale invariance
  │   Input: Same scene at 2x scale
  │   Expected: Keypoints at proportional positions
  │   Pass: correlation of positions > 0.9
  │
  ├─ UT-1.3: Rotation robustness
  │   Input: Image rotated ±30°
  │   Expected: Descriptors match original + rotated
  │   Pass: match rate > 80%
  │
  ├─ UT-1.4: Multi-scale handling
  │   Input: Image with features at multiple scales
  │   Expected: Features detected at all scales (pyramid)
  │   Pass: ratio of scales [1:1.2:1.44:...] verified
  │
  └─ UT-1.5: Performance constraint
      Input: FullHD image (1920×1080)
      Expected: <500ms feature extraction
      Pass: 95th percentile < 500ms

UT-2: Feature Matching

Purpose: Verify robust feature correspondence
Test Data: Pairs of synthetic/real images with known correspondence
Test Cases:
  ├─ UT-2.1: Basic matching
  │   Input: Two images from synthetic scene (90% overlap)
  │   Expected: ≥95% of ground-truth features matched
  │   Pass: match_rate ≥ 0.95
  │
  ├─ UT-2.2: Outlier rejection (Lowe's ratio test)
  │   Input: Synthetic pair + 50% false features
  │   Expected: False matches rejected
  │   Pass: false_match_rate < 0.1
  │
  ├─ UT-2.3: Low overlap scenario
  │   Input: Two images with 20% overlap
  │   Expected: Still matches ≥20 points
  │   Pass: min_matches ≥ 20
  │
  └─ UT-2.4: Performance
      Input: FullHD images, 1000 features each
      Expected: <300ms matching time
      Pass: 95th percentile < 300ms

UT-3: Essential Matrix Estimation

Purpose: Verify 5-point/8-point algorithms for camera geometry
Test Data: Synthetic correspondences with known relative pose
Test Cases:
  ├─ UT-3.1: 8-point algorithm
  │   Input: 8+ point correspondences
  │   Expected: Essential matrix E with rank 2
  │   Pass: min_singular_value(E) < 1e-6
  │
  ├─ UT-3.2: 5-point algorithm
  │   Input: 5 point correspondences
  │   Expected: Up to 4 solutions generated
  │   Pass: num_solutions ∈ [1, 4]
  │
  ├─ UT-3.3: RANSAC convergence
  │   Input: 100 correspondences, 30% outliers
  │   Expected: Essential matrix recovery despite outliers
  │   Pass: inlier_ratio ≥ 0.6
  │
  └─ UT-3.4: Chirality constraint
      Input: Multiple (R,t) solutions from decomposition
      Expected: Only solution with points in front of cameras selected
      Pass: selected_solution verified via triangulation

UT-4: Triangulation (DLT)

Purpose: Verify 3D point reconstruction from image correspondences
Test Data: Synthetic scenes with known 3D geometry
Test Cases:
  ├─ UT-4.1: Accuracy
  │   Input: Noise-free point correspondences
  │   Expected: Reconstructed X matches ground truth
  │   Pass: RMSE < 0.1cm on 1m scene
  │
  ├─ UT-4.2: Outlier handling
  │   Input: 10 valid + 2 invalid correspondences
  │   Expected: Invalid points detected (behind camera/far)
  │   Pass: valid_mask accuracy > 95%
  │
  ├─ UT-4.3: Altitude constraint
  │   Input: Points with z < 50m (below aircraft)
  │   Expected: Points rejected
  │   Pass: altitude_filter works correctly
  │
  └─ UT-4.4: Batch performance
      Input: 500 point triangulations
      Expected: <100ms total
      Pass: 95th percentile < 100ms

UT-5: Bundle Adjustment

Purpose: Verify pose and 3D point optimization
Test Data: Synthetic multi-view scenes
Test Cases:
  ├─ UT-5.1: Convergence
  │   Input: 5 frames with noisy initial poses
  │   Expected: Residual decreases monotonically
  │   Pass: final_residual < 0.001 * initial_residual
  │
  ├─ UT-5.2: Covariance computation
  │   Input: Optimized poses and points
  │   Expected: Covariance matrix positive-definite
  │   Pass: all_eigenvalues > 0
  │
  ├─ UT-5.3: Window size effect
  │   Input: Same problem with window sizes [3, 5, 10]
  │   Expected: Larger windows → better residuals
  │   Pass: residual_5 < residual_3, residual_10 < residual_5
  │
  └─ UT-5.4: Performance scaling
      Input: Window size [5, 10, 15, 20]
      Expected: Time ~= O(w^3)
      Pass: quadratic fit accurate (R² > 0.95)

2.2 Integration Tests (Level 2)

IT-1: Sequential Pipeline

Purpose: Verify image-to-image processing chain
Test Data: Real aerial image sequences (5-20 images)
Test Cases:
  ├─ IT-1.1: Feature flow
  │   Features extracted from img₁ → tracked to img₂ → matched
  │   Expected: Consistent tracking across images
  │   Pass: ≥70% features tracked end-to-end
  │
  ├─ IT-1.2: Pose chain consistency
  │   Poses P₁, P₂, P₃ computed sequentially
  │   Expected: P₃ = P₂ ∘ P₂₋₁ (composition consistency)
  │   Pass: pose_error < 0.1° rotation, 5cm translation
  │
  ├─ IT-1.3: Trajectory smoothness
  │   Velocity computed between poses
  │   Expected: Smooth velocity profile (no jumps)
  │   Pass: velocity_std_dev < 20% mean_velocity
  │
  └─ IT-1.4: Memory usage
      Process 100-image sequence
      Expected: Constant memory (windowed processing)
      Pass: peak_memory < 2GB

IT-2: Satellite Georeferencing

Purpose: Verify local-to-global coordinate transformation
Test Data: Synthetic/real images with known satellite reference
Test Cases:
  ├─ IT-2.1: Feature matching with satellite
  │   Input: Aerial image + satellite reference
  │   Expected: ≥10 matched features between viewpoints
  │   Pass: match_count ≥ 10
  │
  ├─ IT-2.2: Homography estimation
  │   Matched features → homography matrix
  │   Expected: Valid transformation (3×3 matrix)
  │   Pass: det(H) ≠ 0, condition_number < 100
  │
  ├─ IT-2.3: GPS transformation accuracy
  │   Apply homography to image corners
  │   Expected: Computed GPS ≈ known reference GPS
  │   Pass: error < 100m (on test data)
  │
  └─ IT-2.4: Confidence scoring
      Compute inlier_ratio and MI (mutual information)
      Expected: score = inlier_ratio × MI ∈ [0, 1]
      Pass: high_confidence for obvious matches

IT-3: Outlier Detection Chain

Purpose: Verify multi-stage outlier detection
Test Data: Synthetic trajectory with injected outliers
Test Cases:
  ├─ IT-3.1: Velocity anomaly detection
  │   Inject 350m jump at frame N
  │   Expected: Detected as outlier
  │   Pass: outlier_flag = True
  │
  ├─ IT-3.2: Recovery mechanism
  │   After outlier detection
  │   Expected: System attempts skip-frame matching (N→N+2)
  │   Pass: recovery_successful = True
  │
  ├─ IT-3.3: False positive rate
  │   Normal sequence with small perturbations
  │   Expected: <5% false outlier flagging
  │   Pass: false_positive_rate < 0.05
  │
  └─ IT-3.4: Consistency across stages
      Multiple detection stages should agree
      Pass: agreement_score > 0.8

2.3 System Tests (Level 3)

ST-1: Accuracy Criteria

Purpose: Verify system meets ±50m and ±20m accuracy targets
Test Data: Real aerial image sequences with ground-truth GPS
Test Cases:
  ├─ ST-1.1: 50m accuracy target
  │   Input: 500-image flight
  │   Compute: % images within 50m of ground truth
  │   Expected: ≥80%
  │   Pass: accuracy_50m ≥ 0.80
  │
  ├─ ST-1.2: 20m accuracy target
  │   Same flight data
  │   Expected: ≥60% within 20m
  │   Pass: accuracy_20m ≥ 0.60
  │
  ├─ ST-1.3: Mean absolute error
  │   Compute: MAE over all images
  │   Expected: <40m typical
  │   Pass: MAE < 50m
  │
  └─ ST-1.4: Error distribution
      Expected: Error approximately Gaussian
      Pass: K-S test p-value > 0.05

ST-2: Registration Rate

Purpose: Verify ≥95% of images successfully registered
Test Data: Real flights with various conditions
Test Cases:
  ├─ ST-2.1: Baseline registration
  │   Good overlap, clear features
  │   Expected: >98% registration rate
  │   Pass: registration_rate ≥ 0.98
  │
  ├─ ST-2.2: Challenging conditions
  │   Low texture, variable lighting
  │   Expected: ≥95% registration rate
  │   Pass: registration_rate ≥ 0.95
  │
  ├─ ST-2.3: Sharp turns scenario
  │   Images with <10% overlap
  │   Expected: Fallback mechanisms trigger, ≥90% success
  │   Pass: fallback_success_rate ≥ 0.90
  │
  └─ ST-2.4: Consecutive failures
      Track max consecutive unregistered images
      Expected: <3 consecutive failures
      Pass: max_consecutive_failures ≤ 3

ST-3: Reprojection Error

Purpose: Verify <1.0 pixel mean reprojection error
Test Data: Real flight data after bundle adjustment
Test Cases:
  ├─ ST-3.1: Mean reprojection error
  │   After BA optimization
  │   Expected: <1.0 pixel
  │   Pass: mean_reproj_error < 1.0
  │
  ├─ ST-3.2: Error distribution
  │   Histogram of per-point errors
  │   Expected: Tightly concentrated <2 pixels
  │   Pass: 95th_percentile < 2.0 px
  │
  ├─ ST-3.3: Per-frame consistency
  │   Error should not vary dramatically
  │   Expected: Consistent across frames
  │   Pass: frame_error_std_dev < 0.3 px
  │
  └─ ST-3.4: Outlier points
      Very large reprojection errors
      Expected: <1% of points with error >3 px
      Pass: outlier_rate < 0.01

ST-4: Processing Speed

Purpose: Verify <2 seconds per image
Test Data: Full flight sequences on target hardware
Test Cases:
  ├─ ST-4.1: Average latency
  │   Mean processing time per image
  │   Expected: <2 seconds
  │   Pass: mean_latency < 2.0 sec
  │
  ├─ ST-4.2: 95th percentile latency
  │   Worst-case images (complex scenes)
  │   Expected: <2.5 seconds
  │   Pass: p95_latency < 2.5 sec
  │
  ├─ ST-4.3: Component breakdown
  │   Feature extraction: <0.5s
  │   Matching: <0.3s
  │   RANSAC: <0.2s
  │   BA: <0.8s
  │   Satellite: <0.3s
  │   Pass: Each component within budget
  │
  └─ ST-4.4: Scaling with problem size
      Memory usage, CPU usage vs. image resolution
      Expected: Linear scaling
      Pass: O(n) complexity verified

ST-5: Robustness - Outlier Handling

Purpose: Verify graceful handling of 350m outlier drifts
Test Data: Synthetic/real data with injected outliers
Test Cases:
  ├─ ST-5.1: Single 350m outlier
  │   Inject outlier at frame N
  │   Expected: Detected, trajectory continues
  │   Pass: system_continues = True
  │
  ├─ ST-5.2: Multiple outliers
  │   3-5 outliers scattered in sequence
  │   Expected: All detected, recovery attempted
  │   Pass: detection_rate ≥ 0.8
  │
  ├─ ST-5.3: False positive rate
  │   Normal trajectory, no outliers
  │   Expected: <5% false flagging
  │   Pass: false_positive_rate < 0.05
  │
  └─ ST-5.4: Recovery latency
      Time to recover after outlier
      Expected: ≤3 frames
      Pass: recovery_latency ≤ 3 frames

ST-6: Robustness - Sharp Turns

Purpose: Verify handling of <5% image overlap scenarios
Test Data: Synthetic sequences with sharp angles
Test Cases:
  ├─ ST-6.1: 5% overlap matching
  │   Two images with 5% overlap
  │   Expected: Minimal matches or skip-frame
  │   Pass: system_handles_gracefully = True
  │
  ├─ ST-6.2: Skip-frame fallback
  │   Direct N→N+1 fails, tries N→N+2
  │   Expected: Succeeds with N→N+2
  │   Pass: skip_frame_success_rate ≥ 0.8
  │
  ├─ ST-6.3: 90° turn handling
  │   Images at near-orthogonal angles
  │   Expected: Degeneracy detected, logged
  │   Pass: degeneracy_detection = True
  │
  └─ ST-6.4: Trajectory consistency
      Consecutive turns: check velocity smoothness
      Expected: No velocity jumps > 50%
      Pass: velocity_consistency verified

2.4 Field Acceptance Tests (Level 4)

FAT-1: Real UAV Flight Trial #1 (Baseline)

Scenario: Nominal flight over agricultural field
┌────────────────────────────────────────┐
│ Conditions:                             │
│ • Clear weather, good sunlight          │
│ • Flat terrain, sparse trees            │
│ • 300m altitude, 50m/s speed            │
│ • 800 images, ~15 min flight            │
└────────────────────────────────────────┘

Pass Criteria:
  ✓ Accuracy: ≥80% within 50m
  ✓ Accuracy: ≥60% within 20m
  ✓ Registration rate: ≥95%
  ✓ Processing time: <2s/image
  ✓ Satellite validation: <10% outliers
  ✓ Reprojection error: <1.0px mean

Success Metrics:
  • MAE (mean absolute error): <40m
  • RMS error: <45m
  • Max error: <200m
  • Trajectory coherence: smooth (no jumps)

FAT-2: Real UAV Flight Trial #2 (Challenging)

Scenario: Flight with more complex terrain
┌────────────────────────────────────────┐
│ Conditions:                             │
│ • Mixed urban/agricultural              │
│ • Buildings, vegetation, water bodies   │
│ • Variable altitude (250-400m)          │
│ • Includes 1-2 sharp turns              │
│ • 1200 images, ~25 min flight           │
└────────────────────────────────────────┘

Pass Criteria:
  ✓ Accuracy: ≥75% within 50m (relaxed from 80%)
  ✓ Accuracy: ≥50% within 20m (relaxed from 60%)
  ✓ Registration rate: ≥92% (relaxed from 95%)
  ✓ Processing time: <2.5s/image avg
  ✓ Outliers detected: <15% (relaxed from 10%)
  
Fallback Validation:
  ✓ User corrected <20% of uncertain images
  ✓ After correction, accuracy meets FAT-1 targets

FAT-3: Real UAV Flight Trial #3 (Edge Case)

Scenario: Low-texture flight (challenging for features)
┌────────────────────────────────────────┐
│ Conditions:                             │
│ • Sandy/desert terrain or water         │
│ • Minimal features                      │
│ • Overcast/variable lighting            │
│ • 500-600 images, ~12 min flight        │
└────────────────────────────────────────┘

Pass Criteria:
  ✓ System continues (no crash): YES
  ✓ Graceful degradation: Flags uncertainty
  ✓ User can correct and improve: YES
  ✓ Satellite anchor helps recovery: YES

Success Metrics:
  • >80% of images tagged "uncertain"
  • After user correction: meets standard targets
  • Demonstrates fallback mechanisms working

3. Test Environment Setup

Hardware Requirements

CPU: 16+ cores (Intel Xeon / AMD Ryzen)
RAM: 64GB minimum (32GB acceptable for <1500 images)
Storage: 1TB SSD (for raw images + processing)
GPU: Optional (CUDA 11.8+ for 5-10x acceleration)
Network: For satellite API queries (can be cached)

Software Requirements

OS: Ubuntu 20.04 LTS or macOS 12+
Build: CMake 3.20+, GCC 9+ or Clang 11+
Dependencies: OpenCV 4.8+, Eigen 3.4+, GDAL 3.0+
Testing: GoogleTest, Pytest
CI/CD: GitHub Actions or Jenkins

Test Data Management

Synthetic Data: Generated via Blender (checked into repo)
Real Data: External dataset storage (S3/local SSD)
Ground Truth: Maintained in CSV format with metadata
Versioning: Git-LFS for binary image data

4. Test Execution Plan

Phase 1: Unit Testing (Weeks 1-6)

Sprint 1-2: UT-1 (Feature detection) - 2 week
Sprint 3-4: UT-2 (Feature matching) - 2 weeks
Sprint 5-6: UT-3, UT-4, UT-5 (Geometry) - 2 weeks

Continuous: Run full unit test suite every commit
Coverage target: >90% code coverage

Phase 2: Integration Testing (Weeks 7-12)

Sprint 7-9: IT-1 (Sequential pipeline) - 3 weeks
Sprint 10-11: IT-2, IT-3 (Georef, Outliers) - 2 weeks
Sprint 12: System integration - 1 week

Continuous: Integration tests run nightly

Phase 3: System Testing (Weeks 13-18)

Sprint 13-14: ST-1, ST-2 (Accuracy, Registration) - 2 weeks
Sprint 15-16: ST-3, ST-4 (Error, Speed) - 2 weeks
Sprint 17-18: ST-5, ST-6 (Robustness) - 2 weeks

Load testing: 1000-3000 image sequences
Stress testing: Edge cases, memory limits

Phase 4: Field Acceptance (Weeks 19-30)

Week 19-22: FAT-1 (Baseline trial)
  • Coordinate 1-2 baseline flights
  • Validate system on real data
  • Adjust parameters as needed

Week 23-26: FAT-2 (Challenging trial)
  • More complex scenarios
  • Test fallback mechanisms
  • Refine user interface

Week 27-30: FAT-3 (Edge case trial)
  • Low-texture scenarios
  • Validate robustness
  • Final adjustments

Post-trial: Generate comprehensive report

5. Acceptance Criteria Summary

Criterion Target Test Pass/Fail
Accuracy@50m ≥80% FAT-1 ≥80% pass
Accuracy@20m ≥60% FAT-1 ≥60% pass
Registration Rate ≥95% ST-2 ≥95% pass
Reprojection Error <1.0px mean ST-3 <1.0px pass
Processing Speed <2.0s/image ST-4 p95<2.5s pass
Robustness (350m outlier) Handled ST-5 Continue pass
Sharp turns (<5% overlap) Handled ST-6 Skip-frame pass
Satellite validation <10% outliers FAT-1-3 <10% pass

6. Success Metrics

Green Light Criteria (Ready for production):

  • All unit tests pass (100%)
  • All integration tests pass (100%)
  • All system tests pass (100%)
  • FAT-1 and FAT-2 pass acceptance criteria
  • FAT-3 shows graceful degradation
  • <10% code defects discovered in field trials
  • Performance meets SLA consistently

Yellow Light Criteria (Conditional deployment):

  • ⚠ 85-89% of acceptance criteria met
  • ⚠ Minor issues in edge cases
  • ⚠ Requires workaround documentation
  • ⚠ Re-test after fixes

Red Light Criteria (Do not deploy):

  • <85% of acceptance criteria met
  • Critical failures in core functionality
  • Safety/security concerns
  • Cannot meet latency or accuracy targets

Conclusion

This comprehensive testing strategy ensures the UAV Image Geolocalization System is robust, accurate, and reliable before production deployment. The multi-level approach (unit → integration → system → field) progressively validates system behavior from components to end-to-end scenarios, with field trials providing real-world validation of the acceptance criteria.