# Testing Strategy & Acceptance Test Plan (ATP)

## Overview

This document details the comprehensive testing strategy for the UAV Aerial Image Geolocalization System, covering unit tests, integration tests, system tests, field validation, and acceptance criteria.

---

## 1. Test Pyramid Architecture

```
                          ▲
                         /|\
                        / | \
                    ACCEPTANCE TESTS (5%)
              ┌─────────────────────────┐
             /│      Field Trials       │\
            / │   Real UAV Flights      │ \
           /  └─────────────────────────┘  \
          ╱                                  ╲
      SYSTEM TESTS (15%)
    ┌────────────────────────────────┐
   /│    End-to-End Performance     │\
  / │ Accuracy, Speed, Robustness   │ \
 /  └────────────────────────────────┘  \
╱                                          ╲
    INTEGRATION TESTS (30%)
  ┌─────────────────────────────────┐
 /│  Multi-Component Workflows      │\
/  │  FeatureMatcher → Triangulator │ \
   │  Bundle Adjustment Refinement  │
   └─────────────────────────────────┘

        UNIT TESTS (50%)
      ┌─────────────────────────┐
     /│  Individual Components  │\
    /  │ AKAZE, Essential Matrix │
       │ Triangulation, BA...    │
       └─────────────────────────┘
```

---

## 2. Detailed Test Categories

### 2.1 Unit Tests (Level 1)

#### UT-1: Feature Extraction (AKAZE)
```
Purpose: Verify keypoint detection and descriptor computation
Test Data: Synthetic images with known features (checkerboard patterns)
Test Cases:
  ├─ UT-1.1: Basic feature detection
  │   Input: 1024×768 synthetic image with checkerboard
  │   Expected: ≥500 keypoints detected
  │   Pass: count ≥ 500
  │
  ├─ UT-1.2: Scale invariance
  │   Input: Same scene at 2x scale
  │   Expected: Keypoints at proportional positions
  │   Pass: correlation of positions > 0.9
  │
  ├─ UT-1.3: Rotation robustness
  │   Input: Image rotated ±30°
  │   Expected: Descriptors match original + rotated
  │   Pass: match rate > 80%
  │
  ├─ UT-1.4: Multi-scale handling
  │   Input: Image with features at multiple scales
  │   Expected: Features detected at all scales (pyramid)
  │   Pass: ratio of scales [1:1.2:1.44:...] verified
  │
  └─ UT-1.5: Performance constraint
      Input: FullHD image (1920×1080)
      Expected: <500ms feature extraction
      Pass: 95th percentile < 500ms
```

#### UT-2: Feature Matching
```
Purpose: Verify robust feature correspondence
Test Data: Pairs of synthetic/real images with known correspondence
Test Cases:
  ├─ UT-2.1: Basic matching
  │   Input: Two images from synthetic scene (90% overlap)
  │   Expected: ≥95% of ground-truth features matched
  │   Pass: match_rate ≥ 0.95
  │
  ├─ UT-2.2: Outlier rejection (Lowe's ratio test)
  │   Input: Synthetic pair + 50% false features
  │   Expected: False matches rejected
  │   Pass: false_match_rate < 0.1
  │
  ├─ UT-2.3: Low overlap scenario
  │   Input: Two images with 20% overlap
  │   Expected: Still matches ≥20 points
  │   Pass: min_matches ≥ 20
  │
  └─ UT-2.4: Performance
      Input: FullHD images, 1000 features each
      Expected: <300ms matching time
      Pass: 95th percentile < 300ms
```

#### UT-3: Essential Matrix Estimation
```
Purpose: Verify 5-point/8-point algorithms for camera geometry
Test Data: Synthetic correspondences with known relative pose
Test Cases:
  ├─ UT-3.1: 8-point algorithm
  │   Input: 8+ point correspondences
  │   Expected: Essential matrix E with rank 2
  │   Pass: min_singular_value(E) < 1e-6
  │
  ├─ UT-3.2: 5-point algorithm
  │   Input: 5 point correspondences
  │   Expected: Up to 4 solutions generated
  │   Pass: num_solutions ∈ [1, 4]
  │
  ├─ UT-3.3: RANSAC convergence
  │   Input: 100 correspondences, 30% outliers
  │   Expected: Essential matrix recovery despite outliers
  │   Pass: inlier_ratio ≥ 0.6
  │
  └─ UT-3.4: Chirality constraint
      Input: Multiple (R,t) solutions from decomposition
      Expected: Only solution with points in front of cameras selected
      Pass: selected_solution verified via triangulation
```

#### UT-4: Triangulation (DLT)
```
Purpose: Verify 3D point reconstruction from image correspondences
Test Data: Synthetic scenes with known 3D geometry
Test Cases:
  ├─ UT-4.1: Accuracy
  │   Input: Noise-free point correspondences
  │   Expected: Reconstructed X matches ground truth
  │   Pass: RMSE < 0.1cm on 1m scene
  │
  ├─ UT-4.2: Outlier handling
  │   Input: 10 valid + 2 invalid correspondences
  │   Expected: Invalid points detected (behind camera/far)
  │   Pass: valid_mask accuracy > 95%
  │
  ├─ UT-4.3: Altitude constraint
  │   Input: Points with z < 50m (below aircraft)
  │   Expected: Points rejected
  │   Pass: altitude_filter works correctly
  │
  └─ UT-4.4: Batch performance
      Input: 500 point triangulations
      Expected: <100ms total
      Pass: 95th percentile < 100ms
```

#### UT-5: Bundle Adjustment
```
Purpose: Verify pose and 3D point optimization
Test Data: Synthetic multi-view scenes
Test Cases:
  ├─ UT-5.1: Convergence
  │   Input: 5 frames with noisy initial poses
  │   Expected: Residual decreases monotonically
  │   Pass: final_residual < 0.001 * initial_residual
  │
  ├─ UT-5.2: Covariance computation
  │   Input: Optimized poses and points
  │   Expected: Covariance matrix positive-definite
  │   Pass: all_eigenvalues > 0
  │
  ├─ UT-5.3: Window size effect
  │   Input: Same problem with window sizes [3, 5, 10]
  │   Expected: Larger windows → better residuals
  │   Pass: residual_5 < residual_3, residual_10 < residual_5
  │
  └─ UT-5.4: Performance scaling
      Input: Window size [5, 10, 15, 20]
      Expected: Time ~= O(w^3)
      Pass: quadratic fit accurate (R² > 0.95)
```

---

### 2.2 Integration Tests (Level 2)

#### IT-1: Sequential Pipeline
```
Purpose: Verify image-to-image processing chain
Test Data: Real aerial image sequences (5-20 images)
Test Cases:
  ├─ IT-1.1: Feature flow
  │   Features extracted from img₁ → tracked to img₂ → matched
  │   Expected: Consistent tracking across images
  │   Pass: ≥70% features tracked end-to-end
  │
  ├─ IT-1.2: Pose chain consistency
  │   Poses P₁, P₂, P₃ computed sequentially
  │   Expected: P₃ = P₂ ∘ P₂₋₁ (composition consistency)
  │   Pass: pose_error < 0.1° rotation, 5cm translation
  │
  ├─ IT-1.3: Trajectory smoothness
  │   Velocity computed between poses
  │   Expected: Smooth velocity profile (no jumps)
  │   Pass: velocity_std_dev < 20% mean_velocity
  │
  └─ IT-1.4: Memory usage
      Process 100-image sequence
      Expected: Constant memory (windowed processing)
      Pass: peak_memory < 2GB
```

#### IT-2: Satellite Georeferencing
```
Purpose: Verify local-to-global coordinate transformation
Test Data: Synthetic/real images with known satellite reference
Test Cases:
  ├─ IT-2.1: Feature matching with satellite
  │   Input: Aerial image + satellite reference
  │   Expected: ≥10 matched features between viewpoints
  │   Pass: match_count ≥ 10
  │
  ├─ IT-2.2: Homography estimation
  │   Matched features → homography matrix
  │   Expected: Valid transformation (3×3 matrix)
  │   Pass: det(H) ≠ 0, condition_number < 100
  │
  ├─ IT-2.3: GPS transformation accuracy
  │   Apply homography to image corners
  │   Expected: Computed GPS ≈ known reference GPS
  │   Pass: error < 100m (on test data)
  │
  └─ IT-2.4: Confidence scoring
      Compute inlier_ratio and MI (mutual information)
      Expected: score = inlier_ratio × MI ∈ [0, 1]
      Pass: high_confidence for obvious matches
```

#### IT-3: Outlier Detection Chain
```
Purpose: Verify multi-stage outlier detection
Test Data: Synthetic trajectory with injected outliers
Test Cases:
  ├─ IT-3.1: Velocity anomaly detection
  │   Inject 350m jump at frame N
  │   Expected: Detected as outlier
  │   Pass: outlier_flag = True
  │
  ├─ IT-3.2: Recovery mechanism
  │   After outlier detection
  │   Expected: System attempts skip-frame matching (N→N+2)
  │   Pass: recovery_successful = True
  │
  ├─ IT-3.3: False positive rate
  │   Normal sequence with small perturbations
  │   Expected: <5% false outlier flagging
  │   Pass: false_positive_rate < 0.05
  │
  └─ IT-3.4: Consistency across stages
      Multiple detection stages should agree
      Pass: agreement_score > 0.8
```

---

### 2.3 System Tests (Level 3)

#### ST-1: Accuracy Criteria
```
Purpose: Verify system meets ±50m and ±20m accuracy targets
Test Data: Real aerial image sequences with ground-truth GPS
Test Cases:
  ├─ ST-1.1: 50m accuracy target
  │   Input: 500-image flight
  │   Compute: % images within 50m of ground truth
  │   Expected: ≥80%
  │   Pass: accuracy_50m ≥ 0.80
  │
  ├─ ST-1.2: 20m accuracy target
  │   Same flight data
  │   Expected: ≥60% within 20m
  │   Pass: accuracy_20m ≥ 0.60
  │
  ├─ ST-1.3: Mean absolute error
  │   Compute: MAE over all images
  │   Expected: <40m typical
  │   Pass: MAE < 50m
  │
  └─ ST-1.4: Error distribution
      Expected: Error approximately Gaussian
      Pass: K-S test p-value > 0.05
```

#### ST-2: Registration Rate
```
Purpose: Verify ≥95% of images successfully registered
Test Data: Real flights with various conditions
Test Cases:
  ├─ ST-2.1: Baseline registration
  │   Good overlap, clear features
  │   Expected: >98% registration rate
  │   Pass: registration_rate ≥ 0.98
  │
  ├─ ST-2.2: Challenging conditions
  │   Low texture, variable lighting
  │   Expected: ≥95% registration rate
  │   Pass: registration_rate ≥ 0.95
  │
  ├─ ST-2.3: Sharp turns scenario
  │   Images with <10% overlap
  │   Expected: Fallback mechanisms trigger, ≥90% success
  │   Pass: fallback_success_rate ≥ 0.90
  │
  └─ ST-2.4: Consecutive failures
      Track max consecutive unregistered images
      Expected: <3 consecutive failures
      Pass: max_consecutive_failures ≤ 3
```

#### ST-3: Reprojection Error
```
Purpose: Verify <1.0 pixel mean reprojection error
Test Data: Real flight data after bundle adjustment
Test Cases:
  ├─ ST-3.1: Mean reprojection error
  │   After BA optimization
  │   Expected: <1.0 pixel
  │   Pass: mean_reproj_error < 1.0
  │
  ├─ ST-3.2: Error distribution
  │   Histogram of per-point errors
  │   Expected: Tightly concentrated <2 pixels
  │   Pass: 95th_percentile < 2.0 px
  │
  ├─ ST-3.3: Per-frame consistency
  │   Error should not vary dramatically
  │   Expected: Consistent across frames
  │   Pass: frame_error_std_dev < 0.3 px
  │
  └─ ST-3.4: Outlier points
      Very large reprojection errors
      Expected: <1% of points with error >3 px
      Pass: outlier_rate < 0.01
```

#### ST-4: Processing Speed
```
Purpose: Verify <2 seconds per image
Test Data: Full flight sequences on target hardware
Test Cases:
  ├─ ST-4.1: Average latency
  │   Mean processing time per image
  │   Expected: <2 seconds
  │   Pass: mean_latency < 2.0 sec
  │
  ├─ ST-4.2: 95th percentile latency
  │   Worst-case images (complex scenes)
  │   Expected: <2.5 seconds
  │   Pass: p95_latency < 2.5 sec
  │
  ├─ ST-4.3: Component breakdown
  │   Feature extraction: <0.5s
  │   Matching: <0.3s
  │   RANSAC: <0.2s
  │   BA: <0.8s
  │   Satellite: <0.3s
  │   Pass: Each component within budget
  │
  └─ ST-4.4: Scaling with problem size
      Memory usage, CPU usage vs. image resolution
      Expected: Linear scaling
      Pass: O(n) complexity verified
```

#### ST-5: Robustness - Outlier Handling
```
Purpose: Verify graceful handling of 350m outlier drifts
Test Data: Synthetic/real data with injected outliers
Test Cases:
  ├─ ST-5.1: Single 350m outlier
  │   Inject outlier at frame N
  │   Expected: Detected, trajectory continues
  │   Pass: system_continues = True
  │
  ├─ ST-5.2: Multiple outliers
  │   3-5 outliers scattered in sequence
  │   Expected: All detected, recovery attempted
  │   Pass: detection_rate ≥ 0.8
  │
  ├─ ST-5.3: False positive rate
  │   Normal trajectory, no outliers
  │   Expected: <5% false flagging
  │   Pass: false_positive_rate < 0.05
  │
  └─ ST-5.4: Recovery latency
      Time to recover after outlier
      Expected: ≤3 frames
      Pass: recovery_latency ≤ 3 frames
```

#### ST-6: Robustness - Sharp Turns
```
Purpose: Verify handling of <5% image overlap scenarios
Test Data: Synthetic sequences with sharp angles
Test Cases:
  ├─ ST-6.1: 5% overlap matching
  │   Two images with 5% overlap
  │   Expected: Minimal matches or skip-frame
  │   Pass: system_handles_gracefully = True
  │
  ├─ ST-6.2: Skip-frame fallback
  │   Direct N→N+1 fails, tries N→N+2
  │   Expected: Succeeds with N→N+2
  │   Pass: skip_frame_success_rate ≥ 0.8
  │
  ├─ ST-6.3: 90° turn handling
  │   Images at near-orthogonal angles
  │   Expected: Degeneracy detected, logged
  │   Pass: degeneracy_detection = True
  │
  └─ ST-6.4: Trajectory consistency
      Consecutive turns: check velocity smoothness
      Expected: No velocity jumps > 50%
      Pass: velocity_consistency verified
```

---

### 2.4 Field Acceptance Tests (Level 4)

#### FAT-1: Real UAV Flight Trial #1 (Baseline)
```
Scenario: Nominal flight over agricultural field
┌────────────────────────────────────────┐
│ Conditions:                             │
│ • Clear weather, good sunlight          │
│ • Flat terrain, sparse trees            │
│ • 300m altitude, 50m/s speed            │
│ • 800 images, ~15 min flight            │
└────────────────────────────────────────┘

Pass Criteria:
  ✓ Accuracy: ≥80% within 50m
  ✓ Accuracy: ≥60% within 20m
  ✓ Registration rate: ≥95%
  ✓ Processing time: <2s/image
  ✓ Satellite validation: <10% outliers
  ✓ Reprojection error: <1.0px mean

Success Metrics:
  • MAE (mean absolute error): <40m
  • RMS error: <45m
  • Max error: <200m
  • Trajectory coherence: smooth (no jumps)
```

#### FAT-2: Real UAV Flight Trial #2 (Challenging)
```
Scenario: Flight with more complex terrain
┌────────────────────────────────────────┐
│ Conditions:                             │
│ • Mixed urban/agricultural              │
│ • Buildings, vegetation, water bodies   │
│ • Variable altitude (250-400m)          │
│ • Includes 1-2 sharp turns              │
│ • 1200 images, ~25 min flight           │
└────────────────────────────────────────┘

Pass Criteria:
  ✓ Accuracy: ≥75% within 50m (relaxed from 80%)
  ✓ Accuracy: ≥50% within 20m (relaxed from 60%)
  ✓ Registration rate: ≥92% (relaxed from 95%)
  ✓ Processing time: <2.5s/image avg
  ✓ Outliers detected: <15% (relaxed from 10%)
  
Fallback Validation:
  ✓ User corrected <20% of uncertain images
  ✓ After correction, accuracy meets FAT-1 targets
```

#### FAT-3: Real UAV Flight Trial #3 (Edge Case)
```
Scenario: Low-texture flight (challenging for features)
┌────────────────────────────────────────┐
│ Conditions:                             │
│ • Sandy/desert terrain or water         │
│ • Minimal features                      │
│ • Overcast/variable lighting            │
│ • 500-600 images, ~12 min flight        │
└────────────────────────────────────────┘

Pass Criteria:
  ✓ System continues (no crash): YES
  ✓ Graceful degradation: Flags uncertainty
  ✓ User can correct and improve: YES
  ✓ Satellite anchor helps recovery: YES

Success Metrics:
  • >80% of images tagged "uncertain"
  • After user correction: meets standard targets
  • Demonstrates fallback mechanisms working
```

---

## 3. Test Environment Setup

### Hardware Requirements
```
CPU: 16+ cores (Intel Xeon / AMD Ryzen)
RAM: 64GB minimum (32GB acceptable for <1500 images)
Storage: 1TB SSD (for raw images + processing)
GPU: Optional (CUDA 11.8+ for 5-10x acceleration)
Network: For satellite API queries (can be cached)
```

### Software Requirements
```
OS: Ubuntu 20.04 LTS or macOS 12+
Build: CMake 3.20+, GCC 9+ or Clang 11+
Dependencies: OpenCV 4.8+, Eigen 3.4+, GDAL 3.0+
Testing: GoogleTest, Pytest
CI/CD: GitHub Actions or Jenkins
```

### Test Data Management
```
Synthetic Data: Generated via Blender (checked into repo)
Real Data: External dataset storage (S3/local SSD)
Ground Truth: Maintained in CSV format with metadata
Versioning: Git-LFS for binary image data
```

---

## 4. Test Execution Plan

### Phase 1: Unit Testing (Weeks 1-6)
```
Sprint 1-2: UT-1 (Feature detection) - 2 week
Sprint 3-4: UT-2 (Feature matching) - 2 weeks
Sprint 5-6: UT-3, UT-4, UT-5 (Geometry) - 2 weeks

Continuous: Run full unit test suite every commit
Coverage target: >90% code coverage
```

### Phase 2: Integration Testing (Weeks 7-12)
```
Sprint 7-9: IT-1 (Sequential pipeline) - 3 weeks
Sprint 10-11: IT-2, IT-3 (Georef, Outliers) - 2 weeks
Sprint 12: System integration - 1 week

Continuous: Integration tests run nightly
```

### Phase 3: System Testing (Weeks 13-18)
```
Sprint 13-14: ST-1, ST-2 (Accuracy, Registration) - 2 weeks
Sprint 15-16: ST-3, ST-4 (Error, Speed) - 2 weeks
Sprint 17-18: ST-5, ST-6 (Robustness) - 2 weeks

Load testing: 1000-3000 image sequences
Stress testing: Edge cases, memory limits
```

### Phase 4: Field Acceptance (Weeks 19-30)
```
Week 19-22: FAT-1 (Baseline trial)
  • Coordinate 1-2 baseline flights
  • Validate system on real data
  • Adjust parameters as needed

Week 23-26: FAT-2 (Challenging trial)
  • More complex scenarios
  • Test fallback mechanisms
  • Refine user interface

Week 27-30: FAT-3 (Edge case trial)
  • Low-texture scenarios
  • Validate robustness
  • Final adjustments

Post-trial: Generate comprehensive report
```

---

## 5. Acceptance Criteria Summary

| Criterion | Target | Test | Pass/Fail |
|-----------|--------|------|-----------|
| **Accuracy@50m** | ≥80% | FAT-1 | ≥80% pass |
| **Accuracy@20m** | ≥60% | FAT-1 | ≥60% pass |
| **Registration Rate** | ≥95% | ST-2 | ≥95% pass |
| **Reprojection Error** | <1.0px mean | ST-3 | <1.0px pass |
| **Processing Speed** | <2.0s/image | ST-4 | p95<2.5s pass |
| **Robustness (350m outlier)** | Handled | ST-5 | Continue pass |
| **Sharp turns (<5% overlap)** | Handled | ST-6 | Skip-frame pass |
| **Satellite validation** | <10% outliers | FAT-1-3 | <10% pass |

---

## 6. Success Metrics

**Green Light Criteria** (Ready for production):
- ✅ All unit tests pass (100%)
- ✅ All integration tests pass (100%)
- ✅ All system tests pass (100%)
- ✅ FAT-1 and FAT-2 pass acceptance criteria
- ✅ FAT-3 shows graceful degradation
- ✅ <10% code defects discovered in field trials
- ✅ Performance meets SLA consistently

**Yellow Light Criteria** (Conditional deployment):
- ⚠ 85-89% of acceptance criteria met
- ⚠ Minor issues in edge cases
- ⚠ Requires workaround documentation
- ⚠ Re-test after fixes

**Red Light Criteria** (Do not deploy):
- ❌ <85% of acceptance criteria met
- ❌ Critical failures in core functionality
- ❌ Safety/security concerns
- ❌ Cannot meet latency or accuracy targets

---

## Conclusion

This comprehensive testing strategy ensures the UAV Image Geolocalization System is robust, accurate, and reliable before production deployment. The multi-level approach (unit → integration → system → field) progressively validates system behavior from components to end-to-end scenarios, with field trials providing real-world validation of the acceptance criteria.