mirror of
https://github.com/azaion/gps-denied-onboard.git
synced 2026-04-22 08:56:37 +00:00
20 KiB
20 KiB
Testing Strategy & Acceptance Test Plan (ATP)
Overview
This document details the comprehensive testing strategy for the UAV Aerial Image Geolocalization System, covering unit tests, integration tests, system tests, field validation, and acceptance criteria.
1. Test Pyramid Architecture
▲
/|\
/ | \
ACCEPTANCE TESTS (5%)
┌─────────────────────────┐
/│ Field Trials │\
/ │ Real UAV Flights │ \
/ └─────────────────────────┘ \
╱ ╲
SYSTEM TESTS (15%)
┌────────────────────────────────┐
/│ End-to-End Performance │\
/ │ Accuracy, Speed, Robustness │ \
/ └────────────────────────────────┘ \
╱ ╲
INTEGRATION TESTS (30%)
┌─────────────────────────────────┐
/│ Multi-Component Workflows │\
/ │ FeatureMatcher → Triangulator │ \
│ Bundle Adjustment Refinement │
└─────────────────────────────────┘
UNIT TESTS (50%)
┌─────────────────────────┐
/│ Individual Components │\
/ │ AKAZE, Essential Matrix │
│ Triangulation, BA... │
└─────────────────────────┘
2. Detailed Test Categories
2.1 Unit Tests (Level 1)
UT-1: Feature Extraction (AKAZE)
Purpose: Verify keypoint detection and descriptor computation
Test Data: Synthetic images with known features (checkerboard patterns)
Test Cases:
├─ UT-1.1: Basic feature detection
│ Input: 1024×768 synthetic image with checkerboard
│ Expected: ≥500 keypoints detected
│ Pass: count ≥ 500
│
├─ UT-1.2: Scale invariance
│ Input: Same scene at 2x scale
│ Expected: Keypoints at proportional positions
│ Pass: correlation of positions > 0.9
│
├─ UT-1.3: Rotation robustness
│ Input: Image rotated ±30°
│ Expected: Descriptors match original + rotated
│ Pass: match rate > 80%
│
├─ UT-1.4: Multi-scale handling
│ Input: Image with features at multiple scales
│ Expected: Features detected at all scales (pyramid)
│ Pass: ratio of scales [1:1.2:1.44:...] verified
│
└─ UT-1.5: Performance constraint
Input: FullHD image (1920×1080)
Expected: <500ms feature extraction
Pass: 95th percentile < 500ms
UT-2: Feature Matching
Purpose: Verify robust feature correspondence
Test Data: Pairs of synthetic/real images with known correspondence
Test Cases:
├─ UT-2.1: Basic matching
│ Input: Two images from synthetic scene (90% overlap)
│ Expected: ≥95% of ground-truth features matched
│ Pass: match_rate ≥ 0.95
│
├─ UT-2.2: Outlier rejection (Lowe's ratio test)
│ Input: Synthetic pair + 50% false features
│ Expected: False matches rejected
│ Pass: false_match_rate < 0.1
│
├─ UT-2.3: Low overlap scenario
│ Input: Two images with 20% overlap
│ Expected: Still matches ≥20 points
│ Pass: min_matches ≥ 20
│
└─ UT-2.4: Performance
Input: FullHD images, 1000 features each
Expected: <300ms matching time
Pass: 95th percentile < 300ms
UT-3: Essential Matrix Estimation
Purpose: Verify 5-point/8-point algorithms for camera geometry
Test Data: Synthetic correspondences with known relative pose
Test Cases:
├─ UT-3.1: 8-point algorithm
│ Input: 8+ point correspondences
│ Expected: Essential matrix E with rank 2
│ Pass: min_singular_value(E) < 1e-6
│
├─ UT-3.2: 5-point algorithm
│ Input: 5 point correspondences
│ Expected: Up to 4 solutions generated
│ Pass: num_solutions ∈ [1, 4]
│
├─ UT-3.3: RANSAC convergence
│ Input: 100 correspondences, 30% outliers
│ Expected: Essential matrix recovery despite outliers
│ Pass: inlier_ratio ≥ 0.6
│
└─ UT-3.4: Chirality constraint
Input: Multiple (R,t) solutions from decomposition
Expected: Only solution with points in front of cameras selected
Pass: selected_solution verified via triangulation
UT-4: Triangulation (DLT)
Purpose: Verify 3D point reconstruction from image correspondences
Test Data: Synthetic scenes with known 3D geometry
Test Cases:
├─ UT-4.1: Accuracy
│ Input: Noise-free point correspondences
│ Expected: Reconstructed X matches ground truth
│ Pass: RMSE < 0.1cm on 1m scene
│
├─ UT-4.2: Outlier handling
│ Input: 10 valid + 2 invalid correspondences
│ Expected: Invalid points detected (behind camera/far)
│ Pass: valid_mask accuracy > 95%
│
├─ UT-4.3: Altitude constraint
│ Input: Points with z < 50m (below aircraft)
│ Expected: Points rejected
│ Pass: altitude_filter works correctly
│
└─ UT-4.4: Batch performance
Input: 500 point triangulations
Expected: <100ms total
Pass: 95th percentile < 100ms
UT-5: Bundle Adjustment
Purpose: Verify pose and 3D point optimization
Test Data: Synthetic multi-view scenes
Test Cases:
├─ UT-5.1: Convergence
│ Input: 5 frames with noisy initial poses
│ Expected: Residual decreases monotonically
│ Pass: final_residual < 0.001 * initial_residual
│
├─ UT-5.2: Covariance computation
│ Input: Optimized poses and points
│ Expected: Covariance matrix positive-definite
│ Pass: all_eigenvalues > 0
│
├─ UT-5.3: Window size effect
│ Input: Same problem with window sizes [3, 5, 10]
│ Expected: Larger windows → better residuals
│ Pass: residual_5 < residual_3, residual_10 < residual_5
│
└─ UT-5.4: Performance scaling
Input: Window size [5, 10, 15, 20]
Expected: Time ~= O(w^3)
Pass: quadratic fit accurate (R² > 0.95)
2.2 Integration Tests (Level 2)
IT-1: Sequential Pipeline
Purpose: Verify image-to-image processing chain
Test Data: Real aerial image sequences (5-20 images)
Test Cases:
├─ IT-1.1: Feature flow
│ Features extracted from img₁ → tracked to img₂ → matched
│ Expected: Consistent tracking across images
│ Pass: ≥70% features tracked end-to-end
│
├─ IT-1.2: Pose chain consistency
│ Poses P₁, P₂, P₃ computed sequentially
│ Expected: P₃ = P₂ ∘ P₂₋₁ (composition consistency)
│ Pass: pose_error < 0.1° rotation, 5cm translation
│
├─ IT-1.3: Trajectory smoothness
│ Velocity computed between poses
│ Expected: Smooth velocity profile (no jumps)
│ Pass: velocity_std_dev < 20% mean_velocity
│
└─ IT-1.4: Memory usage
Process 100-image sequence
Expected: Constant memory (windowed processing)
Pass: peak_memory < 2GB
IT-2: Satellite Georeferencing
Purpose: Verify local-to-global coordinate transformation
Test Data: Synthetic/real images with known satellite reference
Test Cases:
├─ IT-2.1: Feature matching with satellite
│ Input: Aerial image + satellite reference
│ Expected: ≥10 matched features between viewpoints
│ Pass: match_count ≥ 10
│
├─ IT-2.2: Homography estimation
│ Matched features → homography matrix
│ Expected: Valid transformation (3×3 matrix)
│ Pass: det(H) ≠ 0, condition_number < 100
│
├─ IT-2.3: GPS transformation accuracy
│ Apply homography to image corners
│ Expected: Computed GPS ≈ known reference GPS
│ Pass: error < 100m (on test data)
│
└─ IT-2.4: Confidence scoring
Compute inlier_ratio and MI (mutual information)
Expected: score = inlier_ratio × MI ∈ [0, 1]
Pass: high_confidence for obvious matches
IT-3: Outlier Detection Chain
Purpose: Verify multi-stage outlier detection
Test Data: Synthetic trajectory with injected outliers
Test Cases:
├─ IT-3.1: Velocity anomaly detection
│ Inject 350m jump at frame N
│ Expected: Detected as outlier
│ Pass: outlier_flag = True
│
├─ IT-3.2: Recovery mechanism
│ After outlier detection
│ Expected: System attempts skip-frame matching (N→N+2)
│ Pass: recovery_successful = True
│
├─ IT-3.3: False positive rate
│ Normal sequence with small perturbations
│ Expected: <5% false outlier flagging
│ Pass: false_positive_rate < 0.05
│
└─ IT-3.4: Consistency across stages
Multiple detection stages should agree
Pass: agreement_score > 0.8
2.3 System Tests (Level 3)
ST-1: Accuracy Criteria
Purpose: Verify system meets ±50m and ±20m accuracy targets
Test Data: Real aerial image sequences with ground-truth GPS
Test Cases:
├─ ST-1.1: 50m accuracy target
│ Input: 500-image flight
│ Compute: % images within 50m of ground truth
│ Expected: ≥80%
│ Pass: accuracy_50m ≥ 0.80
│
├─ ST-1.2: 20m accuracy target
│ Same flight data
│ Expected: ≥60% within 20m
│ Pass: accuracy_20m ≥ 0.60
│
├─ ST-1.3: Mean absolute error
│ Compute: MAE over all images
│ Expected: <40m typical
│ Pass: MAE < 50m
│
└─ ST-1.4: Error distribution
Expected: Error approximately Gaussian
Pass: K-S test p-value > 0.05
ST-2: Registration Rate
Purpose: Verify ≥95% of images successfully registered
Test Data: Real flights with various conditions
Test Cases:
├─ ST-2.1: Baseline registration
│ Good overlap, clear features
│ Expected: >98% registration rate
│ Pass: registration_rate ≥ 0.98
│
├─ ST-2.2: Challenging conditions
│ Low texture, variable lighting
│ Expected: ≥95% registration rate
│ Pass: registration_rate ≥ 0.95
│
├─ ST-2.3: Sharp turns scenario
│ Images with <10% overlap
│ Expected: Fallback mechanisms trigger, ≥90% success
│ Pass: fallback_success_rate ≥ 0.90
│
└─ ST-2.4: Consecutive failures
Track max consecutive unregistered images
Expected: <3 consecutive failures
Pass: max_consecutive_failures ≤ 3
ST-3: Reprojection Error
Purpose: Verify <1.0 pixel mean reprojection error
Test Data: Real flight data after bundle adjustment
Test Cases:
├─ ST-3.1: Mean reprojection error
│ After BA optimization
│ Expected: <1.0 pixel
│ Pass: mean_reproj_error < 1.0
│
├─ ST-3.2: Error distribution
│ Histogram of per-point errors
│ Expected: Tightly concentrated <2 pixels
│ Pass: 95th_percentile < 2.0 px
│
├─ ST-3.3: Per-frame consistency
│ Error should not vary dramatically
│ Expected: Consistent across frames
│ Pass: frame_error_std_dev < 0.3 px
│
└─ ST-3.4: Outlier points
Very large reprojection errors
Expected: <1% of points with error >3 px
Pass: outlier_rate < 0.01
ST-4: Processing Speed
Purpose: Verify <2 seconds per image
Test Data: Full flight sequences on target hardware
Test Cases:
├─ ST-4.1: Average latency
│ Mean processing time per image
│ Expected: <2 seconds
│ Pass: mean_latency < 2.0 sec
│
├─ ST-4.2: 95th percentile latency
│ Worst-case images (complex scenes)
│ Expected: <2.5 seconds
│ Pass: p95_latency < 2.5 sec
│
├─ ST-4.3: Component breakdown
│ Feature extraction: <0.5s
│ Matching: <0.3s
│ RANSAC: <0.2s
│ BA: <0.8s
│ Satellite: <0.3s
│ Pass: Each component within budget
│
└─ ST-4.4: Scaling with problem size
Memory usage, CPU usage vs. image resolution
Expected: Linear scaling
Pass: O(n) complexity verified
ST-5: Robustness - Outlier Handling
Purpose: Verify graceful handling of 350m outlier drifts
Test Data: Synthetic/real data with injected outliers
Test Cases:
├─ ST-5.1: Single 350m outlier
│ Inject outlier at frame N
│ Expected: Detected, trajectory continues
│ Pass: system_continues = True
│
├─ ST-5.2: Multiple outliers
│ 3-5 outliers scattered in sequence
│ Expected: All detected, recovery attempted
│ Pass: detection_rate ≥ 0.8
│
├─ ST-5.3: False positive rate
│ Normal trajectory, no outliers
│ Expected: <5% false flagging
│ Pass: false_positive_rate < 0.05
│
└─ ST-5.4: Recovery latency
Time to recover after outlier
Expected: ≤3 frames
Pass: recovery_latency ≤ 3 frames
ST-6: Robustness - Sharp Turns
Purpose: Verify handling of <5% image overlap scenarios
Test Data: Synthetic sequences with sharp angles
Test Cases:
├─ ST-6.1: 5% overlap matching
│ Two images with 5% overlap
│ Expected: Minimal matches or skip-frame
│ Pass: system_handles_gracefully = True
│
├─ ST-6.2: Skip-frame fallback
│ Direct N→N+1 fails, tries N→N+2
│ Expected: Succeeds with N→N+2
│ Pass: skip_frame_success_rate ≥ 0.8
│
├─ ST-6.3: 90° turn handling
│ Images at near-orthogonal angles
│ Expected: Degeneracy detected, logged
│ Pass: degeneracy_detection = True
│
└─ ST-6.4: Trajectory consistency
Consecutive turns: check velocity smoothness
Expected: No velocity jumps > 50%
Pass: velocity_consistency verified
2.4 Field Acceptance Tests (Level 4)
FAT-1: Real UAV Flight Trial #1 (Baseline)
Scenario: Nominal flight over agricultural field
┌────────────────────────────────────────┐
│ Conditions: │
│ • Clear weather, good sunlight │
│ • Flat terrain, sparse trees │
│ • 300m altitude, 50m/s speed │
│ • 800 images, ~15 min flight │
└────────────────────────────────────────┘
Pass Criteria:
✓ Accuracy: ≥80% within 50m
✓ Accuracy: ≥60% within 20m
✓ Registration rate: ≥95%
✓ Processing time: <2s/image
✓ Satellite validation: <10% outliers
✓ Reprojection error: <1.0px mean
Success Metrics:
• MAE (mean absolute error): <40m
• RMS error: <45m
• Max error: <200m
• Trajectory coherence: smooth (no jumps)
FAT-2: Real UAV Flight Trial #2 (Challenging)
Scenario: Flight with more complex terrain
┌────────────────────────────────────────┐
│ Conditions: │
│ • Mixed urban/agricultural │
│ • Buildings, vegetation, water bodies │
│ • Variable altitude (250-400m) │
│ • Includes 1-2 sharp turns │
│ • 1200 images, ~25 min flight │
└────────────────────────────────────────┘
Pass Criteria:
✓ Accuracy: ≥75% within 50m (relaxed from 80%)
✓ Accuracy: ≥50% within 20m (relaxed from 60%)
✓ Registration rate: ≥92% (relaxed from 95%)
✓ Processing time: <2.5s/image avg
✓ Outliers detected: <15% (relaxed from 10%)
Fallback Validation:
✓ User corrected <20% of uncertain images
✓ After correction, accuracy meets FAT-1 targets
FAT-3: Real UAV Flight Trial #3 (Edge Case)
Scenario: Low-texture flight (challenging for features)
┌────────────────────────────────────────┐
│ Conditions: │
│ • Sandy/desert terrain or water │
│ • Minimal features │
│ • Overcast/variable lighting │
│ • 500-600 images, ~12 min flight │
└────────────────────────────────────────┘
Pass Criteria:
✓ System continues (no crash): YES
✓ Graceful degradation: Flags uncertainty
✓ User can correct and improve: YES
✓ Satellite anchor helps recovery: YES
Success Metrics:
• >80% of images tagged "uncertain"
• After user correction: meets standard targets
• Demonstrates fallback mechanisms working
3. Test Environment Setup
Hardware Requirements
CPU: 16+ cores (Intel Xeon / AMD Ryzen)
RAM: 64GB minimum (32GB acceptable for <1500 images)
Storage: 1TB SSD (for raw images + processing)
GPU: Optional (CUDA 11.8+ for 5-10x acceleration)
Network: For satellite API queries (can be cached)
Software Requirements
OS: Ubuntu 20.04 LTS or macOS 12+
Build: CMake 3.20+, GCC 9+ or Clang 11+
Dependencies: OpenCV 4.8+, Eigen 3.4+, GDAL 3.0+
Testing: GoogleTest, Pytest
CI/CD: GitHub Actions or Jenkins
Test Data Management
Synthetic Data: Generated via Blender (checked into repo)
Real Data: External dataset storage (S3/local SSD)
Ground Truth: Maintained in CSV format with metadata
Versioning: Git-LFS for binary image data
4. Test Execution Plan
Phase 1: Unit Testing (Weeks 1-6)
Sprint 1-2: UT-1 (Feature detection) - 2 week
Sprint 3-4: UT-2 (Feature matching) - 2 weeks
Sprint 5-6: UT-3, UT-4, UT-5 (Geometry) - 2 weeks
Continuous: Run full unit test suite every commit
Coverage target: >90% code coverage
Phase 2: Integration Testing (Weeks 7-12)
Sprint 7-9: IT-1 (Sequential pipeline) - 3 weeks
Sprint 10-11: IT-2, IT-3 (Georef, Outliers) - 2 weeks
Sprint 12: System integration - 1 week
Continuous: Integration tests run nightly
Phase 3: System Testing (Weeks 13-18)
Sprint 13-14: ST-1, ST-2 (Accuracy, Registration) - 2 weeks
Sprint 15-16: ST-3, ST-4 (Error, Speed) - 2 weeks
Sprint 17-18: ST-5, ST-6 (Robustness) - 2 weeks
Load testing: 1000-3000 image sequences
Stress testing: Edge cases, memory limits
Phase 4: Field Acceptance (Weeks 19-30)
Week 19-22: FAT-1 (Baseline trial)
• Coordinate 1-2 baseline flights
• Validate system on real data
• Adjust parameters as needed
Week 23-26: FAT-2 (Challenging trial)
• More complex scenarios
• Test fallback mechanisms
• Refine user interface
Week 27-30: FAT-3 (Edge case trial)
• Low-texture scenarios
• Validate robustness
• Final adjustments
Post-trial: Generate comprehensive report
5. Acceptance Criteria Summary
| Criterion | Target | Test | Pass/Fail |
|---|---|---|---|
| Accuracy@50m | ≥80% | FAT-1 | ≥80% pass |
| Accuracy@20m | ≥60% | FAT-1 | ≥60% pass |
| Registration Rate | ≥95% | ST-2 | ≥95% pass |
| Reprojection Error | <1.0px mean | ST-3 | <1.0px pass |
| Processing Speed | <2.0s/image | ST-4 | p95<2.5s pass |
| Robustness (350m outlier) | Handled | ST-5 | Continue pass |
| Sharp turns (<5% overlap) | Handled | ST-6 | Skip-frame pass |
| Satellite validation | <10% outliers | FAT-1-3 | <10% pass |
6. Success Metrics
Green Light Criteria (Ready for production):
- ✅ All unit tests pass (100%)
- ✅ All integration tests pass (100%)
- ✅ All system tests pass (100%)
- ✅ FAT-1 and FAT-2 pass acceptance criteria
- ✅ FAT-3 shows graceful degradation
- ✅ <10% code defects discovered in field trials
- ✅ Performance meets SLA consistently
Yellow Light Criteria (Conditional deployment):
- ⚠ 85-89% of acceptance criteria met
- ⚠ Minor issues in edge cases
- ⚠ Requires workaround documentation
- ⚠ Re-test after fixes
Red Light Criteria (Do not deploy):
- ❌ <85% of acceptance criteria met
- ❌ Critical failures in core functionality
- ❌ Safety/security concerns
- ❌ Cannot meet latency or accuracy targets
Conclusion
This comprehensive testing strategy ensures the UAV Image Geolocalization System is robust, accurate, and reliable before production deployment. The multi-level approach (unit → integration → system → field) progressively validates system behavior from components to end-to-end scenarios, with field trials providing real-world validation of the acceptance criteria.