Added Perplexity 01_solution_draft

This commit is contained in:
Denys Zaitsev
2025-11-03 21:18:52 +02:00
parent 5bfe049d95
commit 7a35d8f138
7 changed files with 1859 additions and 0 deletions
@@ -0,0 +1,631 @@
# Testing Strategy & Acceptance Test Plan (ATP)
## Overview
This document details the comprehensive testing strategy for the UAV Aerial Image Geolocalization System, covering unit tests, integration tests, system tests, field validation, and acceptance criteria.
---
## 1. Test Pyramid Architecture
```
/|\
/ | \
ACCEPTANCE TESTS (5%)
┌─────────────────────────┐
/│ Field Trials │\
/ │ Real UAV Flights │ \
/ └─────────────────────────┘ \
╱ ╲
SYSTEM TESTS (15%)
┌────────────────────────────────┐
/│ End-to-End Performance │\
/ │ Accuracy, Speed, Robustness │ \
/ └────────────────────────────────┘ \
╱ ╲
INTEGRATION TESTS (30%)
┌─────────────────────────────────┐
/│ Multi-Component Workflows │\
/ │ FeatureMatcher → Triangulator │ \
│ Bundle Adjustment Refinement │
└─────────────────────────────────┘
UNIT TESTS (50%)
┌─────────────────────────┐
/│ Individual Components │\
/ │ AKAZE, Essential Matrix │
│ Triangulation, BA... │
└─────────────────────────┘
```
---
## 2. Detailed Test Categories
### 2.1 Unit Tests (Level 1)
#### UT-1: Feature Extraction (AKAZE)
```
Purpose: Verify keypoint detection and descriptor computation
Test Data: Synthetic images with known features (checkerboard patterns)
Test Cases:
├─ UT-1.1: Basic feature detection
│ Input: 1024×768 synthetic image with checkerboard
│ Expected: ≥500 keypoints detected
│ Pass: count ≥ 500
├─ UT-1.2: Scale invariance
│ Input: Same scene at 2x scale
│ Expected: Keypoints at proportional positions
│ Pass: correlation of positions > 0.9
├─ UT-1.3: Rotation robustness
│ Input: Image rotated ±30°
│ Expected: Descriptors match original + rotated
│ Pass: match rate > 80%
├─ UT-1.4: Multi-scale handling
│ Input: Image with features at multiple scales
│ Expected: Features detected at all scales (pyramid)
│ Pass: ratio of scales [1:1.2:1.44:...] verified
└─ UT-1.5: Performance constraint
Input: FullHD image (1920×1080)
Expected: <500ms feature extraction
Pass: 95th percentile < 500ms
```
#### UT-2: Feature Matching
```
Purpose: Verify robust feature correspondence
Test Data: Pairs of synthetic/real images with known correspondence
Test Cases:
├─ UT-2.1: Basic matching
│ Input: Two images from synthetic scene (90% overlap)
│ Expected: ≥95% of ground-truth features matched
│ Pass: match_rate ≥ 0.95
├─ UT-2.2: Outlier rejection (Lowe's ratio test)
│ Input: Synthetic pair + 50% false features
│ Expected: False matches rejected
│ Pass: false_match_rate < 0.1
├─ UT-2.3: Low overlap scenario
│ Input: Two images with 20% overlap
│ Expected: Still matches ≥20 points
│ Pass: min_matches ≥ 20
└─ UT-2.4: Performance
Input: FullHD images, 1000 features each
Expected: <300ms matching time
Pass: 95th percentile < 300ms
```
#### UT-3: Essential Matrix Estimation
```
Purpose: Verify 5-point/8-point algorithms for camera geometry
Test Data: Synthetic correspondences with known relative pose
Test Cases:
├─ UT-3.1: 8-point algorithm
│ Input: 8+ point correspondences
│ Expected: Essential matrix E with rank 2
│ Pass: min_singular_value(E) < 1e-6
├─ UT-3.2: 5-point algorithm
│ Input: 5 point correspondences
│ Expected: Up to 4 solutions generated
│ Pass: num_solutions ∈ [1, 4]
├─ UT-3.3: RANSAC convergence
│ Input: 100 correspondences, 30% outliers
│ Expected: Essential matrix recovery despite outliers
│ Pass: inlier_ratio ≥ 0.6
└─ UT-3.4: Chirality constraint
Input: Multiple (R,t) solutions from decomposition
Expected: Only solution with points in front of cameras selected
Pass: selected_solution verified via triangulation
```
#### UT-4: Triangulation (DLT)
```
Purpose: Verify 3D point reconstruction from image correspondences
Test Data: Synthetic scenes with known 3D geometry
Test Cases:
├─ UT-4.1: Accuracy
│ Input: Noise-free point correspondences
│ Expected: Reconstructed X matches ground truth
│ Pass: RMSE < 0.1cm on 1m scene
├─ UT-4.2: Outlier handling
│ Input: 10 valid + 2 invalid correspondences
│ Expected: Invalid points detected (behind camera/far)
│ Pass: valid_mask accuracy > 95%
├─ UT-4.3: Altitude constraint
│ Input: Points with z < 50m (below aircraft)
│ Expected: Points rejected
│ Pass: altitude_filter works correctly
└─ UT-4.4: Batch performance
Input: 500 point triangulations
Expected: <100ms total
Pass: 95th percentile < 100ms
```
#### UT-5: Bundle Adjustment
```
Purpose: Verify pose and 3D point optimization
Test Data: Synthetic multi-view scenes
Test Cases:
├─ UT-5.1: Convergence
│ Input: 5 frames with noisy initial poses
│ Expected: Residual decreases monotonically
│ Pass: final_residual < 0.001 * initial_residual
├─ UT-5.2: Covariance computation
│ Input: Optimized poses and points
│ Expected: Covariance matrix positive-definite
│ Pass: all_eigenvalues > 0
├─ UT-5.3: Window size effect
│ Input: Same problem with window sizes [3, 5, 10]
│ Expected: Larger windows → better residuals
│ Pass: residual_5 < residual_3, residual_10 < residual_5
└─ UT-5.4: Performance scaling
Input: Window size [5, 10, 15, 20]
Expected: Time ~= O(w^3)
Pass: quadratic fit accurate (R² > 0.95)
```
---
### 2.2 Integration Tests (Level 2)
#### IT-1: Sequential Pipeline
```
Purpose: Verify image-to-image processing chain
Test Data: Real aerial image sequences (5-20 images)
Test Cases:
├─ IT-1.1: Feature flow
│ Features extracted from img₁ → tracked to img₂ → matched
│ Expected: Consistent tracking across images
│ Pass: ≥70% features tracked end-to-end
├─ IT-1.2: Pose chain consistency
│ Poses P₁, P₂, P₃ computed sequentially
│ Expected: P₃ = P₂ ∘ P₂₋₁ (composition consistency)
│ Pass: pose_error < 0.1° rotation, 5cm translation
├─ IT-1.3: Trajectory smoothness
│ Velocity computed between poses
│ Expected: Smooth velocity profile (no jumps)
│ Pass: velocity_std_dev < 20% mean_velocity
└─ IT-1.4: Memory usage
Process 100-image sequence
Expected: Constant memory (windowed processing)
Pass: peak_memory < 2GB
```
#### IT-2: Satellite Georeferencing
```
Purpose: Verify local-to-global coordinate transformation
Test Data: Synthetic/real images with known satellite reference
Test Cases:
├─ IT-2.1: Feature matching with satellite
│ Input: Aerial image + satellite reference
│ Expected: ≥10 matched features between viewpoints
│ Pass: match_count ≥ 10
├─ IT-2.2: Homography estimation
│ Matched features → homography matrix
│ Expected: Valid transformation (3×3 matrix)
│ Pass: det(H) ≠ 0, condition_number < 100
├─ IT-2.3: GPS transformation accuracy
│ Apply homography to image corners
│ Expected: Computed GPS ≈ known reference GPS
│ Pass: error < 100m (on test data)
└─ IT-2.4: Confidence scoring
Compute inlier_ratio and MI (mutual information)
Expected: score = inlier_ratio × MI ∈ [0, 1]
Pass: high_confidence for obvious matches
```
#### IT-3: Outlier Detection Chain
```
Purpose: Verify multi-stage outlier detection
Test Data: Synthetic trajectory with injected outliers
Test Cases:
├─ IT-3.1: Velocity anomaly detection
│ Inject 350m jump at frame N
│ Expected: Detected as outlier
│ Pass: outlier_flag = True
├─ IT-3.2: Recovery mechanism
│ After outlier detection
│ Expected: System attempts skip-frame matching (N→N+2)
│ Pass: recovery_successful = True
├─ IT-3.3: False positive rate
│ Normal sequence with small perturbations
│ Expected: <5% false outlier flagging
│ Pass: false_positive_rate < 0.05
└─ IT-3.4: Consistency across stages
Multiple detection stages should agree
Pass: agreement_score > 0.8
```
---
### 2.3 System Tests (Level 3)
#### ST-1: Accuracy Criteria
```
Purpose: Verify system meets ±50m and ±20m accuracy targets
Test Data: Real aerial image sequences with ground-truth GPS
Test Cases:
├─ ST-1.1: 50m accuracy target
│ Input: 500-image flight
│ Compute: % images within 50m of ground truth
│ Expected: ≥80%
│ Pass: accuracy_50m ≥ 0.80
├─ ST-1.2: 20m accuracy target
│ Same flight data
│ Expected: ≥60% within 20m
│ Pass: accuracy_20m ≥ 0.60
├─ ST-1.3: Mean absolute error
│ Compute: MAE over all images
│ Expected: <40m typical
│ Pass: MAE < 50m
└─ ST-1.4: Error distribution
Expected: Error approximately Gaussian
Pass: K-S test p-value > 0.05
```
#### ST-2: Registration Rate
```
Purpose: Verify ≥95% of images successfully registered
Test Data: Real flights with various conditions
Test Cases:
├─ ST-2.1: Baseline registration
│ Good overlap, clear features
│ Expected: >98% registration rate
│ Pass: registration_rate ≥ 0.98
├─ ST-2.2: Challenging conditions
│ Low texture, variable lighting
│ Expected: ≥95% registration rate
│ Pass: registration_rate ≥ 0.95
├─ ST-2.3: Sharp turns scenario
│ Images with <10% overlap
│ Expected: Fallback mechanisms trigger, ≥90% success
│ Pass: fallback_success_rate ≥ 0.90
└─ ST-2.4: Consecutive failures
Track max consecutive unregistered images
Expected: <3 consecutive failures
Pass: max_consecutive_failures ≤ 3
```
#### ST-3: Reprojection Error
```
Purpose: Verify <1.0 pixel mean reprojection error
Test Data: Real flight data after bundle adjustment
Test Cases:
├─ ST-3.1: Mean reprojection error
│ After BA optimization
│ Expected: <1.0 pixel
│ Pass: mean_reproj_error < 1.0
├─ ST-3.2: Error distribution
│ Histogram of per-point errors
│ Expected: Tightly concentrated <2 pixels
│ Pass: 95th_percentile < 2.0 px
├─ ST-3.3: Per-frame consistency
│ Error should not vary dramatically
│ Expected: Consistent across frames
│ Pass: frame_error_std_dev < 0.3 px
└─ ST-3.4: Outlier points
Very large reprojection errors
Expected: <1% of points with error >3 px
Pass: outlier_rate < 0.01
```
#### ST-4: Processing Speed
```
Purpose: Verify <2 seconds per image
Test Data: Full flight sequences on target hardware
Test Cases:
├─ ST-4.1: Average latency
│ Mean processing time per image
│ Expected: <2 seconds
│ Pass: mean_latency < 2.0 sec
├─ ST-4.2: 95th percentile latency
│ Worst-case images (complex scenes)
│ Expected: <2.5 seconds
│ Pass: p95_latency < 2.5 sec
├─ ST-4.3: Component breakdown
│ Feature extraction: <0.5s
│ Matching: <0.3s
│ RANSAC: <0.2s
│ BA: <0.8s
│ Satellite: <0.3s
│ Pass: Each component within budget
└─ ST-4.4: Scaling with problem size
Memory usage, CPU usage vs. image resolution
Expected: Linear scaling
Pass: O(n) complexity verified
```
#### ST-5: Robustness - Outlier Handling
```
Purpose: Verify graceful handling of 350m outlier drifts
Test Data: Synthetic/real data with injected outliers
Test Cases:
├─ ST-5.1: Single 350m outlier
│ Inject outlier at frame N
│ Expected: Detected, trajectory continues
│ Pass: system_continues = True
├─ ST-5.2: Multiple outliers
│ 3-5 outliers scattered in sequence
│ Expected: All detected, recovery attempted
│ Pass: detection_rate ≥ 0.8
├─ ST-5.3: False positive rate
│ Normal trajectory, no outliers
│ Expected: <5% false flagging
│ Pass: false_positive_rate < 0.05
└─ ST-5.4: Recovery latency
Time to recover after outlier
Expected: ≤3 frames
Pass: recovery_latency ≤ 3 frames
```
#### ST-6: Robustness - Sharp Turns
```
Purpose: Verify handling of <5% image overlap scenarios
Test Data: Synthetic sequences with sharp angles
Test Cases:
├─ ST-6.1: 5% overlap matching
│ Two images with 5% overlap
│ Expected: Minimal matches or skip-frame
│ Pass: system_handles_gracefully = True
├─ ST-6.2: Skip-frame fallback
│ Direct N→N+1 fails, tries N→N+2
│ Expected: Succeeds with N→N+2
│ Pass: skip_frame_success_rate ≥ 0.8
├─ ST-6.3: 90° turn handling
│ Images at near-orthogonal angles
│ Expected: Degeneracy detected, logged
│ Pass: degeneracy_detection = True
└─ ST-6.4: Trajectory consistency
Consecutive turns: check velocity smoothness
Expected: No velocity jumps > 50%
Pass: velocity_consistency verified
```
---
### 2.4 Field Acceptance Tests (Level 4)
#### FAT-1: Real UAV Flight Trial #1 (Baseline)
```
Scenario: Nominal flight over agricultural field
┌────────────────────────────────────────┐
│ Conditions: │
│ • Clear weather, good sunlight │
│ • Flat terrain, sparse trees │
│ • 300m altitude, 50m/s speed │
│ • 800 images, ~15 min flight │
└────────────────────────────────────────┘
Pass Criteria:
✓ Accuracy: ≥80% within 50m
✓ Accuracy: ≥60% within 20m
✓ Registration rate: ≥95%
✓ Processing time: <2s/image
✓ Satellite validation: <10% outliers
✓ Reprojection error: <1.0px mean
Success Metrics:
• MAE (mean absolute error): <40m
• RMS error: <45m
• Max error: <200m
• Trajectory coherence: smooth (no jumps)
```
#### FAT-2: Real UAV Flight Trial #2 (Challenging)
```
Scenario: Flight with more complex terrain
┌────────────────────────────────────────┐
│ Conditions: │
│ • Mixed urban/agricultural │
│ • Buildings, vegetation, water bodies │
│ • Variable altitude (250-400m) │
│ • Includes 1-2 sharp turns │
│ • 1200 images, ~25 min flight │
└────────────────────────────────────────┘
Pass Criteria:
✓ Accuracy: ≥75% within 50m (relaxed from 80%)
✓ Accuracy: ≥50% within 20m (relaxed from 60%)
✓ Registration rate: ≥92% (relaxed from 95%)
✓ Processing time: <2.5s/image avg
✓ Outliers detected: <15% (relaxed from 10%)
Fallback Validation:
✓ User corrected <20% of uncertain images
✓ After correction, accuracy meets FAT-1 targets
```
#### FAT-3: Real UAV Flight Trial #3 (Edge Case)
```
Scenario: Low-texture flight (challenging for features)
┌────────────────────────────────────────┐
│ Conditions: │
│ • Sandy/desert terrain or water │
│ • Minimal features │
│ • Overcast/variable lighting │
│ • 500-600 images, ~12 min flight │
└────────────────────────────────────────┘
Pass Criteria:
✓ System continues (no crash): YES
✓ Graceful degradation: Flags uncertainty
✓ User can correct and improve: YES
✓ Satellite anchor helps recovery: YES
Success Metrics:
• >80% of images tagged "uncertain"
• After user correction: meets standard targets
• Demonstrates fallback mechanisms working
```
---
## 3. Test Environment Setup
### Hardware Requirements
```
CPU: 16+ cores (Intel Xeon / AMD Ryzen)
RAM: 64GB minimum (32GB acceptable for <1500 images)
Storage: 1TB SSD (for raw images + processing)
GPU: Optional (CUDA 11.8+ for 5-10x acceleration)
Network: For satellite API queries (can be cached)
```
### Software Requirements
```
OS: Ubuntu 20.04 LTS or macOS 12+
Build: CMake 3.20+, GCC 9+ or Clang 11+
Dependencies: OpenCV 4.8+, Eigen 3.4+, GDAL 3.0+
Testing: GoogleTest, Pytest
CI/CD: GitHub Actions or Jenkins
```
### Test Data Management
```
Synthetic Data: Generated via Blender (checked into repo)
Real Data: External dataset storage (S3/local SSD)
Ground Truth: Maintained in CSV format with metadata
Versioning: Git-LFS for binary image data
```
---
## 4. Test Execution Plan
### Phase 1: Unit Testing (Weeks 1-6)
```
Sprint 1-2: UT-1 (Feature detection) - 2 week
Sprint 3-4: UT-2 (Feature matching) - 2 weeks
Sprint 5-6: UT-3, UT-4, UT-5 (Geometry) - 2 weeks
Continuous: Run full unit test suite every commit
Coverage target: >90% code coverage
```
### Phase 2: Integration Testing (Weeks 7-12)
```
Sprint 7-9: IT-1 (Sequential pipeline) - 3 weeks
Sprint 10-11: IT-2, IT-3 (Georef, Outliers) - 2 weeks
Sprint 12: System integration - 1 week
Continuous: Integration tests run nightly
```
### Phase 3: System Testing (Weeks 13-18)
```
Sprint 13-14: ST-1, ST-2 (Accuracy, Registration) - 2 weeks
Sprint 15-16: ST-3, ST-4 (Error, Speed) - 2 weeks
Sprint 17-18: ST-5, ST-6 (Robustness) - 2 weeks
Load testing: 1000-3000 image sequences
Stress testing: Edge cases, memory limits
```
### Phase 4: Field Acceptance (Weeks 19-30)
```
Week 19-22: FAT-1 (Baseline trial)
• Coordinate 1-2 baseline flights
• Validate system on real data
• Adjust parameters as needed
Week 23-26: FAT-2 (Challenging trial)
• More complex scenarios
• Test fallback mechanisms
• Refine user interface
Week 27-30: FAT-3 (Edge case trial)
• Low-texture scenarios
• Validate robustness
• Final adjustments
Post-trial: Generate comprehensive report
```
---
## 5. Acceptance Criteria Summary
| Criterion | Target | Test | Pass/Fail |
|-----------|--------|------|-----------|
| **Accuracy@50m** | ≥80% | FAT-1 | ≥80% pass |
| **Accuracy@20m** | ≥60% | FAT-1 | ≥60% pass |
| **Registration Rate** | ≥95% | ST-2 | ≥95% pass |
| **Reprojection Error** | <1.0px mean | ST-3 | <1.0px pass |
| **Processing Speed** | <2.0s/image | ST-4 | p95<2.5s pass |
| **Robustness (350m outlier)** | Handled | ST-5 | Continue pass |
| **Sharp turns (<5% overlap)** | Handled | ST-6 | Skip-frame pass |
| **Satellite validation** | <10% outliers | FAT-1-3 | <10% pass |
---
## 6. Success Metrics
**Green Light Criteria** (Ready for production):
- ✅ All unit tests pass (100%)
- ✅ All integration tests pass (100%)
- ✅ All system tests pass (100%)
- ✅ FAT-1 and FAT-2 pass acceptance criteria
- ✅ FAT-3 shows graceful degradation
- ✅ <10% code defects discovered in field trials
- ✅ Performance meets SLA consistently
**Yellow Light Criteria** (Conditional deployment):
- ⚠ 85-89% of acceptance criteria met
- ⚠ Minor issues in edge cases
- ⚠ Requires workaround documentation
- ⚠ Re-test after fixes
**Red Light Criteria** (Do not deploy):
- ❌ <85% of acceptance criteria met
- ❌ Critical failures in core functionality
- ❌ Safety/security concerns
- ❌ Cannot meet latency or accuracy targets
---
## Conclusion
This comprehensive testing strategy ensures the UAV Image Geolocalization System is robust, accurate, and reliable before production deployment. The multi-level approach (unit → integration → system → field) progressively validates system behavior from components to end-to-end scenarios, with field trials providing real-world validation of the acceptance criteria.