From 9afa1baed336b15726554c366cb1ae1567d45f9f Mon Sep 17 00:00:00 2001 From: Oleksandr Bezdieniezhnykh Date: Mon, 3 Nov 2025 21:47:21 +0200 Subject: [PATCH] add solution drafts - gemini and perplexity --- docs/01_solution/01_solution_draft/.DS_Store | Bin 8196 -> 0 bytes .../Solution_Executive_Summary.md | 299 ------ .../01_solution_draft/Testing_Strategy_ATP.md | 631 ------------- docs/01_solution/01_solution_draft_google.md | 271 ++++++ ...ion.md => 01_solution_draft_perplexity.md} | 878 +++++++++++------- .../02_solution_draft/productDescription.md | 90 -- 6 files changed, 808 insertions(+), 1361 deletions(-) delete mode 100644 docs/01_solution/01_solution_draft/.DS_Store delete mode 100644 docs/01_solution/01_solution_draft/Solution_Executive_Summary.md delete mode 100644 docs/01_solution/01_solution_draft/Testing_Strategy_ATP.md create mode 100644 docs/01_solution/01_solution_draft_google.md rename docs/01_solution/{01_solution_draft/UAV_Geolocalization_Solution.md => 01_solution_draft_perplexity.md} (56%) delete mode 100644 docs/01_solution/02_solution_draft/productDescription.md diff --git a/docs/01_solution/01_solution_draft/.DS_Store b/docs/01_solution/01_solution_draft/.DS_Store deleted file mode 100644 index dd91b4fc328e59bd08ca80bf39ae32e3f9580838..0000000000000000000000000000000000000000 GIT binary patch literal 0 HcmV?d00001 literal 8196 zcmeI1&2G~`6ot>w3RPm$9o-=O0wwef+$x9-LWmZ&Caq10)HYS@l!lcbo$t;}K_p~R zBB3h16U`j^{&{?^$H|>Y3P-d0K-!bELpR)g#_*SBdh1K=NtQeX1O8H{u_n4wr8i5~ zE^r2%0cXG&a0Z-#hrj^e+17SeeD5c%kIsNI@F*Fu_d`K9EHm~AmRSzM%hFoHVT;M-KZ&rud%Jvq9*fqM=M@I9p z*_`PcR_91t?c>v6H)gCkv*`QCN62cO67@D-cY>G2eys!UtzBSXmAoRxlvrPx@nCK4 z-e>OC$N+PAu9L`x?Y>;Wf@P|Uv$R~Cla&evPxQfa7ISugzV7mLPfv35lBfNUwNU9T zxv*7c&)ox@Er? zXTTX)U?6M{kB3JHb8Wq#PE~$^^h?&5T~7z23qvf&!B~!izy2`fxWLp5%Zz30% outliers | -| **3D Reconstruction** | Linear triangulation (DLT) | Fast, numerically stable | -| **Pose Refinement** | Windowed Bundle Adjustment (Levenberg-Marquardt) | Non-linear optimization, sparse structure exploitation | -| **Georeferencing** | Satellite image matching (ORB features) | Leverages free, readily available data | -| **Outlier Detection** | Multi-stage (velocity + satellite + loop closure) | Catches different failure modes | - -### Processing Pipeline - -**Phase 1: Offline Initialization** (~1-3 min) -- Load all images -- Extract features in parallel -- Estimate camera calibration - -**Phase 2: Sequential Processing** (~2 sec/image) -- For each image pair: - - Match features (RANSAC) - - Recover camera pose - - Triangulate 3D points - - Local bundle adjustment - - Satellite georeferencing - - Store GPS coordinate + confidence - -**Phase 3: Post-Processing** (~5-20 min) -- Outlier detection -- Satellite validation -- Optional loop closure optimization -- Generate report - -**Phase 4: Manual Review** (~10-60 min, optional) -- User corrects flagged uncertain regions -- Re-optimize with corrected anchors - ---- - -## Testing Strategy - -### Test Levels - -**Level 1: Unit Tests** (Feature-level validation) -- ✅ Feature extraction: >95% on synthetic images -- ✅ Feature matching: inlier ratio >0.4 at 50% overlap -- ✅ Essential matrix: rank-2 constraint within 1e-6 -- ✅ Triangulation: RMSE <5cm on synthetic scenes -- ✅ Bundle adjustment: convergence in <10 iterations - -**Level 2: Integration Tests** (Component-level) -- ✅ Sequential pipeline: correct pose chain for N images -- ✅ 5-frame window BA: reprojection <1.5px -- ✅ Satellite matching: GPS shift <30m when satellite available -- ✅ Fallback mechanisms: graceful degradation on failure - -**Level 3: System Tests** (End-to-end) -- ✅ **Accuracy**: 80% images within 50m, 60% within 20m -- ✅ **Registration Rate**: ≥95% images successfully tracked -- ✅ **Reprojection Error**: mean <1.0px -- ✅ **Latency**: <2 seconds per image (95th percentile) -- ✅ **Robustness**: handles 350m outliers, <5% overlap turns -- ✅ **Validation**: <10% outliers on satellite check - -**Level 4: Field Validation** (Real UAV flights) -- 3-4 real flights over eastern Ukraine -- Ground-truth validation using survey-grade GNSS -- Satellite imagery cross-verification -- Performance in diverse conditions (flat fields, urban, transitions) - -### Test Coverage - -| Scenario | Test Type | Pass Criteria | -|----------|-----------|---------------| -| Normal flight (good overlap) | Integration | 90%+ accuracy within 50m | -| Sharp turns (<5% overlap) | System | Fallback triggered, continues | -| Low texture (sand/water) | System | Flags uncertainty, continues | -| 350m outlier drift | System | Detected, isolated, recovery | -| Corrupted image | Robustness | Skipped gracefully | -| Satellite API failure | Robustness | Falls back to local coords | -| Real UAV data | Field | Meets all acceptance criteria | - ---- - -## Performance Expectations - -### Accuracy -- **80% of images within 50m** ✅ (achievable via satellite anchor) -- **60% of images within 20m** ✅ (bundle adjustment precision) -- **Mean error**: ~30-40m (acceptable for UAV surveying) -- **Outliers**: <10% (detected and flagged for review) - -### Speed -- **Feature extraction**: 0.4s per image -- **Feature matching**: 0.3s per pair -- **RANSAC/pose**: 0.2s per pair -- **Bundle adjustment**: 0.8s per 5-frame window -- **Satellite matching**: 0.3s per image -- **Total average**: 1.7s per image ✅ (below 2s target) - -### Robustness -- **Registration rate**: 97% ✅ (well above 95% target) -- **Reprojection error**: 0.8px mean ✅ (below 1.0px target) -- **Outlier handling**: Graceful degradation up to 30% outliers -- **Sharp turn handling**: Skip-frame matching succeeds -- **Fallback mechanisms**: 3-level hierarchy ensures completion - ---- - -## Implementation Stack - -**Languages & Libraries** -- **Core**: C++17 + Python bindings -- **Linear algebra**: Eigen 3.4+ -- **Computer vision**: OpenCV 4.8+ -- **Optimization**: Ceres Solver (sparse bundle adjustment) -- **Geospatial**: GDAL, proj (coordinate transformations) -- **Web UI**: Python Flask/FastAPI + React.js + Mapbox GL -- **Acceleration**: CUDA/GPU optional (5-10x speedup on feature extraction) - -**Deployment** -- **Standalone**: Docker container on Ubuntu 20.04+ -- **Requirements**: 16+ CPU cores, 64GB RAM (for 3000 images) -- **Processing time**: ~2-3 hours for 1000 images -- **Output**: GeoJSON, CSV, interactive web map - ---- - -## Risk Mitigation - -| Risk | Probability | Mitigation | -|------|-------------|-----------| -| Feature matching fails on low texture | Medium | Satellite matching, user input | -| Satellite imagery unavailable | Medium | Use local transform, GCP support | -| Computational overload | Low | Streaming, hierarchical processing, GPU | -| Rolling shutter distortion | Medium | Rectification, ORB-SLAM3 techniques | -| Poor GPS initialization | Low | Auto-detect from visible landmarks | - ---- - -## Expected Outcomes - -✅ **Meets all acceptance criteria** on representative datasets -✅ **Exceeds accuracy targets** with satellite anchor (typically 40-50m mean error) -✅ **Robust to edge cases** (sharp turns, low texture, outliers) -✅ **Production-ready pipeline** with user fallback option -✅ **Scalable architecture** (processes up to 3000 images/flight) -✅ **Extensible design** (GPU acceleration, IMU fusion future work) - ---- - -## Recommendations for Deployment - -1. **Pre-Flight** - - Calibrate camera intrinsic parameters (focal length, distortion) - - Record starting GPS coordinate or landmark - - Ensure ≥50% image overlap in flight plan - -2. **During Flight** - - Maintain consistent altitude for uniform resolution - - Record telemetry data (optional, for IMU fusion) - - Avoid extreme tilt or rolling maneuvers - -3. **Post-Flight** - - Process on high-spec computer (16+ cores, 64GB RAM) - - Review satellite validation report - - Manually correct <20% of uncertain images if needed - - Export results with confidence metrics - -4. **Accuracy Improvement** - - Provide 4+ GCPs if survey-grade accuracy needed - - Use satellite imagery as georeferencing anchor - - Fly in good weather (minimal cloud cover) - - Ensure adequate feature-rich terrain - ---- - -## Deliverables - -1. **Core Software** - - Complete C++ codebase with Python bindings - - Docker container for deployment - - Unit & integration test suite - -2. **Documentation** - - API reference - - Configuration guide - - Troubleshooting manual - -3. **User Interface** - - Web-based dashboard for visualization - - Manual correction interface - - Report generation - -4. **Validation** - - Field trial report (3-4 real flights) - - Accuracy assessment vs. ground truth - - Performance benchmarks - ---- - -## Timeline - -- **Weeks 1-4**: Foundation (feature detection, matching, pose estimation) -- **Weeks 5-8**: Core SfM pipeline & bundle adjustment -- **Weeks 9-12**: Georeferencing & satellite integration -- **Weeks 13-16**: Robustness, optimization, edge cases -- **Weeks 17-20**: UI, integration, deployment -- **Weeks 21-30**: Field trials & refinement - -**Total: 30 weeks (~7 months) to production deployment** - ---- - -## Conclusion - -This solution provides a **comprehensive, production-ready system** for UAV aerial image geolocalization in GPS-denied environments. By combining incremental structure-from-motion, visual odometry, and satellite cross-referencing, it achieves the challenging accuracy requirements while maintaining robustness to real-world edge cases and constraints. - -The modular architecture enables incremental development, extensive testing, and future enhancements (GPU acceleration, IMU fusion, deep learning integration). Deployment as a containerized service makes it accessible for use across eastern Ukraine and similar regions. - -**Key Success Factors**: -1. Robust feature matching with multi-scale handling -2. Satellite imagery as absolute georeferencing anchor -3. Intelligent fallback strategies for difficult scenarios -4. Comprehensive testing across multiple difficulty levels -5. Flexible deployment (standalone, cloud, edge) \ No newline at end of file diff --git a/docs/01_solution/01_solution_draft/Testing_Strategy_ATP.md b/docs/01_solution/01_solution_draft/Testing_Strategy_ATP.md deleted file mode 100644 index b16bded..0000000 --- a/docs/01_solution/01_solution_draft/Testing_Strategy_ATP.md +++ /dev/null @@ -1,631 +0,0 @@ -# Testing Strategy & Acceptance Test Plan (ATP) - -## Overview - -This document details the comprehensive testing strategy for the UAV Aerial Image Geolocalization System, covering unit tests, integration tests, system tests, field validation, and acceptance criteria. - ---- - -## 1. Test Pyramid Architecture - -``` - ▲ - /|\ - / | \ - ACCEPTANCE TESTS (5%) - ┌─────────────────────────┐ - /│ Field Trials │\ - / │ Real UAV Flights │ \ - / └─────────────────────────┘ \ - ╱ ╲ - SYSTEM TESTS (15%) - ┌────────────────────────────────┐ - /│ End-to-End Performance │\ - / │ Accuracy, Speed, Robustness │ \ - / └────────────────────────────────┘ \ -╱ ╲ - INTEGRATION TESTS (30%) - ┌─────────────────────────────────┐ - /│ Multi-Component Workflows │\ -/ │ FeatureMatcher → Triangulator │ \ - │ Bundle Adjustment Refinement │ - └─────────────────────────────────┘ - - UNIT TESTS (50%) - ┌─────────────────────────┐ - /│ Individual Components │\ - / │ AKAZE, Essential Matrix │ - │ Triangulation, BA... │ - └─────────────────────────┘ -``` - ---- - -## 2. Detailed Test Categories - -### 2.1 Unit Tests (Level 1) - -#### UT-1: Feature Extraction (AKAZE) -``` -Purpose: Verify keypoint detection and descriptor computation -Test Data: Synthetic images with known features (checkerboard patterns) -Test Cases: - ├─ UT-1.1: Basic feature detection - │ Input: 1024×768 synthetic image with checkerboard - │ Expected: ≥500 keypoints detected - │ Pass: count ≥ 500 - │ - ├─ UT-1.2: Scale invariance - │ Input: Same scene at 2x scale - │ Expected: Keypoints at proportional positions - │ Pass: correlation of positions > 0.9 - │ - ├─ UT-1.3: Rotation robustness - │ Input: Image rotated ±30° - │ Expected: Descriptors match original + rotated - │ Pass: match rate > 80% - │ - ├─ UT-1.4: Multi-scale handling - │ Input: Image with features at multiple scales - │ Expected: Features detected at all scales (pyramid) - │ Pass: ratio of scales [1:1.2:1.44:...] verified - │ - └─ UT-1.5: Performance constraint - Input: FullHD image (1920×1080) - Expected: <500ms feature extraction - Pass: 95th percentile < 500ms -``` - -#### UT-2: Feature Matching -``` -Purpose: Verify robust feature correspondence -Test Data: Pairs of synthetic/real images with known correspondence -Test Cases: - ├─ UT-2.1: Basic matching - │ Input: Two images from synthetic scene (90% overlap) - │ Expected: ≥95% of ground-truth features matched - │ Pass: match_rate ≥ 0.95 - │ - ├─ UT-2.2: Outlier rejection (Lowe's ratio test) - │ Input: Synthetic pair + 50% false features - │ Expected: False matches rejected - │ Pass: false_match_rate < 0.1 - │ - ├─ UT-2.3: Low overlap scenario - │ Input: Two images with 20% overlap - │ Expected: Still matches ≥20 points - │ Pass: min_matches ≥ 20 - │ - └─ UT-2.4: Performance - Input: FullHD images, 1000 features each - Expected: <300ms matching time - Pass: 95th percentile < 300ms -``` - -#### UT-3: Essential Matrix Estimation -``` -Purpose: Verify 5-point/8-point algorithms for camera geometry -Test Data: Synthetic correspondences with known relative pose -Test Cases: - ├─ UT-3.1: 8-point algorithm - │ Input: 8+ point correspondences - │ Expected: Essential matrix E with rank 2 - │ Pass: min_singular_value(E) < 1e-6 - │ - ├─ UT-3.2: 5-point algorithm - │ Input: 5 point correspondences - │ Expected: Up to 4 solutions generated - │ Pass: num_solutions ∈ [1, 4] - │ - ├─ UT-3.3: RANSAC convergence - │ Input: 100 correspondences, 30% outliers - │ Expected: Essential matrix recovery despite outliers - │ Pass: inlier_ratio ≥ 0.6 - │ - └─ UT-3.4: Chirality constraint - Input: Multiple (R,t) solutions from decomposition - Expected: Only solution with points in front of cameras selected - Pass: selected_solution verified via triangulation -``` - -#### UT-4: Triangulation (DLT) -``` -Purpose: Verify 3D point reconstruction from image correspondences -Test Data: Synthetic scenes with known 3D geometry -Test Cases: - ├─ UT-4.1: Accuracy - │ Input: Noise-free point correspondences - │ Expected: Reconstructed X matches ground truth - │ Pass: RMSE < 0.1cm on 1m scene - │ - ├─ UT-4.2: Outlier handling - │ Input: 10 valid + 2 invalid correspondences - │ Expected: Invalid points detected (behind camera/far) - │ Pass: valid_mask accuracy > 95% - │ - ├─ UT-4.3: Altitude constraint - │ Input: Points with z < 50m (below aircraft) - │ Expected: Points rejected - │ Pass: altitude_filter works correctly - │ - └─ UT-4.4: Batch performance - Input: 500 point triangulations - Expected: <100ms total - Pass: 95th percentile < 100ms -``` - -#### UT-5: Bundle Adjustment -``` -Purpose: Verify pose and 3D point optimization -Test Data: Synthetic multi-view scenes -Test Cases: - ├─ UT-5.1: Convergence - │ Input: 5 frames with noisy initial poses - │ Expected: Residual decreases monotonically - │ Pass: final_residual < 0.001 * initial_residual - │ - ├─ UT-5.2: Covariance computation - │ Input: Optimized poses and points - │ Expected: Covariance matrix positive-definite - │ Pass: all_eigenvalues > 0 - │ - ├─ UT-5.3: Window size effect - │ Input: Same problem with window sizes [3, 5, 10] - │ Expected: Larger windows → better residuals - │ Pass: residual_5 < residual_3, residual_10 < residual_5 - │ - └─ UT-5.4: Performance scaling - Input: Window size [5, 10, 15, 20] - Expected: Time ~= O(w^3) - Pass: quadratic fit accurate (R² > 0.95) -``` - ---- - -### 2.2 Integration Tests (Level 2) - -#### IT-1: Sequential Pipeline -``` -Purpose: Verify image-to-image processing chain -Test Data: Real aerial image sequences (5-20 images) -Test Cases: - ├─ IT-1.1: Feature flow - │ Features extracted from img₁ → tracked to img₂ → matched - │ Expected: Consistent tracking across images - │ Pass: ≥70% features tracked end-to-end - │ - ├─ IT-1.2: Pose chain consistency - │ Poses P₁, P₂, P₃ computed sequentially - │ Expected: P₃ = P₂ ∘ P₂₋₁ (composition consistency) - │ Pass: pose_error < 0.1° rotation, 5cm translation - │ - ├─ IT-1.3: Trajectory smoothness - │ Velocity computed between poses - │ Expected: Smooth velocity profile (no jumps) - │ Pass: velocity_std_dev < 20% mean_velocity - │ - └─ IT-1.4: Memory usage - Process 100-image sequence - Expected: Constant memory (windowed processing) - Pass: peak_memory < 2GB -``` - -#### IT-2: Satellite Georeferencing -``` -Purpose: Verify local-to-global coordinate transformation -Test Data: Synthetic/real images with known satellite reference -Test Cases: - ├─ IT-2.1: Feature matching with satellite - │ Input: Aerial image + satellite reference - │ Expected: ≥10 matched features between viewpoints - │ Pass: match_count ≥ 10 - │ - ├─ IT-2.2: Homography estimation - │ Matched features → homography matrix - │ Expected: Valid transformation (3×3 matrix) - │ Pass: det(H) ≠ 0, condition_number < 100 - │ - ├─ IT-2.3: GPS transformation accuracy - │ Apply homography to image corners - │ Expected: Computed GPS ≈ known reference GPS - │ Pass: error < 100m (on test data) - │ - └─ IT-2.4: Confidence scoring - Compute inlier_ratio and MI (mutual information) - Expected: score = inlier_ratio × MI ∈ [0, 1] - Pass: high_confidence for obvious matches -``` - -#### IT-3: Outlier Detection Chain -``` -Purpose: Verify multi-stage outlier detection -Test Data: Synthetic trajectory with injected outliers -Test Cases: - ├─ IT-3.1: Velocity anomaly detection - │ Inject 350m jump at frame N - │ Expected: Detected as outlier - │ Pass: outlier_flag = True - │ - ├─ IT-3.2: Recovery mechanism - │ After outlier detection - │ Expected: System attempts skip-frame matching (N→N+2) - │ Pass: recovery_successful = True - │ - ├─ IT-3.3: False positive rate - │ Normal sequence with small perturbations - │ Expected: <5% false outlier flagging - │ Pass: false_positive_rate < 0.05 - │ - └─ IT-3.4: Consistency across stages - Multiple detection stages should agree - Pass: agreement_score > 0.8 -``` - ---- - -### 2.3 System Tests (Level 3) - -#### ST-1: Accuracy Criteria -``` -Purpose: Verify system meets ±50m and ±20m accuracy targets -Test Data: Real aerial image sequences with ground-truth GPS -Test Cases: - ├─ ST-1.1: 50m accuracy target - │ Input: 500-image flight - │ Compute: % images within 50m of ground truth - │ Expected: ≥80% - │ Pass: accuracy_50m ≥ 0.80 - │ - ├─ ST-1.2: 20m accuracy target - │ Same flight data - │ Expected: ≥60% within 20m - │ Pass: accuracy_20m ≥ 0.60 - │ - ├─ ST-1.3: Mean absolute error - │ Compute: MAE over all images - │ Expected: <40m typical - │ Pass: MAE < 50m - │ - └─ ST-1.4: Error distribution - Expected: Error approximately Gaussian - Pass: K-S test p-value > 0.05 -``` - -#### ST-2: Registration Rate -``` -Purpose: Verify ≥95% of images successfully registered -Test Data: Real flights with various conditions -Test Cases: - ├─ ST-2.1: Baseline registration - │ Good overlap, clear features - │ Expected: >98% registration rate - │ Pass: registration_rate ≥ 0.98 - │ - ├─ ST-2.2: Challenging conditions - │ Low texture, variable lighting - │ Expected: ≥95% registration rate - │ Pass: registration_rate ≥ 0.95 - │ - ├─ ST-2.3: Sharp turns scenario - │ Images with <10% overlap - │ Expected: Fallback mechanisms trigger, ≥90% success - │ Pass: fallback_success_rate ≥ 0.90 - │ - └─ ST-2.4: Consecutive failures - Track max consecutive unregistered images - Expected: <3 consecutive failures - Pass: max_consecutive_failures ≤ 3 -``` - -#### ST-3: Reprojection Error -``` -Purpose: Verify <1.0 pixel mean reprojection error -Test Data: Real flight data after bundle adjustment -Test Cases: - ├─ ST-3.1: Mean reprojection error - │ After BA optimization - │ Expected: <1.0 pixel - │ Pass: mean_reproj_error < 1.0 - │ - ├─ ST-3.2: Error distribution - │ Histogram of per-point errors - │ Expected: Tightly concentrated <2 pixels - │ Pass: 95th_percentile < 2.0 px - │ - ├─ ST-3.3: Per-frame consistency - │ Error should not vary dramatically - │ Expected: Consistent across frames - │ Pass: frame_error_std_dev < 0.3 px - │ - └─ ST-3.4: Outlier points - Very large reprojection errors - Expected: <1% of points with error >3 px - Pass: outlier_rate < 0.01 -``` - -#### ST-4: Processing Speed -``` -Purpose: Verify <2 seconds per image -Test Data: Full flight sequences on target hardware -Test Cases: - ├─ ST-4.1: Average latency - │ Mean processing time per image - │ Expected: <2 seconds - │ Pass: mean_latency < 2.0 sec - │ - ├─ ST-4.2: 95th percentile latency - │ Worst-case images (complex scenes) - │ Expected: <2.5 seconds - │ Pass: p95_latency < 2.5 sec - │ - ├─ ST-4.3: Component breakdown - │ Feature extraction: <0.5s - │ Matching: <0.3s - │ RANSAC: <0.2s - │ BA: <0.8s - │ Satellite: <0.3s - │ Pass: Each component within budget - │ - └─ ST-4.4: Scaling with problem size - Memory usage, CPU usage vs. image resolution - Expected: Linear scaling - Pass: O(n) complexity verified -``` - -#### ST-5: Robustness - Outlier Handling -``` -Purpose: Verify graceful handling of 350m outlier drifts -Test Data: Synthetic/real data with injected outliers -Test Cases: - ├─ ST-5.1: Single 350m outlier - │ Inject outlier at frame N - │ Expected: Detected, trajectory continues - │ Pass: system_continues = True - │ - ├─ ST-5.2: Multiple outliers - │ 3-5 outliers scattered in sequence - │ Expected: All detected, recovery attempted - │ Pass: detection_rate ≥ 0.8 - │ - ├─ ST-5.3: False positive rate - │ Normal trajectory, no outliers - │ Expected: <5% false flagging - │ Pass: false_positive_rate < 0.05 - │ - └─ ST-5.4: Recovery latency - Time to recover after outlier - Expected: ≤3 frames - Pass: recovery_latency ≤ 3 frames -``` - -#### ST-6: Robustness - Sharp Turns -``` -Purpose: Verify handling of <5% image overlap scenarios -Test Data: Synthetic sequences with sharp angles -Test Cases: - ├─ ST-6.1: 5% overlap matching - │ Two images with 5% overlap - │ Expected: Minimal matches or skip-frame - │ Pass: system_handles_gracefully = True - │ - ├─ ST-6.2: Skip-frame fallback - │ Direct N→N+1 fails, tries N→N+2 - │ Expected: Succeeds with N→N+2 - │ Pass: skip_frame_success_rate ≥ 0.8 - │ - ├─ ST-6.3: 90° turn handling - │ Images at near-orthogonal angles - │ Expected: Degeneracy detected, logged - │ Pass: degeneracy_detection = True - │ - └─ ST-6.4: Trajectory consistency - Consecutive turns: check velocity smoothness - Expected: No velocity jumps > 50% - Pass: velocity_consistency verified -``` - ---- - -### 2.4 Field Acceptance Tests (Level 4) - -#### FAT-1: Real UAV Flight Trial #1 (Baseline) -``` -Scenario: Nominal flight over agricultural field -┌────────────────────────────────────────┐ -│ Conditions: │ -│ • Clear weather, good sunlight │ -│ • Flat terrain, sparse trees │ -│ • 300m altitude, 50m/s speed │ -│ • 800 images, ~15 min flight │ -└────────────────────────────────────────┘ - -Pass Criteria: - ✓ Accuracy: ≥80% within 50m - ✓ Accuracy: ≥60% within 20m - ✓ Registration rate: ≥95% - ✓ Processing time: <2s/image - ✓ Satellite validation: <10% outliers - ✓ Reprojection error: <1.0px mean - -Success Metrics: - • MAE (mean absolute error): <40m - • RMS error: <45m - • Max error: <200m - • Trajectory coherence: smooth (no jumps) -``` - -#### FAT-2: Real UAV Flight Trial #2 (Challenging) -``` -Scenario: Flight with more complex terrain -┌────────────────────────────────────────┐ -│ Conditions: │ -│ • Mixed urban/agricultural │ -│ • Buildings, vegetation, water bodies │ -│ • Variable altitude (250-400m) │ -│ • Includes 1-2 sharp turns │ -│ • 1200 images, ~25 min flight │ -└────────────────────────────────────────┘ - -Pass Criteria: - ✓ Accuracy: ≥75% within 50m (relaxed from 80%) - ✓ Accuracy: ≥50% within 20m (relaxed from 60%) - ✓ Registration rate: ≥92% (relaxed from 95%) - ✓ Processing time: <2.5s/image avg - ✓ Outliers detected: <15% (relaxed from 10%) - -Fallback Validation: - ✓ User corrected <20% of uncertain images - ✓ After correction, accuracy meets FAT-1 targets -``` - -#### FAT-3: Real UAV Flight Trial #3 (Edge Case) -``` -Scenario: Low-texture flight (challenging for features) -┌────────────────────────────────────────┐ -│ Conditions: │ -│ • Sandy/desert terrain or water │ -│ • Minimal features │ -│ • Overcast/variable lighting │ -│ • 500-600 images, ~12 min flight │ -└────────────────────────────────────────┘ - -Pass Criteria: - ✓ System continues (no crash): YES - ✓ Graceful degradation: Flags uncertainty - ✓ User can correct and improve: YES - ✓ Satellite anchor helps recovery: YES - -Success Metrics: - • >80% of images tagged "uncertain" - • After user correction: meets standard targets - • Demonstrates fallback mechanisms working -``` - ---- - -## 3. Test Environment Setup - -### Hardware Requirements -``` -CPU: 16+ cores (Intel Xeon / AMD Ryzen) -RAM: 64GB minimum (32GB acceptable for <1500 images) -Storage: 1TB SSD (for raw images + processing) -GPU: Optional (CUDA 11.8+ for 5-10x acceleration) -Network: For satellite API queries (can be cached) -``` - -### Software Requirements -``` -OS: Ubuntu 20.04 LTS or macOS 12+ -Build: CMake 3.20+, GCC 9+ or Clang 11+ -Dependencies: OpenCV 4.8+, Eigen 3.4+, GDAL 3.0+ -Testing: GoogleTest, Pytest -CI/CD: GitHub Actions or Jenkins -``` - -### Test Data Management -``` -Synthetic Data: Generated via Blender (checked into repo) -Real Data: External dataset storage (S3/local SSD) -Ground Truth: Maintained in CSV format with metadata -Versioning: Git-LFS for binary image data -``` - ---- - -## 4. Test Execution Plan - -### Phase 1: Unit Testing (Weeks 1-6) -``` -Sprint 1-2: UT-1 (Feature detection) - 2 week -Sprint 3-4: UT-2 (Feature matching) - 2 weeks -Sprint 5-6: UT-3, UT-4, UT-5 (Geometry) - 2 weeks - -Continuous: Run full unit test suite every commit -Coverage target: >90% code coverage -``` - -### Phase 2: Integration Testing (Weeks 7-12) -``` -Sprint 7-9: IT-1 (Sequential pipeline) - 3 weeks -Sprint 10-11: IT-2, IT-3 (Georef, Outliers) - 2 weeks -Sprint 12: System integration - 1 week - -Continuous: Integration tests run nightly -``` - -### Phase 3: System Testing (Weeks 13-18) -``` -Sprint 13-14: ST-1, ST-2 (Accuracy, Registration) - 2 weeks -Sprint 15-16: ST-3, ST-4 (Error, Speed) - 2 weeks -Sprint 17-18: ST-5, ST-6 (Robustness) - 2 weeks - -Load testing: 1000-3000 image sequences -Stress testing: Edge cases, memory limits -``` - -### Phase 4: Field Acceptance (Weeks 19-30) -``` -Week 19-22: FAT-1 (Baseline trial) - • Coordinate 1-2 baseline flights - • Validate system on real data - • Adjust parameters as needed - -Week 23-26: FAT-2 (Challenging trial) - • More complex scenarios - • Test fallback mechanisms - • Refine user interface - -Week 27-30: FAT-3 (Edge case trial) - • Low-texture scenarios - • Validate robustness - • Final adjustments - -Post-trial: Generate comprehensive report -``` - ---- - -## 5. Acceptance Criteria Summary - -| Criterion | Target | Test | Pass/Fail | -|-----------|--------|------|-----------| -| **Accuracy@50m** | ≥80% | FAT-1 | ≥80% pass | -| **Accuracy@20m** | ≥60% | FAT-1 | ≥60% pass | -| **Registration Rate** | ≥95% | ST-2 | ≥95% pass | -| **Reprojection Error** | <1.0px mean | ST-3 | <1.0px pass | -| **Processing Speed** | <2.0s/image | ST-4 | p95<2.5s pass | -| **Robustness (350m outlier)** | Handled | ST-5 | Continue pass | -| **Sharp turns (<5% overlap)** | Handled | ST-6 | Skip-frame pass | -| **Satellite validation** | <10% outliers | FAT-1-3 | <10% pass | - ---- - -## 6. Success Metrics - -**Green Light Criteria** (Ready for production): -- ✅ All unit tests pass (100%) -- ✅ All integration tests pass (100%) -- ✅ All system tests pass (100%) -- ✅ FAT-1 and FAT-2 pass acceptance criteria -- ✅ FAT-3 shows graceful degradation -- ✅ <10% code defects discovered in field trials -- ✅ Performance meets SLA consistently - -**Yellow Light Criteria** (Conditional deployment): -- ⚠ 85-89% of acceptance criteria met -- ⚠ Minor issues in edge cases -- ⚠ Requires workaround documentation -- ⚠ Re-test after fixes - -**Red Light Criteria** (Do not deploy): -- ❌ <85% of acceptance criteria met -- ❌ Critical failures in core functionality -- ❌ Safety/security concerns -- ❌ Cannot meet latency or accuracy targets - ---- - -## Conclusion - -This comprehensive testing strategy ensures the UAV Image Geolocalization System is robust, accurate, and reliable before production deployment. The multi-level approach (unit → integration → system → field) progressively validates system behavior from components to end-to-end scenarios, with field trials providing real-world validation of the acceptance criteria. \ No newline at end of file diff --git a/docs/01_solution/01_solution_draft_google.md b/docs/01_solution/01_solution_draft_google.md new file mode 100644 index 0000000..c1726c7 --- /dev/null +++ b/docs/01_solution/01_solution_draft_google.md @@ -0,0 +1,271 @@ +# **Analysis and Proposed Architecture for a Hybrid Visual-Geodetic Localization System** + +## **Part 1: Product Solution Description: The GOLS (Geolocational Odometry & Localization System)** + +### **1.1. Executive Summary: System Concept and Mission** + +This report details the technical architecture for the "Geolocational Odometry & Localization System" (GOLS), a high-precision, offline processing software suite designed to solve a complex georeferencing problem. The system's primary mission is to ingest a chronologically-ordered sequence of high-resolution aerial images (e.g., AD000001.jpg to AD001500.jpg) captured from a fixed-wing Unmanned Aerial Vehicle (UAV) and, using *only* the known GPS coordinate of the first image 1, reconstruct the complete and precise geodetic location of the entire flight. + +The system's outputs are twofold: + +1. A high-fidelity, georeferenced 6-Degrees-of-Freedom (6-DoF) pose (comprising Latitude, Longitude, Altitude, Roll, Pitch, and Yaw) for every valid image in the input sequence. +2. An on-demand query function to determine the precise WGS84 GPS coordinates (latitude, longitude, altitude) of any object, identified by its pixel coordinates (u, v), within any of these successfully georeferenced images. + +GOLS is architected as a hybrid, multi-modal system. It is designed to overcome the two fundamental and coupled challenges inherent to this problem: + +1. **Scale Ambiguity:** Monocular Visual Odometry (VO) or Simultaneous Localization and Mapping (SLAM) systems are incapable of determining the true scale of the world from a single camera feed.2 The 100-meter distance between photos is a guideline, but cannot be used as a rigid constraint to solve this ambiguity. +2. **Cumulative Drift:** All relative-positioning systems accumulate small errors over time, causing the estimated trajectory to "drift" from the true geodetic path.4 + +To solve these, the GOLS architecture fuses high-frequency, relative motion estimates (derived from frame-to-frame image analysis) with low-frequency, *absolute* geodetic "anchor points" derived from matching the UAV's imagery against an external satellite map provider (Google Maps).5 + +### **1.2. Core Technical Principle: A Hybrid Fusion Approach** + +The system's core philosophy is founded on the understanding that no single, monolithic algorithm can meet the severe operational constraints and stringent acceptance criteria. The problem as defined sits at the intersection of three challenging computer vision domains. + +* A standard **Visual SLAM** system (e.g., ORB-SLAM3 8) would fail. The target environment (Eastern and Southern Ukraine) and the provided sample images (Images 1-9) are dominated by natural, low-texture terrain such as fields, shrubbery, and dirt roads. Feature-based SLAM systems are notoriously unreliable in such "textureless" areas.10 +* A standard **Structure from Motion (SfM)** pipeline (e.g., COLMAP 13) would also fail. The constraints explicitly state "sharp turns" with less than 5% image-to-image overlap and potential "350m outlier" photos. Traditional SfM approaches require significant image overlap and will fail to register images across these gaps.14 + +Therefore, GOLS is designed as a modular, graph-based optimization framework.15 It separates the problem into three parallel "front-end" modules that generate "constraints" (i.e., measurements) and one "back-end" module that fuses all measurements into a single, globally consistent solution. + +1. **Module 1: Visual Odometry (VO) Front-End:** This module computes high-frequency, *relative* frame-to-frame motion (e.g., "frame 2 is 98.2m forward and 1.8° right of frame 1"). This provides the dense "shape" of the trajectory but is unscaled and prone to drift. +2. **Module 2: Wide-Baseline SfM Front-End:** This module computes low-frequency, *non-sequential relative* matches (e.g., "frame 50 contains the same building as frame 5"). Its sole purpose is to bridge large gaps in the trajectory caused by sharp turns (Acceptance Criterion 4) or sensor outliers (Acceptance Criterion 3). +3. **Module 3: Cross-View Georeferencing (CVG) Front-End:** This module computes low-frequency, *absolute* pose estimates (e.g., "frame 100 is at 48.27° N, 37.38° E, at 1km altitude"). It does this by matching UAV images to the georeferenced satellite map.5 This module provides the *absolute scale* and the *GPS anchors* necessary to eliminate drift and meet the \<20m accuracy criteria. +4. **Module 4: Back-End Global Optimizer:** This module fuses all constraints from the other three modules into a pose graph and solves it, finding the 6-DoF pose for every image that best-satisfies all (often conflicting) measurements. + +### **1.3. Addressing Key Problem Constraints & Acceptance Criteria** + +This hybrid architecture is specifically designed to meet each acceptance criterion: + +* **Criteria 1 & 2 (80% \< 50m, 60% \< 20m error):** Solved by the **CVG Front-End (Module 3)**. By providing sparse, absolute GPS fixes, this module "anchors" the pose graph. The back-end optimization (Module 4) propagates this absolute information across the entire trajectory, correcting the scale and eliminating the cumulative drift.4 +* **Criteria 3 & 4 (350m outlier, \<5% overlap on turns):** Solved by the **Wide-Baseline SfM Front-End (Module 2)**. This module will use state-of-the-art (SOTA) deep-learning-based feature matchers (analyzed in 2.1.2) that are designed for "wide-baseline" or "low-overlap" scenarios.18 These matchers can find correspondences where traditional methods fail, allowing the system to bridge these gaps. +* **Non-Stabilized Camera Constraint:** The fixed, non-stabilized camera on a fixed-wing platform will induce severe roll/pitch and motion blur. This is handled by two components: (1) A specialized VO front-end (Module 1) that is photometrically robust 20, and (2) aggressive RANSAC (RANdom SAmple Consensus) 21 outlier rejection at every matching stage to discard false correspondences caused by motion blur or extreme perspective distortion. +* **Criterion 7 (\< 2s/image):** Solved by a multi-threaded, asynchronous architecture and **GPU acceleration**.22 The performance criterion is interpreted as *average throughput*, not *latency*. The fast VO front-end (Module 1) provides an initial pose, while the computationally expensive Modules 2, 3, and 4 run on a GPU (e.g., using CUDA 24) in parallel, asynchronously refining the global solution. +* **Criteria 8 & 9 (Reg. Rate > 95%, MRE \< 1.0px):** These criteria are achieved by the combination of the front-ends and back-end. The >95% **Registration Rate** 25 is achieved by the *high-recall* front-ends (Module 1 + 2). The \<1.0 pixel **Mean Reprojection Error** 26 is the explicit optimization target of the **Back-End Global Optimizer (Module 4)**, which performs a global Bundle Adjustment (BA) to minimize this exact metric.27 + +## **Part 2: System Architecture and State-of-the-Art Analysis** + +### **2.1. SOTA Foundational Analysis: Selecting the Core Algorithmic Components** + +The GOLS architecture is a composition of SOTA components, each selected to solve a specific part of the problem. + +#### **2.1.1. Front-End Strategy 1 (VO): Feature-Based (ORB-SLAM) vs. Direct (DSO)** + +The primary relative VO front-end (Module 1) must be robust to the UAV's fast motion and the environment's visual characteristics. + +* **Feature-Based Methods (e.g., ORB-SLAM):** Systems like ORB-SLAM3 8 are highly successful general-purpose SLAM systems. They operate by detecting sparse, repeatable features (ORB) and matching them between frames. However, their primary weakness is a reliance on "good features." Research and practical application show that ORB-SLAM suffers from "tracking loss in textureless areas".10 The provided sample images (e.g., Image 1, 2, 8, 9) are dominated by homogeneous fields and sparse vegetation—a worst-case scenario for feature-based methods. +* **Direct Methods (e.g., DSO):** Systems like Direct Sparse Odometry (DSO) 20 or LSD-SLAM 31 operate on a different principle. They "discard the feature extractor and directly utilize the pixel intensity" 32 by minimizing the *photometric error* (difference in pixel brightness) between frames. +* **Analysis and Decision:** For this specific problem, a **Direct** method is superior. + 1. DSO "does not depend on keypoint detectors or descriptors".30 It "can naturally sample pixels from across all image regions that have intensity gradient, including edges or smooth intensity variations".20 + 2. This makes it *ideal* for the target environment. The edge of a dirt track (Image 4), the boundary between two fields (Image 1), or the shadow of a bush (Image 2) provide strong, usable gradients for DSO, whereas they contain no "corners" for ORB. + 3. Furthermore, the non-stabilized camera will cause rapid changes in auto-exposure and vignetting. DSO's formulation "integrates a full photometric calibration, accounting for exposure time, lens vignetting, and non-linear response functions".20 ORB-SLAM's assumption of uniform motion and lighting is more easily violated.34 + +Therefore, the GOLS **Module 1: Relative Odometry Front-End** will be based on **Direct Sparse Odometry (DSO)**. + +#### **2.1.2. Front-End Strategy 2 (Matching): Traditional (SIFT) vs. Deep Learning (SuperGlue)** + +Modules 2 (Wide-Baseline SfM) and 3 (CVG) both rely on a "wide-baseline" feature matcher—an algorithm that can match two images with very different viewpoints. + +* **The Challenge:** The system must match (1) UAV-to-UAV across a "sharp turn" with \<5% overlap (AC 4) and (2) UAV-to-Satellite, which is an *extreme* cross-view matching problem with differences in scale, sensor, illumination, and time (due to outdated maps).35 +* **Traditional Matchers (e.g., SIFT, ORB):** Classic methods like SIFT 36 are robust to scale and rotation, but they fail decisively in cross-view scenarios.37 They produce very few valid matches, which are then almost entirely eliminated by RANSAC filtering.35 +* **Deep Learning Matchers (e.g., SuperGlue):** The SOTA solution is a combination of the **SuperPoint** feature detector 38 and the **SuperGlue** matcher.19 SuperGlue is not a simple matcher; it is a Graph Neural Network (GNN) that performs "context aggregation, matching, and filtering" simultaneously.19 It "establishes global context perception" 38 by learning the geometric priors of 3D-to-2D projection.39 +* **Analysis and Decision:** + 1. For both the wide-baseline UAV-to-UAV case and the extreme cross-view UAV-to-satellite case, **SuperPoint + SuperGlue** is the selected technology. + 2. Multiple studies confirm that SuperPoint+SuperGlue significantly outperforms traditional methods (SIFT/ORB) and other learning-based methods (LoFTR) for UAV-to-map and UAV-to-satellite registration.35 + 3. Its ability to match based on *global context* rather than just local patch appearance makes it robust to the sparse-texture scenes 41 and extreme viewpoint changes that define this problem. + +Therefore, GOLS **Module 2 (Wide-Baseline SfM)** and **Module 3 (CVG)** will be built upon the **SuperPoint + SuperGlue** matching pipeline. + +#### **2.1.3. Back-End Strategy: Monolithic (COLMAP) vs. Hybrid (CVD-SfM) vs. Graph Fusion (GTSAM)** + +The back-end (Module 4) must fuse the *heterogeneous* constraints from Modules 1, 2, and 3\. + +* **Monolithic SfM (e.g., COLMAP):** A classic incremental SfM pipeline.14 As noted, "Traditional SfM pipelines (COLMAP, OpenMVG) struggle with large viewpoint differences, failing to match enough cross-view features".13 This approach will fail. +* **Hybrid SfM (e.g., CVD-SfM):** A very recent SOTA approach (IROS 2025) is CVD-SfM, a "Cross-View Deep Front-end Structure-from-Motion System".14 This system is *designed* to integrate cross-view (e.g., satellite) priors and deep features into a unified SfM pipeline 45 and is shown to achieve higher registration "coverage" than COLMAP on sparse, multi-altitude datasets.13 +* **Graph Fusion (e.g., g2o / GTSAM):** This approach, common in robotics, separates front-end measurement from back-end optimization.46 It uses a library like g2o (General Graph Optimization) 47 or GTSAM (Georgia Tech Smoothing and Mapping) 49 to build a *factor graph*.50 +* **Analysis and Decision:** + 1. While CVD-SfM 44 is highly relevant, it is a monolithic system designed for a specific problem. + 2. Our problem requires fusing *three different types* of constraints: (1) fast, unscaled *photometric* constraints from DSO, (2) sparse, *feature-based relative* constraints from SuperGlue (UAV-to-UAV), and (3) sparse, *feature-based absolute* constraints from SuperGlue (UAV-to-Satellite). + 3. A general-purpose optimization framework is more flexible and robust. **GTSAM** 49 is a SOTA C++ library based on factor graphs, which are explicitly designed for multi-sensor fusion.4 + 4. We can define custom "factors" for each of our three constraint types.53 + 5. Crucially, GTSAM includes the **iSAM2** solver 55, an *incremental* smoothing and mapping algorithm. This allows the back-end graph to be efficiently updated as new (and slow-to-compute) satellite "anchor" constraints arrive from Module 3, without re-solving the entire 3000-image graph from scratch. This incremental nature is essential for meeting the \< 2s/image performance criterion (AC 7). + +Therefore, the GOLS **Module 4: Back-End Global Optimizer** will be implemented using the **GTSAM** factor graph library. + +### **2.2. Proposed GOLS Architecture: A Multi-Front-End, Factor-Graph System** + +The GOLS system is an asynchronous, multi-threaded application built on the four selected modules. + +#### **2.2.1. Module 1: The Relative Odometry Front-End (DSO-VO)** + +* **Purpose:** To generate a high-frequency, low-drift, but *unscaled* estimate of the UAV's trajectory. +* **Algorithm:** Direct Sparse Odometry (DSO).20 +* **Workflow:** + 1. This module runs in a high-priority thread, processing images sequentially (N, N+1). + 2. It minimizes the photometric error between the frames to estimate the relative 6-DoF pose transformation T(N, N+1). + 3. This module runs fastest, providing the initial "dead-reckoning" path. The *scale* of this path is initially unknown (or bootstrapped by Module 3). +* **Output:** A high-frequency stream of Odometry Factors (binary constraints between Pose_N and Pose_N+1) 49 is published to the Back-End (Module 4). + +#### **2.2.2. Module 2: The Wide-Baseline & Outlier Front-End (SG-SfM)** + +* **Purpose:** To find non-sequential "loop closures" or "shortcuts" to correct for drift and re-establish tracking after sharp turns (AC 4) or outliers (AC 3). +* **Algorithm:** SuperPoint 38 + SuperGlue.19 +* **Workflow:** + 1. This module runs asynchronously in a lower-priority GPU thread. + 2. It does *not* compare N to N+1. Instead, it compares Image N to a "sliding window" of non-adjacent frames (e.g., N-50...N-10 and N+10...N+50). + 3. **Handling AC 3 (350m Outlier):** If Image N+1 is an outlier (e.g., a sudden tilt causes a 350m ground shift), Module 1 will fail to match N to N+1. This module, however, will continue to search and will successfully match N to N+2 (or N+3, etc.), *bridging* the outlier. The outlier frame N+1 becomes an un-registered "island" in the pose graph, which is permissible under AC 6 (the 20% allowance). + 4. **Handling AC 4 (Sharp Turn):** Similarly, if a sharp turn breaks Module 1's tracking due to \<5% overlap, this module will find a wide-baseline match, creating a constraint T(N, N+k) that bridges the turn. +* **Output:** A low-frequency stream of Loop Closure Factors (binary constraints between Pose_N and Pose_N+k) 54 is published to the Back-End. + +#### **2.2.3. Module 3: The Absolute Georeferencing Front-End (CVG)** + +* **Purpose:** To provide *absolute scale* and *global GPS coordinates* to anchor the entire graph and eliminate drift, thereby meeting AC 1 and AC 2. +* **Algorithm:** SuperPoint + SuperGlue 42 + RANSAC 21 + Google Maps API. +* **Workflow:** + 1. **Initialization:** This module runs first. It takes AD000001.jpg and its known GPS coordinate.1 It fetches the corresponding satellite tile from Google Maps. It performs a SuperPoint+SuperGlue match.41 From this match, it calculates the 6-DoF pose of the first frame and, most importantly, the initial **scale** (Ground Sampling Distance, GSD, in meters/pixel). This GSD is used to provide an initial scale to the DSO module (Module 1). + 2. **Asynchronous Anchoring:** This module runs in a background GPU thread, activating on a sparse subset of images (e.g., every 25th frame, or when Module 4 reports high pose uncertainty). + 3. For a target Image N, it uses the *current best estimate* of its pose (from the GTSAM graph) to fetch the relevant satellite tile. + 4. It performs a SuperGlue UAV-to-satellite match.42 + 5. **Handling Outdated Maps:** The constraint "Google Maps...could be...outdated" 56 is a major challenge. This architecture is robust to it. Because SuperGlue 38 is a *feature-based* matcher, it matches persistent geometric features (road intersections, building corners 35, field boundaries) that are stable over time. It is not confused by temporal, non-geometric changes (e.g., different seasons, presence/absence of cars, new/destroyed small structures) that would foil a dense or semantic matcher (like CVM-Net 57). + 6. **Outlier Rejection (AC 5):** The match is validated with a robust RANSAC.21 If the number of inlier matches is too low or the reprojection error is too high (indicating a failed or poor match, e.g., due to clouds or extreme map changes), the match is *discarded*.58 This ensures the "Number of outliers during...ground check" is \< 10%. +* **Output:** A low-frequency stream of Absolute Pose Factors (unary, or "GPS" constraints) 54 is published to the Back-End. + +#### **2.2.4. Module 4: The Back-End Global Optimizer (GTSAM-PGO)** + +* **Purpose:** To fuse all constraints from Modules 1, 2, and 3 into a single, globally-consistent 6-DoF trajectory that is scaled and georeferenced. +* **Algorithm:** GTSAM (Georgia Tech Smoothing and Mapping) 49 using an iSAM2 (incremental Smoothing and Mapping) solver.55 +* **Workflow:** + 1. The system maintains a factor graph 51 where *nodes* are the unknown 6-DoF poses of each camera and *edges* (factors) are the constraints (measurements) from the front-ends. + 2. **Factor Types:** + * **Unary Factor:** A high-precision prior on Pose_1 (from the 1 input). + * **Unary Factors:** The sparse, lower-precision Absolute Pose Factors from Module 3\.54 + * **Binary Factors:** The dense, high-precision, but unscaled Odometry Factors from Module 1\.53 + * **Binary Factors:** The sparse, high-precision Loop Closure Factors from Module 2\.54 + 3. **Optimization:** The iSAM2 solver 55 runs continuously, finding the set of 6-DoF poses that *minimizes the error* across all factors simultaneously (a non-linear least-squares problem). This optimization process: + * *Fixes Scale:* Uses the absolute measurements from Module 3 to find the single "scale" parameter that best fits the unscaled DSO measurements from Module 1\. + * *Corrects Drift:* Uses the "loop closures" from Module 2 and "GPS anchors" from Module 3 to correct the cumulative drift from Module 1\. +* **Output:** The final, optimized 6-DoF pose for all registered images. This final optimized structure is designed to meet the **MRE \< 1.0 pixels** criterion (AC 9).28 + +### **2.3. Sub-System: Object-Level Geolocation (Photogrammetric Ray-Casting)** + +The second primary user requirement is to find the GPS coordinates of "any object" (pixel) in an image. This is a standard photogrammetric procedure 59 that is *only* solvable *after* Module 4 has produced a high-fidelity 6-DoF pose for the image. + +#### **2.3.1. Prerequisite 1: Camera Intrinsic Calibration** + +The system must be provided with the camera's *intrinsic parameters* (focal length, principal point (cx, cy), and distortion coefficients k1, k2...). These must be pre-calibrated (e.g., using a checkerboard) and provided as an input file.28 + +#### **2.3.2. Prerequisite 2: Digital Elevation Model (DEM) Acquisition** + +To find where a ray from the camera hits the ground, a 3D model of the ground itself is required. + +* **SOTA Analysis:** Several free, global DEMs are available, including SRTM 60 and Copernicus DEM.60 +* **Decision:** The system will use the **Copernicus GLO-30 DEM**.63 +* **Rationale:** + 1. **Type:** Copernicus is a **Digital Surface Model (DSM)**, meaning it "represents the surface of the Earth including buildings, infrastructure and vegetation".63 This is *critically important*. The sample images (e.g., Image 5, 6, 7) clearly show buildings and trees. If a user clicks on a building rooftop, a DSM will return the correct (high) altitude. A Digital *Terrain* Model (DTM) like SRTM would return the altitude of the "bare earth" *under* the building, which would be incorrect. + 2. **Accuracy:** Copernicus GLO-30 (data 2011-2015) is a more modern and higher-fidelity dataset than the older SRTM (data 2000). It has "minimized data voids" and "improved the vertical accuracy".65 +* **Implementation:** The GOLS system will automatically download the required Copernicus GLO-30 tiles (in GeoTIFF format) for the flight's bounding box from an open-data source (e.g., the Copernicus S3 bucket 60). + +#### **2.3.3. Geolocation via Photogrammetric Ray-Casting** + +* **Algorithm:** This process is known as ray-casting.69 + 1. **Input:** Image AD0000X.jpg, pixel coordinate (u, v). + 2. **Load:** The optimized 6-DoF pose Pose_X (from Module 4) and the camera's intrinsic parameters. + 3. **Load:** The Copernicus GLO-30 DEM, loaded as a 3D mesh (e.g., using rasterio 71). + 4. **Un-project:** Using the camera intrinsics, convert the 2D pixel (u, v) into a 3D ray vector R in the camera's local coordinate system. + 5. **Transform:** Use the 6-DoF pose Pose_X to transform the ray's origin O (the camera's 3D position) and vector R into the global (WGS84) coordinate system. + 6. **Intersect:** Compute the 3D intersection point P(lat, lon, alt) where the global ray (O, R) intersects the 3D mesh of the DEM.72 + 7. **Output:** P(lat, lon, alt) is the precise GPS coordinate of the object at pixel (u, v). +* **Libraries:** This sub-system will be implemented in Python, using the rasterio library 71 for DEM I/O and a library like trimesh 75 or pyembree 73 for high-speed ray-mesh intersection calculations. + +### **2.4. Performance and Usability Architecture** + +#### **2.4.1. Hardware Acceleration (Criterion 7: \< 2s/image)** + +The \< 2s/image criterion is for *average throughput*. A 1500-image flight must complete in \< 3000 seconds (50 minutes). This is aggressive. + +* **Bottlenecks:** The deep learning front-ends (Module 2 & 3: SuperPoint/SuperGlue) 19 and the graph optimization (Module 4) are the bottlenecks. +* **Solution:** These modules *must* be GPU-accelerated.22 + 1. **Module 1 (DSO):** Natively fast, CPU-bound. + 2. **Module 2/3 (SuperGlue):** Natively designed for GPU execution.19 Running SuperGlue on a CPU would take tens of seconds *per match*, catastrophically failing the performance requirement. + 3. **Module 4 (GTSAM):** Can be compiled with CUDA support for GPU-accelerated solver steps. +* **Implementation:** The system will be built on a pub/sub framework (e.g., ROS 2, or a custom C++/Python framework) to manage the asynchronous-threaded architecture. A high-end NVIDIA GPU (e.g., RTX 30-series or 40-series) is a hard requirement for this system. NVIDIA's Isaac ROS suite provides GPU-accelerated VSLAM packages 24 that can serve as a reference for this implementation. + +#### **2.4.2. User-in-the-Loop Failsafe (Criterion 6)** + +The system must handle "absolute incapable" scenarios (e.g., flying over a large, textureless, featureless body of water) where all modules fail for 3+ consecutive images. + +* **Workflow:** + 1. **Monitor:** The Back-End (Module 4) monitors the queue of unregistered frames. + 2. **Trigger:** If 3 consecutive frames (e.g., N, N+1, N+2) fail all front-end checks (DSO tracking lost, SuperGlue-UAV fails, SuperGlue-Sat fails), the system pauses processing. + 3. **GUI Prompt:** A GUI is presented to the user. + * *Left Pane:* Shows the last known *good* image (e.g., N-1) with its estimated position on the satellite map. + * *Right Pane:* Shows the first *failed* image (e.g., N). + * *Map Pane:* An interactive Google Maps interface centered on the last known good location. + 4. **User Action:** The user must manually find a recognizable landmark in Image N (e.g., "that distinct river bend") and click its corresponding location on the map. + 5. **Recovery:** The user's click generates a new, low-precision Absolute Pose Factor. This new "anchor" is inserted into the GTSAM graph 49, which re-optimizes and attempts to re-start the entire processing pipeline from frame N. + +## **Part 3: Testing Strategy and Validation Plan** + +A comprehensive test suite is required to validate every acceptance criterion. The provided coordinates.csv file 1 will serve as the "ground truth" trajectory for validation.78 + +**Core Validation Metrics:** + +* **Absolute Trajectory Error (ATE):** The geodetic (Haversine) distance, in meters, between the system's estimated pose (lat, lon) for an image and the ground truth (lat, lon) from coordinates.csv.34 This is the primary metric for positional accuracy. +* **Mean Reprojection Error (MRE):** The average pixel distance between an observed 2D feature and its corresponding 3D map point re-projected back into the camera.26 This is the primary metric for the internal 3D consistency of the reconstruction.28 +* **Image Registration Rate (IRR):** The percentage of images for which the system successfully computes a 6-DoF pose.25 + +### **Table 3.1: Acceptance Criteria Test Matrix** + +| Criterion ID | Criterion Description | Test Case | Test Metric | Pass/Fail Threshold | +| :---- | :---- | :---- | :---- | :---- | +| **AC-1** | 80% of photos \< 50m error | **TC-1**: Baseline Accuracy | COUNT(ATE \< 50m) / TOTAL_IMAGES | > 0.80 | +| **AC-2** | 60% of photos \< 20m error | **TC-1**: Baseline Accuracy | COUNT(ATE \< 20m) / TOTAL_IMAGES | > 0.60 | +| **AC-3** | Handle 350m outlier | **TC-2**: Outlier Robustness | ATE for post-outlier frames | ATE remains within AC-1/2 spec. | +| **AC-4** | Handle sharp turns (\<5% overlap) | **TC-3**: Low-Overlap Robustness | IRR for frame *after* the turn | > 95% (must register) | +| **AC-5** | Satellite check outliers \< 10% | **TC-4**: Back-End Residual Analysis | COUNT(Bad_Factors) / COUNT(Sat_Factors) | \< 0.10 | +| **AC-6** | User-in-the-Loop Failsafe | **TC-5**: Failsafe Trigger | System state | GUI prompt must appear. | +| **AC-7** | \< 2 seconds for processing one image | **TC-6**: Performance Benchmark | (Total Wall Time) / TOTAL_IMAGES | \< 2.0 seconds | +| **AC-8** | Image Registration Rate > 95% | **TC-1**: Baseline Accuracy | IRR \= (Registered_Images / TOTAL_IMAGES) | > 0.95 | +| **AC-9** | Mean Reprojection Error \< 1.0 pixels | **TC-1**: Baseline Accuracy | MRE from Back-End (Module 4) | \< 1.0 pixels | + +### **3.1. Test Case 1 (TC-1): Baseline Positional Accuracy (Validates AC-1, AC-2, AC-8, AC-9)** + +* **Procedure:** Process the full sample image set (e.g., AD000001...AD000060) and the coordinates.csv ground truth.1 The system is given *only* the GPS for AD000001.jpg. +* **Metrics & Validation:** + 1. The system's output trajectory is aligned with the ground truth trajectory 1 using a Sim(3) transformation (to account for initial scale/orientation alignment). + 2. **(AC-8)** Calculate the Image Registration Rate (IRR). Pass if IRR > 0.95.25 + 3. **(AC-9)** Extract the final MRE from the GTSAM back-end.26 Pass if MRE \< 1.0 pixels.28 + 4. **(AC-1, AC-2)** Calculate the ATE (in meters) for every successfully registered frame. Calculate the 80th and 60th percentiles. Pass if 80th percentile is \< 50.0m and 60th percentile is \< 20.0m. + +### **3.2. Test Case 2 (TC-2): 350m Outlier Robustness (Validates AC-3)** + +* **Procedure:** Create a synthetic dataset. From the sample set, use AD000001...AD000030. Insert a single, unrelated image (e.g., from a different flight, or AD000060) as frame AD000031_outlier. Append the real AD000031...AD000060 (renamed). +* **Validation:** + 1. The system must process the full set. + 2. The IRR (from TC-1) for *valid* frames must remain > 95%. The outlier frame AD000031_outlier should be correctly rejected. + 3. The ATE for frames AD000031 onward must not be significantly degraded and must remain within the AC-1/2 specification. This validates that Module 2 (SG-SfM) successfully "bridged" the outlier by matching AD000030 to AD000031. + +### **3.3. Test Case 3 (TC-3): Sharp Turn / Low Overlap Robustness (Validates AC-4)** + +* **Procedure:** Create a synthetic dataset by removing frames to simulate a sharp turn. Based on the coordinates in 1, the spatial and angular gap between AD000031 and AD000035 will be large, and overlap will be minimal. +* **Validation:** + 1. The system must successfully register frame AD000035. + 2. The ATE for the entire trajectory must remain within AC-1/2 specification. This directly tests the wide-baseline matching of Module 2 (SG-SfM).19 + +### **3.4. Test Case 4 (TC-4): Satellite Ground Check Fidelity (Validates AC-5)** + +* **Procedure:** During the execution of TC-1, log the state of every Absolute Pose Factor generated by Module 3 (CVG). +* **Metric:** An "outlier factor" is defined as a satellite-match constraint whose final, optimized error *residual* in the GTSAM graph is in the top 10% of all residuals. A high residual means the optimizer is "fighting" this constraint, indicating a bad match (e.g., RANSAC failure 58 or a poor-quality match on an outdated map 54). +* **Validation:** Pass if the total count of these outlier factors is \< 10% of the total number of satellite-match attempts. + +### **3.5. Test Case 5 (TC-5): Failsafe Mechanism Trigger (Validates AC-6)** + +* **Procedure:** Create a synthetic dataset. Insert 4 consecutive "bad" frames (e.g., solid black images, solid white images, or images from a different continent) into the middle of the sample set (e.g., at AD000030). +* **Validation:** + 1. The system must detect 3 consecutive registration failures (30, 31, 32). + 2. Pass if the system automatically pauses processing and successfully displays the "User-in-the-Loop" GUI described in section 2.4.2. + +### **3.6. Test Case 6 (TC-6): System Performance (Validates AC-7)** + +* **Procedure:** On the specified target hardware (including required NVIDIA GPU), time the *total wall-clock execution* of TC-1 using the full 1500-image dataset. +* **Metric:** (Total Wall Clock Time in seconds) / (Total Number of Images). +* **Validation:** Pass if the average processing time per image is \< 2.0 seconds. \ No newline at end of file diff --git a/docs/01_solution/01_solution_draft/UAV_Geolocalization_Solution.md b/docs/01_solution/01_solution_draft_perplexity.md similarity index 56% rename from docs/01_solution/01_solution_draft/UAV_Geolocalization_Solution.md rename to docs/01_solution/01_solution_draft_perplexity.md index f7af46c..0af0779 100644 --- a/docs/01_solution/01_solution_draft/UAV_Geolocalization_Solution.md +++ b/docs/01_solution/01_solution_draft_perplexity.md @@ -541,389 +541,585 @@ Time: 10-60 minutes depending on complexity --- ## 5. Testing Strategy +## 2. Detailed Test Categories -### 5.1 Functional Testing +### 2.1 Unit Tests (Level 1) -#### **Test Category 1: Feature Detection & Matching** -| Test | Scenario | Expected Outcome | Pass Criteria | -|------|----------|------------------|---------------| -| **FT-1.1** | 100% image overlap | ≥95% feature correspondence | Inlier ratio > 0.4 | -| **FT-1.2** | 50% overlap (normal) | 80-95% valid matches | Inlier ratio 0.3-0.6 | -| **FT-1.3** | 5% overlap (sharp turn) | Graceful degradation to skip-frame | Fallback triggered, still <2s | -| **FT-1.4** | Low texture (water/sand) | Detects ≥30 features | System flags uncertainty | -| **FT-1.5** | High contrast (clouds) | Robust feature detection | No false matches | -| **FT-1.6** | Scale change (altitude var) | Detects features at all scales | Multi-scale pyramid works | - -#### **Test Category 2: Pose Estimation** -| Test | Scenario | Expected Outcome | Pass Criteria | -|------|----------|------------------|---------------| -| **FT-2.1** | Known synthetic motion | Essential matrix rank 2 | SVD σ₂=0 (within 1e-6) | -| **FT-2.2** | Real flight +5° pitch | Pose recovered | Altitude consistent ±10% | -| **FT-2.3** | Outlier presence (30%) | RANSAC robustness | Inliers ≥60% of total | -| **FT-2.4** | Collinear points | Degenerate case handling | System detects, skips image | -| **FT-2.5** | Chirality test | Both points in front of cameras | Invalid solutions rejected | - -#### **Test Category 3: 3D Reconstruction** -| Test | Scenario | Expected Outcome | Pass Criteria | -|------|----------|------------------|---------------| -| **FT-3.1** | Simple scene | Triangulated points match ground truth | RMSE < 5cm | -| **FT-3.2** | Occlusions present | Points in visible regions triangulated | Valid mask accuracy > 90% | -| **FT-3.3** | Near camera (<50m) | Points rejected (altitude constraint) | Correctly filtered | -| **FT-3.4** | Far points (>2km) | Points rejected (unrealistic) | Altitude filter working | - -#### **Test Category 4: Bundle Adjustment** -| Test | Scenario | Expected Outcome | Pass Criteria | -|------|----------|------------------|---------------| -| **FT-4.1** | 5-frame window | Reprojection error < 1.5px | Converged in <10 iterations | -| **FT-4.2** | 10-frame window | Still converges | Execution time < 1.5s | -| **FT-4.3** | With outliers | Robust optimization | Error doesn't exceed initial by >10% | -| **FT-4.4** | Covariance computation | Uncertainty quantified | σ > 0 for all poses | - -#### **Test Category 5: Georeferencing** -| Test | Scenario | Expected Outcome | Pass Criteria | -|------|----------|------------------|---------------| -| **FT-5.1** | With satellite match | GPS shift applied | Residual < 30m | -| **FT-5.2** | No satellite match | Local coords preserved | System flags uncertainty | -| **FT-5.3** | With GCPs (4 GCPs) | Transformation computed | Residual on GCPs < 5m | -| **FT-5.4** | Mixed (satellite + GCP) | Both integrated | Weighted average used | - -### 5.2 Non-Functional Testing - -#### **Test Category 6: Performance & Latency** -| Metric | Target | Test Method | Pass Criteria | -|--------|--------|-------------|---------------| -| **NFT-6.1** Feature extraction/image | <0.5s | Profiler on 10 images | 95th percentile < 0.5s | -| **NFT-6.2** Feature matching pair | <0.3s | 50 random pairs | Mean < 0.25s | -| **NFT-6.3** RANSAC (100 iter) | <0.2s | Timer around RANSAC loop | Total < 0.2s | -| **NFT-6.4** Triangulation (500 pts) | <0.1s | Batch triangulation | Linear time O(n) | -| **NFT-6.5** Bundle adjustment (5-frame) | <0.8s | Wall-clock time | LM iterations tracked | -| **NFT-6.6** Satellite lookup & match | <1.5s | API call + matching | Including network latency | -| **Total per image | <2.0s | End-to-end pipeline | 95th percentile < 2.0s | - -#### **Test Category 7: Accuracy & Correctness** -| Test | Scenario | Expected Outcome | Pass Criteria | -|------|----------|------------------|---------------| -| **NFT-7.1** | 80% within 50m | Reference GPS from ground survey | ≥80% of images within 50m error | -| **NFT-7.2** | 60% within 20m | Same reference | ≥60% of images within 20m error | -| **NFT-7.3** | Outlier handling (350m) | System detects, continues | <5 consecutive unresolved images | -| **NFT-7.4** | Sharp turn (<5% overlap) | Graceful fallback | Skip-frame matching succeeds | -| **NFT-7.5** | Registration rate | Sufficient tracking | ≥95% images registered (not flagged) | -| **NFT-7.6** | Reprojection error | Visual consistency | Mean < 1.0 px, max < 3.0 px | - -#### **Test Category 8: Robustness & Resilience** -| Test | Scenario | Expected Outcome | Pass Criteria | -|------|----------|------------------|---------------| -| **NFT-8.1** | Corrupted image | Graceful skip | User notified, trajectory continues | -| **NFT-8.2** | Satellite API failure | Fallback to local | Coordinates use local transform | -| **NFT-8.3** | Low texture sequence | Uncertainty flagged | Continues with reduced confidence | -| **NFT-8.4** | GPS outlier drift | Detected and isolated | Lateral recovery within 3 frames | -| **NFT-8.5** | Memory constraint | Streaming processing | Completes on 8GB RAM (1500 images) | - -#### **Test Category 9: Satellite Cross-Validation** -| Test | Scenario | Expected Outcome | Pass Criteria | -|------|----------|------------------|---------------| -| **NFT-9.1** | Google Maps availability | Images retrieved for area | <10% failed API calls | -| **NFT-9.2** | Outlier rate validation | <10% outliers detected | Outlier count < N/10 | -| **NFT-9.3** | Satellite aged imagery | Handles outdated imagery | Cross-correlation > 0.2 acceptable | -| **NFT-9.4** | Cloud cover in satellite | Continues without georeference | System doesn't crash | - -### 5.3 Test Data & Datasets - -**Primary Test Set**: -- **Provided samples**: 29 images with ground-truth GPS (coordinates.csv) -- **Expected use**: Validation of 50m/20m accuracy criteria - -**Extended Validation**: -- **EuRoC MAV Dataset**: Publicly available UAV sequences with GT poses -- **TUM monoVO Dataset**: Outdoor sequences with GPS ground truth -- **Synthetic flights**: Generated via 3D scene rendering (Blender) - - Vary: altitude (100-900m), overlap (10-95%), texture richness - - Inject: motion blur, rolling shutter, noise - -**Real-world Scenarios**: -- **Agricultural region**: Flat terrain, repetitive texture (challenge) -- **Urban**: Mixed buildings and streets (many features) -- **Coastal**: Sharp water-land boundaries -- **Forest edges**: Varying texture and occlusions - -**Edge Cases**: -- **Complete loss of overlap**: Simulate lost GPS by ignoring N-1 neighbor -- **Extreme tilt**: Aircraft banking >45° -- **Fast motion**: High altitude or fast aircraft speed -- **Low light**: Dawn/dusk imaging -- **Highly repetitive texture**: Sand dunes, water surfaces - -### 5.4 Acceptance Test Plan (ATP) - -**ATP Phase 1: Feature-Level Validation** +#### UT-1: Feature Extraction (AKAZE) ``` -Environment: Controlled lab setting with synthetic data -Duration: 2-3 weeks -Pass/Fail: All FT-1.x through FT-5.x must pass +Purpose: Verify keypoint detection and descriptor computation +Test Data: Synthetic images with known features (checkerboard patterns) +Test Cases: + ├─ UT-1.1: Basic feature detection + │ Input: 1024×768 synthetic image with checkerboard + │ Expected: ≥500 keypoints detected + │ Pass: count ≥ 500 + │ + ├─ UT-1.2: Scale invariance + │ Input: Same scene at 2x scale + │ Expected: Keypoints at proportional positions + │ Pass: correlation of positions > 0.9 + │ + ├─ UT-1.3: Rotation robustness + │ Input: Image rotated ±30° + │ Expected: Descriptors match original + rotated + │ Pass: match rate > 80% + │ + ├─ UT-1.4: Multi-scale handling + │ Input: Image with features at multiple scales + │ Expected: Features detected at all scales (pyramid) + │ Pass: ratio of scales [1:1.2:1.44:...] verified + │ + └─ UT-1.5: Performance constraint + Input: FullHD image (1920×1080) + Expected: <500ms feature extraction + Pass: 95th percentile < 500ms ``` -**ATP Phase 2: Performance Validation** +#### UT-2: Feature Matching ``` -Environment: Multi-core CPU (16+), SSD storage, 16GB+ RAM -Duration: 1 week -Pass/Fail: All NFT-6.x must pass with 95th percentile latency +Purpose: Verify robust feature correspondence +Test Data: Pairs of synthetic/real images with known correspondence +Test Cases: + ├─ UT-2.1: Basic matching + │ Input: Two images from synthetic scene (90% overlap) + │ Expected: ≥95% of ground-truth features matched + │ Pass: match_rate ≥ 0.95 + │ + ├─ UT-2.2: Outlier rejection (Lowe's ratio test) + │ Input: Synthetic pair + 50% false features + │ Expected: False matches rejected + │ Pass: false_match_rate < 0.1 + │ + ├─ UT-2.3: Low overlap scenario + │ Input: Two images with 20% overlap + │ Expected: Still matches ≥20 points + │ Pass: min_matches ≥ 20 + │ + └─ UT-2.4: Performance + Input: FullHD images, 1000 features each + Expected: <300ms matching time + Pass: 95th percentile < 300ms ``` -**ATP Phase 3: Accuracy Validation** +#### UT-3: Essential Matrix Estimation ``` -Environment: Real or realistic flight data (EuRoC, TUM) -Duration: 2 weeks -Pass/Fail: NFT-7.1 and NFT-7.2 (80%/60% criteria) -Deliverable: Accuracy report with error histograms +Purpose: Verify 5-point/8-point algorithms for camera geometry +Test Data: Synthetic correspondences with known relative pose +Test Cases: + ├─ UT-3.1: 8-point algorithm + │ Input: 8+ point correspondences + │ Expected: Essential matrix E with rank 2 + │ Pass: min_singular_value(E) < 1e-6 + │ + ├─ UT-3.2: 5-point algorithm + │ Input: 5 point correspondences + │ Expected: Up to 4 solutions generated + │ Pass: num_solutions ∈ [1, 4] + │ + ├─ UT-3.3: RANSAC convergence + │ Input: 100 correspondences, 30% outliers + │ Expected: Essential matrix recovery despite outliers + │ Pass: inlier_ratio ≥ 0.6 + │ + └─ UT-3.4: Chirality constraint + Input: Multiple (R,t) solutions from decomposition + Expected: Only solution with points in front of cameras selected + Pass: selected_solution verified via triangulation ``` -**ATP Phase 4: Robustness Validation** +#### UT-4: Triangulation (DLT) ``` -Environment: Stress testing with edge cases -Duration: 2 weeks -Pass/Fail: All NFT-8.x, graceful degradation in failures +Purpose: Verify 3D point reconstruction from image correspondences +Test Data: Synthetic scenes with known 3D geometry +Test Cases: + ├─ UT-4.1: Accuracy + │ Input: Noise-free point correspondences + │ Expected: Reconstructed X matches ground truth + │ Pass: RMSE < 0.1cm on 1m scene + │ + ├─ UT-4.2: Outlier handling + │ Input: 10 valid + 2 invalid correspondences + │ Expected: Invalid points detected (behind camera/far) + │ Pass: valid_mask accuracy > 95% + │ + ├─ UT-4.3: Altitude constraint + │ Input: Points with z < 50m (below aircraft) + │ Expected: Points rejected + │ Pass: altitude_filter works correctly + │ + └─ UT-4.4: Batch performance + Input: 500 point triangulations + Expected: <100ms total + Pass: 95th percentile < 100ms ``` -**ATP Phase 5: Field Trial** +#### UT-5: Bundle Adjustment ``` -Environment: Real UAV flights in eastern Ukraine -Duration: 3-4 weeks -Pass/Fail: NFT-7.1, NFT-7.2, NFT-9.1-9.4 on real data -Acceptance Criteria: - - ≥80% images within 50m - - ≥60% images within 20m - - <10% outliers on satellite validation - - <2s per-image processing - - >95% registration rate - - Mean reprojection error <1.0 px -Deliverable: Field test report with metrics -``` - -### 5.5 Metrics & KPIs - -| KPI | Target | Measurement | Frequency | -|-----|--------|-------------|-----------| -| **Accuracy@50m** | ≥80% | % images within 50m of reference | Per flight | -| **Accuracy@20m** | ≥60% | % images within 20m of reference | Per flight | -| **Registration Rate** | ≥95% | % images with successful pose | Per flight | -| **Reprojection Error (Mean)** | <1.0 px | RMS pixel error in BA | Per frame | -| **Processing Speed** | <2.0 s/img | Wall-clock time per image | Per frame | -| **Outlier Rate** | <10% | % images failing satellite validation | Per flight | -| **Availability** | >99% | System uptime (downtime for failures) | Per month | -| **User Time** | <20 min | Time for manual correction of 20% | Per 1000 images | - ---- - -## 6. Implementation Roadmap - -### Phase 1: Foundation (Weeks 1-4) -- ✅ Finalize feature detector (AKAZE multi-scale) -- ✅ Implement feature matcher (KNN + RANSAC) -- ✅ 5-point & 8-point essential matrix solvers -- ✅ Triangulation module -- **Testing**: FT-1.x, FT-2.x validation on synthetic data - -### Phase 2: Core SfM (Weeks 5-8) -- ✅ Sequential image-to-image pipeline -- ✅ Local bundle adjustment (Levenberg-Marquardt) -- ✅ Covariance estimation -- **Testing**: FT-3.x, FT-4.x, NFT-6.x (latency targets) - -### Phase 3: Georeferencing (Weeks 9-12) -- ✅ Satellite image fetching & matching (Google Maps API) -- ✅ GPS coordinate transformation -- ✅ GCP integration framework -- **Testing**: FT-5.x on diverse regions - -### Phase 4: Robustness & Optimization (Weeks 13-16) -- ✅ Outlier detection (velocity anomalies, satellite validation) -- ✅ Fallback strategies (skip-frame, loop closure) -- ✅ Performance optimization (multi-threading, GPU acceleration) -- **Testing**: NFT-7.x, NFT-8.x on edge cases - -### Phase 5: Interface & Deployment (Weeks 17-20) -- ✅ User interface (web-based dashboard) -- ✅ Reporting & export (GeoJSON, CSV, maps) -- ✅ Integration testing -- **Testing**: End-to-end ATP phases 1-4 - -### Phase 6: Field Trials & Refinement (Weeks 21-30) -- ✅ Real UAV flights (3-4 flights) -- ✅ Accuracy validation against ground survey (survey-grade GNSS) -- ✅ Satellite imagery cross-check -- ✅ Optimization tuning based on field data -- **Testing**: ATP Phase 5, field trials - ---- - -## 7. Technology Stack - -### Language & Core Libraries -| Component | Technology | Rationale | -|-----------|-----------|-----------| -| **Core processing** | C++17 + Python bindings | Speed + accessibility | -| **Linear algebra** | Eigen 3.4+ | Efficient matrix ops, sparse support | -| **Computer vision** | OpenCV 4.8+ | AKAZE, feature matching, BA framework | -| **SfM/BA** | Custom + Ceres-Solver* | Flexible optimization, sparse support | -| **Geospatial** | GDAL, proj | WGS84 transformations, coordinate systems | -| **Web API** | Python Flask/FastAPI | Lightweight backend | -| **Frontend** | React.js + Mapbox GL | Interactive mapping, real-time updates | -| **Database** | PostgreSQL + PostGIS | Spatial data storage, queries | - -### Dependencies -``` -Essential: -├─ OpenCV 4.8+ -├─ Eigen 3.4+ -├─ GDAL 3.0+ -├─ proj 7.0+ -└─ Ceres Solver 2.1+ - -Optional (for acceleration): -├─ CUDA 11.8+ (GPU feature extraction) -├─ cuDNN (deep learning fallback) -├─ TensorRT (inference optimization) -└─ OpenMP (CPU parallelization) - -Development: -├─ CMake 3.20+ -├─ Git -├─ Docker (deployment) -└─ Pytest (testing) +Purpose: Verify pose and 3D point optimization +Test Data: Synthetic multi-view scenes +Test Cases: + ├─ UT-5.1: Convergence + │ Input: 5 frames with noisy initial poses + │ Expected: Residual decreases monotonically + │ Pass: final_residual < 0.001 * initial_residual + │ + ├─ UT-5.2: Covariance computation + │ Input: Optimized poses and points + │ Expected: Covariance matrix positive-definite + │ Pass: all_eigenvalues > 0 + │ + ├─ UT-5.3: Window size effect + │ Input: Same problem with window sizes [3, 5, 10] + │ Expected: Larger windows → better residuals + │ Pass: residual_5 < residual_3, residual_10 < residual_5 + │ + └─ UT-5.4: Performance scaling + Input: Window size [5, 10, 15, 20] + Expected: Time ~= O(w^3) + Pass: quadratic fit accurate (R² > 0.95) ``` --- -## 8. Deployment & Scalability +### 2.2 Integration Tests (Level 2) -### 8.1 Deployment Options +#### IT-1: Sequential Pipeline +``` +Purpose: Verify image-to-image processing chain +Test Data: Real aerial image sequences (5-20 images) +Test Cases: + ├─ IT-1.1: Feature flow + │ Features extracted from img₁ → tracked to img₂ → matched + │ Expected: Consistent tracking across images + │ Pass: ≥70% features tracked end-to-end + │ + ├─ IT-1.2: Pose chain consistency + │ Poses P₁, P₂, P₃ computed sequentially + │ Expected: P₃ = P₂ ∘ P₂₋₁ (composition consistency) + │ Pass: pose_error < 0.1° rotation, 5cm translation + │ + ├─ IT-1.3: Trajectory smoothness + │ Velocity computed between poses + │ Expected: Smooth velocity profile (no jumps) + │ Pass: velocity_std_dev < 20% mean_velocity + │ + └─ IT-1.4: Memory usage + Process 100-image sequence + Expected: Constant memory (windowed processing) + Pass: peak_memory < 2GB +``` -**Option A: On-Site Processing (Recommended for Ukraine)** -- **Hardware**: High-performance server or laptop (16+ cores, 64GB RAM, GPU optional) -- **Deployment**: Docker container -- **Advantage**: Offline processing, no cloud dependency, data sovereignty -- **Output**: Local database + web interface +#### IT-2: Satellite Georeferencing +``` +Purpose: Verify local-to-global coordinate transformation +Test Data: Synthetic/real images with known satellite reference +Test Cases: + ├─ IT-2.1: Feature matching with satellite + │ Input: Aerial image + satellite reference + │ Expected: ≥10 matched features between viewpoints + │ Pass: match_count ≥ 10 + │ + ├─ IT-2.2: Homography estimation + │ Matched features → homography matrix + │ Expected: Valid transformation (3×3 matrix) + │ Pass: det(H) ≠ 0, condition_number < 100 + │ + ├─ IT-2.3: GPS transformation accuracy + │ Apply homography to image corners + │ Expected: Computed GPS ≈ known reference GPS + │ Pass: error < 100m (on test data) + │ + └─ IT-2.4: Confidence scoring + Compute inlier_ratio and MI (mutual information) + Expected: score = inlier_ratio × MI ∈ [0, 1] + Pass: high_confidence for obvious matches +``` -**Option B: Cloud Processing** -- **Platform**: AWS EC2 / Azure VM with GPU -- **Advantage**: Scalable, handles large batches -- **Challenge**: Internet requirement, data transfer time, cost -- **Recommended**: Batch processing after flight completion - -**Option C: Edge Processing (Future)** -- **Target**: On-board UAV or ground station computer -- **Challenge**: Computational constraints (<2s per image on embedded CPU) -- **Solution**: Model quantization, frame skipping, inference optimization - -### 8.2 Scalability Considerations - -**Single Flight (500 images)**: -- Processing time: ~15-25 minutes -- Memory peak: ~8GB (feature storage + BA windows) -- Storage: ~500MB raw + ~100MB processed data - -**Large Mission (3000 images)**: -- Processing time: ~90-150 minutes -- Memory peak: ~32GB (recommended) -- Storage: ~3GB raw + ~600MB processed - -**Parallelization Strategy**: -- **Image preprocessing**: Trivially parallel (N CPU threads) -- **Feature extraction**: Parallel (N threads) -- **Sequential matching**: Cannot parallelize (temporal dependency) -- **Bundle adjustment**: Parallel within optimization (Eigen, OpenMP) -- **Satellite validation**: Parallel (batch API calls with rate limiting) - -**Optimization Opportunities**: -1. **Frame skipping**: Process every 2nd or 3rd frame, interpolate others -2. **GPU acceleration**: SIFT/SURF descriptors on CUDA (5-10x speedup) -3. **Incremental BA**: Avoid re-optimizing old frames (sliding window) -4. **Feature caching**: Cache features on SSD to avoid recomputation +#### IT-3: Outlier Detection Chain +``` +Purpose: Verify multi-stage outlier detection +Test Data: Synthetic trajectory with injected outliers +Test Cases: + ├─ IT-3.1: Velocity anomaly detection + │ Inject 350m jump at frame N + │ Expected: Detected as outlier + │ Pass: outlier_flag = True + │ + ├─ IT-3.2: Recovery mechanism + │ After outlier detection + │ Expected: System attempts skip-frame matching (N→N+2) + │ Pass: recovery_successful = True + │ + ├─ IT-3.3: False positive rate + │ Normal sequence with small perturbations + │ Expected: <5% false outlier flagging + │ Pass: false_positive_rate < 0.05 + │ + └─ IT-3.4: Consistency across stages + Multiple detection stages should agree + Pass: agreement_score > 0.8 +``` --- -## 9. Risk Assessment & Mitigation +### 2.3 System Tests (Level 3) -| Risk | Probability | Impact | Mitigation | -|------|-------------|--------|-----------| -| **Feature matching fails on low-texture terrain** | Medium | High | Fallback to skip-frame, satellite matching, user input | -| **Satellite imagery unavailable/outdated** | Medium | Medium | Use local transform, add GCP support | -| **Google Maps API rate limiting** | Low | Low | Cache tiles, batch requests, fallback | -| **GPS coordinate accuracy insufficient** | Low | Medium | Compensate with satellite validation, ground survey option | -| **Rolling shutter distortion** | Medium | Medium | Rectify images, use ORB-SLAM3 techniques | -| **Computational overload on large flights** | Low | Low | Streaming processing, hierarchical processing, GPU | -| **User ground truth unavailable for validation** | Medium | Low | Provide synthetic test datasets, accept self-validation | +#### ST-1: Accuracy Criteria +``` +Purpose: Verify system meets ±50m and ±20m accuracy targets +Test Data: Real aerial image sequences with ground-truth GPS +Test Cases: + ├─ ST-1.1: 50m accuracy target + │ Input: 500-image flight + │ Compute: % images within 50m of ground truth + │ Expected: ≥80% + │ Pass: accuracy_50m ≥ 0.80 + │ + ├─ ST-1.2: 20m accuracy target + │ Same flight data + │ Expected: ≥60% within 20m + │ Pass: accuracy_20m ≥ 0.60 + │ + ├─ ST-1.3: Mean absolute error + │ Compute: MAE over all images + │ Expected: <40m typical + │ Pass: MAE < 50m + │ + └─ ST-1.4: Error distribution + Expected: Error approximately Gaussian + Pass: K-S test p-value > 0.05 +``` + +#### ST-2: Registration Rate +``` +Purpose: Verify ≥95% of images successfully registered +Test Data: Real flights with various conditions +Test Cases: + ├─ ST-2.1: Baseline registration + │ Good overlap, clear features + │ Expected: >98% registration rate + │ Pass: registration_rate ≥ 0.98 + │ + ├─ ST-2.2: Challenging conditions + │ Low texture, variable lighting + │ Expected: ≥95% registration rate + │ Pass: registration_rate ≥ 0.95 + │ + ├─ ST-2.3: Sharp turns scenario + │ Images with <10% overlap + │ Expected: Fallback mechanisms trigger, ≥90% success + │ Pass: fallback_success_rate ≥ 0.90 + │ + └─ ST-2.4: Consecutive failures + Track max consecutive unregistered images + Expected: <3 consecutive failures + Pass: max_consecutive_failures ≤ 3 +``` + +#### ST-3: Reprojection Error +``` +Purpose: Verify <1.0 pixel mean reprojection error +Test Data: Real flight data after bundle adjustment +Test Cases: + ├─ ST-3.1: Mean reprojection error + │ After BA optimization + │ Expected: <1.0 pixel + │ Pass: mean_reproj_error < 1.0 + │ + ├─ ST-3.2: Error distribution + │ Histogram of per-point errors + │ Expected: Tightly concentrated <2 pixels + │ Pass: 95th_percentile < 2.0 px + │ + ├─ ST-3.3: Per-frame consistency + │ Error should not vary dramatically + │ Expected: Consistent across frames + │ Pass: frame_error_std_dev < 0.3 px + │ + └─ ST-3.4: Outlier points + Very large reprojection errors + Expected: <1% of points with error >3 px + Pass: outlier_rate < 0.01 +``` + +#### ST-4: Processing Speed +``` +Purpose: Verify <2 seconds per image +Test Data: Full flight sequences on target hardware +Test Cases: + ├─ ST-4.1: Average latency + │ Mean processing time per image + │ Expected: <2 seconds + │ Pass: mean_latency < 2.0 sec + │ + ├─ ST-4.2: 95th percentile latency + │ Worst-case images (complex scenes) + │ Expected: <2.5 seconds + │ Pass: p95_latency < 2.5 sec + │ + ├─ ST-4.3: Component breakdown + │ Feature extraction: <0.5s + │ Matching: <0.3s + │ RANSAC: <0.2s + │ BA: <0.8s + │ Satellite: <0.3s + │ Pass: Each component within budget + │ + └─ ST-4.4: Scaling with problem size + Memory usage, CPU usage vs. image resolution + Expected: Linear scaling + Pass: O(n) complexity verified +``` + +#### ST-5: Robustness - Outlier Handling +``` +Purpose: Verify graceful handling of 350m outlier drifts +Test Data: Synthetic/real data with injected outliers +Test Cases: + ├─ ST-5.1: Single 350m outlier + │ Inject outlier at frame N + │ Expected: Detected, trajectory continues + │ Pass: system_continues = True + │ + ├─ ST-5.2: Multiple outliers + │ 3-5 outliers scattered in sequence + │ Expected: All detected, recovery attempted + │ Pass: detection_rate ≥ 0.8 + │ + ├─ ST-5.3: False positive rate + │ Normal trajectory, no outliers + │ Expected: <5% false flagging + │ Pass: false_positive_rate < 0.05 + │ + └─ ST-5.4: Recovery latency + Time to recover after outlier + Expected: ≤3 frames + Pass: recovery_latency ≤ 3 frames +``` + +#### ST-6: Robustness - Sharp Turns +``` +Purpose: Verify handling of <5% image overlap scenarios +Test Data: Synthetic sequences with sharp angles +Test Cases: + ├─ ST-6.1: 5% overlap matching + │ Two images with 5% overlap + │ Expected: Minimal matches or skip-frame + │ Pass: system_handles_gracefully = True + │ + ├─ ST-6.2: Skip-frame fallback + │ Direct N→N+1 fails, tries N→N+2 + │ Expected: Succeeds with N→N+2 + │ Pass: skip_frame_success_rate ≥ 0.8 + │ + ├─ ST-6.3: 90° turn handling + │ Images at near-orthogonal angles + │ Expected: Degeneracy detected, logged + │ Pass: degeneracy_detection = True + │ + └─ ST-6.4: Trajectory consistency + Consecutive turns: check velocity smoothness + Expected: No velocity jumps > 50% + Pass: velocity_consistency verified +``` --- -## 10. Expected Performance Outcomes +### 2.4 Field Acceptance Tests (Level 4) -Based on current state-of-the-art and the proposed hybrid approach: +#### FAT-1: Real UAV Flight Trial #1 (Baseline) +``` +Scenario: Nominal flight over agricultural field +┌────────────────────────────────────────┐ +│ Conditions: │ +│ • Clear weather, good sunlight │ +│ • Flat terrain, sparse trees │ +│ • 300m altitude, 50m/s speed │ +│ • 800 images, ~15 min flight │ +└────────────────────────────────────────┘ -### Baseline Estimates (from literature + this system) -- **Accuracy@50m**: 82-88% (targeting 80% minimum) -- **Accuracy@20m**: 65-72% (targeting 60% minimum) -- **Registration Rate**: 97-99% (well above 95% target) -- **Mean Reprojection Error**: 0.7-0.9 px (below 1.0 target) -- **Processing Speed**: 1.5-2.0 s/image (meets <2s target) -- **Outlier Rate**: 5-8% (below 10% target) +Pass Criteria: + ✓ Accuracy: ≥80% within 50m + ✓ Accuracy: ≥60% within 20m + ✓ Registration rate: ≥95% + ✓ Processing time: <2s/image + ✓ Satellite validation: <10% outliers + ✓ Reprojection error: <1.0px mean -### Factors Improving Performance -✅ Multi-scale feature pyramids (handles altitude variations) -✅ RANSAC robustness (handles >30% outliers) -✅ Satellite georeferencing anchor (absolute coordinate recovery) -✅ Loop closure detection (drift correction) -✅ Local bundle adjustment (high-precision pose refinement) +Success Metrics: + • MAE (mean absolute error): <40m + • RMS error: <45m + • Max error: <200m + • Trajectory coherence: smooth (no jumps) +``` -### Factors Limiting Performance -❌ No onboard IMU (can't constrain orientation) -❌ Non-stabilized camera (potential large rotations) -❌ Flat terrain (repetitive texture → feature ambiguity) -❌ Google Maps imagery age (temporal misalignment) -❌ Sequential-only processing (no global optimization pass) +#### FAT-2: Real UAV Flight Trial #2 (Challenging) +``` +Scenario: Flight with more complex terrain +┌────────────────────────────────────────┐ +│ Conditions: │ +│ • Mixed urban/agricultural │ +│ • Buildings, vegetation, water bodies │ +│ • Variable altitude (250-400m) │ +│ • Includes 1-2 sharp turns │ +│ • 1200 images, ~25 min flight │ +└────────────────────────────────────────┘ + +Pass Criteria: + ✓ Accuracy: ≥75% within 50m (relaxed from 80%) + ✓ Accuracy: ≥50% within 20m (relaxed from 60%) + ✓ Registration rate: ≥92% (relaxed from 95%) + ✓ Processing time: <2.5s/image avg + ✓ Outliers detected: <15% (relaxed from 10%) + +Fallback Validation: + ✓ User corrected <20% of uncertain images + ✓ After correction, accuracy meets FAT-1 targets +``` + +#### FAT-3: Real UAV Flight Trial #3 (Edge Case) +``` +Scenario: Low-texture flight (challenging for features) +┌────────────────────────────────────────┐ +│ Conditions: │ +│ • Sandy/desert terrain or water │ +│ • Minimal features │ +│ • Overcast/variable lighting │ +│ • 500-600 images, ~12 min flight │ +└────────────────────────────────────────┘ + +Pass Criteria: + ✓ System continues (no crash): YES + ✓ Graceful degradation: Flags uncertainty + ✓ User can correct and improve: YES + ✓ Satellite anchor helps recovery: YES + +Success Metrics: + • >80% of images tagged "uncertain" + • After user correction: meets standard targets + • Demonstrates fallback mechanisms working +``` --- -## 11. Recommendations for Production Use +## 3. Test Environment Setup -1. **Pre-Flight Calibration** - - Capture intrinsic camera calibration (focal length, principal point, distortion) - - Store in flight metadata - - Update if camera upgraded +### Hardware Requirements +``` +CPU: 16+ cores (Intel Xeon / AMD Ryzen) +RAM: 64GB minimum (32GB acceptable for <1500 images) +Storage: 1TB SSD (for raw images + processing) +GPU: Optional (CUDA 11.8+ for 5-10x acceleration) +Network: For satellite API queries (can be cached) +``` -2. **During Flight** - - Record flight telemetry (IMU, barometer, compass) for optional fusion - - Log starting GPS coordinate (or nearest known landmark) - - Maintain consistent altitude for uniform GSD +### Software Requirements +``` +OS: Ubuntu 20.04 LTS or macOS 12+ +Build: CMake 3.20+, GCC 9+ or Clang 11+ +Dependencies: OpenCV 4.8+, Eigen 3.4+, GDAL 3.0+ +Testing: GoogleTest, Pytest +CI/CD: GitHub Actions or Jenkins +``` -3. **Post-Flight Processing** - - Run full pipeline on high-spec computer (16+ cores recommended) - - Review satellite validation report for outliers - - Manual correction of <20% uncertain images - - Export results with confidence metrics - -4. **Accuracy Improvement** - - Provide 4+ GCPs if survey-grade accuracy needed (<10m) - - Use higher altitude for better satellite overlap - - Ensure adequate image overlap (>50%) - - Fly in good weather (minimal clouds, consistent lighting) - -5. **Operational Constraints** - - Maximum 3000 images per flight (processing time ~2-3 hours) - - Internet connection required for satellite imagery (can cache) - - 64GB RAM recommended for large missions - - SSD storage for raw and processed images +### Test Data Management +``` +Synthetic Data: Generated via Blender (checked into repo) +Real Data: External dataset storage (S3/local SSD) +Ground Truth: Maintained in CSV format with metadata +Versioning: Git-LFS for binary image data +``` --- -## Conclusion +## 4. Test Execution Plan -This solution provides a comprehensive, production-ready system for UAV aerial image geolocalization in GPS-denied environments. By combining incremental structure-from-motion, visual odometry, and satellite image cross-referencing, the system achieves: +### Phase 1: Unit Testing (Weeks 1-6) +``` +Sprint 1-2: UT-1 (Feature detection) - 2 week +Sprint 3-4: UT-2 (Feature matching) - 2 weeks +Sprint 5-6: UT-3, UT-4, UT-5 (Geometry) - 2 weeks -✅ **80% accuracy within 50m** (production requirement) -✅ **60% accuracy within 20m** (production requirement) -✅ **>95% registration rate** (robustness) -✅ **<1.0 pixel reprojection error** (geometric consistency) -✅ **<2 seconds per image** (real-time feasibility) +Continuous: Run full unit test suite every commit +Coverage target: >90% code coverage +``` -The modular architecture allows incremental development, extensive testing via the provided ATP framework, and future enhancements (GPU acceleration, IMU fusion, deep learning integration). Field trials with real UAV flights will validate accuracy and refine parameters for deployment in eastern Ukraine and similar regions. \ No newline at end of file +### Phase 2: Integration Testing (Weeks 7-12) +``` +Sprint 7-9: IT-1 (Sequential pipeline) - 3 weeks +Sprint 10-11: IT-2, IT-3 (Georef, Outliers) - 2 weeks +Sprint 12: System integration - 1 week + +Continuous: Integration tests run nightly +``` + +### Phase 3: System Testing (Weeks 13-18) +``` +Sprint 13-14: ST-1, ST-2 (Accuracy, Registration) - 2 weeks +Sprint 15-16: ST-3, ST-4 (Error, Speed) - 2 weeks +Sprint 17-18: ST-5, ST-6 (Robustness) - 2 weeks + +Load testing: 1000-3000 image sequences +Stress testing: Edge cases, memory limits +``` + +### Phase 4: Field Acceptance (Weeks 19-30) +``` +Week 19-22: FAT-1 (Baseline trial) + • Coordinate 1-2 baseline flights + • Validate system on real data + • Adjust parameters as needed + +Week 23-26: FAT-2 (Challenging trial) + • More complex scenarios + • Test fallback mechanisms + • Refine user interface + +Week 27-30: FAT-3 (Edge case trial) + • Low-texture scenarios + • Validate robustness + • Final adjustments + +Post-trial: Generate comprehensive report +``` + +--- + +## 5. Acceptance Criteria Summary + +| Criterion | Target | Test | Pass/Fail | +|-----------|--------|------|-----------| +| **Accuracy@50m** | ≥80% | FAT-1 | ≥80% pass | +| **Accuracy@20m** | ≥60% | FAT-1 | ≥60% pass | +| **Registration Rate** | ≥95% | ST-2 | ≥95% pass | +| **Reprojection Error** | <1.0px mean | ST-3 | <1.0px pass | +| **Processing Speed** | <2.0s/image | ST-4 | p95<2.5s pass | +| **Robustness (350m outlier)** | Handled | ST-5 | Continue pass | +| **Sharp turns (<5% overlap)** | Handled | ST-6 | Skip-frame pass | +| **Satellite validation** | <10% outliers | FAT-1-3 | <10% pass | + +--- + +## 6. Success Metrics + +**Green Light Criteria** (Ready for production): +- ✅ All unit tests pass (100%) +- ✅ All integration tests pass (100%) +- ✅ All system tests pass (100%) +- ✅ FAT-1 and FAT-2 pass acceptance criteria +- ✅ FAT-3 shows graceful degradation +- ✅ <10% code defects discovered in field trials +- ✅ Performance meets SLA consistently + +**Yellow Light Criteria** (Conditional deployment): +- ⚠ 85-89% of acceptance criteria met +- ⚠ Minor issues in edge cases +- ⚠ Requires workaround documentation +- ⚠ Re-test after fixes + +**Red Light Criteria** (Do not deploy): +- ❌ <85% of acceptance criteria met +- ❌ Critical failures in core functionality +- ❌ Safety/security concerns +- ❌ Cannot meet latency or accuracy targets diff --git a/docs/01_solution/02_solution_draft/productDescription.md b/docs/01_solution/02_solution_draft/productDescription.md deleted file mode 100644 index d28de24..0000000 --- a/docs/01_solution/02_solution_draft/productDescription.md +++ /dev/null @@ -1,90 +0,0 @@ -Product Solution Description -We propose a photogrammetric solution that leverages Structure-from-Motion (SfM) to recover camera -poses from the UAV images and thereby geolocate each photo and features within it. In practice, the -pipeline would extract robust local features (e.g. SIFT or ORB) from each image and match these between -overlapping frames. Matching can be accelerated using a vocabulary-tree (Bag-of-Words) strategy as in -COLMAP or DBoW2, which is efficient for large image sets. Matched feature tracks are triangulated to -obtain sparse 3D points, then bundle adjustment optimizes all camera intrinsics and extrinsics jointly. This -yields a consistent local 3D reconstruction (camera centers and orientations) up to scale. At that point, we -align the reconstructed model to real-world coordinates using the known GPS of the first image – effectively -treating it as a ground control point (GCP). By fixing the first camera’s position (and optionally its altitude), -we impose scale and translation on the model. The remaining cameras then inherit georeferenced positions -and orientations. Finally, once camera poses are in geographic (lat/lon) coordinates, we can map any image -pixel to a ground location (for example by intersecting the camera ray with a flat-earth plane or a DEM), -yielding object coordinates. This photogrammetric approach – similar to open-source pipelines like -OpenSfM or COLMAP – is standard in aerial mapping. -Figure: A fixed-wing UAV used for mapping missions (western Ukraine). Our pipeline would run after image -capture: features are matched across images and bundle-adjusted to recover camera poses. With this approach, -even without onboard GPS for every shot, the relative poses and scale are determined by image overlap. We -would calibrate the ADTi2625 camera (intrinsics and distortion) beforehand to reduce error. Robust -estimators (RANSAC) would reject bad feature matches, ensuring that outlier shifts or low-overlap frames -do not derail reconstruction. We could cluster images into connected groups if sharp turns break the -overlap graph. The use of well-tested SfM libraries (COLMAP, OpenSfM, OpenMVG) provides mature -implementations of these steps. For example, COLMAP’s documented workflow finds matching image pairs -via a BoW index and then performs incremental reconstruction with bundle adjustment. OpenSfM (used in -OpenDroneMap) similarly allows feeding a known GPS point or GCP to align the model. In short, our -solution cores on feature-based SfM to register the images and recover a 3D scene structure, then projects -it to GPS space via the first image’s coordinate. -Architecture Approach -Our system would ingest a batch of up to ~3000 sequential images and run an automated SfM pipeline, -with the following stages: -1. Preprocessing: Load camera intrinsics (from prior calibration or manufacture data). Optionally undistort -images. -2. Feature Detection & Matching: Extract scale-invariant keypoints (SIFT/SURF or fast alternatives) from -each image. Use a vocabulary-tree or sequential matching scheme to find overlapping image pairs. Since -images are sequential, we can match each image to its immediate neighbors (and perhaps to any non- -consecutive images if turns induce overlap). Matching uses KD-tree or FLANN and RANSAC to filter outliers. -3. Pose Estimation (SfM): Seed an incremental SfM: start with the first two images to get a relative pose, -then add images one by one, solving P3P + RANSAC and triangulating new points. If the flight path breaks -1 -into disconnected segments, process each segment separately. After all images are added, run a global -bundle adjustment (using Ceres or COLMAP’s solver) to refine all camera poses and 3D points jointly. We -aim for a mean reprojection error < 1 pixel , indicating a good fit. -4. Georeferencing: Take the optimized reconstruction (which is in an arbitrary coordinate frame) and -transform it to geodetic coordinates. We set the first camera’s recovered position to the known GPS -coordinate (latitude, longitude, altitude). This defines a similarity transform (scale, rotation, translation) -from the SfM frame to WGS84. If altitude or scale is still ambiguous, we can use the UAV’s known altitude or -average GSD to fix scale. We may also use two or more tie-points if available (for example, match image -content to known map features) to constrain orientation. In practice, OpenSfM allows “anchor” points: its -alignment step uses any GCPs to move the reconstruction so observed points align with GPS . Here, even -one anchor (the first camera) fixes the origin and scale, while leaving a yaw uncertainty. To reduce -orientation error, we could match large-scale features (roads, fields) visible in images against a base map to -pin down rotation. -5. Object Geolocation: With each camera pose (now in lat/lon) known, any pixel can be projected onto the -terrain. For example, using a flat-ground assumption or DEM, compute where a ray through that pixel -meets ground. This gives GPS coordinates for image features (craters, fields, etc.). For higher accuracy, -multi-view triangulation of distinct points in the 3D point cloud can refine object coordinates. -Throughout this architecture, we include robustness measures: skip image pairs that fail to match (these -get flagged as “unregistered” and can be handled manually); use robust solvers to ignore mismatches; and -allow segments to be processed independently if turns break connectivity. Manual correction fallback would -be supported by exporting partial models and images to a GIS interface (e.g. QGIS or a WebODM viewer), -where an analyst can add ground control points or manually adjust a segment’s alignment if automated -registration fails for some images. All processing is implemented in optimized C++/Python libraries (OpenCV -for features, COLMAP/OpenSfM for SfM, and GDAL/PROJ for coordinate transforms) so that the time cost -stays within ~2 seconds per image on modern hardware. -Testing Strategy -We will validate performance against the acceptance criteria using a combination of real data and simulated -tests. Functionally, we can run the pipeline on annotated test flights (where the true camera GPS or object -locations are known) and measure errors. For image center accuracy, we compare the computed center -coordinates to ground truth. We expect ≥80% of images to be within 50 m and ≥60% within 20 m of true -position; we will compute these statistics from test flights and tune the pipeline (e.g. match thresholds, -bundle adjustment weighting) if needed. For object positioning, we can place synthetic targets or use -identifiable landmarks (with known GPS) in the imagery, then verify the projected locations. We will also -track the image registration rate (percent of images successfully included with valid poses) and the mean -reprojection error. The latter is a standard photogrammetry metric – values under ~1 pixel are considered -“good” – so we will confirm our reconstructions meet this. Testing under outlier conditions (e.g. -randomly dropping image overlaps, adding false images) will ensure the system correctly rejects bad data -and flags segments for manual review. -Non-functional tests include timing and scalability: we will measure end-to-end processing time on large -flights (3000 images) and optimize parallel processing to meet the 2 s/image target. Robustness testing will -include flights with sharp turns and low-overlap segments to ensure >95% images can still register (with the -remainder caught by the manual fallback UI). We will also simulate partial failures (e.g. missing first image -GPS) to verify the system gracefully alerts the operator. Throughout, we will log bundle-adjustment -residuals and enforce reprojection-error thresholds . Any detected failure (e.g. large error) triggers user -notification to apply manual corrections (e.g. adding an extra GCP or adjusting a segment’s yaw). By -benchmarking on known datasets and gradually introducing perturbations, we can validate that our -pipeline meets the specified accuracy and robustness requirements. -References: Standard open-source photogrammetry tools (e.g. COLMAP, OpenSfM, OpenDroneMap) -implement the SfM and georeferencing steps described . Computer-vision texts note that mean -reprojection error should be ≲1 pixel for a good bundle adjustment fit . These principles and practices -underlie our solution.