# UAV Aerial Image Geolocalization System: Solution Draft ## Executive Summary This document presents a comprehensive solution for determining GPS coordinates of aerial image centers and objects within images captured by fixed-wing UAVs flying at altitudes up to 1km over eastern/southern Ukraine. The system leverages structure-from-motion (SfM), visual odometry, and satellite image cross-referencing to achieve sub-50-meter accuracy for 80% of images while maintaining registration rates above 95%. --- ## 1. Problem Analysis ### 1.1 Key Constraints & Challenges - **No onboard GPS/GNSS receiver** (system must infer coordinates) - **Fixed downward-pointing camera** (non-stabilized, subject to aircraft pitch/roll) - **Up to 3000 images per flight** at 100m nominal spacing (variable due to aircraft dynamics) - **Altitude ≤ 1km** with resolution up to 6252×4168 pixels - **Sharp turns possible** causing image overlaps <5% or complete loss - **Outliers possible**: 350m drift between consecutive images (aircraft tilt) - **Time constraint**: <2 seconds processing per image - **Real-world requirement**: Google Maps validation with <10% outliers ### 1.2 Reference Dataset Analysis The provided 29 sample images show: - **Flight distance**: ~2.26 km ground path - **Image spacing**: 66-202m (mean 119m), indicating ~100-200m altitude - **Coverage area**: ~1.1 km × 1.6 km - **Geographic region**: Eastern Ukraine (east of Dnipro, Kherson/Zaporozhye area) - **Terrain**: Mix of agricultural fields and scattered vegetation ### 1.3 Acceptance Criteria Summary | Criterion | Target | |-----------|--------| | 80% of images within 50m error | Required | | 60% of images within 20m error | Required | | Handle 350m outlier drift | Graceful degradation | | Image Registration Rate | >95% | | Mean Reprojection Error | <1.0 pixels | | Processing time/image | <2 seconds | | Outlier rate (satellite check) | <10% | | User interaction fallback | For unresolvable 20% | --- ## 2. State-of-the-Art Solutions ### 2.1 Current Industry Standards #### **A. OpenDroneMap (ODM)** - **Strengths**: Open-source, parallelizable, proven at scale (2500+ images) - **Pipeline**: OpenSfM (feature matching/tracking) → OpenMVS (dense reconstruction) → GDAL (georeferencing) - **Weaknesses**: Requires GCPs for absolute georeferencing; computational cost (recommends 128GB RAM); doesn't handle GPS-denied scenarios without external anchors - **Typical accuracy**: Meter-level without GCPs; cm-level with GCPs #### **B. COLMAP** - **Strengths**: Incremental SfM with robust bundle adjustment; excellent reprojection error (typically <0.5px) - **Application**: Academic gold standard; proven on large multi-view datasets - **Limitations**: Requires good initial seed pair; can fail with low overlap; computational cost for online processing - **Relevance**: Core algorithm suitable as backbone for this application #### **C. AliceVision/Meshroom** - **Strengths**: Modular photogrammetry framework; feature-rich; GPU-accelerated - **Features**: Robust feature matching, multi-view stereo, camera tracking - **Challenge**: Designed for batch processing, not real-time streaming #### **D. ORB-SLAM3** - **Strengths**: Real-time monocular SLAM; handles rolling-shutter distortions; extremely fast - **Relevant to**: Aerial video streams; can operate at frame rates - **Limitation**: No absolute georeferencing without external anchors; drifts over long sequences #### **E. GPS-Denied Visual Localization (GNSS-Denied Methods)** - **Deep Learning Approaches**: CLIP-based satellite-aerial image matching achieving 39m location error, 15.9° heading error at 100m altitude - **Hierarchical Methods**: Coarse semantic matching + fine-grained feature refinement; tolerates oblique views - **Advantage**: Works with satellite imagery as reference ### 2.2 Feature Detector/Descriptor Comparison | Algorithm | Detection Speed | Matching Speed | Features | Robustness | Best For | |-----------|-----------------|-----------------|----------|-----------|----------| | **SIFT** | Slow | Medium | Scattered | Excellent | Reference, small scale | | **AKAZE** | Fast | Fast | Moderate | Very Good | Real-time, scale variance | | **ORB** | Very Fast | Very Fast | High | Good | Real-time, embedded systems | | **SuperPoint** | Medium | Fast | Learned | Excellent | Modern DL pipelines | **Recommendation**: Hybrid approach using AKAZE for speed + SuperPoint for robustness in difficult scenes --- ## 3. Proposed Architecture Solution ### 3.1 High-Level System Design ``` ┌─────────────────────────────────────────────────────────────────┐ │ UAV IMAGE STREAM │ │ (Sequential, ≤100m spacing, 100-200m alt) │ └──────────────────────────┬──────────────────────────────────────┘ │ ┌──────────────────┴──────────────────┐ │ │ ▼ ▼ ┌──────────────────────────┐ ┌──────────────────────────┐ │ FEATURE EXTRACTION │ │ INITIALIZATION MODULE │ │ ──────────────────── │ │ ────────────────── │ │ • AKAZE keypoint detect │ │ • Assume starting GPS │ │ • Multi-scale pyramids │ │ • Initial camera params │ │ • Descriptor computation│ │ • Seed pair selection │ └──────────────┬───────────┘ └──────────────┬───────────┘ │ │ │ ┌────────────────────────┘ │ │ ▼ ▼ ┌──────────────────────────┐ │ SEQUENTIAL MATCHING │ │ ──────────────────── │ │ • N-to-N+1 matching │ │ • Epipolar constraint │ │ • RANSAC outlier reject │ │ • Essential matrix est. │ └──────────────┬───────────┘ │ ┌────────┴────────┐ │ │ YES ▼ ▼ NO/DIFFICULT ┌──────────────┐ ┌──────────────┐ │ COMPUTE POSE │ │ FALLBACK: │ │ ────────────│ │ • Try N→N+2 │ │ • 8-pt alg │ │ • Try global │ │ • Triangulate • Try satellite │ │ • BA update │ │ • Ask user │ └──────┬───────┘ └──────┬───────┘ │ │ └────────┬────────┘ │ ▼ ┌──────────────────────────────┐ │ BUNDLE ADJUSTMENT (Local) │ │ ────────────────────────── │ │ • Windowed optimization │ │ • Levenberg-Marquardt │ │ • Refine poses + 3D points │ │ • Covariance estimation │ └──────────────┬───────────────┘ │ ▼ ┌──────────────────────────────┐ │ GEOREFERENCING │ │ ──────────────────────── │ │ • Satellite image matching │ │ • GCP integration (if avail)│ │ • WGS84 transformation │ │ • Accuracy assessment │ └──────────────┬───────────────┘ │ ▼ ┌──────────────────────────────┐ │ OUTPUT & VALIDATION │ │ ──────────────────────── │ │ • Image center GPS coords │ │ • Object/feature coords │ │ • Confidence intervals │ │ • Outlier flagging │ │ • Google Maps cross-check │ └──────────────────────────────┘ ``` ### 3.2 Core Algorithmic Components #### **3.2.1 Initialization Phase** **Input**: Starting GPS coordinate (or estimated from first visible landmarks) **Process**: 1. Load first image, extract AKAZE features at multiple scales 2. Establish camera intrinsic parameters: - If known: use factory calibration or pre-computed values - If unknown: assume standard pinhole model with principal point at image center - Estimate focal length from image resolution: ~2.5-3.0 × image width (typical aerial lens) 3. Define initial local coordinate system: - Origin at starting GPS coordinate - Z-axis up, XY horizontal - Project all future calculations to WGS84 at end **Output**: Camera matrix K, initial camera pose (R₀, t₀) #### **3.2.2 Sequential Image-to-Image Matching** **Algorithm**: Incremental SfM with temporal ordering constraint ``` For image N in sequence: 1. Extract AKAZE features from image N 2. Match features with image N-1 using KNN with Lowe's ratio test 3. RANSAC with 8-point essential matrix estimation: - Iterate: sample 8 point correspondences - Solve: SVD-based essential matrix E computation - Score: inlier count (epipolar constraint |p'ᵀEp| < ε) - Keep: best E with >50 inliers 4. If registration fails (inliers <50 or insufficient quality): - Attempt N to N+2 matching (skip frame) - If still failing: request user input or flag as uncertain 5. Decompose E to camera pose (R, t) with triangulation validation 6. Triangulate 3D points from matched features 7. Perform local windowed bundle adjustment (last 5 images) 8. Compute image center GPS via local-to-global transformation ``` **Key Parameters**: - AKAZE threshold: adaptive based on image quality - Matching distance ratio: 0.7 (Lowe's test) - RANSAC inlier threshold: 1.0 pixels - Minimum inliers for success: 50 points - Maximum reprojection error in BA: 1.5 pixels #### **3.2.3 Pose Estimation & Triangulation** **5-Point Algorithm** (Stewenius et al.): - Minimal solver for 5 point correspondences - Returns up to 4 solutions for essential matrix - Selects solution with maximum triangulated points in front of cameras - Complexity: O(5) vs O(8) for 8-point, enabling RANSAC speed **Triangulation**: - Linear triangulation using DLT (Direct Linear Transform) - For each matched feature pair: solve 4×4 system via SVD - Filter: reject points with: - Reprojection error > 1.5 pixels - Behind either camera - Altitude inconsistent with flight dynamics #### **3.2.4 Bundle Adjustment (Windowed)** **Formulation**: ``` minimize Σ ||p_i^(img) - π(X_i, P_cam)||² + λ·||ΔP_cam||² where: - p_i^(img): observed pixel position - X_i: 3D point coordinate - P_cam: camera pose parameters - π(): projection function - λ: regularization weight ``` **Algorithm**: Sparse Levenberg-Marquardt with Schur complement - Window size: 5-10 consecutive images (trade-off between accuracy and speed) - Iteration limit: 10 (convergence typically in 3-5) - Damping: adaptive μ (starts at 10⁻⁶) - Covariance computation: from information matrix inverse **Complexity**: O(w³) where w = window size → ~0.3s for w=10 on modern CPU #### **3.2.5 Georeferencing Module** **Challenge**: Converting local 3D structure to WGS84 coordinates **Approach 1 - Satellite Image Matching** (Primary): 1. Query Google Maps Static API for area around estimated location 2. Scale downloaded satellite imagery to match expected ground resolution 3. Extract ORB/SIFT features from satellite image 4. Match features between UAV nadir image and satellite image 5. Compute homography transformation (if sufficient overlap) 6. Estimate camera center GPS from homography 7. Validate: check consistency with neighboring images **Approach 2 - GCP Integration** (When available): 1. If user provides 4+ manually-identified GCPs in images with known coords: - Use GCPs to establish local-to-global transformation - 6-DOF rigid transformation (4 GCPs minimum) - Refine with all available GCPs using least-squares 2. Transform all local coordinates via this transformation **Approach 3 - IMU/INS Integration** (If available): 1. If UAV provides gyro/accelerometer data: - Integrate IMU measurements to constrain camera orientation - Use IMU to detect anomalies (sharp turns, tilt) - Fuse with visual odometry using Extended Kalman Filter (EKF) - Improves robustness during low-texture sequences **Uncertainty Quantification**: - Covariance matrix σ² from bundle adjustment - Project uncertainty to GPS coordinates via Jacobian - Compute 95% confidence ellipse for each image center - Typical values: σ ≈ 20-50m initially, improves with satellite anchor #### **3.2.6 Fallback & Outlier Detection** **Outlier Detection Strategy**: 1. **Local consistency check**: - Compute velocity between consecutive images - Flag if velocity changes >50% between successive intervals - Expected velocity: ~10-15 m/s ground speed 2. **Satellite validation**: - After full flight processing: retrieve satellite imagery - Compare UAV image against satellite image at claimed coordinates - Compute cross-correlation; flag if <0.3 3. **Loop closure detection**: - If imagery from later in flight matches earlier imagery: flag potential error - Use place recognition (ORB vocabulary tree) to detect revisits 4. **User feedback loop**: - Display flagged uncertain frames to operator - Allow manual refinement for <20% of images - Re-optimize trajectory using corrected anchor points **Graceful Degradation** (350m outlier scenario): - Detect outlier via velocity threshold - Attempt skip-frame matching (N to N+2, N+3) - If fails, insert "uncertainty zone" marker - Continue from next successfully matched pair - Later satellite validation will flag this region for manual review --- ## 4. Architecture: Detailed Module Specifications ### 4.1 System Components #### **Component 1: Image Preprocessor** ``` Input: Raw JPEG/PNG from UAV Output: Normalized, undistorted image ready for feature extraction Operations: ├─ Load image (max 6252×4168) ├─ Apply lens distortion correction (if calibration available) ├─ Normalize histogram (CLAHE for uniform feature detection) ├─ Optional: Downsample for <2s latency (e.g., 3000×2000 if >4000×3000) ├─ Compute image metadata (filename, timestamp) └─ Cache for access by subsequent modules ``` #### **Component 2: Feature Detector** ``` Input: Preprocessed image Output: Keypoints + descriptors Algorithm: AKAZE with multi-scale pyramids ├─ Pyramid levels: 4-6 (scale factor 1.2) ├─ FAST corner threshold: adaptive (target 500-1000 keypoints) ├─ BRIEF descriptor: rotation-aware, 256 bits ├─ Feature filtering: │ ├─ Remove features in low-texture regions (variance <10) │ ├─ Enforce min separation (8px) to avoid clustering │ └─ Sort by keypoint strength (use top 2000) └─ Output: vector, Mat descriptors (Nx256 uint8) ``` #### **Component 3: Feature Matcher** ``` Input: Features from Image N-1, Features from Image N Output: Vector of matched point pairs (inliers only) Algorithm: KNN matching with Lowe's ratio test + RANSAC ├─ BruteForceMatcher (Hamming distance for AKAZE) ├─ KNN search: k=2 ├─ Lowe's ratio test: d1/d2 < 0.7 ├─ RANSAC 5-point algorithm: │ ├─ Iterations: min(4000, 10000 - 100*inlier_count) │ ├─ Inlier threshold: 1.0 pixels │ ├─ Minimum inliers: 50 (lower to 30 for skip-frame matching) │ └─ Success: inlier_ratio > 0.4 ├─ Triangulation validation (reject behind camera) └─ Output: vector, Mat points3D (Mx3) ``` #### **Component 4: Pose Solver** ``` Input: Essential matrix E from RANSAC, matched points Output: Rotation matrix R, translation vector t Algorithm: E decomposition ├─ SVD decomposition of E ├─ Extract 4 candidate (R, t) pairs ├─ Triangulate points for each candidate ├─ Select candidate with max points in front of both cameras ├─ Recover scale using calibration (altitude constraint) ├─ Output: 4x4 transformation matrix T = [R t; 0 1] ``` #### **Component 5: Triangulator** ``` Input: Keypoints from image 1, image 2; poses P1, P2; calib K Output: 3D point positions, mask of valid points Algorithm: Linear triangulation (DLT) ├─ For each point correspondence (p1, p2): │ ├─ Build 4×4 matrix from epipolar lines │ ├─ SVD → solve for 3D point X │ ├─ Validate: |p1 - π(X,P1)| < 1.5px AND |p2 - π(X,P2)| < 1.5px │ ├─ Validate: X_z > 50m (min safe altitude above ground) │ └─ Validate: X_z < 1500m (max altitude constraint) └─ Output: Mat points3D (Mx3 float32), Mat validMask (Mx1 uchar) ``` #### **Component 6: Bundle Adjuster** ``` Input: Poses [P0...Pn], 3D points [X0...Xm], observations Output: Refined poses, 3D points, covariance matrices Algorithm: Sparse Levenberg-Marquardt with windowing ├─ Window size: 5 images (or fewer at flight start) ├─ Optimization variables: │ ├─ Camera poses: 6 DOF per image (Rodrigues rotation + translation) │ └─ 3D points: 3 coordinates per point ├─ Residuals: reprojection error in both images ├─ Iterations: max 10 (typically converges in 3-5) ├─ Covariance: │ ├─ Compute Hessian inverse (information matrix) │ ├─ Extract diagonal for per-parameter variances │ └─ Per-image uncertainty: sqrt(diag(Cov[t])) └─ Output: refined poses, points, Mat covariance (per image) ``` #### **Component 7: Satellite Georeferencer** ``` Input: Current image, estimated center GPS (rough), local trajectory Output: Refined GPS coordinates, confidence score Algorithm: Satellite image matching ├─ Query Google Maps API: │ ├─ Coordinates: estimated_gps ± 200m │ ├─ Resolution: match UAV image resolution (1-2m GSD) │ └─ Zoom level: 18-20 ├─ Image preprocessing: │ ├─ Scale satellite image to ~same resolution as UAV image │ ├─ Convert to grayscale │ └─ Equalize histogram ├─ Feature matching: │ ├─ Extract ORB features from both images │ ├─ Match with BruteForceMatcher │ ├─ Apply RANSAC homography (min 10 inliers) │ └─ Compute inlier ratio ├─ Homography analysis: │ ├─ If inlier_ratio > 0.2: │ │ ├─ Extract 4 corners from UAV image via inverse homography │ │ ├─ Map to satellite image coordinates │ │ ├─ Compute implied GPS shift │ │ └─ Apply shift to current pose estimate │ └─ else: keep local estimate, flag as uncertain ├─ Confidence scoring: │ ├─ score = inlier_ratio × mutual_information_normalized │ └─ Threshold: score > 0.3 for "high confidence" └─ Output: refined_gps, confidence (0.0-1.0), residual_px ``` #### **Component 8: Outlier Detector** ``` Input: Trajectory sequence [GPS_0, GPS_1, ..., GPS_n] Output: Outlier flags, re-processed trajectory Algorithm: Multi-stage detection ├─ Stage 1 - Velocity anomaly: │ ├─ Compute inter-image distances: d_i = |GPS_i - GPS_{i-1}| │ ├─ Compute velocity: v_i = d_i / Δt (Δt typically 0.5-2s) │ ├─ Expected: 10-20 m/s for typical UAV │ ├─ Flag if: v_i > 30 m/s OR v_i < 1 m/s │ └─ Acceleration anomaly: |v_i - v_{i-1}| > 15 m/s ├─ Stage 2 - Satellite consistency: │ ├─ For each flagged image: │ │ ├─ Retrieve satellite image at claimed GPS │ │ ├─ Compute cross-correlation with UAV image │ │ └─ If corr < 0.25: mark as outlier │ └─ Reprocess outlier image: │ ├─ Try skip-frame matching (to N±2, N±3) │ ├─ Try global place recognition │ └─ Request user input if all fail ├─ Stage 3 - Loop closure: │ ├─ Check if image matches any earlier image (Hamming dist <50) │ └─ If match detected: assess if consistent with trajectory └─ Output: flags, corrected_trajectory, uncertain_regions ``` #### **Component 9: User Interface Module** ``` Input: Flight trajectory, flagged uncertain regions Output: User corrections, refined trajectory Features: ├─ Web interface or desktop app ├─ Map display (Google Maps embedded): │ ├─ Show computed trajectory │ ├─ Overlay satellite imagery │ ├─ Highlight uncertain regions (red) │ ├─ Show confidence intervals (error ellipses) │ └─ Display reprojection errors ├─ Image preview: │ ├─ Click trajectory point to view corresponding image │ ├─ Show matched keypoints and epipolar lines │ ├─ Display feature matching quality metrics │ └─ Show neighboring images in sequence ├─ Manual correction: │ ├─ Drag trajectory point to correct location (via map click) │ ├─ Mark GCPs manually (click point in image, enter GPS) │ ├─ Re-run optimization with corrected anchors │ └─ Export corrected trajectory as GeoJSON/CSV └─ Reporting: ├─ Summary statistics (% within 50m, 20m, etc.) ├─ Outlier report with reasons ├─ Satellite validation results └─ Export georeferenced image list with coordinates ``` ### 4.2 Data Flow & Processing Pipeline **Phase 1: Offline Initialization** (before flight or post-download) ``` Input: Full set of N images, starting GPS coordinate ├─ Load all images into memory/fast storage (SSD) ├─ Detect features in all images (parallelizable: N CPU threads) ├─ Store features on disk for quick access └─ Estimate camera calibration (if not known) Time: ~1-3 minutes for 1000 images on 16-core CPU ``` **Phase 2: Sequential Processing** (online or batch) ``` For i = 1 to N-1: ├─ Load images[i] and images[i+1] ├─ Match features ├─ RANSAC pose estimation ├─ Triangulate 3D points ├─ Local bundle adjustment (last 5 frames) ├─ Satellite georeferencing ├─ Store: GPS[i+1], confidence[i+1], covariance[i+1] └─ [< 2 seconds per iteration] Time: 2N seconds = ~30-60 minutes for 1000 images ``` **Phase 3: Post-Processing** (after full trajectory) ``` ├─ Global bundle adjustment (optional: full flight with key-frame selection) ├─ Loop closure optimization (if detected) ├─ Outlier detection and flagging ├─ Satellite validation (batch retrieve imagery, compare) ├─ Export results with metadata └─ Generate report with accuracy metrics Time: ~5-20 minutes ``` **Phase 4: Manual Review & Correction** (if needed) ``` ├─ User reviews flagged uncertain regions ├─ Manually corrects up to 20% of trajectory as needed ├─ Re-optimizes with corrected anchors └─ Final export Time: 10-60 minutes depending on complexity ``` --- ## 5. Testing Strategy ## 2. Detailed Test Categories ### 2.1 Unit Tests (Level 1) #### UT-1: Feature Extraction (AKAZE) ``` Purpose: Verify keypoint detection and descriptor computation Test Data: Synthetic images with known features (checkerboard patterns) Test Cases: ├─ UT-1.1: Basic feature detection │ Input: 1024×768 synthetic image with checkerboard │ Expected: ≥500 keypoints detected │ Pass: count ≥ 500 │ ├─ UT-1.2: Scale invariance │ Input: Same scene at 2x scale │ Expected: Keypoints at proportional positions │ Pass: correlation of positions > 0.9 │ ├─ UT-1.3: Rotation robustness │ Input: Image rotated ±30° │ Expected: Descriptors match original + rotated │ Pass: match rate > 80% │ ├─ UT-1.4: Multi-scale handling │ Input: Image with features at multiple scales │ Expected: Features detected at all scales (pyramid) │ Pass: ratio of scales [1:1.2:1.44:...] verified │ └─ UT-1.5: Performance constraint Input: FullHD image (1920×1080) Expected: <500ms feature extraction Pass: 95th percentile < 500ms ``` #### UT-2: Feature Matching ``` Purpose: Verify robust feature correspondence Test Data: Pairs of synthetic/real images with known correspondence Test Cases: ├─ UT-2.1: Basic matching │ Input: Two images from synthetic scene (90% overlap) │ Expected: ≥95% of ground-truth features matched │ Pass: match_rate ≥ 0.95 │ ├─ UT-2.2: Outlier rejection (Lowe's ratio test) │ Input: Synthetic pair + 50% false features │ Expected: False matches rejected │ Pass: false_match_rate < 0.1 │ ├─ UT-2.3: Low overlap scenario │ Input: Two images with 20% overlap │ Expected: Still matches ≥20 points │ Pass: min_matches ≥ 20 │ └─ UT-2.4: Performance Input: FullHD images, 1000 features each Expected: <300ms matching time Pass: 95th percentile < 300ms ``` #### UT-3: Essential Matrix Estimation ``` Purpose: Verify 5-point/8-point algorithms for camera geometry Test Data: Synthetic correspondences with known relative pose Test Cases: ├─ UT-3.1: 8-point algorithm │ Input: 8+ point correspondences │ Expected: Essential matrix E with rank 2 │ Pass: min_singular_value(E) < 1e-6 │ ├─ UT-3.2: 5-point algorithm │ Input: 5 point correspondences │ Expected: Up to 4 solutions generated │ Pass: num_solutions ∈ [1, 4] │ ├─ UT-3.3: RANSAC convergence │ Input: 100 correspondences, 30% outliers │ Expected: Essential matrix recovery despite outliers │ Pass: inlier_ratio ≥ 0.6 │ └─ UT-3.4: Chirality constraint Input: Multiple (R,t) solutions from decomposition Expected: Only solution with points in front of cameras selected Pass: selected_solution verified via triangulation ``` #### UT-4: Triangulation (DLT) ``` Purpose: Verify 3D point reconstruction from image correspondences Test Data: Synthetic scenes with known 3D geometry Test Cases: ├─ UT-4.1: Accuracy │ Input: Noise-free point correspondences │ Expected: Reconstructed X matches ground truth │ Pass: RMSE < 0.1cm on 1m scene │ ├─ UT-4.2: Outlier handling │ Input: 10 valid + 2 invalid correspondences │ Expected: Invalid points detected (behind camera/far) │ Pass: valid_mask accuracy > 95% │ ├─ UT-4.3: Altitude constraint │ Input: Points with z < 50m (below aircraft) │ Expected: Points rejected │ Pass: altitude_filter works correctly │ └─ UT-4.4: Batch performance Input: 500 point triangulations Expected: <100ms total Pass: 95th percentile < 100ms ``` #### UT-5: Bundle Adjustment ``` Purpose: Verify pose and 3D point optimization Test Data: Synthetic multi-view scenes Test Cases: ├─ UT-5.1: Convergence │ Input: 5 frames with noisy initial poses │ Expected: Residual decreases monotonically │ Pass: final_residual < 0.001 * initial_residual │ ├─ UT-5.2: Covariance computation │ Input: Optimized poses and points │ Expected: Covariance matrix positive-definite │ Pass: all_eigenvalues > 0 │ ├─ UT-5.3: Window size effect │ Input: Same problem with window sizes [3, 5, 10] │ Expected: Larger windows → better residuals │ Pass: residual_5 < residual_3, residual_10 < residual_5 │ └─ UT-5.4: Performance scaling Input: Window size [5, 10, 15, 20] Expected: Time ~= O(w^3) Pass: quadratic fit accurate (R² > 0.95) ``` --- ### 2.2 Integration Tests (Level 2) #### IT-1: Sequential Pipeline ``` Purpose: Verify image-to-image processing chain Test Data: Real aerial image sequences (5-20 images) Test Cases: ├─ IT-1.1: Feature flow │ Features extracted from img₁ → tracked to img₂ → matched │ Expected: Consistent tracking across images │ Pass: ≥70% features tracked end-to-end │ ├─ IT-1.2: Pose chain consistency │ Poses P₁, P₂, P₃ computed sequentially │ Expected: P₃ = P₂ ∘ P₂₋₁ (composition consistency) │ Pass: pose_error < 0.1° rotation, 5cm translation │ ├─ IT-1.3: Trajectory smoothness │ Velocity computed between poses │ Expected: Smooth velocity profile (no jumps) │ Pass: velocity_std_dev < 20% mean_velocity │ └─ IT-1.4: Memory usage Process 100-image sequence Expected: Constant memory (windowed processing) Pass: peak_memory < 2GB ``` #### IT-2: Satellite Georeferencing ``` Purpose: Verify local-to-global coordinate transformation Test Data: Synthetic/real images with known satellite reference Test Cases: ├─ IT-2.1: Feature matching with satellite │ Input: Aerial image + satellite reference │ Expected: ≥10 matched features between viewpoints │ Pass: match_count ≥ 10 │ ├─ IT-2.2: Homography estimation │ Matched features → homography matrix │ Expected: Valid transformation (3×3 matrix) │ Pass: det(H) ≠ 0, condition_number < 100 │ ├─ IT-2.3: GPS transformation accuracy │ Apply homography to image corners │ Expected: Computed GPS ≈ known reference GPS │ Pass: error < 100m (on test data) │ └─ IT-2.4: Confidence scoring Compute inlier_ratio and MI (mutual information) Expected: score = inlier_ratio × MI ∈ [0, 1] Pass: high_confidence for obvious matches ``` #### IT-3: Outlier Detection Chain ``` Purpose: Verify multi-stage outlier detection Test Data: Synthetic trajectory with injected outliers Test Cases: ├─ IT-3.1: Velocity anomaly detection │ Inject 350m jump at frame N │ Expected: Detected as outlier │ Pass: outlier_flag = True │ ├─ IT-3.2: Recovery mechanism │ After outlier detection │ Expected: System attempts skip-frame matching (N→N+2) │ Pass: recovery_successful = True │ ├─ IT-3.3: False positive rate │ Normal sequence with small perturbations │ Expected: <5% false outlier flagging │ Pass: false_positive_rate < 0.05 │ └─ IT-3.4: Consistency across stages Multiple detection stages should agree Pass: agreement_score > 0.8 ``` --- ### 2.3 System Tests (Level 3) #### ST-1: Accuracy Criteria ``` Purpose: Verify system meets ±50m and ±20m accuracy targets Test Data: Real aerial image sequences with ground-truth GPS Test Cases: ├─ ST-1.1: 50m accuracy target │ Input: 500-image flight │ Compute: % images within 50m of ground truth │ Expected: ≥80% │ Pass: accuracy_50m ≥ 0.80 │ ├─ ST-1.2: 20m accuracy target │ Same flight data │ Expected: ≥60% within 20m │ Pass: accuracy_20m ≥ 0.60 │ ├─ ST-1.3: Mean absolute error │ Compute: MAE over all images │ Expected: <40m typical │ Pass: MAE < 50m │ └─ ST-1.4: Error distribution Expected: Error approximately Gaussian Pass: K-S test p-value > 0.05 ``` #### ST-2: Registration Rate ``` Purpose: Verify ≥95% of images successfully registered Test Data: Real flights with various conditions Test Cases: ├─ ST-2.1: Baseline registration │ Good overlap, clear features │ Expected: >98% registration rate │ Pass: registration_rate ≥ 0.98 │ ├─ ST-2.2: Challenging conditions │ Low texture, variable lighting │ Expected: ≥95% registration rate │ Pass: registration_rate ≥ 0.95 │ ├─ ST-2.3: Sharp turns scenario │ Images with <10% overlap │ Expected: Fallback mechanisms trigger, ≥90% success │ Pass: fallback_success_rate ≥ 0.90 │ └─ ST-2.4: Consecutive failures Track max consecutive unregistered images Expected: <3 consecutive failures Pass: max_consecutive_failures ≤ 3 ``` #### ST-3: Reprojection Error ``` Purpose: Verify <1.0 pixel mean reprojection error Test Data: Real flight data after bundle adjustment Test Cases: ├─ ST-3.1: Mean reprojection error │ After BA optimization │ Expected: <1.0 pixel │ Pass: mean_reproj_error < 1.0 │ ├─ ST-3.2: Error distribution │ Histogram of per-point errors │ Expected: Tightly concentrated <2 pixels │ Pass: 95th_percentile < 2.0 px │ ├─ ST-3.3: Per-frame consistency │ Error should not vary dramatically │ Expected: Consistent across frames │ Pass: frame_error_std_dev < 0.3 px │ └─ ST-3.4: Outlier points Very large reprojection errors Expected: <1% of points with error >3 px Pass: outlier_rate < 0.01 ``` #### ST-4: Processing Speed ``` Purpose: Verify <2 seconds per image Test Data: Full flight sequences on target hardware Test Cases: ├─ ST-4.1: Average latency │ Mean processing time per image │ Expected: <2 seconds │ Pass: mean_latency < 2.0 sec │ ├─ ST-4.2: 95th percentile latency │ Worst-case images (complex scenes) │ Expected: <2.5 seconds │ Pass: p95_latency < 2.5 sec │ ├─ ST-4.3: Component breakdown │ Feature extraction: <0.5s │ Matching: <0.3s │ RANSAC: <0.2s │ BA: <0.8s │ Satellite: <0.3s │ Pass: Each component within budget │ └─ ST-4.4: Scaling with problem size Memory usage, CPU usage vs. image resolution Expected: Linear scaling Pass: O(n) complexity verified ``` #### ST-5: Robustness - Outlier Handling ``` Purpose: Verify graceful handling of 350m outlier drifts Test Data: Synthetic/real data with injected outliers Test Cases: ├─ ST-5.1: Single 350m outlier │ Inject outlier at frame N │ Expected: Detected, trajectory continues │ Pass: system_continues = True │ ├─ ST-5.2: Multiple outliers │ 3-5 outliers scattered in sequence │ Expected: All detected, recovery attempted │ Pass: detection_rate ≥ 0.8 │ ├─ ST-5.3: False positive rate │ Normal trajectory, no outliers │ Expected: <5% false flagging │ Pass: false_positive_rate < 0.05 │ └─ ST-5.4: Recovery latency Time to recover after outlier Expected: ≤3 frames Pass: recovery_latency ≤ 3 frames ``` #### ST-6: Robustness - Sharp Turns ``` Purpose: Verify handling of <5% image overlap scenarios Test Data: Synthetic sequences with sharp angles Test Cases: ├─ ST-6.1: 5% overlap matching │ Two images with 5% overlap │ Expected: Minimal matches or skip-frame │ Pass: system_handles_gracefully = True │ ├─ ST-6.2: Skip-frame fallback │ Direct N→N+1 fails, tries N→N+2 │ Expected: Succeeds with N→N+2 │ Pass: skip_frame_success_rate ≥ 0.8 │ ├─ ST-6.3: 90° turn handling │ Images at near-orthogonal angles │ Expected: Degeneracy detected, logged │ Pass: degeneracy_detection = True │ └─ ST-6.4: Trajectory consistency Consecutive turns: check velocity smoothness Expected: No velocity jumps > 50% Pass: velocity_consistency verified ``` --- ### 2.4 Field Acceptance Tests (Level 4) #### FAT-1: Real UAV Flight Trial #1 (Baseline) ``` Scenario: Nominal flight over agricultural field ┌────────────────────────────────────────┐ │ Conditions: │ │ • Clear weather, good sunlight │ │ • Flat terrain, sparse trees │ │ • 300m altitude, 50m/s speed │ │ • 800 images, ~15 min flight │ └────────────────────────────────────────┘ Pass Criteria: ✓ Accuracy: ≥80% within 50m ✓ Accuracy: ≥60% within 20m ✓ Registration rate: ≥95% ✓ Processing time: <2s/image ✓ Satellite validation: <10% outliers ✓ Reprojection error: <1.0px mean Success Metrics: • MAE (mean absolute error): <40m • RMS error: <45m • Max error: <200m • Trajectory coherence: smooth (no jumps) ``` #### FAT-2: Real UAV Flight Trial #2 (Challenging) ``` Scenario: Flight with more complex terrain ┌────────────────────────────────────────┐ │ Conditions: │ │ • Mixed urban/agricultural │ │ • Buildings, vegetation, water bodies │ │ • Variable altitude (250-400m) │ │ • Includes 1-2 sharp turns │ │ • 1200 images, ~25 min flight │ └────────────────────────────────────────┘ Pass Criteria: ✓ Accuracy: ≥75% within 50m (relaxed from 80%) ✓ Accuracy: ≥50% within 20m (relaxed from 60%) ✓ Registration rate: ≥92% (relaxed from 95%) ✓ Processing time: <2.5s/image avg ✓ Outliers detected: <15% (relaxed from 10%) Fallback Validation: ✓ User corrected <20% of uncertain images ✓ After correction, accuracy meets FAT-1 targets ``` #### FAT-3: Real UAV Flight Trial #3 (Edge Case) ``` Scenario: Low-texture flight (challenging for features) ┌────────────────────────────────────────┐ │ Conditions: │ │ • Sandy/desert terrain or water │ │ • Minimal features │ │ • Overcast/variable lighting │ │ • 500-600 images, ~12 min flight │ └────────────────────────────────────────┘ Pass Criteria: ✓ System continues (no crash): YES ✓ Graceful degradation: Flags uncertainty ✓ User can correct and improve: YES ✓ Satellite anchor helps recovery: YES Success Metrics: • >80% of images tagged "uncertain" • After user correction: meets standard targets • Demonstrates fallback mechanisms working ``` --- ## 3. Test Environment Setup ### Hardware Requirements ``` CPU: 16+ cores (Intel Xeon / AMD Ryzen) RAM: 64GB minimum (32GB acceptable for <1500 images) Storage: 1TB SSD (for raw images + processing) GPU: Optional (CUDA 11.8+ for 5-10x acceleration) Network: For satellite API queries (can be cached) ``` ### Software Requirements ``` OS: Ubuntu 20.04 LTS or macOS 12+ Build: CMake 3.20+, GCC 9+ or Clang 11+ Dependencies: OpenCV 4.8+, Eigen 3.4+, GDAL 3.0+ Testing: GoogleTest, Pytest CI/CD: GitHub Actions or Jenkins ``` ### Test Data Management ``` Synthetic Data: Generated via Blender (checked into repo) Real Data: External dataset storage (S3/local SSD) Ground Truth: Maintained in CSV format with metadata Versioning: Git-LFS for binary image data ``` --- ## 4. Test Execution Plan ### Phase 1: Unit Testing (Weeks 1-6) ``` Sprint 1-2: UT-1 (Feature detection) - 2 week Sprint 3-4: UT-2 (Feature matching) - 2 weeks Sprint 5-6: UT-3, UT-4, UT-5 (Geometry) - 2 weeks Continuous: Run full unit test suite every commit Coverage target: >90% code coverage ``` ### Phase 2: Integration Testing (Weeks 7-12) ``` Sprint 7-9: IT-1 (Sequential pipeline) - 3 weeks Sprint 10-11: IT-2, IT-3 (Georef, Outliers) - 2 weeks Sprint 12: System integration - 1 week Continuous: Integration tests run nightly ``` ### Phase 3: System Testing (Weeks 13-18) ``` Sprint 13-14: ST-1, ST-2 (Accuracy, Registration) - 2 weeks Sprint 15-16: ST-3, ST-4 (Error, Speed) - 2 weeks Sprint 17-18: ST-5, ST-6 (Robustness) - 2 weeks Load testing: 1000-3000 image sequences Stress testing: Edge cases, memory limits ``` ### Phase 4: Field Acceptance (Weeks 19-30) ``` Week 19-22: FAT-1 (Baseline trial) • Coordinate 1-2 baseline flights • Validate system on real data • Adjust parameters as needed Week 23-26: FAT-2 (Challenging trial) • More complex scenarios • Test fallback mechanisms • Refine user interface Week 27-30: FAT-3 (Edge case trial) • Low-texture scenarios • Validate robustness • Final adjustments Post-trial: Generate comprehensive report ``` --- ## 5. Acceptance Criteria Summary | Criterion | Target | Test | Pass/Fail | |-----------|--------|------|-----------| | **Accuracy@50m** | ≥80% | FAT-1 | ≥80% pass | | **Accuracy@20m** | ≥60% | FAT-1 | ≥60% pass | | **Registration Rate** | ≥95% | ST-2 | ≥95% pass | | **Reprojection Error** | <1.0px mean | ST-3 | <1.0px pass | | **Processing Speed** | <2.0s/image | ST-4 | p95<2.5s pass | | **Robustness (350m outlier)** | Handled | ST-5 | Continue pass | | **Sharp turns (<5% overlap)** | Handled | ST-6 | Skip-frame pass | | **Satellite validation** | <10% outliers | FAT-1-3 | <10% pass | --- ## 6. Success Metrics **Green Light Criteria** (Ready for production): - ✅ All unit tests pass (100%) - ✅ All integration tests pass (100%) - ✅ All system tests pass (100%) - ✅ FAT-1 and FAT-2 pass acceptance criteria - ✅ FAT-3 shows graceful degradation - ✅ <10% code defects discovered in field trials - ✅ Performance meets SLA consistently **Yellow Light Criteria** (Conditional deployment): - ⚠ 85-89% of acceptance criteria met - ⚠ Minor issues in edge cases - ⚠ Requires workaround documentation - ⚠ Re-test after fixes **Red Light Criteria** (Do not deploy): - ❌ <85% of acceptance criteria met - ❌ Critical failures in core functionality - ❌ Safety/security concerns - ❌ Cannot meet latency or accuracy targets