42 KiB
UAV Aerial Image Geolocalization System: Solution Draft
Executive Summary
This document presents a comprehensive solution for determining GPS coordinates of aerial image centers and objects within images captured by fixed-wing UAVs flying at altitudes up to 1km over eastern/southern Ukraine. The system leverages structure-from-motion (SfM), visual odometry, and satellite image cross-referencing to achieve sub-50-meter accuracy for 80% of images while maintaining registration rates above 95%.
1. Problem Analysis
1.1 Key Constraints & Challenges
- No onboard GPS/GNSS receiver (system must infer coordinates)
- Fixed downward-pointing camera (non-stabilized, subject to aircraft pitch/roll)
- Up to 3000 images per flight at 100m nominal spacing (variable due to aircraft dynamics)
- Altitude ≤ 1km with resolution up to 6252×4168 pixels
- Sharp turns possible causing image overlaps <5% or complete loss
- Outliers possible: 350m drift between consecutive images (aircraft tilt)
- Time constraint: <2 seconds processing per image
- Real-world requirement: Google Maps validation with <10% outliers
1.2 Reference Dataset Analysis
The provided 29 sample images show:
- Flight distance: ~2.26 km ground path
- Image spacing: 66-202m (mean 119m), indicating ~100-200m altitude
- Coverage area: ~1.1 km × 1.6 km
- Geographic region: Eastern Ukraine (east of Dnipro, Kherson/Zaporozhye area)
- Terrain: Mix of agricultural fields and scattered vegetation
1.3 Acceptance Criteria Summary
| Criterion | Target |
|---|---|
| 80% of images within 50m error | Required |
| 60% of images within 20m error | Required |
| Handle 350m outlier drift | Graceful degradation |
| Image Registration Rate | >95% |
| Mean Reprojection Error | <1.0 pixels |
| Processing time/image | <2 seconds |
| Outlier rate (satellite check) | <10% |
| User interaction fallback | For unresolvable 20% |
2. State-of-the-Art Solutions
2.1 Current Industry Standards
A. OpenDroneMap (ODM)
- Strengths: Open-source, parallelizable, proven at scale (2500+ images)
- Pipeline: OpenSfM (feature matching/tracking) → OpenMVS (dense reconstruction) → GDAL (georeferencing)
- Weaknesses: Requires GCPs for absolute georeferencing; computational cost (recommends 128GB RAM); doesn't handle GPS-denied scenarios without external anchors
- Typical accuracy: Meter-level without GCPs; cm-level with GCPs
B. COLMAP
- Strengths: Incremental SfM with robust bundle adjustment; excellent reprojection error (typically <0.5px)
- Application: Academic gold standard; proven on large multi-view datasets
- Limitations: Requires good initial seed pair; can fail with low overlap; computational cost for online processing
- Relevance: Core algorithm suitable as backbone for this application
C. AliceVision/Meshroom
- Strengths: Modular photogrammetry framework; feature-rich; GPU-accelerated
- Features: Robust feature matching, multi-view stereo, camera tracking
- Challenge: Designed for batch processing, not real-time streaming
D. ORB-SLAM3
- Strengths: Real-time monocular SLAM; handles rolling-shutter distortions; extremely fast
- Relevant to: Aerial video streams; can operate at frame rates
- Limitation: No absolute georeferencing without external anchors; drifts over long sequences
E. GPS-Denied Visual Localization (GNSS-Denied Methods)
- Deep Learning Approaches: CLIP-based satellite-aerial image matching achieving 39m location error, 15.9° heading error at 100m altitude
- Hierarchical Methods: Coarse semantic matching + fine-grained feature refinement; tolerates oblique views
- Advantage: Works with satellite imagery as reference
2.2 Feature Detector/Descriptor Comparison
| Algorithm | Detection Speed | Matching Speed | Features | Robustness | Best For |
|---|---|---|---|---|---|
| SIFT | Slow | Medium | Scattered | Excellent | Reference, small scale |
| AKAZE | Fast | Fast | Moderate | Very Good | Real-time, scale variance |
| ORB | Very Fast | Very Fast | High | Good | Real-time, embedded systems |
| SuperPoint | Medium | Fast | Learned | Excellent | Modern DL pipelines |
Recommendation: Hybrid approach using AKAZE for speed + SuperPoint for robustness in difficult scenes
3. Proposed Architecture Solution
3.1 High-Level System Design
┌─────────────────────────────────────────────────────────────────┐
│ UAV IMAGE STREAM │
│ (Sequential, ≤100m spacing, 100-200m alt) │
└──────────────────────────┬──────────────────────────────────────┘
│
┌──────────────────┴──────────────────┐
│ │
▼ ▼
┌──────────────────────────┐ ┌──────────────────────────┐
│ FEATURE EXTRACTION │ │ INITIALIZATION MODULE │
│ ──────────────────── │ │ ────────────────── │
│ • AKAZE keypoint detect │ │ • Assume starting GPS │
│ • Multi-scale pyramids │ │ • Initial camera params │
│ • Descriptor computation│ │ • Seed pair selection │
└──────────────┬───────────┘ └──────────────┬───────────┘
│ │
│ ┌────────────────────────┘
│ │
▼ ▼
┌──────────────────────────┐
│ SEQUENTIAL MATCHING │
│ ──────────────────── │
│ • N-to-N+1 matching │
│ • Epipolar constraint │
│ • RANSAC outlier reject │
│ • Essential matrix est. │
└──────────────┬───────────┘
│
┌────────┴────────┐
│ │
YES ▼ ▼ NO/DIFFICULT
┌──────────────┐ ┌──────────────┐
│ COMPUTE POSE │ │ FALLBACK: │
│ ────────────│ │ • Try N→N+2 │
│ • 8-pt alg │ │ • Try global │
│ • Triangulate • Try satellite │
│ • BA update │ │ • Ask user │
└──────┬───────┘ └──────┬───────┘
│ │
└────────┬────────┘
│
▼
┌──────────────────────────────┐
│ BUNDLE ADJUSTMENT (Local) │
│ ────────────────────────── │
│ • Windowed optimization │
│ • Levenberg-Marquardt │
│ • Refine poses + 3D points │
│ • Covariance estimation │
└──────────────┬───────────────┘
│
▼
┌──────────────────────────────┐
│ GEOREFERENCING │
│ ──────────────────────── │
│ • Satellite image matching │
│ • GCP integration (if avail)│
│ • WGS84 transformation │
│ • Accuracy assessment │
└──────────────┬───────────────┘
│
▼
┌──────────────────────────────┐
│ OUTPUT & VALIDATION │
│ ──────────────────────── │
│ • Image center GPS coords │
│ • Object/feature coords │
│ • Confidence intervals │
│ • Outlier flagging │
│ • Google Maps cross-check │
└──────────────────────────────┘
3.2 Core Algorithmic Components
3.2.1 Initialization Phase
Input: Starting GPS coordinate (or estimated from first visible landmarks)
Process:
- Load first image, extract AKAZE features at multiple scales
- Establish camera intrinsic parameters:
- If known: use factory calibration or pre-computed values
- If unknown: assume standard pinhole model with principal point at image center
- Estimate focal length from image resolution: ~2.5-3.0 × image width (typical aerial lens)
- Define initial local coordinate system:
- Origin at starting GPS coordinate
- Z-axis up, XY horizontal
- Project all future calculations to WGS84 at end
Output: Camera matrix K, initial camera pose (R₀, t₀)
3.2.2 Sequential Image-to-Image Matching
Algorithm: Incremental SfM with temporal ordering constraint
For image N in sequence:
1. Extract AKAZE features from image N
2. Match features with image N-1 using KNN with Lowe's ratio test
3. RANSAC with 8-point essential matrix estimation:
- Iterate: sample 8 point correspondences
- Solve: SVD-based essential matrix E computation
- Score: inlier count (epipolar constraint |p'ᵀEp| < ε)
- Keep: best E with >50 inliers
4. If registration fails (inliers <50 or insufficient quality):
- Attempt N to N+2 matching (skip frame)
- If still failing: request user input or flag as uncertain
5. Decompose E to camera pose (R, t) with triangulation validation
6. Triangulate 3D points from matched features
7. Perform local windowed bundle adjustment (last 5 images)
8. Compute image center GPS via local-to-global transformation
Key Parameters:
- AKAZE threshold: adaptive based on image quality
- Matching distance ratio: 0.7 (Lowe's test)
- RANSAC inlier threshold: 1.0 pixels
- Minimum inliers for success: 50 points
- Maximum reprojection error in BA: 1.5 pixels
3.2.3 Pose Estimation & Triangulation
5-Point Algorithm (Stewenius et al.):
- Minimal solver for 5 point correspondences
- Returns up to 4 solutions for essential matrix
- Selects solution with maximum triangulated points in front of cameras
- Complexity: O(5) vs O(8) for 8-point, enabling RANSAC speed
Triangulation:
- Linear triangulation using DLT (Direct Linear Transform)
- For each matched feature pair: solve 4×4 system via SVD
- Filter: reject points with:
- Reprojection error > 1.5 pixels
- Behind either camera
- Altitude inconsistent with flight dynamics
3.2.4 Bundle Adjustment (Windowed)
Formulation:
minimize Σ ||p_i^(img) - π(X_i, P_cam)||² + λ·||ΔP_cam||²
where:
- p_i^(img): observed pixel position
- X_i: 3D point coordinate
- P_cam: camera pose parameters
- π(): projection function
- λ: regularization weight
Algorithm: Sparse Levenberg-Marquardt with Schur complement
- Window size: 5-10 consecutive images (trade-off between accuracy and speed)
- Iteration limit: 10 (convergence typically in 3-5)
- Damping: adaptive μ (starts at 10⁻⁶)
- Covariance computation: from information matrix inverse
Complexity: O(w³) where w = window size → ~0.3s for w=10 on modern CPU
3.2.5 Georeferencing Module
Challenge: Converting local 3D structure to WGS84 coordinates
Approach 1 - Satellite Image Matching (Primary):
- Query Google Maps Static API for area around estimated location
- Scale downloaded satellite imagery to match expected ground resolution
- Extract ORB/SIFT features from satellite image
- Match features between UAV nadir image and satellite image
- Compute homography transformation (if sufficient overlap)
- Estimate camera center GPS from homography
- Validate: check consistency with neighboring images
Approach 2 - GCP Integration (When available):
- If user provides 4+ manually-identified GCPs in images with known coords:
- Use GCPs to establish local-to-global transformation
- 6-DOF rigid transformation (4 GCPs minimum)
- Refine with all available GCPs using least-squares
- Transform all local coordinates via this transformation
Approach 3 - IMU/INS Integration (If available):
- If UAV provides gyro/accelerometer data:
- Integrate IMU measurements to constrain camera orientation
- Use IMU to detect anomalies (sharp turns, tilt)
- Fuse with visual odometry using Extended Kalman Filter (EKF)
- Improves robustness during low-texture sequences
Uncertainty Quantification:
- Covariance matrix σ² from bundle adjustment
- Project uncertainty to GPS coordinates via Jacobian
- Compute 95% confidence ellipse for each image center
- Typical values: σ ≈ 20-50m initially, improves with satellite anchor
3.2.6 Fallback & Outlier Detection
Outlier Detection Strategy:
- Local consistency check:
- Compute velocity between consecutive images
- Flag if velocity changes >50% between successive intervals
- Expected velocity: ~10-15 m/s ground speed
- Satellite validation:
- After full flight processing: retrieve satellite imagery
- Compare UAV image against satellite image at claimed coordinates
- Compute cross-correlation; flag if <0.3
- Loop closure detection:
- If imagery from later in flight matches earlier imagery: flag potential error
- Use place recognition (ORB vocabulary tree) to detect revisits
- User feedback loop:
- Display flagged uncertain frames to operator
- Allow manual refinement for <20% of images
- Re-optimize trajectory using corrected anchor points
Graceful Degradation (350m outlier scenario):
- Detect outlier via velocity threshold
- Attempt skip-frame matching (N to N+2, N+3)
- If fails, insert "uncertainty zone" marker
- Continue from next successfully matched pair
- Later satellite validation will flag this region for manual review
4. Architecture: Detailed Module Specifications
4.1 System Components
Component 1: Image Preprocessor
Input: Raw JPEG/PNG from UAV
Output: Normalized, undistorted image ready for feature extraction
Operations:
├─ Load image (max 6252×4168)
├─ Apply lens distortion correction (if calibration available)
├─ Normalize histogram (CLAHE for uniform feature detection)
├─ Optional: Downsample for <2s latency (e.g., 3000×2000 if >4000×3000)
├─ Compute image metadata (filename, timestamp)
└─ Cache for access by subsequent modules
Component 2: Feature Detector
Input: Preprocessed image
Output: Keypoints + descriptors
Algorithm: AKAZE with multi-scale pyramids
├─ Pyramid levels: 4-6 (scale factor 1.2)
├─ FAST corner threshold: adaptive (target 500-1000 keypoints)
├─ BRIEF descriptor: rotation-aware, 256 bits
├─ Feature filtering:
│ ├─ Remove features in low-texture regions (variance <10)
│ ├─ Enforce min separation (8px) to avoid clustering
│ └─ Sort by keypoint strength (use top 2000)
└─ Output: vector<KeyPoint>, Mat descriptors (Nx256 uint8)
Component 3: Feature Matcher
Input: Features from Image N-1, Features from Image N
Output: Vector of matched point pairs (inliers only)
Algorithm: KNN matching with Lowe's ratio test + RANSAC
├─ BruteForceMatcher (Hamming distance for AKAZE)
├─ KNN search: k=2
├─ Lowe's ratio test: d1/d2 < 0.7
├─ RANSAC 5-point algorithm:
│ ├─ Iterations: min(4000, 10000 - 100*inlier_count)
│ ├─ Inlier threshold: 1.0 pixels
│ ├─ Minimum inliers: 50 (lower to 30 for skip-frame matching)
│ └─ Success: inlier_ratio > 0.4
├─ Triangulation validation (reject behind camera)
└─ Output: vector<DMatch>, Mat points3D (Mx3)
Component 4: Pose Solver
Input: Essential matrix E from RANSAC, matched points
Output: Rotation matrix R, translation vector t
Algorithm: E decomposition
├─ SVD decomposition of E
├─ Extract 4 candidate (R, t) pairs
├─ Triangulate points for each candidate
├─ Select candidate with max points in front of both cameras
├─ Recover scale using calibration (altitude constraint)
├─ Output: 4x4 transformation matrix T = [R t; 0 1]
Component 5: Triangulator
Input: Keypoints from image 1, image 2; poses P1, P2; calib K
Output: 3D point positions, mask of valid points
Algorithm: Linear triangulation (DLT)
├─ For each point correspondence (p1, p2):
│ ├─ Build 4×4 matrix from epipolar lines
│ ├─ SVD → solve for 3D point X
│ ├─ Validate: |p1 - π(X,P1)| < 1.5px AND |p2 - π(X,P2)| < 1.5px
│ ├─ Validate: X_z > 50m (min safe altitude above ground)
│ └─ Validate: X_z < 1500m (max altitude constraint)
└─ Output: Mat points3D (Mx3 float32), Mat validMask (Mx1 uchar)
Component 6: Bundle Adjuster
Input: Poses [P0...Pn], 3D points [X0...Xm], observations
Output: Refined poses, 3D points, covariance matrices
Algorithm: Sparse Levenberg-Marquardt with windowing
├─ Window size: 5 images (or fewer at flight start)
├─ Optimization variables:
│ ├─ Camera poses: 6 DOF per image (Rodrigues rotation + translation)
│ └─ 3D points: 3 coordinates per point
├─ Residuals: reprojection error in both images
├─ Iterations: max 10 (typically converges in 3-5)
├─ Covariance:
│ ├─ Compute Hessian inverse (information matrix)
│ ├─ Extract diagonal for per-parameter variances
│ └─ Per-image uncertainty: sqrt(diag(Cov[t]))
└─ Output: refined poses, points, Mat covariance (per image)
Component 7: Satellite Georeferencer
Input: Current image, estimated center GPS (rough), local trajectory
Output: Refined GPS coordinates, confidence score
Algorithm: Satellite image matching
├─ Query Google Maps API:
│ ├─ Coordinates: estimated_gps ± 200m
│ ├─ Resolution: match UAV image resolution (1-2m GSD)
│ └─ Zoom level: 18-20
├─ Image preprocessing:
│ ├─ Scale satellite image to ~same resolution as UAV image
│ ├─ Convert to grayscale
│ └─ Equalize histogram
├─ Feature matching:
│ ├─ Extract ORB features from both images
│ ├─ Match with BruteForceMatcher
│ ├─ Apply RANSAC homography (min 10 inliers)
│ └─ Compute inlier ratio
├─ Homography analysis:
│ ├─ If inlier_ratio > 0.2:
│ │ ├─ Extract 4 corners from UAV image via inverse homography
│ │ ├─ Map to satellite image coordinates
│ │ ├─ Compute implied GPS shift
│ │ └─ Apply shift to current pose estimate
│ └─ else: keep local estimate, flag as uncertain
├─ Confidence scoring:
│ ├─ score = inlier_ratio × mutual_information_normalized
│ └─ Threshold: score > 0.3 for "high confidence"
└─ Output: refined_gps, confidence (0.0-1.0), residual_px
Component 8: Outlier Detector
Input: Trajectory sequence [GPS_0, GPS_1, ..., GPS_n]
Output: Outlier flags, re-processed trajectory
Algorithm: Multi-stage detection
├─ Stage 1 - Velocity anomaly:
│ ├─ Compute inter-image distances: d_i = |GPS_i - GPS_{i-1}|
│ ├─ Compute velocity: v_i = d_i / Δt (Δt typically 0.5-2s)
│ ├─ Expected: 10-20 m/s for typical UAV
│ ├─ Flag if: v_i > 30 m/s OR v_i < 1 m/s
│ └─ Acceleration anomaly: |v_i - v_{i-1}| > 15 m/s
├─ Stage 2 - Satellite consistency:
│ ├─ For each flagged image:
│ │ ├─ Retrieve satellite image at claimed GPS
│ │ ├─ Compute cross-correlation with UAV image
│ │ └─ If corr < 0.25: mark as outlier
│ └─ Reprocess outlier image:
│ ├─ Try skip-frame matching (to N±2, N±3)
│ ├─ Try global place recognition
│ └─ Request user input if all fail
├─ Stage 3 - Loop closure:
│ ├─ Check if image matches any earlier image (Hamming dist <50)
│ └─ If match detected: assess if consistent with trajectory
└─ Output: flags, corrected_trajectory, uncertain_regions
Component 9: User Interface Module
Input: Flight trajectory, flagged uncertain regions
Output: User corrections, refined trajectory
Features:
├─ Web interface or desktop app
├─ Map display (Google Maps embedded):
│ ├─ Show computed trajectory
│ ├─ Overlay satellite imagery
│ ├─ Highlight uncertain regions (red)
│ ├─ Show confidence intervals (error ellipses)
│ └─ Display reprojection errors
├─ Image preview:
│ ├─ Click trajectory point to view corresponding image
│ ├─ Show matched keypoints and epipolar lines
│ ├─ Display feature matching quality metrics
│ └─ Show neighboring images in sequence
├─ Manual correction:
│ ├─ Drag trajectory point to correct location (via map click)
│ ├─ Mark GCPs manually (click point in image, enter GPS)
│ ├─ Re-run optimization with corrected anchors
│ └─ Export corrected trajectory as GeoJSON/CSV
└─ Reporting:
├─ Summary statistics (% within 50m, 20m, etc.)
├─ Outlier report with reasons
├─ Satellite validation results
└─ Export georeferenced image list with coordinates
4.2 Data Flow & Processing Pipeline
Phase 1: Offline Initialization (before flight or post-download)
Input: Full set of N images, starting GPS coordinate
├─ Load all images into memory/fast storage (SSD)
├─ Detect features in all images (parallelizable: N CPU threads)
├─ Store features on disk for quick access
└─ Estimate camera calibration (if not known)
Time: ~1-3 minutes for 1000 images on 16-core CPU
Phase 2: Sequential Processing (online or batch)
For i = 1 to N-1:
├─ Load images[i] and images[i+1]
├─ Match features
├─ RANSAC pose estimation
├─ Triangulate 3D points
├─ Local bundle adjustment (last 5 frames)
├─ Satellite georeferencing
├─ Store: GPS[i+1], confidence[i+1], covariance[i+1]
└─ [< 2 seconds per iteration]
Time: 2N seconds = ~30-60 minutes for 1000 images
Phase 3: Post-Processing (after full trajectory)
├─ Global bundle adjustment (optional: full flight with key-frame selection)
├─ Loop closure optimization (if detected)
├─ Outlier detection and flagging
├─ Satellite validation (batch retrieve imagery, compare)
├─ Export results with metadata
└─ Generate report with accuracy metrics
Time: ~5-20 minutes
Phase 4: Manual Review & Correction (if needed)
├─ User reviews flagged uncertain regions
├─ Manually corrects up to 20% of trajectory as needed
├─ Re-optimizes with corrected anchors
└─ Final export
Time: 10-60 minutes depending on complexity
5. Testing Strategy
2. Detailed Test Categories
2.1 Unit Tests (Level 1)
UT-1: Feature Extraction (AKAZE)
Purpose: Verify keypoint detection and descriptor computation
Test Data: Synthetic images with known features (checkerboard patterns)
Test Cases:
├─ UT-1.1: Basic feature detection
│ Input: 1024×768 synthetic image with checkerboard
│ Expected: ≥500 keypoints detected
│ Pass: count ≥ 500
│
├─ UT-1.2: Scale invariance
│ Input: Same scene at 2x scale
│ Expected: Keypoints at proportional positions
│ Pass: correlation of positions > 0.9
│
├─ UT-1.3: Rotation robustness
│ Input: Image rotated ±30°
│ Expected: Descriptors match original + rotated
│ Pass: match rate > 80%
│
├─ UT-1.4: Multi-scale handling
│ Input: Image with features at multiple scales
│ Expected: Features detected at all scales (pyramid)
│ Pass: ratio of scales [1:1.2:1.44:...] verified
│
└─ UT-1.5: Performance constraint
Input: FullHD image (1920×1080)
Expected: <500ms feature extraction
Pass: 95th percentile < 500ms
UT-2: Feature Matching
Purpose: Verify robust feature correspondence
Test Data: Pairs of synthetic/real images with known correspondence
Test Cases:
├─ UT-2.1: Basic matching
│ Input: Two images from synthetic scene (90% overlap)
│ Expected: ≥95% of ground-truth features matched
│ Pass: match_rate ≥ 0.95
│
├─ UT-2.2: Outlier rejection (Lowe's ratio test)
│ Input: Synthetic pair + 50% false features
│ Expected: False matches rejected
│ Pass: false_match_rate < 0.1
│
├─ UT-2.3: Low overlap scenario
│ Input: Two images with 20% overlap
│ Expected: Still matches ≥20 points
│ Pass: min_matches ≥ 20
│
└─ UT-2.4: Performance
Input: FullHD images, 1000 features each
Expected: <300ms matching time
Pass: 95th percentile < 300ms
UT-3: Essential Matrix Estimation
Purpose: Verify 5-point/8-point algorithms for camera geometry
Test Data: Synthetic correspondences with known relative pose
Test Cases:
├─ UT-3.1: 8-point algorithm
│ Input: 8+ point correspondences
│ Expected: Essential matrix E with rank 2
│ Pass: min_singular_value(E) < 1e-6
│
├─ UT-3.2: 5-point algorithm
│ Input: 5 point correspondences
│ Expected: Up to 4 solutions generated
│ Pass: num_solutions ∈ [1, 4]
│
├─ UT-3.3: RANSAC convergence
│ Input: 100 correspondences, 30% outliers
│ Expected: Essential matrix recovery despite outliers
│ Pass: inlier_ratio ≥ 0.6
│
└─ UT-3.4: Chirality constraint
Input: Multiple (R,t) solutions from decomposition
Expected: Only solution with points in front of cameras selected
Pass: selected_solution verified via triangulation
UT-4: Triangulation (DLT)
Purpose: Verify 3D point reconstruction from image correspondences
Test Data: Synthetic scenes with known 3D geometry
Test Cases:
├─ UT-4.1: Accuracy
│ Input: Noise-free point correspondences
│ Expected: Reconstructed X matches ground truth
│ Pass: RMSE < 0.1cm on 1m scene
│
├─ UT-4.2: Outlier handling
│ Input: 10 valid + 2 invalid correspondences
│ Expected: Invalid points detected (behind camera/far)
│ Pass: valid_mask accuracy > 95%
│
├─ UT-4.3: Altitude constraint
│ Input: Points with z < 50m (below aircraft)
│ Expected: Points rejected
│ Pass: altitude_filter works correctly
│
└─ UT-4.4: Batch performance
Input: 500 point triangulations
Expected: <100ms total
Pass: 95th percentile < 100ms
UT-5: Bundle Adjustment
Purpose: Verify pose and 3D point optimization
Test Data: Synthetic multi-view scenes
Test Cases:
├─ UT-5.1: Convergence
│ Input: 5 frames with noisy initial poses
│ Expected: Residual decreases monotonically
│ Pass: final_residual < 0.001 * initial_residual
│
├─ UT-5.2: Covariance computation
│ Input: Optimized poses and points
│ Expected: Covariance matrix positive-definite
│ Pass: all_eigenvalues > 0
│
├─ UT-5.3: Window size effect
│ Input: Same problem with window sizes [3, 5, 10]
│ Expected: Larger windows → better residuals
│ Pass: residual_5 < residual_3, residual_10 < residual_5
│
└─ UT-5.4: Performance scaling
Input: Window size [5, 10, 15, 20]
Expected: Time ~= O(w^3)
Pass: quadratic fit accurate (R² > 0.95)
2.2 Integration Tests (Level 2)
IT-1: Sequential Pipeline
Purpose: Verify image-to-image processing chain
Test Data: Real aerial image sequences (5-20 images)
Test Cases:
├─ IT-1.1: Feature flow
│ Features extracted from img₁ → tracked to img₂ → matched
│ Expected: Consistent tracking across images
│ Pass: ≥70% features tracked end-to-end
│
├─ IT-1.2: Pose chain consistency
│ Poses P₁, P₂, P₃ computed sequentially
│ Expected: P₃ = P₂ ∘ P₂₋₁ (composition consistency)
│ Pass: pose_error < 0.1° rotation, 5cm translation
│
├─ IT-1.3: Trajectory smoothness
│ Velocity computed between poses
│ Expected: Smooth velocity profile (no jumps)
│ Pass: velocity_std_dev < 20% mean_velocity
│
└─ IT-1.4: Memory usage
Process 100-image sequence
Expected: Constant memory (windowed processing)
Pass: peak_memory < 2GB
IT-2: Satellite Georeferencing
Purpose: Verify local-to-global coordinate transformation
Test Data: Synthetic/real images with known satellite reference
Test Cases:
├─ IT-2.1: Feature matching with satellite
│ Input: Aerial image + satellite reference
│ Expected: ≥10 matched features between viewpoints
│ Pass: match_count ≥ 10
│
├─ IT-2.2: Homography estimation
│ Matched features → homography matrix
│ Expected: Valid transformation (3×3 matrix)
│ Pass: det(H) ≠ 0, condition_number < 100
│
├─ IT-2.3: GPS transformation accuracy
│ Apply homography to image corners
│ Expected: Computed GPS ≈ known reference GPS
│ Pass: error < 100m (on test data)
│
└─ IT-2.4: Confidence scoring
Compute inlier_ratio and MI (mutual information)
Expected: score = inlier_ratio × MI ∈ [0, 1]
Pass: high_confidence for obvious matches
IT-3: Outlier Detection Chain
Purpose: Verify multi-stage outlier detection
Test Data: Synthetic trajectory with injected outliers
Test Cases:
├─ IT-3.1: Velocity anomaly detection
│ Inject 350m jump at frame N
│ Expected: Detected as outlier
│ Pass: outlier_flag = True
│
├─ IT-3.2: Recovery mechanism
│ After outlier detection
│ Expected: System attempts skip-frame matching (N→N+2)
│ Pass: recovery_successful = True
│
├─ IT-3.3: False positive rate
│ Normal sequence with small perturbations
│ Expected: <5% false outlier flagging
│ Pass: false_positive_rate < 0.05
│
└─ IT-3.4: Consistency across stages
Multiple detection stages should agree
Pass: agreement_score > 0.8
2.3 System Tests (Level 3)
ST-1: Accuracy Criteria
Purpose: Verify system meets ±50m and ±20m accuracy targets
Test Data: Real aerial image sequences with ground-truth GPS
Test Cases:
├─ ST-1.1: 50m accuracy target
│ Input: 500-image flight
│ Compute: % images within 50m of ground truth
│ Expected: ≥80%
│ Pass: accuracy_50m ≥ 0.80
│
├─ ST-1.2: 20m accuracy target
│ Same flight data
│ Expected: ≥60% within 20m
│ Pass: accuracy_20m ≥ 0.60
│
├─ ST-1.3: Mean absolute error
│ Compute: MAE over all images
│ Expected: <40m typical
│ Pass: MAE < 50m
│
└─ ST-1.4: Error distribution
Expected: Error approximately Gaussian
Pass: K-S test p-value > 0.05
ST-2: Registration Rate
Purpose: Verify ≥95% of images successfully registered
Test Data: Real flights with various conditions
Test Cases:
├─ ST-2.1: Baseline registration
│ Good overlap, clear features
│ Expected: >98% registration rate
│ Pass: registration_rate ≥ 0.98
│
├─ ST-2.2: Challenging conditions
│ Low texture, variable lighting
│ Expected: ≥95% registration rate
│ Pass: registration_rate ≥ 0.95
│
├─ ST-2.3: Sharp turns scenario
│ Images with <10% overlap
│ Expected: Fallback mechanisms trigger, ≥90% success
│ Pass: fallback_success_rate ≥ 0.90
│
└─ ST-2.4: Consecutive failures
Track max consecutive unregistered images
Expected: <3 consecutive failures
Pass: max_consecutive_failures ≤ 3
ST-3: Reprojection Error
Purpose: Verify <1.0 pixel mean reprojection error
Test Data: Real flight data after bundle adjustment
Test Cases:
├─ ST-3.1: Mean reprojection error
│ After BA optimization
│ Expected: <1.0 pixel
│ Pass: mean_reproj_error < 1.0
│
├─ ST-3.2: Error distribution
│ Histogram of per-point errors
│ Expected: Tightly concentrated <2 pixels
│ Pass: 95th_percentile < 2.0 px
│
├─ ST-3.3: Per-frame consistency
│ Error should not vary dramatically
│ Expected: Consistent across frames
│ Pass: frame_error_std_dev < 0.3 px
│
└─ ST-3.4: Outlier points
Very large reprojection errors
Expected: <1% of points with error >3 px
Pass: outlier_rate < 0.01
ST-4: Processing Speed
Purpose: Verify <2 seconds per image
Test Data: Full flight sequences on target hardware
Test Cases:
├─ ST-4.1: Average latency
│ Mean processing time per image
│ Expected: <2 seconds
│ Pass: mean_latency < 2.0 sec
│
├─ ST-4.2: 95th percentile latency
│ Worst-case images (complex scenes)
│ Expected: <2.5 seconds
│ Pass: p95_latency < 2.5 sec
│
├─ ST-4.3: Component breakdown
│ Feature extraction: <0.5s
│ Matching: <0.3s
│ RANSAC: <0.2s
│ BA: <0.8s
│ Satellite: <0.3s
│ Pass: Each component within budget
│
└─ ST-4.4: Scaling with problem size
Memory usage, CPU usage vs. image resolution
Expected: Linear scaling
Pass: O(n) complexity verified
ST-5: Robustness - Outlier Handling
Purpose: Verify graceful handling of 350m outlier drifts
Test Data: Synthetic/real data with injected outliers
Test Cases:
├─ ST-5.1: Single 350m outlier
│ Inject outlier at frame N
│ Expected: Detected, trajectory continues
│ Pass: system_continues = True
│
├─ ST-5.2: Multiple outliers
│ 3-5 outliers scattered in sequence
│ Expected: All detected, recovery attempted
│ Pass: detection_rate ≥ 0.8
│
├─ ST-5.3: False positive rate
│ Normal trajectory, no outliers
│ Expected: <5% false flagging
│ Pass: false_positive_rate < 0.05
│
└─ ST-5.4: Recovery latency
Time to recover after outlier
Expected: ≤3 frames
Pass: recovery_latency ≤ 3 frames
ST-6: Robustness - Sharp Turns
Purpose: Verify handling of <5% image overlap scenarios
Test Data: Synthetic sequences with sharp angles
Test Cases:
├─ ST-6.1: 5% overlap matching
│ Two images with 5% overlap
│ Expected: Minimal matches or skip-frame
│ Pass: system_handles_gracefully = True
│
├─ ST-6.2: Skip-frame fallback
│ Direct N→N+1 fails, tries N→N+2
│ Expected: Succeeds with N→N+2
│ Pass: skip_frame_success_rate ≥ 0.8
│
├─ ST-6.3: 90° turn handling
│ Images at near-orthogonal angles
│ Expected: Degeneracy detected, logged
│ Pass: degeneracy_detection = True
│
└─ ST-6.4: Trajectory consistency
Consecutive turns: check velocity smoothness
Expected: No velocity jumps > 50%
Pass: velocity_consistency verified
2.4 Field Acceptance Tests (Level 4)
FAT-1: Real UAV Flight Trial #1 (Baseline)
Scenario: Nominal flight over agricultural field
┌────────────────────────────────────────┐
│ Conditions: │
│ • Clear weather, good sunlight │
│ • Flat terrain, sparse trees │
│ • 300m altitude, 50m/s speed │
│ • 800 images, ~15 min flight │
└────────────────────────────────────────┘
Pass Criteria:
✓ Accuracy: ≥80% within 50m
✓ Accuracy: ≥60% within 20m
✓ Registration rate: ≥95%
✓ Processing time: <2s/image
✓ Satellite validation: <10% outliers
✓ Reprojection error: <1.0px mean
Success Metrics:
• MAE (mean absolute error): <40m
• RMS error: <45m
• Max error: <200m
• Trajectory coherence: smooth (no jumps)
FAT-2: Real UAV Flight Trial #2 (Challenging)
Scenario: Flight with more complex terrain
┌────────────────────────────────────────┐
│ Conditions: │
│ • Mixed urban/agricultural │
│ • Buildings, vegetation, water bodies │
│ • Variable altitude (250-400m) │
│ • Includes 1-2 sharp turns │
│ • 1200 images, ~25 min flight │
└────────────────────────────────────────┘
Pass Criteria:
✓ Accuracy: ≥75% within 50m (relaxed from 80%)
✓ Accuracy: ≥50% within 20m (relaxed from 60%)
✓ Registration rate: ≥92% (relaxed from 95%)
✓ Processing time: <2.5s/image avg
✓ Outliers detected: <15% (relaxed from 10%)
Fallback Validation:
✓ User corrected <20% of uncertain images
✓ After correction, accuracy meets FAT-1 targets
FAT-3: Real UAV Flight Trial #3 (Edge Case)
Scenario: Low-texture flight (challenging for features)
┌────────────────────────────────────────┐
│ Conditions: │
│ • Sandy/desert terrain or water │
│ • Minimal features │
│ • Overcast/variable lighting │
│ • 500-600 images, ~12 min flight │
└────────────────────────────────────────┘
Pass Criteria:
✓ System continues (no crash): YES
✓ Graceful degradation: Flags uncertainty
✓ User can correct and improve: YES
✓ Satellite anchor helps recovery: YES
Success Metrics:
• >80% of images tagged "uncertain"
• After user correction: meets standard targets
• Demonstrates fallback mechanisms working
3. Test Environment Setup
Hardware Requirements
CPU: 16+ cores (Intel Xeon / AMD Ryzen)
RAM: 64GB minimum (32GB acceptable for <1500 images)
Storage: 1TB SSD (for raw images + processing)
GPU: Optional (CUDA 11.8+ for 5-10x acceleration)
Network: For satellite API queries (can be cached)
Software Requirements
OS: Ubuntu 20.04 LTS or macOS 12+
Build: CMake 3.20+, GCC 9+ or Clang 11+
Dependencies: OpenCV 4.8+, Eigen 3.4+, GDAL 3.0+
Testing: GoogleTest, Pytest
CI/CD: GitHub Actions or Jenkins
Test Data Management
Synthetic Data: Generated via Blender (checked into repo)
Real Data: External dataset storage (S3/local SSD)
Ground Truth: Maintained in CSV format with metadata
Versioning: Git-LFS for binary image data
4. Test Execution Plan
Phase 1: Unit Testing (Weeks 1-6)
Sprint 1-2: UT-1 (Feature detection) - 2 week
Sprint 3-4: UT-2 (Feature matching) - 2 weeks
Sprint 5-6: UT-3, UT-4, UT-5 (Geometry) - 2 weeks
Continuous: Run full unit test suite every commit
Coverage target: >90% code coverage
Phase 2: Integration Testing (Weeks 7-12)
Sprint 7-9: IT-1 (Sequential pipeline) - 3 weeks
Sprint 10-11: IT-2, IT-3 (Georef, Outliers) - 2 weeks
Sprint 12: System integration - 1 week
Continuous: Integration tests run nightly
Phase 3: System Testing (Weeks 13-18)
Sprint 13-14: ST-1, ST-2 (Accuracy, Registration) - 2 weeks
Sprint 15-16: ST-3, ST-4 (Error, Speed) - 2 weeks
Sprint 17-18: ST-5, ST-6 (Robustness) - 2 weeks
Load testing: 1000-3000 image sequences
Stress testing: Edge cases, memory limits
Phase 4: Field Acceptance (Weeks 19-30)
Week 19-22: FAT-1 (Baseline trial)
• Coordinate 1-2 baseline flights
• Validate system on real data
• Adjust parameters as needed
Week 23-26: FAT-2 (Challenging trial)
• More complex scenarios
• Test fallback mechanisms
• Refine user interface
Week 27-30: FAT-3 (Edge case trial)
• Low-texture scenarios
• Validate robustness
• Final adjustments
Post-trial: Generate comprehensive report
5. Acceptance Criteria Summary
| Criterion | Target | Test | Pass/Fail |
|---|---|---|---|
| Accuracy@50m | ≥80% | FAT-1 | ≥80% pass |
| Accuracy@20m | ≥60% | FAT-1 | ≥60% pass |
| Registration Rate | ≥95% | ST-2 | ≥95% pass |
| Reprojection Error | <1.0px mean | ST-3 | <1.0px pass |
| Processing Speed | <2.0s/image | ST-4 | p95<2.5s pass |
| Robustness (350m outlier) | Handled | ST-5 | Continue pass |
| Sharp turns (<5% overlap) | Handled | ST-6 | Skip-frame pass |
| Satellite validation | <10% outliers | FAT-1-3 | <10% pass |
6. Success Metrics
Green Light Criteria (Ready for production):
- ✅ All unit tests pass (100%)
- ✅ All integration tests pass (100%)
- ✅ All system tests pass (100%)
- ✅ FAT-1 and FAT-2 pass acceptance criteria
- ✅ FAT-3 shows graceful degradation
- ✅ <10% code defects discovered in field trials
- ✅ Performance meets SLA consistently
Yellow Light Criteria (Conditional deployment):
- ⚠ 85-89% of acceptance criteria met
- ⚠ Minor issues in edge cases
- ⚠ Requires workaround documentation
- ⚠ Re-test after fixes
Red Light Criteria (Do not deploy):
- ❌ <85% of acceptance criteria met
- ❌ Critical failures in core functionality
- ❌ Safety/security concerns
- ❌ Cannot meet latency or accuracy targets