Files
gps-denied-desktop/docs/01_solution/01_solution_draft_google.md
T
2025-11-03 21:47:21 +02:00

30 KiB

Analysis and Proposed Architecture for a Hybrid Visual-Geodetic Localization System

Part 1: Product Solution Description: The GOLS (Geolocational Odometry & Localization System)

1.1. Executive Summary: System Concept and Mission

This report details the technical architecture for the "Geolocational Odometry & Localization System" (GOLS), a high-precision, offline processing software suite designed to solve a complex georeferencing problem. The system's primary mission is to ingest a chronologically-ordered sequence of high-resolution aerial images (e.g., AD000001.jpg to AD001500.jpg) captured from a fixed-wing Unmanned Aerial Vehicle (UAV) and, using only the known GPS coordinate of the first image 1, reconstruct the complete and precise geodetic location of the entire flight.

The system's outputs are twofold:

  1. A high-fidelity, georeferenced 6-Degrees-of-Freedom (6-DoF) pose (comprising Latitude, Longitude, Altitude, Roll, Pitch, and Yaw) for every valid image in the input sequence.
  2. An on-demand query function to determine the precise WGS84 GPS coordinates (latitude, longitude, altitude) of any object, identified by its pixel coordinates (u, v), within any of these successfully georeferenced images.

GOLS is architected as a hybrid, multi-modal system. It is designed to overcome the two fundamental and coupled challenges inherent to this problem:

  1. Scale Ambiguity: Monocular Visual Odometry (VO) or Simultaneous Localization and Mapping (SLAM) systems are incapable of determining the true scale of the world from a single camera feed.2 The 100-meter distance between photos is a guideline, but cannot be used as a rigid constraint to solve this ambiguity.
  2. Cumulative Drift: All relative-positioning systems accumulate small errors over time, causing the estimated trajectory to "drift" from the true geodetic path.4

To solve these, the GOLS architecture fuses high-frequency, relative motion estimates (derived from frame-to-frame image analysis) with low-frequency, absolute geodetic "anchor points" derived from matching the UAV's imagery against an external satellite map provider (Google Maps).5

1.2. Core Technical Principle: A Hybrid Fusion Approach

The system's core philosophy is founded on the understanding that no single, monolithic algorithm can meet the severe operational constraints and stringent acceptance criteria. The problem as defined sits at the intersection of three challenging computer vision domains.

  • A standard Visual SLAM system (e.g., ORB-SLAM3 8) would fail. The target environment (Eastern and Southern Ukraine) and the provided sample images (Images 1-9) are dominated by natural, low-texture terrain such as fields, shrubbery, and dirt roads. Feature-based SLAM systems are notoriously unreliable in such "textureless" areas.10
  • A standard Structure from Motion (SfM) pipeline (e.g., COLMAP 13) would also fail. The constraints explicitly state "sharp turns" with less than 5% image-to-image overlap and potential "350m outlier" photos. Traditional SfM approaches require significant image overlap and will fail to register images across these gaps.14

Therefore, GOLS is designed as a modular, graph-based optimization framework.15 It separates the problem into three parallel "front-end" modules that generate "constraints" (i.e., measurements) and one "back-end" module that fuses all measurements into a single, globally consistent solution.

  1. Module 1: Visual Odometry (VO) Front-End: This module computes high-frequency, relative frame-to-frame motion (e.g., "frame 2 is 98.2m forward and 1.8° right of frame 1"). This provides the dense "shape" of the trajectory but is unscaled and prone to drift.
  2. Module 2: Wide-Baseline SfM Front-End: This module computes low-frequency, non-sequential relative matches (e.g., "frame 50 contains the same building as frame 5"). Its sole purpose is to bridge large gaps in the trajectory caused by sharp turns (Acceptance Criterion 4) or sensor outliers (Acceptance Criterion 3).
  3. Module 3: Cross-View Georeferencing (CVG) Front-End: This module computes low-frequency, absolute pose estimates (e.g., "frame 100 is at 48.27° N, 37.38° E, at 1km altitude"). It does this by matching UAV images to the georeferenced satellite map.5 This module provides the absolute scale and the GPS anchors necessary to eliminate drift and meet the <20m accuracy criteria.
  4. Module 4: Back-End Global Optimizer: This module fuses all constraints from the other three modules into a pose graph and solves it, finding the 6-DoF pose for every image that best-satisfies all (often conflicting) measurements.

1.3. Addressing Key Problem Constraints & Acceptance Criteria

This hybrid architecture is specifically designed to meet each acceptance criterion:

  • Criteria 1 & 2 (80% < 50m, 60% < 20m error): Solved by the CVG Front-End (Module 3). By providing sparse, absolute GPS fixes, this module "anchors" the pose graph. The back-end optimization (Module 4) propagates this absolute information across the entire trajectory, correcting the scale and eliminating the cumulative drift.4
  • Criteria 3 & 4 (350m outlier, <5% overlap on turns): Solved by the Wide-Baseline SfM Front-End (Module 2). This module will use state-of-the-art (SOTA) deep-learning-based feature matchers (analyzed in 2.1.2) that are designed for "wide-baseline" or "low-overlap" scenarios.18 These matchers can find correspondences where traditional methods fail, allowing the system to bridge these gaps.
  • Non-Stabilized Camera Constraint: The fixed, non-stabilized camera on a fixed-wing platform will induce severe roll/pitch and motion blur. This is handled by two components: (1) A specialized VO front-end (Module 1) that is photometrically robust 20, and (2) aggressive RANSAC (RANdom SAmple Consensus) 21 outlier rejection at every matching stage to discard false correspondences caused by motion blur or extreme perspective distortion.
  • Criterion 7 (< 2s/image): Solved by a multi-threaded, asynchronous architecture and GPU acceleration.22 The performance criterion is interpreted as average throughput, not latency. The fast VO front-end (Module 1) provides an initial pose, while the computationally expensive Modules 2, 3, and 4 run on a GPU (e.g., using CUDA 24) in parallel, asynchronously refining the global solution.
  • Criteria 8 & 9 (Reg. Rate > 95%, MRE < 1.0px): These criteria are achieved by the combination of the front-ends and back-end. The >95% Registration Rate 25 is achieved by the high-recall front-ends (Module 1 + 2). The <1.0 pixel Mean Reprojection Error 26 is the explicit optimization target of the Back-End Global Optimizer (Module 4), which performs a global Bundle Adjustment (BA) to minimize this exact metric.27

Part 2: System Architecture and State-of-the-Art Analysis

2.1. SOTA Foundational Analysis: Selecting the Core Algorithmic Components

The GOLS architecture is a composition of SOTA components, each selected to solve a specific part of the problem.

2.1.1. Front-End Strategy 1 (VO): Feature-Based (ORB-SLAM) vs. Direct (DSO)

The primary relative VO front-end (Module 1) must be robust to the UAV's fast motion and the environment's visual characteristics.

  • Feature-Based Methods (e.g., ORB-SLAM): Systems like ORB-SLAM3 8 are highly successful general-purpose SLAM systems. They operate by detecting sparse, repeatable features (ORB) and matching them between frames. However, their primary weakness is a reliance on "good features." Research and practical application show that ORB-SLAM suffers from "tracking loss in textureless areas".10 The provided sample images (e.g., Image 1, 2, 8, 9) are dominated by homogeneous fields and sparse vegetation—a worst-case scenario for feature-based methods.
  • Direct Methods (e.g., DSO): Systems like Direct Sparse Odometry (DSO) 20 or LSD-SLAM 31 operate on a different principle. They "discard the feature extractor and directly utilize the pixel intensity" 32 by minimizing the photometric error (difference in pixel brightness) between frames.
  • Analysis and Decision: For this specific problem, a Direct method is superior.
    1. DSO "does not depend on keypoint detectors or descriptors".30 It "can naturally sample pixels from across all image regions that have intensity gradient, including edges or smooth intensity variations".20
    2. This makes it ideal for the target environment. The edge of a dirt track (Image 4), the boundary between two fields (Image 1), or the shadow of a bush (Image 2) provide strong, usable gradients for DSO, whereas they contain no "corners" for ORB.
    3. Furthermore, the non-stabilized camera will cause rapid changes in auto-exposure and vignetting. DSO's formulation "integrates a full photometric calibration, accounting for exposure time, lens vignetting, and non-linear response functions".20 ORB-SLAM's assumption of uniform motion and lighting is more easily violated.34

Therefore, the GOLS Module 1: Relative Odometry Front-End will be based on Direct Sparse Odometry (DSO).

2.1.2. Front-End Strategy 2 (Matching): Traditional (SIFT) vs. Deep Learning (SuperGlue)

Modules 2 (Wide-Baseline SfM) and 3 (CVG) both rely on a "wide-baseline" feature matcher—an algorithm that can match two images with very different viewpoints.

  • The Challenge: The system must match (1) UAV-to-UAV across a "sharp turn" with <5% overlap (AC 4) and (2) UAV-to-Satellite, which is an extreme cross-view matching problem with differences in scale, sensor, illumination, and time (due to outdated maps).35
  • Traditional Matchers (e.g., SIFT, ORB): Classic methods like SIFT 36 are robust to scale and rotation, but they fail decisively in cross-view scenarios.37 They produce very few valid matches, which are then almost entirely eliminated by RANSAC filtering.35
  • Deep Learning Matchers (e.g., SuperGlue): The SOTA solution is a combination of the SuperPoint feature detector 38 and the SuperGlue matcher.19 SuperGlue is not a simple matcher; it is a Graph Neural Network (GNN) that performs "context aggregation, matching, and filtering" simultaneously.19 It "establishes global context perception" 38 by learning the geometric priors of 3D-to-2D projection.39
  • Analysis and Decision:
    1. For both the wide-baseline UAV-to-UAV case and the extreme cross-view UAV-to-satellite case, SuperPoint + SuperGlue is the selected technology.
    2. Multiple studies confirm that SuperPoint+SuperGlue significantly outperforms traditional methods (SIFT/ORB) and other learning-based methods (LoFTR) for UAV-to-map and UAV-to-satellite registration.35
    3. Its ability to match based on global context rather than just local patch appearance makes it robust to the sparse-texture scenes 41 and extreme viewpoint changes that define this problem.

Therefore, GOLS Module 2 (Wide-Baseline SfM) and Module 3 (CVG) will be built upon the SuperPoint + SuperGlue matching pipeline.

2.1.3. Back-End Strategy: Monolithic (COLMAP) vs. Hybrid (CVD-SfM) vs. Graph Fusion (GTSAM)

The back-end (Module 4) must fuse the heterogeneous constraints from Modules 1, 2, and 3.

  • Monolithic SfM (e.g., COLMAP): A classic incremental SfM pipeline.14 As noted, "Traditional SfM pipelines (COLMAP, OpenMVG) struggle with large viewpoint differences, failing to match enough cross-view features".13 This approach will fail.
  • Hybrid SfM (e.g., CVD-SfM): A very recent SOTA approach (IROS 2025) is CVD-SfM, a "Cross-View Deep Front-end Structure-from-Motion System".14 This system is designed to integrate cross-view (e.g., satellite) priors and deep features into a unified SfM pipeline 45 and is shown to achieve higher registration "coverage" than COLMAP on sparse, multi-altitude datasets.13
  • Graph Fusion (e.g., g2o / GTSAM): This approach, common in robotics, separates front-end measurement from back-end optimization.46 It uses a library like g2o (General Graph Optimization) 47 or GTSAM (Georgia Tech Smoothing and Mapping) 49 to build a factor graph.50
  • Analysis and Decision:
    1. While CVD-SfM 44 is highly relevant, it is a monolithic system designed for a specific problem.
    2. Our problem requires fusing three different types of constraints: (1) fast, unscaled photometric constraints from DSO, (2) sparse, feature-based relative constraints from SuperGlue (UAV-to-UAV), and (3) sparse, feature-based absolute constraints from SuperGlue (UAV-to-Satellite).
    3. A general-purpose optimization framework is more flexible and robust. GTSAM 49 is a SOTA C++ library based on factor graphs, which are explicitly designed for multi-sensor fusion.4
    4. We can define custom "factors" for each of our three constraint types.53
    5. Crucially, GTSAM includes the iSAM2 solver 55, an incremental smoothing and mapping algorithm. This allows the back-end graph to be efficiently updated as new (and slow-to-compute) satellite "anchor" constraints arrive from Module 3, without re-solving the entire 3000-image graph from scratch. This incremental nature is essential for meeting the < 2s/image performance criterion (AC 7).

Therefore, the GOLS Module 4: Back-End Global Optimizer will be implemented using the GTSAM factor graph library.

2.2. Proposed GOLS Architecture: A Multi-Front-End, Factor-Graph System

The GOLS system is an asynchronous, multi-threaded application built on the four selected modules.

2.2.1. Module 1: The Relative Odometry Front-End (DSO-VO)

  • Purpose: To generate a high-frequency, low-drift, but unscaled estimate of the UAV's trajectory.
  • Algorithm: Direct Sparse Odometry (DSO).20
  • Workflow:
    1. This module runs in a high-priority thread, processing images sequentially (N, N+1).
    2. It minimizes the photometric error between the frames to estimate the relative 6-DoF pose transformation T(N, N+1).
    3. This module runs fastest, providing the initial "dead-reckoning" path. The scale of this path is initially unknown (or bootstrapped by Module 3).
  • Output: A high-frequency stream of Odometry Factors (binary constraints between Pose_N and Pose_N+1) 49 is published to the Back-End (Module 4).

2.2.2. Module 2: The Wide-Baseline & Outlier Front-End (SG-SfM)

  • Purpose: To find non-sequential "loop closures" or "shortcuts" to correct for drift and re-establish tracking after sharp turns (AC 4) or outliers (AC 3).
  • Algorithm: SuperPoint 38 + SuperGlue.19
  • Workflow:
    1. This module runs asynchronously in a lower-priority GPU thread.
    2. It does not compare N to N+1. Instead, it compares Image N to a "sliding window" of non-adjacent frames (e.g., N-50...N-10 and N+10...N+50).
    3. Handling AC 3 (350m Outlier): If Image N+1 is an outlier (e.g., a sudden tilt causes a 350m ground shift), Module 1 will fail to match N to N+1. This module, however, will continue to search and will successfully match N to N+2 (or N+3, etc.), bridging the outlier. The outlier frame N+1 becomes an un-registered "island" in the pose graph, which is permissible under AC 6 (the 20% allowance).
    4. Handling AC 4 (Sharp Turn): Similarly, if a sharp turn breaks Module 1's tracking due to <5% overlap, this module will find a wide-baseline match, creating a constraint T(N, N+k) that bridges the turn.
  • Output: A low-frequency stream of Loop Closure Factors (binary constraints between Pose_N and Pose_N+k) 54 is published to the Back-End.

2.2.3. Module 3: The Absolute Georeferencing Front-End (CVG)

  • Purpose: To provide absolute scale and global GPS coordinates to anchor the entire graph and eliminate drift, thereby meeting AC 1 and AC 2.
  • Algorithm: SuperPoint + SuperGlue 42 + RANSAC 21 + Google Maps API.
  • Workflow:
    1. Initialization: This module runs first. It takes AD000001.jpg and its known GPS coordinate.1 It fetches the corresponding satellite tile from Google Maps. It performs a SuperPoint+SuperGlue match.41 From this match, it calculates the 6-DoF pose of the first frame and, most importantly, the initial scale (Ground Sampling Distance, GSD, in meters/pixel). This GSD is used to provide an initial scale to the DSO module (Module 1).
    2. Asynchronous Anchoring: This module runs in a background GPU thread, activating on a sparse subset of images (e.g., every 25th frame, or when Module 4 reports high pose uncertainty).
    3. For a target Image N, it uses the current best estimate of its pose (from the GTSAM graph) to fetch the relevant satellite tile.
    4. It performs a SuperGlue UAV-to-satellite match.42
    5. Handling Outdated Maps: The constraint "Google Maps...could be...outdated" 56 is a major challenge. This architecture is robust to it. Because SuperGlue 38 is a feature-based matcher, it matches persistent geometric features (road intersections, building corners 35, field boundaries) that are stable over time. It is not confused by temporal, non-geometric changes (e.g., different seasons, presence/absence of cars, new/destroyed small structures) that would foil a dense or semantic matcher (like CVM-Net 57).
    6. Outlier Rejection (AC 5): The match is validated with a robust RANSAC.21 If the number of inlier matches is too low or the reprojection error is too high (indicating a failed or poor match, e.g., due to clouds or extreme map changes), the match is discarded.58 This ensures the "Number of outliers during...ground check" is < 10%.
  • Output: A low-frequency stream of Absolute Pose Factors (unary, or "GPS" constraints) 54 is published to the Back-End.

2.2.4. Module 4: The Back-End Global Optimizer (GTSAM-PGO)

  • Purpose: To fuse all constraints from Modules 1, 2, and 3 into a single, globally-consistent 6-DoF trajectory that is scaled and georeferenced.
  • Algorithm: GTSAM (Georgia Tech Smoothing and Mapping) 49 using an iSAM2 (incremental Smoothing and Mapping) solver.55
  • Workflow:
    1. The system maintains a factor graph 51 where nodes are the unknown 6-DoF poses of each camera and edges (factors) are the constraints (measurements) from the front-ends.
    2. Factor Types:
      • Unary Factor: A high-precision prior on Pose_1 (from the 1 input).
      • Unary Factors: The sparse, lower-precision Absolute Pose Factors from Module 3.54
      • Binary Factors: The dense, high-precision, but unscaled Odometry Factors from Module 1.53
      • Binary Factors: The sparse, high-precision Loop Closure Factors from Module 2.54
    3. Optimization: The iSAM2 solver 55 runs continuously, finding the set of 6-DoF poses that minimizes the error across all factors simultaneously (a non-linear least-squares problem). This optimization process:
      • Fixes Scale: Uses the absolute measurements from Module 3 to find the single "scale" parameter that best fits the unscaled DSO measurements from Module 1.
      • Corrects Drift: Uses the "loop closures" from Module 2 and "GPS anchors" from Module 3 to correct the cumulative drift from Module 1.
  • Output: The final, optimized 6-DoF pose for all registered images. This final optimized structure is designed to meet the MRE < 1.0 pixels criterion (AC 9).28

2.3. Sub-System: Object-Level Geolocation (Photogrammetric Ray-Casting)

The second primary user requirement is to find the GPS coordinates of "any object" (pixel) in an image. This is a standard photogrammetric procedure 59 that is only solvable after Module 4 has produced a high-fidelity 6-DoF pose for the image.

2.3.1. Prerequisite 1: Camera Intrinsic Calibration

The system must be provided with the camera's intrinsic parameters (focal length, principal point (cx, cy), and distortion coefficients k1, k2...). These must be pre-calibrated (e.g., using a checkerboard) and provided as an input file.28

2.3.2. Prerequisite 2: Digital Elevation Model (DEM) Acquisition

To find where a ray from the camera hits the ground, a 3D model of the ground itself is required.

  • SOTA Analysis: Several free, global DEMs are available, including SRTM 60 and Copernicus DEM.60
  • Decision: The system will use the Copernicus GLO-30 DEM.63
  • Rationale:
    1. Type: Copernicus is a Digital Surface Model (DSM), meaning it "represents the surface of the Earth including buildings, infrastructure and vegetation".63 This is critically important. The sample images (e.g., Image 5, 6, 7) clearly show buildings and trees. If a user clicks on a building rooftop, a DSM will return the correct (high) altitude. A Digital Terrain Model (DTM) like SRTM would return the altitude of the "bare earth" under the building, which would be incorrect.
    2. Accuracy: Copernicus GLO-30 (data 2011-2015) is a more modern and higher-fidelity dataset than the older SRTM (data 2000). It has "minimized data voids" and "improved the vertical accuracy".65
  • Implementation: The GOLS system will automatically download the required Copernicus GLO-30 tiles (in GeoTIFF format) for the flight's bounding box from an open-data source (e.g., the Copernicus S3 bucket 60).

2.3.3. Geolocation via Photogrammetric Ray-Casting

  • Algorithm: This process is known as ray-casting.69
    1. Input: Image AD0000X.jpg, pixel coordinate (u, v).
    2. Load: The optimized 6-DoF pose Pose_X (from Module 4) and the camera's intrinsic parameters.
    3. Load: The Copernicus GLO-30 DEM, loaded as a 3D mesh (e.g., using rasterio 71).
    4. Un-project: Using the camera intrinsics, convert the 2D pixel (u, v) into a 3D ray vector R in the camera's local coordinate system.
    5. Transform: Use the 6-DoF pose Pose_X to transform the ray's origin O (the camera's 3D position) and vector R into the global (WGS84) coordinate system.
    6. Intersect: Compute the 3D intersection point P(lat, lon, alt) where the global ray (O, R) intersects the 3D mesh of the DEM.72
    7. Output: P(lat, lon, alt) is the precise GPS coordinate of the object at pixel (u, v).
  • Libraries: This sub-system will be implemented in Python, using the rasterio library 71 for DEM I/O and a library like trimesh 75 or pyembree 73 for high-speed ray-mesh intersection calculations.

2.4. Performance and Usability Architecture

2.4.1. Hardware Acceleration (Criterion 7: < 2s/image)

The < 2s/image criterion is for average throughput. A 1500-image flight must complete in < 3000 seconds (50 minutes). This is aggressive.

  • Bottlenecks: The deep learning front-ends (Module 2 & 3: SuperPoint/SuperGlue) 19 and the graph optimization (Module 4) are the bottlenecks.
  • Solution: These modules must be GPU-accelerated.22
    1. Module 1 (DSO): Natively fast, CPU-bound.
    2. Module 2/3 (SuperGlue): Natively designed for GPU execution.19 Running SuperGlue on a CPU would take tens of seconds per match, catastrophically failing the performance requirement.
    3. Module 4 (GTSAM): Can be compiled with CUDA support for GPU-accelerated solver steps.
  • Implementation: The system will be built on a pub/sub framework (e.g., ROS 2, or a custom C++/Python framework) to manage the asynchronous-threaded architecture. A high-end NVIDIA GPU (e.g., RTX 30-series or 40-series) is a hard requirement for this system. NVIDIA's Isaac ROS suite provides GPU-accelerated VSLAM packages 24 that can serve as a reference for this implementation.

2.4.2. User-in-the-Loop Failsafe (Criterion 6)

The system must handle "absolute incapable" scenarios (e.g., flying over a large, textureless, featureless body of water) where all modules fail for 3+ consecutive images.

  • Workflow:
    1. Monitor: The Back-End (Module 4) monitors the queue of unregistered frames.
    2. Trigger: If 3 consecutive frames (e.g., N, N+1, N+2) fail all front-end checks (DSO tracking lost, SuperGlue-UAV fails, SuperGlue-Sat fails), the system pauses processing.
    3. GUI Prompt: A GUI is presented to the user.
      • Left Pane: Shows the last known good image (e.g., N-1) with its estimated position on the satellite map.
      • Right Pane: Shows the first failed image (e.g., N).
      • Map Pane: An interactive Google Maps interface centered on the last known good location.
    4. User Action: The user must manually find a recognizable landmark in Image N (e.g., "that distinct river bend") and click its corresponding location on the map.
    5. Recovery: The user's click generates a new, low-precision Absolute Pose Factor. This new "anchor" is inserted into the GTSAM graph 49, which re-optimizes and attempts to re-start the entire processing pipeline from frame N.

Part 3: Testing Strategy and Validation Plan

A comprehensive test suite is required to validate every acceptance criterion. The provided coordinates.csv file 1 will serve as the "ground truth" trajectory for validation.78

Core Validation Metrics:

  • Absolute Trajectory Error (ATE): The geodetic (Haversine) distance, in meters, between the system's estimated pose (lat, lon) for an image and the ground truth (lat, lon) from coordinates.csv.34 This is the primary metric for positional accuracy.
  • Mean Reprojection Error (MRE): The average pixel distance between an observed 2D feature and its corresponding 3D map point re-projected back into the camera.26 This is the primary metric for the internal 3D consistency of the reconstruction.28
  • Image Registration Rate (IRR): The percentage of images for which the system successfully computes a 6-DoF pose.25

Table 3.1: Acceptance Criteria Test Matrix

Criterion ID Criterion Description Test Case Test Metric Pass/Fail Threshold
AC-1 80% of photos < 50m error TC-1: Baseline Accuracy COUNT(ATE < 50m) / TOTAL_IMAGES > 0.80
AC-2 60% of photos < 20m error TC-1: Baseline Accuracy COUNT(ATE < 20m) / TOTAL_IMAGES > 0.60
AC-3 Handle 350m outlier TC-2: Outlier Robustness ATE for post-outlier frames ATE remains within AC-1/2 spec.
AC-4 Handle sharp turns (<5% overlap) TC-3: Low-Overlap Robustness IRR for frame after the turn > 95% (must register)
AC-5 Satellite check outliers < 10% TC-4: Back-End Residual Analysis COUNT(Bad_Factors) / COUNT(Sat_Factors) < 0.10
AC-6 User-in-the-Loop Failsafe TC-5: Failsafe Trigger System state GUI prompt must appear.
AC-7 < 2 seconds for processing one image TC-6: Performance Benchmark (Total Wall Time) / TOTAL_IMAGES < 2.0 seconds
AC-8 Image Registration Rate > 95% TC-1: Baseline Accuracy IRR = (Registered_Images / TOTAL_IMAGES) > 0.95
AC-9 Mean Reprojection Error < 1.0 pixels TC-1: Baseline Accuracy MRE from Back-End (Module 4) < 1.0 pixels

3.1. Test Case 1 (TC-1): Baseline Positional Accuracy (Validates AC-1, AC-2, AC-8, AC-9)

  • Procedure: Process the full sample image set (e.g., AD000001...AD000060) and the coordinates.csv ground truth.1 The system is given only the GPS for AD000001.jpg.
  • Metrics & Validation:
    1. The system's output trajectory is aligned with the ground truth trajectory 1 using a Sim(3) transformation (to account for initial scale/orientation alignment).
    2. (AC-8) Calculate the Image Registration Rate (IRR). Pass if IRR > 0.95.25
    3. (AC-9) Extract the final MRE from the GTSAM back-end.26 Pass if MRE < 1.0 pixels.28
    4. (AC-1, AC-2) Calculate the ATE (in meters) for every successfully registered frame. Calculate the 80th and 60th percentiles. Pass if 80th percentile is < 50.0m and 60th percentile is < 20.0m.

3.2. Test Case 2 (TC-2): 350m Outlier Robustness (Validates AC-3)

  • Procedure: Create a synthetic dataset. From the sample set, use AD000001...AD000030. Insert a single, unrelated image (e.g., from a different flight, or AD000060) as frame AD000031_outlier. Append the real AD000031...AD000060 (renamed).
  • Validation:
    1. The system must process the full set.
    2. The IRR (from TC-1) for valid frames must remain > 95%. The outlier frame AD000031_outlier should be correctly rejected.
    3. The ATE for frames AD000031 onward must not be significantly degraded and must remain within the AC-1/2 specification. This validates that Module 2 (SG-SfM) successfully "bridged" the outlier by matching AD000030 to AD000031.

3.3. Test Case 3 (TC-3): Sharp Turn / Low Overlap Robustness (Validates AC-4)

  • Procedure: Create a synthetic dataset by removing frames to simulate a sharp turn. Based on the coordinates in 1, the spatial and angular gap between AD000031 and AD000035 will be large, and overlap will be minimal.
  • Validation:
    1. The system must successfully register frame AD000035.
    2. The ATE for the entire trajectory must remain within AC-1/2 specification. This directly tests the wide-baseline matching of Module 2 (SG-SfM).19

3.4. Test Case 4 (TC-4): Satellite Ground Check Fidelity (Validates AC-5)

  • Procedure: During the execution of TC-1, log the state of every Absolute Pose Factor generated by Module 3 (CVG).
  • Metric: An "outlier factor" is defined as a satellite-match constraint whose final, optimized error residual in the GTSAM graph is in the top 10% of all residuals. A high residual means the optimizer is "fighting" this constraint, indicating a bad match (e.g., RANSAC failure 58 or a poor-quality match on an outdated map 54).
  • Validation: Pass if the total count of these outlier factors is < 10% of the total number of satellite-match attempts.

3.5. Test Case 5 (TC-5): Failsafe Mechanism Trigger (Validates AC-6)

  • Procedure: Create a synthetic dataset. Insert 4 consecutive "bad" frames (e.g., solid black images, solid white images, or images from a different continent) into the middle of the sample set (e.g., at AD000030).
  • Validation:
    1. The system must detect 3 consecutive registration failures (30, 31, 32).
    2. Pass if the system automatically pauses processing and successfully displays the "User-in-the-Loop" GUI described in section 2.4.2.

3.6. Test Case 6 (TC-6): System Performance (Validates AC-7)

  • Procedure: On the specified target hardware (including required NVIDIA GPU), time the total wall-clock execution of TC-1 using the full 1500-image dataset.
  • Metric: (Total Wall Clock Time in seconds) / (Total Number of Images).
  • Validation: Pass if the average processing time per image is < 2.0 seconds.