32 KiB
The ATLAS-GEOFUSE System Architecture
Multi-component architecture designed for high-performance, real-time geolocalization in IMU-denied, high-drift environments. Its architecture is explicitly designed around pre-flight data caching and multi-map robustness.
2.1 Core Design Principles
- Pre-Flight Caching: To meet the <5s (AC-7) real-time requirement, all network latency must be eliminated. The system mandates a "Pre-Flight" step (Section 3.0) where all geospatial data (satellite tiles, DEMs, vector data) for the Area of Interest (AOI) is downloaded from a viable open-source provider (e.g., Copernicus 6) and stored in a local database on the processing laptop. All real-time queries are made against this local cache.
- Decoupled Multi-Map SLAM: The system separates relative motion from absolute scale. A Visual SLAM (V-SLAM) "Atlas" Front-End (Section 4.0) computes high-frequency, robust, but unscaled relative motion. A Local Geospatial Anchoring Back-End (GAB) (Section 5.0) provides sparse, high-confidence, absolute metric anchors by querying the local cache. A Trajectory Optimization Hub (TOH) (Section 6.0) fuses these two streams in a Sim(3) pose-graph to solve for the global 7-DoF trajectory (pose + scale).
- Multi-Map Robustness (Atlas): To solve the "sharp turn" (AC-4) and "tracking loss" (AC-6) requirements, the V-SLAM front-end is based on an "Atlas" architecture.14 Tracking loss initiates a new, independent map fragment.13 The TOH is responsible for anchoring and merging all fragments geodetically 19 into a single, globally-consistent trajectory.
2.2 Component Interaction and Data Flow
- Component 1: Pre-Flight Caching Module (PCM) (Offline)
- Input: User-defined Area of Interest (AOI) (e.g., a KML polygon).
- Action: Queries Copernicus 6 and OpenStreetMap APIs. Downloads and builds a local geospatial database (GeoPackage/SpatiaLite) containing satellite tiles, DEM tiles, and road/river vectors for the AOI.
- Output: A single, self-contained Local Geo-Database file.
- Component 2: Image Ingestion & Pre-processing (Real-time)
- Input: Image_N (up to 6.2K), Camera Intrinsics (
K). - Action: Creates two copies:
- Image_N_LR (Low-Resolution, e.g., 1536x1024): Dispatched immediately to the V-SLAM Front-End.
- Image_N_HR (High-Resolution, 6.2K): Stored for asynchronous use by the GAB.
- Input: Image_N (up to 6.2K), Camera Intrinsics (
- Component 3: V-SLAM "Atlas" Front-End (High-Frequency Thread)
- Input: Image_N_LR.
- Action: Tracks Image_N_LR against its active map fragment. Manages keyframes, local bundle adjustment 38, and the co-visibility graph. If tracking is lost (e.g., AC-4 sharp turn), it initializes a new map fragment 14 and continues tracking.
- Output: Relative_Unscaled_Pose and Local_Point_Cloud data, sent to the TOH.
- Component 4: Local Geospatial Anchoring Back-End (GAB) (Low-Frequency, Asynchronous Thread)
- Input: A keyframe (Image_N_HR) and its unscaled pose, triggered by the TOH.
- Action: Performs a visual-only, coarse-to-fine search 34 against the Local Geo-Database.
- Output: An Absolute_Metric_Anchor (a high-confidence [Lat, Lon, Alt] pose) for that keyframe, sent to the TOH.
- Component 5: Trajectory Optimization Hub (TOH) (Central Hub Thread)
- Input: (1) High-frequency Relative_Unscaled_Pose stream. (2) Low-frequency Absolute_Metric_Anchor stream.
- Action: Manages the complete flight trajectory as a Sim(3) pose graph 39 using Ceres Solver.19 Continuously fuses all data.
- Output 1 (Real-time): Pose_N_Est (unscaled) sent to UI (meets AC-7, AC-8).
- Output 2 (Refined): Pose_N_Refined (metric-scale, globally-optimized) sent to UI (meets AC-1, AC-2, AC-8).
2.3 System Inputs
- Image Sequence: Consecutively named images (FullHD to 6252x4168).
- Start Coordinate (Image 0): A single, absolute GPS coordinate (Latitude, Longitude).
- Camera Intrinsics (
K): Pre-calibrated camera intrinsic matrix. - Local Geo-Database File: The single file generated by the Pre-Flight Caching Module (Section 3.0).
2.4 Streaming Outputs (Meets AC-7, AC-8)
- Initial Pose (
Pose_N^{Est}): An unscaled pose estimate. This is sent immediately (<5s, AC-7) to the UI for real-time visualization of the UAV's path shape. - Refined Pose (
Pose_N^{Refined}) [Asynchronous]: A globally-optimized, metric-scale 7-DoF pose (X, Y, Z, Qx, Qy, Qz, Qw) and its corresponding [Lat, Lon, Alt] coordinate. This is sent to the user whenever the TOH re-converges (e.g., after a new GAB anchor or map-merge), updating all past poses (AC-1, AC-2, AC-8 refinement met).
3.0 Pre-Flight Component: The Geospatial Caching Module (PCM)
This component is a new, mandatory, pre-flight utility that solves the fatal flaws (Section 1.1, 1.2) of the GEORTEX-R design. It eliminates all real-time network latency (AC-7) and all ToS violations (AC-5), ensuring the project is both performant and legally viable.
3.1 Defining the Area of Interest (AOI)
The system is designed for long-range flights. Given 3000 photos at 100m intervals, the maximum linear track is 300km. The user must provide a coarse "bounding box" or polygon (e.g., KML/GeoJSON format) of the intended flight area. The PCM will automatically add a generous buffer (e.g., 20km) to this AOI to account for navigational drift and ensure all necessary reference data is captured.
3.2 Legal & Viable Data Sources (Copernicus & OpenStreetMap)
As established in 1.1, the system must use open-data providers. The PCM is architected to use the following:
- Visual/Terrain Data (Primary): The Copernicus Data Space Ecosystem 6 is the primary source. The PCM will use the Copernicus Processing and Catalogue APIs 6 to query, process, and download two key products for the buffered AOI:
- Sentinel-2 Satellite Imagery: High-resolution (10m) visual tiles.
- Copernicus GLO-30 DEM: A 30m-resolution Digital Elevation Model.7 This DEM is not used for high-accuracy object localization (see 1.4), but as a coarse altitude prior for the TOH and for the critical dynamic-warping step (Section 5.3).
- Semantic Data (Secondary): OpenStreetMap (OSM) data 40 for the AOI will be downloaded. This provides temporally-invariant vector data (roads, rivers, building footprints) which can be used as a secondary, optional verification layer for the GAB, especially in cases of extreme temporal divergence (e.g., new construction).42
3.3 Building the Local Geo-Database
The PCM utility will process all downloaded data into a single, efficient, compressed file. A modern GeoPackage or SpatiaLite database is the ideal format. This database will contain the satellite tiles, DEM tiles, and vector features, all indexed by a common spatial grid (e.g., UTM).
This single file is then loaded by the main ATLAS-GEOFUSE application at runtime. The GAB's (Section 5.0) "API calls" are thus transformed from high-latency, unreliable HTTP requests 9 into high-speed, zero-latency local SQL queries, guaranteeing that data I/O is never the bottleneck for meeting the AC-7 performance requirement.
4.0 Core Component: The Multi-Map V-SLAM "Atlas" Front-End
This component's sole task is to robustly and accurately compute the unscaled 6-DoF relative motion of the UAV and build a geometrically-consistent map of keyframes. It is explicitly designed to be more robust than simple frame-to-frame odometry and to handle catastrophic tracking loss (AC-4) gracefully.
4.1 Rationale: ORB-SLAM3 "Atlas" Architecture
The system will implement a V-SLAM front-end based on the "Atlas" multi-map paradigm, as seen in SOTA systems like ORB-SLAM3.14 This is the industry-standard solution for robust, long-term navigation in environments where tracking loss is possible.13
The mechanism is as follows:
- The system initializes and begins tracking on Map_Fragment_0, using the known start GPS as a metadata tag.
- It tracks all new frames (Image_N_LR) against this active map.
- If tracking is lost (e.g., a sharp turn (AC-4) or a persistent 350m outlier (AC-3)):
- The "Atlas" architecture does not fail. It declares Map_Fragment_0 "inactive," stores it, and immediately initializes Map_Fragment_1 from the current frame.14
- Tracking resumes instantly on this new map fragment, ensuring the system "correctly continues the work" (AC-4).
This architecture converts the "sharp turn" failure case into a standard operating procedure. The system never "fails"; it simply fragments. The burden of stitching these fragments together is correctly moved from the V-SLAM front-end (which has no global context) to the TOH (Section 6.0), which can solve it using global-metric anchors.
4.2 Feature Matching Sub-System: SuperPoint + LightGlue
The V-SLAM front-end's success depends entirely on high-quality feature matches, especially in the sparse, low-texture agricultural terrain seen in the user's images. The selected approach is SuperPoint + LightGlue.
- SuperPoint: A SOTA feature detector proven to find robust, repeatable keypoints in challenging, low-texture conditions.43
- LightGlue: A highly optimized GNN-based matcher that is the successor to SuperGlue.44
The choice of LightGlue over SuperGlue is a deliberate performance optimization. LightGlue is adaptive.46 The user query states sharp turns (AC-4) are "rather an exception." This implies ~95% of image pairs are "easy" (high-overlap, straight flight) and 5% are "hard" (low-overlap, turns). LightGlue's adaptive-depth GNN exits early on "easy" pairs, returning a high-confidence match in a fraction of the time. This saves enormous computational budget on the 95% of normal frames, ensuring the system always meets the <5s budget (AC-7) and reserving that compute for the GAB and TOH. This component will run on Image_N_LR (low-res) to guarantee performance, and will be accelerated via TensorRT (Section 7.0).
4.3 Keyframe Management and Local 3D Cloud
The front-end will maintain a co-visibility graph of keyframes for its active map fragment. It will perform local Bundle Adjustment 38 continuously over a sliding window of recent keyframes to minimize drift within that fragment.
Crucially, it will triangulate features to create a local, high-density 3D point cloud for its map fragment.28 This point cloud is essential for two reasons:
- It provides robust tracking (tracking against a 3D map, not just a 2D frame).
- It serves as the high-accuracy source for the object localization output (Section 9.1), as established in 1.4, allowing the system to bypass the high-error external DEM.
Table 1: Analysis of State-of-the-Art Feature Matchers (For V-SLAM Front-End)
| Approach (Tools/Library) | Advantages | Limitations | Requirements | Fitness for Problem Component |
|---|---|---|---|---|
| SuperPoint + SuperGlue | - SOTA robustness in low-texture, high-blur conditions. - GNN reasons about 3D scene context. - Proven in real-time SLAM systems. | - Computationally heavy (fixed-depth GNN). - Slower than LightGlue. | - NVIDIA GPU (RTX 2060+). - PyTorch or TensorRT. | Good. A solid, baseline choice. Meets robustness needs but will heavily tax the <5s time budget (AC-7). |
| SuperPoint + LightGlue 44 | - Adaptive Depth: Faster on "easy" pairs, more accurate on "hard" pairs.46 - Faster & Lighter: Outperforms SuperGlue on speed and accuracy. - SOTA "in practice" choice for large-scale matching. | - Newer, but rapidly being adopted and proven.48 | - NVIDIA GPU (RTX 2060+). - PyTorch or TensorRT. | Excellent (Selected). The adaptive nature is perfect for this problem. It saves compute on the 95% of easy (straight) frames, maximizing our ability to meet AC-7. |
5.0 Core Component: The Local Geospatial Anchoring Back-End (GAB)
This asynchronous component is the system's "anchor to reality." Its sole purpose is to find a high-confidence, absolute-metric pose for a given V-SLAM keyframe by matching it against the local, pre-cached geo-database (from Section 3.0). This component is a full replacement for the high-risk, high-latency GAB from the GEORTEX-R draft (see 1.2, 1.5).
5.1 Rationale: Local-First Query vs. On-Demand API
As established in 1.2, all queries are made to the local SSD. This guarantees zero-latency I/O, which is a hard requirement for a real-time system, as external network latency is unacceptably high and variable.9 The GAB itself runs asynchronously and can take longer than 5s (e.g., 10-15s), but it must not be blocked by network I/O, which would stall the entire processing pipeline.
5.2 SOTA Visual-Only Coarse-to-Fine Localization
This component implements a state-of-the-art, two-stage visual-only pipeline, which is lower-risk and more performant (see 1.5) than the GEORTEX-R's semantic-hybrid model. This approach is well-supported by SOTA research in aerial localization.34
- Stage 1 (Coarse): Global Descriptor Retrieval.
- Action: When the TOH requests an anchor for Keyframe_k, the GAB first computes a global descriptor (a compact vector representation) for the nadir-warped (see 5.3) low-resolution Image_k_LR.
- Technology: A SOTA Visual Place Recognition (VPR) model like SALAD 49, TransVLAD 50, or NetVLAD 33 will be used. These are designed for this "image retrieval" task.45
- Result: This descriptor is used to perform a fast FAISS/vector search against the descriptors of the local satellite tiles (which were pre-computed and stored in the Geo-Database). This returns the Top-K (e.g., K=5) most likely satellite tiles in milliseconds.
- Stage 2 (Fine): Local Feature Matching.
- Action: The system runs SuperPoint+LightGlue 43 to find pixel-level correspondences.
- Performance: This is not run on the full UAV image against the full satellite map. It is run only between high-resolution patches (from Image_k_HR) and the Top-K satellite tiles identified in Stage 1.
- Result: This produces a set of 2D-2D (image-to-map) feature matches. A PnP/RANSAC solver then computes a high-confidence 6-DoF pose. This pose is the Absolute_Metric_Anchor that is sent to the TOH.
5.3 Solving the Viewpoint Gap: Dynamic Feature Warping
The GAB must solve the "viewpoint gap" 33: the UAV image is oblique (due to roll/pitch), while the satellite tiles are nadir (top-down).
The GEORTEX-R draft proposed a complex, high-risk deep learning solution. The ATLAS-GEOFUSE solution is far more elegant and requires zero R&D:
- The V-SLAM Front-End (Section 4.0) already knows the camera's relative 6-DoF pose, including its roll and pitch orientation relative to the local map's ground plane.
- The Local Geo-Database (Section 3.0) contains a 30m-resolution DEM for the AOI.
- When the GAB processes Keyframe_k, it first performs a dynamic homography warp. It projects the V-SLAM ground plane onto the coarse DEM, and then uses the known camera roll/pitch to calculate the perspective transform (homography) needed to un-distort the oblique UAV image into a synthetic nadir-view.
This nadir-warped UAV image is then used in the Coarse-to-Fine pipeline (5.2). It will now match the nadir satellite tiles with extremely high-fidelity. This method eliminates the viewpoint gap without training any new neural networks, leveraging the inherent synergy between the V-SLAM component and the GAB's pre-cached DEM.
6.0 Core Component: The Multi-Map Trajectory Optimization Hub (TOH)
This component is the system's central "brain." It runs continuously, fusing all measurements (high-frequency/unscaled V-SLAM, low-frequency/metric-scale GAB anchors) from all map fragments into a single, globally consistent trajectory.
6.1 Incremental Sim(3) Pose-Graph Optimization
The central challenge of monocular, IMU-denied SLAM is scale-drift. The V-SLAM front-end produces unscaled 6-DoF (SE(3)) relative poses.37 The GAB produces metric-scale 6-DoF (SE(3)) absolute poses. These cannot be directly combined.
The solution is that the graph must be optimized in Sim(3) (7-DoF).39 This adds a single global scale factor $s$ as an optimizable variable to each V-SLAM map fragment. The TOH will maintain a pose-graph using Ceres Solver 19, a SOTA optimization library.
The graph is constructed as follows:
- Nodes: Each keyframe pose (7-DoF:
X, Y, Z, Qx, Qy, Qz, s). - Edge 1 (V-SLAM): A relative pose constraint between Keyframe_i and Keyframe_j within the same map fragment. The error is computed in Sim(3).29
- Edge 2 (GAB): An absolute pose constraint on Keyframe_k. This constraint fixes Keyframe_k's pose to the metric GPS coordinate from the GAB anchor and fixes its scale
sto 1.0.
The GAB's s=1.0 anchor creates "tension" in the graph. The Ceres optimizer 20 resolves this tension by finding the one global scale s for all other V-SLAM nodes in that fragment that minimizes the total error. This effectively "stretches" or "shrinks" the entire unscaled V-SLAM fragment to fit the metric anchors, which is the core of monocular SLAM scale-drift correction.29
6.2 Geodetic Map-Merging via Absolute Anchors
This is the robust solution to the "sharp turn" (AC-4) problem, replacing the flawed "relocalization" model from the original draft.
- Scenario: The UAV makes a sharp turn (AC-4). The V-SLAM front-end loses tracking on Map_Fragment_0 and creates Map_Fragment_1 (per Section 4.1). The TOH's pose graph now contains two disconnected components.
- Mechanism (Geodetic Merging):
- The GAB (Section 5.0) is queued to find anchors for keyframes in both fragments.
- The GAB returns Anchor_A for Keyframe_10 (in Map_Fragment_0) with GPS [Lat_A, Lon_A].
- The GAB returns Anchor_B for Keyframe_50 (in Map_Fragment_1) with GPS ``.
- The TOH adds both of these as absolute, metric constraints (Edge 2) to the global pose-graph.
- The graph optimizer 20 now has all the information it needs. It will solve for the 7-DoF pose of both fragments, placing them in their correct, globally-consistent metric positions. The two fragments are merged geodetically (i.e., by their global coordinates) even if they never visually overlap. This is a vastly more robust and modern solution than simple visual loop closure.19
6.3 Automatic Outlier Rejection (AC-3, AC-5)
The system must be robust to 350m outliers (AC-3) and <10% bad GAB matches (AC-5). A standard least-squares optimizer (like Ceres 20) would be catastrophically corrupted by a 350m error.
This is a solved problem in modern graph optimization.19 The solution is to wrap all constraints (V-SLAM and GAB) in a Robust Loss Function (e.g., HuberLoss, CauchyLoss) within Ceres Solver.
A robust loss function mathematically down-weights the influence of constraints with large errors (high residuals). When the TOH "sees" the 350m error from a V-SLAM relative pose (AC-3) or a bad GAB anchor (AC-5), the robust loss function effectively acknowledges the measurement but refuses to pull the entire 3000-image trajectory to fit this one "insane" data point. It automatically and gracefully ignores the outlier, optimizing the 99.9% of "sane" measurements, thus meeting AC-3 and AC-5.
Table 2: Analysis of Trajectory Optimization Strategies
| Approach (Tools/Library) | Advantages | Limitations | Requirements | Fitness for Problem Component |
|---|---|---|---|---|
| Incremental SLAM (Pose-Graph Optimization) (Ceres Solver 19, g2o, GTSAM) | - Real-time / Online: Provides immediate pose estimates (AC-7). - Supports Refinement: Explicitly designed to refine past poses when new "loop closure" (GAB) data arrives (AC-8). - Robust: Can handle outliers via robust kernels.19 | - Initial estimate is unscaled until a GAB anchor arrives. - Can drift if not anchored. | - A graph optimization library (Ceres). - A robust cost function (Huber). | Excellent (Selected). This is the only architecture that satisfies all user requirements for real-time streaming (AC-7) and asynchronous refinement (AC-8). |
| Batch Structure from Motion (Global Bundle Adjustment) (COLMAP, Agisoft Metashape) | - Globally Optimal Accuracy: Produces the most accurate possible 3D reconstruction. | - Offline: Cannot run in real-time or stream results. - High computational cost (minutes to hours). - Fails AC-7 and AC-8 completely. | - All images must be available before processing starts. - High RAM and CPU. | Good (as an Optional Post-Processing Step). Unsuitable as the primary online system, but could be offered as an optional, high-accuracy "Finalize Trajectory" batch process. |
7.0 High-Performance Compute & Deployment
The system must run on an RTX 2060 (AC-7) while processing 6.2K images. These are opposing constraints that require a deliberate compute strategy to balance speed and accuracy.
7.1 Multi-Scale, Coarse-to-Fine Processing Pipeline
The system must balance the conflicting demands of real-time speed (AC-7) and high accuracy (AC-2). This is achieved by running different components at different resolutions.
- V-SLAM Front-End (Real-time, <5s): This component (Section 4.0) runs only on the Image_N_LR (e.g., 1536x1024) copy. This is fast enough to meet the AC-7 budget.46
- GAB (Asynchronous, High-Accuracy): This component (Section 5.0) uses the full-resolution Image_N_HR selectively to meet the 20m accuracy (AC-2).
- Stage 1 (Coarse) runs on the low-res, nadir-warped image.
- Stage 2 (Fine) runs SuperPoint on the full 6.2K image to find the most confident keypoints. It then extracts small, 256x256 patches from the full-resolution image, centered on these keypoints.
- It matches these small, full-resolution patches against the high-res satellite tile.
This hybrid, multi-scale method provides the fine-grained matching accuracy of the 6.2K image (needed for AC-2) without the catastrophic CUDA Out-of-Memory errors (an RTX 2060 has only 6GB VRAM 30) or performance penalties that full-resolution processing would entail.
7.2 Mandatory Deployment: NVIDIA TensorRT Acceleration
The deep learning models (SuperPoint, LightGlue, NetVLAD) will be too slow in their native PyTorch framework to meet AC-7 on an RTX 2060.
This is not an "optional" optimization; it is a mandatory deployment step. The key neural networks must be converted from PyTorch into a highly-optimized NVIDIA TensorRT engine.
Research specifically on accelerating LightGlue with TensorRT shows "2x-4x speed gains over compiled PyTorch".48 Other benchmarks confirm TensorRT provides 30-70% speedups for deep learning inference.52 This conversion (which applies layer fusion, graph optimization, and FP16/INT8 precision) is what makes achieving the <5s (AC-7) performance possible on the specified RTX 2060 hardware.
8.0 System Robustness: Failure Mode Escalation Logic
This logic defines the system's behavior during real-world failures, ensuring it meets criteria AC-3, AC-4, AC-6, and AC-9, and is built upon the new "Atlas" multi-map architecture.
8.1 Stage 1: Normal Operation (Tracking)
- Condition: V-SLAM front-end (Section 4.0) is healthy.
- Logic:
- V-SLAM successfully tracks Image_N_LR against its active map fragment.
- A new Relative_Unscaled_Pose is sent to the TOH (Section 6.0).
- TOH sends Pose_N_Est (unscaled) to the user (AC-7, AC-8 met).
- If Image_N is selected as a keyframe, the GAB (Section 5.0) is queued to find an anchor for it, which will trigger a Pose_N_Refined update later.
8.2 Stage 2: Transient VO Failure (Outlier Rejection)
- Condition: Image_N is unusable (e.g., severe blur, sun-glare, or the 350m outlier from AC-3).
- Logic (Frame Skipping):
- V-SLAM front-end fails to track Image_N_LR against the active map.
- The system discards Image_N (marking it as a rejected outlier, AC-5).
- When Image_N+1 arrives, the V-SLAM front-end attempts to track it against the same local keyframe map (from Image_N-1).
- If successful: Tracking resumes. Image_N is officially an outlier. The system "correctly continues the work" (AC-3 met).
- If fails: The system repeats for Image_N+2, N+3. If this fails for ~5 consecutive frames, it escalates to Stage 3.
8.3 Stage 3: Persistent VO Failure (New Map Initialization)
- Condition: Tracking is lost for multiple frames. This is the "sharp turn" (AC-4) or "low overlap" (AC-4) scenario.
- Logic (Atlas Multi-Map):
- The V-SLAM front-end (Section 4.0) declares "Tracking Lost."
- It marks the current Map_Fragment_k as "inactive".13
- It immediately initializes a new Map_Fragment_k+1 using the current frame (Image_N+5).
- Tracking resumes instantly on this new, unscaled, un-anchored map fragment.
- This "registering" of a new map ensures the system "correctly continues the work" (AC-4 met) and maintains the >95% registration rate (AC-9) by not counting this as a failure.
8.4 Stage 4: Map-Merging & Global Relocalization (GAB-Assisted)
- Condition: The system is now tracking on Map_Fragment_k+1, while Map_Fragment_k is inactive. The TOH pose-graph (Section 6.0) is disconnected.
- Logic (Geodetic Merging):
- The TOH queues the GAB (Section 5.0) to find anchors for both map fragments.
- The GAB finds anchors for keyframes in both fragments.
- The TOH (Section 6.2) receives these metric anchors, adds them to the graph, and the Ceres optimizer 20 finds the global 7-DoF pose for both fragments, merging them into a single, metrically-consistent trajectory.
8.5 Stage 5: Catastrophic Failure (User Intervention)
- Condition: The system is in Stage 3 (Lost), and the GAB (Section 5.0) has also failed to find any global anchors for a new Map_Fragment_k+1 for a prolonged period (e.g., 20% of the route). This is the "absolutely incapable" scenario (AC-6), (e.g., flying over a large, featureless body of water or dense, uniform fog).
- Logic:
- The system has an unscaled, un-anchored map fragment (Map_Fragment_k+1) and zero idea where it is in the world.
- The TOH triggers the AC-6 flag.
- Resolution (User-Aided Prior):
- The UI prompts the user: "Tracking lost. Please provide a coarse location for the current image."
- The user clicks one point on a map.
- This [Lat, Lon] is not taken as ground truth. It is fed to the GAB (Section 5.0) as a strong spatial prior for its local database query (Section 5.2).
- This narrows the GAB's Stage 1 search area from "the entire AOI" to "a 5km radius around the user's click." This guarantees the GAB will find the correct satellite tile, find a high-confidence Absolute_Metric_Anchor, and allow the TOH (Stage 4) to re-scale 29 and geodetically-merge 20 this lost fragment, re-localizing the entire trajectory.
9.0 High-Accuracy Output Generation and Validation Strategy
This section details how the final user-facing outputs are generated, specifically replacing the flawed "Ray-DEM" method (see 1.4) with a high-accuracy "Ray-Cloud" method to meet the 20m accuracy (AC-2).
9.1 High-Accuracy Object Geolocalization via Ray-Cloud Intersection
As established in 1.4, using an external 30m DEM 21 for object localization introduces uncontrollable errors (up to 4m+22) that make meeting the 20m (AC-2) accuracy goal impossible. The system must use its own, internally-generated 3D map, which is locally far more accurate.25
- Inputs:
- User clicks pixel coordinate
(u,v)on Image_N. - The system retrieves the final, refined, metric 7-DoF Sim(3) pose
P_{sim(3)} = (s, R, T)for the map fragment that Image_N belongs to. This transformP_{sim(3)}maps the local V-SLAM coordinate system to the global metric coordinate system. - The system retrieves the local, unscaled V-SLAM 3D point cloud (
P_{local_cloud}) generated by the Front-End (Section 4.3). - The known camera intrinsic matrix
K.
- User clicks pixel coordinate
- Algorithm (Ray-Cloud Intersection):
- Un-project Pixel: The 2D pixel
(u,v)is un-projected into a 3D ray direction vectord_{cam}in the camera's local coordinate system:d_{cam} = K^{-1} \\cdot [u, v, 1]^T. - Transform Ray (Local): This ray is transformed using the local V-SLAM pose of Image_N to get a ray in the local map fragment's coordinate system.
- Intersect (Local): The system performs a numerical ray-mesh intersection (or nearest-neighbor search) to find the 3D point
P_{local}where this local ray intersects the local V-SLAM point cloud (P_{local_cloud}).25 ThisP_{local}is highly accurate relative to the V-SLAM map.26 - Transform (Global): This local 3D point
P_{local}is now transformed to the global, metric coordinate system using the 7-DoF Sim(3) transform from the TOH:P_{metric} = s \\cdot (R \\cdot P_{local}) + T. - Result: This 3D intersection point
P_{metric}is the metric world coordinate of the object. - Convert: This
(X, Y, Z)world coordinate is converted to a [Latitude, Longitude, Altitude] GPS coordinate.55
- Un-project Pixel: The 2D pixel
This method correctly isolates the error. The object's accuracy is now only dependent on the V-SLAM's geometric fidelity (AC-10 MRE < 1.0px) and the GAB's global anchoring (AC-1, AC-2). It completely eliminates the external 30m DEM error 22 from this critical, high-accuracy calculation.
9.2 Rigorous Validation Methodology
A comprehensive test plan is required to validate compliance with all 10 Acceptance Criteria. The foundation is a Ground-Truth Test Harness (e.g., using the provided coordinates.csv data).
- Test Harness:
- Ground-Truth Data: coordinates.csv provides ground-truth [Lat, Lon] for a set of images.
- Test Datasets:
- Test_Baseline: The ground-truth images and coordinates.
- Test_Outlier_350m (AC-3): Test_Baseline with a single, unrelated image inserted.
- Test_Sharp_Turn_5pct (AC-4): A sequence where several frames are manually deleted to simulate <5% overlap.
- Test_Long_Route (AC-9): A 1500-image sequence.
- Test Cases:
- Test_Accuracy (AC-1, AC-2, AC-5, AC-9):
- Run: Execute ATLAS-GEOFUSE on Test_Baseline, providing the first image's coordinate as the Start Coordinate.
- Script: A validation script will compute the Haversine distance error between the system's refined GPS output (
Pose_N^{Refined}) for each image and the ground-truth GPS. - ASSERT (count(errors < 50m) / total_images) >= 0.80 (AC-1 Met)
- ASSERT (count(errors < 20m) / total_images) >= 0.60 (AC-2 Met)
- ASSERT (count(un-localized_images) / total_images) < 0.10 (AC-5 Met)
- ASSERT (count(localized_images) / total_images) > 0.95 (AC-9 Met)
- Test_MRE (AC-10):
- Run: After Test_Baseline completes.
- ASSERT TOH.final_Mean_Reprojection_Error < 1.0 (AC-10 Met)
- Test_Performance (AC-7, AC-8):
- Run: Execute on Test_Long_Route on the minimum-spec RTX 2060.
- Log: Log timestamps for "Image In" -> "Initial Pose Out" (
Pose_N^{Est}). - ASSERT average_time < 5.0s (AC-7 Met)
- Log: Log the output stream.
- ASSERT >80% of images receive two poses: an "Initial" and a "Refined" (AC-8 Met)
- Test_Robustness (AC-3, AC-4, AC-6):
- Run: Execute Test_Outlier_350m.
- ASSERT System logs "Stage 2: Discarding Outlier" or "Stage 3: New Map" and the final trajectory error for the next frame is < 50m (AC-3 Met).
- Run: Execute Test_Sharp_Turn_5pct.
- ASSERT System logs "Stage 3: New Map Initialization" and "Stage 4: Geodetic Map-Merge," and the final trajectory is complete and accurate (AC-4 Met).
- Run: Execute on a sequence with no GAB anchors possible for 20% of the route.
- ASSERT System logs "Stage 5: User Intervention Requested" (AC-6 Met).
- Test_Accuracy (AC-1, AC-2, AC-5, AC-9):