Files
gps-denied-onboard/docs/01_solution/04_solution_draft.md
T

32 KiB

The ATLAS-GEOFUSE System Architecture

Multi-component architecture designed for high-performance, real-time geolocalization in IMU-denied, high-drift environments. Its architecture is explicitly designed around pre-flight data caching and multi-map robustness.

2.1 Core Design Principles

  1. Pre-Flight Caching: To meet the <5s (AC-7) real-time requirement, all network latency must be eliminated. The system mandates a "Pre-Flight" step (Section 3.0) where all geospatial data (satellite tiles, DEMs, vector data) for the Area of Interest (AOI) is downloaded from a viable open-source provider (e.g., Copernicus 6) and stored in a local database on the processing laptop. All real-time queries are made against this local cache.
  2. Decoupled Multi-Map SLAM: The system separates relative motion from absolute scale. A Visual SLAM (V-SLAM) "Atlas" Front-End (Section 4.0) computes high-frequency, robust, but unscaled relative motion. A Local Geospatial Anchoring Back-End (GAB) (Section 5.0) provides sparse, high-confidence, absolute metric anchors by querying the local cache. A Trajectory Optimization Hub (TOH) (Section 6.0) fuses these two streams in a Sim(3) pose-graph to solve for the global 7-DoF trajectory (pose + scale).
  3. Multi-Map Robustness (Atlas): To solve the "sharp turn" (AC-4) and "tracking loss" (AC-6) requirements, the V-SLAM front-end is based on an "Atlas" architecture.14 Tracking loss initiates a new, independent map fragment.13 The TOH is responsible for anchoring and merging all fragments geodetically 19 into a single, globally-consistent trajectory.

2.2 Component Interaction and Data Flow

  • Component 1: Pre-Flight Caching Module (PCM) (Offline)
    • Input: User-defined Area of Interest (AOI) (e.g., a KML polygon).
    • Action: Queries Copernicus 6 and OpenStreetMap APIs. Downloads and builds a local geospatial database (GeoPackage/SpatiaLite) containing satellite tiles, DEM tiles, and road/river vectors for the AOI.
    • Output: A single, self-contained Local Geo-Database file.
  • Component 2: Image Ingestion & Pre-processing (Real-time)
    • Input: Image_N (up to 6.2K), Camera Intrinsics (K).
    • Action: Creates two copies:
      • Image_N_LR (Low-Resolution, e.g., 1536x1024): Dispatched immediately to the V-SLAM Front-End.
      • Image_N_HR (High-Resolution, 6.2K): Stored for asynchronous use by the GAB.
  • Component 3: V-SLAM "Atlas" Front-End (High-Frequency Thread)
    • Input: Image_N_LR.
    • Action: Tracks Image_N_LR against its active map fragment. Manages keyframes, local bundle adjustment 38, and the co-visibility graph. If tracking is lost (e.g., AC-4 sharp turn), it initializes a new map fragment 14 and continues tracking.
    • Output: Relative_Unscaled_Pose and Local_Point_Cloud data, sent to the TOH.
  • Component 4: Local Geospatial Anchoring Back-End (GAB) (Low-Frequency, Asynchronous Thread)
    • Input: A keyframe (Image_N_HR) and its unscaled pose, triggered by the TOH.
    • Action: Performs a visual-only, coarse-to-fine search 34 against the Local Geo-Database.
    • Output: An Absolute_Metric_Anchor (a high-confidence [Lat, Lon, Alt] pose) for that keyframe, sent to the TOH.
  • Component 5: Trajectory Optimization Hub (TOH) (Central Hub Thread)
    • Input: (1) High-frequency Relative_Unscaled_Pose stream. (2) Low-frequency Absolute_Metric_Anchor stream.
    • Action: Manages the complete flight trajectory as a Sim(3) pose graph 39 using Ceres Solver.19 Continuously fuses all data.
    • Output 1 (Real-time): Pose_N_Est (unscaled) sent to UI (meets AC-7, AC-8).
    • Output 2 (Refined): Pose_N_Refined (metric-scale, globally-optimized) sent to UI (meets AC-1, AC-2, AC-8).

2.3 System Inputs

  1. Image Sequence: Consecutively named images (FullHD to 6252x4168).
  2. Start Coordinate (Image 0): A single, absolute GPS coordinate (Latitude, Longitude).
  3. Camera Intrinsics (K): Pre-calibrated camera intrinsic matrix.
  4. Local Geo-Database File: The single file generated by the Pre-Flight Caching Module (Section 3.0).

2.4 Streaming Outputs (Meets AC-7, AC-8)

  1. Initial Pose (Pose_N^{Est}): An unscaled pose estimate. This is sent immediately (<5s, AC-7) to the UI for real-time visualization of the UAV's path shape.
  2. Refined Pose (Pose_N^{Refined}) [Asynchronous]: A globally-optimized, metric-scale 7-DoF pose (X, Y, Z, Qx, Qy, Qz, Qw) and its corresponding [Lat, Lon, Alt] coordinate. This is sent to the user whenever the TOH re-converges (e.g., after a new GAB anchor or map-merge), updating all past poses (AC-1, AC-2, AC-8 refinement met).

3.0 Pre-Flight Component: The Geospatial Caching Module (PCM)

This component is a new, mandatory, pre-flight utility that solves the fatal flaws (Section 1.1, 1.2) of the GEORTEX-R design. It eliminates all real-time network latency (AC-7) and all ToS violations (AC-5), ensuring the project is both performant and legally viable.

3.1 Defining the Area of Interest (AOI)

The system is designed for long-range flights. Given 3000 photos at 100m intervals, the maximum linear track is 300km. The user must provide a coarse "bounding box" or polygon (e.g., KML/GeoJSON format) of the intended flight area. The PCM will automatically add a generous buffer (e.g., 20km) to this AOI to account for navigational drift and ensure all necessary reference data is captured.

As established in 1.1, the system must use open-data providers. The PCM is architected to use the following:

  1. Visual/Terrain Data (Primary): The Copernicus Data Space Ecosystem 6 is the primary source. The PCM will use the Copernicus Processing and Catalogue APIs 6 to query, process, and download two key products for the buffered AOI:
    • Sentinel-2 Satellite Imagery: High-resolution (10m) visual tiles.
    • Copernicus GLO-30 DEM: A 30m-resolution Digital Elevation Model.7 This DEM is not used for high-accuracy object localization (see 1.4), but as a coarse altitude prior for the TOH and for the critical dynamic-warping step (Section 5.3).
  2. Semantic Data (Secondary): OpenStreetMap (OSM) data 40 for the AOI will be downloaded. This provides temporally-invariant vector data (roads, rivers, building footprints) which can be used as a secondary, optional verification layer for the GAB, especially in cases of extreme temporal divergence (e.g., new construction).42

3.3 Building the Local Geo-Database

The PCM utility will process all downloaded data into a single, efficient, compressed file. A modern GeoPackage or SpatiaLite database is the ideal format. This database will contain the satellite tiles, DEM tiles, and vector features, all indexed by a common spatial grid (e.g., UTM).

This single file is then loaded by the main ATLAS-GEOFUSE application at runtime. The GAB's (Section 5.0) "API calls" are thus transformed from high-latency, unreliable HTTP requests 9 into high-speed, zero-latency local SQL queries, guaranteeing that data I/O is never the bottleneck for meeting the AC-7 performance requirement.

4.0 Core Component: The Multi-Map V-SLAM "Atlas" Front-End

This component's sole task is to robustly and accurately compute the unscaled 6-DoF relative motion of the UAV and build a geometrically-consistent map of keyframes. It is explicitly designed to be more robust than simple frame-to-frame odometry and to handle catastrophic tracking loss (AC-4) gracefully.

4.1 Rationale: ORB-SLAM3 "Atlas" Architecture

The system will implement a V-SLAM front-end based on the "Atlas" multi-map paradigm, as seen in SOTA systems like ORB-SLAM3.14 This is the industry-standard solution for robust, long-term navigation in environments where tracking loss is possible.13

The mechanism is as follows:

  1. The system initializes and begins tracking on Map_Fragment_0, using the known start GPS as a metadata tag.
  2. It tracks all new frames (Image_N_LR) against this active map.
  3. If tracking is lost (e.g., a sharp turn (AC-4) or a persistent 350m outlier (AC-3)):
    • The "Atlas" architecture does not fail. It declares Map_Fragment_0 "inactive," stores it, and immediately initializes Map_Fragment_1 from the current frame.14
    • Tracking resumes instantly on this new map fragment, ensuring the system "correctly continues the work" (AC-4).

This architecture converts the "sharp turn" failure case into a standard operating procedure. The system never "fails"; it simply fragments. The burden of stitching these fragments together is correctly moved from the V-SLAM front-end (which has no global context) to the TOH (Section 6.0), which can solve it using global-metric anchors.

4.2 Feature Matching Sub-System: SuperPoint + LightGlue

The V-SLAM front-end's success depends entirely on high-quality feature matches, especially in the sparse, low-texture agricultural terrain seen in the user's images. The selected approach is SuperPoint + LightGlue.

  • SuperPoint: A SOTA feature detector proven to find robust, repeatable keypoints in challenging, low-texture conditions.43
  • LightGlue: A highly optimized GNN-based matcher that is the successor to SuperGlue.44

The choice of LightGlue over SuperGlue is a deliberate performance optimization. LightGlue is adaptive.46 The user query states sharp turns (AC-4) are "rather an exception." This implies ~95% of image pairs are "easy" (high-overlap, straight flight) and 5% are "hard" (low-overlap, turns). LightGlue's adaptive-depth GNN exits early on "easy" pairs, returning a high-confidence match in a fraction of the time. This saves enormous computational budget on the 95% of normal frames, ensuring the system always meets the <5s budget (AC-7) and reserving that compute for the GAB and TOH. This component will run on Image_N_LR (low-res) to guarantee performance, and will be accelerated via TensorRT (Section 7.0).

4.3 Keyframe Management and Local 3D Cloud

The front-end will maintain a co-visibility graph of keyframes for its active map fragment. It will perform local Bundle Adjustment 38 continuously over a sliding window of recent keyframes to minimize drift within that fragment.

Crucially, it will triangulate features to create a local, high-density 3D point cloud for its map fragment.28 This point cloud is essential for two reasons:

  1. It provides robust tracking (tracking against a 3D map, not just a 2D frame).
  2. It serves as the high-accuracy source for the object localization output (Section 9.1), as established in 1.4, allowing the system to bypass the high-error external DEM.

Table 1: Analysis of State-of-the-Art Feature Matchers (For V-SLAM Front-End)

Approach (Tools/Library) Advantages Limitations Requirements Fitness for Problem Component
SuperPoint + SuperGlue - SOTA robustness in low-texture, high-blur conditions. - GNN reasons about 3D scene context. - Proven in real-time SLAM systems. - Computationally heavy (fixed-depth GNN). - Slower than LightGlue. - NVIDIA GPU (RTX 2060+). - PyTorch or TensorRT. Good. A solid, baseline choice. Meets robustness needs but will heavily tax the <5s time budget (AC-7).
SuperPoint + LightGlue 44 - Adaptive Depth: Faster on "easy" pairs, more accurate on "hard" pairs.46 - Faster & Lighter: Outperforms SuperGlue on speed and accuracy. - SOTA "in practice" choice for large-scale matching. - Newer, but rapidly being adopted and proven.48 - NVIDIA GPU (RTX 2060+). - PyTorch or TensorRT. Excellent (Selected). The adaptive nature is perfect for this problem. It saves compute on the 95% of easy (straight) frames, maximizing our ability to meet AC-7.

5.0 Core Component: The Local Geospatial Anchoring Back-End (GAB)

This asynchronous component is the system's "anchor to reality." Its sole purpose is to find a high-confidence, absolute-metric pose for a given V-SLAM keyframe by matching it against the local, pre-cached geo-database (from Section 3.0). This component is a full replacement for the high-risk, high-latency GAB from the GEORTEX-R draft (see 1.2, 1.5).

5.1 Rationale: Local-First Query vs. On-Demand API

As established in 1.2, all queries are made to the local SSD. This guarantees zero-latency I/O, which is a hard requirement for a real-time system, as external network latency is unacceptably high and variable.9 The GAB itself runs asynchronously and can take longer than 5s (e.g., 10-15s), but it must not be blocked by network I/O, which would stall the entire processing pipeline.

5.2 SOTA Visual-Only Coarse-to-Fine Localization

This component implements a state-of-the-art, two-stage visual-only pipeline, which is lower-risk and more performant (see 1.5) than the GEORTEX-R's semantic-hybrid model. This approach is well-supported by SOTA research in aerial localization.34

  1. Stage 1 (Coarse): Global Descriptor Retrieval.
    • Action: When the TOH requests an anchor for Keyframe_k, the GAB first computes a global descriptor (a compact vector representation) for the nadir-warped (see 5.3) low-resolution Image_k_LR.
    • Technology: A SOTA Visual Place Recognition (VPR) model like SALAD 49, TransVLAD 50, or NetVLAD 33 will be used. These are designed for this "image retrieval" task.45
    • Result: This descriptor is used to perform a fast FAISS/vector search against the descriptors of the local satellite tiles (which were pre-computed and stored in the Geo-Database). This returns the Top-K (e.g., K=5) most likely satellite tiles in milliseconds.
  2. Stage 2 (Fine): Local Feature Matching.
    • Action: The system runs SuperPoint+LightGlue 43 to find pixel-level correspondences.
    • Performance: This is not run on the full UAV image against the full satellite map. It is run only between high-resolution patches (from Image_k_HR) and the Top-K satellite tiles identified in Stage 1.
    • Result: This produces a set of 2D-2D (image-to-map) feature matches. A PnP/RANSAC solver then computes a high-confidence 6-DoF pose. This pose is the Absolute_Metric_Anchor that is sent to the TOH.

5.3 Solving the Viewpoint Gap: Dynamic Feature Warping

The GAB must solve the "viewpoint gap" 33: the UAV image is oblique (due to roll/pitch), while the satellite tiles are nadir (top-down).

The GEORTEX-R draft proposed a complex, high-risk deep learning solution. The ATLAS-GEOFUSE solution is far more elegant and requires zero R&D:

  1. The V-SLAM Front-End (Section 4.0) already knows the camera's relative 6-DoF pose, including its roll and pitch orientation relative to the local map's ground plane.
  2. The Local Geo-Database (Section 3.0) contains a 30m-resolution DEM for the AOI.
  3. When the GAB processes Keyframe_k, it first performs a dynamic homography warp. It projects the V-SLAM ground plane onto the coarse DEM, and then uses the known camera roll/pitch to calculate the perspective transform (homography) needed to un-distort the oblique UAV image into a synthetic nadir-view.

This nadir-warped UAV image is then used in the Coarse-to-Fine pipeline (5.2). It will now match the nadir satellite tiles with extremely high-fidelity. This method eliminates the viewpoint gap without training any new neural networks, leveraging the inherent synergy between the V-SLAM component and the GAB's pre-cached DEM.

6.0 Core Component: The Multi-Map Trajectory Optimization Hub (TOH)

This component is the system's central "brain." It runs continuously, fusing all measurements (high-frequency/unscaled V-SLAM, low-frequency/metric-scale GAB anchors) from all map fragments into a single, globally consistent trajectory.

6.1 Incremental Sim(3) Pose-Graph Optimization

The central challenge of monocular, IMU-denied SLAM is scale-drift. The V-SLAM front-end produces unscaled 6-DoF (SE(3)) relative poses.37 The GAB produces metric-scale 6-DoF (SE(3)) absolute poses. These cannot be directly combined.

The solution is that the graph must be optimized in Sim(3) (7-DoF).39 This adds a single global scale factor $s$ as an optimizable variable to each V-SLAM map fragment. The TOH will maintain a pose-graph using Ceres Solver 19, a SOTA optimization library.

The graph is constructed as follows:

  1. Nodes: Each keyframe pose (7-DoF: X, Y, Z, Qx, Qy, Qz, s).
  2. Edge 1 (V-SLAM): A relative pose constraint between Keyframe_i and Keyframe_j within the same map fragment. The error is computed in Sim(3).29
  3. Edge 2 (GAB): An absolute pose constraint on Keyframe_k. This constraint fixes Keyframe_k's pose to the metric GPS coordinate from the GAB anchor and fixes its scale s to 1.0.

The GAB's s=1.0 anchor creates "tension" in the graph. The Ceres optimizer 20 resolves this tension by finding the one global scale s for all other V-SLAM nodes in that fragment that minimizes the total error. This effectively "stretches" or "shrinks" the entire unscaled V-SLAM fragment to fit the metric anchors, which is the core of monocular SLAM scale-drift correction.29

6.2 Geodetic Map-Merging via Absolute Anchors

This is the robust solution to the "sharp turn" (AC-4) problem, replacing the flawed "relocalization" model from the original draft.

  • Scenario: The UAV makes a sharp turn (AC-4). The V-SLAM front-end loses tracking on Map_Fragment_0 and creates Map_Fragment_1 (per Section 4.1). The TOH's pose graph now contains two disconnected components.
  • Mechanism (Geodetic Merging):
    1. The GAB (Section 5.0) is queued to find anchors for keyframes in both fragments.
    2. The GAB returns Anchor_A for Keyframe_10 (in Map_Fragment_0) with GPS [Lat_A, Lon_A].
    3. The GAB returns Anchor_B for Keyframe_50 (in Map_Fragment_1) with GPS ``.
    4. The TOH adds both of these as absolute, metric constraints (Edge 2) to the global pose-graph.
  • The graph optimizer 20 now has all the information it needs. It will solve for the 7-DoF pose of both fragments, placing them in their correct, globally-consistent metric positions. The two fragments are merged geodetically (i.e., by their global coordinates) even if they never visually overlap. This is a vastly more robust and modern solution than simple visual loop closure.19

6.3 Automatic Outlier Rejection (AC-3, AC-5)

The system must be robust to 350m outliers (AC-3) and <10% bad GAB matches (AC-5). A standard least-squares optimizer (like Ceres 20) would be catastrophically corrupted by a 350m error.

This is a solved problem in modern graph optimization.19 The solution is to wrap all constraints (V-SLAM and GAB) in a Robust Loss Function (e.g., HuberLoss, CauchyLoss) within Ceres Solver.

A robust loss function mathematically down-weights the influence of constraints with large errors (high residuals). When the TOH "sees" the 350m error from a V-SLAM relative pose (AC-3) or a bad GAB anchor (AC-5), the robust loss function effectively acknowledges the measurement but refuses to pull the entire 3000-image trajectory to fit this one "insane" data point. It automatically and gracefully ignores the outlier, optimizing the 99.9% of "sane" measurements, thus meeting AC-3 and AC-5.

Table 2: Analysis of Trajectory Optimization Strategies

Approach (Tools/Library) Advantages Limitations Requirements Fitness for Problem Component
Incremental SLAM (Pose-Graph Optimization) (Ceres Solver 19, g2o, GTSAM) - Real-time / Online: Provides immediate pose estimates (AC-7). - Supports Refinement: Explicitly designed to refine past poses when new "loop closure" (GAB) data arrives (AC-8). - Robust: Can handle outliers via robust kernels.19 - Initial estimate is unscaled until a GAB anchor arrives. - Can drift if not anchored. - A graph optimization library (Ceres). - A robust cost function (Huber). Excellent (Selected). This is the only architecture that satisfies all user requirements for real-time streaming (AC-7) and asynchronous refinement (AC-8).
Batch Structure from Motion (Global Bundle Adjustment) (COLMAP, Agisoft Metashape) - Globally Optimal Accuracy: Produces the most accurate possible 3D reconstruction. - Offline: Cannot run in real-time or stream results. - High computational cost (minutes to hours). - Fails AC-7 and AC-8 completely. - All images must be available before processing starts. - High RAM and CPU. Good (as an Optional Post-Processing Step). Unsuitable as the primary online system, but could be offered as an optional, high-accuracy "Finalize Trajectory" batch process.

7.0 High-Performance Compute & Deployment

The system must run on an RTX 2060 (AC-7) while processing 6.2K images. These are opposing constraints that require a deliberate compute strategy to balance speed and accuracy.

7.1 Multi-Scale, Coarse-to-Fine Processing Pipeline

The system must balance the conflicting demands of real-time speed (AC-7) and high accuracy (AC-2). This is achieved by running different components at different resolutions.

  • V-SLAM Front-End (Real-time, <5s): This component (Section 4.0) runs only on the Image_N_LR (e.g., 1536x1024) copy. This is fast enough to meet the AC-7 budget.46
  • GAB (Asynchronous, High-Accuracy): This component (Section 5.0) uses the full-resolution Image_N_HR selectively to meet the 20m accuracy (AC-2).
    1. Stage 1 (Coarse) runs on the low-res, nadir-warped image.
    2. Stage 2 (Fine) runs SuperPoint on the full 6.2K image to find the most confident keypoints. It then extracts small, 256x256 patches from the full-resolution image, centered on these keypoints.
    3. It matches these small, full-resolution patches against the high-res satellite tile.

This hybrid, multi-scale method provides the fine-grained matching accuracy of the 6.2K image (needed for AC-2) without the catastrophic CUDA Out-of-Memory errors (an RTX 2060 has only 6GB VRAM 30) or performance penalties that full-resolution processing would entail.

7.2 Mandatory Deployment: NVIDIA TensorRT Acceleration

The deep learning models (SuperPoint, LightGlue, NetVLAD) will be too slow in their native PyTorch framework to meet AC-7 on an RTX 2060.

This is not an "optional" optimization; it is a mandatory deployment step. The key neural networks must be converted from PyTorch into a highly-optimized NVIDIA TensorRT engine.

Research specifically on accelerating LightGlue with TensorRT shows "2x-4x speed gains over compiled PyTorch".48 Other benchmarks confirm TensorRT provides 30-70% speedups for deep learning inference.52 This conversion (which applies layer fusion, graph optimization, and FP16/INT8 precision) is what makes achieving the <5s (AC-7) performance possible on the specified RTX 2060 hardware.

8.0 System Robustness: Failure Mode Escalation Logic

This logic defines the system's behavior during real-world failures, ensuring it meets criteria AC-3, AC-4, AC-6, and AC-9, and is built upon the new "Atlas" multi-map architecture.

8.1 Stage 1: Normal Operation (Tracking)

  • Condition: V-SLAM front-end (Section 4.0) is healthy.
  • Logic:
    1. V-SLAM successfully tracks Image_N_LR against its active map fragment.
    2. A new Relative_Unscaled_Pose is sent to the TOH (Section 6.0).
    3. TOH sends Pose_N_Est (unscaled) to the user (AC-7, AC-8 met).
    4. If Image_N is selected as a keyframe, the GAB (Section 5.0) is queued to find an anchor for it, which will trigger a Pose_N_Refined update later.

8.2 Stage 2: Transient VO Failure (Outlier Rejection)

  • Condition: Image_N is unusable (e.g., severe blur, sun-glare, or the 350m outlier from AC-3).
  • Logic (Frame Skipping):
    1. V-SLAM front-end fails to track Image_N_LR against the active map.
    2. The system discards Image_N (marking it as a rejected outlier, AC-5).
    3. When Image_N+1 arrives, the V-SLAM front-end attempts to track it against the same local keyframe map (from Image_N-1).
    4. If successful: Tracking resumes. Image_N is officially an outlier. The system "correctly continues the work" (AC-3 met).
    5. If fails: The system repeats for Image_N+2, N+3. If this fails for ~5 consecutive frames, it escalates to Stage 3.

8.3 Stage 3: Persistent VO Failure (New Map Initialization)

  • Condition: Tracking is lost for multiple frames. This is the "sharp turn" (AC-4) or "low overlap" (AC-4) scenario.
  • Logic (Atlas Multi-Map):
    1. The V-SLAM front-end (Section 4.0) declares "Tracking Lost."
    2. It marks the current Map_Fragment_k as "inactive".13
    3. It immediately initializes a new Map_Fragment_k+1 using the current frame (Image_N+5).
    4. Tracking resumes instantly on this new, unscaled, un-anchored map fragment.
    5. This "registering" of a new map ensures the system "correctly continues the work" (AC-4 met) and maintains the >95% registration rate (AC-9) by not counting this as a failure.

8.4 Stage 4: Map-Merging & Global Relocalization (GAB-Assisted)

  • Condition: The system is now tracking on Map_Fragment_k+1, while Map_Fragment_k is inactive. The TOH pose-graph (Section 6.0) is disconnected.
  • Logic (Geodetic Merging):
    1. The TOH queues the GAB (Section 5.0) to find anchors for both map fragments.
    2. The GAB finds anchors for keyframes in both fragments.
    3. The TOH (Section 6.2) receives these metric anchors, adds them to the graph, and the Ceres optimizer 20 finds the global 7-DoF pose for both fragments, merging them into a single, metrically-consistent trajectory.

8.5 Stage 5: Catastrophic Failure (User Intervention)

  • Condition: The system is in Stage 3 (Lost), and the GAB (Section 5.0) has also failed to find any global anchors for a new Map_Fragment_k+1 for a prolonged period (e.g., 20% of the route). This is the "absolutely incapable" scenario (AC-6), (e.g., flying over a large, featureless body of water or dense, uniform fog).
  • Logic:
    1. The system has an unscaled, un-anchored map fragment (Map_Fragment_k+1) and zero idea where it is in the world.
    2. The TOH triggers the AC-6 flag.
  • Resolution (User-Aided Prior):
    1. The UI prompts the user: "Tracking lost. Please provide a coarse location for the current image."
    2. The user clicks one point on a map.
    3. This [Lat, Lon] is not taken as ground truth. It is fed to the GAB (Section 5.0) as a strong spatial prior for its local database query (Section 5.2).
    4. This narrows the GAB's Stage 1 search area from "the entire AOI" to "a 5km radius around the user's click." This guarantees the GAB will find the correct satellite tile, find a high-confidence Absolute_Metric_Anchor, and allow the TOH (Stage 4) to re-scale 29 and geodetically-merge 20 this lost fragment, re-localizing the entire trajectory.

9.0 High-Accuracy Output Generation and Validation Strategy

This section details how the final user-facing outputs are generated, specifically replacing the flawed "Ray-DEM" method (see 1.4) with a high-accuracy "Ray-Cloud" method to meet the 20m accuracy (AC-2).

9.1 High-Accuracy Object Geolocalization via Ray-Cloud Intersection

As established in 1.4, using an external 30m DEM 21 for object localization introduces uncontrollable errors (up to 4m+22) that make meeting the 20m (AC-2) accuracy goal impossible. The system must use its own, internally-generated 3D map, which is locally far more accurate.25

  • Inputs:
    1. User clicks pixel coordinate (u,v) on Image_N.
    2. The system retrieves the final, refined, metric 7-DoF Sim(3) pose P_{sim(3)} = (s, R, T) for the map fragment that Image_N belongs to. This transform P_{sim(3)} maps the local V-SLAM coordinate system to the global metric coordinate system.
    3. The system retrieves the local, unscaled V-SLAM 3D point cloud (P_{local_cloud}) generated by the Front-End (Section 4.3).
    4. The known camera intrinsic matrix K.
  • Algorithm (Ray-Cloud Intersection):
    1. Un-project Pixel: The 2D pixel (u,v) is un-projected into a 3D ray direction vector d_{cam} in the camera's local coordinate system: d_{cam} = K^{-1} \\cdot [u, v, 1]^T.
    2. Transform Ray (Local): This ray is transformed using the local V-SLAM pose of Image_N to get a ray in the local map fragment's coordinate system.
    3. Intersect (Local): The system performs a numerical ray-mesh intersection (or nearest-neighbor search) to find the 3D point P_{local} where this local ray intersects the local V-SLAM point cloud (P_{local_cloud}).25 This P_{local} is highly accurate relative to the V-SLAM map.26
    4. Transform (Global): This local 3D point P_{local} is now transformed to the global, metric coordinate system using the 7-DoF Sim(3) transform from the TOH: P_{metric} = s \\cdot (R \\cdot P_{local}) + T.
    5. Result: This 3D intersection point P_{metric} is the metric world coordinate of the object.
    6. Convert: This (X, Y, Z) world coordinate is converted to a [Latitude, Longitude, Altitude] GPS coordinate.55

This method correctly isolates the error. The object's accuracy is now only dependent on the V-SLAM's geometric fidelity (AC-10 MRE < 1.0px) and the GAB's global anchoring (AC-1, AC-2). It completely eliminates the external 30m DEM error 22 from this critical, high-accuracy calculation.

9.2 Rigorous Validation Methodology

A comprehensive test plan is required to validate compliance with all 10 Acceptance Criteria. The foundation is a Ground-Truth Test Harness (e.g., using the provided coordinates.csv data).

  • Test Harness:
    1. Ground-Truth Data: coordinates.csv provides ground-truth [Lat, Lon] for a set of images.
    2. Test Datasets:
      • Test_Baseline: The ground-truth images and coordinates.
      • Test_Outlier_350m (AC-3): Test_Baseline with a single, unrelated image inserted.
      • Test_Sharp_Turn_5pct (AC-4): A sequence where several frames are manually deleted to simulate <5% overlap.
      • Test_Long_Route (AC-9): A 1500-image sequence.
  • Test Cases:
    • Test_Accuracy (AC-1, AC-2, AC-5, AC-9):
      • Run: Execute ATLAS-GEOFUSE on Test_Baseline, providing the first image's coordinate as the Start Coordinate.
      • Script: A validation script will compute the Haversine distance error between the system's refined GPS output (Pose_N^{Refined}) for each image and the ground-truth GPS.
      • ASSERT (count(errors < 50m) / total_images) >= 0.80 (AC-1 Met)
      • ASSERT (count(errors < 20m) / total_images) >= 0.60 (AC-2 Met)
      • ASSERT (count(un-localized_images) / total_images) < 0.10 (AC-5 Met)
      • ASSERT (count(localized_images) / total_images) > 0.95 (AC-9 Met)
    • Test_MRE (AC-10):
      • Run: After Test_Baseline completes.
      • ASSERT TOH.final_Mean_Reprojection_Error < 1.0 (AC-10 Met)
    • Test_Performance (AC-7, AC-8):
      • Run: Execute on Test_Long_Route on the minimum-spec RTX 2060.
      • Log: Log timestamps for "Image In" -> "Initial Pose Out" (Pose_N^{Est}).
      • ASSERT average_time < 5.0s (AC-7 Met)
      • Log: Log the output stream.
      • ASSERT >80% of images receive two poses: an "Initial" and a "Refined" (AC-8 Met)
    • Test_Robustness (AC-3, AC-4, AC-6):
      • Run: Execute Test_Outlier_350m.
      • ASSERT System logs "Stage 2: Discarding Outlier" or "Stage 3: New Map" and the final trajectory error for the next frame is < 50m (AC-3 Met).
      • Run: Execute Test_Sharp_Turn_5pct.
      • ASSERT System logs "Stage 3: New Map Initialization" and "Stage 4: Geodetic Map-Merge," and the final trajectory is complete and accurate (AC-4 Met).
      • Run: Execute on a sequence with no GAB anchors possible for 20% of the route.
      • ASSERT System logs "Stage 5: User Intervention Requested" (AC-6 Met).