##  **The ATLAS-GEOFUSE System Architecture**

Multi-component architecture designed for high-performance, real-time geolocalization in IMU-denied, high-drift environments. Its architecture is explicitly designed around **pre-flight data caching** and **multi-map robustness**.

### **2.1 Core Design Principles**

1. **Pre-Flight Caching:** To meet the <5s (AC-7) real-time requirement, all network latency must be eliminated. The system mandates a "Pre-Flight" step (Section 3.0) where all geospatial data (satellite tiles, DEMs, vector data) for the Area of Interest (AOI) is downloaded from a viable open-source provider (e.g., Copernicus 6) and stored in a local database on the processing laptop. All real-time queries are made against this local cache.  
2. **Decoupled Multi-Map SLAM:** The system separates *relative* motion from *absolute* scale. A Visual SLAM (V-SLAM) "Atlas" Front-End (Section 4.0) computes high-frequency, robust, but *unscaled* relative motion. A Local Geospatial Anchoring Back-End (GAB) (Section 5.0) provides sparse, high-confidence, *absolute metric* anchors by querying the local cache. A Trajectory Optimization Hub (TOH) (Section 6.0) fuses these two streams in a Sim(3) pose-graph to solve for the global 7-DoF trajectory (pose + scale).  
3. **Multi-Map Robustness (Atlas):** To solve the "sharp turn" (AC-4) and "tracking loss" (AC-6) requirements, the V-SLAM front-end is based on an "Atlas" architecture.14 Tracking loss initiates a *new, independent map fragment*.13 The TOH is responsible for anchoring and merging *all* fragments geodetically 19 into a single, globally-consistent trajectory.

### **2.2 Component Interaction and Data Flow**

* **Component 1: Pre-Flight Caching Module (PCM) (Offline)**  
  * *Input:* User-defined Area of Interest (AOI) (e.g., a KML polygon).  
  * *Action:* Queries Copernicus 6 and OpenStreetMap APIs. Downloads and builds a local geospatial database (GeoPackage/SpatiaLite) containing satellite tiles, DEM tiles, and road/river vectors for the AOI.  
  * *Output:* A single, self-contained **Local Geo-Database file**.  
* **Component 2: Image Ingestion & Pre-processing (Real-time)**  
  * *Input:* Image_N (up to 6.2K), Camera Intrinsics ($K$).  
  * *Action:* Creates two copies:  
    * **Image_N_LR** (Low-Resolution, e.g., 1536x1024): Dispatched *immediately* to the V-SLAM Front-End.  
    * **Image_N_HR** (High-Resolution, 6.2K): Stored for asynchronous use by the GAB.  
* **Component 3: V-SLAM "Atlas" Front-End (High-Frequency Thread)**  
  * *Input:* Image_N_LR.  
  * *Action:* Tracks Image_N_LR against its *active map fragment*. Manages keyframes, local bundle adjustment 38, and the co-visibility graph. If tracking is lost (e.g., AC-4 sharp turn), it initializes a *new map fragment* 14 and continues tracking.  
  * *Output:* **Relative_Unscaled_Pose** and **Local_Point_Cloud** data, sent to the TOH.  
* **Component 4: Local Geospatial Anchoring Back-End (GAB) (Low-Frequency, Asynchronous Thread)**  
  * *Input:* A keyframe (Image_N_HR) and its *unscaled* pose, triggered by the TOH.  
  * *Action:* Performs a visual-only, coarse-to-fine search 34 against the *Local Geo-Database*.  
  * *Output:* An **Absolute_Metric_Anchor** (a high-confidence [Lat, Lon, Alt] pose) for that keyframe, sent to the TOH.  
* **Component 5: Trajectory Optimization Hub (TOH) (Central Hub Thread)**  
  * *Input:* (1) High-frequency Relative_Unscaled_Pose stream. (2) Low-frequency Absolute_Metric_Anchor stream.  
  * *Action:* Manages the complete flight trajectory as a **Sim(3) pose graph** 39 using Ceres Solver.19 Continuously fuses all data.  
  * *Output 1 (Real-time):* **Pose_N_Est** (unscaled) sent to UI (meets AC-7, AC-8).  
  * *Output 2 (Refined):* **Pose_N_Refined** (metric-scale, globally-optimized) sent to UI (meets AC-1, AC-2, AC-8).

### **2.3 System Inputs**

1. **Image Sequence:** Consecutively named images (FullHD to 6252x4168).  
2. **Start Coordinate (Image 0):** A single, absolute GPS coordinate (Latitude, Longitude).  
3. **Camera Intrinsics ($K$):** Pre-calibrated camera intrinsic matrix.  
4. **Local Geo-Database File:** The single file generated by the Pre-Flight Caching Module (Section 3.0).

### **2.4 Streaming Outputs (Meets AC-7, AC-8)**

1. **Initial Pose ($Pose_N^{Est}$):** An *unscaled* pose estimate. This is sent immediately (<5s, AC-7) to the UI for real-time visualization of the UAV's *path shape*.  
2. **Refined Pose ($Pose_N^{Refined}$) [Asynchronous]:** A globally-optimized, *metric-scale* 7-DoF pose (X, Y, Z, Qx, Qy, Qz, Qw) and its corresponding [Lat, Lon, Alt] coordinate. This is sent to the user whenever the TOH re-converges (e.g., after a new GAB anchor or map-merge), updating all past poses (AC-1, AC-2, AC-8 refinement met).

## **3.0 Pre-Flight Component: The Geospatial Caching Module (PCM)**

This component is a new, mandatory, pre-flight utility that solves the fatal flaws (Section 1.1, 1.2) of the GEORTEX-R design. It eliminates all real-time network latency (AC-7) and all ToS violations (AC-5), ensuring the project is both performant and legally viable.

### **3.1 Defining the Area of Interest (AOI)**

The system is designed for long-range flights. Given 3000 photos at 100m intervals, the maximum linear track is 300km. The user must provide a coarse "bounding box" or polygon (e.g., KML/GeoJSON format) of the intended flight area. The PCM will automatically add a generous buffer (e.g., 20km) to this AOI to account for navigational drift and ensure all necessary reference data is captured.

### **3.2 Legal & Viable Data Sources (Copernicus & OpenStreetMap)**

As established in 1.1, the system *must* use open-data providers. The PCM is architected to use the following:

1. **Visual/Terrain Data (Primary):** The **Copernicus Data Space Ecosystem** 6 is the primary source. The PCM will use the Copernicus Processing and Catalogue APIs 6 to query, process, and download two key products for the buffered AOI:  
   * **Sentinel-2 Satellite Imagery:** High-resolution (10m) visual tiles.  
   * **Copernicus GLO-30 DEM:** A 30m-resolution Digital Elevation Model.7 This DEM is *not* used for high-accuracy object localization (see 1.4), but as a coarse altitude *prior* for the TOH and for the critical dynamic-warping step (Section 5.3).  
2. **Semantic Data (Secondary):** OpenStreetMap (OSM) data 40 for the AOI will be downloaded. This provides temporally-invariant vector data (roads, rivers, building footprints) which can be used as a secondary, optional verification layer for the GAB, especially in cases of extreme temporal divergence (e.g., new construction).42

### **3.3 Building the Local Geo-Database**

The PCM utility will process all downloaded data into a single, efficient, compressed file. A modern GeoPackage or SpatiaLite database is the ideal format. This database will contain the satellite tiles, DEM tiles, and vector features, all indexed by a common spatial grid (e.g., UTM).

This single file is then loaded by the main ATLAS-GEOFUSE application at runtime. The GAB's (Section 5.0) "API calls" are thus transformed from high-latency, unreliable HTTP requests 9 into high-speed, zero-latency local SQL queries, guaranteeing that data I/O is never the bottleneck for meeting the AC-7 performance requirement.

## **4.0 Core Component: The Multi-Map V-SLAM "Atlas" Front-End**

This component's sole task is to robustly and accurately compute the *unscaled* 6-DoF relative motion of the UAV and build a geometrically-consistent map of keyframes. It is explicitly designed to be more robust than simple frame-to-frame odometry and to handle catastrophic tracking loss (AC-4) gracefully.

### **4.1 Rationale: ORB-SLAM3 "Atlas" Architecture**

The system will implement a V-SLAM front-end based on the "Atlas" multi-map paradigm, as seen in SOTA systems like ORB-SLAM3.14 This is the industry-standard solution for robust, long-term navigation in environments where tracking loss is possible.13

The mechanism is as follows:

1. The system initializes and begins tracking on **Map_Fragment_0**, using the known start GPS as a metadata tag.  
2. It tracks all new frames (Image_N_LR) against this active map.  
3. **If tracking is lost** (e.g., a sharp turn (AC-4) or a persistent 350m outlier (AC-3)):  
   * The "Atlas" architecture does not fail. It declares Map_Fragment_0 "inactive," stores it, and *immediately initializes* **Map_Fragment_1** from the current frame.14  
   * Tracking *resumes instantly* on this new map fragment, ensuring the system "correctly continues the work" (AC-4).

This architecture converts the "sharp turn" failure case into a *standard operating procedure*. The system never "fails"; it simply fragments. The burden of stitching these fragments together is correctly moved from the V-SLAM front-end (which has no global context) to the TOH (Section 6.0), which *can* solve it using global-metric anchors.

### **4.2 Feature Matching Sub-System: SuperPoint + LightGlue**

The V-SLAM front-end's success depends entirely on high-quality feature matches, especially in the sparse, low-texture agricultural terrain seen in the user's images. The selected approach is **SuperPoint + LightGlue**.

* **SuperPoint:** A SOTA feature detector proven to find robust, repeatable keypoints in challenging, low-texture conditions.43  
* **LightGlue:** A highly optimized GNN-based matcher that is the successor to SuperGlue.44

The choice of LightGlue over SuperGlue is a deliberate performance optimization. LightGlue is *adaptive*.46 The user query states sharp turns (AC-4) are "rather an exception." This implies \~95% of image pairs are "easy" (high-overlap, straight flight) and 5% are "hard" (low-overlap, turns). LightGlue's adaptive-depth GNN exits early on "easy" pairs, returning a high-confidence match in a fraction of the time. This saves *enormous* computational budget on the 95% of normal frames, ensuring the system *always* meets the <5s budget (AC-7) and reserving that compute for the GAB and TOH. This component will run on **Image_N_LR** (low-res) to guarantee performance, and will be accelerated via TensorRT (Section 7.0).

### **4.3 Keyframe Management and Local 3D Cloud**

The front-end will maintain a co-visibility graph of keyframes for its *active map fragment*. It will perform local Bundle Adjustment 38 continuously over a sliding window of recent keyframes to minimize drift *within* that fragment.

Crucially, it will triangulate features to create a **local, high-density 3D point cloud** for its map fragment.28 This point cloud is essential for two reasons:

1. It provides robust tracking (tracking against a 3D map, not just a 2D frame).  
2. It serves as the **high-accuracy source** for the object localization output (Section 9.1), as established in 1.4, allowing the system to bypass the high-error external DEM.

#### **Table 1: Analysis of State-of-the-Art Feature Matchers (For V-SLAM Front-End)**

| Approach (Tools/Library) | Advantages | Limitations | Requirements | Fitness for Problem Component |
| :---- | :---- | :---- | :---- | :---- |
| **SuperPoint + SuperGlue** | - SOTA robustness in low-texture, high-blur conditions. - GNN reasons about 3D scene context. - Proven in real-time SLAM systems. | - Computationally heavy (fixed-depth GNN). - Slower than LightGlue. | - NVIDIA GPU (RTX 2060+). - PyTorch or TensorRT. | **Good.** A solid, baseline choice. Meets robustness needs but will heavily tax the <5s time budget (AC-7). |
| **SuperPoint + LightGlue** 44 | - **Adaptive Depth:** Faster on "easy" pairs, more accurate on "hard" pairs.46 - **Faster & Lighter:** Outperforms SuperGlue on speed and accuracy. - SOTA "in practice" choice for large-scale matching. | - Newer, but rapidly being adopted and proven.48 | - NVIDIA GPU (RTX 2060+). - PyTorch or TensorRT. | **Excellent (Selected).** The adaptive nature is *perfect* for this problem. It saves compute on the 95% of easy (straight) frames, maximizing our ability to meet AC-7. |

## **5.0 Core Component: The Local Geospatial Anchoring Back-End (GAB)**

This asynchronous component is the system's "anchor to reality." Its sole purpose is to find a high-confidence, *absolute-metric* pose for a given V-SLAM keyframe by matching it against the **local, pre-cached geo-database** (from Section 3.0). This component is a full replacement for the high-risk, high-latency GAB from the GEORTEX-R draft (see 1.2, 1.5).

### **5.1 Rationale: Local-First Query vs. On-Demand API**

As established in 1.2, all queries are made to the local SSD. This guarantees zero-latency I/O, which is a hard requirement for a real-time system, as external network latency is unacceptably high and variable.9 The GAB itself runs asynchronously and can take longer than 5s (e.g., 10-15s), but it must not be *blocked* by network I/O, which would stall the entire processing pipeline.

### **5.2 SOTA Visual-Only Coarse-to-Fine Localization**

This component implements a state-of-the-art, two-stage *visual-only* pipeline, which is lower-risk and more performant (see 1.5) than the GEORTEX-R's semantic-hybrid model. This approach is well-supported by SOTA research in aerial localization.34

1. **Stage 1 (Coarse): Global Descriptor Retrieval.**  
   * *Action:* When the TOH requests an anchor for Keyframe_k, the GAB first computes a *global descriptor* (a compact vector representation) for the *nadir-warped* (see 5.3) low-resolution Image_k_LR.  
   * *Technology:* A SOTA Visual Place Recognition (VPR) model like **SALAD** 49, **TransVLAD** 50, or **NetVLAD** 33 will be used. These are designed for this "image retrieval" task.45  
   * *Result:* This descriptor is used to perform a fast FAISS/vector search against the descriptors of the *local satellite tiles* (which were pre-computed and stored in the Geo-Database). This returns the Top-K (e.g., K=5) most likely satellite tiles in milliseconds.  
2. **Stage 2 (Fine): Local Feature Matching.**  
   * *Action:* The system runs **SuperPoint+LightGlue** 43 to find pixel-level correspondences.  
   * *Performance:* This is *not* run on the *full* UAV image against the *full* satellite map. It is run *only* between high-resolution patches (from **Image_k_HR**) and the **Top-K satellite tiles** identified in Stage 1.  
   * *Result:* This produces a set of 2D-2D (image-to-map) feature matches. A PnP/RANSAC solver then computes a high-confidence 6-DoF pose. This pose is the **Absolute_Metric_Anchor** that is sent to the TOH.

### **5.3 Solving the Viewpoint Gap: Dynamic Feature Warping**

The GAB must solve the "viewpoint gap" 33: the UAV image is oblique (due to roll/pitch), while the satellite tiles are nadir (top-down).

The GEORTEX-R draft proposed a complex, high-risk deep learning solution. The ATLAS-GEOFUSE solution is far more elegant and requires zero R\&D:

1. The V-SLAM Front-End (Section 4.0) already *knows* the camera's *relative* 6-DoF pose, including its **roll and pitch** orientation relative to the *local map's ground plane*.  
2. The *Local Geo-Database* (Section 3.0) contains a 30m-resolution DEM for the AOI.  
3. When the GAB processes Keyframe_k, it *first* performs a **dynamic homography warp**. It projects the V-SLAM ground plane onto the coarse DEM, and then uses the known camera roll/pitch to calculate the perspective transform (homography) needed to *un-distort* the oblique UAV image into a synthetic *nadir-view*.

This *nadir-warped* UAV image is then used in the Coarse-to-Fine pipeline (5.2). It will now match the *nadir* satellite tiles with extremely high-fidelity. This method *eliminates* the viewpoint gap *without* training any new neural networks, leveraging the inherent synergy between the V-SLAM component and the GAB's pre-cached DEM.

## **6.0 Core Component: The Multi-Map Trajectory Optimization Hub (TOH)**

This component is the system's central "brain." It runs continuously, fusing all measurements (high-frequency/unscaled V-SLAM, low-frequency/metric-scale GAB anchors) from *all map fragments* into a single, globally consistent trajectory.

### **6.1 Incremental Sim(3) Pose-Graph Optimization**

The central challenge of monocular, IMU-denied SLAM is scale-drift. The V-SLAM front-end produces *unscaled* 6-DoF ($SE(3)$) relative poses.37 The GAB produces *metric-scale* 6-DoF ($SE(3)$) *absolute* poses. These cannot be directly combined.

The solution is that the graph *must* be optimized in **Sim(3) (7-DoF)**.39 This adds a *single global scale factor $s$* as an optimizable variable to each V-SLAM map fragment. The TOH will maintain a pose-graph using **Ceres Solver** 19, a SOTA optimization library.

The graph is constructed as follows:

1. **Nodes:** Each keyframe pose (7-DoF: $X, Y, Z, Qx, Qy, Qz, s$).  
2. **Edge 1 (V-SLAM):** A relative pose constraint between Keyframe_i and Keyframe_j *within the same map fragment*. The error is computed in Sim(3).29  
3. **Edge 2 (GAB):** An *absolute* pose constraint on Keyframe_k. This constraint *fixes* Keyframe_k's pose to the *metric* GPS coordinate from the GAB anchor and *fixes its scale $s$ to 1.0*.

The GAB's $s=1.0$ anchor creates "tension" in the graph. The Ceres optimizer 20 resolves this tension by finding the *one* global scale $s$ for all *other* V-SLAM nodes in that fragment that minimizes the total error. This effectively "stretches" or "shrinks" the entire unscaled V-SLAM fragment to fit the metric anchors, which is the core of monocular SLAM scale-drift correction.29

### **6.2 Geodetic Map-Merging via Absolute Anchors**

This is the robust solution to the "sharp turn" (AC-4) problem, replacing the flawed "relocalization" model from the original draft.

* **Scenario:** The UAV makes a sharp turn (AC-4). The V-SLAM front-end *loses tracking* on Map_Fragment_0 and *creates* Map_Fragment_1 (per Section 4.1). The TOH's pose graph now contains *two disconnected components*.  
* **Mechanism (Geodetic Merging):**  
  1. The GAB (Section 5.0) is *queued* to find anchors for keyframes in *both* fragments.  
  2. The GAB returns Anchor_A for Keyframe_10 (in Map_Fragment_0) with GPS [Lat_A, Lon_A].  
  3. The GAB returns Anchor_B for Keyframe_50 (in Map_Fragment_1) with GPS ``.  
  4. The TOH adds *both* of these as absolute, metric constraints (Edge 2) to the global pose-graph.  
* The graph optimizer 20 now has all the information it needs. It will solve for the 7-DoF pose of *both fragments*, placing them in their correct, globally-consistent metric positions. The two fragments are *merged geodetically* (i.e., by their global coordinates) even if they *never* visually overlap. This is a vastly more robust and modern solution than simple visual loop closure.19

### **6.3 Automatic Outlier Rejection (AC-3, AC-5)**

The system must be robust to 350m outliers (AC-3) and <10% bad GAB matches (AC-5). A standard least-squares optimizer (like Ceres 20) would be catastrophically corrupted by a 350m error.

This is a solved problem in modern graph optimization.19 The solution is to wrap *all* constraints (V-SLAM and GAB) in a **Robust Loss Function (e.g., HuberLoss, CauchyLoss)** within Ceres Solver.

A robust loss function mathematically *down-weights* the influence of constraints with large errors (high residuals). When the TOH "sees" the 350m error from a V-SLAM relative pose (AC-3) or a bad GAB anchor (AC-5), the robust loss function effectively acknowledges the measurement but *refuses* to pull the entire 3000-image trajectory to fit this one "insane" data point. It automatically and gracefully *ignores* the outlier, optimizing the 99.9% of "sane" measurements, thus meeting AC-3 and AC-5.

### **Table 2: Analysis of Trajectory Optimization Strategies**

| Approach (Tools/Library) | Advantages | Limitations | Requirements | Fitness for Problem Component |
| :---- | :---- | :---- | :---- | :---- |
| **Incremental SLAM (Pose-Graph Optimization)** (Ceres Solver 19, g2o, GTSAM) | - **Real-time / Online:** Provides immediate pose estimates (AC-7). - **Supports Refinement:** Explicitly designed to refine past poses when new "loop closure" (GAB) data arrives (AC-8). - **Robust:** Can handle outliers via robust kernels.19 | - Initial estimate is *unscaled* until a GAB anchor arrives. - Can drift *if* not anchored. | - A graph optimization library (Ceres). - A robust cost function (Huber). | **Excellent (Selected).** This is the *only* architecture that satisfies all user requirements for real-time streaming (AC-7) and asynchronous refinement (AC-8). |
| **Batch Structure from Motion (Global Bundle Adjustment)** (COLMAP, Agisoft Metashape) | - **Globally Optimal Accuracy:** Produces the most accurate possible 3D reconstruction. | - **Offline:** Cannot run in real-time or stream results. - High computational cost (minutes to hours). - Fails AC-7 and AC-8 completely. | - All images must be available before processing starts. - High RAM and CPU. | **Good (as an *Optional* Post-Processing Step).** Unsuitable as the primary online system, but could be offered as an optional, high-accuracy "Finalize Trajectory" batch process. |

## **7.0 High-Performance Compute & Deployment**

The system must run on an RTX 2060 (AC-7) while processing 6.2K images. These are opposing constraints that require a deliberate compute strategy to balance speed and accuracy.

### **7.1 Multi-Scale, Coarse-to-Fine Processing Pipeline**

The system must balance the conflicting demands of real-time speed (AC-7) and high accuracy (AC-2). This is achieved by running different components at different resolutions.

* **V-SLAM Front-End (Real-time, <5s):** This component (Section 4.0) runs *only* on the **Image_N_LR** (e.g., 1536x1024) copy. This is fast enough to meet the AC-7 budget.46  
* **GAB (Asynchronous, High-Accuracy):** This component (Section 5.0) uses the full-resolution **Image_N_HR** *selectively* to meet the 20m accuracy (AC-2).  
  1. Stage 1 (Coarse) runs on the low-res, nadir-warped image.  
  2. Stage 2 (Fine) runs SuperPoint on the *full 6.2K* image to find the *most confident* keypoints. It then extracts small, 256x256 *patches* from the *full-resolution* image, centered on these keypoints.  
  3. It matches *these small, full-resolution patches* against the high-res satellite tile.

This hybrid, multi-scale method provides the fine-grained matching accuracy of the 6.2K image (needed for AC-2) without the catastrophic CUDA Out-of-Memory errors (an RTX 2060 has only 6GB VRAM 30) or performance penalties that full-resolution processing would entail.

### **7.2 Mandatory Deployment: NVIDIA TensorRT Acceleration**

The deep learning models (SuperPoint, LightGlue, NetVLAD) will be too slow in their native PyTorch framework to meet AC-7 on an RTX 2060.

This is not an "optional" optimization; it is a *mandatory* deployment step. The key neural networks *must* be converted from PyTorch into a highly-optimized **NVIDIA TensorRT engine**.

Research *specifically* on accelerating LightGlue with TensorRT shows **"2x-4x speed gains over compiled PyTorch"**.48 Other benchmarks confirm TensorRT provides 30-70% speedups for deep learning inference.52 This conversion (which applies layer fusion, graph optimization, and FP16/INT8 precision) is what makes achieving the <5s (AC-7) performance *possible* on the specified RTX 2060 hardware.

## **8.0 System Robustness: Failure Mode Escalation Logic**

This logic defines the system's behavior during real-world failures, ensuring it meets criteria AC-3, AC-4, AC-6, and AC-9, and is built upon the new "Atlas" multi-map architecture.

### **8.1 Stage 1: Normal Operation (Tracking)**

* **Condition:** V-SLAM front-end (Section 4.0) is healthy.  
* **Logic:**  
  1. V-SLAM successfully tracks Image_N_LR against its *active map fragment*.  
  2. A new **Relative_Unscaled_Pose** is sent to the TOH (Section 6.0).  
  3. TOH sends **Pose_N_Est** (unscaled) to the user (AC-7, AC-8 met).  
  4. If Image_N is selected as a keyframe, the GAB (Section 5.0) is *queued* to find an anchor for it, which will trigger a **Pose_N_Refined** update later.

### **8.2 Stage 2: Transient VO Failure (Outlier Rejection)**

* **Condition:** Image_N is unusable (e.g., severe blur, sun-glare, or the 350m outlier from AC-3).  
* **Logic (Frame Skipping):**  
  1. V-SLAM front-end fails to track Image_N_LR against the active map.  
  2. The system *discards* Image_N (marking it as a rejected outlier, AC-5).  
  3. When Image_N+1 arrives, the V-SLAM front-end attempts to track it against the *same* local keyframe map (from Image_N-1).  
  4. **If successful:** Tracking resumes. Image_N is officially an outlier. The system "correctly continues the work" (AC-3 met).  
  5. **If fails:** The system repeats for Image_N+2, N+3. If this fails for \~5 consecutive frames, it escalates to Stage 3.

### **8.3 Stage 3: Persistent VO Failure (New Map Initialization)**

* **Condition:** Tracking is lost for multiple frames. This is the **"sharp turn" (AC-4)** or "low overlap" (AC-4) scenario.  
* **Logic (Atlas Multi-Map):**  
  1. The V-SLAM front-end (Section 4.0) declares "Tracking Lost."  
  2. It marks the current Map_Fragment_k as "inactive".13  
  3. It *immediately* initializes a **new** Map_Fragment_k+1 using the current frame (Image_N+5).  
  4. **Tracking resumes instantly** on this new, unscaled, un-anchored map fragment.  
  5. This "registering" of a new map ensures the system "correctly continues the work" (AC-4 met) and maintains the >95% registration rate (AC-9) by not counting this as a failure.

### **8.4 Stage 4: Map-Merging & Global Relocalization (GAB-Assisted)**

* **Condition:** The system is now tracking on Map_Fragment_k+1, while Map_Fragment_k is inactive. The TOH pose-graph (Section 6.0) is disconnected.  
* **Logic (Geodetic Merging):**  
  1. The TOH queues the GAB (Section 5.0) to find anchors for *both* map fragments.  
  2. The GAB finds anchors for keyframes in *both* fragments.  
  3. The TOH (Section 6.2) receives these metric anchors, adds them to the graph, and the Ceres optimizer 20 *finds the global 7-DoF pose for both fragments*, merging them into a single, metrically-consistent trajectory.

### **8.5 Stage 5: Catastrophic Failure (User Intervention)**

* **Condition:** The system is in Stage 3 (Lost), *and* the GAB (Section 5.0) has *also* failed to find *any* global anchors for a new Map_Fragment_k+1 for a prolonged period (e.g., 20% of the route). This is the "absolutely incapable" scenario (AC-6), (e.g., flying over a large, featureless body of water or dense, uniform fog).  
* **Logic:**  
  1. The system has an *unscaled, un-anchored* map fragment (Map_Fragment_k+1) and *zero* idea where it is in the world.  
  2. The TOH triggers the AC-6 flag.  
* **Resolution (User-Aided Prior):**  
  1. The UI prompts the user: "Tracking lost. Please provide a coarse location for the *current* image."  
  2. The user clicks *one point* on a map.  
  3. This [Lat, Lon] is *not* taken as ground truth. It is fed to the **GAB (Section 5.0)** as a *strong spatial prior* for its *local database query* (Section 5.2).  
  4. This narrows the GAB's Stage 1 search area from "the entire AOI" to "a 5km radius around the user's click." This *guarantees* the GAB will find the correct satellite tile, find a high-confidence **Absolute_Metric_Anchor**, and allow the TOH (Stage 4) to re-scale 29 and geodetically-merge 20 this lost fragment, re-localizing the entire trajectory.

## **9.0 High-Accuracy Output Generation and Validation Strategy**

This section details how the final user-facing outputs are generated, specifically replacing the flawed "Ray-DEM" method (see 1.4) with a high-accuracy "Ray-Cloud" method to meet the 20m accuracy (AC-2).

### **9.1 High-Accuracy Object Geolocalization via Ray-Cloud Intersection**

As established in 1.4, using an external 30m DEM 21 for object localization introduces uncontrollable errors (up to 4m+22) that make meeting the 20m (AC-2) accuracy goal impossible. The system *must* use its *own*, internally-generated 3D map, which is locally far more accurate.25

* **Inputs:**  
  1. User clicks pixel coordinate $(u,v)$ on Image_N.  
  2. The system retrieves the **final, refined, metric 7-DoF Sim(3) pose** $P_{sim(3)} = (s, R, T)$ for the *map fragment* that Image_N belongs to. This transform $P_{sim(3)}$ maps the *local V-SLAM coordinate system* to the *global metric coordinate system*.  
  3. The system retrieves the *local, unscaled* **V-SLAM 3D point cloud** ($P_{local_cloud}$) generated by the Front-End (Section 4.3).  
  4. The known camera intrinsic matrix $K$.  
* **Algorithm (Ray-Cloud Intersection):**  
  1. **Un-project Pixel:** The 2D pixel $(u,v)$ is un-projected into a 3D ray *direction* vector $d_{cam}$ in the camera's local coordinate system: $d_{cam} = K^{-1} \\cdot [u, v, 1]^T$.  
  2. **Transform Ray (Local):** This ray is transformed using the *local V-SLAM pose* of Image_N to get a ray in the *local map fragment's* coordinate system.  
  3. **Intersect (Local):** The system performs a numerical *ray-mesh intersection* (or nearest-neighbor search) to find the 3D point $P_{local}$ where this local ray *intersects the local V-SLAM point cloud* ($P_{local_cloud}$).25 This $P_{local}$ is *highly accurate* relative to the V-SLAM map.26  
  4. **Transform (Global):** This local 3D point $P_{local}$ is now transformed to the global, metric coordinate system using the 7-DoF Sim(3) transform from the TOH: $P_{metric} = s \\cdot (R \\cdot P_{local}) + T$.  
  5. **Result:** This 3D intersection point $P_{metric}$ is the *metric* world coordinate of the object.  
  6. **Convert:** This $(X, Y, Z)$ world coordinate is converted to a [Latitude, Longitude, Altitude] GPS coordinate.55

This method correctly isolates the error. The object's accuracy is now *only* dependent on the V-SLAM's geometric fidelity (AC-10 MRE < 1.0px) and the GAB's global anchoring (AC-1, AC-2). It *completely eliminates* the external 30m DEM error 22 from this critical, high-accuracy calculation.

### **9.2 Rigorous Validation Methodology**

A comprehensive test plan is required to validate compliance with all 10 Acceptance Criteria. The foundation is a **Ground-Truth Test Harness** (e.g., using the provided coordinates.csv data).

* **Test Harness:**  
  1. **Ground-Truth Data:** coordinates.csv provides ground-truth [Lat, Lon] for a set of images.  
  2. **Test Datasets:**  
     * Test_Baseline: The ground-truth images and coordinates.  
     * Test_Outlier_350m (AC-3): Test_Baseline with a single, unrelated image inserted.  
     * Test_Sharp_Turn_5pct (AC-4): A sequence where several frames are manually deleted to simulate <5% overlap.  
     * Test_Long_Route (AC-9): A 1500-image sequence.  
* **Test Cases:**  
  * **Test_Accuracy (AC-1, AC-2, AC-5, AC-9):**  
    * **Run:** Execute ATLAS-GEOFUSE on Test_Baseline, providing the first image's coordinate as the Start Coordinate.  
    * **Script:** A validation script will compute the Haversine distance error between the *system's refined GPS output* ($Pose_N^{Refined}$) for each image and the *ground-truth GPS*.  
    * **ASSERT** (count(errors < 50m) / total_images) >= 0.80 **(AC-1 Met)**  
    * **ASSERT** (count(errors < 20m) / total_images) >= 0.60 **(AC-2 Met)**  
    * **ASSERT** (count(un-localized_images) / total_images) < 0.10 **(AC-5 Met)**  
    * **ASSERT** (count(localized_images) / total_images) > 0.95 **(AC-9 Met)**  
  * **Test_MRE (AC-10):**  
    * **Run:** After Test_Baseline completes.  
    * **ASSERT** TOH.final_Mean_Reprojection_Error < 1.0 **(AC-10 Met)**  
  * **Test_Performance (AC-7, AC-8):**  
    * **Run:** Execute on Test_Long_Route on the minimum-spec RTX 2060.  
    * **Log:** Log timestamps for "Image In" -> "Initial Pose Out" ($Pose_N^{Est}$).  
    * **ASSERT** average_time < 5.0s **(AC-7 Met)**  
    * **Log:** Log the output stream.  
    * **ASSERT** >80% of images receive *two* poses: an "Initial" and a "Refined" **(AC-8 Met)**  
  * **Test_Robustness (AC-3, AC-4, AC-6):**  
    * **Run:** Execute Test_Outlier_350m.  
    * **ASSERT** System logs "Stage 2: Discarding Outlier" or "Stage 3: New Map" *and* the final trajectory error for the *next* frame is < 50m **(AC-3 Met)**.  
    * **Run:** Execute Test_Sharp_Turn_5pct.  
    * **ASSERT** System logs "Stage 3: New Map Initialization" and "Stage 4: Geodetic Map-Merge," and the final trajectory is complete and accurate **(AC-4 Met)**.  
    * **Run:** Execute on a sequence with no GAB anchors possible for 20% of the route.  
    * **ASSERT** System logs "Stage 5: User Intervention Requested" **(AC-6 Met)**.