mirror of
https://github.com/azaion/gps-denied-desktop.git
synced 2026-04-22 22:36:36 +00:00
bed8e6d52a
Separate tutorial.md for developers from commands for AI WIP
373 lines
33 KiB
Markdown
373 lines
33 KiB
Markdown
Read carefully about the problem:
|
|
|
|
We have a lot of images taken from a wing-type UAV using a camera with at least Full HD resolution. Resolution of each photo could be up to 6200*4100 for the whole flight, but for other flights, it could be FullHD
|
|
Photos are taken and named consecutively within 100 meters of each other.
|
|
We know only the starting GPS coordinates. We need to determine the GPS of the centers of each image. And also the coordinates of the center of any object in these photos. We can use an external satellite provider for ground checks on the existing photos
|
|
|
|
System has next restrictions and conditions:
|
|
- Photos are taken by only airplane type UAVs.
|
|
- Photos are taken by the camera pointing downwards and fixed, but it is not autostabilized.
|
|
- The flying range is restricted by the eastern and southern parts of Ukraine (To the left of the Dnipro River)
|
|
- The image resolution could be from FullHD to 6252*4168. Camera parameters are known: focal length, sensor width, resolution and so on.
|
|
- Altitude is predefined and no more than 1km. The height of the terrain can be neglected.
|
|
- There is NO data from IMU
|
|
- Flights are done mostly in sunny weather
|
|
- We can use satellite providers, but we're limited right now to Google Maps, which could be outdated for some regions
|
|
- Number of photos could be up to 3000, usually in the 500-1500 range
|
|
- During the flight, UAVs can make sharp turns, so that the next photo may be absolutely different from the previous one (no same objects), but it is rather an exception than the rule
|
|
- Processing is done on a stationary computer or laptop with NVidia GPU at least RTX2060, better 3070. (For the UAV solution Jetson Orin Nano would be used, but that is out of scope.)
|
|
|
|
Output of the system should address next acceptance criteria:
|
|
- The system should find out the GPS of centers of 80% of the photos from the flight within an error of no more than 50 meters in comparison to the real GPS
|
|
|
|
- The system should find out the GPS of centers of 60% of the photos from the flight within an error of no more than 20 meters in comparison to the real GPS
|
|
|
|
- The system should correctly continue the work even in the presence of up to 350 meters of an outlier photo between 2 consecutive pictures en route. This could happen due to tilt of the plane.
|
|
|
|
- System should correctly continue the work even during sharp turns, where the next photo doesn't overlap at all, or overlaps in less than 5%. The next photo should be in less than 200m drift and at an angle of less than 70%
|
|
|
|
- System should try to operate when UAV made a sharp turn, and all the next photos has no common points with previous route. In that situation system should try to figure out location of the new piece of the route and connect it to the previous route. Also this separate chunks could be more than 2, so this strategy should be in the core of the system
|
|
|
|
- In case of being absolutely incapable of determining the system to determine next, second next, and third next images GPS, by any means (these 20% of the route), then it should ask the user for input for the next image, so that the user can specify the location
|
|
|
|
- Less than 5 seconds for processing one image
|
|
|
|
- Results of image processing should appear immediately to user, so that user shouldn't wait for the whole route to complete in order to analyze first results. Also, system could refine existing calculated results and send refined results again to user
|
|
|
|
- Image Registration Rate > 95%. The system can find enough matching features to confidently calculate the camera's 6-DoF pose (position and orientation) and "stitch" that image into the final trajectory
|
|
|
|
- Mean Reprojection Error (MRE) < 1.0 pixels. The distance, in pixels, between the original pixel location of the object and the re-projected pixel location.
|
|
|
|
- The whole system should work as a background service. The interaction should be done by zeromq. Sevice should be up and running and awaiting for the initial input message. On the input message processing should started, and immediately after the first results system should provide them to the client
|
|
|
|
Here is a solution draft:
|
|
|
|
|
|
# **ASTRAL System Architecture: A High-Fidelity Geopositioning Framework for IMU-Denied Aerial Operations**
|
|
|
|
## **2.0 The ASTRAL (Advanced Scale-Aware Trajectory-Refinement and Localization) System Architecture**
|
|
|
|
The ASTRAL architecture is a multi-map, decoupled, loosely-coupled system designed to solve the flaws identified in Section 1.0 and meet all 10 Acceptance Criteria.
|
|
|
|
### **2.1 Core Principles**
|
|
|
|
The ASTRAL architecture is built on three principles:
|
|
|
|
1. **Tiered Geospatial Database:** The system *cannot* rely on a single data source. It is architected around a *tiered* local database.
|
|
* **Tier-1 (Baseline):** Google Maps data. This is used to meet the 50m (AC-1) requirement and provide geolocalization.
|
|
* **Tier-2 (High-Accuracy):** A framework for ingesting *commercial, sub-meter* data (visual 4; and DEM 5). This tier is *required* to meet the 20m (AC-2) accuracy. The system will *run* on Tier-1 but *achieve* AC-2 when "fueled" with Tier-2 data.
|
|
2. **Viewpoint-Invariant Anchoring:** The system *rejects* geometric warping. The GAB (Section 5.0) is built on SOTA Visual Place Recognition (VPR) models that are *inherently* invariant to the oblique-to-nadir viewpoint change, decoupling it from the V-SLAM's unstable orientation.
|
|
3. **Continuously-Scaled Trajectory:** The system *rejects* the "single-scale-per-fragment" model. The TOH (Section 6.0) is a Sim(3) pose-graph optimizer 11 that models scale as a *per-keyframe optimizable parameter*.15 This allows the trajectory to "stretch" and "shrink" elastically to absorb continuous monocular scale drift.12
|
|
|
|
### **2.2 Component Interaction and Data Flow**
|
|
|
|
The system is multi-threaded and asynchronous, designed for real-time streaming (AC-7) and refinement (AC-8).
|
|
|
|
* **Component 1: Tiered GDB (Pre-Flight):**
|
|
* *Input:* User-defined Area of Interest (AOI).
|
|
* *Action:* Downloads and builds a local SpatiaLite/GeoPackage.
|
|
* *Output:* A single **Local-Geo-Database file** containing:
|
|
* Tier-1 (Google Maps) + GLO-30 DSM
|
|
* Tier-2 (Commercial) satellite tiles + WorldDEM DTM elevation tiles.
|
|
* A *pre-computed FAISS vector index* of global descriptors (e.g., SALAD 8) for *all* satellite tiles (see 3.4).
|
|
* **Component 2: Image Ingestion (Real-time):**
|
|
* *Input:* Image_N (up to 6.2K), Camera Intrinsics ($K$).
|
|
* *Action:* Creates Image_N_LR (Low-Res, e.g., 1536x1024) and Image_N_HR (High-Res, 6.2K).
|
|
* *Dispatch:* Image_N_LR -> V-SLAM. Image_N_HR -> GAB (for patches).
|
|
* **Component 3: "Atlas" V-SLAM Front-End (High-Frequency Thread):**
|
|
* *Input:* Image_N_LR.
|
|
* *Action:* Tracks Image_N_LR against the *active map fragment*. Manages keyframes and local BA. If tracking lost (AC-4, AC-6), it *initializes a new map fragment*.
|
|
* *Output:* Relative_Unscaled_Pose, Local_Point_Cloud, and Map_Fragment_ID -> TOH.
|
|
* **Component 4: VPR Geospatial Anchoring Back-End (GAB) (Low-Frequency, Asynchronous Thread):**
|
|
* *Input:* A keyframe (Image_N_LR, Image_N_HR) and its Map_Fragment_ID.
|
|
* *Action:* Performs SOTA two-stage VPR (Section 5.0) against the **Local-Geo-Database file**.
|
|
* *Output:* Absolute_Metric_Anchor ([Lat, Lon, Alt] pose) and its Map_Fragment_ID -> TOH.
|
|
* **Component 5: Scale-Aware Trajectory Optimization Hub (TOH) (Central Hub Thread):**
|
|
* *Input 1:* High-frequency Relative_Unscaled_Pose stream.
|
|
* *Input 2:* Low-frequency Absolute_Metric_Anchor stream.
|
|
* *Action:* Manages the *global Sim(3) pose-graph* 13 with *per-keyframe scale*.15
|
|
* *Output 1 (Real-time):* Pose_N_Est (unscaled) -> UI (Meets AC-7).
|
|
* *Output 2 (Refined):* Pose_N_Refined (metric-scale) -> UI (Meets AC-1, AC-2, AC-8).
|
|
|
|
### **2.3 System Inputs**
|
|
|
|
1. **Image Sequence:** Consecutively named images (FullHD to 6252x4168).
|
|
2. **Start Coordinate (Image 0):** A single, absolute GPS coordinate [Lat, Lon].
|
|
3. **Camera Intrinsics (K):** Pre-calibrated camera intrinsic matrix.
|
|
4. **Local-Geo-Database File:** The single file generated by Component 1.
|
|
|
|
### **2.4 Streaming Outputs (Meets AC-7, AC-8)**
|
|
|
|
1. **Initial Pose (Pose_N^{Est}):** An *unscaled* pose. This is the raw output from the V-SLAM Front-End, transformed by the *current best estimate* of the trajectory. It is sent immediately (<5s, AC-7) to the UI for real-time visualization of the UAV's *path shape*.
|
|
2. **Refined Pose (Pose_N^{Refined}) [Asynchronous]:** A globally-optimized, *metric-scale* 7-DoF pose. This is sent to the user *whenever the TOH re-converges* (e.g., after a new GAB anchor or a map-merge). This *re-writes* the history of poses (e.g., Pose_{N-100} to Pose_N), meeting the refinement (AC-8) and accuracy (AC-1, AC-2) requirements.
|
|
|
|
## **3.0 Component 1: The Tiered Pre-Flight Geospatial Database (GDB)**
|
|
|
|
This component is the implementation of the "Tiered Geospatial" principle. It is a mandatory pre-flight utility that solves both the *legal* problem (Flaw 1.4) and the *accuracy* problem (Flaw 1.1).
|
|
|
|
### **3.2 Tier-1 (Baseline): Google Maps and GLO-30 DEM**
|
|
|
|
This tier provides the baseline capability and satisfies AC-1.
|
|
|
|
* **Visual Data:** Google Maps (coarse Maxar)
|
|
* *Resolution:* 10m.
|
|
* *Geodetic Accuracy:* \~1 m to 20m
|
|
* *Purpose:* Meets AC-1 (80% < 50m error). Provides a robust baseline for coarse geolocalization.
|
|
* **Elevation Data:** Copernicus GLO-30 DEM
|
|
* *Resolution:* 30m.
|
|
* *Type:* DSM (Digital Surface Model).2 This is a *weakness*, as it includes buildings/trees.
|
|
* *Purpose:* Provides a coarse altitude prior for the TOH and the initial GAB search.
|
|
|
|
### **3.3 Tier-2 (High-Accuracy): Ingestion Framework for Commercial Data**
|
|
|
|
This is the *procurement and integration framework* required to meet AC-2.
|
|
|
|
* **Visual Data:** Commercial providers, e.g., Maxar (30-50cm) or Satellogic (70cm)
|
|
* *Resolution:* < 1m.
|
|
* *Geodetic Accuracy:* Typically < 5m.
|
|
* *Purpose:* Provides the high-resolution, high-accuracy reference needed for the GAB to achieve a sub-20m total error.
|
|
* **Elevation Data:** Commercial providers, e.g., WorldDEM Neo 5 or Elevation10.32
|
|
* *Resolution:* 5m-12m.
|
|
* *Vertical Accuracy:* < 4m.32
|
|
* *Type:* DTM (Digital Terrain Model).32
|
|
|
|
The use of a DTM (bare-earth) in Tier-2 is a critical advantage over the Tier-1 DSM (surface). The V-SLAM Front-End (Section 4.0) will triangulate a 3D point cloud of what it *sees*, which is the *ground* in fields or *tree-tops* in forests. The Tier-1 GLO-30 DSM 2 represents the *top* of the canopy/buildings. If the V-SLAM maps the *ground* (e.g., altitude 100m) and the GAB tries to anchor it to a DSM *prior* that shows a forest (e.g., altitude 120m), the 20m altitude discrepancy will introduce significant error into the TOH. The Tier-2 DTM (bare-earth) 5 provides a *vastly* superior altitude anchor, as it represents the same ground plane the V-SLAM is tracking, significantly improving the entire 7-DoF pose solution.
|
|
|
|
### **3.4 Local Database Generation: Pre-computing Global Descriptors**
|
|
|
|
This is the key performance optimization for the GAB. During the pre-flight caching step, the GDB utility does not just *store* tiles; it *processes* them.
|
|
|
|
For *every* satellite tile (e.g., 256x256m) in the AOI, the utility will load the tile into the VPR model (e.g., SALAD 8), compute its global descriptor (a compact feature vector), and store this vector in a high-speed vector index (e.g., FAISS).
|
|
|
|
This step moves 99% of the GAB's "Stage 1" (Coarse Retrieval) workload into an offline, pre-flight step. The *real-time* GAB query (Section 5.2) is now reduced to: (1) Compute *one* vector for the UAV image, and (2) Perform a very fast K-Nearest-Neighbor search on the pre-computed FAISS index. This is what makes a SOTA deep-learning GAB 6 fast enough to support the real-time refinement loop.
|
|
|
|
#### **Table 1: Geospatial Reference Data Analysis (Decision Matrix)**
|
|
|
|
| Data Product | Type | Resolution | Geodetic Accuracy (Horiz.) | Type | Cost | AC-2 (20m) Compliant? |
|
|
| :---- | :---- | :---- | :---- | :---- | :---- | :---- |
|
|
| Google Maps | Visual | 1m | 1m - 10m | N/A | Free | **Depending on the location** |
|
|
| Copernicus GLO-30 | Elevation | 30m | \~10-30m | **DSM** (Surface) | Free | **No (Fails Error Budget)** |
|
|
| **Tier-2: Maxar/Satellogic** | Visual | 0.3m - 0.7m | < 5 m (Est.) | N/A | Commercial | **Yes** |
|
|
| **Tier-2: WorldDEM Neo** | Elevation | 5m | < 4m | **DTM** (Bare-Earth) | Commercial | **Yes** |
|
|
|
|
## **4.0 Component 2: The "Atlas" Relative Motion Front-End**
|
|
|
|
This component's sole task is to robustly compute *unscaled* 6-DoF relative motion and handle tracking failures (AC-3, AC-4).
|
|
|
|
### **4.1 Feature Matching Sub-System: SuperPoint + LightGlue**
|
|
|
|
The system will use **SuperPoint** for feature detection and **LightGlue** for matching. This choice is driven by the project's specific constraints:
|
|
|
|
* **Rationale (Robustness):** The UAV flies over "eastern and southern parts of Ukraine," which includes large, low-texture agricultural areas. SuperPoint is a SOTA deep-learning detector renowned for its robustness and repeatability in these challenging, low-texture environments.
|
|
* **Rationale (Performance):** The RTX 2060 (AC-7) is a *hard* constraint with only 6GB VRAM.34 Performance is paramount. LightGlue is an SOTA matcher that provides a 4-10x speedup over its predecessor, SuperGlue. Its "adaptive" nature is a key optimization: it exits early on "easy" pairs (high-overlap, straight-flight) and spends more compute only on "hard" pairs (turns). This saves critical GPU budget on 95% of normal frames, ensuring the <5s (AC-7) budget is met.
|
|
|
|
This subsystem will run on the Image_N_LR (low-res) copy to guarantee it fits in VRAM and meets the real-time budget.
|
|
|
|
#### **Table 2: Analysis of State-of-the-Art Feature Matchers (V-SLAM Front-End)**
|
|
|
|
| Approach (Tools/Library) | Robustness (Low-Texture) | Speed (RTX 2060) | Fitness for Problem |
|
|
| :---- | :---- | :---- | :---- |
|
|
| ORB 33 (e.g., ORB-SLAM3) | Poor. Fails on low-texture. | Excellent (CPU/GPU) | **Good.** Fails robustness in target environment. |
|
|
| SuperPoint + SuperGlue | Excellent. | Good, but heavy. Fixed-depth GNN. 4-10x Slower than LightGlue.35 | **Good.** Robust, but risks AC-7 budget. |
|
|
| **SuperPoint + LightGlue** 35 | Excellent. | **Excellent.** Adaptive depth 35 saves budget. 4-10x faster. | **Excellent (Selected).** Balances robustness and performance. |
|
|
|
|
### **4.2 The "Atlas" Multi-Map Paradigm (Solution for AC-3, AC-4, AC-6)**
|
|
|
|
This architecture is the industry-standard solution for IMU-denied, long-term SLAM and is critical for robustness.
|
|
|
|
* **Mechanism (AC-4, Sharp Turn):**
|
|
1. The system is tracking on $Map_Fragment_0$.
|
|
2. The UAV makes a sharp turn (AC-4, <5% overlap). The V-SLAM *loses tracking*.
|
|
3. Instead of failing, the Atlas architecture *initializes a new map*: $Map_Fragment_1$.
|
|
4. Tracking *resumes instantly* on this new, unanchored map.
|
|
* **Mechanism (AC-3, 350m Outlier):**
|
|
1. The system is tracking. A 350m outlier $Image_N$ arrives.
|
|
2. The V-SLAM fails to match $Image_N$ (a "Transient VO Failure," see 7.3). It is *discarded*.
|
|
3. $Image_N+1$ arrives (back on track). V-SLAM re-acquires its location on $Map_Fragment_0$.
|
|
4. The system "correctly continues the work" (AC-3) by simply rejecting the outlier.
|
|
|
|
This design turns "catastrophic failure" (AC-3, AC-4) into a *standard operating procedure*. The "problem" of stitching the fragments ($Map_0$, $Map_1$) together is moved from the V-SLAM (which has no global context) to the TOH (which *can* solve it using GAB anchors, see 6.4).
|
|
|
|
### **4.3 Local Bundle Adjustment and High-Fidelity 3D Cloud**
|
|
|
|
The V-SLAM front-end will continuously run Local Bundle Adjustment (BA) over a sliding window of recent keyframes to minimize drift *within* that fragment. It will also triangulate a sparse, but high-fidelity, 3D point cloud for its *local map fragment*.
|
|
|
|
This 3D cloud serves a critical dual function:
|
|
|
|
1. It provides a robust 3D map for frame-to-map tracking, which is more stable than frame-to-frame odometry.
|
|
2. It serves as the **high-accuracy data source** for the object localization output (Section 7.2). This is the key to decoupling object-pointing accuracy from external DEM accuracy 19, a critical flaw in simpler designs.
|
|
|
|
## **5.0 Component 3: The Viewpoint-Invariant Geospatial Anchoring Back-End (GAB)**
|
|
|
|
This component *replaces* the draft's "Dynamic Warping" (Section 5.0) and implements the "Viewpoint-Invariant Anchoring" principle (Section 2.1).
|
|
|
|
### **5.1 Rationale: Viewpoint-Invariant VPR vs. Geometric Warping (Solves Flaw 1.2)**
|
|
|
|
As established in 1.2, geometrically warping the image using the V-SLAM's *drifty* roll/pitch estimate creates a *brittle*, high-risk failure spiral. The ASTRAL GAB *decouples* from the V-SLAM's orientation. It uses a SOTA VPR pipeline that *learns* to match oblique UAV images to nadir satellite images *directly*, at the feature level.6
|
|
|
|
### **5.2 Stage 1 (Coarse Retrieval): SOTA Global Descriptors**
|
|
|
|
When triggered by the TOH, the GAB takes Image_N_LR. It computes a *global descriptor* (a single feature vector) using a SOTA VPR model like **SALAD** 6 or **MixVPR**.7
|
|
|
|
This choice is driven by two factors:
|
|
|
|
1. **Viewpoint Invariance:** These models are SOTA for this exact task.
|
|
2. **Inference Speed:** They are extremely fast. SALAD reports < 3ms per image inference 8, and MixVPR is also noted for "fastest inference speed".37 This low overhead is essential for the AC-7 (<5s) budget.
|
|
|
|
This vector is used to query the *pre-computed FAISS vector index* (from 3.4), which returns the Top-K (e.g., K=5) most likely satellite tiles from the *entire AOI* in milliseconds.
|
|
|
|
#### **Table 3: Analysis of VPR Global Descriptors (GAB Back-End)**
|
|
|
|
| Model (Backbone) | Key Feature | Viewpoint Invariance | Inference Speed (ms) | Fitness for GAB |
|
|
| :---- | :---- | :---- | :---- | :---- |
|
|
| NetVLAD 7 (CNN) | Baseline | Poor. Not designed for oblique-to-nadir. | Moderate (\~20-50ms) | **Poor.** Fails robustness. |
|
|
| **SALAD** 8 (DINOv2) | Foundation Model.6 | **Excellent.** Designed for this. | **< 3ms**.8 Extremely fast. | **Excellent (Selected).** |
|
|
| **MixVPR** 36 (ResNet) | All-MLP aggregator.36 | **Very Good.**.7 | **Very Fast.**.37 | **Excellent (Selected).** |
|
|
|
|
### **5.3 Stage 2 (Fine): Local Feature Matching and Pose Refinement**
|
|
|
|
The system runs **SuperPoint+LightGlue** 35 to find pixel-level matches, but *only* between the UAV image and the **Top-K satellite tiles** identified in Stage 1.
|
|
|
|
A **Multi-Resolution Strategy** is employed to solve the VRAM bottleneck.
|
|
|
|
1. Stage 1 (Coarse) runs on the Image_N_LR.
|
|
2. Stage 2 (Fine) runs SuperPoint *selectively* on the Image_N_HR (6.2K) to get high-accuracy keypoints.
|
|
3. It then matches small, full-resolution *patches* from the full-res image, *not* the full image.
|
|
|
|
This hybrid approach is the *only* way to meet both AC-7 (speed) and AC-2 (accuracy). The 6.2K image *cannot* be processed in <5s on an RTX 2060 (6GB VRAM 34). But its high-resolution *pixels* are needed for the 20m *accuracy*. Using full-res *patches* provides the pixel-level accuracy without the VRAM/compute cost.
|
|
|
|
A PnP/RANSAC solver then computes a high-confidence 6-DoF pose. This pose, converted to [Lat, Lon, Alt], is the **$Absolute_Metric_Anchor$** sent to the TOH.
|
|
|
|
## **6.0 Component 4: The Scale-Aware Trajectory Optimization Hub (TOH)**
|
|
|
|
This component is the system's "brain" and implements the "Continuously-Scaled Trajectory" principle (Section 2.1). It *replaces* the draft's flawed "Single Scale" optimizer.
|
|
|
|
### **6.1 The $Sim(3)$ Pose-Graph as the Optimization Backbone**
|
|
|
|
The central challenge of IMU-denied monocular SLAM is *scale drift*.11 The V-SLAM (Component 3) produces 6-DoF poses, but they are *unscaled* ($SE(3)$). The GAB (Component 4) produces *metric* 6-DoF poses ($SE(3)$).
|
|
|
|
The solution is to optimize the *entire graph* in the 7-DoF "Similarity" group, **$Sim(3)$**.11 This adds a 7th degree of freedom (scale, $s$) to the poses. The optimization backbone will be **Ceres Solver** 14, a SOTA C++ library for large, complex non-linear least-squares problems.
|
|
|
|
### **6.2 Advanced Scale-Drift Correction: Modeling Scale as a Per-Keyframe Parameter (Solves Flaw 1.3)**
|
|
|
|
This is the *core* of the ASTRAL optimizer, solving Flaw 1.3. The draft's flawed model ($Pose_Graph(Fragment_i) = \\{Pose_1...Pose_n, s_i\\}$) is replaced by ASTRAL's correct model: $Pose_Graph = \\{ (Pose_1, s_1), (Pose_2, s_2),..., (Pose_N, s_N) \\}$.
|
|
|
|
The graph is constructed as follows:
|
|
|
|
* **Nodes:** Each keyframe pose is a 7-DoF $Sim(3)$ variable $\\{s_k, R_k, t_k\\}$.
|
|
* **Edge 1 (V-SLAM):** A *relative* $Sim(3)$ constraint between $Pose_k$ and $Pose_{k+1}$ from the V-SLAM Front-End.
|
|
* **Edge 2 (GAB):** An *absolute* $SE(3)$ constraint on $Pose_j$ from a GAB anchor. This constraint *fixes* the 6-DoF pose $(R_j, t_j)$ to the metric GAB value and *fixes its scale* $s_j = 1.0$.
|
|
|
|
This "per-keyframe scale" model 15 enables "elastic" trajectory refinement. When the graph is a long, unscaled "chain" of V-SLAM constraints, a GAB anchor (Edge 2) arrives at $Pose_{100}$, "nailing" it to the metric map and setting $s_{100} = 1.0$. As the V-SLAM continues, scale drifts. When a second anchor arrives at $Pose_{200}$ (setting $s_{200} = 1.0$), the Ceres optimizer 14 has a problem: the V-SLAM data *between* them has drifted.
|
|
|
|
The ASTRAL model *allows* the optimizer to solve for all intermediate scales (s_{101}, s_{102},..., s_{199}) as variables. The optimizer will find a *smooth, continuous* scale correction 15 that "elastically" stretches/shrinks the 100-frame sub-segment to *perfectly* fit both metric anchors. This *correctly* models the physics of scale drift 12 and is the *only* way to achieve the 20m accuracy (AC-2) and 1.0px MRE (AC-10).
|
|
|
|
### **6.3 Robust M-Estimation (Solution for AC-3, AC-5)**
|
|
|
|
A 350m outlier (AC-3) or a bad GAB match (AC-5) will add a constraint with a *massive* error. A standard least-squares optimizer 14 would be *catastrophically* corrupted, pulling the *entire* 3000-image trajectory to try and fit this one bad point.
|
|
|
|
This is a solved problem. All constraints (V-SLAM and GAB) *must* be wrapped in a **Robust Loss Function** (e.g., HuberLoss, CauchyLoss) within Ceres Solver. This function mathematically *down-weights* the influence of constraints with large errors (high residuals). It effectively tells the optimizer: "This measurement is insane. Ignore it." This provides automatic, graceful outlier rejection, meeting AC-3 and AC-5.
|
|
|
|
### **6.4 Geodetic Map-Merging (Solution for AC-4, AC-6)**
|
|
|
|
This mechanism is the robust solution to the "sharp turn" (AC-4) problem.
|
|
|
|
* **Scenario:** The UAV makes a sharp turn (AC-4). The V-SLAM (4.2) creates Map_Fragment_0 and Map_Fragment_1. The TOH's graph now has two *disconnected* components.
|
|
* **Mechanism (Geodetic Merging):**
|
|
1. The TOH queues the GAB (Section 5.0) to find anchors for *both* fragments.
|
|
2. GAB returns Anchor_A for Map_Fragment_0 and Anchor_B for Map_Fragment_1.
|
|
3. The TOH adds *both* of these as absolute, metric constraints (Edge 2) to the *single global pose-graph*.
|
|
4. The Ceres optimizer 14 now has all the information it needs. It solves for the 7-Dof pose of *both fragments*, placing them in their correct, globally-consistent metric positions.
|
|
|
|
The two fragments are *merged geodetically* (by their global coordinates 11) even if they *never* visually overlap. This is a vastly more robust solution to AC-4 and AC-6 than simple visual loop closure.
|
|
|
|
## **7.0 Performance, Deployment, and High-Accuracy Outputs**
|
|
|
|
### **7.1 Meeting the <5s Budget (AC-7): Mandatory Acceleration with NVIDIA TensorRT**
|
|
|
|
The system must run on an RTX 2060 (AC-7). This is a low-end, 6GB VRAM card 34, which is a *severe* constraint. Running three deep-learning models (SuperPoint, LightGlue, SALAD/MixVPR) plus a Ceres optimizer 38 will saturate this hardware.
|
|
|
|
* **Solution 1: Multi-Scale Pipeline.** As defined in 5.3, the system *never* processes a full 6.2K image on the GPU. It uses low-res for V-SLAM/GAB-Coarse and high-res *patches* for GAB-Fine.
|
|
* **Solution 2: Mandatory TensorRT Deployment.** Running these models in their native PyTorch framework will be too slow. All neural networks (SuperPoint, LightGlue, SALAD/MixVPR) *must* be converted from PyTorch into optimized **NVIDIA TensorRT engines**. Research *specifically* on accelerating LightGlue shows this provides **"2x-4x speed gains over compiled PyTorch"**.35 This 200-400% speedup is *not* an optimization; it is a *mandatory deployment step* to make the <5s (AC-7) budget *possible* on an RTX 2060.
|
|
|
|
### **7.2 High-Accuracy Object Geolocalization via Ray-Cloud Intersection (Solves AC-2/AC-10)**
|
|
|
|
The user must be able to find the GPS of an *object* in a photo. A simple approach of ray-casting from the camera and intersecting with the 30m GLO-30 DEM 2 is fatally flawed. The DEM error itself can be up to 30m 19, making AC-2 impossible.
|
|
|
|
The ASTRAL system uses a **Ray-Cloud Intersection** method that *decouples* object accuracy from external DEM accuracy.
|
|
|
|
* **Algorithm:**
|
|
1. The user clicks pixel (u,v) on Image_N.
|
|
2. The system retrieves the *final, refined, metric 7-DoF pose* P_{sim(3)} = (s, R, T) for Image_N from the TOH.
|
|
3. It also retrieves the V-SLAM's *local, high-fidelity 3D point cloud* (P_{local_cloud}) from Component 3 (Section 4.3).
|
|
4. **Step 1 (Local):** The pixel (u,v) is un-projected into a ray. This ray is intersected with the *local* P_{local_cloud}. This finds the 3D point $P_{local} *relative to the V-SLAM map*. The accuracy of this step is defined by AC-10 (MRE < 1.0px).
|
|
5. **Step 2 (Global):** This *highly-accurate* local point P_{local} is transformed into the global metric coordinate system using the *highly-accurate* refined pose from the TOH: P_{metric} = s * (R * P_{local}) + T.
|
|
6. **Step 3 (Convert):** P_{metric} (an X,Y,Z world coordinate) is converted to [Latitude, Longitude, Altitude].
|
|
|
|
This method correctly isolates error. The object's accuracy is now *only* dependent on the V-SLAM's internal geometry (AC-10) and the TOH's global pose accuracy (AC-1, AC-2). It *completely eliminates* the external 30m DEM error 2 from this critical, high-accuracy calculation.
|
|
|
|
### **7.3 Failure Mode Escalation Logic (Meets AC-3, AC-4, AC-6, AC-9)**
|
|
|
|
The system is built on a robust state machine to handle real-world failures.
|
|
|
|
* **Stage 1: Normal Operation (Tracking):** V-SLAM tracks, TOH optimizes.
|
|
* **Stage 2: Transient VO Failure (Outlier Rejection):**
|
|
* *Condition:* Image_N is a 350m outlier (AC-3) or severe blur.
|
|
* *Logic:* V-SLAM fails to track Image_N. System *discards* it (AC-5). Image_N+1 arrives, V-SLAM re-tracks.
|
|
* *Result:* **AC-3 Met.**
|
|
* **Stage 3: Persistent VO Failure (New Map Initialization):**
|
|
* *Condition:* "Sharp turn" (AC-4) or >5 frames of tracking loss.
|
|
* *Logic:* V-SLAM (Section 4.2) declares "Tracking Lost." Initializes *new* Map_Fragment_k+1. Tracking *resumes instantly*.
|
|
* *Result:* **AC-4 Met.** System "correctly continues the work." The >95% registration rate (AC-9) is met because this is *not* a failure, it's a *new registration*.
|
|
* **Stage 4: Map-Merging & Global Relocalization (GAB-Assisted):**
|
|
* *Condition:* System is on Map_Fragment_k+1, Map_Fragment_k is "lost."
|
|
* *Logic:* TOH (Section 6.4) receives GAB anchors for *both* fragments and *geodetically merges* them in the global optimizer.14
|
|
* *Result:* **AC-6 Met** (strategy to connect separate chunks).
|
|
* **Stage 5: Catastrophic Failure (User Intervention):**
|
|
* *Condition:* System is in Stage 3 (Lost) *and* the GAB has failed for 20% of the route. The "absolutely incapable" scenario (AC-6).
|
|
* *Logic:* TOH triggers the AC-6 flag. UI prompts user: "Please provide a coarse location for the *current* image."
|
|
* *Action:* This user-click is *not* taken as ground-truth. It is fed to the **GAB (Section 5.0)** as a *strong spatial prior*, narrowing its Stage 1 8 search from "the entire AOI" to "a 5km radius." This *guarantees* the GAB finds a match, which triggers Stage 4, re-localizing the system.
|
|
* *Result:* **AC-6 Met** (user input).
|
|
|
|
## **8.0 ASTRAL Validation Plan and Acceptance Criteria Matrix**
|
|
|
|
A comprehensive test plan is required to validate compliance with all 10 Acceptance Criteria. The foundation is a **Ground-Truth Test Harness** using project-provided ground-truth data.
|
|
|
|
### **Table 4: ASTRAL Component vs. Acceptance Criteria Compliance Matrix**
|
|
|
|
| ID | Requirement | ASTRAL Solution (Component) | Key Technology / Justification |
|
|
| :---- | :---- | :---- | :---- |
|
|
| **AC-1** | 80% of photos < 50m error | GDB (C-1) + GAB (C-5) + TOH (C-6) | **Tier-1 (Copernicus)** data 1 is sufficient. SOTA VPR 8 + Sim(3) graph 13 can achieve this. |
|
|
| **AC-2** | 60% of photos < 20m error | GDB (C-1) + GAB (C-5) + TOH (C-6) | **Requires Tier-2 (Commercial) Data**.4 Mitigates reference error.3 **Per-Keyframe Scale** 15 model in TOH minimizes drift error. |
|
|
| **AC-3** | Robust to 350m outlier | V-SLAM (C-3) + TOH (C-6) | **Stage 2 Failure Logic** (7.3) discards the frame. **Robust M-Estimation** (6.3) in Ceres 14 automatically rejects the constraint. |
|
|
| **AC-4** | Robust to sharp turns (<5% overlap) | V-SLAM (C-3) + TOH (C-6) | **"Atlas" Multi-Map** (4.2) initializes new map (Map_Fragment_k+1). **Geodetic Map-Merging** (6.4) in TOH re-connects fragments via GAB anchors. |
|
|
| **AC-5** | < 10% outlier anchors | TOH (C-6) | **Robust M-Estimation (Huber Loss)** (6.3) in Ceres 14 automatically down-weights and ignores high-residual (bad) GAB anchors. |
|
|
| **AC-6** | Connect route chunks; User input | V-SLAM (C-3) + TOH (C-6) + UI | **Geodetic Map-Merging** (6.4) connects chunks. **Stage 5 Failure Logic** (7.3) provides the user-input-as-prior mechanism. |
|
|
| **AC-7** | < 5 seconds processing/image | All Components | **Multi-Scale Pipeline** (5.3) (Low-Res V-SLAM, Hi-Res GAB patches). **Mandatory TensorRT Acceleration** (7.1) for 2-4x speedup.35 |
|
|
| **AC-8** | Real-time stream + async refinement | TOH (C-5) + Outputs (C-2.4) | Decoupled architecture provides Pose_N_Est (V-SLAM) in real-time and Pose_N_Refined (TOH) asynchronously as GAB anchors arrive. |
|
|
| **AC-9** | Image Registration Rate > 95% | V-SLAM (C-3) | **"Atlas" Multi-Map** (4.2). A "lost track" (AC-4) is *not* a registration failure; it's a *new map registration*. This ensures the rate > 95%. |
|
|
| **AC-10** | Mean Reprojection Error (MRE) < 1.0px | V-SLAM (C-3) + TOH (C-6) | Local BA (4.3) + Global BA (TOH14) + **Per-Keyframe Scale** (6.2) minimizes internal graph tension (Flaw 1.3), allowing the optimizer to converge to a low MRE. |
|
|
|
|
### **8.1 Rigorous Validation Methodology**
|
|
|
|
* **Test Harness:** A validation script will be created to compare the system's Pose_N^{Refined} output against a ground-truth coordinates.csv file, computing Haversine distance errors.
|
|
* **Test Datasets:**
|
|
* Test_Baseline: Standard flight.
|
|
* Test_Outlier_350m (AC-3): A single, unrelated image inserted.
|
|
* Test_Sharp_Turn_5pct (AC-4): A sequence with a 10-frame gap.
|
|
* Test_Long_Route (AC-9, AC-7): A 2000-image sequence.
|
|
* **Test Cases:**
|
|
* Test_Accuracy: Run Test_Baseline. ASSERT (count(errors < 50m) / total) >= 0.80 (AC-1). ASSERT (count(errors < 20m) / total) >= 0.60 (AC-2).
|
|
* Test_Robustness: Run Test_Outlier_350m and Test_Sharp_Turn_5pct. ASSERT system completes the run and Test_Accuracy assertions still pass on the valid frames.
|
|
* Test_Performance: Run Test_Long_Route on min-spec RTX 2060. ASSERT average_time(Pose_N^{Est} output) < 5.0s (AC-7).
|
|
* Test_MRE: ASSERT TOH.final_MRE < 1.0 (AC-10).
|
|
|
|
|
|
Identify all potential weak points and problems. Address them and find out ways to solve them. Based on your findings, form a new solution draft in the same format.
|
|
|
|
If your finding requires a complete reorganization of the flow and different components, state it.
|
|
Put all the findings regarding what was weak and poor at the beginning of the report.
|
|
At the very beginning of the report list most profound changes you've made to previous solution.
|
|
|
|
Then form a new solution design without referencing the previous system. Remove Poor and Very Poor component choices from the component analysis tables, but leave Good and Excellent ones.
|
|
In the updated report, do not put "new" marks, do not compare to the previous solution draft, just make a new solution as if from scratch |