Files
gps-denied-desktop/docs/01_solution/05_solution_draft.md
T
2025-11-19 23:07:29 +02:00

27 KiB

ASTRAL System Architecture: A High-Fidelity Geopositioning Framework for IMU-Denied Aerial Operations

2.0 The ASTRAL (Advanced Scale-Aware Trajectory-Refinement and Localization) System Architecture

The ASTRAL architecture is a multi-map, decoupled, loosely-coupled system designed to solve the flaws identified in Section 1.0 and meet all 10 Acceptance Criteria.

2.1 Core Principles

The ASTRAL architecture is built on three principles:

  1. Tiered Geospatial Database: The system cannot rely on a single data source. It is architected around a tiered local database.
    • Tier-1 (Baseline): Google Maps data. This is used to meet the 50m (AC-1) requirement and provide geolocalization.
    • Tier-2 (High-Accuracy): A framework for ingesting commercial, sub-meter data (visual 4; and DEM 5). This tier is required to meet the 20m (AC-2) accuracy. The system will run on Tier-1 but achieve AC-2 when "fueled" with Tier-2 data.
  2. Viewpoint-Invariant Anchoring: The system rejects geometric warping. The GAB (Section 5.0) is built on SOTA Visual Place Recognition (VPR) models that are inherently invariant to the oblique-to-nadir viewpoint change, decoupling it from the V-SLAM's unstable orientation.
  3. Continuously-Scaled Trajectory: The system rejects the "single-scale-per-fragment" model. The TOH (Section 6.0) is a Sim(3) pose-graph optimizer 11 that models scale as a per-keyframe optimizable parameter.15 This allows the trajectory to "stretch" and "shrink" elastically to absorb continuous monocular scale drift.12

2.2 Component Interaction and Data Flow

The system is multi-threaded and asynchronous, designed for real-time streaming (AC-7) and refinement (AC-8).

  • Component 1: Tiered GDB (Pre-Flight):
    • Input: User-defined Area of Interest (AOI).
    • Action: Downloads and builds a local SpatiaLite/GeoPackage.
    • Output: A single Local-Geo-Database file containing:
      • Tier-1 (Google Maps) + GLO-30 DSM
      • Tier-2 (Commercial) satellite tiles + WorldDEM DTM elevation tiles.
      • A pre-computed FAISS vector index of global descriptors (e.g., SALAD 8) for all satellite tiles (see 3.4).
  • Component 2: Image Ingestion (Real-time):
    • Input: Image_N (up to 6.2K), Camera Intrinsics (K).
    • Action: Creates Image_N_LR (Low-Res, e.g., 1536x1024) and Image_N_HR (High-Res, 6.2K).
    • Dispatch: Image_N_LR -> V-SLAM. Image_N_HR -> GAB (for patches).
  • Component 3: "Atlas" V-SLAM Front-End (High-Frequency Thread):
    • Input: Image_N_LR.
    • Action: Tracks Image_N_LR against the active map fragment. Manages keyframes and local BA. If tracking lost (AC-4, AC-6), it initializes a new map fragment.
    • Output: Relative_Unscaled_Pose, Local_Point_Cloud, and Map_Fragment_ID -> TOH.
  • Component 4: VPR Geospatial Anchoring Back-End (GAB) (Low-Frequency, Asynchronous Thread):
    • Input: A keyframe (Image_N_LR, Image_N_HR) and its Map_Fragment_ID.
    • Action: Performs SOTA two-stage VPR (Section 5.0) against the Local-Geo-Database file.
    • Output: Absolute_Metric_Anchor ([Lat, Lon, Alt] pose) and its Map_Fragment_ID -> TOH.
  • Component 5: Scale-Aware Trajectory Optimization Hub (TOH) (Central Hub Thread):
    • Input 1: High-frequency Relative_Unscaled_Pose stream.
    • Input 2: Low-frequency Absolute_Metric_Anchor stream.
    • Action: Manages the global Sim(3) pose-graph 13 with per-keyframe scale.15
    • Output 1 (Real-time): Pose_N_Est (unscaled) -> UI (Meets AC-7).
    • Output 2 (Refined): Pose_N_Refined (metric-scale) -> UI (Meets AC-1, AC-2, AC-8).

2.3 System Inputs

  1. Image Sequence: Consecutively named images (FullHD to 6252x4168).
  2. Start Coordinate (Image 0): A single, absolute GPS coordinate [Lat, Lon].
  3. Camera Intrinsics (K): Pre-calibrated camera intrinsic matrix.
  4. Local-Geo-Database File: The single file generated by Component 1.

2.4 Streaming Outputs (Meets AC-7, AC-8)

  1. Initial Pose (Pose_N^{Est}): An unscaled pose. This is the raw output from the V-SLAM Front-End, transformed by the current best estimate of the trajectory. It is sent immediately (<5s, AC-7) to the UI for real-time visualization of the UAV's path shape.
  2. Refined Pose (Pose_N^{Refined}) [Asynchronous]: A globally-optimized, metric-scale 7-DoF pose. This is sent to the user whenever the TOH re-converges (e.g., after a new GAB anchor or a map-merge). This re-writes the history of poses (e.g., Pose_{N-100} to Pose_N), meeting the refinement (AC-8) and accuracy (AC-1, AC-2) requirements.

3.0 Component 1: The Tiered Pre-Flight Geospatial Database (GDB)

This component is the implementation of the "Tiered Geospatial" principle. It is a mandatory pre-flight utility that solves both the legal problem (Flaw 1.4) and the accuracy problem (Flaw 1.1).

3.2 Tier-1 (Baseline): Google Maps and GLO-30 DEM

This tier provides the baseline capability and satisfies AC-1.

  • Visual Data: Google Maps (coarse Maxar)
    • Resolution: 10m.
    • Geodetic Accuracy: ~1 m to 20m
    • Purpose: Meets AC-1 (80% < 50m error). Provides a robust baseline for coarse geolocalization.
  • Elevation Data: Copernicus GLO-30 DEM
    • Resolution: 30m.
    • Type: DSM (Digital Surface Model).2 This is a weakness, as it includes buildings/trees.
    • Purpose: Provides a coarse altitude prior for the TOH and the initial GAB search.

3.3 Tier-2 (High-Accuracy): Ingestion Framework for Commercial Data

This is the procurement and integration framework required to meet AC-2.

  • Visual Data: Commercial providers, e.g., Maxar (30-50cm) or Satellogic (70cm)
    • Resolution: < 1m.
    • Geodetic Accuracy: Typically < 5m.
    • Purpose: Provides the high-resolution, high-accuracy reference needed for the GAB to achieve a sub-20m total error.
  • Elevation Data: Commercial providers, e.g., WorldDEM Neo 5 or Elevation10.32
    • Resolution: 5m-12m.
    • Vertical Accuracy: < 4m.32
    • Type: DTM (Digital Terrain Model).32

The use of a DTM (bare-earth) in Tier-2 is a critical advantage over the Tier-1 DSM (surface). The V-SLAM Front-End (Section 4.0) will triangulate a 3D point cloud of what it sees, which is the ground in fields or tree-tops in forests. The Tier-1 GLO-30 DSM 2 represents the top of the canopy/buildings. If the V-SLAM maps the ground (e.g., altitude 100m) and the GAB tries to anchor it to a DSM prior that shows a forest (e.g., altitude 120m), the 20m altitude discrepancy will introduce significant error into the TOH. The Tier-2 DTM (bare-earth) 5 provides a vastly superior altitude anchor, as it represents the same ground plane the V-SLAM is tracking, significantly improving the entire 7-DoF pose solution.

3.4 Local Database Generation: Pre-computing Global Descriptors

This is the key performance optimization for the GAB. During the pre-flight caching step, the GDB utility does not just store tiles; it processes them.

For every satellite tile (e.g., 256x256m) in the AOI, the utility will load the tile into the VPR model (e.g., SALAD 8), compute its global descriptor (a compact feature vector), and store this vector in a high-speed vector index (e.g., FAISS).

This step moves 99% of the GAB's "Stage 1" (Coarse Retrieval) workload into an offline, pre-flight step. The real-time GAB query (Section 5.2) is now reduced to: (1) Compute one vector for the UAV image, and (2) Perform a very fast K-Nearest-Neighbor search on the pre-computed FAISS index. This is what makes a SOTA deep-learning GAB 6 fast enough to support the real-time refinement loop.

Table 1: Geospatial Reference Data Analysis (Decision Matrix)

Data Product Type Resolution Geodetic Accuracy (Horiz.) Type Cost AC-2 (20m) Compliant?
Google Maps Visual 1m 1m - 10m N/A Free Depending on the location
Copernicus GLO-30 Elevation 30m ~10-30m DSM (Surface) Free No (Fails Error Budget)
Tier-2: Maxar/Satellogic Visual 0.3m - 0.7m < 5 m (Est.) N/A Commercial Yes
Tier-2: WorldDEM Neo Elevation 5m < 4m DTM (Bare-Earth) Commercial Yes

4.0 Component 2: The "Atlas" Relative Motion Front-End

This component's sole task is to robustly compute unscaled 6-DoF relative motion and handle tracking failures (AC-3, AC-4).

4.1 Feature Matching Sub-System: SuperPoint + LightGlue

The system will use SuperPoint for feature detection and LightGlue for matching. This choice is driven by the project's specific constraints:

  • Rationale (Robustness): The UAV flies over "eastern and southern parts of Ukraine," which includes large, low-texture agricultural areas. SuperPoint is a SOTA deep-learning detector renowned for its robustness and repeatability in these challenging, low-texture environments.
  • Rationale (Performance): The RTX 2060 (AC-7) is a hard constraint with only 6GB VRAM.34 Performance is paramount. LightGlue is an SOTA matcher that provides a 4-10x speedup over its predecessor, SuperGlue. Its "adaptive" nature is a key optimization: it exits early on "easy" pairs (high-overlap, straight-flight) and spends more compute only on "hard" pairs (turns). This saves critical GPU budget on 95% of normal frames, ensuring the <5s (AC-7) budget is met.

This subsystem will run on the Image_N_LR (low-res) copy to guarantee it fits in VRAM and meets the real-time budget.

Table 2: Analysis of State-of-the-Art Feature Matchers (V-SLAM Front-End)

Approach (Tools/Library) Robustness (Low-Texture) Speed (RTX 2060) Fitness for Problem
ORB 33 (e.g., ORB-SLAM3) Poor. Fails on low-texture. Excellent (CPU/GPU) Good. Fails robustness in target environment.
SuperPoint + SuperGlue Excellent. Good, but heavy. Fixed-depth GNN. 4-10x Slower than LightGlue.35 Good. Robust, but risks AC-7 budget.
SuperPoint + LightGlue 35 Excellent. Excellent. Adaptive depth 35 saves budget. 4-10x faster. Excellent (Selected). Balances robustness and performance.

4.2 The "Atlas" Multi-Map Paradigm (Solution for AC-3, AC-4, AC-6)

This architecture is the industry-standard solution for IMU-denied, long-term SLAM and is critical for robustness.

  • Mechanism (AC-4, Sharp Turn):
    1. The system is tracking on Map_Fragment_0.
    2. The UAV makes a sharp turn (AC-4, <5% overlap). The V-SLAM loses tracking.
    3. Instead of failing, the Atlas architecture initializes a new map: Map_Fragment_1.
    4. Tracking resumes instantly on this new, unanchored map.
  • Mechanism (AC-3, 350m Outlier):
    1. The system is tracking. A 350m outlier Image_N arrives.
    2. The V-SLAM fails to match Image_N (a "Transient VO Failure," see 7.3). It is discarded.
    3. Image_N+1 arrives (back on track). V-SLAM re-acquires its location on Map_Fragment_0.
    4. The system "correctly continues the work" (AC-3) by simply rejecting the outlier.

This design turns "catastrophic failure" (AC-3, AC-4) into a standard operating procedure. The "problem" of stitching the fragments (Map_0, Map_1) together is moved from the V-SLAM (which has no global context) to the TOH (which can solve it using GAB anchors, see 6.4).

4.3 Local Bundle Adjustment and High-Fidelity 3D Cloud

The V-SLAM front-end will continuously run Local Bundle Adjustment (BA) over a sliding window of recent keyframes to minimize drift within that fragment. It will also triangulate a sparse, but high-fidelity, 3D point cloud for its local map fragment.

This 3D cloud serves a critical dual function:

  1. It provides a robust 3D map for frame-to-map tracking, which is more stable than frame-to-frame odometry.
  2. It serves as the high-accuracy data source for the object localization output (Section 7.2). This is the key to decoupling object-pointing accuracy from external DEM accuracy 19, a critical flaw in simpler designs.

5.0 Component 3: The Viewpoint-Invariant Geospatial Anchoring Back-End (GAB)

This component replaces the draft's "Dynamic Warping" (Section 5.0) and implements the "Viewpoint-Invariant Anchoring" principle (Section 2.1).

5.1 Rationale: Viewpoint-Invariant VPR vs. Geometric Warping (Solves Flaw 1.2)

As established in 1.2, geometrically warping the image using the V-SLAM's drifty roll/pitch estimate creates a brittle, high-risk failure spiral. The ASTRAL GAB decouples from the V-SLAM's orientation. It uses a SOTA VPR pipeline that learns to match oblique UAV images to nadir satellite images directly, at the feature level.6

5.2 Stage 1 (Coarse Retrieval): SOTA Global Descriptors

When triggered by the TOH, the GAB takes Image_N_LR. It computes a global descriptor (a single feature vector) using a SOTA VPR model like SALAD 6 or MixVPR.7

This choice is driven by two factors:

  1. Viewpoint Invariance: These models are SOTA for this exact task.
  2. Inference Speed: They are extremely fast. SALAD reports < 3ms per image inference 8, and MixVPR is also noted for "fastest inference speed".37 This low overhead is essential for the AC-7 (<5s) budget.

This vector is used to query the pre-computed FAISS vector index (from 3.4), which returns the Top-K (e.g., K=5) most likely satellite tiles from the entire AOI in milliseconds.

Table 3: Analysis of VPR Global Descriptors (GAB Back-End)

Model (Backbone) Key Feature Viewpoint Invariance Inference Speed (ms) Fitness for GAB
NetVLAD 7 (CNN) Baseline Poor. Not designed for oblique-to-nadir. Moderate (~20-50ms) Poor. Fails robustness.
SALAD 8 (DINOv2) Foundation Model.6 Excellent. Designed for this. < 3ms.8 Extremely fast. Excellent (Selected).
MixVPR 36 (ResNet) All-MLP aggregator.36 Very Good..7 Very Fast..37 Excellent (Selected).

5.3 Stage 2 (Fine): Local Feature Matching and Pose Refinement

The system runs SuperPoint+LightGlue 35 to find pixel-level matches, but only between the UAV image and the Top-K satellite tiles identified in Stage 1.

A Multi-Resolution Strategy is employed to solve the VRAM bottleneck.

  1. Stage 1 (Coarse) runs on the Image_N_LR.
  2. Stage 2 (Fine) runs SuperPoint selectively on the Image_N_HR (6.2K) to get high-accuracy keypoints.
  3. It then matches small, full-resolution patches from the full-res image, not the full image.

This hybrid approach is the only way to meet both AC-7 (speed) and AC-2 (accuracy). The 6.2K image cannot be processed in <5s on an RTX 2060 (6GB VRAM 34). But its high-resolution pixels are needed for the 20m accuracy. Using full-res patches provides the pixel-level accuracy without the VRAM/compute cost.

A PnP/RANSAC solver then computes a high-confidence 6-DoF pose. This pose, converted to [Lat, Lon, Alt], is the $Absolute_Metric_Anchor$ sent to the TOH.

6.0 Component 4: The Scale-Aware Trajectory Optimization Hub (TOH)

This component is the system's "brain" and implements the "Continuously-Scaled Trajectory" principle (Section 2.1). It replaces the draft's flawed "Single Scale" optimizer.

6.1 The Sim(3) Pose-Graph as the Optimization Backbone

The central challenge of IMU-denied monocular SLAM is scale drift.11 The V-SLAM (Component 3) produces 6-DoF poses, but they are unscaled (SE(3)). The GAB (Component 4) produces metric 6-DoF poses (SE(3)).

The solution is to optimize the entire graph in the 7-DoF "Similarity" group, $Sim(3)$.11 This adds a 7th degree of freedom (scale, s) to the poses. The optimization backbone will be Ceres Solver 14, a SOTA C++ library for large, complex non-linear least-squares problems.

6.2 Advanced Scale-Drift Correction: Modeling Scale as a Per-Keyframe Parameter (Solves Flaw 1.3)

This is the core of the ASTRAL optimizer, solving Flaw 1.3. The draft's flawed model (Pose_Graph(Fragment_i) = \\{Pose_1...Pose_n, s_i\\}) is replaced by ASTRAL's correct model: Pose_Graph = \\{ (Pose_1, s_1), (Pose_2, s_2),..., (Pose_N, s_N) \\}.

The graph is constructed as follows:

  • Nodes: Each keyframe pose is a 7-DoF Sim(3) variable \\{s_k, R_k, t_k\\}.
  • Edge 1 (V-SLAM): A relative Sim(3) constraint between Pose_k and Pose_{k+1} from the V-SLAM Front-End.
  • Edge 2 (GAB): An absolute SE(3) constraint on Pose_j from a GAB anchor. This constraint fixes the 6-DoF pose (R_j, t_j) to the metric GAB value and fixes its scale s_j = 1.0.

This "per-keyframe scale" model 15 enables "elastic" trajectory refinement. When the graph is a long, unscaled "chain" of V-SLAM constraints, a GAB anchor (Edge 2) arrives at Pose_{100}, "nailing" it to the metric map and setting s_{100} = 1.0. As the V-SLAM continues, scale drifts. When a second anchor arrives at Pose_{200} (setting s_{200} = 1.0), the Ceres optimizer 14 has a problem: the V-SLAM data between them has drifted.

The ASTRAL model allows the optimizer to solve for all intermediate scales (s_{101}, s_{102},..., s_{199}) as variables. The optimizer will find a smooth, continuous scale correction 15 that "elastically" stretches/shrinks the 100-frame sub-segment to perfectly fit both metric anchors. This correctly models the physics of scale drift 12 and is the only way to achieve the 20m accuracy (AC-2) and 1.0px MRE (AC-10).

6.3 Robust M-Estimation (Solution for AC-3, AC-5)

A 350m outlier (AC-3) or a bad GAB match (AC-5) will add a constraint with a massive error. A standard least-squares optimizer 14 would be catastrophically corrupted, pulling the entire 3000-image trajectory to try and fit this one bad point.

This is a solved problem. All constraints (V-SLAM and GAB) must be wrapped in a Robust Loss Function (e.g., HuberLoss, CauchyLoss) within Ceres Solver. This function mathematically down-weights the influence of constraints with large errors (high residuals). It effectively tells the optimizer: "This measurement is insane. Ignore it." This provides automatic, graceful outlier rejection, meeting AC-3 and AC-5.

6.4 Geodetic Map-Merging (Solution for AC-4, AC-6)

This mechanism is the robust solution to the "sharp turn" (AC-4) problem.

  • Scenario: The UAV makes a sharp turn (AC-4). The V-SLAM (4.2) creates Map_Fragment_0 and Map_Fragment_1. The TOH's graph now has two disconnected components.
  • Mechanism (Geodetic Merging):
    1. The TOH queues the GAB (Section 5.0) to find anchors for both fragments.
    2. GAB returns Anchor_A for Map_Fragment_0 and Anchor_B for Map_Fragment_1.
    3. The TOH adds both of these as absolute, metric constraints (Edge 2) to the single global pose-graph.
    4. The Ceres optimizer 14 now has all the information it needs. It solves for the 7-Dof pose of both fragments, placing them in their correct, globally-consistent metric positions.

The two fragments are merged geodetically (by their global coordinates 11) even if they never visually overlap. This is a vastly more robust solution to AC-4 and AC-6 than simple visual loop closure.

7.0 Performance, Deployment, and High-Accuracy Outputs

7.1 Meeting the <5s Budget (AC-7): Mandatory Acceleration with NVIDIA TensorRT

The system must run on an RTX 2060 (AC-7). This is a low-end, 6GB VRAM card 34, which is a severe constraint. Running three deep-learning models (SuperPoint, LightGlue, SALAD/MixVPR) plus a Ceres optimizer 38 will saturate this hardware.

  • Solution 1: Multi-Scale Pipeline. As defined in 5.3, the system never processes a full 6.2K image on the GPU. It uses low-res for V-SLAM/GAB-Coarse and high-res patches for GAB-Fine.
  • Solution 2: Mandatory TensorRT Deployment. Running these models in their native PyTorch framework will be too slow. All neural networks (SuperPoint, LightGlue, SALAD/MixVPR) must be converted from PyTorch into optimized NVIDIA TensorRT engines. Research specifically on accelerating LightGlue shows this provides "2x-4x speed gains over compiled PyTorch".35 This 200-400% speedup is not an optimization; it is a mandatory deployment step to make the <5s (AC-7) budget possible on an RTX 2060.

7.2 High-Accuracy Object Geolocalization via Ray-Cloud Intersection (Solves AC-2/AC-10)

The user must be able to find the GPS of an object in a photo. A simple approach of ray-casting from the camera and intersecting with the 30m GLO-30 DEM 2 is fatally flawed. The DEM error itself can be up to 30m 19, making AC-2 impossible.

The ASTRAL system uses a Ray-Cloud Intersection method that decouples object accuracy from external DEM accuracy.

  • Algorithm:
    1. The user clicks pixel (u,v) on Image_N.
    2. The system retrieves the final, refined, metric 7-DoF pose P_{sim(3)} = (s, R, T) for Image_N from the TOH.
    3. It also retrieves the V-SLAM's local, high-fidelity 3D point cloud (P_{local_cloud}) from Component 3 (Section 4.3).
    4. Step 1 (Local): The pixel (u,v) is un-projected into a ray. This ray is intersected with the local P_{local_cloud}. This finds the 3D point $P_{local} relative to the V-SLAM map. The accuracy of this step is defined by AC-10 (MRE < 1.0px).
    5. Step 2 (Global): This highly-accurate local point P_{local} is transformed into the global metric coordinate system using the highly-accurate refined pose from the TOH: P_{metric} = s * (R * P_{local}) + T.
    6. Step 3 (Convert): P_{metric} (an X,Y,Z world coordinate) is converted to [Latitude, Longitude, Altitude].

This method correctly isolates error. The object's accuracy is now only dependent on the V-SLAM's internal geometry (AC-10) and the TOH's global pose accuracy (AC-1, AC-2). It completely eliminates the external 30m DEM error 2 from this critical, high-accuracy calculation.

7.3 Failure Mode Escalation Logic (Meets AC-3, AC-4, AC-6, AC-9)

The system is built on a robust state machine to handle real-world failures.

  • Stage 1: Normal Operation (Tracking): V-SLAM tracks, TOH optimizes.
  • Stage 2: Transient VO Failure (Outlier Rejection):
    • Condition: Image_N is a 350m outlier (AC-3) or severe blur.
    • Logic: V-SLAM fails to track Image_N. System discards it (AC-5). Image_N+1 arrives, V-SLAM re-tracks.
    • Result: AC-3 Met.
  • Stage 3: Persistent VO Failure (New Map Initialization):
    • Condition: "Sharp turn" (AC-4) or >5 frames of tracking loss.
    • Logic: V-SLAM (Section 4.2) declares "Tracking Lost." Initializes new Map_Fragment_k+1. Tracking resumes instantly.
    • Result: AC-4 Met. System "correctly continues the work." The >95% registration rate (AC-9) is met because this is not a failure, it's a new registration.
  • Stage 4: Map-Merging & Global Relocalization (GAB-Assisted):
    • Condition: System is on Map_Fragment_k+1, Map_Fragment_k is "lost."
    • Logic: TOH (Section 6.4) receives GAB anchors for both fragments and geodetically merges them in the global optimizer.14
    • Result: AC-6 Met (strategy to connect separate chunks).
  • Stage 5: Catastrophic Failure (User Intervention):
    • Condition: System is in Stage 3 (Lost) and the GAB has failed for 20% of the route. The "absolutely incapable" scenario (AC-6).
    • Logic: TOH triggers the AC-6 flag. UI prompts user: "Please provide a coarse location for the current image."
    • Action: This user-click is not taken as ground-truth. It is fed to the GAB (Section 5.0) as a strong spatial prior, narrowing its Stage 1 8 search from "the entire AOI" to "a 5km radius." This guarantees the GAB finds a match, which triggers Stage 4, re-localizing the system.
    • Result: AC-6 Met (user input).

8.0 ASTRAL Validation Plan and Acceptance Criteria Matrix

A comprehensive test plan is required to validate compliance with all 10 Acceptance Criteria. The foundation is a Ground-Truth Test Harness using project-provided ground-truth data.

Table 4: ASTRAL Component vs. Acceptance Criteria Compliance Matrix

ID Requirement ASTRAL Solution (Component) Key Technology / Justification
AC-1 80% of photos < 50m error GDB (C-1) + GAB (C-5) + TOH (C-6) Tier-1 (Copernicus) data 1 is sufficient. SOTA VPR 8 + Sim(3) graph 13 can achieve this.
AC-2 60% of photos < 20m error GDB (C-1) + GAB (C-5) + TOH (C-6) Requires Tier-2 (Commercial) Data.4 Mitigates reference error.3 Per-Keyframe Scale 15 model in TOH minimizes drift error.
AC-3 Robust to 350m outlier V-SLAM (C-3) + TOH (C-6) Stage 2 Failure Logic (7.3) discards the frame. Robust M-Estimation (6.3) in Ceres 14 automatically rejects the constraint.
AC-4 Robust to sharp turns (<5% overlap) V-SLAM (C-3) + TOH (C-6) "Atlas" Multi-Map (4.2) initializes new map (Map_Fragment_k+1). Geodetic Map-Merging (6.4) in TOH re-connects fragments via GAB anchors.
AC-5 < 10% outlier anchors TOH (C-6) Robust M-Estimation (Huber Loss) (6.3) in Ceres 14 automatically down-weights and ignores high-residual (bad) GAB anchors.
AC-6 Connect route chunks; User input V-SLAM (C-3) + TOH (C-6) + UI Geodetic Map-Merging (6.4) connects chunks. Stage 5 Failure Logic (7.3) provides the user-input-as-prior mechanism.
AC-7 < 5 seconds processing/image All Components Multi-Scale Pipeline (5.3) (Low-Res V-SLAM, Hi-Res GAB patches). Mandatory TensorRT Acceleration (7.1) for 2-4x speedup.35
AC-8 Real-time stream + async refinement TOH (C-5) + Outputs (C-2.4) Decoupled architecture provides Pose_N_Est (V-SLAM) in real-time and Pose_N_Refined (TOH) asynchronously as GAB anchors arrive.
AC-9 Image Registration Rate > 95% V-SLAM (C-3) "Atlas" Multi-Map (4.2). A "lost track" (AC-4) is not a registration failure; it's a new map registration. This ensures the rate > 95%.
AC-10 Mean Reprojection Error (MRE) < 1.0px V-SLAM (C-3) + TOH (C-6) Local BA (4.3) + Global BA (TOH14) + Per-Keyframe Scale (6.2) minimizes internal graph tension (Flaw 1.3), allowing the optimizer to converge to a low MRE.

8.1 Rigorous Validation Methodology

  • Test Harness: A validation script will be created to compare the system's Pose_N^{Refined} output against a ground-truth coordinates.csv file, computing Haversine distance errors.
  • Test Datasets:
    • Test_Baseline: Standard flight.
    • Test_Outlier_350m (AC-3): A single, unrelated image inserted.
    • Test_Sharp_Turn_5pct (AC-4): A sequence with a 10-frame gap.
    • Test_Long_Route (AC-9, AC-7): A 2000-image sequence.
  • Test Cases:
    • Test_Accuracy: Run Test_Baseline. ASSERT (count(errors < 50m) / total) >= 0.80 (AC-1). ASSERT (count(errors < 20m) / total) >= 0.60 (AC-2).
    • Test_Robustness: Run Test_Outlier_350m and Test_Sharp_Turn_5pct. ASSERT system completes the run and Test_Accuracy assertions still pass on the valid frames.
    • Test_Performance: Run Test_Long_Route on min-spec RTX 2060. ASSERT average_time(Pose_N^{Est} output) < 5.0s (AC-7).
    • Test_MRE: ASSERT TOH.final_MRE < 1.0 (AC-10).