mirror of
https://github.com/azaion/gps-denied-desktop.git
synced 2026-04-22 22:16:36 +00:00
615 lines
51 KiB
Markdown
615 lines
51 KiB
Markdown
# Solution Draft
|
||
|
||
## Assessment Findings
|
||
|
||
|
||
| Old Component Solution | Weak Point | New Solution |
|
||
| --- | --- | --- |
|
||
| No explicit lens undistortion in preprocessing | **Functional (Moderate)**: Draft05 "rectify" step doesn't include undistortion using camera calibration (K, distortion coefficients). Lens distortion causes 5-20px errors at image edges for wide-angle UAV cameras, degrading feature matching and homography estimation. | Add cv2.undistort() after image loading, before downscaling. Uses K matrix and distortion coefficients from camera_params. Cost: ~5-10ms. |
|
||
| GSD assumes nadir camera (no tilt correction) | **Functional (Moderate)**: Camera is "not autostabilized." During turns (10-30° bank), GSD error is 1.5-15.5%. At 18° tilt, >5% error. Propagates directly to VO position estimates. | Extract tilt angle θ from existing homography decomposition R matrix. Apply GSD_corrected = GSD_nadir / cos(θ). Zero additional computation cost. |
|
||
| DINOv2 coarse retrieval uses average pooling | **Performance (Moderate)**: Average pooling is the weakest aggregation method. GeM pooling adds +20pp R@1 on VPR benchmarks. SALAD adds another +12pp. Directly impacts satellite tile retrieval success rate. | Replace with GeM (Generalized Mean) pooling — one-line change, zero overhead. Document SALAD as future enhancement if needed. |
|
||
| Pipeline claims VO/satellite "overlap" on single GPU | **Functional (Low)**: Compute-bound DNN models saturate the GPU; CUDA streams cannot achieve true parallelism on a single GPU (confirmed by PyTorch/CUDA docs). | Clarify: sequential GPU execution (VO first, then satellite). Async Python delivers satellite results while next frame's data is prepared on CPU. Honest throughput: ~450ms/frame. |
|
||
| python-jose for JWT | **Security (Critical)**: Unmaintained ~2 years. Multiple CVEs (DER confusion, timing side-channels). Okta and community recommend migration. | Replace with PyJWT ≥2.10.0. Drop-in replacement for JWT verification/signing. |
|
||
| Pillow ≥11.3.0 | **Security (High)**: CVE-2026-25990 (PSD out-of-bounds write) affects versions <12.1.1. | Upgrade pin to Pillow ≥12.1.1. |
|
||
| aiohttp unversioned | **Security (High)**: 7 CVEs (zip bomb DoS, large payload DoS, request smuggling). | Pin aiohttp ≥3.13.3. |
|
||
| h11 unversioned (uvicorn dependency) | **Security (Critical)**: CVE-2025-43859 (CVSS 9.1, HTTP request smuggling via h11). | Pin h11 ≥0.16.0. |
|
||
| ONNX Runtime unversioned | **Security (High)**: AIKIDO-2026-10185 (path traversal in external data loading). | Pin ONNX Runtime ≥1.24.1. |
|
||
| ENU coordinates centered on starting GPS | **Functional (Low)**: ENU flat-Earth approximation accurate only within ~4km. Flights cover 50-150km. At 50km, error ~12.5m. | Replace with UTM coordinates via pyproj. Auto-select UTM zone from starting GPS. Accurate for flights up to 360km. |
|
||
| No explicit memory management for features | **Performance (Low)**: SuperPoint features from all frames could accumulate to ~6GB RAM for 3000 images if not freed. | Rolling window: discard frame N-1 features after VO matching with frame N. Constant ~2MB feature memory. |
|
||
| safetensors header not validated | **Security (Low)**: Metadata RCE under review (Feb 2026). Polyglot/header-bomb attacks possible. | Validate safetensors header size < 10MB before parsing. |
|
||
|
||
|
||
## Product Solution Description
|
||
|
||
A Python-based GPS-denied visual navigation service that determines GPS coordinates of consecutive UAV photo centers using a hierarchical localization approach: fast visual odometry for frame-to-frame motion, two-stage satellite geo-referencing (coarse retrieval + fine matching) for absolute positioning, and factor graph optimization for trajectory refinement. The system operates as a background REST API service with real-time SSE streaming.
|
||
|
||
**Core approach**: Consecutive images are undistorted using camera calibration parameters, then matched using SuperPoint+LightGlue (learned features with contextual attention matching, MIR 0.92) to estimate relative motion (visual odometry) — chosen for maximum reliability on low-texture terrain. Camera tilt is extracted from homography decomposition to correct GSD during turns. Each image is geo-referenced against satellite imagery through a two-stage process: DINOv2 ViT-S/14 with GeM-pooled coarse retrieval selects the best-matching satellite tile, then LiteSAM (lightweight semi-dense matcher, 6.31M params) refines the alignment to subpixel precision. LiteSAM achieves 61.65% hit rate in Hard conditions on UAV-VisLoc and 77.3% on the authors' self-made dataset. EfficientLoFTR (CVPR 2024) serves as a proven fallback if LiteSAM is unavailable. A GTSAM iSAM2 factor graph fuses VO constraints (BetweenFactorPose2) and satellite anchors (PriorFactorPose2) in UTM coordinates to produce an optimized trajectory. The system handles route disconnections by treating each continuous VO chain as an independent segment, geo-referenced through satellite matching and connected via the shared WGS84 coordinate frame.
|
||
|
||
```
|
||
┌─────────────────────────────────────────────────────────────────────┐
|
||
│ Client (Desktop App) │
|
||
│ POST /jobs (start GPS, camera params, image folder) │
|
||
│ GET /jobs/{id}/stream (SSE) │
|
||
│ POST /jobs/{id}/anchor (user manual GPS input) │
|
||
│ POST /jobs/{id}/batch-anchor (batch manual GPS input) │
|
||
│ GET /jobs/{id}/point-to-gps (image_id, pixel_x, pixel_y) │
|
||
└──────────────────────┬──────────────────────────────────────────────┘
|
||
│ HTTP/SSE (JWT auth via PyJWT)
|
||
┌──────────────────────▼──────────────────────────────────────────────┐
|
||
│ FastAPI Service Layer │
|
||
│ Job Manager → Pipeline Orchestrator → SSE Event Publisher │
|
||
│ (asyncio.Queue-based publisher, heartbeat, Last-Event-ID) │
|
||
└──────────────────────┬──────────────────────────────────────────────┘
|
||
│
|
||
┌──────────────────────▼──────────────────────────────────────────────┐
|
||
│ Processing Pipeline │
|
||
│ │
|
||
│ ┌──────────────┐ ┌──────────────┐ ┌─────────────────────────┐ │
|
||
│ │ Image │ │ Visual │ │ Satellite Geo-Ref │ │
|
||
│ │ Preprocessor │→│ Odometry │→│ Stage 1: DINOv2-S GeM │ │
|
||
│ │ (undistort, │ │ (SuperPoint │ │ retrieval (CPU faiss) │ │
|
||
│ │ downscale, │ │ + LightGlue │ │ Stage 2: LiteSAM fine │ │
|
||
│ │ rectify) │ │ ONNX FP16) │ │ matching (subpixel) │ │
|
||
│ │ │ │ │ │ [fallback: EfficientLoFTR] │
|
||
│ └──────────────┘ └──────────────┘ └─────────────────────────┘ │
|
||
│ │ │ │ │
|
||
│ ▼ ▼ ▼ │
|
||
│ ┌──────────────────────────────────────────────────────────────┐ │
|
||
│ │ GTSAM iSAM2 Factor Graph Optimizer │ │
|
||
│ │ Pose2 + BetweenFactorPose2 (VO) + PriorFactorPose2 (sat) │ │
|
||
│ │ UTM coordinates → WGS84 output │ │
|
||
│ │ [IndeterminantLinearSystemException handling] │ │
|
||
│ └──────────────────────────────────────────────────────────────┘ │
|
||
│ │ │
|
||
│ ┌───────────────────────────▼──────────────────────────────────┐ │
|
||
│ │ Segment Manager │ │
|
||
│ │ (drift thresholds, confidence decay, user input triggers) │ │
|
||
│ └──────────────────────────────────────────────────────────────┘ │
|
||
│ │
|
||
│ ┌──────────────────────────────────────────────────────────────┐ │
|
||
│ │ Multi-Provider Satellite Tile Cache │ │
|
||
│ │ (Google Maps + Mapbox + user tiles, session tokens, │ │
|
||
│ │ DEM cache, request budgeting) │ │
|
||
│ └──────────────────────────────────────────────────────────────┘ │
|
||
│ │
|
||
│ ┌──────────────────────────────────────────────────────────────┐ │
|
||
│ │ Model Weight Manager │ │
|
||
│ │ (SHA256 verification, startup validation, fallback chain) │ │
|
||
│ └──────────────────────────────────────────────────────────────┘ │
|
||
└──────────────────────────────────────────────────────────────────────┘
|
||
```
|
||
|
||
## Architecture
|
||
|
||
### Component: Model Weight Manager
|
||
|
||
| Solution | Tools | Advantages | Limitations | Requirements | Security | Performance | Fit |
|
||
| --- | --- | --- | --- | --- | --- | --- | --- |
|
||
| SHA256 checksum + startup validation + fallback chain | hashlib, safetensors, torch | Prevents supply chain attacks. Detects corruption. Auto-fallback on failure. | Adds ~2-5s startup time. | PyTorch ≥2.10.0 | SHA256 per weight file, safetensors where available, header validation | One-time at startup | **Best** |
|
||
|
||
**Selected**: SHA256 checksum verification with startup validation and safetensors header validation.
|
||
|
||
**Weight manifest** (stored in config):
|
||
|
||
| Model | Source | Format | SHA256 | Fallback |
|
||
| --- | --- | --- | --- | --- |
|
||
| SuperPoint | Official repo | PyTorch | [from repo] | SIFT (OpenCV, no weights) |
|
||
| LightGlue ONNX | GitHub release | ONNX | [from release] | LightGlue PyTorch |
|
||
| DINOv2 ViT-S/14 | torch.hub / HuggingFace | safetensors (preferred) | [from HuggingFace] | None (required) |
|
||
| LiteSAM | Google Drive (pinned link) | .ckpt (pickle) | [compute on first download] | EfficientLoFTR |
|
||
| EfficientLoFTR | HuggingFace | PyTorch | [from HuggingFace] | SuperPoint+LightGlue |
|
||
| SIFT | OpenCV built-in | N/A | N/A | None |
|
||
|
||
**Startup sequence**:
|
||
|
||
1. Verify PyTorch version ≥2.10.0 — refuse to start if older
|
||
2. For each model in manifest: check file exists → verify SHA256 → load with `weights_only=True` → run inference on reference input → confirm output shape
|
||
3. For safetensors files: validate header size < 10MB before parsing
|
||
4. If LiteSAM fails: load EfficientLoFTR, log warning
|
||
5. If EfficientLoFTR fails: load SuperPoint+LightGlue for satellite matching, log warning
|
||
6. If ALL satellite matchers fail: start in VO-only mode, emit `model_degraded` SSE event
|
||
7. SuperPoint, LightGlue, and DINOv2 are required — refuse to start without them
|
||
|
||
### Component: Image Preprocessor
|
||
|
||
| Solution | Tools | Advantages | Limitations | Requirements | Security | Performance | Fit |
|
||
| --- | --- | --- | --- | --- | --- | --- | --- |
|
||
| Undistort + downscale + rectify + validate | OpenCV undistort/resize, NumPy | Corrects lens distortion. Normalizes input. Consistent memory. Validates before loading. | Loses fine detail in downscaled images. | OpenCV, NumPy, camera calibration params | Magic byte validation, dimension check before load | <20ms per image | **Best** |
|
||
|
||
**Selected**: Undistort + downscale + rectify + validate pipeline.
|
||
|
||
**Preprocessing per image**:
|
||
|
||
1. Validate file: check magic bytes (JPEG/PNG/TIFF), reject unknown formats
|
||
2. Read image header only: check dimensions, reject if either > 10,000px
|
||
3. Load image via OpenCV (cv2.imread)
|
||
4. **Undistort**: apply cv2.undistort() using camera intrinsic matrix K and distortion coefficients (provided in camera_params). Corrects radial and tangential distortion.
|
||
5. Downscale to max 1600 pixels on longest edge (preserving aspect ratio)
|
||
6. Store original resolution for GSD: `GSD = (effective_altitude × sensor_width) / (focal_length × original_width)` where `effective_altitude = flight_altitude - terrain_elevation` (terrain from Copernicus DEM if available, otherwise flight_altitude directly since terrain can be neglected per restrictions)
|
||
7. If estimated heading is available: rotate to approximate north-up for satellite matching
|
||
8. If no heading (segment start): pass unrotated
|
||
9. Convert to grayscale for feature extraction
|
||
10. Output: undistorted, downscaled grayscale image + metadata (original dims, GSD, heading if known, K_undistorted)
|
||
|
||
### Component: Feature Extraction
|
||
|
||
| Solution | Tools | Advantages | Limitations | Requirements | Security | Performance | Fit |
|
||
| --- | --- | --- | --- | --- | --- | --- | --- |
|
||
| SuperPoint (for VO) | superpoint (PyTorch) | Learned features, robust to viewpoint/illumination. 256-dim descriptors. High MIR (0.92 with LightGlue) — best reliability on low-texture terrain. | Not rotation-invariant. Slower than XFeat. | NVIDIA GPU, PyTorch, CUDA | Model weights from official source | ~80ms GPU | **Best for VO** |
|
||
| LiteSAM (for satellite matching) | LiteSAM (PyTorch) | Best hit rate on satellite-aerial benchmarks. 6.31M params. Subpixel refinement via MinGRU. End-to-end semi-dense matcher. UAV-VisLoc Hard: 61.65%. Self-made: 77.3%. | Not rotation-invariant. No ONNX. Immature repo (5 stars). | PyTorch, NVIDIA GPU | Model weights from Google Drive (SHA256 verified) | ~140-210ms on RTX 2060 (est.) | **Best for satellite** |
|
||
| EfficientLoFTR (satellite fallback) | EfficientLoFTR (PyTorch) | CVPR 2024, 964 stars. HuggingFace integration. Proven semi-dense matcher. | 15.05M params (2.4x more than LiteSAM). Slightly lower hit rate. | PyTorch, NVIDIA GPU | HuggingFace | ~150-250ms on RTX 2060 (est.) | **Satellite fallback** |
|
||
| SIFT (rotation fallback) | OpenCV cv2.SIFT | Rotation-invariant. Scale-invariant. Proven SIFT+LightGlue hybrid for UAV mosaicking (ISPRS 2025). | Slower. Less discriminative in low-texture. | OpenCV | N/A | ~200ms CPU | **Rotation fallback** |
|
||
|
||
**Selected**: SuperPoint+LightGlue ONNX FP16 for VO (maximum reliability), LiteSAM for satellite fine matching (EfficientLoFTR fallback), SIFT+LightGlue as rotation-heavy fallback.
|
||
|
||
**VRAM budget**:
|
||
|
||
| Model | VRAM | Loaded When |
|
||
| --- | --- | --- |
|
||
| SuperPoint | ~400MB | Always (VO every frame) |
|
||
| LightGlue ONNX FP16 | ~500MB | Always (VO every frame) |
|
||
| DINOv2 ViT-S/14 | ~300MB | Satellite coarse retrieval |
|
||
| LiteSAM (6.31M params) | ~400MB | Satellite fine matching |
|
||
| **Peak total** | **~1.6GB** | Satellite matching phase |
|
||
| EfficientLoFTR (if fallback) | ~600MB | Replaces LiteSAM slot |
|
||
| **Peak with fallback** | **~1.8GB** | Satellite matching phase |
|
||
|
||
**Memory management**: After VO matching between frame N and frame N-1, discard frame N-1's SuperPoint keypoints and descriptors from GPU and CPU memory. Only the current frame's features are retained for the next VO iteration. This keeps feature memory constant at ~2MB regardless of flight length.
|
||
|
||
### Component: Feature Matching
|
||
|
||
| Solution | Tools | Advantages | Limitations | Requirements | Security | Performance | Fit |
|
||
| --- | --- | --- | --- | --- | --- | --- | --- |
|
||
| SuperPoint+LightGlue ONNX FP16 (VO) | SuperPoint + LightGlue-ONNX | Highest match quality (MIR 0.92). LightGlue attention disambiguates repetitive patterns. Best reliability on low-texture terrain. FP16 on Turing. | Not rotation-invariant. ~150-200ms total. | PyTorch, ONNX Runtime ≥1.24.1, NVIDIA GPU | Model weights from official source | ~130-180ms on RTX 2060 | **Best for VO** |
|
||
| LiteSAM (satellite fine matching) | LiteSAM (PyTorch) | Best hit rate on satellite-aerial benchmarks (61.65% Hard on UAV-VisLoc, 77.3% on self-made). 6.31M params. Subpixel refinement. | Not rotation-invariant. No ONNX. | PyTorch, NVIDIA GPU | SHA256 verified weights | ~140-210ms on RTX 2060 (est.) | **Best for satellite** |
|
||
| EfficientLoFTR (satellite fallback) | EfficientLoFTR (PyTorch) | Proven base architecture. CVPR 2024. Reliable. | Slightly lower hit rate than LiteSAM. More params. | PyTorch, NVIDIA GPU | HuggingFace | ~150-250ms on RTX 2060 (est.) | **Satellite fallback** |
|
||
| SIFT+LightGlue (rotation fallback) | OpenCV SIFT + LightGlue | SIFT rotation invariance + LightGlue contextual matching. Proven superior for high-rotation UAV (ISPRS 2025). | Slower than XFeat. | OpenCV + ONNX Runtime | N/A | ~250ms total | **Rotation fallback** |
|
||
|
||
**Selected**: SuperPoint+LightGlue ONNX FP16 for VO, LiteSAM for satellite fine matching (EfficientLoFTR fallback), SIFT+LightGlue as rotation fallback.
|
||
|
||
**Satellite fine matcher fallback chain**: LiteSAM → EfficientLoFTR → SIFT+LightGlue
|
||
|
||
### Component: Visual Odometry (Consecutive Frame Matching)
|
||
|
||
| Solution | Tools | Advantages | Limitations | Requirements | Security | Performance | Fit |
|
||
| --- | --- | --- | --- | --- | --- | --- | --- |
|
||
| Homography VO with essential matrix fallback + tilt-corrected GSD | OpenCV findHomography (USAC_MAGSAC), findEssentialMat, decomposeHomographyMat | Homography: optimal for flat terrain. Essential matrix: non-planar fallback. Known altitude resolves scale. Tilt extracted from R corrects GSD during turns. | Homography assumes planar. 4-way decomposition ambiguity. | OpenCV, NumPy | N/A | ~5ms for estimation | **Best** |
|
||
|
||
**Selected**: Homography VO with essential matrix fallback, tilt-corrected GSD, and DEM terrain correction.
|
||
|
||
**VO Pipeline per frame**:
|
||
|
||
1. Extract SuperPoint features from current image (~80ms)
|
||
2. Match with previous image using LightGlue ONNX FP16 (~50-100ms)
|
||
3. **Discard previous frame's SuperPoint features** from memory (rolling window)
|
||
4. **Triple failure check**: match count ≥ 30 AND RANSAC inlier ratio ≥ 0.4 AND motion magnitude consistent with expected inter-frame distance (100m ± 250m)
|
||
5. If checks pass → estimate homography (cv2.findHomography with USAC_MAGSAC, confidence 0.999, max iterations 2000)
|
||
6. If RANSAC inlier ratio < 0.6 → additionally estimate essential matrix as quality check
|
||
7. **Decomposition disambiguation** (4 solutions from decomposeHomographyMat):
|
||
a. Filter by positive depth: triangulate 5 matched points, reject if behind camera
|
||
b. Filter by plane normal: normal z-component > 0.5 (downward camera → ground plane normal points up)
|
||
c. If previous direction available: prefer solution consistent with expected motion
|
||
d. Orthogonality check: verify R^T R ≈ I (Frobenius norm < 0.01). If failed, re-orthogonalize via SVD: U,S,V = svd(R), R_clean = U @ V^T
|
||
e. First frame pair in segment: use filters a+b only
|
||
8. **Tilt-corrected GSD**: Extract camera tilt angle θ from rotation matrix R (pitch/roll relative to nadir). Correction: `GSD_corrected = GSD_nadir / cos(θ)`. For first frame in segment (no R available), use GSD_nadir (assume straight flight). Terrain correction: if Copernicus DEM available at estimated position → `effective_altitude = flight_altitude - terrain_elevation` → `GSD_nadir = (effective_altitude × sensor_width) / (focal_length × original_image_width)`. If DEM unavailable, use flight_altitude directly (terrain negligible per restrictions).
|
||
9. Convert pixel displacement to meters: `displacement_m = displacement_px × GSD_corrected`
|
||
10. Update position: `new_pos = prev_pos + rotation @ displacement_m`
|
||
11. Track cumulative heading for image rectification
|
||
12. If triple failure check fails → trigger segment break
|
||
|
||
### Component: Satellite Image Geo-Referencing (Two-Stage)
|
||
|
||
| Solution | Tools | Advantages | Limitations | Requirements | Security | Performance | Fit |
|
||
| --- | --- | --- | --- | --- | --- | --- | --- |
|
||
| Stage 1: DINOv2 ViT-S/14 GeM retrieval | dinov2 ViT-S/14 (PyTorch), faiss (CPU) | Fast (50ms). 300MB VRAM. GeM pooling captures spatial layout better than average pooling (+20pp retrieval recall on VPR benchmarks). Semantic matching robust to seasonal change. | Coarse only (~tile-level). Lower precision than ViT-B/ViT-L. | PyTorch, faiss-cpu | Model weights from official source | ~50ms extract + <1ms search | **Best coarse** |
|
||
| Stage 2: LiteSAM fine matching | LiteSAM (PyTorch) | Best satellite-aerial hit rate (61.65% Hard on UAV-VisLoc, 77.3% on self-made). Subpixel accuracy via MinGRU. 6.31M params, ~400MB VRAM. End-to-end semi-dense matching. | Not rotation-invariant. No ONNX. Immature codebase. | PyTorch, NVIDIA GPU | SHA256 verified weights | ~140-210ms on RTX 2060 (est.) | **Best fine** |
|
||
| Stage 2 fallback: EfficientLoFTR | EfficientLoFTR (PyTorch) | CVPR 2024. Mature. HuggingFace. LiteSAM's base architecture. | 15.05M params. ~600MB VRAM. | PyTorch, NVIDIA GPU | HuggingFace weights | ~150-250ms on RTX 2060 (est.) | **Fine fallback** |
|
||
|
||
**Selected**: Two-stage hierarchical matching — DINOv2 GeM-pooled coarse retrieval + LiteSAM fine matching (EfficientLoFTR fallback).
|
||
|
||
**DINOv2 aggregation**: Replace spatial average pooling with Generalized Mean (GeM) pooling: `gem(x, p=3) = (mean(x^p))^(1/p)`. GeM emphasizes discriminative patch activations, providing +20pp retrieval recall over average pooling on VPR benchmarks. Zero additional latency. Future enhancement: add SALAD optimal transport aggregation (additional +12pp, <3ms overhead) if GeM retrieval recall proves insufficient.
|
||
|
||
**Satellite Matching Pipeline**:
|
||
|
||
1. Estimate approximate position from VO
|
||
2. **Stage 1 — Coarse retrieval**:
|
||
a. Define search area: 500m radius around VO estimate (expand to 1km if segment just started or drift > 100m)
|
||
b. Pre-compute DINOv2 ViT-S/14 GeM-pooled embeddings for all satellite tiles in search area. Method: extract patch tokens (not CLS), apply GeM pooling to get a single descriptor per tile. Cache embeddings.
|
||
c. Extract DINOv2 ViT-S/14 GeM-pooled embedding from UAV image (same pooling)
|
||
d. Find top-5 most similar satellite tiles using faiss (CPU) cosine similarity
|
||
3. **Stage 2 — Fine matching** (on top-5 tiles, stop on first good match):
|
||
a. Warp UAV image to approximate nadir view using estimated camera pose
|
||
b. **Rotation handling**:
|
||
- If heading known: single attempt with rectified image
|
||
- If no heading (segment start): try 4 rotations {0°, 90°, 180°, 270°}
|
||
c. Run LiteSAM (or EfficientLoFTR fallback) on (uav_warped, sat_tile) → semi-dense correspondences with subpixel accuracy
|
||
d. **Geometric validation**: require ≥15 inliers, inlier ratio ≥ 0.3, reprojection error < 3px
|
||
e. If valid: estimate homography → transform image center → satellite pixel → WGS84
|
||
f. Report: absolute position anchor with confidence based on match quality
|
||
4. If all 5 tiles fail Stage 2 with LiteSAM/EfficientLoFTR:
|
||
a. Try SIFT+LightGlue on top-3 tiles (rotation-invariant). Trigger: best LiteSAM inlier ratio was < 0.15.
|
||
b. Try zoom level 17 (wider view)
|
||
5. If still fails: mark frame as VO-only, reduce confidence, continue
|
||
|
||
**Satellite matching frequency**: Every frame, executed sequentially on GPU after VO completes. Satellite result for frame N is added to the factor graph before frame N+1's VO begins, enabling immediate backward correction.
|
||
|
||
### Component: GTSAM Factor Graph Optimizer
|
||
|
||
| Solution | Tools | Advantages | Limitations | Requirements | Security | Performance | Fit |
|
||
| --- | --- | --- | --- | --- | --- | --- | --- |
|
||
| GTSAM iSAM2 factor graph (Pose2) in UTM | gtsam==4.2 (pip), pyproj | Incremental smoothing. Proper uncertainty propagation. Native BetweenFactorPose2 and PriorFactorPose2. Backward smoothing on new evidence. Python bindings. UTM accurate for any realistic flight range. | C++ backend (pip binary). Learning curve. | gtsam==4.2, NumPy, pyproj | N/A | ~5-10ms incremental update | **Best** |
|
||
|
||
**Selected**: GTSAM iSAM2 with Pose2 variables in UTM coordinates.
|
||
|
||
**Coordinate system**: UTM (Universal Transverse Mercator) with zone auto-selected from starting GPS via pyproj. All internal positions computed in UTM meters. Converted to WGS84 for output. UTM is accurate for flights up to 360km within a zone — covers any realistic flight. Starting GPS → `pyproj.Proj(proj='utm', zone=auto, ellps='WGS84')`.
|
||
|
||
**Factor graph structure**:
|
||
|
||
- **Variables**: Pose2 (utm_easting, utm_northing, heading) per image
|
||
- **Prior Factor** (PriorFactorPose2): first frame anchored at its UTM position with tight noise (sigma_xy = 5m if GPS accurate, sigma_theta = 0.1 rad)
|
||
- **VO Factor** (BetweenFactorPose2): relative motion between consecutive frames. Noise model: `Diagonal.Sigmas([sigma_x, sigma_y, sigma_theta])` where sigma scales inversely with RANSAC inlier ratio. High inlier ratio (0.8) → sigma 2m. Low inlier ratio (0.4) → sigma 10m. Sigma_theta proportional to displacement magnitude.
|
||
- **Satellite Anchor Factor** (PriorFactorPose2): absolute position from satellite matching. Position noise: `sigma = reprojection_error × GSD × scale_factor`. Good match (0.5px × 0.4m/px × 3) = 0.6m. Poor match = 5-10m. Heading component: loose (sigma = 1.0 rad) unless estimated from satellite alignment.
|
||
- **Satellite age adjustment**: For tiles known to be >1 year old (conflict zones), multiply satellite anchor noise sigma by 2.0 to reduce their influence on optimization.
|
||
|
||
**Optimizer behavior**:
|
||
|
||
- On each new frame: add VO factor, run iSAM2.update() → ~5ms
|
||
- On satellite match arrival: add PriorFactorPose2, run iSAM2.update() → backward correction
|
||
- Emit updated positions via SSE after each update
|
||
- Refinement events: when backward correction moves positions by >1m, emit "refined" SSE event
|
||
- No custom Python factors — all factors use native GTSAM C++ implementations for speed
|
||
|
||
**Error handling**:
|
||
|
||
- Wrap every iSAM2.update() in try/except for `gtsam.IndeterminantLinearSystemException`
|
||
- On exception: log error with factor details, skip the problematic factor, retry with 10x noise sigma
|
||
- If initial prior factor fails: re-initialize graph with relaxed noise (sigma_xy = 50m, sigma_theta = 0.5 rad)
|
||
- If persistent failures (>3 consecutive): reset graph from last known-good state, re-add factors incrementally
|
||
- Never crash the pipeline — degrade to VO-only positioning if optimizer is unusable
|
||
|
||
### Component: Segment Manager
|
||
|
||
The segment manager tracks independent VO chains, manages drift thresholds, and handles reconnection.
|
||
|
||
**Segment lifecycle**:
|
||
|
||
1. **Start condition**: First image, OR VO triple failure check fails
|
||
2. **Active tracking**: VO provides frame-to-frame motion within segment
|
||
3. **Anchoring**: Satellite two-stage matching provides absolute position
|
||
4. **End condition**: VO failure (sharp turn, outlier >350m, occlusion)
|
||
5. **New segment**: Starts, attempts satellite anchor immediately
|
||
|
||
**Segment states**:
|
||
|
||
- `ANCHORED`: At least one satellite match → HIGH confidence
|
||
- `FLOATING`: No satellite match yet → positioned relative to segment start → LOW confidence
|
||
- `USER_ANCHORED`: User provided manual GPS → MEDIUM confidence
|
||
|
||
**Drift monitoring**:
|
||
|
||
- Track cumulative VO displacement since last satellite anchor per segment
|
||
- **100m threshold**: emit warning SSE event, expand satellite search radius to 1km, increase matching attempts per frame
|
||
- **200m threshold**: emit `user_input_needed` SSE event with configurable timeout (default: 30s)
|
||
- **500m threshold**: mark all subsequent positions as VERY LOW confidence, continue processing
|
||
- **Confidence formula**: `confidence = base_confidence × exp(-drift / decay_constant)` where base_confidence is from satellite match quality, drift is distance from nearest anchor, decay_constant = 100m
|
||
|
||
**Segment reconnection**:
|
||
|
||
- When a segment becomes ANCHORED, check for nearby FLOATING segments (within 500m of any anchored position)
|
||
- Attempt satellite-based position matching between FLOATING segment images and tiles near the ANCHORED segment
|
||
- **Reconnection order** (for 5+ segments): process by proximity to nearest ANCHORED segment first (greedy nearest-neighbor)
|
||
- **Reconnection validation**: require geometric consistency (heading continuity) and DEM elevation profile consistency between adjacent segments before merging
|
||
- If no match after all frames tried: request user input, auto-continue after timeout
|
||
|
||
### Component: Multi-Provider Satellite Tile Cache
|
||
|
||
| Solution | Tools | Advantages | Limitations | Requirements | Security | Performance | Fit |
|
||
| --- | --- | --- | --- | --- | --- | --- | --- |
|
||
| Multi-provider progressive cache with DEM | aiohttp ≥3.13.3, aiofiles, sqlite3, faiss-cpu | Multiple providers. Async download. DINOv2/GeM pre-computation. DEM cached. Session token management. | Needs internet. Provider API differences. | Google Maps Tiles API + Mapbox API keys | API keys in env vars only. Session tokens managed internally. | Async, non-blocking | **Best** |
|
||
|
||
**Selected**: Multi-provider progressive cache.
|
||
|
||
**Provider priority**:
|
||
|
||
1. User-provided tiles (highest priority — custom/recent imagery)
|
||
2. Google Maps (zoom 18, ~0.4m/px) — 100K free requests/month, 15K/day
|
||
3. Mapbox Satellite (zoom 16-18, ~0.6-0.3m/px) — 200K free requests/month
|
||
|
||
**Conflict zone awareness**: For eastern Ukraine (configurable region polygon), Google Maps imagery is known to be 1-3 years old. System should: 1) log a warning at job start, 2) increase satellite anchor noise sigma by 2.0×, 3) prioritize user-provided tiles if available, 4) lower satellite matching confidence threshold by 0.3.
|
||
|
||
**Google Maps session management**:
|
||
|
||
1. On job start: POST to `/v1/createSession` with API key → receive session token
|
||
2. Use session token in all subsequent tile requests for this job
|
||
3. Token has finite lifetime — handle expiry by creating new session
|
||
4. Track request count per day per provider
|
||
|
||
**Cache strategy**:
|
||
|
||
1. On job start: download tiles in 1km radius around starting GPS from primary provider
|
||
2. Pre-compute DINOv2 ViT-S/14 GeM-pooled embeddings for all cached tiles
|
||
3. As route extends: download tiles 500m ahead of estimated position
|
||
4. **Request budgeting**: track daily API requests per provider. At 80% daily limit (12,000 for Google): switch to Mapbox. Log budget status.
|
||
5. Cache structure on disk:
|
||
```
|
||
cache/
|
||
├── tiles/{provider}/{zoom}/{x}/{y}.jpg
|
||
├── embeddings/{provider}/{zoom}/{x}/{y}_dino_gem.npy (DINOv2 GeM embedding)
|
||
└── dem/{lat}_{lon}.tif (Copernicus DEM tiles)
|
||
```
|
||
6. Cache persistent across jobs — tiles and features reused for overlapping areas
|
||
7. **DEM cache**: Copernicus DEM GLO-30 tiles from AWS S3 (free, no auth). `s3://copernicus-dem-30m/`. Cloud Optimized GeoTIFFs, 30m resolution. Downloaded via HTTPS (no AWS SDK needed): `https://copernicus-dem-30m.s3.amazonaws.com/Copernicus_DSM_COG_10_{N|S}{lat}_00_{E|W}{lon}_DEM/...`
|
||
|
||
**Tile download budget**:
|
||
|
||
- Google Maps: 100,000/month, 15,000/day → ~7 flights/day from cache misses, ~50 flights/month
|
||
- Mapbox: 200,000/month → additional ~100 flights/month
|
||
- Per flight: ~2000 satellite tiles (~80MB) + ~200 DEM tiles (~10MB)
|
||
|
||
### Component: API & Real-Time Streaming
|
||
|
||
| Solution | Tools | Advantages | Limitations | Requirements | Security | Performance | Fit |
|
||
| --- | --- | --- | --- | --- | --- | --- | --- |
|
||
| FastAPI + SSE (Queue-based) + PyJWT | FastAPI ≥0.135.0, asyncio.Queue, uvicorn, PyJWT ≥2.10.0 | Native SSE. Queue-based publisher avoids generator cleanup issues. PyJWT actively maintained. OpenAPI auto-generated. | Python GIL (mitigated with asyncio). | Python 3.11+, uvicorn | JWT, CORS, rate limiting, CSP headers | Async, non-blocking | **Best** |
|
||
|
||
**Selected**: FastAPI + Queue-based SSE + PyJWT authentication.
|
||
|
||
**SSE implementation**:
|
||
|
||
- Use `asyncio.Queue` per client connection (not bare async generators)
|
||
- Server pushes events to queue; client reads from queue
|
||
- On disconnect: queue is garbage collected, no lingering generators
|
||
- SSE heartbeat: send `event: heartbeat` every 15 seconds to detect stale connections
|
||
- Support `Last-Event-ID` header for reconnection: include monotonic event ID in each SSE message. On reconnect, replay missed events from in-memory ring buffer (last 1000 events per job).
|
||
|
||
**API Endpoints**:
|
||
|
||
```
|
||
POST /auth/token
|
||
Body: { api_key }
|
||
Returns: { access_token, token_type, expires_in }
|
||
|
||
POST /jobs
|
||
Headers: Authorization: Bearer <token>
|
||
Body: { start_lat, start_lon, altitude, camera_params, image_folder }
|
||
Returns: { job_id }
|
||
|
||
GET /jobs/{job_id}/stream
|
||
Headers: Authorization: Bearer <token>
|
||
SSE stream of:
|
||
- { event: "position", id: "42", data: { image_id, lat, lon, confidence, segment_id } }
|
||
- { event: "refined", id: "43", data: { image_id, lat, lon, confidence, delta_m } }
|
||
- { event: "segment_start", id: "44", data: { segment_id, reason } }
|
||
- { event: "drift_warning", id: "45", data: { segment_id, cumulative_drift_m } }
|
||
- { event: "user_input_needed", id: "46", data: { image_id, reason, timeout_s } }
|
||
- { event: "model_degraded", id: "47", data: { model, fallback, reason } }
|
||
- { event: "heartbeat", id: "48", data: { timestamp } }
|
||
- { event: "complete", id: "49", data: { summary } }
|
||
|
||
POST /jobs/{job_id}/anchor
|
||
Headers: Authorization: Bearer <token>
|
||
Body: { image_id, lat, lon }
|
||
|
||
POST /jobs/{job_id}/batch-anchor
|
||
Headers: Authorization: Bearer <token>
|
||
Body: { anchors: [{ image_id, lat, lon }, ...] }
|
||
|
||
GET /jobs/{job_id}/point-to-gps?image_id=X&px=100&py=200
|
||
Headers: Authorization: Bearer <token>
|
||
Returns: { lat, lon, confidence }
|
||
|
||
GET /jobs/{job_id}/results?format=geojson
|
||
Headers: Authorization: Bearer <token>
|
||
Returns: full results as GeoJSON or CSV (WGS84)
|
||
```
|
||
|
||
**Security measures**:
|
||
|
||
- JWT authentication via PyJWT ≥2.10.0 on all endpoints (short-lived tokens, 1h expiry)
|
||
- Image folder whitelist: resolve to canonical path (os.path.realpath), verify under configured base directories
|
||
- Image validation: magic byte check (JPEG FFD8, PNG 89504E47, TIFF 4949/4D4D), dimension check (<10,000px per side), reject others
|
||
- **Pin Pillow ≥12.1.1** (CVE-2026-25990 mitigation)
|
||
- **Pin PyTorch ≥2.10.0** (CVE-2025-32434 and CVE-2026-24747 mitigation)
|
||
- **Pin aiohttp ≥3.13.3** (7 CVEs mitigated)
|
||
- **Pin h11 ≥0.16.0** (CVE-2025-43859, CVSS 9.1 request smuggling)
|
||
- **Pin ONNX Runtime ≥1.24.1** (AIKIDO-2026-10185 path traversal)
|
||
- **SHA256 checksum verification for all model weights** (especially LiteSAM from Google Drive)
|
||
- **Use `weights_only=True` for all torch.load() calls** (defense-in-depth; not sole protection)
|
||
- **Prefer safetensors format** where available (DINOv2 from HuggingFace). Validate header size < 10MB.
|
||
- Max concurrent SSE connections per client: 5
|
||
- Rate limiting: 100 requests/minute per client
|
||
- All provider API keys in environment variables, never logged or returned
|
||
- CORS configured for known client origins only
|
||
- Content-Security-Policy headers
|
||
- SSE heartbeat prevents stale connections accumulating
|
||
|
||
### Component: Interactive Point-to-GPS Lookup
|
||
|
||
For each processed image, the system stores the estimated camera-to-ground transformation. Given pixel coordinates (px, py):
|
||
|
||
1. **Undistort the pixel**: apply cv2.undistortPoints() on (px, py) using camera K and distortion coefficients to get corrected coordinates.
|
||
2. If image has satellite match: use computed homography to project undistorted (px, py) → satellite tile coordinates → WGS84. HIGH confidence.
|
||
3. If image has only VO pose: use camera intrinsics + DEM-corrected altitude + estimated heading to ray-cast undistorted (px, py) to ground plane → WGS84. MEDIUM confidence.
|
||
4. Confidence score derived from underlying position estimate quality.
|
||
|
||
### Component: GPU Scheduling
|
||
|
||
Single-GPU sequential execution model:
|
||
|
||
1. **VO phase** (latency-critical): SuperPoint extraction → LightGlue matching → homography estimation → GTSAM update → SSE position emit. Total: ~200ms GPU + ~10ms CPU.
|
||
2. **Satellite phase** (correction): DINOv2 embedding → faiss search → LiteSAM fine matching → geometric validation → GTSAM satellite anchor update → SSE refined emit. Total: ~250ms GPU + ~10ms CPU.
|
||
3. **CPU overlap**: While GPU executes satellite matching, CPU prepares next frame (image loading, undistortion, validation). When GPU is in VO phase, CPU processes GTSAM updates from previous satellite results.
|
||
4. Total GPU time per frame: ~450ms. Total wall-clock per frame: ~450ms (GPU-bound). Well under 5s budget.
|
||
5. pin_memory() for CPU→GPU transfers, non_blocking=True for GPU→CPU transfers to overlap data movement with computation.
|
||
|
||
## Processing Time Budget
|
||
|
||
| Step | Component | Time | GPU/CPU | Notes |
|
||
| --- | --- | --- | --- | --- |
|
||
| 1 | Image load + validate + undistort + downscale | <20ms | CPU | OpenCV (undistort adds ~5-10ms) |
|
||
| 2 | SuperPoint feature extraction | ~80ms | GPU | 256-dim descriptors |
|
||
| 3 | LightGlue ONNX FP16 matching | ~50-100ms | GPU | Contextual matcher |
|
||
| 4 | Homography estimation + decomposition + tilt extraction | ~5ms | CPU | USAC_MAGSAC |
|
||
| 5 | GTSAM iSAM2 update (VO factor) | ~5ms | CPU | Incremental |
|
||
| 6 | SSE position emit | <1ms | CPU | Queue push |
|
||
| **VO subtotal** | | **~160-210ms** | | **Per-frame critical path** |
|
||
| 7 | DINOv2 ViT-S/14 GeM extract (UAV image) | ~50ms | GPU | GeM-pooled patch tokens |
|
||
| 8 | faiss cosine search (top-5 tiles) | <1ms | CPU | ~2000 vectors |
|
||
| 9 | LiteSAM fine matching (per tile, up to 5) | ~140-210ms | GPU | End-to-end semi-dense, est. RTX 2060 |
|
||
| 10 | Geometric validation + homography | ~5ms | CPU | |
|
||
| 11 | GTSAM iSAM2 update (satellite factor) | ~5ms | CPU | Backward correction |
|
||
| **Satellite subtotal** | | **~201-271ms** | | **Sequential after VO** |
|
||
| **Total per frame** | | **~361-481ms** | | **Well under 5s budget** |
|
||
|
||
## Memory Budget
|
||
|
||
| Component | Memory | Growth | Notes |
|
||
| --- | --- | --- | --- |
|
||
| SuperPoint features (rolling window) | ~2MB | Constant | Only current frame retained |
|
||
| GTSAM factor graph (3000 nodes) | ~300KB | Linear, slow | Pose2: 24 bytes/node + factors |
|
||
| GTSAM internal Bayes tree | ~10MB | Linear, slow | iSAM2 working memory |
|
||
| Satellite tile cache (2000 tiles) | ~393MB | Per flight | Persistent across jobs |
|
||
| DINOv2 GeM embeddings (2000 tiles) | ~3MB | Per flight | Cached alongside tiles |
|
||
| SSE ring buffer (1000 events) | ~1MB | Constant | Per job |
|
||
| Model weights (GPU) | ~1.6GB VRAM | Constant | All models loaded at startup |
|
||
| PyTorch + CUDA overhead | ~500MB VRAM | Constant | Framework overhead |
|
||
| **Total RAM peak** | **~500MB** | | Excluding tile cache (~400MB) |
|
||
| **Total VRAM peak** | **~2.1GB** | | Well under 6GB |
|
||
|
||
## Testing Strategy
|
||
|
||
### Integration / Functional Tests
|
||
|
||
- End-to-end pipeline test using provided 60-image sample dataset with ground truth GPS
|
||
- Verify 80% of positions within 50m of ground truth
|
||
- Verify 60% of positions within 20m of ground truth
|
||
- Test sharp turn handling: simulate 90° turn with non-overlapping images
|
||
- Test segment creation, satellite anchoring, and cross-segment reconnection
|
||
- Test segment reconnection ordering with 5+ disconnected segments
|
||
- Test user manual anchor injection via POST endpoint
|
||
- Test batch anchor endpoint with multiple anchors for multi-segment scenarios
|
||
- Test point-to-GPS lookup accuracy against known ground coordinates
|
||
- Test SSE streaming delivers results within 1s of processing completion
|
||
- Test with FullHD resolution images (pipeline must not fail)
|
||
- Test with 6252×4168 images (verify downscaling and memory usage)
|
||
- Test DINOv2 ViT-S/14 GeM-pooled coarse retrieval finds correct satellite tile with 100m VO drift
|
||
- Test GeM retrieval vs average pooling: compare recall on test satellite tile set
|
||
- Test multi-provider fallback: block Google Maps, verify Mapbox takes over
|
||
- Test with outdated satellite imagery: verify confidence scores reflect match quality
|
||
- Test outlier handling: 350m gap between consecutive photos
|
||
- Test image rotation handling: apply 45° and 90° rotation, verify 4-rotation retry works
|
||
- Test SIFT+LightGlue fallback triggers when LiteSAM inlier ratio < 0.15
|
||
- Test GTSAM PriorFactorPose2 satellite anchoring produces backward correction
|
||
- Test drift warning at 100m cumulative displacement without satellite anchor
|
||
- Test user_input_needed event at 200m cumulative displacement
|
||
- Test SSE heartbeat arrives every 15s during long processing
|
||
- Test SSE reconnection with Last-Event-ID replays missed events
|
||
- Test homography decomposition disambiguation for first frame pair (no previous direction)
|
||
- Test LiteSAM fine matching produces valid correspondences on satellite-aerial pair
|
||
- Test LiteSAM subpixel accuracy improves homography estimation vs pixel-level only
|
||
- Test EfficientLoFTR fallback activates when LiteSAM fails startup validation
|
||
- Test VO-only mode when all satellite matchers fail to load
|
||
- Test model_degraded SSE event is emitted on fallback activation
|
||
- Test iSAM2 IndeterminantLinearSystemException recovery (skip factor + retry with relaxed noise)
|
||
- Test iSAM2 initial prior factor failure recovery (relaxed re-initialization)
|
||
- Test conflict zone imagery staleness: verify increased noise sigma for satellite anchors
|
||
- Test lens undistortion: compare feature matching accuracy with/without cv2.undistort() on edge features
|
||
- Test tilt-corrected GSD: simulate 20° camera tilt, verify GSD correction reduces position error
|
||
- Test UTM coordinate conversion: verify WGS84→UTM→WGS84 round-trip accuracy
|
||
- Test UTM for long flight: process 100km simulated track, verify no coordinate drift
|
||
- Test memory stability: process 3000 images, verify SuperPoint feature memory stays constant (~2MB)
|
||
- Test safetensors header validation: reject files with oversized headers
|
||
|
||
### Non-Functional Tests
|
||
|
||
- Processing speed: <5s per image on RTX 2060 (target <481ms sequential VO + satellite)
|
||
- Memory: peak RAM <16GB, VRAM <6GB during 3000-image flight at max resolution
|
||
- VRAM: verify peak stays under 2.1GB (all models loaded + framework overhead)
|
||
- Memory stability: process 3000 images, verify no memory leak (stable RSS over time, feature rolling window working)
|
||
- Concurrent jobs: 2 simultaneous flights, verify isolation and resource sharing
|
||
- Tile cache: verify tiles and DINOv2 GeM embeddings cached and reused
|
||
- API: load test SSE connections (10 simultaneous clients)
|
||
- Recovery: kill and restart service mid-job, verify job can resume from last processed image
|
||
- DEM download: verify Copernicus DEM tiles fetched from AWS S3 and cached correctly
|
||
- GTSAM optimizer: verify backward correction produces "refined" events
|
||
- Session token lifecycle: verify Google Maps session creation, usage, and expiry handling
|
||
- Model startup validation: verify all weight checksums pass within <10s total
|
||
|
||
### Security Tests
|
||
|
||
- JWT authentication enforcement on all endpoints (PyJWT)
|
||
- Expired/invalid token rejection
|
||
- Provider API keys not exposed in responses, logs, or error messages
|
||
- Image folder path traversal prevention (attempt to access /etc/passwd via image_folder)
|
||
- Image folder whitelist enforcement (canonical path resolution)
|
||
- Image magic byte validation: reject non-image files renamed to .jpg
|
||
- Image dimension validation: reject >10,000px images
|
||
- Input validation: invalid GPS coordinates, negative altitude, malformed camera params
|
||
- Rate limiting: verify 429 response after exceeding limit
|
||
- Max SSE connection enforcement
|
||
- CORS enforcement: reject requests from unknown origins
|
||
- Content-Security-Policy header presence
|
||
- **Pillow version ≥12.1.1 verified in requirements**
|
||
- **PyTorch version ≥2.10.0 verified in requirements**
|
||
- **aiohttp version ≥3.13.3 verified in requirements**
|
||
- **h11 version ≥0.16.0 verified in requirements**
|
||
- **ONNX Runtime version ≥1.24.1 verified in requirements**
|
||
- **SHA256 checksum verification for all model weight files**
|
||
- **Verify weights_only=True used in all torch.load() calls**
|
||
- **Verify safetensors format used for DINOv2 (no pickle deserialization)**
|
||
- **Verify safetensors header size validation (< 10MB)**
|
||
- **LiteSAM weight integrity: verify SHA256 matches config on every load**
|
||
- **Verify python-jose is NOT in requirements (replaced by PyJWT)**
|
||
|
||
## References
|
||
|
||
- [LiteSAM (Remote Sensing, Oct 2025)](https://www.mdpi.com/2072-4292/17/19/3349) — Lightweight satellite-aerial feature matching, 6.31M params, UAV-VisLoc Hard HR 61.65%, RMSE@30=17.86m; self-made dataset Hard HR 77.3%
|
||
- [LiteSAM GitHub](https://github.com/boyagesmile/LiteSAM) — Official code, pretrained weights on Google Drive, 5 stars, built upon EfficientLoFTR
|
||
- [EfficientLoFTR (CVPR 2024)](https://github.com/zju3dv/EfficientLoFTR) — LiteSAM's base architecture, 15.05M params, 964 stars, HuggingFace integration
|
||
- [XFeat (CVPR 2024)](https://github.com/verlab/accelerated_features) — 5x faster than SuperPoint, AUC@10° 65.4 (MNN) vs SuperPoint+LightGlue AUC@10° 75.0 (MIR 0.92). SP+LG more reliable on low-texture.
|
||
- [SatLoc-Fusion (2025)](https://www.mdpi.com/2072-4292/17/17/3048) — hierarchical DINOv2+XFeat+optical flow, <15m on edge hardware
|
||
- [YFS90/GNSS-Denied-UAV-Geolocalization](https://github.com/YFS90/GNSS-Denied-UAV-Geolocalization) — <7m MAE with terrain-weighted constraint optimization
|
||
- [CEUSP (2025)](https://arxiv.org/abs/2502.11408) — DINOv2-based cross-view UAV self-positioning
|
||
- [DINOv2 UAV Self-Localization (2025)](https://ui.adsabs.harvard.edu/abs/2025IRAL...10.2080Y/) — 86.27 R@1 on DenseUAV
|
||
- [DINOv2 ViT-S vs ViT-B comparison (Nature Scientific Reports 2024)](https://www.nature.com/articles/s41598-024-83358-8) — ViT-B +2.54pp recall over ViT-S, but 3-4x VRAM
|
||
- [SALAD: DINOv2 Optimal Transport Aggregation (CVPR 2024)](https://arxiv.org/abs/2311.15937) — +12.4pp R@1 over GeM on MSLS Challenge. <3ms overhead. Future enhancement candidate.
|
||
- [LightGlue-ONNX](https://github.com/fabio-sim/LightGlue-ONNX) — 2-4x speedup via ONNX/TensorRT, FP16 on Turing
|
||
- [SIFT+LightGlue UAV Mosaicking (ISPRS 2025)](https://isprs-archives.copernicus.org/articles/XLVIII-2-W11-2025/169/2025/) — SIFT superior for high-rotation conditions
|
||
- [LightGlue rotation issue #64](https://github.com/cvg/LightGlue/issues/64) — confirmed not rotation-invariant
|
||
- [DALGlue (2025)](https://www.nature.com/articles/s41598-025-21602-5) — 11.8% MMA improvement over LightGlue for UAV
|
||
- [NaviLoc (2025)](https://www.mdpi.com/2504-446X/10/2/97) — trajectory-level optimization, 19.5m MLE, 16x improvement
|
||
- [GTSAM v4.2](https://github.com/borglab/gtsam) — factor graph optimization with Python bindings
|
||
- [GTSAM GPSFactor docs](https://gtsam.org/doxygen/a04084.html) — GPSFactor works with Pose3 only
|
||
- [GTSAM Pose2 SLAM Example](https://gtbook.github.io/gtsam-examples/Pose2SLAMExample.html) — BetweenFactorPose2 + PriorFactorPose2
|
||
- [GTSAM IndeterminantLinearSystemException](https://github.com/borglab/gtsam/issues/561) — known failure mode, needs error handling
|
||
- [OpenCV decomposeHomographyMat issue #23282](https://github.com/opencv/opencv/issues/23282) — non-orthogonal matrices, 4-solution ambiguity
|
||
- [OpenCV undistort vs undistortPoints](https://stackoverflow.com/questions/30919957/undistort-vs-undistortpoints-for-feature-matching-of-calibrated-images) — full-image undistortion preferred for feature matching
|
||
- [Lens Distortion Correction for UAV Cameras (JGGS 2025)](https://www.sciopen.com/article/10.11947/j.JGGS.2025.0105) — critical for non-metric UAV cameras
|
||
- [CVE-2025-32434 PyTorch](https://nvd.nist.gov/vuln/detail/CVE-2025-32434) — RCE with weights_only=True, fixed in PyTorch 2.6+
|
||
- [CVE-2026-24747 PyTorch](https://nvd.nist.gov/vuln/detail/CVE-2026-24747) — memory corruption in weights_only unpickler, fixed in 2.10.0+
|
||
- [CVE-2026-25990 Pillow](https://nvd.nist.gov/vuln/detail/CVE-2026-25990) — PSD out-of-bounds write, fixed in 12.1.1
|
||
- [CVE-2025-43859 h11](https://nvd.nist.gov/vuln/detail/CVE-2025-43859) — HTTP request smuggling, CVSS 9.1, fixed in h11 0.16.0
|
||
- [AIKIDO-2026-10185 ONNX Runtime](https://nvd.nist.gov/vuln/detail/AIKIDO-2026-10185) — path traversal in external data, fixed in 1.24.1
|
||
- [python-jose maintenance status](https://github.com/mpdavis/python-jose) — unmaintained, recommend PyJWT
|
||
- [Copernicus DEM GLO-30 on AWS](https://registry.opendata.aws/copernicus-dem/) — free 30m global DEM, no auth via S3
|
||
- [Google Maps Tiles API](https://developers.google.com/maps/documentation/tile/satellite) — satellite tiles, 100K free/month, session tokens required
|
||
- [Google Maps Tiles API billing](https://developers.google.com/maps/documentation/tile/usage-and-billing) — 15K/day, 6K/min rate limits
|
||
- [Google Maps Ukraine imagery policy](https://en.ain.ua/2024/05/10/google-maps-shows-mariupol-irpin-and-other-cities-destroyed-by-russia/) — intentionally 1-3 years old for conflict zones
|
||
- [Maxar Ukraine imagery restored (2025)](https://en.defence-ua.com/news/maxar_satellite_imagery_is_still_available_in_ukraine_but_its_paid_only_now-13758.html) — paid-only, 31-50cm
|
||
- [Mapbox Satellite](https://docs.mapbox.com/data/tilesets/reference/mapbox-satellite/) — alternative tile provider, up to 0.3m/px regional
|
||
- [FastAPI SSE](https://fastapi.tiangolo.com/tutorial/server-sent-events/) — EventSourceResponse
|
||
- [SSE-Starlette cleanup issue #99](https://github.com/sysid/sse-starlette/issues/99) — async generator cleanup, Queue pattern recommended
|
||
- [FAISS GPU wiki](https://github.com/facebookresearch/faiss/wiki/Faiss-on-the-GPU) — ~2GB scratch space default, CPU recommended for small datasets
|
||
- [Oblique-Robust AVL (IEEE TGRS 2024)](https://ieeexplore.ieee.org/iel7/36/10354519/10356107.pdf) — rotation-equivariant features for UAV-satellite matching
|
||
- [ENU Coordinates Limitation](https://dirsig.cis.rit.edu/docs/new/coordinates.html) — flat-Earth approximation accurate within ~4km
|
||
- [ENU to ECEF Transformations (ESA Navipedia)](https://gssc.esa.int/navipedia/index.php/Transformations_between_ECEF_and_ENU_coordinates)
|
||
|
||
## Related Artifacts
|
||
|
||
- Previous assessment research: `_docs/00_research/gps_denied_nav_assessment/`
|
||
- Draft02 assessment research: `_docs/00_research/gps_denied_draft02_assessment/`
|
||
- Draft03 assessment (LiteSAM): `_docs/00_research/litesam_satellite_assessment/`
|
||
- Draft04 assessment: `_docs/00_research/draft04_assessment/`
|
||
- This assessment research: `_docs/00_research/draft05_assessment/`
|
||
- Previous AC assessment: `_docs/00_research/gps_denied_visual_nav/00_ac_assessment.md`
|