mirror of https://github.com/azaion/gps-denied-desktop.git synced 2026-04-22 21:56:37 +00:00

Files

T

Oleksandr Bezdieniezhnykh b419e2c04a add clarification to research methodology by including a step for solution comparison and user consultation

2026-03-17 18:43:57 +02:00

44 KiB

Raw Blame History

Solution Draft

Assessment Findings

Old Component Solution	Weak Point	New Solution
SuperPoint+LightGlue for VO (150-200ms)	No change: SuperPoint+LightGlue provides highest match quality (MIR 0.92 vs XFeat 0.74 on Megadepth) and best reliability on low-texture terrain. 150-200ms is well within 5s budget. VO reliability prioritized over speed.	Retain SuperPoint+LightGlue ONNX FP16 for VO.
LiteSAM "77.3% Hard hit rate" claim	Functional (Moderate): 77.3% is on the self-made dataset only. UAV-VisLoc Hard: 61.65%. Still better than SP+LG (~54-58%) but gap is ~4-7pp, not ~19pp.	Correct hit rate reporting. LiteSAM remains best option but with accurate expectations.
LiteSAM as sole satellite fine matcher	Functional (Moderate): 5 GitHub stars, 0 forks, no license, no independent reproduction, 4 commits. Single-point-of-failure weight hosting on Google Drive.	Add EfficientLoFTR (CVPR 2024, 964 stars) as proven fallback. Startup validation: checksum verify, test inference, auto-switch on failure.
No PyTorch version pinning	Security (Critical): CVE-2025-32434 (RCE with weights_only=True, PyTorch ≤2.5.1). CVE-2026-24747 (memory corruption, before 2.10.0).	Pin PyTorch ≥2.10.0. SHA256 checksums for all model weights. Prefer safetensors format where available.
LiteSAM weights from Google Drive	Security (Moderate): No checksum, no mirror, no alternative source. Mutable link. Pickle-based .ckpt format.	Download once, compute SHA256, store in config. Verify on every load. Convert to safetensors if feasible.
No iSAM2 error handling	Functional (Moderate): IndeterminantLinearSystemException can crash pipeline (GTSAM #561). No handling for initial factor failure.	Try/except around iSAM2.update(). On failure: skip factor, retry with 10x noise. Special handling for initial prior.
Google Maps imagery assumed "possibly outdated"	Functional (Low): Google intentionally keeps conflict zone imagery 1-3 years old. Eastern Ukraine matching will degrade significantly.	Add imagery staleness awareness: increase match noise sigma for outdated areas, lower confidence, prioritize user-provided tiles and Maxar for conflict zones.
No graceful degradation for model load failures	Functional (Low): If LiteSAM AND fallback fail to load, system has no degraded mode.	Add VO-only startup mode: if all satellite matchers fail to load, system runs VO + user anchoring only. Emit warning via SSE.

Product Solution Description

A Python-based GPS-denied visual navigation service that determines GPS coordinates of consecutive UAV photo centers using a hierarchical localization approach: fast visual odometry for frame-to-frame motion, two-stage satellite geo-referencing (coarse retrieval + fine matching) for absolute positioning, and factor graph optimization for trajectory refinement. The system operates as a background REST API service with real-time SSE streaming.

Core approach: Consecutive images are matched using SuperPoint+LightGlue (learned features with contextual attention matching, MIR 0.92) to estimate relative motion (visual odometry) — chosen for maximum reliability on low-texture terrain. Each image is geo-referenced against satellite imagery through a two-stage process: DINOv2 ViT-S/14 coarse retrieval selects the best-matching satellite tile using patch-level features, then LiteSAM (lightweight semi-dense matcher, 6.31M params) refines the alignment to subpixel precision. LiteSAM achieves 61.65% hit rate in Hard conditions on UAV-VisLoc and 77.3% on the authors' self-made dataset. EfficientLoFTR (CVPR 2024) serves as a proven fallback if LiteSAM is unavailable. A GTSAM iSAM2 factor graph fuses VO constraints (BetweenFactorPose2) and satellite anchors (PriorFactorPose2) in local ENU coordinates to produce an optimized trajectory. The system handles route disconnections by treating each continuous VO chain as an independent segment, geo-referenced through satellite matching and connected via the shared WGS84 coordinate frame.

┌─────────────────────────────────────────────────────────────────────┐
│                        Client (Desktop App)                         │
│   POST /jobs (start GPS, camera params, image folder)               │
│   GET  /jobs/{id}/stream (SSE)                                      │
│   POST /jobs/{id}/anchor (user manual GPS input)                    │
│   POST /jobs/{id}/batch-anchor (batch manual GPS input)             │
│   GET  /jobs/{id}/point-to-gps (image_id, pixel_x, pixel_y)        │
└──────────────────────┬──────────────────────────────────────────────┘
                       │ HTTP/SSE (JWT auth)
┌──────────────────────▼──────────────────────────────────────────────┐
│                     FastAPI Service Layer                            │
│   Job Manager → Pipeline Orchestrator → SSE Event Publisher         │
│   (asyncio.Queue-based publisher, heartbeat, Last-Event-ID)         │
└──────────────────────┬──────────────────────────────────────────────┘
                       │
┌──────────────────────▼──────────────────────────────────────────────┐
│                     Processing Pipeline                              │
│                                                                      │
│  ┌──────────────┐  ┌──────────────┐  ┌─────────────────────────┐   │
│  │ Image        │  │  Visual      │  │ Satellite Geo-Ref       │   │
│  │ Preprocessor │→│  Odometry    │→│ Stage 1: DINOv2-S patch  │   │
│  │ (downscale,  │  │ (SuperPoint │  │   retrieval (CPU faiss)  │   │
│  │  rectify)    │  │ + LightGlue │  │ Stage 2: LiteSAM fine    │   │
│  │              │  │  ONNX FP16) │  │   matching (subpixel)    │   │
│  │              │  │             │  │   [fallback: EfficientLoFTR] │
│  └──────────────┘  └──────────────┘  └─────────────────────────┘   │
│         │                │                       │                   │
│         ▼                ▼                       ▼                   │
│  ┌──────────────────────────────────────────────────────────────┐   │
│  │       GTSAM iSAM2 Factor Graph Optimizer                    │   │
│  │  Pose2 + BetweenFactorPose2 (VO) + PriorFactorPose2 (sat)  │   │
│  │  Local ENU coordinates → WGS84 output                       │   │
│  │  [IndeterminantLinearSystemException handling]               │   │
│  └──────────────────────────────────────────────────────────────┘   │
│                              │                                       │
│  ┌───────────────────────────▼──────────────────────────────────┐   │
│  │                  Segment Manager                              │   │
│  │   (drift thresholds, confidence decay, user input triggers)   │   │
│  └──────────────────────────────────────────────────────────────┘   │
│                                                                      │
│  ┌──────────────────────────────────────────────────────────────┐   │
│  │          Multi-Provider Satellite Tile Cache                  │   │
│  │  (Google Maps + Mapbox + user tiles, session tokens,          │   │
│  │   DEM cache, request budgeting)                               │   │
│  └──────────────────────────────────────────────────────────────┘   │
│                                                                      │
│  ┌──────────────────────────────────────────────────────────────┐   │
│  │          Model Weight Manager                                 │   │
│  │  (SHA256 verification, startup validation, fallback chain)    │   │
│  └──────────────────────────────────────────────────────────────┘   │
└──────────────────────────────────────────────────────────────────────┘

Architecture

Component: Model Weight Manager (NEW)

Solution	Tools	Advantages	Limitations	Requirements	Security	Performance	Fit
SHA256 checksum + startup validation + fallback chain	hashlib, safetensors, torch	Prevents supply chain attacks. Detects corruption. Auto-fallback on failure.	Adds ~2-5s startup time.	PyTorch ≥2.10.0	SHA256 per weight file, safetensors where available	One-time at startup	Best

Selected: SHA256 checksum verification with startup validation.

Weight manifest (stored in config):

Model	Source	Format	SHA256	Fallback
SuperPoint	Official repo	PyTorch	[from repo]	SIFT (OpenCV, no weights)
LightGlue ONNX	GitHub release	ONNX	[from release]	LightGlue PyTorch
DINOv2 ViT-S/14	torch.hub / HuggingFace	safetensors (preferred)	[from HuggingFace]	None (required)
LiteSAM	Google Drive (pinned link)	.ckpt (pickle)	[compute on first download]	EfficientLoFTR
EfficientLoFTR	HuggingFace	PyTorch	[from HuggingFace]	SuperPoint+LightGlue
SIFT	OpenCV built-in	N/A	N/A	None

Startup sequence:

Verify PyTorch version ≥2.10.0 — refuse to start if older
For each model in manifest: check file exists → verify SHA256 → load with weights_only=True → run inference on reference input → confirm output shape
If LiteSAM fails: load EfficientLoFTR, log warning
If EfficientLoFTR fails: load SuperPoint+LightGlue for satellite matching, log warning
If ALL satellite matchers fail: start in VO-only mode, emit model_degraded SSE event
SuperPoint, LightGlue, and DINOv2 are required — refuse to start without them

Component: Image Preprocessor

Solution	Tools	Advantages	Limitations	Requirements	Security	Performance	Fit
Downscale + rectify + validate	OpenCV resize, NumPy	Normalizes input. Consistent memory. Validates before loading.	Loses fine detail in downscaled images.	OpenCV, NumPy	Magic byte validation, dimension check before load	<10ms per image	Best

Selected: Downscale + rectify + validate pipeline.

Preprocessing per image:

Validate file: check magic bytes (JPEG/PNG/TIFF), reject unknown formats
Read image header only: check dimensions, reject if either > 10,000px
Load image via OpenCV (cv2.imread)
Downscale to max 1600 pixels on longest edge (preserving aspect ratio)
Store original resolution for GSD: GSD = (effective_altitude × sensor_width) / (focal_length × original_width) where effective_altitude = flight_altitude - terrain_elevation (terrain from Copernicus DEM)
If estimated heading is available: rotate to approximate north-up for satellite matching
If no heading (segment start): pass unrotated
Convert to grayscale for feature extraction
Output: downscaled grayscale image + metadata (original dims, GSD, heading if known)

Component: Feature Extraction

Solution	Tools	Advantages	Limitations	Requirements	Security	Performance	Fit
SuperPoint (for VO)	superpoint (PyTorch)	Learned features, robust to viewpoint/illumination. 256-dim descriptors. High MIR (0.92 with LightGlue) — best reliability on low-texture terrain.	Not rotation-invariant. Slower than XFeat.	NVIDIA GPU, PyTorch, CUDA	Model weights from official source	~80ms GPU	Best for VO
LiteSAM (for satellite matching)	LiteSAM (PyTorch)	Best hit rate on satellite-aerial benchmarks. 6.31M params. Subpixel refinement via MinGRU. End-to-end semi-dense matcher. UAV-VisLoc Hard: 61.65%. Self-made: 77.3%.	Not rotation-invariant. No ONNX. Immature repo (5 stars).	PyTorch, NVIDIA GPU	Model weights from Google Drive (SHA256 verified)	~140-210ms on RTX 2060 (est.)	Best for satellite
EfficientLoFTR (satellite fallback)	EfficientLoFTR (PyTorch)	CVPR 2024, 964 stars. HuggingFace integration. Proven semi-dense matcher.	15.05M params (2.4x more than LiteSAM). Slightly lower hit rate.	PyTorch, NVIDIA GPU	HuggingFace	~150-250ms on RTX 2060 (est.)	Satellite fallback
SIFT (rotation fallback)	OpenCV cv2.SIFT	Rotation-invariant. Scale-invariant. Proven SIFT+LightGlue hybrid for UAV mosaicking (ISPRS 2025).	Slower. Less discriminative in low-texture.	OpenCV	N/A	~200ms CPU	Rotation fallback

Selected: SuperPoint+LightGlue ONNX FP16 for VO (maximum reliability), LiteSAM for satellite fine matching (EfficientLoFTR fallback), SIFT+LightGlue as rotation-heavy fallback.

VRAM budget:

Model	VRAM	Loaded When
SuperPoint	~400MB	Always (VO every frame)
LightGlue ONNX FP16	~500MB	Always (VO every frame)
DINOv2 ViT-S/14	~300MB	Satellite coarse retrieval
LiteSAM (6.31M params)	~400MB	Satellite fine matching
Peak total	~1.6GB	Satellite matching phase
EfficientLoFTR (if fallback)	~600MB	Replaces LiteSAM slot
Peak with fallback	~1.8GB	Satellite matching phase

Component: Feature Matching

Solution	Tools	Advantages	Limitations	Requirements	Security	Performance	Fit
SuperPoint+LightGlue ONNX FP16 (VO)	SuperPoint + LightGlue-ONNX	Highest match quality (MIR 0.92). LightGlue attention disambiguates repetitive patterns. Best reliability on low-texture terrain. FP16 on Turing.	Not rotation-invariant. ~150-200ms total.	PyTorch, ONNX Runtime, NVIDIA GPU	Model weights from official source	~130-180ms on RTX 2060	Best for VO
LiteSAM (satellite fine matching)	LiteSAM (PyTorch)	Best hit rate on satellite-aerial benchmarks (61.65% Hard on UAV-VisLoc, 77.3% on self-made). 6.31M params. Subpixel refinement.	Not rotation-invariant. No ONNX.	PyTorch, NVIDIA GPU	SHA256 verified weights	~140-210ms on RTX 2060 (est.)	Best for satellite
EfficientLoFTR (satellite fallback)	EfficientLoFTR (PyTorch)	Proven base architecture. CVPR 2024. Reliable.	Slightly lower hit rate than LiteSAM. More params.	PyTorch, NVIDIA GPU	HuggingFace	~150-250ms on RTX 2060 (est.)	Satellite fallback
SIFT+LightGlue (rotation fallback)	OpenCV SIFT + LightGlue	SIFT rotation invariance + LightGlue contextual matching. Proven superior for high-rotation UAV (ISPRS 2025).	Slower than XFeat.	OpenCV + ONNX Runtime	N/A	~250ms total	Rotation fallback

Selected: SuperPoint+LightGlue ONNX FP16 for VO, LiteSAM for satellite fine matching (EfficientLoFTR fallback), SIFT+LightGlue as rotation fallback.

Satellite fine matcher fallback chain: LiteSAM → EfficientLoFTR → SIFT+LightGlue

Component: Visual Odometry (Consecutive Frame Matching)

Solution	Tools	Advantages	Limitations	Requirements	Security	Performance	Fit
Homography VO with essential matrix fallback	OpenCV findHomography (USAC_MAGSAC), findEssentialMat, decomposeHomographyMat	Homography: optimal for flat terrain. Essential matrix: non-planar fallback. Known altitude resolves scale.	Homography assumes planar. 4-way decomposition ambiguity.	OpenCV, NumPy	N/A	~5ms for estimation	Best

Selected: Homography VO with essential matrix fallback and DEM terrain-corrected GSD.

VO Pipeline per frame:

Extract SuperPoint features from current image (~80ms)
Match with previous image using LightGlue ONNX FP16 (~50-100ms)
Triple failure check: match count ≥ 30 AND RANSAC inlier ratio ≥ 0.4 AND motion magnitude consistent with expected inter-frame distance (100m ± 250m)
If checks pass → estimate homography (cv2.findHomography with USAC_MAGSAC, confidence 0.999, max iterations 2000)
If RANSAC inlier ratio < 0.6 → additionally estimate essential matrix as quality check
Decomposition disambiguation (4 solutions from decomposeHomographyMat): a. Filter by positive depth: triangulate 5 matched points, reject if behind camera b. Filter by plane normal: normal z-component > 0.5 (downward camera → ground plane normal points up) c. If previous direction available: prefer solution consistent with expected motion d. Orthogonality check: verify R^T R ≈ I (Frobenius norm < 0.01). If failed, re-orthogonalize via SVD: U,S,V = svd(R), R_clean = U @ V^T e. First frame pair in segment: use filters a+b only
Terrain-corrected GSD: query Copernicus DEM at estimated position → effective_altitude = flight_altitude - terrain_elevation → GSD = (effective_altitude × sensor_width) / (focal_length × original_image_width)
Convert pixel displacement to meters: displacement_m = displacement_px × GSD
Update position: new_pos = prev_pos + rotation @ displacement_m
Track cumulative heading for image rectification
If triple failure check fails → trigger segment break

Component: Satellite Image Geo-Referencing (Two-Stage)

Solution	Tools	Advantages	Limitations	Requirements	Security	Performance	Fit
Stage 1: DINOv2 ViT-S/14 patch retrieval	dinov2 ViT-S/14 (PyTorch), faiss (CPU)	Fast (50ms). 300MB VRAM. Patch tokens capture spatial layout better than CLS alone. Semantic matching robust to seasonal change.	Coarse only (~tile-level). Lower precision than ViT-B/ViT-L.	PyTorch, faiss-cpu	Model weights from official source	~50ms extract + <1ms search	Best coarse
Stage 2: LiteSAM fine matching	LiteSAM (PyTorch)	Best satellite-aerial hit rate (61.65% Hard on UAV-VisLoc, 77.3% on self-made). Subpixel accuracy via MinGRU. 6.31M params, ~400MB VRAM. End-to-end semi-dense matching.	Not rotation-invariant. No ONNX. Immature codebase.	PyTorch, NVIDIA GPU	SHA256 verified weights	~140-210ms on RTX 2060 (est.)	Best fine
Stage 2 fallback: EfficientLoFTR	EfficientLoFTR (PyTorch)	CVPR 2024. Mature. HuggingFace. LiteSAM's base architecture.	15.05M params. ~600MB VRAM.	PyTorch, NVIDIA GPU	HuggingFace weights	~150-250ms on RTX 2060 (est.)	Fine fallback

Selected: Two-stage hierarchical matching — DINOv2 coarse retrieval + LiteSAM fine matching (EfficientLoFTR fallback).

Satellite Matching Pipeline:

Estimate approximate position from VO
Stage 1 — Coarse retrieval: a. Define search area: 500m radius around VO estimate (expand to 1km if segment just started or drift > 100m) b. Pre-compute DINOv2 ViT-S/14 patch embeddings for all satellite tiles in search area. Method: extract patch tokens (not CLS), apply spatial average pooling to get a single descriptor per tile. Cache embeddings. c. Extract DINOv2 ViT-S/14 patch embedding from UAV image (same pooling) d. Find top-5 most similar satellite tiles using faiss (CPU) cosine similarity
Stage 2 — Fine matching (on top-5 tiles, stop on first good match): a. Warp UAV image to approximate nadir view using estimated camera pose b. Rotation handling:
- If heading known: single attempt with rectified image
- If no heading (segment start): try 4 rotations {0°, 90°, 180°, 270°} c. Run LiteSAM (or EfficientLoFTR fallback) on (uav_warped, sat_tile) → semi-dense correspondences with subpixel accuracy d. Geometric validation: require ≥15 inliers, inlier ratio ≥ 0.3, reprojection error < 3px e. If valid: estimate homography → transform image center → satellite pixel → WGS84 f. Report: absolute position anchor with confidence based on match quality
If all 5 tiles fail Stage 2 with LiteSAM/EfficientLoFTR: a. Try SIFT+LightGlue on top-3 tiles (rotation-invariant). Trigger: best LiteSAM inlier ratio was < 0.15. b. Try zoom level 17 (wider view)
If still fails: mark frame as VO-only, reduce confidence, continue

Satellite matching frequency: Every frame when available, but async — satellite matching for frame N overlaps with VO processing for frame N+1. Satellite result arrives and gets added to factor graph retroactively via iSAM2 update.

Component: GTSAM Factor Graph Optimizer

Solution	Tools	Advantages	Limitations	Requirements	Security	Performance	Fit
GTSAM iSAM2 factor graph (Pose2)	gtsam==4.2 (pip)	Incremental smoothing. Proper uncertainty propagation. Native BetweenFactorPose2 and PriorFactorPose2. Backward smoothing on new evidence. Python bindings.	C++ backend (pip binary). Learning curve.	gtsam==4.2, NumPy	N/A	~5-10ms incremental update	Best

Selected: GTSAM iSAM2 with Pose2 variables.

Coordinate system: Local East-North-Up (ENU) centered on starting GPS. All positions computed in ENU meters, converted to WGS84 for output. Conversion: pyproj or manual geodetic math (WGS84 ellipsoid).

Factor graph structure:

Variables: Pose2 (x_enu, y_enu, heading) per image
Prior Factor (PriorFactorPose2): first frame anchored at ENU origin (0, 0, initial_heading) with tight noise (sigma_xy = 5m if GPS accurate, sigma_theta = 0.1 rad)
VO Factor (BetweenFactorPose2): relative motion between consecutive frames. Noise model: Diagonal.Sigmas([sigma_x, sigma_y, sigma_theta]) where sigma scales inversely with RANSAC inlier ratio. High inlier ratio (0.8) → sigma 2m. Low inlier ratio (0.4) → sigma 10m. Sigma_theta proportional to displacement magnitude.
Satellite Anchor Factor (PriorFactorPose2): absolute position from satellite matching. Position noise: sigma = reprojection_error × GSD × scale_factor. Good match (0.5px × 0.4m/px × 3) = 0.6m. Poor match = 5-10m. Heading component: loose (sigma = 1.0 rad) unless estimated from satellite alignment.
Satellite age adjustment: For tiles known to be >1 year old (conflict zones), multiply satellite anchor noise sigma by 2.0 to reduce their influence on optimization.

Optimizer behavior:

On each new frame: add VO factor, run iSAM2.update() → ~5ms
On satellite match arrival: add PriorFactorPose2, run iSAM2.update() → backward correction
Emit updated positions via SSE after each update
Refinement events: when backward correction moves positions by >1m, emit "refined" SSE event
No custom Python factors — all factors use native GTSAM C++ implementations for speed

Error handling:

Wrap every iSAM2.update() in try/except for gtsam.IndeterminantLinearSystemException
On exception: log error with factor details, skip the problematic factor, retry with 10x noise sigma
If initial prior factor fails: re-initialize graph with relaxed noise (sigma_xy = 50m, sigma_theta = 0.5 rad)
If persistent failures (>3 consecutive): reset graph from last known-good state, re-add factors incrementally
Never crash the pipeline — degrade to VO-only positioning if optimizer is unusable

Component: Segment Manager

The segment manager tracks independent VO chains, manages drift thresholds, and handles reconnection.

Segment lifecycle:

Start condition: First image, OR VO triple failure check fails
Active tracking: VO provides frame-to-frame motion within segment
Anchoring: Satellite two-stage matching provides absolute position
End condition: VO failure (sharp turn, outlier >350m, occlusion)
New segment: Starts, attempts satellite anchor immediately

Segment states:

ANCHORED: At least one satellite match → HIGH confidence
FLOATING: No satellite match yet → positioned relative to segment start → LOW confidence
USER_ANCHORED: User provided manual GPS → MEDIUM confidence

Drift monitoring:

Track cumulative VO displacement since last satellite anchor per segment
100m threshold: emit warning SSE event, expand satellite search radius to 1km, increase matching attempts per frame
200m threshold: emit user_input_needed SSE event with configurable timeout (default: 30s)
500m threshold: mark all subsequent positions as VERY LOW confidence, continue processing
Confidence formula: confidence = base_confidence × exp(-drift / decay_constant) where base_confidence is from satellite match quality, drift is distance from nearest anchor, decay_constant = 100m

Segment reconnection:

When a segment becomes ANCHORED, check for nearby FLOATING segments (within 500m of any anchored position)
Attempt satellite-based position matching between FLOATING segment images and tiles near the ANCHORED segment
Reconnection order (for 5+ segments): process by proximity to nearest ANCHORED segment first (greedy nearest-neighbor)
Reconnection validation: require geometric consistency (heading continuity) and DEM elevation profile consistency between adjacent segments before merging
If no match after all frames tried: request user input, auto-continue after timeout

Component: Multi-Provider Satellite Tile Cache

Solution	Tools	Advantages	Limitations	Requirements	Security	Performance	Fit
Multi-provider progressive cache with DEM	aiohttp, aiofiles, sqlite3, faiss-cpu	Multiple providers. Async download. DINOv2/feature pre-computation. DEM cached. Session token management.	Needs internet. Provider API differences.	Google Maps Tiles API + Mapbox API keys	API keys in env vars only. Session tokens managed internally.	Async, non-blocking	Best

Selected: Multi-provider progressive cache.

Provider priority:

User-provided tiles (highest priority — custom/recent imagery)
Google Maps (zoom 18, ~0.4m/px) — 100K free requests/month, 15K/day
Mapbox Satellite (zoom 16-18, ~0.6-0.3m/px) — 200K free requests/month

Conflict zone awareness: For eastern Ukraine (configurable region polygon), Google Maps imagery is known to be 1-3 years old. System should: 1) log a warning at job start, 2) increase satellite anchor noise sigma by 2.0×, 3) prioritize user-provided tiles if available, 4) lower satellite matching confidence threshold by 0.3.

Google Maps session management:

On job start: POST to /v1/createSession with API key → receive session token
Use session token in all subsequent tile requests for this job
Token has finite lifetime — handle expiry by creating new session
Track request count per day per provider

Cache strategy:

On job start: download tiles in 1km radius around starting GPS from primary provider
Pre-compute DINOv2 ViT-S/14 patch embeddings for all cached tiles
As route extends: download tiles 500m ahead of estimated position
Request budgeting: track daily API requests per provider. At 80% daily limit (12,000 for Google): switch to Mapbox. Log budget status.

Cache structure on disk:

cache/
├── tiles/{provider}/{zoom}/{x}/{y}.jpg
├── embeddings/{provider}/{zoom}/{x}/{y}_dino.npy  (DINOv2 patch embedding)
└── dem/{lat}_{lon}.tif                            (Copernicus DEM tiles)

Cache persistent across jobs — tiles and features reused for overlapping areas
DEM cache: Copernicus DEM GLO-30 tiles from AWS S3 (free, no auth). s3://copernicus-dem-30m/. Cloud Optimized GeoTIFFs, 30m resolution. Downloaded via HTTPS (no AWS SDK needed): https://copernicus-dem-30m.s3.amazonaws.com/Copernicus_DSM_COG_10_{N|S}{lat}_00_{E|W}{lon}_DEM/...

Tile download budget:

Google Maps: 100,000/month, 15,000/day → ~7 flights/day from cache misses, ~50 flights/month
Mapbox: 200,000/month → additional ~100 flights/month
Per flight: ~2000 satellite tiles (~80MB) + ~200 DEM tiles (~10MB)

Component: API & Real-Time Streaming

Solution	Tools	Advantages	Limitations	Requirements	Security	Performance	Fit
FastAPI + SSE (Queue-based) + JWT	FastAPI ≥0.135.0, asyncio.Queue, uvicorn, python-jose	Native SSE. Queue-based publisher avoids generator cleanup issues. JWT auth. OpenAPI auto-generated.	Python GIL (mitigated with asyncio).	Python 3.11+, uvicorn	JWT, CORS, rate limiting, CSP headers	Async, non-blocking	Best

Selected: FastAPI + Queue-based SSE + JWT authentication.

SSE implementation:

Use asyncio.Queue per client connection (not bare async generators)
Server pushes events to queue; client reads from queue
On disconnect: queue is garbage collected, no lingering generators
SSE heartbeat: send event: heartbeat every 15 seconds to detect stale connections
Support Last-Event-ID header for reconnection: include monotonic event ID in each SSE message. On reconnect, replay missed events from in-memory ring buffer (last 1000 events per job).

API Endpoints:

POST /auth/token
  Body: { api_key }
  Returns: { access_token, token_type, expires_in }

POST /jobs
  Headers: Authorization: Bearer <token>
  Body: { start_lat, start_lon, altitude, camera_params, image_folder }
  Returns: { job_id }

GET /jobs/{job_id}/stream
  Headers: Authorization: Bearer <token>
  SSE stream of:
    - { event: "position", id: "42", data: { image_id, lat, lon, confidence, segment_id } }
    - { event: "refined", id: "43", data: { image_id, lat, lon, confidence, delta_m } }
    - { event: "segment_start", id: "44", data: { segment_id, reason } }
    - { event: "drift_warning", id: "45", data: { segment_id, cumulative_drift_m } }
    - { event: "user_input_needed", id: "46", data: { image_id, reason, timeout_s } }
    - { event: "model_degraded", id: "47", data: { model, fallback, reason } }
    - { event: "heartbeat", id: "48", data: { timestamp } }
    - { event: "complete", id: "49", data: { summary } }

POST /jobs/{job_id}/anchor
  Headers: Authorization: Bearer <token>
  Body: { image_id, lat, lon }

POST /jobs/{job_id}/batch-anchor
  Headers: Authorization: Bearer <token>
  Body: { anchors: [{ image_id, lat, lon }, ...] }

GET /jobs/{job_id}/point-to-gps?image_id=X&px=100&py=200
  Headers: Authorization: Bearer <token>
  Returns: { lat, lon, confidence }

GET /jobs/{job_id}/results?format=geojson
  Headers: Authorization: Bearer <token>
  Returns: full results as GeoJSON or CSV (WGS84)

Security measures:

JWT authentication on all endpoints (short-lived tokens, 1h expiry)
Image folder whitelist: resolve to canonical path (os.path.realpath), verify under configured base directories
Image validation: magic byte check (JPEG FFD8, PNG 89504E47, TIFF 4949/4D4D), dimension check (<10,000px per side), reject others
Pin Pillow ≥11.3.0 (CVE-2025-48379 mitigation)
Pin PyTorch ≥2.10.0 (CVE-2025-32434 and CVE-2026-24747 mitigation)
SHA256 checksum verification for all model weights (especially LiteSAM from Google Drive)
Use weights_only=True for all torch.load() calls (defense-in-depth; not sole protection)
Prefer safetensors format where available (DINOv2 from HuggingFace)
Max concurrent SSE connections per client: 5
Rate limiting: 100 requests/minute per client
All provider API keys in environment variables, never logged or returned
CORS configured for known client origins only
Content-Security-Policy headers
SSE heartbeat prevents stale connections accumulating

Component: Interactive Point-to-GPS Lookup

For each processed image, the system stores the estimated camera-to-ground transformation. Given pixel coordinates (px, py):

If image has satellite match: use computed homography to project (px, py) → satellite tile coordinates → WGS84. HIGH confidence.
If image has only VO pose: use camera intrinsics + DEM-corrected altitude + estimated heading to ray-cast (px, py) to ground plane → WGS84. MEDIUM confidence.
Confidence score derived from underlying position estimate quality.

Processing Time Budget

Step	Component	Time	GPU/CPU	Notes
1	Image load + validate + downscale	<10ms	CPU	OpenCV
2	SuperPoint feature extraction	~80ms	GPU	256-dim descriptors
3	LightGlue ONNX FP16 matching	~50-100ms	GPU	Contextual matcher
4	Homography estimation + decomposition	~5ms	CPU	USAC_MAGSAC
5	GTSAM iSAM2 update (VO factor)	~5ms	CPU	Incremental
6	SSE position emit	<1ms	CPU	Queue push
VO subtotal		~150-200ms		Per-frame critical path
7	DINOv2 ViT-S/14 extract (UAV image)	~50ms	GPU	Patch tokens
8	faiss cosine search (top-5 tiles)	<1ms	CPU	~2000 vectors
9	LiteSAM fine matching (per tile, up to 5)	~140-210ms	GPU	End-to-end semi-dense, est. RTX 2060
10	Geometric validation + homography	~5ms	CPU
11	GTSAM iSAM2 update (satellite factor)	~5ms	CPU	Backward correction
Satellite subtotal		~201-271ms		Overlapped with next frame's VO
Total per frame		~350-470ms		Well under 5s budget

Testing Strategy

Integration / Functional Tests

End-to-end pipeline test using provided 60-image sample dataset with ground truth GPS
Verify 80% of positions within 50m of ground truth
Verify 60% of positions within 20m of ground truth
Test sharp turn handling: simulate 90° turn with non-overlapping images
Test segment creation, satellite anchoring, and cross-segment reconnection
Test segment reconnection ordering with 5+ disconnected segments
Test user manual anchor injection via POST endpoint
Test batch anchor endpoint with multiple anchors for multi-segment scenarios
Test point-to-GPS lookup accuracy against known ground coordinates
Test SSE streaming delivers results within 1s of processing completion
Test with FullHD resolution images (pipeline must not fail)
Test with 6252×4168 images (verify downscaling and memory usage)
Test DINOv2 ViT-S/14 coarse retrieval finds correct satellite tile with 100m VO drift
Test multi-provider fallback: block Google Maps, verify Mapbox takes over
Test with outdated satellite imagery: verify confidence scores reflect match quality
Test outlier handling: 350m gap between consecutive photos
Test image rotation handling: apply 45° and 90° rotation, verify 4-rotation retry works
Test SIFT+LightGlue fallback triggers when LiteSAM inlier ratio < 0.15
Test GTSAM PriorFactorPose2 satellite anchoring produces backward correction
Test drift warning at 100m cumulative displacement without satellite anchor
Test user_input_needed event at 200m cumulative displacement
Test SSE heartbeat arrives every 15s during long processing
Test SSE reconnection with Last-Event-ID replays missed events
Test homography decomposition disambiguation for first frame pair (no previous direction)
Test LiteSAM fine matching produces valid correspondences on satellite-aerial pair
Test LiteSAM subpixel accuracy improves homography estimation vs pixel-level only
Test EfficientLoFTR fallback activates when LiteSAM fails startup validation
Test VO-only mode when all satellite matchers fail to load
Test model_degraded SSE event is emitted on fallback activation
Test iSAM2 IndeterminantLinearSystemException recovery (skip factor + retry with relaxed noise)
Test iSAM2 initial prior factor failure recovery (relaxed re-initialization)
Test conflict zone imagery staleness: verify increased noise sigma for satellite anchors

Non-Functional Tests

Processing speed: <5s per image on RTX 2060 (target <470ms with SuperPoint+LightGlue VO)
Memory: peak RAM <16GB, VRAM <6GB during 3000-image flight at max resolution
VRAM: verify peak stays under 1.6GB during satellite matching phase (LiteSAM) or 1.8GB (EfficientLoFTR fallback)
Memory stability: process 3000 images, verify no memory leak (stable RSS over time)
Concurrent jobs: 2 simultaneous flights, verify isolation and resource sharing
Tile cache: verify tiles and DINOv2 embeddings cached and reused
API: load test SSE connections (10 simultaneous clients)
Recovery: kill and restart service mid-job, verify job can resume from last processed image
DEM download: verify Copernicus DEM tiles fetched from AWS S3 and cached correctly
GTSAM optimizer: verify backward correction produces "refined" events
Session token lifecycle: verify Google Maps session creation, usage, and expiry handling
Model startup validation: verify all weight checksums pass within <10s total

Security Tests

JWT authentication enforcement on all endpoints
Expired/invalid token rejection
Provider API keys not exposed in responses, logs, or error messages
Image folder path traversal prevention (attempt to access /etc/passwd via image_folder)
Image folder whitelist enforcement (canonical path resolution)
Image magic byte validation: reject non-image files renamed to .jpg
Image dimension validation: reject >10,000px images
Input validation: invalid GPS coordinates, negative altitude, malformed camera params
Rate limiting: verify 429 response after exceeding limit
Max SSE connection enforcement
CORS enforcement: reject requests from unknown origins
Content-Security-Policy header presence
Pillow version ≥11.3.0 verified in requirements
PyTorch version ≥2.10.0 verified in requirements
SHA256 checksum verification for all model weight files
Verify weights_only=True used in all torch.load() calls
Verify safetensors format used for DINOv2 (no pickle deserialization)
LiteSAM weight integrity: verify SHA256 matches config on every load

References

LiteSAM (Remote Sensing, Oct 2025) — Lightweight satellite-aerial feature matching, 6.31M params, UAV-VisLoc Hard HR 61.65%, RMSE@30=17.86m; self-made dataset Hard HR 77.3%
LiteSAM GitHub — Official code, pretrained weights on Google Drive, 5 stars, built upon EfficientLoFTR
EfficientLoFTR (CVPR 2024) — LiteSAM's base architecture, 15.05M params, 964 stars, HuggingFace integration
XFeat (CVPR 2024) — 5x faster than SuperPoint, AUC@10° 65.4 (MNN) vs SuperPoint+LightGlue AUC@10° 75.0 (MIR 0.92). SP+LG more reliable on low-texture.
SatLoc-Fusion (2025) — hierarchical DINOv2+XFeat+optical flow, <15m on edge hardware
YFS90/GNSS-Denied-UAV-Geolocalization — <7m MAE with terrain-weighted constraint optimization
CEUSP (2025) — DINOv2-based cross-view UAV self-positioning
DINOv2 UAV Self-Localization (2025) — 86.27 R@1 on DenseUAV
DINOv2 ViT-S vs ViT-B comparison (Nature Scientific Reports 2024) — ViT-B +2.54pp recall over ViT-S, but 3-4x VRAM
LightGlue-ONNX — 2-4x speedup via ONNX/TensorRT, FP16 on Turing
SIFT+LightGlue UAV Mosaicking (ISPRS 2025) — SIFT superior for high-rotation conditions
LightGlue rotation issue #64 — confirmed not rotation-invariant
DALGlue (2025) — 11.8% MMA improvement over LightGlue for UAV
SALAD: DINOv2 Optimal Transport Aggregation (2024) — improved visual place recognition
NaviLoc (2025) — trajectory-level optimization, 19.5m MLE, 16x improvement
GTSAM v4.2 — factor graph optimization with Python bindings
GTSAM GPSFactor docs — GPSFactor works with Pose3 only
GTSAM Pose2 SLAM Example — BetweenFactorPose2 + PriorFactorPose2
GTSAM IndeterminantLinearSystemException — known failure mode, needs error handling
OpenCV decomposeHomographyMat issue #23282 — non-orthogonal matrices, 4-solution ambiguity
CVE-2025-32434 PyTorch — RCE with weights_only=True, fixed in PyTorch 2.6+
CVE-2026-24747 PyTorch — memory corruption in weights_only unpickler, fixed in 2.10.0+
Copernicus DEM GLO-30 on AWS — free 30m global DEM, no auth via S3
Google Maps Tiles API — satellite tiles, 100K free/month, session tokens required
Google Maps Tiles API billing — 15K/day, 6K/min rate limits
Google Maps Ukraine imagery policy — intentionally 1-3 years old for conflict zones
Maxar Ukraine imagery restored (2025) — paid-only, 31-50cm
Mapbox Satellite — alternative tile provider, up to 0.3m/px regional
FastAPI SSE — EventSourceResponse
SSE-Starlette cleanup issue #99 — async generator cleanup, Queue pattern recommended
CVE-2025-48379 Pillow — heap buffer overflow, fixed in 11.3.0
FAISS GPU wiki — ~2GB scratch space default, CPU recommended for small datasets
Oblique-Robust AVL (IEEE TGRS 2024) — rotation-equivariant features for UAV-satellite matching

Previous assessment research: _docs/00_research/gps_denied_nav_assessment/
Draft02 assessment research: _docs/00_research/gps_denied_draft02_assessment/
Draft03 assessment (LiteSAM): _docs/00_research/litesam_satellite_assessment/
This assessment research: _docs/00_research/draft04_assessment/
Previous AC assessment: _docs/00_research/gps_denied_visual_nav/00_ac_assessment.md

44 KiB Raw Blame History Unescape Escape