azaion/gps-denied-desktop

Fork 0

mirror of https://github.com/azaion/gps-denied-desktop.git synced 2026-04-22 21:56:37 +00:00

Files

T

Oleksandr Bezdieniezhnykh d764250f9a add solution drafts 3 times, used research skill, expand acceptance criteria

2026-03-14 20:38:00 +02:00

29 KiB

Raw Blame History

Solution Draft

Assessment Findings

Old Component Solution	Weak Point	New Solution
Direct SuperPoint+LightGlue satellite matching	Functional: No coarse localization stage. Fails when VO drift is large or satellite tile is wrong. Not rotation-invariant (GitHub issue #64). Returns false matches on non-overlapping pairs (issue #13).	Two-stage: DINOv2 coarse retrieval → SuperPoint+LightGlue fine alignment. Image rotation normalization. Geometric consistency check for match validation.
SuperPoint for all feature extraction	Performance: Unified pipeline is simpler but suboptimal. SuperPoint ~80ms per image is slower than needed for every-frame VO.	Dual-extractor: XFeat for VO (5x faster, ~15ms), SuperPoint for satellite matching (higher accuracy).
scipy.optimize sliding window	Functional: Generic optimizer. No proper uncertainty modeling per measurement. No terrain constraints. Reinvents what GTSAM already provides.	GTSAM iSAM2 factor graph: BetweenFactor (VO), GPSFactor (satellite anchors), terrain constraints from Copernicus DEM.
Google Maps as sole satellite provider	Functional: Eastern Ukraine imagery 3-5+ years old. $200/month free credit expired Feb 2025. 15K/day rate limit tight for large flights.	Multi-provider: Google Maps primary + Mapbox fallback + user-provided tiles. Request budgeting.
No image downscaling strategy	Performance/Memory: 6252×4168 images cannot fit in 6GB VRAM for feature extraction. No memory budget specified.	Downscale to 1600 long edge for feature extraction. Streaming one-at-a-time processing. Explicit memory budgets.
No camera rotation handling	Functional: Non-stabilized camera produces rotated images. SuperPoint/LightGlue fail at 90° rotation.	Estimate heading from VO chain. Rectify images before satellite matching. SIFT fallback for rotation-heavy cases.
Homography VO without terrain correction	Functional: GSD assumes constant altitude. No fallback for non-planar scenes. Decomposition can be unstable.	Integrate Copernicus DEM for terrain-corrected GSD. Essential matrix fallback when RANSAC inlier ratio is low.
No non-match detection	Functional: VO failure detection relies on match count only. Misses geometrically inconsistent matches.	Triple check: match count + RANSAC inlier ratio + motion consistency with previous frames.
API key authentication only	Security: API keys in URLs persist in logs and browser history. No SSE connection limits. No DoS protection.	JWT authentication. Short-lived SSE tokens. Rate limiting. Connection pool limits. Image size validation.
Segment reconnection via satellite only	Functional: Floating segments with no satellite match stay permanently unresolved.	Cross-segment matching when new anchors arrive. DEM constraints. Configurable user-input timeout with auto-continue.

Product Solution Description

A Python-based GPS-denied visual navigation service that determines GPS coordinates of consecutive UAV photo centers using a hierarchical localization approach: fast visual odometry for frame-to-frame motion, two-stage satellite geo-referencing (coarse retrieval + fine matching) for absolute positioning, and factor graph optimization for trajectory refinement. The system operates as a background REST API service with real-time SSE streaming.

Core approach: Consecutive images are matched using XFeat (fast learned features) to estimate relative motion (visual odometry). Periodically, each image is geo-referenced against satellite imagery through a two-stage process: DINOv2 global retrieval selects the best-matching satellite tile, then SuperPoint+LightGlue refines the alignment to pixel precision. A GTSAM iSAM2 factor graph fuses VO constraints, satellite anchors, and DEM terrain constraints to produce an optimized trajectory. The system handles route disconnections by treating each continuous VO chain as an independent segment, geo-referenced through satellite matching and connected via the shared WGS84 coordinate frame.

┌─────────────────────────────────────────────────────────────────────┐
│                        Client (Desktop App)                         │
│   POST /jobs (start GPS, camera params, image folder)               │
│   GET  /jobs/{id}/stream (SSE)                                      │
│   POST /jobs/{id}/anchor (user manual GPS input)                    │
│   GET  /jobs/{id}/point-to-gps (image_id, pixel_x, pixel_y)        │
└──────────────────────┬──────────────────────────────────────────────┘
                       │ HTTP/SSE (JWT auth)
┌──────────────────────▼──────────────────────────────────────────────┐
│                     FastAPI Service Layer                            │
│   Job Manager → Pipeline Orchestrator → SSE Event Emitter           │
└──────────────────────┬──────────────────────────────────────────────┘
                       │
┌──────────────────────▼──────────────────────────────────────────────┐
│                     Processing Pipeline                              │
│                                                                      │
│  ┌──────────────┐  ┌──────────────┐  ┌─────────────────────────┐   │
│  │ Image        │  │  Visual      │  │ Satellite Geo-Ref       │   │
│  │ Preprocessor │→│  Odometry    │→│ Stage 1: DINOv2 retrieval│   │
│  │ (downscale,  │  │ (XFeat +    │  │ Stage 2: SuperPoint +   │   │
│  │  rectify)    │  │  LightGlue) │  │   LightGlue refinement  │   │
│  └──────────────┘  └──────────────┘  └─────────────────────────┘   │
│         │                │                       │                   │
│         ▼                ▼                       ▼                   │
│  ┌──────────────────────────────────────────────────────────────┐   │
│  │       GTSAM iSAM2 Factor Graph Optimizer                    │   │
│  │  (VO factors + satellite anchors + DEM terrain constraints)  │   │
│  └──────────────────────────────────────────────────────────────┘   │
│                              │                                       │
│  ┌───────────────────────────▼──────────────────────────────────┐   │
│  │                  Segment Manager                              │   │
│  │   (independent segments, cross-segment reconnection)          │   │
│  └──────────────────────────────────────────────────────────────┘   │
│                                                                      │
│  ┌──────────────────────────────────────────────────────────────┐   │
│  │          Multi-Provider Satellite Tile Cache                  │   │
│  │  (Google Maps + Mapbox + user tiles, disk cache, DEM cache)   │   │
│  └──────────────────────────────────────────────────────────────┘   │
└──────────────────────────────────────────────────────────────────────┘

Architecture

Component: Image Preprocessor

Solution	Tools	Advantages	Limitations	Requirements	Security	Performance	Fit
Downscale + rectify pipeline	OpenCV resize, NumPy	Normalizes input for all downstream components. Consistent memory usage. Preserves full-res metadata for GSD.	Loses fine detail in downscaled images.	OpenCV, NumPy	Input validation on image files	<10ms per image	Best

Selected: Downscale + rectify pipeline.

Preprocessing per image:

Load image, validate format and dimensions
Downscale to max 1600 pixels on longest edge (preserving aspect ratio) for feature extraction
Store original resolution for GSD calculation: GSD = (altitude × sensor_width) / (focal_length × original_width)
If estimated heading is available (from previous VO): rotate image to approximate north-up orientation for satellite matching
Convert to grayscale for feature extraction
Output: downscaled grayscale image + metadata (original dims, GSD, estimated heading)

Component: Feature Extraction

Solution	Tools	Advantages	Limitations	Requirements	Security	Performance	Fit
XFeat (for VO)	accelerated_features (PyTorch)	5x faster than SuperPoint. CPU-capable. Sparse + semi-dense matching. Used in SatLoc-Fusion.	Fewer keypoints than SuperPoint in some scenes.	PyTorch	Model weights from official source	~15ms GPU, ~50ms CPU	Best for VO
SuperPoint (for satellite matching)	superpoint (PyTorch)	Learned features, robust to viewpoint/illumination. Proven for satellite matching (ISPRS 2025). 256-dim descriptors.	Slower than XFeat. Not rotation-invariant.	NVIDIA GPU, PyTorch, CUDA	Model weights from official source	~80ms GPU	Best for satellite
SIFT (fallback)	OpenCV cv2.SIFT	Rotation-invariant. Scale-invariant. Better for high-rotation scenarios.	Slower. Less discriminative in low-texture.	OpenCV	N/A	~200ms CPU	Rotation fallback

Selected: XFeat for VO, SuperPoint for satellite matching, SIFT as rotation-heavy fallback.

Component: Feature Matching

Solution	Tools	Advantages	Limitations	Requirements	Security	Performance	Fit
LightGlue + ONNX/TensorRT	lightglue + LightGlue-ONNX	2-4x faster than PyTorch via ONNX. Best balance of speed/accuracy. FlashAttention-2 + TopK-trick for additional 30%.	FP8 not available on RTX 2060 (Turing). Not rotation-invariant.	NVIDIA GPU, ONNX Runtime / TensorRT	Model weights from official source	~50-100ms ONNX on RTX 2060	Best
SuperGlue	superglue (PyTorch)	Strong spatial context. 93% match rate.	2x slower than LightGlue. Non-commercial license.	NVIDIA GPU, PyTorch	Model weights from official source	~100-200ms	Backup
DALGlue	dalglue (PyTorch)	11.8% MMA improvement over LightGlue. UAV-optimized wavelet preprocessing.	Very new (2025). Limited production validation.	NVIDIA GPU, PyTorch	Model weights from official source	Comparable to LightGlue	Monitor for future

Selected: LightGlue with ONNX optimization. DALGlue to evaluate when mature.

Component: Visual Odometry (Consecutive Frame Matching)

Solution	Tools	Advantages	Limitations	Requirements	Security	Performance	Fit
Homography VO with essential matrix fallback	OpenCV findHomography, findEssentialMat, decomposeHomographyMat	Homography: optimal for flat terrain. Essential matrix: handles non-planar. Known altitude resolves scale.	Homography assumes planar. Essential matrix: more complex, scale ambiguity.	OpenCV, NumPy	N/A	~5ms for estimation	Best
ORB-SLAM3 monocular	ORB-SLAM3	Full SLAM with loop closure.	Heavy. Map building unnecessary. C++ dependency.	ROS (optional), C++	N/A	—	Over-engineered

Selected: Homography VO with essential matrix fallback and DEM terrain correction.

VO Pipeline per frame:

Extract XFeat features from current image (~15ms)
Match with previous image using LightGlue ONNX (~50ms)
Triple failure check: match count ≥ 30 AND RANSAC inlier ratio ≥ 0.4 AND motion magnitude consistent with expected inter-frame distance (100m ± 250m to handle outliers up to 350m)
If checks pass → estimate homography (cv2.findHomography with USAC_MAGSAC)
If RANSAC inlier ratio < 0.6 → additionally estimate essential matrix (cv2.findEssentialMat) as quality check
Decompose homography → rotation + translation
Select correct decomposition (motion consistent with previous direction + positive depth)
Terrain-corrected GSD: query Copernicus DEM at estimated position → effective_altitude = flight_altitude - terrain_elevation → GSD = (effective_altitude × sensor_width) / (focal_length × original_image_width)
Convert pixel displacement to meters: displacement_m = displacement_px × GSD
Update position: new_pos = prev_pos + rotation @ displacement_m
Track cumulative heading for image rectification
If triple failure check fails → trigger segment break

Component: Satellite Image Geo-Referencing (Two-Stage)

Solution	Tools	Advantages	Limitations	Requirements	Security	Performance	Fit
Stage 1: DINOv2 coarse retrieval	dinov2 (PyTorch), faiss	Handles large viewpoint/domain gap. Finds correct area even with 100-200m VO drift. Semantic matching robust to seasonal change.	Coarse only (~tile-level). Needs pre-computed satellite tile embeddings.	PyTorch, faiss	Model weights from official source	~50ms per query	Best coarse
Stage 2: SuperPoint+LightGlue fine matching with perspective warping	SuperPoint, LightGlue-ONNX, OpenCV warpPerspective	Precise alignment. Pixel-level accuracy. Proven on satellite benchmarks.	Needs rough pose for warping. Fails without Stage 1 on large drift.	PyTorch, ONNX Runtime, OpenCV	API keys secured	~150ms total	Best fine

Selected: Two-stage hierarchical matching.

Satellite Matching Pipeline:

Estimate approximate position from VO
Stage 1 — Coarse retrieval: a. Define search area: 500m radius around VO estimate (expand to 1km if segment just started) b. Pre-compute DINOv2 embeddings for all satellite tiles in search area (cached) c. Extract DINOv2 embedding from rectified UAV image d. Find top-5 most similar satellite tiles using faiss cosine similarity
Stage 2 — Fine matching (on top-5 tiles, stop on first good match): a. Warp UAV image to approximate nadir view using estimated camera pose b. Extract SuperPoint features from warped UAV image c. Extract SuperPoint features from satellite tile (pre-computed and cached) d. Match with LightGlue ONNX e. Geometric validation: require ≥15 inliers, inlier ratio ≥ 0.3, reprojection error < 3px f. If valid: estimate homography → transform image center → satellite pixel → WGS84 g. Report: absolute position anchor with confidence based on match quality
If all 5 tiles fail Stage 2: try SIFT+LightGlue (rotation-invariant), try zoom level 17 (wider view)
If still fails: mark frame as VO-only, reduce confidence, continue

Satellite matching frequency: Every frame when available, but async — don't block VO pipeline. Satellite result arrives and gets added to factor graph retroactively.

Component: GTSAM Factor Graph Optimizer

Solution	Tools	Advantages	Limitations	Requirements	Security	Performance	Fit
GTSAM iSAM2 factor graph	gtsam 4.2 (pip)	Incremental smoothing. Proper uncertainty propagation. Built-in GPSFactor. Backward smoothing on new evidence. Python bindings. Production-proven.	C++ backend (pip binary handles this). Learning curve for factor graph API.	gtsam==4.2, NumPy	N/A	~5-10ms incremental update	Best
scipy.optimize sliding window	scipy, NumPy	Simple. No external dependency.	Generic optimizer. No uncertainty. No incremental update.	SciPy, NumPy	N/A	~10-50ms per window	Baseline only

Selected: GTSAM iSAM2.

Factor graph structure:

Variables: Pose2 (x, y, heading) per image
VO Factor (BetweenFactorPose2): relative motion between consecutive frames. Noise model: diagonal with sigma proportional to 1 / inlier_ratio. Higher inlier ratio = lower uncertainty.
Satellite Anchor Factor (GPSFactor or PriorFactorPoint2): absolute position from satellite matching. Noise model: sigma proportional to reprojection_error × GSD. Good match (~0.5px × 0.4m/px) = 0.2m sigma. Poor match = 5-10m sigma.
DEM Terrain Factor (custom): constrains altitude to be consistent with Copernicus DEM at estimated position. Soft constraint, sigma = 5m.
Drift Limit Factor (custom): penalizes cumulative VO displacement between satellite anchors exceeding 100m. Activated only when two anchor-to-anchor VO path exceeds threshold.

Optimizer behavior:

On each new frame: add VO factor, run iSAM2.update() → ~5ms
On satellite match arrival: add anchor factor, run iSAM2.update() → triggers backward correction of recent poses
Emit updated positions via SSE after each update
Refinement events: when backward correction moves positions by >1m, emit "refined" SSE event

Component: Segment Manager

The segment manager tracks independent VO chains and manages their lifecycle and interconnection.

Segment lifecycle:

Start condition: First image, OR VO triple failure check fails
Active tracking: VO provides frame-to-frame motion within segment
Anchoring: Satellite two-stage matching provides absolute position
End condition: VO failure (sharp turn, outlier >350m, occlusion)
New segment: Starts from satellite anchor or user GPS

Segment states:

ANCHORED: At least one satellite match → HIGH confidence
FLOATING: No satellite match yet → positioned relative to start point → LOW confidence
USER_ANCHORED: User provided manual GPS → MEDIUM confidence

Enhanced segment reconnection:

When a segment becomes ANCHORED, check for nearby FLOATING segments (within 500m of any anchored position in the new segment)
Attempt satellite-based position matching between FLOATING segment images and satellite tiles near the ANCHORED segment
If match found: anchor the floating segment and connect to the trajectory
DEM consistency check: ensure segment positions are consistent with terrain elevation
If no match after all frames in floating segment are tried: request user input with configurable timeout (default: 30s), then continue with best VO estimate

Component: Multi-Provider Satellite Tile Cache

Solution	Tools	Advantages	Limitations	Requirements	Security	Performance	Fit
Multi-provider progressive cache with DEM	aiohttp, aiofiles, sqlite3, faiss	Multiple providers for coverage. Async download. DINOv2 embeddings pre-computed. DEM cached alongside tiles.	Needs internet. Provider API differences.	Google Maps Tiles API + Mapbox API keys	API keys in env vars only. Never logged.	Async, non-blocking	Best

Selected: Multi-provider progressive cache.

Provider priority:

User-provided tiles (highest priority — custom/recent imagery for the area)
Google Maps (zoom 18, ~0.4m/px) — 100K free tiles/month
Mapbox Satellite (zoom 16+, up to 0.3m/px) — 200K free requests/month

Cache strategy:

On job start: download tiles in 1km radius around starting GPS from primary provider
Pre-compute SuperPoint features AND DINOv2 embeddings for all cached tiles
As route extends: download tiles 500m ahead of estimated position
Request budgeting: track daily API requests, switch to secondary provider at 80% of daily limit

Cache structure on disk:

cache/
├── tiles/{provider}/{zoom}/{x}/{y}.jpg
├── features/{provider}/{zoom}/{x}/{y}_sp.npz     (SuperPoint features)
├── embeddings/{provider}/{zoom}/{x}/{y}_dino.npz  (DINOv2 embedding)
└── dem/{lat}_{lon}.tif                            (Copernicus DEM tiles)

Cache is persistent across jobs — tiles and features reused for overlapping areas
DEM cache: download Copernicus DEM GLO-30 tiles alongside satellite tiles. 30m resolution is sufficient for terrain correction at flight altitude.

Tile download budget (revised):

Google Maps: 100,000 tiles/month free → ~50 flights at 2000 tiles each
Mapbox: 200,000 requests/month free → additional capacity
Per flight: ~2000 satellite tiles (~80MB) + ~500 DEM tiles (~20MB)
Combined free tier handles operational volume

Component: API & Real-Time Streaming

Solution	Tools	Advantages	Limitations	Requirements	Security	Performance	Fit
FastAPI + SSE + JWT	FastAPI ≥0.135.0, EventSourceResponse, uvicorn, python-jose	Native SSE. Async pipeline. OpenAPI auto-generated. JWT for proper auth.	Python GIL (mitigated with asyncio + GPU-bound ops).	Python 3.11+, uvicorn	JWT auth, CORS, rate limiting	Async, non-blocking	Best

Selected: FastAPI + SSE + JWT authentication.

API Endpoints:

POST /auth/token
  Body: { api_key }
  Returns: { access_token, token_type, expires_in }

POST /jobs
  Headers: Authorization: Bearer <token>
  Body: { start_lat, start_lon, altitude, camera_params, image_folder }
  Returns: { job_id }

GET /jobs/{job_id}/stream
  Headers: Authorization: Bearer <token>
  SSE stream of:
    - { event: "position", data: { image_id, lat, lon, confidence, segment_id } }
    - { event: "refined", data: { image_id, lat, lon, confidence, delta_m } }
    - { event: "segment_start", data: { segment_id, reason } }
    - { event: "user_input_needed", data: { image_id, reason, timeout_s } }
    - { event: "complete", data: { summary } }

POST /jobs/{job_id}/anchor
  Headers: Authorization: Bearer <token>
  Body: { image_id, lat, lon }
  Manual user GPS input for an image

GET /jobs/{job_id}/point-to-gps?image_id=X&px=100&py=200
  Headers: Authorization: Bearer <token>
  Returns: { lat, lon, confidence }

GET /jobs/{job_id}/results?format=geojson
  Headers: Authorization: Bearer <token>
  Returns: full results as GeoJSON or CSV (WGS84)

Security measures:

JWT authentication on all endpoints (short-lived tokens, 1h expiry)
Image folder whitelist: only paths under configured base directories allowed
Max image dimensions: 8000×8000 pixels
Max concurrent SSE connections per client: 5
Rate limiting: 100 requests/minute per client
All provider API keys in environment variables, never logged or returned in responses
CORS configured for known client origins only

Component: Interactive Point-to-GPS Lookup

For each processed image, the system stores the estimated camera-to-ground homography. Given pixel coordinates (px, py):

If image has satellite match: use computed homography to project (px, py) → satellite tile coordinates → WGS84. HIGH confidence.
If image has only VO pose: use camera intrinsics + DEM-corrected altitude + estimated heading to ray-cast (px, py) to ground plane → WGS84. MEDIUM confidence.
Confidence score derived from underlying position estimate quality.

Testing Strategy

Integration / Functional Tests

End-to-end pipeline test using provided 60-image sample dataset with ground truth GPS
Verify 80% of positions within 50m of ground truth
Verify 60% of positions within 20m of ground truth
Test sharp turn handling: simulate 90° turn with non-overlapping images
Test segment creation, satellite anchoring, and cross-segment reconnection
Test user manual anchor injection via POST endpoint
Test point-to-GPS lookup accuracy against known ground coordinates
Test SSE streaming delivers results within 1s of processing completion
Test with FullHD resolution images (pipeline must not fail)
Test with 6252×4168 images (verify downscaling and memory usage)
Test DINOv2 coarse retrieval finds correct satellite tile with 100m VO drift
Test multi-provider fallback: block Google Maps, verify Mapbox takes over
Test with outdated satellite imagery: verify confidence scores reflect match quality
Test outlier handling: 350m gap between consecutive photos
Test image rotation handling: apply 45° rotation to images, verify pipeline handles it

Non-Functional Tests

Processing speed: <5s per image on RTX 2060 (target <2s with ONNX optimization)
Memory: peak RAM <16GB, VRAM <6GB during 3000-image flight at max resolution
Memory stability: process 3000 images, verify no memory leak (stable RSS over time)
Concurrent jobs: 2 simultaneous flights, verify isolation and resource sharing
Tile cache: verify tiles and features are cached and reused across jobs
API: load test SSE connections (10 simultaneous clients)
Recovery: kill and restart service mid-job, verify job can resume from last processed image
DEM download: verify Copernicus DEM tiles fetched and cached correctly
GTSAM optimizer: verify backward correction produces "refined" events with improved positions

Security Tests

JWT authentication enforcement on all endpoints
Expired/invalid token rejection
Provider API keys not exposed in responses, logs, or error messages
Image folder path traversal prevention (attempt to access /etc/passwd via image_folder)
Image folder whitelist enforcement
Input validation: invalid GPS coordinates, negative altitude, malformed camera params
Rate limiting: verify 429 response after exceeding limit
Max SSE connection enforcement
Max image size validation (reject >8000px)
CORS enforcement: reject requests from unknown origins

References

YFS90/GNSS-Denied-UAV-Geolocalization — <7m MAE with terrain-weighted constraint optimization
SatLoc-Fusion (2025) — hierarchical DINOv2+XFeat+optical flow, <15m on edge hardware
CEUSP (2025) — DINOv2-based cross-view UAV self-positioning
Oblique-Robust AVL (IEEE TGRS 2024) — rotation-equivariant features for UAV-satellite matching
XFeat (CVPR 2024) — 5x faster than SuperPoint
LightGlue-ONNX — 2-4x speedup via ONNX/TensorRT
DALGlue (2025) — 11.8% MMA improvement over LightGlue for UAV
SIFT+LightGlue UAV Mosaicking (ISPRS 2025) — SIFT superior for high-rotation conditions
LightGlue rotation issue #64 — confirmed not rotation-invariant
LightGlue no-match issue #13 — false matches on non-overlapping pairs
GTSAM v4.2 — factor graph optimization with Python bindings
Copernicus DEM GLO-30 — free 30m global DEM
Google Maps Tiles API — satellite tiles, 100K free/month
Mapbox Satellite — alternative tile provider, up to 0.3m/px
DINOv2 UAV Self-Localization (2025) — 86.27 R@1 on DenseUAV
FastAPI SSE
Homography Decomposition Revisited (IJCV 2025)
Sliding Window Factor Graph Optimization (2020)

Assessment research: _docs/00_research/gps_denied_nav_assessment/
Previous AC assessment: _docs/00_research/gps_denied_visual_nav/00_ac_assessment.md
Previous comparison framework: _docs/00_research/gps_denied_visual_nav/03_comparison_framework.md

29 KiB Raw Blame History Unescape Escape