mirror of
https://github.com/azaion/gps-denied-desktop.git
synced 2026-04-22 22:16:36 +00:00
284 lines
22 KiB
Markdown
284 lines
22 KiB
Markdown
# Solution Draft
|
||
|
||
## Product Solution Description
|
||
|
||
A Python-based GPS-denied visual navigation service that determines GPS coordinates of consecutive UAV aerial photo centers using visual odometry, satellite image geo-referencing, and sliding window position optimization. The system operates as a background REST API service with real-time SSE streaming.
|
||
|
||
**Core approach**: Consecutive images are matched using learned features (SuperPoint + LightGlue) to estimate relative motion (visual odometry). Periodically, each image is matched against pre-cached Google Maps satellite tiles to obtain absolute position anchors. A sliding window optimizer fuses VO estimates with satellite anchors, constraining drift. The system handles route disconnections by treating each continuous VO chain as an independent segment, geo-referenced through satellite matching.
|
||
|
||
```
|
||
┌─────────────────────────────────────────────────────────────────────┐
|
||
│ Client (Desktop App) │
|
||
│ POST /jobs (start GPS, camera params, image folder) │
|
||
│ GET /jobs/{id}/stream (SSE) │
|
||
│ POST /jobs/{id}/anchor (user manual GPS input) │
|
||
│ GET /jobs/{id}/point-to-gps (image_id, pixel_x, pixel_y) │
|
||
└──────────────────────┬──────────────────────────────────────────────┘
|
||
│ HTTP/SSE
|
||
┌──────────────────────▼──────────────────────────────────────────────┐
|
||
│ FastAPI Service Layer │
|
||
│ Job Manager → Pipeline Orchestrator → SSE Event Emitter │
|
||
└──────────────────────┬──────────────────────────────────────────────┘
|
||
│
|
||
┌──────────────────────▼──────────────────────────────────────────────┐
|
||
│ Processing Pipeline │
|
||
│ │
|
||
│ ┌─────────────┐ ┌──────────────┐ ┌────────────────────────────┐ │
|
||
│ │ Feature │ │ Visual │ │ Satellite Geo-Referencing │ │
|
||
│ │ Extractor │→│ Odometry │→│ (cross-view matching) │ │
|
||
│ │ (SuperPoint) │ │ (homography) │ │ (SuperPoint+LightGlue) │ │
|
||
│ └─────────────┘ └──────────────┘ └────────────────────────────┘ │
|
||
│ │ │ │ │
|
||
│ ▼ ▼ ▼ │
|
||
│ ┌──────────────────────────────────────────────────────────────┐ │
|
||
│ │ Sliding Window Position Optimizer │ │
|
||
│ │ (VO estimates + satellite anchors + drift constraints) │ │
|
||
│ └──────────────────────────────────────────────────────────────┘ │
|
||
│ │ │
|
||
│ ┌───────────────────────────▼──────────────────────────────────┐ │
|
||
│ │ Segment Manager │ │
|
||
│ │ (independent segments, satellite-anchored stitching) │ │
|
||
│ └──────────────────────────────────────────────────────────────┘ │
|
||
│ │
|
||
│ ┌──────────────────────────────────────────────────────────────┐ │
|
||
│ │ Satellite Tile Cache Manager │ │
|
||
│ │ (progressive download, Google Maps Tiles API, disk cache) │ │
|
||
│ └──────────────────────────────────────────────────────────────┘ │
|
||
└──────────────────────────────────────────────────────────────────────┘
|
||
```
|
||
|
||
## Existing/Competitor Solutions Analysis
|
||
|
||
| Solution | Approach | Accuracy | IMU Required | Open Source | Relevance |
|
||
|----------|----------|----------|-------------|-------------|-----------|
|
||
| YFS90/GNSS-Denied-UAV-Geolocalization | VO + satellite matching + terrain-weighted constraint optimization | <7m MAE | No | Yes (GitHub, 69★) | **Highest** — same constraints, best results |
|
||
| AerialPositioning (hamitbugrabayram) | Multi-provider tile engine + deep matchers + perspective warping | Not reported | Simulated INS | Yes (GitHub, 52★) | High — tile engine and perspective warping reference |
|
||
| NaviLoc | Trajectory-level VPR + VIO fusion | 19.5m MLE | Yes (VIO) | Partial | Medium — uses IMU, different altitude range |
|
||
| ITU Thesis (Öztürk 2025) | ORB-SLAM3 + SuperPoint/SuperGlue/GIM SIM | GPS-level | No | No | High — architecture reference |
|
||
| Mateos-Ramirez et al. (2024) | ORB VO + AKAZE satellite + Kalman filter | 143m mean (17km) | Yes | No | Medium — higher altitude, uses IMU |
|
||
| VisionUAV-Navigation | Multi-algorithm feature detection + satellite matching | Not reported | Not stated | Yes (GitHub) | Low — early stage |
|
||
|
||
**Key insight from competitor analysis**: YFS90 achieves <7m without IMU using terrain-weighted constraint optimization. This validates that our target accuracy (20-50m) is realistic and possibly conservative. The sliding window optimization approach is the critical differentiator from simpler VO+satellite systems.
|
||
|
||
## Architecture
|
||
|
||
### Component: Feature Extraction
|
||
|
||
| Solution | Tools | Advantages | Limitations | Requirements | Security | Cost | Fit |
|
||
|----------|-------|-----------|-------------|-------------|----------|------|-----|
|
||
| SuperPoint | superpoint (PyTorch) | Learned features, robust to viewpoint/illumination changes. Repeatability in aerial scenes. GPU-accelerated. ~80ms per image. | Requires GPU. Fixed descriptor dimension (256). | NVIDIA GPU, PyTorch, CUDA | Model weights from official source only | Free (MIT license) | **Best** |
|
||
| SIFT | OpenCV cv2.SIFT | Classical, well-understood. Scale/rotation invariant. Good satellite matching (SIFT+LightGlue top on ISPRS 2025). | Slower than SuperPoint. Less robust to extreme viewpoint changes. | OpenCV | N/A | Free | Good fallback |
|
||
| ORB | OpenCV cv2.ORB | Very fast. Many keypoints. | Not scale-invariant. Poor for cross-view matching. | OpenCV | N/A | Free | Only for fast VO |
|
||
|
||
**Selected**: SuperPoint as primary (both VO and satellite matching — unified pipeline). SIFT as fallback for satellite matching where SuperPoint struggles.
|
||
|
||
### Component: Feature Matching
|
||
|
||
| Solution | Tools | Advantages | Limitations | Requirements | Security | Cost | Fit |
|
||
|----------|-------|-----------|-------------|-------------|----------|------|-----|
|
||
| LightGlue | lightglue (PyTorch) | Fastest learned matcher (~20-50ms). Adaptive pruning. Best on satellite benchmarks. ONNX/TensorRT support for 2-4x speedup. | Requires GPU for best performance. | NVIDIA GPU, PyTorch | Model weights from official source only | Free (Apache 2.0) | **Best** |
|
||
| SuperGlue | superglue (PyTorch) | Graph neural network, strong spatial context. 93% match rate (ITU thesis). | Slower than LightGlue (~2x). Non-commercial license. | NVIDIA GPU, PyTorch | Model weights from official source only | Non-commercial license | Backup |
|
||
| GIM | gim (PyTorch) | Best generalization for challenging cross-domain scenes. | Additional model complexity. | NVIDIA GPU, PyTorch | Model weights from official source only | Free | Supplementary for difficult matches |
|
||
|
||
**Selected**: LightGlue as primary. GIM as supplementary for difficult satellite matches.
|
||
|
||
### Component: Visual Odometry (Consecutive Frame Matching)
|
||
|
||
| Solution | Tools | Advantages | Limitations | Requirements | Security | Cost | Fit |
|
||
|----------|-------|-----------|-------------|-------------|----------|------|-----|
|
||
| Homography-based VO | OpenCV findHomography, decomposeHomographyMat | Perfect for downward camera + flat terrain. Cleanly gives rotation + translation. Known altitude resolves scale. Simple, fast. | Assumes planar ground (valid for steppe at 400m). Fails at sharp turns (by design). | OpenCV, NumPy | N/A | Free | **Best** |
|
||
| Essential matrix VO | OpenCV findEssentialMat, recoverPose | More general than homography. Works for non-planar scenes. | Scale ambiguity harder to resolve. More complex. Unnecessary for our flat terrain case. | OpenCV | N/A | Free | Overengineered |
|
||
| ORB-SLAM3 monocular | ORB-SLAM3 | Full SLAM with map management, loop closure. | Heavy dependency. Map building unnecessary. Scale ambiguity. | ROS (optional), C++ | N/A | Free (GPL) | Too complex |
|
||
|
||
**Selected**: Homography-based VO with SuperPoint+LightGlue features.
|
||
|
||
**VO Pipeline per frame**:
|
||
1. Extract SuperPoint features from current image
|
||
2. Match with previous image using LightGlue
|
||
3. Estimate homography (cv2.findHomography with RANSAC)
|
||
4. Decompose homography → rotation + translation (cv2.decomposeHomographyMat)
|
||
5. Select correct decomposition (motion must be consistent with previous direction)
|
||
6. Convert pixel displacement to meters: `displacement_m = displacement_px × GSD`
|
||
7. GSD = (altitude × sensor_width) / (focal_length × image_width)
|
||
8. Update position: new_pos = prev_pos + rotation × displacement_m
|
||
9. Report inlier ratio as match quality metric
|
||
|
||
### Component: Satellite Image Geo-Referencing
|
||
|
||
| Solution | Tools | Advantages | Limitations | Requirements | Security | Cost | Fit |
|
||
|----------|-------|-----------|-------------|-------------|----------|------|-----|
|
||
| Direct cross-view matching with perspective warping | SuperPoint+LightGlue, OpenCV warpPerspective | Pre-warp UAV image to approximate nadir view. Reduces viewpoint gap. Proven approach. | Needs rough camera pose estimate (from VO) for warping. | PyTorch, OpenCV | API key secured | Google Maps API cost | **Best** |
|
||
| Template matching (normalized cross-correlation) | OpenCV matchTemplate | Simple, no learning required. | Very sensitive to scale/rotation/illumination differences. Poor for cross-view. | OpenCV | N/A | Free | Poor for cross-view |
|
||
| VPR retrieval + refinement (NetVLAD/CosPlace) | torchvision, faiss | Handles large search areas. | Coarse localization only (tile-level). Needs fine-grained refinement step. | PyTorch, faiss | N/A | Free | Supplementary — coarse search |
|
||
|
||
**Selected**: Direct cross-view matching with perspective warping using SuperPoint + LightGlue.
|
||
|
||
**Satellite Matching Pipeline per frame**:
|
||
1. Estimate approximate position from VO
|
||
2. Fetch satellite tile(s) from cache at estimated position (zoom 18, ~0.4m/px)
|
||
3. Crop satellite region matching UAV image footprint (with margin)
|
||
4. Warp UAV image to approximate nadir view using estimated camera pose
|
||
5. Extract SuperPoint features from warped UAV image
|
||
6. Extract SuperPoint features from satellite crop (can be pre-computed and cached)
|
||
7. Match with LightGlue
|
||
8. If insufficient matches: try GIM, try wider search area, try zoom 17
|
||
9. If sufficient matches (≥15 inliers):
|
||
a. Estimate homography from matches
|
||
b. Transform image center through homography → satellite pixel coordinates
|
||
c. Convert satellite pixel coordinates to WGS84 using tile geo-referencing
|
||
d. This is the absolute position anchor
|
||
10. Report match count and inlier ratio as confidence metrics
|
||
|
||
### Component: Sliding Window Position Optimizer
|
||
|
||
| Solution | Tools | Advantages | Limitations | Requirements | Security | Cost | Fit |
|
||
|----------|-------|-----------|-------------|-------------|----------|------|-----|
|
||
| Constrained sliding window optimization | scipy.optimize, NumPy | Fuses VO + satellite anchors. Constrains maximum drift. Smooths trajectory. Inspired by YFS90 (<7m). | Window size tuning needed. | SciPy, NumPy | N/A | Free | **Best** |
|
||
| Extended Kalman Filter | filterpy | Standard, well-understood. Online fusion. | Linearization approximation. Single-pass, no backward smoothing. | filterpy | N/A | Free | Good simpler alternative |
|
||
| Pose Graph Optimization | g2o or GTSAM (Python bindings) | Globally optimal. Handles complex factor graphs. | Heavy C++ dependency. Overkill for sequential processing. | g2o/GTSAM, C++ | N/A | Free | Over-engineered |
|
||
|
||
**Selected**: Constrained sliding window optimization (primary), with EKF as simpler initial implementation.
|
||
|
||
**Optimizer behavior**:
|
||
- Maintains a sliding window of last N positions (N=20-50)
|
||
- VO estimates provide relative motion constraints between consecutive positions
|
||
- Satellite matches provide absolute position anchors (hard/soft constraints)
|
||
- Maximum drift constraint: cumulative VO displacement between anchors < 100m
|
||
- Optimization minimizes: sum of VO residuals + anchor residuals + smoothness penalty
|
||
- On each new frame: add to window, re-optimize, emit updated positions
|
||
- Enables refinement: earlier positions improve as new anchors arrive
|
||
|
||
### Component: Segment Manager
|
||
|
||
The segment manager is the core architectural pattern, not an edge case handler.
|
||
|
||
**Segment lifecycle**:
|
||
1. **Start condition**: First image of flight, or VO failure (feature match count < threshold)
|
||
2. **Active tracking**: VO provides frame-to-frame motion within segment
|
||
3. **Anchoring**: Satellite matching provides absolute position for segment's images
|
||
4. **End condition**: VO failure (sharp turn, outlier, occlusion)
|
||
5. **New segment**: Starts from satellite anchor or user-provided GPS
|
||
|
||
**Segment states**:
|
||
- `ANCHORED`: At least one satellite match provides absolute position → HIGH confidence
|
||
- `FLOATING`: No satellite match yet → positioned relative to start point only → LOW confidence
|
||
- `USER_ANCHORED`: User provided manual GPS → MEDIUM confidence (human error possible)
|
||
|
||
**Segment stitching**:
|
||
- All segments share the WGS84 coordinate frame via satellite matching
|
||
- No direct inter-segment matching needed
|
||
- A segment without any satellite anchor remains "floating" and is flagged for user input
|
||
|
||
### Component: Satellite Tile Cache Manager
|
||
|
||
| Solution | Tools | Advantages | Limitations | Requirements | Security | Cost | Fit |
|
||
|----------|-------|-----------|-------------|-------------|----------|------|-----|
|
||
| Progressive download with disk cache | aiohttp, aiofiles, sqlite3 | Async download doesn't block pipeline. Tiles cached to disk. Progressive expansion follows route. | Needs internet during processing. First few images may wait for tiles. | Google Maps Tiles API key | API key in env var, not in code | Google Maps API: $200/month free credit covers ~40K tiles | **Best** |
|
||
| Pre-download entire area | requests, sqlite3 | All tiles available at start. No download latency during processing. | Requires known bounding box. Large download for unknown routes. Wasteful. | Same | Same | Higher cost if area is large | For known routes |
|
||
|
||
**Selected**: Progressive download with disk cache.
|
||
|
||
**Strategy**:
|
||
1. On job start: download tiles in radius R=1km around starting GPS at zoom 18
|
||
2. As route extends: download tiles ahead of estimated position (radius 500m)
|
||
3. Cache tiles on disk in `{zoom}/{x}/{y}.jpg` directory structure
|
||
4. Cache is persistent across jobs — tiles are reused for overlapping areas
|
||
5. Pre-compute SuperPoint features for cached tiles (saved alongside tile images)
|
||
6. If tile download fails or is unavailable: log warning, mark position as VO-only
|
||
|
||
**Tile download budget**:
|
||
- Initial 1km radius at zoom 18: ~300 tiles (~12MB)
|
||
- Per-frame expansion: 5-20 new tiles (~0.2-0.8MB)
|
||
- Full 20km flight: ~2000 tiles (~80MB) over the course of processing
|
||
- Well within $200/month Google Maps free credit
|
||
|
||
### Component: API & Real-Time Streaming
|
||
|
||
| Solution | Tools | Advantages | Limitations | Requirements | Security | Cost | Fit |
|
||
|----------|-------|-----------|-------------|-------------|----------|------|-----|
|
||
| FastAPI + SSE | FastAPI ≥0.135.0, EventSourceResponse, uvicorn | Native SSE support. Async pipeline. Excellent for ML workloads. OpenAPI docs auto-generated. | Python GIL (mitigated with asyncio + GPU-bound ops). | Python 3.11+, uvicorn | CORS configuration, API key auth | Free | **Best** |
|
||
|
||
**Selected**: FastAPI + SSE.
|
||
|
||
**API Endpoints**:
|
||
```
|
||
POST /jobs
|
||
Body: { start_lat, start_lon, altitude, camera_params, image_folder }
|
||
Returns: { job_id }
|
||
|
||
GET /jobs/{job_id}/stream
|
||
SSE stream of:
|
||
- { event: "position", data: { image_id, lat, lon, confidence, segment_id } }
|
||
- { event: "refined", data: { image_id, lat, lon, confidence } }
|
||
- { event: "segment_start", data: { segment_id, reason } }
|
||
- { event: "user_input_needed", data: { image_id, reason } }
|
||
- { event: "complete", data: { summary } }
|
||
|
||
POST /jobs/{job_id}/anchor
|
||
Body: { image_id, lat, lon }
|
||
Manual user GPS input for an image
|
||
|
||
GET /jobs/{job_id}/point-to-gps?image_id=X&px=100&py=200
|
||
Returns: { lat, lon, confidence }
|
||
Interactive point-to-GPS lookup
|
||
|
||
GET /jobs/{job_id}/results
|
||
Returns: full results as GeoJSON or CSV
|
||
```
|
||
|
||
### Component: Interactive Point-to-GPS Lookup
|
||
|
||
For each processed image, the system stores the estimated camera-to-ground homography (from either satellite matching or VO+estimated pose). Given a pixel coordinate (px, py) in an image:
|
||
|
||
1. If image has satellite match: use the computed homography to project (px, py) → satellite tile coordinates → WGS84. High confidence.
|
||
2. If image has only VO pose: use camera intrinsics + estimated altitude + estimated heading to ray-cast (px, py) to the ground plane → WGS84. Medium confidence.
|
||
3. Both methods return confidence score based on the underlying position estimate quality.
|
||
|
||
## Testing Strategy
|
||
|
||
### Integration / Functional Tests
|
||
- End-to-end pipeline test using provided 60-image sample dataset with ground truth GPS
|
||
- Verify 80% of positions within 50m of ground truth
|
||
- Verify 60% of positions within 20m of ground truth
|
||
- Test sharp turn handling: simulate turn by reordering/skipping images
|
||
- Test segment creation and reconnection
|
||
- Test user manual anchor injection
|
||
- Test point-to-GPS lookup accuracy against known coordinates
|
||
- Test SSE streaming delivers results within 1s of processing completion
|
||
- Test with FullHD resolution images (degraded accuracy expected, but pipeline must not fail)
|
||
|
||
### Non-Functional Tests
|
||
- Processing speed: <5s per image on RTX 2060 (target <2s)
|
||
- Memory: peak RAM <16GB, VRAM <6GB during 3000-image flight
|
||
- Memory leak test: process 3000 images, verify stable memory
|
||
- Concurrent jobs: 2 simultaneous flights, verify isolation
|
||
- Tile cache: verify tiles are cached and reused across jobs
|
||
- API: load test SSE connections (10 simultaneous clients)
|
||
- Recovery: kill and restart service mid-job, verify job can resume
|
||
|
||
### Security Tests
|
||
- API key authentication enforcement
|
||
- Google Maps API key not exposed in responses or logs
|
||
- Image folder path traversal prevention
|
||
- Input validation (GPS coordinates, camera parameters)
|
||
- Rate limiting on API endpoints
|
||
|
||
## References
|
||
- [YFS90/GNSS-Denied-UAV-Geolocalization](https://github.com/YFS90/GNSS-Denied-UAV-Geolocalization) — <7m MAE without IMU
|
||
- [AerialPositioning](https://github.com/hamitbugrabayram/AerialPositioning) — tile engine and deep matcher integration reference
|
||
- [NaviLoc (2025)](https://www.mdpi.com/2504-446X/10/2/97) — trajectory-level visual localization, 19.5m MLE
|
||
- [ITU Thesis (2025)](https://polen.itu.edu.tr/items/1fe1e872-7cea-44d8-a8de-339e4587bee6) — ORB-SLAM3 + SIM integration
|
||
- [Mateos-Ramirez et al. (2024)](https://www.mdpi.com/2076-3417/14/16/7420) — VO + satellite correction for fixed-wing UAV
|
||
- [LightGlue (ICCV 2023)](https://github.com/cvg/LightGlue) — feature matching
|
||
- [SuperPoint](https://github.com/magicleap/SuperPointPretrainedNetwork) — feature extraction
|
||
- [DALGlue (2025)](https://www.nature.com/articles/s41598-025-21602-5) — 11.8% improvement over LightGlue
|
||
- [SCAR (2026)](https://arxiv.org/html/2602.16349v1) — satellite-based aerial calibration
|
||
- [DUSt3R/MASt3R evaluation (2025)](https://arxiv.org/abs/2507.14798) — extreme low-overlap matching
|
||
- [FastAPI SSE docs](https://fastapi.tiangolo.com/tutorial/server-sent-events/)
|
||
- [Google Maps Tiles API](https://developers.google.com/maps/documentation/tile/satellite)
|
||
|
||
## Related Artifacts
|
||
- AC assessment: `_docs/00_research/gps_denied_visual_nav/00_ac_assessment.md`
|
||
- Comparison framework: `_docs/00_research/gps_denied_visual_nav/03_comparison_framework.md`
|
||
- Reasoning chain: `_docs/00_research/gps_denied_visual_nav/04_reasoning_chain.md`
|