Files
gps-denied-desktop/_docs/01_solution/solution_draft03.md
T

44 KiB
Raw Blame History

Solution Draft

Assessment Findings

Old Component Solution Weak Point New Solution
Pose2 + GPSFactor for satellite anchors Functional (Critical): GPSFactor works with Pose3, not Pose2. Code would fail at runtime. Custom Python DEM/drift factors are slow. Pose2 + BetweenFactorPose2 (VO) + PriorFactorPose2 (satellite anchors). Remove DEM terrain factor and drift limit factor from graph — handle in GSD calculation and Segment Manager respectively.
DINOv2 model variant unspecified + faiss GPU Performance/Memory: No model variant chosen. faiss GPU uses ~2GB scratch. Combined VRAM could exceed 6GB. DINOv2 ViT-S/14 (300MB VRAM, 50ms/img). faiss on CPU only (<1ms for ~2000 vectors). Explicit VRAM budget per model.
Heading-based rotation rectification + SIFT fallback Functional: No heading at segment start. No trigger criteria for SIFT. Multi-rotation matching not specified. Three-tier: (1) segment start → 4-rotation retry {0°,90°,180°,270°}, (2) heading available → rectify, (3) all fail → SIFT+LightGlue. Trigger: SuperPoint inlier ratio < 0.15.
"Motion consistent with previous direction" homography selection Functional: Underspecified for 4 decomposition solutions. Non-orthogonal R possible. No strategy for first frame pair. Four-step disambiguation: positive depth → plane normal up → motion consistency → orthogonality check via SVD. First pair: depth + normal only.
Raw DINOv2 CLS token + faiss cosine Performance: Raw CLS token suboptimal for retrieval. Patch-level features capture more spatial information. DINOv2 ViT-S/14 patch tokens with spatial average pooling (not just CLS). Cosine similarity via CPU faiss.
"Async satellite matching — don't block VO" Functional: No concrete concurrency model. Single GPU can't run two models simultaneously. Sequential GPU pipeline: VO first (~~40ms), satellite matching overlapped with next frame's VO (~~205ms). asyncio for I/O. CPU for faiss + RANSAC.
JWT + rate limiting + CORS Security: No image format validation. No Pillow CVE-2025-48379 mitigation. No SSE heartbeat. No memory-limited image loading. Pin Pillow ≥11.3.0. Validate magic bytes. Reject images >10,000px. SSE heartbeat 15s. asyncio.Queue event publisher. CSP headers.
Custom GTSAM drift limit factor Functional: Python callback per optimization step (slow). If no anchors for 50+ frames, nothing to constrain. Replace with Segment Manager drift thresholds: 100m → warning, 200m → user input request, 500m → LOW confidence. Exponential confidence decay from last anchor.
Google Maps tile download (API key only) Functional: Google Maps requires session tokens via createSession API, not just API key. 15K/day limit not managed. Implement session token lifecycle: createSession → use token → handle expiry. Request budget tracking per provider per day.
FastAPI EventSourceResponse Stability: Async generator cleanup issues on shutdown. No heartbeat. No reconnection support. asyncio.Queue-based EventPublisher pattern. SSE heartbeat every 15s. Last-Event-ID support for reconnection.

Product Solution Description

A Python-based GPS-denied visual navigation service that determines GPS coordinates of consecutive UAV photo centers using a hierarchical localization approach: fast visual odometry for frame-to-frame motion, two-stage satellite geo-referencing (coarse retrieval + fine matching) for absolute positioning, and factor graph optimization for trajectory refinement. The system operates as a background REST API service with real-time SSE streaming.

Core approach: Consecutive images are matched using XFeat (fast learned features) to estimate relative motion (visual odometry). Each image is geo-referenced against satellite imagery through a two-stage process: DINOv2 ViT-S/14 coarse retrieval selects the best-matching satellite tile using patch-level features, then SuperPoint+LightGlue refines the alignment to pixel precision. A GTSAM iSAM2 factor graph fuses VO constraints (BetweenFactorPose2) and satellite anchors (PriorFactorPose2) in local ENU coordinates to produce an optimized trajectory. The system handles route disconnections by treating each continuous VO chain as an independent segment, geo-referenced through satellite matching and connected via the shared WGS84 coordinate frame.

┌─────────────────────────────────────────────────────────────────────┐
│                        Client (Desktop App)                         │
│   POST /jobs (start GPS, camera params, image folder)               │
│   GET  /jobs/{id}/stream (SSE)                                      │
│   POST /jobs/{id}/anchor (user manual GPS input)                    │
│   GET  /jobs/{id}/point-to-gps (image_id, pixel_x, pixel_y)        │
└──────────────────────┬──────────────────────────────────────────────┘
                       │ HTTP/SSE (JWT auth)
┌──────────────────────▼──────────────────────────────────────────────┐
│                     FastAPI Service Layer                            │
│   Job Manager → Pipeline Orchestrator → SSE Event Publisher         │
│   (asyncio.Queue-based publisher, heartbeat, Last-Event-ID)         │
└──────────────────────┬──────────────────────────────────────────────┘
                       │
┌──────────────────────▼──────────────────────────────────────────────┐
│                     Processing Pipeline                              │
│                                                                      │
│  ┌──────────────┐  ┌──────────────┐  ┌─────────────────────────┐   │
│  │ Image        │  │  Visual      │  │ Satellite Geo-Ref       │   │
│  │ Preprocessor │→│  Odometry    │→│ Stage 1: DINOv2-S patch  │   │
│  │ (downscale,  │  │ (XFeat +    │  │   retrieval (CPU faiss)  │   │
│  │  rectify)    │  │  XFeat      │  │ Stage 2: SuperPoint +   │   │
│  │              │  │  matcher)   │  │   LightGlue-ONNX refine │   │
│  └──────────────┘  └──────────────┘  └─────────────────────────┘   │
│         │                │                       │                   │
│         ▼                ▼                       ▼                   │
│  ┌──────────────────────────────────────────────────────────────┐   │
│  │       GTSAM iSAM2 Factor Graph Optimizer                    │   │
│  │  Pose2 + BetweenFactorPose2 (VO) + PriorFactorPose2 (sat)  │   │
│  │  Local ENU coordinates → WGS84 output                       │   │
│  └──────────────────────────────────────────────────────────────┘   │
│                              │                                       │
│  ┌───────────────────────────▼──────────────────────────────────┐   │
│  │                  Segment Manager                              │   │
│  │   (drift thresholds, confidence decay, user input triggers)   │   │
│  └──────────────────────────────────────────────────────────────┘   │
│                                                                      │
│  ┌──────────────────────────────────────────────────────────────┐   │
│  │          Multi-Provider Satellite Tile Cache                  │   │
│  │  (Google Maps + Mapbox + user tiles, session tokens,          │   │
│  │   DEM cache, request budgeting)                               │   │
│  └──────────────────────────────────────────────────────────────┘   │
└──────────────────────────────────────────────────────────────────────┘

Architecture

Component: Image Preprocessor

Solution Tools Advantages Limitations Requirements Security Performance Fit
Downscale + rectify + validate OpenCV resize, NumPy Normalizes input. Consistent memory. Validates before loading. Loses fine detail in downscaled images. OpenCV, NumPy Magic byte validation, dimension check before load <10ms per image Best

Selected: Downscale + rectify + validate pipeline.

Preprocessing per image:

  1. Validate file: check magic bytes (JPEG/PNG/TIFF), reject unknown formats
  2. Read image header only: check dimensions, reject if either > 10,000px
  3. Load image via OpenCV (cv2.imread)
  4. Downscale to max 1600 pixels on longest edge (preserving aspect ratio)
  5. Store original resolution for GSD: GSD = (effective_altitude × sensor_width) / (focal_length × original_width) where effective_altitude = flight_altitude - terrain_elevation (terrain from Copernicus DEM)
  6. If estimated heading is available: rotate to approximate north-up for satellite matching
  7. If no heading (segment start): pass unrotated
  8. Convert to grayscale for feature extraction
  9. Output: downscaled grayscale image + metadata (original dims, GSD, heading if known)

Component: Feature Extraction

Solution Tools Advantages Limitations Requirements Security Performance Fit
XFeat (for VO) accelerated_features (PyTorch) 5x faster than SuperPoint. CPU-capable. Has built-in matcher. Semi-dense matching. Used in SatLoc-Fusion. Fewer keypoints than SuperPoint in some scenes. Not rotation-invariant. PyTorch Model weights from official source ~15ms GPU, ~50ms CPU Best for VO
SuperPoint (for satellite matching) superpoint (PyTorch) Learned features, robust to viewpoint/illumination. Proven for satellite matching (ISPRS 2025). 256-dim descriptors. Slower than XFeat. Not rotation-invariant. NVIDIA GPU, PyTorch, CUDA Model weights from official source ~80ms GPU Best for satellite
SIFT (rotation fallback) OpenCV cv2.SIFT Rotation-invariant. Scale-invariant. Proven SIFT+LightGlue hybrid for UAV mosaicking (ISPRS 2025). Slower. Less discriminative in low-texture. OpenCV N/A ~200ms CPU Rotation fallback

Selected: XFeat with built-in matcher for VO, SuperPoint for satellite matching, SIFT+LightGlue as rotation-heavy fallback.

VRAM budget:

Model VRAM Loaded When
XFeat ~200MB Always (VO every frame)
DINOv2 ViT-S/14 ~300MB Satellite coarse retrieval
SuperPoint ~400MB Satellite fine matching
LightGlue ONNX FP16 ~500MB Satellite fine matching
ONNX Runtime overhead ~200MB When ONNX models active
Peak total ~1.6GB Satellite matching phase

Component: Feature Matching

Solution Tools Advantages Limitations Requirements Security Performance Fit
XFeat built-in matcher (VO) accelerated_features Fastest option (~15ms extract+match). Paired with XFeat extraction. Lower quality than LightGlue. PyTorch N/A ~15ms total Best for VO
LightGlue ONNX FP16 (satellite) LightGlue-ONNX 2-4x faster than PyTorch via ONNX. FP16 works on Turing (RTX 2060). FP8 not available on Turing. Not rotation-invariant. ONNX Runtime, NVIDIA GPU Model weights from official source ~50-100ms on RTX 2060 Best for satellite
SIFT+LightGlue (rotation fallback) OpenCV SIFT + LightGlue SIFT rotation invariance + LightGlue contextual matching. Proven superior for high-rotation UAV (ISPRS 2025). Slower than SuperPoint+LightGlue. OpenCV + ONNX Runtime N/A ~250ms total Rotation fallback

Selected: XFeat matcher for VO, LightGlue ONNX FP16 for satellite matching, SIFT+LightGlue as rotation fallback.

Component: Visual Odometry (Consecutive Frame Matching)

Solution Tools Advantages Limitations Requirements Security Performance Fit
Homography VO with essential matrix fallback OpenCV findHomography (USAC_MAGSAC), findEssentialMat, decomposeHomographyMat Homography: optimal for flat terrain. Essential matrix: non-planar fallback. Known altitude resolves scale. Homography assumes planar. 4-way decomposition ambiguity. OpenCV, NumPy N/A ~5ms for estimation Best

Selected: Homography VO with essential matrix fallback and DEM terrain-corrected GSD.

VO Pipeline per frame:

  1. Extract XFeat features from current image (~15ms)
  2. Match with previous image using XFeat built-in matcher (included in extraction time)
  3. Triple failure check: match count ≥ 30 AND RANSAC inlier ratio ≥ 0.4 AND motion magnitude consistent with expected inter-frame distance (100m ± 250m)
  4. If checks pass → estimate homography (cv2.findHomography with USAC_MAGSAC, confidence 0.999, max iterations 2000)
  5. If RANSAC inlier ratio < 0.6 → additionally estimate essential matrix as quality check
  6. Decomposition disambiguation (4 solutions from decomposeHomographyMat): a. Filter by positive depth: triangulate 5 matched points, reject if behind camera b. Filter by plane normal: normal z-component > 0.5 (downward camera → ground plane normal points up) c. If previous direction available: prefer solution consistent with expected motion d. Orthogonality check: verify R^T R ≈ I (Frobenius norm < 0.01). If failed, re-orthogonalize via SVD: U,S,V = svd(R), R_clean = U @ V^T e. First frame pair in segment: use filters a+b only
  7. Terrain-corrected GSD: query Copernicus DEM at estimated position → effective_altitude = flight_altitude - terrain_elevationGSD = (effective_altitude × sensor_width) / (focal_length × original_image_width)
  8. Convert pixel displacement to meters: displacement_m = displacement_px × GSD
  9. Update position: new_pos = prev_pos + rotation @ displacement_m
  10. Track cumulative heading for image rectification
  11. If triple failure check fails → trigger segment break

Component: Satellite Image Geo-Referencing (Two-Stage)

Solution Tools Advantages Limitations Requirements Security Performance Fit
Stage 1: DINOv2 ViT-S/14 patch retrieval dinov2 ViT-S/14 (PyTorch), faiss (CPU) Fast (50ms). 300MB VRAM. Patch tokens capture spatial layout better than CLS alone. Semantic matching robust to seasonal change. Coarse only (~tile-level). Lower precision than ViT-B/ViT-L. PyTorch, faiss-cpu Model weights from official source ~50ms extract + <1ms search Best coarse
Stage 2: SuperPoint+LightGlue ONNX FP16 SuperPoint, LightGlue-ONNX, OpenCV Precise pixel-level alignment. Proven on satellite benchmarks. FP16 on RTX 2060. Needs rough pose for warping. Not rotation-invariant. PyTorch, ONNX Runtime N/A ~150ms total Best fine

Selected: Two-stage hierarchical matching.

Satellite Matching Pipeline:

  1. Estimate approximate position from VO
  2. Stage 1 — Coarse retrieval: a. Define search area: 500m radius around VO estimate (expand to 1km if segment just started or drift > 100m) b. Pre-compute DINOv2 ViT-S/14 patch embeddings for all satellite tiles in search area. Method: extract patch tokens (not CLS), apply spatial average pooling to get a single descriptor per tile. Cache embeddings. c. Extract DINOv2 ViT-S/14 patch embedding from UAV image (same pooling) d. Find top-5 most similar satellite tiles using faiss (CPU) cosine similarity
  3. Stage 2 — Fine matching (on top-5 tiles, stop on first good match): a. Warp UAV image to approximate nadir view using estimated camera pose b. Rotation handling:
    • If heading known: single attempt with rectified image
    • If no heading (segment start): try 4 rotations {0°, 90°, 180°, 270°} c. Extract SuperPoint features from warped UAV image d. Extract SuperPoint features from satellite tile (pre-computed and cached) e. Match with LightGlue ONNX FP16 f. Geometric validation: require ≥15 inliers, inlier ratio ≥ 0.3, reprojection error < 3px g. If valid: estimate homography → transform image center → satellite pixel → WGS84 h. Report: absolute position anchor with confidence based on match quality
  4. If all 5 tiles fail Stage 2 with SuperPoint: a. Try SIFT+LightGlue on top-3 tiles (rotation-invariant). Trigger: best SuperPoint inlier ratio was < 0.15. b. Try zoom level 17 (wider view)
  5. If still fails: mark frame as VO-only, reduce confidence, continue

Satellite matching frequency: Every frame when available, but async — satellite matching for frame N overlaps with VO processing for frame N+1. Satellite result arrives and gets added to factor graph retroactively via iSAM2 update.

Component: GTSAM Factor Graph Optimizer

Solution Tools Advantages Limitations Requirements Security Performance Fit
GTSAM iSAM2 factor graph (Pose2) gtsam==4.2 (pip) Incremental smoothing. Proper uncertainty propagation. Native BetweenFactorPose2 and PriorFactorPose2. Backward smoothing on new evidence. Python bindings. C++ backend (pip binary). Learning curve. gtsam==4.2, NumPy N/A ~5-10ms incremental update Best

Selected: GTSAM iSAM2 with Pose2 variables.

Coordinate system: Local East-North-Up (ENU) centered on starting GPS. All positions computed in ENU meters, converted to WGS84 for output. Conversion: pyproj or manual geodetic math (WGS84 ellipsoid).

Factor graph structure:

  • Variables: Pose2 (x_enu, y_enu, heading) per image
  • Prior Factor (PriorFactorPose2): first frame anchored at ENU origin (0, 0, initial_heading) with tight noise (sigma_xy = 5m if GPS accurate, sigma_theta = 0.1 rad)
  • VO Factor (BetweenFactorPose2): relative motion between consecutive frames. Noise model: Diagonal.Sigmas([sigma_x, sigma_y, sigma_theta]) where sigma scales inversely with RANSAC inlier ratio. High inlier ratio (0.8) → sigma 2m. Low inlier ratio (0.4) → sigma 10m. Sigma_theta proportional to displacement magnitude.
  • Satellite Anchor Factor (PriorFactorPose2): absolute position from satellite matching. Position noise: sigma = reprojection_error × GSD × scale_factor. Good match (0.5px × 0.4m/px × 3) = 0.6m. Poor match = 5-10m. Heading component: loose (sigma = 1.0 rad) unless estimated from satellite alignment.

Optimizer behavior:

  • On each new frame: add VO factor, run iSAM2.update() → ~5ms
  • On satellite match arrival: add PriorFactorPose2, run iSAM2.update() → backward correction
  • Emit updated positions via SSE after each update
  • Refinement events: when backward correction moves positions by >1m, emit "refined" SSE event
  • No custom Python factors — all factors use native GTSAM C++ implementations for speed

Component: Segment Manager

The segment manager tracks independent VO chains, manages drift thresholds, and handles reconnection.

Segment lifecycle:

  1. Start condition: First image, OR VO triple failure check fails
  2. Active tracking: VO provides frame-to-frame motion within segment
  3. Anchoring: Satellite two-stage matching provides absolute position
  4. End condition: VO failure (sharp turn, outlier >350m, occlusion)
  5. New segment: Starts, attempts satellite anchor immediately

Segment states:

  • ANCHORED: At least one satellite match → HIGH confidence
  • FLOATING: No satellite match yet → positioned relative to segment start → LOW confidence
  • USER_ANCHORED: User provided manual GPS → MEDIUM confidence

Drift monitoring (replaces GTSAM custom drift factor):

  • Track cumulative VO displacement since last satellite anchor per segment
  • 100m threshold: emit warning SSE event, expand satellite search radius to 1km, increase matching attempts per frame
  • 200m threshold: emit user_input_needed SSE event with configurable timeout (default: 30s)
  • 500m threshold: mark all subsequent positions as VERY LOW confidence, continue processing
  • Confidence formula: confidence = base_confidence × exp(-drift / decay_constant) where base_confidence is from satellite match quality, drift is distance from nearest anchor, decay_constant = 100m

Segment reconnection:

  • When a segment becomes ANCHORED, check for nearby FLOATING segments (within 500m of any anchored position)
  • Attempt satellite-based position matching between FLOATING segment images and tiles near the ANCHORED segment
  • DEM consistency: verify segment elevation profile is consistent with terrain
  • If no match after all frames tried: request user input, auto-continue after timeout

Component: Multi-Provider Satellite Tile Cache

Solution Tools Advantages Limitations Requirements Security Performance Fit
Multi-provider progressive cache with DEM aiohttp, aiofiles, sqlite3, faiss-cpu Multiple providers. Async download. DINOv2/SuperPoint features pre-computed. DEM cached. Session token management. Needs internet. Provider API differences. Google Maps Tiles API + Mapbox API keys API keys in env vars only. Session tokens managed internally. Async, non-blocking Best

Selected: Multi-provider progressive cache.

Provider priority:

  1. User-provided tiles (highest priority — custom/recent imagery)
  2. Google Maps (zoom 18, ~0.4m/px) — 100K free requests/month, 15K/day
  3. Mapbox Satellite (zoom 16-18, ~0.6-0.3m/px) — 200K free requests/month

Google Maps session management:

  1. On job start: POST to /v1/createSession with API key → receive session token
  2. Use session token in all subsequent tile requests for this job
  3. Token has finite lifetime — handle expiry by creating new session
  4. Track request count per day per provider

Cache strategy:

  1. On job start: download tiles in 1km radius around starting GPS from primary provider
  2. Pre-compute SuperPoint features AND DINOv2 ViT-S/14 patch embeddings for all cached tiles
  3. As route extends: download tiles 500m ahead of estimated position
  4. Request budgeting: track daily API requests per provider. At 80% daily limit (12,000 for Google): switch to Mapbox. Log budget status.
  5. Cache structure on disk:
 cache/
 ├── tiles/{provider}/{zoom}/{x}/{y}.jpg
 ├── features/{provider}/{zoom}/{x}/{y}_sp.npz     (SuperPoint features)
 ├── embeddings/{provider}/{zoom}/{x}/{y}_dino.npy  (DINOv2 patch embedding)
 └── dem/{lat}_{lon}.tif                            (Copernicus DEM tiles)
  1. Cache persistent across jobs — tiles and features reused for overlapping areas
  2. DEM cache: Copernicus DEM GLO-30 tiles from AWS S3 (free, no auth). s3://copernicus-dem-30m/. Cloud Optimized GeoTIFFs, 30m resolution. Downloaded via HTTPS (no AWS SDK needed): https://copernicus-dem-30m.s3.amazonaws.com/Copernicus_DSM_COG_10_{N|S}{lat}_00_{E|W}{lon}_DEM/...

Tile download budget:

  • Google Maps: 100,000/month, 15,000/day → ~7 flights/day from cache misses, ~50 flights/month
  • Mapbox: 200,000/month → additional ~100 flights/month
  • Per flight: ~~2000 satellite tiles (~~80MB) + ~~200 DEM tiles (~~10MB)

Component: API & Real-Time Streaming

Solution Tools Advantages Limitations Requirements Security Performance Fit
FastAPI + SSE (Queue-based) + JWT FastAPI ≥0.135.0, asyncio.Queue, uvicorn, python-jose Native SSE. Queue-based publisher avoids generator cleanup issues. JWT auth. OpenAPI auto-generated. Python GIL (mitigated with asyncio). Python 3.11+, uvicorn JWT, CORS, rate limiting, CSP headers Async, non-blocking Best

Selected: FastAPI + Queue-based SSE + JWT authentication.

SSE implementation:

  • Use asyncio.Queue per client connection (not bare async generators)
  • Server pushes events to queue; client reads from queue
  • On disconnect: queue is garbage collected, no lingering generators
  • SSE heartbeat: send event: heartbeat every 15 seconds to detect stale connections
  • Support Last-Event-ID header for reconnection: include monotonic event ID in each SSE message. On reconnect, replay missed events from in-memory ring buffer (last 1000 events per job).

API Endpoints:

POST /auth/token
  Body: { api_key }
  Returns: { access_token, token_type, expires_in }

POST /jobs
  Headers: Authorization: Bearer <token>
  Body: { start_lat, start_lon, altitude, camera_params, image_folder }
  Returns: { job_id }

GET /jobs/{job_id}/stream
  Headers: Authorization: Bearer <token>
  SSE stream of:
    - { event: "position", id: "42", data: { image_id, lat, lon, confidence, segment_id } }
    - { event: "refined", id: "43", data: { image_id, lat, lon, confidence, delta_m } }
    - { event: "segment_start", id: "44", data: { segment_id, reason } }
    - { event: "drift_warning", id: "45", data: { segment_id, cumulative_drift_m } }
    - { event: "user_input_needed", id: "46", data: { image_id, reason, timeout_s } }
    - { event: "heartbeat", id: "47", data: { timestamp } }
    - { event: "complete", id: "48", data: { summary } }

POST /jobs/{job_id}/anchor
  Headers: Authorization: Bearer <token>
  Body: { image_id, lat, lon }

GET /jobs/{job_id}/point-to-gps?image_id=X&px=100&py=200
  Headers: Authorization: Bearer <token>
  Returns: { lat, lon, confidence }

GET /jobs/{job_id}/results?format=geojson
  Headers: Authorization: Bearer <token>
  Returns: full results as GeoJSON or CSV (WGS84)

Security measures:

  • JWT authentication on all endpoints (short-lived tokens, 1h expiry)
  • Image folder whitelist: resolve to canonical path (os.path.realpath), verify under configured base directories
  • Image validation: magic byte check (JPEG FFD8, PNG 89504E47, TIFF 4949/4D4D), dimension check (<10,000px per side), reject others
  • Pin Pillow ≥11.3.0 (CVE-2025-48379 mitigation)
  • Max concurrent SSE connections per client: 5
  • Rate limiting: 100 requests/minute per client
  • All provider API keys in environment variables, never logged or returned
  • CORS configured for known client origins only
  • Content-Security-Policy headers
  • SSE heartbeat prevents stale connections accumulating

Component: Interactive Point-to-GPS Lookup

For each processed image, the system stores the estimated camera-to-ground transformation. Given pixel coordinates (px, py):

  1. If image has satellite match: use computed homography to project (px, py) → satellite tile coordinates → WGS84. HIGH confidence.
  2. If image has only VO pose: use camera intrinsics + DEM-corrected altitude + estimated heading to ray-cast (px, py) to ground plane → WGS84. MEDIUM confidence.
  3. Confidence score derived from underlying position estimate quality.

Processing Time Budget

Step Component Time GPU/CPU Notes
1 Image load + validate + downscale <10ms CPU OpenCV
2 XFeat extract + match (VO) ~15ms GPU Built-in matcher
3 Homography estimation + decomposition ~5ms CPU USAC_MAGSAC
4 GTSAM iSAM2 update (VO factor) ~5ms CPU Incremental
5 SSE position emit <1ms CPU Queue push
VO subtotal ~36ms Per-frame critical path
6 DINOv2 ViT-S/14 extract (UAV image) ~50ms GPU Patch tokens
7 faiss cosine search (top-5 tiles) <1ms CPU ~2000 vectors
8 SuperPoint extract (UAV warped) ~80ms GPU
9 LightGlue ONNX match (per tile, up to 5) ~50-100ms GPU Stop on first good match
10 Geometric validation + homography ~5ms CPU
11 GTSAM iSAM2 update (satellite factor) ~5ms CPU Backward correction
Satellite subtotal ~191-236ms Overlapped with next frame's VO
Total per frame ~230-270ms Well under 5s budget

Testing Strategy

Integration / Functional Tests

  • End-to-end pipeline test using provided 60-image sample dataset with ground truth GPS
  • Verify 80% of positions within 50m of ground truth
  • Verify 60% of positions within 20m of ground truth
  • Test sharp turn handling: simulate 90° turn with non-overlapping images
  • Test segment creation, satellite anchoring, and cross-segment reconnection
  • Test user manual anchor injection via POST endpoint
  • Test point-to-GPS lookup accuracy against known ground coordinates
  • Test SSE streaming delivers results within 1s of processing completion
  • Test with FullHD resolution images (pipeline must not fail)
  • Test with 6252×4168 images (verify downscaling and memory usage)
  • Test DINOv2 ViT-S/14 coarse retrieval finds correct satellite tile with 100m VO drift
  • Test multi-provider fallback: block Google Maps, verify Mapbox takes over
  • Test with outdated satellite imagery: verify confidence scores reflect match quality
  • Test outlier handling: 350m gap between consecutive photos
  • Test image rotation handling: apply 45° and 90° rotation, verify 4-rotation retry works
  • Test SIFT+LightGlue fallback triggers when SuperPoint inlier ratio < 0.15
  • Test GTSAM PriorFactorPose2 satellite anchoring produces backward correction
  • Test drift warning at 100m cumulative displacement without satellite anchor
  • Test user_input_needed event at 200m cumulative displacement
  • Test SSE heartbeat arrives every 15s during long processing
  • Test SSE reconnection with Last-Event-ID replays missed events
  • Test homography decomposition disambiguation for first frame pair (no previous direction)

Non-Functional Tests

  • Processing speed: <5s per image on RTX 2060 (target <300ms with ONNX optimization)
  • Memory: peak RAM <16GB, VRAM <6GB during 3000-image flight at max resolution
  • VRAM: verify peak stays under 2GB during satellite matching phase
  • Memory stability: process 3000 images, verify no memory leak (stable RSS over time)
  • Concurrent jobs: 2 simultaneous flights, verify isolation and resource sharing
  • Tile cache: verify tiles, SuperPoint features, and DINOv2 embeddings cached and reused
  • API: load test SSE connections (10 simultaneous clients)
  • Recovery: kill and restart service mid-job, verify job can resume from last processed image
  • DEM download: verify Copernicus DEM tiles fetched from AWS S3 and cached correctly
  • GTSAM optimizer: verify backward correction produces "refined" events
  • Session token lifecycle: verify Google Maps session creation, usage, and expiry handling

Security Tests

  • JWT authentication enforcement on all endpoints
  • Expired/invalid token rejection
  • Provider API keys not exposed in responses, logs, or error messages
  • Image folder path traversal prevention (attempt to access /etc/passwd via image_folder)
  • Image folder whitelist enforcement (canonical path resolution)
  • Image magic byte validation: reject non-image files renamed to .jpg
  • Image dimension validation: reject >10,000px images
  • Input validation: invalid GPS coordinates, negative altitude, malformed camera params
  • Rate limiting: verify 429 response after exceeding limit
  • Max SSE connection enforcement
  • CORS enforcement: reject requests from unknown origins
  • Content-Security-Policy header presence
  • Pillow version ≥11.3.0 verified in requirements

References

  • Previous assessment research: _docs/00_research/gps_denied_nav_assessment/
  • This assessment research: _docs/00_research/gps_denied_draft02_assessment/
  • Previous AC assessment: _docs/00_research/gps_denied_visual_nav/00_ac_assessment.md