mirror of https://github.com/azaion/gps-denied-onboard.git synced 2026-04-22 22:36:37 +00:00

Files

T

Oleksandr Bezdieniezhnykh f2aa95c8a2 Refactor acceptance criteria, problem description, and restrictions for UAV GPS-Denied system. Enhance clarity and detail in performance metrics, image processing requirements, and operational constraints. Introduce new sections for UAV specifications, camera details, satellite imagery, and onboard hardware.

2026-03-17 09:00:06 +02:00

17 KiB

Raw Blame History

Solution Draft

Product Solution Description

A real-time GPS-denied visual navigation system for fixed-wing UAVs, running entirely on a Jetson Orin Nano Super (8GB). The system determines frame-center GPS coordinates by fusing three information sources: (1) CUDA-accelerated visual odometry (cuVSLAM), (2) absolute position corrections from satellite image matching, and (3) IMU-based motion prediction. Results stream to clients via REST API + SSE in real time.

Hard constraint: Camera shoots at ~3fps (333-400ms interval). The full pipeline must complete within 400ms per frame.

Satellite matching strategy: Benchmark LiteSAM on actual Orin Nano Super hardware as a day-one priority. If LiteSAM cannot achieve ≤400ms at 480px resolution, abandon it entirely and use XFeat semi-dense matching as the primary satellite matcher. Speed is non-negotiable.

Core architectural principles:

cuVSLAM handles VO — NVIDIA's CUDA-accelerated library achieves 90fps on Jetson Orin Nano, giving VO essentially "for free" (~11ms/frame).
Keyframe-based satellite matching — satellite matcher runs on keyframes only (every 3-10 frames), amortizing its cost. Non-keyframes rely on cuVSLAM VO + IMU.
Every keyframe independently attempts satellite-based geo-localization — this handles disconnected segments natively.
Pipeline parallelism — satellite matching for frame N overlaps with VO processing of frame N+1 via CUDA streams.

┌─────────────────────────────────────────────────────────────────┐
│                    OFFLINE (Before Flight)                       │
│  Satellite Tiles → Download & Crop → Store as tile pairs        │
│  (Google Maps)     (per flight plan)   (disk, GeoHash indexed)  │
└─────────────────────────────────────────────────────────────────┘
                              │
                              ▼
┌─────────────────────────────────────────────────────────────────┐
│                    ONLINE (During Flight)                        │
│                                                                  │
│  EVERY FRAME (400ms budget):                                     │
│  ┌────────────────────────────────┐                              │
│  │ Camera → Downsample (CUDA 2ms)│                               │
│  │       → cuVSLAM VO+IMU (~11ms)│──→ ESKF Update → SSE Emit   │
│  └────────────────────────────────┘         ↑                    │
│                                             │                    │
│  KEYFRAMES ONLY (every 3-10 frames):        │                    │
│  ┌────────────────────────────────────┐     │                    │
│  │ Satellite match (async CUDA stream)│─────┘                    │
│  │ LiteSAM or XFeat (see benchmark)  │                           │
│  │ (does NOT block VO output)         │                           │
│  └────────────────────────────────────┘                          │
│                                                                  │
│  IMU: 100+Hz continuous → ESKF prediction                        │
└─────────────────────────────────────────────────────────────────┘

Speed Optimization Techniques

1. cuVSLAM for Visual Odometry (~11ms/frame)

NVIDIA's CUDA-accelerated VO library (v15.0.0, March 2026) achieves 90fps on Jetson Orin Nano. Supports monocular camera + IMU natively. Features: automatic IMU fallback when visual tracking fails, loop closure, Python and C++ APIs. This eliminates custom VO entirely.

2. Keyframe-Based Satellite Matching

Not every frame needs satellite matching. Strategy:

cuVSLAM provides VO at every frame (high-rate, low-latency)
Satellite matching triggers on keyframes selected by:
- Fixed interval: every 3-10 frames (~1-3.3s between satellite corrections)
- Confidence drop: when ESKF covariance exceeds threshold
- VO failure: when cuVSLAM reports tracking loss (sharp turn)

3. Satellite Matcher Selection (Benchmark-Driven)

Candidate A: LiteSAM (opt) — Best accuracy for satellite-aerial matching (RMSE@30 = 17.86m on UAV-VisLoc). 6.31M params, MobileOne + TAIFormer + MinGRU. Benchmarked at 497ms on Jetson AGX Orin at 1184px. AGX Orin is 3-4x more powerful than Orin Nano Super (275 TOPS vs 67 TOPS, $2000+ vs $249).

Realistic Orin Nano Super estimates:

At 1184px: ~1.5-2.0s (unusable)
At 640px: ~500-800ms (borderline)
At 480px: ~300-500ms (best case)

Candidate B: XFeat semi-dense — ~50-100ms on Orin Nano Super. Proven on Jetson. Not specifically designed for cross-view satellite-aerial, but fast and reliable.

Decision rule: Benchmark LiteSAM TensorRT FP16 at 480px on Orin Nano Super. If ≤400ms → use LiteSAM. If >400ms → abandon LiteSAM, use XFeat as primary. No hybrid compromises — pick one and optimize it.

4. TensorRT FP16 Optimization

LiteSAM's MobileOne backbone is reparameterizable — multi-branch training structure collapses to a single feed-forward path at inference. Combined with TensorRT FP16, this maximizes throughput. INT8 is possible for MobileOne backbone but ViT/transformer components may degrade with INT8.

5. CUDA Stream Pipelining

Overlap operations across consecutive frames:

Stream A: cuVSLAM VO for current frame (~11ms) + ESKF fusion (~1ms)
Stream B: Satellite matching for previous keyframe (async)
CPU: SSE emission, tile management, keyframe selection logic

6. Pre-cropped Satellite Tiles

Offline: for each satellite tile, store both the raw image and a pre-resized version matching the satellite matcher's input resolution. Runtime avoids resize cost.

Existing/Competitor Solutions Analysis

Solution	Approach	Accuracy	Hardware	Limitations
Mateos-Ramirez et al. (2024)	VO (ORB) + satellite keypoint correction + Kalman	142m mean / 17km (0.83%)	Orange Pi class	No re-localization; ORB only; 1000m+ altitude
SatLoc (2025)	DinoV2 + XFeat + optical flow + adaptive fusion	<15m, >90% coverage	Edge (unspecified)	Paper not fully accessible
LiteSAM (2025)	MobileOne + TAIFormer + MinGRU subpixel refinement	RMSE@30 = 17.86m on UAV-VisLoc	RTX 3090 (62ms), AGX Orin (497ms)	Not tested on Orin Nano; AGX Orin is 3-4x more powerful
TerboucheHacene/visual_localization	SuperPoint/SuperGlue/GIM + VO + satellite	Not quantified	Desktop-class	Not edge-optimized
cuVSLAM (NVIDIA, 2025-2026)	CUDA-accelerated VO+SLAM, mono/stereo/IMU	<1% trajectory error (KITTI), <5cm (EuRoC)	Jetson Orin Nano (90fps)	VO only, no satellite matching

Key insight: Combine cuVSLAM (best-in-class VO for Jetson) with the fastest viable satellite-aerial matcher via ESKF fusion. LiteSAM is the accuracy leader but unproven on Orin Nano Super — benchmark first, abandon for XFeat if too slow.

Architecture

Component: Visual Odometry

Solution	Tools	Advantages	Limitations	Fit
cuVSLAM (mono+IMU)	PyCuVSLAM / C++ API	90fps on Orin Nano, NVIDIA-optimized, loop closure, IMU fallback	Closed-source CUDA library	✅ Best
XFeat frame-to-frame	XFeatTensorRT	5x faster than SuperPoint, open-source	~30-50ms total, no IMU integration	⚠️ Fallback
ORB-SLAM3	OpenCV + custom	Well-understood, open-source	CPU-heavy, ~30fps on Orin	⚠️ Slower

Selected: cuVSLAM (mono+IMU mode) — purpose-built by NVIDIA for Jetson. ~11ms/frame leaves 389ms for everything else. Auto-fallback to IMU when visual tracking fails.

Component: Satellite Image Matching

Solution	Tools	Advantages	Limitations	Fit
LiteSAM (opt)	TensorRT	Best satellite-aerial accuracy (RMSE@30 17.86m), 6.31M params, subpixel refinement	497ms on AGX Orin at 1184px; AGX Orin is 3-4x more powerful than Orin Nano Super	✅ If benchmark passes
XFeat semi-dense	XFeatTensorRT	~50-100ms, lightweight, Jetson-proven	Not designed for cross-view satellite-aerial	✅ If LiteSAM fails benchmark
EfficientLoFTR	TensorRT	Good accuracy, semi-dense	15.05M params (2.4x LiteSAM), slower	⚠️ Heavier
SuperPoint + LightGlue	TensorRT C++	Good general matching	Sparse only, worse on satellite-aerial	⚠️ Not specialized

Selection: Benchmark-driven. Day-one test on Orin Nano Super:

Export LiteSAM (opt) to TensorRT FP16
Measure at 480px, 640px, 800px
If ≤400ms at 480px → LiteSAM
If >400ms at any viable resolution → XFeat semi-dense (primary, no hybrid)

Component: Sensor Fusion

Solution	Tools	Advantages	Limitations	Fit
Error-State EKF (ESKF)	Custom Python/C++	Lightweight, multi-rate, well-understood	Linear approximation	✅ Best
Hybrid ESKF/UKF	Custom	49% better accuracy	More complex	⚠️ Upgrade path
Factor Graph (GTSAM)	GTSAM	Best accuracy	Heavy compute	❌ Too heavy

Selected: ESKF with adaptive measurement noise. State vector: [position(3), velocity(3), orientation_quat(4), accel_bias(3), gyro_bias(3)] = 16 states.

Measurement sources and rates:

IMU prediction: 100+Hz
cuVSLAM VO update: ~3Hz (every frame)
Satellite update: ~0.3-1Hz (keyframes only, delayed via async pipeline)

Component: Satellite Tile Preprocessing (Offline)

Selected: GeoHash-indexed tile pairs on disk.

Pipeline:

Define operational area from flight plan
Download satellite tiles from Google Maps Tile API at max zoom (18-19)
Pre-resize each tile to matcher input resolution
Store: original tile + resized tile + metadata (GPS bounds, zoom, GSD) in GeoHash-indexed directory structure
Copy to Jetson storage before flight

Component: Re-localization (Disconnected Segments)

Selected: Keyframe satellite matching is always active + expanded search on VO failure.

When cuVSLAM reports tracking loss (sharp turn, no features):

Immediately flag next frame as keyframe → trigger satellite matching
Expand tile search radius (from ±200m to ±1km based on IMU dead-reckoning uncertainty)
If match found: position recovered, new segment begins
If 3+ consecutive keyframe failures: request user input via API

Component: Object Center Coordinates

Geometric calculation once frame-center GPS is known:

Pixel offset from center: (dx_px, dy_px)
Convert to meters: dx_m = dx_px × GSD, dy_m = dy_px × GSD
Rotate by IMU yaw heading
Convert meter offset to lat/lon and add to frame-center GPS

Component: API & Streaming

Selected: FastAPI + sse-starlette. REST for session management, SSE for real-time position stream. OpenAPI auto-documentation.

Processing Time Budget (per frame, 400ms budget)

Normal Frame (non-keyframe, ~60-80% of frames)

Step	Time	Notes
Image capture + transfer	~10ms	CSI/USB3
Downsample (for cuVSLAM)	~2ms	OpenCV CUDA
cuVSLAM VO+IMU	~11ms	NVIDIA CUDA-optimized, 90fps capable
ESKF fusion (VO+IMU update)	~1ms	C extension or NumPy
SSE emit	~1ms	Async
Total	~25ms	Well within 400ms

Keyframe Satellite Matching (async, every 3-10 frames)

Runs asynchronously on a separate CUDA stream — does NOT block per-frame VO output.

Path A — LiteSAM (if benchmark passes):

Step	Time	Notes
Downsample to ~480px	~1ms	OpenCV CUDA
Load satellite tile	~5ms	Pre-resized, from storage
LiteSAM (opt) matching	~300-500ms	TensorRT FP16, 480px, Orin Nano Super estimate
Geometric pose (RANSAC)	~5ms	Homography estimation
ESKF satellite update	~1ms	Delayed measurement
Total	~310-510ms	Async, does not block VO

Path B — XFeat (if LiteSAM abandoned):

Step	Time	Notes
XFeat feature extraction (both images)	~10-20ms	TensorRT FP16/INT8
XFeat semi-dense matching	~30-50ms	KNN + refinement
Geometric verification (RANSAC)	~5ms
ESKF satellite update	~1ms
Total	~50-80ms	Comfortably within budget

Per-Frame Wall-Clock Latency

Every frame:

VO result emitted in ~25ms (cuVSLAM + ESKF + SSE)
Satellite correction arrives asynchronously on keyframes
Client gets immediate position, then refined position when satellite match completes

Memory Budget (Jetson Orin Nano Super, 8GB shared)

Component	Memory	Notes
OS + runtime	~1.5GB	JetPack 6.2 + Python
cuVSLAM	~200-300MB	NVIDIA CUDA library + internal state
Satellite matcher TensorRT	~50-100MB	LiteSAM FP16 or XFeat FP16
Current frame (downsampled)	~2MB	640×480×3
Satellite tile (pre-resized)	~1MB	Single active tile
ESKF state + buffers	~10MB
FastAPI + SSE runtime	~100MB
Total	~1.9-2.4GB	~25-30% of 8GB — comfortable margin

Confidence Scoring

Level	Condition	Expected Accuracy
HIGH	Satellite match succeeded + cuVSLAM consistent	<20m
MEDIUM	cuVSLAM VO only, recent satellite correction (<500m travel)	20-50m
LOW	cuVSLAM VO only, no recent satellite correction	50-100m+
VERY LOW	IMU dead-reckoning only (cuVSLAM + satellite both failed)	100m+
MANUAL	User-provided position	As provided

Key Risks and Mitigations

Risk	Likelihood	Impact	Mitigation
LiteSAM too slow on Orin Nano Super	HIGH	Misses 400ms deadline	Abandon LiteSAM, use XFeat. Day-one benchmark is the go/no-go gate
cuVSLAM not supporting nadir-only camera well	MEDIUM	VO accuracy degrades	Fall back to XFeat frame-to-frame matching
Google Maps satellite quality in conflict zone	HIGH	Satellite matching fails	Accept VO+IMU with higher drift; request user input sooner; alternative satellite providers
XFeat cross-view accuracy insufficient	MEDIUM	Position corrections less accurate than LiteSAM	Increase keyframe frequency; multi-tile consensus voting; geometric verification with strict RANSAC
cuVSLAM is closed-source	LOW	Hard to debug	Fallback to XFeat VO; cuVSLAM has Python+C++ APIs

Testing Strategy

Integration / Functional Tests

End-to-end pipeline test with real flight data (60 images from input_data/)
Compare computed positions against ground truth GPS from coordinates.csv
Measure: percentage within 50m, percentage within 20m
Test sharp-turn handling: introduce 90-degree heading change in sequence
Test user-input fallback: simulate 3+ consecutive failures
Test SSE streaming: verify client receives VO result within 50ms, satellite-corrected result within 500ms
Test session management: start/stop/restart flight sessions via REST API

Non-Functional Tests

Day-one benchmark: LiteSAM TensorRT FP16 at 480/640/800px on Orin Nano Super → go/no-go for LiteSAM
cuVSLAM benchmark: verify 90fps monocular+IMU on Orin Nano Super
Performance: measure per-frame processing time (must be <400ms)
Memory: monitor peak usage during 1000-frame session (must stay <8GB)
Stress: process 3000 frames without memory leak
Keyframe strategy: vary interval (2, 3, 5, 10) and measure accuracy vs latency tradeoff

References

LiteSAM (2025): https://www.mdpi.com/2072-4292/17/19/3349
LiteSAM code: https://github.com/boyagesmile/LiteSAM
cuVSLAM (2025-2026): https://github.com/NVlabs/PyCuVSLAM
PyCuVSLAM API: https://nvlabs.github.io/PyCuVSLAM/api.html
Mateos-Ramirez et al. (2024): https://www.mdpi.com/2076-3417/14/16/7420
SatLoc (2025): https://www.scilit.com/publications/e5cafaf875a49297a62b298a89d5572f
XFeat (CVPR 2024): https://arxiv.org/abs/2404.19174
XFeat TensorRT for Jetson: https://github.com/PranavNedunghat/XFeatTensorRT
EfficientLoFTR (CVPR 2024): https://github.com/zju3dv/EfficientLoFTR
JetPack 6.2: https://docs.nvidia.com/jetson/archives/jetpack-archived/jetpack-62/release-notes/
Hybrid ESKF/UKF: https://arxiv.org/abs/2512.17505
Google Maps Tile API: https://developers.google.com/maps/documentation/tile/satellite

AC Assessment: _docs/00_research/gps_denied_nav/00_ac_assessment.md
Tech stack evaluation: _docs/01_solution/tech_stack.md

17 KiB Raw Blame History Unescape Escape