Files
gps-denied-onboard/_docs/01_solution/tech_stack.md
T

12 KiB

Tech Stack Evaluation

Requirements Summary

Functional

  • GPS-denied visual navigation for fixed-wing UAV
  • Frame-center GPS estimation via VO + satellite matching + IMU fusion
  • Object-center GPS via geometric projection
  • Real-time streaming via REST API + SSE
  • Disconnected route segment handling
  • User-input fallback for unresolvable frames

Non-Functional

  • <400ms per-frame processing (camera @ ~3fps)
  • <50m accuracy for 80% of frames, <20m for 60%
  • <8GB total memory (CPU+GPU shared pool)
  • Up to 3000 frames per flight session
  • Image Registration Rate >95% (normal segments)

Hardware Constraints

  • Jetson Orin Nano Super (8GB LPDDR5, 1024 CUDA cores, 67 TOPS INT8)
  • JetPack 6.2.2: CUDA 12.6.10, TensorRT 10.3.0, cuDNN 9.3
  • ARM64 (aarch64) architecture
  • No internet connectivity during flight

Technology Evaluation

Platform & OS

Option Version Score (1-5) Notes
JetPack 6.2.2 (L4T) Ubuntu 22.04 based 5 Only supported OS for Orin Nano Super. Includes CUDA 12.6, TensorRT 10.3, cuDNN 9.3

Selected: JetPack 6.2.2 — no alternative.

Primary Language

Option Fitness Maturity Perf on Jetson Ecosystem Score
Python 3.10+ 5 5 4 5 4.8
C++ 5 5 5 3 4.5
Rust 3 3 5 2 3.3

Selected: Python 3.10+ as primary language.

  • cuVSLAM provides Python bindings (PyCuVSLAM v15.0.0)
  • TensorRT has Python API
  • FastAPI is Python-native
  • OpenCV has full Python+CUDA bindings
  • Performance-critical paths offloaded to CUDA via cuVSLAM/TensorRT — Python is glue code only
  • C++ for custom ESKF if NumPy proves too slow (unlikely for 16-state EKF at 100Hz)

Visual Odometry

Option Version FPS on Orin Nano Memory License Score
cuVSLAM (PyCuVSLAM) v15.0.0 (Mar 2026) 116fps @ 720p ~200-300MB Free (NVIDIA, closed-source) 5
XFeat frame-to-frame TensorRT engine ~30-50ms/frame ~50MB MIT 3.5
ORB-SLAM3 v1.0 ~30fps ~300MB GPLv3 2.5

Selected: PyCuVSLAM v15.0.0

  • 116fps on Orin Nano 8G at 720p (verified via Intermodalics benchmark)
  • Mono + IMU mode natively supported
  • Auto IMU fallback on tracking loss
  • Pre-built aarch64 wheel: pip install -e bin/aarch64
  • Loop closure built-in

Risk: Closed-source; nadir-only camera not explicitly tested. Fallback: XFeat frame-to-frame matching.

Satellite Image Matching (Benchmark-Driven Selection)

Day-one benchmark decides between two candidates:

Option Params Accuracy (UAV-VisLoc) Est. Time on Orin Nano License Score
LiteSAM (opt) 6.31M RMSE@30 = 17.86m ~300-500ms @ 480px (estimated) Open-source 4 (if fast enough)
XFeat semi-dense ~5M Not benchmarked on UAV-VisLoc ~50-100ms MIT 4 (if LiteSAM too slow)

Decision rule:

  1. Export LiteSAM (opt) to TensorRT FP16 on Orin Nano Super
  2. Benchmark at 480px, 640px, 800px
  3. If ≤400ms at 480px → LiteSAM
  4. If >400ms → abandon LiteSAM, XFeat is primary
Requirement LiteSAM (opt) XFeat semi-dense
PyTorch → ONNX → TensorRT export Required Required
TensorRT FP16 engine 6.31M params, ~25MB engine ~5M params, ~20MB engine
Input preprocessing Resize to 480px, normalize Resize to 640px, normalize
Matching pipeline End-to-end (detect + match + refine) Detect → KNN match → geometric verify
Cross-view robustness Designed for satellite-aerial gap General-purpose, less robust

Sensor Fusion

Option Complexity Accuracy Compute @ 100Hz Score
ESKF (custom) Low Good <1ms/step 5
Hybrid ESKF/UKF Medium 49% better ~2-3ms/step 3.5
GTSAM Factor Graph High Best ~10-50ms/step 2

Selected: Custom ESKF in Python (NumPy/SciPy)

  • 16-state vector, well within NumPy capability
  • FilterPy (v1.4.5, MIT) as reference/fallback, but custom implementation preferred for tighter control
  • If 100Hz IMU prediction step proves slow in Python: rewrite as Cython or C extension (~1 day effort)

Image Preprocessing

Option Tool Time on Orin Nano Notes Score
OpenCV CUDA resize cv2.cuda.resize ~2-3ms (pre-allocated) Must build OpenCV with CUDA from source. Pre-allocate GPU mats to avoid allocation overhead 4
NVIDIA VPI resize VPI 3.2 ~1-2ms Part of JetPack, potentially faster 4
CPU resize (OpenCV) cv2.resize ~5-10ms No GPU needed, simpler 3

Selected: OpenCV CUDA (pre-allocated GPU memory) or VPI 3.2 (whichever is faster in benchmark). Both available in JetPack 6.2.

  • Must build OpenCV from source with CUDA_ARCH_BIN=8.7 for Orin Nano Ampere architecture
  • Alternative: VPI 3.2 is pre-installed in JetPack 6.2, no build step needed

API & Streaming Framework

Option Version Async Support SSE Support Score
FastAPI + sse-starlette FastAPI 0.115+, sse-starlette 3.3.2 Native async/await EventSourceResponse with auto-disconnect 5
Flask + flask-sse Flask 3.x Limited Redis dependency 2
Raw aiohttp aiohttp 3.x Full Manual SSE implementation 3

Selected: FastAPI + sse-starlette v3.3.2

  • sse-starlette: 108M downloads/month, BSD-3 license, production-stable
  • Auto-generated OpenAPI docs
  • Native async for non-blocking VO + satellite pipeline
  • Uvicorn as ASGI server

Satellite Tile Storage & Indexing

Option Complexity Lookup Speed Score
GeoHash-indexed directory Low O(1) hash lookup 5
SQLite + spatial index Medium O(log n) 4
PostGIS High O(log n) 2 (overkill)

Selected: GeoHash-indexed directory structure

  • Pre-flight: download tiles, store as {geohash}/{zoom}_{x}_{y}.jpg + {geohash}/{zoom}_{x}_{y}_resized.jpg
  • Runtime: compute geohash from ESKF position → direct directory lookup
  • Metadata in JSON sidecar files
  • No database dependency on the Jetson during flight

Satellite Tile Provider

Provider Max Zoom GSD Pricing Eastern Ukraine Coverage Score
Google Maps Tile API 18-19 ~0.3-0.5 m/px 100K tiles free/month, then $0.48/1K Partial (conflict zone gaps) 4
Bing Maps 18-19 ~0.3-0.5 m/px 125K free/year (basic) Similar 3.5
Mapbox Satellite 18-19 ~0.5 m/px 200K free/month Similar 3.5

Selected: Google Maps Tile API (per restrictions.md). 100K free tiles/month covers ~25km² at zoom 19. For larger operational areas, costs are manageable at $0.48/1K tiles.

Output Format

Format Standard Tooling Score
GeoJSON RFC 7946 Universal GIS support 5
CSV (lat, lon, confidence) De facto Simple, lightweight 4

Selected: GeoJSON as primary, CSV as export option. Per AC: WGS84 coordinates.

Tech Stack Summary

┌────────────────────────────────────────────────────┐
│  HARDWARE: Jetson Orin Nano Super 8GB               │
│  OS: JetPack 6.2.2 (L4T / Ubuntu 22.04)            │
│  CUDA 12.6.10 / TensorRT 10.3.0 / cuDNN 9.3       │
├────────────────────────────────────────────────────┤
│  LANGUAGE: Python 3.10+                             │
│  FRAMEWORK: FastAPI + sse-starlette 3.3.2           │
│  SERVER: Uvicorn (ASGI)                             │
├────────────────────────────────────────────────────┤
│  VISUAL ODOMETRY: PyCuVSLAM v15.0.0                │
│  SATELLITE MATCH: LiteSAM(opt) or XFeat (benchmark) │
│  SENSOR FUSION: Custom ESKF (NumPy/SciPy)           │
│  PREPROCESSING: OpenCV CUDA or VPI 3.2              │
│  INFERENCE: TensorRT 10.3.0 (FP16)                  │
├────────────────────────────────────────────────────┤
│  TILE PROVIDER: Google Maps Tile API                 │
│  TILE STORAGE: GeoHash-indexed directory             │
│  OUTPUT: GeoJSON (WGS84) via SSE stream              │
└────────────────────────────────────────────────────┘

Dependency List

Python Packages (pip)

Package Version Purpose
pycuvslam v15.0.0 (aarch64 wheel) Visual odometry
fastapi >=0.115 REST API framework
sse-starlette >=3.3.2 SSE streaming
uvicorn >=0.30 ASGI server
numpy >=1.26 ESKF math, array ops
scipy >=1.12 Rotation matrices, spatial transforms
opencv-python (CUDA build) >=4.8 Image preprocessing (must build from source with CUDA)
torch (aarch64) >=2.3 (JetPack-compatible) LiteSAM model loading (if selected)
tensorrt 10.3.0 (JetPack bundled) Inference engine
pycuda >=2024.1 CUDA stream management
geojson >=3.1 GeoJSON output formatting
pygeohash >=1.2 GeoHash tile indexing

System Dependencies (JetPack 6.2.2)

Component Version Notes
CUDA Toolkit 12.6.10 Pre-installed
TensorRT 10.3.0 Pre-installed
cuDNN 9.3 Pre-installed
VPI 3.2 Pre-installed, alternative to OpenCV CUDA for resize
cuVSLAM runtime Bundled with PyCuVSLAM wheel

Offline Preprocessing Tools (developer machine, not Jetson)

Tool Purpose
Python 3.10+ Tile download script
Google Maps Tile API key Satellite tile access
torch + LiteSAM weights Feature pre-extraction (if LiteSAM selected)
trtexec (TensorRT) Model export to TensorRT engine

Risk Assessment

Technology Risk Likelihood Impact Mitigation
cuVSLAM Closed-source, nadir camera untested Medium High XFeat frame-to-frame as open-source fallback
LiteSAM May exceed 400ms on Orin Nano Super High High Abandon for XFeat — day-one benchmark is go/no-go
OpenCV CUDA build Build complexity on Jetson, CUDA arch compatibility Medium Low VPI 3.2 as drop-in alternative (pre-installed)
Google Maps Tile API Conflict zone coverage gaps, EEA restrictions Medium Medium Test tile availability for operational area pre-flight; alternative providers (Bing, Mapbox)
Custom ESKF Implementation bugs, tuning effort Low Medium FilterPy v1.4.5 as reference; well-understood algorithm
Python GIL Concurrent VO + satellite matching contention Low Low CUDA operations release GIL; use asyncio + threading for I/O

Learning Requirements

Technology Team Expertise Needed Ramp-up Time
PyCuVSLAM SLAM concepts, Python API, camera calibration 2-3 days
TensorRT model export ONNX export, trtexec, FP16 optimization 2-3 days
LiteSAM architecture Transformer-based matching (if selected) 1-2 days
XFeat Feature detection/matching concepts 1 day
ESKF Kalman filtering, quaternion math, multi-rate fusion 3-5 days
FastAPI + SSE Async Python, ASGI, SSE protocol 1 day
GeoHash spatial indexing Geospatial concepts 0.5 days
Jetson deployment JetPack, power modes, thermal management 2-3 days

Development Environment

Environment Purpose Setup
Developer machine (x86_64, GPU) Development, unit testing, model export Docker with CUDA + TensorRT
Jetson Orin Nano Super Integration testing, benchmarking, deployment JetPack 6.2.2 flashed, SSH access

Code should be developed and unit-tested on x86_64, then deployed to Jetson for integration/performance testing. cuVSLAM and TensorRT engines are aarch64-only — mock these in x86_64 tests.