mirror of
https://github.com/azaion/gps-denied-onboard.git
synced 2026-06-21 22:01:13 +00:00
Merge branch 'try02' into dev
This commit is contained in:
+90
-582
@@ -1,622 +1,130 @@
|
||||
# Solution Draft
|
||||
|
||||
## Assessment Findings
|
||||
|
||||
|
||||
| Old Component Solution | Weak Point (functional/security/performance) | New Solution |
|
||||
| ------------------------------------------------------------------------------------ | ----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | -------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
|
||||
| ESKF described as "16-state vector, ~10MB" with no mathematical specification | **Functional**: No state vector, no process model (F,Q), no measurement models (H for VO, H for satellite), no noise parameters, no scale observability analysis. Impossible to implement or validate accuracy claims. | **Define complete ESKF specification**: 15-state error vector, IMU-driven prediction, dual measurement models (VO relative pose, satellite absolute position), initial Q/R values, scale constraint via altitude + satellite corrections. |
|
||||
| GPS_INPUT at 5-10Hz via pymavlink — no field mapping | **Functional**: GPS_INPUT requires 15+ fields (velocity, accuracy, hdop, fix_type, GPS time). No specification of how ESKF state maps to these fields. ArduPilot requires minimum 5Hz. | **Define GPS_INPUT population spec**: velocity from ESKF, accuracy from covariance, fix_type from confidence tier, GPS time from system clock conversion, synthesized hdop/vdop. |
|
||||
| Confidence scoring "unchanged from draft03" — not in draft05 | **Functional**: Draft05 is supposed to be self-contained. Confidence scoring determines GPS_INPUT accuracy fields and fix_type — directly affects how ArduPilot EKF weights the position data. | **Define confidence scoring inline**: 3 tiers (satellite-anchored, VO-tracked, IMU-only) mapping to fix_type + accuracy values. |
|
||||
| Coordinate transformations not defined | **Functional**: No pixel→camera→body→NED→WGS84 chain. Camera is not autostabilized, so body attitude matters. Satellite match → WGS84 conversion undefined. Object localization impossible without these transforms. | **Define coordinate transformation chain**: camera intrinsics K, camera-to-body extrinsic T_cam_body, body-to-NED from ESKF attitude, NED origin at mission start point. |
|
||||
| Disconnected route segments — "satellite re-localization" mentioned but no algorithm | **Functional**: AC requires handling as "core to the system." Multiple disconnected segments expected. No tracking-loss detection, no re-localization trigger, no ESKF re-initialization, no cuVSLAM restart procedure. | **Define re-localization pipeline**: detect cuVSLAM tracking loss → IMU-only ESKF prediction → trigger satellite match on every frame → on match success: ESKF position reset + cuVSLAM restart → on 3 consecutive failures: operator re-localization request. |
|
||||
| No startup handoff from GPS to GPS-denied | **Functional**: System reads GLOBAL_POSITION_INT at startup but no protocol for when GPS is lost/spoofed vs system start. No validation of initial position. | **Define handoff protocol**: system runs continuously, FC receives both real GPS and GPS_INPUT. GPS-denied system always provides its estimate; FC selects best source. Initial position validated against first satellite match. |
|
||||
| No mid-flight reboot recovery | **Functional**: AC requires: "re-initialize from flight controller's current IMU-extrapolated position." No procedure defined. Recovery time estimation missing. | **Define reboot recovery sequence**: read FC position → init ESKF with high uncertainty → load TRT engines → start cuVSLAM → immediate satellite match. Estimated recovery: ~35-70s. Document as known limitation. |
|
||||
| 3-consecutive-failure re-localization request undefined | **Functional**: AC requires ground station re-localization request. No message format, no operator workflow, no system behavior while waiting. | **Define re-localization protocol**: detect 3 failures → send custom MAVLink message with last known position + uncertainty → operator provides approximate coordinates → system uses as ESKF measurement with high covariance. |
|
||||
| Object localization — "trigonometric calculation" with no details | **Functional**: No math, no API, no Viewpro gimbal integration, no accuracy propagation. Other onboard systems cannot use this component as specified. | **Define object localization**: pixel→ray using Viewpro intrinsics + gimbal angles → body frame → NED → ray-ground intersection → WGS84. FastAPI endpoint: POST /objects/locate. Accuracy propagated from UAV position + gimbal uncertainty. |
|
||||
| Satellite matching — GSD normalization and tile selection unspecified | **Functional**: Camera GSD ~15.9 cm/px at 600m vs satellite ~0.3 m/px at zoom 19. The "pre-resize" step is mentioned but not specified. Tile selection radius based on ESKF uncertainty not defined. | **Define GSD handling**: downsample camera frame to match satellite GSD. Define tile selection: ESKF position ± 3σ_horizontal → select tiles covering that area. Assemble tile mosaic for matching. |
|
||||
| Satellite tile storage requirements not calculated | **Functional**: "±2km" preload mentioned but no storage estimate. At zoom 19: a 200km path with ±2km buffer requires ~~130K tiles (~~2.5GB). | **Calculate tile storage**: specify zoom level (18 preferred — 0.6m/px, 4× fewer tiles), estimate storage per mission profile, define maximum mission area by storage limit. |
|
||||
| FastAPI endpoints not in solution draft | **Functional**: Endpoints only in security_analysis.md. No request/response schemas. No SSE event format. No object localization endpoint. | **Consolidate API spec in solution**: define all endpoints, SSE event schema, object localization endpoint. Reference security_analysis.md for auth. |
|
||||
| cuVSLAM configuration missing (calibration, IMU params, mode) | **Functional**: No camera calibration procedure, no IMU noise parameters, no T_imu_rig extrinsic, no mode selection (Mono vs Inertial). | **Define cuVSLAM configuration**: use Inertial mode, specify required calibration data (camera intrinsics, distortion, IMU noise params from datasheet, T_imu_rig from physical measurement), define calibration procedure. |
|
||||
| tech_stack.md inconsistent with draft05 | **Functional**: tech_stack.md says 3fps (should be 0.7fps), LiteSAM at 480px (should be 1280px), missing EfficientLoFTR. | **Flag for update**: tech_stack.md must be synchronized with draft05 corrections. Not addressed in this draft — separate task. |
|
||||
|
||||
|
||||
## Overall Maturity Assessment
|
||||
|
||||
|
||||
| Category | Maturity (1-5) | Assessment |
|
||||
| ----------------------------- | -------------- | --------------------------------------------------------------------------------------------------------------------------------------- |
|
||||
| Hardware & Platform Selection | 3.5 | UAV airframe, cameras, Jetson, batteries — well-researched with specs, weight budget, endurance calculations. Ready for procurement. |
|
||||
| Core Algorithm Selection | 3.0 | cuVSLAM, LiteSAM/XFeat, ESKF — components selected with comparison tables, fallback chains, decision trees. Day-one benchmarks defined. |
|
||||
| AI Inference Runtime | 3.5 | TRT Engine migration thoroughly analyzed. Conversion workflows, memory savings, performance estimates. Code wrapper provided. |
|
||||
| Sensor Fusion (ESKF) | 1.5 | Mentioned but not specified. No implementable detail. Blockerfor coding. |
|
||||
| System Integration | 1.5 | GPS_INPUT, coordinate transforms, inter-component data flow — all under-specified. |
|
||||
| Edge Cases & Resilience | 1.0 | Disconnected segments, reboot recovery, re-localization — acknowledged but no algorithms. |
|
||||
| Operational Readiness | 0.5 | No pre-flight procedures, no in-flight monitoring, no failure response. |
|
||||
| Security | 3.0 | Comprehensive threat model, OP-TEE analysis, LUKS, secure boot. Well-researched. |
|
||||
| **Overall TRL** | **~2.5** | **Technology concept formulated + some component validation. Not implementation-ready.** |
|
||||
|
||||
|
||||
The solution is at approximately **TRL 3** (proof of concept) for hardware/algorithm selection and **TRL 1-2** (basic concept) for system integration, ESKF, and operational procedures.
|
||||
# Solution
|
||||
|
||||
## Product Solution Description
|
||||
|
||||
A real-time GPS-denied visual navigation system for fixed-wing UAVs, running on a Jetson Orin Nano Super (8GB). All AI model inference uses native TensorRT Engine files. The system replaces the GPS module by sending MAVLink GPS_INPUT messages via pymavlink over UART at 5-10Hz.
|
||||
Build an onboard GPS-denied localization service that runs on the Jetson companion computer, uses the fixed downward navigation camera and flight-controller inertial telemetry, and emits ArduPilot `GPS_INPUT` estimates with calibrated covariance and source labels.
|
||||
|
||||
Position is determined by fusing: (1) CUDA-accelerated visual odometry (cuVSLAM in Inertial mode) from ADTI 20L V1 at 0.7 fps sustained, (2) absolute position corrections from satellite image matching (LiteSAM or XFeat — TRT Engine FP16) using keyframes from the same ADTI image stream, and (3) IMU data from the flight controller via ESKF. Viewpro A40 Pro is reserved for AI object detection only.
|
||||
|
||||
The ESKF is the central state estimator with 15-state error vector. It fuses:
|
||||
|
||||
- **IMU prediction** at 5-10Hz (high-frequency pose propagation)
|
||||
- **cuVSLAM VO measurement** at 0.7Hz (relative pose correction)
|
||||
- **Satellite matching measurement** at ~0.07-0.14Hz (absolute position correction)
|
||||
|
||||
GPS_INPUT messages carry position, velocity, and accuracy derived from the ESKF state and covariance.
|
||||
|
||||
**Hard constraint**: ADTI 20L V1 shoots at 0.7 fps sustained (1430ms interval). Full VO+ESKF pipeline within 400ms per frame. Satellite matching async on keyframes (every 5-10 camera frames). GPS_INPUT at 5-10Hz (ESKF IMU prediction fills gaps between camera frames).
|
||||
The production architecture is a trigger-based hybrid estimator:
|
||||
|
||||
```text
|
||||
Nav camera + FC telemetry
|
||||
|
|
||||
v
|
||||
Image quality + calibration + orthorectification
|
||||
|
|
||||
+--> Hot path: OpenCV geometry + BASALT VIO --> safety/anchor wrapper --> GPS_INPUT + QGC + FDR
|
||||
|
|
||||
+--> Reference path: OpenVINS replay benchmark for VIO drift/covariance tests; Kimera backup replay
|
||||
|
|
||||
+--> Trigger path: DINOv2-VLAD query --> CPU FAISS top-K --> ALIKED/DISK+LightGlue --> OpenCV RANSAC --> safety/anchor wrapper
|
||||
|
|
||||
+--> Tile path: new COG tile + quality/provenance sidecar --> manifest update --> post-flight Satellite Service sync
|
||||
```
|
||||
┌─────────────────────────────────────────────────────────────────────┐
|
||||
│ OFFLINE (Before Flight) │
|
||||
│ 1. Satellite Tiles → Download & Validate → Pre-resize → Store │
|
||||
│ (Google Maps) (≥0.5m/px, <2yr) (matcher res) (GeoHash)│
|
||||
│ 2. TRT Engine Build (one-time per model version): │
|
||||
│ PyTorch model → reparameterize → ONNX export → trtexec --fp16 │
|
||||
│ Output: litesam.engine, xfeat.engine │
|
||||
│ 3. Camera + IMU calibration (one-time per hardware unit) │
|
||||
│ 4. Copy tiles + engines + calibration to Jetson storage │
|
||||
└─────────────────────────────────────────────────────────────────────┘
|
||||
│
|
||||
▼
|
||||
┌─────────────────────────────────────────────────────────────────────┐
|
||||
│ ONLINE (During Flight) │
|
||||
│ │
|
||||
│ STARTUP: │
|
||||
│ 1. pymavlink → read GLOBAL_POSITION_INT → init ESKF state │
|
||||
│ 2. Load TRT engines + allocate GPU buffers │
|
||||
│ 3. Load camera calibration + IMU calibration │
|
||||
│ 4. Start cuVSLAM (Inertial mode) with ADTI 20L V1 │
|
||||
│ 5. Preload satellite tiles ±2km into RAM │
|
||||
│ 6. First satellite match → validate initial position │
|
||||
│ 7. Begin GPS_INPUT output loop at 5-10Hz │
|
||||
│ │
|
||||
│ EVERY CAMERA FRAME (0.7fps from ADTI 20L V1): │
|
||||
│ ┌──────────────────────────────────────┐ │
|
||||
│ │ ADTI 20L V1 → Downsample (CUDA) │ │
|
||||
│ │ → cuVSLAM VO+IMU (~9ms) │ ← CUDA Stream A │
|
||||
│ │ → ESKF VO measurement │ │
|
||||
│ └──────────────────────────────────────┘ │
|
||||
│ │
|
||||
│ 5-10Hz CONTINUOUS (IMU-driven between camera frames): │
|
||||
│ ┌──────────────────────────────────────┐ │
|
||||
│ │ IMU data → ESKF prediction │ │
|
||||
│ │ ESKF state → GPS_INPUT fields │ │
|
||||
│ │ GPS_INPUT → Flight Controller (UART) │ │
|
||||
│ └──────────────────────────────────────┘ │
|
||||
│ │
|
||||
│ KEYFRAMES (every 5-10 camera frames, async): │
|
||||
│ ┌──────────────────────────────────────┐ │
|
||||
│ │ Camera frame → GSD downsample │ │
|
||||
│ │ Select satellite tile (ESKF pos±3σ) │ │
|
||||
│ │ TRT inference (Stream B): LiteSAM/ │ │
|
||||
│ │ XFeat → correspondences │ │
|
||||
│ │ RANSAC → homography → WGS84 position │ │
|
||||
│ │ ESKF satellite measurement update │──→ Position correction │
|
||||
│ └──────────────────────────────────────┘ │
|
||||
│ │
|
||||
│ TRACKING LOSS (cuVSLAM fails — sharp turn / featureless): │
|
||||
│ ┌──────────────────────────────────────┐ │
|
||||
│ │ ESKF → IMU-only prediction (growing │ │
|
||||
│ │ uncertainty) │ │
|
||||
│ │ Satellite match on EVERY frame │ │
|
||||
│ │ On match success → ESKF reset + │ │
|
||||
│ │ cuVSLAM restart │ │
|
||||
│ │ 3 consecutive failures → operator │ │
|
||||
│ │ re-localization request │ │
|
||||
│ └──────────────────────────────────────┘ │
|
||||
│ │
|
||||
│ TELEMETRY (1Hz): │
|
||||
│ ┌──────────────────────────────────────┐ │
|
||||
│ │ NAMED_VALUE_FLOAT: confidence, drift │──→ Ground Station │
|
||||
│ └──────────────────────────────────────┘ │
|
||||
└─────────────────────────────────────────────────────────────────────┘
|
||||
```
|
||||
|
||||
Heavy local retrieval and local matching are not steady-state per-frame dependencies. They run on cold start, VO failure, sharp turns, disconnected segments, covariance growth, stale-anchor age, or operator-assisted relocalization, using only preloaded cache/index data during flight.
|
||||
|
||||
## Architecture
|
||||
|
||||
### Component: ESKF Sensor Fusion (NEW — previously unspecified)
|
||||
### Camera Ingest, Calibration, And Geometry
|
||||
|
||||
**Error-State Kalman Filter** fusing IMU, visual odometry, and satellite matching.
|
||||
| Solution | Tools | Pinned Mode/Config | Fit |
|
||||
|----------|-------|--------------------|-----|
|
||||
| OpenCV geometry utility layer | OpenCV 4.x | Calibration, undistortion, homography, RANSAC/USAC, MRE measurement | Selected. Mature, permissive, exact utility fit; not a full estimator. |
|
||||
|
||||
**Nominal state vector** (propagated by IMU):
|
||||
### VO / IMU Propagation And Estimator
|
||||
|
||||
| Solution | Tools | Pinned Mode/Config | Fit |
|
||||
|----------|-------|--------------------|-----|
|
||||
| BASALT + safety/anchor wrapper | BASALT, OpenCV, custom wrapper | BASALT consumes calibrated nav-camera frames + FC IMU; wrapper fuses satellite anchors, calibrates uncertainty, emits source labels and `GPS_INPUT` fields | Selected. Best production VIO candidate found: permissive license, strong benchmark evidence, avoids custom VIO from scratch. |
|
||||
| OpenVINS | OpenVINS | Monocular camera + IMU EKF/MSCKF reference runs with covariance extraction | Reference only. Strong VIO and covariance baseline, but GPLv3 and generic VIO ownership make it unsuitable as default shipped dependency. |
|
||||
| Kimera-VIO | Kimera-VIO | Mono/stereo camera + IMU VIO/SLAM backup replay | Backup candidate. BSD-friendly but heavier/stereo-oriented; mono-inertial path has documented caveats. |
|
||||
| ORB-SLAM3 | ORB-SLAM3 | Monocular-inertial SLAM | Rejected for production. GPLv3 and heavier SLAM/map lifecycle. |
|
||||
|
||||
| State | Symbol | Size | Description |
|
||||
| ---------- | ------ | ---- | ------------------------------------------------ |
|
||||
| Position | p | 3 | NED position relative to mission origin (meters) |
|
||||
| Velocity | v | 3 | NED velocity (m/s) |
|
||||
| Attitude | q | 4 | Unit quaternion (body-to-NED rotation) |
|
||||
| Accel bias | b_a | 3 | Accelerometer bias (m/s²) |
|
||||
| Gyro bias | b_g | 3 | Gyroscope bias (rad/s) |
|
||||
BASALT does not replace the project-owned safety logic. The wrapper remains responsible for satellite anchor acceptance, confidence calibration, source labels, blackout/spoofing modes, tile-write eligibility, and MAVLink `GPS_INPUT` semantics.
|
||||
|
||||
### Satellite Service And Anchor Verification
|
||||
|
||||
**Error-state vector** (estimated by ESKF): δx = [δp, δv, δθ, δb_a, δb_g]ᵀ ∈ ℝ¹⁵
|
||||
where δθ ∈ so(3) is the 3D rotation error.
|
||||
| Solution | Tools | Pinned Mode/Config | Fit |
|
||||
|----------|-------|--------------------|-----|
|
||||
| DINOv2-VLAD + CPU FAISS + ALIKED/DISK+LightGlue | DINOv2/AnyLoc-style descriptors, FAISS CPU, LightGlue, OpenCV RANSAC | Offline VPR chunk descriptors; conditional query descriptor; CPU FAISS top-K; learned local match on bounded candidates; TensorRT only after fidelity check | Selected with runtime/fidelity gates. |
|
||||
| SuperPoint+LightGlue | SuperPoint, LightGlue | Same matcher with SuperPoint features | License-gated benchmark/fallback only. |
|
||||
| Classical SIFT/ORB | OpenCV | Handcrafted features + homography | Regression/fallback baseline. |
|
||||
|
||||
**Prediction step** (IMU at 5-10Hz from flight controller):
|
||||
The Satellite Service component imports mission cache/index packages before flight, uploads generated-tile packages after landing, and serves local VPR queries during flight. The VPR index is built over ground-footprint-sized chunks with overlap and a multi-scale descriptor set. VPR is invoked only on relocalization triggers or covariance/anchor-age growth; normal flight uses BASALT VIO plus wrapper propagation. No satellite-provider or Satellite Service network calls are allowed mid-flight.
|
||||
|
||||
- Input: accelerometer a_m, gyroscope ω_m, dt
|
||||
- Propagate nominal state: p += v·dt, v += (R(q)·(a_m - b_a) - g)·dt, q ⊗= Exp(ω_m - b_g)·dt
|
||||
- Propagate error covariance: P = F·P·Fᵀ + Q
|
||||
- F is the 15×15 error-state transition matrix (standard ESKF formulation)
|
||||
- Q: process noise diagonal, initial values from IMU datasheet noise densities
|
||||
### Tile Manager
|
||||
|
||||
**VO measurement update** (0.7Hz from cuVSLAM):
|
||||
| Solution | Tools | Pinned Mode/Config | Fit |
|
||||
|----------|-------|--------------------|-----|
|
||||
| COG tile objects + PostgreSQL/PostGIS manifest + signed JSON sidecars | GDAL COG, PostgreSQL/PostGIS, signed JSON sidecars, FAISS index files | Service tiles and generated tiles are write-new COG objects; active version selected by PostGIS-backed manifest | Selected. Fits geospatial raster access, provenance, spatial/freshness queries, and write-new tile lifecycle. |
|
||||
| PMTiles | PMTiles | Read-only archive snapshot | Rejected for live cache because in-flight tile generation needs mutable write-new objects. |
|
||||
|
||||
- cuVSLAM outputs relative pose: ΔR, Δt (camera frame)
|
||||
- Transform to NED: Δp_ned = R_body_ned · T_cam_body · Δt
|
||||
- Innovation: z = Δp_ned_measured - Δp_ned_predicted
|
||||
- Observation matrix H_vo maps error state to relative position change
|
||||
- R_vo: measurement noise, initial ~0.1-0.5m (from cuVSLAM precision at 600m+ altitude)
|
||||
- Kalman update: K = P·Hᵀ·(H·P·Hᵀ + R)⁻¹, δx = K·z, P = (I - K·H)·P
|
||||
Service-source tiles and generated tiles carry CRS, capture date, source, m/px, freshness, quality score, sidecar hashes, and descriptor references. The Tile Manager also orthorectifies eligible nadir frames into generated COG tiles. Stale tiles are rejected or down-confidence weighted.
|
||||
|
||||
**Satellite measurement update** (0.07-0.14Hz, async):
|
||||
### MAVLink Integration
|
||||
|
||||
- Satellite matching outputs absolute position: lat_sat, lon_sat in WGS84
|
||||
- Convert to NED relative to mission origin
|
||||
- Innovation: z = p_satellite - p_predicted
|
||||
- H_sat = [I₃, 0, 0, 0, 0] (directly observes position)
|
||||
- R_sat: measurement noise, from matching confidence (~5-20m based on RANSAC inlier ratio)
|
||||
- Provides absolute position correction — bounds drift accumulation
|
||||
| Solution | Tools | Pinned Mode/Config | Fit |
|
||||
|----------|-------|--------------------|-----|
|
||||
| MAVSDK telemetry + pymavlink `GPS_INPUT` | MAVSDK, pymavlink | MAVSDK subscriptions; pymavlink emits `GPS_INPUT`; v1 emits GPS_INPUT only; Plane SITL validates `GPS1_TYPE=14`, velocity source params, ignore flags, fix types, accuracy fields | Selected. Exact output control with good telemetry ergonomics. |
|
||||
|
||||
**Scale observability**:
|
||||
The system emits per-frame estimates locally and downsampled status to QGroundControl. `GPS_INPUT.horiz_accuracy` must not under-report the calibrated 95% covariance semi-major axis.
|
||||
|
||||
- Monocular cuVSLAM has scale ambiguity during constant-velocity flight
|
||||
- Scale is constrained by: (1) satellite matching absolute positions (primary), (2) known flight altitude from barometer + predefined mission altitude, (3) IMU accelerometer during maneuvers
|
||||
- During long straight segments without satellite correction, scale drift is possible. Satellite corrections every ~7-14s re-anchor scale.
|
||||
### Security And Safety Controls
|
||||
|
||||
**Tuning approach**: Start with IMU datasheet noise values for Q. Start with conservative R values (high measurement noise). Tune on flight test data by comparing ESKF output to known GPS ground truth.
|
||||
| Solution | Tools | Pinned Mode/Config | Fit |
|
||||
|----------|-------|--------------------|-----|
|
||||
| Consistency-gated anchor acceptance | Safety/anchor wrapper, cache manifest verification | Anchor accepted only if freshness, provenance, RANSAC, covariance, Mahalanobis, and temporal consistency pass | Selected. Prevents confident false fixes. |
|
||||
| FDR audit trail | PostgreSQL event index + CBOR payload segments + hashes | Logs estimates, inputs, emitted GPS_INPUT, health, tile writes, anchor decisions | Selected. Supports incident analysis, indexed queries, and cache-poisoning audits. |
|
||||
|
||||
## Runtime Modes
|
||||
|
||||
| Solution | Tools | Advantages | Limitations | Performance | Fit |
|
||||
| -------------------------- | --------------- | ------------------------------------------------------------- | -------------------------------------- | ------------- | ----------- |
|
||||
| Custom ESKF (Python/NumPy) | NumPy, SciPy | Full control, minimal dependencies, well-understood algorithm | Implementation effort, tuning required | <1ms per step | ✅ Selected |
|
||||
| FilterPy ESKF | FilterPy v1.4.5 | Reference implementation, less code | Less flexible for multi-rate fusion | <1ms per step | ⚠️ Fallback |
|
||||
|
||||
|
||||
### Component: Coordinate System & Transformations (NEW — previously undefined)
|
||||
|
||||
**Reference frames**:
|
||||
|
||||
- **Camera frame (C)**: origin at camera optical center, Z forward, X right, Y down (OpenCV convention)
|
||||
- **Body frame (B)**: origin at UAV CG, X forward (nose), Y right (starboard), Z down
|
||||
- **NED frame (N)**: North-East-Down, origin at mission start point
|
||||
- **WGS84**: latitude, longitude, altitude (output format)
|
||||
|
||||
**Transformation chain**:
|
||||
|
||||
1. **Pixel → Camera ray**: p_cam = K⁻¹ · [u, v, 1]ᵀ where K = camera intrinsic matrix (ADTI 20L V1: fx, fy from 16mm lens + APS-C sensor)
|
||||
2. **Camera → Body**: p_body = T_cam_body · p_cam where T_cam_body is the fixed mounting rotation (camera points nadir: 90° pitch rotation from body X-forward to camera Z-down)
|
||||
3. **Body → NED**: p_ned = R_body_ned(q) · p_body where q is the ESKF quaternion attitude estimate
|
||||
4. **NED → WGS84**: lat = lat_origin + p_north / R_earth, lon = lon_origin + p_east / (R_earth · cos(lat_origin)) where (lat_origin, lon_origin) is the mission start GPS position
|
||||
|
||||
**Camera intrinsic matrix K** (ADTI 20L V1 + 16mm lens):
|
||||
|
||||
- Sensor: 23.2 × 15.4 mm, Resolution: 5456 × 3632
|
||||
- fx = fy = focal_mm × width_px / sensor_width_mm = 16 × 5456 / 23.2 = 3763 pixels
|
||||
- cx = 2728, cy = 1816 (sensor center)
|
||||
- Distortion: Brown model (k1, k2, p1, p2 from calibration)
|
||||
|
||||
**T_cam_body** (camera mount):
|
||||
|
||||
- Navigation camera is fixed, pointing nadir (downward), not autostabilized
|
||||
- R_cam_body = R_x(180°) · R_z(0°) (camera Z-axis aligned with body -Z, camera X with body X)
|
||||
- Translation: offset from CG to camera mount (measured during assembly, typically <0.3m)
|
||||
|
||||
**Satellite match → WGS84**:
|
||||
|
||||
- Feature correspondences between camera frame and geo-referenced satellite tile
|
||||
- Homography H maps camera pixels to satellite tile pixels
|
||||
- Satellite tile pixel → WGS84 via tile's known georeference (zoom level + tile x,y → lat,lon)
|
||||
- Camera center projects to satellite pixel (cx_sat, cy_sat) via H
|
||||
- Convert (cx_sat, cy_sat) to WGS84 using tile georeference
|
||||
|
||||
### Component: GPS_INPUT Message Population (NEW — previously undefined)
|
||||
|
||||
|
||||
| GPS_INPUT Field | Source | Computation |
|
||||
| ----------------------- | ---------------------------------------------- | ------------------------------------------------------------------------------------------------------- |
|
||||
| lat, lon | ESKF position (NED) | NED → WGS84 conversion using mission origin |
|
||||
| alt | ESKF position (Down) + mission origin altitude | alt = alt_origin - p_down |
|
||||
| vn, ve, vd | ESKF velocity state | Direct from ESKF v[0], v[1], v[2] |
|
||||
| fix_type | Confidence tier | 3 (3D fix) when satellite-anchored (last match <30s). 2 (2D) when VO-only. 0 (no fix) when IMU-only >5s |
|
||||
| hdop | ESKF horizontal covariance | hdop = sqrt(P[0,0] + P[1,1]) / 5.0 (approximate CEP→HDOP mapping) |
|
||||
| vdop | ESKF vertical covariance | vdop = sqrt(P[2,2]) / 5.0 |
|
||||
| horiz_accuracy | ESKF horizontal covariance | horiz_accuracy = sqrt(P[0,0] + P[1,1]) meters |
|
||||
| vert_accuracy | ESKF vertical covariance | vert_accuracy = sqrt(P[2,2]) meters |
|
||||
| speed_accuracy | ESKF velocity covariance | speed_accuracy = sqrt(P[3,3] + P[4,4]) m/s |
|
||||
| time_week, time_week_ms | System time | Convert Unix time to GPS epoch (GPS epoch = 1980-01-06, subtract leap seconds) |
|
||||
| satellites_visible | Constant | 10 (synthetic — prevents satellite-count failsafes in ArduPilot) |
|
||||
| gps_id | Constant | 0 |
|
||||
| ignore_flags | Constant | 0 (provide all fields) |
|
||||
|
||||
|
||||
**Confidence tiers** mapping to GPS_INPUT:
|
||||
|
||||
|
||||
| Tier | Condition | fix_type | horiz_accuracy | Rationale |
|
||||
| ------ | ------------------------------------------------- | ---------- | ------------------------------- | -------------------------------------- |
|
||||
| HIGH | Satellite match <30s ago, ESKF covariance < 400m² | 3 (3D fix) | From ESKF P (typically 5-20m) | Absolute position anchor recent |
|
||||
| MEDIUM | cuVSLAM tracking OK, no recent satellite match | 3 (3D fix) | From ESKF P (typically 20-50m) | Relative tracking valid, drift growing |
|
||||
| LOW | cuVSLAM lost, IMU-only | 2 (2D fix) | From ESKF P (50-200m+, growing) | Only IMU dead reckoning, rapid drift |
|
||||
| FAILED | 3+ consecutive total failures | 0 (no fix) | 999.0 | System cannot determine position |
|
||||
|
||||
|
||||
### Component: Disconnected Route Segment Handling (NEW — previously undefined)
|
||||
|
||||
**Trigger**: cuVSLAM reports tracking_lost OR tracking confidence drops below threshold
|
||||
|
||||
**Algorithm**:
|
||||
|
||||
```
|
||||
STATE: TRACKING_NORMAL
|
||||
cuVSLAM provides relative pose
|
||||
ESKF VO measurement updates at 0.7Hz
|
||||
Satellite matching on keyframes (every 5-10 frames)
|
||||
|
||||
STATE: TRACKING_LOST (enter when cuVSLAM reports loss)
|
||||
1. ESKF continues with IMU-only prediction (no VO updates)
|
||||
→ uncertainty grows rapidly (~1-5 m/s drift with consumer IMU)
|
||||
2. Switch satellite matching to EVERY frame (not just keyframes)
|
||||
→ maximize chances of getting absolute correction
|
||||
3. For each camera frame:
|
||||
a. Attempt satellite match using ESKF predicted position ± 3σ for tile selection
|
||||
b. If match succeeds (RANSAC inlier ratio > 30%):
|
||||
→ ESKF measurement update with satellite position
|
||||
→ Restart cuVSLAM with current frame as new origin
|
||||
→ Transition to TRACKING_NORMAL
|
||||
→ Reset failure counter
|
||||
c. If match fails:
|
||||
→ Increment failure_counter
|
||||
→ Continue IMU-only ESKF prediction
|
||||
4. If failure_counter >= 3:
|
||||
→ Send re-localization request to ground station
|
||||
→ GPS_INPUT fix_type = 0 (no fix), horiz_accuracy = 999.0
|
||||
→ Continue attempting satellite matching on each frame
|
||||
5. If operator sends re-localization hint (approximate lat,lon):
|
||||
→ Use as ESKF measurement with high covariance (~500m)
|
||||
→ Attempt satellite match in that area
|
||||
→ On success: transition to TRACKING_NORMAL
|
||||
|
||||
STATE: SEGMENT_DISCONNECT
|
||||
After re-localization following tracking loss:
|
||||
→ New cuVSLAM track is independent of previous track
|
||||
→ ESKF maintains global NED position continuity via satellite anchor
|
||||
→ No need to "connect" segments at the cuVSLAM level
|
||||
→ ESKF already handles this: satellite corrections keep global position consistent
|
||||
```
|
||||
|
||||
### Component: Satellite Image Matching Pipeline (UPDATED — added GSD + tile selection details)
|
||||
|
||||
**GSD normalization**:
|
||||
|
||||
- Camera GSD at 600m: ~15.9 cm/pixel (ADTI 20L V1 + 16mm)
|
||||
- Satellite tile GSD at zoom 18: ~0.6 m/pixel
|
||||
- Scale ratio: ~3.8:1
|
||||
- Downsample camera image to satellite GSD before matching: resize from 5456×3632 to ~1440×960 (matching zoom 18 GSD)
|
||||
- This is close to LiteSAM's 1280px input — use 1280px with minor GSD mismatch acceptable for matching
|
||||
|
||||
**Tile selection**:
|
||||
|
||||
- Input: ESKF position estimate (lat, lon) + horizontal covariance σ_h
|
||||
- Search radius: max(3·σ_h, 500m) — at least 500m to handle initial uncertainty
|
||||
- Compute geohash for center position → load tiles covering the search area
|
||||
- Assemble tile mosaic if needed (typically 2×2 to 4×4 tiles for adequate coverage)
|
||||
- If ESKF uncertainty > 2km: tile selection unreliable, fall back to wider search or request operator input
|
||||
|
||||
**Tile storage calculation** (zoom 18 — 0.6 m/pixel):
|
||||
|
||||
- Each 256×256 tile covers ~153m × 153m
|
||||
- Flight path 200km with ±2km buffer: area ≈ 200km × 4km = 800 km²
|
||||
- Tiles needed: 800,000,000 / (153 × 153) ≈ 34,200 tiles
|
||||
- Storage: ~10-15KB per JPEG tile → ~340-510 MB
|
||||
- With zoom 19 overlap tiles for higher precision: ×4 = ~1.4-2.0 GB
|
||||
- Recommended: zoom 18 primary + zoom 19 for ±500m along flight path → ~500-800 MB total
|
||||
|
||||
|
||||
| Solution | Tools | Advantages | Limitations | Performance (est. Orin Nano Super TRT FP16) | Params | Fit |
|
||||
| -------------------------------------- | ------------------------- | ------------------------------------------------------------------------ | ------------------------------------------------------ | ------------------------------------------- | ------ | ------------------------------- |
|
||||
| LiteSAM (opt) TRT Engine FP16 @ 1280px | trtexec + tensorrt Python | Best satellite-aerial accuracy (RMSE@30=17.86m UAV-VisLoc), 6.31M params | MinGRU TRT export needs verification (LOW-MEDIUM risk) | Est. ~165-330ms | 6.31M | ✅ Primary |
|
||||
| EfficientLoFTR TRT Engine FP16 | trtexec + tensorrt Python | Proven TRT path (Coarse_LoFTR_TRT). Semi-dense. CVPR 2024. | 2.4x more params than LiteSAM. | Est. ~200-400ms | 15.05M | ✅ Fallback if LiteSAM TRT fails |
|
||||
| XFeat TRT Engine FP16 | trtexec + tensorrt Python | Fastest. Proven TRT implementation. | General-purpose, not designed for cross-view gap. | Est. ~50-100ms | <5M | ✅ Speed fallback |
|
||||
|
||||
|
||||
### Component: cuVSLAM Configuration (NEW — previously undefined)
|
||||
|
||||
**Mode**: Inertial (mono camera + IMU)
|
||||
|
||||
**Camera configuration** (ADTI 20L V1 + 16mm lens):
|
||||
|
||||
- Model: Brown distortion
|
||||
- fx = fy = 3763 px (16mm on 23.2mm sensor at 5456px width)
|
||||
- cx = 2728 px, cy = 1816 px
|
||||
- Distortion coefficients: from calibration (k1, k2, p1, p2)
|
||||
- Border: 50px (ignore lens edge distortion)
|
||||
|
||||
**IMU configuration** (Pixhawk 6x IMU — ICM-42688-P):
|
||||
|
||||
- Gyroscope noise density: 3.0 × 10⁻³ °/s/√Hz
|
||||
- Gyroscope random walk: 5.0 × 10⁻⁵ °/s²/√Hz
|
||||
- Accelerometer noise density: 70 µg/√Hz
|
||||
- Accelerometer random walk: ~2.0 × 10⁻³ m/s³/√Hz
|
||||
- IMU frequency: 200 Hz (from flight controller via MAVLink)
|
||||
- T_imu_rig: measured transformation from Pixhawk IMU to camera center (translation + rotation)
|
||||
|
||||
**cuVSLAM settings**:
|
||||
|
||||
- OdometryMode: INERTIAL
|
||||
- MulticameraMode: PRECISION (favor accuracy over speed — we have 1430ms budget)
|
||||
- Input resolution: downsample to 1280×852 (or 720p) for processing speed
|
||||
- async_bundle_adjustment: True
|
||||
|
||||
**Initialization**:
|
||||
|
||||
- cuVSLAM initializes automatically when it receives the first camera frame + IMU data
|
||||
- First few frames used for feature initialization and scale estimation
|
||||
- First satellite match validates and corrects the initial position
|
||||
|
||||
**Calibration procedure** (one-time per hardware unit):
|
||||
|
||||
1. Camera intrinsics: checkerboard calibration with OpenCV (or use manufacturer data if available)
|
||||
2. Camera-IMU extrinsic (T_imu_rig): Kalibr tool with checkerboard + IMU data
|
||||
3. IMU noise parameters: Allan variance analysis or use datasheet values
|
||||
4. Store calibration files on Jetson storage
|
||||
|
||||
### Component: AI Model Inference Runtime (UNCHANGED)
|
||||
|
||||
Native TRT Engine — optimal performance and memory on fixed NVIDIA hardware. See draft05 for full comparison table and conversion workflow.
|
||||
|
||||
### Component: Visual Odometry (UNCHANGED)
|
||||
|
||||
cuVSLAM in Inertial mode, fed by ADTI 20L V1 at 0.7 fps sustained. See draft05 for feasibility analysis at 0.7fps.
|
||||
|
||||
### Component: Flight Controller Integration (UPDATED — added GPS_INPUT field spec)
|
||||
|
||||
pymavlink over UART at 5-10Hz. GPS_INPUT field population defined above.
|
||||
|
||||
ArduPilot configuration:
|
||||
|
||||
- GPS1_TYPE = 14 (MAVLink)
|
||||
- GPS_RATE = 5 (minimum, matching our 5-10Hz output)
|
||||
- EK3_SRC1_POSXY = 1 (GPS), EK3_SRC1_VELXY = 1 (GPS) — EKF uses GPS_INPUT as position/velocity source
|
||||
|
||||
### Component: Object Localization (NEW — previously undefined)
|
||||
|
||||
**Input**: pixel coordinates (u, v) in Viewpro A40 Pro image, current gimbal angles (pan_deg, tilt_deg), zoom factor, UAV position from GPS-denied system, UAV altitude
|
||||
|
||||
**Process**:
|
||||
|
||||
1. Pixel → camera ray: ray_cam = K_viewpro⁻¹(zoom) · [u, v, 1]ᵀ
|
||||
2. Camera → gimbal frame: ray_gimbal = R_gimbal(pan, tilt) · ray_cam
|
||||
3. Gimbal → body: ray_body = T_gimbal_body · ray_gimbal
|
||||
4. Body → NED: ray_ned = R_body_ned(q) · ray_body
|
||||
5. Ray-ground intersection: assuming flat terrain at UAV altitude h: t = -h / ray_ned[2], p_ground_ned = p_uav_ned + t · ray_ned
|
||||
6. NED → WGS84: convert to lat, lon
|
||||
|
||||
**Output**: { lat, lon, accuracy_m, confidence }
|
||||
|
||||
- accuracy_m propagated from: UAV position accuracy (from ESKF) + gimbal angle uncertainty + altitude uncertainty
|
||||
|
||||
**API endpoint**: POST /objects/locate
|
||||
|
||||
- Request: { pixel_x, pixel_y, gimbal_pan_deg, gimbal_tilt_deg, zoom_factor }
|
||||
- Response: { lat, lon, alt, accuracy_m, confidence, uav_position: {lat, lon, alt}, timestamp }
|
||||
|
||||
### Component: Startup, Handoff & Failsafe (UPDATED — added handoff + reboot + re-localization)
|
||||
|
||||
**GPS-denied handoff protocol**:
|
||||
|
||||
- GPS-denied system runs continuously from companion computer boot
|
||||
- Reads initial position from FC (GLOBAL_POSITION_INT) — this may be real GPS or last known
|
||||
- First satellite match validates the initial position
|
||||
- FC receives both real GPS (if available) and GPS_INPUT; FC EKF selects best source based on accuracy
|
||||
- No explicit "switch" — the GPS-denied system is a secondary GPS source
|
||||
|
||||
**Startup sequence** (expanded from draft05):
|
||||
|
||||
1. Boot Jetson → start GPS-Denied service (systemd)
|
||||
2. Connect to flight controller via pymavlink on UART
|
||||
3. Wait for heartbeat
|
||||
4. Initialize PyCUDA context
|
||||
5. Load TRT engines: litesam.engine + xfeat.engine (~1-3s each)
|
||||
6. Allocate GPU I/O buffers
|
||||
7. Create CUDA streams: Stream A (cuVSLAM), Stream B (satellite matching)
|
||||
8. Load camera calibration + IMU calibration files
|
||||
9. Read GLOBAL_POSITION_INT → set mission origin (NED reference point) → init ESKF
|
||||
10. Start cuVSLAM (Inertial mode) with ADTI 20L V1 camera stream
|
||||
11. Preload satellite tiles within ±2km into RAM
|
||||
12. Trigger first satellite match → validate initial position
|
||||
13. Begin GPS_INPUT output loop at 5-10Hz
|
||||
14. System ready
|
||||
|
||||
**Mid-flight reboot recovery**:
|
||||
|
||||
1. Jetson boots (~30-60s)
|
||||
2. GPS-Denied service starts, connects to FC
|
||||
3. Read GLOBAL_POSITION_INT (FC's current IMU-extrapolated position)
|
||||
4. Init ESKF with this position + HIGH uncertainty covariance (σ = 200m)
|
||||
5. Load TRT engines (~2-6s total)
|
||||
6. Start cuVSLAM (fresh, no prior map)
|
||||
7. Immediate satellite matching on first camera frame
|
||||
8. On satellite match success: ESKF corrected, uncertainty drops
|
||||
9. Estimated total recovery: ~35-70s
|
||||
10. During recovery: FC uses IMU-only dead reckoning (at 70 km/h: ~700-1400m uncontrolled drift)
|
||||
11. **Known limitation**: recovery time is dominated by Jetson boot time
|
||||
|
||||
**3-consecutive-failure re-localization**:
|
||||
|
||||
- Trigger: VO lost + satellite match failed × 3 consecutive camera frames
|
||||
- Action: send re-localization request via MAVLink STATUSTEXT or custom message
|
||||
- Message content: "RELOC_REQ: last_lat={lat} last_lon={lon} uncertainty={σ}m"
|
||||
- Operator response: MAVLink COMMAND_LONG with approximate lat/lon
|
||||
- System: use operator position as ESKF measurement with R = diag(500², 500², 100²) meters²
|
||||
- System continues satellite matching with updated search area
|
||||
- While waiting: GPS_INPUT fix_type=0, IMU-only ESKF prediction continues
|
||||
|
||||
### Component: Ground Station Telemetry (UPDATED — added re-localization)
|
||||
|
||||
MAVLink messages to ground station:
|
||||
|
||||
|
||||
| Message | Rate | Content |
|
||||
| ----------------------------- | -------- | --------------------------------------------------- |
|
||||
| NAMED_VALUE_FLOAT "gps_conf" | 1Hz | Confidence score (0.0-1.0) |
|
||||
| NAMED_VALUE_FLOAT "gps_drift" | 1Hz | Estimated drift from last satellite anchor (meters) |
|
||||
| NAMED_VALUE_FLOAT "gps_hacc" | 1Hz | Horizontal accuracy (meters, from ESKF) |
|
||||
| STATUSTEXT | On event | "RELOC_REQ: ..." for re-localization request |
|
||||
| STATUSTEXT | On event | Tracking loss / recovery notifications |
|
||||
|
||||
|
||||
### Component: Thermal Management (UNCHANGED)
|
||||
|
||||
Same adaptive pipeline from draft05. Active cooling required at 25W. Throttling at 80°C SoC junction.
|
||||
|
||||
### Component: API & Inter-System Communication (NEW — consolidated)
|
||||
|
||||
FastAPI (Uvicorn) running locally on Jetson for inter-process communication with other onboard systems.
|
||||
|
||||
|
||||
| Endpoint | Method | Purpose | Auth |
|
||||
| --------------------- | --------- | -------------------------------------- | ---- |
|
||||
| /sessions | POST | Start GPS-denied session | JWT |
|
||||
| /sessions/{id}/stream | GET (SSE) | Real-time position + confidence stream | JWT |
|
||||
| /sessions/{id}/anchor | POST | Operator re-localization hint | JWT |
|
||||
| /sessions/{id} | DELETE | End session | JWT |
|
||||
| /objects/locate | POST | Object GPS from pixel coordinates | JWT |
|
||||
| /health | GET | System health + memory + thermal | None |
|
||||
|
||||
|
||||
**SSE event schema** (1Hz):
|
||||
|
||||
```json
|
||||
{
|
||||
"type": "position",
|
||||
"timestamp": "2026-03-17T12:00:00.000Z",
|
||||
"lat": 48.123456,
|
||||
"lon": 37.654321,
|
||||
"alt": 600.0,
|
||||
"accuracy_h": 15.2,
|
||||
"accuracy_v": 8.1,
|
||||
"confidence": "HIGH",
|
||||
"drift_from_anchor": 12.5,
|
||||
"vo_status": "tracking",
|
||||
"last_satellite_match_age_s": 8.3
|
||||
}
|
||||
```
|
||||
|
||||
## UAV Platform
|
||||
|
||||
Unchanged from draft05. See draft05 for: airframe configuration (3.5m S-2 composite, 12.5kg AUW), flight performance (3.4h endurance at 50 km/h), camera specifications (ADTI 20L V1 + 16mm, Viewpro A40 Pro), ground coverage calculations.
|
||||
|
||||
## Speed Optimization Techniques
|
||||
|
||||
Unchanged from draft05. Key points: cuVSLAM ~9ms/frame, native TRT Engine (no ONNX RT), dual CUDA streams, 5-10Hz GPS_INPUT from ESKF IMU prediction.
|
||||
|
||||
## Processing Time Budget
|
||||
|
||||
Unchanged from draft05. VO frame: ~17-22ms. Satellite matching: ≤210ms async. Well within 1430ms frame interval.
|
||||
|
||||
## Memory Budget (Jetson Orin Nano Super, 8GB shared)
|
||||
|
||||
|
||||
| Component | Memory | Notes |
|
||||
| ------------------------- | -------------- | ------------------------------------------- |
|
||||
| OS + runtime | ~1.5GB | JetPack 6.2 + Python |
|
||||
| cuVSLAM | ~200-500MB | CUDA library + map |
|
||||
| LiteSAM TRT engine | ~50-80MB | If LiteSAM fails: EfficientLoFTR ~100-150MB |
|
||||
| XFeat TRT engine | ~30-50MB | |
|
||||
| Preloaded satellite tiles | ~200MB | ±2km of flight plan |
|
||||
| pymavlink + MAVLink | ~20MB | |
|
||||
| FastAPI (local IPC) | ~50MB | |
|
||||
| ESKF + buffers | ~10MB | |
|
||||
| **Total** | **~2.1-2.9GB** | **26-36% of 8GB** |
|
||||
|
||||
|
||||
## Key Risks and Mitigations
|
||||
|
||||
|
||||
| Risk | Likelihood | Impact | Mitigation |
|
||||
| ------------------------------------------------------- | ---------- | ----------------------------------------------------- | ------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
|
||||
| LiteSAM MinGRU ops unsupported in TRT 10.3 | LOW-MEDIUM | LiteSAM TRT export fails | Day-one verification. Fallback: EfficientLoFTR TRT → XFeat TRT. |
|
||||
| cuVSLAM fails on low-texture terrain at 0.7fps | HIGH | Frequent tracking loss | Satellite matching corrections bound drift. Re-localization pipeline handles tracking loss. IMU bridges short gaps. |
|
||||
| Google Maps satellite quality in conflict zone | HIGH | Satellite matching fails, outdated imagery | Pre-flight tile validation. Consider alternative providers (Bing, Mapbox). Robust to seasonal appearance changes via feature-based matching. |
|
||||
| ESKF scale drift during long constant-velocity segments | MEDIUM | Position error exceeds 100m between satellite anchors | Satellite corrections every 7-14s re-anchor. Altitude constraint from barometer. Monitor drift rate — if >50m between corrections, increase satellite matching frequency. |
|
||||
| Monocular scale ambiguity | MEDIUM | Metric scale lost during constant-velocity flight | Satellite absolute corrections provide scale. Known altitude constrains vertical scale. IMU acceleration during turns provides observability. |
|
||||
| AUW exceeds AT4125 recommended range | MEDIUM | Reduced endurance, motor thermal stress | 12.5 kg vs 8-10 kg recommended. Monitor motor temps. Weight optimization. |
|
||||
| ADTI mechanical shutter lifespan | MEDIUM | Replacement needed periodically | ~8,800 actuations/flight at 0.7fps. Estimated 11-57 flights before replacement. Budget as consumable. |
|
||||
| Mid-flight companion computer failure | LOW | ~35-70s position gap | Reboot recovery procedure defined. FC uses IMU dead reckoning during gap. Known limitation. |
|
||||
| Thermal throttling on Jetson | MEDIUM | Satellite matching latency increases | Active cooling required. Monitor SoC temp. Throttling at 80°C. Our workload ~8-15W typical — well under 25W TDP. |
|
||||
| Engine incompatibility after JetPack update | MEDIUM | Must rebuild engines | Include engine rebuild in update procedure. |
|
||||
| TRT engine build OOM on 8GB | LOW | Cannot build on target | Models small (6.31M, <5M). Reduce --memPoolSize if needed. |
|
||||
|
||||
| Mode | Trigger | Behavior | `GPS_INPUT` / Telemetry |
|
||||
|------|---------|----------|--------------------------|
|
||||
| `satellite_anchored` | VPR + local match passes all gates | Wrapper absolute update; tile write eligible only if sigma gate passes | 3D fix, `horiz_accuracy` >= 95% covariance semi-major axis |
|
||||
| `vo_extrapolated` | BASALT VIO healthy and anchor age/covariance within bounds | BASALT VIO + wrapper propagation; covariance grows | 3D/2D depending covariance threshold |
|
||||
| `dead_reckoned` | visual blackout or no accepted anchor | IMU-only propagation, monotonic covariance growth | degraded fix type; QGC `VISUAL_BLACKOUT_IMU_ONLY` |
|
||||
| failsafe/no-fix | covariance >500 m or blackout >30 s | stop pretending position is valid | `fix_type=0`, `horiz_accuracy=999.0`, QGC `VISUAL_BLACKOUT_FAILSAFE` |
|
||||
|
||||
## Testing Strategy
|
||||
|
||||
### Integration / Functional Tests
|
||||
|
||||
- **ESKF correctness**: Feed recorded IMU + synthetic VO/satellite data → verify output matches reference ESKF implementation
|
||||
- **GPS_INPUT field validation**: Send GPS_INPUT to SITL ArduPilot → verify EKF accepts and uses the data correctly
|
||||
- **Coordinate transform chain**: Known GPS → NED → pixel → back to GPS — verify round-trip error <0.1m
|
||||
- **Disconnected segment handling**: Simulate tracking loss → verify satellite re-localization triggers → verify cuVSLAM restarts → verify ESKF position continuity
|
||||
- **3-consecutive-failure**: Simulate VO + satellite failures → verify re-localization request sent → verify operator hint accepted
|
||||
- **Object localization**: Known object at known GPS → verify computed GPS matches within camera accuracy
|
||||
- **Mid-flight reboot**: Kill GPS-denied process → restart → verify recovery within expected time → verify position accuracy after recovery
|
||||
- **TRT engine load test**: Verify engines load successfully on Jetson
|
||||
- **TRT inference correctness**: Compare TRT output vs PyTorch reference (max L1 error < 0.01)
|
||||
- **CUDA Stream pipelining**: Verify Stream B satellite matching does not block Stream A VO
|
||||
- **ADTI sustained capture rate**: Verify 0.7fps sustained >30 min without buffer overflow
|
||||
- **Confidence tier transitions**: Verify fix_type and accuracy change correctly across HIGH → MEDIUM → LOW → FAILED transitions
|
||||
- BASALT replay: assert AC-2.1a and AC-2.2 VO MRE on overlapping frame pairs, completion rate, latency, and wrapper-calibrated covariance.
|
||||
- OpenVINS reference replay: compare VIO drift, failure cases, and covariance against BASALT + wrapper.
|
||||
- Kimera-VIO backup replay: keep a second permissive candidate benchmark in case BASALT fails project replay/runtime gates.
|
||||
- Satellite anchor replay: assert AC-1.1/1.2, AC-2.2 cross-domain MRE, freshness rejection, and source labels.
|
||||
- DINOv2 descriptor fidelity: compare PyTorch/ONNX/TensorRT embeddings and retrieval rankings before accepting optimized engines.
|
||||
- FAISS CPU index tests: top-K recall, query latency, index size, save/load behavior on Jetson ARM64.
|
||||
- LightGlue extractor matrix: ALIKED vs DISK vs SIFT/ORB vs SuperPoint benchmark; SuperPoint excluded from production unless legal approves.
|
||||
- Tile Manager: orthorectify eligible nadir frames into write-new generated tiles, update manifest, verify active version and rollback.
|
||||
- `GPS_INPUT` SITL: validate fix type, `horiz_accuracy`, velocity fields, ignore flags, `EK3_SRC1_*` parameters, QGC behavior.
|
||||
- Security gates: stale tile, mismatched tile hash, low inlier ratio, impossible velocity jump, and spoofed GPS during blackout.
|
||||
|
||||
### Non-Functional Tests
|
||||
|
||||
- **End-to-end accuracy** (primary validation): Fly with real GPS recording → run GPS-denied system in parallel → compare estimated vs real positions → verify 80% within 50m, 60% within 20m
|
||||
- **VO drift rate**: Measure cuVSLAM drift over 1km straight segment without satellite correction
|
||||
- **Satellite matching accuracy**: Compare satellite-matched position vs real GPS at known locations
|
||||
- **Processing time**: Verify end-to-end per-frame <400ms
|
||||
- **Memory usage**: Monitor over 30-min session → verify <8GB, no leaks
|
||||
- **Thermal**: Sustained 30-min run → verify no throttling
|
||||
- **GPS_INPUT rate**: Verify consistent 5-10Hz delivery to FC
|
||||
- **Tile storage**: Validate calculated storage matches actual for test mission area
|
||||
- **MinGRU TRT compatibility** (day-one blocker): Clone LiteSAM → ONNX export → polygraphy → trtexec
|
||||
- **Flight endurance**: Ground-test full system power draw against 267W estimate
|
||||
- Jetson latency and memory: <400 ms p95, <8 GB shared memory, no 25 W thermal throttle.
|
||||
- Cache budget: 400 km² imagery + manifests + descriptors fits budget or reports explicit split budget.
|
||||
- FDR 8-hour load: <=64 GB, rollover logged, no silent payload loss.
|
||||
- Monte Carlo false-position and cache-poisoning tests for AC-NEW-4 and AC-NEW-7.
|
||||
- Cold boot: first valid `GPS_INPUT` <30 s p95 across 50 runs.
|
||||
|
||||
## References
|
||||
|
||||
- ArduPilot GPS_RATE parameter: [https://github.com/ArduPilot/ardupilot/pull/15980](https://github.com/ArduPilot/ardupilot/pull/15980)
|
||||
- MAVLink GPS_INPUT message: [https://ardupilot.org/mavproxy/docs/modules/GPSInput.html](https://ardupilot.org/mavproxy/docs/modules/GPSInput.html)
|
||||
- pymavlink GPS_INPUT example: [https://webperso.ensta.fr/lebars/Share/GPS_INPUT_pymavlink.py](https://webperso.ensta.fr/lebars/Share/GPS_INPUT_pymavlink.py)
|
||||
- ESKF reference (fixed-wing UAV): [https://github.com/ludvigls/ESKF](https://github.com/ludvigls/ESKF)
|
||||
- ROS ESKF multi-sensor: [https://github.com/EliaTarasov/ESKF](https://github.com/EliaTarasov/ESKF)
|
||||
- Range-VIO scale observability: [https://arxiv.org/abs/2103.15215](https://arxiv.org/abs/2103.15215)
|
||||
- NaviLoc trajectory-level localization: [https://www.mdpi.com/2504-446X/10/2/97](https://www.mdpi.com/2504-446X/10/2/97)
|
||||
- SatLoc-Fusion hierarchical framework: [https://www.scilit.com/publications/e5cafaf875a49297a62b298a89d5572f](https://www.scilit.com/publications/e5cafaf875a49297a62b298a89d5572f)
|
||||
- Auterion GPS-denied workflow: [https://docs.auterion.com/vehicle-operation/auterion-mission-control/useful-resources/operations/gps-denied-workflow](https://docs.auterion.com/vehicle-operation/auterion-mission-control/useful-resources/operations/gps-denied-workflow)
|
||||
- PX4 GNSS-denied flight: [https://docs.px4.io/main/en/advanced_config/gnss_degraded_or_denied_flight.html](https://docs.px4.io/main/en/advanced_config/gnss_degraded_or_denied_flight.html)
|
||||
- ArduPilot GPS_INPUT advanced usage: [https://discuss.ardupilot.org/t/advanced-usage-of-gps-type-mav-14/99406](https://discuss.ardupilot.org/t/advanced-usage-of-gps-type-mav-14/99406)
|
||||
- Google Maps Ukraine imagery: [https://newsukraine.rbc.ua/news/google-maps-has-surprise-for-satellite-imagery-1727182380.html](https://newsukraine.rbc.ua/news/google-maps-has-surprise-for-satellite-imagery-1727182380.html)
|
||||
- Jetson Orin Nano Super thermal: [https://edgeaistack.app/blog/jetson-orin-nano-power-consumption/](https://edgeaistack.app/blog/jetson-orin-nano-power-consumption/)
|
||||
- GSD matching research: [https://www.kjrs.org/journal/view.html?pn=related&uid=756&vmd=Full](https://www.kjrs.org/journal/view.html?pn=related&uid=756&vmd=Full)
|
||||
- VO+satellite matching pipeline: [https://polen.itu.edu.tr/items/1fe1e872-7cea-44d8-a8de-339e4587bee6](https://polen.itu.edu.tr/items/1fe1e872-7cea-44d8-a8de-339e4587bee6)
|
||||
- PyCuVSLAM docs: [https://wiki.seeedstudio.com/pycuvslam_recomputer_robotics/](https://wiki.seeedstudio.com/pycuvslam_recomputer_robotics/)
|
||||
- Pixhawk 6x IMU (ICM-42688-P) datasheet: [https://invensense.tdk.com/products/motion-tracking/6-axis/icm-42688-p/](https://invensense.tdk.com/products/motion-tracking/6-axis/icm-42688-p/)
|
||||
- All references from solution_draft05.md
|
||||
Detailed source registry: `_docs/00_research/01_source_registry.md`.
|
||||
|
||||
Key sources:
|
||||
- BASALT repository: https://github.com/VladyslavUsenko/basalt
|
||||
- HybVIO benchmark paper: https://arxiv.org/pdf/2106.11857
|
||||
- OpenVINS docs: https://docs.openvins.com/index.html
|
||||
- OpenVINS ATE/RTE comparison: https://github.com/rpng/open_vins/issues/402
|
||||
- Kimera mono-inertial caveat: https://github.com/MIT-SPARK/Kimera-VIO/issues/254
|
||||
- AnyLoc paper: https://arxiv.org/html/2308.00688
|
||||
- Aerial VPR survey: https://arxiv.org/abs/2406.00885
|
||||
- ALIKED-LightGlue-ONNX: https://github.com/ikeboo/ALIKED-LightGlue-ONNX
|
||||
- DINOv2 TensorRT issue: https://github.com/NVIDIA/TensorRT/issues/4348
|
||||
|
||||
## Related Artifacts
|
||||
|
||||
- AC Assessment: `_docs/00_research/gps_denied_nav/00_ac_assessment.md`
|
||||
- Completeness assessment research: `_docs/00_research/solution_completeness_assessment/`
|
||||
- Previous research: `_docs/00_research/trt_engine_migration/`
|
||||
- Tech stack evaluation: `_docs/01_solution/tech_stack.md` (needs sync with draft05 corrections)
|
||||
- Security analysis: `_docs/01_solution/security_analysis.md`
|
||||
- Previous draft: `_docs/01_solution/solution_draft05.md`
|
||||
|
||||
- Tech stack evaluation: `_docs/01_solution/tech_stack.md`
|
||||
- Component fit matrix: `_docs/00_research/06_component_fit_matrix.md`
|
||||
- Fact cards: `_docs/00_research/02_fact_cards.md`
|
||||
|
||||
@@ -2,282 +2,148 @@
|
||||
|
||||
## Product Solution Description
|
||||
|
||||
A real-time GPS-denied visual navigation system for fixed-wing UAVs, running entirely on a Jetson Orin Nano Super (8GB). The system determines frame-center GPS coordinates by fusing three information sources: (1) CUDA-accelerated visual odometry (cuVSLAM), (2) absolute position corrections from satellite image matching, and (3) IMU-based motion prediction. Results stream to clients via REST API + SSE in real time.
|
||||
Build an onboard GPS-denied localization service that runs on the companion Jetson, consumes the fixed nadir navigation camera plus flight-controller IMU/attitude/altitude, and emits ArduPilot-compatible `GPS_INPUT` estimates with honest covariance.
|
||||
|
||||
**Hard constraint**: Camera shoots at ~3fps (333-400ms interval). The full pipeline must complete within **400ms per frame**.
|
||||
The solution is a hybrid estimator:
|
||||
|
||||
**Satellite matching strategy**: Benchmark LiteSAM on actual Orin Nano Super hardware as a day-one priority. If LiteSAM cannot achieve ≤400ms at 480px resolution, **abandon it entirely** and use XFeat semi-dense matching as the primary satellite matcher. Speed is non-negotiable.
|
||||
|
||||
**Core architectural principles**:
|
||||
1. **cuVSLAM handles VO** — NVIDIA's CUDA-accelerated library achieves 90fps on Jetson Orin Nano, giving VO essentially "for free" (~11ms/frame).
|
||||
2. **Keyframe-based satellite matching** — satellite matcher runs on keyframes only (every 3-10 frames), amortizing its cost. Non-keyframes rely on cuVSLAM VO + IMU.
|
||||
3. **Every keyframe independently attempts satellite-based geo-localization** — this handles disconnected segments natively.
|
||||
4. **Pipeline parallelism** — satellite matching for frame N overlaps with VO processing of frame N+1 via CUDA streams.
|
||||
|
||||
```
|
||||
┌─────────────────────────────────────────────────────────────────┐
|
||||
│ OFFLINE (Before Flight) │
|
||||
│ Satellite Tiles → Download & Crop → Store as tile pairs │
|
||||
│ (Google Maps) (per flight plan) (disk, GeoHash indexed) │
|
||||
└─────────────────────────────────────────────────────────────────┘
|
||||
│
|
||||
▼
|
||||
┌─────────────────────────────────────────────────────────────────┐
|
||||
│ ONLINE (During Flight) │
|
||||
│ │
|
||||
│ EVERY FRAME (400ms budget): │
|
||||
│ ┌────────────────────────────────┐ │
|
||||
│ │ Camera → Downsample (CUDA 2ms)│ │
|
||||
│ │ → cuVSLAM VO+IMU (~11ms)│──→ ESKF Update → SSE Emit │
|
||||
│ └────────────────────────────────┘ ↑ │
|
||||
│ │ │
|
||||
│ KEYFRAMES ONLY (every 3-10 frames): │ │
|
||||
│ ┌────────────────────────────────────┐ │ │
|
||||
│ │ Satellite match (async CUDA stream)│─────┘ │
|
||||
│ │ LiteSAM or XFeat (see benchmark) │ │
|
||||
│ │ (does NOT block VO output) │ │
|
||||
│ └────────────────────────────────────┘ │
|
||||
│ │
|
||||
│ IMU: 100+Hz continuous → ESKF prediction │
|
||||
└─────────────────────────────────────────────────────────────────┘
|
||||
```text
|
||||
Nav camera + FC telemetry
|
||||
|
|
||||
v
|
||||
Image quality + calibration + orthorectification
|
||||
|
|
||||
+--> Steady state: VO/IMU propagation --> ESKF --> GPS_INPUT + QGC + FDR
|
||||
|
|
||||
+--> Re-anchor triggers: VPR top-K --> local match/RANSAC --> ESKF anchor update
|
||||
|
|
||||
+--> Tile generation: ortho tile + quality sidecar --> local cache --> post-flight Satellite Service upload
|
||||
```
|
||||
|
||||
## Speed Optimization Techniques
|
||||
The steady-state path is intentionally lightweight. Heavy global retrieval and cross-domain local matching run only on cold start, VO failure, sharp turns, disconnected segments, covariance growth, or stale-anchor age.
|
||||
|
||||
### 1. cuVSLAM for Visual Odometry (~11ms/frame)
|
||||
NVIDIA's CUDA-accelerated VO library (v15.0.0, March 2026) achieves 90fps on Jetson Orin Nano. Supports monocular camera + IMU natively. Features: automatic IMU fallback when visual tracking fails, loop closure, Python and C++ APIs. This eliminates custom VO entirely.
|
||||
## Existing / Competitor Solutions Analysis
|
||||
|
||||
### 2. Keyframe-Based Satellite Matching
|
||||
Not every frame needs satellite matching. Strategy:
|
||||
- cuVSLAM provides VO at every frame (high-rate, low-latency)
|
||||
- Satellite matching triggers on **keyframes** selected by:
|
||||
- Fixed interval: every 3-10 frames (~1-3.3s between satellite corrections)
|
||||
- Confidence drop: when ESKF covariance exceeds threshold
|
||||
- VO failure: when cuVSLAM reports tracking loss (sharp turn)
|
||||
Recent fixed-wing GPS-denied research supports the same high-level mechanism: monocular visual odometry alone accumulates scale/drift error, while satellite-image comparison can periodically correct it. Aerial VPR surveys also show why the implementation cannot be naive: weather, season, scale mismatch, repetitive fields, tile overlap, and re-ranking runtime all matter.
|
||||
|
||||
### 3. Satellite Matcher Selection (Benchmark-Driven)
|
||||
|
||||
**Candidate A: LiteSAM (opt)** — Best accuracy for satellite-aerial matching (RMSE@30 = 17.86m on UAV-VisLoc). 6.31M params, MobileOne + TAIFormer + MinGRU. Benchmarked at 497ms on Jetson AGX Orin at 1184px. AGX Orin is 3-4x more powerful than Orin Nano Super (275 TOPS vs 67 TOPS, $2000+ vs $249).
|
||||
|
||||
Realistic Orin Nano Super estimates:
|
||||
- At 1184px: ~1.5-2.0s (unusable)
|
||||
- At 640px: ~500-800ms (borderline)
|
||||
- At 480px: ~300-500ms (best case)
|
||||
|
||||
**Candidate B: XFeat semi-dense** — ~50-100ms on Orin Nano Super. Proven on Jetson. Not specifically designed for cross-view satellite-aerial, but fast and reliable.
|
||||
|
||||
**Decision rule**: Benchmark LiteSAM TensorRT FP16 at 480px on Orin Nano Super. If ≤400ms → use LiteSAM. If >400ms → **abandon LiteSAM, use XFeat as primary**. No hybrid compromises — pick one and optimize it.
|
||||
|
||||
### 4. TensorRT FP16 Optimization
|
||||
LiteSAM's MobileOne backbone is reparameterizable — multi-branch training structure collapses to a single feed-forward path at inference. Combined with TensorRT FP16, this maximizes throughput. INT8 is possible for MobileOne backbone but ViT/transformer components may degrade with INT8.
|
||||
|
||||
### 5. CUDA Stream Pipelining
|
||||
Overlap operations across consecutive frames:
|
||||
- Stream A: cuVSLAM VO for current frame (~11ms) + ESKF fusion (~1ms)
|
||||
- Stream B: Satellite matching for previous keyframe (async)
|
||||
- CPU: SSE emission, tile management, keyframe selection logic
|
||||
|
||||
### 6. Pre-cropped Satellite Tiles
|
||||
Offline: for each satellite tile, store both the raw image and a pre-resized version matching the satellite matcher's input resolution. Runtime avoids resize cost.
|
||||
|
||||
## Existing/Competitor Solutions Analysis
|
||||
|
||||
| Solution | Approach | Accuracy | Hardware | Limitations |
|
||||
|----------|----------|----------|----------|-------------|
|
||||
| Mateos-Ramirez et al. (2024) | VO (ORB) + satellite keypoint correction + Kalman | 142m mean / 17km (0.83%) | Orange Pi class | No re-localization; ORB only; 1000m+ altitude |
|
||||
| SatLoc (2025) | DinoV2 + XFeat + optical flow + adaptive fusion | <15m, >90% coverage | Edge (unspecified) | Paper not fully accessible |
|
||||
| LiteSAM (2025) | MobileOne + TAIFormer + MinGRU subpixel refinement | RMSE@30 = 17.86m on UAV-VisLoc | RTX 3090 (62ms), AGX Orin (497ms) | Not tested on Orin Nano; AGX Orin is 3-4x more powerful |
|
||||
| TerboucheHacene/visual_localization | SuperPoint/SuperGlue/GIM + VO + satellite | Not quantified | Desktop-class | Not edge-optimized |
|
||||
| cuVSLAM (NVIDIA, 2025-2026) | CUDA-accelerated VO+SLAM, mono/stereo/IMU | <1% trajectory error (KITTI), <5cm (EuRoC) | Jetson Orin Nano (90fps) | VO only, no satellite matching |
|
||||
|
||||
**Key insight**: Combine cuVSLAM (best-in-class VO for Jetson) with the fastest viable satellite-aerial matcher via ESKF fusion. LiteSAM is the accuracy leader but unproven on Orin Nano Super — benchmark first, abandon for XFeat if too slow.
|
||||
The selected design borrows the proven structure but rejects an all-in-one SLAM dependency as the product core. OpenVINS and ORB-SLAM3 are useful benchmark/reference implementations, but their GPL-family licensing and broader SLAM lifecycle do not fit a default production dependency for this project.
|
||||
|
||||
## Architecture
|
||||
|
||||
### Component: Visual Odometry
|
||||
### Component: Camera Ingest, Calibration, And Geometry
|
||||
|
||||
| Solution | Tools | Advantages | Limitations | Fit |
|
||||
|----------|-------|-----------|-------------|-----|
|
||||
| cuVSLAM (mono+IMU) | PyCuVSLAM / C++ API | 90fps on Orin Nano, NVIDIA-optimized, loop closure, IMU fallback | Closed-source CUDA library | ✅ Best |
|
||||
| XFeat frame-to-frame | XFeatTensorRT | 5x faster than SuperPoint, open-source | ~30-50ms total, no IMU integration | ⚠️ Fallback |
|
||||
| ORB-SLAM3 | OpenCV + custom | Well-understood, open-source | CPU-heavy, ~30fps on Orin | ⚠️ Slower |
|
||||
| Solution | Tools | Pinned Mode/Config | Advantages | Limitations | Requirements | Security | Cost | API Capability Evidence | Fit |
|
||||
|----------|-------|--------------------|------------|-------------|--------------|----------|------|-------------------------|-----|
|
||||
| OpenCV geometry utility layer | OpenCV 4.x | Camera calibration, undistortion, RANSAC homography, reprojection-error measurement | Mature, local, fast, exact API fit | Not a full estimator | Calibration target, fixed intrinsics/extrinsics, lens/FOV selection | Local-only, no network | Low | MVE: `_docs/00_research/02_fact_cards.md`; Source #5 | Selected |
|
||||
|
||||
**Selected**: **cuVSLAM (mono+IMU mode)** — purpose-built by NVIDIA for Jetson. ~11ms/frame leaves 389ms for everything else. Auto-fallback to IMU when visual tracking fails.
|
||||
**Exact-fit evidence**:
|
||||
- Project constraints checked: fixed nadir camera, calibration, homography, MRE gates.
|
||||
- Disqualifiers: none.
|
||||
- Restrictions × AC matrix: `_docs/00_research/06_component_fit_matrix.md`.
|
||||
|
||||
### Component: Satellite Image Matching
|
||||
### Component: VO / IMU Propagation And Estimator
|
||||
|
||||
| Solution | Tools | Advantages | Limitations | Fit |
|
||||
|----------|-------|-----------|-------------|-----|
|
||||
| LiteSAM (opt) | TensorRT | Best satellite-aerial accuracy (RMSE@30 17.86m), 6.31M params, subpixel refinement | 497ms on AGX Orin at 1184px; AGX Orin is 3-4x more powerful than Orin Nano Super | ✅ If benchmark passes |
|
||||
| XFeat semi-dense | XFeatTensorRT | ~50-100ms, lightweight, Jetson-proven | Not designed for cross-view satellite-aerial | ✅ If LiteSAM fails benchmark |
|
||||
| EfficientLoFTR | TensorRT | Good accuracy, semi-dense | 15.05M params (2.4x LiteSAM), slower | ⚠️ Heavier |
|
||||
| SuperPoint + LightGlue | TensorRT C++ | Good general matching | Sparse only, worse on satellite-aerial | ⚠️ Not specialized |
|
||||
| Solution | Tools | Pinned Mode/Config | Advantages | Limitations | Requirements | Security | Cost | API Capability Evidence | Fit |
|
||||
|----------|-------|--------------------|------------|-------------|--------------|----------|------|-------------------------|-----|
|
||||
| Custom VO/IMU ESKF | OpenCV + custom estimator | Frame-to-frame homography/features + FC IMU/attitude/altitude fused in a custom ESKF with source modes | Owns covariance, source labels, degraded modes, ArduPilot output semantics | More implementation work than adopting VIO library | Synchronized frames/IMU, calibration, replay tests | No third-party cloud; deterministic local logs | Medium | Facts #1, #4, #5, #16 | Selected |
|
||||
| OpenVINS reference | OpenVINS | Monocular camera + IMU EKF/MSCKF reference runs | Strong VIO reference and evaluation tools | GPL-3 production dependency risk | Dataset/replay adapter | Local only | Low for benchmark | Source #3 | Reference only |
|
||||
| ORB-SLAM3 alternative | ORB-SLAM3 | Monocular-inertial SLAM | Mature SLAM benchmark | GPLv3, heavier map lifecycle, initialization complexity | Calibration, vocabulary, runtime tuning | Local only | Medium | Source #4 | Rejected for production |
|
||||
|
||||
**Selection**: Benchmark-driven. Day-one test on Orin Nano Super:
|
||||
1. Export LiteSAM (opt) to TensorRT FP16
|
||||
2. Measure at 480px, 640px, 800px
|
||||
3. If ≤400ms at 480px → **LiteSAM**
|
||||
4. If >400ms at any viable resolution → **XFeat semi-dense** (primary, no hybrid)
|
||||
**Exact-fit evidence**:
|
||||
- Project constraints checked: one camera + IMU, frame-by-frame output, covariance labels, blackout/spoofing modes.
|
||||
- Runtime quality gate: validate drift and covariance on EuRoC-style and representative fixed-wing replay.
|
||||
|
||||
### Component: Sensor Fusion
|
||||
### Component: Satellite Retrieval And Local Anchor
|
||||
|
||||
| Solution | Tools | Advantages | Limitations | Fit |
|
||||
|----------|-------|-----------|-------------|-----|
|
||||
| Error-State EKF (ESKF) | Custom Python/C++ | Lightweight, multi-rate, well-understood | Linear approximation | ✅ Best |
|
||||
| Hybrid ESKF/UKF | Custom | 49% better accuracy | More complex | ⚠️ Upgrade path |
|
||||
| Factor Graph (GTSAM) | GTSAM | Best accuracy | Heavy compute | ❌ Too heavy |
|
||||
| Solution | Tools | Pinned Mode/Config | Advantages | Limitations | Requirements | Security | Cost | API Capability Evidence | Fit |
|
||||
|----------|-------|--------------------|------------|-------------|--------------|----------|------|-------------------------|-----|
|
||||
| DINOv2-VLAD + FAISS + LightGlue | DINOv2/AnyLoc-style descriptors, FAISS, DISK/ALIKED+LightGlue, OpenCV RANSAC | Offline precomputed VPR chunk descriptors; conditional query descriptor; top-K FAISS; local matching on candidates | Matches AC-8.6; scalable top-K; local geometry verifies anchors | Needs Jetson profiling and model-size pruning | Fresh satellite cache, descriptors, dynamic K, RANSAC gates | Cache is local; no in-flight provider calls | Medium-high | MVE blocks in `02_fact_cards.md`; Sources #6-#9 | Selected with runtime gate |
|
||||
| SuperPoint + LightGlue | SuperPoint, LightGlue | Same local matching with SuperPoint features | Strong technical baseline | SuperPoint license is restrictive | Legal review | Local only | Medium | Source #6 | Needs user decision |
|
||||
| Classical SIFT/ORB-only | OpenCV | Handcrafted features and homography | Simple fallback, low compute | Poor cross-domain robustness | Feature-rich scenes | Local only | Low | Source #5 | Fallback / regression baseline |
|
||||
|
||||
**Selected**: **ESKF** with adaptive measurement noise. State vector: [position(3), velocity(3), orientation_quat(4), accel_bias(3), gyro_bias(3)] = 16 states.
|
||||
**Exact-fit evidence**:
|
||||
- Project constraints checked: offline cache, top-K dynamic retrieval, cross-domain local match, <400 ms hot-path constraint.
|
||||
- Runtime gate: heavy VPR/re-ranking is trigger-based, not per-frame.
|
||||
|
||||
Measurement sources and rates:
|
||||
- IMU prediction: 100+Hz
|
||||
- cuVSLAM VO update: ~3Hz (every frame)
|
||||
- Satellite update: ~0.3-1Hz (keyframes only, delayed via async pipeline)
|
||||
### Component: Satellite Cache And Tile Write-Back
|
||||
|
||||
### Component: Satellite Tile Preprocessing (Offline)
|
||||
| Solution | Tools | Pinned Mode/Config | Advantages | Limitations | Requirements | Security | Cost | API Capability Evidence | Fit |
|
||||
|----------|-------|--------------------|------------|-------------|--------------|----------|------|-------------------------|-----|
|
||||
| COG + PostgreSQL/PostGIS manifest + descriptor sidecars | GDAL COG, PostgreSQL/PostGIS manifest, FAISS sidecars | Service tiles and generated candidate tiles stored as tiled compressed GeoTIFFs with CRS/date/source/meter-per-pixel metadata | Geospatial standard, supports write-new-tile workflow, descriptor accounting | Needs careful 10 GB budget | STAC-like manifest, freshness gates, descriptor pruning | Local signed manifests recommended | Medium | Source #18 | Selected |
|
||||
| PMTiles archive | PMTiles | Single-file read archive | Efficient map reads | Read-only; cannot update in place | Archive rebuild for updates | Local file integrity | Low | Source #17 | Rejected for live mutable cache |
|
||||
|
||||
**Selected**: **GeoHash-indexed tile pairs on disk**.
|
||||
**Exact-fit evidence**:
|
||||
- Project constraints checked: offline-only, no raw photo storage, mid-flight generated tiles, Satellite Service ingest metadata.
|
||||
- External dependency: Satellite Service owns the promotion/voting layer for trusted basemap updates.
|
||||
|
||||
Pipeline:
|
||||
1. Define operational area from flight plan
|
||||
2. Download satellite tiles from Google Maps Tile API at max zoom (18-19)
|
||||
3. Pre-resize each tile to matcher input resolution
|
||||
4. Store: original tile + resized tile + metadata (GPS bounds, zoom, GSD) in GeoHash-indexed directory structure
|
||||
5. Copy to Jetson storage before flight
|
||||
### Component: MAVLink Integration And Telemetry
|
||||
|
||||
### Component: Re-localization (Disconnected Segments)
|
||||
| Solution | Tools | Pinned Mode/Config | Advantages | Limitations | Requirements | Security | Cost | API Capability Evidence | Fit |
|
||||
|----------|-------|--------------------|------------|-------------|--------------|----------|------|-------------------------|-----|
|
||||
| MAVSDK telemetry + pymavlink output | MAVSDK, pymavlink | MAVSDK subscribes to telemetry; pymavlink emits `GPS_INPUT` to ArduPilot with `GPS1_TYPE=14` | Exact `GPS_INPUT` field control while keeping high-level telemetry APIs | Plane-specific failsafe/spoof triggers need SITL proof | ArduPilot Plane params, QGC status, FDR tlog | Validate MAVLink source and message rate | Medium | Sources #10-#12 | Selected |
|
||||
|
||||
**Selected**: **Keyframe satellite matching is always active + expanded search on VO failure**.
|
||||
**Exact-fit evidence**:
|
||||
- Project constraints checked: ArduPilot only, v1 `GPS_INPUT` only, WGS84 output, covariance to `horiz_accuracy`.
|
||||
- Validation gate: Plane SITL with production parameters.
|
||||
|
||||
When cuVSLAM reports tracking loss (sharp turn, no features):
|
||||
1. Immediately flag next frame as keyframe → trigger satellite matching
|
||||
2. Expand tile search radius (from ±200m to ±1km based on IMU dead-reckoning uncertainty)
|
||||
3. If match found: position recovered, new segment begins
|
||||
4. If 3+ consecutive keyframe failures: request user input via API
|
||||
### Component: Flight Data Recorder
|
||||
|
||||
### Component: Object Center Coordinates
|
||||
| Solution | Tools | Pinned Mode/Config | Advantages | Limitations | Requirements | Security | Cost | API Capability Evidence | Fit |
|
||||
|----------|-------|--------------------|------------|-------------|--------------|----------|------|-------------------------|-----|
|
||||
| Segmented FDR | PostgreSQL event index + binary/CBOR payloads with Parquet export post-flight | Fixed-size segment files for per-frame estimates, IMU, MAVLink, health, emitted GPS_INPUT, generated-tile metadata | Replayable, bounded, no raw frame retention | Exact format selected during implementation | Rollover policy, monotonic timestamps | Integrity hash per segment recommended | Medium | AC-NEW-3 | Selected pattern |
|
||||
|
||||
Geometric calculation once frame-center GPS is known:
|
||||
1. Pixel offset from center: (dx_px, dy_px)
|
||||
2. Convert to meters: dx_m = dx_px × GSD, dy_m = dy_px × GSD
|
||||
3. Rotate by IMU yaw heading
|
||||
4. Convert meter offset to lat/lon and add to frame-center GPS
|
||||
## Runtime Modes
|
||||
|
||||
### Component: API & Streaming
|
||||
|
||||
**Selected**: **FastAPI + sse-starlette**. REST for session management, SSE for real-time position stream. OpenAPI auto-documentation.
|
||||
|
||||
## Processing Time Budget (per frame, 400ms budget)
|
||||
|
||||
### Normal Frame (non-keyframe, ~60-80% of frames)
|
||||
|
||||
| Step | Time | Notes |
|
||||
|------|------|-------|
|
||||
| Image capture + transfer | ~10ms | CSI/USB3 |
|
||||
| Downsample (for cuVSLAM) | ~2ms | OpenCV CUDA |
|
||||
| cuVSLAM VO+IMU | ~11ms | NVIDIA CUDA-optimized, 90fps capable |
|
||||
| ESKF fusion (VO+IMU update) | ~1ms | C extension or NumPy |
|
||||
| SSE emit | ~1ms | Async |
|
||||
| **Total** | **~25ms** | Well within 400ms |
|
||||
|
||||
### Keyframe Satellite Matching (async, every 3-10 frames)
|
||||
|
||||
Runs asynchronously on a separate CUDA stream — does NOT block per-frame VO output.
|
||||
|
||||
**Path A — LiteSAM (if benchmark passes)**:
|
||||
|
||||
| Step | Time | Notes |
|
||||
|------|------|-------|
|
||||
| Downsample to ~480px | ~1ms | OpenCV CUDA |
|
||||
| Load satellite tile | ~5ms | Pre-resized, from storage |
|
||||
| LiteSAM (opt) matching | ~300-500ms | TensorRT FP16, 480px, Orin Nano Super estimate |
|
||||
| Geometric pose (RANSAC) | ~5ms | Homography estimation |
|
||||
| ESKF satellite update | ~1ms | Delayed measurement |
|
||||
| **Total** | **~310-510ms** | Async, does not block VO |
|
||||
|
||||
**Path B — XFeat (if LiteSAM abandoned)**:
|
||||
|
||||
| Step | Time | Notes |
|
||||
|------|------|-------|
|
||||
| XFeat feature extraction (both images) | ~10-20ms | TensorRT FP16/INT8 |
|
||||
| XFeat semi-dense matching | ~30-50ms | KNN + refinement |
|
||||
| Geometric verification (RANSAC) | ~5ms | |
|
||||
| ESKF satellite update | ~1ms | |
|
||||
| **Total** | **~50-80ms** | Comfortably within budget |
|
||||
|
||||
### Per-Frame Wall-Clock Latency
|
||||
|
||||
Every frame:
|
||||
- **VO result emitted in ~25ms** (cuVSLAM + ESKF + SSE)
|
||||
- Satellite correction arrives asynchronously on keyframes
|
||||
- Client gets immediate position, then refined position when satellite match completes
|
||||
|
||||
## Memory Budget (Jetson Orin Nano Super, 8GB shared)
|
||||
|
||||
| Component | Memory | Notes |
|
||||
|-----------|--------|-------|
|
||||
| OS + runtime | ~1.5GB | JetPack 6.2 + Python |
|
||||
| cuVSLAM | ~200-300MB | NVIDIA CUDA library + internal state |
|
||||
| Satellite matcher TensorRT | ~50-100MB | LiteSAM FP16 or XFeat FP16 |
|
||||
| Current frame (downsampled) | ~2MB | 640×480×3 |
|
||||
| Satellite tile (pre-resized) | ~1MB | Single active tile |
|
||||
| ESKF state + buffers | ~10MB | |
|
||||
| FastAPI + SSE runtime | ~100MB | |
|
||||
| **Total** | **~1.9-2.4GB** | ~25-30% of 8GB — comfortable margin |
|
||||
|
||||
## Confidence Scoring
|
||||
|
||||
| Level | Condition | Expected Accuracy |
|
||||
|-------|-----------|-------------------|
|
||||
| HIGH | Satellite match succeeded + cuVSLAM consistent | <20m |
|
||||
| MEDIUM | cuVSLAM VO only, recent satellite correction (<500m travel) | 20-50m |
|
||||
| LOW | cuVSLAM VO only, no recent satellite correction | 50-100m+ |
|
||||
| VERY LOW | IMU dead-reckoning only (cuVSLAM + satellite both failed) | 100m+ |
|
||||
| MANUAL | User-provided position | As provided |
|
||||
|
||||
## Key Risks and Mitigations
|
||||
|
||||
| Risk | Likelihood | Impact | Mitigation |
|
||||
|------|-----------|--------|------------|
|
||||
| LiteSAM too slow on Orin Nano Super | HIGH | Misses 400ms deadline | **Abandon LiteSAM, use XFeat**. Day-one benchmark is the go/no-go gate |
|
||||
| cuVSLAM not supporting nadir-only camera well | MEDIUM | VO accuracy degrades | Fall back to XFeat frame-to-frame matching |
|
||||
| Google Maps satellite quality in conflict zone | HIGH | Satellite matching fails | Accept VO+IMU with higher drift; request user input sooner; alternative satellite providers |
|
||||
| XFeat cross-view accuracy insufficient | MEDIUM | Position corrections less accurate than LiteSAM | Increase keyframe frequency; multi-tile consensus voting; geometric verification with strict RANSAC |
|
||||
| cuVSLAM is closed-source | LOW | Hard to debug | Fallback to XFeat VO; cuVSLAM has Python+C++ APIs |
|
||||
| Mode | Trigger | Behavior | Output Label |
|
||||
|------|---------|----------|--------------|
|
||||
| Satellite anchored | VPR + local match passes freshness, RANSAC, covariance, and Mahalanobis gates | ESKF absolute update, low covariance, tile generation eligible if sigma gate passes | `satellite_anchored` |
|
||||
| VO extrapolated | Last anchor fresh enough, VO healthy, normal overlap | VO/IMU propagation, covariance grows with anchor age | `vo_extrapolated` |
|
||||
| Dead reckoned | Visual blackout, VO failure without anchor, spoofing while no visual signal | IMU-only propagation, monotonic covariance growth, degraded fix type thresholds | `dead_reckoned` |
|
||||
| Failsafe / no fix | Blackout >30 s or covariance >500 m | `GPS_INPUT.fix_type=0`, `horiz_accuracy=999.0`, QGC status | `dead_reckoned` with failsafe status |
|
||||
|
||||
## Testing Strategy
|
||||
|
||||
### Integration / Functional Tests
|
||||
- End-to-end pipeline test with real flight data (60 images from input_data/)
|
||||
- Compare computed positions against ground truth GPS from coordinates.csv
|
||||
- Measure: percentage within 50m, percentage within 20m
|
||||
- Test sharp-turn handling: introduce 90-degree heading change in sequence
|
||||
- Test user-input fallback: simulate 3+ consecutive failures
|
||||
- Test SSE streaming: verify client receives VO result within 50ms, satellite-corrected result within 500ms
|
||||
- Test session management: start/stop/restart flight sessions via REST API
|
||||
|
||||
- Replay normal overlapping frames and assert AC-2.1a VO registration rate and AC-2.2 VO MRE.
|
||||
- Replay satellite-anchor cases and assert AC-1.1/1.2, AC-2.2 cross-domain MRE, freshness gates, and source labels.
|
||||
- Inject stale tiles and assert no `satellite_anchored` output.
|
||||
- Inject sharp turns and disconnected segments and assert VPR relocalization.
|
||||
- Run ArduPilot Plane SITL with `GPS1_TYPE=14` and assert valid `GPS_INPUT` fields, fix type degradation, and QGC statuses.
|
||||
- Inject visual blackout + spoofed GPS and assert spoofed GPS is ignored, covariance grows, and thresholds match AC-NEW-8.
|
||||
- Request AI-camera object coordinates and assert level-flight projection plus maneuver error bound.
|
||||
|
||||
### Non-Functional Tests
|
||||
- **Day-one benchmark**: LiteSAM TensorRT FP16 at 480/640/800px on Orin Nano Super → go/no-go for LiteSAM
|
||||
- cuVSLAM benchmark: verify 90fps monocular+IMU on Orin Nano Super
|
||||
- Performance: measure per-frame processing time (must be <400ms)
|
||||
- Memory: monitor peak usage during 1000-frame session (must stay <8GB)
|
||||
- Stress: process 3000 frames without memory leak
|
||||
- Keyframe strategy: vary interval (2, 3, 5, 10) and measure accuracy vs latency tradeoff
|
||||
|
||||
- Jetson Orin Nano Super profiling: <400 ms p95, <8 GB shared memory, 25 W no-throttle hot-soak.
|
||||
- Cache build test for 400 km²: imagery + manifests + descriptors fit within budget or fail with explicit budget report.
|
||||
- 8-hour FDR load test: <=64 GB, rollover logged, no silent data loss.
|
||||
- Monte Carlo false-anchor and over-confidence tests for AC-NEW-4 and AC-NEW-7.
|
||||
- Cold boot 50x: first valid `GPS_INPUT` <30 s p95.
|
||||
|
||||
## References
|
||||
- LiteSAM (2025): https://www.mdpi.com/2072-4292/17/19/3349
|
||||
- LiteSAM code: https://github.com/boyagesmile/LiteSAM
|
||||
- cuVSLAM (2025-2026): https://github.com/NVlabs/PyCuVSLAM
|
||||
- PyCuVSLAM API: https://nvlabs.github.io/PyCuVSLAM/api.html
|
||||
- Mateos-Ramirez et al. (2024): https://www.mdpi.com/2076-3417/14/16/7420
|
||||
- SatLoc (2025): https://www.scilit.com/publications/e5cafaf875a49297a62b298a89d5572f
|
||||
- XFeat (CVPR 2024): https://arxiv.org/abs/2404.19174
|
||||
- XFeat TensorRT for Jetson: https://github.com/PranavNedunghat/XFeatTensorRT
|
||||
- EfficientLoFTR (CVPR 2024): https://github.com/zju3dv/EfficientLoFTR
|
||||
- JetPack 6.2: https://docs.nvidia.com/jetson/archives/jetpack-archived/jetpack-62/release-notes/
|
||||
- Hybrid ESKF/UKF: https://arxiv.org/abs/2512.17505
|
||||
- Google Maps Tile API: https://developers.google.com/maps/documentation/tile/satellite
|
||||
|
||||
Detailed source registry: `_docs/00_research/01_source_registry.md`.
|
||||
|
||||
Key sources:
|
||||
- Fixed-wing satellite-aided VO: https://www.mdpi.com/2076-3417/14/16/7420
|
||||
- Aerial VPR survey: https://arxiv.org/abs/2406.00885
|
||||
- OpenVINS docs: https://docs.openvins.com/
|
||||
- ORB-SLAM3 README: https://raw.githubusercontent.com/UZ-SLAMLab/ORB_SLAM3/master/README.md
|
||||
- LightGlue README: https://raw.githubusercontent.com/cvg/LightGlue/main/README.md
|
||||
- FAISS docs: https://faiss.ai/index.html
|
||||
- ArduPilot GPSInput: https://ardupilot.org/mavproxy/docs/modules/GPSInput.html
|
||||
- MAVLink GPS_INPUT: https://mavlink.io/en/messages/common.html#GPS_INPUT
|
||||
- Jetson Orin Nano Super: https://www.nvidia.com/en-us/autonomous-machines/embedded-systems/jetson-orin/nano-super-developer-kit/
|
||||
- PMTiles: https://docs.protomaps.com/pmtiles/
|
||||
- GDAL COG: https://gdal.org/en/stable/drivers/raster/cog.html
|
||||
|
||||
## Related Artifacts
|
||||
- AC Assessment: `_docs/00_research/gps_denied_nav/00_ac_assessment.md`
|
||||
- Tech stack evaluation: `_docs/01_solution/tech_stack.md`
|
||||
|
||||
- AC assessment: `_docs/00_research/00_ac_assessment.md`
|
||||
- Question decomposition: `_docs/00_research/00_question_decomposition.md`
|
||||
- Source registry: `_docs/00_research/01_source_registry.md`
|
||||
- Fact cards: `_docs/00_research/02_fact_cards.md`
|
||||
- Comparison framework: `_docs/00_research/03_comparison_framework.md`
|
||||
- Reasoning chain: `_docs/00_research/04_reasoning_chain.md`
|
||||
- Validation log: `_docs/00_research/05_validation_log.md`
|
||||
- Component fit matrix: `_docs/00_research/06_component_fit_matrix.md`
|
||||
|
||||
@@ -4,353 +4,122 @@
|
||||
|
||||
| Old Component Solution | Weak Point (functional/security/performance) | New Solution |
|
||||
|------------------------|----------------------------------------------|-------------|
|
||||
| LiteSAM at 480px as satellite matcher | **Performance**: 497ms on AGX Orin at 1184px. Orin Nano Super is ~3-4x slower. At 480px estimated ~270-360ms — borderline. Paper uses PyTorch AMP, not TensorRT FP16. TensorRT could bring 2-3x improvement. | Add TensorRT FP16 as mandatory optimization step. Revised estimate at 480px with TensorRT: ~90-180ms. Still benchmark-driven: abandon if >400ms. |
|
||||
| XFeat as LiteSAM fallback for satellite matching | **Functional**: XFeat is a general-purpose feature matcher, NOT designed for cross-view satellite-aerial gap. May fail on season/lighting differences between UAV and satellite imagery. | **Expand fallback options**: benchmark EfficientLoFTR (designed for weak-texture aerial) alongside XFeat. Consider STHN-style deep homography as third option. See detailed satellite matcher comparison below. |
|
||||
| SP+LG considered as "sparse only, worse on satellite-aerial" | **Functional**: LiteSAM paper confirms "SP+LG achieves fastest inference speed but at expense of accuracy." Sparse matcher fails on texture-scarce regions. ~180-360ms on Orin Nano Super. | **Reject SP+LG** for both VO and satellite matching. cuVSLAM is 15-33x faster for VO. |
|
||||
| cuVSLAM on low-texture terrain | **Functional**: cuVSLAM uses Shi-Tomasi corners + Lucas-Kanade tracking. On uniform agricultural fields/water bodies, features will be sparse → frequent tracking loss. IMU fallback lasts only ~1s. No published benchmarks for nadir agricultural terrain. Does NOT guarantee pose recovery after tracking loss. | **CRITICAL RISK**: cuVSLAM will likely fail frequently over low-texture terrain. Mitigation: (1) increase satellite matching frequency in low-texture areas, (2) use IMU dead-reckoning bridge, (3) accept higher drift in featureless segments, (4) XFeat VO as secondary fallback may also struggle on same terrain. |
|
||||
| cuVSLAM memory estimate ~200-300MB | **Performance**: Map grows over time. For 3000-frame flights (~16min at 3fps), map could reach 500MB-1GB without pruning. | Configure cuVSLAM map pruning. Set max keyframes. Monitor memory. |
|
||||
| Tile search on VO failure: "expand to ±1km" | **Functional**: Underspecified. Loading 10-20 tiles slow from disk I/O. | Preload tiles within ±2km of flight plan into RAM. Ranked search by IMU dead-reckoning position. |
|
||||
| LiteSAM resolution | **Performance**: Paper benchmarked at 1184px on AGX Orin (497ms AMP). TensorRT FP16 with reparameterized MobileOne expected 2-3x faster. | Benchmark LiteSAM TRT FP16 at **1280px** on Orin Nano Super. If ≤200ms → use LiteSAM at 1280px. If >200ms → use XFeat. |
|
||||
| SP+LG proposed for VO by user | **Performance**: ~130-280ms/frame on Orin Nano. cuVSLAM ~8.6ms/frame. No IMU, no loop closure. | **Reject SP+LG for VO.** cuVSLAM 15-33x faster. XFeat frame-to-frame remains fallback. |
|
||||
| DINOv2-VLAD with possible TensorRT optimization | TensorRT conversion may produce limited speedup and can alter embedding distances on Jetson-class deployments. | Keep DINOv2-VLAD, but require descriptor-fidelity tests against PyTorch/ONNX before TensorRT descriptors are accepted. |
|
||||
| FAISS CPU/GPU optional | FAISS GPU is not a safe default on Jetson ARM64/aarch64 packaging. | Pin FAISS as CPU-first on Jetson; use PQ/IVF and top-K caps before considering custom GPU builds. |
|
||||
| LightGlue local matcher | SuperPoint path has license risk and community confusion. | Keep DISK/ALIKED+LightGlue as production default; SuperPoint remains license-gated benchmark/fallback only. |
|
||||
| COG cache | "COG cache" could be misread as mutable in-place raster updates. | Use write-new COG tile objects plus manifest versioning and sidecars; never mutate COGs in place. |
|
||||
| `GPS_INPUT` output | ArduPilot velocity ignore flags have reported EKF3 pitfalls. | SITL must validate velocity source parameters, ignore flags, and whether zero velocity is ever fused accidentally. |
|
||||
| Visual/satellite anchoring | Draft did not emphasize adversarial/cache integrity enough. | Add signed cache manifests, tile provenance, freshness gates, anchor consistency checks, and FDR audit trail. |
|
||||
|
||||
## Product Solution Description
|
||||
|
||||
A real-time GPS-denied visual navigation system for fixed-wing UAVs, running entirely on a Jetson Orin Nano Super (8GB). The system determines frame-center GPS coordinates by fusing three information sources: (1) CUDA-accelerated visual odometry (cuVSLAM), (2) absolute position corrections from satellite image matching, and (3) IMU-based motion prediction. Results stream to clients via REST API + SSE in real time.
|
||||
Build an onboard GPS-denied localization service that runs on the Jetson companion computer, uses the fixed downward navigation camera and flight-controller inertial telemetry, and emits ArduPilot `GPS_INPUT` estimates with calibrated covariance and source labels.
|
||||
|
||||
**Hard constraint**: Camera shoots at ~3fps (333-400ms interval). The full pipeline must complete within **400ms per frame**.
|
||||
The production architecture is a trigger-based hybrid estimator:
|
||||
|
||||
**Satellite matching strategy**: Benchmark LiteSAM TensorRT FP16 at **1280px** on Orin Nano Super as a day-one priority. The paper's AGX Orin benchmark used PyTorch AMP — TensorRT FP16 with reparameterized MobileOne should yield 2-3x additional speedup. **Decision rule: if LiteSAM TRT FP16 at 1280px ≤200ms → use LiteSAM. If >200ms → use XFeat.**
|
||||
|
||||
**Core architectural principles**:
|
||||
1. **cuVSLAM handles VO** — 116fps on Orin Nano 8GB, ~8.6ms/frame. SuperPoint+LightGlue was evaluated and rejected (15-33x slower, no IMU integration).
|
||||
2. **Keyframe-based satellite matching** — satellite matcher runs on keyframes only (every 3-10 frames), amortizing cost. Non-keyframes rely on cuVSLAM VO + IMU.
|
||||
3. **Every keyframe independently attempts satellite-based geo-localization** — handles disconnected segments natively.
|
||||
4. **Pipeline parallelism** — satellite matching for frame N overlaps with VO processing of frame N+1 via CUDA streams.
|
||||
5. **Proactive tile loading** — preload tiles within ±2km of flight plan into RAM for fast lookup during expanded search.
|
||||
|
||||
```
|
||||
┌─────────────────────────────────────────────────────────────────┐
|
||||
│ OFFLINE (Before Flight) │
|
||||
│ Satellite Tiles → Download & Crop → Store as tile pairs │
|
||||
│ (Google Maps) (per flight plan) (disk, GeoHash indexed) │
|
||||
└─────────────────────────────────────────────────────────────────┘
|
||||
│
|
||||
▼
|
||||
┌─────────────────────────────────────────────────────────────────┐
|
||||
│ ONLINE (During Flight) │
|
||||
│ │
|
||||
│ EVERY FRAME (400ms budget): │
|
||||
│ ┌────────────────────────────────┐ │
|
||||
│ │ Camera → Downsample (CUDA 2ms)│ │
|
||||
│ │ → cuVSLAM VO+IMU (~9ms) │──→ ESKF Update → SSE Emit │
|
||||
│ └────────────────────────────────┘ ↑ │
|
||||
│ │ │
|
||||
│ KEYFRAMES ONLY (every 3-10 frames): │ │
|
||||
│ ┌────────────────────────────────────┐ │ │
|
||||
│ │ Satellite match (async CUDA stream)│─────┘ │
|
||||
│ │ LiteSAM TRT FP16 or XFeat │ │
|
||||
│ │ (does NOT block VO output) │ │
|
||||
│ └────────────────────────────────────┘ │
|
||||
│ │
|
||||
│ IMU: 100+Hz continuous → ESKF prediction │
|
||||
│ TILES: ±2km preloaded in RAM from flight plan │
|
||||
└─────────────────────────────────────────────────────────────────┘
|
||||
```text
|
||||
Nav camera + FC telemetry
|
||||
|
|
||||
v
|
||||
Image quality + calibration + orthorectification
|
||||
|
|
||||
+--> Hot path: VO/IMU propagation --> custom ESKF --> GPS_INPUT + QGC + FDR
|
||||
|
|
||||
+--> Trigger path: DINOv2-VLAD query --> CPU FAISS top-K --> DISK/ALIKED+LightGlue --> RANSAC --> ESKF anchor
|
||||
|
|
||||
+--> Tile path: new COG tile + quality/provenance sidecar --> manifest update --> post-flight Satellite Service sync
|
||||
```
|
||||
|
||||
## Speed Optimization Techniques
|
||||
|
||||
### 1. cuVSLAM for Visual Odometry (~9ms/frame)
|
||||
NVIDIA's CUDA-accelerated VO library (v15.0.0, March 2026) achieves 116fps on Jetson Orin Nano 8GB at 720p. Supports monocular camera + IMU natively. Features: automatic IMU fallback when visual tracking fails, loop closure, Python and C++ APIs.
|
||||
|
||||
**Why not SuperPoint+LightGlue for VO**: SP+LG is 15-33x slower (~130-280ms vs ~9ms). Lacks IMU integration, loop closure, auto-fallback.
|
||||
|
||||
**CRITICAL: cuVSLAM on difficult/even terrain (agricultural fields, water)**:
|
||||
cuVSLAM uses Shi-Tomasi corner detection + Lucas-Kanade optical flow tracking (classical features, not learned). On uniform agricultural terrain or water bodies:
|
||||
- Very few corners will be detected → sparse/unreliable tracking
|
||||
- Frequent keyframe creation → heavier compute
|
||||
- Tracking loss → IMU fallback (~1 second) → constant-velocity integrator (~0.5s more)
|
||||
- cuVSLAM does NOT guarantee pose recovery after tracking loss
|
||||
- All published benchmarks (KITTI: urban/suburban, EuRoC: indoor) do NOT include nadir agricultural terrain
|
||||
- Multi-stereo mode helps with featureless surfaces, but we have mono camera only
|
||||
|
||||
**Mitigation strategy for low-texture terrain**:
|
||||
1. **Increase satellite matching frequency**: In low-texture areas (detected by cuVSLAM's keypoint count dropping), switch from every 3-10 frames to every frame
|
||||
2. **IMU dead-reckoning bridge**: When cuVSLAM reports tracking loss, ESKF continues with IMU prediction. At 3fps with ~1.5s IMU bridge, that covers ~4-5 frames
|
||||
3. **Accept higher drift**: In featureless segments, position accuracy degrades to IMU-only level (50-100m+ over ~10s). Satellite matching must recover absolute position when texture returns
|
||||
4. **Keypoint density monitoring**: Track cuVSLAM's number of tracked features per frame. When below threshold (e.g., <50), proactively trigger satellite matching
|
||||
5. **XFeat frame-to-frame as VO fallback**: XFeat uses learned features that may detect texture invisible to Shi-Tomasi corners. But XFeat may also struggle on truly uniform terrain
|
||||
|
||||
### 2. Keyframe-Based Satellite Matching
|
||||
Not every frame needs satellite matching. Strategy:
|
||||
- cuVSLAM provides VO at every frame (high-rate, low-latency)
|
||||
- Satellite matching triggers on **keyframes** selected by:
|
||||
- Fixed interval: every 3-10 frames (~1-3.3s between satellite corrections)
|
||||
- Confidence drop: when ESKF covariance exceeds threshold
|
||||
- VO failure: when cuVSLAM reports tracking loss (sharp turn)
|
||||
|
||||
### 3. Satellite Matcher Selection (Benchmark-Driven)
|
||||
|
||||
**Important context**: Our UAV-to-satellite matching is EASIER than typical cross-view geo-localization problems. Both the UAV camera and satellite imagery are approximately nadir (top-down). The main challenges are season/lighting differences, resolution mismatch, and temporal changes — not the extreme viewpoint gap seen in ground-to-satellite matching. This means even general-purpose matchers may perform well.
|
||||
|
||||
**Candidate A: LiteSAM (opt) with TensorRT FP16 at 1280px** — Best satellite-aerial accuracy (RMSE@30 = 17.86m on UAV-VisLoc). 6.31M params, MobileOne reparameterizable for TensorRT. Paper benchmarked at 497ms on AGX Orin using AMP at 1184px. TensorRT FP16 with reparameterized MobileOne expected 2-3x faster than AMP. At 1280px (close to paper's 1184px benchmark resolution), accuracy should match published results.
|
||||
|
||||
Orin Nano Super TensorRT FP16 estimate at 1280px:
|
||||
- AGX Orin AMP @ 1184px: 497ms
|
||||
- TRT FP16 speedup over AMP: ~2-3x → AGX Orin TRT estimate: ~165-250ms
|
||||
- Orin Nano Super is ~3-4x slower → estimate: ~500-1000ms without TRT
|
||||
- With TRT FP16: **~165-330ms** (realistic range)
|
||||
- Go/no-go threshold: **≤200ms**
|
||||
|
||||
**Candidate B (fallback): XFeat semi-dense** — ~50-100ms on Orin Nano Super. Proven on Jetson. General-purpose, not designed for cross-view gap. FASTEST option. Since our cross-view gap is small (both nadir), XFeat may work adequately for this specific use case.
|
||||
|
||||
**Other evaluated options (not selected)**:
|
||||
|
||||
- **EfficientLoFTR**: Semi-dense, 15.05M params, handles weak-texture well. ~20% slower than LiteSAM. Strong option if LiteSAM codebase proves difficult to export to TRT, but larger model footprint.
|
||||
- **Deep Homography (STHN-style)**: End-to-end homography estimation, no feature/RANSAC pipeline. 4.24m at 50m range. Interesting future option but needs RGB retraining — higher implementation risk.
|
||||
- **PFED and retrieval-based methods**: Image RETRIEVAL only (identifies which tile matches), not pixel-level matching. We already know which tile to use from ESKF position.
|
||||
- **SuperPoint+LightGlue**: Sparse matcher. LiteSAM paper confirms worse satellite-aerial accuracy. Slower than XFeat.
|
||||
|
||||
**Decision rule** (day-one on Orin Nano Super):
|
||||
1. Export LiteSAM (opt) to TensorRT FP16
|
||||
2. Benchmark at **1280px**
|
||||
3. **If ≤200ms → use LiteSAM at 1280px**
|
||||
4. **If >200ms → use XFeat**
|
||||
|
||||
### 4. TensorRT FP16 Optimization
|
||||
LiteSAM's MobileOne backbone is reparameterizable — multi-branch training structure collapses to a single feed-forward path at inference. Combined with TensorRT FP16, this maximizes throughput. **Do NOT use INT8 on transformer components** (TAIFormer) — accuracy degrades. INT8 is safe only for the MobileOne backbone CNN layers.
|
||||
|
||||
### 5. CUDA Stream Pipelining
|
||||
Overlap operations across consecutive frames:
|
||||
- Stream A: cuVSLAM VO for current frame (~9ms) + ESKF fusion (~1ms)
|
||||
- Stream B: Satellite matching for previous keyframe (async)
|
||||
- CPU: SSE emission, tile management, keyframe selection logic
|
||||
|
||||
### 6. Proactive Tile Loading
|
||||
**Change from draft01**: Instead of loading tiles on-demand from disk, preload tiles within ±2km of the flight plan into RAM at session start. This eliminates disk I/O latency during flight. For a 50km flight path, ~2000 tiles at zoom 19 ≈ ~200MB RAM — well within budget.
|
||||
|
||||
On VO failure / expanded search:
|
||||
1. Compute IMU dead-reckoning position
|
||||
2. Rank preloaded tiles by distance to predicted position
|
||||
3. Try top 3 tiles (not all tiles in ±1km radius)
|
||||
4. If no match in top 3, expand to next 3
|
||||
|
||||
## Existing/Competitor Solutions Analysis
|
||||
|
||||
| Solution | Approach | Accuracy | Hardware | Limitations |
|
||||
|----------|----------|----------|----------|-------------|
|
||||
| Mateos-Ramirez et al. (2024) | VO (ORB) + satellite keypoint correction + Kalman | 142m mean / 17km (0.83%) | Orange Pi class | No re-localization; ORB only; 1000m+ altitude |
|
||||
| SatLoc (2025) | DinoV2 + XFeat + optical flow + adaptive fusion | <15m, >90% coverage | Edge (unspecified) | Paper not fully accessible |
|
||||
| LiteSAM (2025) | MobileOne + TAIFormer + MinGRU subpixel refinement | RMSE@30 = 17.86m on UAV-VisLoc | RTX 3090 (62ms), AGX Orin (497ms@1184px) | Not tested on Orin Nano; AGX Orin is 3-4x more powerful |
|
||||
| TerboucheHacene/visual_localization | SuperPoint/SuperGlue/GIM + VO + satellite | Not quantified | Desktop-class | Not edge-optimized |
|
||||
| cuVSLAM (NVIDIA, 2025-2026) | CUDA-accelerated VO+SLAM, mono/stereo/IMU | <1% trajectory error (KITTI), <5cm (EuRoC) | Jetson Orin Nano (116fps) | VO only, no satellite matching |
|
||||
| VRLM (2024) | FocalNet backbone + multi-scale feature fusion | 83.35% MA@20 | Desktop | Not edge-optimized |
|
||||
| Scale-Aware UAV-to-Satellite (2026) | Semantic geometric + metric scale recovery | N/A | Desktop | Addresses scale ambiguity problem |
|
||||
| EfficientLoFTR (CVPR 2024) | Aggregated attention + adaptive token selection, semi-dense | Competitive with LiteSAM | 2.5x faster than LoFTR, TRT available | 15.05M params, heavier than LiteSAM |
|
||||
| PFED (2025) | Knowledge distillation + multi-view refinement, retrieval | 97.15% Recall@1 (University-1652) | AGX Orin (251.5 FPS) | Retrieval only, not pixel-level matching |
|
||||
| STHN (IEEE RA-L 2024) | Deep homography estimation, coarse-to-fine | 4.24m at 50m range | Open-source, lightweight | Trained on thermal, needs RGB retraining |
|
||||
| Hierarchical AVL (2025) | DINOv2 retrieval + SuperPoint matching | 64.5-95% success rate | ROS, IMU integration | Two-stage complexity |
|
||||
| JointLoc (IROS 2024) | Retrieval + VO fusion, adaptive weighting | 0.237m RMSE over 1km | Open-source | Designed for Mars/planetary, needs adaptation |
|
||||
Heavy retrieval and local matching are not steady-state per-frame dependencies. They run on cold start, VO failure, sharp turns, disconnected segments, covariance growth, or stale-anchor age.
|
||||
|
||||
## Architecture
|
||||
|
||||
### Component: Visual Odometry
|
||||
### Component: Camera Ingest, Calibration, And Geometry
|
||||
|
||||
| Solution | Tools | Advantages | Limitations | Performance | Fit |
|
||||
|----------|-------|-----------|-------------|------------|-----|
|
||||
| cuVSLAM (mono+IMU) | PyCuVSLAM v15.0.0 | 116fps on Orin Nano, NVIDIA-optimized, loop closure, IMU fallback | Closed-source CUDA library | ~9ms/frame | ✅ Best |
|
||||
| XFeat frame-to-frame | XFeatTensorRT | 5x faster than SuperPoint, open-source | ~30-50ms total, no IMU integration | ~30-50ms/frame | ⚠️ Fallback |
|
||||
| SuperPoint+LightGlue | LightGlue-ONNX TRT | Good accuracy, adaptive pruning | ~130-280ms, no IMU, no loop closure | ~130-280ms/frame | ❌ Rejected |
|
||||
| ORB-SLAM3 | OpenCV + custom | Well-understood, open-source | CPU-heavy, ~30fps on Orin | ~33ms/frame | ⚠️ Slower |
|
||||
| Solution | Tools | Pinned Mode/Config | Advantages | Limitations | Requirements | Security | Performance | API Capability Evidence | Fit |
|
||||
|----------|-------|--------------------|------------|-------------|--------------|----------|-------------|-------------------------|-----|
|
||||
| OpenCV geometry utility layer | OpenCV 4.x | Calibration, undistortion, RANSAC homography, MRE measurement | Mature, exact fit, permissive | Not a full estimator | Checkerboard calibration, fixed extrinsics, lens/FOV selection | Local-only | Fast enough for hot-path utility use | MVE in `02_fact_cards.md`; Source #5 | Selected |
|
||||
|
||||
**Selected**: **cuVSLAM (mono+IMU mode)** — 116fps, purpose-built by NVIDIA for Jetson. Auto-fallback to IMU when visual tracking fails.
|
||||
### Component: VO / IMU Propagation And Estimator
|
||||
|
||||
**SP+LG rejection rationale**: 15-33x slower than cuVSLAM. No built-in IMU fusion, loop closure, or tracking failure detection. Building these features around SP+LG would take significant development time and still be slower. XFeat at ~30-50ms is a better fallback for VO if cuVSLAM fails on nadir camera.
|
||||
| Solution | Tools | Pinned Mode/Config | Advantages | Limitations | Requirements | Security | Performance | API Capability Evidence | Fit |
|
||||
|----------|-------|--------------------|------------|-------------|--------------|----------|-------------|-------------------------|-----|
|
||||
| Custom VO/IMU ESKF | OpenCV + custom estimator | Nadir VO/homography + FC IMU/attitude/altitude fused in ESKF with mode labels | Owns covariance, source labels, blackout/spoofing behavior | More implementation effort | Synchronized frames/IMU, calibration, replay tests | No network dependency | Hot path is lightweight | Facts #1, #16 | Selected |
|
||||
| OpenVINS | OpenVINS | Monocular+IMU reference runs | Strong EKF/MSCKF reference | GPL-3 production risk | Replay adapter | Local only | Benchmark only | Source #3 | Reference only |
|
||||
| ORB-SLAM3 | ORB-SLAM3 | Monocular-inertial SLAM | Mature benchmark | GPLv3 and heavier SLAM lifecycle | Calibration/vocabulary/runtime tuning | Local only | Riskier on embedded | Source #4 | Rejected for production |
|
||||
|
||||
### Component: Satellite Image Matching
|
||||
### Component: Satellite Retrieval And Anchor Verification
|
||||
|
||||
| Solution | Tools | Advantages | Limitations | Performance | Fit |
|
||||
|----------|-------|-----------|-------------|------------|-----|
|
||||
| LiteSAM (opt) TRT FP16 @ 1280px | TensorRT | Best satellite-aerial accuracy (RMSE@30 17.86m), 6.31M params, subpixel refinement | Untested on Orin Nano Super with TensorRT | Est. ~165-330ms @ 1280px TRT FP16 | ✅ If ≤200ms |
|
||||
| XFeat semi-dense | XFeatTensorRT | ~50-100ms, lightweight, Jetson-proven, fastest | General-purpose, not designed for cross-view. Our nadir-nadir gap is small → may work. | ~50-100ms | ✅ Fallback if LiteSAM >200ms |
|
||||
| Solution | Tools | Pinned Mode/Config | Advantages | Limitations | Requirements | Security | Performance | API Capability Evidence | Fit |
|
||||
|----------|-------|--------------------|------------|-------------|--------------|----------|-------------|-------------------------|-----|
|
||||
| DINOv2-VLAD + CPU FAISS + DISK/ALIKED+LightGlue | DINOv2/AnyLoc-style descriptors, FAISS CPU, LightGlue, OpenCV RANSAC | Offline VPR chunk descriptors; conditional query descriptor; CPU FAISS top-K; local match on candidates; TensorRT only after fidelity check | Strong retrieval+geometry structure; avoids per-frame map search | Requires profiling and representative data | Descriptor cache, dynamic K, freshness, RANSAC, Mahalanobis gates | Signed manifests, provenance, stale-tile rejection | Trigger path only; top-K capped | MVE blocks in `02_fact_cards.md`; Sources #6-#9, #21-#25 | Selected with runtime/fidelity gates |
|
||||
| SuperPoint+LightGlue | SuperPoint, LightGlue | Same matcher with SuperPoint features | Strong technical baseline | SuperPoint license risk | Legal review | Local only | Benchmark only | Sources #6, #23 | Needs user decision |
|
||||
| Classical SIFT/ORB | OpenCV | Handcrafted features + homography | Simple and cheap | Weak cross-domain robustness | Feature-rich scenes | Local only | Fast | Source #5 | Regression baseline |
|
||||
|
||||
**Selection**: Day-one benchmark on Orin Nano Super:
|
||||
1. Export LiteSAM (opt) to TensorRT FP16
|
||||
2. Benchmark at **1280px**
|
||||
3. **If ≤200ms → LiteSAM at 1280px**
|
||||
4. **If >200ms → XFeat**
|
||||
### Component: Cache And Tile Lifecycle
|
||||
|
||||
### Component: Sensor Fusion
|
||||
| Solution | Tools | Pinned Mode/Config | Advantages | Limitations | Requirements | Security | Performance | API Capability Evidence | Fit |
|
||||
|----------|-------|--------------------|------------|-------------|--------------|----------|-------------|-------------------------|-----|
|
||||
| COG tile objects + manifest + sidecars | GDAL COG, manifest DB/JSON, FAISS index files | Service tiles and generated tiles are write-new COG objects; active version selected by manifest | Geospatial standard, supports provenance and quality metadata | Descriptor budget pressure | CRS/date/source/m/px/freshness, sidecar hashes | Signed manifests, tile provenance, hash verification | Efficient local reads | Source #18; Facts #21, #29 | Selected |
|
||||
| PMTiles | PMTiles | Read-only archive snapshot | Compact read package | Cannot update in place | Archive rebuild | Hash archive | Good for read-only export | Source #17 | Rejected for live cache |
|
||||
|
||||
| Solution | Tools | Advantages | Limitations | Performance | Fit |
|
||||
|----------|-------|-----------|-------------|------------|-----|
|
||||
| Error-State EKF (ESKF) | Custom Python/C++ | Lightweight, multi-rate, well-understood | Linear approximation | <1ms/step | ✅ Best |
|
||||
| Hybrid ESKF/UKF | Custom | 49% better accuracy | More complex | ~2-3ms/step | ⚠️ Upgrade path |
|
||||
| Factor Graph (GTSAM) | GTSAM | Best accuracy | Heavy compute | ~10-50ms/step | ❌ Too heavy |
|
||||
### Component: MAVLink Integration
|
||||
|
||||
**Selected**: **ESKF** with adaptive measurement noise. State vector: [position(3), velocity(3), orientation_quat(4), accel_bias(3), gyro_bias(3)] = 16 states.
|
||||
| Solution | Tools | Pinned Mode/Config | Advantages | Limitations | Requirements | Security | Performance | API Capability Evidence | Fit |
|
||||
|----------|-------|--------------------|------------|-------------|--------------|----------|-------------|-------------------------|-----|
|
||||
| MAVSDK telemetry + pymavlink `GPS_INPUT` | MAVSDK, pymavlink | MAVSDK subscriptions; pymavlink emits `GPS_INPUT`; Plane SITL validates `GPS1_TYPE=14`, velocity source params, ignore flags, fix types, accuracy fields | Exact output control with good telemetry ergonomics | SITL required to prove Plane behavior | ArduPilot Plane params, QGC, tlog/FDR | Link/source validation, status audit | Light CPU load | Sources #10-#12, #24 | Selected |
|
||||
|
||||
### Component: Satellite Tile Preprocessing (Offline)
|
||||
### Component: Security And Safety Controls
|
||||
|
||||
**Selected**: **GeoHash-indexed tile pairs on disk + RAM preloading**.
|
||||
| Solution | Tools | Pinned Mode/Config | Advantages | Limitations | Requirements | Security | Performance | API Capability Evidence | Fit |
|
||||
|----------|-------|--------------------|------------|-------------|--------------|----------|-------------|-------------------------|-----|
|
||||
| Consistency-gated anchor acceptance | Custom ESKF gates, cache manifest verification | Anchor accepted only if freshness, provenance, RANSAC, covariance, Mahalanobis, and temporal consistency pass | Prevents confident false fixes | Needs calibrated thresholds | Representative replay and Monte Carlo | Rejects stale/poisoned/low-confidence anchors | Lightweight after candidate generation | Facts #16, #17, #28 | Selected |
|
||||
| FDR audit trail | Segmented logs + hashes | Logs estimates, inputs, emitted GPS_INPUT, health, tile writes, anchor decisions | Supports incident analysis and cache-poisoning audits | Schema work | 64 GB rollover | Tamper-evident hashes recommended | Sequential writes | AC-NEW-3 | Selected |
|
||||
|
||||
Pipeline:
|
||||
1. Define operational area from flight plan
|
||||
2. Download satellite tiles from Google Maps Tile API at max zoom (18-19)
|
||||
3. Pre-resize each tile to matcher input resolution
|
||||
4. Store: original tile + resized tile + metadata (GPS bounds, zoom, GSD) in GeoHash-indexed directory structure
|
||||
5. Copy to Jetson storage before flight
|
||||
6. **At session start**: preload tiles within ±2km of flight plan into RAM (~200MB for 50km route)
|
||||
## Runtime Modes
|
||||
|
||||
### Component: Re-localization (Disconnected Segments)
|
||||
|
||||
**Selected**: **Keyframe satellite matching is always active + ranked tile search on VO failure**.
|
||||
|
||||
When cuVSLAM reports tracking loss (sharp turn, no features):
|
||||
1. Immediately flag next frame as keyframe → trigger satellite matching
|
||||
2. Compute IMU dead-reckoning position since last known position
|
||||
3. Rank preloaded tiles by distance to dead-reckoning position
|
||||
4. Try top 3 tiles sequentially (not all tiles in radius)
|
||||
5. If match found: position recovered, new segment begins
|
||||
6. If 3 consecutive keyframe failures across top tiles: expand to next 3 tiles
|
||||
7. If still no match after 3+ full attempts: request user input via API
|
||||
|
||||
### Component: Object Center Coordinates
|
||||
|
||||
Geometric calculation once frame-center GPS is known:
|
||||
1. Pixel offset from center: (dx_px, dy_px)
|
||||
2. Convert to meters: dx_m = dx_px × GSD, dy_m = dy_px × GSD
|
||||
3. Rotate by IMU yaw heading
|
||||
4. Convert meter offset to lat/lon and add to frame-center GPS
|
||||
|
||||
### Component: API & Streaming
|
||||
|
||||
**Selected**: **FastAPI + sse-starlette**. REST for session management, SSE for real-time position stream. OpenAPI auto-documentation.
|
||||
|
||||
## Processing Time Budget (per frame, 400ms budget)
|
||||
|
||||
### Normal Frame (non-keyframe, ~60-80% of frames)
|
||||
|
||||
| Step | Time | Notes |
|
||||
|------|------|-------|
|
||||
| Image capture + transfer | ~10ms | CSI/USB3 |
|
||||
| Downsample (for cuVSLAM) | ~2ms | OpenCV CUDA |
|
||||
| cuVSLAM VO+IMU | ~9ms | NVIDIA CUDA-optimized, 116fps capable |
|
||||
| ESKF fusion (VO+IMU update) | ~1ms | C extension or NumPy |
|
||||
| SSE emit | ~1ms | Async |
|
||||
| **Total** | **~23ms** | Well within 400ms |
|
||||
|
||||
### Keyframe Satellite Matching (async, every 3-10 frames)
|
||||
|
||||
Runs asynchronously on a separate CUDA stream — does NOT block per-frame VO output.
|
||||
|
||||
**Path A — LiteSAM TRT FP16 at 1280px (if ≤200ms benchmark)**:
|
||||
|
||||
| Step | Time | Notes |
|
||||
|------|------|-------|
|
||||
| Downsample to 1280px | ~1ms | OpenCV CUDA |
|
||||
| Load satellite tile | ~1ms | Pre-loaded in RAM |
|
||||
| LiteSAM (opt) TRT FP16 matching | ≤200ms | TensorRT FP16, 1280px, go/no-go threshold |
|
||||
| Geometric pose (RANSAC) | ~5ms | Homography estimation |
|
||||
| ESKF satellite update | ~1ms | Delayed measurement |
|
||||
| **Total** | **≤210ms** | Async, within budget |
|
||||
|
||||
**Path B — XFeat (if LiteSAM >200ms)**:
|
||||
|
||||
| Step | Time | Notes |
|
||||
|------|------|-------|
|
||||
| XFeat feature extraction (both images) | ~10-20ms | TensorRT FP16/INT8 |
|
||||
| XFeat semi-dense matching | ~30-50ms | KNN + refinement |
|
||||
| Geometric verification (RANSAC) | ~5ms | |
|
||||
| ESKF satellite update | ~1ms | |
|
||||
| **Total** | **~50-80ms** | Comfortably within budget |
|
||||
|
||||
## Memory Budget (Jetson Orin Nano Super, 8GB shared)
|
||||
|
||||
| Component | Memory | Notes |
|
||||
|-----------|--------|-------|
|
||||
| OS + runtime | ~1.5GB | JetPack 6.2 + Python |
|
||||
| cuVSLAM | ~200-500MB | CUDA library + map state. **Configure map pruning for 3000-frame flights** |
|
||||
| Satellite matcher TensorRT | ~50-100MB | LiteSAM FP16 or XFeat FP16 |
|
||||
| Preloaded satellite tiles | ~200MB | ±2km of flight plan, pre-resized |
|
||||
| Current frame (downsampled) | ~2MB | 640×480×3 |
|
||||
| ESKF state + buffers | ~10MB | |
|
||||
| FastAPI + SSE runtime | ~100MB | |
|
||||
| **Total** | **~2.1-2.9GB** | ~26-36% of 8GB — comfortable margin |
|
||||
|
||||
## Confidence Scoring
|
||||
|
||||
| Level | Condition | Expected Accuracy |
|
||||
|-------|-----------|-------------------|
|
||||
| HIGH | Satellite match succeeded + cuVSLAM consistent | <20m |
|
||||
| MEDIUM | cuVSLAM VO only, recent satellite correction (<500m travel) | 20-50m |
|
||||
| LOW | cuVSLAM VO only, no recent satellite correction | 50-100m+ |
|
||||
| VERY LOW | IMU dead-reckoning only (cuVSLAM + satellite both failed) | 100m+ |
|
||||
| MANUAL | User-provided position | As provided |
|
||||
|
||||
## Key Risks and Mitigations
|
||||
|
||||
| Risk | Likelihood | Impact | Mitigation |
|
||||
|------|-----------|--------|------------|
|
||||
| **cuVSLAM fails on low-texture agricultural terrain** | **HIGH** | Frequent tracking loss, degraded VO | Increase satellite matching frequency when keypoint count drops. IMU dead-reckoning bridge (~1.5s). Accept higher drift in featureless segments. Satellite matching recovers position when texture returns. |
|
||||
| LiteSAM TRT FP16 >200ms at 1280px on Orin Nano Super | MEDIUM | Must use XFeat instead (less accurate for cross-view) | Day-one TRT FP16 benchmark. If >200ms → XFeat. Since our nadir-nadir gap is small, XFeat may still perform adequately. |
|
||||
| XFeat cross-view accuracy insufficient | MEDIUM | Satellite corrections less accurate | Benchmark XFeat on actual operational area satellite-aerial pairs. Increase keyframe frequency; multi-tile consensus; strict RANSAC. |
|
||||
| cuVSLAM map memory growth on long flights | MEDIUM | Memory pressure | Configure map pruning, set max keyframes. Monitor memory. |
|
||||
| Google Maps satellite quality in conflict zone | HIGH | Satellite matching fails | Accept VO+IMU with higher drift; request user input sooner; alternative satellite providers |
|
||||
| cuVSLAM is closed-source, no nadir benchmarks | MEDIUM | Unknown failure modes over farmland | Extensive testing with real nadir UAV imagery before deployment. XFeat VO as fallback (also uses learned features). |
|
||||
| Tile I/O bottleneck during expanded search | LOW | Delayed re-localization | Preload ±2km tiles in RAM; ranked search instead of exhaustive |
|
||||
| Mode | Trigger | Behavior | `GPS_INPUT` / Telemetry |
|
||||
|------|---------|----------|--------------------------|
|
||||
| `satellite_anchored` | VPR + local match passes all gates | ESKF absolute update; tile write eligible only if sigma gate passes | 3D fix, `horiz_accuracy` >= 95% covariance semi-major axis |
|
||||
| `vo_extrapolated` | VO healthy and anchor age/covariance within bounds | VO/IMU propagation; covariance grows | 3D/2D depending covariance threshold |
|
||||
| `dead_reckoned` | visual blackout or no accepted anchor | IMU-only propagation, monotonic covariance growth | degraded fix type; QGC `VISUAL_BLACKOUT_IMU_ONLY` |
|
||||
| failsafe/no-fix | covariance >500 m or blackout >30 s | stop pretending position is valid | `fix_type=0`, `horiz_accuracy=999.0`, QGC `VISUAL_BLACKOUT_FAILSAFE` |
|
||||
|
||||
## Testing Strategy
|
||||
|
||||
### Integration / Functional Tests
|
||||
- End-to-end pipeline test with real flight data (60 images from input_data/)
|
||||
- Compare computed positions against ground truth GPS from coordinates.csv
|
||||
- Measure: percentage within 50m, percentage within 20m
|
||||
- Test sharp-turn handling: introduce 90-degree heading change in sequence
|
||||
- Test user-input fallback: simulate 3+ consecutive failures
|
||||
- Test SSE streaming: verify client receives VO result within 50ms, satellite-corrected result within 500ms
|
||||
- Test session management: start/stop/restart flight sessions via REST API
|
||||
- Test cuVSLAM map memory: run 3000-frame session, monitor memory growth
|
||||
|
||||
- VO replay: assert AC-2.1a and AC-2.2 VO MRE on overlapping frame pairs.
|
||||
- Satellite anchor replay: assert AC-1.1/1.2, AC-2.2 cross-domain MRE, freshness rejection, and source labels.
|
||||
- DINOv2 descriptor fidelity: compare PyTorch/ONNX/TensorRT embeddings and retrieval rankings before accepting optimized engines.
|
||||
- FAISS CPU index tests: top-K recall, query latency, index size, save/load behavior on Jetson ARM64.
|
||||
- LightGlue extractor matrix: DISK vs ALIKED vs SuperPoint benchmark; SuperPoint output excluded from production unless license approved.
|
||||
- COG cache lifecycle: write-new generated tile, update manifest, verify active version and rollback.
|
||||
- `GPS_INPUT` SITL: validate fix type, `horiz_accuracy`, velocity fields, ignore flags, `EK3_SRC1_*` parameters, QGC behavior.
|
||||
- Security gates: stale tile, mismatched tile hash, low inlier ratio, impossible velocity jump, and spoofed GPS during blackout.
|
||||
|
||||
### Non-Functional Tests
|
||||
- **Day-one satellite matcher benchmark**: LiteSAM TRT FP16 at **1280px** on Orin Nano Super. If ≤200ms → use LiteSAM. If >200ms → use XFeat. Also measure accuracy on test satellite-aerial pairs for both.
|
||||
- cuVSLAM benchmark: verify 116fps monocular+IMU on Orin Nano Super
|
||||
- **cuVSLAM terrain stress test**: test with nadir camera over (a) urban/structured terrain, (b) agricultural fields, (c) water/uniform terrain, (d) forest. Measure: keypoint count, tracking success rate, drift per 100 frames, IMU fallback frequency
|
||||
- cuVSLAM keypoint monitoring: verify that low-keypoint detection triggers increased satellite matching
|
||||
- Performance: measure per-frame processing time (must be <400ms)
|
||||
- Memory: monitor peak usage during 3000-frame session (must stay <8GB)
|
||||
- Stress: process 3000 frames without memory leak
|
||||
- Keyframe strategy: vary interval (2, 3, 5, 10) and measure accuracy vs latency tradeoff
|
||||
- Tile preloading: verify RAM usage of preloaded tiles for 50km flight plan
|
||||
|
||||
- Jetson latency and memory: <400 ms p95, <8 GB shared memory, no 25 W thermal throttle.
|
||||
- Cache budget: 400 km² imagery + manifests + descriptors fits budget or reports explicit split budget.
|
||||
- FDR 8-hour load: <=64 GB, rollover logged, no silent payload loss.
|
||||
- Monte Carlo false-position and cache-poisoning tests for AC-NEW-4 and AC-NEW-7.
|
||||
- Cold boot: first valid `GPS_INPUT` <30 s p95 across 50 runs.
|
||||
|
||||
## References
|
||||
- EfficientLoFTR (CVPR 2024): https://github.com/zju3dv/EfficientLoFTR
|
||||
- EfficientLoFTR paper: https://zju3dv.github.io/efficientloftr/
|
||||
- LoFTR TensorRT adaptation: https://github.com/Kolkir/LoFTR_TRT
|
||||
- PFED (2025): https://github.com/SkyEyeLoc/PFED
|
||||
- STHN (IEEE RA-L 2024): https://github.com/arplaboratory/STHN
|
||||
- JointLoc (IROS 2024): https://github.com/LuoXubo/JointLoc
|
||||
- Hierarchical AVL (MDPI 2025): https://www.mdpi.com/2072-4292/17/20/3470
|
||||
- LiteSAM (2025): https://www.mdpi.com/2072-4292/17/19/3349
|
||||
- LiteSAM code: https://github.com/boyagesmile/LiteSAM
|
||||
- cuVSLAM (2025-2026): https://github.com/NVlabs/PyCuVSLAM
|
||||
- cuVSLAM paper: https://arxiv.org/abs/2506.04359
|
||||
- PyCuVSLAM API: https://nvlabs.github.io/PyCuVSLAM/api.html
|
||||
- Intermodalics cuVSLAM benchmark: https://www.intermodalics.ai/blog/nvidia-isaac-ros-in-depth-cuvslam-and-the-dp3-1-release
|
||||
- Mateos-Ramirez et al. (2024): https://www.mdpi.com/2076-3417/14/16/7420
|
||||
- SatLoc (2025): https://www.scilit.com/publications/e5cafaf875a49297a62b298a89d5572f
|
||||
- XFeat (CVPR 2024): https://arxiv.org/abs/2404.19174
|
||||
- XFeat TensorRT for Jetson: https://github.com/PranavNedunghat/XFeatTensorRT
|
||||
- EfficientLoFTR (CVPR 2024): https://github.com/zju3dv/EfficientLoFTR
|
||||
- LightGlue (ICCV 2023): https://github.com/cvg/LightGlue
|
||||
- LightGlue TensorRT: https://fabio-sim.github.io/blog/accelerating-lightglue-inference-onnx-runtime-tensorrt/
|
||||
- LightGlue TRT Jetson: https://github.com/qdLMF/LightGlue-with-FlashAttentionV2-TensorRT
|
||||
- ForestVO / SP+LG VO: https://arxiv.org/html/2504.01261v1
|
||||
- vo_lightglue (SP+LG VO): https://github.com/himadrir/vo_lightglue
|
||||
- JetPack 6.2: https://docs.nvidia.com/jetson/archives/jetpack-archived/jetpack-62/release-notes/
|
||||
- Hybrid ESKF/UKF: https://arxiv.org/abs/2512.17505
|
||||
- Google Maps Tile API: https://developers.google.com/maps/documentation/tile/satellite
|
||||
|
||||
Detailed source registry: `_docs/00_research/01_source_registry.md`.
|
||||
|
||||
Key added Mode B sources:
|
||||
- DINOv2 TensorRT issue: https://github.com/NVIDIA/TensorRT/issues/4348
|
||||
- DINOv2 Jetson forum issue: https://forums.developer.nvidia.com/t/dinov2-tensorrt-model-performance-issue/312251
|
||||
- LightGlue license discussion: https://github.com/cvg/LightGlue/issues/120
|
||||
- ArduPilot GPS_INPUT velocity issue: https://github.com/ArduPilot/ardupilot/issues/19633
|
||||
- FAISS install docs: https://github.com/facebookresearch/faiss/blob/main/INSTALL.md
|
||||
- Orthophoto visual geolocalization: https://ar5iv.labs.arxiv.org/html/2103.14381
|
||||
|
||||
## Related Artifacts
|
||||
- AC Assessment: `_docs/00_research/gps_denied_nav/00_ac_assessment.md`
|
||||
|
||||
- Tech stack evaluation: `_docs/01_solution/tech_stack.md`
|
||||
- Security analysis: `_docs/01_solution/security_analysis.md`
|
||||
- Component fit matrix: `_docs/00_research/06_component_fit_matrix.md`
|
||||
- Fact cards: `_docs/00_research/02_fact_cards.md`
|
||||
|
||||
+45
-243
@@ -1,257 +1,59 @@
|
||||
# Tech Stack Evaluation
|
||||
|
||||
## Requirements Summary
|
||||
## Requirements Analysis
|
||||
|
||||
### Functional
|
||||
- GPS-denied visual navigation for fixed-wing UAV
|
||||
- Frame-center GPS estimation via VO + satellite matching + IMU fusion
|
||||
- Object-center GPS via geometric projection
|
||||
- Real-time streaming via REST API + SSE
|
||||
- Disconnected route segment handling
|
||||
- User-input fallback for unresolvable frames
|
||||
|
||||
### Non-Functional
|
||||
- <400ms per-frame processing (camera @ ~3fps)
|
||||
- <50m accuracy for 80% of frames, <20m for 60%
|
||||
- <8GB total memory (CPU+GPU shared pool)
|
||||
- Up to 3000 frames per flight session
|
||||
- Image Registration Rate >95% (normal segments)
|
||||
|
||||
### Hardware Constraints
|
||||
- **Jetson Orin Nano Super** (8GB LPDDR5, 1024 CUDA cores, 67 TOPS INT8)
|
||||
- **JetPack 6.2.2**: CUDA 12.6.10, TensorRT 10.3.0, cuDNN 9.3
|
||||
- ARM64 (aarch64) architecture
|
||||
- No internet connectivity during flight
|
||||
| Area | Requirement |
|
||||
|------|-------------|
|
||||
| Runtime | Jetson Orin Nano Super, Ubuntu/JetPack, CUDA/TensorRT available, 8 GB shared memory, 25 W thermal envelope. |
|
||||
| Language | Python is acceptable for orchestration/prototyping; C++/TensorRT paths likely needed for hot vision loops. |
|
||||
| Vision | Calibration, undistortion, homography, VPR descriptors, local matching, RANSAC verification. |
|
||||
| Estimation | BASALT VIO for relative camera+IMU propagation, wrapped by project-owned safety/anchor estimator logic for covariance calibration, source labels, blackout/spoofing modes, and MAVLink output mapping. |
|
||||
| Storage | Offline satellite cache, descriptor index, FDR with no raw frame retention. |
|
||||
| Autopilot | ArduPilot Plane, `GPS_INPUT` through pymavlink, MAVSDK for telemetry. |
|
||||
|
||||
## Technology Evaluation
|
||||
|
||||
### Platform & OS
|
||||
|
||||
| Option | Version | Score (1-5) | Notes |
|
||||
|--------|---------|-------------|-------|
|
||||
| **JetPack 6.2.2 (L4T)** | Ubuntu 22.04 based | **5** | Only supported OS for Orin Nano Super. Includes CUDA 12.6, TensorRT 10.3, cuDNN 9.3 |
|
||||
|
||||
**Selected**: JetPack 6.2.2 — no alternative.
|
||||
|
||||
### Primary Language
|
||||
|
||||
| Option | Fitness | Maturity | Perf on Jetson | Ecosystem | Score |
|
||||
|--------|---------|----------|----------------|-----------|-------|
|
||||
| **Python 3.10+** | 5 | 5 | 4 | 5 | **4.8** |
|
||||
| C++ | 5 | 5 | 5 | 3 | 4.5 |
|
||||
| Rust | 3 | 3 | 5 | 2 | 3.3 |
|
||||
|
||||
**Selected**: **Python 3.10+** as primary language.
|
||||
- cuVSLAM provides Python bindings (PyCuVSLAM v15.0.0)
|
||||
- TensorRT has Python API
|
||||
- FastAPI is Python-native
|
||||
- OpenCV has full Python+CUDA bindings
|
||||
- Performance-critical paths offloaded to CUDA via cuVSLAM/TensorRT — Python is glue code only
|
||||
- C++ for custom ESKF if NumPy proves too slow (unlikely for 16-state EKF at 100Hz)
|
||||
|
||||
### Visual Odometry
|
||||
|
||||
| Option | Version | FPS on Orin Nano | Memory | License | Score |
|
||||
|--------|---------|------------------|--------|---------|-------|
|
||||
| **cuVSLAM (PyCuVSLAM)** | v15.0.0 (Mar 2026) | 116fps @ 720p | ~200-300MB | Free (NVIDIA, closed-source) | **5** |
|
||||
| XFeat frame-to-frame | TensorRT engine | ~30-50ms/frame | ~50MB | MIT | 3.5 |
|
||||
| ORB-SLAM3 | v1.0 | ~30fps | ~300MB | GPLv3 | 2.5 |
|
||||
|
||||
**Selected**: **PyCuVSLAM v15.0.0**
|
||||
- 116fps on Orin Nano 8G at 720p (verified via Intermodalics benchmark)
|
||||
- Mono + IMU mode natively supported
|
||||
- Auto IMU fallback on tracking loss
|
||||
- Pre-built aarch64 wheel: `pip install -e bin/aarch64`
|
||||
- Loop closure built-in
|
||||
|
||||
**Risk**: Closed-source; nadir-only camera not explicitly tested. **Fallback**: XFeat frame-to-frame matching.
|
||||
|
||||
### Satellite Image Matching (Benchmark-Driven Selection)
|
||||
|
||||
**Day-one benchmark decides between two candidates:**
|
||||
|
||||
| Option | Params | Accuracy (UAV-VisLoc) | Est. Time on Orin Nano | License | Score |
|
||||
|--------|--------|----------------------|----------------------|---------|-------|
|
||||
| **LiteSAM (opt)** | 6.31M | RMSE@30 = 17.86m | ~300-500ms @ 480px (estimated) | Open-source | **4** (if fast enough) |
|
||||
| **XFeat semi-dense** | ~5M | Not benchmarked on UAV-VisLoc | ~50-100ms | MIT | **4** (if LiteSAM too slow) |
|
||||
|
||||
**Decision rule**:
|
||||
1. Export LiteSAM (opt) to TensorRT FP16 on Orin Nano Super
|
||||
2. Benchmark at 480px, 640px, 800px
|
||||
3. If ≤400ms at 480px → LiteSAM
|
||||
4. If >400ms → **abandon LiteSAM, XFeat is primary**
|
||||
|
||||
| Requirement | LiteSAM (opt) | XFeat semi-dense |
|
||||
|-------------|---------------|------------------|
|
||||
| PyTorch → ONNX → TensorRT export | Required | Required |
|
||||
| TensorRT FP16 engine | 6.31M params, ~25MB engine | ~5M params, ~20MB engine |
|
||||
| Input preprocessing | Resize to 480px, normalize | Resize to 640px, normalize |
|
||||
| Matching pipeline | End-to-end (detect + match + refine) | Detect → KNN match → geometric verify |
|
||||
| Cross-view robustness | Designed for satellite-aerial gap | General-purpose, less robust |
|
||||
|
||||
### Sensor Fusion
|
||||
|
||||
| Option | Complexity | Accuracy | Compute @ 100Hz | Score |
|
||||
|--------|-----------|----------|-----------------|-------|
|
||||
| **ESKF (custom)** | Low | Good | <1ms/step | **5** |
|
||||
| Hybrid ESKF/UKF | Medium | 49% better | ~2-3ms/step | 3.5 |
|
||||
| GTSAM Factor Graph | High | Best | ~10-50ms/step | 2 |
|
||||
|
||||
**Selected**: **Custom ESKF in Python (NumPy/SciPy)**
|
||||
- 16-state vector, well within NumPy capability
|
||||
- FilterPy (v1.4.5, MIT) as reference/fallback, but custom implementation preferred for tighter control
|
||||
- If 100Hz IMU prediction step proves slow in Python: rewrite as Cython or C extension (~1 day effort)
|
||||
|
||||
### Image Preprocessing
|
||||
|
||||
| Option | Tool | Time on Orin Nano | Notes | Score |
|
||||
|--------|------|-------------------|-------|-------|
|
||||
| **OpenCV CUDA resize** | cv2.cuda.resize | ~2-3ms (pre-allocated) | Must build OpenCV with CUDA from source. Pre-allocate GPU mats to avoid allocation overhead | **4** |
|
||||
| NVIDIA VPI resize | VPI 3.2 | ~1-2ms | Part of JetPack, potentially faster | 4 |
|
||||
| CPU resize (OpenCV) | cv2.resize | ~5-10ms | No GPU needed, simpler | 3 |
|
||||
|
||||
**Selected**: **OpenCV CUDA** (pre-allocated GPU memory) or **VPI 3.2** (whichever is faster in benchmark). Both available in JetPack 6.2.
|
||||
- Must build OpenCV from source with `CUDA_ARCH_BIN=8.7` for Orin Nano Ampere architecture
|
||||
- Alternative: VPI 3.2 is pre-installed in JetPack 6.2, no build step needed
|
||||
|
||||
### API & Streaming Framework
|
||||
|
||||
| Option | Version | Async Support | SSE Support | Score |
|
||||
|--------|---------|--------------|-------------|-------|
|
||||
| **FastAPI + sse-starlette** | FastAPI 0.115+, sse-starlette 3.3.2 | Native async/await | EventSourceResponse with auto-disconnect | **5** |
|
||||
| Flask + flask-sse | Flask 3.x | Limited | Redis dependency | 2 |
|
||||
| Raw aiohttp | aiohttp 3.x | Full | Manual SSE implementation | 3 |
|
||||
|
||||
**Selected**: **FastAPI + sse-starlette v3.3.2**
|
||||
- sse-starlette: 108M downloads/month, BSD-3 license, production-stable
|
||||
- Auto-generated OpenAPI docs
|
||||
- Native async for non-blocking VO + satellite pipeline
|
||||
- Uvicorn as ASGI server
|
||||
|
||||
### Satellite Tile Storage & Indexing
|
||||
|
||||
| Option | Complexity | Lookup Speed | Score |
|
||||
|--------|-----------|-------------|-------|
|
||||
| **GeoHash-indexed directory** | Low | O(1) hash lookup | **5** |
|
||||
| SQLite + spatial index | Medium | O(log n) | 4 |
|
||||
| PostGIS | High | O(log n) | 2 (overkill) |
|
||||
|
||||
**Selected**: **GeoHash-indexed directory structure**
|
||||
- Pre-flight: download tiles, store as `{geohash}/{zoom}_{x}_{y}.jpg` + `{geohash}/{zoom}_{x}_{y}_resized.jpg`
|
||||
- Runtime: compute geohash from ESKF position → direct directory lookup
|
||||
- Metadata in JSON sidecar files
|
||||
- No database dependency on the Jetson during flight
|
||||
|
||||
### Satellite Tile Provider
|
||||
|
||||
| Provider | Max Zoom | GSD | Pricing | Eastern Ukraine Coverage | Score |
|
||||
|----------|----------|-----|---------|--------------------------|-------|
|
||||
| **Google Maps Tile API** | 18-19 | ~0.3-0.5 m/px | 100K tiles free/month, then $0.48/1K | Partial (conflict zone gaps) | **4** |
|
||||
| Bing Maps | 18-19 | ~0.3-0.5 m/px | 125K free/year (basic) | Similar | 3.5 |
|
||||
| Mapbox Satellite | 18-19 | ~0.5 m/px | 200K free/month | Similar | 3.5 |
|
||||
|
||||
**Selected**: **Google Maps Tile API** (per restrictions.md). 100K free tiles/month covers ~25km² at zoom 19. For larger operational areas, costs are manageable at $0.48/1K tiles.
|
||||
|
||||
### Output Format
|
||||
|
||||
| Format | Standard | Tooling | Score |
|
||||
|--------|----------|---------|-------|
|
||||
| **GeoJSON** | RFC 7946 | Universal GIS support | **5** |
|
||||
| CSV (lat, lon, confidence) | De facto | Simple, lightweight | 4 |
|
||||
|
||||
**Selected**: **GeoJSON** as primary, CSV as export option. Per AC: WGS84 coordinates.
|
||||
| Layer | Selected | Alternatives Considered | Rationale | Risk |
|
||||
|-------|----------|-------------------------|-----------|------|
|
||||
| OS / GPU stack | JetPack Ubuntu + CUDA + TensorRT | Plain Ubuntu without JetPack | Required for Jetson acceleration and profiling. | Thermal/performance tuning. |
|
||||
| Calibration / geometry | OpenCV 4.x | Custom NumPy-only geometry | Mature APIs for calibration, undistortion, homography, RANSAC. | Version pin and calibration quality. |
|
||||
| VO / estimator | BASALT + project-owned safety/anchor wrapper | OpenVINS, Kimera-VIO, custom OpenCV/ESKF, ORB-SLAM3, VINS-Fusion | BASALT is the selected production VIO candidate; wrapper owns source labels, covariance gates, degraded modes, satellite-anchor acceptance, and MAVLink semantics. | BASALT confidence/covariance must be calibrated; nadir fixed-wing replay required. |
|
||||
| VPR descriptors | DINOv2-VLAD / AnyLoc-style, model-size profiled, TensorRT only after fidelity check | MixVPR, SALAD, NetVLAD, classical BoW | Strong retrieval evidence; good offline descriptor model. | Memory/latency and embedding drift if optimized incorrectly. |
|
||||
| Vector search | FAISS CPU-first on Jetson ARM64 | HNSWLIB, PostgreSQL/pgvector metadata-assisted search, brute-force NumPy, custom FAISS GPU build | Mature top-K search, save/load, PQ compression; PostgreSQL stores spatial/mission metadata around descriptor files. | CPU query latency must be profiled; GPU FAISS is not default on aarch64. |
|
||||
| Local matching | DISK/ALIKED + LightGlue | SuperPoint+LightGlue, SIFT/ORB, LoFTR | Exact match outputs; Apache/BSD-friendly path; adaptive speed knobs. | Jetson profiling needed. |
|
||||
| Raster cache | COG + PostgreSQL/PostGIS manifest + signed JSON sidecars | PMTiles, MBTiles, loose tile folders | COG fits geospatial raster and write-new-tile workflow; PostGIS supports spatial/freshness queries. | 10 GB budget pressure and local DB availability. |
|
||||
| MAVLink | MAVSDK telemetry + pymavlink `GPS_INPUT` | MAVSDK-only, MAVProxy bridge | MAVSDK does telemetry well; `GPS_INPUT` needs raw field control. | Plane SITL validation. |
|
||||
| FDR | PostgreSQL event index + CBOR/binary payload segments with optional Parquet export | Raw images, plain CSV | Queryable event metadata, bounded payload segments, no raw frame retention. | Schema and local DB availability. |
|
||||
| Testing | ArduPilot Plane SITL + AerialVL/VPAir + EuRoC + representative flight replay | Public datasets only | No public dataset covers all ACs. | Representative data collection required. |
|
||||
|
||||
## Tech Stack Summary
|
||||
|
||||
```
|
||||
┌────────────────────────────────────────────────────┐
|
||||
│ HARDWARE: Jetson Orin Nano Super 8GB │
|
||||
│ OS: JetPack 6.2.2 (L4T / Ubuntu 22.04) │
|
||||
│ CUDA 12.6.10 / TensorRT 10.3.0 / cuDNN 9.3 │
|
||||
├────────────────────────────────────────────────────┤
|
||||
│ LANGUAGE: Python 3.10+ │
|
||||
│ FRAMEWORK: FastAPI + sse-starlette 3.3.2 │
|
||||
│ SERVER: Uvicorn (ASGI) │
|
||||
├────────────────────────────────────────────────────┤
|
||||
│ VISUAL ODOMETRY: PyCuVSLAM v15.0.0 │
|
||||
│ SATELLITE MATCH: LiteSAM(opt) or XFeat (benchmark) │
|
||||
│ SENSOR FUSION: Custom ESKF (NumPy/SciPy) │
|
||||
│ PREPROCESSING: OpenCV CUDA or VPI 3.2 │
|
||||
│ INFERENCE: TensorRT 10.3.0 (FP16) │
|
||||
├────────────────────────────────────────────────────┤
|
||||
│ TILE PROVIDER: Google Maps Tile API │
|
||||
│ TILE STORAGE: GeoHash-indexed directory │
|
||||
│ OUTPUT: GeoJSON (WGS84) via SSE stream │
|
||||
└────────────────────────────────────────────────────┘
|
||||
```
|
||||
|
||||
## Dependency List
|
||||
|
||||
### Python Packages (pip)
|
||||
|
||||
| Package | Version | Purpose |
|
||||
|---------|---------|---------|
|
||||
| pycuvslam | v15.0.0 (aarch64 wheel) | Visual odometry |
|
||||
| fastapi | >=0.115 | REST API framework |
|
||||
| sse-starlette | >=3.3.2 | SSE streaming |
|
||||
| uvicorn | >=0.30 | ASGI server |
|
||||
| numpy | >=1.26 | ESKF math, array ops |
|
||||
| scipy | >=1.12 | Rotation matrices, spatial transforms |
|
||||
| opencv-python (CUDA build) | >=4.8 | Image preprocessing (must build from source with CUDA) |
|
||||
| torch (aarch64) | >=2.3 (JetPack-compatible) | LiteSAM model loading (if selected) |
|
||||
| tensorrt | 10.3.0 (JetPack bundled) | Inference engine |
|
||||
| pycuda | >=2024.1 | CUDA stream management |
|
||||
| geojson | >=3.1 | GeoJSON output formatting |
|
||||
| pygeohash | >=1.2 | GeoHash tile indexing |
|
||||
|
||||
### System Dependencies (JetPack 6.2.2)
|
||||
|
||||
| Component | Version | Notes |
|
||||
|-----------|---------|-------|
|
||||
| CUDA Toolkit | 12.6.10 | Pre-installed |
|
||||
| TensorRT | 10.3.0 | Pre-installed |
|
||||
| cuDNN | 9.3 | Pre-installed |
|
||||
| VPI | 3.2 | Pre-installed, alternative to OpenCV CUDA for resize |
|
||||
| cuVSLAM runtime | Bundled with PyCuVSLAM wheel | |
|
||||
|
||||
### Offline Preprocessing Tools (developer machine, not Jetson)
|
||||
|
||||
| Tool | Purpose |
|
||||
|------|---------|
|
||||
| Python 3.10+ | Tile download script |
|
||||
| Google Maps Tile API key | Satellite tile access |
|
||||
| torch + LiteSAM weights | Feature pre-extraction (if LiteSAM selected) |
|
||||
| trtexec (TensorRT) | Model export to TensorRT engine |
|
||||
- **Primary implementation**: Python for orchestration, test harness, cache tooling, and MAVLink integration; C++/TensorRT for hot-path vision if profiling requires it.
|
||||
- **Vision utilities**: OpenCV 4.x.
|
||||
- **Estimator**: BASALT VIO plus project-owned safety/anchor wrapper and mode machine.
|
||||
- **Global retrieval**: DINOv2-VLAD style descriptors with CPU-first FAISS top-K search.
|
||||
- **Local matching**: DISK/ALIKED + LightGlue; SuperPoint only after license review.
|
||||
- **Cache**: COG imagery, PostgreSQL/PostGIS manifest metadata, signed JSON sidecars, FAISS index files.
|
||||
- **Autopilot**: MAVSDK subscriptions plus pymavlink `GPS_INPUT`.
|
||||
- **Validation**: public datasets for component de-risking, Plane SITL for integration, representative flight/replay data for acceptance.
|
||||
|
||||
## Risk Assessment
|
||||
|
||||
| Technology | Risk | Likelihood | Impact | Mitigation |
|
||||
|-----------|------|-----------|--------|------------|
|
||||
| cuVSLAM | Closed-source, nadir camera untested | Medium | High | XFeat frame-to-frame as open-source fallback |
|
||||
| LiteSAM | May exceed 400ms on Orin Nano Super | High | High | **Abandon for XFeat** — day-one benchmark is go/no-go |
|
||||
| OpenCV CUDA build | Build complexity on Jetson, CUDA arch compatibility | Medium | Low | VPI 3.2 as drop-in alternative (pre-installed) |
|
||||
| Google Maps Tile API | Conflict zone coverage gaps, EEA restrictions | Medium | Medium | Test tile availability for operational area pre-flight; alternative providers (Bing, Mapbox) |
|
||||
| Custom ESKF | Implementation bugs, tuning effort | Low | Medium | FilterPy v1.4.5 as reference; well-understood algorithm |
|
||||
| Python GIL | Concurrent VO + satellite matching contention | Low | Low | CUDA operations release GIL; use asyncio + threading for I/O |
|
||||
| Risk | Impact | Mitigation |
|
||||
|------|--------|------------|
|
||||
| VPR/local matching exceeds Jetson latency | AC-4.1 failure | Conditional VPR, top-K caps, downsampled descriptors, CPU FAISS profiling, TensorRT only after embedding-fidelity checks. |
|
||||
| Descriptor cache exceeds 10 GB | AC-8.3/storage failure | PQ/compression, multi-scale budget report, split descriptor budget if needed. |
|
||||
| GPL library accidentally becomes production dependency | Licensing issue | Keep OpenVINS/ORB-SLAM3 as reference only unless legal approves; BASALT is the production VIO candidate. |
|
||||
| BASALT covariance under-reports real error | False-position safety budget failure | Calibrate wrapper covariance against OpenVINS covariance, ground truth replay, and satellite anchor residuals. |
|
||||
| Plane failsafe differs from Copter docs | Safety behavior mismatch | Production-parameter ArduPilot Plane SITL gate. |
|
||||
| `GPS_INPUT` velocity ignore flags behave unexpectedly | EKF drift or false velocity fusion | SITL tests for velocity fields, ignore flags, and `EK3_SRC1_*` source parameters. |
|
||||
| Public datasets fail to represent Ukrainian agricultural terrain | False confidence | Require representative synchronized flight/replay data before AC signoff. |
|
||||
| Thermal throttling | Latency regression | Hot-soak test and throttle logging per AC-NEW-5. |
|
||||
|
||||
## Learning Requirements
|
||||
## Learning / Implementation Requirements
|
||||
|
||||
| Technology | Team Expertise Needed | Ramp-up Time |
|
||||
|-----------|----------------------|--------------|
|
||||
| PyCuVSLAM | SLAM concepts, Python API, camera calibration | 2-3 days |
|
||||
| TensorRT model export | ONNX export, trtexec, FP16 optimization | 2-3 days |
|
||||
| LiteSAM architecture | Transformer-based matching (if selected) | 1-2 days |
|
||||
| XFeat | Feature detection/matching concepts | 1 day |
|
||||
| ESKF | Kalman filtering, quaternion math, multi-rate fusion | 3-5 days |
|
||||
| FastAPI + SSE | Async Python, ASGI, SSE protocol | 1 day |
|
||||
| GeoHash spatial indexing | Geospatial concepts | 0.5 days |
|
||||
| Jetson deployment | JetPack, power modes, thermal management | 2-3 days |
|
||||
|
||||
## Development Environment
|
||||
|
||||
| Environment | Purpose | Setup |
|
||||
|-------------|---------|-------|
|
||||
| **Developer machine** (x86_64, GPU) | Development, unit testing, model export | Docker with CUDA + TensorRT |
|
||||
| **Jetson Orin Nano Super** | Integration testing, benchmarking, deployment | JetPack 6.2.2 flashed, SSH access |
|
||||
|
||||
Code should be developed and unit-tested on x86_64, then deployed to Jetson for integration/performance testing. cuVSLAM and TensorRT engines are aarch64-only — mock these in x86_64 tests.
|
||||
- Jetson profiling with CUDA/TensorRT and memory instrumentation.
|
||||
- ArduPilot Plane SITL, `GPS_INPUT`, GPS spoof/failsafe parameters.
|
||||
- Geospatial raster formats: COG, CRS, tile matrices, m/px metadata, manifests.
|
||||
- BASALT integration, covariance calibration, and Mahalanobis rejection in the safety/anchor wrapper.
|
||||
- Aerial VPR benchmark methodology and georeference recall.
|
||||
|
||||
Reference in New Issue
Block a user