start over again

This commit is contained in:
Oleksandr Bezdieniezhnykh
2026-05-07 04:08:03 +03:00
parent ee6606a9c2
commit 8382cdae10
351 changed files with 0 additions and 30337 deletions
-346
View File
@@ -1,346 +0,0 @@
# Security Analysis
## Operational Context
This system runs on a UAV operating in a **conflict zone** (eastern Ukraine). The UAV could be shot down and physically captured. GPS denial/spoofing is the premise. The Jetson Orin Nano stores satellite imagery, flight plans, and captured photos. Security must assume the worst case: **physical access by an adversary**.
## Threat Model
### Asset Inventory
| Asset | Sensitivity | Location | Notes |
|-------|------------|----------|-------|
| Captured camera imagery | HIGH | Jetson storage | Reconnaissance data — reveals what was surveyed |
| Satellite tile cache | MEDIUM | Jetson storage | Reveals operational area and areas of interest |
| Flight plan / route | HIGH | Jetson memory + storage | Reveals mission objectives and launch/landing sites |
| Computed GPS positions | HIGH | Jetson memory, SSE stream | Real-time position data of UAV and surveyed targets |
| Google Maps API key | MEDIUM | Offline prep machine only | Used pre-flight, NOT stored on Jetson |
| TensorRT model weights | LOW | Jetson storage | LiteSAM/XFeat — publicly available models |
| cuVSLAM binary | LOW | Jetson storage | NVIDIA proprietary but freely distributed |
| IMU calibration data | LOW | Jetson storage | Device-specific calibration |
| System configuration | MEDIUM | Jetson storage | API endpoints, tile paths, fusion parameters |
### Threat Actors
| Actor | Capability | Motivation | Likelihood |
|-------|-----------|------------|------------|
| **Adversary military (physical capture)** | Full physical access after UAV loss | Extract intelligence: imagery, flight plans, operational area | HIGH |
| **Electronic warfare unit** | GPS spoofing/jamming, RF jamming | Disrupt navigation, force UAV off course | HIGH (GPS denial is the premise) |
| **Network attacker (ground station link)** | Intercept/inject on UAV-to-ground comms | Steal position data, inject false commands | MEDIUM |
| **Insider / rogue operator** | Authorized access to system | Data exfiltration, mission sabotage | LOW |
| **Supply chain attacker** | Tampered satellite tiles or model weights | Feed corrupted reference data → position errors | LOW |
### Attack Vectors
| Vector | Target Asset | Actor | Impact | Likelihood |
|--------|-------------|-------|--------|------------|
| **Physical extraction of storage** | All stored data | Adversary (capture) | Full intelligence compromise | HIGH |
| **GPS spoofing** | Position estimate | EW unit | Already mitigated — system is GPS-denied by design | N/A |
| **IMU acoustic injection** | IMU data → ESKF | EW unit | Drift injection, subtle position errors | LOW |
| **Camera blinding/spoofing** | VO + satellite matching | EW unit | VO failure, incorrect satellite matches | LOW |
| **Adversarial ground patterns** | Satellite matching | Adversary | Physical patches on ground fool feature matching | VERY LOW |
| **SSE stream interception** | Position data | Network attacker | Real-time position leak | MEDIUM |
| **API command injection** | Flight session control | Network attacker | Start/stop/manipulate sessions | MEDIUM |
| **Corrupted satellite tiles** | Satellite matching | Supply chain | Systematic position errors | LOW |
| **Model weight tampering** | Matching accuracy | Supply chain | Degraded matching → higher drift | LOW |
## Per-Component Security Requirements and Controls
### 1. Data at Rest (Jetson Storage)
| Requirement | Risk Level | Control | Implementation |
|-------------|-----------|---------|----------------|
| Protect captured imagery from extraction after capture | CRITICAL | Full-disk encryption (LUKS) | JetPack LUKS support with `ENC_ROOTFS=1`. Use OP-TEE Trusted Application for key management. Modify `luks-srv` to NOT auto-decrypt — require hardware token or secure erase trigger |
| Protect satellite tiles and flight plans | HIGH | Same LUKS encryption | Included in full-disk encryption scope |
| Enable rapid secure erase on capture/crash | CRITICAL | Tamper-triggered wipe | Hardware dead-man switch: if UAV telemetry lost for N seconds OR accelerometer detects crash impact → trigger `cryptsetup luksErase` on all LUKS volumes. Destroys key material in <1 second — data becomes unrecoverable |
| Prevent cold-boot key extraction | HIGH | Minimize key residency in RAM | ESKF state and position history cleared from memory when session ends. Avoid writing position logs to disk unless encrypted |
### 2. Secure Boot
| Requirement | Risk Level | Control | Implementation |
|-------------|-----------|---------|----------------|
| Prevent unauthorized code execution | HIGH | NVIDIA Secure Boot with PKC fuse burning | Burn `SecurityMode` fuse (odm_production_mode=0x1) on production Jetsons. Sign all boot images with PKC key pair. Generate keys via HSM |
| Prevent firmware rollback | MEDIUM | Ratchet fuses | Configure anti-rollback fuses in fuse configuration XML |
| Debug port lockdown | HIGH | Disable JTAG/debug after production | Burn debug-disable fuses. Irreversible — production units only |
### 3. API & Communication (FastAPI + SSE)
| Requirement | Risk Level | Control | Implementation |
|-------------|-----------|---------|----------------|
| Authenticate API clients | HIGH | JWT bearer token | Pre-shared secret between ground station and Jetson. Generate JWT at session start. Short expiry (flight duration). `HTTPBearer` scheme in FastAPI |
| Encrypt SSE stream | HIGH | TLS 1.3 | Uvicorn with TLS certificate (self-signed for field use, pre-installed on ground station). All SSE position data encrypted in transit |
| Prevent unauthorized session control | HIGH | JWT + endpoint authorization | Session start/stop/anchor endpoints require valid JWT. Rate-limit via `slowapi` |
| Prevent replay attacks | MEDIUM | JWT `exp` + `jti` claims | Token expiry per-flight. Unique token ID (`jti`) tracked to prevent reuse |
| Limit API surface | MEDIUM | Minimal endpoint exposure | Only expose: POST /sessions, GET /sessions/{id}/stream (SSE), POST /sessions/{id}/anchor, DELETE /sessions/{id}. No admin/debug endpoints in production |
### 4. Visual Odometry (cuVSLAM)
| Requirement | Risk Level | Control | Implementation |
|-------------|-----------|---------|----------------|
| Detect camera feed tampering | LOW | Sanity checks on frame consistency | If consecutive frames show implausible motion (>500m displacement at 3fps), flag as suspicious. ESKF covariance spike triggers satellite re-localization |
| Protect against VO poisoning | LOW | Cross-validate VO with IMU | ESKF fusion inherently cross-validates: IMU and VO disagreement raises covariance, triggers satellite matching. No single sensor can silently corrupt position |
### 5. Satellite Image Matching
| Requirement | Risk Level | Control | Implementation |
|-------------|-----------|---------|----------------|
| Verify tile integrity | MEDIUM | SHA-256 checksums per tile | During offline preprocessing: compute SHA-256 for each tile pair. Store checksums in signed manifest. At runtime: verify checksum on tile load |
| Prevent adversarial tile injection | MEDIUM | Signed tile manifest | Offline tool signs manifest with private key. Jetson verifies signature with embedded public key before accepting tile set |
| Detect satellite match outliers | MEDIUM | RANSAC inlier ratio threshold | If RANSAC inlier ratio <30%, reject match as unreliable. ESKF treats as no-measurement rather than bad measurement |
| Protect against ground-based adversarial patterns | VERY LOW | Multi-tile consensus | Match against multiple overlapping tiles. Physical adversarial patches affect local area — consensus voting across tiles detects anomalies |
### 6. Sensor Fusion (ESKF)
| Requirement | Risk Level | Control | Implementation |
|-------------|-----------|---------|----------------|
| Prevent single-sensor corruption of position | HIGH | Adaptive noise + outlier rejection | Mahalanobis distance test on each measurement. Reject updates >5σ from predicted state. No single measurement can cause >50m position jump |
| Detect systematic drift | MEDIUM | Satellite matching rate monitoring | If satellite matches consistently disagree with VO by >100m, flag integrity warning to operator |
| Protect fusion state | LOW | In-memory only, no persistence | ESKF state never written to disk. Lost on power-off — no forensic recovery |
### 7. Offline Preprocessing (Developer Machine)
| Requirement | Risk Level | Control | Implementation |
|-------------|-----------|---------|----------------|
| Protect Google Maps API key | MEDIUM | Environment variable, never in code | `.env` file excluded from version control. API key used only on developer machine, never deployed to Jetson |
| Validate downloaded tiles | LOW | Source verification | Download only from Google Maps Tile API via HTTPS. Verify TLS certificate chain |
| Secure tile transfer to Jetson | MEDIUM | Signed + encrypted transfer | Transfer tile set + signed manifest via encrypted channel (SCP/SFTP). Verify manifest signature on Jetson before accepting |
## Security Controls Summary
### Authentication & Authorization
- **Mechanism**: Pre-shared JWT secret between ground station and Jetson
- **Scope**: All API endpoints require valid JWT bearer token
- **Session model**: One JWT per flight session, expires at session end
- **No user management on Jetson** — single-operator system, auth is device-to-device
### Data Protection
| State | Protection | Tool |
|-------|-----------|------|
| At rest (Jetson storage) | LUKS full-disk encryption | JetPack LUKS + OP-TEE |
| In transit (SSE stream) | TLS 1.3 | Uvicorn SSL |
| In memory (ESKF state) | No persistence, cleared on session end | Application logic |
| On capture (emergency) | Tamper-triggered LUKS key erase | Hardware dead-man switch + `cryptsetup luksErase` |
### Secure Communication
- Ground station ↔ Jetson: TLS 1.3 (self-signed cert, pre-installed)
- No internet connectivity during flight — no external attack surface
- RF link security is out of scope (handled by UAV communication system)
### Logging & Monitoring
| What | Where | Retention |
|------|-------|-----------|
| API access logs (request count, errors) | In-memory ring buffer | Current session only, not persisted |
| Security events (auth failures, integrity warnings) | In-memory + SSE alert to operator | Current session only |
| Position history | In-memory for refinement, SSE to ground station | NOT persisted on Jetson after session end |
| Crash/tamper events | Trigger secure erase, no logging | N/A — priority is data destruction |
**Design principle**: Minimize data persistence on Jetson. The ground station is the system of record. Jetson stores only what's needed for the current flight — satellite tiles (encrypted at rest) and transient processing state (memory only).
## Protected Code Execution (OP-TEE / ARM TrustZone)
### Overview
The Jetson Orin Nano Super supports hardware-enforced protected code execution via **ARM TrustZone** and **OP-TEE v4.2.0** (included in Jetson Linux 36.3+). TrustZone partitions the processor into two isolated worlds:
- **Secure World** (TEE): Runs at ARMv8 secure EL-1 (OS) and EL-0 (apps). Code here cannot be read or tampered with from the normal world. OP-TEE is the secure OS.
- **Normal World**: Standard Linux (JetPack). Our Python application, cuVSLAM, FastAPI all run here.
Trusted Applications (TAs) execute inside the secure world and are invoked by Client Applications (CAs) in the normal world via the GlobalPlatform TEE Client API.
### Architecture on Jetson Orin Nano
```
┌─────────────────────────────────────────────────┐
│ NORMAL WORLD (Linux / JetPack) │
│ │
│ Client Application (CA) │
│ ↕ libteec.so (TEE Client API) │
│ ↕ OP-TEE Linux Kernel Driver │
│ ↕ ARM Trusted Firmware (ATF) / Monitor │
├─────────────────────────────────────────────────┤
│ SECURE WORLD (OP-TEE v4.2.0) │
│ │
│ OP-TEE OS (ARMv8 S-EL1) │
│ ├── jetson-user-key PTA (key management) │
│ ├── luks TA (disk encryption passphrase) │
│ ├── hwkey-agent TA (encrypt/decrypt data) │
│ ├── PKCS #11 TA (crypto token interface) │
│ └── Custom TAs (our application-specific TAs) │
│ │
│ Hardware: Security Engine (SE), HW RNG, Fuses │
│ TZ-DRAM: Dedicated memory carveout │
└─────────────────────────────────────────────────┘
```
### Key Hierarchy (Hardware-Backed)
The Jetson Orin Nano provides a hardware-rooted key hierarchy via the Security Engine (SE) and Encrypted Key Blob (EKB):
```
OEM_K1 fuse (256-bit AES, burned into hardware, cannot be read by software)
├── EKB_RK (EKB Root Key, derived via AES-128-ECB from OEM_K1 + FV)
│ ├── EKB_EK (encryption key for EKB content)
│ └── EKB_AK (authentication key for EKB content)
├── HUK (Hardware Unique Key, per-device, derived via NIST-SP-800-108)
│ └── SSK (Secure Storage Key, per-device, generated at OP-TEE boot)
│ ├── TSK (TA Storage Key, per-TA)
│ └── FEK (File Encryption Key, per-file)
└── LUKS passphrase (derived from disk encryption key stored in EKB)
```
Fuse keys are loaded into SE keyslots during early boot (before OP-TEE starts). Software cannot read keys from keyslots — only derive new keys through the SE. After use, keyslots should be cleared via `tegra_se_clear_aes_keyslots()`.
### What to Run in the Secure World (Our Use Cases)
| Use Case | TA Type | Purpose |
|----------|---------|---------|
| **LUKS disk encryption** | Built-in `luks` TA | Generate one-time passphrase at boot to unlock encrypted rootfs. Keys never leave secure world |
| **Tile manifest verification** | Custom User TA | Verify SHA-256 signatures of satellite tile manifests. Signing key stored in EKB, accessible only in secure world |
| **JWT secret storage** | Custom User TA or `hwkey-agent` TA | Store JWT signing secret in EKB. Sign/verify JWTs inside secure world — secret never exposed to Linux |
| **Secure erase trigger** | Custom User TA | Receive tamper signal → invoke `cryptsetup luksErase` via CA. Key erase logic runs in secure world to prevent normal-world interference |
| **TLS private key protection** | PKCS #11 TA | Store TLS private key in OP-TEE secure storage. Uvicorn uses PKCS #11 interface to perform TLS handshake without key leaving secure world |
### How to Enable Protected Code Execution
#### Step 1: Burn OEM_K1 Fuse (One-Time, Irreversible)
```bash
# Generate 256-bit OEM_K1 key (use HSM in production)
openssl rand -hex 32 > oem_k1_key.txt
# Create fuse configuration XML with OEM_K1
# Burn fuse via odmfuse.sh (IRREVERSIBLE)
sudo ./odmfuse.sh -i <fuse_config.xml> <board_name>
```
After burning `SecurityMode` fuse (`odm_production_mode=0x1`), all further fuse writes are blocked. OEM_K1 becomes permanently embedded in hardware.
#### Step 2: Generate and Flash EKB
```bash
# Generate user keys (disk encryption key, JWT secret, tile signing key)
openssl rand -hex 16 > disk_enc_key.txt
openssl rand -hex 32 > jwt_secret.txt
openssl rand -hex 32 > tile_signing_key.txt
# Generate EKB binary
python3 gen_ekb.py -chip t234 \
-oem_k1_key oem_k1_key.txt \
-in_sym_key uefi_enc_key.txt \
-in_sym_key2 disk_enc_key.txt \
-in_auth_key uefi_var_auth_key.txt \
-out eks_t234.img
# Flash EKB to EKS partition
# (part of the normal flash process with secure boot enabled)
```
#### Step 3: Enable LUKS Disk Encryption
```bash
# During flash, set ENC_ROOTFS=1 to encrypt rootfs
export ENC_ROOTFS=1
sudo ./flash.sh <board_name> <storage_device>
```
The `luks` TA in OP-TEE derives a passphrase from the disk encryption key in EKB at boot. The passphrase is generated inside the secure world and passed to `cryptsetup` — it never exists in persistent storage.
#### Step 4: Develop Custom Trusted Applications
Cross-compile TAs for aarch64 using the Jetson OP-TEE source package:
```bash
# Build custom TA (e.g., tile manifest verifier)
make -C <ta_source_dir> \
CROSS_COMPILE="<toolchain>/bin/aarch64-buildroot-linux-gnu-" \
TA_DEV_KIT_DIR="<optee_src>/optee/build/t234/export-ta_arm64/" \
OPTEE_CLIENT_EXPORT="<optee_src>/optee/install/t234/usr" \
TEEC_EXPORT="<optee_src>/optee/install/t234/usr" \
-j"$(nproc)"
# Deploy: copy TA to /lib/optee_armtz/ on Jetson
# Deploy: copy CA to /usr/sbin/ on Jetson
```
TAs conform to the GlobalPlatform TEE Internal Core API. Use the `hello_world` example from `optee_examples` as a starting template.
#### Step 5: Enable Secure Boot
```bash
# Generate PKC key pair (use HSM for production)
openssl genrsa -out pkc_key.pem 3072
# Sign and flash secured images
sudo ./flash.sh --sign pkc_key.pem <board_name> <storage_device>
# After verification, burn SecurityMode fuse (IRREVERSIBLE)
```
### Available Crypto Services in Secure World
| Service | Provider | Notes |
|---------|----------|-------|
| AES-128/256 encryption/decryption | SE hardware | Via keyslot-derived keys, never leaves SE |
| Key derivation (NIST-SP-800-108) | `jetson-user-key` PTA | Derive purpose-specific keys from EKB keys |
| Hardware RNG | SE hardware | `TEE_GenerateRandom()` or PTA command |
| PKCS #11 crypto tokens | PKCS #11 TA | Standard crypto interface for TLS, signing |
| SHA-256, HMAC | MbedTLS (bundled in optee_os) | Software crypto in secure world |
| RSA/ECC signing | GlobalPlatform TEE Crypto API | For manifest signature verification |
### Limitations on Orin Nano
| Limitation | Impact | Workaround |
|-----------|--------|------------|
| No RPMB support (only AGX Orin has RPMB) | Secure storage uses REE FS instead of replay-protected memory | Acceptable — LUKS encryption protects data at rest. REE FS secure storage is encrypted by SSK |
| EKB can only be updated via OTA, not at runtime | Cannot rotate keys in flight | Pre-provision per-device unique keys at manufacturing time |
| OP-TEE hello_world reported issues on some Orin Nano units | Some users report initialization failures | Use JetPack 6.2.2+ which includes fixes. Test thoroughly on target hardware |
| TZ-DRAM is a fixed carveout | Limits secure world memory | Keep TAs lightweight — only crypto operations and key management, not data processing |
### Security Hardening Checklist
- [ ] Burn OEM_K1 fuse with unique per-device key (via HSM)
- [ ] Generate and flash EKB with disk encryption key, JWT secret, tile signing key
- [ ] Enable LUKS full-disk encryption (`ENC_ROOTFS=1`)
- [ ] Modify `luks-srv` to NOT auto-decrypt (require explicit trigger or dead-man switch)
- [ ] Burn SecurityMode fuse (`odm_production_mode=0x1`) — enables secure boot chain
- [ ] Burn debug-disable fuses — disables JTAG
- [ ] Configure anti-rollback ratchet fuses
- [ ] Clear SE keyslots after EKB extraction via `tegra_se_clear_aes_keyslots()`
- [ ] Deploy custom TAs for JWT signing and tile manifest verification
- [ ] Use PKCS #11 for TLS private key protection
- [ ] Test secure erase trigger end-to-end
- [ ] Run `xtest` (OP-TEE test suite) on production Jetson to validate TEE
## Key Security Risks
| Risk | Severity | Mitigation Status |
|------|---------|-------------------|
| Physical capture → data extraction | CRITICAL | Mitigated by LUKS + secure erase. Residual risk: attacker extracts RAM before erase triggers |
| No auto-decrypt bypass for LUKS | HIGH | Requires custom `luks-srv` modification — development effort needed |
| Self-signed TLS certificates | MEDIUM | Acceptable for field deployment. Certificate pinning on ground station prevents MITM |
| cuVSLAM is closed-source | LOW | Cannot audit for vulnerabilities. Mitigated by running in sandboxed environment, input validation on camera frames |
| Dead-man switch reliability | HIGH | Hardware integration required. False triggers (temporary signal loss) must NOT cause premature erase. Needs careful threshold tuning |
## References
- xT-STRIDE threat model for UAVs (2025): https://link.springer.com/article/10.1007/s10207-025-01082-4
- NVIDIA Jetson OP-TEE Documentation (r36.4.4): https://docs.nvidia.com/jetson/archives/r36.4.4/DeveloperGuide/SD/Security/OpTee.html
- NVIDIA Jetson Security Overview (r36.4.3): https://docs.nvidia.com/jetson/archives/r36.4.3/DeveloperGuide/SD/Security.html
- NVIDIA Jetson LUKS Disk Encryption: https://docs.nvidia.com/jetson/archives/r36.4.4/DeveloperGuide/SD/Security/DiskEncryption.html
- NVIDIA Jetson Secure Boot: https://docs.nvidia.com/jetson/archives/r36.4.4/DeveloperGuide/SD/Security/SecureBoot.html
- NVIDIA Jetson Secure Storage: https://docs.nvidia.com/jetson/archives/r36.4.3/DeveloperGuide/SD/Security/SecureStorage.html
- NVIDIA Jetson Firmware TPM: https://docs.nvidia.com/jetson/archives/r36.4.3/DeveloperGuide/SD/Security/FirmwareTPM.html
- NVIDIA Jetson Rollback Protection: https://docs.nvidia.com/jetson/archives/r36.4.3/DeveloperGuide/SD/Security/RollbackProtection.html
- Jetson Orin Fuse Specification: https://developer.nvidia.com/downloads/jetson-agx-orin-series-fuse-specification
- OP-TEE Official Documentation: https://optee.readthedocs.io/en/latest/
- OP-TEE Trusted Application Examples: https://github.com/linaro-swg/optee_examples
- RidgeRun OP-TEE on Jetson Guide: https://developer.ridgerun.com/wiki/index.php/RidgeRun_Platform_Security_Manual/Getting_Started/TEE/NVIDA-Jetson
- GlobalPlatform TEE Specifications: https://globalplatform.org/specs-library/?filter-committee=tee
- Model Agnostic Defense against Adversarial Patches on UAVs (2024): https://arxiv.org/html/2405.19179v1
- FastAPI Security Best Practices (2026): https://fastlaunchapi.dev/blog/fastapi-best-practices-production-2026
-130
View File
@@ -1,130 +0,0 @@
# Solution
## Product Solution Description
Build an onboard GPS-denied localization service that runs on the Jetson companion computer, uses the fixed downward navigation camera and flight-controller inertial telemetry, and emits ArduPilot `GPS_INPUT` estimates with calibrated covariance and source labels.
The production architecture is a trigger-based hybrid estimator:
```text
Nav camera + FC telemetry
|
v
Image quality + calibration + orthorectification
|
+--> Hot path: OpenCV geometry + BASALT VIO --> safety/anchor wrapper --> GPS_INPUT + QGC + FDR
|
+--> Reference path: OpenVINS replay benchmark for VIO drift/covariance tests; Kimera backup replay
|
+--> Trigger path: DINOv2-VLAD query --> CPU FAISS top-K --> ALIKED/DISK+LightGlue --> OpenCV RANSAC --> safety/anchor wrapper
|
+--> Tile path: new COG tile + quality/provenance sidecar --> manifest update --> post-flight Satellite Service sync
```
Heavy local retrieval and local matching are not steady-state per-frame dependencies. They run on cold start, VO failure, sharp turns, disconnected segments, covariance growth, stale-anchor age, or operator-assisted relocalization, using only preloaded cache/index data during flight.
## Architecture
### Camera Ingest, Calibration, And Geometry
| Solution | Tools | Pinned Mode/Config | Fit |
|----------|-------|--------------------|-----|
| OpenCV geometry utility layer | OpenCV 4.x | Calibration, undistortion, homography, RANSAC/USAC, MRE measurement | Selected. Mature, permissive, exact utility fit; not a full estimator. |
### VO / IMU Propagation And Estimator
| Solution | Tools | Pinned Mode/Config | Fit |
|----------|-------|--------------------|-----|
| BASALT + safety/anchor wrapper | BASALT, OpenCV, custom wrapper | BASALT consumes calibrated nav-camera frames + FC IMU; wrapper fuses satellite anchors, calibrates uncertainty, emits source labels and `GPS_INPUT` fields | Selected. Best production VIO candidate found: permissive license, strong benchmark evidence, avoids custom VIO from scratch. |
| OpenVINS | OpenVINS | Monocular camera + IMU EKF/MSCKF reference runs with covariance extraction | Reference only. Strong VIO and covariance baseline, but GPLv3 and generic VIO ownership make it unsuitable as default shipped dependency. |
| Kimera-VIO | Kimera-VIO | Mono/stereo camera + IMU VIO/SLAM backup replay | Backup candidate. BSD-friendly but heavier/stereo-oriented; mono-inertial path has documented caveats. |
| ORB-SLAM3 | ORB-SLAM3 | Monocular-inertial SLAM | Rejected for production. GPLv3 and heavier SLAM/map lifecycle. |
BASALT does not replace the project-owned safety logic. The wrapper remains responsible for satellite anchor acceptance, confidence calibration, source labels, blackout/spoofing modes, tile-write eligibility, and MAVLink `GPS_INPUT` semantics.
### Satellite Service And Anchor Verification
| Solution | Tools | Pinned Mode/Config | Fit |
|----------|-------|--------------------|-----|
| DINOv2-VLAD + CPU FAISS + ALIKED/DISK+LightGlue | DINOv2/AnyLoc-style descriptors, FAISS CPU, LightGlue, OpenCV RANSAC | Offline VPR chunk descriptors; conditional query descriptor; CPU FAISS top-K; learned local match on bounded candidates; TensorRT only after fidelity check | Selected with runtime/fidelity gates. |
| SuperPoint+LightGlue | SuperPoint, LightGlue | Same matcher with SuperPoint features | License-gated benchmark/fallback only. |
| Classical SIFT/ORB | OpenCV | Handcrafted features + homography | Regression/fallback baseline. |
The Satellite Service component imports mission cache/index packages before flight, uploads generated-tile packages after landing, and serves local VPR queries during flight. The VPR index is built over ground-footprint-sized chunks with overlap and a multi-scale descriptor set. VPR is invoked only on relocalization triggers or covariance/anchor-age growth; normal flight uses BASALT VIO plus wrapper propagation. No satellite-provider or Satellite Service network calls are allowed mid-flight.
### Tile Manager
| Solution | Tools | Pinned Mode/Config | Fit |
|----------|-------|--------------------|-----|
| COG tile objects + PostgreSQL/PostGIS manifest + signed JSON sidecars | GDAL COG, PostgreSQL/PostGIS, signed JSON sidecars, FAISS index files | Service tiles and generated tiles are write-new COG objects; active version selected by PostGIS-backed manifest | Selected. Fits geospatial raster access, provenance, spatial/freshness queries, and write-new tile lifecycle. |
| PMTiles | PMTiles | Read-only archive snapshot | Rejected for live cache because in-flight tile generation needs mutable write-new objects. |
Service-source tiles and generated tiles carry CRS, capture date, source, m/px, freshness, quality score, sidecar hashes, and descriptor references. The Tile Manager also orthorectifies eligible nadir frames into generated COG tiles. Stale tiles are rejected or down-confidence weighted.
### MAVLink Integration
| Solution | Tools | Pinned Mode/Config | Fit |
|----------|-------|--------------------|-----|
| MAVSDK telemetry + pymavlink `GPS_INPUT` | MAVSDK, pymavlink | MAVSDK subscriptions; pymavlink emits `GPS_INPUT`; v1 emits GPS_INPUT only; Plane SITL validates `GPS1_TYPE=14`, velocity source params, ignore flags, fix types, accuracy fields | Selected. Exact output control with good telemetry ergonomics. |
The system emits per-frame estimates locally and downsampled status to QGroundControl. `GPS_INPUT.horiz_accuracy` must not under-report the calibrated 95% covariance semi-major axis.
### Security And Safety Controls
| Solution | Tools | Pinned Mode/Config | Fit |
|----------|-------|--------------------|-----|
| Consistency-gated anchor acceptance | Safety/anchor wrapper, cache manifest verification | Anchor accepted only if freshness, provenance, RANSAC, covariance, Mahalanobis, and temporal consistency pass | Selected. Prevents confident false fixes. |
| FDR audit trail | PostgreSQL event index + CBOR payload segments + hashes | Logs estimates, inputs, emitted GPS_INPUT, health, tile writes, anchor decisions | Selected. Supports incident analysis, indexed queries, and cache-poisoning audits. |
## Runtime Modes
| Mode | Trigger | Behavior | `GPS_INPUT` / Telemetry |
|------|---------|----------|--------------------------|
| `satellite_anchored` | VPR + local match passes all gates | Wrapper absolute update; tile write eligible only if sigma gate passes | 3D fix, `horiz_accuracy` >= 95% covariance semi-major axis |
| `vo_extrapolated` | BASALT VIO healthy and anchor age/covariance within bounds | BASALT VIO + wrapper propagation; covariance grows | 3D/2D depending covariance threshold |
| `dead_reckoned` | visual blackout or no accepted anchor | IMU-only propagation, monotonic covariance growth | degraded fix type; QGC `VISUAL_BLACKOUT_IMU_ONLY` |
| failsafe/no-fix | covariance >500 m or blackout >30 s | stop pretending position is valid | `fix_type=0`, `horiz_accuracy=999.0`, QGC `VISUAL_BLACKOUT_FAILSAFE` |
## Testing Strategy
### Integration / Functional Tests
- BASALT replay: assert AC-2.1a and AC-2.2 VO MRE on overlapping frame pairs, completion rate, latency, and wrapper-calibrated covariance.
- OpenVINS reference replay: compare VIO drift, failure cases, and covariance against BASALT + wrapper.
- Kimera-VIO backup replay: keep a second permissive candidate benchmark in case BASALT fails project replay/runtime gates.
- Satellite anchor replay: assert AC-1.1/1.2, AC-2.2 cross-domain MRE, freshness rejection, and source labels.
- DINOv2 descriptor fidelity: compare PyTorch/ONNX/TensorRT embeddings and retrieval rankings before accepting optimized engines.
- FAISS CPU index tests: top-K recall, query latency, index size, save/load behavior on Jetson ARM64.
- LightGlue extractor matrix: ALIKED vs DISK vs SIFT/ORB vs SuperPoint benchmark; SuperPoint excluded from production unless legal approves.
- Tile Manager: orthorectify eligible nadir frames into write-new generated tiles, update manifest, verify active version and rollback.
- `GPS_INPUT` SITL: validate fix type, `horiz_accuracy`, velocity fields, ignore flags, `EK3_SRC1_*` parameters, QGC behavior.
- Security gates: stale tile, mismatched tile hash, low inlier ratio, impossible velocity jump, and spoofed GPS during blackout.
### Non-Functional Tests
- Jetson latency and memory: <400 ms p95, <8 GB shared memory, no 25 W thermal throttle.
- Cache budget: 400 km² imagery + manifests + descriptors fits budget or reports explicit split budget.
- FDR 8-hour load: <=64 GB, rollover logged, no silent payload loss.
- Monte Carlo false-position and cache-poisoning tests for AC-NEW-4 and AC-NEW-7.
- Cold boot: first valid `GPS_INPUT` <30 s p95 across 50 runs.
## References
Detailed source registry: `_docs/00_research/01_source_registry.md`.
Key sources:
- BASALT repository: https://github.com/VladyslavUsenko/basalt
- HybVIO benchmark paper: https://arxiv.org/pdf/2106.11857
- OpenVINS docs: https://docs.openvins.com/index.html
- OpenVINS ATE/RTE comparison: https://github.com/rpng/open_vins/issues/402
- Kimera mono-inertial caveat: https://github.com/MIT-SPARK/Kimera-VIO/issues/254
- AnyLoc paper: https://arxiv.org/html/2308.00688
- Aerial VPR survey: https://arxiv.org/abs/2406.00885
- ALIKED-LightGlue-ONNX: https://github.com/ikeboo/ALIKED-LightGlue-ONNX
- DINOv2 TensorRT issue: https://github.com/NVIDIA/TensorRT/issues/4348
## Related Artifacts
- Tech stack evaluation: `_docs/01_solution/tech_stack.md`
- Component fit matrix: `_docs/00_research/06_component_fit_matrix.md`
- Fact cards: `_docs/00_research/02_fact_cards.md`
-149
View File
@@ -1,149 +0,0 @@
# Solution Draft
## Product Solution Description
Build an onboard GPS-denied localization service that runs on the companion Jetson, consumes the fixed nadir navigation camera plus flight-controller IMU/attitude/altitude, and emits ArduPilot-compatible `GPS_INPUT` estimates with honest covariance.
The solution is a hybrid estimator:
```text
Nav camera + FC telemetry
|
v
Image quality + calibration + orthorectification
|
+--> Steady state: VO/IMU propagation --> ESKF --> GPS_INPUT + QGC + FDR
|
+--> Re-anchor triggers: VPR top-K --> local match/RANSAC --> ESKF anchor update
|
+--> Tile generation: ortho tile + quality sidecar --> local cache --> post-flight Satellite Service upload
```
The steady-state path is intentionally lightweight. Heavy global retrieval and cross-domain local matching run only on cold start, VO failure, sharp turns, disconnected segments, covariance growth, or stale-anchor age.
## Existing / Competitor Solutions Analysis
Recent fixed-wing GPS-denied research supports the same high-level mechanism: monocular visual odometry alone accumulates scale/drift error, while satellite-image comparison can periodically correct it. Aerial VPR surveys also show why the implementation cannot be naive: weather, season, scale mismatch, repetitive fields, tile overlap, and re-ranking runtime all matter.
The selected design borrows the proven structure but rejects an all-in-one SLAM dependency as the product core. OpenVINS and ORB-SLAM3 are useful benchmark/reference implementations, but their GPL-family licensing and broader SLAM lifecycle do not fit a default production dependency for this project.
## Architecture
### Component: Camera Ingest, Calibration, And Geometry
| Solution | Tools | Pinned Mode/Config | Advantages | Limitations | Requirements | Security | Cost | API Capability Evidence | Fit |
|----------|-------|--------------------|------------|-------------|--------------|----------|------|-------------------------|-----|
| OpenCV geometry utility layer | OpenCV 4.x | Camera calibration, undistortion, RANSAC homography, reprojection-error measurement | Mature, local, fast, exact API fit | Not a full estimator | Calibration target, fixed intrinsics/extrinsics, lens/FOV selection | Local-only, no network | Low | MVE: `_docs/00_research/02_fact_cards.md`; Source #5 | Selected |
**Exact-fit evidence**:
- Project constraints checked: fixed nadir camera, calibration, homography, MRE gates.
- Disqualifiers: none.
- Restrictions × AC matrix: `_docs/00_research/06_component_fit_matrix.md`.
### Component: VO / IMU Propagation And Estimator
| Solution | Tools | Pinned Mode/Config | Advantages | Limitations | Requirements | Security | Cost | API Capability Evidence | Fit |
|----------|-------|--------------------|------------|-------------|--------------|----------|------|-------------------------|-----|
| Custom VO/IMU ESKF | OpenCV + custom estimator | Frame-to-frame homography/features + FC IMU/attitude/altitude fused in a custom ESKF with source modes | Owns covariance, source labels, degraded modes, ArduPilot output semantics | More implementation work than adopting VIO library | Synchronized frames/IMU, calibration, replay tests | No third-party cloud; deterministic local logs | Medium | Facts #1, #4, #5, #16 | Selected |
| OpenVINS reference | OpenVINS | Monocular camera + IMU EKF/MSCKF reference runs | Strong VIO reference and evaluation tools | GPL-3 production dependency risk | Dataset/replay adapter | Local only | Low for benchmark | Source #3 | Reference only |
| ORB-SLAM3 alternative | ORB-SLAM3 | Monocular-inertial SLAM | Mature SLAM benchmark | GPLv3, heavier map lifecycle, initialization complexity | Calibration, vocabulary, runtime tuning | Local only | Medium | Source #4 | Rejected for production |
**Exact-fit evidence**:
- Project constraints checked: one camera + IMU, frame-by-frame output, covariance labels, blackout/spoofing modes.
- Runtime quality gate: validate drift and covariance on EuRoC-style and representative fixed-wing replay.
### Component: Satellite Retrieval And Local Anchor
| Solution | Tools | Pinned Mode/Config | Advantages | Limitations | Requirements | Security | Cost | API Capability Evidence | Fit |
|----------|-------|--------------------|------------|-------------|--------------|----------|------|-------------------------|-----|
| DINOv2-VLAD + FAISS + LightGlue | DINOv2/AnyLoc-style descriptors, FAISS, DISK/ALIKED+LightGlue, OpenCV RANSAC | Offline precomputed VPR chunk descriptors; conditional query descriptor; top-K FAISS; local matching on candidates | Matches AC-8.6; scalable top-K; local geometry verifies anchors | Needs Jetson profiling and model-size pruning | Fresh satellite cache, descriptors, dynamic K, RANSAC gates | Cache is local; no in-flight provider calls | Medium-high | MVE blocks in `02_fact_cards.md`; Sources #6-#9 | Selected with runtime gate |
| SuperPoint + LightGlue | SuperPoint, LightGlue | Same local matching with SuperPoint features | Strong technical baseline | SuperPoint license is restrictive | Legal review | Local only | Medium | Source #6 | Needs user decision |
| Classical SIFT/ORB-only | OpenCV | Handcrafted features and homography | Simple fallback, low compute | Poor cross-domain robustness | Feature-rich scenes | Local only | Low | Source #5 | Fallback / regression baseline |
**Exact-fit evidence**:
- Project constraints checked: offline cache, top-K dynamic retrieval, cross-domain local match, <400 ms hot-path constraint.
- Runtime gate: heavy VPR/re-ranking is trigger-based, not per-frame.
### Component: Satellite Cache And Tile Write-Back
| Solution | Tools | Pinned Mode/Config | Advantages | Limitations | Requirements | Security | Cost | API Capability Evidence | Fit |
|----------|-------|--------------------|------------|-------------|--------------|----------|------|-------------------------|-----|
| COG + PostgreSQL/PostGIS manifest + descriptor sidecars | GDAL COG, PostgreSQL/PostGIS manifest, FAISS sidecars | Service tiles and generated candidate tiles stored as tiled compressed GeoTIFFs with CRS/date/source/meter-per-pixel metadata | Geospatial standard, supports write-new-tile workflow, descriptor accounting | Needs careful 10 GB budget | STAC-like manifest, freshness gates, descriptor pruning | Local signed manifests recommended | Medium | Source #18 | Selected |
| PMTiles archive | PMTiles | Single-file read archive | Efficient map reads | Read-only; cannot update in place | Archive rebuild for updates | Local file integrity | Low | Source #17 | Rejected for live mutable cache |
**Exact-fit evidence**:
- Project constraints checked: offline-only, no raw photo storage, mid-flight generated tiles, Satellite Service ingest metadata.
- External dependency: Satellite Service owns the promotion/voting layer for trusted basemap updates.
### Component: MAVLink Integration And Telemetry
| Solution | Tools | Pinned Mode/Config | Advantages | Limitations | Requirements | Security | Cost | API Capability Evidence | Fit |
|----------|-------|--------------------|------------|-------------|--------------|----------|------|-------------------------|-----|
| MAVSDK telemetry + pymavlink output | MAVSDK, pymavlink | MAVSDK subscribes to telemetry; pymavlink emits `GPS_INPUT` to ArduPilot with `GPS1_TYPE=14` | Exact `GPS_INPUT` field control while keeping high-level telemetry APIs | Plane-specific failsafe/spoof triggers need SITL proof | ArduPilot Plane params, QGC status, FDR tlog | Validate MAVLink source and message rate | Medium | Sources #10-#12 | Selected |
**Exact-fit evidence**:
- Project constraints checked: ArduPilot only, v1 `GPS_INPUT` only, WGS84 output, covariance to `horiz_accuracy`.
- Validation gate: Plane SITL with production parameters.
### Component: Flight Data Recorder
| Solution | Tools | Pinned Mode/Config | Advantages | Limitations | Requirements | Security | Cost | API Capability Evidence | Fit |
|----------|-------|--------------------|------------|-------------|--------------|----------|------|-------------------------|-----|
| Segmented FDR | PostgreSQL event index + binary/CBOR payloads with Parquet export post-flight | Fixed-size segment files for per-frame estimates, IMU, MAVLink, health, emitted GPS_INPUT, generated-tile metadata | Replayable, bounded, no raw frame retention | Exact format selected during implementation | Rollover policy, monotonic timestamps | Integrity hash per segment recommended | Medium | AC-NEW-3 | Selected pattern |
## Runtime Modes
| Mode | Trigger | Behavior | Output Label |
|------|---------|----------|--------------|
| Satellite anchored | VPR + local match passes freshness, RANSAC, covariance, and Mahalanobis gates | ESKF absolute update, low covariance, tile generation eligible if sigma gate passes | `satellite_anchored` |
| VO extrapolated | Last anchor fresh enough, VO healthy, normal overlap | VO/IMU propagation, covariance grows with anchor age | `vo_extrapolated` |
| Dead reckoned | Visual blackout, VO failure without anchor, spoofing while no visual signal | IMU-only propagation, monotonic covariance growth, degraded fix type thresholds | `dead_reckoned` |
| Failsafe / no fix | Blackout >30 s or covariance >500 m | `GPS_INPUT.fix_type=0`, `horiz_accuracy=999.0`, QGC status | `dead_reckoned` with failsafe status |
## Testing Strategy
### Integration / Functional Tests
- Replay normal overlapping frames and assert AC-2.1a VO registration rate and AC-2.2 VO MRE.
- Replay satellite-anchor cases and assert AC-1.1/1.2, AC-2.2 cross-domain MRE, freshness gates, and source labels.
- Inject stale tiles and assert no `satellite_anchored` output.
- Inject sharp turns and disconnected segments and assert VPR relocalization.
- Run ArduPilot Plane SITL with `GPS1_TYPE=14` and assert valid `GPS_INPUT` fields, fix type degradation, and QGC statuses.
- Inject visual blackout + spoofed GPS and assert spoofed GPS is ignored, covariance grows, and thresholds match AC-NEW-8.
- Request AI-camera object coordinates and assert level-flight projection plus maneuver error bound.
### Non-Functional Tests
- Jetson Orin Nano Super profiling: <400 ms p95, <8 GB shared memory, 25 W no-throttle hot-soak.
- Cache build test for 400 km²: imagery + manifests + descriptors fit within budget or fail with explicit budget report.
- 8-hour FDR load test: <=64 GB, rollover logged, no silent data loss.
- Monte Carlo false-anchor and over-confidence tests for AC-NEW-4 and AC-NEW-7.
- Cold boot 50x: first valid `GPS_INPUT` <30 s p95.
## References
Detailed source registry: `_docs/00_research/01_source_registry.md`.
Key sources:
- Fixed-wing satellite-aided VO: https://www.mdpi.com/2076-3417/14/16/7420
- Aerial VPR survey: https://arxiv.org/abs/2406.00885
- OpenVINS docs: https://docs.openvins.com/
- ORB-SLAM3 README: https://raw.githubusercontent.com/UZ-SLAMLab/ORB_SLAM3/master/README.md
- LightGlue README: https://raw.githubusercontent.com/cvg/LightGlue/main/README.md
- FAISS docs: https://faiss.ai/index.html
- ArduPilot GPSInput: https://ardupilot.org/mavproxy/docs/modules/GPSInput.html
- MAVLink GPS_INPUT: https://mavlink.io/en/messages/common.html#GPS_INPUT
- Jetson Orin Nano Super: https://www.nvidia.com/en-us/autonomous-machines/embedded-systems/jetson-orin/nano-super-developer-kit/
- PMTiles: https://docs.protomaps.com/pmtiles/
- GDAL COG: https://gdal.org/en/stable/drivers/raster/cog.html
## Related Artifacts
- AC assessment: `_docs/00_research/00_ac_assessment.md`
- Question decomposition: `_docs/00_research/00_question_decomposition.md`
- Source registry: `_docs/00_research/01_source_registry.md`
- Fact cards: `_docs/00_research/02_fact_cards.md`
- Comparison framework: `_docs/00_research/03_comparison_framework.md`
- Reasoning chain: `_docs/00_research/04_reasoning_chain.md`
- Validation log: `_docs/00_research/05_validation_log.md`
- Component fit matrix: `_docs/00_research/06_component_fit_matrix.md`
-125
View File
@@ -1,125 +0,0 @@
# Solution Draft
## Assessment Findings
| Old Component Solution | Weak Point (functional/security/performance) | New Solution |
|------------------------|----------------------------------------------|-------------|
| DINOv2-VLAD with possible TensorRT optimization | TensorRT conversion may produce limited speedup and can alter embedding distances on Jetson-class deployments. | Keep DINOv2-VLAD, but require descriptor-fidelity tests against PyTorch/ONNX before TensorRT descriptors are accepted. |
| FAISS CPU/GPU optional | FAISS GPU is not a safe default on Jetson ARM64/aarch64 packaging. | Pin FAISS as CPU-first on Jetson; use PQ/IVF and top-K caps before considering custom GPU builds. |
| LightGlue local matcher | SuperPoint path has license risk and community confusion. | Keep DISK/ALIKED+LightGlue as production default; SuperPoint remains license-gated benchmark/fallback only. |
| COG cache | "COG cache" could be misread as mutable in-place raster updates. | Use write-new COG tile objects plus manifest versioning and sidecars; never mutate COGs in place. |
| `GPS_INPUT` output | ArduPilot velocity ignore flags have reported EKF3 pitfalls. | SITL must validate velocity source parameters, ignore flags, and whether zero velocity is ever fused accidentally. |
| Visual/satellite anchoring | Draft did not emphasize adversarial/cache integrity enough. | Add signed cache manifests, tile provenance, freshness gates, anchor consistency checks, and FDR audit trail. |
## Product Solution Description
Build an onboard GPS-denied localization service that runs on the Jetson companion computer, uses the fixed downward navigation camera and flight-controller inertial telemetry, and emits ArduPilot `GPS_INPUT` estimates with calibrated covariance and source labels.
The production architecture is a trigger-based hybrid estimator:
```text
Nav camera + FC telemetry
|
v
Image quality + calibration + orthorectification
|
+--> Hot path: VO/IMU propagation --> custom ESKF --> GPS_INPUT + QGC + FDR
|
+--> Trigger path: DINOv2-VLAD query --> CPU FAISS top-K --> DISK/ALIKED+LightGlue --> RANSAC --> ESKF anchor
|
+--> Tile path: new COG tile + quality/provenance sidecar --> manifest update --> post-flight Satellite Service sync
```
Heavy retrieval and local matching are not steady-state per-frame dependencies. They run on cold start, VO failure, sharp turns, disconnected segments, covariance growth, or stale-anchor age.
## Architecture
### Component: Camera Ingest, Calibration, And Geometry
| Solution | Tools | Pinned Mode/Config | Advantages | Limitations | Requirements | Security | Performance | API Capability Evidence | Fit |
|----------|-------|--------------------|------------|-------------|--------------|----------|-------------|-------------------------|-----|
| OpenCV geometry utility layer | OpenCV 4.x | Calibration, undistortion, RANSAC homography, MRE measurement | Mature, exact fit, permissive | Not a full estimator | Checkerboard calibration, fixed extrinsics, lens/FOV selection | Local-only | Fast enough for hot-path utility use | MVE in `02_fact_cards.md`; Source #5 | Selected |
### Component: VO / IMU Propagation And Estimator
| Solution | Tools | Pinned Mode/Config | Advantages | Limitations | Requirements | Security | Performance | API Capability Evidence | Fit |
|----------|-------|--------------------|------------|-------------|--------------|----------|-------------|-------------------------|-----|
| Custom VO/IMU ESKF | OpenCV + custom estimator | Nadir VO/homography + FC IMU/attitude/altitude fused in ESKF with mode labels | Owns covariance, source labels, blackout/spoofing behavior | More implementation effort | Synchronized frames/IMU, calibration, replay tests | No network dependency | Hot path is lightweight | Facts #1, #16 | Selected |
| OpenVINS | OpenVINS | Monocular+IMU reference runs | Strong EKF/MSCKF reference | GPL-3 production risk | Replay adapter | Local only | Benchmark only | Source #3 | Reference only |
| ORB-SLAM3 | ORB-SLAM3 | Monocular-inertial SLAM | Mature benchmark | GPLv3 and heavier SLAM lifecycle | Calibration/vocabulary/runtime tuning | Local only | Riskier on embedded | Source #4 | Rejected for production |
### Component: Satellite Retrieval And Anchor Verification
| Solution | Tools | Pinned Mode/Config | Advantages | Limitations | Requirements | Security | Performance | API Capability Evidence | Fit |
|----------|-------|--------------------|------------|-------------|--------------|----------|-------------|-------------------------|-----|
| DINOv2-VLAD + CPU FAISS + DISK/ALIKED+LightGlue | DINOv2/AnyLoc-style descriptors, FAISS CPU, LightGlue, OpenCV RANSAC | Offline VPR chunk descriptors; conditional query descriptor; CPU FAISS top-K; local match on candidates; TensorRT only after fidelity check | Strong retrieval+geometry structure; avoids per-frame map search | Requires profiling and representative data | Descriptor cache, dynamic K, freshness, RANSAC, Mahalanobis gates | Signed manifests, provenance, stale-tile rejection | Trigger path only; top-K capped | MVE blocks in `02_fact_cards.md`; Sources #6-#9, #21-#25 | Selected with runtime/fidelity gates |
| SuperPoint+LightGlue | SuperPoint, LightGlue | Same matcher with SuperPoint features | Strong technical baseline | SuperPoint license risk | Legal review | Local only | Benchmark only | Sources #6, #23 | Needs user decision |
| Classical SIFT/ORB | OpenCV | Handcrafted features + homography | Simple and cheap | Weak cross-domain robustness | Feature-rich scenes | Local only | Fast | Source #5 | Regression baseline |
### Component: Cache And Tile Lifecycle
| Solution | Tools | Pinned Mode/Config | Advantages | Limitations | Requirements | Security | Performance | API Capability Evidence | Fit |
|----------|-------|--------------------|------------|-------------|--------------|----------|-------------|-------------------------|-----|
| COG tile objects + manifest + sidecars | GDAL COG, manifest DB/JSON, FAISS index files | Service tiles and generated tiles are write-new COG objects; active version selected by manifest | Geospatial standard, supports provenance and quality metadata | Descriptor budget pressure | CRS/date/source/m/px/freshness, sidecar hashes | Signed manifests, tile provenance, hash verification | Efficient local reads | Source #18; Facts #21, #29 | Selected |
| PMTiles | PMTiles | Read-only archive snapshot | Compact read package | Cannot update in place | Archive rebuild | Hash archive | Good for read-only export | Source #17 | Rejected for live cache |
### Component: MAVLink Integration
| Solution | Tools | Pinned Mode/Config | Advantages | Limitations | Requirements | Security | Performance | API Capability Evidence | Fit |
|----------|-------|--------------------|------------|-------------|--------------|----------|-------------|-------------------------|-----|
| MAVSDK telemetry + pymavlink `GPS_INPUT` | MAVSDK, pymavlink | MAVSDK subscriptions; pymavlink emits `GPS_INPUT`; Plane SITL validates `GPS1_TYPE=14`, velocity source params, ignore flags, fix types, accuracy fields | Exact output control with good telemetry ergonomics | SITL required to prove Plane behavior | ArduPilot Plane params, QGC, tlog/FDR | Link/source validation, status audit | Light CPU load | Sources #10-#12, #24 | Selected |
### Component: Security And Safety Controls
| Solution | Tools | Pinned Mode/Config | Advantages | Limitations | Requirements | Security | Performance | API Capability Evidence | Fit |
|----------|-------|--------------------|------------|-------------|--------------|----------|-------------|-------------------------|-----|
| Consistency-gated anchor acceptance | Custom ESKF gates, cache manifest verification | Anchor accepted only if freshness, provenance, RANSAC, covariance, Mahalanobis, and temporal consistency pass | Prevents confident false fixes | Needs calibrated thresholds | Representative replay and Monte Carlo | Rejects stale/poisoned/low-confidence anchors | Lightweight after candidate generation | Facts #16, #17, #28 | Selected |
| FDR audit trail | Segmented logs + hashes | Logs estimates, inputs, emitted GPS_INPUT, health, tile writes, anchor decisions | Supports incident analysis and cache-poisoning audits | Schema work | 64 GB rollover | Tamper-evident hashes recommended | Sequential writes | AC-NEW-3 | Selected |
## Runtime Modes
| Mode | Trigger | Behavior | `GPS_INPUT` / Telemetry |
|------|---------|----------|--------------------------|
| `satellite_anchored` | VPR + local match passes all gates | ESKF absolute update; tile write eligible only if sigma gate passes | 3D fix, `horiz_accuracy` >= 95% covariance semi-major axis |
| `vo_extrapolated` | VO healthy and anchor age/covariance within bounds | VO/IMU propagation; covariance grows | 3D/2D depending covariance threshold |
| `dead_reckoned` | visual blackout or no accepted anchor | IMU-only propagation, monotonic covariance growth | degraded fix type; QGC `VISUAL_BLACKOUT_IMU_ONLY` |
| failsafe/no-fix | covariance >500 m or blackout >30 s | stop pretending position is valid | `fix_type=0`, `horiz_accuracy=999.0`, QGC `VISUAL_BLACKOUT_FAILSAFE` |
## Testing Strategy
### Integration / Functional Tests
- VO replay: assert AC-2.1a and AC-2.2 VO MRE on overlapping frame pairs.
- Satellite anchor replay: assert AC-1.1/1.2, AC-2.2 cross-domain MRE, freshness rejection, and source labels.
- DINOv2 descriptor fidelity: compare PyTorch/ONNX/TensorRT embeddings and retrieval rankings before accepting optimized engines.
- FAISS CPU index tests: top-K recall, query latency, index size, save/load behavior on Jetson ARM64.
- LightGlue extractor matrix: DISK vs ALIKED vs SuperPoint benchmark; SuperPoint output excluded from production unless license approved.
- COG cache lifecycle: write-new generated tile, update manifest, verify active version and rollback.
- `GPS_INPUT` SITL: validate fix type, `horiz_accuracy`, velocity fields, ignore flags, `EK3_SRC1_*` parameters, QGC behavior.
- Security gates: stale tile, mismatched tile hash, low inlier ratio, impossible velocity jump, and spoofed GPS during blackout.
### Non-Functional Tests
- Jetson latency and memory: <400 ms p95, <8 GB shared memory, no 25 W thermal throttle.
- Cache budget: 400 km² imagery + manifests + descriptors fits budget or reports explicit split budget.
- FDR 8-hour load: <=64 GB, rollover logged, no silent payload loss.
- Monte Carlo false-position and cache-poisoning tests for AC-NEW-4 and AC-NEW-7.
- Cold boot: first valid `GPS_INPUT` <30 s p95 across 50 runs.
## References
Detailed source registry: `_docs/00_research/01_source_registry.md`.
Key added Mode B sources:
- DINOv2 TensorRT issue: https://github.com/NVIDIA/TensorRT/issues/4348
- DINOv2 Jetson forum issue: https://forums.developer.nvidia.com/t/dinov2-tensorrt-model-performance-issue/312251
- LightGlue license discussion: https://github.com/cvg/LightGlue/issues/120
- ArduPilot GPS_INPUT velocity issue: https://github.com/ArduPilot/ardupilot/issues/19633
- FAISS install docs: https://github.com/facebookresearch/faiss/blob/main/INSTALL.md
- Orthophoto visual geolocalization: https://ar5iv.labs.arxiv.org/html/2103.14381
## Related Artifacts
- Tech stack evaluation: `_docs/01_solution/tech_stack.md`
- Component fit matrix: `_docs/00_research/06_component_fit_matrix.md`
- Fact cards: `_docs/00_research/02_fact_cards.md`
-491
View File
@@ -1,491 +0,0 @@
# Solution Draft
## Assessment Findings
| Old Component Solution | Weak Point (functional/security/performance) | New Solution |
|------------------------|----------------------------------------------|-------------|
| FastAPI + SSE as primary output | **Functional**: New AC requires MAVLink GPS_INPUT to flight controller, not REST/SSE. The system must act as a GPS replacement module. SSE is wrong output channel. | **Replace with pymavlink GPS_INPUT sender**. Send GPS_INPUT at 5-10Hz to flight controller via UART. Retain minimal FastAPI only for local IPC (object localization API). |
| No ground station integration | **Functional**: New AC requires streaming position+confidence to ground station and receiving re-localization commands via telemetry. Draft02 had no telemetry. | **MAVLink telemetry integration**: GPS data forwarded automatically by flight controller. Custom data via NAMED_VALUE_FLOAT (confidence, drift). Re-localization hints via COMMAND_LONG listener. |
| MAVSDK library (per restriction) | **Functional**: MAVSDK-Python v3.15.3 cannot send GPS_INPUT messages. Feature requested since 2021, still unresolved. This is a blocking limitation for the core output function. | **Use pymavlink** for all MAVLink communication. pymavlink provides `gps_input_send()` and full MAVLink v2 access. Note conflict with restriction — pymavlink is the only viable option. |
| 3fps camera → ~3Hz output | **Performance**: ArduPilot GPS_RATE_MS minimum is 5Hz (200ms). 3Hz camera output is below minimum. Flight controller EKF may not fuse properly. | **IMU-interpolated 5-10Hz GPS_INPUT**: ESKF prediction runs at 100+Hz internally. Emit predicted state as GPS_INPUT at 5-10Hz. Camera corrections arrive at 3Hz within this stream. |
| No startup/failsafe procedures | **Functional**: New AC requires init from last GPS, reboot recovery, IMU-only fallback. Draft02 assumed position was already known. | **Full lifecycle management**: (1) Boot → read GPS from flight controller → init ESKF. (2) Reboot → read IMU-extrapolated position → re-init. (3) N-second failure → stop GPS_INPUT → autopilot falls back to IMU. |
| Basic object localization (nadir only) | **Functional**: New AC adds AI camera with configurable angle and zoom. Nadir pixel-to-GPS is insufficient. | **Trigonometric projection for oblique camera**: ground_distance = alt × tan(tilt), bearing = heading + pan + pixel offset. Local API for AI system requests. |
| No thermal management | **Performance**: Jetson Orin Nano Super throttles at 80°C (GPU drops 1GHz→300MHz = 3x slowdown). Could blow 400ms budget. | **Thermal monitoring + adaptive pipeline**: Use 25W mode. Monitor via tegrastats. If temp >75°C → reduce satellite matching frequency. If >80°C → VO+IMU only. |
| ESKF covariance without explicit drift budget | **Functional**: New AC requires max 100m cumulative VO drift between satellite anchors. Draft02 uses covariance for keyframe selection but no explicit budget. | **Drift budget tracker**: √(σ_x² + σ_y²) from ESKF as drift estimate. When approaching 100m → force every-frame satellite matching. Report via horiz_accuracy in GPS_INPUT. |
| No satellite imagery validation | **Functional**: New AC requires ≥0.5 m/pixel, <2 years old. Draft02 didn't validate. | **Preprocessing validation step**: Check zoom 19 availability (0.3 m/pixel). Fall back to zoom 18 (0.6 m/pixel). Flag stale tiles. |
| "Ask user via API" for re-localization | **Functional**: New AC says send re-localization request to ground station via telemetry link, not REST API. Operator sends hint via telemetry. | **MAVLink re-localization protocol**: On 3 consecutive failures → send STATUSTEXT alert to ground station. Operator sends COMMAND_LONG with approximate lat/lon. System uses hint to constrain tile search. |
## Product Solution Description
A real-time GPS-denied visual navigation system for fixed-wing UAVs, running on a Jetson Orin Nano Super (8GB). The system replaces the GPS module for the flight controller by sending MAVLink GPS_INPUT messages via pymavlink over UART. Position is determined by fusing: (1) CUDA-accelerated visual odometry (cuVSLAM), (2) absolute position corrections from satellite image matching, and (3) IMU data from the flight controller. GPS_INPUT is sent at 5-10Hz, with camera-based corrections at 3Hz and IMU prediction filling the gaps.
**Hard constraint**: Camera shoots at ~3fps (333ms interval). The full VO+ESKF pipeline must complete within 400ms per frame. GPS_INPUT output rate: 5-10Hz minimum (ArduPilot EKF requirement).
**Output architecture**:
- **Primary**: pymavlink → GPS_INPUT to flight controller via UART (replaces GPS module)
- **Telemetry**: Flight controller auto-forwards GPS data to ground station. Custom NAMED_VALUE_FLOAT for confidence/drift at 1Hz
- **Commands**: Ground station → COMMAND_LONG → flight controller → pymavlink listener on companion computer
- **Local IPC**: Minimal FastAPI on localhost for object localization requests from AI systems
```
┌─────────────────────────────────────────────────────────────────────┐
│ OFFLINE (Before Flight) │
│ Satellite Tiles → Download & Validate → Pre-resize → Store │
│ (Google Maps) (≥0.5m/px, <2yr) (matcher res) (GeoHash) │
│ Copy to Jetson storage │
└─────────────────────────────────────────────────────────────────────┘
┌─────────────────────────────────────────────────────────────────────┐
│ ONLINE (During Flight) │
│ │
│ STARTUP: │
│ pymavlink → read GLOBAL_POSITION_INT → init ESKF → start cuVSLAM │
│ │
│ EVERY FRAME (3fps, 333ms interval): │
│ ┌──────────────────────────────────────┐ │
│ │ Nav Camera → Downsample (CUDA ~2ms) │ │
│ │ → cuVSLAM VO+IMU (~9ms) │ │
│ │ → ESKF measurement update │ │
│ └──────────────────────────────────────┘ │
│ │
│ 5-10Hz CONTINUOUS (between camera frames): │
│ ┌──────────────────────────────────────┐ │
│ │ ESKF IMU prediction → GPS_INPUT send │──→ Flight Controller │
│ │ (pymavlink, every 100-200ms) │ (GPS1_TYPE=14) │
│ └──────────────────────────────────────┘ │
│ │
│ KEYFRAMES (every 3-10 frames, async): │
│ ┌──────────────────────────────────────┐ │
│ │ Satellite match (CUDA stream B) │──→ ESKF correction │
│ │ LiteSAM TRT FP16 or XFeat │ │
│ └──────────────────────────────────────┘ │
│ │
│ TELEMETRY (1Hz): │
│ ┌──────────────────────────────────────┐ │
│ │ NAMED_VALUE_FLOAT: confidence, drift │──→ Ground Station │
│ │ STATUSTEXT: alerts, re-loc requests │ (via telemetry radio) │
│ └──────────────────────────────────────┘ │
│ │
│ COMMANDS (from ground station): │
│ ┌──────────────────────────────────────┐ │
│ │ Listen COMMAND_LONG: re-loc hint │←── Ground Station │
│ │ (lat/lon from operator) │ (via telemetry radio) │
│ └──────────────────────────────────────┘ │
│ │
│ LOCAL IPC: │
│ ┌──────────────────────────────────────┐ │
│ │ FastAPI localhost:8000 │←── AI Detection System │
│ │ POST /localize (object GPS calc) │ │
│ │ GET /status (system health) │ │
│ └──────────────────────────────────────┘ │
│ │
│ IMU: 100+Hz from flight controller → ESKF prediction │
│ TILES: ±2km preloaded in RAM from flight plan │
│ THERMAL: Monitor via tegrastats, adaptive pipeline throttling │
└─────────────────────────────────────────────────────────────────────┘
```
## Speed Optimization Techniques
### 1. cuVSLAM for Visual Odometry (~9ms/frame)
NVIDIA's CUDA-accelerated VO library (PyCuVSLAM v15.0.0, March 2026) achieves 116fps on Jetson Orin Nano 8GB at 720p. Supports monocular camera + IMU natively. Auto-fallback to IMU when visual tracking fails, loop closure, Python and C++ APIs.
**CRITICAL: cuVSLAM on low-texture terrain (agricultural fields, water)**:
cuVSLAM uses Shi-Tomasi corners + Lucas-Kanade optical flow (classical features). On uniform agricultural terrain:
- Few corners detected → sparse/unreliable tracking
- Frequent keyframe creation → heavier compute
- Tracking loss → IMU fallback (~1s) → constant-velocity integrator (~0.5s)
- cuVSLAM does NOT guarantee pose recovery after tracking loss
**Mitigation**:
1. Increase satellite matching frequency when cuVSLAM keypoint count drops
2. IMU dead-reckoning bridge via ESKF (continues GPS_INPUT output during tracking loss)
3. Accept higher drift in featureless segments — report via horiz_accuracy
4. Keypoint density monitoring triggers adaptive satellite matching
### 2. Keyframe-Based Satellite Matching
Not every frame needs satellite matching:
- cuVSLAM provides VO at every frame (~9ms)
- Satellite matching triggers on keyframes selected by:
- Fixed interval: every 3-10 frames
- ESKF covariance exceeds threshold (drift approaching budget)
- VO failure: cuVSLAM reports tracking loss
- Thermal: reduce frequency if temperature high
### 3. Satellite Matcher Selection (Benchmark-Driven)
**Context**: Our UAV-to-satellite matching is nadir-to-nadir (both top-down). Challenges are season/lighting differences and temporal changes, not extreme viewpoint gaps.
**Candidate A: LiteSAM (opt) TRT FP16 @ 1280px** — Best satellite-aerial accuracy (RMSE@30 = 17.86m on UAV-VisLoc). 6.31M params. TensorRT FP16 with reparameterized MobileOne. Estimated ~165-330ms on Orin Nano Super with TRT FP16.
**Candidate B: XFeat semi-dense** — ~50-100ms on Orin Nano Super. Fastest option. General-purpose but our nadir-nadir gap is small.
**Decision rule** (day-one on Orin Nano Super):
1. Export LiteSAM (opt) to TensorRT FP16
2. Benchmark at 1280px
3. If ≤200ms → LiteSAM at 1280px
4. If >200ms → XFeat
### 4. TensorRT FP16 Optimization
LiteSAM's MobileOne backbone is reparameterizable — multi-branch collapses to single feed-forward at inference. INT8 safe only for MobileOne CNN layers, NOT for TAIFormer transformer components.
### 5. CUDA Stream Pipelining
- Stream A: cuVSLAM VO for current frame (~9ms) + ESKF fusion (~1ms)
- Stream B: Satellite matching for previous keyframe (async, does not block VO)
- CPU: GPS_INPUT output loop, NAMED_VALUE_FLOAT, command listener, tile management
### 6. Proactive Tile Loading
Preload tiles within ±2km of flight plan into RAM at startup. For a 50km route, ~2000 tiles at zoom 19 ≈ ~200MB. Eliminates disk I/O during flight.
On VO failure / expanded search:
1. Compute IMU dead-reckoning position
2. Rank preloaded tiles by distance to predicted position
3. Try top 3 tiles, then expand
### 7. 5-10Hz GPS_INPUT Output Loop
Dedicated thread/coroutine sends GPS_INPUT at fixed rate (5-10Hz):
1. Read current ESKF state (position, velocity, covariance)
2. Compute horiz_accuracy from √(σ_x² + σ_y²)
3. Set fix_type based on last correction type (3=satellite-corrected, 2=VO-only, 1=IMU-only)
4. Send via `mav.gps_input_send()`
5. Sleep until next interval
This decouples camera frame rate (3fps) from GPS_INPUT rate (5-10Hz).
## Existing/Competitor Solutions Analysis
| Solution | Approach | Accuracy | Hardware | Limitations |
|----------|----------|----------|----------|-------------|
| Mateos-Ramirez et al. (2024) | VO (ORB) + satellite keypoint correction + Kalman | 142m mean / 17km (0.83%) | Orange Pi class | No re-localization; ORB only; 1000m+ altitude |
| SatLoc (2025) | DinoV2 + XFeat + optical flow + adaptive fusion | <15m, >90% coverage | Edge (unspecified) | Paper not fully accessible |
| LiteSAM (2025) | MobileOne + TAIFormer + MinGRU subpixel refinement | RMSE@30 = 17.86m on UAV-VisLoc | RTX 3090 (62ms), AGX Orin (497ms@1184px) | Not tested on Orin Nano |
| cuVSLAM (NVIDIA, 2025-2026) | CUDA-accelerated VO+SLAM, mono/stereo/IMU | <1% trajectory error (KITTI) | Jetson Orin Nano (116fps) | VO only, no satellite matching |
| EfficientLoFTR (CVPR 2024) | Aggregated attention + adaptive token selection | Competitive with LiteSAM | TRT available | 15.05M params, heavier |
| STHN (IEEE RA-L 2024) | Deep homography estimation | 4.24m at 50m range | Lightweight | Needs RGB retraining |
| JointLoc (IROS 2024) | Retrieval + VO fusion, adaptive weighting | 0.237m RMSE over 1km | Open-source | Planetary, needs adaptation |
## Architecture
### Component: Flight Controller Integration (NEW)
| Solution | Tools | Advantages | Limitations | Performance | Fit |
|----------|-------|-----------|-------------|------------|-----|
| pymavlink GPS_INPUT | pymavlink | Full MAVLink v2 access, GPS_INPUT support, pure Python, aarch64 compatible | Lower-level API, manual message handling | ~1ms per send | ✅ Best |
| MAVSDK-Python TelemetryServer | MAVSDK v3.15.3 | Higher-level API, aarch64 wheels | NO GPS_INPUT support, no custom messages | N/A — missing feature | ❌ Blocked |
| MAVSDK C++ MavlinkDirect | MAVSDK v4 (future) | Custom message support planned | Not available in Python wrapper yet | N/A — not released | ❌ Not available |
| MAVROS (ROS) | ROS + MAVROS | Full GPS_INPUT support, ROS ecosystem | Heavy ROS dependency, complex setup, unnecessary overhead | ~5ms overhead | ⚠️ Overkill |
**Selected**: **pymavlink** — only viable Python library for GPS_INPUT. Pure Python, works on aarch64, full MAVLink v2 message set.
**Restriction note**: restrictions.md specifies "MAVSDK library" but MAVSDK-Python cannot send GPS_INPUT (confirmed: Issue #320, open since 2021). pymavlink is the necessary alternative.
Configuration:
- Connection: UART (`/dev/ttyTHS0` or `/dev/ttyTHS1` on Jetson, 115200-921600 baud)
- Flight controller: GPS1_TYPE=14, SERIAL2_PROTOCOL=2 (MAVLink2)
- GPS_INPUT rate: 5-10Hz (dedicated output thread)
- Heartbeat: 1Hz to maintain connection
### Component: Visual Odometry
| Solution | Tools | Advantages | Limitations | Performance | Fit |
|----------|-------|-----------|-------------|------------|-----|
| cuVSLAM (mono+IMU) | PyCuVSLAM v15.0.0 | 116fps on Orin Nano, NVIDIA-optimized, loop closure, IMU fallback | Closed-source, low-texture terrain risk | ~9ms/frame | ✅ Best |
| XFeat frame-to-frame | XFeatTensorRT | Open-source, learned features | No IMU integration, ~30-50ms | ~30-50ms/frame | ⚠️ Fallback |
| ORB-SLAM3 | OpenCV + custom | Well-understood, open-source | CPU-heavy, ~30fps | ~33ms/frame | ⚠️ Slower |
**Selected**: **cuVSLAM (mono+IMU mode)** — 116fps, purpose-built for Jetson.
### Component: Satellite Image Matching
| Solution | Tools | Advantages | Limitations | Performance | Fit |
|----------|-------|-----------|-------------|------------|-----|
| LiteSAM (opt) TRT FP16 @ 1280px | TensorRT | Best satellite-aerial accuracy, 6.31M params | Untested on Orin Nano Super TRT | Est. ~165-330ms TRT FP16 | ✅ If ≤200ms |
| XFeat semi-dense | XFeatTensorRT | ~50-100ms, Jetson-proven, fastest | General-purpose | ~50-100ms | ✅ Fallback |
**Selection**: Day-one benchmark. LiteSAM TRT FP16 at 1280px → if ≤200ms → LiteSAM. If >200ms → XFeat.
### Component: Sensor Fusion
| Solution | Tools | Advantages | Limitations | Performance | Fit |
|----------|-------|-----------|-------------|------------|-----|
| ESKF (custom) | Python/C++ | Lightweight, multi-rate, well-understood | Linear approximation | <1ms/step | ✅ Best |
| Hybrid ESKF/UKF | Custom | 49% better accuracy | More complex | ~2-3ms/step | ⚠️ Upgrade path |
**Selected**: **ESKF** with adaptive measurement noise. State vector: [position(3), velocity(3), orientation_quat(4), accel_bias(3), gyro_bias(3)] = 16 states.
**Output rates**:
- IMU prediction: 100+Hz (from flight controller IMU via pymavlink)
- cuVSLAM VO update: ~3Hz
- Satellite update: ~0.3-1Hz (keyframes, async)
- GPS_INPUT output: 5-10Hz (ESKF predicted state)
**Drift budget**: Track √(σ_x² + σ_y²) from ESKF covariance. When approaching 100m → force every-frame satellite matching.
### Component: Ground Station Telemetry (NEW)
| Solution | Tools | Advantages | Limitations | Performance | Fit |
|----------|-------|-----------|-------------|------------|-----|
| MAVLink auto-forwarding + NAMED_VALUE_FLOAT | pymavlink | Standard MAVLink, no custom protocol, works with all GCS (Mission Planner, QGC) | Limited bandwidth (~12kbit/s), NAMED_VALUE_FLOAT name limited to 10 chars | ~50 bytes/msg | ✅ Best |
| Custom MAVLink dialect messages | pymavlink + custom XML | Full flexibility | Requires custom GCS plugin, non-standard | ~50 bytes/msg | ⚠️ Complex |
| Separate telemetry channel | TCP/UDP over separate radio | Full bandwidth | Extra hardware, extra radio | N/A | ❌ Not available |
**Selected**: **Standard MAVLink forwarding + NAMED_VALUE_FLOAT**
Telemetry data sent to ground station:
- GPS position: auto-forwarded by flight controller from GPS_INPUT data
- Confidence score: NAMED_VALUE_FLOAT `"gps_conf"` at 1Hz (values: 1=HIGH, 2=MEDIUM, 3=LOW, 4=VERY_LOW)
- Drift estimate: NAMED_VALUE_FLOAT `"gps_drift"` at 1Hz (meters)
- Matching status: NAMED_VALUE_FLOAT `"sat_match"` at 1Hz (0=inactive, 1=matching, 2=failed)
- Alerts: STATUSTEXT for critical events (re-localization request, system failure)
Re-localization from ground station:
- Operator sees drift/failure alert in GCS
- Sends COMMAND_LONG (MAV_CMD_USER_1) with lat/lon in param5/param6
- Companion computer listens for COMMAND_LONG with target component ID
- Receives hint → constrains tile search → attempts satellite matching near hint coordinates
### Component: Startup & Lifecycle (NEW)
**Startup sequence**:
1. Boot Jetson → start GPS-Denied service (systemd)
2. Connect to flight controller via pymavlink on UART
3. Wait for heartbeat from flight controller
4. Read GLOBAL_POSITION_INT → extract lat, lon, alt
5. Initialize ESKF state with this position (high confidence if real GPS available)
6. Start cuVSLAM with first camera frames
7. Begin GPS_INPUT output loop at 5-10Hz
8. Preload satellite tiles within ±2km of flight plan into RAM
9. System ready — GPS-Denied active
**GPS denial detection**:
Not required — the system always outputs GPS_INPUT. If real GPS is available, the flight controller uses whichever GPS source has better accuracy (configurable GPS blending or priority). When real GPS degrades/lost, flight controller seamlessly uses our GPS_INPUT.
**Failsafe**:
- If no valid position estimate for N seconds (configurable, e.g., 10s): stop sending GPS_INPUT
- Flight controller detects GPS timeout → falls back to IMU-only dead reckoning
- System logs failure, continues attempting recovery (VO + satellite matching)
- When recovery succeeds: resume GPS_INPUT output
**Reboot recovery**:
1. Jetson reboots → re-establish pymavlink connection
2. Read GPS_RAW_INT (now IMU-extrapolated by flight controller since GPS_INPUT stopped)
3. Initialize ESKF with this position (low confidence, horiz_accuracy=100m+)
4. Resume cuVSLAM + satellite matching → accuracy improves over time
5. Resume GPS_INPUT output
### Component: Object Localization (UPDATED)
**Two modes**:
**Mode 1: Navigation camera (nadir)**
Frame-center GPS from ESKF. Any object in navigation camera frame:
1. Pixel offset from center: (dx_px, dy_px)
2. Convert to meters: dx_m = dx_px × GSD, dy_m = dy_px × GSD
3. Rotate by heading (yaw from IMU)
4. Convert meter offset to lat/lon delta, add to frame-center GPS
**Mode 2: AI camera (configurable angle and zoom)**
1. Get current UAV position from ESKF
2. Get AI camera params: tilt_angle (from vertical), pan_angle (from heading), zoom (effective focal length)
3. Get pixel coordinates of detected object in AI camera frame
4. Compute bearing: bearing = heading + pan_angle + atan2(dx_px × sensor_width / focal_eff, focal_eff)
5. Compute ground distance: for flat terrain, slant_range = altitude / cos(tilt_angle + dy_angle), ground_range = slant_range × sin(tilt_angle + dy_angle)
6. Convert bearing + ground_range to lat/lon offset
7. Return GPS coordinates with accuracy estimate
**Local API** (FastAPI on localhost:8000):
- `POST /localize` — accepts: pixel_x, pixel_y, camera_id ("nav" or "ai"), ai_camera_params (tilt, pan, zoom) → returns: lat, lon, accuracy_m
- `GET /status` — returns: system state, confidence, drift, uptime
### Component: Satellite Tile Preprocessing (Offline)
**Selected**: GeoHash-indexed tile pairs on disk + RAM preloading.
Pipeline:
1. Define operational area from flight plan
2. Download satellite tiles from Google Maps Tile API at zoom 19 (0.3 m/pixel)
3. If zoom 19 unavailable: fall back to zoom 18 (0.6 m/pixel — meets ≥0.5 m/pixel requirement)
4. Validate: resolution ≥0.5 m/pixel, check imagery staleness where possible
5. Pre-resize each tile to matcher input resolution
6. Store: original + resized + metadata (GPS bounds, zoom, GSD, download date) in GeoHash-indexed structure
7. Copy to Jetson storage before flight
8. At startup: preload tiles within ±2km of flight plan into RAM
### Component: Re-localization (Disconnected Segments)
When cuVSLAM reports tracking loss (sharp turn, no features):
1. Flag next frame as keyframe → trigger satellite matching
2. Compute IMU dead-reckoning position since last known position
3. Rank preloaded tiles by distance to dead-reckoning position
4. Try top 3 tiles sequentially
5. If match found: position recovered, new segment begins
6. If 3 consecutive keyframe failures: send STATUSTEXT alert to ground station ("RE-LOC REQUEST: position uncertain, drift Xm")
7. While waiting for operator hint: continue VO/IMU dead reckoning, report low confidence via horiz_accuracy
8. If operator sends COMMAND_LONG with lat/lon hint: constrain tile search to ±500m of hint
9. If still no match after operator hint: continue dead reckoning, log failure
### Component: Thermal Management (NEW)
**Power mode**: 25W (stable sustained performance)
**Monitoring**: Read GPU/CPU temperature via tegrastats or sysfs thermal zones at 1Hz.
**Adaptive pipeline**:
- Normal (<70°C): Full pipeline — cuVSLAM every frame + satellite match every 3-10 frames
- Warm (70-75°C): Reduce satellite matching to every 5-10 frames
- Hot (75-80°C): Reduce satellite matching to every 10-15 frames
- Throttling (>80°C): Disable satellite matching entirely, VO+IMU only (cuVSLAM ~9ms is very light). Report LOW confidence. Resume satellite matching when temp drops below 75°C
**Hardware requirement**: Active cooling fan (5V) mandatory for UAV companion computer enclosure.
## Processing Time Budget (per frame, 333ms interval)
### Normal Frame (non-keyframe, ~60-80% of frames)
| Step | Time | Notes |
|------|------|-------|
| Image capture + transfer | ~10ms | CSI/USB3 |
| Downsample (for cuVSLAM) | ~2ms | OpenCV CUDA |
| cuVSLAM VO+IMU | ~9ms | NVIDIA CUDA-optimized, 116fps |
| ESKF measurement update | ~1ms | NumPy |
| **Total per camera frame** | **~22ms** | Well within 333ms |
GPS_INPUT output runs independently at 5-10Hz (every 100-200ms):
| Step | Time | Notes |
|------|------|-------|
| Read ESKF state | <0.1ms | Shared state |
| Compute horiz_accuracy | <0.1ms | √(σ²) |
| pymavlink gps_input_send | ~1ms | UART write |
| **Total per GPS_INPUT** | **~1ms** | Negligible overhead |
### Keyframe Satellite Matching (async, every 3-10 frames)
Runs on separate CUDA stream — does NOT block VO or GPS_INPUT.
**Path A — LiteSAM TRT FP16 at 1280px (if ≤200ms benchmark)**:
| Step | Time | Notes |
|------|------|-------|
| Downsample to 1280px | ~1ms | OpenCV CUDA |
| Load satellite tile | ~1ms | Pre-loaded in RAM |
| LiteSAM (opt) TRT FP16 | ≤200ms | Go/no-go threshold |
| Geometric pose (RANSAC) | ~5ms | Homography |
| ESKF satellite update | ~1ms | Delayed measurement |
| **Total** | **≤210ms** | Async |
**Path B — XFeat (if LiteSAM >200ms)**:
| Step | Time | Notes |
|------|------|-------|
| XFeat extraction + matching | ~50-80ms | TensorRT FP16 |
| Geometric verification (RANSAC) | ~5ms | |
| ESKF satellite update | ~1ms | |
| **Total** | **~60-90ms** | Async |
## Memory Budget (Jetson Orin Nano Super, 8GB shared)
| Component | Memory | Notes |
|-----------|--------|-------|
| OS + runtime | ~1.5GB | JetPack 6.2 + Python |
| cuVSLAM | ~200-500MB | CUDA library + map state (configure pruning for 3000 frames) |
| Satellite matcher TensorRT | ~50-100MB | LiteSAM FP16 or XFeat FP16 |
| Preloaded satellite tiles | ~200MB | ±2km of flight plan |
| pymavlink + MAVLink runtime | ~20MB | Lightweight |
| FastAPI (local IPC) | ~50MB | Minimal, localhost only |
| Current frame buffer | ~2MB | |
| ESKF state + buffers | ~10MB | |
| **Total** | **~2.1-2.9GB** | ~26-36% of 8GB — comfortable |
## Confidence Scoring → GPS_INPUT Mapping
| Level | Condition | horiz_accuracy (m) | fix_type | GPS_INPUT satellites_visible |
|-------|-----------|---------------------|----------|------------------------------|
| HIGH | Satellite match succeeded + cuVSLAM consistent | 10-20 | 3 (3D) | 12 |
| MEDIUM | cuVSLAM VO only, recent satellite correction (<500m travel) | 20-50 | 3 (3D) | 8 |
| LOW | cuVSLAM VO only, no recent correction, OR high thermal throttling | 50-100 | 2 (2D) | 4 |
| VERY LOW | IMU dead-reckoning only | 100-500 | 1 (no fix) | 1 |
| MANUAL | Operator-provided re-localization hint | 200 | 3 (3D) | 6 |
Note: `satellites_visible` is synthetic — used to influence EKF weighting. ArduPilot gives more weight to GPS with higher satellite count and lower horiz_accuracy.
## Key Risks and Mitigations
| Risk | Likelihood | Impact | Mitigation |
|------|-----------|--------|------------|
| **MAVSDK cannot send GPS_INPUT** | CONFIRMED | Must use pymavlink (conflicts with restriction) | Use pymavlink. Document restriction conflict. No alternative in Python. |
| **cuVSLAM fails on low-texture agricultural terrain** | HIGH | Frequent tracking loss, degraded VO | Increase satellite matching frequency. IMU dead-reckoning bridge. Accept higher drift. |
| **Jetson UART instability with ArduPilot** | MEDIUM | MAVLink connection drops | Test thoroughly. Use USB serial adapter if UART unreliable. Add watchdog reconnect. |
| **Thermal throttling blows satellite matching budget** | MEDIUM | Miss keyframe windows | Adaptive pipeline: reduce/skip satellite matching at high temp. Active cooling mandatory. |
| LiteSAM TRT FP16 >200ms at 1280px | MEDIUM | Must use XFeat | Day-one benchmark. XFeat fallback. |
| XFeat cross-view accuracy insufficient | MEDIUM | Satellite corrections less accurate | Multi-tile consensus, strict RANSAC, increase keyframe frequency. |
| cuVSLAM map memory growth on long flights | MEDIUM | Memory pressure | Configure map pruning, max keyframes. |
| Google Maps satellite quality in conflict zone | HIGH | Satellite matching fails | Accept VO+IMU with higher drift. Alternative providers. |
| GPS_INPUT at 3Hz too slow for ArduPilot EKF | HIGH | Poor EKF fusion, position jumps | 5-10Hz output with IMU interpolation between camera frames. |
| Companion computer reboot mid-flight | LOW | ~30-60s GPS gap | Flight controller IMU fallback. Automatic recovery on restart. |
| Telemetry bandwidth saturation | LOW | Custom messages compete with autopilot telemetry | Limit NAMED_VALUE_FLOAT to 1Hz. Keep messages compact. |
## Testing Strategy
### Integration / Functional Tests
- End-to-end: camera → cuVSLAM → ESKF → GPS_INPUT → verify flight controller receives valid position
- Compare computed positions against ground truth GPS from coordinates.csv
- Measure: percentage within 50m (target: 80%), percentage within 20m (target: 60%)
- Test GPS_INPUT rate: verify 5-10Hz output to flight controller
- Test sharp-turn handling: verify satellite re-localization after 90-degree heading change
- Test disconnected segments: simulate 3+ route breaks, verify all segments connected
- Test re-localization: simulate 3 consecutive failures → verify STATUSTEXT sent → inject COMMAND_LONG hint → verify recovery
- Test object localization: send POST /localize with known AI camera params → verify GPS accuracy
- Test startup: verify ESKF initializes from flight controller GPS
- Test reboot recovery: kill process → restart → verify reconnection and position recovery
- Test failsafe: simulate total failure → verify GPS_INPUT stops → verify flight controller IMU fallback
- Test cuVSLAM map memory: run 3000-frame session, monitor memory growth
### Non-Functional Tests
- **Day-one satellite matcher benchmark**: LiteSAM TRT FP16 at 1280px on Orin Nano Super
- cuVSLAM benchmark: verify 116fps monocular+IMU on Orin Nano Super
- cuVSLAM terrain stress test: urban, agricultural, water, forest
- **UART reliability test**: sustained pymavlink communication over 1+ hour
- **Thermal endurance test**: run full pipeline for 30+ minutes, measure GPU temp, verify no throttling with active cooling
- Per-frame latency: must be <400ms for VO pipeline
- GPS_INPUT latency: measure time from camera capture to GPS_INPUT send
- Memory: peak usage during 3000-frame session (must stay <8GB)
- Drift budget: verify ESKF covariance tracks cumulative drift, triggers satellite matching before 100m
- Telemetry bandwidth: measure total MAVLink bandwidth used by companion computer
## References
- pymavlink GPS_INPUT example: https://webperso.ensta.fr/lebars/Share/GPS_INPUT_pymavlink.py
- pymavlink mavgps.py: https://github.com/ArduPilot/pymavlink/blob/master/examples/mavgps.py
- ArduPilot GPS Input module: https://ardupilot.org/mavproxy/docs/modules/GPSInput.html
- MAVLink GPS_INPUT message spec: https://mavlink.io/en/messages/common.html#GPS_INPUT
- MAVSDK-Python GPS_INPUT limitation: https://github.com/mavlink/MAVSDK-Python/issues/320
- MAVSDK-Python custom message limitation: https://github.com/mavlink/MAVSDK-Python/issues/739
- ArduPilot companion computer setup: https://ardupilot.org/dev/docs/raspberry-pi-via-mavlink.html
- Jetson Orin UART with ArduPilot: https://forums.developer.nvidia.com/t/uart-connection-between-jetson-nano-orin-and-ardupilot/325416
- MAVLink NAMED_VALUE_FLOAT: https://mavlink.io/en/messages/common.html#NAMED_VALUE_FLOAT
- MAVLink STATUSTEXT: https://mavlink.io/en/messages/common.html#STATUSTEXT
- MAVLink telemetry bandwidth: https://github.com/mavlink/mavlink/issues/1605
- JetPack 6.2 Super Mode: https://developer.nvidia.com/blog/nvidia-jetpack-6-2-brings-super-mode-to-nvidia-jetson-orin-nano-and-jetson-orin-nx-modules/
- Jetson Orin Nano power consumption: https://edgeaistack.app/blog/jetson-orin-nano-power-consumption/
- UAV target geolocation: https://www.mdpi.com/1424-8220/22/5/1903
- LiteSAM (2025): https://www.mdpi.com/2072-4292/17/19/3349
- LiteSAM code: https://github.com/boyagesmile/LiteSAM
- cuVSLAM (2025-2026): https://github.com/NVlabs/PyCuVSLAM
- PyCuVSLAM API: https://nvlabs.github.io/PyCuVSLAM/api.html
- Intermodalics cuVSLAM benchmark: https://www.intermodalics.ai/blog/nvidia-isaac-ros-in-depth-cuvslam-and-the-dp3-1-release
- XFeat (CVPR 2024): https://arxiv.org/abs/2404.19174
- XFeat TensorRT for Jetson: https://github.com/PranavNedunghat/XFeatTensorRT
- EfficientLoFTR (CVPR 2024): https://github.com/zju3dv/EfficientLoFTR
- STHN (IEEE RA-L 2024): https://github.com/arplaboratory/STHN
- JointLoc (IROS 2024): https://github.com/LuoXubo/JointLoc
- Hybrid ESKF/UKF: https://arxiv.org/abs/2512.17505
- Google Maps Tile API: https://developers.google.com/maps/documentation/tile/satellite
- ArduPilot EKF Source Selection: https://ardupilot.org/copter/docs/common-ekf-sources.html
- Mateos-Ramirez et al. (2024): https://www.mdpi.com/2076-3417/14/16/7420
- SatLoc (2025): https://www.scilit.com/publications/e5cafaf875a49297a62b298a89d5572f
## Related Artifacts
- AC Assessment: `_docs/00_research/gps_denied_nav/00_ac_assessment.md`
- Research artifacts: `_docs/00_research/gps_denied_nav_v3/`
- Tech stack evaluation: `_docs/01_solution/tech_stack.md`
- Security analysis: `_docs/01_solution/security_analysis.md`
-385
View File
@@ -1,385 +0,0 @@
# Solution Draft
## Assessment Findings
| Old Component Solution | Weak Point (functional/security/performance) | New Solution |
|------------------------|----------------------------------------------|-------------|
| ONNX Runtime as potential inference runtime for AI models | **Performance**: ONNX Runtime CUDA EP on Jetson Orin Nano is 7-8x slower than TRT standalone with default settings (tensor cores not utilized). Even TRT-EP shows up to 3x overhead on some models. | **Use native TRT Engine for all AI models**. Convert PyTorch → ONNX → trtexec → .engine. Load with tensorrt Python module. Eliminates ONNX Runtime dependency entirely. |
| ONNX Runtime TRT-EP memory overhead | **Performance**: ONNX RT TRT-EP keeps serialized engine in memory (~420-440MB vs 130-140MB native TRT). Delta ~280-300MB PER MODEL. On 8GB shared memory, this wastes ~560-600MB for two models. | **Native TRT releases serialized blob after deserialization** → saves ~280-300MB per model. Total savings ~560-600MB — 7% of total memory. Critical given cuVSLAM map growth risk. |
| No explicit TRT engine build step in offline pipeline | **Functional**: Draft03 mentions TRT FP16 but doesn't define the build workflow. When/where are engines built? | **Add TRT engine build to offline preparation pipeline**: After satellite tile download, run trtexec on Jetson to build .engine files. Store alongside tiles. One-time cost per model version. |
| Cross-platform portability via ONNX Runtime | **Functional**: ONNX Runtime's primary value is cross-platform support. Our deployment is Jetson-only — this value is zero. We pay the performance/memory tax for unused portability. | **Drop ONNX Runtime**. Jetson Orin Nano Super is fixed deployment hardware. TRT Engine is the optimal runtime for NVIDIA-only deployment. |
| No DLA offloading considered | **Performance**: Draft03 doesn't mention DLA. Jetson Orin Nano has NO DLA cores — only Orin NX (1-2) and AGX Orin (2) have DLA. | **Confirm: DLA offloading is NOT available on Orin Nano**. All inference must run on GPU (1024 CUDA cores, 16 tensor cores). This makes maximizing GPU efficiency via native TRT even more critical. |
| LiteSAM MinGRU TRT compatibility risk | **Functional**: LiteSAM's subpixel refinement uses 4 stacked MinGRU layers over a 3×3 candidate window (seq_len=9). MinGRU gates depend only on input C_f (not h_{t-1}), so z_t/h̃_t are pre-computable. Ops are standard: Linear, Sigmoid, Mul, Add, ReLU, Tanh. Risk is LOW-MEDIUM — depends on whether implementation uses logcumsumexp (problematic) or simple loop (fine). Seq_len=9 makes this trivially rewritable. | **Day-one verification**: clone LiteSAM repo → torch.onnx.export → polygraphy inspect → trtexec --fp16. If export fails on MinGRU: rewrite forward() as unrolled loop (9 steps). **If LiteSAM cannot be made TRT-compatible: replace with EfficientLoFTR TRT** (proven TRT path via Coarse_LoFTR_TRT, 15.05M params, semi-dense matching). |
## Product Solution Description
A real-time GPS-denied visual navigation system for fixed-wing UAVs, running on a Jetson Orin Nano Super (8GB). All AI model inference uses **native TensorRT Engine files** — no ONNX Runtime dependency. The system replaces the GPS module by sending MAVLink GPS_INPUT messages via pymavlink over UART at 5-10Hz.
Position is determined by fusing: (1) CUDA-accelerated visual odometry (cuVSLAM — native CUDA), (2) absolute position corrections from satellite image matching (LiteSAM or XFeat — TRT Engine FP16), and (3) IMU data from the flight controller via ESKF.
**Inference runtime decision**: Native TRT Engine over ONNX Runtime because:
1. ONNX RT CUDA EP is 7-8x slower on Orin Nano (tensor core bug)
2. ONNX RT TRT-EP wastes ~280-300MB per model (serialized engine retained in memory)
3. Cross-platform portability has zero value — deployment is Jetson-only
4. Native TRT provides direct CUDA stream control for pipelining with cuVSLAM
**Hard constraint**: Camera shoots at ~3fps (333ms interval). Full VO+ESKF pipeline within 400ms. GPS_INPUT at 5-10Hz.
**AI Model Runtime Summary**:
| Model | Runtime | Precision | Memory | Integration |
|-------|---------|-----------|--------|-------------|
| cuVSLAM | Native CUDA (PyCuVSLAM) | N/A (closed-source) | ~200-500MB | CUDA Stream A |
| LiteSAM | TRT Engine | FP16 | ~50-80MB | CUDA Stream B |
| XFeat | TRT Engine | FP16 | ~30-50MB | CUDA Stream B (fallback) |
| ESKF | CPU (Python/C++) | FP64 | ~10MB | CPU thread |
**Offline Preparation Pipeline** (before flight):
1. Download satellite tiles → validate → pre-resize → store (existing)
2. **NEW: Build TRT engines on Jetson** (one-time per model version)
- `trtexec --onnx=litesam_fp16.onnx --saveEngine=litesam.engine --fp16`
- `trtexec --onnx=xfeat.onnx --saveEngine=xfeat.engine --fp16`
3. Copy tiles + engines to Jetson storage
4. At startup: load engines + preload tiles into RAM
```
┌─────────────────────────────────────────────────────────────────────┐
│ OFFLINE (Before Flight) │
│ 1. Satellite Tiles → Download & Validate → Pre-resize → Store │
│ (Google Maps) (≥0.5m/px, <2yr) (matcher res) (GeoHash)│
│ 2. TRT Engine Build (one-time per model version): │
│ PyTorch model → reparameterize → ONNX export → trtexec --fp16 │
│ Output: litesam.engine, xfeat.engine │
│ 3. Copy tiles + engines to Jetson storage │
└─────────────────────────────────────────────────────────────────────┘
┌─────────────────────────────────────────────────────────────────────┐
│ ONLINE (During Flight) │
│ │
│ STARTUP: │
│ 1. pymavlink → read GLOBAL_POSITION_INT → init ESKF │
│ 2. Load TRT engines: litesam.engine + xfeat.engine │
│ (tensorrt.Runtime → deserialize_cuda_engine → create_context) │
│ 3. Allocate GPU buffers for TRT input/output (PyCUDA) │
│ 4. Start cuVSLAM with first camera frames │
│ 5. Preload satellite tiles ±2km into RAM │
│ 6. Begin GPS_INPUT output loop at 5-10Hz │
│ │
│ EVERY FRAME (3fps, 333ms interval): │
│ ┌──────────────────────────────────────┐ │
│ │ Nav Camera → Downsample (CUDA ~2ms) │ │
│ │ → cuVSLAM VO+IMU (~9ms) │ ← CUDA Stream A │
│ │ → ESKF measurement update │ │
│ └──────────────────────────────────────┘ │
│ │
│ 5-10Hz CONTINUOUS: │
│ ┌──────────────────────────────────────┐ │
│ │ ESKF IMU prediction → GPS_INPUT send │──→ Flight Controller │
│ └──────────────────────────────────────┘ │
│ │
│ KEYFRAMES (every 3-10 frames, async): │
│ ┌──────────────────────────────────────┐ │
│ │ TRT Engine inference (Stream B): │ │
│ │ context.enqueue_v3(stream_B) │──→ ESKF correction │
│ │ LiteSAM FP16 or XFeat FP16 │ │
│ └──────────────────────────────────────┘ │
│ │
│ TELEMETRY (1Hz): │
│ ┌──────────────────────────────────────┐ │
│ │ NAMED_VALUE_FLOAT: confidence, drift │──→ Ground Station │
│ └──────────────────────────────────────┘ │
└─────────────────────────────────────────────────────────────────────┘
```
## Architecture
### Component: AI Model Inference Runtime
| Solution | Tools | Advantages | Limitations | Performance | Memory | Fit |
|----------|-------|-----------|-------------|------------|--------|-----|
| Native TRT Engine | tensorrt Python + PyCUDA + trtexec | Optimal latency, minimal memory, full tensor core usage, direct CUDA stream control | Hardware-specific engines, manual buffer management, rebuild per TRT version | Optimal | ~50-130MB total (both models) | ✅ Best |
| ONNX Runtime TRT-EP | onnxruntime + TensorRT EP | Auto-fallback for unsupported ops, simpler API, auto engine caching | +280-300MB per model, wrapper overhead, first-run latency spike | Near-parity (claimed), up to 3x slower (observed) | ~640-690MB total (both models) | ❌ Memory overhead unacceptable |
| ONNX Runtime CUDA EP | onnxruntime + CUDA EP | Simplest API, broadest op support | 7-8x slower on Orin Nano (tensor core bug), no TRT optimizations | 7-8x slower | Standard | ❌ Performance unacceptable |
| Torch-TensorRT | torch_tensorrt | AOT compilation, PyTorch-native, handles mixed TRT/PyTorch | Newer on Jetson, requires PyTorch runtime at inference | Near native TRT | PyTorch runtime ~500MB+ | ⚠️ Viable alternative if TRT export fails |
**Selected**: **Native TRT Engine** — optimal performance and memory on our fixed NVIDIA hardware.
**Fallback**: If any model has unsupported TRT ops (e.g., MinGRU in LiteSAM), use **Torch-TensorRT** for that specific model. Torch-TensorRT handles mixed TRT/PyTorch execution but requires PyTorch runtime in memory.
### Component: TRT Engine Conversion Workflow
**LiteSAM conversion**:
1. Load PyTorch model with trained weights
2. Reparameterize MobileOne backbone (collapse multi-branch → single Conv2d+BN)
3. Export to ONNX: `torch.onnx.export(model, dummy_input, "litesam.onnx", opset_version=17)`
4. Verify with polygraphy: `polygraphy inspect model litesam.onnx`
5. Build engine on Jetson: `trtexec --onnx=litesam.onnx --saveEngine=litesam.engine --fp16 --memPoolSize=workspace:2048`
6. Verify engine: `trtexec --loadEngine=litesam.engine --fp16`
**XFeat conversion**:
1. Load PyTorch model
2. Export to ONNX: `torch.onnx.export(model, dummy_input, "xfeat.onnx", opset_version=17)`
3. Build engine on Jetson: `trtexec --onnx=xfeat.onnx --saveEngine=xfeat.engine --fp16`
4. Alternative: use XFeatTensorRT C++ implementation directly
**INT8 quantization strategy** (optional, future optimization):
- MobileOne backbone (CNN): INT8 safe with calibration data
- TAIFormer (transformer attention): FP16 only — INT8 degrades accuracy
- XFeat: evaluate INT8 on actual UAV-satellite pairs before deploying
- Use nvidia-modelopt for calibration: `from modelopt.onnx.quantization import quantize`
### Component: TRT Python Inference Wrapper
Minimal wrapper class for TRT engine inference:
```python
import tensorrt as trt
import pycuda.driver as cuda
class TRTInference:
def __init__(self, engine_path, stream):
self.logger = trt.Logger(trt.Logger.WARNING)
self.runtime = trt.Runtime(self.logger)
with open(engine_path, 'rb') as f:
self.engine = self.runtime.deserialize_cuda_engine(f.read())
self.context = self.engine.create_execution_context()
self.stream = stream
self._allocate_buffers()
def _allocate_buffers(self):
self.inputs = {}
self.outputs = {}
for i in range(self.engine.num_io_tensors):
name = self.engine.get_tensor_name(i)
shape = self.engine.get_tensor_shape(name)
dtype = trt.nptype(self.engine.get_tensor_dtype(name))
size = trt.volume(shape)
device_mem = cuda.mem_alloc(size * np.dtype(dtype).itemsize)
self.context.set_tensor_address(name, int(device_mem))
mode = self.engine.get_tensor_mode(name)
if mode == trt.TensorIOMode.INPUT:
self.inputs[name] = (device_mem, shape, dtype)
else:
self.outputs[name] = (device_mem, shape, dtype)
def infer_async(self, input_data):
for name, data in input_data.items():
cuda.memcpy_htod_async(self.inputs[name][0], data, self.stream)
self.context.enqueue_v3(self.stream.handle)
def get_output(self):
results = {}
for name, (dev_mem, shape, dtype) in self.outputs.items():
host_mem = np.empty(shape, dtype=dtype)
cuda.memcpy_dtoh_async(host_mem, dev_mem, self.stream)
self.stream.synchronize()
return results
```
Key design: `infer_async()` + `get_output()` split enables pipelining with cuVSLAM on Stream A while satellite matching runs on Stream B.
### Component: Visual Odometry (UNCHANGED)
cuVSLAM — native CUDA library, not affected by TRT migration. Already optimal.
### Component: Satellite Image Matching (UPDATED runtime + fallback chain)
| Solution | Tools | Advantages | Limitations | Performance (est. Orin Nano Super TRT FP16) | Params | Fit |
|----------|-------|-----------|-------------|----------------------------------------------|--------|-----|
| LiteSAM (opt) TRT Engine FP16 @ 1280px | trtexec + tensorrt Python | Best satellite-aerial accuracy (RMSE@30=17.86m UAV-VisLoc), 6.31M params, smallest model | MinGRU TRT export needs verification (LOW-MEDIUM risk) | Est. ~165-330ms | 6.31M | ✅ Primary (if TRT export succeeds AND ≤200ms) |
| EfficientLoFTR TRT Engine FP16 | trtexec + tensorrt Python | Proven TRT path (Coarse_LoFTR_TRT repo, 138 stars). Semi-dense. CVPR 2024. High accuracy. | 2.4x more params than LiteSAM. Requires einsum→elementary ops rewrite for TRT (documented in Coarse_LoFTR_TRT paper). | Est. ~200-400ms | 15.05M | ✅ Fallback if LiteSAM TRT fails |
| XFeat TRT Engine FP16 | trtexec + tensorrt Python (or XFeatTensorRT C++) | Fastest. Proven TRT implementation. Lightweight. | General-purpose, not designed for cross-view satellite-aerial gap (but nadir-nadir gap is small). | Est. ~50-100ms | <5M | ✅ Speed fallback |
**Decision tree (day-one on Orin Nano Super)**:
1. Clone LiteSAM repo → reparameterize MobileOne → `torch.onnx.export()``polygraphy inspect`
2. If ONNX export succeeds → `trtexec --onnx=litesam.onnx --saveEngine=litesam.engine --fp16`
3. If MinGRU causes ONNX/TRT failure → rewrite MinGRU forward() as unrolled 9-step loop → retry
4. If rewrite fails or accuracy degrades → **switch to EfficientLoFTR TRT**:
- Apply Coarse_LoFTR_TRT TRT-adaptation techniques (einsum replacement, etc.)
- Export to ONNX → trtexec --fp16
- Benchmark at 640×480 and 1280px
5. Benchmark winner: **if ≤200ms → use it. If >200ms but ≤300ms → acceptable (async on Stream B). If >300ms → use XFeat TRT**
**EfficientLoFTR TRT adaptation** (from Coarse_LoFTR_TRT paper, proven workflow):
- Replace `torch.einsum()` with elementary ops (view, bmm, reshape, sum)
- Replace any TRT-incompatible high-level PyTorch functions
- Use ONNX export path (less memory required than Torch-TensorRT on 8GB device)
- Knowledge distillation available for further parameter reduction if needed
### Component: Sensor Fusion (UNCHANGED)
ESKF — CPU-based mathematical filter, not affected.
### Component: Flight Controller Integration (UNCHANGED)
pymavlink — not affected by TRT migration.
### Component: Ground Station Telemetry (UNCHANGED)
MAVLink NAMED_VALUE_FLOAT — not affected.
### Component: Startup & Lifecycle (UPDATED)
**Updated startup sequence**:
1. Boot Jetson → start GPS-Denied service (systemd)
2. Connect to flight controller via pymavlink on UART
3. Wait for heartbeat from flight controller
4. **Initialize PyCUDA context**
5. **Load TRT engines**: litesam.engine + xfeat.engine via tensorrt.Runtime.deserialize_cuda_engine()
6. **Allocate GPU I/O buffers** for both models
7. **Create CUDA streams**: Stream A (cuVSLAM), Stream B (satellite matching)
8. Read GLOBAL_POSITION_INT → init ESKF
9. Start cuVSLAM with first camera frames
10. Begin GPS_INPUT output loop at 5-10Hz
11. Preload satellite tiles within ±2km into RAM
12. System ready
**Engine load time**: ~1-3 seconds per engine (deserialization from .engine file). One-time cost at startup.
### Component: Thermal Management (UNCHANGED)
Same adaptive pipeline. TRT engines are slightly more power-efficient than ONNX Runtime, but the difference is within noise.
### Component: Object Localization (UNCHANGED)
Not affected — trigonometric calculation, no AI inference.
## Speed Optimization Techniques
### 1. cuVSLAM for Visual Odometry (~9ms/frame)
Unchanged from draft03. Native CUDA, not part of TRT migration.
### 2. Native TRT Engine Inference (NEW)
All AI models run as pre-compiled TRT FP16 engines:
- Engine files built offline with trtexec (one-time per model version)
- Loaded at startup (~1-3s per engine)
- Inference via context.enqueue_v3() on dedicated CUDA Stream B
- GPU buffers pre-allocated — zero runtime allocation during flight
- No ONNX Runtime dependency — no framework overhead
Memory advantage over ONNX Runtime TRT-EP: ~560-600MB saved (both models combined).
Latency advantage: eliminates ONNX wrapper overhead, guaranteed tensor core utilization.
### 3. CUDA Stream Pipelining (REFINED)
- Stream A: cuVSLAM VO for current frame (~9ms) + ESKF fusion (~1ms)
- Stream B: TRT engine inference for satellite matching (LiteSAM or XFeat, async)
- CPU: GPS_INPUT output loop, NAMED_VALUE_FLOAT, command listener, tile management
- **NEW**: Both cuVSLAM and TRT engines use CUDA streams natively — no framework abstraction layer. Direct GPU scheduling.
### 4-7. (UNCHANGED from draft03)
Keyframe-based satellite matching, TensorRT FP16 optimization, proactive tile loading, 5-10Hz GPS_INPUT output — all unchanged.
## Processing Time Budget (per frame, 333ms interval)
### Normal Frame (non-keyframe)
Unchanged from draft03 — cuVSLAM dominates at ~22ms total.
### Keyframe Satellite Matching (async, CUDA Stream B)
**Path A — LiteSAM TRT Engine FP16 at 1280px**:
| Step | Time | Notes |
|------|------|-------|
| Downsample to 1280px | ~1ms | OpenCV CUDA |
| Load satellite tile | ~1ms | Pre-loaded in RAM |
| Copy input to GPU buffer | <0.5ms | PyCUDA memcpy_htod_async |
| LiteSAM TRT Engine FP16 | ≤200ms | context.enqueue_v3(stream_B) |
| Copy output from GPU | <0.5ms | PyCUDA memcpy_dtoh_async |
| Geometric pose (RANSAC) | ~5ms | Homography |
| ESKF satellite update | ~1ms | Delayed measurement |
| **Total** | **≤210ms** | Async on Stream B |
**Path B — XFeat TRT Engine FP16**:
| Step | Time | Notes |
|------|------|-------|
| XFeat TRT Engine inference | ~50-80ms | context.enqueue_v3(stream_B) |
| Geometric verification (RANSAC) | ~5ms | |
| ESKF satellite update | ~1ms | |
| **Total** | **~60-90ms** | Async on Stream B |
## Memory Budget (Jetson Orin Nano Super, 8GB shared)
| Component | Memory (Native TRT) | Memory (ONNX RT TRT-EP) | Notes |
|-----------|---------------------|--------------------------|-------|
| OS + runtime | ~1.5GB | ~1.5GB | JetPack 6.2 + Python |
| cuVSLAM | ~200-500MB | ~200-500MB | CUDA library + map |
| **LiteSAM TRT engine** | **~50-80MB** | **~330-360MB** | Native TRT vs TRT-EP. If LiteSAM fails: EfficientLoFTR ~100-150MB |
| **XFeat TRT engine** | **~30-50MB** | **~310-330MB** | Native TRT vs TRT-EP |
| Preloaded satellite tiles | ~200MB | ~200MB | ±2km of flight plan |
| pymavlink + MAVLink | ~20MB | ~20MB | |
| FastAPI (local IPC) | ~50MB | ~50MB | |
| ESKF + buffers | ~10MB | ~10MB | |
| ONNX Runtime framework | **0MB** | **~150MB** | Eliminated with native TRT |
| **Total** | **~2.1-2.9GB** | **~2.8-3.6GB** | |
| **% of 8GB** | **26-36%** | **35-45%** | |
| **Savings** | — | — | **~700MB saved with native TRT** |
## Confidence Scoring → GPS_INPUT Mapping
Unchanged from draft03.
## Key Risks and Mitigations
| Risk | Likelihood | Impact | Mitigation |
|------|-----------|--------|------------|
| **LiteSAM MinGRU ops unsupported in TRT 10.3** | LOW-MEDIUM | LiteSAM TRT export fails | Day-one verification: ONNX export → polygraphy → trtexec. If MinGRU fails: (1) rewrite as unrolled 9-step loop, (2) if still fails: **switch to EfficientLoFTR TRT** (proven TRT path, Coarse_LoFTR_TRT, 15.05M params). XFeat TRT as speed fallback. |
| **TRT engine build OOM on 8GB Jetson** | LOW | Cannot build engines on target device | Our models are small (6.31M LiteSAM, <5M XFeat). OOM unlikely. If occurs: reduce --memPoolSize, or build on identical Orin Nano module with more headroom |
| **Engine incompatibility after JetPack update** | MEDIUM | Must rebuild engines | Include engine rebuild in JetPack update procedure. Takes minutes per model. |
| **MAVSDK cannot send GPS_INPUT** | CONFIRMED | Must use pymavlink | Unchanged from draft03 |
| **cuVSLAM fails on low-texture terrain** | HIGH | Frequent tracking loss | Unchanged from draft03 |
| **Thermal throttling** | MEDIUM | Satellite matching budget blown | Unchanged from draft03 |
| LiteSAM TRT FP16 >200ms at 1280px | MEDIUM | Must use fallback matcher | Day-one benchmark. Fallback chain: EfficientLoFTR TRT (if ≤300ms) → XFeat TRT (if all >300ms) |
| Google Maps satellite quality in conflict zone | HIGH | Satellite matching fails | Unchanged from draft03 |
## Testing Strategy
### Integration / Functional Tests
All tests from draft03 unchanged, plus:
- **TRT engine load test**: Verify litesam.engine and xfeat.engine load successfully on Jetson Orin Nano Super
- **TRT inference correctness**: Compare TRT engine output vs PyTorch reference output (max L1 error < 0.01)
- **CUDA Stream B pipelining**: Verify satellite matching on Stream B does not block cuVSLAM on Stream A
- **Engine pre-built validation**: Verify engine files from offline preparation work without rebuild at runtime
### Non-Functional Tests
All tests from draft03 unchanged, plus:
- **TRT engine build time**: Measure trtexec build time for LiteSAM and XFeat on Orin Nano Super (expected: 1-5 minutes each)
- **TRT engine load time**: Measure deserialization time (expected: 1-3 seconds each)
- **Memory comparison**: Measure actual GPU memory with native TRT vs ONNX RT TRT-EP for both models
- **MinGRU TRT compatibility** (day-one blocker):
1. Clone LiteSAM repo, load pretrained weights
2. Reparameterize MobileOne backbone
3. `torch.onnx.export(model, dummy, "litesam.onnx", opset_version=17)`
4. `polygraphy inspect model litesam.onnx` — check for unsupported ops
5. `trtexec --onnx=litesam.onnx --saveEngine=litesam.engine --fp16`
6. If step 3 or 5 fails on MinGRU: rewrite MinGRU forward() as unrolled loop, retry
7. If still fails: switch to EfficientLoFTR, apply Coarse_LoFTR_TRT adaptation
8. Compare TRT output vs PyTorch reference (max L1 error < 0.01)
- **EfficientLoFTR TRT fallback benchmark** (if LiteSAM fails): apply TRT adaptation from Coarse_LoFTR_TRT → ONNX → trtexec → measure latency at 640×480 and 1280px
- **Tensor core utilization**: Verify with NSight that TRT engines use tensor cores (unlike ONNX RT CUDA EP)
## References
- ONNX Runtime Issue #24085 (Jetson Orin Nano tensor core bug): https://github.com/microsoft/onnxruntime/issues/24085
- ONNX Runtime Issue #20457 (TRT-EP memory overhead): https://github.com/microsoft/onnxruntime/issues/20457
- ONNX Runtime Issue #12083 (TRT-EP vs native TRT): https://github.com/microsoft/onnxruntime/issues/12083
- NVIDIA TensorRT 10 Python API: https://docs.nvidia.com/deeplearning/tensorrt/10.15.1/inference-library/python-api-docs.html
- TensorRT Best Practices: https://docs.nvidia.com/deeplearning/tensorrt/latest/performance/best-practices.html
- TensorRT engine hardware specificity: https://github.com/NVIDIA/TensorRT/issues/1920
- trtexec ONNX conversion: https://nvidia-jetson.piveral.com/jetson-orin-nano/how-to-convert-onnx-to-engine-on-jetson-orin-nano-dev-board/
- Torch-TensorRT JetPack 6.2: https://docs.pytorch.org/TensorRT/v2.10.0/getting_started/jetpack.html
- XFeatTensorRT: https://github.com/PranavNedunghat/XFeatTensorRT
- JetPack 6.2 Release Notes: https://docs.nvidia.com/jetson/archives/jetpack-archived/jetpack-62/release-notes/index.html
- Jetson Orin Nano Super: https://developer.nvidia.com/blog/nvidia-jetson-orin-nano-developer-kit-gets-a-super-boost/
- DLA on Jetson Orin: https://developer.nvidia.com/blog/maximizing-deep-learning-performance-on-nvidia-jetson-orin-with-dla/
- EfficientLoFTR (CVPR 2024): https://github.com/zju3dv/EfficientLoFTR
- EfficientLoFTR HuggingFace: https://huggingface.co/docs/transformers/en/model_doc/efficientloftr
- Coarse_LoFTR_TRT (TRT for embedded): https://github.com/Kolkir/Coarse_LoFTR_TRT
- Coarse_LoFTR_TRT paper: https://ar5iv.labs.arxiv.org/html/2202.00770
- LoFTR_TRT: https://github.com/Kolkir/LoFTR_TRT
- minGRU ("Were RNNs All We Needed?"): https://huggingface.co/papers/2410.01201
- minGRU PyTorch implementation: https://github.com/lucidrains/minGRU-pytorch
- LiteSAM paper (MinGRU details, Eqs 12-16): https://www.mdpi.com/2072-4292/17/19/3349
- DALGlue (UAV feature matching, 2025): https://www.nature.com/articles/s41598-025-21602-5
- All references from solution_draft03.md
## Related Artifacts
- AC Assessment: `_docs/00_research/gps_denied_nav/00_ac_assessment.md`
- Research artifacts (this assessment): `_docs/00_research/trt_engine_migration/`
- Previous research: `_docs/00_research/gps_denied_nav_v3/`
- Tech stack evaluation: `_docs/01_solution/tech_stack.md`
- Security analysis: `_docs/01_solution/security_analysis.md`
-562
View File
@@ -1,562 +0,0 @@
# Solution Draft
## Assessment Findings
| Old Component Solution | Weak Point (functional/security/performance) | New Solution |
|------------------------|----------------------------------------------|-------------|
| ONNX Runtime as potential inference runtime for AI models | **Performance**: ONNX Runtime CUDA EP on Jetson Orin Nano is 7-8x slower than TRT standalone with default settings (tensor cores not utilized). Even TRT-EP shows up to 3x overhead on some models. | **Use native TRT Engine for all AI models**. Convert PyTorch → ONNX → trtexec → .engine. Load with tensorrt Python module. Eliminates ONNX Runtime dependency entirely. |
| ONNX Runtime TRT-EP memory overhead | **Performance**: ONNX RT TRT-EP keeps serialized engine in memory (~420-440MB vs 130-140MB native TRT). Delta ~280-300MB PER MODEL. On 8GB shared memory, this wastes ~560-600MB for two models. | **Native TRT releases serialized blob after deserialization** → saves ~280-300MB per model. Total savings ~560-600MB — 7% of total memory. Critical given cuVSLAM map growth risk. |
| No explicit TRT engine build step in offline pipeline | **Functional**: Draft03 mentions TRT FP16 but doesn't define the build workflow. When/where are engines built? | **Add TRT engine build to offline preparation pipeline**: After satellite tile download, run trtexec on Jetson to build .engine files. Store alongside tiles. One-time cost per model version. |
| Cross-platform portability via ONNX Runtime | **Functional**: ONNX Runtime's primary value is cross-platform support. Our deployment is Jetson-only — this value is zero. We pay the performance/memory tax for unused portability. | **Drop ONNX Runtime**. Jetson Orin Nano Super is fixed deployment hardware. TRT Engine is the optimal runtime for NVIDIA-only deployment. |
| No DLA offloading considered | **Performance**: Draft03 doesn't mention DLA. Jetson Orin Nano has NO DLA cores — only Orin NX (1-2) and AGX Orin (2) have DLA. | **Confirm: DLA offloading is NOT available on Orin Nano**. All inference must run on GPU (1024 CUDA cores, 16 tensor cores). This makes maximizing GPU efficiency via native TRT even more critical. |
| LiteSAM MinGRU TRT compatibility risk | **Functional**: LiteSAM's subpixel refinement uses 4 stacked MinGRU layers over a 3×3 candidate window (seq_len=9). MinGRU gates depend only on input C_f (not h_{t-1}), so z_t/h̃_t are pre-computable. Ops are standard: Linear, Sigmoid, Mul, Add, ReLU, Tanh. Risk is LOW-MEDIUM — depends on whether implementation uses logcumsumexp (problematic) or simple loop (fine). Seq_len=9 makes this trivially rewritable. | **Day-one verification**: clone LiteSAM repo → torch.onnx.export → polygraphy inspect → trtexec --fp16. If export fails on MinGRU: rewrite forward() as unrolled loop (9 steps). **If LiteSAM cannot be made TRT-compatible: replace with EfficientLoFTR TRT** (proven TRT path via Coarse_LoFTR_TRT, 15.05M params, semi-dense matching). |
| Camera shoots at ~3fps (draft03/04 hard constraint) | **Functional**: ADTI 20L V1 max continuous rate is **2.0 fps** (burst only, buffer-limited). ADTi recommends 1.5s per capture (**0.7 fps sustained**). 3fps is physically impossible. 2fps is not sustainable for multi-hour flights (buffer saturation + mechanical shutter wear). | **Revised to 0.7 fps sustained**. ADTI 20L V1 is the sole navigation camera — used for both cuVSLAM VO and satellite matching. At 70 km/h cruise and 600m altitude: 27.8m inter-frame displacement, ~175px pixel shift, **95.2% frame overlap** — within pyramid-assisted LK optical flow range. ESKF IMU prediction at 5-10Hz bridges 1.43s gaps between frames. Satellite matching triggered on keyframes from the same stream. Viewpro A40 Pro reserved for AI object detection only. |
## UAV Platform
### Airframe Configuration
| Component | Specification | Weight |
|-----------|--------------|--------|
| Airframe | Custom 3.5m S-2 Glass Composite, Eppler 423 airfoil | ~4.5 kg |
| Battery | 2x VANT Semi-Solid State 6S 30Ah (22.2V, 666Wh each) | 5.30 kg (2.65 kg each) |
| Motor | T-Motor AT4125 KV540 (2000W peak, 5.5 kg thrust w/ APC 15x8) | 0.36 kg |
| Propulsion Acc. | ESC, 15x8 Folding Propeller, Servos, Cables | ~0.50 kg |
| Avionics | Pixhawk 6x + GPS | ~0.10 kg |
| Computing | NVIDIA Jetson Orin Nano Super Dev Kit | ~0.30 kg |
| Camera 1 | ADTI 20L V1 APS-C Camera + 16mm Lens | ~0.22 kg |
| Camera 2 | Viewpro A40 Pro (A40TPro) AI Gimbal | ~0.85 kg |
| Misc | Mounts, wiring, connectors | ~0.35 kg |
| **Total AUW** | | **~12.5 kg** |
T-Motor AT4125 KV540 is spec'd for 8-10 kg fixed-wing. At 12.5 kg AUW, static thrust-to-weight is ~0.44. Flyable but margins are tight — weight optimization should be monitored.
### Flight Performance (Max Endurance)
Assumptions: wingspan 3.5m, mean chord ~0.30m, wing area S ~1.05 m², AR ~11.7, Eppler 423 Cl_max ~1.8, cruise altitude 800-1000m (ρ ~1.10 kg/m³).
| Parameter | Value |
|-----------|-------|
| Stall speed (900m altitude) | 10.6 m/s (38 km/h) |
| Min-power speed (theoretical) | 12.0 m/s (43 km/h) — only 13% above stall, impractical |
| **Max endurance cruise (1.3× stall margin)** | **14 m/s (50 km/h)** |
| Best range speed | ~18 m/s (65 km/h) |
| Energy Budget | Value |
|---------------|-------|
| Total battery energy | 1332 Wh (2 × 666 Wh) |
| Usable (80% DoD) | 1066 Wh |
| Climb to 900m (~5 min at 3 m/s) | 57 Wh |
| 10% reserve | ×0.9 |
| **Available for cruise** | **~908 Wh** |
| Power Budget | Value |
|--------------|-------|
| Propulsion (L/D ~15, η_prop 0.65, η_motor 0.80) | ~212 W |
| Electronics (Pixhawk + Jetson + cameras + gimbal + servos) | ~55 W |
| **Total cruise power** | **~267 W** |
| Endurance | Value |
|-----------|-------|
| **Max endurance (at 50 km/h)** | **~3.4 hours** |
| Total mission (incl. climb + reserve) | ~3.5 hours |
| Max range (at 65 km/h best-range speed) | ~209 km |
### Camera 1: ADTI 20L V1 + 16mm Lens
| Spec | Value |
|------|-------|
| Sensor | Sony CMOS APS-C, 23.2 × 15.4 mm |
| Resolution | 5456 × 3632 (20 MP) |
| Focal length | 16 mm |
| Shutter | Mechanical global inter-mirror shutter (ADTI product line) |
| Max continuous fps | **2.0 fps** (spec — burst rate, buffer-limited) |
| Sustained capture rate | **~0.7 fps** (ADTi recommended 1.5s per capture) |
| File formats | JPEG, RAW, RAW+JPEG |
| HDMI video output | 1080p 24p/30p, 1440×1080 30p |
| Weight | 118g body + ~100g lens |
| ISP | Socionext Milbeaut |
| Cooling | Active fan |
**2.0 fps is a burst rate, not sustained.** The 2.0 fps spec is limited by the internal buffer (estimated 3-5 frames). Once the buffer fills, the camera throttles to the write pipeline speed of ~0.7 fps. The bottleneck chain: mechanical shutter actuation (~100-300ms) + ISP processing (demosaic, NR, JPEG compress) + storage write (~5-10 MB/frame JPEG). The 1.5s/capture recommendation guarantees the buffer never fills and accounts for thermal margin over multi-hour flights.
**Mechanical shutter wear at sustained rates:**
| Rate | Actuations per 3.5h flight | Est. flights before 150K shutter life | Est. flights before 500K shutter life |
|------|---------------------------|---------------------------------------|---------------------------------------|
| 2.0 fps (burst, unsustainable) | 25,200 | ~6 | ~20 |
| 1.0 fps | 12,600 | ~12 | ~40 |
| 0.7 fps (recommended) | 8,820 | ~17 | ~57 |
The 20L V1 shutter lifespan is not documented. The higher-end 102PRO is rated at 500K actuations. The 20L as an entry-level model is likely 100K-150K. At 0.7 fps sustained, this gives ~11-57 flights depending on shutter rating.
**Confirmed operational rate: 0.7 fps (JPEG mode) for both VO and satellite matching.**
### Camera 1: Ground Coverage at Mission Altitude
| Parameter | H = 600 m | H = 800 m | H = 1000 m |
|-----------|-----------|-----------|------------|
| Along-track footprint (15.4mm side) | 577 m | 770 m | 962 m |
| Cross-track footprint (23.2mm side) | 870 m | 1160 m | 1450 m |
| GSD | 15.9 cm/pixel | 21.3 cm/pixel | 26.6 cm/pixel |
### Camera 1: Forward Overlap at 0.7 fps
At 0.7 fps, distance between shots = V / 0.7.
**At 70 km/h (19.4 m/s) — realistic cruise speed:**
| Altitude | Along-track footprint | Shot gap (27.8m) | Forward overlap | Pixel shift |
|----------|-----------------------|------------------|-----------------|-------------|
| 600 m | 577 m | 27.8 m | **95.2%** | ~175 px |
| 800 m | 770 m | 27.8 m | **96.4%** | ~131 px |
| 1000 m | 962 m | 27.8 m | **97.1%** | ~105 px |
**Across speed range (at 600m altitude, 0.7 fps):**
| Speed | Frame gap | Pixel shift | Forward overlap |
|-------|-----------|-------------|-----------------|
| 50 km/h (14 m/s) | 20.0 m | ~126 px | 96.5% |
| 70 km/h (19.4 m/s) | 27.8 m | ~175 px | 95.2% |
| 90 km/h (25 m/s) | 35.7 m | ~224 px | 93.8% |
Even at 90 km/h and the lowest altitude (600m), overlap remains >93%. The 16mm lens on APS-C at these altitudes produces a footprint so large that 0.7 fps provides massive redundancy for both VO and satellite matching.
**For satellite matching specifically** (keyframe-based, every 5-10 camera frames):
| Target overlap | Required gap (600m) | Time between shots (70 km/h) | Capture rate |
|----------------|---------------------|------------------------------|--------------|
| 80% | 115 m | 5.9 s | 0.17 fps |
| 70% | 173 m | 8.9 s | 0.11 fps |
| 60% | 231 m | 11.9 s | 0.084 fps |
Even at the lowest altitude and highest speed, 1 satellite matching keyframe every 6-12 seconds gives 60-80% overlap.
### Camera 2: Viewpro A40 Pro (A40TPro) AI Gimbal
Dual EO/IR gimbal with AI tracking. Reserved for **AI object detection and tracking only** — not used for navigation. Operates independently from the navigation pipeline.
### Camera Role Assignment
| Role | Camera | Rate | Notes |
|------|--------|------|-------|
| Visual Odometry (cuVSLAM) | ADTI 20L V1 + 16mm | 0.7 fps (sustained) | Sole navigation camera. At 70 km/h: 27.8m/~175px displacement at 600m alt. 95%+ overlap. |
| Satellite Image Matching | ADTI 20L V1 + 16mm | Keyframes from VO stream (~every 5-10 frames) | Same image stream as VO. Subset routed to satellite matcher on Stream B. |
| AI Object Detection | Viewpro A40 Pro | Independent | Not part of navigation pipeline. |
### cuVSLAM at 0.7 fps — Feasibility
At 0.7 fps and 70 km/h cruise, inter-frame displacement is 27.8m. In pixel terms:
| Altitude | Displacement | Pixel shift | % of image height | Overlap |
|----------|-------------|-------------|-------------------|---------|
| 600 m | 27.8 m | ~175 px | 4.8% | 95.2% |
| 800 m | 27.8 m | ~131 px | 3.6% | 96.4% |
| 1000 m | 27.8 m | ~105 px | 2.9% | 97.1% |
cuVSLAM uses Lucas-Kanade optical flow with image pyramids. Standard LK handles 30-50px displacements on the base level; with 3-4 pyramid levels, effective search range extends to ~150-200px. At 600m altitude, the 175px shift is within this pyramid-assisted range. At 800-1000m, the shift drops to 105-131px — well within range.
Key factors that make 0.7 fps viable at high altitude:
- **Large footprint**: The 16mm lens on APS-C at 600-1000m produces 577-962m along-track coverage. The aircraft moves only 4-5% of the frame between shots.
- **High texture from altitude**: At 600-1000m, each frame covers a large area with diverse terrain features (roads, field boundaries, structures) even in agricultural regions.
- **IMU bridging**: cuVSLAM's built-in IMU integrator provides pose prediction during the 1.43s gap between frames. ESKF IMU prediction runs at 5-10Hz for continuous GPS_INPUT output.
- **95%+ overlap**: Consecutive frames share >95% content — abundant features for matching.
**Risk**: Over completely uniform terrain (e.g., single crop field filling entire 577m+ footprint), feature tracking may still fail. cuVSLAM falls back to IMU-only (~1s acceptable) then constant-velocity (~0.5s) before tracking loss. Satellite matching corrections every 5-10 frames bound accumulated drift.
## Product Solution Description
A real-time GPS-denied visual navigation system for fixed-wing UAVs, running on a Jetson Orin Nano Super (8GB). All AI model inference uses **native TensorRT Engine files** — no ONNX Runtime dependency. The system replaces the GPS module by sending MAVLink GPS_INPUT messages via pymavlink over UART at 5-10Hz.
Position is determined by fusing: (1) CUDA-accelerated visual odometry (cuVSLAM — native CUDA) from ADTI 20L V1 at 0.7 fps sustained, (2) absolute position corrections from satellite image matching (LiteSAM or XFeat — TRT Engine FP16) using keyframes from the same ADTI image stream, and (3) IMU data from the flight controller via ESKF. Viewpro A40 Pro is reserved for AI object detection only.
**Inference runtime decision**: Native TRT Engine over ONNX Runtime because:
1. ONNX RT CUDA EP is 7-8x slower on Orin Nano (tensor core bug)
2. ONNX RT TRT-EP wastes ~280-300MB per model (serialized engine retained in memory)
3. Cross-platform portability has zero value — deployment is Jetson-only
4. Native TRT provides direct CUDA stream control for pipelining with cuVSLAM
**Hard constraint**: ADTI 20L V1 shoots at 0.7 fps sustained (1430ms interval). Full VO+ESKF pipeline within 400ms per frame. Satellite matching async on keyframes (every 5-10 camera frames). GPS_INPUT at 5-10Hz (ESKF IMU prediction fills gaps between camera frames).
**AI Model Runtime Summary**:
| Model | Runtime | Precision | Memory | Integration |
|-------|---------|-----------|--------|-------------|
| cuVSLAM | Native CUDA (PyCuVSLAM) | N/A (closed-source) | ~200-500MB | CUDA Stream A |
| LiteSAM | TRT Engine | FP16 | ~50-80MB | CUDA Stream B |
| XFeat | TRT Engine | FP16 | ~30-50MB | CUDA Stream B (fallback) |
| ESKF | CPU (Python/C++) | FP64 | ~10MB | CPU thread |
**Offline Preparation Pipeline** (before flight):
1. Download satellite tiles → validate → pre-resize → store (existing)
2. **NEW: Build TRT engines on Jetson** (one-time per model version)
- `trtexec --onnx=litesam_fp16.onnx --saveEngine=litesam.engine --fp16`
- `trtexec --onnx=xfeat.onnx --saveEngine=xfeat.engine --fp16`
3. Copy tiles + engines to Jetson storage
4. At startup: load engines + preload tiles into RAM
```
┌─────────────────────────────────────────────────────────────────────┐
│ OFFLINE (Before Flight) │
│ 1. Satellite Tiles → Download & Validate → Pre-resize → Store │
│ (Google Maps) (≥0.5m/px, <2yr) (matcher res) (GeoHash)│
│ 2. TRT Engine Build (one-time per model version): │
│ PyTorch model → reparameterize → ONNX export → trtexec --fp16 │
│ Output: litesam.engine, xfeat.engine │
│ 3. Copy tiles + engines to Jetson storage │
└─────────────────────────────────────────────────────────────────────┘
┌─────────────────────────────────────────────────────────────────────┐
│ ONLINE (During Flight) │
│ │
│ STARTUP: │
│ 1. pymavlink → read GLOBAL_POSITION_INT → init ESKF │
│ 2. Load TRT engines: litesam.engine + xfeat.engine │
│ (tensorrt.Runtime → deserialize_cuda_engine → create_context) │
│ 3. Allocate GPU buffers for TRT input/output (PyCUDA) │
│ 4. Start cuVSLAM with ADTI 20L V1 camera stream │
│ 5. Preload satellite tiles ±2km into RAM │
│ 6. Begin GPS_INPUT output loop at 5-10Hz │
│ │
│ EVERY CAMERA FRAME (0.7fps sustained from ADTI 20L V1): │
│ ┌──────────────────────────────────────┐ │
│ │ ADTI 20L V1 → Downsample (CUDA) │ │
│ │ → cuVSLAM VO+IMU (~9ms) │ ← CUDA Stream A │
│ │ → ESKF measurement │ │
│ └──────────────────────────────────────┘ │
│ │
│ 5-10Hz CONTINUOUS (IMU-driven between camera frames): │
│ ┌──────────────────────────────────────┐ │
│ │ ESKF IMU prediction → GPS_INPUT send │──→ Flight Controller │
│ └──────────────────────────────────────┘ │
│ │
│ KEYFRAMES (every 5-10 camera frames, async): │
│ ┌──────────────────────────────────────┐ │
│ │ Same ADTI frame → TRT inference (B): │ │
│ │ context.enqueue_v3(stream_B) │──→ ESKF correction │
│ │ LiteSAM FP16 or XFeat FP16 │ │
│ └──────────────────────────────────────┘ │
│ │
│ TELEMETRY (1Hz): │
│ ┌──────────────────────────────────────┐ │
│ │ NAMED_VALUE_FLOAT: confidence, drift │──→ Ground Station │
│ └──────────────────────────────────────┘ │
└─────────────────────────────────────────────────────────────────────┘
```
## Architecture
### Component: AI Model Inference Runtime
| Solution | Tools | Advantages | Limitations | Performance | Memory | Fit |
|----------|-------|-----------|-------------|------------|--------|-----|
| Native TRT Engine | tensorrt Python + PyCUDA + trtexec | Optimal latency, minimal memory, full tensor core usage, direct CUDA stream control | Hardware-specific engines, manual buffer management, rebuild per TRT version | Optimal | ~50-130MB total (both models) | ✅ Best |
| ONNX Runtime TRT-EP | onnxruntime + TensorRT EP | Auto-fallback for unsupported ops, simpler API, auto engine caching | +280-300MB per model, wrapper overhead, first-run latency spike | Near-parity (claimed), up to 3x slower (observed) | ~640-690MB total (both models) | ❌ Memory overhead unacceptable |
| ONNX Runtime CUDA EP | onnxruntime + CUDA EP | Simplest API, broadest op support | 7-8x slower on Orin Nano (tensor core bug), no TRT optimizations | 7-8x slower | Standard | ❌ Performance unacceptable |
| Torch-TensorRT | torch_tensorrt | AOT compilation, PyTorch-native, handles mixed TRT/PyTorch | Newer on Jetson, requires PyTorch runtime at inference | Near native TRT | PyTorch runtime ~500MB+ | ⚠️ Viable alternative if TRT export fails |
**Selected**: **Native TRT Engine** — optimal performance and memory on our fixed NVIDIA hardware.
**Fallback**: If any model has unsupported TRT ops (e.g., MinGRU in LiteSAM), use **Torch-TensorRT** for that specific model. Torch-TensorRT handles mixed TRT/PyTorch execution but requires PyTorch runtime in memory.
### Component: TRT Engine Conversion Workflow
**LiteSAM conversion**:
1. Load PyTorch model with trained weights
2. Reparameterize MobileOne backbone (collapse multi-branch → single Conv2d+BN)
3. Export to ONNX: `torch.onnx.export(model, dummy_input, "litesam.onnx", opset_version=17)`
4. Verify with polygraphy: `polygraphy inspect model litesam.onnx`
5. Build engine on Jetson: `trtexec --onnx=litesam.onnx --saveEngine=litesam.engine --fp16 --memPoolSize=workspace:2048`
6. Verify engine: `trtexec --loadEngine=litesam.engine --fp16`
**XFeat conversion**:
1. Load PyTorch model
2. Export to ONNX: `torch.onnx.export(model, dummy_input, "xfeat.onnx", opset_version=17)`
3. Build engine on Jetson: `trtexec --onnx=xfeat.onnx --saveEngine=xfeat.engine --fp16`
4. Alternative: use XFeatTensorRT C++ implementation directly
**INT8 quantization strategy** (optional, future optimization):
- MobileOne backbone (CNN): INT8 safe with calibration data
- TAIFormer (transformer attention): FP16 only — INT8 degrades accuracy
- XFeat: evaluate INT8 on actual UAV-satellite pairs before deploying
- Use nvidia-modelopt for calibration: `from modelopt.onnx.quantization import quantize`
### Component: TRT Python Inference Wrapper
Minimal wrapper class for TRT engine inference:
```python
import tensorrt as trt
import pycuda.driver as cuda
class TRTInference:
def __init__(self, engine_path, stream):
self.logger = trt.Logger(trt.Logger.WARNING)
self.runtime = trt.Runtime(self.logger)
with open(engine_path, 'rb') as f:
self.engine = self.runtime.deserialize_cuda_engine(f.read())
self.context = self.engine.create_execution_context()
self.stream = stream
self._allocate_buffers()
def _allocate_buffers(self):
self.inputs = {}
self.outputs = {}
for i in range(self.engine.num_io_tensors):
name = self.engine.get_tensor_name(i)
shape = self.engine.get_tensor_shape(name)
dtype = trt.nptype(self.engine.get_tensor_dtype(name))
size = trt.volume(shape)
device_mem = cuda.mem_alloc(size * np.dtype(dtype).itemsize)
self.context.set_tensor_address(name, int(device_mem))
mode = self.engine.get_tensor_mode(name)
if mode == trt.TensorIOMode.INPUT:
self.inputs[name] = (device_mem, shape, dtype)
else:
self.outputs[name] = (device_mem, shape, dtype)
def infer_async(self, input_data):
for name, data in input_data.items():
cuda.memcpy_htod_async(self.inputs[name][0], data, self.stream)
self.context.enqueue_v3(self.stream.handle)
def get_output(self):
results = {}
for name, (dev_mem, shape, dtype) in self.outputs.items():
host_mem = np.empty(shape, dtype=dtype)
cuda.memcpy_dtoh_async(host_mem, dev_mem, self.stream)
self.stream.synchronize()
return results
```
Key design: `infer_async()` + `get_output()` split enables pipelining with cuVSLAM on Stream A while satellite matching runs on Stream B.
### Component: Visual Odometry (UPDATED — camera rate corrected)
cuVSLAM — native CUDA library. Fed by **ADTI 20L V1 at 0.7 fps sustained** (previously assumed 3fps which exceeds camera hardware limit; 2.0 fps spec is burst-only, not sustainable). At 70 km/h cruise the inter-frame displacement is 27.8m — at 600m altitude this translates to ~175px (4.8% of frame), within pyramid-assisted LK optical flow range. At 800-1000m altitude the pixel shift drops to 105-131px. 95%+ frame overlap ensures abundant features for matching. ESKF IMU prediction at 5-10Hz fills the position output between sparse camera frames.
### Component: Satellite Image Matching (UPDATED runtime + fallback chain)
| Solution | Tools | Advantages | Limitations | Performance (est. Orin Nano Super TRT FP16) | Params | Fit |
|----------|-------|-----------|-------------|----------------------------------------------|--------|-----|
| LiteSAM (opt) TRT Engine FP16 @ 1280px | trtexec + tensorrt Python | Best satellite-aerial accuracy (RMSE@30=17.86m UAV-VisLoc), 6.31M params, smallest model | MinGRU TRT export needs verification (LOW-MEDIUM risk) | Est. ~165-330ms | 6.31M | ✅ Primary (if TRT export succeeds AND ≤200ms) |
| EfficientLoFTR TRT Engine FP16 | trtexec + tensorrt Python | Proven TRT path (Coarse_LoFTR_TRT repo, 138 stars). Semi-dense. CVPR 2024. High accuracy. | 2.4x more params than LiteSAM. Requires einsum→elementary ops rewrite for TRT (documented in Coarse_LoFTR_TRT paper). | Est. ~200-400ms | 15.05M | ✅ Fallback if LiteSAM TRT fails |
| XFeat TRT Engine FP16 | trtexec + tensorrt Python (or XFeatTensorRT C++) | Fastest. Proven TRT implementation. Lightweight. | General-purpose, not designed for cross-view satellite-aerial gap (but nadir-nadir gap is small). | Est. ~50-100ms | <5M | ✅ Speed fallback |
**Decision tree (day-one on Orin Nano Super)**:
1. Clone LiteSAM repo → reparameterize MobileOne → `torch.onnx.export()``polygraphy inspect`
2. If ONNX export succeeds → `trtexec --onnx=litesam.onnx --saveEngine=litesam.engine --fp16`
3. If MinGRU causes ONNX/TRT failure → rewrite MinGRU forward() as unrolled 9-step loop → retry
4. If rewrite fails or accuracy degrades → **switch to EfficientLoFTR TRT**:
- Apply Coarse_LoFTR_TRT TRT-adaptation techniques (einsum replacement, etc.)
- Export to ONNX → trtexec --fp16
- Benchmark at 640×480 and 1280px
5. Benchmark winner: **if ≤200ms → use it. If >200ms but ≤300ms → acceptable (async on Stream B). If >300ms → use XFeat TRT**
**EfficientLoFTR TRT adaptation** (from Coarse_LoFTR_TRT paper, proven workflow):
- Replace `torch.einsum()` with elementary ops (view, bmm, reshape, sum)
- Replace any TRT-incompatible high-level PyTorch functions
- Use ONNX export path (less memory required than Torch-TensorRT on 8GB device)
- Knowledge distillation available for further parameter reduction if needed
**Satellite matching cadence**: Keyframes selected from the ADTI VO stream every 5-10 frames (~every 2.5-14s depending on camera fps setting). At 800-1000m altitude and 14 m/s cruise, this yields 60-97% forward overlap between satellite match frames. Matching runs async on Stream B — does not block VO on Stream A.
### Component: Sensor Fusion (UNCHANGED)
ESKF — CPU-based mathematical filter, not affected.
### Component: Flight Controller Integration (UNCHANGED)
pymavlink — not affected by TRT migration.
### Component: Ground Station Telemetry (UNCHANGED)
MAVLink NAMED_VALUE_FLOAT — not affected.
### Component: Startup & Lifecycle (UPDATED)
**Updated startup sequence**:
1. Boot Jetson → start GPS-Denied service (systemd)
2. Connect to flight controller via pymavlink on UART
3. Wait for heartbeat from flight controller
4. **Initialize PyCUDA context**
5. **Load TRT engines**: litesam.engine + xfeat.engine via tensorrt.Runtime.deserialize_cuda_engine()
6. **Allocate GPU I/O buffers** for both models
7. **Create CUDA streams**: Stream A (cuVSLAM), Stream B (satellite matching)
8. Read GLOBAL_POSITION_INT → init ESKF
9. Start cuVSLAM with ADTI 20L V1 camera frames
10. Begin GPS_INPUT output loop at 5-10Hz
11. Preload satellite tiles within ±2km into RAM
12. System ready
**Engine load time**: ~1-3 seconds per engine (deserialization from .engine file). One-time cost at startup.
### Component: Thermal Management (UNCHANGED)
Same adaptive pipeline. TRT engines are slightly more power-efficient than ONNX Runtime, but the difference is within noise.
### Component: Object Localization (UNCHANGED)
Not affected — trigonometric calculation, no AI inference.
## Speed Optimization Techniques
### 1. cuVSLAM for Visual Odometry (~9ms/frame)
Fed by ADTI 20L V1 at 0.7 fps sustained. At 70 km/h cruise and 600m altitude, inter-frame displacement is 27.8m (~175px, 4.8% of frame). With pyramid-based LK optical flow (3-4 levels), effective search range is ~150-200px — 175px is within range. At 800-1000m altitude, pixel shift drops to 105-131px. 95%+ overlap between consecutive frames.
### 2. Native TRT Engine Inference (NEW)
All AI models run as pre-compiled TRT FP16 engines:
- Engine files built offline with trtexec (one-time per model version)
- Loaded at startup (~1-3s per engine)
- Inference via context.enqueue_v3() on dedicated CUDA Stream B
- GPU buffers pre-allocated — zero runtime allocation during flight
- No ONNX Runtime dependency — no framework overhead
Memory advantage over ONNX Runtime TRT-EP: ~560-600MB saved (both models combined).
Latency advantage: eliminates ONNX wrapper overhead, guaranteed tensor core utilization.
### 3. CUDA Stream Pipelining (REFINED)
- Stream A: cuVSLAM VO from ADTI 20L V1 (~9ms) + ESKF fusion (~1ms)
- Stream B: TRT engine inference for satellite matching (LiteSAM or XFeat, async, triggered on keyframe from same ADTI stream)
- CPU: GPS_INPUT output loop, NAMED_VALUE_FLOAT, command listener, tile management
- **NEW**: Both cuVSLAM and TRT engines use CUDA streams natively — no framework abstraction layer. Direct GPU scheduling.
### 4-7. (UNCHANGED from draft03)
Keyframe-based satellite matching, TensorRT FP16 optimization, proactive tile loading, 5-10Hz GPS_INPUT output — all unchanged.
## Processing Time Budget
### VO Frame (every ~1430ms from ADTI 20L V1 at 0.7 fps)
| Step | Time | Notes |
|------|------|-------|
| ADTI image transfer | ~5-10ms | Trigger + readout |
| Downsample (CUDA) | ~2ms | To cuVSLAM input resolution |
| cuVSLAM VO+IMU | ~9ms | CUDA Stream A |
| ESKF measurement update | ~1ms | CPU |
| **Total** | **~17-22ms** | Well within 1430ms budget |
Between camera frames, ESKF IMU prediction runs at 5-10Hz to maintain continuous GPS_INPUT output. The ~1.4s gap between frames is bridged entirely by IMU integration.
### Keyframe Satellite Matching (every 5-10 camera frames, async CUDA Stream B)
**Path A — LiteSAM TRT Engine FP16 at 1280px**:
| Step | Time | Notes |
|------|------|-------|
| Image already in GPU (from VO) | ~0ms | Same frame used for VO and matching |
| Load satellite tile | ~1ms | Pre-loaded in RAM |
| Copy input to GPU buffer | <0.5ms | PyCUDA memcpy_htod_async |
| LiteSAM TRT Engine FP16 | ≤200ms | context.enqueue_v3(stream_B) |
| Copy output from GPU | <0.5ms | PyCUDA memcpy_dtoh_async |
| Geometric pose (RANSAC) | ~5ms | Homography |
| ESKF satellite update | ~1ms | Delayed measurement |
| **Total** | **≤210ms** | Async on Stream B, does not block VO |
**Path B — XFeat TRT Engine FP16**:
| Step | Time | Notes |
|------|------|-------|
| XFeat TRT Engine inference | ~50-80ms | context.enqueue_v3(stream_B) |
| Geometric verification (RANSAC) | ~5ms | |
| ESKF satellite update | ~1ms | |
| **Total** | **~60-90ms** | Async on Stream B |
## Memory Budget (Jetson Orin Nano Super, 8GB shared)
| Component | Memory (Native TRT) | Memory (ONNX RT TRT-EP) | Notes |
|-----------|---------------------|--------------------------|-------|
| OS + runtime | ~1.5GB | ~1.5GB | JetPack 6.2 + Python |
| cuVSLAM | ~200-500MB | ~200-500MB | CUDA library + map |
| **LiteSAM TRT engine** | **~50-80MB** | **~330-360MB** | Native TRT vs TRT-EP. If LiteSAM fails: EfficientLoFTR ~100-150MB |
| **XFeat TRT engine** | **~30-50MB** | **~310-330MB** | Native TRT vs TRT-EP |
| Preloaded satellite tiles | ~200MB | ~200MB | ±2km of flight plan |
| pymavlink + MAVLink | ~20MB | ~20MB | |
| FastAPI (local IPC) | ~50MB | ~50MB | |
| ESKF + buffers | ~10MB | ~10MB | |
| ONNX Runtime framework | **0MB** | **~150MB** | Eliminated with native TRT |
| **Total** | **~2.1-2.9GB** | **~2.8-3.6GB** | |
| **% of 8GB** | **26-36%** | **35-45%** | |
| **Savings** | — | — | **~700MB saved with native TRT** |
## Confidence Scoring → GPS_INPUT Mapping
Unchanged from draft03.
## Key Risks and Mitigations
| Risk | Likelihood | Impact | Mitigation |
|------|-----------|--------|------------|
| **LiteSAM MinGRU ops unsupported in TRT 10.3** | LOW-MEDIUM | LiteSAM TRT export fails | Day-one verification: ONNX export → polygraphy → trtexec. If MinGRU fails: (1) rewrite as unrolled 9-step loop, (2) if still fails: **switch to EfficientLoFTR TRT** (proven TRT path, Coarse_LoFTR_TRT, 15.05M params). XFeat TRT as speed fallback. |
| **TRT engine build OOM on 8GB Jetson** | LOW | Cannot build engines on target device | Our models are small (6.31M LiteSAM, <5M XFeat). OOM unlikely. If occurs: reduce --memPoolSize, or build on identical Orin Nano module with more headroom |
| **Engine incompatibility after JetPack update** | MEDIUM | Must rebuild engines | Include engine rebuild in JetPack update procedure. Takes minutes per model. |
| **MAVSDK cannot send GPS_INPUT** | CONFIRMED | Must use pymavlink | Unchanged from draft03 |
| **cuVSLAM fails on low-texture terrain** | HIGH | Frequent tracking loss | ADTI at 0.7 fps means 27.8m inter-frame displacement at 70 km/h. At 600m+ altitude, pixel shift is 105-175px with 95%+ overlap — within pyramid-assisted LK range. HIGH risk remains over completely uniform terrain (single crop covering 577m+ footprint). IMU bridging + satellite matching corrections bound drift. |
| **Thermal throttling** | MEDIUM | Satellite matching budget blown | Unchanged from draft03 |
| LiteSAM TRT FP16 >200ms at 1280px | MEDIUM | Must use fallback matcher | Day-one benchmark. Fallback chain: EfficientLoFTR TRT (if ≤300ms) → XFeat TRT (if all >300ms) |
| Google Maps satellite quality in conflict zone | HIGH | Satellite matching fails | Unchanged from draft03 |
| **AUW exceeds AT4125 recommended range** | MEDIUM | Reduced endurance, motor thermal stress | 12.5 kg AUW vs 8-10 kg recommended. Monitor motor temps. Consider weight reduction (lighter gimbal, single battery for shorter missions). |
| **cuVSLAM at 0.7 fps — inter-frame displacement** | MEDIUM | VO tracking loss on uniform terrain | At 0.7 fps and 70 km/h: ~175px displacement at 600m (4.8% of frame, 95.2% overlap). Within pyramid-assisted LK range (150-200px). At 800m+: drops to 105-131px. Mitigations: (1) cuVSLAM IMU integrator bridges 1.43s frame gaps, (2) ESKF IMU prediction at 5-10Hz fills position gaps, (3) satellite matching corrections every 5-10 frames bound drift. |
| **ADTI mechanical shutter lifespan** | MEDIUM | Shutter replacement needed periodically | At 0.7 fps sustained over 3.5h flights: ~8,800 actuations/flight. Shutter life unknown for 20L (102PRO is 500K, entry-level likely 100-150K). Estimated 11-57 flights before replacement. Budget for shutter replacement as consumable. |
## Testing Strategy
### Integration / Functional Tests
All tests from draft03 unchanged, plus:
- **TRT engine load test**: Verify litesam.engine and xfeat.engine load successfully on Jetson Orin Nano Super
- **TRT inference correctness**: Compare TRT engine output vs PyTorch reference output (max L1 error < 0.01)
- **CUDA Stream B pipelining**: Verify satellite matching on Stream B does not block cuVSLAM on Stream A
- **Engine pre-built validation**: Verify engine files from offline preparation work without rebuild at runtime
- **ADTI 20L V1 sustained capture rate**: Verify camera sustains 0.7 fps in JPEG mode over extended periods (>30 min) without buffer overflow or overheating. Also test 1.0 fps to determine if higher sustained rate is achievable.
- **ADTI trigger timing**: Verify camera trigger and image transfer pipeline delivers frames to cuVSLAM within acceptable latency (<50ms from trigger to GPU buffer)
### Non-Functional Tests
All tests from draft03 unchanged, plus:
- **TRT engine build time**: Measure trtexec build time for LiteSAM and XFeat on Orin Nano Super (expected: 1-5 minutes each)
- **TRT engine load time**: Measure deserialization time (expected: 1-3 seconds each)
- **Memory comparison**: Measure actual GPU memory with native TRT vs ONNX RT TRT-EP for both models
- **MinGRU TRT compatibility** (day-one blocker):
1. Clone LiteSAM repo, load pretrained weights
2. Reparameterize MobileOne backbone
3. `torch.onnx.export(model, dummy, "litesam.onnx", opset_version=17)`
4. `polygraphy inspect model litesam.onnx` — check for unsupported ops
5. `trtexec --onnx=litesam.onnx --saveEngine=litesam.engine --fp16`
6. If step 3 or 5 fails on MinGRU: rewrite MinGRU forward() as unrolled loop, retry
7. If still fails: switch to EfficientLoFTR, apply Coarse_LoFTR_TRT adaptation
8. Compare TRT output vs PyTorch reference (max L1 error < 0.01)
- **EfficientLoFTR TRT fallback benchmark** (if LiteSAM fails): apply TRT adaptation from Coarse_LoFTR_TRT → ONNX → trtexec → measure latency at 640×480 and 1280px
- **Tensor core utilization**: Verify with NSight that TRT engines use tensor cores (unlike ONNX RT CUDA EP)
- **Flight endurance validation**: Ground-test full system power draw (propulsion + electronics) against 267W estimate. Verify ~3.4h endurance target.
- **cuVSLAM at 0.7 fps**: Benchmark VO tracking quality, drift rate, and tracking loss frequency at 0.7 fps with ADTI 20L V1. Measure IMU integrator effectiveness for bridging 1.43s inter-frame gaps. Test at 600m and 800m altitude, over both textured and low-texture terrain.
- **ADTI shutter durability**: Track shutter actuation count across flights. Monitor for shutter failure symptoms (missed frames, inconsistent exposure).
## References
- ONNX Runtime Issue #24085 (Jetson Orin Nano tensor core bug): https://github.com/microsoft/onnxruntime/issues/24085
- ONNX Runtime Issue #20457 (TRT-EP memory overhead): https://github.com/microsoft/onnxruntime/issues/20457
- ONNX Runtime Issue #12083 (TRT-EP vs native TRT): https://github.com/microsoft/onnxruntime/issues/12083
- NVIDIA TensorRT 10 Python API: https://docs.nvidia.com/deeplearning/tensorrt/10.15.1/inference-library/python-api-docs.html
- TensorRT Best Practices: https://docs.nvidia.com/deeplearning/tensorrt/latest/performance/best-practices.html
- TensorRT engine hardware specificity: https://github.com/NVIDIA/TensorRT/issues/1920
- trtexec ONNX conversion: https://nvidia-jetson.piveral.com/jetson-orin-nano/how-to-convert-onnx-to-engine-on-jetson-orin-nano-dev-board/
- Torch-TensorRT JetPack 6.2: https://docs.pytorch.org/TensorRT/v2.10.0/getting_started/jetpack.html
- XFeatTensorRT: https://github.com/PranavNedunghat/XFeatTensorRT
- JetPack 6.2 Release Notes: https://docs.nvidia.com/jetson/archives/jetpack-archived/jetpack-62/release-notes/index.html
- Jetson Orin Nano Super: https://developer.nvidia.com/blog/nvidia-jetson-orin-nano-developer-kit-gets-a-super-boost/
- DLA on Jetson Orin: https://developer.nvidia.com/blog/maximizing-deep-learning-performance-on-nvidia-jetson-orin-with-dla/
- EfficientLoFTR (CVPR 2024): https://github.com/zju3dv/EfficientLoFTR
- EfficientLoFTR HuggingFace: https://huggingface.co/docs/transformers/en/model_doc/efficientloftr
- Coarse_LoFTR_TRT (TRT for embedded): https://github.com/Kolkir/Coarse_LoFTR_TRT
- Coarse_LoFTR_TRT paper: https://ar5iv.labs.arxiv.org/html/2202.00770
- LoFTR_TRT: https://github.com/Kolkir/LoFTR_TRT
- minGRU ("Were RNNs All We Needed?"): https://huggingface.co/papers/2410.01201
- minGRU PyTorch implementation: https://github.com/lucidrains/minGRU-pytorch
- LiteSAM paper (MinGRU details, Eqs 12-16): https://www.mdpi.com/2072-4292/17/19/3349
- DALGlue (UAV feature matching, 2025): https://www.nature.com/articles/s41598-025-21602-5
- ADTI 20L V1 specs: https://unmannedrc.com/products/adti-20l-v1-mapping-camera
- ADTI 20L V1 user manual: https://docs.adti.camera/adti-20l-and-24l-v1-quick-start-guide/
- T-Motor AT4125 KV540: https://uav-en.tmotor.com/2019/Motors_0429/247.html
- VANT Semi-Solid State 6S 30Ah battery: https://www.xtbattery.com/370wh/kg-42v-high-energy-density-6s-12s-14s-18s-30ah-semi-solid-state-drone-battery/
- All references from solution_draft03.md
## Related Artifacts
- AC Assessment: `_docs/00_research/gps_denied_nav/00_ac_assessment.md`
- Research artifacts (this assessment): `_docs/00_research/trt_engine_migration/`
- Previous research: `_docs/00_research/gps_denied_nav_v3/`
- Tech stack evaluation: `_docs/01_solution/tech_stack.md`
- Security analysis: `_docs/01_solution/security_analysis.md`
- Previous draft: `_docs/01_solution/solution_draft04.md`
-622
View File
@@ -1,622 +0,0 @@
# Solution Draft
## Assessment Findings
| Old Component Solution | Weak Point (functional/security/performance) | New Solution |
| ------------------------------------------------------------------------------------ | ----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | -------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
| ESKF described as "16-state vector, ~10MB" with no mathematical specification | **Functional**: No state vector, no process model (F,Q), no measurement models (H for VO, H for satellite), no noise parameters, no scale observability analysis. Impossible to implement or validate accuracy claims. | **Define complete ESKF specification**: 15-state error vector, IMU-driven prediction, dual measurement models (VO relative pose, satellite absolute position), initial Q/R values, scale constraint via altitude + satellite corrections. |
| GPS_INPUT at 5-10Hz via pymavlink — no field mapping | **Functional**: GPS_INPUT requires 15+ fields (velocity, accuracy, hdop, fix_type, GPS time). No specification of how ESKF state maps to these fields. ArduPilot requires minimum 5Hz. | **Define GPS_INPUT population spec**: velocity from ESKF, accuracy from covariance, fix_type from confidence tier, GPS time from system clock conversion, synthesized hdop/vdop. |
| Confidence scoring "unchanged from draft03" — not in draft05 | **Functional**: Draft05 is supposed to be self-contained. Confidence scoring determines GPS_INPUT accuracy fields and fix_type — directly affects how ArduPilot EKF weights the position data. | **Define confidence scoring inline**: 3 tiers (satellite-anchored, VO-tracked, IMU-only) mapping to fix_type + accuracy values. |
| Coordinate transformations not defined | **Functional**: No pixel→camera→body→NED→WGS84 chain. Camera is not autostabilized, so body attitude matters. Satellite match → WGS84 conversion undefined. Object localization impossible without these transforms. | **Define coordinate transformation chain**: camera intrinsics K, camera-to-body extrinsic T_cam_body, body-to-NED from ESKF attitude, NED origin at mission start point. |
| Disconnected route segments — "satellite re-localization" mentioned but no algorithm | **Functional**: AC requires handling as "core to the system." Multiple disconnected segments expected. No tracking-loss detection, no re-localization trigger, no ESKF re-initialization, no cuVSLAM restart procedure. | **Define re-localization pipeline**: detect cuVSLAM tracking loss → IMU-only ESKF prediction → trigger satellite match on every frame → on match success: ESKF position reset + cuVSLAM restart → on 3 consecutive failures: operator re-localization request. |
| No startup handoff from GPS to GPS-denied | **Functional**: System reads GLOBAL_POSITION_INT at startup but no protocol for when GPS is lost/spoofed vs system start. No validation of initial position. | **Define handoff protocol**: system runs continuously, FC receives both real GPS and GPS_INPUT. GPS-denied system always provides its estimate; FC selects best source. Initial position validated against first satellite match. |
| No mid-flight reboot recovery | **Functional**: AC requires: "re-initialize from flight controller's current IMU-extrapolated position." No procedure defined. Recovery time estimation missing. | **Define reboot recovery sequence**: read FC position → init ESKF with high uncertainty → load TRT engines → start cuVSLAM → immediate satellite match. Estimated recovery: ~35-70s. Document as known limitation. |
| 3-consecutive-failure re-localization request undefined | **Functional**: AC requires ground station re-localization request. No message format, no operator workflow, no system behavior while waiting. | **Define re-localization protocol**: detect 3 failures → send custom MAVLink message with last known position + uncertainty → operator provides approximate coordinates → system uses as ESKF measurement with high covariance. |
| Object localization — "trigonometric calculation" with no details | **Functional**: No math, no API, no Viewpro gimbal integration, no accuracy propagation. Other onboard systems cannot use this component as specified. | **Define object localization**: pixel→ray using Viewpro intrinsics + gimbal angles → body frame → NED → ray-ground intersection → WGS84. FastAPI endpoint: POST /objects/locate. Accuracy propagated from UAV position + gimbal uncertainty. |
| Satellite matching — GSD normalization and tile selection unspecified | **Functional**: Camera GSD ~15.9 cm/px at 600m vs satellite ~0.3 m/px at zoom 19. The "pre-resize" step is mentioned but not specified. Tile selection radius based on ESKF uncertainty not defined. | **Define GSD handling**: downsample camera frame to match satellite GSD. Define tile selection: ESKF position ± 3σ_horizontal → select tiles covering that area. Assemble tile mosaic for matching. |
| Satellite tile storage requirements not calculated | **Functional**: "±2km" preload mentioned but no storage estimate. At zoom 19: a 200km path with ±2km buffer requires ~~130K tiles (~~2.5GB). | **Calculate tile storage**: specify zoom level (18 preferred — 0.6m/px, 4× fewer tiles), estimate storage per mission profile, define maximum mission area by storage limit. |
| FastAPI endpoints not in solution draft | **Functional**: Endpoints only in security_analysis.md. No request/response schemas. No SSE event format. No object localization endpoint. | **Consolidate API spec in solution**: define all endpoints, SSE event schema, object localization endpoint. Reference security_analysis.md for auth. |
| cuVSLAM configuration missing (calibration, IMU params, mode) | **Functional**: No camera calibration procedure, no IMU noise parameters, no T_imu_rig extrinsic, no mode selection (Mono vs Inertial). | **Define cuVSLAM configuration**: use Inertial mode, specify required calibration data (camera intrinsics, distortion, IMU noise params from datasheet, T_imu_rig from physical measurement), define calibration procedure. |
| tech_stack.md inconsistent with draft05 | **Functional**: tech_stack.md says 3fps (should be 0.7fps), LiteSAM at 480px (should be 1280px), missing EfficientLoFTR. | **Flag for update**: tech_stack.md must be synchronized with draft05 corrections. Not addressed in this draft — separate task. |
## Overall Maturity Assessment
| Category | Maturity (1-5) | Assessment |
| ----------------------------- | -------------- | --------------------------------------------------------------------------------------------------------------------------------------- |
| Hardware & Platform Selection | 3.5 | UAV airframe, cameras, Jetson, batteries — well-researched with specs, weight budget, endurance calculations. Ready for procurement. |
| Core Algorithm Selection | 3.0 | cuVSLAM, LiteSAM/XFeat, ESKF — components selected with comparison tables, fallback chains, decision trees. Day-one benchmarks defined. |
| AI Inference Runtime | 3.5 | TRT Engine migration thoroughly analyzed. Conversion workflows, memory savings, performance estimates. Code wrapper provided. |
| Sensor Fusion (ESKF) | 1.5 | Mentioned but not specified. No implementable detail. Blockerfor coding. |
| System Integration | 1.5 | GPS_INPUT, coordinate transforms, inter-component data flow — all under-specified. |
| Edge Cases & Resilience | 1.0 | Disconnected segments, reboot recovery, re-localization — acknowledged but no algorithms. |
| Operational Readiness | 0.5 | No pre-flight procedures, no in-flight monitoring, no failure response. |
| Security | 3.0 | Comprehensive threat model, OP-TEE analysis, LUKS, secure boot. Well-researched. |
| **Overall TRL** | **~2.5** | **Technology concept formulated + some component validation. Not implementation-ready.** |
The solution is at approximately **TRL 3** (proof of concept) for hardware/algorithm selection and **TRL 1-2** (basic concept) for system integration, ESKF, and operational procedures.
## Product Solution Description
A real-time GPS-denied visual navigation system for fixed-wing UAVs, running on a Jetson Orin Nano Super (8GB). All AI model inference uses native TensorRT Engine files. The system replaces the GPS module by sending MAVLink GPS_INPUT messages via pymavlink over UART at 5-10Hz.
Position is determined by fusing: (1) CUDA-accelerated visual odometry (cuVSLAM in Inertial mode) from ADTI 20L V1 at 0.7 fps sustained, (2) absolute position corrections from satellite image matching (LiteSAM or XFeat — TRT Engine FP16) using keyframes from the same ADTI image stream, and (3) IMU data from the flight controller via ESKF. Viewpro A40 Pro is reserved for AI object detection only.
The ESKF is the central state estimator with 15-state error vector. It fuses:
- **IMU prediction** at 5-10Hz (high-frequency pose propagation)
- **cuVSLAM VO measurement** at 0.7Hz (relative pose correction)
- **Satellite matching measurement** at ~0.07-0.14Hz (absolute position correction)
GPS_INPUT messages carry position, velocity, and accuracy derived from the ESKF state and covariance.
**Hard constraint**: ADTI 20L V1 shoots at 0.7 fps sustained (1430ms interval). Full VO+ESKF pipeline within 400ms per frame. Satellite matching async on keyframes (every 5-10 camera frames). GPS_INPUT at 5-10Hz (ESKF IMU prediction fills gaps between camera frames).
```
┌─────────────────────────────────────────────────────────────────────┐
│ OFFLINE (Before Flight) │
│ 1. Satellite Tiles → Download & Validate → Pre-resize → Store │
│ (Google Maps) (≥0.5m/px, <2yr) (matcher res) (GeoHash)│
│ 2. TRT Engine Build (one-time per model version): │
│ PyTorch model → reparameterize → ONNX export → trtexec --fp16 │
│ Output: litesam.engine, xfeat.engine │
│ 3. Camera + IMU calibration (one-time per hardware unit) │
│ 4. Copy tiles + engines + calibration to Jetson storage │
└─────────────────────────────────────────────────────────────────────┘
┌─────────────────────────────────────────────────────────────────────┐
│ ONLINE (During Flight) │
│ │
│ STARTUP: │
│ 1. pymavlink → read GLOBAL_POSITION_INT → init ESKF state │
│ 2. Load TRT engines + allocate GPU buffers │
│ 3. Load camera calibration + IMU calibration │
│ 4. Start cuVSLAM (Inertial mode) with ADTI 20L V1 │
│ 5. Preload satellite tiles ±2km into RAM │
│ 6. First satellite match → validate initial position │
│ 7. Begin GPS_INPUT output loop at 5-10Hz │
│ │
│ EVERY CAMERA FRAME (0.7fps from ADTI 20L V1): │
│ ┌──────────────────────────────────────┐ │
│ │ ADTI 20L V1 → Downsample (CUDA) │ │
│ │ → cuVSLAM VO+IMU (~9ms) │ ← CUDA Stream A │
│ │ → ESKF VO measurement │ │
│ └──────────────────────────────────────┘ │
│ │
│ 5-10Hz CONTINUOUS (IMU-driven between camera frames): │
│ ┌──────────────────────────────────────┐ │
│ │ IMU data → ESKF prediction │ │
│ │ ESKF state → GPS_INPUT fields │ │
│ │ GPS_INPUT → Flight Controller (UART) │ │
│ └──────────────────────────────────────┘ │
│ │
│ KEYFRAMES (every 5-10 camera frames, async): │
│ ┌──────────────────────────────────────┐ │
│ │ Camera frame → GSD downsample │ │
│ │ Select satellite tile (ESKF pos±3σ) │ │
│ │ TRT inference (Stream B): LiteSAM/ │ │
│ │ XFeat → correspondences │ │
│ │ RANSAC → homography → WGS84 position │ │
│ │ ESKF satellite measurement update │──→ Position correction │
│ └──────────────────────────────────────┘ │
│ │
│ TRACKING LOSS (cuVSLAM fails — sharp turn / featureless): │
│ ┌──────────────────────────────────────┐ │
│ │ ESKF → IMU-only prediction (growing │ │
│ │ uncertainty) │ │
│ │ Satellite match on EVERY frame │ │
│ │ On match success → ESKF reset + │ │
│ │ cuVSLAM restart │ │
│ │ 3 consecutive failures → operator │ │
│ │ re-localization request │ │
│ └──────────────────────────────────────┘ │
│ │
│ TELEMETRY (1Hz): │
│ ┌──────────────────────────────────────┐ │
│ │ NAMED_VALUE_FLOAT: confidence, drift │──→ Ground Station │
│ └──────────────────────────────────────┘ │
└─────────────────────────────────────────────────────────────────────┘
```
## Architecture
### Component: ESKF Sensor Fusion (NEW — previously unspecified)
**Error-State Kalman Filter** fusing IMU, visual odometry, and satellite matching.
**Nominal state vector** (propagated by IMU):
| State | Symbol | Size | Description |
| ---------- | ------ | ---- | ------------------------------------------------ |
| Position | p | 3 | NED position relative to mission origin (meters) |
| Velocity | v | 3 | NED velocity (m/s) |
| Attitude | q | 4 | Unit quaternion (body-to-NED rotation) |
| Accel bias | b_a | 3 | Accelerometer bias (m/s²) |
| Gyro bias | b_g | 3 | Gyroscope bias (rad/s) |
**Error-state vector** (estimated by ESKF): δx = [δp, δv, δθ, δb_a, δb_g]ᵀ ∈ ℝ¹⁵
where δθ ∈ so(3) is the 3D rotation error.
**Prediction step** (IMU at 5-10Hz from flight controller):
- Input: accelerometer a_m, gyroscope ω_m, dt
- Propagate nominal state: p += v·dt, v += (R(q)·(a_m - b_a) - g)·dt, q ⊗= Exp(ω_m - b_g)·dt
- Propagate error covariance: P = F·P·Fᵀ + Q
- F is the 15×15 error-state transition matrix (standard ESKF formulation)
- Q: process noise diagonal, initial values from IMU datasheet noise densities
**VO measurement update** (0.7Hz from cuVSLAM):
- cuVSLAM outputs relative pose: ΔR, Δt (camera frame)
- Transform to NED: Δp_ned = R_body_ned · T_cam_body · Δt
- Innovation: z = Δp_ned_measured - Δp_ned_predicted
- Observation matrix H_vo maps error state to relative position change
- R_vo: measurement noise, initial ~0.1-0.5m (from cuVSLAM precision at 600m+ altitude)
- Kalman update: K = P·Hᵀ·(H·P·Hᵀ + R)⁻¹, δx = K·z, P = (I - K·H)·P
**Satellite measurement update** (0.07-0.14Hz, async):
- Satellite matching outputs absolute position: lat_sat, lon_sat in WGS84
- Convert to NED relative to mission origin
- Innovation: z = p_satellite - p_predicted
- H_sat = [I₃, 0, 0, 0, 0] (directly observes position)
- R_sat: measurement noise, from matching confidence (~5-20m based on RANSAC inlier ratio)
- Provides absolute position correction — bounds drift accumulation
**Scale observability**:
- Monocular cuVSLAM has scale ambiguity during constant-velocity flight
- Scale is constrained by: (1) satellite matching absolute positions (primary), (2) known flight altitude from barometer + predefined mission altitude, (3) IMU accelerometer during maneuvers
- During long straight segments without satellite correction, scale drift is possible. Satellite corrections every ~7-14s re-anchor scale.
**Tuning approach**: Start with IMU datasheet noise values for Q. Start with conservative R values (high measurement noise). Tune on flight test data by comparing ESKF output to known GPS ground truth.
| Solution | Tools | Advantages | Limitations | Performance | Fit |
| -------------------------- | --------------- | ------------------------------------------------------------- | -------------------------------------- | ------------- | ----------- |
| Custom ESKF (Python/NumPy) | NumPy, SciPy | Full control, minimal dependencies, well-understood algorithm | Implementation effort, tuning required | <1ms per step | ✅ Selected |
| FilterPy ESKF | FilterPy v1.4.5 | Reference implementation, less code | Less flexible for multi-rate fusion | <1ms per step | ⚠️ Fallback |
### Component: Coordinate System & Transformations (NEW — previously undefined)
**Reference frames**:
- **Camera frame (C)**: origin at camera optical center, Z forward, X right, Y down (OpenCV convention)
- **Body frame (B)**: origin at UAV CG, X forward (nose), Y right (starboard), Z down
- **NED frame (N)**: North-East-Down, origin at mission start point
- **WGS84**: latitude, longitude, altitude (output format)
**Transformation chain**:
1. **Pixel → Camera ray**: p_cam = K⁻¹ · [u, v, 1]ᵀ where K = camera intrinsic matrix (ADTI 20L V1: fx, fy from 16mm lens + APS-C sensor)
2. **Camera → Body**: p_body = T_cam_body · p_cam where T_cam_body is the fixed mounting rotation (camera points nadir: 90° pitch rotation from body X-forward to camera Z-down)
3. **Body → NED**: p_ned = R_body_ned(q) · p_body where q is the ESKF quaternion attitude estimate
4. **NED → WGS84**: lat = lat_origin + p_north / R_earth, lon = lon_origin + p_east / (R_earth · cos(lat_origin)) where (lat_origin, lon_origin) is the mission start GPS position
**Camera intrinsic matrix K** (ADTI 20L V1 + 16mm lens):
- Sensor: 23.2 × 15.4 mm, Resolution: 5456 × 3632
- fx = fy = focal_mm × width_px / sensor_width_mm = 16 × 5456 / 23.2 = 3763 pixels
- cx = 2728, cy = 1816 (sensor center)
- Distortion: Brown model (k1, k2, p1, p2 from calibration)
**T_cam_body** (camera mount):
- Navigation camera is fixed, pointing nadir (downward), not autostabilized
- R_cam_body = R_x(180°) · R_z(0°) (camera Z-axis aligned with body -Z, camera X with body X)
- Translation: offset from CG to camera mount (measured during assembly, typically <0.3m)
**Satellite match → WGS84**:
- Feature correspondences between camera frame and geo-referenced satellite tile
- Homography H maps camera pixels to satellite tile pixels
- Satellite tile pixel → WGS84 via tile's known georeference (zoom level + tile x,y → lat,lon)
- Camera center projects to satellite pixel (cx_sat, cy_sat) via H
- Convert (cx_sat, cy_sat) to WGS84 using tile georeference
### Component: GPS_INPUT Message Population (NEW — previously undefined)
| GPS_INPUT Field | Source | Computation |
| ----------------------- | ---------------------------------------------- | ------------------------------------------------------------------------------------------------------- |
| lat, lon | ESKF position (NED) | NED → WGS84 conversion using mission origin |
| alt | ESKF position (Down) + mission origin altitude | alt = alt_origin - p_down |
| vn, ve, vd | ESKF velocity state | Direct from ESKF v[0], v[1], v[2] |
| fix_type | Confidence tier | 3 (3D fix) when satellite-anchored (last match <30s). 2 (2D) when VO-only. 0 (no fix) when IMU-only >5s |
| hdop | ESKF horizontal covariance | hdop = sqrt(P[0,0] + P[1,1]) / 5.0 (approximate CEP→HDOP mapping) |
| vdop | ESKF vertical covariance | vdop = sqrt(P[2,2]) / 5.0 |
| horiz_accuracy | ESKF horizontal covariance | horiz_accuracy = sqrt(P[0,0] + P[1,1]) meters |
| vert_accuracy | ESKF vertical covariance | vert_accuracy = sqrt(P[2,2]) meters |
| speed_accuracy | ESKF velocity covariance | speed_accuracy = sqrt(P[3,3] + P[4,4]) m/s |
| time_week, time_week_ms | System time | Convert Unix time to GPS epoch (GPS epoch = 1980-01-06, subtract leap seconds) |
| satellites_visible | Constant | 10 (synthetic — prevents satellite-count failsafes in ArduPilot) |
| gps_id | Constant | 0 |
| ignore_flags | Constant | 0 (provide all fields) |
**Confidence tiers** mapping to GPS_INPUT:
| Tier | Condition | fix_type | horiz_accuracy | Rationale |
| ------ | ------------------------------------------------- | ---------- | ------------------------------- | -------------------------------------- |
| HIGH | Satellite match <30s ago, ESKF covariance < 400m² | 3 (3D fix) | From ESKF P (typically 5-20m) | Absolute position anchor recent |
| MEDIUM | cuVSLAM tracking OK, no recent satellite match | 3 (3D fix) | From ESKF P (typically 20-50m) | Relative tracking valid, drift growing |
| LOW | cuVSLAM lost, IMU-only | 2 (2D fix) | From ESKF P (50-200m+, growing) | Only IMU dead reckoning, rapid drift |
| FAILED | 3+ consecutive total failures | 0 (no fix) | 999.0 | System cannot determine position |
### Component: Disconnected Route Segment Handling (NEW — previously undefined)
**Trigger**: cuVSLAM reports tracking_lost OR tracking confidence drops below threshold
**Algorithm**:
```
STATE: TRACKING_NORMAL
cuVSLAM provides relative pose
ESKF VO measurement updates at 0.7Hz
Satellite matching on keyframes (every 5-10 frames)
STATE: TRACKING_LOST (enter when cuVSLAM reports loss)
1. ESKF continues with IMU-only prediction (no VO updates)
→ uncertainty grows rapidly (~1-5 m/s drift with consumer IMU)
2. Switch satellite matching to EVERY frame (not just keyframes)
→ maximize chances of getting absolute correction
3. For each camera frame:
a. Attempt satellite match using ESKF predicted position ± 3σ for tile selection
b. If match succeeds (RANSAC inlier ratio > 30%):
→ ESKF measurement update with satellite position
→ Restart cuVSLAM with current frame as new origin
→ Transition to TRACKING_NORMAL
→ Reset failure counter
c. If match fails:
→ Increment failure_counter
→ Continue IMU-only ESKF prediction
4. If failure_counter >= 3:
→ Send re-localization request to ground station
→ GPS_INPUT fix_type = 0 (no fix), horiz_accuracy = 999.0
→ Continue attempting satellite matching on each frame
5. If operator sends re-localization hint (approximate lat,lon):
→ Use as ESKF measurement with high covariance (~500m)
→ Attempt satellite match in that area
→ On success: transition to TRACKING_NORMAL
STATE: SEGMENT_DISCONNECT
After re-localization following tracking loss:
→ New cuVSLAM track is independent of previous track
→ ESKF maintains global NED position continuity via satellite anchor
→ No need to "connect" segments at the cuVSLAM level
→ ESKF already handles this: satellite corrections keep global position consistent
```
### Component: Satellite Image Matching Pipeline (UPDATED — added GSD + tile selection details)
**GSD normalization**:
- Camera GSD at 600m: ~15.9 cm/pixel (ADTI 20L V1 + 16mm)
- Satellite tile GSD at zoom 18: ~0.6 m/pixel
- Scale ratio: ~3.8:1
- Downsample camera image to satellite GSD before matching: resize from 5456×3632 to ~1440×960 (matching zoom 18 GSD)
- This is close to LiteSAM's 1280px input — use 1280px with minor GSD mismatch acceptable for matching
**Tile selection**:
- Input: ESKF position estimate (lat, lon) + horizontal covariance σ_h
- Search radius: max(3·σ_h, 500m) — at least 500m to handle initial uncertainty
- Compute geohash for center position → load tiles covering the search area
- Assemble tile mosaic if needed (typically 2×2 to 4×4 tiles for adequate coverage)
- If ESKF uncertainty > 2km: tile selection unreliable, fall back to wider search or request operator input
**Tile storage calculation** (zoom 18 — 0.6 m/pixel):
- Each 256×256 tile covers ~153m × 153m
- Flight path 200km with ±2km buffer: area ≈ 200km × 4km = 800 km²
- Tiles needed: 800,000,000 / (153 × 153) ≈ 34,200 tiles
- Storage: ~10-15KB per JPEG tile → ~340-510 MB
- With zoom 19 overlap tiles for higher precision: ×4 = ~1.4-2.0 GB
- Recommended: zoom 18 primary + zoom 19 for ±500m along flight path → ~500-800 MB total
| Solution | Tools | Advantages | Limitations | Performance (est. Orin Nano Super TRT FP16) | Params | Fit |
| -------------------------------------- | ------------------------- | ------------------------------------------------------------------------ | ------------------------------------------------------ | ------------------------------------------- | ------ | ------------------------------- |
| LiteSAM (opt) TRT Engine FP16 @ 1280px | trtexec + tensorrt Python | Best satellite-aerial accuracy (RMSE@30=17.86m UAV-VisLoc), 6.31M params | MinGRU TRT export needs verification (LOW-MEDIUM risk) | Est. ~165-330ms | 6.31M | ✅ Primary |
| EfficientLoFTR TRT Engine FP16 | trtexec + tensorrt Python | Proven TRT path (Coarse_LoFTR_TRT). Semi-dense. CVPR 2024. | 2.4x more params than LiteSAM. | Est. ~200-400ms | 15.05M | ✅ Fallback if LiteSAM TRT fails |
| XFeat TRT Engine FP16 | trtexec + tensorrt Python | Fastest. Proven TRT implementation. | General-purpose, not designed for cross-view gap. | Est. ~50-100ms | <5M | ✅ Speed fallback |
### Component: cuVSLAM Configuration (NEW — previously undefined)
**Mode**: Inertial (mono camera + IMU)
**Camera configuration** (ADTI 20L V1 + 16mm lens):
- Model: Brown distortion
- fx = fy = 3763 px (16mm on 23.2mm sensor at 5456px width)
- cx = 2728 px, cy = 1816 px
- Distortion coefficients: from calibration (k1, k2, p1, p2)
- Border: 50px (ignore lens edge distortion)
**IMU configuration** (Pixhawk 6x IMU — ICM-42688-P):
- Gyroscope noise density: 3.0 × 10⁻³ °/s/√Hz
- Gyroscope random walk: 5.0 × 10⁻⁵ °/s²/√Hz
- Accelerometer noise density: 70 µg/√Hz
- Accelerometer random walk: ~2.0 × 10⁻³ m/s³/√Hz
- IMU frequency: 200 Hz (from flight controller via MAVLink)
- T_imu_rig: measured transformation from Pixhawk IMU to camera center (translation + rotation)
**cuVSLAM settings**:
- OdometryMode: INERTIAL
- MulticameraMode: PRECISION (favor accuracy over speed — we have 1430ms budget)
- Input resolution: downsample to 1280×852 (or 720p) for processing speed
- async_bundle_adjustment: True
**Initialization**:
- cuVSLAM initializes automatically when it receives the first camera frame + IMU data
- First few frames used for feature initialization and scale estimation
- First satellite match validates and corrects the initial position
**Calibration procedure** (one-time per hardware unit):
1. Camera intrinsics: checkerboard calibration with OpenCV (or use manufacturer data if available)
2. Camera-IMU extrinsic (T_imu_rig): Kalibr tool with checkerboard + IMU data
3. IMU noise parameters: Allan variance analysis or use datasheet values
4. Store calibration files on Jetson storage
### Component: AI Model Inference Runtime (UNCHANGED)
Native TRT Engine — optimal performance and memory on fixed NVIDIA hardware. See draft05 for full comparison table and conversion workflow.
### Component: Visual Odometry (UNCHANGED)
cuVSLAM in Inertial mode, fed by ADTI 20L V1 at 0.7 fps sustained. See draft05 for feasibility analysis at 0.7fps.
### Component: Flight Controller Integration (UPDATED — added GPS_INPUT field spec)
pymavlink over UART at 5-10Hz. GPS_INPUT field population defined above.
ArduPilot configuration:
- GPS1_TYPE = 14 (MAVLink)
- GPS_RATE = 5 (minimum, matching our 5-10Hz output)
- EK3_SRC1_POSXY = 1 (GPS), EK3_SRC1_VELXY = 1 (GPS) — EKF uses GPS_INPUT as position/velocity source
### Component: Object Localization (NEW — previously undefined)
**Input**: pixel coordinates (u, v) in Viewpro A40 Pro image, current gimbal angles (pan_deg, tilt_deg), zoom factor, UAV position from GPS-denied system, UAV altitude
**Process**:
1. Pixel → camera ray: ray_cam = K_viewpro⁻¹(zoom) · [u, v, 1]ᵀ
2. Camera → gimbal frame: ray_gimbal = R_gimbal(pan, tilt) · ray_cam
3. Gimbal → body: ray_body = T_gimbal_body · ray_gimbal
4. Body → NED: ray_ned = R_body_ned(q) · ray_body
5. Ray-ground intersection: assuming flat terrain at UAV altitude h: t = -h / ray_ned[2], p_ground_ned = p_uav_ned + t · ray_ned
6. NED → WGS84: convert to lat, lon
**Output**: { lat, lon, accuracy_m, confidence }
- accuracy_m propagated from: UAV position accuracy (from ESKF) + gimbal angle uncertainty + altitude uncertainty
**API endpoint**: POST /objects/locate
- Request: { pixel_x, pixel_y, gimbal_pan_deg, gimbal_tilt_deg, zoom_factor }
- Response: { lat, lon, alt, accuracy_m, confidence, uav_position: {lat, lon, alt}, timestamp }
### Component: Startup, Handoff & Failsafe (UPDATED — added handoff + reboot + re-localization)
**GPS-denied handoff protocol**:
- GPS-denied system runs continuously from companion computer boot
- Reads initial position from FC (GLOBAL_POSITION_INT) — this may be real GPS or last known
- First satellite match validates the initial position
- FC receives both real GPS (if available) and GPS_INPUT; FC EKF selects best source based on accuracy
- No explicit "switch" — the GPS-denied system is a secondary GPS source
**Startup sequence** (expanded from draft05):
1. Boot Jetson → start GPS-Denied service (systemd)
2. Connect to flight controller via pymavlink on UART
3. Wait for heartbeat
4. Initialize PyCUDA context
5. Load TRT engines: litesam.engine + xfeat.engine (~1-3s each)
6. Allocate GPU I/O buffers
7. Create CUDA streams: Stream A (cuVSLAM), Stream B (satellite matching)
8. Load camera calibration + IMU calibration files
9. Read GLOBAL_POSITION_INT → set mission origin (NED reference point) → init ESKF
10. Start cuVSLAM (Inertial mode) with ADTI 20L V1 camera stream
11. Preload satellite tiles within ±2km into RAM
12. Trigger first satellite match → validate initial position
13. Begin GPS_INPUT output loop at 5-10Hz
14. System ready
**Mid-flight reboot recovery**:
1. Jetson boots (~30-60s)
2. GPS-Denied service starts, connects to FC
3. Read GLOBAL_POSITION_INT (FC's current IMU-extrapolated position)
4. Init ESKF with this position + HIGH uncertainty covariance (σ = 200m)
5. Load TRT engines (~2-6s total)
6. Start cuVSLAM (fresh, no prior map)
7. Immediate satellite matching on first camera frame
8. On satellite match success: ESKF corrected, uncertainty drops
9. Estimated total recovery: ~35-70s
10. During recovery: FC uses IMU-only dead reckoning (at 70 km/h: ~700-1400m uncontrolled drift)
11. **Known limitation**: recovery time is dominated by Jetson boot time
**3-consecutive-failure re-localization**:
- Trigger: VO lost + satellite match failed × 3 consecutive camera frames
- Action: send re-localization request via MAVLink STATUSTEXT or custom message
- Message content: "RELOC_REQ: last_lat={lat} last_lon={lon} uncertainty={σ}m"
- Operator response: MAVLink COMMAND_LONG with approximate lat/lon
- System: use operator position as ESKF measurement with R = diag(500², 500², 100²) meters²
- System continues satellite matching with updated search area
- While waiting: GPS_INPUT fix_type=0, IMU-only ESKF prediction continues
### Component: Ground Station Telemetry (UPDATED — added re-localization)
MAVLink messages to ground station:
| Message | Rate | Content |
| ----------------------------- | -------- | --------------------------------------------------- |
| NAMED_VALUE_FLOAT "gps_conf" | 1Hz | Confidence score (0.0-1.0) |
| NAMED_VALUE_FLOAT "gps_drift" | 1Hz | Estimated drift from last satellite anchor (meters) |
| NAMED_VALUE_FLOAT "gps_hacc" | 1Hz | Horizontal accuracy (meters, from ESKF) |
| STATUSTEXT | On event | "RELOC_REQ: ..." for re-localization request |
| STATUSTEXT | On event | Tracking loss / recovery notifications |
### Component: Thermal Management (UNCHANGED)
Same adaptive pipeline from draft05. Active cooling required at 25W. Throttling at 80°C SoC junction.
### Component: API & Inter-System Communication (NEW — consolidated)
FastAPI (Uvicorn) running locally on Jetson for inter-process communication with other onboard systems.
| Endpoint | Method | Purpose | Auth |
| --------------------- | --------- | -------------------------------------- | ---- |
| /sessions | POST | Start GPS-denied session | JWT |
| /sessions/{id}/stream | GET (SSE) | Real-time position + confidence stream | JWT |
| /sessions/{id}/anchor | POST | Operator re-localization hint | JWT |
| /sessions/{id} | DELETE | End session | JWT |
| /objects/locate | POST | Object GPS from pixel coordinates | JWT |
| /health | GET | System health + memory + thermal | None |
**SSE event schema** (1Hz):
```json
{
"type": "position",
"timestamp": "2026-03-17T12:00:00.000Z",
"lat": 48.123456,
"lon": 37.654321,
"alt": 600.0,
"accuracy_h": 15.2,
"accuracy_v": 8.1,
"confidence": "HIGH",
"drift_from_anchor": 12.5,
"vo_status": "tracking",
"last_satellite_match_age_s": 8.3
}
```
## UAV Platform
Unchanged from draft05. See draft05 for: airframe configuration (3.5m S-2 composite, 12.5kg AUW), flight performance (3.4h endurance at 50 km/h), camera specifications (ADTI 20L V1 + 16mm, Viewpro A40 Pro), ground coverage calculations.
## Speed Optimization Techniques
Unchanged from draft05. Key points: cuVSLAM ~9ms/frame, native TRT Engine (no ONNX RT), dual CUDA streams, 5-10Hz GPS_INPUT from ESKF IMU prediction.
## Processing Time Budget
Unchanged from draft05. VO frame: ~17-22ms. Satellite matching: ≤210ms async. Well within 1430ms frame interval.
## Memory Budget (Jetson Orin Nano Super, 8GB shared)
| Component | Memory | Notes |
| ------------------------- | -------------- | ------------------------------------------- |
| OS + runtime | ~1.5GB | JetPack 6.2 + Python |
| cuVSLAM | ~200-500MB | CUDA library + map |
| LiteSAM TRT engine | ~50-80MB | If LiteSAM fails: EfficientLoFTR ~100-150MB |
| XFeat TRT engine | ~30-50MB | |
| Preloaded satellite tiles | ~200MB | ±2km of flight plan |
| pymavlink + MAVLink | ~20MB | |
| FastAPI (local IPC) | ~50MB | |
| ESKF + buffers | ~10MB | |
| **Total** | **~2.1-2.9GB** | **26-36% of 8GB** |
## Key Risks and Mitigations
| Risk | Likelihood | Impact | Mitigation |
| ------------------------------------------------------- | ---------- | ----------------------------------------------------- | ------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
| LiteSAM MinGRU ops unsupported in TRT 10.3 | LOW-MEDIUM | LiteSAM TRT export fails | Day-one verification. Fallback: EfficientLoFTR TRT → XFeat TRT. |
| cuVSLAM fails on low-texture terrain at 0.7fps | HIGH | Frequent tracking loss | Satellite matching corrections bound drift. Re-localization pipeline handles tracking loss. IMU bridges short gaps. |
| Google Maps satellite quality in conflict zone | HIGH | Satellite matching fails, outdated imagery | Pre-flight tile validation. Consider alternative providers (Bing, Mapbox). Robust to seasonal appearance changes via feature-based matching. |
| ESKF scale drift during long constant-velocity segments | MEDIUM | Position error exceeds 100m between satellite anchors | Satellite corrections every 7-14s re-anchor. Altitude constraint from barometer. Monitor drift rate — if >50m between corrections, increase satellite matching frequency. |
| Monocular scale ambiguity | MEDIUM | Metric scale lost during constant-velocity flight | Satellite absolute corrections provide scale. Known altitude constrains vertical scale. IMU acceleration during turns provides observability. |
| AUW exceeds AT4125 recommended range | MEDIUM | Reduced endurance, motor thermal stress | 12.5 kg vs 8-10 kg recommended. Monitor motor temps. Weight optimization. |
| ADTI mechanical shutter lifespan | MEDIUM | Replacement needed periodically | ~8,800 actuations/flight at 0.7fps. Estimated 11-57 flights before replacement. Budget as consumable. |
| Mid-flight companion computer failure | LOW | ~35-70s position gap | Reboot recovery procedure defined. FC uses IMU dead reckoning during gap. Known limitation. |
| Thermal throttling on Jetson | MEDIUM | Satellite matching latency increases | Active cooling required. Monitor SoC temp. Throttling at 80°C. Our workload ~8-15W typical — well under 25W TDP. |
| Engine incompatibility after JetPack update | MEDIUM | Must rebuild engines | Include engine rebuild in update procedure. |
| TRT engine build OOM on 8GB | LOW | Cannot build on target | Models small (6.31M, <5M). Reduce --memPoolSize if needed. |
## Testing Strategy
### Integration / Functional Tests
- **ESKF correctness**: Feed recorded IMU + synthetic VO/satellite data → verify output matches reference ESKF implementation
- **GPS_INPUT field validation**: Send GPS_INPUT to SITL ArduPilot → verify EKF accepts and uses the data correctly
- **Coordinate transform chain**: Known GPS → NED → pixel → back to GPS — verify round-trip error <0.1m
- **Disconnected segment handling**: Simulate tracking loss → verify satellite re-localization triggers → verify cuVSLAM restarts → verify ESKF position continuity
- **3-consecutive-failure**: Simulate VO + satellite failures → verify re-localization request sent → verify operator hint accepted
- **Object localization**: Known object at known GPS → verify computed GPS matches within camera accuracy
- **Mid-flight reboot**: Kill GPS-denied process → restart → verify recovery within expected time → verify position accuracy after recovery
- **TRT engine load test**: Verify engines load successfully on Jetson
- **TRT inference correctness**: Compare TRT output vs PyTorch reference (max L1 error < 0.01)
- **CUDA Stream pipelining**: Verify Stream B satellite matching does not block Stream A VO
- **ADTI sustained capture rate**: Verify 0.7fps sustained >30 min without buffer overflow
- **Confidence tier transitions**: Verify fix_type and accuracy change correctly across HIGH → MEDIUM → LOW → FAILED transitions
### Non-Functional Tests
- **End-to-end accuracy** (primary validation): Fly with real GPS recording → run GPS-denied system in parallel → compare estimated vs real positions → verify 80% within 50m, 60% within 20m
- **VO drift rate**: Measure cuVSLAM drift over 1km straight segment without satellite correction
- **Satellite matching accuracy**: Compare satellite-matched position vs real GPS at known locations
- **Processing time**: Verify end-to-end per-frame <400ms
- **Memory usage**: Monitor over 30-min session → verify <8GB, no leaks
- **Thermal**: Sustained 30-min run → verify no throttling
- **GPS_INPUT rate**: Verify consistent 5-10Hz delivery to FC
- **Tile storage**: Validate calculated storage matches actual for test mission area
- **MinGRU TRT compatibility** (day-one blocker): Clone LiteSAM → ONNX export → polygraphy → trtexec
- **Flight endurance**: Ground-test full system power draw against 267W estimate
## References
- ArduPilot GPS_RATE parameter: [https://github.com/ArduPilot/ardupilot/pull/15980](https://github.com/ArduPilot/ardupilot/pull/15980)
- MAVLink GPS_INPUT message: [https://ardupilot.org/mavproxy/docs/modules/GPSInput.html](https://ardupilot.org/mavproxy/docs/modules/GPSInput.html)
- pymavlink GPS_INPUT example: [https://webperso.ensta.fr/lebars/Share/GPS_INPUT_pymavlink.py](https://webperso.ensta.fr/lebars/Share/GPS_INPUT_pymavlink.py)
- ESKF reference (fixed-wing UAV): [https://github.com/ludvigls/ESKF](https://github.com/ludvigls/ESKF)
- ROS ESKF multi-sensor: [https://github.com/EliaTarasov/ESKF](https://github.com/EliaTarasov/ESKF)
- Range-VIO scale observability: [https://arxiv.org/abs/2103.15215](https://arxiv.org/abs/2103.15215)
- NaviLoc trajectory-level localization: [https://www.mdpi.com/2504-446X/10/2/97](https://www.mdpi.com/2504-446X/10/2/97)
- SatLoc-Fusion hierarchical framework: [https://www.scilit.com/publications/e5cafaf875a49297a62b298a89d5572f](https://www.scilit.com/publications/e5cafaf875a49297a62b298a89d5572f)
- Auterion GPS-denied workflow: [https://docs.auterion.com/vehicle-operation/auterion-mission-control/useful-resources/operations/gps-denied-workflow](https://docs.auterion.com/vehicle-operation/auterion-mission-control/useful-resources/operations/gps-denied-workflow)
- PX4 GNSS-denied flight: [https://docs.px4.io/main/en/advanced_config/gnss_degraded_or_denied_flight.html](https://docs.px4.io/main/en/advanced_config/gnss_degraded_or_denied_flight.html)
- ArduPilot GPS_INPUT advanced usage: [https://discuss.ardupilot.org/t/advanced-usage-of-gps-type-mav-14/99406](https://discuss.ardupilot.org/t/advanced-usage-of-gps-type-mav-14/99406)
- Google Maps Ukraine imagery: [https://newsukraine.rbc.ua/news/google-maps-has-surprise-for-satellite-imagery-1727182380.html](https://newsukraine.rbc.ua/news/google-maps-has-surprise-for-satellite-imagery-1727182380.html)
- Jetson Orin Nano Super thermal: [https://edgeaistack.app/blog/jetson-orin-nano-power-consumption/](https://edgeaistack.app/blog/jetson-orin-nano-power-consumption/)
- GSD matching research: [https://www.kjrs.org/journal/view.html?pn=related&uid=756&vmd=Full](https://www.kjrs.org/journal/view.html?pn=related&uid=756&vmd=Full)
- VO+satellite matching pipeline: [https://polen.itu.edu.tr/items/1fe1e872-7cea-44d8-a8de-339e4587bee6](https://polen.itu.edu.tr/items/1fe1e872-7cea-44d8-a8de-339e4587bee6)
- PyCuVSLAM docs: [https://wiki.seeedstudio.com/pycuvslam_recomputer_robotics/](https://wiki.seeedstudio.com/pycuvslam_recomputer_robotics/)
- Pixhawk 6x IMU (ICM-42688-P) datasheet: [https://invensense.tdk.com/products/motion-tracking/6-axis/icm-42688-p/](https://invensense.tdk.com/products/motion-tracking/6-axis/icm-42688-p/)
- All references from solution_draft05.md
## Related Artifacts
- AC Assessment: `_docs/00_research/gps_denied_nav/00_ac_assessment.md`
- Completeness assessment research: `_docs/00_research/solution_completeness_assessment/`
- Previous research: `_docs/00_research/trt_engine_migration/`
- Tech stack evaluation: `_docs/01_solution/tech_stack.md` (needs sync with draft05 corrections)
- Security analysis: `_docs/01_solution/security_analysis.md`
- Previous draft: `_docs/01_solution/solution_draft05.md`
-59
View File
@@ -1,59 +0,0 @@
# Tech Stack Evaluation
## Requirements Analysis
| Area | Requirement |
|------|-------------|
| Runtime | Jetson Orin Nano Super, Ubuntu/JetPack, CUDA/TensorRT available, 8 GB shared memory, 25 W thermal envelope. |
| Language | Python is acceptable for orchestration/prototyping; C++/TensorRT paths likely needed for hot vision loops. |
| Vision | Calibration, undistortion, homography, VPR descriptors, local matching, RANSAC verification. |
| Estimation | BASALT VIO for relative camera+IMU propagation, wrapped by project-owned safety/anchor estimator logic for covariance calibration, source labels, blackout/spoofing modes, and MAVLink output mapping. |
| Storage | Offline satellite cache, descriptor index, FDR with no raw frame retention. |
| Autopilot | ArduPilot Plane, `GPS_INPUT` through pymavlink, MAVSDK for telemetry. |
## Technology Evaluation
| Layer | Selected | Alternatives Considered | Rationale | Risk |
|-------|----------|-------------------------|-----------|------|
| OS / GPU stack | JetPack Ubuntu + CUDA + TensorRT | Plain Ubuntu without JetPack | Required for Jetson acceleration and profiling. | Thermal/performance tuning. |
| Calibration / geometry | OpenCV 4.x | Custom NumPy-only geometry | Mature APIs for calibration, undistortion, homography, RANSAC. | Version pin and calibration quality. |
| VO / estimator | BASALT + project-owned safety/anchor wrapper | OpenVINS, Kimera-VIO, custom OpenCV/ESKF, ORB-SLAM3, VINS-Fusion | BASALT is the selected production VIO candidate; wrapper owns source labels, covariance gates, degraded modes, satellite-anchor acceptance, and MAVLink semantics. | BASALT confidence/covariance must be calibrated; nadir fixed-wing replay required. |
| VPR descriptors | DINOv2-VLAD / AnyLoc-style, model-size profiled, TensorRT only after fidelity check | MixVPR, SALAD, NetVLAD, classical BoW | Strong retrieval evidence; good offline descriptor model. | Memory/latency and embedding drift if optimized incorrectly. |
| Vector search | FAISS CPU-first on Jetson ARM64 | HNSWLIB, PostgreSQL/pgvector metadata-assisted search, brute-force NumPy, custom FAISS GPU build | Mature top-K search, save/load, PQ compression; PostgreSQL stores spatial/mission metadata around descriptor files. | CPU query latency must be profiled; GPU FAISS is not default on aarch64. |
| Local matching | DISK/ALIKED + LightGlue | SuperPoint+LightGlue, SIFT/ORB, LoFTR | Exact match outputs; Apache/BSD-friendly path; adaptive speed knobs. | Jetson profiling needed. |
| Raster cache | COG + PostgreSQL/PostGIS manifest + signed JSON sidecars | PMTiles, MBTiles, loose tile folders | COG fits geospatial raster and write-new-tile workflow; PostGIS supports spatial/freshness queries. | 10 GB budget pressure and local DB availability. |
| MAVLink | MAVSDK telemetry + pymavlink `GPS_INPUT` | MAVSDK-only, MAVProxy bridge | MAVSDK does telemetry well; `GPS_INPUT` needs raw field control. | Plane SITL validation. |
| FDR | PostgreSQL event index + CBOR/binary payload segments with optional Parquet export | Raw images, plain CSV | Queryable event metadata, bounded payload segments, no raw frame retention. | Schema and local DB availability. |
| Testing | ArduPilot Plane SITL + AerialVL/VPAir + EuRoC + representative flight replay | Public datasets only | No public dataset covers all ACs. | Representative data collection required. |
## Tech Stack Summary
- **Primary implementation**: Python for orchestration, test harness, cache tooling, and MAVLink integration; C++/TensorRT for hot-path vision if profiling requires it.
- **Vision utilities**: OpenCV 4.x.
- **Estimator**: BASALT VIO plus project-owned safety/anchor wrapper and mode machine.
- **Global retrieval**: DINOv2-VLAD style descriptors with CPU-first FAISS top-K search.
- **Local matching**: DISK/ALIKED + LightGlue; SuperPoint only after license review.
- **Cache**: COG imagery, PostgreSQL/PostGIS manifest metadata, signed JSON sidecars, FAISS index files.
- **Autopilot**: MAVSDK subscriptions plus pymavlink `GPS_INPUT`.
- **Validation**: public datasets for component de-risking, Plane SITL for integration, representative flight/replay data for acceptance.
## Risk Assessment
| Risk | Impact | Mitigation |
|------|--------|------------|
| VPR/local matching exceeds Jetson latency | AC-4.1 failure | Conditional VPR, top-K caps, downsampled descriptors, CPU FAISS profiling, TensorRT only after embedding-fidelity checks. |
| Descriptor cache exceeds 10 GB | AC-8.3/storage failure | PQ/compression, multi-scale budget report, split descriptor budget if needed. |
| GPL library accidentally becomes production dependency | Licensing issue | Keep OpenVINS/ORB-SLAM3 as reference only unless legal approves; BASALT is the production VIO candidate. |
| BASALT covariance under-reports real error | False-position safety budget failure | Calibrate wrapper covariance against OpenVINS covariance, ground truth replay, and satellite anchor residuals. |
| Plane failsafe differs from Copter docs | Safety behavior mismatch | Production-parameter ArduPilot Plane SITL gate. |
| `GPS_INPUT` velocity ignore flags behave unexpectedly | EKF drift or false velocity fusion | SITL tests for velocity fields, ignore flags, and `EK3_SRC1_*` source parameters. |
| Public datasets fail to represent Ukrainian agricultural terrain | False confidence | Require representative synchronized flight/replay data before AC signoff. |
| Thermal throttling | Latency regression | Hot-soak test and throttle logging per AC-NEW-5. |
## Learning / Implementation Requirements
- Jetson profiling with CUDA/TensorRT and memory instrumentation.
- ArduPilot Plane SITL, `GPS_INPUT`, GPS spoof/failsafe parameters.
- Geospatial raster formats: COG, CRS, tile matrices, m/px metadata, manifests.
- BASALT integration, covariance calibration, and Mahalanobis rejection in the safety/anchor wrapper.
- Aerial VPR benchmark methodology and georeference recall.