mirror of
https://github.com/azaion/gps-denied-onboard.git
synced 2026-04-22 21:06:37 +00:00
Refactor acceptance criteria, problem description, and restrictions for UAV GPS-Denied system. Enhance clarity and detail in performance metrics, image processing requirements, and operational constraints. Introduce new sections for UAV specifications, camera details, satellite imagery, and onboard hardware.
This commit is contained in:
@@ -0,0 +1,346 @@
|
||||
# Security Analysis
|
||||
|
||||
## Operational Context
|
||||
|
||||
This system runs on a UAV operating in a **conflict zone** (eastern Ukraine). The UAV could be shot down and physically captured. GPS denial/spoofing is the premise. The Jetson Orin Nano stores satellite imagery, flight plans, and captured photos. Security must assume the worst case: **physical access by an adversary**.
|
||||
|
||||
## Threat Model
|
||||
|
||||
### Asset Inventory
|
||||
|
||||
| Asset | Sensitivity | Location | Notes |
|
||||
|-------|------------|----------|-------|
|
||||
| Captured camera imagery | HIGH | Jetson storage | Reconnaissance data — reveals what was surveyed |
|
||||
| Satellite tile cache | MEDIUM | Jetson storage | Reveals operational area and areas of interest |
|
||||
| Flight plan / route | HIGH | Jetson memory + storage | Reveals mission objectives and launch/landing sites |
|
||||
| Computed GPS positions | HIGH | Jetson memory, SSE stream | Real-time position data of UAV and surveyed targets |
|
||||
| Google Maps API key | MEDIUM | Offline prep machine only | Used pre-flight, NOT stored on Jetson |
|
||||
| TensorRT model weights | LOW | Jetson storage | LiteSAM/XFeat — publicly available models |
|
||||
| cuVSLAM binary | LOW | Jetson storage | NVIDIA proprietary but freely distributed |
|
||||
| IMU calibration data | LOW | Jetson storage | Device-specific calibration |
|
||||
| System configuration | MEDIUM | Jetson storage | API endpoints, tile paths, fusion parameters |
|
||||
|
||||
### Threat Actors
|
||||
|
||||
| Actor | Capability | Motivation | Likelihood |
|
||||
|-------|-----------|------------|------------|
|
||||
| **Adversary military (physical capture)** | Full physical access after UAV loss | Extract intelligence: imagery, flight plans, operational area | HIGH |
|
||||
| **Electronic warfare unit** | GPS spoofing/jamming, RF jamming | Disrupt navigation, force UAV off course | HIGH (GPS denial is the premise) |
|
||||
| **Network attacker (ground station link)** | Intercept/inject on UAV-to-ground comms | Steal position data, inject false commands | MEDIUM |
|
||||
| **Insider / rogue operator** | Authorized access to system | Data exfiltration, mission sabotage | LOW |
|
||||
| **Supply chain attacker** | Tampered satellite tiles or model weights | Feed corrupted reference data → position errors | LOW |
|
||||
|
||||
### Attack Vectors
|
||||
|
||||
| Vector | Target Asset | Actor | Impact | Likelihood |
|
||||
|--------|-------------|-------|--------|------------|
|
||||
| **Physical extraction of storage** | All stored data | Adversary (capture) | Full intelligence compromise | HIGH |
|
||||
| **GPS spoofing** | Position estimate | EW unit | Already mitigated — system is GPS-denied by design | N/A |
|
||||
| **IMU acoustic injection** | IMU data → ESKF | EW unit | Drift injection, subtle position errors | LOW |
|
||||
| **Camera blinding/spoofing** | VO + satellite matching | EW unit | VO failure, incorrect satellite matches | LOW |
|
||||
| **Adversarial ground patterns** | Satellite matching | Adversary | Physical patches on ground fool feature matching | VERY LOW |
|
||||
| **SSE stream interception** | Position data | Network attacker | Real-time position leak | MEDIUM |
|
||||
| **API command injection** | Flight session control | Network attacker | Start/stop/manipulate sessions | MEDIUM |
|
||||
| **Corrupted satellite tiles** | Satellite matching | Supply chain | Systematic position errors | LOW |
|
||||
| **Model weight tampering** | Matching accuracy | Supply chain | Degraded matching → higher drift | LOW |
|
||||
|
||||
## Per-Component Security Requirements and Controls
|
||||
|
||||
### 1. Data at Rest (Jetson Storage)
|
||||
|
||||
| Requirement | Risk Level | Control | Implementation |
|
||||
|-------------|-----------|---------|----------------|
|
||||
| Protect captured imagery from extraction after capture | CRITICAL | Full-disk encryption (LUKS) | JetPack LUKS support with `ENC_ROOTFS=1`. Use OP-TEE Trusted Application for key management. Modify `luks-srv` to NOT auto-decrypt — require hardware token or secure erase trigger |
|
||||
| Protect satellite tiles and flight plans | HIGH | Same LUKS encryption | Included in full-disk encryption scope |
|
||||
| Enable rapid secure erase on capture/crash | CRITICAL | Tamper-triggered wipe | Hardware dead-man switch: if UAV telemetry lost for N seconds OR accelerometer detects crash impact → trigger `cryptsetup luksErase` on all LUKS volumes. Destroys key material in <1 second — data becomes unrecoverable |
|
||||
| Prevent cold-boot key extraction | HIGH | Minimize key residency in RAM | ESKF state and position history cleared from memory when session ends. Avoid writing position logs to disk unless encrypted |
|
||||
|
||||
### 2. Secure Boot
|
||||
|
||||
| Requirement | Risk Level | Control | Implementation |
|
||||
|-------------|-----------|---------|----------------|
|
||||
| Prevent unauthorized code execution | HIGH | NVIDIA Secure Boot with PKC fuse burning | Burn `SecurityMode` fuse (odm_production_mode=0x1) on production Jetsons. Sign all boot images with PKC key pair. Generate keys via HSM |
|
||||
| Prevent firmware rollback | MEDIUM | Ratchet fuses | Configure anti-rollback fuses in fuse configuration XML |
|
||||
| Debug port lockdown | HIGH | Disable JTAG/debug after production | Burn debug-disable fuses. Irreversible — production units only |
|
||||
|
||||
### 3. API & Communication (FastAPI + SSE)
|
||||
|
||||
| Requirement | Risk Level | Control | Implementation |
|
||||
|-------------|-----------|---------|----------------|
|
||||
| Authenticate API clients | HIGH | JWT bearer token | Pre-shared secret between ground station and Jetson. Generate JWT at session start. Short expiry (flight duration). `HTTPBearer` scheme in FastAPI |
|
||||
| Encrypt SSE stream | HIGH | TLS 1.3 | Uvicorn with TLS certificate (self-signed for field use, pre-installed on ground station). All SSE position data encrypted in transit |
|
||||
| Prevent unauthorized session control | HIGH | JWT + endpoint authorization | Session start/stop/anchor endpoints require valid JWT. Rate-limit via `slowapi` |
|
||||
| Prevent replay attacks | MEDIUM | JWT `exp` + `jti` claims | Token expiry per-flight. Unique token ID (`jti`) tracked to prevent reuse |
|
||||
| Limit API surface | MEDIUM | Minimal endpoint exposure | Only expose: POST /sessions, GET /sessions/{id}/stream (SSE), POST /sessions/{id}/anchor, DELETE /sessions/{id}. No admin/debug endpoints in production |
|
||||
|
||||
### 4. Visual Odometry (cuVSLAM)
|
||||
|
||||
| Requirement | Risk Level | Control | Implementation |
|
||||
|-------------|-----------|---------|----------------|
|
||||
| Detect camera feed tampering | LOW | Sanity checks on frame consistency | If consecutive frames show implausible motion (>500m displacement at 3fps), flag as suspicious. ESKF covariance spike triggers satellite re-localization |
|
||||
| Protect against VO poisoning | LOW | Cross-validate VO with IMU | ESKF fusion inherently cross-validates: IMU and VO disagreement raises covariance, triggers satellite matching. No single sensor can silently corrupt position |
|
||||
|
||||
### 5. Satellite Image Matching
|
||||
|
||||
| Requirement | Risk Level | Control | Implementation |
|
||||
|-------------|-----------|---------|----------------|
|
||||
| Verify tile integrity | MEDIUM | SHA-256 checksums per tile | During offline preprocessing: compute SHA-256 for each tile pair. Store checksums in signed manifest. At runtime: verify checksum on tile load |
|
||||
| Prevent adversarial tile injection | MEDIUM | Signed tile manifest | Offline tool signs manifest with private key. Jetson verifies signature with embedded public key before accepting tile set |
|
||||
| Detect satellite match outliers | MEDIUM | RANSAC inlier ratio threshold | If RANSAC inlier ratio <30%, reject match as unreliable. ESKF treats as no-measurement rather than bad measurement |
|
||||
| Protect against ground-based adversarial patterns | VERY LOW | Multi-tile consensus | Match against multiple overlapping tiles. Physical adversarial patches affect local area — consensus voting across tiles detects anomalies |
|
||||
|
||||
### 6. Sensor Fusion (ESKF)
|
||||
|
||||
| Requirement | Risk Level | Control | Implementation |
|
||||
|-------------|-----------|---------|----------------|
|
||||
| Prevent single-sensor corruption of position | HIGH | Adaptive noise + outlier rejection | Mahalanobis distance test on each measurement. Reject updates >5σ from predicted state. No single measurement can cause >50m position jump |
|
||||
| Detect systematic drift | MEDIUM | Satellite matching rate monitoring | If satellite matches consistently disagree with VO by >100m, flag integrity warning to operator |
|
||||
| Protect fusion state | LOW | In-memory only, no persistence | ESKF state never written to disk. Lost on power-off — no forensic recovery |
|
||||
|
||||
### 7. Offline Preprocessing (Developer Machine)
|
||||
|
||||
| Requirement | Risk Level | Control | Implementation |
|
||||
|-------------|-----------|---------|----------------|
|
||||
| Protect Google Maps API key | MEDIUM | Environment variable, never in code | `.env` file excluded from version control. API key used only on developer machine, never deployed to Jetson |
|
||||
| Validate downloaded tiles | LOW | Source verification | Download only from Google Maps Tile API via HTTPS. Verify TLS certificate chain |
|
||||
| Secure tile transfer to Jetson | MEDIUM | Signed + encrypted transfer | Transfer tile set + signed manifest via encrypted channel (SCP/SFTP). Verify manifest signature on Jetson before accepting |
|
||||
|
||||
## Security Controls Summary
|
||||
|
||||
### Authentication & Authorization
|
||||
|
||||
- **Mechanism**: Pre-shared JWT secret between ground station and Jetson
|
||||
- **Scope**: All API endpoints require valid JWT bearer token
|
||||
- **Session model**: One JWT per flight session, expires at session end
|
||||
- **No user management on Jetson** — single-operator system, auth is device-to-device
|
||||
|
||||
### Data Protection
|
||||
|
||||
| State | Protection | Tool |
|
||||
|-------|-----------|------|
|
||||
| At rest (Jetson storage) | LUKS full-disk encryption | JetPack LUKS + OP-TEE |
|
||||
| In transit (SSE stream) | TLS 1.3 | Uvicorn SSL |
|
||||
| In memory (ESKF state) | No persistence, cleared on session end | Application logic |
|
||||
| On capture (emergency) | Tamper-triggered LUKS key erase | Hardware dead-man switch + `cryptsetup luksErase` |
|
||||
|
||||
### Secure Communication
|
||||
|
||||
- Ground station ↔ Jetson: TLS 1.3 (self-signed cert, pre-installed)
|
||||
- No internet connectivity during flight — no external attack surface
|
||||
- RF link security is out of scope (handled by UAV communication system)
|
||||
|
||||
### Logging & Monitoring
|
||||
|
||||
| What | Where | Retention |
|
||||
|------|-------|-----------|
|
||||
| API access logs (request count, errors) | In-memory ring buffer | Current session only, not persisted |
|
||||
| Security events (auth failures, integrity warnings) | In-memory + SSE alert to operator | Current session only |
|
||||
| Position history | In-memory for refinement, SSE to ground station | NOT persisted on Jetson after session end |
|
||||
| Crash/tamper events | Trigger secure erase, no logging | N/A — priority is data destruction |
|
||||
|
||||
**Design principle**: Minimize data persistence on Jetson. The ground station is the system of record. Jetson stores only what's needed for the current flight — satellite tiles (encrypted at rest) and transient processing state (memory only).
|
||||
|
||||
## Protected Code Execution (OP-TEE / ARM TrustZone)
|
||||
|
||||
### Overview
|
||||
|
||||
The Jetson Orin Nano Super supports hardware-enforced protected code execution via **ARM TrustZone** and **OP-TEE v4.2.0** (included in Jetson Linux 36.3+). TrustZone partitions the processor into two isolated worlds:
|
||||
|
||||
- **Secure World** (TEE): Runs at ARMv8 secure EL-1 (OS) and EL-0 (apps). Code here cannot be read or tampered with from the normal world. OP-TEE is the secure OS.
|
||||
- **Normal World**: Standard Linux (JetPack). Our Python application, cuVSLAM, FastAPI all run here.
|
||||
|
||||
Trusted Applications (TAs) execute inside the secure world and are invoked by Client Applications (CAs) in the normal world via the GlobalPlatform TEE Client API.
|
||||
|
||||
### Architecture on Jetson Orin Nano
|
||||
|
||||
```
|
||||
┌─────────────────────────────────────────────────┐
|
||||
│ NORMAL WORLD (Linux / JetPack) │
|
||||
│ │
|
||||
│ Client Application (CA) │
|
||||
│ ↕ libteec.so (TEE Client API) │
|
||||
│ ↕ OP-TEE Linux Kernel Driver │
|
||||
│ ↕ ARM Trusted Firmware (ATF) / Monitor │
|
||||
├─────────────────────────────────────────────────┤
|
||||
│ SECURE WORLD (OP-TEE v4.2.0) │
|
||||
│ │
|
||||
│ OP-TEE OS (ARMv8 S-EL1) │
|
||||
│ ├── jetson-user-key PTA (key management) │
|
||||
│ ├── luks TA (disk encryption passphrase) │
|
||||
│ ├── hwkey-agent TA (encrypt/decrypt data) │
|
||||
│ ├── PKCS #11 TA (crypto token interface) │
|
||||
│ └── Custom TAs (our application-specific TAs) │
|
||||
│ │
|
||||
│ Hardware: Security Engine (SE), HW RNG, Fuses │
|
||||
│ TZ-DRAM: Dedicated memory carveout │
|
||||
└─────────────────────────────────────────────────┘
|
||||
```
|
||||
|
||||
### Key Hierarchy (Hardware-Backed)
|
||||
|
||||
The Jetson Orin Nano provides a hardware-rooted key hierarchy via the Security Engine (SE) and Encrypted Key Blob (EKB):
|
||||
|
||||
```
|
||||
OEM_K1 fuse (256-bit AES, burned into hardware, cannot be read by software)
|
||||
│
|
||||
├── EKB_RK (EKB Root Key, derived via AES-128-ECB from OEM_K1 + FV)
|
||||
│ ├── EKB_EK (encryption key for EKB content)
|
||||
│ └── EKB_AK (authentication key for EKB content)
|
||||
│
|
||||
├── HUK (Hardware Unique Key, per-device, derived via NIST-SP-800-108)
|
||||
│ └── SSK (Secure Storage Key, per-device, generated at OP-TEE boot)
|
||||
│ ├── TSK (TA Storage Key, per-TA)
|
||||
│ └── FEK (File Encryption Key, per-file)
|
||||
│
|
||||
└── LUKS passphrase (derived from disk encryption key stored in EKB)
|
||||
```
|
||||
|
||||
Fuse keys are loaded into SE keyslots during early boot (before OP-TEE starts). Software cannot read keys from keyslots — only derive new keys through the SE. After use, keyslots should be cleared via `tegra_se_clear_aes_keyslots()`.
|
||||
|
||||
### What to Run in the Secure World (Our Use Cases)
|
||||
|
||||
| Use Case | TA Type | Purpose |
|
||||
|----------|---------|---------|
|
||||
| **LUKS disk encryption** | Built-in `luks` TA | Generate one-time passphrase at boot to unlock encrypted rootfs. Keys never leave secure world |
|
||||
| **Tile manifest verification** | Custom User TA | Verify SHA-256 signatures of satellite tile manifests. Signing key stored in EKB, accessible only in secure world |
|
||||
| **JWT secret storage** | Custom User TA or `hwkey-agent` TA | Store JWT signing secret in EKB. Sign/verify JWTs inside secure world — secret never exposed to Linux |
|
||||
| **Secure erase trigger** | Custom User TA | Receive tamper signal → invoke `cryptsetup luksErase` via CA. Key erase logic runs in secure world to prevent normal-world interference |
|
||||
| **TLS private key protection** | PKCS #11 TA | Store TLS private key in OP-TEE secure storage. Uvicorn uses PKCS #11 interface to perform TLS handshake without key leaving secure world |
|
||||
|
||||
### How to Enable Protected Code Execution
|
||||
|
||||
#### Step 1: Burn OEM_K1 Fuse (One-Time, Irreversible)
|
||||
|
||||
```bash
|
||||
# Generate 256-bit OEM_K1 key (use HSM in production)
|
||||
openssl rand -hex 32 > oem_k1_key.txt
|
||||
|
||||
# Create fuse configuration XML with OEM_K1
|
||||
# Burn fuse via odmfuse.sh (IRREVERSIBLE)
|
||||
sudo ./odmfuse.sh -i <fuse_config.xml> <board_name>
|
||||
```
|
||||
|
||||
After burning `SecurityMode` fuse (`odm_production_mode=0x1`), all further fuse writes are blocked. OEM_K1 becomes permanently embedded in hardware.
|
||||
|
||||
#### Step 2: Generate and Flash EKB
|
||||
|
||||
```bash
|
||||
# Generate user keys (disk encryption key, JWT secret, tile signing key)
|
||||
openssl rand -hex 16 > disk_enc_key.txt
|
||||
openssl rand -hex 32 > jwt_secret.txt
|
||||
openssl rand -hex 32 > tile_signing_key.txt
|
||||
|
||||
# Generate EKB binary
|
||||
python3 gen_ekb.py -chip t234 \
|
||||
-oem_k1_key oem_k1_key.txt \
|
||||
-in_sym_key uefi_enc_key.txt \
|
||||
-in_sym_key2 disk_enc_key.txt \
|
||||
-in_auth_key uefi_var_auth_key.txt \
|
||||
-out eks_t234.img
|
||||
|
||||
# Flash EKB to EKS partition
|
||||
# (part of the normal flash process with secure boot enabled)
|
||||
```
|
||||
|
||||
#### Step 3: Enable LUKS Disk Encryption
|
||||
|
||||
```bash
|
||||
# During flash, set ENC_ROOTFS=1 to encrypt rootfs
|
||||
export ENC_ROOTFS=1
|
||||
sudo ./flash.sh <board_name> <storage_device>
|
||||
```
|
||||
|
||||
The `luks` TA in OP-TEE derives a passphrase from the disk encryption key in EKB at boot. The passphrase is generated inside the secure world and passed to `cryptsetup` — it never exists in persistent storage.
|
||||
|
||||
#### Step 4: Develop Custom Trusted Applications
|
||||
|
||||
Cross-compile TAs for aarch64 using the Jetson OP-TEE source package:
|
||||
|
||||
```bash
|
||||
# Build custom TA (e.g., tile manifest verifier)
|
||||
make -C <ta_source_dir> \
|
||||
CROSS_COMPILE="<toolchain>/bin/aarch64-buildroot-linux-gnu-" \
|
||||
TA_DEV_KIT_DIR="<optee_src>/optee/build/t234/export-ta_arm64/" \
|
||||
OPTEE_CLIENT_EXPORT="<optee_src>/optee/install/t234/usr" \
|
||||
TEEC_EXPORT="<optee_src>/optee/install/t234/usr" \
|
||||
-j"$(nproc)"
|
||||
|
||||
# Deploy: copy TA to /lib/optee_armtz/ on Jetson
|
||||
# Deploy: copy CA to /usr/sbin/ on Jetson
|
||||
```
|
||||
|
||||
TAs conform to the GlobalPlatform TEE Internal Core API. Use the `hello_world` example from `optee_examples` as a starting template.
|
||||
|
||||
#### Step 5: Enable Secure Boot
|
||||
|
||||
```bash
|
||||
# Generate PKC key pair (use HSM for production)
|
||||
openssl genrsa -out pkc_key.pem 3072
|
||||
|
||||
# Sign and flash secured images
|
||||
sudo ./flash.sh --sign pkc_key.pem <board_name> <storage_device>
|
||||
|
||||
# After verification, burn SecurityMode fuse (IRREVERSIBLE)
|
||||
```
|
||||
|
||||
### Available Crypto Services in Secure World
|
||||
|
||||
| Service | Provider | Notes |
|
||||
|---------|----------|-------|
|
||||
| AES-128/256 encryption/decryption | SE hardware | Via keyslot-derived keys, never leaves SE |
|
||||
| Key derivation (NIST-SP-800-108) | `jetson-user-key` PTA | Derive purpose-specific keys from EKB keys |
|
||||
| Hardware RNG | SE hardware | `TEE_GenerateRandom()` or PTA command |
|
||||
| PKCS #11 crypto tokens | PKCS #11 TA | Standard crypto interface for TLS, signing |
|
||||
| SHA-256, HMAC | MbedTLS (bundled in optee_os) | Software crypto in secure world |
|
||||
| RSA/ECC signing | GlobalPlatform TEE Crypto API | For manifest signature verification |
|
||||
|
||||
### Limitations on Orin Nano
|
||||
|
||||
| Limitation | Impact | Workaround |
|
||||
|-----------|--------|------------|
|
||||
| No RPMB support (only AGX Orin has RPMB) | Secure storage uses REE FS instead of replay-protected memory | Acceptable — LUKS encryption protects data at rest. REE FS secure storage is encrypted by SSK |
|
||||
| EKB can only be updated via OTA, not at runtime | Cannot rotate keys in flight | Pre-provision per-device unique keys at manufacturing time |
|
||||
| OP-TEE hello_world reported issues on some Orin Nano units | Some users report initialization failures | Use JetPack 6.2.2+ which includes fixes. Test thoroughly on target hardware |
|
||||
| TZ-DRAM is a fixed carveout | Limits secure world memory | Keep TAs lightweight — only crypto operations and key management, not data processing |
|
||||
|
||||
### Security Hardening Checklist
|
||||
|
||||
- [ ] Burn OEM_K1 fuse with unique per-device key (via HSM)
|
||||
- [ ] Generate and flash EKB with disk encryption key, JWT secret, tile signing key
|
||||
- [ ] Enable LUKS full-disk encryption (`ENC_ROOTFS=1`)
|
||||
- [ ] Modify `luks-srv` to NOT auto-decrypt (require explicit trigger or dead-man switch)
|
||||
- [ ] Burn SecurityMode fuse (`odm_production_mode=0x1`) — enables secure boot chain
|
||||
- [ ] Burn debug-disable fuses — disables JTAG
|
||||
- [ ] Configure anti-rollback ratchet fuses
|
||||
- [ ] Clear SE keyslots after EKB extraction via `tegra_se_clear_aes_keyslots()`
|
||||
- [ ] Deploy custom TAs for JWT signing and tile manifest verification
|
||||
- [ ] Use PKCS #11 for TLS private key protection
|
||||
- [ ] Test secure erase trigger end-to-end
|
||||
- [ ] Run `xtest` (OP-TEE test suite) on production Jetson to validate TEE
|
||||
|
||||
## Key Security Risks
|
||||
|
||||
| Risk | Severity | Mitigation Status |
|
||||
|------|---------|-------------------|
|
||||
| Physical capture → data extraction | CRITICAL | Mitigated by LUKS + secure erase. Residual risk: attacker extracts RAM before erase triggers |
|
||||
| No auto-decrypt bypass for LUKS | HIGH | Requires custom `luks-srv` modification — development effort needed |
|
||||
| Self-signed TLS certificates | MEDIUM | Acceptable for field deployment. Certificate pinning on ground station prevents MITM |
|
||||
| cuVSLAM is closed-source | LOW | Cannot audit for vulnerabilities. Mitigated by running in sandboxed environment, input validation on camera frames |
|
||||
| Dead-man switch reliability | HIGH | Hardware integration required. False triggers (temporary signal loss) must NOT cause premature erase. Needs careful threshold tuning |
|
||||
|
||||
## References
|
||||
- xT-STRIDE threat model for UAVs (2025): https://link.springer.com/article/10.1007/s10207-025-01082-4
|
||||
- NVIDIA Jetson OP-TEE Documentation (r36.4.4): https://docs.nvidia.com/jetson/archives/r36.4.4/DeveloperGuide/SD/Security/OpTee.html
|
||||
- NVIDIA Jetson Security Overview (r36.4.3): https://docs.nvidia.com/jetson/archives/r36.4.3/DeveloperGuide/SD/Security.html
|
||||
- NVIDIA Jetson LUKS Disk Encryption: https://docs.nvidia.com/jetson/archives/r36.4.4/DeveloperGuide/SD/Security/DiskEncryption.html
|
||||
- NVIDIA Jetson Secure Boot: https://docs.nvidia.com/jetson/archives/r36.4.4/DeveloperGuide/SD/Security/SecureBoot.html
|
||||
- NVIDIA Jetson Secure Storage: https://docs.nvidia.com/jetson/archives/r36.4.3/DeveloperGuide/SD/Security/SecureStorage.html
|
||||
- NVIDIA Jetson Firmware TPM: https://docs.nvidia.com/jetson/archives/r36.4.3/DeveloperGuide/SD/Security/FirmwareTPM.html
|
||||
- NVIDIA Jetson Rollback Protection: https://docs.nvidia.com/jetson/archives/r36.4.3/DeveloperGuide/SD/Security/RollbackProtection.html
|
||||
- Jetson Orin Fuse Specification: https://developer.nvidia.com/downloads/jetson-agx-orin-series-fuse-specification
|
||||
- OP-TEE Official Documentation: https://optee.readthedocs.io/en/latest/
|
||||
- OP-TEE Trusted Application Examples: https://github.com/linaro-swg/optee_examples
|
||||
- RidgeRun OP-TEE on Jetson Guide: https://developer.ridgerun.com/wiki/index.php/RidgeRun_Platform_Security_Manual/Getting_Started/TEE/NVIDA-Jetson
|
||||
- GlobalPlatform TEE Specifications: https://globalplatform.org/specs-library/?filter-committee=tee
|
||||
- Model Agnostic Defense against Adversarial Patches on UAVs (2024): https://arxiv.org/html/2405.19179v1
|
||||
- FastAPI Security Best Practices (2026): https://fastlaunchapi.dev/blog/fastapi-best-practices-production-2026
|
||||
@@ -0,0 +1,283 @@
|
||||
# Solution Draft
|
||||
|
||||
## Product Solution Description
|
||||
|
||||
A real-time GPS-denied visual navigation system for fixed-wing UAVs, running entirely on a Jetson Orin Nano Super (8GB). The system determines frame-center GPS coordinates by fusing three information sources: (1) CUDA-accelerated visual odometry (cuVSLAM), (2) absolute position corrections from satellite image matching, and (3) IMU-based motion prediction. Results stream to clients via REST API + SSE in real time.
|
||||
|
||||
**Hard constraint**: Camera shoots at ~3fps (333-400ms interval). The full pipeline must complete within **400ms per frame**.
|
||||
|
||||
**Satellite matching strategy**: Benchmark LiteSAM on actual Orin Nano Super hardware as a day-one priority. If LiteSAM cannot achieve ≤400ms at 480px resolution, **abandon it entirely** and use XFeat semi-dense matching as the primary satellite matcher. Speed is non-negotiable.
|
||||
|
||||
**Core architectural principles**:
|
||||
1. **cuVSLAM handles VO** — NVIDIA's CUDA-accelerated library achieves 90fps on Jetson Orin Nano, giving VO essentially "for free" (~11ms/frame).
|
||||
2. **Keyframe-based satellite matching** — satellite matcher runs on keyframes only (every 3-10 frames), amortizing its cost. Non-keyframes rely on cuVSLAM VO + IMU.
|
||||
3. **Every keyframe independently attempts satellite-based geo-localization** — this handles disconnected segments natively.
|
||||
4. **Pipeline parallelism** — satellite matching for frame N overlaps with VO processing of frame N+1 via CUDA streams.
|
||||
|
||||
```
|
||||
┌─────────────────────────────────────────────────────────────────┐
|
||||
│ OFFLINE (Before Flight) │
|
||||
│ Satellite Tiles → Download & Crop → Store as tile pairs │
|
||||
│ (Google Maps) (per flight plan) (disk, GeoHash indexed) │
|
||||
└─────────────────────────────────────────────────────────────────┘
|
||||
│
|
||||
▼
|
||||
┌─────────────────────────────────────────────────────────────────┐
|
||||
│ ONLINE (During Flight) │
|
||||
│ │
|
||||
│ EVERY FRAME (400ms budget): │
|
||||
│ ┌────────────────────────────────┐ │
|
||||
│ │ Camera → Downsample (CUDA 2ms)│ │
|
||||
│ │ → cuVSLAM VO+IMU (~11ms)│──→ ESKF Update → SSE Emit │
|
||||
│ └────────────────────────────────┘ ↑ │
|
||||
│ │ │
|
||||
│ KEYFRAMES ONLY (every 3-10 frames): │ │
|
||||
│ ┌────────────────────────────────────┐ │ │
|
||||
│ │ Satellite match (async CUDA stream)│─────┘ │
|
||||
│ │ LiteSAM or XFeat (see benchmark) │ │
|
||||
│ │ (does NOT block VO output) │ │
|
||||
│ └────────────────────────────────────┘ │
|
||||
│ │
|
||||
│ IMU: 100+Hz continuous → ESKF prediction │
|
||||
└─────────────────────────────────────────────────────────────────┘
|
||||
```
|
||||
|
||||
## Speed Optimization Techniques
|
||||
|
||||
### 1. cuVSLAM for Visual Odometry (~11ms/frame)
|
||||
NVIDIA's CUDA-accelerated VO library (v15.0.0, March 2026) achieves 90fps on Jetson Orin Nano. Supports monocular camera + IMU natively. Features: automatic IMU fallback when visual tracking fails, loop closure, Python and C++ APIs. This eliminates custom VO entirely.
|
||||
|
||||
### 2. Keyframe-Based Satellite Matching
|
||||
Not every frame needs satellite matching. Strategy:
|
||||
- cuVSLAM provides VO at every frame (high-rate, low-latency)
|
||||
- Satellite matching triggers on **keyframes** selected by:
|
||||
- Fixed interval: every 3-10 frames (~1-3.3s between satellite corrections)
|
||||
- Confidence drop: when ESKF covariance exceeds threshold
|
||||
- VO failure: when cuVSLAM reports tracking loss (sharp turn)
|
||||
|
||||
### 3. Satellite Matcher Selection (Benchmark-Driven)
|
||||
|
||||
**Candidate A: LiteSAM (opt)** — Best accuracy for satellite-aerial matching (RMSE@30 = 17.86m on UAV-VisLoc). 6.31M params, MobileOne + TAIFormer + MinGRU. Benchmarked at 497ms on Jetson AGX Orin at 1184px. AGX Orin is 3-4x more powerful than Orin Nano Super (275 TOPS vs 67 TOPS, $2000+ vs $249).
|
||||
|
||||
Realistic Orin Nano Super estimates:
|
||||
- At 1184px: ~1.5-2.0s (unusable)
|
||||
- At 640px: ~500-800ms (borderline)
|
||||
- At 480px: ~300-500ms (best case)
|
||||
|
||||
**Candidate B: XFeat semi-dense** — ~50-100ms on Orin Nano Super. Proven on Jetson. Not specifically designed for cross-view satellite-aerial, but fast and reliable.
|
||||
|
||||
**Decision rule**: Benchmark LiteSAM TensorRT FP16 at 480px on Orin Nano Super. If ≤400ms → use LiteSAM. If >400ms → **abandon LiteSAM, use XFeat as primary**. No hybrid compromises — pick one and optimize it.
|
||||
|
||||
### 4. TensorRT FP16 Optimization
|
||||
LiteSAM's MobileOne backbone is reparameterizable — multi-branch training structure collapses to a single feed-forward path at inference. Combined with TensorRT FP16, this maximizes throughput. INT8 is possible for MobileOne backbone but ViT/transformer components may degrade with INT8.
|
||||
|
||||
### 5. CUDA Stream Pipelining
|
||||
Overlap operations across consecutive frames:
|
||||
- Stream A: cuVSLAM VO for current frame (~11ms) + ESKF fusion (~1ms)
|
||||
- Stream B: Satellite matching for previous keyframe (async)
|
||||
- CPU: SSE emission, tile management, keyframe selection logic
|
||||
|
||||
### 6. Pre-cropped Satellite Tiles
|
||||
Offline: for each satellite tile, store both the raw image and a pre-resized version matching the satellite matcher's input resolution. Runtime avoids resize cost.
|
||||
|
||||
## Existing/Competitor Solutions Analysis
|
||||
|
||||
| Solution | Approach | Accuracy | Hardware | Limitations |
|
||||
|----------|----------|----------|----------|-------------|
|
||||
| Mateos-Ramirez et al. (2024) | VO (ORB) + satellite keypoint correction + Kalman | 142m mean / 17km (0.83%) | Orange Pi class | No re-localization; ORB only; 1000m+ altitude |
|
||||
| SatLoc (2025) | DinoV2 + XFeat + optical flow + adaptive fusion | <15m, >90% coverage | Edge (unspecified) | Paper not fully accessible |
|
||||
| LiteSAM (2025) | MobileOne + TAIFormer + MinGRU subpixel refinement | RMSE@30 = 17.86m on UAV-VisLoc | RTX 3090 (62ms), AGX Orin (497ms) | Not tested on Orin Nano; AGX Orin is 3-4x more powerful |
|
||||
| TerboucheHacene/visual_localization | SuperPoint/SuperGlue/GIM + VO + satellite | Not quantified | Desktop-class | Not edge-optimized |
|
||||
| cuVSLAM (NVIDIA, 2025-2026) | CUDA-accelerated VO+SLAM, mono/stereo/IMU | <1% trajectory error (KITTI), <5cm (EuRoC) | Jetson Orin Nano (90fps) | VO only, no satellite matching |
|
||||
|
||||
**Key insight**: Combine cuVSLAM (best-in-class VO for Jetson) with the fastest viable satellite-aerial matcher via ESKF fusion. LiteSAM is the accuracy leader but unproven on Orin Nano Super — benchmark first, abandon for XFeat if too slow.
|
||||
|
||||
## Architecture
|
||||
|
||||
### Component: Visual Odometry
|
||||
|
||||
| Solution | Tools | Advantages | Limitations | Fit |
|
||||
|----------|-------|-----------|-------------|-----|
|
||||
| cuVSLAM (mono+IMU) | PyCuVSLAM / C++ API | 90fps on Orin Nano, NVIDIA-optimized, loop closure, IMU fallback | Closed-source CUDA library | ✅ Best |
|
||||
| XFeat frame-to-frame | XFeatTensorRT | 5x faster than SuperPoint, open-source | ~30-50ms total, no IMU integration | ⚠️ Fallback |
|
||||
| ORB-SLAM3 | OpenCV + custom | Well-understood, open-source | CPU-heavy, ~30fps on Orin | ⚠️ Slower |
|
||||
|
||||
**Selected**: **cuVSLAM (mono+IMU mode)** — purpose-built by NVIDIA for Jetson. ~11ms/frame leaves 389ms for everything else. Auto-fallback to IMU when visual tracking fails.
|
||||
|
||||
### Component: Satellite Image Matching
|
||||
|
||||
| Solution | Tools | Advantages | Limitations | Fit |
|
||||
|----------|-------|-----------|-------------|-----|
|
||||
| LiteSAM (opt) | TensorRT | Best satellite-aerial accuracy (RMSE@30 17.86m), 6.31M params, subpixel refinement | 497ms on AGX Orin at 1184px; AGX Orin is 3-4x more powerful than Orin Nano Super | ✅ If benchmark passes |
|
||||
| XFeat semi-dense | XFeatTensorRT | ~50-100ms, lightweight, Jetson-proven | Not designed for cross-view satellite-aerial | ✅ If LiteSAM fails benchmark |
|
||||
| EfficientLoFTR | TensorRT | Good accuracy, semi-dense | 15.05M params (2.4x LiteSAM), slower | ⚠️ Heavier |
|
||||
| SuperPoint + LightGlue | TensorRT C++ | Good general matching | Sparse only, worse on satellite-aerial | ⚠️ Not specialized |
|
||||
|
||||
**Selection**: Benchmark-driven. Day-one test on Orin Nano Super:
|
||||
1. Export LiteSAM (opt) to TensorRT FP16
|
||||
2. Measure at 480px, 640px, 800px
|
||||
3. If ≤400ms at 480px → **LiteSAM**
|
||||
4. If >400ms at any viable resolution → **XFeat semi-dense** (primary, no hybrid)
|
||||
|
||||
### Component: Sensor Fusion
|
||||
|
||||
| Solution | Tools | Advantages | Limitations | Fit |
|
||||
|----------|-------|-----------|-------------|-----|
|
||||
| Error-State EKF (ESKF) | Custom Python/C++ | Lightweight, multi-rate, well-understood | Linear approximation | ✅ Best |
|
||||
| Hybrid ESKF/UKF | Custom | 49% better accuracy | More complex | ⚠️ Upgrade path |
|
||||
| Factor Graph (GTSAM) | GTSAM | Best accuracy | Heavy compute | ❌ Too heavy |
|
||||
|
||||
**Selected**: **ESKF** with adaptive measurement noise. State vector: [position(3), velocity(3), orientation_quat(4), accel_bias(3), gyro_bias(3)] = 16 states.
|
||||
|
||||
Measurement sources and rates:
|
||||
- IMU prediction: 100+Hz
|
||||
- cuVSLAM VO update: ~3Hz (every frame)
|
||||
- Satellite update: ~0.3-1Hz (keyframes only, delayed via async pipeline)
|
||||
|
||||
### Component: Satellite Tile Preprocessing (Offline)
|
||||
|
||||
**Selected**: **GeoHash-indexed tile pairs on disk**.
|
||||
|
||||
Pipeline:
|
||||
1. Define operational area from flight plan
|
||||
2. Download satellite tiles from Google Maps Tile API at max zoom (18-19)
|
||||
3. Pre-resize each tile to matcher input resolution
|
||||
4. Store: original tile + resized tile + metadata (GPS bounds, zoom, GSD) in GeoHash-indexed directory structure
|
||||
5. Copy to Jetson storage before flight
|
||||
|
||||
### Component: Re-localization (Disconnected Segments)
|
||||
|
||||
**Selected**: **Keyframe satellite matching is always active + expanded search on VO failure**.
|
||||
|
||||
When cuVSLAM reports tracking loss (sharp turn, no features):
|
||||
1. Immediately flag next frame as keyframe → trigger satellite matching
|
||||
2. Expand tile search radius (from ±200m to ±1km based on IMU dead-reckoning uncertainty)
|
||||
3. If match found: position recovered, new segment begins
|
||||
4. If 3+ consecutive keyframe failures: request user input via API
|
||||
|
||||
### Component: Object Center Coordinates
|
||||
|
||||
Geometric calculation once frame-center GPS is known:
|
||||
1. Pixel offset from center: (dx_px, dy_px)
|
||||
2. Convert to meters: dx_m = dx_px × GSD, dy_m = dy_px × GSD
|
||||
3. Rotate by IMU yaw heading
|
||||
4. Convert meter offset to lat/lon and add to frame-center GPS
|
||||
|
||||
### Component: API & Streaming
|
||||
|
||||
**Selected**: **FastAPI + sse-starlette**. REST for session management, SSE for real-time position stream. OpenAPI auto-documentation.
|
||||
|
||||
## Processing Time Budget (per frame, 400ms budget)
|
||||
|
||||
### Normal Frame (non-keyframe, ~60-80% of frames)
|
||||
|
||||
| Step | Time | Notes |
|
||||
|------|------|-------|
|
||||
| Image capture + transfer | ~10ms | CSI/USB3 |
|
||||
| Downsample (for cuVSLAM) | ~2ms | OpenCV CUDA |
|
||||
| cuVSLAM VO+IMU | ~11ms | NVIDIA CUDA-optimized, 90fps capable |
|
||||
| ESKF fusion (VO+IMU update) | ~1ms | C extension or NumPy |
|
||||
| SSE emit | ~1ms | Async |
|
||||
| **Total** | **~25ms** | Well within 400ms |
|
||||
|
||||
### Keyframe Satellite Matching (async, every 3-10 frames)
|
||||
|
||||
Runs asynchronously on a separate CUDA stream — does NOT block per-frame VO output.
|
||||
|
||||
**Path A — LiteSAM (if benchmark passes)**:
|
||||
|
||||
| Step | Time | Notes |
|
||||
|------|------|-------|
|
||||
| Downsample to ~480px | ~1ms | OpenCV CUDA |
|
||||
| Load satellite tile | ~5ms | Pre-resized, from storage |
|
||||
| LiteSAM (opt) matching | ~300-500ms | TensorRT FP16, 480px, Orin Nano Super estimate |
|
||||
| Geometric pose (RANSAC) | ~5ms | Homography estimation |
|
||||
| ESKF satellite update | ~1ms | Delayed measurement |
|
||||
| **Total** | **~310-510ms** | Async, does not block VO |
|
||||
|
||||
**Path B — XFeat (if LiteSAM abandoned)**:
|
||||
|
||||
| Step | Time | Notes |
|
||||
|------|------|-------|
|
||||
| XFeat feature extraction (both images) | ~10-20ms | TensorRT FP16/INT8 |
|
||||
| XFeat semi-dense matching | ~30-50ms | KNN + refinement |
|
||||
| Geometric verification (RANSAC) | ~5ms | |
|
||||
| ESKF satellite update | ~1ms | |
|
||||
| **Total** | **~50-80ms** | Comfortably within budget |
|
||||
|
||||
### Per-Frame Wall-Clock Latency
|
||||
|
||||
Every frame:
|
||||
- **VO result emitted in ~25ms** (cuVSLAM + ESKF + SSE)
|
||||
- Satellite correction arrives asynchronously on keyframes
|
||||
- Client gets immediate position, then refined position when satellite match completes
|
||||
|
||||
## Memory Budget (Jetson Orin Nano Super, 8GB shared)
|
||||
|
||||
| Component | Memory | Notes |
|
||||
|-----------|--------|-------|
|
||||
| OS + runtime | ~1.5GB | JetPack 6.2 + Python |
|
||||
| cuVSLAM | ~200-300MB | NVIDIA CUDA library + internal state |
|
||||
| Satellite matcher TensorRT | ~50-100MB | LiteSAM FP16 or XFeat FP16 |
|
||||
| Current frame (downsampled) | ~2MB | 640×480×3 |
|
||||
| Satellite tile (pre-resized) | ~1MB | Single active tile |
|
||||
| ESKF state + buffers | ~10MB | |
|
||||
| FastAPI + SSE runtime | ~100MB | |
|
||||
| **Total** | **~1.9-2.4GB** | ~25-30% of 8GB — comfortable margin |
|
||||
|
||||
## Confidence Scoring
|
||||
|
||||
| Level | Condition | Expected Accuracy |
|
||||
|-------|-----------|-------------------|
|
||||
| HIGH | Satellite match succeeded + cuVSLAM consistent | <20m |
|
||||
| MEDIUM | cuVSLAM VO only, recent satellite correction (<500m travel) | 20-50m |
|
||||
| LOW | cuVSLAM VO only, no recent satellite correction | 50-100m+ |
|
||||
| VERY LOW | IMU dead-reckoning only (cuVSLAM + satellite both failed) | 100m+ |
|
||||
| MANUAL | User-provided position | As provided |
|
||||
|
||||
## Key Risks and Mitigations
|
||||
|
||||
| Risk | Likelihood | Impact | Mitigation |
|
||||
|------|-----------|--------|------------|
|
||||
| LiteSAM too slow on Orin Nano Super | HIGH | Misses 400ms deadline | **Abandon LiteSAM, use XFeat**. Day-one benchmark is the go/no-go gate |
|
||||
| cuVSLAM not supporting nadir-only camera well | MEDIUM | VO accuracy degrades | Fall back to XFeat frame-to-frame matching |
|
||||
| Google Maps satellite quality in conflict zone | HIGH | Satellite matching fails | Accept VO+IMU with higher drift; request user input sooner; alternative satellite providers |
|
||||
| XFeat cross-view accuracy insufficient | MEDIUM | Position corrections less accurate than LiteSAM | Increase keyframe frequency; multi-tile consensus voting; geometric verification with strict RANSAC |
|
||||
| cuVSLAM is closed-source | LOW | Hard to debug | Fallback to XFeat VO; cuVSLAM has Python+C++ APIs |
|
||||
|
||||
## Testing Strategy
|
||||
|
||||
### Integration / Functional Tests
|
||||
- End-to-end pipeline test with real flight data (60 images from input_data/)
|
||||
- Compare computed positions against ground truth GPS from coordinates.csv
|
||||
- Measure: percentage within 50m, percentage within 20m
|
||||
- Test sharp-turn handling: introduce 90-degree heading change in sequence
|
||||
- Test user-input fallback: simulate 3+ consecutive failures
|
||||
- Test SSE streaming: verify client receives VO result within 50ms, satellite-corrected result within 500ms
|
||||
- Test session management: start/stop/restart flight sessions via REST API
|
||||
|
||||
### Non-Functional Tests
|
||||
- **Day-one benchmark**: LiteSAM TensorRT FP16 at 480/640/800px on Orin Nano Super → go/no-go for LiteSAM
|
||||
- cuVSLAM benchmark: verify 90fps monocular+IMU on Orin Nano Super
|
||||
- Performance: measure per-frame processing time (must be <400ms)
|
||||
- Memory: monitor peak usage during 1000-frame session (must stay <8GB)
|
||||
- Stress: process 3000 frames without memory leak
|
||||
- Keyframe strategy: vary interval (2, 3, 5, 10) and measure accuracy vs latency tradeoff
|
||||
|
||||
## References
|
||||
- LiteSAM (2025): https://www.mdpi.com/2072-4292/17/19/3349
|
||||
- LiteSAM code: https://github.com/boyagesmile/LiteSAM
|
||||
- cuVSLAM (2025-2026): https://github.com/NVlabs/PyCuVSLAM
|
||||
- PyCuVSLAM API: https://nvlabs.github.io/PyCuVSLAM/api.html
|
||||
- Mateos-Ramirez et al. (2024): https://www.mdpi.com/2076-3417/14/16/7420
|
||||
- SatLoc (2025): https://www.scilit.com/publications/e5cafaf875a49297a62b298a89d5572f
|
||||
- XFeat (CVPR 2024): https://arxiv.org/abs/2404.19174
|
||||
- XFeat TensorRT for Jetson: https://github.com/PranavNedunghat/XFeatTensorRT
|
||||
- EfficientLoFTR (CVPR 2024): https://github.com/zju3dv/EfficientLoFTR
|
||||
- JetPack 6.2: https://docs.nvidia.com/jetson/archives/jetpack-archived/jetpack-62/release-notes/
|
||||
- Hybrid ESKF/UKF: https://arxiv.org/abs/2512.17505
|
||||
- Google Maps Tile API: https://developers.google.com/maps/documentation/tile/satellite
|
||||
|
||||
## Related Artifacts
|
||||
- AC Assessment: `_docs/00_research/gps_denied_nav/00_ac_assessment.md`
|
||||
- Tech stack evaluation: `_docs/01_solution/tech_stack.md`
|
||||
@@ -0,0 +1,356 @@
|
||||
# Solution Draft
|
||||
|
||||
## Assessment Findings
|
||||
|
||||
| Old Component Solution | Weak Point (functional/security/performance) | New Solution |
|
||||
|------------------------|----------------------------------------------|-------------|
|
||||
| LiteSAM at 480px as satellite matcher | **Performance**: 497ms on AGX Orin at 1184px. Orin Nano Super is ~3-4x slower. At 480px estimated ~270-360ms — borderline. Paper uses PyTorch AMP, not TensorRT FP16. TensorRT could bring 2-3x improvement. | Add TensorRT FP16 as mandatory optimization step. Revised estimate at 480px with TensorRT: ~90-180ms. Still benchmark-driven: abandon if >400ms. |
|
||||
| XFeat as LiteSAM fallback for satellite matching | **Functional**: XFeat is a general-purpose feature matcher, NOT designed for cross-view satellite-aerial gap. May fail on season/lighting differences between UAV and satellite imagery. | **Expand fallback options**: benchmark EfficientLoFTR (designed for weak-texture aerial) alongside XFeat. Consider STHN-style deep homography as third option. See detailed satellite matcher comparison below. |
|
||||
| SP+LG considered as "sparse only, worse on satellite-aerial" | **Functional**: LiteSAM paper confirms "SP+LG achieves fastest inference speed but at expense of accuracy." Sparse matcher fails on texture-scarce regions. ~180-360ms on Orin Nano Super. | **Reject SP+LG** for both VO and satellite matching. cuVSLAM is 15-33x faster for VO. |
|
||||
| cuVSLAM on low-texture terrain | **Functional**: cuVSLAM uses Shi-Tomasi corners + Lucas-Kanade tracking. On uniform agricultural fields/water bodies, features will be sparse → frequent tracking loss. IMU fallback lasts only ~1s. No published benchmarks for nadir agricultural terrain. Does NOT guarantee pose recovery after tracking loss. | **CRITICAL RISK**: cuVSLAM will likely fail frequently over low-texture terrain. Mitigation: (1) increase satellite matching frequency in low-texture areas, (2) use IMU dead-reckoning bridge, (3) accept higher drift in featureless segments, (4) XFeat VO as secondary fallback may also struggle on same terrain. |
|
||||
| cuVSLAM memory estimate ~200-300MB | **Performance**: Map grows over time. For 3000-frame flights (~16min at 3fps), map could reach 500MB-1GB without pruning. | Configure cuVSLAM map pruning. Set max keyframes. Monitor memory. |
|
||||
| Tile search on VO failure: "expand to ±1km" | **Functional**: Underspecified. Loading 10-20 tiles slow from disk I/O. | Preload tiles within ±2km of flight plan into RAM. Ranked search by IMU dead-reckoning position. |
|
||||
| LiteSAM resolution | **Performance**: Paper benchmarked at 1184px on AGX Orin (497ms AMP). TensorRT FP16 with reparameterized MobileOne expected 2-3x faster. | Benchmark LiteSAM TRT FP16 at **1280px** on Orin Nano Super. If ≤200ms → use LiteSAM at 1280px. If >200ms → use XFeat. |
|
||||
| SP+LG proposed for VO by user | **Performance**: ~130-280ms/frame on Orin Nano. cuVSLAM ~8.6ms/frame. No IMU, no loop closure. | **Reject SP+LG for VO.** cuVSLAM 15-33x faster. XFeat frame-to-frame remains fallback. |
|
||||
|
||||
## Product Solution Description
|
||||
|
||||
A real-time GPS-denied visual navigation system for fixed-wing UAVs, running entirely on a Jetson Orin Nano Super (8GB). The system determines frame-center GPS coordinates by fusing three information sources: (1) CUDA-accelerated visual odometry (cuVSLAM), (2) absolute position corrections from satellite image matching, and (3) IMU-based motion prediction. Results stream to clients via REST API + SSE in real time.
|
||||
|
||||
**Hard constraint**: Camera shoots at ~3fps (333-400ms interval). The full pipeline must complete within **400ms per frame**.
|
||||
|
||||
**Satellite matching strategy**: Benchmark LiteSAM TensorRT FP16 at **1280px** on Orin Nano Super as a day-one priority. The paper's AGX Orin benchmark used PyTorch AMP — TensorRT FP16 with reparameterized MobileOne should yield 2-3x additional speedup. **Decision rule: if LiteSAM TRT FP16 at 1280px ≤200ms → use LiteSAM. If >200ms → use XFeat.**
|
||||
|
||||
**Core architectural principles**:
|
||||
1. **cuVSLAM handles VO** — 116fps on Orin Nano 8GB, ~8.6ms/frame. SuperPoint+LightGlue was evaluated and rejected (15-33x slower, no IMU integration).
|
||||
2. **Keyframe-based satellite matching** — satellite matcher runs on keyframes only (every 3-10 frames), amortizing cost. Non-keyframes rely on cuVSLAM VO + IMU.
|
||||
3. **Every keyframe independently attempts satellite-based geo-localization** — handles disconnected segments natively.
|
||||
4. **Pipeline parallelism** — satellite matching for frame N overlaps with VO processing of frame N+1 via CUDA streams.
|
||||
5. **Proactive tile loading** — preload tiles within ±2km of flight plan into RAM for fast lookup during expanded search.
|
||||
|
||||
```
|
||||
┌─────────────────────────────────────────────────────────────────┐
|
||||
│ OFFLINE (Before Flight) │
|
||||
│ Satellite Tiles → Download & Crop → Store as tile pairs │
|
||||
│ (Google Maps) (per flight plan) (disk, GeoHash indexed) │
|
||||
└─────────────────────────────────────────────────────────────────┘
|
||||
│
|
||||
▼
|
||||
┌─────────────────────────────────────────────────────────────────┐
|
||||
│ ONLINE (During Flight) │
|
||||
│ │
|
||||
│ EVERY FRAME (400ms budget): │
|
||||
│ ┌────────────────────────────────┐ │
|
||||
│ │ Camera → Downsample (CUDA 2ms)│ │
|
||||
│ │ → cuVSLAM VO+IMU (~9ms) │──→ ESKF Update → SSE Emit │
|
||||
│ └────────────────────────────────┘ ↑ │
|
||||
│ │ │
|
||||
│ KEYFRAMES ONLY (every 3-10 frames): │ │
|
||||
│ ┌────────────────────────────────────┐ │ │
|
||||
│ │ Satellite match (async CUDA stream)│─────┘ │
|
||||
│ │ LiteSAM TRT FP16 or XFeat │ │
|
||||
│ │ (does NOT block VO output) │ │
|
||||
│ └────────────────────────────────────┘ │
|
||||
│ │
|
||||
│ IMU: 100+Hz continuous → ESKF prediction │
|
||||
│ TILES: ±2km preloaded in RAM from flight plan │
|
||||
└─────────────────────────────────────────────────────────────────┘
|
||||
```
|
||||
|
||||
## Speed Optimization Techniques
|
||||
|
||||
### 1. cuVSLAM for Visual Odometry (~9ms/frame)
|
||||
NVIDIA's CUDA-accelerated VO library (v15.0.0, March 2026) achieves 116fps on Jetson Orin Nano 8GB at 720p. Supports monocular camera + IMU natively. Features: automatic IMU fallback when visual tracking fails, loop closure, Python and C++ APIs.
|
||||
|
||||
**Why not SuperPoint+LightGlue for VO**: SP+LG is 15-33x slower (~130-280ms vs ~9ms). Lacks IMU integration, loop closure, auto-fallback.
|
||||
|
||||
**CRITICAL: cuVSLAM on difficult/even terrain (agricultural fields, water)**:
|
||||
cuVSLAM uses Shi-Tomasi corner detection + Lucas-Kanade optical flow tracking (classical features, not learned). On uniform agricultural terrain or water bodies:
|
||||
- Very few corners will be detected → sparse/unreliable tracking
|
||||
- Frequent keyframe creation → heavier compute
|
||||
- Tracking loss → IMU fallback (~1 second) → constant-velocity integrator (~0.5s more)
|
||||
- cuVSLAM does NOT guarantee pose recovery after tracking loss
|
||||
- All published benchmarks (KITTI: urban/suburban, EuRoC: indoor) do NOT include nadir agricultural terrain
|
||||
- Multi-stereo mode helps with featureless surfaces, but we have mono camera only
|
||||
|
||||
**Mitigation strategy for low-texture terrain**:
|
||||
1. **Increase satellite matching frequency**: In low-texture areas (detected by cuVSLAM's keypoint count dropping), switch from every 3-10 frames to every frame
|
||||
2. **IMU dead-reckoning bridge**: When cuVSLAM reports tracking loss, ESKF continues with IMU prediction. At 3fps with ~1.5s IMU bridge, that covers ~4-5 frames
|
||||
3. **Accept higher drift**: In featureless segments, position accuracy degrades to IMU-only level (50-100m+ over ~10s). Satellite matching must recover absolute position when texture returns
|
||||
4. **Keypoint density monitoring**: Track cuVSLAM's number of tracked features per frame. When below threshold (e.g., <50), proactively trigger satellite matching
|
||||
5. **XFeat frame-to-frame as VO fallback**: XFeat uses learned features that may detect texture invisible to Shi-Tomasi corners. But XFeat may also struggle on truly uniform terrain
|
||||
|
||||
### 2. Keyframe-Based Satellite Matching
|
||||
Not every frame needs satellite matching. Strategy:
|
||||
- cuVSLAM provides VO at every frame (high-rate, low-latency)
|
||||
- Satellite matching triggers on **keyframes** selected by:
|
||||
- Fixed interval: every 3-10 frames (~1-3.3s between satellite corrections)
|
||||
- Confidence drop: when ESKF covariance exceeds threshold
|
||||
- VO failure: when cuVSLAM reports tracking loss (sharp turn)
|
||||
|
||||
### 3. Satellite Matcher Selection (Benchmark-Driven)
|
||||
|
||||
**Important context**: Our UAV-to-satellite matching is EASIER than typical cross-view geo-localization problems. Both the UAV camera and satellite imagery are approximately nadir (top-down). The main challenges are season/lighting differences, resolution mismatch, and temporal changes — not the extreme viewpoint gap seen in ground-to-satellite matching. This means even general-purpose matchers may perform well.
|
||||
|
||||
**Candidate A: LiteSAM (opt) with TensorRT FP16 at 1280px** — Best satellite-aerial accuracy (RMSE@30 = 17.86m on UAV-VisLoc). 6.31M params, MobileOne reparameterizable for TensorRT. Paper benchmarked at 497ms on AGX Orin using AMP at 1184px. TensorRT FP16 with reparameterized MobileOne expected 2-3x faster than AMP. At 1280px (close to paper's 1184px benchmark resolution), accuracy should match published results.
|
||||
|
||||
Orin Nano Super TensorRT FP16 estimate at 1280px:
|
||||
- AGX Orin AMP @ 1184px: 497ms
|
||||
- TRT FP16 speedup over AMP: ~2-3x → AGX Orin TRT estimate: ~165-250ms
|
||||
- Orin Nano Super is ~3-4x slower → estimate: ~500-1000ms without TRT
|
||||
- With TRT FP16: **~165-330ms** (realistic range)
|
||||
- Go/no-go threshold: **≤200ms**
|
||||
|
||||
**Candidate B (fallback): XFeat semi-dense** — ~50-100ms on Orin Nano Super. Proven on Jetson. General-purpose, not designed for cross-view gap. FASTEST option. Since our cross-view gap is small (both nadir), XFeat may work adequately for this specific use case.
|
||||
|
||||
**Other evaluated options (not selected)**:
|
||||
|
||||
- **EfficientLoFTR**: Semi-dense, 15.05M params, handles weak-texture well. ~20% slower than LiteSAM. Strong option if LiteSAM codebase proves difficult to export to TRT, but larger model footprint.
|
||||
- **Deep Homography (STHN-style)**: End-to-end homography estimation, no feature/RANSAC pipeline. 4.24m at 50m range. Interesting future option but needs RGB retraining — higher implementation risk.
|
||||
- **PFED and retrieval-based methods**: Image RETRIEVAL only (identifies which tile matches), not pixel-level matching. We already know which tile to use from ESKF position.
|
||||
- **SuperPoint+LightGlue**: Sparse matcher. LiteSAM paper confirms worse satellite-aerial accuracy. Slower than XFeat.
|
||||
|
||||
**Decision rule** (day-one on Orin Nano Super):
|
||||
1. Export LiteSAM (opt) to TensorRT FP16
|
||||
2. Benchmark at **1280px**
|
||||
3. **If ≤200ms → use LiteSAM at 1280px**
|
||||
4. **If >200ms → use XFeat**
|
||||
|
||||
### 4. TensorRT FP16 Optimization
|
||||
LiteSAM's MobileOne backbone is reparameterizable — multi-branch training structure collapses to a single feed-forward path at inference. Combined with TensorRT FP16, this maximizes throughput. **Do NOT use INT8 on transformer components** (TAIFormer) — accuracy degrades. INT8 is safe only for the MobileOne backbone CNN layers.
|
||||
|
||||
### 5. CUDA Stream Pipelining
|
||||
Overlap operations across consecutive frames:
|
||||
- Stream A: cuVSLAM VO for current frame (~9ms) + ESKF fusion (~1ms)
|
||||
- Stream B: Satellite matching for previous keyframe (async)
|
||||
- CPU: SSE emission, tile management, keyframe selection logic
|
||||
|
||||
### 6. Proactive Tile Loading
|
||||
**Change from draft01**: Instead of loading tiles on-demand from disk, preload tiles within ±2km of the flight plan into RAM at session start. This eliminates disk I/O latency during flight. For a 50km flight path, ~2000 tiles at zoom 19 ≈ ~200MB RAM — well within budget.
|
||||
|
||||
On VO failure / expanded search:
|
||||
1. Compute IMU dead-reckoning position
|
||||
2. Rank preloaded tiles by distance to predicted position
|
||||
3. Try top 3 tiles (not all tiles in ±1km radius)
|
||||
4. If no match in top 3, expand to next 3
|
||||
|
||||
## Existing/Competitor Solutions Analysis
|
||||
|
||||
| Solution | Approach | Accuracy | Hardware | Limitations |
|
||||
|----------|----------|----------|----------|-------------|
|
||||
| Mateos-Ramirez et al. (2024) | VO (ORB) + satellite keypoint correction + Kalman | 142m mean / 17km (0.83%) | Orange Pi class | No re-localization; ORB only; 1000m+ altitude |
|
||||
| SatLoc (2025) | DinoV2 + XFeat + optical flow + adaptive fusion | <15m, >90% coverage | Edge (unspecified) | Paper not fully accessible |
|
||||
| LiteSAM (2025) | MobileOne + TAIFormer + MinGRU subpixel refinement | RMSE@30 = 17.86m on UAV-VisLoc | RTX 3090 (62ms), AGX Orin (497ms@1184px) | Not tested on Orin Nano; AGX Orin is 3-4x more powerful |
|
||||
| TerboucheHacene/visual_localization | SuperPoint/SuperGlue/GIM + VO + satellite | Not quantified | Desktop-class | Not edge-optimized |
|
||||
| cuVSLAM (NVIDIA, 2025-2026) | CUDA-accelerated VO+SLAM, mono/stereo/IMU | <1% trajectory error (KITTI), <5cm (EuRoC) | Jetson Orin Nano (116fps) | VO only, no satellite matching |
|
||||
| VRLM (2024) | FocalNet backbone + multi-scale feature fusion | 83.35% MA@20 | Desktop | Not edge-optimized |
|
||||
| Scale-Aware UAV-to-Satellite (2026) | Semantic geometric + metric scale recovery | N/A | Desktop | Addresses scale ambiguity problem |
|
||||
| EfficientLoFTR (CVPR 2024) | Aggregated attention + adaptive token selection, semi-dense | Competitive with LiteSAM | 2.5x faster than LoFTR, TRT available | 15.05M params, heavier than LiteSAM |
|
||||
| PFED (2025) | Knowledge distillation + multi-view refinement, retrieval | 97.15% Recall@1 (University-1652) | AGX Orin (251.5 FPS) | Retrieval only, not pixel-level matching |
|
||||
| STHN (IEEE RA-L 2024) | Deep homography estimation, coarse-to-fine | 4.24m at 50m range | Open-source, lightweight | Trained on thermal, needs RGB retraining |
|
||||
| Hierarchical AVL (2025) | DINOv2 retrieval + SuperPoint matching | 64.5-95% success rate | ROS, IMU integration | Two-stage complexity |
|
||||
| JointLoc (IROS 2024) | Retrieval + VO fusion, adaptive weighting | 0.237m RMSE over 1km | Open-source | Designed for Mars/planetary, needs adaptation |
|
||||
|
||||
## Architecture
|
||||
|
||||
### Component: Visual Odometry
|
||||
|
||||
| Solution | Tools | Advantages | Limitations | Performance | Fit |
|
||||
|----------|-------|-----------|-------------|------------|-----|
|
||||
| cuVSLAM (mono+IMU) | PyCuVSLAM v15.0.0 | 116fps on Orin Nano, NVIDIA-optimized, loop closure, IMU fallback | Closed-source CUDA library | ~9ms/frame | ✅ Best |
|
||||
| XFeat frame-to-frame | XFeatTensorRT | 5x faster than SuperPoint, open-source | ~30-50ms total, no IMU integration | ~30-50ms/frame | ⚠️ Fallback |
|
||||
| SuperPoint+LightGlue | LightGlue-ONNX TRT | Good accuracy, adaptive pruning | ~130-280ms, no IMU, no loop closure | ~130-280ms/frame | ❌ Rejected |
|
||||
| ORB-SLAM3 | OpenCV + custom | Well-understood, open-source | CPU-heavy, ~30fps on Orin | ~33ms/frame | ⚠️ Slower |
|
||||
|
||||
**Selected**: **cuVSLAM (mono+IMU mode)** — 116fps, purpose-built by NVIDIA for Jetson. Auto-fallback to IMU when visual tracking fails.
|
||||
|
||||
**SP+LG rejection rationale**: 15-33x slower than cuVSLAM. No built-in IMU fusion, loop closure, or tracking failure detection. Building these features around SP+LG would take significant development time and still be slower. XFeat at ~30-50ms is a better fallback for VO if cuVSLAM fails on nadir camera.
|
||||
|
||||
### Component: Satellite Image Matching
|
||||
|
||||
| Solution | Tools | Advantages | Limitations | Performance | Fit |
|
||||
|----------|-------|-----------|-------------|------------|-----|
|
||||
| LiteSAM (opt) TRT FP16 @ 1280px | TensorRT | Best satellite-aerial accuracy (RMSE@30 17.86m), 6.31M params, subpixel refinement | Untested on Orin Nano Super with TensorRT | Est. ~165-330ms @ 1280px TRT FP16 | ✅ If ≤200ms |
|
||||
| XFeat semi-dense | XFeatTensorRT | ~50-100ms, lightweight, Jetson-proven, fastest | General-purpose, not designed for cross-view. Our nadir-nadir gap is small → may work. | ~50-100ms | ✅ Fallback if LiteSAM >200ms |
|
||||
|
||||
**Selection**: Day-one benchmark on Orin Nano Super:
|
||||
1. Export LiteSAM (opt) to TensorRT FP16
|
||||
2. Benchmark at **1280px**
|
||||
3. **If ≤200ms → LiteSAM at 1280px**
|
||||
4. **If >200ms → XFeat**
|
||||
|
||||
### Component: Sensor Fusion
|
||||
|
||||
| Solution | Tools | Advantages | Limitations | Performance | Fit |
|
||||
|----------|-------|-----------|-------------|------------|-----|
|
||||
| Error-State EKF (ESKF) | Custom Python/C++ | Lightweight, multi-rate, well-understood | Linear approximation | <1ms/step | ✅ Best |
|
||||
| Hybrid ESKF/UKF | Custom | 49% better accuracy | More complex | ~2-3ms/step | ⚠️ Upgrade path |
|
||||
| Factor Graph (GTSAM) | GTSAM | Best accuracy | Heavy compute | ~10-50ms/step | ❌ Too heavy |
|
||||
|
||||
**Selected**: **ESKF** with adaptive measurement noise. State vector: [position(3), velocity(3), orientation_quat(4), accel_bias(3), gyro_bias(3)] = 16 states.
|
||||
|
||||
### Component: Satellite Tile Preprocessing (Offline)
|
||||
|
||||
**Selected**: **GeoHash-indexed tile pairs on disk + RAM preloading**.
|
||||
|
||||
Pipeline:
|
||||
1. Define operational area from flight plan
|
||||
2. Download satellite tiles from Google Maps Tile API at max zoom (18-19)
|
||||
3. Pre-resize each tile to matcher input resolution
|
||||
4. Store: original tile + resized tile + metadata (GPS bounds, zoom, GSD) in GeoHash-indexed directory structure
|
||||
5. Copy to Jetson storage before flight
|
||||
6. **At session start**: preload tiles within ±2km of flight plan into RAM (~200MB for 50km route)
|
||||
|
||||
### Component: Re-localization (Disconnected Segments)
|
||||
|
||||
**Selected**: **Keyframe satellite matching is always active + ranked tile search on VO failure**.
|
||||
|
||||
When cuVSLAM reports tracking loss (sharp turn, no features):
|
||||
1. Immediately flag next frame as keyframe → trigger satellite matching
|
||||
2. Compute IMU dead-reckoning position since last known position
|
||||
3. Rank preloaded tiles by distance to dead-reckoning position
|
||||
4. Try top 3 tiles sequentially (not all tiles in radius)
|
||||
5. If match found: position recovered, new segment begins
|
||||
6. If 3 consecutive keyframe failures across top tiles: expand to next 3 tiles
|
||||
7. If still no match after 3+ full attempts: request user input via API
|
||||
|
||||
### Component: Object Center Coordinates
|
||||
|
||||
Geometric calculation once frame-center GPS is known:
|
||||
1. Pixel offset from center: (dx_px, dy_px)
|
||||
2. Convert to meters: dx_m = dx_px × GSD, dy_m = dy_px × GSD
|
||||
3. Rotate by IMU yaw heading
|
||||
4. Convert meter offset to lat/lon and add to frame-center GPS
|
||||
|
||||
### Component: API & Streaming
|
||||
|
||||
**Selected**: **FastAPI + sse-starlette**. REST for session management, SSE for real-time position stream. OpenAPI auto-documentation.
|
||||
|
||||
## Processing Time Budget (per frame, 400ms budget)
|
||||
|
||||
### Normal Frame (non-keyframe, ~60-80% of frames)
|
||||
|
||||
| Step | Time | Notes |
|
||||
|------|------|-------|
|
||||
| Image capture + transfer | ~10ms | CSI/USB3 |
|
||||
| Downsample (for cuVSLAM) | ~2ms | OpenCV CUDA |
|
||||
| cuVSLAM VO+IMU | ~9ms | NVIDIA CUDA-optimized, 116fps capable |
|
||||
| ESKF fusion (VO+IMU update) | ~1ms | C extension or NumPy |
|
||||
| SSE emit | ~1ms | Async |
|
||||
| **Total** | **~23ms** | Well within 400ms |
|
||||
|
||||
### Keyframe Satellite Matching (async, every 3-10 frames)
|
||||
|
||||
Runs asynchronously on a separate CUDA stream — does NOT block per-frame VO output.
|
||||
|
||||
**Path A — LiteSAM TRT FP16 at 1280px (if ≤200ms benchmark)**:
|
||||
|
||||
| Step | Time | Notes |
|
||||
|------|------|-------|
|
||||
| Downsample to 1280px | ~1ms | OpenCV CUDA |
|
||||
| Load satellite tile | ~1ms | Pre-loaded in RAM |
|
||||
| LiteSAM (opt) TRT FP16 matching | ≤200ms | TensorRT FP16, 1280px, go/no-go threshold |
|
||||
| Geometric pose (RANSAC) | ~5ms | Homography estimation |
|
||||
| ESKF satellite update | ~1ms | Delayed measurement |
|
||||
| **Total** | **≤210ms** | Async, within budget |
|
||||
|
||||
**Path B — XFeat (if LiteSAM >200ms)**:
|
||||
|
||||
| Step | Time | Notes |
|
||||
|------|------|-------|
|
||||
| XFeat feature extraction (both images) | ~10-20ms | TensorRT FP16/INT8 |
|
||||
| XFeat semi-dense matching | ~30-50ms | KNN + refinement |
|
||||
| Geometric verification (RANSAC) | ~5ms | |
|
||||
| ESKF satellite update | ~1ms | |
|
||||
| **Total** | **~50-80ms** | Comfortably within budget |
|
||||
|
||||
## Memory Budget (Jetson Orin Nano Super, 8GB shared)
|
||||
|
||||
| Component | Memory | Notes |
|
||||
|-----------|--------|-------|
|
||||
| OS + runtime | ~1.5GB | JetPack 6.2 + Python |
|
||||
| cuVSLAM | ~200-500MB | CUDA library + map state. **Configure map pruning for 3000-frame flights** |
|
||||
| Satellite matcher TensorRT | ~50-100MB | LiteSAM FP16 or XFeat FP16 |
|
||||
| Preloaded satellite tiles | ~200MB | ±2km of flight plan, pre-resized |
|
||||
| Current frame (downsampled) | ~2MB | 640×480×3 |
|
||||
| ESKF state + buffers | ~10MB | |
|
||||
| FastAPI + SSE runtime | ~100MB | |
|
||||
| **Total** | **~2.1-2.9GB** | ~26-36% of 8GB — comfortable margin |
|
||||
|
||||
## Confidence Scoring
|
||||
|
||||
| Level | Condition | Expected Accuracy |
|
||||
|-------|-----------|-------------------|
|
||||
| HIGH | Satellite match succeeded + cuVSLAM consistent | <20m |
|
||||
| MEDIUM | cuVSLAM VO only, recent satellite correction (<500m travel) | 20-50m |
|
||||
| LOW | cuVSLAM VO only, no recent satellite correction | 50-100m+ |
|
||||
| VERY LOW | IMU dead-reckoning only (cuVSLAM + satellite both failed) | 100m+ |
|
||||
| MANUAL | User-provided position | As provided |
|
||||
|
||||
## Key Risks and Mitigations
|
||||
|
||||
| Risk | Likelihood | Impact | Mitigation |
|
||||
|------|-----------|--------|------------|
|
||||
| **cuVSLAM fails on low-texture agricultural terrain** | **HIGH** | Frequent tracking loss, degraded VO | Increase satellite matching frequency when keypoint count drops. IMU dead-reckoning bridge (~1.5s). Accept higher drift in featureless segments. Satellite matching recovers position when texture returns. |
|
||||
| LiteSAM TRT FP16 >200ms at 1280px on Orin Nano Super | MEDIUM | Must use XFeat instead (less accurate for cross-view) | Day-one TRT FP16 benchmark. If >200ms → XFeat. Since our nadir-nadir gap is small, XFeat may still perform adequately. |
|
||||
| XFeat cross-view accuracy insufficient | MEDIUM | Satellite corrections less accurate | Benchmark XFeat on actual operational area satellite-aerial pairs. Increase keyframe frequency; multi-tile consensus; strict RANSAC. |
|
||||
| cuVSLAM map memory growth on long flights | MEDIUM | Memory pressure | Configure map pruning, set max keyframes. Monitor memory. |
|
||||
| Google Maps satellite quality in conflict zone | HIGH | Satellite matching fails | Accept VO+IMU with higher drift; request user input sooner; alternative satellite providers |
|
||||
| cuVSLAM is closed-source, no nadir benchmarks | MEDIUM | Unknown failure modes over farmland | Extensive testing with real nadir UAV imagery before deployment. XFeat VO as fallback (also uses learned features). |
|
||||
| Tile I/O bottleneck during expanded search | LOW | Delayed re-localization | Preload ±2km tiles in RAM; ranked search instead of exhaustive |
|
||||
|
||||
## Testing Strategy
|
||||
|
||||
### Integration / Functional Tests
|
||||
- End-to-end pipeline test with real flight data (60 images from input_data/)
|
||||
- Compare computed positions against ground truth GPS from coordinates.csv
|
||||
- Measure: percentage within 50m, percentage within 20m
|
||||
- Test sharp-turn handling: introduce 90-degree heading change in sequence
|
||||
- Test user-input fallback: simulate 3+ consecutive failures
|
||||
- Test SSE streaming: verify client receives VO result within 50ms, satellite-corrected result within 500ms
|
||||
- Test session management: start/stop/restart flight sessions via REST API
|
||||
- Test cuVSLAM map memory: run 3000-frame session, monitor memory growth
|
||||
|
||||
### Non-Functional Tests
|
||||
- **Day-one satellite matcher benchmark**: LiteSAM TRT FP16 at **1280px** on Orin Nano Super. If ≤200ms → use LiteSAM. If >200ms → use XFeat. Also measure accuracy on test satellite-aerial pairs for both.
|
||||
- cuVSLAM benchmark: verify 116fps monocular+IMU on Orin Nano Super
|
||||
- **cuVSLAM terrain stress test**: test with nadir camera over (a) urban/structured terrain, (b) agricultural fields, (c) water/uniform terrain, (d) forest. Measure: keypoint count, tracking success rate, drift per 100 frames, IMU fallback frequency
|
||||
- cuVSLAM keypoint monitoring: verify that low-keypoint detection triggers increased satellite matching
|
||||
- Performance: measure per-frame processing time (must be <400ms)
|
||||
- Memory: monitor peak usage during 3000-frame session (must stay <8GB)
|
||||
- Stress: process 3000 frames without memory leak
|
||||
- Keyframe strategy: vary interval (2, 3, 5, 10) and measure accuracy vs latency tradeoff
|
||||
- Tile preloading: verify RAM usage of preloaded tiles for 50km flight plan
|
||||
|
||||
## References
|
||||
- EfficientLoFTR (CVPR 2024): https://github.com/zju3dv/EfficientLoFTR
|
||||
- EfficientLoFTR paper: https://zju3dv.github.io/efficientloftr/
|
||||
- LoFTR TensorRT adaptation: https://github.com/Kolkir/LoFTR_TRT
|
||||
- PFED (2025): https://github.com/SkyEyeLoc/PFED
|
||||
- STHN (IEEE RA-L 2024): https://github.com/arplaboratory/STHN
|
||||
- JointLoc (IROS 2024): https://github.com/LuoXubo/JointLoc
|
||||
- Hierarchical AVL (MDPI 2025): https://www.mdpi.com/2072-4292/17/20/3470
|
||||
- LiteSAM (2025): https://www.mdpi.com/2072-4292/17/19/3349
|
||||
- LiteSAM code: https://github.com/boyagesmile/LiteSAM
|
||||
- cuVSLAM (2025-2026): https://github.com/NVlabs/PyCuVSLAM
|
||||
- cuVSLAM paper: https://arxiv.org/abs/2506.04359
|
||||
- PyCuVSLAM API: https://nvlabs.github.io/PyCuVSLAM/api.html
|
||||
- Intermodalics cuVSLAM benchmark: https://www.intermodalics.ai/blog/nvidia-isaac-ros-in-depth-cuvslam-and-the-dp3-1-release
|
||||
- Mateos-Ramirez et al. (2024): https://www.mdpi.com/2076-3417/14/16/7420
|
||||
- SatLoc (2025): https://www.scilit.com/publications/e5cafaf875a49297a62b298a89d5572f
|
||||
- XFeat (CVPR 2024): https://arxiv.org/abs/2404.19174
|
||||
- XFeat TensorRT for Jetson: https://github.com/PranavNedunghat/XFeatTensorRT
|
||||
- EfficientLoFTR (CVPR 2024): https://github.com/zju3dv/EfficientLoFTR
|
||||
- LightGlue (ICCV 2023): https://github.com/cvg/LightGlue
|
||||
- LightGlue TensorRT: https://fabio-sim.github.io/blog/accelerating-lightglue-inference-onnx-runtime-tensorrt/
|
||||
- LightGlue TRT Jetson: https://github.com/qdLMF/LightGlue-with-FlashAttentionV2-TensorRT
|
||||
- ForestVO / SP+LG VO: https://arxiv.org/html/2504.01261v1
|
||||
- vo_lightglue (SP+LG VO): https://github.com/himadrir/vo_lightglue
|
||||
- JetPack 6.2: https://docs.nvidia.com/jetson/archives/jetpack-archived/jetpack-62/release-notes/
|
||||
- Hybrid ESKF/UKF: https://arxiv.org/abs/2512.17505
|
||||
- Google Maps Tile API: https://developers.google.com/maps/documentation/tile/satellite
|
||||
|
||||
## Related Artifacts
|
||||
- AC Assessment: `_docs/00_research/gps_denied_nav/00_ac_assessment.md`
|
||||
- Tech stack evaluation: `_docs/01_solution/tech_stack.md`
|
||||
- Security analysis: `_docs/01_solution/security_analysis.md`
|
||||
@@ -0,0 +1,491 @@
|
||||
# Solution Draft
|
||||
|
||||
## Assessment Findings
|
||||
|
||||
| Old Component Solution | Weak Point (functional/security/performance) | New Solution |
|
||||
|------------------------|----------------------------------------------|-------------|
|
||||
| FastAPI + SSE as primary output | **Functional**: New AC requires MAVLink GPS_INPUT to flight controller, not REST/SSE. The system must act as a GPS replacement module. SSE is wrong output channel. | **Replace with pymavlink GPS_INPUT sender**. Send GPS_INPUT at 5-10Hz to flight controller via UART. Retain minimal FastAPI only for local IPC (object localization API). |
|
||||
| No ground station integration | **Functional**: New AC requires streaming position+confidence to ground station and receiving re-localization commands via telemetry. Draft02 had no telemetry. | **MAVLink telemetry integration**: GPS data forwarded automatically by flight controller. Custom data via NAMED_VALUE_FLOAT (confidence, drift). Re-localization hints via COMMAND_LONG listener. |
|
||||
| MAVSDK library (per restriction) | **Functional**: MAVSDK-Python v3.15.3 cannot send GPS_INPUT messages. Feature requested since 2021, still unresolved. This is a blocking limitation for the core output function. | **Use pymavlink** for all MAVLink communication. pymavlink provides `gps_input_send()` and full MAVLink v2 access. Note conflict with restriction — pymavlink is the only viable option. |
|
||||
| 3fps camera → ~3Hz output | **Performance**: ArduPilot GPS_RATE_MS minimum is 5Hz (200ms). 3Hz camera output is below minimum. Flight controller EKF may not fuse properly. | **IMU-interpolated 5-10Hz GPS_INPUT**: ESKF prediction runs at 100+Hz internally. Emit predicted state as GPS_INPUT at 5-10Hz. Camera corrections arrive at 3Hz within this stream. |
|
||||
| No startup/failsafe procedures | **Functional**: New AC requires init from last GPS, reboot recovery, IMU-only fallback. Draft02 assumed position was already known. | **Full lifecycle management**: (1) Boot → read GPS from flight controller → init ESKF. (2) Reboot → read IMU-extrapolated position → re-init. (3) N-second failure → stop GPS_INPUT → autopilot falls back to IMU. |
|
||||
| Basic object localization (nadir only) | **Functional**: New AC adds AI camera with configurable angle and zoom. Nadir pixel-to-GPS is insufficient. | **Trigonometric projection for oblique camera**: ground_distance = alt × tan(tilt), bearing = heading + pan + pixel offset. Local API for AI system requests. |
|
||||
| No thermal management | **Performance**: Jetson Orin Nano Super throttles at 80°C (GPU drops 1GHz→300MHz = 3x slowdown). Could blow 400ms budget. | **Thermal monitoring + adaptive pipeline**: Use 25W mode. Monitor via tegrastats. If temp >75°C → reduce satellite matching frequency. If >80°C → VO+IMU only. |
|
||||
| ESKF covariance without explicit drift budget | **Functional**: New AC requires max 100m cumulative VO drift between satellite anchors. Draft02 uses covariance for keyframe selection but no explicit budget. | **Drift budget tracker**: √(σ_x² + σ_y²) from ESKF as drift estimate. When approaching 100m → force every-frame satellite matching. Report via horiz_accuracy in GPS_INPUT. |
|
||||
| No satellite imagery validation | **Functional**: New AC requires ≥0.5 m/pixel, <2 years old. Draft02 didn't validate. | **Preprocessing validation step**: Check zoom 19 availability (0.3 m/pixel). Fall back to zoom 18 (0.6 m/pixel). Flag stale tiles. |
|
||||
| "Ask user via API" for re-localization | **Functional**: New AC says send re-localization request to ground station via telemetry link, not REST API. Operator sends hint via telemetry. | **MAVLink re-localization protocol**: On 3 consecutive failures → send STATUSTEXT alert to ground station. Operator sends COMMAND_LONG with approximate lat/lon. System uses hint to constrain tile search. |
|
||||
|
||||
## Product Solution Description
|
||||
|
||||
A real-time GPS-denied visual navigation system for fixed-wing UAVs, running on a Jetson Orin Nano Super (8GB). The system replaces the GPS module for the flight controller by sending MAVLink GPS_INPUT messages via pymavlink over UART. Position is determined by fusing: (1) CUDA-accelerated visual odometry (cuVSLAM), (2) absolute position corrections from satellite image matching, and (3) IMU data from the flight controller. GPS_INPUT is sent at 5-10Hz, with camera-based corrections at 3Hz and IMU prediction filling the gaps.
|
||||
|
||||
**Hard constraint**: Camera shoots at ~3fps (333ms interval). The full VO+ESKF pipeline must complete within 400ms per frame. GPS_INPUT output rate: 5-10Hz minimum (ArduPilot EKF requirement).
|
||||
|
||||
**Output architecture**:
|
||||
- **Primary**: pymavlink → GPS_INPUT to flight controller via UART (replaces GPS module)
|
||||
- **Telemetry**: Flight controller auto-forwards GPS data to ground station. Custom NAMED_VALUE_FLOAT for confidence/drift at 1Hz
|
||||
- **Commands**: Ground station → COMMAND_LONG → flight controller → pymavlink listener on companion computer
|
||||
- **Local IPC**: Minimal FastAPI on localhost for object localization requests from AI systems
|
||||
|
||||
```
|
||||
┌─────────────────────────────────────────────────────────────────────┐
|
||||
│ OFFLINE (Before Flight) │
|
||||
│ Satellite Tiles → Download & Validate → Pre-resize → Store │
|
||||
│ (Google Maps) (≥0.5m/px, <2yr) (matcher res) (GeoHash) │
|
||||
│ Copy to Jetson storage │
|
||||
└─────────────────────────────────────────────────────────────────────┘
|
||||
│
|
||||
▼
|
||||
┌─────────────────────────────────────────────────────────────────────┐
|
||||
│ ONLINE (During Flight) │
|
||||
│ │
|
||||
│ STARTUP: │
|
||||
│ pymavlink → read GLOBAL_POSITION_INT → init ESKF → start cuVSLAM │
|
||||
│ │
|
||||
│ EVERY FRAME (3fps, 333ms interval): │
|
||||
│ ┌──────────────────────────────────────┐ │
|
||||
│ │ Nav Camera → Downsample (CUDA ~2ms) │ │
|
||||
│ │ → cuVSLAM VO+IMU (~9ms) │ │
|
||||
│ │ → ESKF measurement update │ │
|
||||
│ └──────────────────────────────────────┘ │
|
||||
│ │
|
||||
│ 5-10Hz CONTINUOUS (between camera frames): │
|
||||
│ ┌──────────────────────────────────────┐ │
|
||||
│ │ ESKF IMU prediction → GPS_INPUT send │──→ Flight Controller │
|
||||
│ │ (pymavlink, every 100-200ms) │ (GPS1_TYPE=14) │
|
||||
│ └──────────────────────────────────────┘ │
|
||||
│ │
|
||||
│ KEYFRAMES (every 3-10 frames, async): │
|
||||
│ ┌──────────────────────────────────────┐ │
|
||||
│ │ Satellite match (CUDA stream B) │──→ ESKF correction │
|
||||
│ │ LiteSAM TRT FP16 or XFeat │ │
|
||||
│ └──────────────────────────────────────┘ │
|
||||
│ │
|
||||
│ TELEMETRY (1Hz): │
|
||||
│ ┌──────────────────────────────────────┐ │
|
||||
│ │ NAMED_VALUE_FLOAT: confidence, drift │──→ Ground Station │
|
||||
│ │ STATUSTEXT: alerts, re-loc requests │ (via telemetry radio) │
|
||||
│ └──────────────────────────────────────┘ │
|
||||
│ │
|
||||
│ COMMANDS (from ground station): │
|
||||
│ ┌──────────────────────────────────────┐ │
|
||||
│ │ Listen COMMAND_LONG: re-loc hint │←── Ground Station │
|
||||
│ │ (lat/lon from operator) │ (via telemetry radio) │
|
||||
│ └──────────────────────────────────────┘ │
|
||||
│ │
|
||||
│ LOCAL IPC: │
|
||||
│ ┌──────────────────────────────────────┐ │
|
||||
│ │ FastAPI localhost:8000 │←── AI Detection System │
|
||||
│ │ POST /localize (object GPS calc) │ │
|
||||
│ │ GET /status (system health) │ │
|
||||
│ └──────────────────────────────────────┘ │
|
||||
│ │
|
||||
│ IMU: 100+Hz from flight controller → ESKF prediction │
|
||||
│ TILES: ±2km preloaded in RAM from flight plan │
|
||||
│ THERMAL: Monitor via tegrastats, adaptive pipeline throttling │
|
||||
└─────────────────────────────────────────────────────────────────────┘
|
||||
```
|
||||
|
||||
## Speed Optimization Techniques
|
||||
|
||||
### 1. cuVSLAM for Visual Odometry (~9ms/frame)
|
||||
NVIDIA's CUDA-accelerated VO library (PyCuVSLAM v15.0.0, March 2026) achieves 116fps on Jetson Orin Nano 8GB at 720p. Supports monocular camera + IMU natively. Auto-fallback to IMU when visual tracking fails, loop closure, Python and C++ APIs.
|
||||
|
||||
**CRITICAL: cuVSLAM on low-texture terrain (agricultural fields, water)**:
|
||||
cuVSLAM uses Shi-Tomasi corners + Lucas-Kanade optical flow (classical features). On uniform agricultural terrain:
|
||||
- Few corners detected → sparse/unreliable tracking
|
||||
- Frequent keyframe creation → heavier compute
|
||||
- Tracking loss → IMU fallback (~1s) → constant-velocity integrator (~0.5s)
|
||||
- cuVSLAM does NOT guarantee pose recovery after tracking loss
|
||||
|
||||
**Mitigation**:
|
||||
1. Increase satellite matching frequency when cuVSLAM keypoint count drops
|
||||
2. IMU dead-reckoning bridge via ESKF (continues GPS_INPUT output during tracking loss)
|
||||
3. Accept higher drift in featureless segments — report via horiz_accuracy
|
||||
4. Keypoint density monitoring triggers adaptive satellite matching
|
||||
|
||||
### 2. Keyframe-Based Satellite Matching
|
||||
Not every frame needs satellite matching:
|
||||
- cuVSLAM provides VO at every frame (~9ms)
|
||||
- Satellite matching triggers on keyframes selected by:
|
||||
- Fixed interval: every 3-10 frames
|
||||
- ESKF covariance exceeds threshold (drift approaching budget)
|
||||
- VO failure: cuVSLAM reports tracking loss
|
||||
- Thermal: reduce frequency if temperature high
|
||||
|
||||
### 3. Satellite Matcher Selection (Benchmark-Driven)
|
||||
|
||||
**Context**: Our UAV-to-satellite matching is nadir-to-nadir (both top-down). Challenges are season/lighting differences and temporal changes, not extreme viewpoint gaps.
|
||||
|
||||
**Candidate A: LiteSAM (opt) TRT FP16 @ 1280px** — Best satellite-aerial accuracy (RMSE@30 = 17.86m on UAV-VisLoc). 6.31M params. TensorRT FP16 with reparameterized MobileOne. Estimated ~165-330ms on Orin Nano Super with TRT FP16.
|
||||
|
||||
**Candidate B: XFeat semi-dense** — ~50-100ms on Orin Nano Super. Fastest option. General-purpose but our nadir-nadir gap is small.
|
||||
|
||||
**Decision rule** (day-one on Orin Nano Super):
|
||||
1. Export LiteSAM (opt) to TensorRT FP16
|
||||
2. Benchmark at 1280px
|
||||
3. If ≤200ms → LiteSAM at 1280px
|
||||
4. If >200ms → XFeat
|
||||
|
||||
### 4. TensorRT FP16 Optimization
|
||||
LiteSAM's MobileOne backbone is reparameterizable — multi-branch collapses to single feed-forward at inference. INT8 safe only for MobileOne CNN layers, NOT for TAIFormer transformer components.
|
||||
|
||||
### 5. CUDA Stream Pipelining
|
||||
- Stream A: cuVSLAM VO for current frame (~9ms) + ESKF fusion (~1ms)
|
||||
- Stream B: Satellite matching for previous keyframe (async, does not block VO)
|
||||
- CPU: GPS_INPUT output loop, NAMED_VALUE_FLOAT, command listener, tile management
|
||||
|
||||
### 6. Proactive Tile Loading
|
||||
Preload tiles within ±2km of flight plan into RAM at startup. For a 50km route, ~2000 tiles at zoom 19 ≈ ~200MB. Eliminates disk I/O during flight.
|
||||
|
||||
On VO failure / expanded search:
|
||||
1. Compute IMU dead-reckoning position
|
||||
2. Rank preloaded tiles by distance to predicted position
|
||||
3. Try top 3 tiles, then expand
|
||||
|
||||
### 7. 5-10Hz GPS_INPUT Output Loop
|
||||
Dedicated thread/coroutine sends GPS_INPUT at fixed rate (5-10Hz):
|
||||
1. Read current ESKF state (position, velocity, covariance)
|
||||
2. Compute horiz_accuracy from √(σ_x² + σ_y²)
|
||||
3. Set fix_type based on last correction type (3=satellite-corrected, 2=VO-only, 1=IMU-only)
|
||||
4. Send via `mav.gps_input_send()`
|
||||
5. Sleep until next interval
|
||||
|
||||
This decouples camera frame rate (3fps) from GPS_INPUT rate (5-10Hz).
|
||||
|
||||
## Existing/Competitor Solutions Analysis
|
||||
|
||||
| Solution | Approach | Accuracy | Hardware | Limitations |
|
||||
|----------|----------|----------|----------|-------------|
|
||||
| Mateos-Ramirez et al. (2024) | VO (ORB) + satellite keypoint correction + Kalman | 142m mean / 17km (0.83%) | Orange Pi class | No re-localization; ORB only; 1000m+ altitude |
|
||||
| SatLoc (2025) | DinoV2 + XFeat + optical flow + adaptive fusion | <15m, >90% coverage | Edge (unspecified) | Paper not fully accessible |
|
||||
| LiteSAM (2025) | MobileOne + TAIFormer + MinGRU subpixel refinement | RMSE@30 = 17.86m on UAV-VisLoc | RTX 3090 (62ms), AGX Orin (497ms@1184px) | Not tested on Orin Nano |
|
||||
| cuVSLAM (NVIDIA, 2025-2026) | CUDA-accelerated VO+SLAM, mono/stereo/IMU | <1% trajectory error (KITTI) | Jetson Orin Nano (116fps) | VO only, no satellite matching |
|
||||
| EfficientLoFTR (CVPR 2024) | Aggregated attention + adaptive token selection | Competitive with LiteSAM | TRT available | 15.05M params, heavier |
|
||||
| STHN (IEEE RA-L 2024) | Deep homography estimation | 4.24m at 50m range | Lightweight | Needs RGB retraining |
|
||||
| JointLoc (IROS 2024) | Retrieval + VO fusion, adaptive weighting | 0.237m RMSE over 1km | Open-source | Planetary, needs adaptation |
|
||||
|
||||
## Architecture
|
||||
|
||||
### Component: Flight Controller Integration (NEW)
|
||||
|
||||
| Solution | Tools | Advantages | Limitations | Performance | Fit |
|
||||
|----------|-------|-----------|-------------|------------|-----|
|
||||
| pymavlink GPS_INPUT | pymavlink | Full MAVLink v2 access, GPS_INPUT support, pure Python, aarch64 compatible | Lower-level API, manual message handling | ~1ms per send | ✅ Best |
|
||||
| MAVSDK-Python TelemetryServer | MAVSDK v3.15.3 | Higher-level API, aarch64 wheels | NO GPS_INPUT support, no custom messages | N/A — missing feature | ❌ Blocked |
|
||||
| MAVSDK C++ MavlinkDirect | MAVSDK v4 (future) | Custom message support planned | Not available in Python wrapper yet | N/A — not released | ❌ Not available |
|
||||
| MAVROS (ROS) | ROS + MAVROS | Full GPS_INPUT support, ROS ecosystem | Heavy ROS dependency, complex setup, unnecessary overhead | ~5ms overhead | ⚠️ Overkill |
|
||||
|
||||
**Selected**: **pymavlink** — only viable Python library for GPS_INPUT. Pure Python, works on aarch64, full MAVLink v2 message set.
|
||||
|
||||
**Restriction note**: restrictions.md specifies "MAVSDK library" but MAVSDK-Python cannot send GPS_INPUT (confirmed: Issue #320, open since 2021). pymavlink is the necessary alternative.
|
||||
|
||||
Configuration:
|
||||
- Connection: UART (`/dev/ttyTHS0` or `/dev/ttyTHS1` on Jetson, 115200-921600 baud)
|
||||
- Flight controller: GPS1_TYPE=14, SERIAL2_PROTOCOL=2 (MAVLink2)
|
||||
- GPS_INPUT rate: 5-10Hz (dedicated output thread)
|
||||
- Heartbeat: 1Hz to maintain connection
|
||||
|
||||
### Component: Visual Odometry
|
||||
|
||||
| Solution | Tools | Advantages | Limitations | Performance | Fit |
|
||||
|----------|-------|-----------|-------------|------------|-----|
|
||||
| cuVSLAM (mono+IMU) | PyCuVSLAM v15.0.0 | 116fps on Orin Nano, NVIDIA-optimized, loop closure, IMU fallback | Closed-source, low-texture terrain risk | ~9ms/frame | ✅ Best |
|
||||
| XFeat frame-to-frame | XFeatTensorRT | Open-source, learned features | No IMU integration, ~30-50ms | ~30-50ms/frame | ⚠️ Fallback |
|
||||
| ORB-SLAM3 | OpenCV + custom | Well-understood, open-source | CPU-heavy, ~30fps | ~33ms/frame | ⚠️ Slower |
|
||||
|
||||
**Selected**: **cuVSLAM (mono+IMU mode)** — 116fps, purpose-built for Jetson.
|
||||
|
||||
### Component: Satellite Image Matching
|
||||
|
||||
| Solution | Tools | Advantages | Limitations | Performance | Fit |
|
||||
|----------|-------|-----------|-------------|------------|-----|
|
||||
| LiteSAM (opt) TRT FP16 @ 1280px | TensorRT | Best satellite-aerial accuracy, 6.31M params | Untested on Orin Nano Super TRT | Est. ~165-330ms TRT FP16 | ✅ If ≤200ms |
|
||||
| XFeat semi-dense | XFeatTensorRT | ~50-100ms, Jetson-proven, fastest | General-purpose | ~50-100ms | ✅ Fallback |
|
||||
|
||||
**Selection**: Day-one benchmark. LiteSAM TRT FP16 at 1280px → if ≤200ms → LiteSAM. If >200ms → XFeat.
|
||||
|
||||
### Component: Sensor Fusion
|
||||
|
||||
| Solution | Tools | Advantages | Limitations | Performance | Fit |
|
||||
|----------|-------|-----------|-------------|------------|-----|
|
||||
| ESKF (custom) | Python/C++ | Lightweight, multi-rate, well-understood | Linear approximation | <1ms/step | ✅ Best |
|
||||
| Hybrid ESKF/UKF | Custom | 49% better accuracy | More complex | ~2-3ms/step | ⚠️ Upgrade path |
|
||||
|
||||
**Selected**: **ESKF** with adaptive measurement noise. State vector: [position(3), velocity(3), orientation_quat(4), accel_bias(3), gyro_bias(3)] = 16 states.
|
||||
|
||||
**Output rates**:
|
||||
- IMU prediction: 100+Hz (from flight controller IMU via pymavlink)
|
||||
- cuVSLAM VO update: ~3Hz
|
||||
- Satellite update: ~0.3-1Hz (keyframes, async)
|
||||
- GPS_INPUT output: 5-10Hz (ESKF predicted state)
|
||||
|
||||
**Drift budget**: Track √(σ_x² + σ_y²) from ESKF covariance. When approaching 100m → force every-frame satellite matching.
|
||||
|
||||
### Component: Ground Station Telemetry (NEW)
|
||||
|
||||
| Solution | Tools | Advantages | Limitations | Performance | Fit |
|
||||
|----------|-------|-----------|-------------|------------|-----|
|
||||
| MAVLink auto-forwarding + NAMED_VALUE_FLOAT | pymavlink | Standard MAVLink, no custom protocol, works with all GCS (Mission Planner, QGC) | Limited bandwidth (~12kbit/s), NAMED_VALUE_FLOAT name limited to 10 chars | ~50 bytes/msg | ✅ Best |
|
||||
| Custom MAVLink dialect messages | pymavlink + custom XML | Full flexibility | Requires custom GCS plugin, non-standard | ~50 bytes/msg | ⚠️ Complex |
|
||||
| Separate telemetry channel | TCP/UDP over separate radio | Full bandwidth | Extra hardware, extra radio | N/A | ❌ Not available |
|
||||
|
||||
**Selected**: **Standard MAVLink forwarding + NAMED_VALUE_FLOAT**
|
||||
|
||||
Telemetry data sent to ground station:
|
||||
- GPS position: auto-forwarded by flight controller from GPS_INPUT data
|
||||
- Confidence score: NAMED_VALUE_FLOAT `"gps_conf"` at 1Hz (values: 1=HIGH, 2=MEDIUM, 3=LOW, 4=VERY_LOW)
|
||||
- Drift estimate: NAMED_VALUE_FLOAT `"gps_drift"` at 1Hz (meters)
|
||||
- Matching status: NAMED_VALUE_FLOAT `"sat_match"` at 1Hz (0=inactive, 1=matching, 2=failed)
|
||||
- Alerts: STATUSTEXT for critical events (re-localization request, system failure)
|
||||
|
||||
Re-localization from ground station:
|
||||
- Operator sees drift/failure alert in GCS
|
||||
- Sends COMMAND_LONG (MAV_CMD_USER_1) with lat/lon in param5/param6
|
||||
- Companion computer listens for COMMAND_LONG with target component ID
|
||||
- Receives hint → constrains tile search → attempts satellite matching near hint coordinates
|
||||
|
||||
### Component: Startup & Lifecycle (NEW)
|
||||
|
||||
**Startup sequence**:
|
||||
1. Boot Jetson → start GPS-Denied service (systemd)
|
||||
2. Connect to flight controller via pymavlink on UART
|
||||
3. Wait for heartbeat from flight controller
|
||||
4. Read GLOBAL_POSITION_INT → extract lat, lon, alt
|
||||
5. Initialize ESKF state with this position (high confidence if real GPS available)
|
||||
6. Start cuVSLAM with first camera frames
|
||||
7. Begin GPS_INPUT output loop at 5-10Hz
|
||||
8. Preload satellite tiles within ±2km of flight plan into RAM
|
||||
9. System ready — GPS-Denied active
|
||||
|
||||
**GPS denial detection**:
|
||||
Not required — the system always outputs GPS_INPUT. If real GPS is available, the flight controller uses whichever GPS source has better accuracy (configurable GPS blending or priority). When real GPS degrades/lost, flight controller seamlessly uses our GPS_INPUT.
|
||||
|
||||
**Failsafe**:
|
||||
- If no valid position estimate for N seconds (configurable, e.g., 10s): stop sending GPS_INPUT
|
||||
- Flight controller detects GPS timeout → falls back to IMU-only dead reckoning
|
||||
- System logs failure, continues attempting recovery (VO + satellite matching)
|
||||
- When recovery succeeds: resume GPS_INPUT output
|
||||
|
||||
**Reboot recovery**:
|
||||
1. Jetson reboots → re-establish pymavlink connection
|
||||
2. Read GPS_RAW_INT (now IMU-extrapolated by flight controller since GPS_INPUT stopped)
|
||||
3. Initialize ESKF with this position (low confidence, horiz_accuracy=100m+)
|
||||
4. Resume cuVSLAM + satellite matching → accuracy improves over time
|
||||
5. Resume GPS_INPUT output
|
||||
|
||||
### Component: Object Localization (UPDATED)
|
||||
|
||||
**Two modes**:
|
||||
|
||||
**Mode 1: Navigation camera (nadir)**
|
||||
Frame-center GPS from ESKF. Any object in navigation camera frame:
|
||||
1. Pixel offset from center: (dx_px, dy_px)
|
||||
2. Convert to meters: dx_m = dx_px × GSD, dy_m = dy_px × GSD
|
||||
3. Rotate by heading (yaw from IMU)
|
||||
4. Convert meter offset to lat/lon delta, add to frame-center GPS
|
||||
|
||||
**Mode 2: AI camera (configurable angle and zoom)**
|
||||
1. Get current UAV position from ESKF
|
||||
2. Get AI camera params: tilt_angle (from vertical), pan_angle (from heading), zoom (effective focal length)
|
||||
3. Get pixel coordinates of detected object in AI camera frame
|
||||
4. Compute bearing: bearing = heading + pan_angle + atan2(dx_px × sensor_width / focal_eff, focal_eff)
|
||||
5. Compute ground distance: for flat terrain, slant_range = altitude / cos(tilt_angle + dy_angle), ground_range = slant_range × sin(tilt_angle + dy_angle)
|
||||
6. Convert bearing + ground_range to lat/lon offset
|
||||
7. Return GPS coordinates with accuracy estimate
|
||||
|
||||
**Local API** (FastAPI on localhost:8000):
|
||||
- `POST /localize` — accepts: pixel_x, pixel_y, camera_id ("nav" or "ai"), ai_camera_params (tilt, pan, zoom) → returns: lat, lon, accuracy_m
|
||||
- `GET /status` — returns: system state, confidence, drift, uptime
|
||||
|
||||
### Component: Satellite Tile Preprocessing (Offline)
|
||||
|
||||
**Selected**: GeoHash-indexed tile pairs on disk + RAM preloading.
|
||||
|
||||
Pipeline:
|
||||
1. Define operational area from flight plan
|
||||
2. Download satellite tiles from Google Maps Tile API at zoom 19 (0.3 m/pixel)
|
||||
3. If zoom 19 unavailable: fall back to zoom 18 (0.6 m/pixel — meets ≥0.5 m/pixel requirement)
|
||||
4. Validate: resolution ≥0.5 m/pixel, check imagery staleness where possible
|
||||
5. Pre-resize each tile to matcher input resolution
|
||||
6. Store: original + resized + metadata (GPS bounds, zoom, GSD, download date) in GeoHash-indexed structure
|
||||
7. Copy to Jetson storage before flight
|
||||
8. At startup: preload tiles within ±2km of flight plan into RAM
|
||||
|
||||
### Component: Re-localization (Disconnected Segments)
|
||||
|
||||
When cuVSLAM reports tracking loss (sharp turn, no features):
|
||||
1. Flag next frame as keyframe → trigger satellite matching
|
||||
2. Compute IMU dead-reckoning position since last known position
|
||||
3. Rank preloaded tiles by distance to dead-reckoning position
|
||||
4. Try top 3 tiles sequentially
|
||||
5. If match found: position recovered, new segment begins
|
||||
6. If 3 consecutive keyframe failures: send STATUSTEXT alert to ground station ("RE-LOC REQUEST: position uncertain, drift Xm")
|
||||
7. While waiting for operator hint: continue VO/IMU dead reckoning, report low confidence via horiz_accuracy
|
||||
8. If operator sends COMMAND_LONG with lat/lon hint: constrain tile search to ±500m of hint
|
||||
9. If still no match after operator hint: continue dead reckoning, log failure
|
||||
|
||||
### Component: Thermal Management (NEW)
|
||||
|
||||
**Power mode**: 25W (stable sustained performance)
|
||||
|
||||
**Monitoring**: Read GPU/CPU temperature via tegrastats or sysfs thermal zones at 1Hz.
|
||||
|
||||
**Adaptive pipeline**:
|
||||
- Normal (<70°C): Full pipeline — cuVSLAM every frame + satellite match every 3-10 frames
|
||||
- Warm (70-75°C): Reduce satellite matching to every 5-10 frames
|
||||
- Hot (75-80°C): Reduce satellite matching to every 10-15 frames
|
||||
- Throttling (>80°C): Disable satellite matching entirely, VO+IMU only (cuVSLAM ~9ms is very light). Report LOW confidence. Resume satellite matching when temp drops below 75°C
|
||||
|
||||
**Hardware requirement**: Active cooling fan (5V) mandatory for UAV companion computer enclosure.
|
||||
|
||||
## Processing Time Budget (per frame, 333ms interval)
|
||||
|
||||
### Normal Frame (non-keyframe, ~60-80% of frames)
|
||||
|
||||
| Step | Time | Notes |
|
||||
|------|------|-------|
|
||||
| Image capture + transfer | ~10ms | CSI/USB3 |
|
||||
| Downsample (for cuVSLAM) | ~2ms | OpenCV CUDA |
|
||||
| cuVSLAM VO+IMU | ~9ms | NVIDIA CUDA-optimized, 116fps |
|
||||
| ESKF measurement update | ~1ms | NumPy |
|
||||
| **Total per camera frame** | **~22ms** | Well within 333ms |
|
||||
|
||||
GPS_INPUT output runs independently at 5-10Hz (every 100-200ms):
|
||||
| Step | Time | Notes |
|
||||
|------|------|-------|
|
||||
| Read ESKF state | <0.1ms | Shared state |
|
||||
| Compute horiz_accuracy | <0.1ms | √(σ²) |
|
||||
| pymavlink gps_input_send | ~1ms | UART write |
|
||||
| **Total per GPS_INPUT** | **~1ms** | Negligible overhead |
|
||||
|
||||
### Keyframe Satellite Matching (async, every 3-10 frames)
|
||||
|
||||
Runs on separate CUDA stream — does NOT block VO or GPS_INPUT.
|
||||
|
||||
**Path A — LiteSAM TRT FP16 at 1280px (if ≤200ms benchmark)**:
|
||||
|
||||
| Step | Time | Notes |
|
||||
|------|------|-------|
|
||||
| Downsample to 1280px | ~1ms | OpenCV CUDA |
|
||||
| Load satellite tile | ~1ms | Pre-loaded in RAM |
|
||||
| LiteSAM (opt) TRT FP16 | ≤200ms | Go/no-go threshold |
|
||||
| Geometric pose (RANSAC) | ~5ms | Homography |
|
||||
| ESKF satellite update | ~1ms | Delayed measurement |
|
||||
| **Total** | **≤210ms** | Async |
|
||||
|
||||
**Path B — XFeat (if LiteSAM >200ms)**:
|
||||
|
||||
| Step | Time | Notes |
|
||||
|------|------|-------|
|
||||
| XFeat extraction + matching | ~50-80ms | TensorRT FP16 |
|
||||
| Geometric verification (RANSAC) | ~5ms | |
|
||||
| ESKF satellite update | ~1ms | |
|
||||
| **Total** | **~60-90ms** | Async |
|
||||
|
||||
## Memory Budget (Jetson Orin Nano Super, 8GB shared)
|
||||
|
||||
| Component | Memory | Notes |
|
||||
|-----------|--------|-------|
|
||||
| OS + runtime | ~1.5GB | JetPack 6.2 + Python |
|
||||
| cuVSLAM | ~200-500MB | CUDA library + map state (configure pruning for 3000 frames) |
|
||||
| Satellite matcher TensorRT | ~50-100MB | LiteSAM FP16 or XFeat FP16 |
|
||||
| Preloaded satellite tiles | ~200MB | ±2km of flight plan |
|
||||
| pymavlink + MAVLink runtime | ~20MB | Lightweight |
|
||||
| FastAPI (local IPC) | ~50MB | Minimal, localhost only |
|
||||
| Current frame buffer | ~2MB | |
|
||||
| ESKF state + buffers | ~10MB | |
|
||||
| **Total** | **~2.1-2.9GB** | ~26-36% of 8GB — comfortable |
|
||||
|
||||
## Confidence Scoring → GPS_INPUT Mapping
|
||||
|
||||
| Level | Condition | horiz_accuracy (m) | fix_type | GPS_INPUT satellites_visible |
|
||||
|-------|-----------|---------------------|----------|------------------------------|
|
||||
| HIGH | Satellite match succeeded + cuVSLAM consistent | 10-20 | 3 (3D) | 12 |
|
||||
| MEDIUM | cuVSLAM VO only, recent satellite correction (<500m travel) | 20-50 | 3 (3D) | 8 |
|
||||
| LOW | cuVSLAM VO only, no recent correction, OR high thermal throttling | 50-100 | 2 (2D) | 4 |
|
||||
| VERY LOW | IMU dead-reckoning only | 100-500 | 1 (no fix) | 1 |
|
||||
| MANUAL | Operator-provided re-localization hint | 200 | 3 (3D) | 6 |
|
||||
|
||||
Note: `satellites_visible` is synthetic — used to influence EKF weighting. ArduPilot gives more weight to GPS with higher satellite count and lower horiz_accuracy.
|
||||
|
||||
## Key Risks and Mitigations
|
||||
|
||||
| Risk | Likelihood | Impact | Mitigation |
|
||||
|------|-----------|--------|------------|
|
||||
| **MAVSDK cannot send GPS_INPUT** | CONFIRMED | Must use pymavlink (conflicts with restriction) | Use pymavlink. Document restriction conflict. No alternative in Python. |
|
||||
| **cuVSLAM fails on low-texture agricultural terrain** | HIGH | Frequent tracking loss, degraded VO | Increase satellite matching frequency. IMU dead-reckoning bridge. Accept higher drift. |
|
||||
| **Jetson UART instability with ArduPilot** | MEDIUM | MAVLink connection drops | Test thoroughly. Use USB serial adapter if UART unreliable. Add watchdog reconnect. |
|
||||
| **Thermal throttling blows satellite matching budget** | MEDIUM | Miss keyframe windows | Adaptive pipeline: reduce/skip satellite matching at high temp. Active cooling mandatory. |
|
||||
| LiteSAM TRT FP16 >200ms at 1280px | MEDIUM | Must use XFeat | Day-one benchmark. XFeat fallback. |
|
||||
| XFeat cross-view accuracy insufficient | MEDIUM | Satellite corrections less accurate | Multi-tile consensus, strict RANSAC, increase keyframe frequency. |
|
||||
| cuVSLAM map memory growth on long flights | MEDIUM | Memory pressure | Configure map pruning, max keyframes. |
|
||||
| Google Maps satellite quality in conflict zone | HIGH | Satellite matching fails | Accept VO+IMU with higher drift. Alternative providers. |
|
||||
| GPS_INPUT at 3Hz too slow for ArduPilot EKF | HIGH | Poor EKF fusion, position jumps | 5-10Hz output with IMU interpolation between camera frames. |
|
||||
| Companion computer reboot mid-flight | LOW | ~30-60s GPS gap | Flight controller IMU fallback. Automatic recovery on restart. |
|
||||
| Telemetry bandwidth saturation | LOW | Custom messages compete with autopilot telemetry | Limit NAMED_VALUE_FLOAT to 1Hz. Keep messages compact. |
|
||||
|
||||
## Testing Strategy
|
||||
|
||||
### Integration / Functional Tests
|
||||
- End-to-end: camera → cuVSLAM → ESKF → GPS_INPUT → verify flight controller receives valid position
|
||||
- Compare computed positions against ground truth GPS from coordinates.csv
|
||||
- Measure: percentage within 50m (target: 80%), percentage within 20m (target: 60%)
|
||||
- Test GPS_INPUT rate: verify 5-10Hz output to flight controller
|
||||
- Test sharp-turn handling: verify satellite re-localization after 90-degree heading change
|
||||
- Test disconnected segments: simulate 3+ route breaks, verify all segments connected
|
||||
- Test re-localization: simulate 3 consecutive failures → verify STATUSTEXT sent → inject COMMAND_LONG hint → verify recovery
|
||||
- Test object localization: send POST /localize with known AI camera params → verify GPS accuracy
|
||||
- Test startup: verify ESKF initializes from flight controller GPS
|
||||
- Test reboot recovery: kill process → restart → verify reconnection and position recovery
|
||||
- Test failsafe: simulate total failure → verify GPS_INPUT stops → verify flight controller IMU fallback
|
||||
- Test cuVSLAM map memory: run 3000-frame session, monitor memory growth
|
||||
|
||||
### Non-Functional Tests
|
||||
- **Day-one satellite matcher benchmark**: LiteSAM TRT FP16 at 1280px on Orin Nano Super
|
||||
- cuVSLAM benchmark: verify 116fps monocular+IMU on Orin Nano Super
|
||||
- cuVSLAM terrain stress test: urban, agricultural, water, forest
|
||||
- **UART reliability test**: sustained pymavlink communication over 1+ hour
|
||||
- **Thermal endurance test**: run full pipeline for 30+ minutes, measure GPU temp, verify no throttling with active cooling
|
||||
- Per-frame latency: must be <400ms for VO pipeline
|
||||
- GPS_INPUT latency: measure time from camera capture to GPS_INPUT send
|
||||
- Memory: peak usage during 3000-frame session (must stay <8GB)
|
||||
- Drift budget: verify ESKF covariance tracks cumulative drift, triggers satellite matching before 100m
|
||||
- Telemetry bandwidth: measure total MAVLink bandwidth used by companion computer
|
||||
|
||||
## References
|
||||
- pymavlink GPS_INPUT example: https://webperso.ensta.fr/lebars/Share/GPS_INPUT_pymavlink.py
|
||||
- pymavlink mavgps.py: https://github.com/ArduPilot/pymavlink/blob/master/examples/mavgps.py
|
||||
- ArduPilot GPS Input module: https://ardupilot.org/mavproxy/docs/modules/GPSInput.html
|
||||
- MAVLink GPS_INPUT message spec: https://mavlink.io/en/messages/common.html#GPS_INPUT
|
||||
- MAVSDK-Python GPS_INPUT limitation: https://github.com/mavlink/MAVSDK-Python/issues/320
|
||||
- MAVSDK-Python custom message limitation: https://github.com/mavlink/MAVSDK-Python/issues/739
|
||||
- ArduPilot companion computer setup: https://ardupilot.org/dev/docs/raspberry-pi-via-mavlink.html
|
||||
- Jetson Orin UART with ArduPilot: https://forums.developer.nvidia.com/t/uart-connection-between-jetson-nano-orin-and-ardupilot/325416
|
||||
- MAVLink NAMED_VALUE_FLOAT: https://mavlink.io/en/messages/common.html#NAMED_VALUE_FLOAT
|
||||
- MAVLink STATUSTEXT: https://mavlink.io/en/messages/common.html#STATUSTEXT
|
||||
- MAVLink telemetry bandwidth: https://github.com/mavlink/mavlink/issues/1605
|
||||
- JetPack 6.2 Super Mode: https://developer.nvidia.com/blog/nvidia-jetpack-6-2-brings-super-mode-to-nvidia-jetson-orin-nano-and-jetson-orin-nx-modules/
|
||||
- Jetson Orin Nano power consumption: https://edgeaistack.app/blog/jetson-orin-nano-power-consumption/
|
||||
- UAV target geolocation: https://www.mdpi.com/1424-8220/22/5/1903
|
||||
- LiteSAM (2025): https://www.mdpi.com/2072-4292/17/19/3349
|
||||
- LiteSAM code: https://github.com/boyagesmile/LiteSAM
|
||||
- cuVSLAM (2025-2026): https://github.com/NVlabs/PyCuVSLAM
|
||||
- PyCuVSLAM API: https://nvlabs.github.io/PyCuVSLAM/api.html
|
||||
- Intermodalics cuVSLAM benchmark: https://www.intermodalics.ai/blog/nvidia-isaac-ros-in-depth-cuvslam-and-the-dp3-1-release
|
||||
- XFeat (CVPR 2024): https://arxiv.org/abs/2404.19174
|
||||
- XFeat TensorRT for Jetson: https://github.com/PranavNedunghat/XFeatTensorRT
|
||||
- EfficientLoFTR (CVPR 2024): https://github.com/zju3dv/EfficientLoFTR
|
||||
- STHN (IEEE RA-L 2024): https://github.com/arplaboratory/STHN
|
||||
- JointLoc (IROS 2024): https://github.com/LuoXubo/JointLoc
|
||||
- Hybrid ESKF/UKF: https://arxiv.org/abs/2512.17505
|
||||
- Google Maps Tile API: https://developers.google.com/maps/documentation/tile/satellite
|
||||
- ArduPilot EKF Source Selection: https://ardupilot.org/copter/docs/common-ekf-sources.html
|
||||
- Mateos-Ramirez et al. (2024): https://www.mdpi.com/2076-3417/14/16/7420
|
||||
- SatLoc (2025): https://www.scilit.com/publications/e5cafaf875a49297a62b298a89d5572f
|
||||
|
||||
## Related Artifacts
|
||||
- AC Assessment: `_docs/00_research/gps_denied_nav/00_ac_assessment.md`
|
||||
- Research artifacts: `_docs/00_research/gps_denied_nav_v3/`
|
||||
- Tech stack evaluation: `_docs/01_solution/tech_stack.md`
|
||||
- Security analysis: `_docs/01_solution/security_analysis.md`
|
||||
@@ -0,0 +1,385 @@
|
||||
# Solution Draft
|
||||
|
||||
## Assessment Findings
|
||||
|
||||
| Old Component Solution | Weak Point (functional/security/performance) | New Solution |
|
||||
|------------------------|----------------------------------------------|-------------|
|
||||
| ONNX Runtime as potential inference runtime for AI models | **Performance**: ONNX Runtime CUDA EP on Jetson Orin Nano is 7-8x slower than TRT standalone with default settings (tensor cores not utilized). Even TRT-EP shows up to 3x overhead on some models. | **Use native TRT Engine for all AI models**. Convert PyTorch → ONNX → trtexec → .engine. Load with tensorrt Python module. Eliminates ONNX Runtime dependency entirely. |
|
||||
| ONNX Runtime TRT-EP memory overhead | **Performance**: ONNX RT TRT-EP keeps serialized engine in memory (~420-440MB vs 130-140MB native TRT). Delta ~280-300MB PER MODEL. On 8GB shared memory, this wastes ~560-600MB for two models. | **Native TRT releases serialized blob after deserialization** → saves ~280-300MB per model. Total savings ~560-600MB — 7% of total memory. Critical given cuVSLAM map growth risk. |
|
||||
| No explicit TRT engine build step in offline pipeline | **Functional**: Draft03 mentions TRT FP16 but doesn't define the build workflow. When/where are engines built? | **Add TRT engine build to offline preparation pipeline**: After satellite tile download, run trtexec on Jetson to build .engine files. Store alongside tiles. One-time cost per model version. |
|
||||
| Cross-platform portability via ONNX Runtime | **Functional**: ONNX Runtime's primary value is cross-platform support. Our deployment is Jetson-only — this value is zero. We pay the performance/memory tax for unused portability. | **Drop ONNX Runtime**. Jetson Orin Nano Super is fixed deployment hardware. TRT Engine is the optimal runtime for NVIDIA-only deployment. |
|
||||
| No DLA offloading considered | **Performance**: Draft03 doesn't mention DLA. Jetson Orin Nano has NO DLA cores — only Orin NX (1-2) and AGX Orin (2) have DLA. | **Confirm: DLA offloading is NOT available on Orin Nano**. All inference must run on GPU (1024 CUDA cores, 16 tensor cores). This makes maximizing GPU efficiency via native TRT even more critical. |
|
||||
| LiteSAM MinGRU TRT compatibility risk | **Functional**: LiteSAM's subpixel refinement uses 4 stacked MinGRU layers over a 3×3 candidate window (seq_len=9). MinGRU gates depend only on input C_f (not h_{t-1}), so z_t/h̃_t are pre-computable. Ops are standard: Linear, Sigmoid, Mul, Add, ReLU, Tanh. Risk is LOW-MEDIUM — depends on whether implementation uses logcumsumexp (problematic) or simple loop (fine). Seq_len=9 makes this trivially rewritable. | **Day-one verification**: clone LiteSAM repo → torch.onnx.export → polygraphy inspect → trtexec --fp16. If export fails on MinGRU: rewrite forward() as unrolled loop (9 steps). **If LiteSAM cannot be made TRT-compatible: replace with EfficientLoFTR TRT** (proven TRT path via Coarse_LoFTR_TRT, 15.05M params, semi-dense matching). |
|
||||
|
||||
## Product Solution Description
|
||||
|
||||
A real-time GPS-denied visual navigation system for fixed-wing UAVs, running on a Jetson Orin Nano Super (8GB). All AI model inference uses **native TensorRT Engine files** — no ONNX Runtime dependency. The system replaces the GPS module by sending MAVLink GPS_INPUT messages via pymavlink over UART at 5-10Hz.
|
||||
|
||||
Position is determined by fusing: (1) CUDA-accelerated visual odometry (cuVSLAM — native CUDA), (2) absolute position corrections from satellite image matching (LiteSAM or XFeat — TRT Engine FP16), and (3) IMU data from the flight controller via ESKF.
|
||||
|
||||
**Inference runtime decision**: Native TRT Engine over ONNX Runtime because:
|
||||
1. ONNX RT CUDA EP is 7-8x slower on Orin Nano (tensor core bug)
|
||||
2. ONNX RT TRT-EP wastes ~280-300MB per model (serialized engine retained in memory)
|
||||
3. Cross-platform portability has zero value — deployment is Jetson-only
|
||||
4. Native TRT provides direct CUDA stream control for pipelining with cuVSLAM
|
||||
|
||||
**Hard constraint**: Camera shoots at ~3fps (333ms interval). Full VO+ESKF pipeline within 400ms. GPS_INPUT at 5-10Hz.
|
||||
|
||||
**AI Model Runtime Summary**:
|
||||
|
||||
| Model | Runtime | Precision | Memory | Integration |
|
||||
|-------|---------|-----------|--------|-------------|
|
||||
| cuVSLAM | Native CUDA (PyCuVSLAM) | N/A (closed-source) | ~200-500MB | CUDA Stream A |
|
||||
| LiteSAM | TRT Engine | FP16 | ~50-80MB | CUDA Stream B |
|
||||
| XFeat | TRT Engine | FP16 | ~30-50MB | CUDA Stream B (fallback) |
|
||||
| ESKF | CPU (Python/C++) | FP64 | ~10MB | CPU thread |
|
||||
|
||||
**Offline Preparation Pipeline** (before flight):
|
||||
1. Download satellite tiles → validate → pre-resize → store (existing)
|
||||
2. **NEW: Build TRT engines on Jetson** (one-time per model version)
|
||||
- `trtexec --onnx=litesam_fp16.onnx --saveEngine=litesam.engine --fp16`
|
||||
- `trtexec --onnx=xfeat.onnx --saveEngine=xfeat.engine --fp16`
|
||||
3. Copy tiles + engines to Jetson storage
|
||||
4. At startup: load engines + preload tiles into RAM
|
||||
|
||||
```
|
||||
┌─────────────────────────────────────────────────────────────────────┐
|
||||
│ OFFLINE (Before Flight) │
|
||||
│ 1. Satellite Tiles → Download & Validate → Pre-resize → Store │
|
||||
│ (Google Maps) (≥0.5m/px, <2yr) (matcher res) (GeoHash)│
|
||||
│ 2. TRT Engine Build (one-time per model version): │
|
||||
│ PyTorch model → reparameterize → ONNX export → trtexec --fp16 │
|
||||
│ Output: litesam.engine, xfeat.engine │
|
||||
│ 3. Copy tiles + engines to Jetson storage │
|
||||
└─────────────────────────────────────────────────────────────────────┘
|
||||
│
|
||||
▼
|
||||
┌─────────────────────────────────────────────────────────────────────┐
|
||||
│ ONLINE (During Flight) │
|
||||
│ │
|
||||
│ STARTUP: │
|
||||
│ 1. pymavlink → read GLOBAL_POSITION_INT → init ESKF │
|
||||
│ 2. Load TRT engines: litesam.engine + xfeat.engine │
|
||||
│ (tensorrt.Runtime → deserialize_cuda_engine → create_context) │
|
||||
│ 3. Allocate GPU buffers for TRT input/output (PyCUDA) │
|
||||
│ 4. Start cuVSLAM with first camera frames │
|
||||
│ 5. Preload satellite tiles ±2km into RAM │
|
||||
│ 6. Begin GPS_INPUT output loop at 5-10Hz │
|
||||
│ │
|
||||
│ EVERY FRAME (3fps, 333ms interval): │
|
||||
│ ┌──────────────────────────────────────┐ │
|
||||
│ │ Nav Camera → Downsample (CUDA ~2ms) │ │
|
||||
│ │ → cuVSLAM VO+IMU (~9ms) │ ← CUDA Stream A │
|
||||
│ │ → ESKF measurement update │ │
|
||||
│ └──────────────────────────────────────┘ │
|
||||
│ │
|
||||
│ 5-10Hz CONTINUOUS: │
|
||||
│ ┌──────────────────────────────────────┐ │
|
||||
│ │ ESKF IMU prediction → GPS_INPUT send │──→ Flight Controller │
|
||||
│ └──────────────────────────────────────┘ │
|
||||
│ │
|
||||
│ KEYFRAMES (every 3-10 frames, async): │
|
||||
│ ┌──────────────────────────────────────┐ │
|
||||
│ │ TRT Engine inference (Stream B): │ │
|
||||
│ │ context.enqueue_v3(stream_B) │──→ ESKF correction │
|
||||
│ │ LiteSAM FP16 or XFeat FP16 │ │
|
||||
│ └──────────────────────────────────────┘ │
|
||||
│ │
|
||||
│ TELEMETRY (1Hz): │
|
||||
│ ┌──────────────────────────────────────┐ │
|
||||
│ │ NAMED_VALUE_FLOAT: confidence, drift │──→ Ground Station │
|
||||
│ └──────────────────────────────────────┘ │
|
||||
└─────────────────────────────────────────────────────────────────────┘
|
||||
```
|
||||
|
||||
## Architecture
|
||||
|
||||
### Component: AI Model Inference Runtime
|
||||
|
||||
| Solution | Tools | Advantages | Limitations | Performance | Memory | Fit |
|
||||
|----------|-------|-----------|-------------|------------|--------|-----|
|
||||
| Native TRT Engine | tensorrt Python + PyCUDA + trtexec | Optimal latency, minimal memory, full tensor core usage, direct CUDA stream control | Hardware-specific engines, manual buffer management, rebuild per TRT version | Optimal | ~50-130MB total (both models) | ✅ Best |
|
||||
| ONNX Runtime TRT-EP | onnxruntime + TensorRT EP | Auto-fallback for unsupported ops, simpler API, auto engine caching | +280-300MB per model, wrapper overhead, first-run latency spike | Near-parity (claimed), up to 3x slower (observed) | ~640-690MB total (both models) | ❌ Memory overhead unacceptable |
|
||||
| ONNX Runtime CUDA EP | onnxruntime + CUDA EP | Simplest API, broadest op support | 7-8x slower on Orin Nano (tensor core bug), no TRT optimizations | 7-8x slower | Standard | ❌ Performance unacceptable |
|
||||
| Torch-TensorRT | torch_tensorrt | AOT compilation, PyTorch-native, handles mixed TRT/PyTorch | Newer on Jetson, requires PyTorch runtime at inference | Near native TRT | PyTorch runtime ~500MB+ | ⚠️ Viable alternative if TRT export fails |
|
||||
|
||||
**Selected**: **Native TRT Engine** — optimal performance and memory on our fixed NVIDIA hardware.
|
||||
|
||||
**Fallback**: If any model has unsupported TRT ops (e.g., MinGRU in LiteSAM), use **Torch-TensorRT** for that specific model. Torch-TensorRT handles mixed TRT/PyTorch execution but requires PyTorch runtime in memory.
|
||||
|
||||
### Component: TRT Engine Conversion Workflow
|
||||
|
||||
**LiteSAM conversion**:
|
||||
1. Load PyTorch model with trained weights
|
||||
2. Reparameterize MobileOne backbone (collapse multi-branch → single Conv2d+BN)
|
||||
3. Export to ONNX: `torch.onnx.export(model, dummy_input, "litesam.onnx", opset_version=17)`
|
||||
4. Verify with polygraphy: `polygraphy inspect model litesam.onnx`
|
||||
5. Build engine on Jetson: `trtexec --onnx=litesam.onnx --saveEngine=litesam.engine --fp16 --memPoolSize=workspace:2048`
|
||||
6. Verify engine: `trtexec --loadEngine=litesam.engine --fp16`
|
||||
|
||||
**XFeat conversion**:
|
||||
1. Load PyTorch model
|
||||
2. Export to ONNX: `torch.onnx.export(model, dummy_input, "xfeat.onnx", opset_version=17)`
|
||||
3. Build engine on Jetson: `trtexec --onnx=xfeat.onnx --saveEngine=xfeat.engine --fp16`
|
||||
4. Alternative: use XFeatTensorRT C++ implementation directly
|
||||
|
||||
**INT8 quantization strategy** (optional, future optimization):
|
||||
- MobileOne backbone (CNN): INT8 safe with calibration data
|
||||
- TAIFormer (transformer attention): FP16 only — INT8 degrades accuracy
|
||||
- XFeat: evaluate INT8 on actual UAV-satellite pairs before deploying
|
||||
- Use nvidia-modelopt for calibration: `from modelopt.onnx.quantization import quantize`
|
||||
|
||||
### Component: TRT Python Inference Wrapper
|
||||
|
||||
Minimal wrapper class for TRT engine inference:
|
||||
|
||||
```python
|
||||
import tensorrt as trt
|
||||
import pycuda.driver as cuda
|
||||
|
||||
class TRTInference:
|
||||
def __init__(self, engine_path, stream):
|
||||
self.logger = trt.Logger(trt.Logger.WARNING)
|
||||
self.runtime = trt.Runtime(self.logger)
|
||||
with open(engine_path, 'rb') as f:
|
||||
self.engine = self.runtime.deserialize_cuda_engine(f.read())
|
||||
self.context = self.engine.create_execution_context()
|
||||
self.stream = stream
|
||||
self._allocate_buffers()
|
||||
|
||||
def _allocate_buffers(self):
|
||||
self.inputs = {}
|
||||
self.outputs = {}
|
||||
for i in range(self.engine.num_io_tensors):
|
||||
name = self.engine.get_tensor_name(i)
|
||||
shape = self.engine.get_tensor_shape(name)
|
||||
dtype = trt.nptype(self.engine.get_tensor_dtype(name))
|
||||
size = trt.volume(shape)
|
||||
device_mem = cuda.mem_alloc(size * np.dtype(dtype).itemsize)
|
||||
self.context.set_tensor_address(name, int(device_mem))
|
||||
mode = self.engine.get_tensor_mode(name)
|
||||
if mode == trt.TensorIOMode.INPUT:
|
||||
self.inputs[name] = (device_mem, shape, dtype)
|
||||
else:
|
||||
self.outputs[name] = (device_mem, shape, dtype)
|
||||
|
||||
def infer_async(self, input_data):
|
||||
for name, data in input_data.items():
|
||||
cuda.memcpy_htod_async(self.inputs[name][0], data, self.stream)
|
||||
self.context.enqueue_v3(self.stream.handle)
|
||||
|
||||
def get_output(self):
|
||||
results = {}
|
||||
for name, (dev_mem, shape, dtype) in self.outputs.items():
|
||||
host_mem = np.empty(shape, dtype=dtype)
|
||||
cuda.memcpy_dtoh_async(host_mem, dev_mem, self.stream)
|
||||
self.stream.synchronize()
|
||||
return results
|
||||
```
|
||||
|
||||
Key design: `infer_async()` + `get_output()` split enables pipelining with cuVSLAM on Stream A while satellite matching runs on Stream B.
|
||||
|
||||
### Component: Visual Odometry (UNCHANGED)
|
||||
|
||||
cuVSLAM — native CUDA library, not affected by TRT migration. Already optimal.
|
||||
|
||||
### Component: Satellite Image Matching (UPDATED runtime + fallback chain)
|
||||
|
||||
| Solution | Tools | Advantages | Limitations | Performance (est. Orin Nano Super TRT FP16) | Params | Fit |
|
||||
|----------|-------|-----------|-------------|----------------------------------------------|--------|-----|
|
||||
| LiteSAM (opt) TRT Engine FP16 @ 1280px | trtexec + tensorrt Python | Best satellite-aerial accuracy (RMSE@30=17.86m UAV-VisLoc), 6.31M params, smallest model | MinGRU TRT export needs verification (LOW-MEDIUM risk) | Est. ~165-330ms | 6.31M | ✅ Primary (if TRT export succeeds AND ≤200ms) |
|
||||
| EfficientLoFTR TRT Engine FP16 | trtexec + tensorrt Python | Proven TRT path (Coarse_LoFTR_TRT repo, 138 stars). Semi-dense. CVPR 2024. High accuracy. | 2.4x more params than LiteSAM. Requires einsum→elementary ops rewrite for TRT (documented in Coarse_LoFTR_TRT paper). | Est. ~200-400ms | 15.05M | ✅ Fallback if LiteSAM TRT fails |
|
||||
| XFeat TRT Engine FP16 | trtexec + tensorrt Python (or XFeatTensorRT C++) | Fastest. Proven TRT implementation. Lightweight. | General-purpose, not designed for cross-view satellite-aerial gap (but nadir-nadir gap is small). | Est. ~50-100ms | <5M | ✅ Speed fallback |
|
||||
|
||||
**Decision tree (day-one on Orin Nano Super)**:
|
||||
1. Clone LiteSAM repo → reparameterize MobileOne → `torch.onnx.export()` → `polygraphy inspect`
|
||||
2. If ONNX export succeeds → `trtexec --onnx=litesam.onnx --saveEngine=litesam.engine --fp16`
|
||||
3. If MinGRU causes ONNX/TRT failure → rewrite MinGRU forward() as unrolled 9-step loop → retry
|
||||
4. If rewrite fails or accuracy degrades → **switch to EfficientLoFTR TRT**:
|
||||
- Apply Coarse_LoFTR_TRT TRT-adaptation techniques (einsum replacement, etc.)
|
||||
- Export to ONNX → trtexec --fp16
|
||||
- Benchmark at 640×480 and 1280px
|
||||
5. Benchmark winner: **if ≤200ms → use it. If >200ms but ≤300ms → acceptable (async on Stream B). If >300ms → use XFeat TRT**
|
||||
|
||||
**EfficientLoFTR TRT adaptation** (from Coarse_LoFTR_TRT paper, proven workflow):
|
||||
- Replace `torch.einsum()` with elementary ops (view, bmm, reshape, sum)
|
||||
- Replace any TRT-incompatible high-level PyTorch functions
|
||||
- Use ONNX export path (less memory required than Torch-TensorRT on 8GB device)
|
||||
- Knowledge distillation available for further parameter reduction if needed
|
||||
|
||||
### Component: Sensor Fusion (UNCHANGED)
|
||||
ESKF — CPU-based mathematical filter, not affected.
|
||||
|
||||
### Component: Flight Controller Integration (UNCHANGED)
|
||||
pymavlink — not affected by TRT migration.
|
||||
|
||||
### Component: Ground Station Telemetry (UNCHANGED)
|
||||
MAVLink NAMED_VALUE_FLOAT — not affected.
|
||||
|
||||
### Component: Startup & Lifecycle (UPDATED)
|
||||
|
||||
**Updated startup sequence**:
|
||||
1. Boot Jetson → start GPS-Denied service (systemd)
|
||||
2. Connect to flight controller via pymavlink on UART
|
||||
3. Wait for heartbeat from flight controller
|
||||
4. **Initialize PyCUDA context**
|
||||
5. **Load TRT engines**: litesam.engine + xfeat.engine via tensorrt.Runtime.deserialize_cuda_engine()
|
||||
6. **Allocate GPU I/O buffers** for both models
|
||||
7. **Create CUDA streams**: Stream A (cuVSLAM), Stream B (satellite matching)
|
||||
8. Read GLOBAL_POSITION_INT → init ESKF
|
||||
9. Start cuVSLAM with first camera frames
|
||||
10. Begin GPS_INPUT output loop at 5-10Hz
|
||||
11. Preload satellite tiles within ±2km into RAM
|
||||
12. System ready
|
||||
|
||||
**Engine load time**: ~1-3 seconds per engine (deserialization from .engine file). One-time cost at startup.
|
||||
|
||||
### Component: Thermal Management (UNCHANGED)
|
||||
Same adaptive pipeline. TRT engines are slightly more power-efficient than ONNX Runtime, but the difference is within noise.
|
||||
|
||||
### Component: Object Localization (UNCHANGED)
|
||||
Not affected — trigonometric calculation, no AI inference.
|
||||
|
||||
## Speed Optimization Techniques
|
||||
|
||||
### 1. cuVSLAM for Visual Odometry (~9ms/frame)
|
||||
Unchanged from draft03. Native CUDA, not part of TRT migration.
|
||||
|
||||
### 2. Native TRT Engine Inference (NEW)
|
||||
All AI models run as pre-compiled TRT FP16 engines:
|
||||
- Engine files built offline with trtexec (one-time per model version)
|
||||
- Loaded at startup (~1-3s per engine)
|
||||
- Inference via context.enqueue_v3() on dedicated CUDA Stream B
|
||||
- GPU buffers pre-allocated — zero runtime allocation during flight
|
||||
- No ONNX Runtime dependency — no framework overhead
|
||||
|
||||
Memory advantage over ONNX Runtime TRT-EP: ~560-600MB saved (both models combined).
|
||||
Latency advantage: eliminates ONNX wrapper overhead, guaranteed tensor core utilization.
|
||||
|
||||
### 3. CUDA Stream Pipelining (REFINED)
|
||||
- Stream A: cuVSLAM VO for current frame (~9ms) + ESKF fusion (~1ms)
|
||||
- Stream B: TRT engine inference for satellite matching (LiteSAM or XFeat, async)
|
||||
- CPU: GPS_INPUT output loop, NAMED_VALUE_FLOAT, command listener, tile management
|
||||
- **NEW**: Both cuVSLAM and TRT engines use CUDA streams natively — no framework abstraction layer. Direct GPU scheduling.
|
||||
|
||||
### 4-7. (UNCHANGED from draft03)
|
||||
Keyframe-based satellite matching, TensorRT FP16 optimization, proactive tile loading, 5-10Hz GPS_INPUT output — all unchanged.
|
||||
|
||||
## Processing Time Budget (per frame, 333ms interval)
|
||||
|
||||
### Normal Frame (non-keyframe)
|
||||
Unchanged from draft03 — cuVSLAM dominates at ~22ms total.
|
||||
|
||||
### Keyframe Satellite Matching (async, CUDA Stream B)
|
||||
|
||||
**Path A — LiteSAM TRT Engine FP16 at 1280px**:
|
||||
|
||||
| Step | Time | Notes |
|
||||
|------|------|-------|
|
||||
| Downsample to 1280px | ~1ms | OpenCV CUDA |
|
||||
| Load satellite tile | ~1ms | Pre-loaded in RAM |
|
||||
| Copy input to GPU buffer | <0.5ms | PyCUDA memcpy_htod_async |
|
||||
| LiteSAM TRT Engine FP16 | ≤200ms | context.enqueue_v3(stream_B) |
|
||||
| Copy output from GPU | <0.5ms | PyCUDA memcpy_dtoh_async |
|
||||
| Geometric pose (RANSAC) | ~5ms | Homography |
|
||||
| ESKF satellite update | ~1ms | Delayed measurement |
|
||||
| **Total** | **≤210ms** | Async on Stream B |
|
||||
|
||||
**Path B — XFeat TRT Engine FP16**:
|
||||
|
||||
| Step | Time | Notes |
|
||||
|------|------|-------|
|
||||
| XFeat TRT Engine inference | ~50-80ms | context.enqueue_v3(stream_B) |
|
||||
| Geometric verification (RANSAC) | ~5ms | |
|
||||
| ESKF satellite update | ~1ms | |
|
||||
| **Total** | **~60-90ms** | Async on Stream B |
|
||||
|
||||
## Memory Budget (Jetson Orin Nano Super, 8GB shared)
|
||||
|
||||
| Component | Memory (Native TRT) | Memory (ONNX RT TRT-EP) | Notes |
|
||||
|-----------|---------------------|--------------------------|-------|
|
||||
| OS + runtime | ~1.5GB | ~1.5GB | JetPack 6.2 + Python |
|
||||
| cuVSLAM | ~200-500MB | ~200-500MB | CUDA library + map |
|
||||
| **LiteSAM TRT engine** | **~50-80MB** | **~330-360MB** | Native TRT vs TRT-EP. If LiteSAM fails: EfficientLoFTR ~100-150MB |
|
||||
| **XFeat TRT engine** | **~30-50MB** | **~310-330MB** | Native TRT vs TRT-EP |
|
||||
| Preloaded satellite tiles | ~200MB | ~200MB | ±2km of flight plan |
|
||||
| pymavlink + MAVLink | ~20MB | ~20MB | |
|
||||
| FastAPI (local IPC) | ~50MB | ~50MB | |
|
||||
| ESKF + buffers | ~10MB | ~10MB | |
|
||||
| ONNX Runtime framework | **0MB** | **~150MB** | Eliminated with native TRT |
|
||||
| **Total** | **~2.1-2.9GB** | **~2.8-3.6GB** | |
|
||||
| **% of 8GB** | **26-36%** | **35-45%** | |
|
||||
| **Savings** | — | — | **~700MB saved with native TRT** |
|
||||
|
||||
## Confidence Scoring → GPS_INPUT Mapping
|
||||
Unchanged from draft03.
|
||||
|
||||
## Key Risks and Mitigations
|
||||
|
||||
| Risk | Likelihood | Impact | Mitigation |
|
||||
|------|-----------|--------|------------|
|
||||
| **LiteSAM MinGRU ops unsupported in TRT 10.3** | LOW-MEDIUM | LiteSAM TRT export fails | Day-one verification: ONNX export → polygraphy → trtexec. If MinGRU fails: (1) rewrite as unrolled 9-step loop, (2) if still fails: **switch to EfficientLoFTR TRT** (proven TRT path, Coarse_LoFTR_TRT, 15.05M params). XFeat TRT as speed fallback. |
|
||||
| **TRT engine build OOM on 8GB Jetson** | LOW | Cannot build engines on target device | Our models are small (6.31M LiteSAM, <5M XFeat). OOM unlikely. If occurs: reduce --memPoolSize, or build on identical Orin Nano module with more headroom |
|
||||
| **Engine incompatibility after JetPack update** | MEDIUM | Must rebuild engines | Include engine rebuild in JetPack update procedure. Takes minutes per model. |
|
||||
| **MAVSDK cannot send GPS_INPUT** | CONFIRMED | Must use pymavlink | Unchanged from draft03 |
|
||||
| **cuVSLAM fails on low-texture terrain** | HIGH | Frequent tracking loss | Unchanged from draft03 |
|
||||
| **Thermal throttling** | MEDIUM | Satellite matching budget blown | Unchanged from draft03 |
|
||||
| LiteSAM TRT FP16 >200ms at 1280px | MEDIUM | Must use fallback matcher | Day-one benchmark. Fallback chain: EfficientLoFTR TRT (if ≤300ms) → XFeat TRT (if all >300ms) |
|
||||
| Google Maps satellite quality in conflict zone | HIGH | Satellite matching fails | Unchanged from draft03 |
|
||||
|
||||
## Testing Strategy
|
||||
|
||||
### Integration / Functional Tests
|
||||
All tests from draft03 unchanged, plus:
|
||||
- **TRT engine load test**: Verify litesam.engine and xfeat.engine load successfully on Jetson Orin Nano Super
|
||||
- **TRT inference correctness**: Compare TRT engine output vs PyTorch reference output (max L1 error < 0.01)
|
||||
- **CUDA Stream B pipelining**: Verify satellite matching on Stream B does not block cuVSLAM on Stream A
|
||||
- **Engine pre-built validation**: Verify engine files from offline preparation work without rebuild at runtime
|
||||
|
||||
### Non-Functional Tests
|
||||
All tests from draft03 unchanged, plus:
|
||||
- **TRT engine build time**: Measure trtexec build time for LiteSAM and XFeat on Orin Nano Super (expected: 1-5 minutes each)
|
||||
- **TRT engine load time**: Measure deserialization time (expected: 1-3 seconds each)
|
||||
- **Memory comparison**: Measure actual GPU memory with native TRT vs ONNX RT TRT-EP for both models
|
||||
- **MinGRU TRT compatibility** (day-one blocker):
|
||||
1. Clone LiteSAM repo, load pretrained weights
|
||||
2. Reparameterize MobileOne backbone
|
||||
3. `torch.onnx.export(model, dummy, "litesam.onnx", opset_version=17)`
|
||||
4. `polygraphy inspect model litesam.onnx` — check for unsupported ops
|
||||
5. `trtexec --onnx=litesam.onnx --saveEngine=litesam.engine --fp16`
|
||||
6. If step 3 or 5 fails on MinGRU: rewrite MinGRU forward() as unrolled loop, retry
|
||||
7. If still fails: switch to EfficientLoFTR, apply Coarse_LoFTR_TRT adaptation
|
||||
8. Compare TRT output vs PyTorch reference (max L1 error < 0.01)
|
||||
- **EfficientLoFTR TRT fallback benchmark** (if LiteSAM fails): apply TRT adaptation from Coarse_LoFTR_TRT → ONNX → trtexec → measure latency at 640×480 and 1280px
|
||||
- **Tensor core utilization**: Verify with NSight that TRT engines use tensor cores (unlike ONNX RT CUDA EP)
|
||||
|
||||
## References
|
||||
- ONNX Runtime Issue #24085 (Jetson Orin Nano tensor core bug): https://github.com/microsoft/onnxruntime/issues/24085
|
||||
- ONNX Runtime Issue #20457 (TRT-EP memory overhead): https://github.com/microsoft/onnxruntime/issues/20457
|
||||
- ONNX Runtime Issue #12083 (TRT-EP vs native TRT): https://github.com/microsoft/onnxruntime/issues/12083
|
||||
- NVIDIA TensorRT 10 Python API: https://docs.nvidia.com/deeplearning/tensorrt/10.15.1/inference-library/python-api-docs.html
|
||||
- TensorRT Best Practices: https://docs.nvidia.com/deeplearning/tensorrt/latest/performance/best-practices.html
|
||||
- TensorRT engine hardware specificity: https://github.com/NVIDIA/TensorRT/issues/1920
|
||||
- trtexec ONNX conversion: https://nvidia-jetson.piveral.com/jetson-orin-nano/how-to-convert-onnx-to-engine-on-jetson-orin-nano-dev-board/
|
||||
- Torch-TensorRT JetPack 6.2: https://docs.pytorch.org/TensorRT/v2.10.0/getting_started/jetpack.html
|
||||
- XFeatTensorRT: https://github.com/PranavNedunghat/XFeatTensorRT
|
||||
- JetPack 6.2 Release Notes: https://docs.nvidia.com/jetson/archives/jetpack-archived/jetpack-62/release-notes/index.html
|
||||
- Jetson Orin Nano Super: https://developer.nvidia.com/blog/nvidia-jetson-orin-nano-developer-kit-gets-a-super-boost/
|
||||
- DLA on Jetson Orin: https://developer.nvidia.com/blog/maximizing-deep-learning-performance-on-nvidia-jetson-orin-with-dla/
|
||||
- EfficientLoFTR (CVPR 2024): https://github.com/zju3dv/EfficientLoFTR
|
||||
- EfficientLoFTR HuggingFace: https://huggingface.co/docs/transformers/en/model_doc/efficientloftr
|
||||
- Coarse_LoFTR_TRT (TRT for embedded): https://github.com/Kolkir/Coarse_LoFTR_TRT
|
||||
- Coarse_LoFTR_TRT paper: https://ar5iv.labs.arxiv.org/html/2202.00770
|
||||
- LoFTR_TRT: https://github.com/Kolkir/LoFTR_TRT
|
||||
- minGRU ("Were RNNs All We Needed?"): https://huggingface.co/papers/2410.01201
|
||||
- minGRU PyTorch implementation: https://github.com/lucidrains/minGRU-pytorch
|
||||
- LiteSAM paper (MinGRU details, Eqs 12-16): https://www.mdpi.com/2072-4292/17/19/3349
|
||||
- DALGlue (UAV feature matching, 2025): https://www.nature.com/articles/s41598-025-21602-5
|
||||
- All references from solution_draft03.md
|
||||
|
||||
## Related Artifacts
|
||||
- AC Assessment: `_docs/00_research/gps_denied_nav/00_ac_assessment.md`
|
||||
- Research artifacts (this assessment): `_docs/00_research/trt_engine_migration/`
|
||||
- Previous research: `_docs/00_research/gps_denied_nav_v3/`
|
||||
- Tech stack evaluation: `_docs/01_solution/tech_stack.md`
|
||||
- Security analysis: `_docs/01_solution/security_analysis.md`
|
||||
@@ -0,0 +1,257 @@
|
||||
# Tech Stack Evaluation
|
||||
|
||||
## Requirements Summary
|
||||
|
||||
### Functional
|
||||
- GPS-denied visual navigation for fixed-wing UAV
|
||||
- Frame-center GPS estimation via VO + satellite matching + IMU fusion
|
||||
- Object-center GPS via geometric projection
|
||||
- Real-time streaming via REST API + SSE
|
||||
- Disconnected route segment handling
|
||||
- User-input fallback for unresolvable frames
|
||||
|
||||
### Non-Functional
|
||||
- <400ms per-frame processing (camera @ ~3fps)
|
||||
- <50m accuracy for 80% of frames, <20m for 60%
|
||||
- <8GB total memory (CPU+GPU shared pool)
|
||||
- Up to 3000 frames per flight session
|
||||
- Image Registration Rate >95% (normal segments)
|
||||
|
||||
### Hardware Constraints
|
||||
- **Jetson Orin Nano Super** (8GB LPDDR5, 1024 CUDA cores, 67 TOPS INT8)
|
||||
- **JetPack 6.2.2**: CUDA 12.6.10, TensorRT 10.3.0, cuDNN 9.3
|
||||
- ARM64 (aarch64) architecture
|
||||
- No internet connectivity during flight
|
||||
|
||||
## Technology Evaluation
|
||||
|
||||
### Platform & OS
|
||||
|
||||
| Option | Version | Score (1-5) | Notes |
|
||||
|--------|---------|-------------|-------|
|
||||
| **JetPack 6.2.2 (L4T)** | Ubuntu 22.04 based | **5** | Only supported OS for Orin Nano Super. Includes CUDA 12.6, TensorRT 10.3, cuDNN 9.3 |
|
||||
|
||||
**Selected**: JetPack 6.2.2 — no alternative.
|
||||
|
||||
### Primary Language
|
||||
|
||||
| Option | Fitness | Maturity | Perf on Jetson | Ecosystem | Score |
|
||||
|--------|---------|----------|----------------|-----------|-------|
|
||||
| **Python 3.10+** | 5 | 5 | 4 | 5 | **4.8** |
|
||||
| C++ | 5 | 5 | 5 | 3 | 4.5 |
|
||||
| Rust | 3 | 3 | 5 | 2 | 3.3 |
|
||||
|
||||
**Selected**: **Python 3.10+** as primary language.
|
||||
- cuVSLAM provides Python bindings (PyCuVSLAM v15.0.0)
|
||||
- TensorRT has Python API
|
||||
- FastAPI is Python-native
|
||||
- OpenCV has full Python+CUDA bindings
|
||||
- Performance-critical paths offloaded to CUDA via cuVSLAM/TensorRT — Python is glue code only
|
||||
- C++ for custom ESKF if NumPy proves too slow (unlikely for 16-state EKF at 100Hz)
|
||||
|
||||
### Visual Odometry
|
||||
|
||||
| Option | Version | FPS on Orin Nano | Memory | License | Score |
|
||||
|--------|---------|------------------|--------|---------|-------|
|
||||
| **cuVSLAM (PyCuVSLAM)** | v15.0.0 (Mar 2026) | 116fps @ 720p | ~200-300MB | Free (NVIDIA, closed-source) | **5** |
|
||||
| XFeat frame-to-frame | TensorRT engine | ~30-50ms/frame | ~50MB | MIT | 3.5 |
|
||||
| ORB-SLAM3 | v1.0 | ~30fps | ~300MB | GPLv3 | 2.5 |
|
||||
|
||||
**Selected**: **PyCuVSLAM v15.0.0**
|
||||
- 116fps on Orin Nano 8G at 720p (verified via Intermodalics benchmark)
|
||||
- Mono + IMU mode natively supported
|
||||
- Auto IMU fallback on tracking loss
|
||||
- Pre-built aarch64 wheel: `pip install -e bin/aarch64`
|
||||
- Loop closure built-in
|
||||
|
||||
**Risk**: Closed-source; nadir-only camera not explicitly tested. **Fallback**: XFeat frame-to-frame matching.
|
||||
|
||||
### Satellite Image Matching (Benchmark-Driven Selection)
|
||||
|
||||
**Day-one benchmark decides between two candidates:**
|
||||
|
||||
| Option | Params | Accuracy (UAV-VisLoc) | Est. Time on Orin Nano | License | Score |
|
||||
|--------|--------|----------------------|----------------------|---------|-------|
|
||||
| **LiteSAM (opt)** | 6.31M | RMSE@30 = 17.86m | ~300-500ms @ 480px (estimated) | Open-source | **4** (if fast enough) |
|
||||
| **XFeat semi-dense** | ~5M | Not benchmarked on UAV-VisLoc | ~50-100ms | MIT | **4** (if LiteSAM too slow) |
|
||||
|
||||
**Decision rule**:
|
||||
1. Export LiteSAM (opt) to TensorRT FP16 on Orin Nano Super
|
||||
2. Benchmark at 480px, 640px, 800px
|
||||
3. If ≤400ms at 480px → LiteSAM
|
||||
4. If >400ms → **abandon LiteSAM, XFeat is primary**
|
||||
|
||||
| Requirement | LiteSAM (opt) | XFeat semi-dense |
|
||||
|-------------|---------------|------------------|
|
||||
| PyTorch → ONNX → TensorRT export | Required | Required |
|
||||
| TensorRT FP16 engine | 6.31M params, ~25MB engine | ~5M params, ~20MB engine |
|
||||
| Input preprocessing | Resize to 480px, normalize | Resize to 640px, normalize |
|
||||
| Matching pipeline | End-to-end (detect + match + refine) | Detect → KNN match → geometric verify |
|
||||
| Cross-view robustness | Designed for satellite-aerial gap | General-purpose, less robust |
|
||||
|
||||
### Sensor Fusion
|
||||
|
||||
| Option | Complexity | Accuracy | Compute @ 100Hz | Score |
|
||||
|--------|-----------|----------|-----------------|-------|
|
||||
| **ESKF (custom)** | Low | Good | <1ms/step | **5** |
|
||||
| Hybrid ESKF/UKF | Medium | 49% better | ~2-3ms/step | 3.5 |
|
||||
| GTSAM Factor Graph | High | Best | ~10-50ms/step | 2 |
|
||||
|
||||
**Selected**: **Custom ESKF in Python (NumPy/SciPy)**
|
||||
- 16-state vector, well within NumPy capability
|
||||
- FilterPy (v1.4.5, MIT) as reference/fallback, but custom implementation preferred for tighter control
|
||||
- If 100Hz IMU prediction step proves slow in Python: rewrite as Cython or C extension (~1 day effort)
|
||||
|
||||
### Image Preprocessing
|
||||
|
||||
| Option | Tool | Time on Orin Nano | Notes | Score |
|
||||
|--------|------|-------------------|-------|-------|
|
||||
| **OpenCV CUDA resize** | cv2.cuda.resize | ~2-3ms (pre-allocated) | Must build OpenCV with CUDA from source. Pre-allocate GPU mats to avoid allocation overhead | **4** |
|
||||
| NVIDIA VPI resize | VPI 3.2 | ~1-2ms | Part of JetPack, potentially faster | 4 |
|
||||
| CPU resize (OpenCV) | cv2.resize | ~5-10ms | No GPU needed, simpler | 3 |
|
||||
|
||||
**Selected**: **OpenCV CUDA** (pre-allocated GPU memory) or **VPI 3.2** (whichever is faster in benchmark). Both available in JetPack 6.2.
|
||||
- Must build OpenCV from source with `CUDA_ARCH_BIN=8.7` for Orin Nano Ampere architecture
|
||||
- Alternative: VPI 3.2 is pre-installed in JetPack 6.2, no build step needed
|
||||
|
||||
### API & Streaming Framework
|
||||
|
||||
| Option | Version | Async Support | SSE Support | Score |
|
||||
|--------|---------|--------------|-------------|-------|
|
||||
| **FastAPI + sse-starlette** | FastAPI 0.115+, sse-starlette 3.3.2 | Native async/await | EventSourceResponse with auto-disconnect | **5** |
|
||||
| Flask + flask-sse | Flask 3.x | Limited | Redis dependency | 2 |
|
||||
| Raw aiohttp | aiohttp 3.x | Full | Manual SSE implementation | 3 |
|
||||
|
||||
**Selected**: **FastAPI + sse-starlette v3.3.2**
|
||||
- sse-starlette: 108M downloads/month, BSD-3 license, production-stable
|
||||
- Auto-generated OpenAPI docs
|
||||
- Native async for non-blocking VO + satellite pipeline
|
||||
- Uvicorn as ASGI server
|
||||
|
||||
### Satellite Tile Storage & Indexing
|
||||
|
||||
| Option | Complexity | Lookup Speed | Score |
|
||||
|--------|-----------|-------------|-------|
|
||||
| **GeoHash-indexed directory** | Low | O(1) hash lookup | **5** |
|
||||
| SQLite + spatial index | Medium | O(log n) | 4 |
|
||||
| PostGIS | High | O(log n) | 2 (overkill) |
|
||||
|
||||
**Selected**: **GeoHash-indexed directory structure**
|
||||
- Pre-flight: download tiles, store as `{geohash}/{zoom}_{x}_{y}.jpg` + `{geohash}/{zoom}_{x}_{y}_resized.jpg`
|
||||
- Runtime: compute geohash from ESKF position → direct directory lookup
|
||||
- Metadata in JSON sidecar files
|
||||
- No database dependency on the Jetson during flight
|
||||
|
||||
### Satellite Tile Provider
|
||||
|
||||
| Provider | Max Zoom | GSD | Pricing | Eastern Ukraine Coverage | Score |
|
||||
|----------|----------|-----|---------|--------------------------|-------|
|
||||
| **Google Maps Tile API** | 18-19 | ~0.3-0.5 m/px | 100K tiles free/month, then $0.48/1K | Partial (conflict zone gaps) | **4** |
|
||||
| Bing Maps | 18-19 | ~0.3-0.5 m/px | 125K free/year (basic) | Similar | 3.5 |
|
||||
| Mapbox Satellite | 18-19 | ~0.5 m/px | 200K free/month | Similar | 3.5 |
|
||||
|
||||
**Selected**: **Google Maps Tile API** (per restrictions.md). 100K free tiles/month covers ~25km² at zoom 19. For larger operational areas, costs are manageable at $0.48/1K tiles.
|
||||
|
||||
### Output Format
|
||||
|
||||
| Format | Standard | Tooling | Score |
|
||||
|--------|----------|---------|-------|
|
||||
| **GeoJSON** | RFC 7946 | Universal GIS support | **5** |
|
||||
| CSV (lat, lon, confidence) | De facto | Simple, lightweight | 4 |
|
||||
|
||||
**Selected**: **GeoJSON** as primary, CSV as export option. Per AC: WGS84 coordinates.
|
||||
|
||||
## Tech Stack Summary
|
||||
|
||||
```
|
||||
┌────────────────────────────────────────────────────┐
|
||||
│ HARDWARE: Jetson Orin Nano Super 8GB │
|
||||
│ OS: JetPack 6.2.2 (L4T / Ubuntu 22.04) │
|
||||
│ CUDA 12.6.10 / TensorRT 10.3.0 / cuDNN 9.3 │
|
||||
├────────────────────────────────────────────────────┤
|
||||
│ LANGUAGE: Python 3.10+ │
|
||||
│ FRAMEWORK: FastAPI + sse-starlette 3.3.2 │
|
||||
│ SERVER: Uvicorn (ASGI) │
|
||||
├────────────────────────────────────────────────────┤
|
||||
│ VISUAL ODOMETRY: PyCuVSLAM v15.0.0 │
|
||||
│ SATELLITE MATCH: LiteSAM(opt) or XFeat (benchmark) │
|
||||
│ SENSOR FUSION: Custom ESKF (NumPy/SciPy) │
|
||||
│ PREPROCESSING: OpenCV CUDA or VPI 3.2 │
|
||||
│ INFERENCE: TensorRT 10.3.0 (FP16) │
|
||||
├────────────────────────────────────────────────────┤
|
||||
│ TILE PROVIDER: Google Maps Tile API │
|
||||
│ TILE STORAGE: GeoHash-indexed directory │
|
||||
│ OUTPUT: GeoJSON (WGS84) via SSE stream │
|
||||
└────────────────────────────────────────────────────┘
|
||||
```
|
||||
|
||||
## Dependency List
|
||||
|
||||
### Python Packages (pip)
|
||||
|
||||
| Package | Version | Purpose |
|
||||
|---------|---------|---------|
|
||||
| pycuvslam | v15.0.0 (aarch64 wheel) | Visual odometry |
|
||||
| fastapi | >=0.115 | REST API framework |
|
||||
| sse-starlette | >=3.3.2 | SSE streaming |
|
||||
| uvicorn | >=0.30 | ASGI server |
|
||||
| numpy | >=1.26 | ESKF math, array ops |
|
||||
| scipy | >=1.12 | Rotation matrices, spatial transforms |
|
||||
| opencv-python (CUDA build) | >=4.8 | Image preprocessing (must build from source with CUDA) |
|
||||
| torch (aarch64) | >=2.3 (JetPack-compatible) | LiteSAM model loading (if selected) |
|
||||
| tensorrt | 10.3.0 (JetPack bundled) | Inference engine |
|
||||
| pycuda | >=2024.1 | CUDA stream management |
|
||||
| geojson | >=3.1 | GeoJSON output formatting |
|
||||
| pygeohash | >=1.2 | GeoHash tile indexing |
|
||||
|
||||
### System Dependencies (JetPack 6.2.2)
|
||||
|
||||
| Component | Version | Notes |
|
||||
|-----------|---------|-------|
|
||||
| CUDA Toolkit | 12.6.10 | Pre-installed |
|
||||
| TensorRT | 10.3.0 | Pre-installed |
|
||||
| cuDNN | 9.3 | Pre-installed |
|
||||
| VPI | 3.2 | Pre-installed, alternative to OpenCV CUDA for resize |
|
||||
| cuVSLAM runtime | Bundled with PyCuVSLAM wheel | |
|
||||
|
||||
### Offline Preprocessing Tools (developer machine, not Jetson)
|
||||
|
||||
| Tool | Purpose |
|
||||
|------|---------|
|
||||
| Python 3.10+ | Tile download script |
|
||||
| Google Maps Tile API key | Satellite tile access |
|
||||
| torch + LiteSAM weights | Feature pre-extraction (if LiteSAM selected) |
|
||||
| trtexec (TensorRT) | Model export to TensorRT engine |
|
||||
|
||||
## Risk Assessment
|
||||
|
||||
| Technology | Risk | Likelihood | Impact | Mitigation |
|
||||
|-----------|------|-----------|--------|------------|
|
||||
| cuVSLAM | Closed-source, nadir camera untested | Medium | High | XFeat frame-to-frame as open-source fallback |
|
||||
| LiteSAM | May exceed 400ms on Orin Nano Super | High | High | **Abandon for XFeat** — day-one benchmark is go/no-go |
|
||||
| OpenCV CUDA build | Build complexity on Jetson, CUDA arch compatibility | Medium | Low | VPI 3.2 as drop-in alternative (pre-installed) |
|
||||
| Google Maps Tile API | Conflict zone coverage gaps, EEA restrictions | Medium | Medium | Test tile availability for operational area pre-flight; alternative providers (Bing, Mapbox) |
|
||||
| Custom ESKF | Implementation bugs, tuning effort | Low | Medium | FilterPy v1.4.5 as reference; well-understood algorithm |
|
||||
| Python GIL | Concurrent VO + satellite matching contention | Low | Low | CUDA operations release GIL; use asyncio + threading for I/O |
|
||||
|
||||
## Learning Requirements
|
||||
|
||||
| Technology | Team Expertise Needed | Ramp-up Time |
|
||||
|-----------|----------------------|--------------|
|
||||
| PyCuVSLAM | SLAM concepts, Python API, camera calibration | 2-3 days |
|
||||
| TensorRT model export | ONNX export, trtexec, FP16 optimization | 2-3 days |
|
||||
| LiteSAM architecture | Transformer-based matching (if selected) | 1-2 days |
|
||||
| XFeat | Feature detection/matching concepts | 1 day |
|
||||
| ESKF | Kalman filtering, quaternion math, multi-rate fusion | 3-5 days |
|
||||
| FastAPI + SSE | Async Python, ASGI, SSE protocol | 1 day |
|
||||
| GeoHash spatial indexing | Geospatial concepts | 0.5 days |
|
||||
| Jetson deployment | JetPack, power modes, thermal management | 2-3 days |
|
||||
|
||||
## Development Environment
|
||||
|
||||
| Environment | Purpose | Setup |
|
||||
|-------------|---------|-------|
|
||||
| **Developer machine** (x86_64, GPU) | Development, unit testing, model export | Docker with CUDA + TensorRT |
|
||||
| **Jetson Orin Nano Super** | Integration testing, benchmarking, deployment | JetPack 6.2.2 flashed, SSH access |
|
||||
|
||||
Code should be developed and unit-tested on x86_64, then deployed to Jetson for integration/performance testing. cuVSLAM and TensorRT engines are aarch64-only — mock these in x86_64 tests.
|
||||
Reference in New Issue
Block a user