mirror of
https://github.com/azaion/gps-denied-onboard.git
synced 2026-04-27 16:46:36 +00:00
fresh start. Another try
This commit is contained in:
@@ -1,346 +0,0 @@
|
||||
# Security Analysis
|
||||
|
||||
## Operational Context
|
||||
|
||||
This system runs on a UAV operating in a **conflict zone** (eastern Ukraine). The UAV could be shot down and physically captured. GPS denial/spoofing is the premise. The Jetson Orin Nano stores satellite imagery, flight plans, and captured photos. Security must assume the worst case: **physical access by an adversary**.
|
||||
|
||||
## Threat Model
|
||||
|
||||
### Asset Inventory
|
||||
|
||||
| Asset | Sensitivity | Location | Notes |
|
||||
|-------|------------|----------|-------|
|
||||
| Captured camera imagery | HIGH | Jetson storage | Reconnaissance data — reveals what was surveyed |
|
||||
| Satellite tile cache | MEDIUM | Jetson storage | Reveals operational area and areas of interest |
|
||||
| Flight plan / route | HIGH | Jetson memory + storage | Reveals mission objectives and launch/landing sites |
|
||||
| Computed GPS positions | HIGH | Jetson memory, SSE stream | Real-time position data of UAV and surveyed targets |
|
||||
| Google Maps API key | MEDIUM | Offline prep machine only | Used pre-flight, NOT stored on Jetson |
|
||||
| TensorRT model weights | LOW | Jetson storage | LiteSAM/XFeat — publicly available models |
|
||||
| cuVSLAM binary | LOW | Jetson storage | NVIDIA proprietary but freely distributed |
|
||||
| IMU calibration data | LOW | Jetson storage | Device-specific calibration |
|
||||
| System configuration | MEDIUM | Jetson storage | API endpoints, tile paths, fusion parameters |
|
||||
|
||||
### Threat Actors
|
||||
|
||||
| Actor | Capability | Motivation | Likelihood |
|
||||
|-------|-----------|------------|------------|
|
||||
| **Adversary military (physical capture)** | Full physical access after UAV loss | Extract intelligence: imagery, flight plans, operational area | HIGH |
|
||||
| **Electronic warfare unit** | GPS spoofing/jamming, RF jamming | Disrupt navigation, force UAV off course | HIGH (GPS denial is the premise) |
|
||||
| **Network attacker (ground station link)** | Intercept/inject on UAV-to-ground comms | Steal position data, inject false commands | MEDIUM |
|
||||
| **Insider / rogue operator** | Authorized access to system | Data exfiltration, mission sabotage | LOW |
|
||||
| **Supply chain attacker** | Tampered satellite tiles or model weights | Feed corrupted reference data → position errors | LOW |
|
||||
|
||||
### Attack Vectors
|
||||
|
||||
| Vector | Target Asset | Actor | Impact | Likelihood |
|
||||
|--------|-------------|-------|--------|------------|
|
||||
| **Physical extraction of storage** | All stored data | Adversary (capture) | Full intelligence compromise | HIGH |
|
||||
| **GPS spoofing** | Position estimate | EW unit | Already mitigated — system is GPS-denied by design | N/A |
|
||||
| **IMU acoustic injection** | IMU data → ESKF | EW unit | Drift injection, subtle position errors | LOW |
|
||||
| **Camera blinding/spoofing** | VO + satellite matching | EW unit | VO failure, incorrect satellite matches | LOW |
|
||||
| **Adversarial ground patterns** | Satellite matching | Adversary | Physical patches on ground fool feature matching | VERY LOW |
|
||||
| **SSE stream interception** | Position data | Network attacker | Real-time position leak | MEDIUM |
|
||||
| **API command injection** | Flight session control | Network attacker | Start/stop/manipulate sessions | MEDIUM |
|
||||
| **Corrupted satellite tiles** | Satellite matching | Supply chain | Systematic position errors | LOW |
|
||||
| **Model weight tampering** | Matching accuracy | Supply chain | Degraded matching → higher drift | LOW |
|
||||
|
||||
## Per-Component Security Requirements and Controls
|
||||
|
||||
### 1. Data at Rest (Jetson Storage)
|
||||
|
||||
| Requirement | Risk Level | Control | Implementation |
|
||||
|-------------|-----------|---------|----------------|
|
||||
| Protect captured imagery from extraction after capture | CRITICAL | Full-disk encryption (LUKS) | JetPack LUKS support with `ENC_ROOTFS=1`. Use OP-TEE Trusted Application for key management. Modify `luks-srv` to NOT auto-decrypt — require hardware token or secure erase trigger |
|
||||
| Protect satellite tiles and flight plans | HIGH | Same LUKS encryption | Included in full-disk encryption scope |
|
||||
| Enable rapid secure erase on capture/crash | CRITICAL | Tamper-triggered wipe | Hardware dead-man switch: if UAV telemetry lost for N seconds OR accelerometer detects crash impact → trigger `cryptsetup luksErase` on all LUKS volumes. Destroys key material in <1 second — data becomes unrecoverable |
|
||||
| Prevent cold-boot key extraction | HIGH | Minimize key residency in RAM | ESKF state and position history cleared from memory when session ends. Avoid writing position logs to disk unless encrypted |
|
||||
|
||||
### 2. Secure Boot
|
||||
|
||||
| Requirement | Risk Level | Control | Implementation |
|
||||
|-------------|-----------|---------|----------------|
|
||||
| Prevent unauthorized code execution | HIGH | NVIDIA Secure Boot with PKC fuse burning | Burn `SecurityMode` fuse (odm_production_mode=0x1) on production Jetsons. Sign all boot images with PKC key pair. Generate keys via HSM |
|
||||
| Prevent firmware rollback | MEDIUM | Ratchet fuses | Configure anti-rollback fuses in fuse configuration XML |
|
||||
| Debug port lockdown | HIGH | Disable JTAG/debug after production | Burn debug-disable fuses. Irreversible — production units only |
|
||||
|
||||
### 3. API & Communication (FastAPI + SSE)
|
||||
|
||||
| Requirement | Risk Level | Control | Implementation |
|
||||
|-------------|-----------|---------|----------------|
|
||||
| Authenticate API clients | HIGH | JWT bearer token | Pre-shared secret between ground station and Jetson. Generate JWT at session start. Short expiry (flight duration). `HTTPBearer` scheme in FastAPI |
|
||||
| Encrypt SSE stream | HIGH | TLS 1.3 | Uvicorn with TLS certificate (self-signed for field use, pre-installed on ground station). All SSE position data encrypted in transit |
|
||||
| Prevent unauthorized session control | HIGH | JWT + endpoint authorization | Session start/stop/anchor endpoints require valid JWT. Rate-limit via `slowapi` |
|
||||
| Prevent replay attacks | MEDIUM | JWT `exp` + `jti` claims | Token expiry per-flight. Unique token ID (`jti`) tracked to prevent reuse |
|
||||
| Limit API surface | MEDIUM | Minimal endpoint exposure | Only expose: POST /sessions, GET /sessions/{id}/stream (SSE), POST /sessions/{id}/anchor, DELETE /sessions/{id}. No admin/debug endpoints in production |
|
||||
|
||||
### 4. Visual Odometry (cuVSLAM)
|
||||
|
||||
| Requirement | Risk Level | Control | Implementation |
|
||||
|-------------|-----------|---------|----------------|
|
||||
| Detect camera feed tampering | LOW | Sanity checks on frame consistency | If consecutive frames show implausible motion (>500m displacement at 3fps), flag as suspicious. ESKF covariance spike triggers satellite re-localization |
|
||||
| Protect against VO poisoning | LOW | Cross-validate VO with IMU | ESKF fusion inherently cross-validates: IMU and VO disagreement raises covariance, triggers satellite matching. No single sensor can silently corrupt position |
|
||||
|
||||
### 5. Satellite Image Matching
|
||||
|
||||
| Requirement | Risk Level | Control | Implementation |
|
||||
|-------------|-----------|---------|----------------|
|
||||
| Verify tile integrity | MEDIUM | SHA-256 checksums per tile | During offline preprocessing: compute SHA-256 for each tile pair. Store checksums in signed manifest. At runtime: verify checksum on tile load |
|
||||
| Prevent adversarial tile injection | MEDIUM | Signed tile manifest | Offline tool signs manifest with private key. Jetson verifies signature with embedded public key before accepting tile set |
|
||||
| Detect satellite match outliers | MEDIUM | RANSAC inlier ratio threshold | If RANSAC inlier ratio <30%, reject match as unreliable. ESKF treats as no-measurement rather than bad measurement |
|
||||
| Protect against ground-based adversarial patterns | VERY LOW | Multi-tile consensus | Match against multiple overlapping tiles. Physical adversarial patches affect local area — consensus voting across tiles detects anomalies |
|
||||
|
||||
### 6. Sensor Fusion (ESKF)
|
||||
|
||||
| Requirement | Risk Level | Control | Implementation |
|
||||
|-------------|-----------|---------|----------------|
|
||||
| Prevent single-sensor corruption of position | HIGH | Adaptive noise + outlier rejection | Mahalanobis distance test on each measurement. Reject updates >5σ from predicted state. No single measurement can cause >50m position jump |
|
||||
| Detect systematic drift | MEDIUM | Satellite matching rate monitoring | If satellite matches consistently disagree with VO by >100m, flag integrity warning to operator |
|
||||
| Protect fusion state | LOW | In-memory only, no persistence | ESKF state never written to disk. Lost on power-off — no forensic recovery |
|
||||
|
||||
### 7. Offline Preprocessing (Developer Machine)
|
||||
|
||||
| Requirement | Risk Level | Control | Implementation |
|
||||
|-------------|-----------|---------|----------------|
|
||||
| Protect Google Maps API key | MEDIUM | Environment variable, never in code | `.env` file excluded from version control. API key used only on developer machine, never deployed to Jetson |
|
||||
| Validate downloaded tiles | LOW | Source verification | Download only from Google Maps Tile API via HTTPS. Verify TLS certificate chain |
|
||||
| Secure tile transfer to Jetson | MEDIUM | Signed + encrypted transfer | Transfer tile set + signed manifest via encrypted channel (SCP/SFTP). Verify manifest signature on Jetson before accepting |
|
||||
|
||||
## Security Controls Summary
|
||||
|
||||
### Authentication & Authorization
|
||||
|
||||
- **Mechanism**: Pre-shared JWT secret between ground station and Jetson
|
||||
- **Scope**: All API endpoints require valid JWT bearer token
|
||||
- **Session model**: One JWT per flight session, expires at session end
|
||||
- **No user management on Jetson** — single-operator system, auth is device-to-device
|
||||
|
||||
### Data Protection
|
||||
|
||||
| State | Protection | Tool |
|
||||
|-------|-----------|------|
|
||||
| At rest (Jetson storage) | LUKS full-disk encryption | JetPack LUKS + OP-TEE |
|
||||
| In transit (SSE stream) | TLS 1.3 | Uvicorn SSL |
|
||||
| In memory (ESKF state) | No persistence, cleared on session end | Application logic |
|
||||
| On capture (emergency) | Tamper-triggered LUKS key erase | Hardware dead-man switch + `cryptsetup luksErase` |
|
||||
|
||||
### Secure Communication
|
||||
|
||||
- Ground station ↔ Jetson: TLS 1.3 (self-signed cert, pre-installed)
|
||||
- No internet connectivity during flight — no external attack surface
|
||||
- RF link security is out of scope (handled by UAV communication system)
|
||||
|
||||
### Logging & Monitoring
|
||||
|
||||
| What | Where | Retention |
|
||||
|------|-------|-----------|
|
||||
| API access logs (request count, errors) | In-memory ring buffer | Current session only, not persisted |
|
||||
| Security events (auth failures, integrity warnings) | In-memory + SSE alert to operator | Current session only |
|
||||
| Position history | In-memory for refinement, SSE to ground station | NOT persisted on Jetson after session end |
|
||||
| Crash/tamper events | Trigger secure erase, no logging | N/A — priority is data destruction |
|
||||
|
||||
**Design principle**: Minimize data persistence on Jetson. The ground station is the system of record. Jetson stores only what's needed for the current flight — satellite tiles (encrypted at rest) and transient processing state (memory only).
|
||||
|
||||
## Protected Code Execution (OP-TEE / ARM TrustZone)
|
||||
|
||||
### Overview
|
||||
|
||||
The Jetson Orin Nano Super supports hardware-enforced protected code execution via **ARM TrustZone** and **OP-TEE v4.2.0** (included in Jetson Linux 36.3+). TrustZone partitions the processor into two isolated worlds:
|
||||
|
||||
- **Secure World** (TEE): Runs at ARMv8 secure EL-1 (OS) and EL-0 (apps). Code here cannot be read or tampered with from the normal world. OP-TEE is the secure OS.
|
||||
- **Normal World**: Standard Linux (JetPack). Our Python application, cuVSLAM, FastAPI all run here.
|
||||
|
||||
Trusted Applications (TAs) execute inside the secure world and are invoked by Client Applications (CAs) in the normal world via the GlobalPlatform TEE Client API.
|
||||
|
||||
### Architecture on Jetson Orin Nano
|
||||
|
||||
```
|
||||
┌─────────────────────────────────────────────────┐
|
||||
│ NORMAL WORLD (Linux / JetPack) │
|
||||
│ │
|
||||
│ Client Application (CA) │
|
||||
│ ↕ libteec.so (TEE Client API) │
|
||||
│ ↕ OP-TEE Linux Kernel Driver │
|
||||
│ ↕ ARM Trusted Firmware (ATF) / Monitor │
|
||||
├─────────────────────────────────────────────────┤
|
||||
│ SECURE WORLD (OP-TEE v4.2.0) │
|
||||
│ │
|
||||
│ OP-TEE OS (ARMv8 S-EL1) │
|
||||
│ ├── jetson-user-key PTA (key management) │
|
||||
│ ├── luks TA (disk encryption passphrase) │
|
||||
│ ├── hwkey-agent TA (encrypt/decrypt data) │
|
||||
│ ├── PKCS #11 TA (crypto token interface) │
|
||||
│ └── Custom TAs (our application-specific TAs) │
|
||||
│ │
|
||||
│ Hardware: Security Engine (SE), HW RNG, Fuses │
|
||||
│ TZ-DRAM: Dedicated memory carveout │
|
||||
└─────────────────────────────────────────────────┘
|
||||
```
|
||||
|
||||
### Key Hierarchy (Hardware-Backed)
|
||||
|
||||
The Jetson Orin Nano provides a hardware-rooted key hierarchy via the Security Engine (SE) and Encrypted Key Blob (EKB):
|
||||
|
||||
```
|
||||
OEM_K1 fuse (256-bit AES, burned into hardware, cannot be read by software)
|
||||
│
|
||||
├── EKB_RK (EKB Root Key, derived via AES-128-ECB from OEM_K1 + FV)
|
||||
│ ├── EKB_EK (encryption key for EKB content)
|
||||
│ └── EKB_AK (authentication key for EKB content)
|
||||
│
|
||||
├── HUK (Hardware Unique Key, per-device, derived via NIST-SP-800-108)
|
||||
│ └── SSK (Secure Storage Key, per-device, generated at OP-TEE boot)
|
||||
│ ├── TSK (TA Storage Key, per-TA)
|
||||
│ └── FEK (File Encryption Key, per-file)
|
||||
│
|
||||
└── LUKS passphrase (derived from disk encryption key stored in EKB)
|
||||
```
|
||||
|
||||
Fuse keys are loaded into SE keyslots during early boot (before OP-TEE starts). Software cannot read keys from keyslots — only derive new keys through the SE. After use, keyslots should be cleared via `tegra_se_clear_aes_keyslots()`.
|
||||
|
||||
### What to Run in the Secure World (Our Use Cases)
|
||||
|
||||
| Use Case | TA Type | Purpose |
|
||||
|----------|---------|---------|
|
||||
| **LUKS disk encryption** | Built-in `luks` TA | Generate one-time passphrase at boot to unlock encrypted rootfs. Keys never leave secure world |
|
||||
| **Tile manifest verification** | Custom User TA | Verify SHA-256 signatures of satellite tile manifests. Signing key stored in EKB, accessible only in secure world |
|
||||
| **JWT secret storage** | Custom User TA or `hwkey-agent` TA | Store JWT signing secret in EKB. Sign/verify JWTs inside secure world — secret never exposed to Linux |
|
||||
| **Secure erase trigger** | Custom User TA | Receive tamper signal → invoke `cryptsetup luksErase` via CA. Key erase logic runs in secure world to prevent normal-world interference |
|
||||
| **TLS private key protection** | PKCS #11 TA | Store TLS private key in OP-TEE secure storage. Uvicorn uses PKCS #11 interface to perform TLS handshake without key leaving secure world |
|
||||
|
||||
### How to Enable Protected Code Execution
|
||||
|
||||
#### Step 1: Burn OEM_K1 Fuse (One-Time, Irreversible)
|
||||
|
||||
```bash
|
||||
# Generate 256-bit OEM_K1 key (use HSM in production)
|
||||
openssl rand -hex 32 > oem_k1_key.txt
|
||||
|
||||
# Create fuse configuration XML with OEM_K1
|
||||
# Burn fuse via odmfuse.sh (IRREVERSIBLE)
|
||||
sudo ./odmfuse.sh -i <fuse_config.xml> <board_name>
|
||||
```
|
||||
|
||||
After burning `SecurityMode` fuse (`odm_production_mode=0x1`), all further fuse writes are blocked. OEM_K1 becomes permanently embedded in hardware.
|
||||
|
||||
#### Step 2: Generate and Flash EKB
|
||||
|
||||
```bash
|
||||
# Generate user keys (disk encryption key, JWT secret, tile signing key)
|
||||
openssl rand -hex 16 > disk_enc_key.txt
|
||||
openssl rand -hex 32 > jwt_secret.txt
|
||||
openssl rand -hex 32 > tile_signing_key.txt
|
||||
|
||||
# Generate EKB binary
|
||||
python3 gen_ekb.py -chip t234 \
|
||||
-oem_k1_key oem_k1_key.txt \
|
||||
-in_sym_key uefi_enc_key.txt \
|
||||
-in_sym_key2 disk_enc_key.txt \
|
||||
-in_auth_key uefi_var_auth_key.txt \
|
||||
-out eks_t234.img
|
||||
|
||||
# Flash EKB to EKS partition
|
||||
# (part of the normal flash process with secure boot enabled)
|
||||
```
|
||||
|
||||
#### Step 3: Enable LUKS Disk Encryption
|
||||
|
||||
```bash
|
||||
# During flash, set ENC_ROOTFS=1 to encrypt rootfs
|
||||
export ENC_ROOTFS=1
|
||||
sudo ./flash.sh <board_name> <storage_device>
|
||||
```
|
||||
|
||||
The `luks` TA in OP-TEE derives a passphrase from the disk encryption key in EKB at boot. The passphrase is generated inside the secure world and passed to `cryptsetup` — it never exists in persistent storage.
|
||||
|
||||
#### Step 4: Develop Custom Trusted Applications
|
||||
|
||||
Cross-compile TAs for aarch64 using the Jetson OP-TEE source package:
|
||||
|
||||
```bash
|
||||
# Build custom TA (e.g., tile manifest verifier)
|
||||
make -C <ta_source_dir> \
|
||||
CROSS_COMPILE="<toolchain>/bin/aarch64-buildroot-linux-gnu-" \
|
||||
TA_DEV_KIT_DIR="<optee_src>/optee/build/t234/export-ta_arm64/" \
|
||||
OPTEE_CLIENT_EXPORT="<optee_src>/optee/install/t234/usr" \
|
||||
TEEC_EXPORT="<optee_src>/optee/install/t234/usr" \
|
||||
-j"$(nproc)"
|
||||
|
||||
# Deploy: copy TA to /lib/optee_armtz/ on Jetson
|
||||
# Deploy: copy CA to /usr/sbin/ on Jetson
|
||||
```
|
||||
|
||||
TAs conform to the GlobalPlatform TEE Internal Core API. Use the `hello_world` example from `optee_examples` as a starting template.
|
||||
|
||||
#### Step 5: Enable Secure Boot
|
||||
|
||||
```bash
|
||||
# Generate PKC key pair (use HSM for production)
|
||||
openssl genrsa -out pkc_key.pem 3072
|
||||
|
||||
# Sign and flash secured images
|
||||
sudo ./flash.sh --sign pkc_key.pem <board_name> <storage_device>
|
||||
|
||||
# After verification, burn SecurityMode fuse (IRREVERSIBLE)
|
||||
```
|
||||
|
||||
### Available Crypto Services in Secure World
|
||||
|
||||
| Service | Provider | Notes |
|
||||
|---------|----------|-------|
|
||||
| AES-128/256 encryption/decryption | SE hardware | Via keyslot-derived keys, never leaves SE |
|
||||
| Key derivation (NIST-SP-800-108) | `jetson-user-key` PTA | Derive purpose-specific keys from EKB keys |
|
||||
| Hardware RNG | SE hardware | `TEE_GenerateRandom()` or PTA command |
|
||||
| PKCS #11 crypto tokens | PKCS #11 TA | Standard crypto interface for TLS, signing |
|
||||
| SHA-256, HMAC | MbedTLS (bundled in optee_os) | Software crypto in secure world |
|
||||
| RSA/ECC signing | GlobalPlatform TEE Crypto API | For manifest signature verification |
|
||||
|
||||
### Limitations on Orin Nano
|
||||
|
||||
| Limitation | Impact | Workaround |
|
||||
|-----------|--------|------------|
|
||||
| No RPMB support (only AGX Orin has RPMB) | Secure storage uses REE FS instead of replay-protected memory | Acceptable — LUKS encryption protects data at rest. REE FS secure storage is encrypted by SSK |
|
||||
| EKB can only be updated via OTA, not at runtime | Cannot rotate keys in flight | Pre-provision per-device unique keys at manufacturing time |
|
||||
| OP-TEE hello_world reported issues on some Orin Nano units | Some users report initialization failures | Use JetPack 6.2.2+ which includes fixes. Test thoroughly on target hardware |
|
||||
| TZ-DRAM is a fixed carveout | Limits secure world memory | Keep TAs lightweight — only crypto operations and key management, not data processing |
|
||||
|
||||
### Security Hardening Checklist
|
||||
|
||||
- [ ] Burn OEM_K1 fuse with unique per-device key (via HSM)
|
||||
- [ ] Generate and flash EKB with disk encryption key, JWT secret, tile signing key
|
||||
- [ ] Enable LUKS full-disk encryption (`ENC_ROOTFS=1`)
|
||||
- [ ] Modify `luks-srv` to NOT auto-decrypt (require explicit trigger or dead-man switch)
|
||||
- [ ] Burn SecurityMode fuse (`odm_production_mode=0x1`) — enables secure boot chain
|
||||
- [ ] Burn debug-disable fuses — disables JTAG
|
||||
- [ ] Configure anti-rollback ratchet fuses
|
||||
- [ ] Clear SE keyslots after EKB extraction via `tegra_se_clear_aes_keyslots()`
|
||||
- [ ] Deploy custom TAs for JWT signing and tile manifest verification
|
||||
- [ ] Use PKCS #11 for TLS private key protection
|
||||
- [ ] Test secure erase trigger end-to-end
|
||||
- [ ] Run `xtest` (OP-TEE test suite) on production Jetson to validate TEE
|
||||
|
||||
## Key Security Risks
|
||||
|
||||
| Risk | Severity | Mitigation Status |
|
||||
|------|---------|-------------------|
|
||||
| Physical capture → data extraction | CRITICAL | Mitigated by LUKS + secure erase. Residual risk: attacker extracts RAM before erase triggers |
|
||||
| No auto-decrypt bypass for LUKS | HIGH | Requires custom `luks-srv` modification — development effort needed |
|
||||
| Self-signed TLS certificates | MEDIUM | Acceptable for field deployment. Certificate pinning on ground station prevents MITM |
|
||||
| cuVSLAM is closed-source | LOW | Cannot audit for vulnerabilities. Mitigated by running in sandboxed environment, input validation on camera frames |
|
||||
| Dead-man switch reliability | HIGH | Hardware integration required. False triggers (temporary signal loss) must NOT cause premature erase. Needs careful threshold tuning |
|
||||
|
||||
## References
|
||||
- xT-STRIDE threat model for UAVs (2025): https://link.springer.com/article/10.1007/s10207-025-01082-4
|
||||
- NVIDIA Jetson OP-TEE Documentation (r36.4.4): https://docs.nvidia.com/jetson/archives/r36.4.4/DeveloperGuide/SD/Security/OpTee.html
|
||||
- NVIDIA Jetson Security Overview (r36.4.3): https://docs.nvidia.com/jetson/archives/r36.4.3/DeveloperGuide/SD/Security.html
|
||||
- NVIDIA Jetson LUKS Disk Encryption: https://docs.nvidia.com/jetson/archives/r36.4.4/DeveloperGuide/SD/Security/DiskEncryption.html
|
||||
- NVIDIA Jetson Secure Boot: https://docs.nvidia.com/jetson/archives/r36.4.4/DeveloperGuide/SD/Security/SecureBoot.html
|
||||
- NVIDIA Jetson Secure Storage: https://docs.nvidia.com/jetson/archives/r36.4.3/DeveloperGuide/SD/Security/SecureStorage.html
|
||||
- NVIDIA Jetson Firmware TPM: https://docs.nvidia.com/jetson/archives/r36.4.3/DeveloperGuide/SD/Security/FirmwareTPM.html
|
||||
- NVIDIA Jetson Rollback Protection: https://docs.nvidia.com/jetson/archives/r36.4.3/DeveloperGuide/SD/Security/RollbackProtection.html
|
||||
- Jetson Orin Fuse Specification: https://developer.nvidia.com/downloads/jetson-agx-orin-series-fuse-specification
|
||||
- OP-TEE Official Documentation: https://optee.readthedocs.io/en/latest/
|
||||
- OP-TEE Trusted Application Examples: https://github.com/linaro-swg/optee_examples
|
||||
- RidgeRun OP-TEE on Jetson Guide: https://developer.ridgerun.com/wiki/index.php/RidgeRun_Platform_Security_Manual/Getting_Started/TEE/NVIDA-Jetson
|
||||
- GlobalPlatform TEE Specifications: https://globalplatform.org/specs-library/?filter-committee=tee
|
||||
- Model Agnostic Defense against Adversarial Patches on UAVs (2024): https://arxiv.org/html/2405.19179v1
|
||||
- FastAPI Security Best Practices (2026): https://fastlaunchapi.dev/blog/fastapi-best-practices-production-2026
|
||||
@@ -1,622 +0,0 @@
|
||||
# Solution Draft
|
||||
|
||||
## Assessment Findings
|
||||
|
||||
|
||||
| Old Component Solution | Weak Point (functional/security/performance) | New Solution |
|
||||
| ------------------------------------------------------------------------------------ | ----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | -------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
|
||||
| ESKF described as "16-state vector, ~10MB" with no mathematical specification | **Functional**: No state vector, no process model (F,Q), no measurement models (H for VO, H for satellite), no noise parameters, no scale observability analysis. Impossible to implement or validate accuracy claims. | **Define complete ESKF specification**: 15-state error vector, IMU-driven prediction, dual measurement models (VO relative pose, satellite absolute position), initial Q/R values, scale constraint via altitude + satellite corrections. |
|
||||
| GPS_INPUT at 5-10Hz via pymavlink — no field mapping | **Functional**: GPS_INPUT requires 15+ fields (velocity, accuracy, hdop, fix_type, GPS time). No specification of how ESKF state maps to these fields. ArduPilot requires minimum 5Hz. | **Define GPS_INPUT population spec**: velocity from ESKF, accuracy from covariance, fix_type from confidence tier, GPS time from system clock conversion, synthesized hdop/vdop. |
|
||||
| Confidence scoring "unchanged from draft03" — not in draft05 | **Functional**: Draft05 is supposed to be self-contained. Confidence scoring determines GPS_INPUT accuracy fields and fix_type — directly affects how ArduPilot EKF weights the position data. | **Define confidence scoring inline**: 3 tiers (satellite-anchored, VO-tracked, IMU-only) mapping to fix_type + accuracy values. |
|
||||
| Coordinate transformations not defined | **Functional**: No pixel→camera→body→NED→WGS84 chain. Camera is not autostabilized, so body attitude matters. Satellite match → WGS84 conversion undefined. Object localization impossible without these transforms. | **Define coordinate transformation chain**: camera intrinsics K, camera-to-body extrinsic T_cam_body, body-to-NED from ESKF attitude, NED origin at mission start point. |
|
||||
| Disconnected route segments — "satellite re-localization" mentioned but no algorithm | **Functional**: AC requires handling as "core to the system." Multiple disconnected segments expected. No tracking-loss detection, no re-localization trigger, no ESKF re-initialization, no cuVSLAM restart procedure. | **Define re-localization pipeline**: detect cuVSLAM tracking loss → IMU-only ESKF prediction → trigger satellite match on every frame → on match success: ESKF position reset + cuVSLAM restart → on 3 consecutive failures: operator re-localization request. |
|
||||
| No startup handoff from GPS to GPS-denied | **Functional**: System reads GLOBAL_POSITION_INT at startup but no protocol for when GPS is lost/spoofed vs system start. No validation of initial position. | **Define handoff protocol**: system runs continuously, FC receives both real GPS and GPS_INPUT. GPS-denied system always provides its estimate; FC selects best source. Initial position validated against first satellite match. |
|
||||
| No mid-flight reboot recovery | **Functional**: AC requires: "re-initialize from flight controller's current IMU-extrapolated position." No procedure defined. Recovery time estimation missing. | **Define reboot recovery sequence**: read FC position → init ESKF with high uncertainty → load TRT engines → start cuVSLAM → immediate satellite match. Estimated recovery: ~35-70s. Document as known limitation. |
|
||||
| 3-consecutive-failure re-localization request undefined | **Functional**: AC requires ground station re-localization request. No message format, no operator workflow, no system behavior while waiting. | **Define re-localization protocol**: detect 3 failures → send custom MAVLink message with last known position + uncertainty → operator provides approximate coordinates → system uses as ESKF measurement with high covariance. |
|
||||
| Object localization — "trigonometric calculation" with no details | **Functional**: No math, no API, no Viewpro gimbal integration, no accuracy propagation. Other onboard systems cannot use this component as specified. | **Define object localization**: pixel→ray using Viewpro intrinsics + gimbal angles → body frame → NED → ray-ground intersection → WGS84. FastAPI endpoint: POST /objects/locate. Accuracy propagated from UAV position + gimbal uncertainty. |
|
||||
| Satellite matching — GSD normalization and tile selection unspecified | **Functional**: Camera GSD ~15.9 cm/px at 600m vs satellite ~0.3 m/px at zoom 19. The "pre-resize" step is mentioned but not specified. Tile selection radius based on ESKF uncertainty not defined. | **Define GSD handling**: downsample camera frame to match satellite GSD. Define tile selection: ESKF position ± 3σ_horizontal → select tiles covering that area. Assemble tile mosaic for matching. |
|
||||
| Satellite tile storage requirements not calculated | **Functional**: "±2km" preload mentioned but no storage estimate. At zoom 19: a 200km path with ±2km buffer requires ~~130K tiles (~~2.5GB). | **Calculate tile storage**: specify zoom level (18 preferred — 0.6m/px, 4× fewer tiles), estimate storage per mission profile, define maximum mission area by storage limit. |
|
||||
| FastAPI endpoints not in solution draft | **Functional**: Endpoints only in security_analysis.md. No request/response schemas. No SSE event format. No object localization endpoint. | **Consolidate API spec in solution**: define all endpoints, SSE event schema, object localization endpoint. Reference security_analysis.md for auth. |
|
||||
| cuVSLAM configuration missing (calibration, IMU params, mode) | **Functional**: No camera calibration procedure, no IMU noise parameters, no T_imu_rig extrinsic, no mode selection (Mono vs Inertial). | **Define cuVSLAM configuration**: use Inertial mode, specify required calibration data (camera intrinsics, distortion, IMU noise params from datasheet, T_imu_rig from physical measurement), define calibration procedure. |
|
||||
| tech_stack.md inconsistent with draft05 | **Functional**: tech_stack.md says 3fps (should be 0.7fps), LiteSAM at 480px (should be 1280px), missing EfficientLoFTR. | **Flag for update**: tech_stack.md must be synchronized with draft05 corrections. Not addressed in this draft — separate task. |
|
||||
|
||||
|
||||
## Overall Maturity Assessment
|
||||
|
||||
|
||||
| Category | Maturity (1-5) | Assessment |
|
||||
| ----------------------------- | -------------- | --------------------------------------------------------------------------------------------------------------------------------------- |
|
||||
| Hardware & Platform Selection | 3.5 | UAV airframe, cameras, Jetson, batteries — well-researched with specs, weight budget, endurance calculations. Ready for procurement. |
|
||||
| Core Algorithm Selection | 3.0 | cuVSLAM, LiteSAM/XFeat, ESKF — components selected with comparison tables, fallback chains, decision trees. Day-one benchmarks defined. |
|
||||
| AI Inference Runtime | 3.5 | TRT Engine migration thoroughly analyzed. Conversion workflows, memory savings, performance estimates. Code wrapper provided. |
|
||||
| Sensor Fusion (ESKF) | 1.5 | Mentioned but not specified. No implementable detail. Blockerfor coding. |
|
||||
| System Integration | 1.5 | GPS_INPUT, coordinate transforms, inter-component data flow — all under-specified. |
|
||||
| Edge Cases & Resilience | 1.0 | Disconnected segments, reboot recovery, re-localization — acknowledged but no algorithms. |
|
||||
| Operational Readiness | 0.5 | No pre-flight procedures, no in-flight monitoring, no failure response. |
|
||||
| Security | 3.0 | Comprehensive threat model, OP-TEE analysis, LUKS, secure boot. Well-researched. |
|
||||
| **Overall TRL** | **~2.5** | **Technology concept formulated + some component validation. Not implementation-ready.** |
|
||||
|
||||
|
||||
The solution is at approximately **TRL 3** (proof of concept) for hardware/algorithm selection and **TRL 1-2** (basic concept) for system integration, ESKF, and operational procedures.
|
||||
|
||||
## Product Solution Description
|
||||
|
||||
A real-time GPS-denied visual navigation system for fixed-wing UAVs, running on a Jetson Orin Nano Super (8GB). All AI model inference uses native TensorRT Engine files. The system replaces the GPS module by sending MAVLink GPS_INPUT messages via pymavlink over UART at 5-10Hz.
|
||||
|
||||
Position is determined by fusing: (1) CUDA-accelerated visual odometry (cuVSLAM in Inertial mode) from ADTI 20L V1 at 0.7 fps sustained, (2) absolute position corrections from satellite image matching (LiteSAM or XFeat — TRT Engine FP16) using keyframes from the same ADTI image stream, and (3) IMU data from the flight controller via ESKF. Viewpro A40 Pro is reserved for AI object detection only.
|
||||
|
||||
The ESKF is the central state estimator with 15-state error vector. It fuses:
|
||||
|
||||
- **IMU prediction** at 5-10Hz (high-frequency pose propagation)
|
||||
- **cuVSLAM VO measurement** at 0.7Hz (relative pose correction)
|
||||
- **Satellite matching measurement** at ~0.07-0.14Hz (absolute position correction)
|
||||
|
||||
GPS_INPUT messages carry position, velocity, and accuracy derived from the ESKF state and covariance.
|
||||
|
||||
**Hard constraint**: ADTI 20L V1 shoots at 0.7 fps sustained (1430ms interval). Full VO+ESKF pipeline within 400ms per frame. Satellite matching async on keyframes (every 5-10 camera frames). GPS_INPUT at 5-10Hz (ESKF IMU prediction fills gaps between camera frames).
|
||||
|
||||
```
|
||||
┌─────────────────────────────────────────────────────────────────────┐
|
||||
│ OFFLINE (Before Flight) │
|
||||
│ 1. Satellite Tiles → Download & Validate → Pre-resize → Store │
|
||||
│ (Google Maps) (≥0.5m/px, <2yr) (matcher res) (GeoHash)│
|
||||
│ 2. TRT Engine Build (one-time per model version): │
|
||||
│ PyTorch model → reparameterize → ONNX export → trtexec --fp16 │
|
||||
│ Output: litesam.engine, xfeat.engine │
|
||||
│ 3. Camera + IMU calibration (one-time per hardware unit) │
|
||||
│ 4. Copy tiles + engines + calibration to Jetson storage │
|
||||
└─────────────────────────────────────────────────────────────────────┘
|
||||
│
|
||||
▼
|
||||
┌─────────────────────────────────────────────────────────────────────┐
|
||||
│ ONLINE (During Flight) │
|
||||
│ │
|
||||
│ STARTUP: │
|
||||
│ 1. pymavlink → read GLOBAL_POSITION_INT → init ESKF state │
|
||||
│ 2. Load TRT engines + allocate GPU buffers │
|
||||
│ 3. Load camera calibration + IMU calibration │
|
||||
│ 4. Start cuVSLAM (Inertial mode) with ADTI 20L V1 │
|
||||
│ 5. Preload satellite tiles ±2km into RAM │
|
||||
│ 6. First satellite match → validate initial position │
|
||||
│ 7. Begin GPS_INPUT output loop at 5-10Hz │
|
||||
│ │
|
||||
│ EVERY CAMERA FRAME (0.7fps from ADTI 20L V1): │
|
||||
│ ┌──────────────────────────────────────┐ │
|
||||
│ │ ADTI 20L V1 → Downsample (CUDA) │ │
|
||||
│ │ → cuVSLAM VO+IMU (~9ms) │ ← CUDA Stream A │
|
||||
│ │ → ESKF VO measurement │ │
|
||||
│ └──────────────────────────────────────┘ │
|
||||
│ │
|
||||
│ 5-10Hz CONTINUOUS (IMU-driven between camera frames): │
|
||||
│ ┌──────────────────────────────────────┐ │
|
||||
│ │ IMU data → ESKF prediction │ │
|
||||
│ │ ESKF state → GPS_INPUT fields │ │
|
||||
│ │ GPS_INPUT → Flight Controller (UART) │ │
|
||||
│ └──────────────────────────────────────┘ │
|
||||
│ │
|
||||
│ KEYFRAMES (every 5-10 camera frames, async): │
|
||||
│ ┌──────────────────────────────────────┐ │
|
||||
│ │ Camera frame → GSD downsample │ │
|
||||
│ │ Select satellite tile (ESKF pos±3σ) │ │
|
||||
│ │ TRT inference (Stream B): LiteSAM/ │ │
|
||||
│ │ XFeat → correspondences │ │
|
||||
│ │ RANSAC → homography → WGS84 position │ │
|
||||
│ │ ESKF satellite measurement update │──→ Position correction │
|
||||
│ └──────────────────────────────────────┘ │
|
||||
│ │
|
||||
│ TRACKING LOSS (cuVSLAM fails — sharp turn / featureless): │
|
||||
│ ┌──────────────────────────────────────┐ │
|
||||
│ │ ESKF → IMU-only prediction (growing │ │
|
||||
│ │ uncertainty) │ │
|
||||
│ │ Satellite match on EVERY frame │ │
|
||||
│ │ On match success → ESKF reset + │ │
|
||||
│ │ cuVSLAM restart │ │
|
||||
│ │ 3 consecutive failures → operator │ │
|
||||
│ │ re-localization request │ │
|
||||
│ └──────────────────────────────────────┘ │
|
||||
│ │
|
||||
│ TELEMETRY (1Hz): │
|
||||
│ ┌──────────────────────────────────────┐ │
|
||||
│ │ NAMED_VALUE_FLOAT: confidence, drift │──→ Ground Station │
|
||||
│ └──────────────────────────────────────┘ │
|
||||
└─────────────────────────────────────────────────────────────────────┘
|
||||
```
|
||||
|
||||
## Architecture
|
||||
|
||||
### Component: ESKF Sensor Fusion (NEW — previously unspecified)
|
||||
|
||||
**Error-State Kalman Filter** fusing IMU, visual odometry, and satellite matching.
|
||||
|
||||
**Nominal state vector** (propagated by IMU):
|
||||
|
||||
|
||||
| State | Symbol | Size | Description |
|
||||
| ---------- | ------ | ---- | ------------------------------------------------ |
|
||||
| Position | p | 3 | NED position relative to mission origin (meters) |
|
||||
| Velocity | v | 3 | NED velocity (m/s) |
|
||||
| Attitude | q | 4 | Unit quaternion (body-to-NED rotation) |
|
||||
| Accel bias | b_a | 3 | Accelerometer bias (m/s²) |
|
||||
| Gyro bias | b_g | 3 | Gyroscope bias (rad/s) |
|
||||
|
||||
|
||||
**Error-state vector** (estimated by ESKF): δx = [δp, δv, δθ, δb_a, δb_g]ᵀ ∈ ℝ¹⁵
|
||||
where δθ ∈ so(3) is the 3D rotation error.
|
||||
|
||||
**Prediction step** (IMU at 5-10Hz from flight controller):
|
||||
|
||||
- Input: accelerometer a_m, gyroscope ω_m, dt
|
||||
- Propagate nominal state: p += v·dt, v += (R(q)·(a_m - b_a) - g)·dt, q ⊗= Exp(ω_m - b_g)·dt
|
||||
- Propagate error covariance: P = F·P·Fᵀ + Q
|
||||
- F is the 15×15 error-state transition matrix (standard ESKF formulation)
|
||||
- Q: process noise diagonal, initial values from IMU datasheet noise densities
|
||||
|
||||
**VO measurement update** (0.7Hz from cuVSLAM):
|
||||
|
||||
- cuVSLAM outputs relative pose: ΔR, Δt (camera frame)
|
||||
- Transform to NED: Δp_ned = R_body_ned · T_cam_body · Δt
|
||||
- Innovation: z = Δp_ned_measured - Δp_ned_predicted
|
||||
- Observation matrix H_vo maps error state to relative position change
|
||||
- R_vo: measurement noise, initial ~0.1-0.5m (from cuVSLAM precision at 600m+ altitude)
|
||||
- Kalman update: K = P·Hᵀ·(H·P·Hᵀ + R)⁻¹, δx = K·z, P = (I - K·H)·P
|
||||
|
||||
**Satellite measurement update** (0.07-0.14Hz, async):
|
||||
|
||||
- Satellite matching outputs absolute position: lat_sat, lon_sat in WGS84
|
||||
- Convert to NED relative to mission origin
|
||||
- Innovation: z = p_satellite - p_predicted
|
||||
- H_sat = [I₃, 0, 0, 0, 0] (directly observes position)
|
||||
- R_sat: measurement noise, from matching confidence (~5-20m based on RANSAC inlier ratio)
|
||||
- Provides absolute position correction — bounds drift accumulation
|
||||
|
||||
**Scale observability**:
|
||||
|
||||
- Monocular cuVSLAM has scale ambiguity during constant-velocity flight
|
||||
- Scale is constrained by: (1) satellite matching absolute positions (primary), (2) known flight altitude from barometer + predefined mission altitude, (3) IMU accelerometer during maneuvers
|
||||
- During long straight segments without satellite correction, scale drift is possible. Satellite corrections every ~7-14s re-anchor scale.
|
||||
|
||||
**Tuning approach**: Start with IMU datasheet noise values for Q. Start with conservative R values (high measurement noise). Tune on flight test data by comparing ESKF output to known GPS ground truth.
|
||||
|
||||
|
||||
| Solution | Tools | Advantages | Limitations | Performance | Fit |
|
||||
| -------------------------- | --------------- | ------------------------------------------------------------- | -------------------------------------- | ------------- | ----------- |
|
||||
| Custom ESKF (Python/NumPy) | NumPy, SciPy | Full control, minimal dependencies, well-understood algorithm | Implementation effort, tuning required | <1ms per step | ✅ Selected |
|
||||
| FilterPy ESKF | FilterPy v1.4.5 | Reference implementation, less code | Less flexible for multi-rate fusion | <1ms per step | ⚠️ Fallback |
|
||||
|
||||
|
||||
### Component: Coordinate System & Transformations (NEW — previously undefined)
|
||||
|
||||
**Reference frames**:
|
||||
|
||||
- **Camera frame (C)**: origin at camera optical center, Z forward, X right, Y down (OpenCV convention)
|
||||
- **Body frame (B)**: origin at UAV CG, X forward (nose), Y right (starboard), Z down
|
||||
- **NED frame (N)**: North-East-Down, origin at mission start point
|
||||
- **WGS84**: latitude, longitude, altitude (output format)
|
||||
|
||||
**Transformation chain**:
|
||||
|
||||
1. **Pixel → Camera ray**: p_cam = K⁻¹ · [u, v, 1]ᵀ where K = camera intrinsic matrix (ADTI 20L V1: fx, fy from 16mm lens + APS-C sensor)
|
||||
2. **Camera → Body**: p_body = T_cam_body · p_cam where T_cam_body is the fixed mounting rotation (camera points nadir: 90° pitch rotation from body X-forward to camera Z-down)
|
||||
3. **Body → NED**: p_ned = R_body_ned(q) · p_body where q is the ESKF quaternion attitude estimate
|
||||
4. **NED → WGS84**: lat = lat_origin + p_north / R_earth, lon = lon_origin + p_east / (R_earth · cos(lat_origin)) where (lat_origin, lon_origin) is the mission start GPS position
|
||||
|
||||
**Camera intrinsic matrix K** (ADTI 20L V1 + 16mm lens):
|
||||
|
||||
- Sensor: 23.2 × 15.4 mm, Resolution: 5456 × 3632
|
||||
- fx = fy = focal_mm × width_px / sensor_width_mm = 16 × 5456 / 23.2 = 3763 pixels
|
||||
- cx = 2728, cy = 1816 (sensor center)
|
||||
- Distortion: Brown model (k1, k2, p1, p2 from calibration)
|
||||
|
||||
**T_cam_body** (camera mount):
|
||||
|
||||
- Navigation camera is fixed, pointing nadir (downward), not autostabilized
|
||||
- R_cam_body = R_x(180°) · R_z(0°) (camera Z-axis aligned with body -Z, camera X with body X)
|
||||
- Translation: offset from CG to camera mount (measured during assembly, typically <0.3m)
|
||||
|
||||
**Satellite match → WGS84**:
|
||||
|
||||
- Feature correspondences between camera frame and geo-referenced satellite tile
|
||||
- Homography H maps camera pixels to satellite tile pixels
|
||||
- Satellite tile pixel → WGS84 via tile's known georeference (zoom level + tile x,y → lat,lon)
|
||||
- Camera center projects to satellite pixel (cx_sat, cy_sat) via H
|
||||
- Convert (cx_sat, cy_sat) to WGS84 using tile georeference
|
||||
|
||||
### Component: GPS_INPUT Message Population (NEW — previously undefined)
|
||||
|
||||
|
||||
| GPS_INPUT Field | Source | Computation |
|
||||
| ----------------------- | ---------------------------------------------- | ------------------------------------------------------------------------------------------------------- |
|
||||
| lat, lon | ESKF position (NED) | NED → WGS84 conversion using mission origin |
|
||||
| alt | ESKF position (Down) + mission origin altitude | alt = alt_origin - p_down |
|
||||
| vn, ve, vd | ESKF velocity state | Direct from ESKF v[0], v[1], v[2] |
|
||||
| fix_type | Confidence tier | 3 (3D fix) when satellite-anchored (last match <30s). 2 (2D) when VO-only. 0 (no fix) when IMU-only >5s |
|
||||
| hdop | ESKF horizontal covariance | hdop = sqrt(P[0,0] + P[1,1]) / 5.0 (approximate CEP→HDOP mapping) |
|
||||
| vdop | ESKF vertical covariance | vdop = sqrt(P[2,2]) / 5.0 |
|
||||
| horiz_accuracy | ESKF horizontal covariance | horiz_accuracy = sqrt(P[0,0] + P[1,1]) meters |
|
||||
| vert_accuracy | ESKF vertical covariance | vert_accuracy = sqrt(P[2,2]) meters |
|
||||
| speed_accuracy | ESKF velocity covariance | speed_accuracy = sqrt(P[3,3] + P[4,4]) m/s |
|
||||
| time_week, time_week_ms | System time | Convert Unix time to GPS epoch (GPS epoch = 1980-01-06, subtract leap seconds) |
|
||||
| satellites_visible | Constant | 10 (synthetic — prevents satellite-count failsafes in ArduPilot) |
|
||||
| gps_id | Constant | 0 |
|
||||
| ignore_flags | Constant | 0 (provide all fields) |
|
||||
|
||||
|
||||
**Confidence tiers** mapping to GPS_INPUT:
|
||||
|
||||
|
||||
| Tier | Condition | fix_type | horiz_accuracy | Rationale |
|
||||
| ------ | ------------------------------------------------- | ---------- | ------------------------------- | -------------------------------------- |
|
||||
| HIGH | Satellite match <30s ago, ESKF covariance < 400m² | 3 (3D fix) | From ESKF P (typically 5-20m) | Absolute position anchor recent |
|
||||
| MEDIUM | cuVSLAM tracking OK, no recent satellite match | 3 (3D fix) | From ESKF P (typically 20-50m) | Relative tracking valid, drift growing |
|
||||
| LOW | cuVSLAM lost, IMU-only | 2 (2D fix) | From ESKF P (50-200m+, growing) | Only IMU dead reckoning, rapid drift |
|
||||
| FAILED | 3+ consecutive total failures | 0 (no fix) | 999.0 | System cannot determine position |
|
||||
|
||||
|
||||
### Component: Disconnected Route Segment Handling (NEW — previously undefined)
|
||||
|
||||
**Trigger**: cuVSLAM reports tracking_lost OR tracking confidence drops below threshold
|
||||
|
||||
**Algorithm**:
|
||||
|
||||
```
|
||||
STATE: TRACKING_NORMAL
|
||||
cuVSLAM provides relative pose
|
||||
ESKF VO measurement updates at 0.7Hz
|
||||
Satellite matching on keyframes (every 5-10 frames)
|
||||
|
||||
STATE: TRACKING_LOST (enter when cuVSLAM reports loss)
|
||||
1. ESKF continues with IMU-only prediction (no VO updates)
|
||||
→ uncertainty grows rapidly (~1-5 m/s drift with consumer IMU)
|
||||
2. Switch satellite matching to EVERY frame (not just keyframes)
|
||||
→ maximize chances of getting absolute correction
|
||||
3. For each camera frame:
|
||||
a. Attempt satellite match using ESKF predicted position ± 3σ for tile selection
|
||||
b. If match succeeds (RANSAC inlier ratio > 30%):
|
||||
→ ESKF measurement update with satellite position
|
||||
→ Restart cuVSLAM with current frame as new origin
|
||||
→ Transition to TRACKING_NORMAL
|
||||
→ Reset failure counter
|
||||
c. If match fails:
|
||||
→ Increment failure_counter
|
||||
→ Continue IMU-only ESKF prediction
|
||||
4. If failure_counter >= 3:
|
||||
→ Send re-localization request to ground station
|
||||
→ GPS_INPUT fix_type = 0 (no fix), horiz_accuracy = 999.0
|
||||
→ Continue attempting satellite matching on each frame
|
||||
5. If operator sends re-localization hint (approximate lat,lon):
|
||||
→ Use as ESKF measurement with high covariance (~500m)
|
||||
→ Attempt satellite match in that area
|
||||
→ On success: transition to TRACKING_NORMAL
|
||||
|
||||
STATE: SEGMENT_DISCONNECT
|
||||
After re-localization following tracking loss:
|
||||
→ New cuVSLAM track is independent of previous track
|
||||
→ ESKF maintains global NED position continuity via satellite anchor
|
||||
→ No need to "connect" segments at the cuVSLAM level
|
||||
→ ESKF already handles this: satellite corrections keep global position consistent
|
||||
```
|
||||
|
||||
### Component: Satellite Image Matching Pipeline (UPDATED — added GSD + tile selection details)
|
||||
|
||||
**GSD normalization**:
|
||||
|
||||
- Camera GSD at 600m: ~15.9 cm/pixel (ADTI 20L V1 + 16mm)
|
||||
- Satellite tile GSD at zoom 18: ~0.6 m/pixel
|
||||
- Scale ratio: ~3.8:1
|
||||
- Downsample camera image to satellite GSD before matching: resize from 5456×3632 to ~1440×960 (matching zoom 18 GSD)
|
||||
- This is close to LiteSAM's 1280px input — use 1280px with minor GSD mismatch acceptable for matching
|
||||
|
||||
**Tile selection**:
|
||||
|
||||
- Input: ESKF position estimate (lat, lon) + horizontal covariance σ_h
|
||||
- Search radius: max(3·σ_h, 500m) — at least 500m to handle initial uncertainty
|
||||
- Compute geohash for center position → load tiles covering the search area
|
||||
- Assemble tile mosaic if needed (typically 2×2 to 4×4 tiles for adequate coverage)
|
||||
- If ESKF uncertainty > 2km: tile selection unreliable, fall back to wider search or request operator input
|
||||
|
||||
**Tile storage calculation** (zoom 18 — 0.6 m/pixel):
|
||||
|
||||
- Each 256×256 tile covers ~153m × 153m
|
||||
- Flight path 200km with ±2km buffer: area ≈ 200km × 4km = 800 km²
|
||||
- Tiles needed: 800,000,000 / (153 × 153) ≈ 34,200 tiles
|
||||
- Storage: ~10-15KB per JPEG tile → ~340-510 MB
|
||||
- With zoom 19 overlap tiles for higher precision: ×4 = ~1.4-2.0 GB
|
||||
- Recommended: zoom 18 primary + zoom 19 for ±500m along flight path → ~500-800 MB total
|
||||
|
||||
|
||||
| Solution | Tools | Advantages | Limitations | Performance (est. Orin Nano Super TRT FP16) | Params | Fit |
|
||||
| -------------------------------------- | ------------------------- | ------------------------------------------------------------------------ | ------------------------------------------------------ | ------------------------------------------- | ------ | ------------------------------- |
|
||||
| LiteSAM (opt) TRT Engine FP16 @ 1280px | trtexec + tensorrt Python | Best satellite-aerial accuracy (RMSE@30=17.86m UAV-VisLoc), 6.31M params | MinGRU TRT export needs verification (LOW-MEDIUM risk) | Est. ~165-330ms | 6.31M | ✅ Primary |
|
||||
| EfficientLoFTR TRT Engine FP16 | trtexec + tensorrt Python | Proven TRT path (Coarse_LoFTR_TRT). Semi-dense. CVPR 2024. | 2.4x more params than LiteSAM. | Est. ~200-400ms | 15.05M | ✅ Fallback if LiteSAM TRT fails |
|
||||
| XFeat TRT Engine FP16 | trtexec + tensorrt Python | Fastest. Proven TRT implementation. | General-purpose, not designed for cross-view gap. | Est. ~50-100ms | <5M | ✅ Speed fallback |
|
||||
|
||||
|
||||
### Component: cuVSLAM Configuration (NEW — previously undefined)
|
||||
|
||||
**Mode**: Inertial (mono camera + IMU)
|
||||
|
||||
**Camera configuration** (ADTI 20L V1 + 16mm lens):
|
||||
|
||||
- Model: Brown distortion
|
||||
- fx = fy = 3763 px (16mm on 23.2mm sensor at 5456px width)
|
||||
- cx = 2728 px, cy = 1816 px
|
||||
- Distortion coefficients: from calibration (k1, k2, p1, p2)
|
||||
- Border: 50px (ignore lens edge distortion)
|
||||
|
||||
**IMU configuration** (Pixhawk 6x IMU — ICM-42688-P):
|
||||
|
||||
- Gyroscope noise density: 3.0 × 10⁻³ °/s/√Hz
|
||||
- Gyroscope random walk: 5.0 × 10⁻⁵ °/s²/√Hz
|
||||
- Accelerometer noise density: 70 µg/√Hz
|
||||
- Accelerometer random walk: ~2.0 × 10⁻³ m/s³/√Hz
|
||||
- IMU frequency: 200 Hz (from flight controller via MAVLink)
|
||||
- T_imu_rig: measured transformation from Pixhawk IMU to camera center (translation + rotation)
|
||||
|
||||
**cuVSLAM settings**:
|
||||
|
||||
- OdometryMode: INERTIAL
|
||||
- MulticameraMode: PRECISION (favor accuracy over speed — we have 1430ms budget)
|
||||
- Input resolution: downsample to 1280×852 (or 720p) for processing speed
|
||||
- async_bundle_adjustment: True
|
||||
|
||||
**Initialization**:
|
||||
|
||||
- cuVSLAM initializes automatically when it receives the first camera frame + IMU data
|
||||
- First few frames used for feature initialization and scale estimation
|
||||
- First satellite match validates and corrects the initial position
|
||||
|
||||
**Calibration procedure** (one-time per hardware unit):
|
||||
|
||||
1. Camera intrinsics: checkerboard calibration with OpenCV (or use manufacturer data if available)
|
||||
2. Camera-IMU extrinsic (T_imu_rig): Kalibr tool with checkerboard + IMU data
|
||||
3. IMU noise parameters: Allan variance analysis or use datasheet values
|
||||
4. Store calibration files on Jetson storage
|
||||
|
||||
### Component: AI Model Inference Runtime (UNCHANGED)
|
||||
|
||||
Native TRT Engine — optimal performance and memory on fixed NVIDIA hardware. See draft05 for full comparison table and conversion workflow.
|
||||
|
||||
### Component: Visual Odometry (UNCHANGED)
|
||||
|
||||
cuVSLAM in Inertial mode, fed by ADTI 20L V1 at 0.7 fps sustained. See draft05 for feasibility analysis at 0.7fps.
|
||||
|
||||
### Component: Flight Controller Integration (UPDATED — added GPS_INPUT field spec)
|
||||
|
||||
pymavlink over UART at 5-10Hz. GPS_INPUT field population defined above.
|
||||
|
||||
ArduPilot configuration:
|
||||
|
||||
- GPS1_TYPE = 14 (MAVLink)
|
||||
- GPS_RATE = 5 (minimum, matching our 5-10Hz output)
|
||||
- EK3_SRC1_POSXY = 1 (GPS), EK3_SRC1_VELXY = 1 (GPS) — EKF uses GPS_INPUT as position/velocity source
|
||||
|
||||
### Component: Object Localization (NEW — previously undefined)
|
||||
|
||||
**Input**: pixel coordinates (u, v) in Viewpro A40 Pro image, current gimbal angles (pan_deg, tilt_deg), zoom factor, UAV position from GPS-denied system, UAV altitude
|
||||
|
||||
**Process**:
|
||||
|
||||
1. Pixel → camera ray: ray_cam = K_viewpro⁻¹(zoom) · [u, v, 1]ᵀ
|
||||
2. Camera → gimbal frame: ray_gimbal = R_gimbal(pan, tilt) · ray_cam
|
||||
3. Gimbal → body: ray_body = T_gimbal_body · ray_gimbal
|
||||
4. Body → NED: ray_ned = R_body_ned(q) · ray_body
|
||||
5. Ray-ground intersection: assuming flat terrain at UAV altitude h: t = -h / ray_ned[2], p_ground_ned = p_uav_ned + t · ray_ned
|
||||
6. NED → WGS84: convert to lat, lon
|
||||
|
||||
**Output**: { lat, lon, accuracy_m, confidence }
|
||||
|
||||
- accuracy_m propagated from: UAV position accuracy (from ESKF) + gimbal angle uncertainty + altitude uncertainty
|
||||
|
||||
**API endpoint**: POST /objects/locate
|
||||
|
||||
- Request: { pixel_x, pixel_y, gimbal_pan_deg, gimbal_tilt_deg, zoom_factor }
|
||||
- Response: { lat, lon, alt, accuracy_m, confidence, uav_position: {lat, lon, alt}, timestamp }
|
||||
|
||||
### Component: Startup, Handoff & Failsafe (UPDATED — added handoff + reboot + re-localization)
|
||||
|
||||
**GPS-denied handoff protocol**:
|
||||
|
||||
- GPS-denied system runs continuously from companion computer boot
|
||||
- Reads initial position from FC (GLOBAL_POSITION_INT) — this may be real GPS or last known
|
||||
- First satellite match validates the initial position
|
||||
- FC receives both real GPS (if available) and GPS_INPUT; FC EKF selects best source based on accuracy
|
||||
- No explicit "switch" — the GPS-denied system is a secondary GPS source
|
||||
|
||||
**Startup sequence** (expanded from draft05):
|
||||
|
||||
1. Boot Jetson → start GPS-Denied service (systemd)
|
||||
2. Connect to flight controller via pymavlink on UART
|
||||
3. Wait for heartbeat
|
||||
4. Initialize PyCUDA context
|
||||
5. Load TRT engines: litesam.engine + xfeat.engine (~1-3s each)
|
||||
6. Allocate GPU I/O buffers
|
||||
7. Create CUDA streams: Stream A (cuVSLAM), Stream B (satellite matching)
|
||||
8. Load camera calibration + IMU calibration files
|
||||
9. Read GLOBAL_POSITION_INT → set mission origin (NED reference point) → init ESKF
|
||||
10. Start cuVSLAM (Inertial mode) with ADTI 20L V1 camera stream
|
||||
11. Preload satellite tiles within ±2km into RAM
|
||||
12. Trigger first satellite match → validate initial position
|
||||
13. Begin GPS_INPUT output loop at 5-10Hz
|
||||
14. System ready
|
||||
|
||||
**Mid-flight reboot recovery**:
|
||||
|
||||
1. Jetson boots (~30-60s)
|
||||
2. GPS-Denied service starts, connects to FC
|
||||
3. Read GLOBAL_POSITION_INT (FC's current IMU-extrapolated position)
|
||||
4. Init ESKF with this position + HIGH uncertainty covariance (σ = 200m)
|
||||
5. Load TRT engines (~2-6s total)
|
||||
6. Start cuVSLAM (fresh, no prior map)
|
||||
7. Immediate satellite matching on first camera frame
|
||||
8. On satellite match success: ESKF corrected, uncertainty drops
|
||||
9. Estimated total recovery: ~35-70s
|
||||
10. During recovery: FC uses IMU-only dead reckoning (at 70 km/h: ~700-1400m uncontrolled drift)
|
||||
11. **Known limitation**: recovery time is dominated by Jetson boot time
|
||||
|
||||
**3-consecutive-failure re-localization**:
|
||||
|
||||
- Trigger: VO lost + satellite match failed × 3 consecutive camera frames
|
||||
- Action: send re-localization request via MAVLink STATUSTEXT or custom message
|
||||
- Message content: "RELOC_REQ: last_lat={lat} last_lon={lon} uncertainty={σ}m"
|
||||
- Operator response: MAVLink COMMAND_LONG with approximate lat/lon
|
||||
- System: use operator position as ESKF measurement with R = diag(500², 500², 100²) meters²
|
||||
- System continues satellite matching with updated search area
|
||||
- While waiting: GPS_INPUT fix_type=0, IMU-only ESKF prediction continues
|
||||
|
||||
### Component: Ground Station Telemetry (UPDATED — added re-localization)
|
||||
|
||||
MAVLink messages to ground station:
|
||||
|
||||
|
||||
| Message | Rate | Content |
|
||||
| ----------------------------- | -------- | --------------------------------------------------- |
|
||||
| NAMED_VALUE_FLOAT "gps_conf" | 1Hz | Confidence score (0.0-1.0) |
|
||||
| NAMED_VALUE_FLOAT "gps_drift" | 1Hz | Estimated drift from last satellite anchor (meters) |
|
||||
| NAMED_VALUE_FLOAT "gps_hacc" | 1Hz | Horizontal accuracy (meters, from ESKF) |
|
||||
| STATUSTEXT | On event | "RELOC_REQ: ..." for re-localization request |
|
||||
| STATUSTEXT | On event | Tracking loss / recovery notifications |
|
||||
|
||||
|
||||
### Component: Thermal Management (UNCHANGED)
|
||||
|
||||
Same adaptive pipeline from draft05. Active cooling required at 25W. Throttling at 80°C SoC junction.
|
||||
|
||||
### Component: API & Inter-System Communication (NEW — consolidated)
|
||||
|
||||
FastAPI (Uvicorn) running locally on Jetson for inter-process communication with other onboard systems.
|
||||
|
||||
|
||||
| Endpoint | Method | Purpose | Auth |
|
||||
| --------------------- | --------- | -------------------------------------- | ---- |
|
||||
| /sessions | POST | Start GPS-denied session | JWT |
|
||||
| /sessions/{id}/stream | GET (SSE) | Real-time position + confidence stream | JWT |
|
||||
| /sessions/{id}/anchor | POST | Operator re-localization hint | JWT |
|
||||
| /sessions/{id} | DELETE | End session | JWT |
|
||||
| /objects/locate | POST | Object GPS from pixel coordinates | JWT |
|
||||
| /health | GET | System health + memory + thermal | None |
|
||||
|
||||
|
||||
**SSE event schema** (1Hz):
|
||||
|
||||
```json
|
||||
{
|
||||
"type": "position",
|
||||
"timestamp": "2026-03-17T12:00:00.000Z",
|
||||
"lat": 48.123456,
|
||||
"lon": 37.654321,
|
||||
"alt": 600.0,
|
||||
"accuracy_h": 15.2,
|
||||
"accuracy_v": 8.1,
|
||||
"confidence": "HIGH",
|
||||
"drift_from_anchor": 12.5,
|
||||
"vo_status": "tracking",
|
||||
"last_satellite_match_age_s": 8.3
|
||||
}
|
||||
```
|
||||
|
||||
## UAV Platform
|
||||
|
||||
Unchanged from draft05. See draft05 for: airframe configuration (3.5m S-2 composite, 12.5kg AUW), flight performance (3.4h endurance at 50 km/h), camera specifications (ADTI 20L V1 + 16mm, Viewpro A40 Pro), ground coverage calculations.
|
||||
|
||||
## Speed Optimization Techniques
|
||||
|
||||
Unchanged from draft05. Key points: cuVSLAM ~9ms/frame, native TRT Engine (no ONNX RT), dual CUDA streams, 5-10Hz GPS_INPUT from ESKF IMU prediction.
|
||||
|
||||
## Processing Time Budget
|
||||
|
||||
Unchanged from draft05. VO frame: ~17-22ms. Satellite matching: ≤210ms async. Well within 1430ms frame interval.
|
||||
|
||||
## Memory Budget (Jetson Orin Nano Super, 8GB shared)
|
||||
|
||||
|
||||
| Component | Memory | Notes |
|
||||
| ------------------------- | -------------- | ------------------------------------------- |
|
||||
| OS + runtime | ~1.5GB | JetPack 6.2 + Python |
|
||||
| cuVSLAM | ~200-500MB | CUDA library + map |
|
||||
| LiteSAM TRT engine | ~50-80MB | If LiteSAM fails: EfficientLoFTR ~100-150MB |
|
||||
| XFeat TRT engine | ~30-50MB | |
|
||||
| Preloaded satellite tiles | ~200MB | ±2km of flight plan |
|
||||
| pymavlink + MAVLink | ~20MB | |
|
||||
| FastAPI (local IPC) | ~50MB | |
|
||||
| ESKF + buffers | ~10MB | |
|
||||
| **Total** | **~2.1-2.9GB** | **26-36% of 8GB** |
|
||||
|
||||
|
||||
## Key Risks and Mitigations
|
||||
|
||||
|
||||
| Risk | Likelihood | Impact | Mitigation |
|
||||
| ------------------------------------------------------- | ---------- | ----------------------------------------------------- | ------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
|
||||
| LiteSAM MinGRU ops unsupported in TRT 10.3 | LOW-MEDIUM | LiteSAM TRT export fails | Day-one verification. Fallback: EfficientLoFTR TRT → XFeat TRT. |
|
||||
| cuVSLAM fails on low-texture terrain at 0.7fps | HIGH | Frequent tracking loss | Satellite matching corrections bound drift. Re-localization pipeline handles tracking loss. IMU bridges short gaps. |
|
||||
| Google Maps satellite quality in conflict zone | HIGH | Satellite matching fails, outdated imagery | Pre-flight tile validation. Consider alternative providers (Bing, Mapbox). Robust to seasonal appearance changes via feature-based matching. |
|
||||
| ESKF scale drift during long constant-velocity segments | MEDIUM | Position error exceeds 100m between satellite anchors | Satellite corrections every 7-14s re-anchor. Altitude constraint from barometer. Monitor drift rate — if >50m between corrections, increase satellite matching frequency. |
|
||||
| Monocular scale ambiguity | MEDIUM | Metric scale lost during constant-velocity flight | Satellite absolute corrections provide scale. Known altitude constrains vertical scale. IMU acceleration during turns provides observability. |
|
||||
| AUW exceeds AT4125 recommended range | MEDIUM | Reduced endurance, motor thermal stress | 12.5 kg vs 8-10 kg recommended. Monitor motor temps. Weight optimization. |
|
||||
| ADTI mechanical shutter lifespan | MEDIUM | Replacement needed periodically | ~8,800 actuations/flight at 0.7fps. Estimated 11-57 flights before replacement. Budget as consumable. |
|
||||
| Mid-flight companion computer failure | LOW | ~35-70s position gap | Reboot recovery procedure defined. FC uses IMU dead reckoning during gap. Known limitation. |
|
||||
| Thermal throttling on Jetson | MEDIUM | Satellite matching latency increases | Active cooling required. Monitor SoC temp. Throttling at 80°C. Our workload ~8-15W typical — well under 25W TDP. |
|
||||
| Engine incompatibility after JetPack update | MEDIUM | Must rebuild engines | Include engine rebuild in update procedure. |
|
||||
| TRT engine build OOM on 8GB | LOW | Cannot build on target | Models small (6.31M, <5M). Reduce --memPoolSize if needed. |
|
||||
|
||||
|
||||
## Testing Strategy
|
||||
|
||||
### Integration / Functional Tests
|
||||
|
||||
- **ESKF correctness**: Feed recorded IMU + synthetic VO/satellite data → verify output matches reference ESKF implementation
|
||||
- **GPS_INPUT field validation**: Send GPS_INPUT to SITL ArduPilot → verify EKF accepts and uses the data correctly
|
||||
- **Coordinate transform chain**: Known GPS → NED → pixel → back to GPS — verify round-trip error <0.1m
|
||||
- **Disconnected segment handling**: Simulate tracking loss → verify satellite re-localization triggers → verify cuVSLAM restarts → verify ESKF position continuity
|
||||
- **3-consecutive-failure**: Simulate VO + satellite failures → verify re-localization request sent → verify operator hint accepted
|
||||
- **Object localization**: Known object at known GPS → verify computed GPS matches within camera accuracy
|
||||
- **Mid-flight reboot**: Kill GPS-denied process → restart → verify recovery within expected time → verify position accuracy after recovery
|
||||
- **TRT engine load test**: Verify engines load successfully on Jetson
|
||||
- **TRT inference correctness**: Compare TRT output vs PyTorch reference (max L1 error < 0.01)
|
||||
- **CUDA Stream pipelining**: Verify Stream B satellite matching does not block Stream A VO
|
||||
- **ADTI sustained capture rate**: Verify 0.7fps sustained >30 min without buffer overflow
|
||||
- **Confidence tier transitions**: Verify fix_type and accuracy change correctly across HIGH → MEDIUM → LOW → FAILED transitions
|
||||
|
||||
### Non-Functional Tests
|
||||
|
||||
- **End-to-end accuracy** (primary validation): Fly with real GPS recording → run GPS-denied system in parallel → compare estimated vs real positions → verify 80% within 50m, 60% within 20m
|
||||
- **VO drift rate**: Measure cuVSLAM drift over 1km straight segment without satellite correction
|
||||
- **Satellite matching accuracy**: Compare satellite-matched position vs real GPS at known locations
|
||||
- **Processing time**: Verify end-to-end per-frame <400ms
|
||||
- **Memory usage**: Monitor over 30-min session → verify <8GB, no leaks
|
||||
- **Thermal**: Sustained 30-min run → verify no throttling
|
||||
- **GPS_INPUT rate**: Verify consistent 5-10Hz delivery to FC
|
||||
- **Tile storage**: Validate calculated storage matches actual for test mission area
|
||||
- **MinGRU TRT compatibility** (day-one blocker): Clone LiteSAM → ONNX export → polygraphy → trtexec
|
||||
- **Flight endurance**: Ground-test full system power draw against 267W estimate
|
||||
|
||||
## References
|
||||
|
||||
- ArduPilot GPS_RATE parameter: [https://github.com/ArduPilot/ardupilot/pull/15980](https://github.com/ArduPilot/ardupilot/pull/15980)
|
||||
- MAVLink GPS_INPUT message: [https://ardupilot.org/mavproxy/docs/modules/GPSInput.html](https://ardupilot.org/mavproxy/docs/modules/GPSInput.html)
|
||||
- pymavlink GPS_INPUT example: [https://webperso.ensta.fr/lebars/Share/GPS_INPUT_pymavlink.py](https://webperso.ensta.fr/lebars/Share/GPS_INPUT_pymavlink.py)
|
||||
- ESKF reference (fixed-wing UAV): [https://github.com/ludvigls/ESKF](https://github.com/ludvigls/ESKF)
|
||||
- ROS ESKF multi-sensor: [https://github.com/EliaTarasov/ESKF](https://github.com/EliaTarasov/ESKF)
|
||||
- Range-VIO scale observability: [https://arxiv.org/abs/2103.15215](https://arxiv.org/abs/2103.15215)
|
||||
- NaviLoc trajectory-level localization: [https://www.mdpi.com/2504-446X/10/2/97](https://www.mdpi.com/2504-446X/10/2/97)
|
||||
- SatLoc-Fusion hierarchical framework: [https://www.scilit.com/publications/e5cafaf875a49297a62b298a89d5572f](https://www.scilit.com/publications/e5cafaf875a49297a62b298a89d5572f)
|
||||
- Auterion GPS-denied workflow: [https://docs.auterion.com/vehicle-operation/auterion-mission-control/useful-resources/operations/gps-denied-workflow](https://docs.auterion.com/vehicle-operation/auterion-mission-control/useful-resources/operations/gps-denied-workflow)
|
||||
- PX4 GNSS-denied flight: [https://docs.px4.io/main/en/advanced_config/gnss_degraded_or_denied_flight.html](https://docs.px4.io/main/en/advanced_config/gnss_degraded_or_denied_flight.html)
|
||||
- ArduPilot GPS_INPUT advanced usage: [https://discuss.ardupilot.org/t/advanced-usage-of-gps-type-mav-14/99406](https://discuss.ardupilot.org/t/advanced-usage-of-gps-type-mav-14/99406)
|
||||
- Google Maps Ukraine imagery: [https://newsukraine.rbc.ua/news/google-maps-has-surprise-for-satellite-imagery-1727182380.html](https://newsukraine.rbc.ua/news/google-maps-has-surprise-for-satellite-imagery-1727182380.html)
|
||||
- Jetson Orin Nano Super thermal: [https://edgeaistack.app/blog/jetson-orin-nano-power-consumption/](https://edgeaistack.app/blog/jetson-orin-nano-power-consumption/)
|
||||
- GSD matching research: [https://www.kjrs.org/journal/view.html?pn=related&uid=756&vmd=Full](https://www.kjrs.org/journal/view.html?pn=related&uid=756&vmd=Full)
|
||||
- VO+satellite matching pipeline: [https://polen.itu.edu.tr/items/1fe1e872-7cea-44d8-a8de-339e4587bee6](https://polen.itu.edu.tr/items/1fe1e872-7cea-44d8-a8de-339e4587bee6)
|
||||
- PyCuVSLAM docs: [https://wiki.seeedstudio.com/pycuvslam_recomputer_robotics/](https://wiki.seeedstudio.com/pycuvslam_recomputer_robotics/)
|
||||
- Pixhawk 6x IMU (ICM-42688-P) datasheet: [https://invensense.tdk.com/products/motion-tracking/6-axis/icm-42688-p/](https://invensense.tdk.com/products/motion-tracking/6-axis/icm-42688-p/)
|
||||
- All references from solution_draft05.md
|
||||
|
||||
## Related Artifacts
|
||||
|
||||
- AC Assessment: `_docs/00_research/gps_denied_nav/00_ac_assessment.md`
|
||||
- Completeness assessment research: `_docs/00_research/solution_completeness_assessment/`
|
||||
- Previous research: `_docs/00_research/trt_engine_migration/`
|
||||
- Tech stack evaluation: `_docs/01_solution/tech_stack.md` (needs sync with draft05 corrections)
|
||||
- Security analysis: `_docs/01_solution/security_analysis.md`
|
||||
- Previous draft: `_docs/01_solution/solution_draft05.md`
|
||||
|
||||
@@ -1,283 +0,0 @@
|
||||
# Solution Draft
|
||||
|
||||
## Product Solution Description
|
||||
|
||||
A real-time GPS-denied visual navigation system for fixed-wing UAVs, running entirely on a Jetson Orin Nano Super (8GB). The system determines frame-center GPS coordinates by fusing three information sources: (1) CUDA-accelerated visual odometry (cuVSLAM), (2) absolute position corrections from satellite image matching, and (3) IMU-based motion prediction. Results stream to clients via REST API + SSE in real time.
|
||||
|
||||
**Hard constraint**: Camera shoots at ~3fps (333-400ms interval). The full pipeline must complete within **400ms per frame**.
|
||||
|
||||
**Satellite matching strategy**: Benchmark LiteSAM on actual Orin Nano Super hardware as a day-one priority. If LiteSAM cannot achieve ≤400ms at 480px resolution, **abandon it entirely** and use XFeat semi-dense matching as the primary satellite matcher. Speed is non-negotiable.
|
||||
|
||||
**Core architectural principles**:
|
||||
1. **cuVSLAM handles VO** — NVIDIA's CUDA-accelerated library achieves 90fps on Jetson Orin Nano, giving VO essentially "for free" (~11ms/frame).
|
||||
2. **Keyframe-based satellite matching** — satellite matcher runs on keyframes only (every 3-10 frames), amortizing its cost. Non-keyframes rely on cuVSLAM VO + IMU.
|
||||
3. **Every keyframe independently attempts satellite-based geo-localization** — this handles disconnected segments natively.
|
||||
4. **Pipeline parallelism** — satellite matching for frame N overlaps with VO processing of frame N+1 via CUDA streams.
|
||||
|
||||
```
|
||||
┌─────────────────────────────────────────────────────────────────┐
|
||||
│ OFFLINE (Before Flight) │
|
||||
│ Satellite Tiles → Download & Crop → Store as tile pairs │
|
||||
│ (Google Maps) (per flight plan) (disk, GeoHash indexed) │
|
||||
└─────────────────────────────────────────────────────────────────┘
|
||||
│
|
||||
▼
|
||||
┌─────────────────────────────────────────────────────────────────┐
|
||||
│ ONLINE (During Flight) │
|
||||
│ │
|
||||
│ EVERY FRAME (400ms budget): │
|
||||
│ ┌────────────────────────────────┐ │
|
||||
│ │ Camera → Downsample (CUDA 2ms)│ │
|
||||
│ │ → cuVSLAM VO+IMU (~11ms)│──→ ESKF Update → SSE Emit │
|
||||
│ └────────────────────────────────┘ ↑ │
|
||||
│ │ │
|
||||
│ KEYFRAMES ONLY (every 3-10 frames): │ │
|
||||
│ ┌────────────────────────────────────┐ │ │
|
||||
│ │ Satellite match (async CUDA stream)│─────┘ │
|
||||
│ │ LiteSAM or XFeat (see benchmark) │ │
|
||||
│ │ (does NOT block VO output) │ │
|
||||
│ └────────────────────────────────────┘ │
|
||||
│ │
|
||||
│ IMU: 100+Hz continuous → ESKF prediction │
|
||||
└─────────────────────────────────────────────────────────────────┘
|
||||
```
|
||||
|
||||
## Speed Optimization Techniques
|
||||
|
||||
### 1. cuVSLAM for Visual Odometry (~11ms/frame)
|
||||
NVIDIA's CUDA-accelerated VO library (v15.0.0, March 2026) achieves 90fps on Jetson Orin Nano. Supports monocular camera + IMU natively. Features: automatic IMU fallback when visual tracking fails, loop closure, Python and C++ APIs. This eliminates custom VO entirely.
|
||||
|
||||
### 2. Keyframe-Based Satellite Matching
|
||||
Not every frame needs satellite matching. Strategy:
|
||||
- cuVSLAM provides VO at every frame (high-rate, low-latency)
|
||||
- Satellite matching triggers on **keyframes** selected by:
|
||||
- Fixed interval: every 3-10 frames (~1-3.3s between satellite corrections)
|
||||
- Confidence drop: when ESKF covariance exceeds threshold
|
||||
- VO failure: when cuVSLAM reports tracking loss (sharp turn)
|
||||
|
||||
### 3. Satellite Matcher Selection (Benchmark-Driven)
|
||||
|
||||
**Candidate A: LiteSAM (opt)** — Best accuracy for satellite-aerial matching (RMSE@30 = 17.86m on UAV-VisLoc). 6.31M params, MobileOne + TAIFormer + MinGRU. Benchmarked at 497ms on Jetson AGX Orin at 1184px. AGX Orin is 3-4x more powerful than Orin Nano Super (275 TOPS vs 67 TOPS, $2000+ vs $249).
|
||||
|
||||
Realistic Orin Nano Super estimates:
|
||||
- At 1184px: ~1.5-2.0s (unusable)
|
||||
- At 640px: ~500-800ms (borderline)
|
||||
- At 480px: ~300-500ms (best case)
|
||||
|
||||
**Candidate B: XFeat semi-dense** — ~50-100ms on Orin Nano Super. Proven on Jetson. Not specifically designed for cross-view satellite-aerial, but fast and reliable.
|
||||
|
||||
**Decision rule**: Benchmark LiteSAM TensorRT FP16 at 480px on Orin Nano Super. If ≤400ms → use LiteSAM. If >400ms → **abandon LiteSAM, use XFeat as primary**. No hybrid compromises — pick one and optimize it.
|
||||
|
||||
### 4. TensorRT FP16 Optimization
|
||||
LiteSAM's MobileOne backbone is reparameterizable — multi-branch training structure collapses to a single feed-forward path at inference. Combined with TensorRT FP16, this maximizes throughput. INT8 is possible for MobileOne backbone but ViT/transformer components may degrade with INT8.
|
||||
|
||||
### 5. CUDA Stream Pipelining
|
||||
Overlap operations across consecutive frames:
|
||||
- Stream A: cuVSLAM VO for current frame (~11ms) + ESKF fusion (~1ms)
|
||||
- Stream B: Satellite matching for previous keyframe (async)
|
||||
- CPU: SSE emission, tile management, keyframe selection logic
|
||||
|
||||
### 6. Pre-cropped Satellite Tiles
|
||||
Offline: for each satellite tile, store both the raw image and a pre-resized version matching the satellite matcher's input resolution. Runtime avoids resize cost.
|
||||
|
||||
## Existing/Competitor Solutions Analysis
|
||||
|
||||
| Solution | Approach | Accuracy | Hardware | Limitations |
|
||||
|----------|----------|----------|----------|-------------|
|
||||
| Mateos-Ramirez et al. (2024) | VO (ORB) + satellite keypoint correction + Kalman | 142m mean / 17km (0.83%) | Orange Pi class | No re-localization; ORB only; 1000m+ altitude |
|
||||
| SatLoc (2025) | DinoV2 + XFeat + optical flow + adaptive fusion | <15m, >90% coverage | Edge (unspecified) | Paper not fully accessible |
|
||||
| LiteSAM (2025) | MobileOne + TAIFormer + MinGRU subpixel refinement | RMSE@30 = 17.86m on UAV-VisLoc | RTX 3090 (62ms), AGX Orin (497ms) | Not tested on Orin Nano; AGX Orin is 3-4x more powerful |
|
||||
| TerboucheHacene/visual_localization | SuperPoint/SuperGlue/GIM + VO + satellite | Not quantified | Desktop-class | Not edge-optimized |
|
||||
| cuVSLAM (NVIDIA, 2025-2026) | CUDA-accelerated VO+SLAM, mono/stereo/IMU | <1% trajectory error (KITTI), <5cm (EuRoC) | Jetson Orin Nano (90fps) | VO only, no satellite matching |
|
||||
|
||||
**Key insight**: Combine cuVSLAM (best-in-class VO for Jetson) with the fastest viable satellite-aerial matcher via ESKF fusion. LiteSAM is the accuracy leader but unproven on Orin Nano Super — benchmark first, abandon for XFeat if too slow.
|
||||
|
||||
## Architecture
|
||||
|
||||
### Component: Visual Odometry
|
||||
|
||||
| Solution | Tools | Advantages | Limitations | Fit |
|
||||
|----------|-------|-----------|-------------|-----|
|
||||
| cuVSLAM (mono+IMU) | PyCuVSLAM / C++ API | 90fps on Orin Nano, NVIDIA-optimized, loop closure, IMU fallback | Closed-source CUDA library | ✅ Best |
|
||||
| XFeat frame-to-frame | XFeatTensorRT | 5x faster than SuperPoint, open-source | ~30-50ms total, no IMU integration | ⚠️ Fallback |
|
||||
| ORB-SLAM3 | OpenCV + custom | Well-understood, open-source | CPU-heavy, ~30fps on Orin | ⚠️ Slower |
|
||||
|
||||
**Selected**: **cuVSLAM (mono+IMU mode)** — purpose-built by NVIDIA for Jetson. ~11ms/frame leaves 389ms for everything else. Auto-fallback to IMU when visual tracking fails.
|
||||
|
||||
### Component: Satellite Image Matching
|
||||
|
||||
| Solution | Tools | Advantages | Limitations | Fit |
|
||||
|----------|-------|-----------|-------------|-----|
|
||||
| LiteSAM (opt) | TensorRT | Best satellite-aerial accuracy (RMSE@30 17.86m), 6.31M params, subpixel refinement | 497ms on AGX Orin at 1184px; AGX Orin is 3-4x more powerful than Orin Nano Super | ✅ If benchmark passes |
|
||||
| XFeat semi-dense | XFeatTensorRT | ~50-100ms, lightweight, Jetson-proven | Not designed for cross-view satellite-aerial | ✅ If LiteSAM fails benchmark |
|
||||
| EfficientLoFTR | TensorRT | Good accuracy, semi-dense | 15.05M params (2.4x LiteSAM), slower | ⚠️ Heavier |
|
||||
| SuperPoint + LightGlue | TensorRT C++ | Good general matching | Sparse only, worse on satellite-aerial | ⚠️ Not specialized |
|
||||
|
||||
**Selection**: Benchmark-driven. Day-one test on Orin Nano Super:
|
||||
1. Export LiteSAM (opt) to TensorRT FP16
|
||||
2. Measure at 480px, 640px, 800px
|
||||
3. If ≤400ms at 480px → **LiteSAM**
|
||||
4. If >400ms at any viable resolution → **XFeat semi-dense** (primary, no hybrid)
|
||||
|
||||
### Component: Sensor Fusion
|
||||
|
||||
| Solution | Tools | Advantages | Limitations | Fit |
|
||||
|----------|-------|-----------|-------------|-----|
|
||||
| Error-State EKF (ESKF) | Custom Python/C++ | Lightweight, multi-rate, well-understood | Linear approximation | ✅ Best |
|
||||
| Hybrid ESKF/UKF | Custom | 49% better accuracy | More complex | ⚠️ Upgrade path |
|
||||
| Factor Graph (GTSAM) | GTSAM | Best accuracy | Heavy compute | ❌ Too heavy |
|
||||
|
||||
**Selected**: **ESKF** with adaptive measurement noise. State vector: [position(3), velocity(3), orientation_quat(4), accel_bias(3), gyro_bias(3)] = 16 states.
|
||||
|
||||
Measurement sources and rates:
|
||||
- IMU prediction: 100+Hz
|
||||
- cuVSLAM VO update: ~3Hz (every frame)
|
||||
- Satellite update: ~0.3-1Hz (keyframes only, delayed via async pipeline)
|
||||
|
||||
### Component: Satellite Tile Preprocessing (Offline)
|
||||
|
||||
**Selected**: **GeoHash-indexed tile pairs on disk**.
|
||||
|
||||
Pipeline:
|
||||
1. Define operational area from flight plan
|
||||
2. Download satellite tiles from Google Maps Tile API at max zoom (18-19)
|
||||
3. Pre-resize each tile to matcher input resolution
|
||||
4. Store: original tile + resized tile + metadata (GPS bounds, zoom, GSD) in GeoHash-indexed directory structure
|
||||
5. Copy to Jetson storage before flight
|
||||
|
||||
### Component: Re-localization (Disconnected Segments)
|
||||
|
||||
**Selected**: **Keyframe satellite matching is always active + expanded search on VO failure**.
|
||||
|
||||
When cuVSLAM reports tracking loss (sharp turn, no features):
|
||||
1. Immediately flag next frame as keyframe → trigger satellite matching
|
||||
2. Expand tile search radius (from ±200m to ±1km based on IMU dead-reckoning uncertainty)
|
||||
3. If match found: position recovered, new segment begins
|
||||
4. If 3+ consecutive keyframe failures: request user input via API
|
||||
|
||||
### Component: Object Center Coordinates
|
||||
|
||||
Geometric calculation once frame-center GPS is known:
|
||||
1. Pixel offset from center: (dx_px, dy_px)
|
||||
2. Convert to meters: dx_m = dx_px × GSD, dy_m = dy_px × GSD
|
||||
3. Rotate by IMU yaw heading
|
||||
4. Convert meter offset to lat/lon and add to frame-center GPS
|
||||
|
||||
### Component: API & Streaming
|
||||
|
||||
**Selected**: **FastAPI + sse-starlette**. REST for session management, SSE for real-time position stream. OpenAPI auto-documentation.
|
||||
|
||||
## Processing Time Budget (per frame, 400ms budget)
|
||||
|
||||
### Normal Frame (non-keyframe, ~60-80% of frames)
|
||||
|
||||
| Step | Time | Notes |
|
||||
|------|------|-------|
|
||||
| Image capture + transfer | ~10ms | CSI/USB3 |
|
||||
| Downsample (for cuVSLAM) | ~2ms | OpenCV CUDA |
|
||||
| cuVSLAM VO+IMU | ~11ms | NVIDIA CUDA-optimized, 90fps capable |
|
||||
| ESKF fusion (VO+IMU update) | ~1ms | C extension or NumPy |
|
||||
| SSE emit | ~1ms | Async |
|
||||
| **Total** | **~25ms** | Well within 400ms |
|
||||
|
||||
### Keyframe Satellite Matching (async, every 3-10 frames)
|
||||
|
||||
Runs asynchronously on a separate CUDA stream — does NOT block per-frame VO output.
|
||||
|
||||
**Path A — LiteSAM (if benchmark passes)**:
|
||||
|
||||
| Step | Time | Notes |
|
||||
|------|------|-------|
|
||||
| Downsample to ~480px | ~1ms | OpenCV CUDA |
|
||||
| Load satellite tile | ~5ms | Pre-resized, from storage |
|
||||
| LiteSAM (opt) matching | ~300-500ms | TensorRT FP16, 480px, Orin Nano Super estimate |
|
||||
| Geometric pose (RANSAC) | ~5ms | Homography estimation |
|
||||
| ESKF satellite update | ~1ms | Delayed measurement |
|
||||
| **Total** | **~310-510ms** | Async, does not block VO |
|
||||
|
||||
**Path B — XFeat (if LiteSAM abandoned)**:
|
||||
|
||||
| Step | Time | Notes |
|
||||
|------|------|-------|
|
||||
| XFeat feature extraction (both images) | ~10-20ms | TensorRT FP16/INT8 |
|
||||
| XFeat semi-dense matching | ~30-50ms | KNN + refinement |
|
||||
| Geometric verification (RANSAC) | ~5ms | |
|
||||
| ESKF satellite update | ~1ms | |
|
||||
| **Total** | **~50-80ms** | Comfortably within budget |
|
||||
|
||||
### Per-Frame Wall-Clock Latency
|
||||
|
||||
Every frame:
|
||||
- **VO result emitted in ~25ms** (cuVSLAM + ESKF + SSE)
|
||||
- Satellite correction arrives asynchronously on keyframes
|
||||
- Client gets immediate position, then refined position when satellite match completes
|
||||
|
||||
## Memory Budget (Jetson Orin Nano Super, 8GB shared)
|
||||
|
||||
| Component | Memory | Notes |
|
||||
|-----------|--------|-------|
|
||||
| OS + runtime | ~1.5GB | JetPack 6.2 + Python |
|
||||
| cuVSLAM | ~200-300MB | NVIDIA CUDA library + internal state |
|
||||
| Satellite matcher TensorRT | ~50-100MB | LiteSAM FP16 or XFeat FP16 |
|
||||
| Current frame (downsampled) | ~2MB | 640×480×3 |
|
||||
| Satellite tile (pre-resized) | ~1MB | Single active tile |
|
||||
| ESKF state + buffers | ~10MB | |
|
||||
| FastAPI + SSE runtime | ~100MB | |
|
||||
| **Total** | **~1.9-2.4GB** | ~25-30% of 8GB — comfortable margin |
|
||||
|
||||
## Confidence Scoring
|
||||
|
||||
| Level | Condition | Expected Accuracy |
|
||||
|-------|-----------|-------------------|
|
||||
| HIGH | Satellite match succeeded + cuVSLAM consistent | <20m |
|
||||
| MEDIUM | cuVSLAM VO only, recent satellite correction (<500m travel) | 20-50m |
|
||||
| LOW | cuVSLAM VO only, no recent satellite correction | 50-100m+ |
|
||||
| VERY LOW | IMU dead-reckoning only (cuVSLAM + satellite both failed) | 100m+ |
|
||||
| MANUAL | User-provided position | As provided |
|
||||
|
||||
## Key Risks and Mitigations
|
||||
|
||||
| Risk | Likelihood | Impact | Mitigation |
|
||||
|------|-----------|--------|------------|
|
||||
| LiteSAM too slow on Orin Nano Super | HIGH | Misses 400ms deadline | **Abandon LiteSAM, use XFeat**. Day-one benchmark is the go/no-go gate |
|
||||
| cuVSLAM not supporting nadir-only camera well | MEDIUM | VO accuracy degrades | Fall back to XFeat frame-to-frame matching |
|
||||
| Google Maps satellite quality in conflict zone | HIGH | Satellite matching fails | Accept VO+IMU with higher drift; request user input sooner; alternative satellite providers |
|
||||
| XFeat cross-view accuracy insufficient | MEDIUM | Position corrections less accurate than LiteSAM | Increase keyframe frequency; multi-tile consensus voting; geometric verification with strict RANSAC |
|
||||
| cuVSLAM is closed-source | LOW | Hard to debug | Fallback to XFeat VO; cuVSLAM has Python+C++ APIs |
|
||||
|
||||
## Testing Strategy
|
||||
|
||||
### Integration / Functional Tests
|
||||
- End-to-end pipeline test with real flight data (60 images from input_data/)
|
||||
- Compare computed positions against ground truth GPS from coordinates.csv
|
||||
- Measure: percentage within 50m, percentage within 20m
|
||||
- Test sharp-turn handling: introduce 90-degree heading change in sequence
|
||||
- Test user-input fallback: simulate 3+ consecutive failures
|
||||
- Test SSE streaming: verify client receives VO result within 50ms, satellite-corrected result within 500ms
|
||||
- Test session management: start/stop/restart flight sessions via REST API
|
||||
|
||||
### Non-Functional Tests
|
||||
- **Day-one benchmark**: LiteSAM TensorRT FP16 at 480/640/800px on Orin Nano Super → go/no-go for LiteSAM
|
||||
- cuVSLAM benchmark: verify 90fps monocular+IMU on Orin Nano Super
|
||||
- Performance: measure per-frame processing time (must be <400ms)
|
||||
- Memory: monitor peak usage during 1000-frame session (must stay <8GB)
|
||||
- Stress: process 3000 frames without memory leak
|
||||
- Keyframe strategy: vary interval (2, 3, 5, 10) and measure accuracy vs latency tradeoff
|
||||
|
||||
## References
|
||||
- LiteSAM (2025): https://www.mdpi.com/2072-4292/17/19/3349
|
||||
- LiteSAM code: https://github.com/boyagesmile/LiteSAM
|
||||
- cuVSLAM (2025-2026): https://github.com/NVlabs/PyCuVSLAM
|
||||
- PyCuVSLAM API: https://nvlabs.github.io/PyCuVSLAM/api.html
|
||||
- Mateos-Ramirez et al. (2024): https://www.mdpi.com/2076-3417/14/16/7420
|
||||
- SatLoc (2025): https://www.scilit.com/publications/e5cafaf875a49297a62b298a89d5572f
|
||||
- XFeat (CVPR 2024): https://arxiv.org/abs/2404.19174
|
||||
- XFeat TensorRT for Jetson: https://github.com/PranavNedunghat/XFeatTensorRT
|
||||
- EfficientLoFTR (CVPR 2024): https://github.com/zju3dv/EfficientLoFTR
|
||||
- JetPack 6.2: https://docs.nvidia.com/jetson/archives/jetpack-archived/jetpack-62/release-notes/
|
||||
- Hybrid ESKF/UKF: https://arxiv.org/abs/2512.17505
|
||||
- Google Maps Tile API: https://developers.google.com/maps/documentation/tile/satellite
|
||||
|
||||
## Related Artifacts
|
||||
- AC Assessment: `_docs/00_research/gps_denied_nav/00_ac_assessment.md`
|
||||
- Tech stack evaluation: `_docs/01_solution/tech_stack.md`
|
||||
@@ -1,356 +0,0 @@
|
||||
# Solution Draft
|
||||
|
||||
## Assessment Findings
|
||||
|
||||
| Old Component Solution | Weak Point (functional/security/performance) | New Solution |
|
||||
|------------------------|----------------------------------------------|-------------|
|
||||
| LiteSAM at 480px as satellite matcher | **Performance**: 497ms on AGX Orin at 1184px. Orin Nano Super is ~3-4x slower. At 480px estimated ~270-360ms — borderline. Paper uses PyTorch AMP, not TensorRT FP16. TensorRT could bring 2-3x improvement. | Add TensorRT FP16 as mandatory optimization step. Revised estimate at 480px with TensorRT: ~90-180ms. Still benchmark-driven: abandon if >400ms. |
|
||||
| XFeat as LiteSAM fallback for satellite matching | **Functional**: XFeat is a general-purpose feature matcher, NOT designed for cross-view satellite-aerial gap. May fail on season/lighting differences between UAV and satellite imagery. | **Expand fallback options**: benchmark EfficientLoFTR (designed for weak-texture aerial) alongside XFeat. Consider STHN-style deep homography as third option. See detailed satellite matcher comparison below. |
|
||||
| SP+LG considered as "sparse only, worse on satellite-aerial" | **Functional**: LiteSAM paper confirms "SP+LG achieves fastest inference speed but at expense of accuracy." Sparse matcher fails on texture-scarce regions. ~180-360ms on Orin Nano Super. | **Reject SP+LG** for both VO and satellite matching. cuVSLAM is 15-33x faster for VO. |
|
||||
| cuVSLAM on low-texture terrain | **Functional**: cuVSLAM uses Shi-Tomasi corners + Lucas-Kanade tracking. On uniform agricultural fields/water bodies, features will be sparse → frequent tracking loss. IMU fallback lasts only ~1s. No published benchmarks for nadir agricultural terrain. Does NOT guarantee pose recovery after tracking loss. | **CRITICAL RISK**: cuVSLAM will likely fail frequently over low-texture terrain. Mitigation: (1) increase satellite matching frequency in low-texture areas, (2) use IMU dead-reckoning bridge, (3) accept higher drift in featureless segments, (4) XFeat VO as secondary fallback may also struggle on same terrain. |
|
||||
| cuVSLAM memory estimate ~200-300MB | **Performance**: Map grows over time. For 3000-frame flights (~16min at 3fps), map could reach 500MB-1GB without pruning. | Configure cuVSLAM map pruning. Set max keyframes. Monitor memory. |
|
||||
| Tile search on VO failure: "expand to ±1km" | **Functional**: Underspecified. Loading 10-20 tiles slow from disk I/O. | Preload tiles within ±2km of flight plan into RAM. Ranked search by IMU dead-reckoning position. |
|
||||
| LiteSAM resolution | **Performance**: Paper benchmarked at 1184px on AGX Orin (497ms AMP). TensorRT FP16 with reparameterized MobileOne expected 2-3x faster. | Benchmark LiteSAM TRT FP16 at **1280px** on Orin Nano Super. If ≤200ms → use LiteSAM at 1280px. If >200ms → use XFeat. |
|
||||
| SP+LG proposed for VO by user | **Performance**: ~130-280ms/frame on Orin Nano. cuVSLAM ~8.6ms/frame. No IMU, no loop closure. | **Reject SP+LG for VO.** cuVSLAM 15-33x faster. XFeat frame-to-frame remains fallback. |
|
||||
|
||||
## Product Solution Description
|
||||
|
||||
A real-time GPS-denied visual navigation system for fixed-wing UAVs, running entirely on a Jetson Orin Nano Super (8GB). The system determines frame-center GPS coordinates by fusing three information sources: (1) CUDA-accelerated visual odometry (cuVSLAM), (2) absolute position corrections from satellite image matching, and (3) IMU-based motion prediction. Results stream to clients via REST API + SSE in real time.
|
||||
|
||||
**Hard constraint**: Camera shoots at ~3fps (333-400ms interval). The full pipeline must complete within **400ms per frame**.
|
||||
|
||||
**Satellite matching strategy**: Benchmark LiteSAM TensorRT FP16 at **1280px** on Orin Nano Super as a day-one priority. The paper's AGX Orin benchmark used PyTorch AMP — TensorRT FP16 with reparameterized MobileOne should yield 2-3x additional speedup. **Decision rule: if LiteSAM TRT FP16 at 1280px ≤200ms → use LiteSAM. If >200ms → use XFeat.**
|
||||
|
||||
**Core architectural principles**:
|
||||
1. **cuVSLAM handles VO** — 116fps on Orin Nano 8GB, ~8.6ms/frame. SuperPoint+LightGlue was evaluated and rejected (15-33x slower, no IMU integration).
|
||||
2. **Keyframe-based satellite matching** — satellite matcher runs on keyframes only (every 3-10 frames), amortizing cost. Non-keyframes rely on cuVSLAM VO + IMU.
|
||||
3. **Every keyframe independently attempts satellite-based geo-localization** — handles disconnected segments natively.
|
||||
4. **Pipeline parallelism** — satellite matching for frame N overlaps with VO processing of frame N+1 via CUDA streams.
|
||||
5. **Proactive tile loading** — preload tiles within ±2km of flight plan into RAM for fast lookup during expanded search.
|
||||
|
||||
```
|
||||
┌─────────────────────────────────────────────────────────────────┐
|
||||
│ OFFLINE (Before Flight) │
|
||||
│ Satellite Tiles → Download & Crop → Store as tile pairs │
|
||||
│ (Google Maps) (per flight plan) (disk, GeoHash indexed) │
|
||||
└─────────────────────────────────────────────────────────────────┘
|
||||
│
|
||||
▼
|
||||
┌─────────────────────────────────────────────────────────────────┐
|
||||
│ ONLINE (During Flight) │
|
||||
│ │
|
||||
│ EVERY FRAME (400ms budget): │
|
||||
│ ┌────────────────────────────────┐ │
|
||||
│ │ Camera → Downsample (CUDA 2ms)│ │
|
||||
│ │ → cuVSLAM VO+IMU (~9ms) │──→ ESKF Update → SSE Emit │
|
||||
│ └────────────────────────────────┘ ↑ │
|
||||
│ │ │
|
||||
│ KEYFRAMES ONLY (every 3-10 frames): │ │
|
||||
│ ┌────────────────────────────────────┐ │ │
|
||||
│ │ Satellite match (async CUDA stream)│─────┘ │
|
||||
│ │ LiteSAM TRT FP16 or XFeat │ │
|
||||
│ │ (does NOT block VO output) │ │
|
||||
│ └────────────────────────────────────┘ │
|
||||
│ │
|
||||
│ IMU: 100+Hz continuous → ESKF prediction │
|
||||
│ TILES: ±2km preloaded in RAM from flight plan │
|
||||
└─────────────────────────────────────────────────────────────────┘
|
||||
```
|
||||
|
||||
## Speed Optimization Techniques
|
||||
|
||||
### 1. cuVSLAM for Visual Odometry (~9ms/frame)
|
||||
NVIDIA's CUDA-accelerated VO library (v15.0.0, March 2026) achieves 116fps on Jetson Orin Nano 8GB at 720p. Supports monocular camera + IMU natively. Features: automatic IMU fallback when visual tracking fails, loop closure, Python and C++ APIs.
|
||||
|
||||
**Why not SuperPoint+LightGlue for VO**: SP+LG is 15-33x slower (~130-280ms vs ~9ms). Lacks IMU integration, loop closure, auto-fallback.
|
||||
|
||||
**CRITICAL: cuVSLAM on difficult/even terrain (agricultural fields, water)**:
|
||||
cuVSLAM uses Shi-Tomasi corner detection + Lucas-Kanade optical flow tracking (classical features, not learned). On uniform agricultural terrain or water bodies:
|
||||
- Very few corners will be detected → sparse/unreliable tracking
|
||||
- Frequent keyframe creation → heavier compute
|
||||
- Tracking loss → IMU fallback (~1 second) → constant-velocity integrator (~0.5s more)
|
||||
- cuVSLAM does NOT guarantee pose recovery after tracking loss
|
||||
- All published benchmarks (KITTI: urban/suburban, EuRoC: indoor) do NOT include nadir agricultural terrain
|
||||
- Multi-stereo mode helps with featureless surfaces, but we have mono camera only
|
||||
|
||||
**Mitigation strategy for low-texture terrain**:
|
||||
1. **Increase satellite matching frequency**: In low-texture areas (detected by cuVSLAM's keypoint count dropping), switch from every 3-10 frames to every frame
|
||||
2. **IMU dead-reckoning bridge**: When cuVSLAM reports tracking loss, ESKF continues with IMU prediction. At 3fps with ~1.5s IMU bridge, that covers ~4-5 frames
|
||||
3. **Accept higher drift**: In featureless segments, position accuracy degrades to IMU-only level (50-100m+ over ~10s). Satellite matching must recover absolute position when texture returns
|
||||
4. **Keypoint density monitoring**: Track cuVSLAM's number of tracked features per frame. When below threshold (e.g., <50), proactively trigger satellite matching
|
||||
5. **XFeat frame-to-frame as VO fallback**: XFeat uses learned features that may detect texture invisible to Shi-Tomasi corners. But XFeat may also struggle on truly uniform terrain
|
||||
|
||||
### 2. Keyframe-Based Satellite Matching
|
||||
Not every frame needs satellite matching. Strategy:
|
||||
- cuVSLAM provides VO at every frame (high-rate, low-latency)
|
||||
- Satellite matching triggers on **keyframes** selected by:
|
||||
- Fixed interval: every 3-10 frames (~1-3.3s between satellite corrections)
|
||||
- Confidence drop: when ESKF covariance exceeds threshold
|
||||
- VO failure: when cuVSLAM reports tracking loss (sharp turn)
|
||||
|
||||
### 3. Satellite Matcher Selection (Benchmark-Driven)
|
||||
|
||||
**Important context**: Our UAV-to-satellite matching is EASIER than typical cross-view geo-localization problems. Both the UAV camera and satellite imagery are approximately nadir (top-down). The main challenges are season/lighting differences, resolution mismatch, and temporal changes — not the extreme viewpoint gap seen in ground-to-satellite matching. This means even general-purpose matchers may perform well.
|
||||
|
||||
**Candidate A: LiteSAM (opt) with TensorRT FP16 at 1280px** — Best satellite-aerial accuracy (RMSE@30 = 17.86m on UAV-VisLoc). 6.31M params, MobileOne reparameterizable for TensorRT. Paper benchmarked at 497ms on AGX Orin using AMP at 1184px. TensorRT FP16 with reparameterized MobileOne expected 2-3x faster than AMP. At 1280px (close to paper's 1184px benchmark resolution), accuracy should match published results.
|
||||
|
||||
Orin Nano Super TensorRT FP16 estimate at 1280px:
|
||||
- AGX Orin AMP @ 1184px: 497ms
|
||||
- TRT FP16 speedup over AMP: ~2-3x → AGX Orin TRT estimate: ~165-250ms
|
||||
- Orin Nano Super is ~3-4x slower → estimate: ~500-1000ms without TRT
|
||||
- With TRT FP16: **~165-330ms** (realistic range)
|
||||
- Go/no-go threshold: **≤200ms**
|
||||
|
||||
**Candidate B (fallback): XFeat semi-dense** — ~50-100ms on Orin Nano Super. Proven on Jetson. General-purpose, not designed for cross-view gap. FASTEST option. Since our cross-view gap is small (both nadir), XFeat may work adequately for this specific use case.
|
||||
|
||||
**Other evaluated options (not selected)**:
|
||||
|
||||
- **EfficientLoFTR**: Semi-dense, 15.05M params, handles weak-texture well. ~20% slower than LiteSAM. Strong option if LiteSAM codebase proves difficult to export to TRT, but larger model footprint.
|
||||
- **Deep Homography (STHN-style)**: End-to-end homography estimation, no feature/RANSAC pipeline. 4.24m at 50m range. Interesting future option but needs RGB retraining — higher implementation risk.
|
||||
- **PFED and retrieval-based methods**: Image RETRIEVAL only (identifies which tile matches), not pixel-level matching. We already know which tile to use from ESKF position.
|
||||
- **SuperPoint+LightGlue**: Sparse matcher. LiteSAM paper confirms worse satellite-aerial accuracy. Slower than XFeat.
|
||||
|
||||
**Decision rule** (day-one on Orin Nano Super):
|
||||
1. Export LiteSAM (opt) to TensorRT FP16
|
||||
2. Benchmark at **1280px**
|
||||
3. **If ≤200ms → use LiteSAM at 1280px**
|
||||
4. **If >200ms → use XFeat**
|
||||
|
||||
### 4. TensorRT FP16 Optimization
|
||||
LiteSAM's MobileOne backbone is reparameterizable — multi-branch training structure collapses to a single feed-forward path at inference. Combined with TensorRT FP16, this maximizes throughput. **Do NOT use INT8 on transformer components** (TAIFormer) — accuracy degrades. INT8 is safe only for the MobileOne backbone CNN layers.
|
||||
|
||||
### 5. CUDA Stream Pipelining
|
||||
Overlap operations across consecutive frames:
|
||||
- Stream A: cuVSLAM VO for current frame (~9ms) + ESKF fusion (~1ms)
|
||||
- Stream B: Satellite matching for previous keyframe (async)
|
||||
- CPU: SSE emission, tile management, keyframe selection logic
|
||||
|
||||
### 6. Proactive Tile Loading
|
||||
**Change from draft01**: Instead of loading tiles on-demand from disk, preload tiles within ±2km of the flight plan into RAM at session start. This eliminates disk I/O latency during flight. For a 50km flight path, ~2000 tiles at zoom 19 ≈ ~200MB RAM — well within budget.
|
||||
|
||||
On VO failure / expanded search:
|
||||
1. Compute IMU dead-reckoning position
|
||||
2. Rank preloaded tiles by distance to predicted position
|
||||
3. Try top 3 tiles (not all tiles in ±1km radius)
|
||||
4. If no match in top 3, expand to next 3
|
||||
|
||||
## Existing/Competitor Solutions Analysis
|
||||
|
||||
| Solution | Approach | Accuracy | Hardware | Limitations |
|
||||
|----------|----------|----------|----------|-------------|
|
||||
| Mateos-Ramirez et al. (2024) | VO (ORB) + satellite keypoint correction + Kalman | 142m mean / 17km (0.83%) | Orange Pi class | No re-localization; ORB only; 1000m+ altitude |
|
||||
| SatLoc (2025) | DinoV2 + XFeat + optical flow + adaptive fusion | <15m, >90% coverage | Edge (unspecified) | Paper not fully accessible |
|
||||
| LiteSAM (2025) | MobileOne + TAIFormer + MinGRU subpixel refinement | RMSE@30 = 17.86m on UAV-VisLoc | RTX 3090 (62ms), AGX Orin (497ms@1184px) | Not tested on Orin Nano; AGX Orin is 3-4x more powerful |
|
||||
| TerboucheHacene/visual_localization | SuperPoint/SuperGlue/GIM + VO + satellite | Not quantified | Desktop-class | Not edge-optimized |
|
||||
| cuVSLAM (NVIDIA, 2025-2026) | CUDA-accelerated VO+SLAM, mono/stereo/IMU | <1% trajectory error (KITTI), <5cm (EuRoC) | Jetson Orin Nano (116fps) | VO only, no satellite matching |
|
||||
| VRLM (2024) | FocalNet backbone + multi-scale feature fusion | 83.35% MA@20 | Desktop | Not edge-optimized |
|
||||
| Scale-Aware UAV-to-Satellite (2026) | Semantic geometric + metric scale recovery | N/A | Desktop | Addresses scale ambiguity problem |
|
||||
| EfficientLoFTR (CVPR 2024) | Aggregated attention + adaptive token selection, semi-dense | Competitive with LiteSAM | 2.5x faster than LoFTR, TRT available | 15.05M params, heavier than LiteSAM |
|
||||
| PFED (2025) | Knowledge distillation + multi-view refinement, retrieval | 97.15% Recall@1 (University-1652) | AGX Orin (251.5 FPS) | Retrieval only, not pixel-level matching |
|
||||
| STHN (IEEE RA-L 2024) | Deep homography estimation, coarse-to-fine | 4.24m at 50m range | Open-source, lightweight | Trained on thermal, needs RGB retraining |
|
||||
| Hierarchical AVL (2025) | DINOv2 retrieval + SuperPoint matching | 64.5-95% success rate | ROS, IMU integration | Two-stage complexity |
|
||||
| JointLoc (IROS 2024) | Retrieval + VO fusion, adaptive weighting | 0.237m RMSE over 1km | Open-source | Designed for Mars/planetary, needs adaptation |
|
||||
|
||||
## Architecture
|
||||
|
||||
### Component: Visual Odometry
|
||||
|
||||
| Solution | Tools | Advantages | Limitations | Performance | Fit |
|
||||
|----------|-------|-----------|-------------|------------|-----|
|
||||
| cuVSLAM (mono+IMU) | PyCuVSLAM v15.0.0 | 116fps on Orin Nano, NVIDIA-optimized, loop closure, IMU fallback | Closed-source CUDA library | ~9ms/frame | ✅ Best |
|
||||
| XFeat frame-to-frame | XFeatTensorRT | 5x faster than SuperPoint, open-source | ~30-50ms total, no IMU integration | ~30-50ms/frame | ⚠️ Fallback |
|
||||
| SuperPoint+LightGlue | LightGlue-ONNX TRT | Good accuracy, adaptive pruning | ~130-280ms, no IMU, no loop closure | ~130-280ms/frame | ❌ Rejected |
|
||||
| ORB-SLAM3 | OpenCV + custom | Well-understood, open-source | CPU-heavy, ~30fps on Orin | ~33ms/frame | ⚠️ Slower |
|
||||
|
||||
**Selected**: **cuVSLAM (mono+IMU mode)** — 116fps, purpose-built by NVIDIA for Jetson. Auto-fallback to IMU when visual tracking fails.
|
||||
|
||||
**SP+LG rejection rationale**: 15-33x slower than cuVSLAM. No built-in IMU fusion, loop closure, or tracking failure detection. Building these features around SP+LG would take significant development time and still be slower. XFeat at ~30-50ms is a better fallback for VO if cuVSLAM fails on nadir camera.
|
||||
|
||||
### Component: Satellite Image Matching
|
||||
|
||||
| Solution | Tools | Advantages | Limitations | Performance | Fit |
|
||||
|----------|-------|-----------|-------------|------------|-----|
|
||||
| LiteSAM (opt) TRT FP16 @ 1280px | TensorRT | Best satellite-aerial accuracy (RMSE@30 17.86m), 6.31M params, subpixel refinement | Untested on Orin Nano Super with TensorRT | Est. ~165-330ms @ 1280px TRT FP16 | ✅ If ≤200ms |
|
||||
| XFeat semi-dense | XFeatTensorRT | ~50-100ms, lightweight, Jetson-proven, fastest | General-purpose, not designed for cross-view. Our nadir-nadir gap is small → may work. | ~50-100ms | ✅ Fallback if LiteSAM >200ms |
|
||||
|
||||
**Selection**: Day-one benchmark on Orin Nano Super:
|
||||
1. Export LiteSAM (opt) to TensorRT FP16
|
||||
2. Benchmark at **1280px**
|
||||
3. **If ≤200ms → LiteSAM at 1280px**
|
||||
4. **If >200ms → XFeat**
|
||||
|
||||
### Component: Sensor Fusion
|
||||
|
||||
| Solution | Tools | Advantages | Limitations | Performance | Fit |
|
||||
|----------|-------|-----------|-------------|------------|-----|
|
||||
| Error-State EKF (ESKF) | Custom Python/C++ | Lightweight, multi-rate, well-understood | Linear approximation | <1ms/step | ✅ Best |
|
||||
| Hybrid ESKF/UKF | Custom | 49% better accuracy | More complex | ~2-3ms/step | ⚠️ Upgrade path |
|
||||
| Factor Graph (GTSAM) | GTSAM | Best accuracy | Heavy compute | ~10-50ms/step | ❌ Too heavy |
|
||||
|
||||
**Selected**: **ESKF** with adaptive measurement noise. State vector: [position(3), velocity(3), orientation_quat(4), accel_bias(3), gyro_bias(3)] = 16 states.
|
||||
|
||||
### Component: Satellite Tile Preprocessing (Offline)
|
||||
|
||||
**Selected**: **GeoHash-indexed tile pairs on disk + RAM preloading**.
|
||||
|
||||
Pipeline:
|
||||
1. Define operational area from flight plan
|
||||
2. Download satellite tiles from Google Maps Tile API at max zoom (18-19)
|
||||
3. Pre-resize each tile to matcher input resolution
|
||||
4. Store: original tile + resized tile + metadata (GPS bounds, zoom, GSD) in GeoHash-indexed directory structure
|
||||
5. Copy to Jetson storage before flight
|
||||
6. **At session start**: preload tiles within ±2km of flight plan into RAM (~200MB for 50km route)
|
||||
|
||||
### Component: Re-localization (Disconnected Segments)
|
||||
|
||||
**Selected**: **Keyframe satellite matching is always active + ranked tile search on VO failure**.
|
||||
|
||||
When cuVSLAM reports tracking loss (sharp turn, no features):
|
||||
1. Immediately flag next frame as keyframe → trigger satellite matching
|
||||
2. Compute IMU dead-reckoning position since last known position
|
||||
3. Rank preloaded tiles by distance to dead-reckoning position
|
||||
4. Try top 3 tiles sequentially (not all tiles in radius)
|
||||
5. If match found: position recovered, new segment begins
|
||||
6. If 3 consecutive keyframe failures across top tiles: expand to next 3 tiles
|
||||
7. If still no match after 3+ full attempts: request user input via API
|
||||
|
||||
### Component: Object Center Coordinates
|
||||
|
||||
Geometric calculation once frame-center GPS is known:
|
||||
1. Pixel offset from center: (dx_px, dy_px)
|
||||
2. Convert to meters: dx_m = dx_px × GSD, dy_m = dy_px × GSD
|
||||
3. Rotate by IMU yaw heading
|
||||
4. Convert meter offset to lat/lon and add to frame-center GPS
|
||||
|
||||
### Component: API & Streaming
|
||||
|
||||
**Selected**: **FastAPI + sse-starlette**. REST for session management, SSE for real-time position stream. OpenAPI auto-documentation.
|
||||
|
||||
## Processing Time Budget (per frame, 400ms budget)
|
||||
|
||||
### Normal Frame (non-keyframe, ~60-80% of frames)
|
||||
|
||||
| Step | Time | Notes |
|
||||
|------|------|-------|
|
||||
| Image capture + transfer | ~10ms | CSI/USB3 |
|
||||
| Downsample (for cuVSLAM) | ~2ms | OpenCV CUDA |
|
||||
| cuVSLAM VO+IMU | ~9ms | NVIDIA CUDA-optimized, 116fps capable |
|
||||
| ESKF fusion (VO+IMU update) | ~1ms | C extension or NumPy |
|
||||
| SSE emit | ~1ms | Async |
|
||||
| **Total** | **~23ms** | Well within 400ms |
|
||||
|
||||
### Keyframe Satellite Matching (async, every 3-10 frames)
|
||||
|
||||
Runs asynchronously on a separate CUDA stream — does NOT block per-frame VO output.
|
||||
|
||||
**Path A — LiteSAM TRT FP16 at 1280px (if ≤200ms benchmark)**:
|
||||
|
||||
| Step | Time | Notes |
|
||||
|------|------|-------|
|
||||
| Downsample to 1280px | ~1ms | OpenCV CUDA |
|
||||
| Load satellite tile | ~1ms | Pre-loaded in RAM |
|
||||
| LiteSAM (opt) TRT FP16 matching | ≤200ms | TensorRT FP16, 1280px, go/no-go threshold |
|
||||
| Geometric pose (RANSAC) | ~5ms | Homography estimation |
|
||||
| ESKF satellite update | ~1ms | Delayed measurement |
|
||||
| **Total** | **≤210ms** | Async, within budget |
|
||||
|
||||
**Path B — XFeat (if LiteSAM >200ms)**:
|
||||
|
||||
| Step | Time | Notes |
|
||||
|------|------|-------|
|
||||
| XFeat feature extraction (both images) | ~10-20ms | TensorRT FP16/INT8 |
|
||||
| XFeat semi-dense matching | ~30-50ms | KNN + refinement |
|
||||
| Geometric verification (RANSAC) | ~5ms | |
|
||||
| ESKF satellite update | ~1ms | |
|
||||
| **Total** | **~50-80ms** | Comfortably within budget |
|
||||
|
||||
## Memory Budget (Jetson Orin Nano Super, 8GB shared)
|
||||
|
||||
| Component | Memory | Notes |
|
||||
|-----------|--------|-------|
|
||||
| OS + runtime | ~1.5GB | JetPack 6.2 + Python |
|
||||
| cuVSLAM | ~200-500MB | CUDA library + map state. **Configure map pruning for 3000-frame flights** |
|
||||
| Satellite matcher TensorRT | ~50-100MB | LiteSAM FP16 or XFeat FP16 |
|
||||
| Preloaded satellite tiles | ~200MB | ±2km of flight plan, pre-resized |
|
||||
| Current frame (downsampled) | ~2MB | 640×480×3 |
|
||||
| ESKF state + buffers | ~10MB | |
|
||||
| FastAPI + SSE runtime | ~100MB | |
|
||||
| **Total** | **~2.1-2.9GB** | ~26-36% of 8GB — comfortable margin |
|
||||
|
||||
## Confidence Scoring
|
||||
|
||||
| Level | Condition | Expected Accuracy |
|
||||
|-------|-----------|-------------------|
|
||||
| HIGH | Satellite match succeeded + cuVSLAM consistent | <20m |
|
||||
| MEDIUM | cuVSLAM VO only, recent satellite correction (<500m travel) | 20-50m |
|
||||
| LOW | cuVSLAM VO only, no recent satellite correction | 50-100m+ |
|
||||
| VERY LOW | IMU dead-reckoning only (cuVSLAM + satellite both failed) | 100m+ |
|
||||
| MANUAL | User-provided position | As provided |
|
||||
|
||||
## Key Risks and Mitigations
|
||||
|
||||
| Risk | Likelihood | Impact | Mitigation |
|
||||
|------|-----------|--------|------------|
|
||||
| **cuVSLAM fails on low-texture agricultural terrain** | **HIGH** | Frequent tracking loss, degraded VO | Increase satellite matching frequency when keypoint count drops. IMU dead-reckoning bridge (~1.5s). Accept higher drift in featureless segments. Satellite matching recovers position when texture returns. |
|
||||
| LiteSAM TRT FP16 >200ms at 1280px on Orin Nano Super | MEDIUM | Must use XFeat instead (less accurate for cross-view) | Day-one TRT FP16 benchmark. If >200ms → XFeat. Since our nadir-nadir gap is small, XFeat may still perform adequately. |
|
||||
| XFeat cross-view accuracy insufficient | MEDIUM | Satellite corrections less accurate | Benchmark XFeat on actual operational area satellite-aerial pairs. Increase keyframe frequency; multi-tile consensus; strict RANSAC. |
|
||||
| cuVSLAM map memory growth on long flights | MEDIUM | Memory pressure | Configure map pruning, set max keyframes. Monitor memory. |
|
||||
| Google Maps satellite quality in conflict zone | HIGH | Satellite matching fails | Accept VO+IMU with higher drift; request user input sooner; alternative satellite providers |
|
||||
| cuVSLAM is closed-source, no nadir benchmarks | MEDIUM | Unknown failure modes over farmland | Extensive testing with real nadir UAV imagery before deployment. XFeat VO as fallback (also uses learned features). |
|
||||
| Tile I/O bottleneck during expanded search | LOW | Delayed re-localization | Preload ±2km tiles in RAM; ranked search instead of exhaustive |
|
||||
|
||||
## Testing Strategy
|
||||
|
||||
### Integration / Functional Tests
|
||||
- End-to-end pipeline test with real flight data (60 images from input_data/)
|
||||
- Compare computed positions against ground truth GPS from coordinates.csv
|
||||
- Measure: percentage within 50m, percentage within 20m
|
||||
- Test sharp-turn handling: introduce 90-degree heading change in sequence
|
||||
- Test user-input fallback: simulate 3+ consecutive failures
|
||||
- Test SSE streaming: verify client receives VO result within 50ms, satellite-corrected result within 500ms
|
||||
- Test session management: start/stop/restart flight sessions via REST API
|
||||
- Test cuVSLAM map memory: run 3000-frame session, monitor memory growth
|
||||
|
||||
### Non-Functional Tests
|
||||
- **Day-one satellite matcher benchmark**: LiteSAM TRT FP16 at **1280px** on Orin Nano Super. If ≤200ms → use LiteSAM. If >200ms → use XFeat. Also measure accuracy on test satellite-aerial pairs for both.
|
||||
- cuVSLAM benchmark: verify 116fps monocular+IMU on Orin Nano Super
|
||||
- **cuVSLAM terrain stress test**: test with nadir camera over (a) urban/structured terrain, (b) agricultural fields, (c) water/uniform terrain, (d) forest. Measure: keypoint count, tracking success rate, drift per 100 frames, IMU fallback frequency
|
||||
- cuVSLAM keypoint monitoring: verify that low-keypoint detection triggers increased satellite matching
|
||||
- Performance: measure per-frame processing time (must be <400ms)
|
||||
- Memory: monitor peak usage during 3000-frame session (must stay <8GB)
|
||||
- Stress: process 3000 frames without memory leak
|
||||
- Keyframe strategy: vary interval (2, 3, 5, 10) and measure accuracy vs latency tradeoff
|
||||
- Tile preloading: verify RAM usage of preloaded tiles for 50km flight plan
|
||||
|
||||
## References
|
||||
- EfficientLoFTR (CVPR 2024): https://github.com/zju3dv/EfficientLoFTR
|
||||
- EfficientLoFTR paper: https://zju3dv.github.io/efficientloftr/
|
||||
- LoFTR TensorRT adaptation: https://github.com/Kolkir/LoFTR_TRT
|
||||
- PFED (2025): https://github.com/SkyEyeLoc/PFED
|
||||
- STHN (IEEE RA-L 2024): https://github.com/arplaboratory/STHN
|
||||
- JointLoc (IROS 2024): https://github.com/LuoXubo/JointLoc
|
||||
- Hierarchical AVL (MDPI 2025): https://www.mdpi.com/2072-4292/17/20/3470
|
||||
- LiteSAM (2025): https://www.mdpi.com/2072-4292/17/19/3349
|
||||
- LiteSAM code: https://github.com/boyagesmile/LiteSAM
|
||||
- cuVSLAM (2025-2026): https://github.com/NVlabs/PyCuVSLAM
|
||||
- cuVSLAM paper: https://arxiv.org/abs/2506.04359
|
||||
- PyCuVSLAM API: https://nvlabs.github.io/PyCuVSLAM/api.html
|
||||
- Intermodalics cuVSLAM benchmark: https://www.intermodalics.ai/blog/nvidia-isaac-ros-in-depth-cuvslam-and-the-dp3-1-release
|
||||
- Mateos-Ramirez et al. (2024): https://www.mdpi.com/2076-3417/14/16/7420
|
||||
- SatLoc (2025): https://www.scilit.com/publications/e5cafaf875a49297a62b298a89d5572f
|
||||
- XFeat (CVPR 2024): https://arxiv.org/abs/2404.19174
|
||||
- XFeat TensorRT for Jetson: https://github.com/PranavNedunghat/XFeatTensorRT
|
||||
- EfficientLoFTR (CVPR 2024): https://github.com/zju3dv/EfficientLoFTR
|
||||
- LightGlue (ICCV 2023): https://github.com/cvg/LightGlue
|
||||
- LightGlue TensorRT: https://fabio-sim.github.io/blog/accelerating-lightglue-inference-onnx-runtime-tensorrt/
|
||||
- LightGlue TRT Jetson: https://github.com/qdLMF/LightGlue-with-FlashAttentionV2-TensorRT
|
||||
- ForestVO / SP+LG VO: https://arxiv.org/html/2504.01261v1
|
||||
- vo_lightglue (SP+LG VO): https://github.com/himadrir/vo_lightglue
|
||||
- JetPack 6.2: https://docs.nvidia.com/jetson/archives/jetpack-archived/jetpack-62/release-notes/
|
||||
- Hybrid ESKF/UKF: https://arxiv.org/abs/2512.17505
|
||||
- Google Maps Tile API: https://developers.google.com/maps/documentation/tile/satellite
|
||||
|
||||
## Related Artifacts
|
||||
- AC Assessment: `_docs/00_research/gps_denied_nav/00_ac_assessment.md`
|
||||
- Tech stack evaluation: `_docs/01_solution/tech_stack.md`
|
||||
- Security analysis: `_docs/01_solution/security_analysis.md`
|
||||
@@ -1,491 +0,0 @@
|
||||
# Solution Draft
|
||||
|
||||
## Assessment Findings
|
||||
|
||||
| Old Component Solution | Weak Point (functional/security/performance) | New Solution |
|
||||
|------------------------|----------------------------------------------|-------------|
|
||||
| FastAPI + SSE as primary output | **Functional**: New AC requires MAVLink GPS_INPUT to flight controller, not REST/SSE. The system must act as a GPS replacement module. SSE is wrong output channel. | **Replace with pymavlink GPS_INPUT sender**. Send GPS_INPUT at 5-10Hz to flight controller via UART. Retain minimal FastAPI only for local IPC (object localization API). |
|
||||
| No ground station integration | **Functional**: New AC requires streaming position+confidence to ground station and receiving re-localization commands via telemetry. Draft02 had no telemetry. | **MAVLink telemetry integration**: GPS data forwarded automatically by flight controller. Custom data via NAMED_VALUE_FLOAT (confidence, drift). Re-localization hints via COMMAND_LONG listener. |
|
||||
| MAVSDK library (per restriction) | **Functional**: MAVSDK-Python v3.15.3 cannot send GPS_INPUT messages. Feature requested since 2021, still unresolved. This is a blocking limitation for the core output function. | **Use pymavlink** for all MAVLink communication. pymavlink provides `gps_input_send()` and full MAVLink v2 access. Note conflict with restriction — pymavlink is the only viable option. |
|
||||
| 3fps camera → ~3Hz output | **Performance**: ArduPilot GPS_RATE_MS minimum is 5Hz (200ms). 3Hz camera output is below minimum. Flight controller EKF may not fuse properly. | **IMU-interpolated 5-10Hz GPS_INPUT**: ESKF prediction runs at 100+Hz internally. Emit predicted state as GPS_INPUT at 5-10Hz. Camera corrections arrive at 3Hz within this stream. |
|
||||
| No startup/failsafe procedures | **Functional**: New AC requires init from last GPS, reboot recovery, IMU-only fallback. Draft02 assumed position was already known. | **Full lifecycle management**: (1) Boot → read GPS from flight controller → init ESKF. (2) Reboot → read IMU-extrapolated position → re-init. (3) N-second failure → stop GPS_INPUT → autopilot falls back to IMU. |
|
||||
| Basic object localization (nadir only) | **Functional**: New AC adds AI camera with configurable angle and zoom. Nadir pixel-to-GPS is insufficient. | **Trigonometric projection for oblique camera**: ground_distance = alt × tan(tilt), bearing = heading + pan + pixel offset. Local API for AI system requests. |
|
||||
| No thermal management | **Performance**: Jetson Orin Nano Super throttles at 80°C (GPU drops 1GHz→300MHz = 3x slowdown). Could blow 400ms budget. | **Thermal monitoring + adaptive pipeline**: Use 25W mode. Monitor via tegrastats. If temp >75°C → reduce satellite matching frequency. If >80°C → VO+IMU only. |
|
||||
| ESKF covariance without explicit drift budget | **Functional**: New AC requires max 100m cumulative VO drift between satellite anchors. Draft02 uses covariance for keyframe selection but no explicit budget. | **Drift budget tracker**: √(σ_x² + σ_y²) from ESKF as drift estimate. When approaching 100m → force every-frame satellite matching. Report via horiz_accuracy in GPS_INPUT. |
|
||||
| No satellite imagery validation | **Functional**: New AC requires ≥0.5 m/pixel, <2 years old. Draft02 didn't validate. | **Preprocessing validation step**: Check zoom 19 availability (0.3 m/pixel). Fall back to zoom 18 (0.6 m/pixel). Flag stale tiles. |
|
||||
| "Ask user via API" for re-localization | **Functional**: New AC says send re-localization request to ground station via telemetry link, not REST API. Operator sends hint via telemetry. | **MAVLink re-localization protocol**: On 3 consecutive failures → send STATUSTEXT alert to ground station. Operator sends COMMAND_LONG with approximate lat/lon. System uses hint to constrain tile search. |
|
||||
|
||||
## Product Solution Description
|
||||
|
||||
A real-time GPS-denied visual navigation system for fixed-wing UAVs, running on a Jetson Orin Nano Super (8GB). The system replaces the GPS module for the flight controller by sending MAVLink GPS_INPUT messages via pymavlink over UART. Position is determined by fusing: (1) CUDA-accelerated visual odometry (cuVSLAM), (2) absolute position corrections from satellite image matching, and (3) IMU data from the flight controller. GPS_INPUT is sent at 5-10Hz, with camera-based corrections at 3Hz and IMU prediction filling the gaps.
|
||||
|
||||
**Hard constraint**: Camera shoots at ~3fps (333ms interval). The full VO+ESKF pipeline must complete within 400ms per frame. GPS_INPUT output rate: 5-10Hz minimum (ArduPilot EKF requirement).
|
||||
|
||||
**Output architecture**:
|
||||
- **Primary**: pymavlink → GPS_INPUT to flight controller via UART (replaces GPS module)
|
||||
- **Telemetry**: Flight controller auto-forwards GPS data to ground station. Custom NAMED_VALUE_FLOAT for confidence/drift at 1Hz
|
||||
- **Commands**: Ground station → COMMAND_LONG → flight controller → pymavlink listener on companion computer
|
||||
- **Local IPC**: Minimal FastAPI on localhost for object localization requests from AI systems
|
||||
|
||||
```
|
||||
┌─────────────────────────────────────────────────────────────────────┐
|
||||
│ OFFLINE (Before Flight) │
|
||||
│ Satellite Tiles → Download & Validate → Pre-resize → Store │
|
||||
│ (Google Maps) (≥0.5m/px, <2yr) (matcher res) (GeoHash) │
|
||||
│ Copy to Jetson storage │
|
||||
└─────────────────────────────────────────────────────────────────────┘
|
||||
│
|
||||
▼
|
||||
┌─────────────────────────────────────────────────────────────────────┐
|
||||
│ ONLINE (During Flight) │
|
||||
│ │
|
||||
│ STARTUP: │
|
||||
│ pymavlink → read GLOBAL_POSITION_INT → init ESKF → start cuVSLAM │
|
||||
│ │
|
||||
│ EVERY FRAME (3fps, 333ms interval): │
|
||||
│ ┌──────────────────────────────────────┐ │
|
||||
│ │ Nav Camera → Downsample (CUDA ~2ms) │ │
|
||||
│ │ → cuVSLAM VO+IMU (~9ms) │ │
|
||||
│ │ → ESKF measurement update │ │
|
||||
│ └──────────────────────────────────────┘ │
|
||||
│ │
|
||||
│ 5-10Hz CONTINUOUS (between camera frames): │
|
||||
│ ┌──────────────────────────────────────┐ │
|
||||
│ │ ESKF IMU prediction → GPS_INPUT send │──→ Flight Controller │
|
||||
│ │ (pymavlink, every 100-200ms) │ (GPS1_TYPE=14) │
|
||||
│ └──────────────────────────────────────┘ │
|
||||
│ │
|
||||
│ KEYFRAMES (every 3-10 frames, async): │
|
||||
│ ┌──────────────────────────────────────┐ │
|
||||
│ │ Satellite match (CUDA stream B) │──→ ESKF correction │
|
||||
│ │ LiteSAM TRT FP16 or XFeat │ │
|
||||
│ └──────────────────────────────────────┘ │
|
||||
│ │
|
||||
│ TELEMETRY (1Hz): │
|
||||
│ ┌──────────────────────────────────────┐ │
|
||||
│ │ NAMED_VALUE_FLOAT: confidence, drift │──→ Ground Station │
|
||||
│ │ STATUSTEXT: alerts, re-loc requests │ (via telemetry radio) │
|
||||
│ └──────────────────────────────────────┘ │
|
||||
│ │
|
||||
│ COMMANDS (from ground station): │
|
||||
│ ┌──────────────────────────────────────┐ │
|
||||
│ │ Listen COMMAND_LONG: re-loc hint │←── Ground Station │
|
||||
│ │ (lat/lon from operator) │ (via telemetry radio) │
|
||||
│ └──────────────────────────────────────┘ │
|
||||
│ │
|
||||
│ LOCAL IPC: │
|
||||
│ ┌──────────────────────────────────────┐ │
|
||||
│ │ FastAPI localhost:8000 │←── AI Detection System │
|
||||
│ │ POST /localize (object GPS calc) │ │
|
||||
│ │ GET /status (system health) │ │
|
||||
│ └──────────────────────────────────────┘ │
|
||||
│ │
|
||||
│ IMU: 100+Hz from flight controller → ESKF prediction │
|
||||
│ TILES: ±2km preloaded in RAM from flight plan │
|
||||
│ THERMAL: Monitor via tegrastats, adaptive pipeline throttling │
|
||||
└─────────────────────────────────────────────────────────────────────┘
|
||||
```
|
||||
|
||||
## Speed Optimization Techniques
|
||||
|
||||
### 1. cuVSLAM for Visual Odometry (~9ms/frame)
|
||||
NVIDIA's CUDA-accelerated VO library (PyCuVSLAM v15.0.0, March 2026) achieves 116fps on Jetson Orin Nano 8GB at 720p. Supports monocular camera + IMU natively. Auto-fallback to IMU when visual tracking fails, loop closure, Python and C++ APIs.
|
||||
|
||||
**CRITICAL: cuVSLAM on low-texture terrain (agricultural fields, water)**:
|
||||
cuVSLAM uses Shi-Tomasi corners + Lucas-Kanade optical flow (classical features). On uniform agricultural terrain:
|
||||
- Few corners detected → sparse/unreliable tracking
|
||||
- Frequent keyframe creation → heavier compute
|
||||
- Tracking loss → IMU fallback (~1s) → constant-velocity integrator (~0.5s)
|
||||
- cuVSLAM does NOT guarantee pose recovery after tracking loss
|
||||
|
||||
**Mitigation**:
|
||||
1. Increase satellite matching frequency when cuVSLAM keypoint count drops
|
||||
2. IMU dead-reckoning bridge via ESKF (continues GPS_INPUT output during tracking loss)
|
||||
3. Accept higher drift in featureless segments — report via horiz_accuracy
|
||||
4. Keypoint density monitoring triggers adaptive satellite matching
|
||||
|
||||
### 2. Keyframe-Based Satellite Matching
|
||||
Not every frame needs satellite matching:
|
||||
- cuVSLAM provides VO at every frame (~9ms)
|
||||
- Satellite matching triggers on keyframes selected by:
|
||||
- Fixed interval: every 3-10 frames
|
||||
- ESKF covariance exceeds threshold (drift approaching budget)
|
||||
- VO failure: cuVSLAM reports tracking loss
|
||||
- Thermal: reduce frequency if temperature high
|
||||
|
||||
### 3. Satellite Matcher Selection (Benchmark-Driven)
|
||||
|
||||
**Context**: Our UAV-to-satellite matching is nadir-to-nadir (both top-down). Challenges are season/lighting differences and temporal changes, not extreme viewpoint gaps.
|
||||
|
||||
**Candidate A: LiteSAM (opt) TRT FP16 @ 1280px** — Best satellite-aerial accuracy (RMSE@30 = 17.86m on UAV-VisLoc). 6.31M params. TensorRT FP16 with reparameterized MobileOne. Estimated ~165-330ms on Orin Nano Super with TRT FP16.
|
||||
|
||||
**Candidate B: XFeat semi-dense** — ~50-100ms on Orin Nano Super. Fastest option. General-purpose but our nadir-nadir gap is small.
|
||||
|
||||
**Decision rule** (day-one on Orin Nano Super):
|
||||
1. Export LiteSAM (opt) to TensorRT FP16
|
||||
2. Benchmark at 1280px
|
||||
3. If ≤200ms → LiteSAM at 1280px
|
||||
4. If >200ms → XFeat
|
||||
|
||||
### 4. TensorRT FP16 Optimization
|
||||
LiteSAM's MobileOne backbone is reparameterizable — multi-branch collapses to single feed-forward at inference. INT8 safe only for MobileOne CNN layers, NOT for TAIFormer transformer components.
|
||||
|
||||
### 5. CUDA Stream Pipelining
|
||||
- Stream A: cuVSLAM VO for current frame (~9ms) + ESKF fusion (~1ms)
|
||||
- Stream B: Satellite matching for previous keyframe (async, does not block VO)
|
||||
- CPU: GPS_INPUT output loop, NAMED_VALUE_FLOAT, command listener, tile management
|
||||
|
||||
### 6. Proactive Tile Loading
|
||||
Preload tiles within ±2km of flight plan into RAM at startup. For a 50km route, ~2000 tiles at zoom 19 ≈ ~200MB. Eliminates disk I/O during flight.
|
||||
|
||||
On VO failure / expanded search:
|
||||
1. Compute IMU dead-reckoning position
|
||||
2. Rank preloaded tiles by distance to predicted position
|
||||
3. Try top 3 tiles, then expand
|
||||
|
||||
### 7. 5-10Hz GPS_INPUT Output Loop
|
||||
Dedicated thread/coroutine sends GPS_INPUT at fixed rate (5-10Hz):
|
||||
1. Read current ESKF state (position, velocity, covariance)
|
||||
2. Compute horiz_accuracy from √(σ_x² + σ_y²)
|
||||
3. Set fix_type based on last correction type (3=satellite-corrected, 2=VO-only, 1=IMU-only)
|
||||
4. Send via `mav.gps_input_send()`
|
||||
5. Sleep until next interval
|
||||
|
||||
This decouples camera frame rate (3fps) from GPS_INPUT rate (5-10Hz).
|
||||
|
||||
## Existing/Competitor Solutions Analysis
|
||||
|
||||
| Solution | Approach | Accuracy | Hardware | Limitations |
|
||||
|----------|----------|----------|----------|-------------|
|
||||
| Mateos-Ramirez et al. (2024) | VO (ORB) + satellite keypoint correction + Kalman | 142m mean / 17km (0.83%) | Orange Pi class | No re-localization; ORB only; 1000m+ altitude |
|
||||
| SatLoc (2025) | DinoV2 + XFeat + optical flow + adaptive fusion | <15m, >90% coverage | Edge (unspecified) | Paper not fully accessible |
|
||||
| LiteSAM (2025) | MobileOne + TAIFormer + MinGRU subpixel refinement | RMSE@30 = 17.86m on UAV-VisLoc | RTX 3090 (62ms), AGX Orin (497ms@1184px) | Not tested on Orin Nano |
|
||||
| cuVSLAM (NVIDIA, 2025-2026) | CUDA-accelerated VO+SLAM, mono/stereo/IMU | <1% trajectory error (KITTI) | Jetson Orin Nano (116fps) | VO only, no satellite matching |
|
||||
| EfficientLoFTR (CVPR 2024) | Aggregated attention + adaptive token selection | Competitive with LiteSAM | TRT available | 15.05M params, heavier |
|
||||
| STHN (IEEE RA-L 2024) | Deep homography estimation | 4.24m at 50m range | Lightweight | Needs RGB retraining |
|
||||
| JointLoc (IROS 2024) | Retrieval + VO fusion, adaptive weighting | 0.237m RMSE over 1km | Open-source | Planetary, needs adaptation |
|
||||
|
||||
## Architecture
|
||||
|
||||
### Component: Flight Controller Integration (NEW)
|
||||
|
||||
| Solution | Tools | Advantages | Limitations | Performance | Fit |
|
||||
|----------|-------|-----------|-------------|------------|-----|
|
||||
| pymavlink GPS_INPUT | pymavlink | Full MAVLink v2 access, GPS_INPUT support, pure Python, aarch64 compatible | Lower-level API, manual message handling | ~1ms per send | ✅ Best |
|
||||
| MAVSDK-Python TelemetryServer | MAVSDK v3.15.3 | Higher-level API, aarch64 wheels | NO GPS_INPUT support, no custom messages | N/A — missing feature | ❌ Blocked |
|
||||
| MAVSDK C++ MavlinkDirect | MAVSDK v4 (future) | Custom message support planned | Not available in Python wrapper yet | N/A — not released | ❌ Not available |
|
||||
| MAVROS (ROS) | ROS + MAVROS | Full GPS_INPUT support, ROS ecosystem | Heavy ROS dependency, complex setup, unnecessary overhead | ~5ms overhead | ⚠️ Overkill |
|
||||
|
||||
**Selected**: **pymavlink** — only viable Python library for GPS_INPUT. Pure Python, works on aarch64, full MAVLink v2 message set.
|
||||
|
||||
**Restriction note**: restrictions.md specifies "MAVSDK library" but MAVSDK-Python cannot send GPS_INPUT (confirmed: Issue #320, open since 2021). pymavlink is the necessary alternative.
|
||||
|
||||
Configuration:
|
||||
- Connection: UART (`/dev/ttyTHS0` or `/dev/ttyTHS1` on Jetson, 115200-921600 baud)
|
||||
- Flight controller: GPS1_TYPE=14, SERIAL2_PROTOCOL=2 (MAVLink2)
|
||||
- GPS_INPUT rate: 5-10Hz (dedicated output thread)
|
||||
- Heartbeat: 1Hz to maintain connection
|
||||
|
||||
### Component: Visual Odometry
|
||||
|
||||
| Solution | Tools | Advantages | Limitations | Performance | Fit |
|
||||
|----------|-------|-----------|-------------|------------|-----|
|
||||
| cuVSLAM (mono+IMU) | PyCuVSLAM v15.0.0 | 116fps on Orin Nano, NVIDIA-optimized, loop closure, IMU fallback | Closed-source, low-texture terrain risk | ~9ms/frame | ✅ Best |
|
||||
| XFeat frame-to-frame | XFeatTensorRT | Open-source, learned features | No IMU integration, ~30-50ms | ~30-50ms/frame | ⚠️ Fallback |
|
||||
| ORB-SLAM3 | OpenCV + custom | Well-understood, open-source | CPU-heavy, ~30fps | ~33ms/frame | ⚠️ Slower |
|
||||
|
||||
**Selected**: **cuVSLAM (mono+IMU mode)** — 116fps, purpose-built for Jetson.
|
||||
|
||||
### Component: Satellite Image Matching
|
||||
|
||||
| Solution | Tools | Advantages | Limitations | Performance | Fit |
|
||||
|----------|-------|-----------|-------------|------------|-----|
|
||||
| LiteSAM (opt) TRT FP16 @ 1280px | TensorRT | Best satellite-aerial accuracy, 6.31M params | Untested on Orin Nano Super TRT | Est. ~165-330ms TRT FP16 | ✅ If ≤200ms |
|
||||
| XFeat semi-dense | XFeatTensorRT | ~50-100ms, Jetson-proven, fastest | General-purpose | ~50-100ms | ✅ Fallback |
|
||||
|
||||
**Selection**: Day-one benchmark. LiteSAM TRT FP16 at 1280px → if ≤200ms → LiteSAM. If >200ms → XFeat.
|
||||
|
||||
### Component: Sensor Fusion
|
||||
|
||||
| Solution | Tools | Advantages | Limitations | Performance | Fit |
|
||||
|----------|-------|-----------|-------------|------------|-----|
|
||||
| ESKF (custom) | Python/C++ | Lightweight, multi-rate, well-understood | Linear approximation | <1ms/step | ✅ Best |
|
||||
| Hybrid ESKF/UKF | Custom | 49% better accuracy | More complex | ~2-3ms/step | ⚠️ Upgrade path |
|
||||
|
||||
**Selected**: **ESKF** with adaptive measurement noise. State vector: [position(3), velocity(3), orientation_quat(4), accel_bias(3), gyro_bias(3)] = 16 states.
|
||||
|
||||
**Output rates**:
|
||||
- IMU prediction: 100+Hz (from flight controller IMU via pymavlink)
|
||||
- cuVSLAM VO update: ~3Hz
|
||||
- Satellite update: ~0.3-1Hz (keyframes, async)
|
||||
- GPS_INPUT output: 5-10Hz (ESKF predicted state)
|
||||
|
||||
**Drift budget**: Track √(σ_x² + σ_y²) from ESKF covariance. When approaching 100m → force every-frame satellite matching.
|
||||
|
||||
### Component: Ground Station Telemetry (NEW)
|
||||
|
||||
| Solution | Tools | Advantages | Limitations | Performance | Fit |
|
||||
|----------|-------|-----------|-------------|------------|-----|
|
||||
| MAVLink auto-forwarding + NAMED_VALUE_FLOAT | pymavlink | Standard MAVLink, no custom protocol, works with all GCS (Mission Planner, QGC) | Limited bandwidth (~12kbit/s), NAMED_VALUE_FLOAT name limited to 10 chars | ~50 bytes/msg | ✅ Best |
|
||||
| Custom MAVLink dialect messages | pymavlink + custom XML | Full flexibility | Requires custom GCS plugin, non-standard | ~50 bytes/msg | ⚠️ Complex |
|
||||
| Separate telemetry channel | TCP/UDP over separate radio | Full bandwidth | Extra hardware, extra radio | N/A | ❌ Not available |
|
||||
|
||||
**Selected**: **Standard MAVLink forwarding + NAMED_VALUE_FLOAT**
|
||||
|
||||
Telemetry data sent to ground station:
|
||||
- GPS position: auto-forwarded by flight controller from GPS_INPUT data
|
||||
- Confidence score: NAMED_VALUE_FLOAT `"gps_conf"` at 1Hz (values: 1=HIGH, 2=MEDIUM, 3=LOW, 4=VERY_LOW)
|
||||
- Drift estimate: NAMED_VALUE_FLOAT `"gps_drift"` at 1Hz (meters)
|
||||
- Matching status: NAMED_VALUE_FLOAT `"sat_match"` at 1Hz (0=inactive, 1=matching, 2=failed)
|
||||
- Alerts: STATUSTEXT for critical events (re-localization request, system failure)
|
||||
|
||||
Re-localization from ground station:
|
||||
- Operator sees drift/failure alert in GCS
|
||||
- Sends COMMAND_LONG (MAV_CMD_USER_1) with lat/lon in param5/param6
|
||||
- Companion computer listens for COMMAND_LONG with target component ID
|
||||
- Receives hint → constrains tile search → attempts satellite matching near hint coordinates
|
||||
|
||||
### Component: Startup & Lifecycle (NEW)
|
||||
|
||||
**Startup sequence**:
|
||||
1. Boot Jetson → start GPS-Denied service (systemd)
|
||||
2. Connect to flight controller via pymavlink on UART
|
||||
3. Wait for heartbeat from flight controller
|
||||
4. Read GLOBAL_POSITION_INT → extract lat, lon, alt
|
||||
5. Initialize ESKF state with this position (high confidence if real GPS available)
|
||||
6. Start cuVSLAM with first camera frames
|
||||
7. Begin GPS_INPUT output loop at 5-10Hz
|
||||
8. Preload satellite tiles within ±2km of flight plan into RAM
|
||||
9. System ready — GPS-Denied active
|
||||
|
||||
**GPS denial detection**:
|
||||
Not required — the system always outputs GPS_INPUT. If real GPS is available, the flight controller uses whichever GPS source has better accuracy (configurable GPS blending or priority). When real GPS degrades/lost, flight controller seamlessly uses our GPS_INPUT.
|
||||
|
||||
**Failsafe**:
|
||||
- If no valid position estimate for N seconds (configurable, e.g., 10s): stop sending GPS_INPUT
|
||||
- Flight controller detects GPS timeout → falls back to IMU-only dead reckoning
|
||||
- System logs failure, continues attempting recovery (VO + satellite matching)
|
||||
- When recovery succeeds: resume GPS_INPUT output
|
||||
|
||||
**Reboot recovery**:
|
||||
1. Jetson reboots → re-establish pymavlink connection
|
||||
2. Read GPS_RAW_INT (now IMU-extrapolated by flight controller since GPS_INPUT stopped)
|
||||
3. Initialize ESKF with this position (low confidence, horiz_accuracy=100m+)
|
||||
4. Resume cuVSLAM + satellite matching → accuracy improves over time
|
||||
5. Resume GPS_INPUT output
|
||||
|
||||
### Component: Object Localization (UPDATED)
|
||||
|
||||
**Two modes**:
|
||||
|
||||
**Mode 1: Navigation camera (nadir)**
|
||||
Frame-center GPS from ESKF. Any object in navigation camera frame:
|
||||
1. Pixel offset from center: (dx_px, dy_px)
|
||||
2. Convert to meters: dx_m = dx_px × GSD, dy_m = dy_px × GSD
|
||||
3. Rotate by heading (yaw from IMU)
|
||||
4. Convert meter offset to lat/lon delta, add to frame-center GPS
|
||||
|
||||
**Mode 2: AI camera (configurable angle and zoom)**
|
||||
1. Get current UAV position from ESKF
|
||||
2. Get AI camera params: tilt_angle (from vertical), pan_angle (from heading), zoom (effective focal length)
|
||||
3. Get pixel coordinates of detected object in AI camera frame
|
||||
4. Compute bearing: bearing = heading + pan_angle + atan2(dx_px × sensor_width / focal_eff, focal_eff)
|
||||
5. Compute ground distance: for flat terrain, slant_range = altitude / cos(tilt_angle + dy_angle), ground_range = slant_range × sin(tilt_angle + dy_angle)
|
||||
6. Convert bearing + ground_range to lat/lon offset
|
||||
7. Return GPS coordinates with accuracy estimate
|
||||
|
||||
**Local API** (FastAPI on localhost:8000):
|
||||
- `POST /localize` — accepts: pixel_x, pixel_y, camera_id ("nav" or "ai"), ai_camera_params (tilt, pan, zoom) → returns: lat, lon, accuracy_m
|
||||
- `GET /status` — returns: system state, confidence, drift, uptime
|
||||
|
||||
### Component: Satellite Tile Preprocessing (Offline)
|
||||
|
||||
**Selected**: GeoHash-indexed tile pairs on disk + RAM preloading.
|
||||
|
||||
Pipeline:
|
||||
1. Define operational area from flight plan
|
||||
2. Download satellite tiles from Google Maps Tile API at zoom 19 (0.3 m/pixel)
|
||||
3. If zoom 19 unavailable: fall back to zoom 18 (0.6 m/pixel — meets ≥0.5 m/pixel requirement)
|
||||
4. Validate: resolution ≥0.5 m/pixel, check imagery staleness where possible
|
||||
5. Pre-resize each tile to matcher input resolution
|
||||
6. Store: original + resized + metadata (GPS bounds, zoom, GSD, download date) in GeoHash-indexed structure
|
||||
7. Copy to Jetson storage before flight
|
||||
8. At startup: preload tiles within ±2km of flight plan into RAM
|
||||
|
||||
### Component: Re-localization (Disconnected Segments)
|
||||
|
||||
When cuVSLAM reports tracking loss (sharp turn, no features):
|
||||
1. Flag next frame as keyframe → trigger satellite matching
|
||||
2. Compute IMU dead-reckoning position since last known position
|
||||
3. Rank preloaded tiles by distance to dead-reckoning position
|
||||
4. Try top 3 tiles sequentially
|
||||
5. If match found: position recovered, new segment begins
|
||||
6. If 3 consecutive keyframe failures: send STATUSTEXT alert to ground station ("RE-LOC REQUEST: position uncertain, drift Xm")
|
||||
7. While waiting for operator hint: continue VO/IMU dead reckoning, report low confidence via horiz_accuracy
|
||||
8. If operator sends COMMAND_LONG with lat/lon hint: constrain tile search to ±500m of hint
|
||||
9. If still no match after operator hint: continue dead reckoning, log failure
|
||||
|
||||
### Component: Thermal Management (NEW)
|
||||
|
||||
**Power mode**: 25W (stable sustained performance)
|
||||
|
||||
**Monitoring**: Read GPU/CPU temperature via tegrastats or sysfs thermal zones at 1Hz.
|
||||
|
||||
**Adaptive pipeline**:
|
||||
- Normal (<70°C): Full pipeline — cuVSLAM every frame + satellite match every 3-10 frames
|
||||
- Warm (70-75°C): Reduce satellite matching to every 5-10 frames
|
||||
- Hot (75-80°C): Reduce satellite matching to every 10-15 frames
|
||||
- Throttling (>80°C): Disable satellite matching entirely, VO+IMU only (cuVSLAM ~9ms is very light). Report LOW confidence. Resume satellite matching when temp drops below 75°C
|
||||
|
||||
**Hardware requirement**: Active cooling fan (5V) mandatory for UAV companion computer enclosure.
|
||||
|
||||
## Processing Time Budget (per frame, 333ms interval)
|
||||
|
||||
### Normal Frame (non-keyframe, ~60-80% of frames)
|
||||
|
||||
| Step | Time | Notes |
|
||||
|------|------|-------|
|
||||
| Image capture + transfer | ~10ms | CSI/USB3 |
|
||||
| Downsample (for cuVSLAM) | ~2ms | OpenCV CUDA |
|
||||
| cuVSLAM VO+IMU | ~9ms | NVIDIA CUDA-optimized, 116fps |
|
||||
| ESKF measurement update | ~1ms | NumPy |
|
||||
| **Total per camera frame** | **~22ms** | Well within 333ms |
|
||||
|
||||
GPS_INPUT output runs independently at 5-10Hz (every 100-200ms):
|
||||
| Step | Time | Notes |
|
||||
|------|------|-------|
|
||||
| Read ESKF state | <0.1ms | Shared state |
|
||||
| Compute horiz_accuracy | <0.1ms | √(σ²) |
|
||||
| pymavlink gps_input_send | ~1ms | UART write |
|
||||
| **Total per GPS_INPUT** | **~1ms** | Negligible overhead |
|
||||
|
||||
### Keyframe Satellite Matching (async, every 3-10 frames)
|
||||
|
||||
Runs on separate CUDA stream — does NOT block VO or GPS_INPUT.
|
||||
|
||||
**Path A — LiteSAM TRT FP16 at 1280px (if ≤200ms benchmark)**:
|
||||
|
||||
| Step | Time | Notes |
|
||||
|------|------|-------|
|
||||
| Downsample to 1280px | ~1ms | OpenCV CUDA |
|
||||
| Load satellite tile | ~1ms | Pre-loaded in RAM |
|
||||
| LiteSAM (opt) TRT FP16 | ≤200ms | Go/no-go threshold |
|
||||
| Geometric pose (RANSAC) | ~5ms | Homography |
|
||||
| ESKF satellite update | ~1ms | Delayed measurement |
|
||||
| **Total** | **≤210ms** | Async |
|
||||
|
||||
**Path B — XFeat (if LiteSAM >200ms)**:
|
||||
|
||||
| Step | Time | Notes |
|
||||
|------|------|-------|
|
||||
| XFeat extraction + matching | ~50-80ms | TensorRT FP16 |
|
||||
| Geometric verification (RANSAC) | ~5ms | |
|
||||
| ESKF satellite update | ~1ms | |
|
||||
| **Total** | **~60-90ms** | Async |
|
||||
|
||||
## Memory Budget (Jetson Orin Nano Super, 8GB shared)
|
||||
|
||||
| Component | Memory | Notes |
|
||||
|-----------|--------|-------|
|
||||
| OS + runtime | ~1.5GB | JetPack 6.2 + Python |
|
||||
| cuVSLAM | ~200-500MB | CUDA library + map state (configure pruning for 3000 frames) |
|
||||
| Satellite matcher TensorRT | ~50-100MB | LiteSAM FP16 or XFeat FP16 |
|
||||
| Preloaded satellite tiles | ~200MB | ±2km of flight plan |
|
||||
| pymavlink + MAVLink runtime | ~20MB | Lightweight |
|
||||
| FastAPI (local IPC) | ~50MB | Minimal, localhost only |
|
||||
| Current frame buffer | ~2MB | |
|
||||
| ESKF state + buffers | ~10MB | |
|
||||
| **Total** | **~2.1-2.9GB** | ~26-36% of 8GB — comfortable |
|
||||
|
||||
## Confidence Scoring → GPS_INPUT Mapping
|
||||
|
||||
| Level | Condition | horiz_accuracy (m) | fix_type | GPS_INPUT satellites_visible |
|
||||
|-------|-----------|---------------------|----------|------------------------------|
|
||||
| HIGH | Satellite match succeeded + cuVSLAM consistent | 10-20 | 3 (3D) | 12 |
|
||||
| MEDIUM | cuVSLAM VO only, recent satellite correction (<500m travel) | 20-50 | 3 (3D) | 8 |
|
||||
| LOW | cuVSLAM VO only, no recent correction, OR high thermal throttling | 50-100 | 2 (2D) | 4 |
|
||||
| VERY LOW | IMU dead-reckoning only | 100-500 | 1 (no fix) | 1 |
|
||||
| MANUAL | Operator-provided re-localization hint | 200 | 3 (3D) | 6 |
|
||||
|
||||
Note: `satellites_visible` is synthetic — used to influence EKF weighting. ArduPilot gives more weight to GPS with higher satellite count and lower horiz_accuracy.
|
||||
|
||||
## Key Risks and Mitigations
|
||||
|
||||
| Risk | Likelihood | Impact | Mitigation |
|
||||
|------|-----------|--------|------------|
|
||||
| **MAVSDK cannot send GPS_INPUT** | CONFIRMED | Must use pymavlink (conflicts with restriction) | Use pymavlink. Document restriction conflict. No alternative in Python. |
|
||||
| **cuVSLAM fails on low-texture agricultural terrain** | HIGH | Frequent tracking loss, degraded VO | Increase satellite matching frequency. IMU dead-reckoning bridge. Accept higher drift. |
|
||||
| **Jetson UART instability with ArduPilot** | MEDIUM | MAVLink connection drops | Test thoroughly. Use USB serial adapter if UART unreliable. Add watchdog reconnect. |
|
||||
| **Thermal throttling blows satellite matching budget** | MEDIUM | Miss keyframe windows | Adaptive pipeline: reduce/skip satellite matching at high temp. Active cooling mandatory. |
|
||||
| LiteSAM TRT FP16 >200ms at 1280px | MEDIUM | Must use XFeat | Day-one benchmark. XFeat fallback. |
|
||||
| XFeat cross-view accuracy insufficient | MEDIUM | Satellite corrections less accurate | Multi-tile consensus, strict RANSAC, increase keyframe frequency. |
|
||||
| cuVSLAM map memory growth on long flights | MEDIUM | Memory pressure | Configure map pruning, max keyframes. |
|
||||
| Google Maps satellite quality in conflict zone | HIGH | Satellite matching fails | Accept VO+IMU with higher drift. Alternative providers. |
|
||||
| GPS_INPUT at 3Hz too slow for ArduPilot EKF | HIGH | Poor EKF fusion, position jumps | 5-10Hz output with IMU interpolation between camera frames. |
|
||||
| Companion computer reboot mid-flight | LOW | ~30-60s GPS gap | Flight controller IMU fallback. Automatic recovery on restart. |
|
||||
| Telemetry bandwidth saturation | LOW | Custom messages compete with autopilot telemetry | Limit NAMED_VALUE_FLOAT to 1Hz. Keep messages compact. |
|
||||
|
||||
## Testing Strategy
|
||||
|
||||
### Integration / Functional Tests
|
||||
- End-to-end: camera → cuVSLAM → ESKF → GPS_INPUT → verify flight controller receives valid position
|
||||
- Compare computed positions against ground truth GPS from coordinates.csv
|
||||
- Measure: percentage within 50m (target: 80%), percentage within 20m (target: 60%)
|
||||
- Test GPS_INPUT rate: verify 5-10Hz output to flight controller
|
||||
- Test sharp-turn handling: verify satellite re-localization after 90-degree heading change
|
||||
- Test disconnected segments: simulate 3+ route breaks, verify all segments connected
|
||||
- Test re-localization: simulate 3 consecutive failures → verify STATUSTEXT sent → inject COMMAND_LONG hint → verify recovery
|
||||
- Test object localization: send POST /localize with known AI camera params → verify GPS accuracy
|
||||
- Test startup: verify ESKF initializes from flight controller GPS
|
||||
- Test reboot recovery: kill process → restart → verify reconnection and position recovery
|
||||
- Test failsafe: simulate total failure → verify GPS_INPUT stops → verify flight controller IMU fallback
|
||||
- Test cuVSLAM map memory: run 3000-frame session, monitor memory growth
|
||||
|
||||
### Non-Functional Tests
|
||||
- **Day-one satellite matcher benchmark**: LiteSAM TRT FP16 at 1280px on Orin Nano Super
|
||||
- cuVSLAM benchmark: verify 116fps monocular+IMU on Orin Nano Super
|
||||
- cuVSLAM terrain stress test: urban, agricultural, water, forest
|
||||
- **UART reliability test**: sustained pymavlink communication over 1+ hour
|
||||
- **Thermal endurance test**: run full pipeline for 30+ minutes, measure GPU temp, verify no throttling with active cooling
|
||||
- Per-frame latency: must be <400ms for VO pipeline
|
||||
- GPS_INPUT latency: measure time from camera capture to GPS_INPUT send
|
||||
- Memory: peak usage during 3000-frame session (must stay <8GB)
|
||||
- Drift budget: verify ESKF covariance tracks cumulative drift, triggers satellite matching before 100m
|
||||
- Telemetry bandwidth: measure total MAVLink bandwidth used by companion computer
|
||||
|
||||
## References
|
||||
- pymavlink GPS_INPUT example: https://webperso.ensta.fr/lebars/Share/GPS_INPUT_pymavlink.py
|
||||
- pymavlink mavgps.py: https://github.com/ArduPilot/pymavlink/blob/master/examples/mavgps.py
|
||||
- ArduPilot GPS Input module: https://ardupilot.org/mavproxy/docs/modules/GPSInput.html
|
||||
- MAVLink GPS_INPUT message spec: https://mavlink.io/en/messages/common.html#GPS_INPUT
|
||||
- MAVSDK-Python GPS_INPUT limitation: https://github.com/mavlink/MAVSDK-Python/issues/320
|
||||
- MAVSDK-Python custom message limitation: https://github.com/mavlink/MAVSDK-Python/issues/739
|
||||
- ArduPilot companion computer setup: https://ardupilot.org/dev/docs/raspberry-pi-via-mavlink.html
|
||||
- Jetson Orin UART with ArduPilot: https://forums.developer.nvidia.com/t/uart-connection-between-jetson-nano-orin-and-ardupilot/325416
|
||||
- MAVLink NAMED_VALUE_FLOAT: https://mavlink.io/en/messages/common.html#NAMED_VALUE_FLOAT
|
||||
- MAVLink STATUSTEXT: https://mavlink.io/en/messages/common.html#STATUSTEXT
|
||||
- MAVLink telemetry bandwidth: https://github.com/mavlink/mavlink/issues/1605
|
||||
- JetPack 6.2 Super Mode: https://developer.nvidia.com/blog/nvidia-jetpack-6-2-brings-super-mode-to-nvidia-jetson-orin-nano-and-jetson-orin-nx-modules/
|
||||
- Jetson Orin Nano power consumption: https://edgeaistack.app/blog/jetson-orin-nano-power-consumption/
|
||||
- UAV target geolocation: https://www.mdpi.com/1424-8220/22/5/1903
|
||||
- LiteSAM (2025): https://www.mdpi.com/2072-4292/17/19/3349
|
||||
- LiteSAM code: https://github.com/boyagesmile/LiteSAM
|
||||
- cuVSLAM (2025-2026): https://github.com/NVlabs/PyCuVSLAM
|
||||
- PyCuVSLAM API: https://nvlabs.github.io/PyCuVSLAM/api.html
|
||||
- Intermodalics cuVSLAM benchmark: https://www.intermodalics.ai/blog/nvidia-isaac-ros-in-depth-cuvslam-and-the-dp3-1-release
|
||||
- XFeat (CVPR 2024): https://arxiv.org/abs/2404.19174
|
||||
- XFeat TensorRT for Jetson: https://github.com/PranavNedunghat/XFeatTensorRT
|
||||
- EfficientLoFTR (CVPR 2024): https://github.com/zju3dv/EfficientLoFTR
|
||||
- STHN (IEEE RA-L 2024): https://github.com/arplaboratory/STHN
|
||||
- JointLoc (IROS 2024): https://github.com/LuoXubo/JointLoc
|
||||
- Hybrid ESKF/UKF: https://arxiv.org/abs/2512.17505
|
||||
- Google Maps Tile API: https://developers.google.com/maps/documentation/tile/satellite
|
||||
- ArduPilot EKF Source Selection: https://ardupilot.org/copter/docs/common-ekf-sources.html
|
||||
- Mateos-Ramirez et al. (2024): https://www.mdpi.com/2076-3417/14/16/7420
|
||||
- SatLoc (2025): https://www.scilit.com/publications/e5cafaf875a49297a62b298a89d5572f
|
||||
|
||||
## Related Artifacts
|
||||
- AC Assessment: `_docs/00_research/gps_denied_nav/00_ac_assessment.md`
|
||||
- Research artifacts: `_docs/00_research/gps_denied_nav_v3/`
|
||||
- Tech stack evaluation: `_docs/01_solution/tech_stack.md`
|
||||
- Security analysis: `_docs/01_solution/security_analysis.md`
|
||||
@@ -1,385 +0,0 @@
|
||||
# Solution Draft
|
||||
|
||||
## Assessment Findings
|
||||
|
||||
| Old Component Solution | Weak Point (functional/security/performance) | New Solution |
|
||||
|------------------------|----------------------------------------------|-------------|
|
||||
| ONNX Runtime as potential inference runtime for AI models | **Performance**: ONNX Runtime CUDA EP on Jetson Orin Nano is 7-8x slower than TRT standalone with default settings (tensor cores not utilized). Even TRT-EP shows up to 3x overhead on some models. | **Use native TRT Engine for all AI models**. Convert PyTorch → ONNX → trtexec → .engine. Load with tensorrt Python module. Eliminates ONNX Runtime dependency entirely. |
|
||||
| ONNX Runtime TRT-EP memory overhead | **Performance**: ONNX RT TRT-EP keeps serialized engine in memory (~420-440MB vs 130-140MB native TRT). Delta ~280-300MB PER MODEL. On 8GB shared memory, this wastes ~560-600MB for two models. | **Native TRT releases serialized blob after deserialization** → saves ~280-300MB per model. Total savings ~560-600MB — 7% of total memory. Critical given cuVSLAM map growth risk. |
|
||||
| No explicit TRT engine build step in offline pipeline | **Functional**: Draft03 mentions TRT FP16 but doesn't define the build workflow. When/where are engines built? | **Add TRT engine build to offline preparation pipeline**: After satellite tile download, run trtexec on Jetson to build .engine files. Store alongside tiles. One-time cost per model version. |
|
||||
| Cross-platform portability via ONNX Runtime | **Functional**: ONNX Runtime's primary value is cross-platform support. Our deployment is Jetson-only — this value is zero. We pay the performance/memory tax for unused portability. | **Drop ONNX Runtime**. Jetson Orin Nano Super is fixed deployment hardware. TRT Engine is the optimal runtime for NVIDIA-only deployment. |
|
||||
| No DLA offloading considered | **Performance**: Draft03 doesn't mention DLA. Jetson Orin Nano has NO DLA cores — only Orin NX (1-2) and AGX Orin (2) have DLA. | **Confirm: DLA offloading is NOT available on Orin Nano**. All inference must run on GPU (1024 CUDA cores, 16 tensor cores). This makes maximizing GPU efficiency via native TRT even more critical. |
|
||||
| LiteSAM MinGRU TRT compatibility risk | **Functional**: LiteSAM's subpixel refinement uses 4 stacked MinGRU layers over a 3×3 candidate window (seq_len=9). MinGRU gates depend only on input C_f (not h_{t-1}), so z_t/h̃_t are pre-computable. Ops are standard: Linear, Sigmoid, Mul, Add, ReLU, Tanh. Risk is LOW-MEDIUM — depends on whether implementation uses logcumsumexp (problematic) or simple loop (fine). Seq_len=9 makes this trivially rewritable. | **Day-one verification**: clone LiteSAM repo → torch.onnx.export → polygraphy inspect → trtexec --fp16. If export fails on MinGRU: rewrite forward() as unrolled loop (9 steps). **If LiteSAM cannot be made TRT-compatible: replace with EfficientLoFTR TRT** (proven TRT path via Coarse_LoFTR_TRT, 15.05M params, semi-dense matching). |
|
||||
|
||||
## Product Solution Description
|
||||
|
||||
A real-time GPS-denied visual navigation system for fixed-wing UAVs, running on a Jetson Orin Nano Super (8GB). All AI model inference uses **native TensorRT Engine files** — no ONNX Runtime dependency. The system replaces the GPS module by sending MAVLink GPS_INPUT messages via pymavlink over UART at 5-10Hz.
|
||||
|
||||
Position is determined by fusing: (1) CUDA-accelerated visual odometry (cuVSLAM — native CUDA), (2) absolute position corrections from satellite image matching (LiteSAM or XFeat — TRT Engine FP16), and (3) IMU data from the flight controller via ESKF.
|
||||
|
||||
**Inference runtime decision**: Native TRT Engine over ONNX Runtime because:
|
||||
1. ONNX RT CUDA EP is 7-8x slower on Orin Nano (tensor core bug)
|
||||
2. ONNX RT TRT-EP wastes ~280-300MB per model (serialized engine retained in memory)
|
||||
3. Cross-platform portability has zero value — deployment is Jetson-only
|
||||
4. Native TRT provides direct CUDA stream control for pipelining with cuVSLAM
|
||||
|
||||
**Hard constraint**: Camera shoots at ~3fps (333ms interval). Full VO+ESKF pipeline within 400ms. GPS_INPUT at 5-10Hz.
|
||||
|
||||
**AI Model Runtime Summary**:
|
||||
|
||||
| Model | Runtime | Precision | Memory | Integration |
|
||||
|-------|---------|-----------|--------|-------------|
|
||||
| cuVSLAM | Native CUDA (PyCuVSLAM) | N/A (closed-source) | ~200-500MB | CUDA Stream A |
|
||||
| LiteSAM | TRT Engine | FP16 | ~50-80MB | CUDA Stream B |
|
||||
| XFeat | TRT Engine | FP16 | ~30-50MB | CUDA Stream B (fallback) |
|
||||
| ESKF | CPU (Python/C++) | FP64 | ~10MB | CPU thread |
|
||||
|
||||
**Offline Preparation Pipeline** (before flight):
|
||||
1. Download satellite tiles → validate → pre-resize → store (existing)
|
||||
2. **NEW: Build TRT engines on Jetson** (one-time per model version)
|
||||
- `trtexec --onnx=litesam_fp16.onnx --saveEngine=litesam.engine --fp16`
|
||||
- `trtexec --onnx=xfeat.onnx --saveEngine=xfeat.engine --fp16`
|
||||
3. Copy tiles + engines to Jetson storage
|
||||
4. At startup: load engines + preload tiles into RAM
|
||||
|
||||
```
|
||||
┌─────────────────────────────────────────────────────────────────────┐
|
||||
│ OFFLINE (Before Flight) │
|
||||
│ 1. Satellite Tiles → Download & Validate → Pre-resize → Store │
|
||||
│ (Google Maps) (≥0.5m/px, <2yr) (matcher res) (GeoHash)│
|
||||
│ 2. TRT Engine Build (one-time per model version): │
|
||||
│ PyTorch model → reparameterize → ONNX export → trtexec --fp16 │
|
||||
│ Output: litesam.engine, xfeat.engine │
|
||||
│ 3. Copy tiles + engines to Jetson storage │
|
||||
└─────────────────────────────────────────────────────────────────────┘
|
||||
│
|
||||
▼
|
||||
┌─────────────────────────────────────────────────────────────────────┐
|
||||
│ ONLINE (During Flight) │
|
||||
│ │
|
||||
│ STARTUP: │
|
||||
│ 1. pymavlink → read GLOBAL_POSITION_INT → init ESKF │
|
||||
│ 2. Load TRT engines: litesam.engine + xfeat.engine │
|
||||
│ (tensorrt.Runtime → deserialize_cuda_engine → create_context) │
|
||||
│ 3. Allocate GPU buffers for TRT input/output (PyCUDA) │
|
||||
│ 4. Start cuVSLAM with first camera frames │
|
||||
│ 5. Preload satellite tiles ±2km into RAM │
|
||||
│ 6. Begin GPS_INPUT output loop at 5-10Hz │
|
||||
│ │
|
||||
│ EVERY FRAME (3fps, 333ms interval): │
|
||||
│ ┌──────────────────────────────────────┐ │
|
||||
│ │ Nav Camera → Downsample (CUDA ~2ms) │ │
|
||||
│ │ → cuVSLAM VO+IMU (~9ms) │ ← CUDA Stream A │
|
||||
│ │ → ESKF measurement update │ │
|
||||
│ └──────────────────────────────────────┘ │
|
||||
│ │
|
||||
│ 5-10Hz CONTINUOUS: │
|
||||
│ ┌──────────────────────────────────────┐ │
|
||||
│ │ ESKF IMU prediction → GPS_INPUT send │──→ Flight Controller │
|
||||
│ └──────────────────────────────────────┘ │
|
||||
│ │
|
||||
│ KEYFRAMES (every 3-10 frames, async): │
|
||||
│ ┌──────────────────────────────────────┐ │
|
||||
│ │ TRT Engine inference (Stream B): │ │
|
||||
│ │ context.enqueue_v3(stream_B) │──→ ESKF correction │
|
||||
│ │ LiteSAM FP16 or XFeat FP16 │ │
|
||||
│ └──────────────────────────────────────┘ │
|
||||
│ │
|
||||
│ TELEMETRY (1Hz): │
|
||||
│ ┌──────────────────────────────────────┐ │
|
||||
│ │ NAMED_VALUE_FLOAT: confidence, drift │──→ Ground Station │
|
||||
│ └──────────────────────────────────────┘ │
|
||||
└─────────────────────────────────────────────────────────────────────┘
|
||||
```
|
||||
|
||||
## Architecture
|
||||
|
||||
### Component: AI Model Inference Runtime
|
||||
|
||||
| Solution | Tools | Advantages | Limitations | Performance | Memory | Fit |
|
||||
|----------|-------|-----------|-------------|------------|--------|-----|
|
||||
| Native TRT Engine | tensorrt Python + PyCUDA + trtexec | Optimal latency, minimal memory, full tensor core usage, direct CUDA stream control | Hardware-specific engines, manual buffer management, rebuild per TRT version | Optimal | ~50-130MB total (both models) | ✅ Best |
|
||||
| ONNX Runtime TRT-EP | onnxruntime + TensorRT EP | Auto-fallback for unsupported ops, simpler API, auto engine caching | +280-300MB per model, wrapper overhead, first-run latency spike | Near-parity (claimed), up to 3x slower (observed) | ~640-690MB total (both models) | ❌ Memory overhead unacceptable |
|
||||
| ONNX Runtime CUDA EP | onnxruntime + CUDA EP | Simplest API, broadest op support | 7-8x slower on Orin Nano (tensor core bug), no TRT optimizations | 7-8x slower | Standard | ❌ Performance unacceptable |
|
||||
| Torch-TensorRT | torch_tensorrt | AOT compilation, PyTorch-native, handles mixed TRT/PyTorch | Newer on Jetson, requires PyTorch runtime at inference | Near native TRT | PyTorch runtime ~500MB+ | ⚠️ Viable alternative if TRT export fails |
|
||||
|
||||
**Selected**: **Native TRT Engine** — optimal performance and memory on our fixed NVIDIA hardware.
|
||||
|
||||
**Fallback**: If any model has unsupported TRT ops (e.g., MinGRU in LiteSAM), use **Torch-TensorRT** for that specific model. Torch-TensorRT handles mixed TRT/PyTorch execution but requires PyTorch runtime in memory.
|
||||
|
||||
### Component: TRT Engine Conversion Workflow
|
||||
|
||||
**LiteSAM conversion**:
|
||||
1. Load PyTorch model with trained weights
|
||||
2. Reparameterize MobileOne backbone (collapse multi-branch → single Conv2d+BN)
|
||||
3. Export to ONNX: `torch.onnx.export(model, dummy_input, "litesam.onnx", opset_version=17)`
|
||||
4. Verify with polygraphy: `polygraphy inspect model litesam.onnx`
|
||||
5. Build engine on Jetson: `trtexec --onnx=litesam.onnx --saveEngine=litesam.engine --fp16 --memPoolSize=workspace:2048`
|
||||
6. Verify engine: `trtexec --loadEngine=litesam.engine --fp16`
|
||||
|
||||
**XFeat conversion**:
|
||||
1. Load PyTorch model
|
||||
2. Export to ONNX: `torch.onnx.export(model, dummy_input, "xfeat.onnx", opset_version=17)`
|
||||
3. Build engine on Jetson: `trtexec --onnx=xfeat.onnx --saveEngine=xfeat.engine --fp16`
|
||||
4. Alternative: use XFeatTensorRT C++ implementation directly
|
||||
|
||||
**INT8 quantization strategy** (optional, future optimization):
|
||||
- MobileOne backbone (CNN): INT8 safe with calibration data
|
||||
- TAIFormer (transformer attention): FP16 only — INT8 degrades accuracy
|
||||
- XFeat: evaluate INT8 on actual UAV-satellite pairs before deploying
|
||||
- Use nvidia-modelopt for calibration: `from modelopt.onnx.quantization import quantize`
|
||||
|
||||
### Component: TRT Python Inference Wrapper
|
||||
|
||||
Minimal wrapper class for TRT engine inference:
|
||||
|
||||
```python
|
||||
import tensorrt as trt
|
||||
import pycuda.driver as cuda
|
||||
|
||||
class TRTInference:
|
||||
def __init__(self, engine_path, stream):
|
||||
self.logger = trt.Logger(trt.Logger.WARNING)
|
||||
self.runtime = trt.Runtime(self.logger)
|
||||
with open(engine_path, 'rb') as f:
|
||||
self.engine = self.runtime.deserialize_cuda_engine(f.read())
|
||||
self.context = self.engine.create_execution_context()
|
||||
self.stream = stream
|
||||
self._allocate_buffers()
|
||||
|
||||
def _allocate_buffers(self):
|
||||
self.inputs = {}
|
||||
self.outputs = {}
|
||||
for i in range(self.engine.num_io_tensors):
|
||||
name = self.engine.get_tensor_name(i)
|
||||
shape = self.engine.get_tensor_shape(name)
|
||||
dtype = trt.nptype(self.engine.get_tensor_dtype(name))
|
||||
size = trt.volume(shape)
|
||||
device_mem = cuda.mem_alloc(size * np.dtype(dtype).itemsize)
|
||||
self.context.set_tensor_address(name, int(device_mem))
|
||||
mode = self.engine.get_tensor_mode(name)
|
||||
if mode == trt.TensorIOMode.INPUT:
|
||||
self.inputs[name] = (device_mem, shape, dtype)
|
||||
else:
|
||||
self.outputs[name] = (device_mem, shape, dtype)
|
||||
|
||||
def infer_async(self, input_data):
|
||||
for name, data in input_data.items():
|
||||
cuda.memcpy_htod_async(self.inputs[name][0], data, self.stream)
|
||||
self.context.enqueue_v3(self.stream.handle)
|
||||
|
||||
def get_output(self):
|
||||
results = {}
|
||||
for name, (dev_mem, shape, dtype) in self.outputs.items():
|
||||
host_mem = np.empty(shape, dtype=dtype)
|
||||
cuda.memcpy_dtoh_async(host_mem, dev_mem, self.stream)
|
||||
self.stream.synchronize()
|
||||
return results
|
||||
```
|
||||
|
||||
Key design: `infer_async()` + `get_output()` split enables pipelining with cuVSLAM on Stream A while satellite matching runs on Stream B.
|
||||
|
||||
### Component: Visual Odometry (UNCHANGED)
|
||||
|
||||
cuVSLAM — native CUDA library, not affected by TRT migration. Already optimal.
|
||||
|
||||
### Component: Satellite Image Matching (UPDATED runtime + fallback chain)
|
||||
|
||||
| Solution | Tools | Advantages | Limitations | Performance (est. Orin Nano Super TRT FP16) | Params | Fit |
|
||||
|----------|-------|-----------|-------------|----------------------------------------------|--------|-----|
|
||||
| LiteSAM (opt) TRT Engine FP16 @ 1280px | trtexec + tensorrt Python | Best satellite-aerial accuracy (RMSE@30=17.86m UAV-VisLoc), 6.31M params, smallest model | MinGRU TRT export needs verification (LOW-MEDIUM risk) | Est. ~165-330ms | 6.31M | ✅ Primary (if TRT export succeeds AND ≤200ms) |
|
||||
| EfficientLoFTR TRT Engine FP16 | trtexec + tensorrt Python | Proven TRT path (Coarse_LoFTR_TRT repo, 138 stars). Semi-dense. CVPR 2024. High accuracy. | 2.4x more params than LiteSAM. Requires einsum→elementary ops rewrite for TRT (documented in Coarse_LoFTR_TRT paper). | Est. ~200-400ms | 15.05M | ✅ Fallback if LiteSAM TRT fails |
|
||||
| XFeat TRT Engine FP16 | trtexec + tensorrt Python (or XFeatTensorRT C++) | Fastest. Proven TRT implementation. Lightweight. | General-purpose, not designed for cross-view satellite-aerial gap (but nadir-nadir gap is small). | Est. ~50-100ms | <5M | ✅ Speed fallback |
|
||||
|
||||
**Decision tree (day-one on Orin Nano Super)**:
|
||||
1. Clone LiteSAM repo → reparameterize MobileOne → `torch.onnx.export()` → `polygraphy inspect`
|
||||
2. If ONNX export succeeds → `trtexec --onnx=litesam.onnx --saveEngine=litesam.engine --fp16`
|
||||
3. If MinGRU causes ONNX/TRT failure → rewrite MinGRU forward() as unrolled 9-step loop → retry
|
||||
4. If rewrite fails or accuracy degrades → **switch to EfficientLoFTR TRT**:
|
||||
- Apply Coarse_LoFTR_TRT TRT-adaptation techniques (einsum replacement, etc.)
|
||||
- Export to ONNX → trtexec --fp16
|
||||
- Benchmark at 640×480 and 1280px
|
||||
5. Benchmark winner: **if ≤200ms → use it. If >200ms but ≤300ms → acceptable (async on Stream B). If >300ms → use XFeat TRT**
|
||||
|
||||
**EfficientLoFTR TRT adaptation** (from Coarse_LoFTR_TRT paper, proven workflow):
|
||||
- Replace `torch.einsum()` with elementary ops (view, bmm, reshape, sum)
|
||||
- Replace any TRT-incompatible high-level PyTorch functions
|
||||
- Use ONNX export path (less memory required than Torch-TensorRT on 8GB device)
|
||||
- Knowledge distillation available for further parameter reduction if needed
|
||||
|
||||
### Component: Sensor Fusion (UNCHANGED)
|
||||
ESKF — CPU-based mathematical filter, not affected.
|
||||
|
||||
### Component: Flight Controller Integration (UNCHANGED)
|
||||
pymavlink — not affected by TRT migration.
|
||||
|
||||
### Component: Ground Station Telemetry (UNCHANGED)
|
||||
MAVLink NAMED_VALUE_FLOAT — not affected.
|
||||
|
||||
### Component: Startup & Lifecycle (UPDATED)
|
||||
|
||||
**Updated startup sequence**:
|
||||
1. Boot Jetson → start GPS-Denied service (systemd)
|
||||
2. Connect to flight controller via pymavlink on UART
|
||||
3. Wait for heartbeat from flight controller
|
||||
4. **Initialize PyCUDA context**
|
||||
5. **Load TRT engines**: litesam.engine + xfeat.engine via tensorrt.Runtime.deserialize_cuda_engine()
|
||||
6. **Allocate GPU I/O buffers** for both models
|
||||
7. **Create CUDA streams**: Stream A (cuVSLAM), Stream B (satellite matching)
|
||||
8. Read GLOBAL_POSITION_INT → init ESKF
|
||||
9. Start cuVSLAM with first camera frames
|
||||
10. Begin GPS_INPUT output loop at 5-10Hz
|
||||
11. Preload satellite tiles within ±2km into RAM
|
||||
12. System ready
|
||||
|
||||
**Engine load time**: ~1-3 seconds per engine (deserialization from .engine file). One-time cost at startup.
|
||||
|
||||
### Component: Thermal Management (UNCHANGED)
|
||||
Same adaptive pipeline. TRT engines are slightly more power-efficient than ONNX Runtime, but the difference is within noise.
|
||||
|
||||
### Component: Object Localization (UNCHANGED)
|
||||
Not affected — trigonometric calculation, no AI inference.
|
||||
|
||||
## Speed Optimization Techniques
|
||||
|
||||
### 1. cuVSLAM for Visual Odometry (~9ms/frame)
|
||||
Unchanged from draft03. Native CUDA, not part of TRT migration.
|
||||
|
||||
### 2. Native TRT Engine Inference (NEW)
|
||||
All AI models run as pre-compiled TRT FP16 engines:
|
||||
- Engine files built offline with trtexec (one-time per model version)
|
||||
- Loaded at startup (~1-3s per engine)
|
||||
- Inference via context.enqueue_v3() on dedicated CUDA Stream B
|
||||
- GPU buffers pre-allocated — zero runtime allocation during flight
|
||||
- No ONNX Runtime dependency — no framework overhead
|
||||
|
||||
Memory advantage over ONNX Runtime TRT-EP: ~560-600MB saved (both models combined).
|
||||
Latency advantage: eliminates ONNX wrapper overhead, guaranteed tensor core utilization.
|
||||
|
||||
### 3. CUDA Stream Pipelining (REFINED)
|
||||
- Stream A: cuVSLAM VO for current frame (~9ms) + ESKF fusion (~1ms)
|
||||
- Stream B: TRT engine inference for satellite matching (LiteSAM or XFeat, async)
|
||||
- CPU: GPS_INPUT output loop, NAMED_VALUE_FLOAT, command listener, tile management
|
||||
- **NEW**: Both cuVSLAM and TRT engines use CUDA streams natively — no framework abstraction layer. Direct GPU scheduling.
|
||||
|
||||
### 4-7. (UNCHANGED from draft03)
|
||||
Keyframe-based satellite matching, TensorRT FP16 optimization, proactive tile loading, 5-10Hz GPS_INPUT output — all unchanged.
|
||||
|
||||
## Processing Time Budget (per frame, 333ms interval)
|
||||
|
||||
### Normal Frame (non-keyframe)
|
||||
Unchanged from draft03 — cuVSLAM dominates at ~22ms total.
|
||||
|
||||
### Keyframe Satellite Matching (async, CUDA Stream B)
|
||||
|
||||
**Path A — LiteSAM TRT Engine FP16 at 1280px**:
|
||||
|
||||
| Step | Time | Notes |
|
||||
|------|------|-------|
|
||||
| Downsample to 1280px | ~1ms | OpenCV CUDA |
|
||||
| Load satellite tile | ~1ms | Pre-loaded in RAM |
|
||||
| Copy input to GPU buffer | <0.5ms | PyCUDA memcpy_htod_async |
|
||||
| LiteSAM TRT Engine FP16 | ≤200ms | context.enqueue_v3(stream_B) |
|
||||
| Copy output from GPU | <0.5ms | PyCUDA memcpy_dtoh_async |
|
||||
| Geometric pose (RANSAC) | ~5ms | Homography |
|
||||
| ESKF satellite update | ~1ms | Delayed measurement |
|
||||
| **Total** | **≤210ms** | Async on Stream B |
|
||||
|
||||
**Path B — XFeat TRT Engine FP16**:
|
||||
|
||||
| Step | Time | Notes |
|
||||
|------|------|-------|
|
||||
| XFeat TRT Engine inference | ~50-80ms | context.enqueue_v3(stream_B) |
|
||||
| Geometric verification (RANSAC) | ~5ms | |
|
||||
| ESKF satellite update | ~1ms | |
|
||||
| **Total** | **~60-90ms** | Async on Stream B |
|
||||
|
||||
## Memory Budget (Jetson Orin Nano Super, 8GB shared)
|
||||
|
||||
| Component | Memory (Native TRT) | Memory (ONNX RT TRT-EP) | Notes |
|
||||
|-----------|---------------------|--------------------------|-------|
|
||||
| OS + runtime | ~1.5GB | ~1.5GB | JetPack 6.2 + Python |
|
||||
| cuVSLAM | ~200-500MB | ~200-500MB | CUDA library + map |
|
||||
| **LiteSAM TRT engine** | **~50-80MB** | **~330-360MB** | Native TRT vs TRT-EP. If LiteSAM fails: EfficientLoFTR ~100-150MB |
|
||||
| **XFeat TRT engine** | **~30-50MB** | **~310-330MB** | Native TRT vs TRT-EP |
|
||||
| Preloaded satellite tiles | ~200MB | ~200MB | ±2km of flight plan |
|
||||
| pymavlink + MAVLink | ~20MB | ~20MB | |
|
||||
| FastAPI (local IPC) | ~50MB | ~50MB | |
|
||||
| ESKF + buffers | ~10MB | ~10MB | |
|
||||
| ONNX Runtime framework | **0MB** | **~150MB** | Eliminated with native TRT |
|
||||
| **Total** | **~2.1-2.9GB** | **~2.8-3.6GB** | |
|
||||
| **% of 8GB** | **26-36%** | **35-45%** | |
|
||||
| **Savings** | — | — | **~700MB saved with native TRT** |
|
||||
|
||||
## Confidence Scoring → GPS_INPUT Mapping
|
||||
Unchanged from draft03.
|
||||
|
||||
## Key Risks and Mitigations
|
||||
|
||||
| Risk | Likelihood | Impact | Mitigation |
|
||||
|------|-----------|--------|------------|
|
||||
| **LiteSAM MinGRU ops unsupported in TRT 10.3** | LOW-MEDIUM | LiteSAM TRT export fails | Day-one verification: ONNX export → polygraphy → trtexec. If MinGRU fails: (1) rewrite as unrolled 9-step loop, (2) if still fails: **switch to EfficientLoFTR TRT** (proven TRT path, Coarse_LoFTR_TRT, 15.05M params). XFeat TRT as speed fallback. |
|
||||
| **TRT engine build OOM on 8GB Jetson** | LOW | Cannot build engines on target device | Our models are small (6.31M LiteSAM, <5M XFeat). OOM unlikely. If occurs: reduce --memPoolSize, or build on identical Orin Nano module with more headroom |
|
||||
| **Engine incompatibility after JetPack update** | MEDIUM | Must rebuild engines | Include engine rebuild in JetPack update procedure. Takes minutes per model. |
|
||||
| **MAVSDK cannot send GPS_INPUT** | CONFIRMED | Must use pymavlink | Unchanged from draft03 |
|
||||
| **cuVSLAM fails on low-texture terrain** | HIGH | Frequent tracking loss | Unchanged from draft03 |
|
||||
| **Thermal throttling** | MEDIUM | Satellite matching budget blown | Unchanged from draft03 |
|
||||
| LiteSAM TRT FP16 >200ms at 1280px | MEDIUM | Must use fallback matcher | Day-one benchmark. Fallback chain: EfficientLoFTR TRT (if ≤300ms) → XFeat TRT (if all >300ms) |
|
||||
| Google Maps satellite quality in conflict zone | HIGH | Satellite matching fails | Unchanged from draft03 |
|
||||
|
||||
## Testing Strategy
|
||||
|
||||
### Integration / Functional Tests
|
||||
All tests from draft03 unchanged, plus:
|
||||
- **TRT engine load test**: Verify litesam.engine and xfeat.engine load successfully on Jetson Orin Nano Super
|
||||
- **TRT inference correctness**: Compare TRT engine output vs PyTorch reference output (max L1 error < 0.01)
|
||||
- **CUDA Stream B pipelining**: Verify satellite matching on Stream B does not block cuVSLAM on Stream A
|
||||
- **Engine pre-built validation**: Verify engine files from offline preparation work without rebuild at runtime
|
||||
|
||||
### Non-Functional Tests
|
||||
All tests from draft03 unchanged, plus:
|
||||
- **TRT engine build time**: Measure trtexec build time for LiteSAM and XFeat on Orin Nano Super (expected: 1-5 minutes each)
|
||||
- **TRT engine load time**: Measure deserialization time (expected: 1-3 seconds each)
|
||||
- **Memory comparison**: Measure actual GPU memory with native TRT vs ONNX RT TRT-EP for both models
|
||||
- **MinGRU TRT compatibility** (day-one blocker):
|
||||
1. Clone LiteSAM repo, load pretrained weights
|
||||
2. Reparameterize MobileOne backbone
|
||||
3. `torch.onnx.export(model, dummy, "litesam.onnx", opset_version=17)`
|
||||
4. `polygraphy inspect model litesam.onnx` — check for unsupported ops
|
||||
5. `trtexec --onnx=litesam.onnx --saveEngine=litesam.engine --fp16`
|
||||
6. If step 3 or 5 fails on MinGRU: rewrite MinGRU forward() as unrolled loop, retry
|
||||
7. If still fails: switch to EfficientLoFTR, apply Coarse_LoFTR_TRT adaptation
|
||||
8. Compare TRT output vs PyTorch reference (max L1 error < 0.01)
|
||||
- **EfficientLoFTR TRT fallback benchmark** (if LiteSAM fails): apply TRT adaptation from Coarse_LoFTR_TRT → ONNX → trtexec → measure latency at 640×480 and 1280px
|
||||
- **Tensor core utilization**: Verify with NSight that TRT engines use tensor cores (unlike ONNX RT CUDA EP)
|
||||
|
||||
## References
|
||||
- ONNX Runtime Issue #24085 (Jetson Orin Nano tensor core bug): https://github.com/microsoft/onnxruntime/issues/24085
|
||||
- ONNX Runtime Issue #20457 (TRT-EP memory overhead): https://github.com/microsoft/onnxruntime/issues/20457
|
||||
- ONNX Runtime Issue #12083 (TRT-EP vs native TRT): https://github.com/microsoft/onnxruntime/issues/12083
|
||||
- NVIDIA TensorRT 10 Python API: https://docs.nvidia.com/deeplearning/tensorrt/10.15.1/inference-library/python-api-docs.html
|
||||
- TensorRT Best Practices: https://docs.nvidia.com/deeplearning/tensorrt/latest/performance/best-practices.html
|
||||
- TensorRT engine hardware specificity: https://github.com/NVIDIA/TensorRT/issues/1920
|
||||
- trtexec ONNX conversion: https://nvidia-jetson.piveral.com/jetson-orin-nano/how-to-convert-onnx-to-engine-on-jetson-orin-nano-dev-board/
|
||||
- Torch-TensorRT JetPack 6.2: https://docs.pytorch.org/TensorRT/v2.10.0/getting_started/jetpack.html
|
||||
- XFeatTensorRT: https://github.com/PranavNedunghat/XFeatTensorRT
|
||||
- JetPack 6.2 Release Notes: https://docs.nvidia.com/jetson/archives/jetpack-archived/jetpack-62/release-notes/index.html
|
||||
- Jetson Orin Nano Super: https://developer.nvidia.com/blog/nvidia-jetson-orin-nano-developer-kit-gets-a-super-boost/
|
||||
- DLA on Jetson Orin: https://developer.nvidia.com/blog/maximizing-deep-learning-performance-on-nvidia-jetson-orin-with-dla/
|
||||
- EfficientLoFTR (CVPR 2024): https://github.com/zju3dv/EfficientLoFTR
|
||||
- EfficientLoFTR HuggingFace: https://huggingface.co/docs/transformers/en/model_doc/efficientloftr
|
||||
- Coarse_LoFTR_TRT (TRT for embedded): https://github.com/Kolkir/Coarse_LoFTR_TRT
|
||||
- Coarse_LoFTR_TRT paper: https://ar5iv.labs.arxiv.org/html/2202.00770
|
||||
- LoFTR_TRT: https://github.com/Kolkir/LoFTR_TRT
|
||||
- minGRU ("Were RNNs All We Needed?"): https://huggingface.co/papers/2410.01201
|
||||
- minGRU PyTorch implementation: https://github.com/lucidrains/minGRU-pytorch
|
||||
- LiteSAM paper (MinGRU details, Eqs 12-16): https://www.mdpi.com/2072-4292/17/19/3349
|
||||
- DALGlue (UAV feature matching, 2025): https://www.nature.com/articles/s41598-025-21602-5
|
||||
- All references from solution_draft03.md
|
||||
|
||||
## Related Artifacts
|
||||
- AC Assessment: `_docs/00_research/gps_denied_nav/00_ac_assessment.md`
|
||||
- Research artifacts (this assessment): `_docs/00_research/trt_engine_migration/`
|
||||
- Previous research: `_docs/00_research/gps_denied_nav_v3/`
|
||||
- Tech stack evaluation: `_docs/01_solution/tech_stack.md`
|
||||
- Security analysis: `_docs/01_solution/security_analysis.md`
|
||||
@@ -1,562 +0,0 @@
|
||||
# Solution Draft
|
||||
|
||||
## Assessment Findings
|
||||
|
||||
| Old Component Solution | Weak Point (functional/security/performance) | New Solution |
|
||||
|------------------------|----------------------------------------------|-------------|
|
||||
| ONNX Runtime as potential inference runtime for AI models | **Performance**: ONNX Runtime CUDA EP on Jetson Orin Nano is 7-8x slower than TRT standalone with default settings (tensor cores not utilized). Even TRT-EP shows up to 3x overhead on some models. | **Use native TRT Engine for all AI models**. Convert PyTorch → ONNX → trtexec → .engine. Load with tensorrt Python module. Eliminates ONNX Runtime dependency entirely. |
|
||||
| ONNX Runtime TRT-EP memory overhead | **Performance**: ONNX RT TRT-EP keeps serialized engine in memory (~420-440MB vs 130-140MB native TRT). Delta ~280-300MB PER MODEL. On 8GB shared memory, this wastes ~560-600MB for two models. | **Native TRT releases serialized blob after deserialization** → saves ~280-300MB per model. Total savings ~560-600MB — 7% of total memory. Critical given cuVSLAM map growth risk. |
|
||||
| No explicit TRT engine build step in offline pipeline | **Functional**: Draft03 mentions TRT FP16 but doesn't define the build workflow. When/where are engines built? | **Add TRT engine build to offline preparation pipeline**: After satellite tile download, run trtexec on Jetson to build .engine files. Store alongside tiles. One-time cost per model version. |
|
||||
| Cross-platform portability via ONNX Runtime | **Functional**: ONNX Runtime's primary value is cross-platform support. Our deployment is Jetson-only — this value is zero. We pay the performance/memory tax for unused portability. | **Drop ONNX Runtime**. Jetson Orin Nano Super is fixed deployment hardware. TRT Engine is the optimal runtime for NVIDIA-only deployment. |
|
||||
| No DLA offloading considered | **Performance**: Draft03 doesn't mention DLA. Jetson Orin Nano has NO DLA cores — only Orin NX (1-2) and AGX Orin (2) have DLA. | **Confirm: DLA offloading is NOT available on Orin Nano**. All inference must run on GPU (1024 CUDA cores, 16 tensor cores). This makes maximizing GPU efficiency via native TRT even more critical. |
|
||||
| LiteSAM MinGRU TRT compatibility risk | **Functional**: LiteSAM's subpixel refinement uses 4 stacked MinGRU layers over a 3×3 candidate window (seq_len=9). MinGRU gates depend only on input C_f (not h_{t-1}), so z_t/h̃_t are pre-computable. Ops are standard: Linear, Sigmoid, Mul, Add, ReLU, Tanh. Risk is LOW-MEDIUM — depends on whether implementation uses logcumsumexp (problematic) or simple loop (fine). Seq_len=9 makes this trivially rewritable. | **Day-one verification**: clone LiteSAM repo → torch.onnx.export → polygraphy inspect → trtexec --fp16. If export fails on MinGRU: rewrite forward() as unrolled loop (9 steps). **If LiteSAM cannot be made TRT-compatible: replace with EfficientLoFTR TRT** (proven TRT path via Coarse_LoFTR_TRT, 15.05M params, semi-dense matching). |
|
||||
| Camera shoots at ~3fps (draft03/04 hard constraint) | **Functional**: ADTI 20L V1 max continuous rate is **2.0 fps** (burst only, buffer-limited). ADTi recommends 1.5s per capture (**0.7 fps sustained**). 3fps is physically impossible. 2fps is not sustainable for multi-hour flights (buffer saturation + mechanical shutter wear). | **Revised to 0.7 fps sustained**. ADTI 20L V1 is the sole navigation camera — used for both cuVSLAM VO and satellite matching. At 70 km/h cruise and 600m altitude: 27.8m inter-frame displacement, ~175px pixel shift, **95.2% frame overlap** — within pyramid-assisted LK optical flow range. ESKF IMU prediction at 5-10Hz bridges 1.43s gaps between frames. Satellite matching triggered on keyframes from the same stream. Viewpro A40 Pro reserved for AI object detection only. |
|
||||
|
||||
## UAV Platform
|
||||
|
||||
### Airframe Configuration
|
||||
|
||||
| Component | Specification | Weight |
|
||||
|-----------|--------------|--------|
|
||||
| Airframe | Custom 3.5m S-2 Glass Composite, Eppler 423 airfoil | ~4.5 kg |
|
||||
| Battery | 2x VANT Semi-Solid State 6S 30Ah (22.2V, 666Wh each) | 5.30 kg (2.65 kg each) |
|
||||
| Motor | T-Motor AT4125 KV540 (2000W peak, 5.5 kg thrust w/ APC 15x8) | 0.36 kg |
|
||||
| Propulsion Acc. | ESC, 15x8 Folding Propeller, Servos, Cables | ~0.50 kg |
|
||||
| Avionics | Pixhawk 6x + GPS | ~0.10 kg |
|
||||
| Computing | NVIDIA Jetson Orin Nano Super Dev Kit | ~0.30 kg |
|
||||
| Camera 1 | ADTI 20L V1 APS-C Camera + 16mm Lens | ~0.22 kg |
|
||||
| Camera 2 | Viewpro A40 Pro (A40TPro) AI Gimbal | ~0.85 kg |
|
||||
| Misc | Mounts, wiring, connectors | ~0.35 kg |
|
||||
| **Total AUW** | | **~12.5 kg** |
|
||||
|
||||
T-Motor AT4125 KV540 is spec'd for 8-10 kg fixed-wing. At 12.5 kg AUW, static thrust-to-weight is ~0.44. Flyable but margins are tight — weight optimization should be monitored.
|
||||
|
||||
### Flight Performance (Max Endurance)
|
||||
|
||||
Assumptions: wingspan 3.5m, mean chord ~0.30m, wing area S ~1.05 m², AR ~11.7, Eppler 423 Cl_max ~1.8, cruise altitude 800-1000m (ρ ~1.10 kg/m³).
|
||||
|
||||
| Parameter | Value |
|
||||
|-----------|-------|
|
||||
| Stall speed (900m altitude) | 10.6 m/s (38 km/h) |
|
||||
| Min-power speed (theoretical) | 12.0 m/s (43 km/h) — only 13% above stall, impractical |
|
||||
| **Max endurance cruise (1.3× stall margin)** | **14 m/s (50 km/h)** |
|
||||
| Best range speed | ~18 m/s (65 km/h) |
|
||||
|
||||
| Energy Budget | Value |
|
||||
|---------------|-------|
|
||||
| Total battery energy | 1332 Wh (2 × 666 Wh) |
|
||||
| Usable (80% DoD) | 1066 Wh |
|
||||
| Climb to 900m (~5 min at 3 m/s) | −57 Wh |
|
||||
| 10% reserve | ×0.9 |
|
||||
| **Available for cruise** | **~908 Wh** |
|
||||
|
||||
| Power Budget | Value |
|
||||
|--------------|-------|
|
||||
| Propulsion (L/D ~15, η_prop 0.65, η_motor 0.80) | ~212 W |
|
||||
| Electronics (Pixhawk + Jetson + cameras + gimbal + servos) | ~55 W |
|
||||
| **Total cruise power** | **~267 W** |
|
||||
|
||||
| Endurance | Value |
|
||||
|-----------|-------|
|
||||
| **Max endurance (at 50 km/h)** | **~3.4 hours** |
|
||||
| Total mission (incl. climb + reserve) | ~3.5 hours |
|
||||
| Max range (at 65 km/h best-range speed) | ~209 km |
|
||||
|
||||
### Camera 1: ADTI 20L V1 + 16mm Lens
|
||||
|
||||
| Spec | Value |
|
||||
|------|-------|
|
||||
| Sensor | Sony CMOS APS-C, 23.2 × 15.4 mm |
|
||||
| Resolution | 5456 × 3632 (20 MP) |
|
||||
| Focal length | 16 mm |
|
||||
| Shutter | Mechanical global inter-mirror shutter (ADTI product line) |
|
||||
| Max continuous fps | **2.0 fps** (spec — burst rate, buffer-limited) |
|
||||
| Sustained capture rate | **~0.7 fps** (ADTi recommended 1.5s per capture) |
|
||||
| File formats | JPEG, RAW, RAW+JPEG |
|
||||
| HDMI video output | 1080p 24p/30p, 1440×1080 30p |
|
||||
| Weight | 118g body + ~100g lens |
|
||||
| ISP | Socionext Milbeaut |
|
||||
| Cooling | Active fan |
|
||||
|
||||
**2.0 fps is a burst rate, not sustained.** The 2.0 fps spec is limited by the internal buffer (estimated 3-5 frames). Once the buffer fills, the camera throttles to the write pipeline speed of ~0.7 fps. The bottleneck chain: mechanical shutter actuation (~100-300ms) + ISP processing (demosaic, NR, JPEG compress) + storage write (~5-10 MB/frame JPEG). The 1.5s/capture recommendation guarantees the buffer never fills and accounts for thermal margin over multi-hour flights.
|
||||
|
||||
**Mechanical shutter wear at sustained rates:**
|
||||
|
||||
| Rate | Actuations per 3.5h flight | Est. flights before 150K shutter life | Est. flights before 500K shutter life |
|
||||
|------|---------------------------|---------------------------------------|---------------------------------------|
|
||||
| 2.0 fps (burst, unsustainable) | 25,200 | ~6 | ~20 |
|
||||
| 1.0 fps | 12,600 | ~12 | ~40 |
|
||||
| 0.7 fps (recommended) | 8,820 | ~17 | ~57 |
|
||||
|
||||
The 20L V1 shutter lifespan is not documented. The higher-end 102PRO is rated at 500K actuations. The 20L as an entry-level model is likely 100K-150K. At 0.7 fps sustained, this gives ~11-57 flights depending on shutter rating.
|
||||
|
||||
**Confirmed operational rate: 0.7 fps (JPEG mode) for both VO and satellite matching.**
|
||||
|
||||
### Camera 1: Ground Coverage at Mission Altitude
|
||||
|
||||
| Parameter | H = 600 m | H = 800 m | H = 1000 m |
|
||||
|-----------|-----------|-----------|------------|
|
||||
| Along-track footprint (15.4mm side) | 577 m | 770 m | 962 m |
|
||||
| Cross-track footprint (23.2mm side) | 870 m | 1160 m | 1450 m |
|
||||
| GSD | 15.9 cm/pixel | 21.3 cm/pixel | 26.6 cm/pixel |
|
||||
|
||||
### Camera 1: Forward Overlap at 0.7 fps
|
||||
|
||||
At 0.7 fps, distance between shots = V / 0.7.
|
||||
|
||||
**At 70 km/h (19.4 m/s) — realistic cruise speed:**
|
||||
|
||||
| Altitude | Along-track footprint | Shot gap (27.8m) | Forward overlap | Pixel shift |
|
||||
|----------|-----------------------|------------------|-----------------|-------------|
|
||||
| 600 m | 577 m | 27.8 m | **95.2%** | ~175 px |
|
||||
| 800 m | 770 m | 27.8 m | **96.4%** | ~131 px |
|
||||
| 1000 m | 962 m | 27.8 m | **97.1%** | ~105 px |
|
||||
|
||||
**Across speed range (at 600m altitude, 0.7 fps):**
|
||||
|
||||
| Speed | Frame gap | Pixel shift | Forward overlap |
|
||||
|-------|-----------|-------------|-----------------|
|
||||
| 50 km/h (14 m/s) | 20.0 m | ~126 px | 96.5% |
|
||||
| 70 km/h (19.4 m/s) | 27.8 m | ~175 px | 95.2% |
|
||||
| 90 km/h (25 m/s) | 35.7 m | ~224 px | 93.8% |
|
||||
|
||||
Even at 90 km/h and the lowest altitude (600m), overlap remains >93%. The 16mm lens on APS-C at these altitudes produces a footprint so large that 0.7 fps provides massive redundancy for both VO and satellite matching.
|
||||
|
||||
**For satellite matching specifically** (keyframe-based, every 5-10 camera frames):
|
||||
|
||||
| Target overlap | Required gap (600m) | Time between shots (70 km/h) | Capture rate |
|
||||
|----------------|---------------------|------------------------------|--------------|
|
||||
| 80% | 115 m | 5.9 s | 0.17 fps |
|
||||
| 70% | 173 m | 8.9 s | 0.11 fps |
|
||||
| 60% | 231 m | 11.9 s | 0.084 fps |
|
||||
|
||||
Even at the lowest altitude and highest speed, 1 satellite matching keyframe every 6-12 seconds gives 60-80% overlap.
|
||||
|
||||
### Camera 2: Viewpro A40 Pro (A40TPro) AI Gimbal
|
||||
|
||||
Dual EO/IR gimbal with AI tracking. Reserved for **AI object detection and tracking only** — not used for navigation. Operates independently from the navigation pipeline.
|
||||
|
||||
### Camera Role Assignment
|
||||
|
||||
| Role | Camera | Rate | Notes |
|
||||
|------|--------|------|-------|
|
||||
| Visual Odometry (cuVSLAM) | ADTI 20L V1 + 16mm | 0.7 fps (sustained) | Sole navigation camera. At 70 km/h: 27.8m/~175px displacement at 600m alt. 95%+ overlap. |
|
||||
| Satellite Image Matching | ADTI 20L V1 + 16mm | Keyframes from VO stream (~every 5-10 frames) | Same image stream as VO. Subset routed to satellite matcher on Stream B. |
|
||||
| AI Object Detection | Viewpro A40 Pro | Independent | Not part of navigation pipeline. |
|
||||
|
||||
### cuVSLAM at 0.7 fps — Feasibility
|
||||
|
||||
At 0.7 fps and 70 km/h cruise, inter-frame displacement is 27.8m. In pixel terms:
|
||||
|
||||
| Altitude | Displacement | Pixel shift | % of image height | Overlap |
|
||||
|----------|-------------|-------------|-------------------|---------|
|
||||
| 600 m | 27.8 m | ~175 px | 4.8% | 95.2% |
|
||||
| 800 m | 27.8 m | ~131 px | 3.6% | 96.4% |
|
||||
| 1000 m | 27.8 m | ~105 px | 2.9% | 97.1% |
|
||||
|
||||
cuVSLAM uses Lucas-Kanade optical flow with image pyramids. Standard LK handles 30-50px displacements on the base level; with 3-4 pyramid levels, effective search range extends to ~150-200px. At 600m altitude, the 175px shift is within this pyramid-assisted range. At 800-1000m, the shift drops to 105-131px — well within range.
|
||||
|
||||
Key factors that make 0.7 fps viable at high altitude:
|
||||
- **Large footprint**: The 16mm lens on APS-C at 600-1000m produces 577-962m along-track coverage. The aircraft moves only 4-5% of the frame between shots.
|
||||
- **High texture from altitude**: At 600-1000m, each frame covers a large area with diverse terrain features (roads, field boundaries, structures) even in agricultural regions.
|
||||
- **IMU bridging**: cuVSLAM's built-in IMU integrator provides pose prediction during the 1.43s gap between frames. ESKF IMU prediction runs at 5-10Hz for continuous GPS_INPUT output.
|
||||
- **95%+ overlap**: Consecutive frames share >95% content — abundant features for matching.
|
||||
|
||||
**Risk**: Over completely uniform terrain (e.g., single crop field filling entire 577m+ footprint), feature tracking may still fail. cuVSLAM falls back to IMU-only (~1s acceptable) then constant-velocity (~0.5s) before tracking loss. Satellite matching corrections every 5-10 frames bound accumulated drift.
|
||||
|
||||
## Product Solution Description
|
||||
|
||||
A real-time GPS-denied visual navigation system for fixed-wing UAVs, running on a Jetson Orin Nano Super (8GB). All AI model inference uses **native TensorRT Engine files** — no ONNX Runtime dependency. The system replaces the GPS module by sending MAVLink GPS_INPUT messages via pymavlink over UART at 5-10Hz.
|
||||
|
||||
Position is determined by fusing: (1) CUDA-accelerated visual odometry (cuVSLAM — native CUDA) from ADTI 20L V1 at 0.7 fps sustained, (2) absolute position corrections from satellite image matching (LiteSAM or XFeat — TRT Engine FP16) using keyframes from the same ADTI image stream, and (3) IMU data from the flight controller via ESKF. Viewpro A40 Pro is reserved for AI object detection only.
|
||||
|
||||
**Inference runtime decision**: Native TRT Engine over ONNX Runtime because:
|
||||
1. ONNX RT CUDA EP is 7-8x slower on Orin Nano (tensor core bug)
|
||||
2. ONNX RT TRT-EP wastes ~280-300MB per model (serialized engine retained in memory)
|
||||
3. Cross-platform portability has zero value — deployment is Jetson-only
|
||||
4. Native TRT provides direct CUDA stream control for pipelining with cuVSLAM
|
||||
|
||||
**Hard constraint**: ADTI 20L V1 shoots at 0.7 fps sustained (1430ms interval). Full VO+ESKF pipeline within 400ms per frame. Satellite matching async on keyframes (every 5-10 camera frames). GPS_INPUT at 5-10Hz (ESKF IMU prediction fills gaps between camera frames).
|
||||
|
||||
**AI Model Runtime Summary**:
|
||||
|
||||
| Model | Runtime | Precision | Memory | Integration |
|
||||
|-------|---------|-----------|--------|-------------|
|
||||
| cuVSLAM | Native CUDA (PyCuVSLAM) | N/A (closed-source) | ~200-500MB | CUDA Stream A |
|
||||
| LiteSAM | TRT Engine | FP16 | ~50-80MB | CUDA Stream B |
|
||||
| XFeat | TRT Engine | FP16 | ~30-50MB | CUDA Stream B (fallback) |
|
||||
| ESKF | CPU (Python/C++) | FP64 | ~10MB | CPU thread |
|
||||
|
||||
**Offline Preparation Pipeline** (before flight):
|
||||
1. Download satellite tiles → validate → pre-resize → store (existing)
|
||||
2. **NEW: Build TRT engines on Jetson** (one-time per model version)
|
||||
- `trtexec --onnx=litesam_fp16.onnx --saveEngine=litesam.engine --fp16`
|
||||
- `trtexec --onnx=xfeat.onnx --saveEngine=xfeat.engine --fp16`
|
||||
3. Copy tiles + engines to Jetson storage
|
||||
4. At startup: load engines + preload tiles into RAM
|
||||
|
||||
```
|
||||
┌─────────────────────────────────────────────────────────────────────┐
|
||||
│ OFFLINE (Before Flight) │
|
||||
│ 1. Satellite Tiles → Download & Validate → Pre-resize → Store │
|
||||
│ (Google Maps) (≥0.5m/px, <2yr) (matcher res) (GeoHash)│
|
||||
│ 2. TRT Engine Build (one-time per model version): │
|
||||
│ PyTorch model → reparameterize → ONNX export → trtexec --fp16 │
|
||||
│ Output: litesam.engine, xfeat.engine │
|
||||
│ 3. Copy tiles + engines to Jetson storage │
|
||||
└─────────────────────────────────────────────────────────────────────┘
|
||||
│
|
||||
▼
|
||||
┌─────────────────────────────────────────────────────────────────────┐
|
||||
│ ONLINE (During Flight) │
|
||||
│ │
|
||||
│ STARTUP: │
|
||||
│ 1. pymavlink → read GLOBAL_POSITION_INT → init ESKF │
|
||||
│ 2. Load TRT engines: litesam.engine + xfeat.engine │
|
||||
│ (tensorrt.Runtime → deserialize_cuda_engine → create_context) │
|
||||
│ 3. Allocate GPU buffers for TRT input/output (PyCUDA) │
|
||||
│ 4. Start cuVSLAM with ADTI 20L V1 camera stream │
|
||||
│ 5. Preload satellite tiles ±2km into RAM │
|
||||
│ 6. Begin GPS_INPUT output loop at 5-10Hz │
|
||||
│ │
|
||||
│ EVERY CAMERA FRAME (0.7fps sustained from ADTI 20L V1): │
|
||||
│ ┌──────────────────────────────────────┐ │
|
||||
│ │ ADTI 20L V1 → Downsample (CUDA) │ │
|
||||
│ │ → cuVSLAM VO+IMU (~9ms) │ ← CUDA Stream A │
|
||||
│ │ → ESKF measurement │ │
|
||||
│ └──────────────────────────────────────┘ │
|
||||
│ │
|
||||
│ 5-10Hz CONTINUOUS (IMU-driven between camera frames): │
|
||||
│ ┌──────────────────────────────────────┐ │
|
||||
│ │ ESKF IMU prediction → GPS_INPUT send │──→ Flight Controller │
|
||||
│ └──────────────────────────────────────┘ │
|
||||
│ │
|
||||
│ KEYFRAMES (every 5-10 camera frames, async): │
|
||||
│ ┌──────────────────────────────────────┐ │
|
||||
│ │ Same ADTI frame → TRT inference (B): │ │
|
||||
│ │ context.enqueue_v3(stream_B) │──→ ESKF correction │
|
||||
│ │ LiteSAM FP16 or XFeat FP16 │ │
|
||||
│ └──────────────────────────────────────┘ │
|
||||
│ │
|
||||
│ TELEMETRY (1Hz): │
|
||||
│ ┌──────────────────────────────────────┐ │
|
||||
│ │ NAMED_VALUE_FLOAT: confidence, drift │──→ Ground Station │
|
||||
│ └──────────────────────────────────────┘ │
|
||||
└─────────────────────────────────────────────────────────────────────┘
|
||||
```
|
||||
|
||||
## Architecture
|
||||
|
||||
### Component: AI Model Inference Runtime
|
||||
|
||||
| Solution | Tools | Advantages | Limitations | Performance | Memory | Fit |
|
||||
|----------|-------|-----------|-------------|------------|--------|-----|
|
||||
| Native TRT Engine | tensorrt Python + PyCUDA + trtexec | Optimal latency, minimal memory, full tensor core usage, direct CUDA stream control | Hardware-specific engines, manual buffer management, rebuild per TRT version | Optimal | ~50-130MB total (both models) | ✅ Best |
|
||||
| ONNX Runtime TRT-EP | onnxruntime + TensorRT EP | Auto-fallback for unsupported ops, simpler API, auto engine caching | +280-300MB per model, wrapper overhead, first-run latency spike | Near-parity (claimed), up to 3x slower (observed) | ~640-690MB total (both models) | ❌ Memory overhead unacceptable |
|
||||
| ONNX Runtime CUDA EP | onnxruntime + CUDA EP | Simplest API, broadest op support | 7-8x slower on Orin Nano (tensor core bug), no TRT optimizations | 7-8x slower | Standard | ❌ Performance unacceptable |
|
||||
| Torch-TensorRT | torch_tensorrt | AOT compilation, PyTorch-native, handles mixed TRT/PyTorch | Newer on Jetson, requires PyTorch runtime at inference | Near native TRT | PyTorch runtime ~500MB+ | ⚠️ Viable alternative if TRT export fails |
|
||||
|
||||
**Selected**: **Native TRT Engine** — optimal performance and memory on our fixed NVIDIA hardware.
|
||||
|
||||
**Fallback**: If any model has unsupported TRT ops (e.g., MinGRU in LiteSAM), use **Torch-TensorRT** for that specific model. Torch-TensorRT handles mixed TRT/PyTorch execution but requires PyTorch runtime in memory.
|
||||
|
||||
### Component: TRT Engine Conversion Workflow
|
||||
|
||||
**LiteSAM conversion**:
|
||||
1. Load PyTorch model with trained weights
|
||||
2. Reparameterize MobileOne backbone (collapse multi-branch → single Conv2d+BN)
|
||||
3. Export to ONNX: `torch.onnx.export(model, dummy_input, "litesam.onnx", opset_version=17)`
|
||||
4. Verify with polygraphy: `polygraphy inspect model litesam.onnx`
|
||||
5. Build engine on Jetson: `trtexec --onnx=litesam.onnx --saveEngine=litesam.engine --fp16 --memPoolSize=workspace:2048`
|
||||
6. Verify engine: `trtexec --loadEngine=litesam.engine --fp16`
|
||||
|
||||
**XFeat conversion**:
|
||||
1. Load PyTorch model
|
||||
2. Export to ONNX: `torch.onnx.export(model, dummy_input, "xfeat.onnx", opset_version=17)`
|
||||
3. Build engine on Jetson: `trtexec --onnx=xfeat.onnx --saveEngine=xfeat.engine --fp16`
|
||||
4. Alternative: use XFeatTensorRT C++ implementation directly
|
||||
|
||||
**INT8 quantization strategy** (optional, future optimization):
|
||||
- MobileOne backbone (CNN): INT8 safe with calibration data
|
||||
- TAIFormer (transformer attention): FP16 only — INT8 degrades accuracy
|
||||
- XFeat: evaluate INT8 on actual UAV-satellite pairs before deploying
|
||||
- Use nvidia-modelopt for calibration: `from modelopt.onnx.quantization import quantize`
|
||||
|
||||
### Component: TRT Python Inference Wrapper
|
||||
|
||||
Minimal wrapper class for TRT engine inference:
|
||||
|
||||
```python
|
||||
import tensorrt as trt
|
||||
import pycuda.driver as cuda
|
||||
|
||||
class TRTInference:
|
||||
def __init__(self, engine_path, stream):
|
||||
self.logger = trt.Logger(trt.Logger.WARNING)
|
||||
self.runtime = trt.Runtime(self.logger)
|
||||
with open(engine_path, 'rb') as f:
|
||||
self.engine = self.runtime.deserialize_cuda_engine(f.read())
|
||||
self.context = self.engine.create_execution_context()
|
||||
self.stream = stream
|
||||
self._allocate_buffers()
|
||||
|
||||
def _allocate_buffers(self):
|
||||
self.inputs = {}
|
||||
self.outputs = {}
|
||||
for i in range(self.engine.num_io_tensors):
|
||||
name = self.engine.get_tensor_name(i)
|
||||
shape = self.engine.get_tensor_shape(name)
|
||||
dtype = trt.nptype(self.engine.get_tensor_dtype(name))
|
||||
size = trt.volume(shape)
|
||||
device_mem = cuda.mem_alloc(size * np.dtype(dtype).itemsize)
|
||||
self.context.set_tensor_address(name, int(device_mem))
|
||||
mode = self.engine.get_tensor_mode(name)
|
||||
if mode == trt.TensorIOMode.INPUT:
|
||||
self.inputs[name] = (device_mem, shape, dtype)
|
||||
else:
|
||||
self.outputs[name] = (device_mem, shape, dtype)
|
||||
|
||||
def infer_async(self, input_data):
|
||||
for name, data in input_data.items():
|
||||
cuda.memcpy_htod_async(self.inputs[name][0], data, self.stream)
|
||||
self.context.enqueue_v3(self.stream.handle)
|
||||
|
||||
def get_output(self):
|
||||
results = {}
|
||||
for name, (dev_mem, shape, dtype) in self.outputs.items():
|
||||
host_mem = np.empty(shape, dtype=dtype)
|
||||
cuda.memcpy_dtoh_async(host_mem, dev_mem, self.stream)
|
||||
self.stream.synchronize()
|
||||
return results
|
||||
```
|
||||
|
||||
Key design: `infer_async()` + `get_output()` split enables pipelining with cuVSLAM on Stream A while satellite matching runs on Stream B.
|
||||
|
||||
### Component: Visual Odometry (UPDATED — camera rate corrected)
|
||||
|
||||
cuVSLAM — native CUDA library. Fed by **ADTI 20L V1 at 0.7 fps sustained** (previously assumed 3fps which exceeds camera hardware limit; 2.0 fps spec is burst-only, not sustainable). At 70 km/h cruise the inter-frame displacement is 27.8m — at 600m altitude this translates to ~175px (4.8% of frame), within pyramid-assisted LK optical flow range. At 800-1000m altitude the pixel shift drops to 105-131px. 95%+ frame overlap ensures abundant features for matching. ESKF IMU prediction at 5-10Hz fills the position output between sparse camera frames.
|
||||
|
||||
### Component: Satellite Image Matching (UPDATED runtime + fallback chain)
|
||||
|
||||
| Solution | Tools | Advantages | Limitations | Performance (est. Orin Nano Super TRT FP16) | Params | Fit |
|
||||
|----------|-------|-----------|-------------|----------------------------------------------|--------|-----|
|
||||
| LiteSAM (opt) TRT Engine FP16 @ 1280px | trtexec + tensorrt Python | Best satellite-aerial accuracy (RMSE@30=17.86m UAV-VisLoc), 6.31M params, smallest model | MinGRU TRT export needs verification (LOW-MEDIUM risk) | Est. ~165-330ms | 6.31M | ✅ Primary (if TRT export succeeds AND ≤200ms) |
|
||||
| EfficientLoFTR TRT Engine FP16 | trtexec + tensorrt Python | Proven TRT path (Coarse_LoFTR_TRT repo, 138 stars). Semi-dense. CVPR 2024. High accuracy. | 2.4x more params than LiteSAM. Requires einsum→elementary ops rewrite for TRT (documented in Coarse_LoFTR_TRT paper). | Est. ~200-400ms | 15.05M | ✅ Fallback if LiteSAM TRT fails |
|
||||
| XFeat TRT Engine FP16 | trtexec + tensorrt Python (or XFeatTensorRT C++) | Fastest. Proven TRT implementation. Lightweight. | General-purpose, not designed for cross-view satellite-aerial gap (but nadir-nadir gap is small). | Est. ~50-100ms | <5M | ✅ Speed fallback |
|
||||
|
||||
**Decision tree (day-one on Orin Nano Super)**:
|
||||
1. Clone LiteSAM repo → reparameterize MobileOne → `torch.onnx.export()` → `polygraphy inspect`
|
||||
2. If ONNX export succeeds → `trtexec --onnx=litesam.onnx --saveEngine=litesam.engine --fp16`
|
||||
3. If MinGRU causes ONNX/TRT failure → rewrite MinGRU forward() as unrolled 9-step loop → retry
|
||||
4. If rewrite fails or accuracy degrades → **switch to EfficientLoFTR TRT**:
|
||||
- Apply Coarse_LoFTR_TRT TRT-adaptation techniques (einsum replacement, etc.)
|
||||
- Export to ONNX → trtexec --fp16
|
||||
- Benchmark at 640×480 and 1280px
|
||||
5. Benchmark winner: **if ≤200ms → use it. If >200ms but ≤300ms → acceptable (async on Stream B). If >300ms → use XFeat TRT**
|
||||
|
||||
**EfficientLoFTR TRT adaptation** (from Coarse_LoFTR_TRT paper, proven workflow):
|
||||
- Replace `torch.einsum()` with elementary ops (view, bmm, reshape, sum)
|
||||
- Replace any TRT-incompatible high-level PyTorch functions
|
||||
- Use ONNX export path (less memory required than Torch-TensorRT on 8GB device)
|
||||
- Knowledge distillation available for further parameter reduction if needed
|
||||
|
||||
**Satellite matching cadence**: Keyframes selected from the ADTI VO stream every 5-10 frames (~every 2.5-14s depending on camera fps setting). At 800-1000m altitude and 14 m/s cruise, this yields 60-97% forward overlap between satellite match frames. Matching runs async on Stream B — does not block VO on Stream A.
|
||||
|
||||
### Component: Sensor Fusion (UNCHANGED)
|
||||
ESKF — CPU-based mathematical filter, not affected.
|
||||
|
||||
### Component: Flight Controller Integration (UNCHANGED)
|
||||
pymavlink — not affected by TRT migration.
|
||||
|
||||
### Component: Ground Station Telemetry (UNCHANGED)
|
||||
MAVLink NAMED_VALUE_FLOAT — not affected.
|
||||
|
||||
### Component: Startup & Lifecycle (UPDATED)
|
||||
|
||||
**Updated startup sequence**:
|
||||
1. Boot Jetson → start GPS-Denied service (systemd)
|
||||
2. Connect to flight controller via pymavlink on UART
|
||||
3. Wait for heartbeat from flight controller
|
||||
4. **Initialize PyCUDA context**
|
||||
5. **Load TRT engines**: litesam.engine + xfeat.engine via tensorrt.Runtime.deserialize_cuda_engine()
|
||||
6. **Allocate GPU I/O buffers** for both models
|
||||
7. **Create CUDA streams**: Stream A (cuVSLAM), Stream B (satellite matching)
|
||||
8. Read GLOBAL_POSITION_INT → init ESKF
|
||||
9. Start cuVSLAM with ADTI 20L V1 camera frames
|
||||
10. Begin GPS_INPUT output loop at 5-10Hz
|
||||
11. Preload satellite tiles within ±2km into RAM
|
||||
12. System ready
|
||||
|
||||
**Engine load time**: ~1-3 seconds per engine (deserialization from .engine file). One-time cost at startup.
|
||||
|
||||
### Component: Thermal Management (UNCHANGED)
|
||||
Same adaptive pipeline. TRT engines are slightly more power-efficient than ONNX Runtime, but the difference is within noise.
|
||||
|
||||
### Component: Object Localization (UNCHANGED)
|
||||
Not affected — trigonometric calculation, no AI inference.
|
||||
|
||||
## Speed Optimization Techniques
|
||||
|
||||
### 1. cuVSLAM for Visual Odometry (~9ms/frame)
|
||||
Fed by ADTI 20L V1 at 0.7 fps sustained. At 70 km/h cruise and 600m altitude, inter-frame displacement is 27.8m (~175px, 4.8% of frame). With pyramid-based LK optical flow (3-4 levels), effective search range is ~150-200px — 175px is within range. At 800-1000m altitude, pixel shift drops to 105-131px. 95%+ overlap between consecutive frames.
|
||||
|
||||
### 2. Native TRT Engine Inference (NEW)
|
||||
All AI models run as pre-compiled TRT FP16 engines:
|
||||
- Engine files built offline with trtexec (one-time per model version)
|
||||
- Loaded at startup (~1-3s per engine)
|
||||
- Inference via context.enqueue_v3() on dedicated CUDA Stream B
|
||||
- GPU buffers pre-allocated — zero runtime allocation during flight
|
||||
- No ONNX Runtime dependency — no framework overhead
|
||||
|
||||
Memory advantage over ONNX Runtime TRT-EP: ~560-600MB saved (both models combined).
|
||||
Latency advantage: eliminates ONNX wrapper overhead, guaranteed tensor core utilization.
|
||||
|
||||
### 3. CUDA Stream Pipelining (REFINED)
|
||||
- Stream A: cuVSLAM VO from ADTI 20L V1 (~9ms) + ESKF fusion (~1ms)
|
||||
- Stream B: TRT engine inference for satellite matching (LiteSAM or XFeat, async, triggered on keyframe from same ADTI stream)
|
||||
- CPU: GPS_INPUT output loop, NAMED_VALUE_FLOAT, command listener, tile management
|
||||
- **NEW**: Both cuVSLAM and TRT engines use CUDA streams natively — no framework abstraction layer. Direct GPU scheduling.
|
||||
|
||||
### 4-7. (UNCHANGED from draft03)
|
||||
Keyframe-based satellite matching, TensorRT FP16 optimization, proactive tile loading, 5-10Hz GPS_INPUT output — all unchanged.
|
||||
|
||||
## Processing Time Budget
|
||||
|
||||
### VO Frame (every ~1430ms from ADTI 20L V1 at 0.7 fps)
|
||||
|
||||
| Step | Time | Notes |
|
||||
|------|------|-------|
|
||||
| ADTI image transfer | ~5-10ms | Trigger + readout |
|
||||
| Downsample (CUDA) | ~2ms | To cuVSLAM input resolution |
|
||||
| cuVSLAM VO+IMU | ~9ms | CUDA Stream A |
|
||||
| ESKF measurement update | ~1ms | CPU |
|
||||
| **Total** | **~17-22ms** | Well within 1430ms budget |
|
||||
|
||||
Between camera frames, ESKF IMU prediction runs at 5-10Hz to maintain continuous GPS_INPUT output. The ~1.4s gap between frames is bridged entirely by IMU integration.
|
||||
|
||||
### Keyframe Satellite Matching (every 5-10 camera frames, async CUDA Stream B)
|
||||
|
||||
**Path A — LiteSAM TRT Engine FP16 at 1280px**:
|
||||
|
||||
| Step | Time | Notes |
|
||||
|------|------|-------|
|
||||
| Image already in GPU (from VO) | ~0ms | Same frame used for VO and matching |
|
||||
| Load satellite tile | ~1ms | Pre-loaded in RAM |
|
||||
| Copy input to GPU buffer | <0.5ms | PyCUDA memcpy_htod_async |
|
||||
| LiteSAM TRT Engine FP16 | ≤200ms | context.enqueue_v3(stream_B) |
|
||||
| Copy output from GPU | <0.5ms | PyCUDA memcpy_dtoh_async |
|
||||
| Geometric pose (RANSAC) | ~5ms | Homography |
|
||||
| ESKF satellite update | ~1ms | Delayed measurement |
|
||||
| **Total** | **≤210ms** | Async on Stream B, does not block VO |
|
||||
|
||||
**Path B — XFeat TRT Engine FP16**:
|
||||
|
||||
| Step | Time | Notes |
|
||||
|------|------|-------|
|
||||
| XFeat TRT Engine inference | ~50-80ms | context.enqueue_v3(stream_B) |
|
||||
| Geometric verification (RANSAC) | ~5ms | |
|
||||
| ESKF satellite update | ~1ms | |
|
||||
| **Total** | **~60-90ms** | Async on Stream B |
|
||||
|
||||
## Memory Budget (Jetson Orin Nano Super, 8GB shared)
|
||||
|
||||
| Component | Memory (Native TRT) | Memory (ONNX RT TRT-EP) | Notes |
|
||||
|-----------|---------------------|--------------------------|-------|
|
||||
| OS + runtime | ~1.5GB | ~1.5GB | JetPack 6.2 + Python |
|
||||
| cuVSLAM | ~200-500MB | ~200-500MB | CUDA library + map |
|
||||
| **LiteSAM TRT engine** | **~50-80MB** | **~330-360MB** | Native TRT vs TRT-EP. If LiteSAM fails: EfficientLoFTR ~100-150MB |
|
||||
| **XFeat TRT engine** | **~30-50MB** | **~310-330MB** | Native TRT vs TRT-EP |
|
||||
| Preloaded satellite tiles | ~200MB | ~200MB | ±2km of flight plan |
|
||||
| pymavlink + MAVLink | ~20MB | ~20MB | |
|
||||
| FastAPI (local IPC) | ~50MB | ~50MB | |
|
||||
| ESKF + buffers | ~10MB | ~10MB | |
|
||||
| ONNX Runtime framework | **0MB** | **~150MB** | Eliminated with native TRT |
|
||||
| **Total** | **~2.1-2.9GB** | **~2.8-3.6GB** | |
|
||||
| **% of 8GB** | **26-36%** | **35-45%** | |
|
||||
| **Savings** | — | — | **~700MB saved with native TRT** |
|
||||
|
||||
## Confidence Scoring → GPS_INPUT Mapping
|
||||
Unchanged from draft03.
|
||||
|
||||
## Key Risks and Mitigations
|
||||
|
||||
| Risk | Likelihood | Impact | Mitigation |
|
||||
|------|-----------|--------|------------|
|
||||
| **LiteSAM MinGRU ops unsupported in TRT 10.3** | LOW-MEDIUM | LiteSAM TRT export fails | Day-one verification: ONNX export → polygraphy → trtexec. If MinGRU fails: (1) rewrite as unrolled 9-step loop, (2) if still fails: **switch to EfficientLoFTR TRT** (proven TRT path, Coarse_LoFTR_TRT, 15.05M params). XFeat TRT as speed fallback. |
|
||||
| **TRT engine build OOM on 8GB Jetson** | LOW | Cannot build engines on target device | Our models are small (6.31M LiteSAM, <5M XFeat). OOM unlikely. If occurs: reduce --memPoolSize, or build on identical Orin Nano module with more headroom |
|
||||
| **Engine incompatibility after JetPack update** | MEDIUM | Must rebuild engines | Include engine rebuild in JetPack update procedure. Takes minutes per model. |
|
||||
| **MAVSDK cannot send GPS_INPUT** | CONFIRMED | Must use pymavlink | Unchanged from draft03 |
|
||||
| **cuVSLAM fails on low-texture terrain** | HIGH | Frequent tracking loss | ADTI at 0.7 fps means 27.8m inter-frame displacement at 70 km/h. At 600m+ altitude, pixel shift is 105-175px with 95%+ overlap — within pyramid-assisted LK range. HIGH risk remains over completely uniform terrain (single crop covering 577m+ footprint). IMU bridging + satellite matching corrections bound drift. |
|
||||
| **Thermal throttling** | MEDIUM | Satellite matching budget blown | Unchanged from draft03 |
|
||||
| LiteSAM TRT FP16 >200ms at 1280px | MEDIUM | Must use fallback matcher | Day-one benchmark. Fallback chain: EfficientLoFTR TRT (if ≤300ms) → XFeat TRT (if all >300ms) |
|
||||
| Google Maps satellite quality in conflict zone | HIGH | Satellite matching fails | Unchanged from draft03 |
|
||||
| **AUW exceeds AT4125 recommended range** | MEDIUM | Reduced endurance, motor thermal stress | 12.5 kg AUW vs 8-10 kg recommended. Monitor motor temps. Consider weight reduction (lighter gimbal, single battery for shorter missions). |
|
||||
| **cuVSLAM at 0.7 fps — inter-frame displacement** | MEDIUM | VO tracking loss on uniform terrain | At 0.7 fps and 70 km/h: ~175px displacement at 600m (4.8% of frame, 95.2% overlap). Within pyramid-assisted LK range (150-200px). At 800m+: drops to 105-131px. Mitigations: (1) cuVSLAM IMU integrator bridges 1.43s frame gaps, (2) ESKF IMU prediction at 5-10Hz fills position gaps, (3) satellite matching corrections every 5-10 frames bound drift. |
|
||||
| **ADTI mechanical shutter lifespan** | MEDIUM | Shutter replacement needed periodically | At 0.7 fps sustained over 3.5h flights: ~8,800 actuations/flight. Shutter life unknown for 20L (102PRO is 500K, entry-level likely 100-150K). Estimated 11-57 flights before replacement. Budget for shutter replacement as consumable. |
|
||||
|
||||
## Testing Strategy
|
||||
|
||||
### Integration / Functional Tests
|
||||
All tests from draft03 unchanged, plus:
|
||||
- **TRT engine load test**: Verify litesam.engine and xfeat.engine load successfully on Jetson Orin Nano Super
|
||||
- **TRT inference correctness**: Compare TRT engine output vs PyTorch reference output (max L1 error < 0.01)
|
||||
- **CUDA Stream B pipelining**: Verify satellite matching on Stream B does not block cuVSLAM on Stream A
|
||||
- **Engine pre-built validation**: Verify engine files from offline preparation work without rebuild at runtime
|
||||
- **ADTI 20L V1 sustained capture rate**: Verify camera sustains 0.7 fps in JPEG mode over extended periods (>30 min) without buffer overflow or overheating. Also test 1.0 fps to determine if higher sustained rate is achievable.
|
||||
- **ADTI trigger timing**: Verify camera trigger and image transfer pipeline delivers frames to cuVSLAM within acceptable latency (<50ms from trigger to GPU buffer)
|
||||
|
||||
### Non-Functional Tests
|
||||
All tests from draft03 unchanged, plus:
|
||||
- **TRT engine build time**: Measure trtexec build time for LiteSAM and XFeat on Orin Nano Super (expected: 1-5 minutes each)
|
||||
- **TRT engine load time**: Measure deserialization time (expected: 1-3 seconds each)
|
||||
- **Memory comparison**: Measure actual GPU memory with native TRT vs ONNX RT TRT-EP for both models
|
||||
- **MinGRU TRT compatibility** (day-one blocker):
|
||||
1. Clone LiteSAM repo, load pretrained weights
|
||||
2. Reparameterize MobileOne backbone
|
||||
3. `torch.onnx.export(model, dummy, "litesam.onnx", opset_version=17)`
|
||||
4. `polygraphy inspect model litesam.onnx` — check for unsupported ops
|
||||
5. `trtexec --onnx=litesam.onnx --saveEngine=litesam.engine --fp16`
|
||||
6. If step 3 or 5 fails on MinGRU: rewrite MinGRU forward() as unrolled loop, retry
|
||||
7. If still fails: switch to EfficientLoFTR, apply Coarse_LoFTR_TRT adaptation
|
||||
8. Compare TRT output vs PyTorch reference (max L1 error < 0.01)
|
||||
- **EfficientLoFTR TRT fallback benchmark** (if LiteSAM fails): apply TRT adaptation from Coarse_LoFTR_TRT → ONNX → trtexec → measure latency at 640×480 and 1280px
|
||||
- **Tensor core utilization**: Verify with NSight that TRT engines use tensor cores (unlike ONNX RT CUDA EP)
|
||||
- **Flight endurance validation**: Ground-test full system power draw (propulsion + electronics) against 267W estimate. Verify ~3.4h endurance target.
|
||||
- **cuVSLAM at 0.7 fps**: Benchmark VO tracking quality, drift rate, and tracking loss frequency at 0.7 fps with ADTI 20L V1. Measure IMU integrator effectiveness for bridging 1.43s inter-frame gaps. Test at 600m and 800m altitude, over both textured and low-texture terrain.
|
||||
- **ADTI shutter durability**: Track shutter actuation count across flights. Monitor for shutter failure symptoms (missed frames, inconsistent exposure).
|
||||
|
||||
## References
|
||||
- ONNX Runtime Issue #24085 (Jetson Orin Nano tensor core bug): https://github.com/microsoft/onnxruntime/issues/24085
|
||||
- ONNX Runtime Issue #20457 (TRT-EP memory overhead): https://github.com/microsoft/onnxruntime/issues/20457
|
||||
- ONNX Runtime Issue #12083 (TRT-EP vs native TRT): https://github.com/microsoft/onnxruntime/issues/12083
|
||||
- NVIDIA TensorRT 10 Python API: https://docs.nvidia.com/deeplearning/tensorrt/10.15.1/inference-library/python-api-docs.html
|
||||
- TensorRT Best Practices: https://docs.nvidia.com/deeplearning/tensorrt/latest/performance/best-practices.html
|
||||
- TensorRT engine hardware specificity: https://github.com/NVIDIA/TensorRT/issues/1920
|
||||
- trtexec ONNX conversion: https://nvidia-jetson.piveral.com/jetson-orin-nano/how-to-convert-onnx-to-engine-on-jetson-orin-nano-dev-board/
|
||||
- Torch-TensorRT JetPack 6.2: https://docs.pytorch.org/TensorRT/v2.10.0/getting_started/jetpack.html
|
||||
- XFeatTensorRT: https://github.com/PranavNedunghat/XFeatTensorRT
|
||||
- JetPack 6.2 Release Notes: https://docs.nvidia.com/jetson/archives/jetpack-archived/jetpack-62/release-notes/index.html
|
||||
- Jetson Orin Nano Super: https://developer.nvidia.com/blog/nvidia-jetson-orin-nano-developer-kit-gets-a-super-boost/
|
||||
- DLA on Jetson Orin: https://developer.nvidia.com/blog/maximizing-deep-learning-performance-on-nvidia-jetson-orin-with-dla/
|
||||
- EfficientLoFTR (CVPR 2024): https://github.com/zju3dv/EfficientLoFTR
|
||||
- EfficientLoFTR HuggingFace: https://huggingface.co/docs/transformers/en/model_doc/efficientloftr
|
||||
- Coarse_LoFTR_TRT (TRT for embedded): https://github.com/Kolkir/Coarse_LoFTR_TRT
|
||||
- Coarse_LoFTR_TRT paper: https://ar5iv.labs.arxiv.org/html/2202.00770
|
||||
- LoFTR_TRT: https://github.com/Kolkir/LoFTR_TRT
|
||||
- minGRU ("Were RNNs All We Needed?"): https://huggingface.co/papers/2410.01201
|
||||
- minGRU PyTorch implementation: https://github.com/lucidrains/minGRU-pytorch
|
||||
- LiteSAM paper (MinGRU details, Eqs 12-16): https://www.mdpi.com/2072-4292/17/19/3349
|
||||
- DALGlue (UAV feature matching, 2025): https://www.nature.com/articles/s41598-025-21602-5
|
||||
- ADTI 20L V1 specs: https://unmannedrc.com/products/adti-20l-v1-mapping-camera
|
||||
- ADTI 20L V1 user manual: https://docs.adti.camera/adti-20l-and-24l-v1-quick-start-guide/
|
||||
- T-Motor AT4125 KV540: https://uav-en.tmotor.com/2019/Motors_0429/247.html
|
||||
- VANT Semi-Solid State 6S 30Ah battery: https://www.xtbattery.com/370wh/kg-42v-high-energy-density-6s-12s-14s-18s-30ah-semi-solid-state-drone-battery/
|
||||
- All references from solution_draft03.md
|
||||
|
||||
## Related Artifacts
|
||||
- AC Assessment: `_docs/00_research/gps_denied_nav/00_ac_assessment.md`
|
||||
- Research artifacts (this assessment): `_docs/00_research/trt_engine_migration/`
|
||||
- Previous research: `_docs/00_research/gps_denied_nav_v3/`
|
||||
- Tech stack evaluation: `_docs/01_solution/tech_stack.md`
|
||||
- Security analysis: `_docs/01_solution/security_analysis.md`
|
||||
- Previous draft: `_docs/01_solution/solution_draft04.md`
|
||||
@@ -1,622 +0,0 @@
|
||||
# Solution Draft
|
||||
|
||||
## Assessment Findings
|
||||
|
||||
|
||||
| Old Component Solution | Weak Point (functional/security/performance) | New Solution |
|
||||
| ------------------------------------------------------------------------------------ | ----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | -------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
|
||||
| ESKF described as "16-state vector, ~10MB" with no mathematical specification | **Functional**: No state vector, no process model (F,Q), no measurement models (H for VO, H for satellite), no noise parameters, no scale observability analysis. Impossible to implement or validate accuracy claims. | **Define complete ESKF specification**: 15-state error vector, IMU-driven prediction, dual measurement models (VO relative pose, satellite absolute position), initial Q/R values, scale constraint via altitude + satellite corrections. |
|
||||
| GPS_INPUT at 5-10Hz via pymavlink — no field mapping | **Functional**: GPS_INPUT requires 15+ fields (velocity, accuracy, hdop, fix_type, GPS time). No specification of how ESKF state maps to these fields. ArduPilot requires minimum 5Hz. | **Define GPS_INPUT population spec**: velocity from ESKF, accuracy from covariance, fix_type from confidence tier, GPS time from system clock conversion, synthesized hdop/vdop. |
|
||||
| Confidence scoring "unchanged from draft03" — not in draft05 | **Functional**: Draft05 is supposed to be self-contained. Confidence scoring determines GPS_INPUT accuracy fields and fix_type — directly affects how ArduPilot EKF weights the position data. | **Define confidence scoring inline**: 3 tiers (satellite-anchored, VO-tracked, IMU-only) mapping to fix_type + accuracy values. |
|
||||
| Coordinate transformations not defined | **Functional**: No pixel→camera→body→NED→WGS84 chain. Camera is not autostabilized, so body attitude matters. Satellite match → WGS84 conversion undefined. Object localization impossible without these transforms. | **Define coordinate transformation chain**: camera intrinsics K, camera-to-body extrinsic T_cam_body, body-to-NED from ESKF attitude, NED origin at mission start point. |
|
||||
| Disconnected route segments — "satellite re-localization" mentioned but no algorithm | **Functional**: AC requires handling as "core to the system." Multiple disconnected segments expected. No tracking-loss detection, no re-localization trigger, no ESKF re-initialization, no cuVSLAM restart procedure. | **Define re-localization pipeline**: detect cuVSLAM tracking loss → IMU-only ESKF prediction → trigger satellite match on every frame → on match success: ESKF position reset + cuVSLAM restart → on 3 consecutive failures: operator re-localization request. |
|
||||
| No startup handoff from GPS to GPS-denied | **Functional**: System reads GLOBAL_POSITION_INT at startup but no protocol for when GPS is lost/spoofed vs system start. No validation of initial position. | **Define handoff protocol**: system runs continuously, FC receives both real GPS and GPS_INPUT. GPS-denied system always provides its estimate; FC selects best source. Initial position validated against first satellite match. |
|
||||
| No mid-flight reboot recovery | **Functional**: AC requires: "re-initialize from flight controller's current IMU-extrapolated position." No procedure defined. Recovery time estimation missing. | **Define reboot recovery sequence**: read FC position → init ESKF with high uncertainty → load TRT engines → start cuVSLAM → immediate satellite match. Estimated recovery: ~35-70s. Document as known limitation. |
|
||||
| 3-consecutive-failure re-localization request undefined | **Functional**: AC requires ground station re-localization request. No message format, no operator workflow, no system behavior while waiting. | **Define re-localization protocol**: detect 3 failures → send custom MAVLink message with last known position + uncertainty → operator provides approximate coordinates → system uses as ESKF measurement with high covariance. |
|
||||
| Object localization — "trigonometric calculation" with no details | **Functional**: No math, no API, no Viewpro gimbal integration, no accuracy propagation. Other onboard systems cannot use this component as specified. | **Define object localization**: pixel→ray using Viewpro intrinsics + gimbal angles → body frame → NED → ray-ground intersection → WGS84. FastAPI endpoint: POST /objects/locate. Accuracy propagated from UAV position + gimbal uncertainty. |
|
||||
| Satellite matching — GSD normalization and tile selection unspecified | **Functional**: Camera GSD ~15.9 cm/px at 600m vs satellite ~0.3 m/px at zoom 19. The "pre-resize" step is mentioned but not specified. Tile selection radius based on ESKF uncertainty not defined. | **Define GSD handling**: downsample camera frame to match satellite GSD. Define tile selection: ESKF position ± 3σ_horizontal → select tiles covering that area. Assemble tile mosaic for matching. |
|
||||
| Satellite tile storage requirements not calculated | **Functional**: "±2km" preload mentioned but no storage estimate. At zoom 19: a 200km path with ±2km buffer requires ~~130K tiles (~~2.5GB). | **Calculate tile storage**: specify zoom level (18 preferred — 0.6m/px, 4× fewer tiles), estimate storage per mission profile, define maximum mission area by storage limit. |
|
||||
| FastAPI endpoints not in solution draft | **Functional**: Endpoints only in security_analysis.md. No request/response schemas. No SSE event format. No object localization endpoint. | **Consolidate API spec in solution**: define all endpoints, SSE event schema, object localization endpoint. Reference security_analysis.md for auth. |
|
||||
| cuVSLAM configuration missing (calibration, IMU params, mode) | **Functional**: No camera calibration procedure, no IMU noise parameters, no T_imu_rig extrinsic, no mode selection (Mono vs Inertial). | **Define cuVSLAM configuration**: use Inertial mode, specify required calibration data (camera intrinsics, distortion, IMU noise params from datasheet, T_imu_rig from physical measurement), define calibration procedure. |
|
||||
| tech_stack.md inconsistent with draft05 | **Functional**: tech_stack.md says 3fps (should be 0.7fps), LiteSAM at 480px (should be 1280px), missing EfficientLoFTR. | **Flag for update**: tech_stack.md must be synchronized with draft05 corrections. Not addressed in this draft — separate task. |
|
||||
|
||||
|
||||
## Overall Maturity Assessment
|
||||
|
||||
|
||||
| Category | Maturity (1-5) | Assessment |
|
||||
| ----------------------------- | -------------- | --------------------------------------------------------------------------------------------------------------------------------------- |
|
||||
| Hardware & Platform Selection | 3.5 | UAV airframe, cameras, Jetson, batteries — well-researched with specs, weight budget, endurance calculations. Ready for procurement. |
|
||||
| Core Algorithm Selection | 3.0 | cuVSLAM, LiteSAM/XFeat, ESKF — components selected with comparison tables, fallback chains, decision trees. Day-one benchmarks defined. |
|
||||
| AI Inference Runtime | 3.5 | TRT Engine migration thoroughly analyzed. Conversion workflows, memory savings, performance estimates. Code wrapper provided. |
|
||||
| Sensor Fusion (ESKF) | 1.5 | Mentioned but not specified. No implementable detail. Blockerfor coding. |
|
||||
| System Integration | 1.5 | GPS_INPUT, coordinate transforms, inter-component data flow — all under-specified. |
|
||||
| Edge Cases & Resilience | 1.0 | Disconnected segments, reboot recovery, re-localization — acknowledged but no algorithms. |
|
||||
| Operational Readiness | 0.5 | No pre-flight procedures, no in-flight monitoring, no failure response. |
|
||||
| Security | 3.0 | Comprehensive threat model, OP-TEE analysis, LUKS, secure boot. Well-researched. |
|
||||
| **Overall TRL** | **~2.5** | **Technology concept formulated + some component validation. Not implementation-ready.** |
|
||||
|
||||
|
||||
The solution is at approximately **TRL 3** (proof of concept) for hardware/algorithm selection and **TRL 1-2** (basic concept) for system integration, ESKF, and operational procedures.
|
||||
|
||||
## Product Solution Description
|
||||
|
||||
A real-time GPS-denied visual navigation system for fixed-wing UAVs, running on a Jetson Orin Nano Super (8GB). All AI model inference uses native TensorRT Engine files. The system replaces the GPS module by sending MAVLink GPS_INPUT messages via pymavlink over UART at 5-10Hz.
|
||||
|
||||
Position is determined by fusing: (1) CUDA-accelerated visual odometry (cuVSLAM in Inertial mode) from ADTI 20L V1 at 0.7 fps sustained, (2) absolute position corrections from satellite image matching (LiteSAM or XFeat — TRT Engine FP16) using keyframes from the same ADTI image stream, and (3) IMU data from the flight controller via ESKF. Viewpro A40 Pro is reserved for AI object detection only.
|
||||
|
||||
The ESKF is the central state estimator with 15-state error vector. It fuses:
|
||||
|
||||
- **IMU prediction** at 5-10Hz (high-frequency pose propagation)
|
||||
- **cuVSLAM VO measurement** at 0.7Hz (relative pose correction)
|
||||
- **Satellite matching measurement** at ~0.07-0.14Hz (absolute position correction)
|
||||
|
||||
GPS_INPUT messages carry position, velocity, and accuracy derived from the ESKF state and covariance.
|
||||
|
||||
**Hard constraint**: ADTI 20L V1 shoots at 0.7 fps sustained (1430ms interval). Full VO+ESKF pipeline within 400ms per frame. Satellite matching async on keyframes (every 5-10 camera frames). GPS_INPUT at 5-10Hz (ESKF IMU prediction fills gaps between camera frames).
|
||||
|
||||
```
|
||||
┌─────────────────────────────────────────────────────────────────────┐
|
||||
│ OFFLINE (Before Flight) │
|
||||
│ 1. Satellite Tiles → Download & Validate → Pre-resize → Store │
|
||||
│ (Google Maps) (≥0.5m/px, <2yr) (matcher res) (GeoHash)│
|
||||
│ 2. TRT Engine Build (one-time per model version): │
|
||||
│ PyTorch model → reparameterize → ONNX export → trtexec --fp16 │
|
||||
│ Output: litesam.engine, xfeat.engine │
|
||||
│ 3. Camera + IMU calibration (one-time per hardware unit) │
|
||||
│ 4. Copy tiles + engines + calibration to Jetson storage │
|
||||
└─────────────────────────────────────────────────────────────────────┘
|
||||
│
|
||||
▼
|
||||
┌─────────────────────────────────────────────────────────────────────┐
|
||||
│ ONLINE (During Flight) │
|
||||
│ │
|
||||
│ STARTUP: │
|
||||
│ 1. pymavlink → read GLOBAL_POSITION_INT → init ESKF state │
|
||||
│ 2. Load TRT engines + allocate GPU buffers │
|
||||
│ 3. Load camera calibration + IMU calibration │
|
||||
│ 4. Start cuVSLAM (Inertial mode) with ADTI 20L V1 │
|
||||
│ 5. Preload satellite tiles ±2km into RAM │
|
||||
│ 6. First satellite match → validate initial position │
|
||||
│ 7. Begin GPS_INPUT output loop at 5-10Hz │
|
||||
│ │
|
||||
│ EVERY CAMERA FRAME (0.7fps from ADTI 20L V1): │
|
||||
│ ┌──────────────────────────────────────┐ │
|
||||
│ │ ADTI 20L V1 → Downsample (CUDA) │ │
|
||||
│ │ → cuVSLAM VO+IMU (~9ms) │ ← CUDA Stream A │
|
||||
│ │ → ESKF VO measurement │ │
|
||||
│ └──────────────────────────────────────┘ │
|
||||
│ │
|
||||
│ 5-10Hz CONTINUOUS (IMU-driven between camera frames): │
|
||||
│ ┌──────────────────────────────────────┐ │
|
||||
│ │ IMU data → ESKF prediction │ │
|
||||
│ │ ESKF state → GPS_INPUT fields │ │
|
||||
│ │ GPS_INPUT → Flight Controller (UART) │ │
|
||||
│ └──────────────────────────────────────┘ │
|
||||
│ │
|
||||
│ KEYFRAMES (every 5-10 camera frames, async): │
|
||||
│ ┌──────────────────────────────────────┐ │
|
||||
│ │ Camera frame → GSD downsample │ │
|
||||
│ │ Select satellite tile (ESKF pos±3σ) │ │
|
||||
│ │ TRT inference (Stream B): LiteSAM/ │ │
|
||||
│ │ XFeat → correspondences │ │
|
||||
│ │ RANSAC → homography → WGS84 position │ │
|
||||
│ │ ESKF satellite measurement update │──→ Position correction │
|
||||
│ └──────────────────────────────────────┘ │
|
||||
│ │
|
||||
│ TRACKING LOSS (cuVSLAM fails — sharp turn / featureless): │
|
||||
│ ┌──────────────────────────────────────┐ │
|
||||
│ │ ESKF → IMU-only prediction (growing │ │
|
||||
│ │ uncertainty) │ │
|
||||
│ │ Satellite match on EVERY frame │ │
|
||||
│ │ On match success → ESKF reset + │ │
|
||||
│ │ cuVSLAM restart │ │
|
||||
│ │ 3 consecutive failures → operator │ │
|
||||
│ │ re-localization request │ │
|
||||
│ └──────────────────────────────────────┘ │
|
||||
│ │
|
||||
│ TELEMETRY (1Hz): │
|
||||
│ ┌──────────────────────────────────────┐ │
|
||||
│ │ NAMED_VALUE_FLOAT: confidence, drift │──→ Ground Station │
|
||||
│ └──────────────────────────────────────┘ │
|
||||
└─────────────────────────────────────────────────────────────────────┘
|
||||
```
|
||||
|
||||
## Architecture
|
||||
|
||||
### Component: ESKF Sensor Fusion (NEW — previously unspecified)
|
||||
|
||||
**Error-State Kalman Filter** fusing IMU, visual odometry, and satellite matching.
|
||||
|
||||
**Nominal state vector** (propagated by IMU):
|
||||
|
||||
|
||||
| State | Symbol | Size | Description |
|
||||
| ---------- | ------ | ---- | ------------------------------------------------ |
|
||||
| Position | p | 3 | NED position relative to mission origin (meters) |
|
||||
| Velocity | v | 3 | NED velocity (m/s) |
|
||||
| Attitude | q | 4 | Unit quaternion (body-to-NED rotation) |
|
||||
| Accel bias | b_a | 3 | Accelerometer bias (m/s²) |
|
||||
| Gyro bias | b_g | 3 | Gyroscope bias (rad/s) |
|
||||
|
||||
|
||||
**Error-state vector** (estimated by ESKF): δx = [δp, δv, δθ, δb_a, δb_g]ᵀ ∈ ℝ¹⁵
|
||||
where δθ ∈ so(3) is the 3D rotation error.
|
||||
|
||||
**Prediction step** (IMU at 5-10Hz from flight controller):
|
||||
|
||||
- Input: accelerometer a_m, gyroscope ω_m, dt
|
||||
- Propagate nominal state: p += v·dt, v += (R(q)·(a_m - b_a) - g)·dt, q ⊗= Exp(ω_m - b_g)·dt
|
||||
- Propagate error covariance: P = F·P·Fᵀ + Q
|
||||
- F is the 15×15 error-state transition matrix (standard ESKF formulation)
|
||||
- Q: process noise diagonal, initial values from IMU datasheet noise densities
|
||||
|
||||
**VO measurement update** (0.7Hz from cuVSLAM):
|
||||
|
||||
- cuVSLAM outputs relative pose: ΔR, Δt (camera frame)
|
||||
- Transform to NED: Δp_ned = R_body_ned · T_cam_body · Δt
|
||||
- Innovation: z = Δp_ned_measured - Δp_ned_predicted
|
||||
- Observation matrix H_vo maps error state to relative position change
|
||||
- R_vo: measurement noise, initial ~0.1-0.5m (from cuVSLAM precision at 600m+ altitude)
|
||||
- Kalman update: K = P·Hᵀ·(H·P·Hᵀ + R)⁻¹, δx = K·z, P = (I - K·H)·P
|
||||
|
||||
**Satellite measurement update** (0.07-0.14Hz, async):
|
||||
|
||||
- Satellite matching outputs absolute position: lat_sat, lon_sat in WGS84
|
||||
- Convert to NED relative to mission origin
|
||||
- Innovation: z = p_satellite - p_predicted
|
||||
- H_sat = [I₃, 0, 0, 0, 0] (directly observes position)
|
||||
- R_sat: measurement noise, from matching confidence (~5-20m based on RANSAC inlier ratio)
|
||||
- Provides absolute position correction — bounds drift accumulation
|
||||
|
||||
**Scale observability**:
|
||||
|
||||
- Monocular cuVSLAM has scale ambiguity during constant-velocity flight
|
||||
- Scale is constrained by: (1) satellite matching absolute positions (primary), (2) known flight altitude from barometer + predefined mission altitude, (3) IMU accelerometer during maneuvers
|
||||
- During long straight segments without satellite correction, scale drift is possible. Satellite corrections every ~7-14s re-anchor scale.
|
||||
|
||||
**Tuning approach**: Start with IMU datasheet noise values for Q. Start with conservative R values (high measurement noise). Tune on flight test data by comparing ESKF output to known GPS ground truth.
|
||||
|
||||
|
||||
| Solution | Tools | Advantages | Limitations | Performance | Fit |
|
||||
| -------------------------- | --------------- | ------------------------------------------------------------- | -------------------------------------- | ------------- | ----------- |
|
||||
| Custom ESKF (Python/NumPy) | NumPy, SciPy | Full control, minimal dependencies, well-understood algorithm | Implementation effort, tuning required | <1ms per step | ✅ Selected |
|
||||
| FilterPy ESKF | FilterPy v1.4.5 | Reference implementation, less code | Less flexible for multi-rate fusion | <1ms per step | ⚠️ Fallback |
|
||||
|
||||
|
||||
### Component: Coordinate System & Transformations (NEW — previously undefined)
|
||||
|
||||
**Reference frames**:
|
||||
|
||||
- **Camera frame (C)**: origin at camera optical center, Z forward, X right, Y down (OpenCV convention)
|
||||
- **Body frame (B)**: origin at UAV CG, X forward (nose), Y right (starboard), Z down
|
||||
- **NED frame (N)**: North-East-Down, origin at mission start point
|
||||
- **WGS84**: latitude, longitude, altitude (output format)
|
||||
|
||||
**Transformation chain**:
|
||||
|
||||
1. **Pixel → Camera ray**: p_cam = K⁻¹ · [u, v, 1]ᵀ where K = camera intrinsic matrix (ADTI 20L V1: fx, fy from 16mm lens + APS-C sensor)
|
||||
2. **Camera → Body**: p_body = T_cam_body · p_cam where T_cam_body is the fixed mounting rotation (camera points nadir: 90° pitch rotation from body X-forward to camera Z-down)
|
||||
3. **Body → NED**: p_ned = R_body_ned(q) · p_body where q is the ESKF quaternion attitude estimate
|
||||
4. **NED → WGS84**: lat = lat_origin + p_north / R_earth, lon = lon_origin + p_east / (R_earth · cos(lat_origin)) where (lat_origin, lon_origin) is the mission start GPS position
|
||||
|
||||
**Camera intrinsic matrix K** (ADTI 20L V1 + 16mm lens):
|
||||
|
||||
- Sensor: 23.2 × 15.4 mm, Resolution: 5456 × 3632
|
||||
- fx = fy = focal_mm × width_px / sensor_width_mm = 16 × 5456 / 23.2 = 3763 pixels
|
||||
- cx = 2728, cy = 1816 (sensor center)
|
||||
- Distortion: Brown model (k1, k2, p1, p2 from calibration)
|
||||
|
||||
**T_cam_body** (camera mount):
|
||||
|
||||
- Navigation camera is fixed, pointing nadir (downward), not autostabilized
|
||||
- R_cam_body = R_x(180°) · R_z(0°) (camera Z-axis aligned with body -Z, camera X with body X)
|
||||
- Translation: offset from CG to camera mount (measured during assembly, typically <0.3m)
|
||||
|
||||
**Satellite match → WGS84**:
|
||||
|
||||
- Feature correspondences between camera frame and geo-referenced satellite tile
|
||||
- Homography H maps camera pixels to satellite tile pixels
|
||||
- Satellite tile pixel → WGS84 via tile's known georeference (zoom level + tile x,y → lat,lon)
|
||||
- Camera center projects to satellite pixel (cx_sat, cy_sat) via H
|
||||
- Convert (cx_sat, cy_sat) to WGS84 using tile georeference
|
||||
|
||||
### Component: GPS_INPUT Message Population (NEW — previously undefined)
|
||||
|
||||
|
||||
| GPS_INPUT Field | Source | Computation |
|
||||
| ----------------------- | ---------------------------------------------- | ------------------------------------------------------------------------------------------------------- |
|
||||
| lat, lon | ESKF position (NED) | NED → WGS84 conversion using mission origin |
|
||||
| alt | ESKF position (Down) + mission origin altitude | alt = alt_origin - p_down |
|
||||
| vn, ve, vd | ESKF velocity state | Direct from ESKF v[0], v[1], v[2] |
|
||||
| fix_type | Confidence tier | 3 (3D fix) when satellite-anchored (last match <30s). 2 (2D) when VO-only. 0 (no fix) when IMU-only >5s |
|
||||
| hdop | ESKF horizontal covariance | hdop = sqrt(P[0,0] + P[1,1]) / 5.0 (approximate CEP→HDOP mapping) |
|
||||
| vdop | ESKF vertical covariance | vdop = sqrt(P[2,2]) / 5.0 |
|
||||
| horiz_accuracy | ESKF horizontal covariance | horiz_accuracy = sqrt(P[0,0] + P[1,1]) meters |
|
||||
| vert_accuracy | ESKF vertical covariance | vert_accuracy = sqrt(P[2,2]) meters |
|
||||
| speed_accuracy | ESKF velocity covariance | speed_accuracy = sqrt(P[3,3] + P[4,4]) m/s |
|
||||
| time_week, time_week_ms | System time | Convert Unix time to GPS epoch (GPS epoch = 1980-01-06, subtract leap seconds) |
|
||||
| satellites_visible | Constant | 10 (synthetic — prevents satellite-count failsafes in ArduPilot) |
|
||||
| gps_id | Constant | 0 |
|
||||
| ignore_flags | Constant | 0 (provide all fields) |
|
||||
|
||||
|
||||
**Confidence tiers** mapping to GPS_INPUT:
|
||||
|
||||
|
||||
| Tier | Condition | fix_type | horiz_accuracy | Rationale |
|
||||
| ------ | ------------------------------------------------- | ---------- | ------------------------------- | -------------------------------------- |
|
||||
| HIGH | Satellite match <30s ago, ESKF covariance < 400m² | 3 (3D fix) | From ESKF P (typically 5-20m) | Absolute position anchor recent |
|
||||
| MEDIUM | cuVSLAM tracking OK, no recent satellite match | 3 (3D fix) | From ESKF P (typically 20-50m) | Relative tracking valid, drift growing |
|
||||
| LOW | cuVSLAM lost, IMU-only | 2 (2D fix) | From ESKF P (50-200m+, growing) | Only IMU dead reckoning, rapid drift |
|
||||
| FAILED | 3+ consecutive total failures | 0 (no fix) | 999.0 | System cannot determine position |
|
||||
|
||||
|
||||
### Component: Disconnected Route Segment Handling (NEW — previously undefined)
|
||||
|
||||
**Trigger**: cuVSLAM reports tracking_lost OR tracking confidence drops below threshold
|
||||
|
||||
**Algorithm**:
|
||||
|
||||
```
|
||||
STATE: TRACKING_NORMAL
|
||||
cuVSLAM provides relative pose
|
||||
ESKF VO measurement updates at 0.7Hz
|
||||
Satellite matching on keyframes (every 5-10 frames)
|
||||
|
||||
STATE: TRACKING_LOST (enter when cuVSLAM reports loss)
|
||||
1. ESKF continues with IMU-only prediction (no VO updates)
|
||||
→ uncertainty grows rapidly (~1-5 m/s drift with consumer IMU)
|
||||
2. Switch satellite matching to EVERY frame (not just keyframes)
|
||||
→ maximize chances of getting absolute correction
|
||||
3. For each camera frame:
|
||||
a. Attempt satellite match using ESKF predicted position ± 3σ for tile selection
|
||||
b. If match succeeds (RANSAC inlier ratio > 30%):
|
||||
→ ESKF measurement update with satellite position
|
||||
→ Restart cuVSLAM with current frame as new origin
|
||||
→ Transition to TRACKING_NORMAL
|
||||
→ Reset failure counter
|
||||
c. If match fails:
|
||||
→ Increment failure_counter
|
||||
→ Continue IMU-only ESKF prediction
|
||||
4. If failure_counter >= 3:
|
||||
→ Send re-localization request to ground station
|
||||
→ GPS_INPUT fix_type = 0 (no fix), horiz_accuracy = 999.0
|
||||
→ Continue attempting satellite matching on each frame
|
||||
5. If operator sends re-localization hint (approximate lat,lon):
|
||||
→ Use as ESKF measurement with high covariance (~500m)
|
||||
→ Attempt satellite match in that area
|
||||
→ On success: transition to TRACKING_NORMAL
|
||||
|
||||
STATE: SEGMENT_DISCONNECT
|
||||
After re-localization following tracking loss:
|
||||
→ New cuVSLAM track is independent of previous track
|
||||
→ ESKF maintains global NED position continuity via satellite anchor
|
||||
→ No need to "connect" segments at the cuVSLAM level
|
||||
→ ESKF already handles this: satellite corrections keep global position consistent
|
||||
```
|
||||
|
||||
### Component: Satellite Image Matching Pipeline (UPDATED — added GSD + tile selection details)
|
||||
|
||||
**GSD normalization**:
|
||||
|
||||
- Camera GSD at 600m: ~15.9 cm/pixel (ADTI 20L V1 + 16mm)
|
||||
- Satellite tile GSD at zoom 18: ~0.6 m/pixel
|
||||
- Scale ratio: ~3.8:1
|
||||
- Downsample camera image to satellite GSD before matching: resize from 5456×3632 to ~1440×960 (matching zoom 18 GSD)
|
||||
- This is close to LiteSAM's 1280px input — use 1280px with minor GSD mismatch acceptable for matching
|
||||
|
||||
**Tile selection**:
|
||||
|
||||
- Input: ESKF position estimate (lat, lon) + horizontal covariance σ_h
|
||||
- Search radius: max(3·σ_h, 500m) — at least 500m to handle initial uncertainty
|
||||
- Compute geohash for center position → load tiles covering the search area
|
||||
- Assemble tile mosaic if needed (typically 2×2 to 4×4 tiles for adequate coverage)
|
||||
- If ESKF uncertainty > 2km: tile selection unreliable, fall back to wider search or request operator input
|
||||
|
||||
**Tile storage calculation** (zoom 18 — 0.6 m/pixel):
|
||||
|
||||
- Each 256×256 tile covers ~153m × 153m
|
||||
- Flight path 200km with ±2km buffer: area ≈ 200km × 4km = 800 km²
|
||||
- Tiles needed: 800,000,000 / (153 × 153) ≈ 34,200 tiles
|
||||
- Storage: ~10-15KB per JPEG tile → ~340-510 MB
|
||||
- With zoom 19 overlap tiles for higher precision: ×4 = ~1.4-2.0 GB
|
||||
- Recommended: zoom 18 primary + zoom 19 for ±500m along flight path → ~500-800 MB total
|
||||
|
||||
|
||||
| Solution | Tools | Advantages | Limitations | Performance (est. Orin Nano Super TRT FP16) | Params | Fit |
|
||||
| -------------------------------------- | ------------------------- | ------------------------------------------------------------------------ | ------------------------------------------------------ | ------------------------------------------- | ------ | ------------------------------- |
|
||||
| LiteSAM (opt) TRT Engine FP16 @ 1280px | trtexec + tensorrt Python | Best satellite-aerial accuracy (RMSE@30=17.86m UAV-VisLoc), 6.31M params | MinGRU TRT export needs verification (LOW-MEDIUM risk) | Est. ~165-330ms | 6.31M | ✅ Primary |
|
||||
| EfficientLoFTR TRT Engine FP16 | trtexec + tensorrt Python | Proven TRT path (Coarse_LoFTR_TRT). Semi-dense. CVPR 2024. | 2.4x more params than LiteSAM. | Est. ~200-400ms | 15.05M | ✅ Fallback if LiteSAM TRT fails |
|
||||
| XFeat TRT Engine FP16 | trtexec + tensorrt Python | Fastest. Proven TRT implementation. | General-purpose, not designed for cross-view gap. | Est. ~50-100ms | <5M | ✅ Speed fallback |
|
||||
|
||||
|
||||
### Component: cuVSLAM Configuration (NEW — previously undefined)
|
||||
|
||||
**Mode**: Inertial (mono camera + IMU)
|
||||
|
||||
**Camera configuration** (ADTI 20L V1 + 16mm lens):
|
||||
|
||||
- Model: Brown distortion
|
||||
- fx = fy = 3763 px (16mm on 23.2mm sensor at 5456px width)
|
||||
- cx = 2728 px, cy = 1816 px
|
||||
- Distortion coefficients: from calibration (k1, k2, p1, p2)
|
||||
- Border: 50px (ignore lens edge distortion)
|
||||
|
||||
**IMU configuration** (Pixhawk 6x IMU — ICM-42688-P):
|
||||
|
||||
- Gyroscope noise density: 3.0 × 10⁻³ °/s/√Hz
|
||||
- Gyroscope random walk: 5.0 × 10⁻⁵ °/s²/√Hz
|
||||
- Accelerometer noise density: 70 µg/√Hz
|
||||
- Accelerometer random walk: ~2.0 × 10⁻³ m/s³/√Hz
|
||||
- IMU frequency: 200 Hz (from flight controller via MAVLink)
|
||||
- T_imu_rig: measured transformation from Pixhawk IMU to camera center (translation + rotation)
|
||||
|
||||
**cuVSLAM settings**:
|
||||
|
||||
- OdometryMode: INERTIAL
|
||||
- MulticameraMode: PRECISION (favor accuracy over speed — we have 1430ms budget)
|
||||
- Input resolution: downsample to 1280×852 (or 720p) for processing speed
|
||||
- async_bundle_adjustment: True
|
||||
|
||||
**Initialization**:
|
||||
|
||||
- cuVSLAM initializes automatically when it receives the first camera frame + IMU data
|
||||
- First few frames used for feature initialization and scale estimation
|
||||
- First satellite match validates and corrects the initial position
|
||||
|
||||
**Calibration procedure** (one-time per hardware unit):
|
||||
|
||||
1. Camera intrinsics: checkerboard calibration with OpenCV (or use manufacturer data if available)
|
||||
2. Camera-IMU extrinsic (T_imu_rig): Kalibr tool with checkerboard + IMU data
|
||||
3. IMU noise parameters: Allan variance analysis or use datasheet values
|
||||
4. Store calibration files on Jetson storage
|
||||
|
||||
### Component: AI Model Inference Runtime (UNCHANGED)
|
||||
|
||||
Native TRT Engine — optimal performance and memory on fixed NVIDIA hardware. See draft05 for full comparison table and conversion workflow.
|
||||
|
||||
### Component: Visual Odometry (UNCHANGED)
|
||||
|
||||
cuVSLAM in Inertial mode, fed by ADTI 20L V1 at 0.7 fps sustained. See draft05 for feasibility analysis at 0.7fps.
|
||||
|
||||
### Component: Flight Controller Integration (UPDATED — added GPS_INPUT field spec)
|
||||
|
||||
pymavlink over UART at 5-10Hz. GPS_INPUT field population defined above.
|
||||
|
||||
ArduPilot configuration:
|
||||
|
||||
- GPS1_TYPE = 14 (MAVLink)
|
||||
- GPS_RATE = 5 (minimum, matching our 5-10Hz output)
|
||||
- EK3_SRC1_POSXY = 1 (GPS), EK3_SRC1_VELXY = 1 (GPS) — EKF uses GPS_INPUT as position/velocity source
|
||||
|
||||
### Component: Object Localization (NEW — previously undefined)
|
||||
|
||||
**Input**: pixel coordinates (u, v) in Viewpro A40 Pro image, current gimbal angles (pan_deg, tilt_deg), zoom factor, UAV position from GPS-denied system, UAV altitude
|
||||
|
||||
**Process**:
|
||||
|
||||
1. Pixel → camera ray: ray_cam = K_viewpro⁻¹(zoom) · [u, v, 1]ᵀ
|
||||
2. Camera → gimbal frame: ray_gimbal = R_gimbal(pan, tilt) · ray_cam
|
||||
3. Gimbal → body: ray_body = T_gimbal_body · ray_gimbal
|
||||
4. Body → NED: ray_ned = R_body_ned(q) · ray_body
|
||||
5. Ray-ground intersection: assuming flat terrain at UAV altitude h: t = -h / ray_ned[2], p_ground_ned = p_uav_ned + t · ray_ned
|
||||
6. NED → WGS84: convert to lat, lon
|
||||
|
||||
**Output**: { lat, lon, accuracy_m, confidence }
|
||||
|
||||
- accuracy_m propagated from: UAV position accuracy (from ESKF) + gimbal angle uncertainty + altitude uncertainty
|
||||
|
||||
**API endpoint**: POST /objects/locate
|
||||
|
||||
- Request: { pixel_x, pixel_y, gimbal_pan_deg, gimbal_tilt_deg, zoom_factor }
|
||||
- Response: { lat, lon, alt, accuracy_m, confidence, uav_position: {lat, lon, alt}, timestamp }
|
||||
|
||||
### Component: Startup, Handoff & Failsafe (UPDATED — added handoff + reboot + re-localization)
|
||||
|
||||
**GPS-denied handoff protocol**:
|
||||
|
||||
- GPS-denied system runs continuously from companion computer boot
|
||||
- Reads initial position from FC (GLOBAL_POSITION_INT) — this may be real GPS or last known
|
||||
- First satellite match validates the initial position
|
||||
- FC receives both real GPS (if available) and GPS_INPUT; FC EKF selects best source based on accuracy
|
||||
- No explicit "switch" — the GPS-denied system is a secondary GPS source
|
||||
|
||||
**Startup sequence** (expanded from draft05):
|
||||
|
||||
1. Boot Jetson → start GPS-Denied service (systemd)
|
||||
2. Connect to flight controller via pymavlink on UART
|
||||
3. Wait for heartbeat
|
||||
4. Initialize PyCUDA context
|
||||
5. Load TRT engines: litesam.engine + xfeat.engine (~1-3s each)
|
||||
6. Allocate GPU I/O buffers
|
||||
7. Create CUDA streams: Stream A (cuVSLAM), Stream B (satellite matching)
|
||||
8. Load camera calibration + IMU calibration files
|
||||
9. Read GLOBAL_POSITION_INT → set mission origin (NED reference point) → init ESKF
|
||||
10. Start cuVSLAM (Inertial mode) with ADTI 20L V1 camera stream
|
||||
11. Preload satellite tiles within ±2km into RAM
|
||||
12. Trigger first satellite match → validate initial position
|
||||
13. Begin GPS_INPUT output loop at 5-10Hz
|
||||
14. System ready
|
||||
|
||||
**Mid-flight reboot recovery**:
|
||||
|
||||
1. Jetson boots (~30-60s)
|
||||
2. GPS-Denied service starts, connects to FC
|
||||
3. Read GLOBAL_POSITION_INT (FC's current IMU-extrapolated position)
|
||||
4. Init ESKF with this position + HIGH uncertainty covariance (σ = 200m)
|
||||
5. Load TRT engines (~2-6s total)
|
||||
6. Start cuVSLAM (fresh, no prior map)
|
||||
7. Immediate satellite matching on first camera frame
|
||||
8. On satellite match success: ESKF corrected, uncertainty drops
|
||||
9. Estimated total recovery: ~35-70s
|
||||
10. During recovery: FC uses IMU-only dead reckoning (at 70 km/h: ~700-1400m uncontrolled drift)
|
||||
11. **Known limitation**: recovery time is dominated by Jetson boot time
|
||||
|
||||
**3-consecutive-failure re-localization**:
|
||||
|
||||
- Trigger: VO lost + satellite match failed × 3 consecutive camera frames
|
||||
- Action: send re-localization request via MAVLink STATUSTEXT or custom message
|
||||
- Message content: "RELOC_REQ: last_lat={lat} last_lon={lon} uncertainty={σ}m"
|
||||
- Operator response: MAVLink COMMAND_LONG with approximate lat/lon
|
||||
- System: use operator position as ESKF measurement with R = diag(500², 500², 100²) meters²
|
||||
- System continues satellite matching with updated search area
|
||||
- While waiting: GPS_INPUT fix_type=0, IMU-only ESKF prediction continues
|
||||
|
||||
### Component: Ground Station Telemetry (UPDATED — added re-localization)
|
||||
|
||||
MAVLink messages to ground station:
|
||||
|
||||
|
||||
| Message | Rate | Content |
|
||||
| ----------------------------- | -------- | --------------------------------------------------- |
|
||||
| NAMED_VALUE_FLOAT "gps_conf" | 1Hz | Confidence score (0.0-1.0) |
|
||||
| NAMED_VALUE_FLOAT "gps_drift" | 1Hz | Estimated drift from last satellite anchor (meters) |
|
||||
| NAMED_VALUE_FLOAT "gps_hacc" | 1Hz | Horizontal accuracy (meters, from ESKF) |
|
||||
| STATUSTEXT | On event | "RELOC_REQ: ..." for re-localization request |
|
||||
| STATUSTEXT | On event | Tracking loss / recovery notifications |
|
||||
|
||||
|
||||
### Component: Thermal Management (UNCHANGED)
|
||||
|
||||
Same adaptive pipeline from draft05. Active cooling required at 25W. Throttling at 80°C SoC junction.
|
||||
|
||||
### Component: API & Inter-System Communication (NEW — consolidated)
|
||||
|
||||
FastAPI (Uvicorn) running locally on Jetson for inter-process communication with other onboard systems.
|
||||
|
||||
|
||||
| Endpoint | Method | Purpose | Auth |
|
||||
| --------------------- | --------- | -------------------------------------- | ---- |
|
||||
| /sessions | POST | Start GPS-denied session | JWT |
|
||||
| /sessions/{id}/stream | GET (SSE) | Real-time position + confidence stream | JWT |
|
||||
| /sessions/{id}/anchor | POST | Operator re-localization hint | JWT |
|
||||
| /sessions/{id} | DELETE | End session | JWT |
|
||||
| /objects/locate | POST | Object GPS from pixel coordinates | JWT |
|
||||
| /health | GET | System health + memory + thermal | None |
|
||||
|
||||
|
||||
**SSE event schema** (1Hz):
|
||||
|
||||
```json
|
||||
{
|
||||
"type": "position",
|
||||
"timestamp": "2026-03-17T12:00:00.000Z",
|
||||
"lat": 48.123456,
|
||||
"lon": 37.654321,
|
||||
"alt": 600.0,
|
||||
"accuracy_h": 15.2,
|
||||
"accuracy_v": 8.1,
|
||||
"confidence": "HIGH",
|
||||
"drift_from_anchor": 12.5,
|
||||
"vo_status": "tracking",
|
||||
"last_satellite_match_age_s": 8.3
|
||||
}
|
||||
```
|
||||
|
||||
## UAV Platform
|
||||
|
||||
Unchanged from draft05. See draft05 for: airframe configuration (3.5m S-2 composite, 12.5kg AUW), flight performance (3.4h endurance at 50 km/h), camera specifications (ADTI 20L V1 + 16mm, Viewpro A40 Pro), ground coverage calculations.
|
||||
|
||||
## Speed Optimization Techniques
|
||||
|
||||
Unchanged from draft05. Key points: cuVSLAM ~9ms/frame, native TRT Engine (no ONNX RT), dual CUDA streams, 5-10Hz GPS_INPUT from ESKF IMU prediction.
|
||||
|
||||
## Processing Time Budget
|
||||
|
||||
Unchanged from draft05. VO frame: ~17-22ms. Satellite matching: ≤210ms async. Well within 1430ms frame interval.
|
||||
|
||||
## Memory Budget (Jetson Orin Nano Super, 8GB shared)
|
||||
|
||||
|
||||
| Component | Memory | Notes |
|
||||
| ------------------------- | -------------- | ------------------------------------------- |
|
||||
| OS + runtime | ~1.5GB | JetPack 6.2 + Python |
|
||||
| cuVSLAM | ~200-500MB | CUDA library + map |
|
||||
| LiteSAM TRT engine | ~50-80MB | If LiteSAM fails: EfficientLoFTR ~100-150MB |
|
||||
| XFeat TRT engine | ~30-50MB | |
|
||||
| Preloaded satellite tiles | ~200MB | ±2km of flight plan |
|
||||
| pymavlink + MAVLink | ~20MB | |
|
||||
| FastAPI (local IPC) | ~50MB | |
|
||||
| ESKF + buffers | ~10MB | |
|
||||
| **Total** | **~2.1-2.9GB** | **26-36% of 8GB** |
|
||||
|
||||
|
||||
## Key Risks and Mitigations
|
||||
|
||||
|
||||
| Risk | Likelihood | Impact | Mitigation |
|
||||
| ------------------------------------------------------- | ---------- | ----------------------------------------------------- | ------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
|
||||
| LiteSAM MinGRU ops unsupported in TRT 10.3 | LOW-MEDIUM | LiteSAM TRT export fails | Day-one verification. Fallback: EfficientLoFTR TRT → XFeat TRT. |
|
||||
| cuVSLAM fails on low-texture terrain at 0.7fps | HIGH | Frequent tracking loss | Satellite matching corrections bound drift. Re-localization pipeline handles tracking loss. IMU bridges short gaps. |
|
||||
| Google Maps satellite quality in conflict zone | HIGH | Satellite matching fails, outdated imagery | Pre-flight tile validation. Consider alternative providers (Bing, Mapbox). Robust to seasonal appearance changes via feature-based matching. |
|
||||
| ESKF scale drift during long constant-velocity segments | MEDIUM | Position error exceeds 100m between satellite anchors | Satellite corrections every 7-14s re-anchor. Altitude constraint from barometer. Monitor drift rate — if >50m between corrections, increase satellite matching frequency. |
|
||||
| Monocular scale ambiguity | MEDIUM | Metric scale lost during constant-velocity flight | Satellite absolute corrections provide scale. Known altitude constrains vertical scale. IMU acceleration during turns provides observability. |
|
||||
| AUW exceeds AT4125 recommended range | MEDIUM | Reduced endurance, motor thermal stress | 12.5 kg vs 8-10 kg recommended. Monitor motor temps. Weight optimization. |
|
||||
| ADTI mechanical shutter lifespan | MEDIUM | Replacement needed periodically | ~8,800 actuations/flight at 0.7fps. Estimated 11-57 flights before replacement. Budget as consumable. |
|
||||
| Mid-flight companion computer failure | LOW | ~35-70s position gap | Reboot recovery procedure defined. FC uses IMU dead reckoning during gap. Known limitation. |
|
||||
| Thermal throttling on Jetson | MEDIUM | Satellite matching latency increases | Active cooling required. Monitor SoC temp. Throttling at 80°C. Our workload ~8-15W typical — well under 25W TDP. |
|
||||
| Engine incompatibility after JetPack update | MEDIUM | Must rebuild engines | Include engine rebuild in update procedure. |
|
||||
| TRT engine build OOM on 8GB | LOW | Cannot build on target | Models small (6.31M, <5M). Reduce --memPoolSize if needed. |
|
||||
|
||||
|
||||
## Testing Strategy
|
||||
|
||||
### Integration / Functional Tests
|
||||
|
||||
- **ESKF correctness**: Feed recorded IMU + synthetic VO/satellite data → verify output matches reference ESKF implementation
|
||||
- **GPS_INPUT field validation**: Send GPS_INPUT to SITL ArduPilot → verify EKF accepts and uses the data correctly
|
||||
- **Coordinate transform chain**: Known GPS → NED → pixel → back to GPS — verify round-trip error <0.1m
|
||||
- **Disconnected segment handling**: Simulate tracking loss → verify satellite re-localization triggers → verify cuVSLAM restarts → verify ESKF position continuity
|
||||
- **3-consecutive-failure**: Simulate VO + satellite failures → verify re-localization request sent → verify operator hint accepted
|
||||
- **Object localization**: Known object at known GPS → verify computed GPS matches within camera accuracy
|
||||
- **Mid-flight reboot**: Kill GPS-denied process → restart → verify recovery within expected time → verify position accuracy after recovery
|
||||
- **TRT engine load test**: Verify engines load successfully on Jetson
|
||||
- **TRT inference correctness**: Compare TRT output vs PyTorch reference (max L1 error < 0.01)
|
||||
- **CUDA Stream pipelining**: Verify Stream B satellite matching does not block Stream A VO
|
||||
- **ADTI sustained capture rate**: Verify 0.7fps sustained >30 min without buffer overflow
|
||||
- **Confidence tier transitions**: Verify fix_type and accuracy change correctly across HIGH → MEDIUM → LOW → FAILED transitions
|
||||
|
||||
### Non-Functional Tests
|
||||
|
||||
- **End-to-end accuracy** (primary validation): Fly with real GPS recording → run GPS-denied system in parallel → compare estimated vs real positions → verify 80% within 50m, 60% within 20m
|
||||
- **VO drift rate**: Measure cuVSLAM drift over 1km straight segment without satellite correction
|
||||
- **Satellite matching accuracy**: Compare satellite-matched position vs real GPS at known locations
|
||||
- **Processing time**: Verify end-to-end per-frame <400ms
|
||||
- **Memory usage**: Monitor over 30-min session → verify <8GB, no leaks
|
||||
- **Thermal**: Sustained 30-min run → verify no throttling
|
||||
- **GPS_INPUT rate**: Verify consistent 5-10Hz delivery to FC
|
||||
- **Tile storage**: Validate calculated storage matches actual for test mission area
|
||||
- **MinGRU TRT compatibility** (day-one blocker): Clone LiteSAM → ONNX export → polygraphy → trtexec
|
||||
- **Flight endurance**: Ground-test full system power draw against 267W estimate
|
||||
|
||||
## References
|
||||
|
||||
- ArduPilot GPS_RATE parameter: [https://github.com/ArduPilot/ardupilot/pull/15980](https://github.com/ArduPilot/ardupilot/pull/15980)
|
||||
- MAVLink GPS_INPUT message: [https://ardupilot.org/mavproxy/docs/modules/GPSInput.html](https://ardupilot.org/mavproxy/docs/modules/GPSInput.html)
|
||||
- pymavlink GPS_INPUT example: [https://webperso.ensta.fr/lebars/Share/GPS_INPUT_pymavlink.py](https://webperso.ensta.fr/lebars/Share/GPS_INPUT_pymavlink.py)
|
||||
- ESKF reference (fixed-wing UAV): [https://github.com/ludvigls/ESKF](https://github.com/ludvigls/ESKF)
|
||||
- ROS ESKF multi-sensor: [https://github.com/EliaTarasov/ESKF](https://github.com/EliaTarasov/ESKF)
|
||||
- Range-VIO scale observability: [https://arxiv.org/abs/2103.15215](https://arxiv.org/abs/2103.15215)
|
||||
- NaviLoc trajectory-level localization: [https://www.mdpi.com/2504-446X/10/2/97](https://www.mdpi.com/2504-446X/10/2/97)
|
||||
- SatLoc-Fusion hierarchical framework: [https://www.scilit.com/publications/e5cafaf875a49297a62b298a89d5572f](https://www.scilit.com/publications/e5cafaf875a49297a62b298a89d5572f)
|
||||
- Auterion GPS-denied workflow: [https://docs.auterion.com/vehicle-operation/auterion-mission-control/useful-resources/operations/gps-denied-workflow](https://docs.auterion.com/vehicle-operation/auterion-mission-control/useful-resources/operations/gps-denied-workflow)
|
||||
- PX4 GNSS-denied flight: [https://docs.px4.io/main/en/advanced_config/gnss_degraded_or_denied_flight.html](https://docs.px4.io/main/en/advanced_config/gnss_degraded_or_denied_flight.html)
|
||||
- ArduPilot GPS_INPUT advanced usage: [https://discuss.ardupilot.org/t/advanced-usage-of-gps-type-mav-14/99406](https://discuss.ardupilot.org/t/advanced-usage-of-gps-type-mav-14/99406)
|
||||
- Google Maps Ukraine imagery: [https://newsukraine.rbc.ua/news/google-maps-has-surprise-for-satellite-imagery-1727182380.html](https://newsukraine.rbc.ua/news/google-maps-has-surprise-for-satellite-imagery-1727182380.html)
|
||||
- Jetson Orin Nano Super thermal: [https://edgeaistack.app/blog/jetson-orin-nano-power-consumption/](https://edgeaistack.app/blog/jetson-orin-nano-power-consumption/)
|
||||
- GSD matching research: [https://www.kjrs.org/journal/view.html?pn=related&uid=756&vmd=Full](https://www.kjrs.org/journal/view.html?pn=related&uid=756&vmd=Full)
|
||||
- VO+satellite matching pipeline: [https://polen.itu.edu.tr/items/1fe1e872-7cea-44d8-a8de-339e4587bee6](https://polen.itu.edu.tr/items/1fe1e872-7cea-44d8-a8de-339e4587bee6)
|
||||
- PyCuVSLAM docs: [https://wiki.seeedstudio.com/pycuvslam_recomputer_robotics/](https://wiki.seeedstudio.com/pycuvslam_recomputer_robotics/)
|
||||
- Pixhawk 6x IMU (ICM-42688-P) datasheet: [https://invensense.tdk.com/products/motion-tracking/6-axis/icm-42688-p/](https://invensense.tdk.com/products/motion-tracking/6-axis/icm-42688-p/)
|
||||
- All references from solution_draft05.md
|
||||
|
||||
## Related Artifacts
|
||||
|
||||
- AC Assessment: `_docs/00_research/gps_denied_nav/00_ac_assessment.md`
|
||||
- Completeness assessment research: `_docs/00_research/solution_completeness_assessment/`
|
||||
- Previous research: `_docs/00_research/trt_engine_migration/`
|
||||
- Tech stack evaluation: `_docs/01_solution/tech_stack.md` (needs sync with draft05 corrections)
|
||||
- Security analysis: `_docs/01_solution/security_analysis.md`
|
||||
- Previous draft: `_docs/01_solution/solution_draft05.md`
|
||||
|
||||
@@ -1,257 +0,0 @@
|
||||
# Tech Stack Evaluation
|
||||
|
||||
## Requirements Summary
|
||||
|
||||
### Functional
|
||||
- GPS-denied visual navigation for fixed-wing UAV
|
||||
- Frame-center GPS estimation via VO + satellite matching + IMU fusion
|
||||
- Object-center GPS via geometric projection
|
||||
- Real-time streaming via REST API + SSE
|
||||
- Disconnected route segment handling
|
||||
- User-input fallback for unresolvable frames
|
||||
|
||||
### Non-Functional
|
||||
- <400ms per-frame processing (camera @ ~3fps)
|
||||
- <50m accuracy for 80% of frames, <20m for 60%
|
||||
- <8GB total memory (CPU+GPU shared pool)
|
||||
- Up to 3000 frames per flight session
|
||||
- Image Registration Rate >95% (normal segments)
|
||||
|
||||
### Hardware Constraints
|
||||
- **Jetson Orin Nano Super** (8GB LPDDR5, 1024 CUDA cores, 67 TOPS INT8)
|
||||
- **JetPack 6.2.2**: CUDA 12.6.10, TensorRT 10.3.0, cuDNN 9.3
|
||||
- ARM64 (aarch64) architecture
|
||||
- No internet connectivity during flight
|
||||
|
||||
## Technology Evaluation
|
||||
|
||||
### Platform & OS
|
||||
|
||||
| Option | Version | Score (1-5) | Notes |
|
||||
|--------|---------|-------------|-------|
|
||||
| **JetPack 6.2.2 (L4T)** | Ubuntu 22.04 based | **5** | Only supported OS for Orin Nano Super. Includes CUDA 12.6, TensorRT 10.3, cuDNN 9.3 |
|
||||
|
||||
**Selected**: JetPack 6.2.2 — no alternative.
|
||||
|
||||
### Primary Language
|
||||
|
||||
| Option | Fitness | Maturity | Perf on Jetson | Ecosystem | Score |
|
||||
|--------|---------|----------|----------------|-----------|-------|
|
||||
| **Python 3.10+** | 5 | 5 | 4 | 5 | **4.8** |
|
||||
| C++ | 5 | 5 | 5 | 3 | 4.5 |
|
||||
| Rust | 3 | 3 | 5 | 2 | 3.3 |
|
||||
|
||||
**Selected**: **Python 3.10+** as primary language.
|
||||
- cuVSLAM provides Python bindings (PyCuVSLAM v15.0.0)
|
||||
- TensorRT has Python API
|
||||
- FastAPI is Python-native
|
||||
- OpenCV has full Python+CUDA bindings
|
||||
- Performance-critical paths offloaded to CUDA via cuVSLAM/TensorRT — Python is glue code only
|
||||
- C++ for custom ESKF if NumPy proves too slow (unlikely for 16-state EKF at 100Hz)
|
||||
|
||||
### Visual Odometry
|
||||
|
||||
| Option | Version | FPS on Orin Nano | Memory | License | Score |
|
||||
|--------|---------|------------------|--------|---------|-------|
|
||||
| **cuVSLAM (PyCuVSLAM)** | v15.0.0 (Mar 2026) | 116fps @ 720p | ~200-300MB | Free (NVIDIA, closed-source) | **5** |
|
||||
| XFeat frame-to-frame | TensorRT engine | ~30-50ms/frame | ~50MB | MIT | 3.5 |
|
||||
| ORB-SLAM3 | v1.0 | ~30fps | ~300MB | GPLv3 | 2.5 |
|
||||
|
||||
**Selected**: **PyCuVSLAM v15.0.0**
|
||||
- 116fps on Orin Nano 8G at 720p (verified via Intermodalics benchmark)
|
||||
- Mono + IMU mode natively supported
|
||||
- Auto IMU fallback on tracking loss
|
||||
- Pre-built aarch64 wheel: `pip install -e bin/aarch64`
|
||||
- Loop closure built-in
|
||||
|
||||
**Risk**: Closed-source; nadir-only camera not explicitly tested. **Fallback**: XFeat frame-to-frame matching.
|
||||
|
||||
### Satellite Image Matching (Benchmark-Driven Selection)
|
||||
|
||||
**Day-one benchmark decides between two candidates:**
|
||||
|
||||
| Option | Params | Accuracy (UAV-VisLoc) | Est. Time on Orin Nano | License | Score |
|
||||
|--------|--------|----------------------|----------------------|---------|-------|
|
||||
| **LiteSAM (opt)** | 6.31M | RMSE@30 = 17.86m | ~300-500ms @ 480px (estimated) | Open-source | **4** (if fast enough) |
|
||||
| **XFeat semi-dense** | ~5M | Not benchmarked on UAV-VisLoc | ~50-100ms | MIT | **4** (if LiteSAM too slow) |
|
||||
|
||||
**Decision rule**:
|
||||
1. Export LiteSAM (opt) to TensorRT FP16 on Orin Nano Super
|
||||
2. Benchmark at 480px, 640px, 800px
|
||||
3. If ≤400ms at 480px → LiteSAM
|
||||
4. If >400ms → **abandon LiteSAM, XFeat is primary**
|
||||
|
||||
| Requirement | LiteSAM (opt) | XFeat semi-dense |
|
||||
|-------------|---------------|------------------|
|
||||
| PyTorch → ONNX → TensorRT export | Required | Required |
|
||||
| TensorRT FP16 engine | 6.31M params, ~25MB engine | ~5M params, ~20MB engine |
|
||||
| Input preprocessing | Resize to 480px, normalize | Resize to 640px, normalize |
|
||||
| Matching pipeline | End-to-end (detect + match + refine) | Detect → KNN match → geometric verify |
|
||||
| Cross-view robustness | Designed for satellite-aerial gap | General-purpose, less robust |
|
||||
|
||||
### Sensor Fusion
|
||||
|
||||
| Option | Complexity | Accuracy | Compute @ 100Hz | Score |
|
||||
|--------|-----------|----------|-----------------|-------|
|
||||
| **ESKF (custom)** | Low | Good | <1ms/step | **5** |
|
||||
| Hybrid ESKF/UKF | Medium | 49% better | ~2-3ms/step | 3.5 |
|
||||
| GTSAM Factor Graph | High | Best | ~10-50ms/step | 2 |
|
||||
|
||||
**Selected**: **Custom ESKF in Python (NumPy/SciPy)**
|
||||
- 16-state vector, well within NumPy capability
|
||||
- FilterPy (v1.4.5, MIT) as reference/fallback, but custom implementation preferred for tighter control
|
||||
- If 100Hz IMU prediction step proves slow in Python: rewrite as Cython or C extension (~1 day effort)
|
||||
|
||||
### Image Preprocessing
|
||||
|
||||
| Option | Tool | Time on Orin Nano | Notes | Score |
|
||||
|--------|------|-------------------|-------|-------|
|
||||
| **OpenCV CUDA resize** | cv2.cuda.resize | ~2-3ms (pre-allocated) | Must build OpenCV with CUDA from source. Pre-allocate GPU mats to avoid allocation overhead | **4** |
|
||||
| NVIDIA VPI resize | VPI 3.2 | ~1-2ms | Part of JetPack, potentially faster | 4 |
|
||||
| CPU resize (OpenCV) | cv2.resize | ~5-10ms | No GPU needed, simpler | 3 |
|
||||
|
||||
**Selected**: **OpenCV CUDA** (pre-allocated GPU memory) or **VPI 3.2** (whichever is faster in benchmark). Both available in JetPack 6.2.
|
||||
- Must build OpenCV from source with `CUDA_ARCH_BIN=8.7` for Orin Nano Ampere architecture
|
||||
- Alternative: VPI 3.2 is pre-installed in JetPack 6.2, no build step needed
|
||||
|
||||
### API & Streaming Framework
|
||||
|
||||
| Option | Version | Async Support | SSE Support | Score |
|
||||
|--------|---------|--------------|-------------|-------|
|
||||
| **FastAPI + sse-starlette** | FastAPI 0.115+, sse-starlette 3.3.2 | Native async/await | EventSourceResponse with auto-disconnect | **5** |
|
||||
| Flask + flask-sse | Flask 3.x | Limited | Redis dependency | 2 |
|
||||
| Raw aiohttp | aiohttp 3.x | Full | Manual SSE implementation | 3 |
|
||||
|
||||
**Selected**: **FastAPI + sse-starlette v3.3.2**
|
||||
- sse-starlette: 108M downloads/month, BSD-3 license, production-stable
|
||||
- Auto-generated OpenAPI docs
|
||||
- Native async for non-blocking VO + satellite pipeline
|
||||
- Uvicorn as ASGI server
|
||||
|
||||
### Satellite Tile Storage & Indexing
|
||||
|
||||
| Option | Complexity | Lookup Speed | Score |
|
||||
|--------|-----------|-------------|-------|
|
||||
| **GeoHash-indexed directory** | Low | O(1) hash lookup | **5** |
|
||||
| SQLite + spatial index | Medium | O(log n) | 4 |
|
||||
| PostGIS | High | O(log n) | 2 (overkill) |
|
||||
|
||||
**Selected**: **GeoHash-indexed directory structure**
|
||||
- Pre-flight: download tiles, store as `{geohash}/{zoom}_{x}_{y}.jpg` + `{geohash}/{zoom}_{x}_{y}_resized.jpg`
|
||||
- Runtime: compute geohash from ESKF position → direct directory lookup
|
||||
- Metadata in JSON sidecar files
|
||||
- No database dependency on the Jetson during flight
|
||||
|
||||
### Satellite Tile Provider
|
||||
|
||||
| Provider | Max Zoom | GSD | Pricing | Eastern Ukraine Coverage | Score |
|
||||
|----------|----------|-----|---------|--------------------------|-------|
|
||||
| **Google Maps Tile API** | 18-19 | ~0.3-0.5 m/px | 100K tiles free/month, then $0.48/1K | Partial (conflict zone gaps) | **4** |
|
||||
| Bing Maps | 18-19 | ~0.3-0.5 m/px | 125K free/year (basic) | Similar | 3.5 |
|
||||
| Mapbox Satellite | 18-19 | ~0.5 m/px | 200K free/month | Similar | 3.5 |
|
||||
|
||||
**Selected**: **Google Maps Tile API** (per restrictions.md). 100K free tiles/month covers ~25km² at zoom 19. For larger operational areas, costs are manageable at $0.48/1K tiles.
|
||||
|
||||
### Output Format
|
||||
|
||||
| Format | Standard | Tooling | Score |
|
||||
|--------|----------|---------|-------|
|
||||
| **GeoJSON** | RFC 7946 | Universal GIS support | **5** |
|
||||
| CSV (lat, lon, confidence) | De facto | Simple, lightweight | 4 |
|
||||
|
||||
**Selected**: **GeoJSON** as primary, CSV as export option. Per AC: WGS84 coordinates.
|
||||
|
||||
## Tech Stack Summary
|
||||
|
||||
```
|
||||
┌────────────────────────────────────────────────────┐
|
||||
│ HARDWARE: Jetson Orin Nano Super 8GB │
|
||||
│ OS: JetPack 6.2.2 (L4T / Ubuntu 22.04) │
|
||||
│ CUDA 12.6.10 / TensorRT 10.3.0 / cuDNN 9.3 │
|
||||
├────────────────────────────────────────────────────┤
|
||||
│ LANGUAGE: Python 3.10+ │
|
||||
│ FRAMEWORK: FastAPI + sse-starlette 3.3.2 │
|
||||
│ SERVER: Uvicorn (ASGI) │
|
||||
├────────────────────────────────────────────────────┤
|
||||
│ VISUAL ODOMETRY: PyCuVSLAM v15.0.0 │
|
||||
│ SATELLITE MATCH: LiteSAM(opt) or XFeat (benchmark) │
|
||||
│ SENSOR FUSION: Custom ESKF (NumPy/SciPy) │
|
||||
│ PREPROCESSING: OpenCV CUDA or VPI 3.2 │
|
||||
│ INFERENCE: TensorRT 10.3.0 (FP16) │
|
||||
├────────────────────────────────────────────────────┤
|
||||
│ TILE PROVIDER: Google Maps Tile API │
|
||||
│ TILE STORAGE: GeoHash-indexed directory │
|
||||
│ OUTPUT: GeoJSON (WGS84) via SSE stream │
|
||||
└────────────────────────────────────────────────────┘
|
||||
```
|
||||
|
||||
## Dependency List
|
||||
|
||||
### Python Packages (pip)
|
||||
|
||||
| Package | Version | Purpose |
|
||||
|---------|---------|---------|
|
||||
| pycuvslam | v15.0.0 (aarch64 wheel) | Visual odometry |
|
||||
| fastapi | >=0.115 | REST API framework |
|
||||
| sse-starlette | >=3.3.2 | SSE streaming |
|
||||
| uvicorn | >=0.30 | ASGI server |
|
||||
| numpy | >=1.26 | ESKF math, array ops |
|
||||
| scipy | >=1.12 | Rotation matrices, spatial transforms |
|
||||
| opencv-python (CUDA build) | >=4.8 | Image preprocessing (must build from source with CUDA) |
|
||||
| torch (aarch64) | >=2.3 (JetPack-compatible) | LiteSAM model loading (if selected) |
|
||||
| tensorrt | 10.3.0 (JetPack bundled) | Inference engine |
|
||||
| pycuda | >=2024.1 | CUDA stream management |
|
||||
| geojson | >=3.1 | GeoJSON output formatting |
|
||||
| pygeohash | >=1.2 | GeoHash tile indexing |
|
||||
|
||||
### System Dependencies (JetPack 6.2.2)
|
||||
|
||||
| Component | Version | Notes |
|
||||
|-----------|---------|-------|
|
||||
| CUDA Toolkit | 12.6.10 | Pre-installed |
|
||||
| TensorRT | 10.3.0 | Pre-installed |
|
||||
| cuDNN | 9.3 | Pre-installed |
|
||||
| VPI | 3.2 | Pre-installed, alternative to OpenCV CUDA for resize |
|
||||
| cuVSLAM runtime | Bundled with PyCuVSLAM wheel | |
|
||||
|
||||
### Offline Preprocessing Tools (developer machine, not Jetson)
|
||||
|
||||
| Tool | Purpose |
|
||||
|------|---------|
|
||||
| Python 3.10+ | Tile download script |
|
||||
| Google Maps Tile API key | Satellite tile access |
|
||||
| torch + LiteSAM weights | Feature pre-extraction (if LiteSAM selected) |
|
||||
| trtexec (TensorRT) | Model export to TensorRT engine |
|
||||
|
||||
## Risk Assessment
|
||||
|
||||
| Technology | Risk | Likelihood | Impact | Mitigation |
|
||||
|-----------|------|-----------|--------|------------|
|
||||
| cuVSLAM | Closed-source, nadir camera untested | Medium | High | XFeat frame-to-frame as open-source fallback |
|
||||
| LiteSAM | May exceed 400ms on Orin Nano Super | High | High | **Abandon for XFeat** — day-one benchmark is go/no-go |
|
||||
| OpenCV CUDA build | Build complexity on Jetson, CUDA arch compatibility | Medium | Low | VPI 3.2 as drop-in alternative (pre-installed) |
|
||||
| Google Maps Tile API | Conflict zone coverage gaps, EEA restrictions | Medium | Medium | Test tile availability for operational area pre-flight; alternative providers (Bing, Mapbox) |
|
||||
| Custom ESKF | Implementation bugs, tuning effort | Low | Medium | FilterPy v1.4.5 as reference; well-understood algorithm |
|
||||
| Python GIL | Concurrent VO + satellite matching contention | Low | Low | CUDA operations release GIL; use asyncio + threading for I/O |
|
||||
|
||||
## Learning Requirements
|
||||
|
||||
| Technology | Team Expertise Needed | Ramp-up Time |
|
||||
|-----------|----------------------|--------------|
|
||||
| PyCuVSLAM | SLAM concepts, Python API, camera calibration | 2-3 days |
|
||||
| TensorRT model export | ONNX export, trtexec, FP16 optimization | 2-3 days |
|
||||
| LiteSAM architecture | Transformer-based matching (if selected) | 1-2 days |
|
||||
| XFeat | Feature detection/matching concepts | 1 day |
|
||||
| ESKF | Kalman filtering, quaternion math, multi-rate fusion | 3-5 days |
|
||||
| FastAPI + SSE | Async Python, ASGI, SSE protocol | 1 day |
|
||||
| GeoHash spatial indexing | Geospatial concepts | 0.5 days |
|
||||
| Jetson deployment | JetPack, power modes, thermal management | 2-3 days |
|
||||
|
||||
## Development Environment
|
||||
|
||||
| Environment | Purpose | Setup |
|
||||
|-------------|---------|-------|
|
||||
| **Developer machine** (x86_64, GPU) | Development, unit testing, model export | Docker with CUDA + TensorRT |
|
||||
| **Jetson Orin Nano Super** | Integration testing, benchmarking, deployment | JetPack 6.2.2 flashed, SSH access |
|
||||
|
||||
Code should be developed and unit-tested on x86_64, then deployed to Jetson for integration/performance testing. cuVSLAM and TensorRT engines are aarch64-only — mock these in x86_64 tests.
|
||||
Reference in New Issue
Block a user