Refactor acceptance criteria, problem description, and restrictions for UAV GPS-Denied system. Enhance clarity and detail in performance metrics, image processing requirements, and operational constraints. Introduce new sections for UAV specifications, camera details, satellite imagery, and onboard hardware.

2026-06-22 09:21:12 +00:00 · 2026-03-17 09:00:06 +02:00
parent 767874cb90
commit f2aa95c8a2
35 changed files with 4857 additions and 26 deletions
@@ -0,0 +1,346 @@
+# Security Analysis
+
+## Operational Context
+
+This system runs on a UAV operating in a **conflict zone** (eastern Ukraine). The UAV could be shot down and physically captured. GPS denial/spoofing is the premise. The Jetson Orin Nano stores satellite imagery, flight plans, and captured photos. Security must assume the worst case: **physical access by an adversary**.
+
+## Threat Model
+
+### Asset Inventory
+
+| Asset | Sensitivity | Location | Notes |
+|-------|------------|----------|-------|
+| Captured camera imagery | HIGH | Jetson storage | Reconnaissance data — reveals what was surveyed |
+| Satellite tile cache | MEDIUM | Jetson storage | Reveals operational area and areas of interest |
+| Flight plan / route | HIGH | Jetson memory + storage | Reveals mission objectives and launch/landing sites |
+| Computed GPS positions | HIGH | Jetson memory, SSE stream | Real-time position data of UAV and surveyed targets |
+| Google Maps API key | MEDIUM | Offline prep machine only | Used pre-flight, NOT stored on Jetson |
+| TensorRT model weights | LOW | Jetson storage | LiteSAM/XFeat — publicly available models |
+| cuVSLAM binary | LOW | Jetson storage | NVIDIA proprietary but freely distributed |
+| IMU calibration data | LOW | Jetson storage | Device-specific calibration |
+| System configuration | MEDIUM | Jetson storage | API endpoints, tile paths, fusion parameters |
+
+### Threat Actors
+
+| Actor | Capability | Motivation | Likelihood |
+|-------|-----------|------------|------------|
+| **Adversary military (physical capture)** | Full physical access after UAV loss | Extract intelligence: imagery, flight plans, operational area | HIGH |
+| **Electronic warfare unit** | GPS spoofing/jamming, RF jamming | Disrupt navigation, force UAV off course | HIGH (GPS denial is the premise) |
+| **Network attacker (ground station link)** | Intercept/inject on UAV-to-ground comms | Steal position data, inject false commands | MEDIUM |
+| **Insider / rogue operator** | Authorized access to system | Data exfiltration, mission sabotage | LOW |
+| **Supply chain attacker** | Tampered satellite tiles or model weights | Feed corrupted reference data → position errors | LOW |
+
+### Attack Vectors
+
+| Vector | Target Asset | Actor | Impact | Likelihood |
+|--------|-------------|-------|--------|------------|
+| **Physical extraction of storage** | All stored data | Adversary (capture) | Full intelligence compromise | HIGH |
+| **GPS spoofing** | Position estimate | EW unit | Already mitigated — system is GPS-denied by design | N/A |
+| **IMU acoustic injection** | IMU data → ESKF | EW unit | Drift injection, subtle position errors | LOW |
+| **Camera blinding/spoofing** | VO + satellite matching | EW unit | VO failure, incorrect satellite matches | LOW |
+| **Adversarial ground patterns** | Satellite matching | Adversary | Physical patches on ground fool feature matching | VERY LOW |
+| **SSE stream interception** | Position data | Network attacker | Real-time position leak | MEDIUM |
+| **API command injection** | Flight session control | Network attacker | Start/stop/manipulate sessions | MEDIUM |
+| **Corrupted satellite tiles** | Satellite matching | Supply chain | Systematic position errors | LOW |
+| **Model weight tampering** | Matching accuracy | Supply chain | Degraded matching → higher drift | LOW |
+
+## Per-Component Security Requirements and Controls
+
+### 1. Data at Rest (Jetson Storage)
+
+| Requirement | Risk Level | Control | Implementation |
+|-------------|-----------|---------|----------------|
+| Protect captured imagery from extraction after capture | CRITICAL | Full-disk encryption (LUKS) | JetPack LUKS support with `ENC_ROOTFS=1`. Use OP-TEE Trusted Application for key management. Modify `luks-srv` to NOT auto-decrypt — require hardware token or secure erase trigger |
+| Protect satellite tiles and flight plans | HIGH | Same LUKS encryption | Included in full-disk encryption scope |
+| Enable rapid secure erase on capture/crash | CRITICAL | Tamper-triggered wipe | Hardware dead-man switch: if UAV telemetry lost for N seconds OR accelerometer detects crash impact → trigger `cryptsetup luksErase` on all LUKS volumes. Destroys key material in <1 second — data becomes unrecoverable |
+| Prevent cold-boot key extraction | HIGH | Minimize key residency in RAM | ESKF state and position history cleared from memory when session ends. Avoid writing position logs to disk unless encrypted |
+
+### 2. Secure Boot
+
+| Requirement | Risk Level | Control | Implementation |
+|-------------|-----------|---------|----------------|
+| Prevent unauthorized code execution | HIGH | NVIDIA Secure Boot with PKC fuse burning | Burn `SecurityMode` fuse (odm_production_mode=0x1) on production Jetsons. Sign all boot images with PKC key pair. Generate keys via HSM |
+| Prevent firmware rollback | MEDIUM | Ratchet fuses | Configure anti-rollback fuses in fuse configuration XML |
+| Debug port lockdown | HIGH | Disable JTAG/debug after production | Burn debug-disable fuses. Irreversible — production units only |
+
+### 3. API & Communication (FastAPI + SSE)
+
+| Requirement | Risk Level | Control | Implementation |
+|-------------|-----------|---------|----------------|
+| Authenticate API clients | HIGH | JWT bearer token | Pre-shared secret between ground station and Jetson. Generate JWT at session start. Short expiry (flight duration). `HTTPBearer` scheme in FastAPI |
+| Encrypt SSE stream | HIGH | TLS 1.3 | Uvicorn with TLS certificate (self-signed for field use, pre-installed on ground station). All SSE position data encrypted in transit |
+| Prevent unauthorized session control | HIGH | JWT + endpoint authorization | Session start/stop/anchor endpoints require valid JWT. Rate-limit via `slowapi` |
+| Prevent replay attacks | MEDIUM | JWT `exp` + `jti` claims | Token expiry per-flight. Unique token ID (`jti`) tracked to prevent reuse |
+| Limit API surface | MEDIUM | Minimal endpoint exposure | Only expose: POST /sessions, GET /sessions/{id}/stream (SSE), POST /sessions/{id}/anchor, DELETE /sessions/{id}. No admin/debug endpoints in production |
+
+### 4. Visual Odometry (cuVSLAM)
+
+| Requirement | Risk Level | Control | Implementation |
+|-------------|-----------|---------|----------------|
+| Detect camera feed tampering | LOW | Sanity checks on frame consistency | If consecutive frames show implausible motion (>500m displacement at 3fps), flag as suspicious. ESKF covariance spike triggers satellite re-localization |
+| Protect against VO poisoning | LOW | Cross-validate VO with IMU | ESKF fusion inherently cross-validates: IMU and VO disagreement raises covariance, triggers satellite matching. No single sensor can silently corrupt position |
+
+### 5. Satellite Image Matching
+
+| Requirement | Risk Level | Control | Implementation |
+|-------------|-----------|---------|----------------|
+| Verify tile integrity | MEDIUM | SHA-256 checksums per tile | During offline preprocessing: compute SHA-256 for each tile pair. Store checksums in signed manifest. At runtime: verify checksum on tile load |
+| Prevent adversarial tile injection | MEDIUM | Signed tile manifest | Offline tool signs manifest with private key. Jetson verifies signature with embedded public key before accepting tile set |
+| Detect satellite match outliers | MEDIUM | RANSAC inlier ratio threshold | If RANSAC inlier ratio <30%, reject match as unreliable. ESKF treats as no-measurement rather than bad measurement |
+| Protect against ground-based adversarial patterns | VERY LOW | Multi-tile consensus | Match against multiple overlapping tiles. Physical adversarial patches affect local area — consensus voting across tiles detects anomalies |
+
+### 6. Sensor Fusion (ESKF)
+
+| Requirement | Risk Level | Control | Implementation |
+|-------------|-----------|---------|----------------|
+| Prevent single-sensor corruption of position | HIGH | Adaptive noise + outlier rejection | Mahalanobis distance test on each measurement. Reject updates >5σ from predicted state. No single measurement can cause >50m position jump |
+| Detect systematic drift | MEDIUM | Satellite matching rate monitoring | If satellite matches consistently disagree with VO by >100m, flag integrity warning to operator |
+| Protect fusion state | LOW | In-memory only, no persistence | ESKF state never written to disk. Lost on power-off — no forensic recovery |
+
+### 7. Offline Preprocessing (Developer Machine)
+
+| Requirement | Risk Level | Control | Implementation |
+|-------------|-----------|---------|----------------|
+| Protect Google Maps API key | MEDIUM | Environment variable, never in code | `.env` file excluded from version control. API key used only on developer machine, never deployed to Jetson |
+| Validate downloaded tiles | LOW | Source verification | Download only from Google Maps Tile API via HTTPS. Verify TLS certificate chain |
+| Secure tile transfer to Jetson | MEDIUM | Signed + encrypted transfer | Transfer tile set + signed manifest via encrypted channel (SCP/SFTP). Verify manifest signature on Jetson before accepting |
+
+## Security Controls Summary
+
+### Authentication & Authorization
+
+- **Mechanism**: Pre-shared JWT secret between ground station and Jetson
+- **Scope**: All API endpoints require valid JWT bearer token
+- **Session model**: One JWT per flight session, expires at session end
+- **No user management on Jetson** — single-operator system, auth is device-to-device
+
+### Data Protection
+
+| State | Protection | Tool |
+|-------|-----------|------|
+| At rest (Jetson storage) | LUKS full-disk encryption | JetPack LUKS + OP-TEE |
+| In transit (SSE stream) | TLS 1.3 | Uvicorn SSL |
+| In memory (ESKF state) | No persistence, cleared on session end | Application logic |
+| On capture (emergency) | Tamper-triggered LUKS key erase | Hardware dead-man switch + `cryptsetup luksErase` |
+
+### Secure Communication
+
+- Ground station ↔ Jetson: TLS 1.3 (self-signed cert, pre-installed)
+- No internet connectivity during flight — no external attack surface
+- RF link security is out of scope (handled by UAV communication system)
+
+### Logging & Monitoring
+
+| What | Where | Retention |
+|------|-------|-----------|
+| API access logs (request count, errors) | In-memory ring buffer | Current session only, not persisted |
+| Security events (auth failures, integrity warnings) | In-memory + SSE alert to operator | Current session only |
+| Position history | In-memory for refinement, SSE to ground station | NOT persisted on Jetson after session end |
+| Crash/tamper events | Trigger secure erase, no logging | N/A — priority is data destruction |
+
+**Design principle**: Minimize data persistence on Jetson. The ground station is the system of record. Jetson stores only what's needed for the current flight — satellite tiles (encrypted at rest) and transient processing state (memory only).
+
+## Protected Code Execution (OP-TEE / ARM TrustZone)
+
+### Overview
+
+The Jetson Orin Nano Super supports hardware-enforced protected code execution via **ARM TrustZone** and **OP-TEE v4.2.0** (included in Jetson Linux 36.3+). TrustZone partitions the processor into two isolated worlds:
+
+- **Secure World** (TEE): Runs at ARMv8 secure EL-1 (OS) and EL-0 (apps). Code here cannot be read or tampered with from the normal world. OP-TEE is the secure OS.
+- **Normal World**: Standard Linux (JetPack). Our Python application, cuVSLAM, FastAPI all run here.
+
+Trusted Applications (TAs) execute inside the secure world and are invoked by Client Applications (CAs) in the normal world via the GlobalPlatform TEE Client API.
+
+### Architecture on Jetson Orin Nano
+
+```
+┌─────────────────────────────────────────────────┐
+│          NORMAL WORLD (Linux / JetPack)          │
+│                                                   │
+│  Client Application (CA)                          │
+│  ↕ libteec.so (TEE Client API)                   │
+│  ↕ OP-TEE Linux Kernel Driver                    │
+│  ↕ ARM Trusted Firmware (ATF) / Monitor          │
+├─────────────────────────────────────────────────┤
+│          SECURE WORLD (OP-TEE v4.2.0)            │
+│                                                   │
+│  OP-TEE OS (ARMv8 S-EL1)                        │
+│  ├── jetson-user-key PTA (key management)        │
+│  ├── luks TA (disk encryption passphrase)        │
+│  ├── hwkey-agent TA (encrypt/decrypt data)       │
+│  ├── PKCS #11 TA (crypto token interface)        │
+│  └── Custom TAs (our application-specific TAs)   │
+│                                                   │
+│  Hardware: Security Engine (SE), HW RNG, Fuses   │
+│  TZ-DRAM: Dedicated memory carveout              │
+└─────────────────────────────────────────────────┘
+```
+
+### Key Hierarchy (Hardware-Backed)
+
+The Jetson Orin Nano provides a hardware-rooted key hierarchy via the Security Engine (SE) and Encrypted Key Blob (EKB):
+
+```
+OEM_K1 fuse (256-bit AES, burned into hardware, cannot be read by software)
+    │
+    ├── EKB_RK (EKB Root Key, derived via AES-128-ECB from OEM_K1 + FV)
+    │   ├── EKB_EK (encryption key for EKB content)
+    │   └── EKB_AK (authentication key for EKB content)
+    │
+    ├── HUK (Hardware Unique Key, per-device, derived via NIST-SP-800-108)
+    │   └── SSK (Secure Storage Key, per-device, generated at OP-TEE boot)
+    │       ├── TSK (TA Storage Key, per-TA)
+    │       └── FEK (File Encryption Key, per-file)
+    │
+    └── LUKS passphrase (derived from disk encryption key stored in EKB)
+```
+
+Fuse keys are loaded into SE keyslots during early boot (before OP-TEE starts). Software cannot read keys from keyslots — only derive new keys through the SE. After use, keyslots should be cleared via `tegra_se_clear_aes_keyslots()`.
+
+### What to Run in the Secure World (Our Use Cases)
+
+| Use Case | TA Type | Purpose |
+|----------|---------|---------|
+| **LUKS disk encryption** | Built-in `luks` TA | Generate one-time passphrase at boot to unlock encrypted rootfs. Keys never leave secure world |
+| **Tile manifest verification** | Custom User TA | Verify SHA-256 signatures of satellite tile manifests. Signing key stored in EKB, accessible only in secure world |
+| **JWT secret storage** | Custom User TA or `hwkey-agent` TA | Store JWT signing secret in EKB. Sign/verify JWTs inside secure world — secret never exposed to Linux |
+| **Secure erase trigger** | Custom User TA | Receive tamper signal → invoke `cryptsetup luksErase` via CA. Key erase logic runs in secure world to prevent normal-world interference |
+| **TLS private key protection** | PKCS #11 TA | Store TLS private key in OP-TEE secure storage. Uvicorn uses PKCS #11 interface to perform TLS handshake without key leaving secure world |
+
+### How to Enable Protected Code Execution
+
+#### Step 1: Burn OEM_K1 Fuse (One-Time, Irreversible)
+
+```bash
+# Generate 256-bit OEM_K1 key (use HSM in production)
+openssl rand -hex 32 > oem_k1_key.txt
+
+# Create fuse configuration XML with OEM_K1
+# Burn fuse via odmfuse.sh (IRREVERSIBLE)
+sudo ./odmfuse.sh -i <fuse_config.xml> <board_name>
+```
+
+After burning `SecurityMode` fuse (`odm_production_mode=0x1`), all further fuse writes are blocked. OEM_K1 becomes permanently embedded in hardware.
+
+#### Step 2: Generate and Flash EKB
+
+```bash
+# Generate user keys (disk encryption key, JWT secret, tile signing key)
+openssl rand -hex 16 > disk_enc_key.txt
+openssl rand -hex 32 > jwt_secret.txt
+openssl rand -hex 32 > tile_signing_key.txt
+
+# Generate EKB binary
+python3 gen_ekb.py -chip t234 \
+  -oem_k1_key oem_k1_key.txt \
+  -in_sym_key uefi_enc_key.txt \
+  -in_sym_key2 disk_enc_key.txt \
+  -in_auth_key uefi_var_auth_key.txt \
+  -out eks_t234.img
+
+# Flash EKB to EKS partition
+# (part of the normal flash process with secure boot enabled)
+```
+
+#### Step 3: Enable LUKS Disk Encryption
+
+```bash
+# During flash, set ENC_ROOTFS=1 to encrypt rootfs
+export ENC_ROOTFS=1
+sudo ./flash.sh <board_name> <storage_device>
+```
+
+The `luks` TA in OP-TEE derives a passphrase from the disk encryption key in EKB at boot. The passphrase is generated inside the secure world and passed to `cryptsetup` — it never exists in persistent storage.
+
+#### Step 4: Develop Custom Trusted Applications
+
+Cross-compile TAs for aarch64 using the Jetson OP-TEE source package:
+
+```bash
+# Build custom TA (e.g., tile manifest verifier)
+make -C <ta_source_dir> \
+  CROSS_COMPILE="<toolchain>/bin/aarch64-buildroot-linux-gnu-" \
+  TA_DEV_KIT_DIR="<optee_src>/optee/build/t234/export-ta_arm64/" \
+  OPTEE_CLIENT_EXPORT="<optee_src>/optee/install/t234/usr" \
+  TEEC_EXPORT="<optee_src>/optee/install/t234/usr" \
+  -j"$(nproc)"
+
+# Deploy: copy TA to /lib/optee_armtz/ on Jetson
+# Deploy: copy CA to /usr/sbin/ on Jetson
+```
+
+TAs conform to the GlobalPlatform TEE Internal Core API. Use the `hello_world` example from `optee_examples` as a starting template.
+
+#### Step 5: Enable Secure Boot
+
+```bash
+# Generate PKC key pair (use HSM for production)
+openssl genrsa -out pkc_key.pem 3072
+
+# Sign and flash secured images
+sudo ./flash.sh --sign pkc_key.pem <board_name> <storage_device>
+
+# After verification, burn SecurityMode fuse (IRREVERSIBLE)
+```
+
+### Available Crypto Services in Secure World
+
+| Service | Provider | Notes |
+|---------|----------|-------|
+| AES-128/256 encryption/decryption | SE hardware | Via keyslot-derived keys, never leaves SE |
+| Key derivation (NIST-SP-800-108) | `jetson-user-key` PTA | Derive purpose-specific keys from EKB keys |
+| Hardware RNG | SE hardware | `TEE_GenerateRandom()` or PTA command |
+| PKCS #11 crypto tokens | PKCS #11 TA | Standard crypto interface for TLS, signing |
+| SHA-256, HMAC | MbedTLS (bundled in optee_os) | Software crypto in secure world |
+| RSA/ECC signing | GlobalPlatform TEE Crypto API | For manifest signature verification |
+
+### Limitations on Orin Nano
+
+| Limitation | Impact | Workaround |
+|-----------|--------|------------|
+| No RPMB support (only AGX Orin has RPMB) | Secure storage uses REE FS instead of replay-protected memory | Acceptable — LUKS encryption protects data at rest. REE FS secure storage is encrypted by SSK |
+| EKB can only be updated via OTA, not at runtime | Cannot rotate keys in flight | Pre-provision per-device unique keys at manufacturing time |
+| OP-TEE hello_world reported issues on some Orin Nano units | Some users report initialization failures | Use JetPack 6.2.2+ which includes fixes. Test thoroughly on target hardware |
+| TZ-DRAM is a fixed carveout | Limits secure world memory | Keep TAs lightweight — only crypto operations and key management, not data processing |
+
+### Security Hardening Checklist
+
+- [ ] Burn OEM_K1 fuse with unique per-device key (via HSM)
+- [ ] Generate and flash EKB with disk encryption key, JWT secret, tile signing key
+- [ ] Enable LUKS full-disk encryption (`ENC_ROOTFS=1`)
+- [ ] Modify `luks-srv` to NOT auto-decrypt (require explicit trigger or dead-man switch)
+- [ ] Burn SecurityMode fuse (`odm_production_mode=0x1`) — enables secure boot chain
+- [ ] Burn debug-disable fuses — disables JTAG
+- [ ] Configure anti-rollback ratchet fuses
+- [ ] Clear SE keyslots after EKB extraction via `tegra_se_clear_aes_keyslots()`
+- [ ] Deploy custom TAs for JWT signing and tile manifest verification
+- [ ] Use PKCS #11 for TLS private key protection
+- [ ] Test secure erase trigger end-to-end
+- [ ] Run `xtest` (OP-TEE test suite) on production Jetson to validate TEE
+
+## Key Security Risks
+
+| Risk | Severity | Mitigation Status |
+|------|---------|-------------------|
+| Physical capture → data extraction | CRITICAL | Mitigated by LUKS + secure erase. Residual risk: attacker extracts RAM before erase triggers |
+| No auto-decrypt bypass for LUKS | HIGH | Requires custom `luks-srv` modification — development effort needed |
+| Self-signed TLS certificates | MEDIUM | Acceptable for field deployment. Certificate pinning on ground station prevents MITM |
+| cuVSLAM is closed-source | LOW | Cannot audit for vulnerabilities. Mitigated by running in sandboxed environment, input validation on camera frames |
+| Dead-man switch reliability | HIGH | Hardware integration required. False triggers (temporary signal loss) must NOT cause premature erase. Needs careful threshold tuning |
+
+## References
+- xT-STRIDE threat model for UAVs (2025): https://link.springer.com/article/10.1007/s10207-025-01082-4
+- NVIDIA Jetson OP-TEE Documentation (r36.4.4): https://docs.nvidia.com/jetson/archives/r36.4.4/DeveloperGuide/SD/Security/OpTee.html
+- NVIDIA Jetson Security Overview (r36.4.3): https://docs.nvidia.com/jetson/archives/r36.4.3/DeveloperGuide/SD/Security.html
+- NVIDIA Jetson LUKS Disk Encryption: https://docs.nvidia.com/jetson/archives/r36.4.4/DeveloperGuide/SD/Security/DiskEncryption.html
+- NVIDIA Jetson Secure Boot: https://docs.nvidia.com/jetson/archives/r36.4.4/DeveloperGuide/SD/Security/SecureBoot.html
+- NVIDIA Jetson Secure Storage: https://docs.nvidia.com/jetson/archives/r36.4.3/DeveloperGuide/SD/Security/SecureStorage.html
+- NVIDIA Jetson Firmware TPM: https://docs.nvidia.com/jetson/archives/r36.4.3/DeveloperGuide/SD/Security/FirmwareTPM.html
+- NVIDIA Jetson Rollback Protection: https://docs.nvidia.com/jetson/archives/r36.4.3/DeveloperGuide/SD/Security/RollbackProtection.html
+- Jetson Orin Fuse Specification: https://developer.nvidia.com/downloads/jetson-agx-orin-series-fuse-specification
+- OP-TEE Official Documentation: https://optee.readthedocs.io/en/latest/
+- OP-TEE Trusted Application Examples: https://github.com/linaro-swg/optee_examples
+- RidgeRun OP-TEE on Jetson Guide: https://developer.ridgerun.com/wiki/index.php/RidgeRun_Platform_Security_Manual/Getting_Started/TEE/NVIDA-Jetson
+- GlobalPlatform TEE Specifications: https://globalplatform.org/specs-library/?filter-committee=tee
+- Model Agnostic Defense against Adversarial Patches on UAVs (2024): https://arxiv.org/html/2405.19179v1
+- FastAPI Security Best Practices (2026): https://fastlaunchapi.dev/blog/fastapi-best-practices-production-2026
@@ -0,0 +1,283 @@
+# Solution Draft
+
+## Product Solution Description
+
+A real-time GPS-denied visual navigation system for fixed-wing UAVs, running entirely on a Jetson Orin Nano Super (8GB). The system determines frame-center GPS coordinates by fusing three information sources: (1) CUDA-accelerated visual odometry (cuVSLAM), (2) absolute position corrections from satellite image matching, and (3) IMU-based motion prediction. Results stream to clients via REST API + SSE in real time.
+
+**Hard constraint**: Camera shoots at ~3fps (333-400ms interval). The full pipeline must complete within **400ms per frame**.
+
+**Satellite matching strategy**: Benchmark LiteSAM on actual Orin Nano Super hardware as a day-one priority. If LiteSAM cannot achieve ≤400ms at 480px resolution, **abandon it entirely** and use XFeat semi-dense matching as the primary satellite matcher. Speed is non-negotiable.
+
+**Core architectural principles**:
+1. **cuVSLAM handles VO** — NVIDIA's CUDA-accelerated library achieves 90fps on Jetson Orin Nano, giving VO essentially "for free" (~11ms/frame).
+2. **Keyframe-based satellite matching** — satellite matcher runs on keyframes only (every 3-10 frames), amortizing its cost. Non-keyframes rely on cuVSLAM VO + IMU.
+3. **Every keyframe independently attempts satellite-based geo-localization** — this handles disconnected segments natively.
+4. **Pipeline parallelism** — satellite matching for frame N overlaps with VO processing of frame N+1 via CUDA streams.
+
+```
+┌─────────────────────────────────────────────────────────────────┐
+│                    OFFLINE (Before Flight)                       │
+│  Satellite Tiles → Download & Crop → Store as tile pairs        │
+│  (Google Maps)     (per flight plan)   (disk, GeoHash indexed)  │
+└─────────────────────────────────────────────────────────────────┘
+                              │
+                              ▼
+┌─────────────────────────────────────────────────────────────────┐
+│                    ONLINE (During Flight)                        │
+│                                                                  │
+│  EVERY FRAME (400ms budget):                                     │
+│  ┌────────────────────────────────┐                              │
+│  │ Camera → Downsample (CUDA 2ms)│                               │
+│  │       → cuVSLAM VO+IMU (~11ms)│──→ ESKF Update → SSE Emit   │
+│  └────────────────────────────────┘         ↑                    │
+│                                             │                    │
+│  KEYFRAMES ONLY (every 3-10 frames):        │                    │
+│  ┌────────────────────────────────────┐     │                    │
+│  │ Satellite match (async CUDA stream)│─────┘                    │
+│  │ LiteSAM or XFeat (see benchmark)  │                           │
+│  │ (does NOT block VO output)         │                           │
+│  └────────────────────────────────────┘                          │
+│                                                                  │
+│  IMU: 100+Hz continuous → ESKF prediction                        │
+└─────────────────────────────────────────────────────────────────┘
+```
+
+## Speed Optimization Techniques
+
+### 1. cuVSLAM for Visual Odometry (~11ms/frame)
+NVIDIA's CUDA-accelerated VO library (v15.0.0, March 2026) achieves 90fps on Jetson Orin Nano. Supports monocular camera + IMU natively. Features: automatic IMU fallback when visual tracking fails, loop closure, Python and C++ APIs. This eliminates custom VO entirely.
+
+### 2. Keyframe-Based Satellite Matching
+Not every frame needs satellite matching. Strategy:
+- cuVSLAM provides VO at every frame (high-rate, low-latency)
+- Satellite matching triggers on **keyframes** selected by:
+  - Fixed interval: every 3-10 frames (~1-3.3s between satellite corrections)
+  - Confidence drop: when ESKF covariance exceeds threshold
+  - VO failure: when cuVSLAM reports tracking loss (sharp turn)
+
+### 3. Satellite Matcher Selection (Benchmark-Driven)
+
+**Candidate A: LiteSAM (opt)** — Best accuracy for satellite-aerial matching (RMSE@30 = 17.86m on UAV-VisLoc). 6.31M params, MobileOne + TAIFormer + MinGRU. Benchmarked at 497ms on Jetson AGX Orin at 1184px. AGX Orin is 3-4x more powerful than Orin Nano Super (275 TOPS vs 67 TOPS, $2000+ vs $249).
+
+Realistic Orin Nano Super estimates:
+- At 1184px: ~1.5-2.0s (unusable)
+- At 640px: ~500-800ms (borderline)
+- At 480px: ~300-500ms (best case)
+
+**Candidate B: XFeat semi-dense** — ~50-100ms on Orin Nano Super. Proven on Jetson. Not specifically designed for cross-view satellite-aerial, but fast and reliable.
+
+**Decision rule**: Benchmark LiteSAM TensorRT FP16 at 480px on Orin Nano Super. If ≤400ms → use LiteSAM. If >400ms → **abandon LiteSAM, use XFeat as primary**. No hybrid compromises — pick one and optimize it.
+
+### 4. TensorRT FP16 Optimization
+LiteSAM's MobileOne backbone is reparameterizable — multi-branch training structure collapses to a single feed-forward path at inference. Combined with TensorRT FP16, this maximizes throughput. INT8 is possible for MobileOne backbone but ViT/transformer components may degrade with INT8.
+
+### 5. CUDA Stream Pipelining
+Overlap operations across consecutive frames:
+- Stream A: cuVSLAM VO for current frame (~11ms) + ESKF fusion (~1ms)
+- Stream B: Satellite matching for previous keyframe (async)
+- CPU: SSE emission, tile management, keyframe selection logic
+
+### 6. Pre-cropped Satellite Tiles
+Offline: for each satellite tile, store both the raw image and a pre-resized version matching the satellite matcher's input resolution. Runtime avoids resize cost.
+
+## Existing/Competitor Solutions Analysis
+
+| Solution | Approach | Accuracy | Hardware | Limitations |
+|----------|----------|----------|----------|-------------|
+| Mateos-Ramirez et al. (2024) | VO (ORB) + satellite keypoint correction + Kalman | 142m mean / 17km (0.83%) | Orange Pi class | No re-localization; ORB only; 1000m+ altitude |
+| SatLoc (2025) | DinoV2 + XFeat + optical flow + adaptive fusion | <15m, >90% coverage | Edge (unspecified) | Paper not fully accessible |
+| LiteSAM (2025) | MobileOne + TAIFormer + MinGRU subpixel refinement | RMSE@30 = 17.86m on UAV-VisLoc | RTX 3090 (62ms), AGX Orin (497ms) | Not tested on Orin Nano; AGX Orin is 3-4x more powerful |
+| TerboucheHacene/visual_localization | SuperPoint/SuperGlue/GIM + VO + satellite | Not quantified | Desktop-class | Not edge-optimized |
+| cuVSLAM (NVIDIA, 2025-2026) | CUDA-accelerated VO+SLAM, mono/stereo/IMU | <1% trajectory error (KITTI), <5cm (EuRoC) | Jetson Orin Nano (90fps) | VO only, no satellite matching |
+
+**Key insight**: Combine cuVSLAM (best-in-class VO for Jetson) with the fastest viable satellite-aerial matcher via ESKF fusion. LiteSAM is the accuracy leader but unproven on Orin Nano Super — benchmark first, abandon for XFeat if too slow.
+
+## Architecture
+
+### Component: Visual Odometry
+
+| Solution | Tools | Advantages | Limitations | Fit |
+|----------|-------|-----------|-------------|-----|
+| cuVSLAM (mono+IMU) | PyCuVSLAM / C++ API | 90fps on Orin Nano, NVIDIA-optimized, loop closure, IMU fallback | Closed-source CUDA library | ✅ Best |
+| XFeat frame-to-frame | XFeatTensorRT | 5x faster than SuperPoint, open-source | ~30-50ms total, no IMU integration | ⚠️ Fallback |
+| ORB-SLAM3 | OpenCV + custom | Well-understood, open-source | CPU-heavy, ~30fps on Orin | ⚠️ Slower |
+
+**Selected**: **cuVSLAM (mono+IMU mode)** — purpose-built by NVIDIA for Jetson. ~11ms/frame leaves 389ms for everything else. Auto-fallback to IMU when visual tracking fails.
+
+### Component: Satellite Image Matching
+
+| Solution | Tools | Advantages | Limitations | Fit |
+|----------|-------|-----------|-------------|-----|
+| LiteSAM (opt) | TensorRT | Best satellite-aerial accuracy (RMSE@30 17.86m), 6.31M params, subpixel refinement | 497ms on AGX Orin at 1184px; AGX Orin is 3-4x more powerful than Orin Nano Super | ✅ If benchmark passes |
+| XFeat semi-dense | XFeatTensorRT | ~50-100ms, lightweight, Jetson-proven | Not designed for cross-view satellite-aerial | ✅ If LiteSAM fails benchmark |
+| EfficientLoFTR | TensorRT | Good accuracy, semi-dense | 15.05M params (2.4x LiteSAM), slower | ⚠️ Heavier |
+| SuperPoint + LightGlue | TensorRT C++ | Good general matching | Sparse only, worse on satellite-aerial | ⚠️ Not specialized |
+
+**Selection**: Benchmark-driven. Day-one test on Orin Nano Super:
+1. Export LiteSAM (opt) to TensorRT FP16
+2. Measure at 480px, 640px, 800px
+3. If ≤400ms at 480px → **LiteSAM**
+4. If >400ms at any viable resolution → **XFeat semi-dense** (primary, no hybrid)
+
+### Component: Sensor Fusion
+
+| Solution | Tools | Advantages | Limitations | Fit |
+|----------|-------|-----------|-------------|-----|
+| Error-State EKF (ESKF) | Custom Python/C++ | Lightweight, multi-rate, well-understood | Linear approximation | ✅ Best |
+| Hybrid ESKF/UKF | Custom | 49% better accuracy | More complex | ⚠️ Upgrade path |
+| Factor Graph (GTSAM) | GTSAM | Best accuracy | Heavy compute | ❌ Too heavy |
+
+**Selected**: **ESKF** with adaptive measurement noise. State vector: [position(3), velocity(3), orientation_quat(4), accel_bias(3), gyro_bias(3)] = 16 states.
+
+Measurement sources and rates:
+- IMU prediction: 100+Hz
+- cuVSLAM VO update: ~3Hz (every frame)
+- Satellite update: ~0.3-1Hz (keyframes only, delayed via async pipeline)
+
+### Component: Satellite Tile Preprocessing (Offline)
+
+**Selected**: **GeoHash-indexed tile pairs on disk**.
+
+Pipeline:
+1. Define operational area from flight plan
+2. Download satellite tiles from Google Maps Tile API at max zoom (18-19)
+3. Pre-resize each tile to matcher input resolution
+4. Store: original tile + resized tile + metadata (GPS bounds, zoom, GSD) in GeoHash-indexed directory structure
+5. Copy to Jetson storage before flight
+
+### Component: Re-localization (Disconnected Segments)
+
+**Selected**: **Keyframe satellite matching is always active + expanded search on VO failure**.
+
+When cuVSLAM reports tracking loss (sharp turn, no features):
+1. Immediately flag next frame as keyframe → trigger satellite matching
+2. Expand tile search radius (from ±200m to ±1km based on IMU dead-reckoning uncertainty)
+3. If match found: position recovered, new segment begins
+4. If 3+ consecutive keyframe failures: request user input via API
+
+### Component: Object Center Coordinates
+
+Geometric calculation once frame-center GPS is known:
+1. Pixel offset from center: (dx_px, dy_px)
+2. Convert to meters: dx_m = dx_px × GSD, dy_m = dy_px × GSD
+3. Rotate by IMU yaw heading
+4. Convert meter offset to lat/lon and add to frame-center GPS
+
+### Component: API & Streaming
+
+**Selected**: **FastAPI + sse-starlette**. REST for session management, SSE for real-time position stream. OpenAPI auto-documentation.
+
+## Processing Time Budget (per frame, 400ms budget)
+
+### Normal Frame (non-keyframe, ~60-80% of frames)
+
+| Step | Time | Notes |
+|------|------|-------|
+| Image capture + transfer | ~10ms | CSI/USB3 |
+| Downsample (for cuVSLAM) | ~2ms | OpenCV CUDA |
+| cuVSLAM VO+IMU | ~11ms | NVIDIA CUDA-optimized, 90fps capable |
+| ESKF fusion (VO+IMU update) | ~1ms | C extension or NumPy |
+| SSE emit | ~1ms | Async |
+| **Total** | **~25ms** | Well within 400ms |
+
+### Keyframe Satellite Matching (async, every 3-10 frames)
+
+Runs asynchronously on a separate CUDA stream — does NOT block per-frame VO output.
+
+**Path A — LiteSAM (if benchmark passes)**:
+
+| Step | Time | Notes |
+|------|------|-------|
+| Downsample to ~480px | ~1ms | OpenCV CUDA |
+| Load satellite tile | ~5ms | Pre-resized, from storage |
+| LiteSAM (opt) matching | ~300-500ms | TensorRT FP16, 480px, Orin Nano Super estimate |
+| Geometric pose (RANSAC) | ~5ms | Homography estimation |
+| ESKF satellite update | ~1ms | Delayed measurement |
+| **Total** | **~310-510ms** | Async, does not block VO |
+
+**Path B — XFeat (if LiteSAM abandoned)**:
+
+| Step | Time | Notes |
+|------|------|-------|
+| XFeat feature extraction (both images) | ~10-20ms | TensorRT FP16/INT8 |
+| XFeat semi-dense matching | ~30-50ms | KNN + refinement |
+| Geometric verification (RANSAC) | ~5ms | |
+| ESKF satellite update | ~1ms | |
+| **Total** | **~50-80ms** | Comfortably within budget |
+
+### Per-Frame Wall-Clock Latency
+
+Every frame:
+- **VO result emitted in ~25ms** (cuVSLAM + ESKF + SSE)
+- Satellite correction arrives asynchronously on keyframes
+- Client gets immediate position, then refined position when satellite match completes
+
+## Memory Budget (Jetson Orin Nano Super, 8GB shared)
+
+| Component | Memory | Notes |
+|-----------|--------|-------|
+| OS + runtime | ~1.5GB | JetPack 6.2 + Python |
+| cuVSLAM | ~200-300MB | NVIDIA CUDA library + internal state |
+| Satellite matcher TensorRT | ~50-100MB | LiteSAM FP16 or XFeat FP16 |
+| Current frame (downsampled) | ~2MB | 640×480×3 |
+| Satellite tile (pre-resized) | ~1MB | Single active tile |
+| ESKF state + buffers | ~10MB | |
+| FastAPI + SSE runtime | ~100MB | |
+| **Total** | **~1.9-2.4GB** | ~25-30% of 8GB — comfortable margin |
+
+## Confidence Scoring
+
+| Level | Condition | Expected Accuracy |
+|-------|-----------|-------------------|
+| HIGH | Satellite match succeeded + cuVSLAM consistent | <20m |
+| MEDIUM | cuVSLAM VO only, recent satellite correction (<500m travel) | 20-50m |
+| LOW | cuVSLAM VO only, no recent satellite correction | 50-100m+ |
+| VERY LOW | IMU dead-reckoning only (cuVSLAM + satellite both failed) | 100m+ |
+| MANUAL | User-provided position | As provided |
+
+## Key Risks and Mitigations
+
+| Risk | Likelihood | Impact | Mitigation |
+|------|-----------|--------|------------|
+| LiteSAM too slow on Orin Nano Super | HIGH | Misses 400ms deadline | **Abandon LiteSAM, use XFeat**. Day-one benchmark is the go/no-go gate |
+| cuVSLAM not supporting nadir-only camera well | MEDIUM | VO accuracy degrades | Fall back to XFeat frame-to-frame matching |
+| Google Maps satellite quality in conflict zone | HIGH | Satellite matching fails | Accept VO+IMU with higher drift; request user input sooner; alternative satellite providers |
+| XFeat cross-view accuracy insufficient | MEDIUM | Position corrections less accurate than LiteSAM | Increase keyframe frequency; multi-tile consensus voting; geometric verification with strict RANSAC |
+| cuVSLAM is closed-source | LOW | Hard to debug | Fallback to XFeat VO; cuVSLAM has Python+C++ APIs |
+
+## Testing Strategy
+
+### Integration / Functional Tests
+- End-to-end pipeline test with real flight data (60 images from input_data/)
+- Compare computed positions against ground truth GPS from coordinates.csv
+- Measure: percentage within 50m, percentage within 20m
+- Test sharp-turn handling: introduce 90-degree heading change in sequence
+- Test user-input fallback: simulate 3+ consecutive failures
+- Test SSE streaming: verify client receives VO result within 50ms, satellite-corrected result within 500ms
+- Test session management: start/stop/restart flight sessions via REST API
+
+### Non-Functional Tests
+- **Day-one benchmark**: LiteSAM TensorRT FP16 at 480/640/800px on Orin Nano Super → go/no-go for LiteSAM
+- cuVSLAM benchmark: verify 90fps monocular+IMU on Orin Nano Super
+- Performance: measure per-frame processing time (must be <400ms)
+- Memory: monitor peak usage during 1000-frame session (must stay <8GB)
+- Stress: process 3000 frames without memory leak
+- Keyframe strategy: vary interval (2, 3, 5, 10) and measure accuracy vs latency tradeoff
+
+## References
+- LiteSAM (2025): https://www.mdpi.com/2072-4292/17/19/3349
+- LiteSAM code: https://github.com/boyagesmile/LiteSAM
+- cuVSLAM (2025-2026): https://github.com/NVlabs/PyCuVSLAM
+- PyCuVSLAM API: https://nvlabs.github.io/PyCuVSLAM/api.html
+- Mateos-Ramirez et al. (2024): https://www.mdpi.com/2076-3417/14/16/7420
+- SatLoc (2025): https://www.scilit.com/publications/e5cafaf875a49297a62b298a89d5572f
+- XFeat (CVPR 2024): https://arxiv.org/abs/2404.19174
+- XFeat TensorRT for Jetson: https://github.com/PranavNedunghat/XFeatTensorRT
+- EfficientLoFTR (CVPR 2024): https://github.com/zju3dv/EfficientLoFTR
+- JetPack 6.2: https://docs.nvidia.com/jetson/archives/jetpack-archived/jetpack-62/release-notes/
+- Hybrid ESKF/UKF: https://arxiv.org/abs/2512.17505
+- Google Maps Tile API: https://developers.google.com/maps/documentation/tile/satellite
+
+## Related Artifacts
+- AC Assessment: `_docs/00_research/gps_denied_nav/00_ac_assessment.md`
+- Tech stack evaluation: `_docs/01_solution/tech_stack.md`
@@ -0,0 +1,356 @@
+# Solution Draft
+
+## Assessment Findings
+
+| Old Component Solution | Weak Point (functional/security/performance) | New Solution |
+|------------------------|----------------------------------------------|-------------|
+| LiteSAM at 480px as satellite matcher | **Performance**: 497ms on AGX Orin at 1184px. Orin Nano Super is ~3-4x slower. At 480px estimated ~270-360ms — borderline. Paper uses PyTorch AMP, not TensorRT FP16. TensorRT could bring 2-3x improvement. | Add TensorRT FP16 as mandatory optimization step. Revised estimate at 480px with TensorRT: ~90-180ms. Still benchmark-driven: abandon if >400ms. |
+| XFeat as LiteSAM fallback for satellite matching | **Functional**: XFeat is a general-purpose feature matcher, NOT designed for cross-view satellite-aerial gap. May fail on season/lighting differences between UAV and satellite imagery. | **Expand fallback options**: benchmark EfficientLoFTR (designed for weak-texture aerial) alongside XFeat. Consider STHN-style deep homography as third option. See detailed satellite matcher comparison below. |
+| SP+LG considered as "sparse only, worse on satellite-aerial" | **Functional**: LiteSAM paper confirms "SP+LG achieves fastest inference speed but at expense of accuracy." Sparse matcher fails on texture-scarce regions. ~180-360ms on Orin Nano Super. | **Reject SP+LG** for both VO and satellite matching. cuVSLAM is 15-33x faster for VO. |
+| cuVSLAM on low-texture terrain | **Functional**: cuVSLAM uses Shi-Tomasi corners + Lucas-Kanade tracking. On uniform agricultural fields/water bodies, features will be sparse → frequent tracking loss. IMU fallback lasts only ~1s. No published benchmarks for nadir agricultural terrain. Does NOT guarantee pose recovery after tracking loss. | **CRITICAL RISK**: cuVSLAM will likely fail frequently over low-texture terrain. Mitigation: (1) increase satellite matching frequency in low-texture areas, (2) use IMU dead-reckoning bridge, (3) accept higher drift in featureless segments, (4) XFeat VO as secondary fallback may also struggle on same terrain. |
+| cuVSLAM memory estimate ~200-300MB | **Performance**: Map grows over time. For 3000-frame flights (~16min at 3fps), map could reach 500MB-1GB without pruning. | Configure cuVSLAM map pruning. Set max keyframes. Monitor memory. |
+| Tile search on VO failure: "expand to ±1km" | **Functional**: Underspecified. Loading 10-20 tiles slow from disk I/O. | Preload tiles within ±2km of flight plan into RAM. Ranked search by IMU dead-reckoning position. |
+| LiteSAM resolution | **Performance**: Paper benchmarked at 1184px on AGX Orin (497ms AMP). TensorRT FP16 with reparameterized MobileOne expected 2-3x faster. | Benchmark LiteSAM TRT FP16 at **1280px** on Orin Nano Super. If ≤200ms → use LiteSAM at 1280px. If >200ms → use XFeat. |
+| SP+LG proposed for VO by user | **Performance**: ~130-280ms/frame on Orin Nano. cuVSLAM ~8.6ms/frame. No IMU, no loop closure. | **Reject SP+LG for VO.** cuVSLAM 15-33x faster. XFeat frame-to-frame remains fallback. |
+
+## Product Solution Description
+
+A real-time GPS-denied visual navigation system for fixed-wing UAVs, running entirely on a Jetson Orin Nano Super (8GB). The system determines frame-center GPS coordinates by fusing three information sources: (1) CUDA-accelerated visual odometry (cuVSLAM), (2) absolute position corrections from satellite image matching, and (3) IMU-based motion prediction. Results stream to clients via REST API + SSE in real time.
+
+**Hard constraint**: Camera shoots at ~3fps (333-400ms interval). The full pipeline must complete within **400ms per frame**.
+
+**Satellite matching strategy**: Benchmark LiteSAM TensorRT FP16 at **1280px** on Orin Nano Super as a day-one priority. The paper's AGX Orin benchmark used PyTorch AMP — TensorRT FP16 with reparameterized MobileOne should yield 2-3x additional speedup. **Decision rule: if LiteSAM TRT FP16 at 1280px ≤200ms → use LiteSAM. If >200ms → use XFeat.**
+
+**Core architectural principles**:
+1. **cuVSLAM handles VO** — 116fps on Orin Nano 8GB, ~8.6ms/frame. SuperPoint+LightGlue was evaluated and rejected (15-33x slower, no IMU integration).
+2. **Keyframe-based satellite matching** — satellite matcher runs on keyframes only (every 3-10 frames), amortizing cost. Non-keyframes rely on cuVSLAM VO + IMU.
+3. **Every keyframe independently attempts satellite-based geo-localization** — handles disconnected segments natively.
+4. **Pipeline parallelism** — satellite matching for frame N overlaps with VO processing of frame N+1 via CUDA streams.
+5. **Proactive tile loading** — preload tiles within ±2km of flight plan into RAM for fast lookup during expanded search.
+
+```
+┌─────────────────────────────────────────────────────────────────┐
+│                    OFFLINE (Before Flight)                       │
+│  Satellite Tiles → Download & Crop → Store as tile pairs        │
+│  (Google Maps)     (per flight plan)   (disk, GeoHash indexed)  │
+└─────────────────────────────────────────────────────────────────┘
+                              │
+                              ▼
+┌─────────────────────────────────────────────────────────────────┐
+│                    ONLINE (During Flight)                        │
+│                                                                  │
+│  EVERY FRAME (400ms budget):                                     │
+│  ┌────────────────────────────────┐                              │
+│  │ Camera → Downsample (CUDA 2ms)│                               │
+│  │       → cuVSLAM VO+IMU (~9ms) │──→ ESKF Update → SSE Emit   │
+│  └────────────────────────────────┘         ↑                    │
+│                                             │                    │
+│  KEYFRAMES ONLY (every 3-10 frames):        │                    │
+│  ┌────────────────────────────────────┐     │                    │
+│  │ Satellite match (async CUDA stream)│─────┘                    │
+│  │ LiteSAM TRT FP16 or XFeat         │                           │
+│  │ (does NOT block VO output)         │                           │
+│  └────────────────────────────────────┘                          │
+│                                                                  │
+│  IMU: 100+Hz continuous → ESKF prediction                        │
+│  TILES: ±2km preloaded in RAM from flight plan                   │
+└─────────────────────────────────────────────────────────────────┘
+```
+
+## Speed Optimization Techniques
+
+### 1. cuVSLAM for Visual Odometry (~9ms/frame)
+NVIDIA's CUDA-accelerated VO library (v15.0.0, March 2026) achieves 116fps on Jetson Orin Nano 8GB at 720p. Supports monocular camera + IMU natively. Features: automatic IMU fallback when visual tracking fails, loop closure, Python and C++ APIs.
+
+**Why not SuperPoint+LightGlue for VO**: SP+LG is 15-33x slower (~130-280ms vs ~9ms). Lacks IMU integration, loop closure, auto-fallback.
+
+**CRITICAL: cuVSLAM on difficult/even terrain (agricultural fields, water)**:
+cuVSLAM uses Shi-Tomasi corner detection + Lucas-Kanade optical flow tracking (classical features, not learned). On uniform agricultural terrain or water bodies:
+- Very few corners will be detected → sparse/unreliable tracking
+- Frequent keyframe creation → heavier compute
+- Tracking loss → IMU fallback (~1 second) → constant-velocity integrator (~0.5s more)
+- cuVSLAM does NOT guarantee pose recovery after tracking loss
+- All published benchmarks (KITTI: urban/suburban, EuRoC: indoor) do NOT include nadir agricultural terrain
+- Multi-stereo mode helps with featureless surfaces, but we have mono camera only
+
+**Mitigation strategy for low-texture terrain**:
+1. **Increase satellite matching frequency**: In low-texture areas (detected by cuVSLAM's keypoint count dropping), switch from every 3-10 frames to every frame
+2. **IMU dead-reckoning bridge**: When cuVSLAM reports tracking loss, ESKF continues with IMU prediction. At 3fps with ~1.5s IMU bridge, that covers ~4-5 frames
+3. **Accept higher drift**: In featureless segments, position accuracy degrades to IMU-only level (50-100m+ over ~10s). Satellite matching must recover absolute position when texture returns
+4. **Keypoint density monitoring**: Track cuVSLAM's number of tracked features per frame. When below threshold (e.g., <50), proactively trigger satellite matching
+5. **XFeat frame-to-frame as VO fallback**: XFeat uses learned features that may detect texture invisible to Shi-Tomasi corners. But XFeat may also struggle on truly uniform terrain
+
+### 2. Keyframe-Based Satellite Matching
+Not every frame needs satellite matching. Strategy:
+- cuVSLAM provides VO at every frame (high-rate, low-latency)
+- Satellite matching triggers on **keyframes** selected by:
+  - Fixed interval: every 3-10 frames (~1-3.3s between satellite corrections)
+  - Confidence drop: when ESKF covariance exceeds threshold
+  - VO failure: when cuVSLAM reports tracking loss (sharp turn)
+
+### 3. Satellite Matcher Selection (Benchmark-Driven)
+
+**Important context**: Our UAV-to-satellite matching is EASIER than typical cross-view geo-localization problems. Both the UAV camera and satellite imagery are approximately nadir (top-down). The main challenges are season/lighting differences, resolution mismatch, and temporal changes — not the extreme viewpoint gap seen in ground-to-satellite matching. This means even general-purpose matchers may perform well.
+
+**Candidate A: LiteSAM (opt) with TensorRT FP16 at 1280px** — Best satellite-aerial accuracy (RMSE@30 = 17.86m on UAV-VisLoc). 6.31M params, MobileOne reparameterizable for TensorRT. Paper benchmarked at 497ms on AGX Orin using AMP at 1184px. TensorRT FP16 with reparameterized MobileOne expected 2-3x faster than AMP. At 1280px (close to paper's 1184px benchmark resolution), accuracy should match published results.
+
+Orin Nano Super TensorRT FP16 estimate at 1280px:
+- AGX Orin AMP @ 1184px: 497ms
+- TRT FP16 speedup over AMP: ~2-3x → AGX Orin TRT estimate: ~165-250ms
+- Orin Nano Super is ~3-4x slower → estimate: ~500-1000ms without TRT
+- With TRT FP16: **~165-330ms** (realistic range)
+- Go/no-go threshold: **≤200ms**
+
+**Candidate B (fallback): XFeat semi-dense** — ~50-100ms on Orin Nano Super. Proven on Jetson. General-purpose, not designed for cross-view gap. FASTEST option. Since our cross-view gap is small (both nadir), XFeat may work adequately for this specific use case.
+
+**Other evaluated options (not selected)**:
+
+- **EfficientLoFTR**: Semi-dense, 15.05M params, handles weak-texture well. ~20% slower than LiteSAM. Strong option if LiteSAM codebase proves difficult to export to TRT, but larger model footprint.
+- **Deep Homography (STHN-style)**: End-to-end homography estimation, no feature/RANSAC pipeline. 4.24m at 50m range. Interesting future option but needs RGB retraining — higher implementation risk.
+- **PFED and retrieval-based methods**: Image RETRIEVAL only (identifies which tile matches), not pixel-level matching. We already know which tile to use from ESKF position.
+- **SuperPoint+LightGlue**: Sparse matcher. LiteSAM paper confirms worse satellite-aerial accuracy. Slower than XFeat.
+
+**Decision rule** (day-one on Orin Nano Super):
+1. Export LiteSAM (opt) to TensorRT FP16
+2. Benchmark at **1280px**
+3. **If ≤200ms → use LiteSAM at 1280px**
+4. **If >200ms → use XFeat**
+
+### 4. TensorRT FP16 Optimization
+LiteSAM's MobileOne backbone is reparameterizable — multi-branch training structure collapses to a single feed-forward path at inference. Combined with TensorRT FP16, this maximizes throughput. **Do NOT use INT8 on transformer components** (TAIFormer) — accuracy degrades. INT8 is safe only for the MobileOne backbone CNN layers.
+
+### 5. CUDA Stream Pipelining
+Overlap operations across consecutive frames:
+- Stream A: cuVSLAM VO for current frame (~9ms) + ESKF fusion (~1ms)
+- Stream B: Satellite matching for previous keyframe (async)
+- CPU: SSE emission, tile management, keyframe selection logic
+
+### 6. Proactive Tile Loading
+**Change from draft01**: Instead of loading tiles on-demand from disk, preload tiles within ±2km of the flight plan into RAM at session start. This eliminates disk I/O latency during flight. For a 50km flight path, ~2000 tiles at zoom 19 ≈ ~200MB RAM — well within budget.
+
+On VO failure / expanded search:
+1. Compute IMU dead-reckoning position
+2. Rank preloaded tiles by distance to predicted position
+3. Try top 3 tiles (not all tiles in ±1km radius)
+4. If no match in top 3, expand to next 3
+
+## Existing/Competitor Solutions Analysis
+
+| Solution | Approach | Accuracy | Hardware | Limitations |
+|----------|----------|----------|----------|-------------|
+| Mateos-Ramirez et al. (2024) | VO (ORB) + satellite keypoint correction + Kalman | 142m mean / 17km (0.83%) | Orange Pi class | No re-localization; ORB only; 1000m+ altitude |
+| SatLoc (2025) | DinoV2 + XFeat + optical flow + adaptive fusion | <15m, >90% coverage | Edge (unspecified) | Paper not fully accessible |
+| LiteSAM (2025) | MobileOne + TAIFormer + MinGRU subpixel refinement | RMSE@30 = 17.86m on UAV-VisLoc | RTX 3090 (62ms), AGX Orin (497ms@1184px) | Not tested on Orin Nano; AGX Orin is 3-4x more powerful |
+| TerboucheHacene/visual_localization | SuperPoint/SuperGlue/GIM + VO + satellite | Not quantified | Desktop-class | Not edge-optimized |
+| cuVSLAM (NVIDIA, 2025-2026) | CUDA-accelerated VO+SLAM, mono/stereo/IMU | <1% trajectory error (KITTI), <5cm (EuRoC) | Jetson Orin Nano (116fps) | VO only, no satellite matching |
+| VRLM (2024) | FocalNet backbone + multi-scale feature fusion | 83.35% MA@20 | Desktop | Not edge-optimized |
+| Scale-Aware UAV-to-Satellite (2026) | Semantic geometric + metric scale recovery | N/A | Desktop | Addresses scale ambiguity problem |
+| EfficientLoFTR (CVPR 2024) | Aggregated attention + adaptive token selection, semi-dense | Competitive with LiteSAM | 2.5x faster than LoFTR, TRT available | 15.05M params, heavier than LiteSAM |
+| PFED (2025) | Knowledge distillation + multi-view refinement, retrieval | 97.15% Recall@1 (University-1652) | AGX Orin (251.5 FPS) | Retrieval only, not pixel-level matching |
+| STHN (IEEE RA-L 2024) | Deep homography estimation, coarse-to-fine | 4.24m at 50m range | Open-source, lightweight | Trained on thermal, needs RGB retraining |
+| Hierarchical AVL (2025) | DINOv2 retrieval + SuperPoint matching | 64.5-95% success rate | ROS, IMU integration | Two-stage complexity |
+| JointLoc (IROS 2024) | Retrieval + VO fusion, adaptive weighting | 0.237m RMSE over 1km | Open-source | Designed for Mars/planetary, needs adaptation |
+
+## Architecture
+
+### Component: Visual Odometry
+
+| Solution | Tools | Advantages | Limitations | Performance | Fit |
+|----------|-------|-----------|-------------|------------|-----|
+| cuVSLAM (mono+IMU) | PyCuVSLAM v15.0.0 | 116fps on Orin Nano, NVIDIA-optimized, loop closure, IMU fallback | Closed-source CUDA library | ~9ms/frame | ✅ Best |
+| XFeat frame-to-frame | XFeatTensorRT | 5x faster than SuperPoint, open-source | ~30-50ms total, no IMU integration | ~30-50ms/frame | ⚠️ Fallback |
+| SuperPoint+LightGlue | LightGlue-ONNX TRT | Good accuracy, adaptive pruning | ~130-280ms, no IMU, no loop closure | ~130-280ms/frame | ❌ Rejected |
+| ORB-SLAM3 | OpenCV + custom | Well-understood, open-source | CPU-heavy, ~30fps on Orin | ~33ms/frame | ⚠️ Slower |
+
+**Selected**: **cuVSLAM (mono+IMU mode)** — 116fps, purpose-built by NVIDIA for Jetson. Auto-fallback to IMU when visual tracking fails.
+
+**SP+LG rejection rationale**: 15-33x slower than cuVSLAM. No built-in IMU fusion, loop closure, or tracking failure detection. Building these features around SP+LG would take significant development time and still be slower. XFeat at ~30-50ms is a better fallback for VO if cuVSLAM fails on nadir camera.
+
+### Component: Satellite Image Matching
+
+| Solution | Tools | Advantages | Limitations | Performance | Fit |
+|----------|-------|-----------|-------------|------------|-----|
+| LiteSAM (opt) TRT FP16 @ 1280px | TensorRT | Best satellite-aerial accuracy (RMSE@30 17.86m), 6.31M params, subpixel refinement | Untested on Orin Nano Super with TensorRT | Est. ~165-330ms @ 1280px TRT FP16 | ✅ If ≤200ms |
+| XFeat semi-dense | XFeatTensorRT | ~50-100ms, lightweight, Jetson-proven, fastest | General-purpose, not designed for cross-view. Our nadir-nadir gap is small → may work. | ~50-100ms | ✅ Fallback if LiteSAM >200ms |
+
+**Selection**: Day-one benchmark on Orin Nano Super:
+1. Export LiteSAM (opt) to TensorRT FP16
+2. Benchmark at **1280px**
+3. **If ≤200ms → LiteSAM at 1280px**
+4. **If >200ms → XFeat**
+
+### Component: Sensor Fusion
+
+| Solution | Tools | Advantages | Limitations | Performance | Fit |
+|----------|-------|-----------|-------------|------------|-----|
+| Error-State EKF (ESKF) | Custom Python/C++ | Lightweight, multi-rate, well-understood | Linear approximation | <1ms/step | ✅ Best |
+| Hybrid ESKF/UKF | Custom | 49% better accuracy | More complex | ~2-3ms/step | ⚠️ Upgrade path |
+| Factor Graph (GTSAM) | GTSAM | Best accuracy | Heavy compute | ~10-50ms/step | ❌ Too heavy |
+
+**Selected**: **ESKF** with adaptive measurement noise. State vector: [position(3), velocity(3), orientation_quat(4), accel_bias(3), gyro_bias(3)] = 16 states.
+
+### Component: Satellite Tile Preprocessing (Offline)
+
+**Selected**: **GeoHash-indexed tile pairs on disk + RAM preloading**.
+
+Pipeline:
+1. Define operational area from flight plan
+2. Download satellite tiles from Google Maps Tile API at max zoom (18-19)
+3. Pre-resize each tile to matcher input resolution
+4. Store: original tile + resized tile + metadata (GPS bounds, zoom, GSD) in GeoHash-indexed directory structure
+5. Copy to Jetson storage before flight
+6. **At session start**: preload tiles within ±2km of flight plan into RAM (~200MB for 50km route)
+
+### Component: Re-localization (Disconnected Segments)
+
+**Selected**: **Keyframe satellite matching is always active + ranked tile search on VO failure**.
+
+When cuVSLAM reports tracking loss (sharp turn, no features):
+1. Immediately flag next frame as keyframe → trigger satellite matching
+2. Compute IMU dead-reckoning position since last known position
+3. Rank preloaded tiles by distance to dead-reckoning position
+4. Try top 3 tiles sequentially (not all tiles in radius)
+5. If match found: position recovered, new segment begins
+6. If 3 consecutive keyframe failures across top tiles: expand to next 3 tiles
+7. If still no match after 3+ full attempts: request user input via API
+
+### Component: Object Center Coordinates
+
+Geometric calculation once frame-center GPS is known:
+1. Pixel offset from center: (dx_px, dy_px)
+2. Convert to meters: dx_m = dx_px × GSD, dy_m = dy_px × GSD
+3. Rotate by IMU yaw heading
+4. Convert meter offset to lat/lon and add to frame-center GPS
+
+### Component: API & Streaming
+
+**Selected**: **FastAPI + sse-starlette**. REST for session management, SSE for real-time position stream. OpenAPI auto-documentation.
+
+## Processing Time Budget (per frame, 400ms budget)
+
+### Normal Frame (non-keyframe, ~60-80% of frames)
+
+| Step | Time | Notes |
+|------|------|-------|
+| Image capture + transfer | ~10ms | CSI/USB3 |
+| Downsample (for cuVSLAM) | ~2ms | OpenCV CUDA |
+| cuVSLAM VO+IMU | ~9ms | NVIDIA CUDA-optimized, 116fps capable |
+| ESKF fusion (VO+IMU update) | ~1ms | C extension or NumPy |
+| SSE emit | ~1ms | Async |
+| **Total** | **~23ms** | Well within 400ms |
+
+### Keyframe Satellite Matching (async, every 3-10 frames)
+
+Runs asynchronously on a separate CUDA stream — does NOT block per-frame VO output.
+
+**Path A — LiteSAM TRT FP16 at 1280px (if ≤200ms benchmark)**:
+
+| Step | Time | Notes |
+|------|------|-------|
+| Downsample to 1280px | ~1ms | OpenCV CUDA |
+| Load satellite tile | ~1ms | Pre-loaded in RAM |
+| LiteSAM (opt) TRT FP16 matching | ≤200ms | TensorRT FP16, 1280px, go/no-go threshold |
+| Geometric pose (RANSAC) | ~5ms | Homography estimation |
+| ESKF satellite update | ~1ms | Delayed measurement |
+| **Total** | **≤210ms** | Async, within budget |
+
+**Path B — XFeat (if LiteSAM >200ms)**:
+
+| Step | Time | Notes |
+|------|------|-------|
+| XFeat feature extraction (both images) | ~10-20ms | TensorRT FP16/INT8 |
+| XFeat semi-dense matching | ~30-50ms | KNN + refinement |
+| Geometric verification (RANSAC) | ~5ms | |
+| ESKF satellite update | ~1ms | |
+| **Total** | **~50-80ms** | Comfortably within budget |
+
+## Memory Budget (Jetson Orin Nano Super, 8GB shared)
+
+| Component | Memory | Notes |
+|-----------|--------|-------|
+| OS + runtime | ~1.5GB | JetPack 6.2 + Python |
+| cuVSLAM | ~200-500MB | CUDA library + map state. **Configure map pruning for 3000-frame flights** |
+| Satellite matcher TensorRT | ~50-100MB | LiteSAM FP16 or XFeat FP16 |
+| Preloaded satellite tiles | ~200MB | ±2km of flight plan, pre-resized |
+| Current frame (downsampled) | ~2MB | 640×480×3 |
+| ESKF state + buffers | ~10MB | |
+| FastAPI + SSE runtime | ~100MB | |
+| **Total** | **~2.1-2.9GB** | ~26-36% of 8GB — comfortable margin |
+
+## Confidence Scoring
+
+| Level | Condition | Expected Accuracy |
+|-------|-----------|-------------------|
+| HIGH | Satellite match succeeded + cuVSLAM consistent | <20m |
+| MEDIUM | cuVSLAM VO only, recent satellite correction (<500m travel) | 20-50m |
+| LOW | cuVSLAM VO only, no recent satellite correction | 50-100m+ |
+| VERY LOW | IMU dead-reckoning only (cuVSLAM + satellite both failed) | 100m+ |
+| MANUAL | User-provided position | As provided |
+
+## Key Risks and Mitigations
+
+| Risk | Likelihood | Impact | Mitigation |
+|------|-----------|--------|------------|
+| **cuVSLAM fails on low-texture agricultural terrain** | **HIGH** | Frequent tracking loss, degraded VO | Increase satellite matching frequency when keypoint count drops. IMU dead-reckoning bridge (~1.5s). Accept higher drift in featureless segments. Satellite matching recovers position when texture returns. |
+| LiteSAM TRT FP16 >200ms at 1280px on Orin Nano Super | MEDIUM | Must use XFeat instead (less accurate for cross-view) | Day-one TRT FP16 benchmark. If >200ms → XFeat. Since our nadir-nadir gap is small, XFeat may still perform adequately. |
+| XFeat cross-view accuracy insufficient | MEDIUM | Satellite corrections less accurate | Benchmark XFeat on actual operational area satellite-aerial pairs. Increase keyframe frequency; multi-tile consensus; strict RANSAC. |
+| cuVSLAM map memory growth on long flights | MEDIUM | Memory pressure | Configure map pruning, set max keyframes. Monitor memory. |
+| Google Maps satellite quality in conflict zone | HIGH | Satellite matching fails | Accept VO+IMU with higher drift; request user input sooner; alternative satellite providers |
+| cuVSLAM is closed-source, no nadir benchmarks | MEDIUM | Unknown failure modes over farmland | Extensive testing with real nadir UAV imagery before deployment. XFeat VO as fallback (also uses learned features). |
+| Tile I/O bottleneck during expanded search | LOW | Delayed re-localization | Preload ±2km tiles in RAM; ranked search instead of exhaustive |
+
+## Testing Strategy
+
+### Integration / Functional Tests
+- End-to-end pipeline test with real flight data (60 images from input_data/)
+- Compare computed positions against ground truth GPS from coordinates.csv
+- Measure: percentage within 50m, percentage within 20m
+- Test sharp-turn handling: introduce 90-degree heading change in sequence
+- Test user-input fallback: simulate 3+ consecutive failures
+- Test SSE streaming: verify client receives VO result within 50ms, satellite-corrected result within 500ms
+- Test session management: start/stop/restart flight sessions via REST API
+- Test cuVSLAM map memory: run 3000-frame session, monitor memory growth
+
+### Non-Functional Tests
+- **Day-one satellite matcher benchmark**: LiteSAM TRT FP16 at **1280px** on Orin Nano Super. If ≤200ms → use LiteSAM. If >200ms → use XFeat. Also measure accuracy on test satellite-aerial pairs for both.
+- cuVSLAM benchmark: verify 116fps monocular+IMU on Orin Nano Super
+- **cuVSLAM terrain stress test**: test with nadir camera over (a) urban/structured terrain, (b) agricultural fields, (c) water/uniform terrain, (d) forest. Measure: keypoint count, tracking success rate, drift per 100 frames, IMU fallback frequency
+- cuVSLAM keypoint monitoring: verify that low-keypoint detection triggers increased satellite matching
+- Performance: measure per-frame processing time (must be <400ms)
+- Memory: monitor peak usage during 3000-frame session (must stay <8GB)
+- Stress: process 3000 frames without memory leak
+- Keyframe strategy: vary interval (2, 3, 5, 10) and measure accuracy vs latency tradeoff
+- Tile preloading: verify RAM usage of preloaded tiles for 50km flight plan
+
+## References
+- EfficientLoFTR (CVPR 2024): https://github.com/zju3dv/EfficientLoFTR
+- EfficientLoFTR paper: https://zju3dv.github.io/efficientloftr/
+- LoFTR TensorRT adaptation: https://github.com/Kolkir/LoFTR_TRT
+- PFED (2025): https://github.com/SkyEyeLoc/PFED
+- STHN (IEEE RA-L 2024): https://github.com/arplaboratory/STHN
+- JointLoc (IROS 2024): https://github.com/LuoXubo/JointLoc
+- Hierarchical AVL (MDPI 2025): https://www.mdpi.com/2072-4292/17/20/3470
+- LiteSAM (2025): https://www.mdpi.com/2072-4292/17/19/3349
+- LiteSAM code: https://github.com/boyagesmile/LiteSAM
+- cuVSLAM (2025-2026): https://github.com/NVlabs/PyCuVSLAM
+- cuVSLAM paper: https://arxiv.org/abs/2506.04359
+- PyCuVSLAM API: https://nvlabs.github.io/PyCuVSLAM/api.html
+- Intermodalics cuVSLAM benchmark: https://www.intermodalics.ai/blog/nvidia-isaac-ros-in-depth-cuvslam-and-the-dp3-1-release
+- Mateos-Ramirez et al. (2024): https://www.mdpi.com/2076-3417/14/16/7420
+- SatLoc (2025): https://www.scilit.com/publications/e5cafaf875a49297a62b298a89d5572f
+- XFeat (CVPR 2024): https://arxiv.org/abs/2404.19174
+- XFeat TensorRT for Jetson: https://github.com/PranavNedunghat/XFeatTensorRT
+- EfficientLoFTR (CVPR 2024): https://github.com/zju3dv/EfficientLoFTR
+- LightGlue (ICCV 2023): https://github.com/cvg/LightGlue
+- LightGlue TensorRT: https://fabio-sim.github.io/blog/accelerating-lightglue-inference-onnx-runtime-tensorrt/
+- LightGlue TRT Jetson: https://github.com/qdLMF/LightGlue-with-FlashAttentionV2-TensorRT
+- ForestVO / SP+LG VO: https://arxiv.org/html/2504.01261v1
+- vo_lightglue (SP+LG VO): https://github.com/himadrir/vo_lightglue
+- JetPack 6.2: https://docs.nvidia.com/jetson/archives/jetpack-archived/jetpack-62/release-notes/
+- Hybrid ESKF/UKF: https://arxiv.org/abs/2512.17505
+- Google Maps Tile API: https://developers.google.com/maps/documentation/tile/satellite
+
+## Related Artifacts
+- AC Assessment: `_docs/00_research/gps_denied_nav/00_ac_assessment.md`
+- Tech stack evaluation: `_docs/01_solution/tech_stack.md`
+- Security analysis: `_docs/01_solution/security_analysis.md`
@@ -0,0 +1,491 @@
+# Solution Draft
+
+## Assessment Findings
+
+| Old Component Solution | Weak Point (functional/security/performance) | New Solution |
+|------------------------|----------------------------------------------|-------------|
+| FastAPI + SSE as primary output | **Functional**: New AC requires MAVLink GPS_INPUT to flight controller, not REST/SSE. The system must act as a GPS replacement module. SSE is wrong output channel. | **Replace with pymavlink GPS_INPUT sender**. Send GPS_INPUT at 5-10Hz to flight controller via UART. Retain minimal FastAPI only for local IPC (object localization API). |
+| No ground station integration | **Functional**: New AC requires streaming position+confidence to ground station and receiving re-localization commands via telemetry. Draft02 had no telemetry. | **MAVLink telemetry integration**: GPS data forwarded automatically by flight controller. Custom data via NAMED_VALUE_FLOAT (confidence, drift). Re-localization hints via COMMAND_LONG listener. |
+| MAVSDK library (per restriction) | **Functional**: MAVSDK-Python v3.15.3 cannot send GPS_INPUT messages. Feature requested since 2021, still unresolved. This is a blocking limitation for the core output function. | **Use pymavlink** for all MAVLink communication. pymavlink provides `gps_input_send()` and full MAVLink v2 access. Note conflict with restriction — pymavlink is the only viable option. |
+| 3fps camera → ~3Hz output | **Performance**: ArduPilot GPS_RATE_MS minimum is 5Hz (200ms). 3Hz camera output is below minimum. Flight controller EKF may not fuse properly. | **IMU-interpolated 5-10Hz GPS_INPUT**: ESKF prediction runs at 100+Hz internally. Emit predicted state as GPS_INPUT at 5-10Hz. Camera corrections arrive at 3Hz within this stream. |
+| No startup/failsafe procedures | **Functional**: New AC requires init from last GPS, reboot recovery, IMU-only fallback. Draft02 assumed position was already known. | **Full lifecycle management**: (1) Boot → read GPS from flight controller → init ESKF. (2) Reboot → read IMU-extrapolated position → re-init. (3) N-second failure → stop GPS_INPUT → autopilot falls back to IMU. |
+| Basic object localization (nadir only) | **Functional**: New AC adds AI camera with configurable angle and zoom. Nadir pixel-to-GPS is insufficient. | **Trigonometric projection for oblique camera**: ground_distance = alt × tan(tilt), bearing = heading + pan + pixel offset. Local API for AI system requests. |
+| No thermal management | **Performance**: Jetson Orin Nano Super throttles at 80°C (GPU drops 1GHz→300MHz = 3x slowdown). Could blow 400ms budget. | **Thermal monitoring + adaptive pipeline**: Use 25W mode. Monitor via tegrastats. If temp >75°C → reduce satellite matching frequency. If >80°C → VO+IMU only. |
+| ESKF covariance without explicit drift budget | **Functional**: New AC requires max 100m cumulative VO drift between satellite anchors. Draft02 uses covariance for keyframe selection but no explicit budget. | **Drift budget tracker**: √(σ_x² + σ_y²) from ESKF as drift estimate. When approaching 100m → force every-frame satellite matching. Report via horiz_accuracy in GPS_INPUT. |
+| No satellite imagery validation | **Functional**: New AC requires ≥0.5 m/pixel, <2 years old. Draft02 didn't validate. | **Preprocessing validation step**: Check zoom 19 availability (0.3 m/pixel). Fall back to zoom 18 (0.6 m/pixel). Flag stale tiles. |
+| "Ask user via API" for re-localization | **Functional**: New AC says send re-localization request to ground station via telemetry link, not REST API. Operator sends hint via telemetry. | **MAVLink re-localization protocol**: On 3 consecutive failures → send STATUSTEXT alert to ground station. Operator sends COMMAND_LONG with approximate lat/lon. System uses hint to constrain tile search. |
+
+## Product Solution Description
+
+A real-time GPS-denied visual navigation system for fixed-wing UAVs, running on a Jetson Orin Nano Super (8GB). The system replaces the GPS module for the flight controller by sending MAVLink GPS_INPUT messages via pymavlink over UART. Position is determined by fusing: (1) CUDA-accelerated visual odometry (cuVSLAM), (2) absolute position corrections from satellite image matching, and (3) IMU data from the flight controller. GPS_INPUT is sent at 5-10Hz, with camera-based corrections at 3Hz and IMU prediction filling the gaps.
+
+**Hard constraint**: Camera shoots at ~3fps (333ms interval). The full VO+ESKF pipeline must complete within 400ms per frame. GPS_INPUT output rate: 5-10Hz minimum (ArduPilot EKF requirement).
+
+**Output architecture**:
+- **Primary**: pymavlink → GPS_INPUT to flight controller via UART (replaces GPS module)
+- **Telemetry**: Flight controller auto-forwards GPS data to ground station. Custom NAMED_VALUE_FLOAT for confidence/drift at 1Hz
+- **Commands**: Ground station → COMMAND_LONG → flight controller → pymavlink listener on companion computer
+- **Local IPC**: Minimal FastAPI on localhost for object localization requests from AI systems
+
+```
+┌─────────────────────────────────────────────────────────────────────┐
+│                    OFFLINE (Before Flight)                           │
+│  Satellite Tiles → Download & Validate → Pre-resize → Store        │
+│  (Google Maps)     (≥0.5m/px, <2yr)     (matcher res)  (GeoHash)  │
+│  Copy to Jetson storage                                             │
+└─────────────────────────────────────────────────────────────────────┘
+                              │
+                              ▼
+┌─────────────────────────────────────────────────────────────────────┐
+│                    ONLINE (During Flight)                            │
+│                                                                     │
+│  STARTUP:                                                           │
+│  pymavlink → read GLOBAL_POSITION_INT → init ESKF → start cuVSLAM │
+│                                                                     │
+│  EVERY FRAME (3fps, 333ms interval):                                │
+│  ┌──────────────────────────────────────┐                           │
+│  │ Nav Camera → Downsample (CUDA ~2ms)  │                           │
+│  │           → cuVSLAM VO+IMU (~9ms)    │                           │
+│  │           → ESKF measurement update  │                           │
+│  └──────────────────────────────────────┘                           │
+│                                                                     │
+│  5-10Hz CONTINUOUS (between camera frames):                         │
+│  ┌──────────────────────────────────────┐                           │
+│  │ ESKF IMU prediction → GPS_INPUT send │──→ Flight Controller     │
+│  │ (pymavlink, every 100-200ms)         │    (GPS1_TYPE=14)        │
+│  └──────────────────────────────────────┘                           │
+│                                                                     │
+│  KEYFRAMES (every 3-10 frames, async):                              │
+│  ┌──────────────────────────────────────┐                           │
+│  │ Satellite match (CUDA stream B)      │──→ ESKF correction       │
+│  │ LiteSAM TRT FP16 or XFeat           │                           │
+│  └──────────────────────────────────────┘                           │
+│                                                                     │
+│  TELEMETRY (1Hz):                                                   │
+│  ┌──────────────────────────────────────┐                           │
+│  │ NAMED_VALUE_FLOAT: confidence, drift │──→ Ground Station        │
+│  │ STATUSTEXT: alerts, re-loc requests  │    (via telemetry radio) │
+│  └──────────────────────────────────────┘                           │
+│                                                                     │
+│  COMMANDS (from ground station):                                    │
+│  ┌──────────────────────────────────────┐                           │
+│  │ Listen COMMAND_LONG: re-loc hint     │←── Ground Station        │
+│  │ (lat/lon from operator)              │    (via telemetry radio) │
+│  └──────────────────────────────────────┘                           │
+│                                                                     │
+│  LOCAL IPC:                                                         │
+│  ┌──────────────────────────────────────┐                           │
+│  │ FastAPI localhost:8000               │←── AI Detection System   │
+│  │ POST /localize (object GPS calc)     │                           │
+│  │ GET /status (system health)          │                           │
+│  └──────────────────────────────────────┘                           │
+│                                                                     │
+│  IMU: 100+Hz from flight controller → ESKF prediction               │
+│  TILES: ±2km preloaded in RAM from flight plan                      │
+│  THERMAL: Monitor via tegrastats, adaptive pipeline throttling      │
+└─────────────────────────────────────────────────────────────────────┘
+```
+
+## Speed Optimization Techniques
+
+### 1. cuVSLAM for Visual Odometry (~9ms/frame)
+NVIDIA's CUDA-accelerated VO library (PyCuVSLAM v15.0.0, March 2026) achieves 116fps on Jetson Orin Nano 8GB at 720p. Supports monocular camera + IMU natively. Auto-fallback to IMU when visual tracking fails, loop closure, Python and C++ APIs.
+
+**CRITICAL: cuVSLAM on low-texture terrain (agricultural fields, water)**:
+cuVSLAM uses Shi-Tomasi corners + Lucas-Kanade optical flow (classical features). On uniform agricultural terrain:
+- Few corners detected → sparse/unreliable tracking
+- Frequent keyframe creation → heavier compute
+- Tracking loss → IMU fallback (~1s) → constant-velocity integrator (~0.5s)
+- cuVSLAM does NOT guarantee pose recovery after tracking loss
+
+**Mitigation**:
+1. Increase satellite matching frequency when cuVSLAM keypoint count drops
+2. IMU dead-reckoning bridge via ESKF (continues GPS_INPUT output during tracking loss)
+3. Accept higher drift in featureless segments — report via horiz_accuracy
+4. Keypoint density monitoring triggers adaptive satellite matching
+
+### 2. Keyframe-Based Satellite Matching
+Not every frame needs satellite matching:
+- cuVSLAM provides VO at every frame (~9ms)
+- Satellite matching triggers on keyframes selected by:
+  - Fixed interval: every 3-10 frames
+  - ESKF covariance exceeds threshold (drift approaching budget)
+  - VO failure: cuVSLAM reports tracking loss
+  - Thermal: reduce frequency if temperature high
+
+### 3. Satellite Matcher Selection (Benchmark-Driven)
+
+**Context**: Our UAV-to-satellite matching is nadir-to-nadir (both top-down). Challenges are season/lighting differences and temporal changes, not extreme viewpoint gaps.
+
+**Candidate A: LiteSAM (opt) TRT FP16 @ 1280px** — Best satellite-aerial accuracy (RMSE@30 = 17.86m on UAV-VisLoc). 6.31M params. TensorRT FP16 with reparameterized MobileOne. Estimated ~165-330ms on Orin Nano Super with TRT FP16.
+
+**Candidate B: XFeat semi-dense** — ~50-100ms on Orin Nano Super. Fastest option. General-purpose but our nadir-nadir gap is small.
+
+**Decision rule** (day-one on Orin Nano Super):
+1. Export LiteSAM (opt) to TensorRT FP16
+2. Benchmark at 1280px
+3. If ≤200ms → LiteSAM at 1280px
+4. If >200ms → XFeat
+
+### 4. TensorRT FP16 Optimization
+LiteSAM's MobileOne backbone is reparameterizable — multi-branch collapses to single feed-forward at inference. INT8 safe only for MobileOne CNN layers, NOT for TAIFormer transformer components.
+
+### 5. CUDA Stream Pipelining
+- Stream A: cuVSLAM VO for current frame (~9ms) + ESKF fusion (~1ms)
+- Stream B: Satellite matching for previous keyframe (async, does not block VO)
+- CPU: GPS_INPUT output loop, NAMED_VALUE_FLOAT, command listener, tile management
+
+### 6. Proactive Tile Loading
+Preload tiles within ±2km of flight plan into RAM at startup. For a 50km route, ~2000 tiles at zoom 19 ≈ ~200MB. Eliminates disk I/O during flight.
+
+On VO failure / expanded search:
+1. Compute IMU dead-reckoning position
+2. Rank preloaded tiles by distance to predicted position
+3. Try top 3 tiles, then expand
+
+### 7. 5-10Hz GPS_INPUT Output Loop
+Dedicated thread/coroutine sends GPS_INPUT at fixed rate (5-10Hz):
+1. Read current ESKF state (position, velocity, covariance)
+2. Compute horiz_accuracy from √(σ_x² + σ_y²)
+3. Set fix_type based on last correction type (3=satellite-corrected, 2=VO-only, 1=IMU-only)
+4. Send via `mav.gps_input_send()`
+5. Sleep until next interval
+
+This decouples camera frame rate (3fps) from GPS_INPUT rate (5-10Hz).
+
+## Existing/Competitor Solutions Analysis
+
+| Solution | Approach | Accuracy | Hardware | Limitations |
+|----------|----------|----------|----------|-------------|
+| Mateos-Ramirez et al. (2024) | VO (ORB) + satellite keypoint correction + Kalman | 142m mean / 17km (0.83%) | Orange Pi class | No re-localization; ORB only; 1000m+ altitude |
+| SatLoc (2025) | DinoV2 + XFeat + optical flow + adaptive fusion | <15m, >90% coverage | Edge (unspecified) | Paper not fully accessible |
+| LiteSAM (2025) | MobileOne + TAIFormer + MinGRU subpixel refinement | RMSE@30 = 17.86m on UAV-VisLoc | RTX 3090 (62ms), AGX Orin (497ms@1184px) | Not tested on Orin Nano |
+| cuVSLAM (NVIDIA, 2025-2026) | CUDA-accelerated VO+SLAM, mono/stereo/IMU | <1% trajectory error (KITTI) | Jetson Orin Nano (116fps) | VO only, no satellite matching |
+| EfficientLoFTR (CVPR 2024) | Aggregated attention + adaptive token selection | Competitive with LiteSAM | TRT available | 15.05M params, heavier |
+| STHN (IEEE RA-L 2024) | Deep homography estimation | 4.24m at 50m range | Lightweight | Needs RGB retraining |
+| JointLoc (IROS 2024) | Retrieval + VO fusion, adaptive weighting | 0.237m RMSE over 1km | Open-source | Planetary, needs adaptation |
+
+## Architecture
+
+### Component: Flight Controller Integration (NEW)
+
+| Solution | Tools | Advantages | Limitations | Performance | Fit |
+|----------|-------|-----------|-------------|------------|-----|
+| pymavlink GPS_INPUT | pymavlink | Full MAVLink v2 access, GPS_INPUT support, pure Python, aarch64 compatible | Lower-level API, manual message handling | ~1ms per send | ✅ Best |
+| MAVSDK-Python TelemetryServer | MAVSDK v3.15.3 | Higher-level API, aarch64 wheels | NO GPS_INPUT support, no custom messages | N/A — missing feature | ❌ Blocked |
+| MAVSDK C++ MavlinkDirect | MAVSDK v4 (future) | Custom message support planned | Not available in Python wrapper yet | N/A — not released | ❌ Not available |
+| MAVROS (ROS) | ROS + MAVROS | Full GPS_INPUT support, ROS ecosystem | Heavy ROS dependency, complex setup, unnecessary overhead | ~5ms overhead | ⚠️ Overkill |
+
+**Selected**: **pymavlink** — only viable Python library for GPS_INPUT. Pure Python, works on aarch64, full MAVLink v2 message set.
+
+**Restriction note**: restrictions.md specifies "MAVSDK library" but MAVSDK-Python cannot send GPS_INPUT (confirmed: Issue #320, open since 2021). pymavlink is the necessary alternative.
+
+Configuration:
+- Connection: UART (`/dev/ttyTHS0` or `/dev/ttyTHS1` on Jetson, 115200-921600 baud)
+- Flight controller: GPS1_TYPE=14, SERIAL2_PROTOCOL=2 (MAVLink2)
+- GPS_INPUT rate: 5-10Hz (dedicated output thread)
+- Heartbeat: 1Hz to maintain connection
+
+### Component: Visual Odometry
+
+| Solution | Tools | Advantages | Limitations | Performance | Fit |
+|----------|-------|-----------|-------------|------------|-----|
+| cuVSLAM (mono+IMU) | PyCuVSLAM v15.0.0 | 116fps on Orin Nano, NVIDIA-optimized, loop closure, IMU fallback | Closed-source, low-texture terrain risk | ~9ms/frame | ✅ Best |
+| XFeat frame-to-frame | XFeatTensorRT | Open-source, learned features | No IMU integration, ~30-50ms | ~30-50ms/frame | ⚠️ Fallback |
+| ORB-SLAM3 | OpenCV + custom | Well-understood, open-source | CPU-heavy, ~30fps | ~33ms/frame | ⚠️ Slower |
+
+**Selected**: **cuVSLAM (mono+IMU mode)** — 116fps, purpose-built for Jetson.
+
+### Component: Satellite Image Matching
+
+| Solution | Tools | Advantages | Limitations | Performance | Fit |
+|----------|-------|-----------|-------------|------------|-----|
+| LiteSAM (opt) TRT FP16 @ 1280px | TensorRT | Best satellite-aerial accuracy, 6.31M params | Untested on Orin Nano Super TRT | Est. ~165-330ms TRT FP16 | ✅ If ≤200ms |
+| XFeat semi-dense | XFeatTensorRT | ~50-100ms, Jetson-proven, fastest | General-purpose | ~50-100ms | ✅ Fallback |
+
+**Selection**: Day-one benchmark. LiteSAM TRT FP16 at 1280px → if ≤200ms → LiteSAM. If >200ms → XFeat.
+
+### Component: Sensor Fusion
+
+| Solution | Tools | Advantages | Limitations | Performance | Fit |
+|----------|-------|-----------|-------------|------------|-----|
+| ESKF (custom) | Python/C++ | Lightweight, multi-rate, well-understood | Linear approximation | <1ms/step | ✅ Best |
+| Hybrid ESKF/UKF | Custom | 49% better accuracy | More complex | ~2-3ms/step | ⚠️ Upgrade path |
+
+**Selected**: **ESKF** with adaptive measurement noise. State vector: [position(3), velocity(3), orientation_quat(4), accel_bias(3), gyro_bias(3)] = 16 states.
+
+**Output rates**:
+- IMU prediction: 100+Hz (from flight controller IMU via pymavlink)
+- cuVSLAM VO update: ~3Hz
+- Satellite update: ~0.3-1Hz (keyframes, async)
+- GPS_INPUT output: 5-10Hz (ESKF predicted state)
+
+**Drift budget**: Track √(σ_x² + σ_y²) from ESKF covariance. When approaching 100m → force every-frame satellite matching.
+
+### Component: Ground Station Telemetry (NEW)
+
+| Solution | Tools | Advantages | Limitations | Performance | Fit |
+|----------|-------|-----------|-------------|------------|-----|
+| MAVLink auto-forwarding + NAMED_VALUE_FLOAT | pymavlink | Standard MAVLink, no custom protocol, works with all GCS (Mission Planner, QGC) | Limited bandwidth (~12kbit/s), NAMED_VALUE_FLOAT name limited to 10 chars | ~50 bytes/msg | ✅ Best |
+| Custom MAVLink dialect messages | pymavlink + custom XML | Full flexibility | Requires custom GCS plugin, non-standard | ~50 bytes/msg | ⚠️ Complex |
+| Separate telemetry channel | TCP/UDP over separate radio | Full bandwidth | Extra hardware, extra radio | N/A | ❌ Not available |
+
+**Selected**: **Standard MAVLink forwarding + NAMED_VALUE_FLOAT**
+
+Telemetry data sent to ground station:
+- GPS position: auto-forwarded by flight controller from GPS_INPUT data
+- Confidence score: NAMED_VALUE_FLOAT `"gps_conf"` at 1Hz (values: 1=HIGH, 2=MEDIUM, 3=LOW, 4=VERY_LOW)
+- Drift estimate: NAMED_VALUE_FLOAT `"gps_drift"` at 1Hz (meters)
+- Matching status: NAMED_VALUE_FLOAT `"sat_match"` at 1Hz (0=inactive, 1=matching, 2=failed)
+- Alerts: STATUSTEXT for critical events (re-localization request, system failure)
+
+Re-localization from ground station:
+- Operator sees drift/failure alert in GCS
+- Sends COMMAND_LONG (MAV_CMD_USER_1) with lat/lon in param5/param6
+- Companion computer listens for COMMAND_LONG with target component ID
+- Receives hint → constrains tile search → attempts satellite matching near hint coordinates
+
+### Component: Startup & Lifecycle (NEW)
+
+**Startup sequence**:
+1. Boot Jetson → start GPS-Denied service (systemd)
+2. Connect to flight controller via pymavlink on UART
+3. Wait for heartbeat from flight controller
+4. Read GLOBAL_POSITION_INT → extract lat, lon, alt
+5. Initialize ESKF state with this position (high confidence if real GPS available)
+6. Start cuVSLAM with first camera frames
+7. Begin GPS_INPUT output loop at 5-10Hz
+8. Preload satellite tiles within ±2km of flight plan into RAM
+9. System ready — GPS-Denied active
+
+**GPS denial detection**:
+Not required — the system always outputs GPS_INPUT. If real GPS is available, the flight controller uses whichever GPS source has better accuracy (configurable GPS blending or priority). When real GPS degrades/lost, flight controller seamlessly uses our GPS_INPUT.
+
+**Failsafe**:
+- If no valid position estimate for N seconds (configurable, e.g., 10s): stop sending GPS_INPUT
+- Flight controller detects GPS timeout → falls back to IMU-only dead reckoning
+- System logs failure, continues attempting recovery (VO + satellite matching)
+- When recovery succeeds: resume GPS_INPUT output
+
+**Reboot recovery**:
+1. Jetson reboots → re-establish pymavlink connection
+2. Read GPS_RAW_INT (now IMU-extrapolated by flight controller since GPS_INPUT stopped)
+3. Initialize ESKF with this position (low confidence, horiz_accuracy=100m+)
+4. Resume cuVSLAM + satellite matching → accuracy improves over time
+5. Resume GPS_INPUT output
+
+### Component: Object Localization (UPDATED)
+
+**Two modes**:
+
+**Mode 1: Navigation camera (nadir)**
+Frame-center GPS from ESKF. Any object in navigation camera frame:
+1. Pixel offset from center: (dx_px, dy_px)
+2. Convert to meters: dx_m = dx_px × GSD, dy_m = dy_px × GSD
+3. Rotate by heading (yaw from IMU)
+4. Convert meter offset to lat/lon delta, add to frame-center GPS
+
+**Mode 2: AI camera (configurable angle and zoom)**
+1. Get current UAV position from ESKF
+2. Get AI camera params: tilt_angle (from vertical), pan_angle (from heading), zoom (effective focal length)
+3. Get pixel coordinates of detected object in AI camera frame
+4. Compute bearing: bearing = heading + pan_angle + atan2(dx_px × sensor_width / focal_eff, focal_eff)
+5. Compute ground distance: for flat terrain, slant_range = altitude / cos(tilt_angle + dy_angle), ground_range = slant_range × sin(tilt_angle + dy_angle)
+6. Convert bearing + ground_range to lat/lon offset
+7. Return GPS coordinates with accuracy estimate
+
+**Local API** (FastAPI on localhost:8000):
+- `POST /localize` — accepts: pixel_x, pixel_y, camera_id ("nav" or "ai"), ai_camera_params (tilt, pan, zoom) → returns: lat, lon, accuracy_m
+- `GET /status` — returns: system state, confidence, drift, uptime
+
+### Component: Satellite Tile Preprocessing (Offline)
+
+**Selected**: GeoHash-indexed tile pairs on disk + RAM preloading.
+
+Pipeline:
+1. Define operational area from flight plan
+2. Download satellite tiles from Google Maps Tile API at zoom 19 (0.3 m/pixel)
+3. If zoom 19 unavailable: fall back to zoom 18 (0.6 m/pixel — meets ≥0.5 m/pixel requirement)
+4. Validate: resolution ≥0.5 m/pixel, check imagery staleness where possible
+5. Pre-resize each tile to matcher input resolution
+6. Store: original + resized + metadata (GPS bounds, zoom, GSD, download date) in GeoHash-indexed structure
+7. Copy to Jetson storage before flight
+8. At startup: preload tiles within ±2km of flight plan into RAM
+
+### Component: Re-localization (Disconnected Segments)
+
+When cuVSLAM reports tracking loss (sharp turn, no features):
+1. Flag next frame as keyframe → trigger satellite matching
+2. Compute IMU dead-reckoning position since last known position
+3. Rank preloaded tiles by distance to dead-reckoning position
+4. Try top 3 tiles sequentially
+5. If match found: position recovered, new segment begins
+6. If 3 consecutive keyframe failures: send STATUSTEXT alert to ground station ("RE-LOC REQUEST: position uncertain, drift Xm")
+7. While waiting for operator hint: continue VO/IMU dead reckoning, report low confidence via horiz_accuracy
+8. If operator sends COMMAND_LONG with lat/lon hint: constrain tile search to ±500m of hint
+9. If still no match after operator hint: continue dead reckoning, log failure
+
+### Component: Thermal Management (NEW)
+
+**Power mode**: 25W (stable sustained performance)
+
+**Monitoring**: Read GPU/CPU temperature via tegrastats or sysfs thermal zones at 1Hz.
+
+**Adaptive pipeline**:
+- Normal (<70°C): Full pipeline — cuVSLAM every frame + satellite match every 3-10 frames
+- Warm (70-75°C): Reduce satellite matching to every 5-10 frames
+- Hot (75-80°C): Reduce satellite matching to every 10-15 frames
+- Throttling (>80°C): Disable satellite matching entirely, VO+IMU only (cuVSLAM ~9ms is very light). Report LOW confidence. Resume satellite matching when temp drops below 75°C
+
+**Hardware requirement**: Active cooling fan (5V) mandatory for UAV companion computer enclosure.
+
+## Processing Time Budget (per frame, 333ms interval)
+
+### Normal Frame (non-keyframe, ~60-80% of frames)
+
+| Step | Time | Notes |
+|------|------|-------|
+| Image capture + transfer | ~10ms | CSI/USB3 |
+| Downsample (for cuVSLAM) | ~2ms | OpenCV CUDA |
+| cuVSLAM VO+IMU | ~9ms | NVIDIA CUDA-optimized, 116fps |
+| ESKF measurement update | ~1ms | NumPy |
+| **Total per camera frame** | **~22ms** | Well within 333ms |
+
+GPS_INPUT output runs independently at 5-10Hz (every 100-200ms):
+| Step | Time | Notes |
+|------|------|-------|
+| Read ESKF state | <0.1ms | Shared state |
+| Compute horiz_accuracy | <0.1ms | √(σ²) |
+| pymavlink gps_input_send | ~1ms | UART write |
+| **Total per GPS_INPUT** | **~1ms** | Negligible overhead |
+
+### Keyframe Satellite Matching (async, every 3-10 frames)
+
+Runs on separate CUDA stream — does NOT block VO or GPS_INPUT.
+
+**Path A — LiteSAM TRT FP16 at 1280px (if ≤200ms benchmark)**:
+
+| Step | Time | Notes |
+|------|------|-------|
+| Downsample to 1280px | ~1ms | OpenCV CUDA |
+| Load satellite tile | ~1ms | Pre-loaded in RAM |
+| LiteSAM (opt) TRT FP16 | ≤200ms | Go/no-go threshold |
+| Geometric pose (RANSAC) | ~5ms | Homography |
+| ESKF satellite update | ~1ms | Delayed measurement |
+| **Total** | **≤210ms** | Async |
+
+**Path B — XFeat (if LiteSAM >200ms)**:
+
+| Step | Time | Notes |
+|------|------|-------|
+| XFeat extraction + matching | ~50-80ms | TensorRT FP16 |
+| Geometric verification (RANSAC) | ~5ms | |
+| ESKF satellite update | ~1ms | |
+| **Total** | **~60-90ms** | Async |
+
+## Memory Budget (Jetson Orin Nano Super, 8GB shared)
+
+| Component | Memory | Notes |
+|-----------|--------|-------|
+| OS + runtime | ~1.5GB | JetPack 6.2 + Python |
+| cuVSLAM | ~200-500MB | CUDA library + map state (configure pruning for 3000 frames) |
+| Satellite matcher TensorRT | ~50-100MB | LiteSAM FP16 or XFeat FP16 |
+| Preloaded satellite tiles | ~200MB | ±2km of flight plan |
+| pymavlink + MAVLink runtime | ~20MB | Lightweight |
+| FastAPI (local IPC) | ~50MB | Minimal, localhost only |
+| Current frame buffer | ~2MB | |
+| ESKF state + buffers | ~10MB | |
+| **Total** | **~2.1-2.9GB** | ~26-36% of 8GB — comfortable |
+
+## Confidence Scoring → GPS_INPUT Mapping
+
+| Level | Condition | horiz_accuracy (m) | fix_type | GPS_INPUT satellites_visible |
+|-------|-----------|---------------------|----------|------------------------------|
+| HIGH | Satellite match succeeded + cuVSLAM consistent | 10-20 | 3 (3D) | 12 |
+| MEDIUM | cuVSLAM VO only, recent satellite correction (<500m travel) | 20-50 | 3 (3D) | 8 |
+| LOW | cuVSLAM VO only, no recent correction, OR high thermal throttling | 50-100 | 2 (2D) | 4 |
+| VERY LOW | IMU dead-reckoning only | 100-500 | 1 (no fix) | 1 |
+| MANUAL | Operator-provided re-localization hint | 200 | 3 (3D) | 6 |
+
+Note: `satellites_visible` is synthetic — used to influence EKF weighting. ArduPilot gives more weight to GPS with higher satellite count and lower horiz_accuracy.
+
+## Key Risks and Mitigations
+
+| Risk | Likelihood | Impact | Mitigation |
+|------|-----------|--------|------------|
+| **MAVSDK cannot send GPS_INPUT** | CONFIRMED | Must use pymavlink (conflicts with restriction) | Use pymavlink. Document restriction conflict. No alternative in Python. |
+| **cuVSLAM fails on low-texture agricultural terrain** | HIGH | Frequent tracking loss, degraded VO | Increase satellite matching frequency. IMU dead-reckoning bridge. Accept higher drift. |
+| **Jetson UART instability with ArduPilot** | MEDIUM | MAVLink connection drops | Test thoroughly. Use USB serial adapter if UART unreliable. Add watchdog reconnect. |
+| **Thermal throttling blows satellite matching budget** | MEDIUM | Miss keyframe windows | Adaptive pipeline: reduce/skip satellite matching at high temp. Active cooling mandatory. |
+| LiteSAM TRT FP16 >200ms at 1280px | MEDIUM | Must use XFeat | Day-one benchmark. XFeat fallback. |
+| XFeat cross-view accuracy insufficient | MEDIUM | Satellite corrections less accurate | Multi-tile consensus, strict RANSAC, increase keyframe frequency. |
+| cuVSLAM map memory growth on long flights | MEDIUM | Memory pressure | Configure map pruning, max keyframes. |
+| Google Maps satellite quality in conflict zone | HIGH | Satellite matching fails | Accept VO+IMU with higher drift. Alternative providers. |
+| GPS_INPUT at 3Hz too slow for ArduPilot EKF | HIGH | Poor EKF fusion, position jumps | 5-10Hz output with IMU interpolation between camera frames. |
+| Companion computer reboot mid-flight | LOW | ~30-60s GPS gap | Flight controller IMU fallback. Automatic recovery on restart. |
+| Telemetry bandwidth saturation | LOW | Custom messages compete with autopilot telemetry | Limit NAMED_VALUE_FLOAT to 1Hz. Keep messages compact. |
+
+## Testing Strategy
+
+### Integration / Functional Tests
+- End-to-end: camera → cuVSLAM → ESKF → GPS_INPUT → verify flight controller receives valid position
+- Compare computed positions against ground truth GPS from coordinates.csv
+- Measure: percentage within 50m (target: 80%), percentage within 20m (target: 60%)
+- Test GPS_INPUT rate: verify 5-10Hz output to flight controller
+- Test sharp-turn handling: verify satellite re-localization after 90-degree heading change
+- Test disconnected segments: simulate 3+ route breaks, verify all segments connected
+- Test re-localization: simulate 3 consecutive failures → verify STATUSTEXT sent → inject COMMAND_LONG hint → verify recovery
+- Test object localization: send POST /localize with known AI camera params → verify GPS accuracy
+- Test startup: verify ESKF initializes from flight controller GPS
+- Test reboot recovery: kill process → restart → verify reconnection and position recovery
+- Test failsafe: simulate total failure → verify GPS_INPUT stops → verify flight controller IMU fallback
+- Test cuVSLAM map memory: run 3000-frame session, monitor memory growth
+
+### Non-Functional Tests
+- **Day-one satellite matcher benchmark**: LiteSAM TRT FP16 at 1280px on Orin Nano Super
+- cuVSLAM benchmark: verify 116fps monocular+IMU on Orin Nano Super
+- cuVSLAM terrain stress test: urban, agricultural, water, forest
+- **UART reliability test**: sustained pymavlink communication over 1+ hour
+- **Thermal endurance test**: run full pipeline for 30+ minutes, measure GPU temp, verify no throttling with active cooling
+- Per-frame latency: must be <400ms for VO pipeline
+- GPS_INPUT latency: measure time from camera capture to GPS_INPUT send
+- Memory: peak usage during 3000-frame session (must stay <8GB)
+- Drift budget: verify ESKF covariance tracks cumulative drift, triggers satellite matching before 100m
+- Telemetry bandwidth: measure total MAVLink bandwidth used by companion computer
+
+## References
+- pymavlink GPS_INPUT example: https://webperso.ensta.fr/lebars/Share/GPS_INPUT_pymavlink.py
+- pymavlink mavgps.py: https://github.com/ArduPilot/pymavlink/blob/master/examples/mavgps.py
+- ArduPilot GPS Input module: https://ardupilot.org/mavproxy/docs/modules/GPSInput.html
+- MAVLink GPS_INPUT message spec: https://mavlink.io/en/messages/common.html#GPS_INPUT
+- MAVSDK-Python GPS_INPUT limitation: https://github.com/mavlink/MAVSDK-Python/issues/320
+- MAVSDK-Python custom message limitation: https://github.com/mavlink/MAVSDK-Python/issues/739
+- ArduPilot companion computer setup: https://ardupilot.org/dev/docs/raspberry-pi-via-mavlink.html
+- Jetson Orin UART with ArduPilot: https://forums.developer.nvidia.com/t/uart-connection-between-jetson-nano-orin-and-ardupilot/325416
+- MAVLink NAMED_VALUE_FLOAT: https://mavlink.io/en/messages/common.html#NAMED_VALUE_FLOAT
+- MAVLink STATUSTEXT: https://mavlink.io/en/messages/common.html#STATUSTEXT
+- MAVLink telemetry bandwidth: https://github.com/mavlink/mavlink/issues/1605
+- JetPack 6.2 Super Mode: https://developer.nvidia.com/blog/nvidia-jetpack-6-2-brings-super-mode-to-nvidia-jetson-orin-nano-and-jetson-orin-nx-modules/
+- Jetson Orin Nano power consumption: https://edgeaistack.app/blog/jetson-orin-nano-power-consumption/
+- UAV target geolocation: https://www.mdpi.com/1424-8220/22/5/1903
+- LiteSAM (2025): https://www.mdpi.com/2072-4292/17/19/3349
+- LiteSAM code: https://github.com/boyagesmile/LiteSAM
+- cuVSLAM (2025-2026): https://github.com/NVlabs/PyCuVSLAM
+- PyCuVSLAM API: https://nvlabs.github.io/PyCuVSLAM/api.html
+- Intermodalics cuVSLAM benchmark: https://www.intermodalics.ai/blog/nvidia-isaac-ros-in-depth-cuvslam-and-the-dp3-1-release
+- XFeat (CVPR 2024): https://arxiv.org/abs/2404.19174
+- XFeat TensorRT for Jetson: https://github.com/PranavNedunghat/XFeatTensorRT
+- EfficientLoFTR (CVPR 2024): https://github.com/zju3dv/EfficientLoFTR
+- STHN (IEEE RA-L 2024): https://github.com/arplaboratory/STHN
+- JointLoc (IROS 2024): https://github.com/LuoXubo/JointLoc
+- Hybrid ESKF/UKF: https://arxiv.org/abs/2512.17505
+- Google Maps Tile API: https://developers.google.com/maps/documentation/tile/satellite
+- ArduPilot EKF Source Selection: https://ardupilot.org/copter/docs/common-ekf-sources.html
+- Mateos-Ramirez et al. (2024): https://www.mdpi.com/2076-3417/14/16/7420
+- SatLoc (2025): https://www.scilit.com/publications/e5cafaf875a49297a62b298a89d5572f
+
+## Related Artifacts
+- AC Assessment: `_docs/00_research/gps_denied_nav/00_ac_assessment.md`
+- Research artifacts: `_docs/00_research/gps_denied_nav_v3/`
+- Tech stack evaluation: `_docs/01_solution/tech_stack.md`
+- Security analysis: `_docs/01_solution/security_analysis.md`
@@ -0,0 +1,385 @@
+# Solution Draft
+
+## Assessment Findings
+
+| Old Component Solution | Weak Point (functional/security/performance) | New Solution |
+|------------------------|----------------------------------------------|-------------|
+| ONNX Runtime as potential inference runtime for AI models | **Performance**: ONNX Runtime CUDA EP on Jetson Orin Nano is 7-8x slower than TRT standalone with default settings (tensor cores not utilized). Even TRT-EP shows up to 3x overhead on some models. | **Use native TRT Engine for all AI models**. Convert PyTorch → ONNX → trtexec → .engine. Load with tensorrt Python module. Eliminates ONNX Runtime dependency entirely. |
+| ONNX Runtime TRT-EP memory overhead | **Performance**: ONNX RT TRT-EP keeps serialized engine in memory (~420-440MB vs 130-140MB native TRT). Delta ~280-300MB PER MODEL. On 8GB shared memory, this wastes ~560-600MB for two models. | **Native TRT releases serialized blob after deserialization** → saves ~280-300MB per model. Total savings ~560-600MB — 7% of total memory. Critical given cuVSLAM map growth risk. |
+| No explicit TRT engine build step in offline pipeline | **Functional**: Draft03 mentions TRT FP16 but doesn't define the build workflow. When/where are engines built? | **Add TRT engine build to offline preparation pipeline**: After satellite tile download, run trtexec on Jetson to build .engine files. Store alongside tiles. One-time cost per model version. |
+| Cross-platform portability via ONNX Runtime | **Functional**: ONNX Runtime's primary value is cross-platform support. Our deployment is Jetson-only — this value is zero. We pay the performance/memory tax for unused portability. | **Drop ONNX Runtime**. Jetson Orin Nano Super is fixed deployment hardware. TRT Engine is the optimal runtime for NVIDIA-only deployment. |
+| No DLA offloading considered | **Performance**: Draft03 doesn't mention DLA. Jetson Orin Nano has NO DLA cores — only Orin NX (1-2) and AGX Orin (2) have DLA. | **Confirm: DLA offloading is NOT available on Orin Nano**. All inference must run on GPU (1024 CUDA cores, 16 tensor cores). This makes maximizing GPU efficiency via native TRT even more critical. |
+| LiteSAM MinGRU TRT compatibility risk | **Functional**: LiteSAM's subpixel refinement uses 4 stacked MinGRU layers over a 3×3 candidate window (seq_len=9). MinGRU gates depend only on input C_f (not h_{t-1}), so z_t/h̃_t are pre-computable. Ops are standard: Linear, Sigmoid, Mul, Add, ReLU, Tanh. Risk is LOW-MEDIUM — depends on whether implementation uses logcumsumexp (problematic) or simple loop (fine). Seq_len=9 makes this trivially rewritable. | **Day-one verification**: clone LiteSAM repo → torch.onnx.export → polygraphy inspect → trtexec --fp16. If export fails on MinGRU: rewrite forward() as unrolled loop (9 steps). **If LiteSAM cannot be made TRT-compatible: replace with EfficientLoFTR TRT** (proven TRT path via Coarse_LoFTR_TRT, 15.05M params, semi-dense matching). |
+
+## Product Solution Description
+
+A real-time GPS-denied visual navigation system for fixed-wing UAVs, running on a Jetson Orin Nano Super (8GB). All AI model inference uses **native TensorRT Engine files** — no ONNX Runtime dependency. The system replaces the GPS module by sending MAVLink GPS_INPUT messages via pymavlink over UART at 5-10Hz.
+
+Position is determined by fusing: (1) CUDA-accelerated visual odometry (cuVSLAM — native CUDA), (2) absolute position corrections from satellite image matching (LiteSAM or XFeat — TRT Engine FP16), and (3) IMU data from the flight controller via ESKF.
+
+**Inference runtime decision**: Native TRT Engine over ONNX Runtime because:
+1. ONNX RT CUDA EP is 7-8x slower on Orin Nano (tensor core bug)
+2. ONNX RT TRT-EP wastes ~280-300MB per model (serialized engine retained in memory)
+3. Cross-platform portability has zero value — deployment is Jetson-only
+4. Native TRT provides direct CUDA stream control for pipelining with cuVSLAM
+
+**Hard constraint**: Camera shoots at ~3fps (333ms interval). Full VO+ESKF pipeline within 400ms. GPS_INPUT at 5-10Hz.
+
+**AI Model Runtime Summary**:
+
+| Model | Runtime | Precision | Memory | Integration |
+|-------|---------|-----------|--------|-------------|
+| cuVSLAM | Native CUDA (PyCuVSLAM) | N/A (closed-source) | ~200-500MB | CUDA Stream A |
+| LiteSAM | TRT Engine | FP16 | ~50-80MB | CUDA Stream B |
+| XFeat | TRT Engine | FP16 | ~30-50MB | CUDA Stream B (fallback) |
+| ESKF | CPU (Python/C++) | FP64 | ~10MB | CPU thread |
+
+**Offline Preparation Pipeline** (before flight):
+1. Download satellite tiles → validate → pre-resize → store (existing)
+2. **NEW: Build TRT engines on Jetson** (one-time per model version)
+   - `trtexec --onnx=litesam_fp16.onnx --saveEngine=litesam.engine --fp16`
+   - `trtexec --onnx=xfeat.onnx --saveEngine=xfeat.engine --fp16`
+3. Copy tiles + engines to Jetson storage
+4. At startup: load engines + preload tiles into RAM
+
+```
+┌─────────────────────────────────────────────────────────────────────┐
+│                    OFFLINE (Before Flight)                           │
+│  1. Satellite Tiles → Download & Validate → Pre-resize → Store      │
+│     (Google Maps)     (≥0.5m/px, <2yr)     (matcher res)  (GeoHash)│
+│  2. TRT Engine Build (one-time per model version):                  │
+│     PyTorch model → reparameterize → ONNX export → trtexec --fp16  │
+│     Output: litesam.engine, xfeat.engine                            │
+│  3. Copy tiles + engines to Jetson storage                          │
+└─────────────────────────────────────────────────────────────────────┘
+                              │
+                              ▼
+┌─────────────────────────────────────────────────────────────────────┐
+│                    ONLINE (During Flight)                            │
+│                                                                     │
+│  STARTUP:                                                           │
+│  1. pymavlink → read GLOBAL_POSITION_INT → init ESKF                │
+│  2. Load TRT engines: litesam.engine + xfeat.engine                 │
+│     (tensorrt.Runtime → deserialize_cuda_engine → create_context)   │
+│  3. Allocate GPU buffers for TRT input/output (PyCUDA)              │
+│  4. Start cuVSLAM with first camera frames                         │
+│  5. Preload satellite tiles ±2km into RAM                           │
+│  6. Begin GPS_INPUT output loop at 5-10Hz                           │
+│                                                                     │
+│  EVERY FRAME (3fps, 333ms interval):                                │
+│  ┌──────────────────────────────────────┐                           │
+│  │ Nav Camera → Downsample (CUDA ~2ms)  │                           │
+│  │           → cuVSLAM VO+IMU (~9ms)    │ ← CUDA Stream A          │
+│  │           → ESKF measurement update  │                           │
+│  └──────────────────────────────────────┘                           │
+│                                                                     │
+│  5-10Hz CONTINUOUS:                                                 │
+│  ┌──────────────────────────────────────┐                           │
+│  │ ESKF IMU prediction → GPS_INPUT send │──→ Flight Controller      │
+│  └──────────────────────────────────────┘                           │
+│                                                                     │
+│  KEYFRAMES (every 3-10 frames, async):                              │
+│  ┌──────────────────────────────────────┐                           │
+│  │ TRT Engine inference (Stream B):     │                           │
+│  │   context.enqueue_v3(stream_B)       │──→ ESKF correction        │
+│  │   LiteSAM FP16 or XFeat FP16        │                           │
+│  └──────────────────────────────────────┘                           │
+│                                                                     │
+│  TELEMETRY (1Hz):                                                   │
+│  ┌──────────────────────────────────────┐                           │
+│  │ NAMED_VALUE_FLOAT: confidence, drift │──→ Ground Station         │
+│  └──────────────────────────────────────┘                           │
+└─────────────────────────────────────────────────────────────────────┘
+```
+
+## Architecture
+
+### Component: AI Model Inference Runtime
+
+| Solution | Tools | Advantages | Limitations | Performance | Memory | Fit |
+|----------|-------|-----------|-------------|------------|--------|-----|
+| Native TRT Engine | tensorrt Python + PyCUDA + trtexec | Optimal latency, minimal memory, full tensor core usage, direct CUDA stream control | Hardware-specific engines, manual buffer management, rebuild per TRT version | Optimal | ~50-130MB total (both models) | ✅ Best |
+| ONNX Runtime TRT-EP | onnxruntime + TensorRT EP | Auto-fallback for unsupported ops, simpler API, auto engine caching | +280-300MB per model, wrapper overhead, first-run latency spike | Near-parity (claimed), up to 3x slower (observed) | ~640-690MB total (both models) | ❌ Memory overhead unacceptable |
+| ONNX Runtime CUDA EP | onnxruntime + CUDA EP | Simplest API, broadest op support | 7-8x slower on Orin Nano (tensor core bug), no TRT optimizations | 7-8x slower | Standard | ❌ Performance unacceptable |
+| Torch-TensorRT | torch_tensorrt | AOT compilation, PyTorch-native, handles mixed TRT/PyTorch | Newer on Jetson, requires PyTorch runtime at inference | Near native TRT | PyTorch runtime ~500MB+ | ⚠️ Viable alternative if TRT export fails |
+
+**Selected**: **Native TRT Engine** — optimal performance and memory on our fixed NVIDIA hardware.
+
+**Fallback**: If any model has unsupported TRT ops (e.g., MinGRU in LiteSAM), use **Torch-TensorRT** for that specific model. Torch-TensorRT handles mixed TRT/PyTorch execution but requires PyTorch runtime in memory.
+
+### Component: TRT Engine Conversion Workflow
+
+**LiteSAM conversion**:
+1. Load PyTorch model with trained weights
+2. Reparameterize MobileOne backbone (collapse multi-branch → single Conv2d+BN)
+3. Export to ONNX: `torch.onnx.export(model, dummy_input, "litesam.onnx", opset_version=17)`
+4. Verify with polygraphy: `polygraphy inspect model litesam.onnx`
+5. Build engine on Jetson: `trtexec --onnx=litesam.onnx --saveEngine=litesam.engine --fp16 --memPoolSize=workspace:2048`
+6. Verify engine: `trtexec --loadEngine=litesam.engine --fp16`
+
+**XFeat conversion**:
+1. Load PyTorch model
+2. Export to ONNX: `torch.onnx.export(model, dummy_input, "xfeat.onnx", opset_version=17)`
+3. Build engine on Jetson: `trtexec --onnx=xfeat.onnx --saveEngine=xfeat.engine --fp16`
+4. Alternative: use XFeatTensorRT C++ implementation directly
+
+**INT8 quantization strategy** (optional, future optimization):
+- MobileOne backbone (CNN): INT8 safe with calibration data
+- TAIFormer (transformer attention): FP16 only — INT8 degrades accuracy
+- XFeat: evaluate INT8 on actual UAV-satellite pairs before deploying
+- Use nvidia-modelopt for calibration: `from modelopt.onnx.quantization import quantize`
+
+### Component: TRT Python Inference Wrapper
+
+Minimal wrapper class for TRT engine inference:
+
+```python
+import tensorrt as trt
+import pycuda.driver as cuda
+
+class TRTInference:
+    def __init__(self, engine_path, stream):
+        self.logger = trt.Logger(trt.Logger.WARNING)
+        self.runtime = trt.Runtime(self.logger)
+        with open(engine_path, 'rb') as f:
+            self.engine = self.runtime.deserialize_cuda_engine(f.read())
+        self.context = self.engine.create_execution_context()
+        self.stream = stream
+        self._allocate_buffers()
+
+    def _allocate_buffers(self):
+        self.inputs = {}
+        self.outputs = {}
+        for i in range(self.engine.num_io_tensors):
+            name = self.engine.get_tensor_name(i)
+            shape = self.engine.get_tensor_shape(name)
+            dtype = trt.nptype(self.engine.get_tensor_dtype(name))
+            size = trt.volume(shape)
+            device_mem = cuda.mem_alloc(size * np.dtype(dtype).itemsize)
+            self.context.set_tensor_address(name, int(device_mem))
+            mode = self.engine.get_tensor_mode(name)
+            if mode == trt.TensorIOMode.INPUT:
+                self.inputs[name] = (device_mem, shape, dtype)
+            else:
+                self.outputs[name] = (device_mem, shape, dtype)
+
+    def infer_async(self, input_data):
+        for name, data in input_data.items():
+            cuda.memcpy_htod_async(self.inputs[name][0], data, self.stream)
+        self.context.enqueue_v3(self.stream.handle)
+
+    def get_output(self):
+        results = {}
+        for name, (dev_mem, shape, dtype) in self.outputs.items():
+            host_mem = np.empty(shape, dtype=dtype)
+            cuda.memcpy_dtoh_async(host_mem, dev_mem, self.stream)
+        self.stream.synchronize()
+        return results
+```
+
+Key design: `infer_async()` + `get_output()` split enables pipelining with cuVSLAM on Stream A while satellite matching runs on Stream B.
+
+### Component: Visual Odometry (UNCHANGED)
+
+cuVSLAM — native CUDA library, not affected by TRT migration. Already optimal.
+
+### Component: Satellite Image Matching (UPDATED runtime + fallback chain)
+
+| Solution | Tools | Advantages | Limitations | Performance (est. Orin Nano Super TRT FP16) | Params | Fit |
+|----------|-------|-----------|-------------|----------------------------------------------|--------|-----|
+| LiteSAM (opt) TRT Engine FP16 @ 1280px | trtexec + tensorrt Python | Best satellite-aerial accuracy (RMSE@30=17.86m UAV-VisLoc), 6.31M params, smallest model | MinGRU TRT export needs verification (LOW-MEDIUM risk) | Est. ~165-330ms | 6.31M | ✅ Primary (if TRT export succeeds AND ≤200ms) |
+| EfficientLoFTR TRT Engine FP16 | trtexec + tensorrt Python | Proven TRT path (Coarse_LoFTR_TRT repo, 138 stars). Semi-dense. CVPR 2024. High accuracy. | 2.4x more params than LiteSAM. Requires einsum→elementary ops rewrite for TRT (documented in Coarse_LoFTR_TRT paper). | Est. ~200-400ms | 15.05M | ✅ Fallback if LiteSAM TRT fails |
+| XFeat TRT Engine FP16 | trtexec + tensorrt Python (or XFeatTensorRT C++) | Fastest. Proven TRT implementation. Lightweight. | General-purpose, not designed for cross-view satellite-aerial gap (but nadir-nadir gap is small). | Est. ~50-100ms | <5M | ✅ Speed fallback |
+
+**Decision tree (day-one on Orin Nano Super)**:
+1. Clone LiteSAM repo → reparameterize MobileOne → `torch.onnx.export()` → `polygraphy inspect`
+2. If ONNX export succeeds → `trtexec --onnx=litesam.onnx --saveEngine=litesam.engine --fp16`
+3. If MinGRU causes ONNX/TRT failure → rewrite MinGRU forward() as unrolled 9-step loop → retry
+4. If rewrite fails or accuracy degrades → **switch to EfficientLoFTR TRT**:
+   - Apply Coarse_LoFTR_TRT TRT-adaptation techniques (einsum replacement, etc.)
+   - Export to ONNX → trtexec --fp16
+   - Benchmark at 640×480 and 1280px
+5. Benchmark winner: **if ≤200ms → use it. If >200ms but ≤300ms → acceptable (async on Stream B). If >300ms → use XFeat TRT**
+
+**EfficientLoFTR TRT adaptation** (from Coarse_LoFTR_TRT paper, proven workflow):
+- Replace `torch.einsum()` with elementary ops (view, bmm, reshape, sum)
+- Replace any TRT-incompatible high-level PyTorch functions
+- Use ONNX export path (less memory required than Torch-TensorRT on 8GB device)
+- Knowledge distillation available for further parameter reduction if needed
+
+### Component: Sensor Fusion (UNCHANGED)
+ESKF — CPU-based mathematical filter, not affected.
+
+### Component: Flight Controller Integration (UNCHANGED)
+pymavlink — not affected by TRT migration.
+
+### Component: Ground Station Telemetry (UNCHANGED)
+MAVLink NAMED_VALUE_FLOAT — not affected.
+
+### Component: Startup & Lifecycle (UPDATED)
+
+**Updated startup sequence**:
+1. Boot Jetson → start GPS-Denied service (systemd)
+2. Connect to flight controller via pymavlink on UART
+3. Wait for heartbeat from flight controller
+4. **Initialize PyCUDA context**
+5. **Load TRT engines**: litesam.engine + xfeat.engine via tensorrt.Runtime.deserialize_cuda_engine()
+6. **Allocate GPU I/O buffers** for both models
+7. **Create CUDA streams**: Stream A (cuVSLAM), Stream B (satellite matching)
+8. Read GLOBAL_POSITION_INT → init ESKF
+9. Start cuVSLAM with first camera frames
+10. Begin GPS_INPUT output loop at 5-10Hz
+11. Preload satellite tiles within ±2km into RAM
+12. System ready
+
+**Engine load time**: ~1-3 seconds per engine (deserialization from .engine file). One-time cost at startup.
+
+### Component: Thermal Management (UNCHANGED)
+Same adaptive pipeline. TRT engines are slightly more power-efficient than ONNX Runtime, but the difference is within noise.
+
+### Component: Object Localization (UNCHANGED)
+Not affected — trigonometric calculation, no AI inference.
+
+## Speed Optimization Techniques
+
+### 1. cuVSLAM for Visual Odometry (~9ms/frame)
+Unchanged from draft03. Native CUDA, not part of TRT migration.
+
+### 2. Native TRT Engine Inference (NEW)
+All AI models run as pre-compiled TRT FP16 engines:
+- Engine files built offline with trtexec (one-time per model version)
+- Loaded at startup (~1-3s per engine)
+- Inference via context.enqueue_v3() on dedicated CUDA Stream B
+- GPU buffers pre-allocated — zero runtime allocation during flight
+- No ONNX Runtime dependency — no framework overhead
+
+Memory advantage over ONNX Runtime TRT-EP: ~560-600MB saved (both models combined).
+Latency advantage: eliminates ONNX wrapper overhead, guaranteed tensor core utilization.
+
+### 3. CUDA Stream Pipelining (REFINED)
+- Stream A: cuVSLAM VO for current frame (~9ms) + ESKF fusion (~1ms)
+- Stream B: TRT engine inference for satellite matching (LiteSAM or XFeat, async)
+- CPU: GPS_INPUT output loop, NAMED_VALUE_FLOAT, command listener, tile management
+- **NEW**: Both cuVSLAM and TRT engines use CUDA streams natively — no framework abstraction layer. Direct GPU scheduling.
+
+### 4-7. (UNCHANGED from draft03)
+Keyframe-based satellite matching, TensorRT FP16 optimization, proactive tile loading, 5-10Hz GPS_INPUT output — all unchanged.
+
+## Processing Time Budget (per frame, 333ms interval)
+
+### Normal Frame (non-keyframe)
+Unchanged from draft03 — cuVSLAM dominates at ~22ms total.
+
+### Keyframe Satellite Matching (async, CUDA Stream B)
+
+**Path A — LiteSAM TRT Engine FP16 at 1280px**:
+
+| Step | Time | Notes |
+|------|------|-------|
+| Downsample to 1280px | ~1ms | OpenCV CUDA |
+| Load satellite tile | ~1ms | Pre-loaded in RAM |
+| Copy input to GPU buffer | <0.5ms | PyCUDA memcpy_htod_async |
+| LiteSAM TRT Engine FP16 | ≤200ms | context.enqueue_v3(stream_B) |
+| Copy output from GPU | <0.5ms | PyCUDA memcpy_dtoh_async |
+| Geometric pose (RANSAC) | ~5ms | Homography |
+| ESKF satellite update | ~1ms | Delayed measurement |
+| **Total** | **≤210ms** | Async on Stream B |
+
+**Path B — XFeat TRT Engine FP16**:
+
+| Step | Time | Notes |
+|------|------|-------|
+| XFeat TRT Engine inference | ~50-80ms | context.enqueue_v3(stream_B) |
+| Geometric verification (RANSAC) | ~5ms | |
+| ESKF satellite update | ~1ms | |
+| **Total** | **~60-90ms** | Async on Stream B |
+
+## Memory Budget (Jetson Orin Nano Super, 8GB shared)
+
+| Component | Memory (Native TRT) | Memory (ONNX RT TRT-EP) | Notes |
+|-----------|---------------------|--------------------------|-------|
+| OS + runtime | ~1.5GB | ~1.5GB | JetPack 6.2 + Python |
+| cuVSLAM | ~200-500MB | ~200-500MB | CUDA library + map |
+| **LiteSAM TRT engine** | **~50-80MB** | **~330-360MB** | Native TRT vs TRT-EP. If LiteSAM fails: EfficientLoFTR ~100-150MB |
+| **XFeat TRT engine** | **~30-50MB** | **~310-330MB** | Native TRT vs TRT-EP |
+| Preloaded satellite tiles | ~200MB | ~200MB | ±2km of flight plan |
+| pymavlink + MAVLink | ~20MB | ~20MB | |
+| FastAPI (local IPC) | ~50MB | ~50MB | |
+| ESKF + buffers | ~10MB | ~10MB | |
+| ONNX Runtime framework | **0MB** | **~150MB** | Eliminated with native TRT |
+| **Total** | **~2.1-2.9GB** | **~2.8-3.6GB** | |
+| **% of 8GB** | **26-36%** | **35-45%** | |
+| **Savings** | — | — | **~700MB saved with native TRT** |
+
+## Confidence Scoring → GPS_INPUT Mapping
+Unchanged from draft03.
+
+## Key Risks and Mitigations
+
+| Risk | Likelihood | Impact | Mitigation |
+|------|-----------|--------|------------|
+| **LiteSAM MinGRU ops unsupported in TRT 10.3** | LOW-MEDIUM | LiteSAM TRT export fails | Day-one verification: ONNX export → polygraphy → trtexec. If MinGRU fails: (1) rewrite as unrolled 9-step loop, (2) if still fails: **switch to EfficientLoFTR TRT** (proven TRT path, Coarse_LoFTR_TRT, 15.05M params). XFeat TRT as speed fallback. |
+| **TRT engine build OOM on 8GB Jetson** | LOW | Cannot build engines on target device | Our models are small (6.31M LiteSAM, <5M XFeat). OOM unlikely. If occurs: reduce --memPoolSize, or build on identical Orin Nano module with more headroom |
+| **Engine incompatibility after JetPack update** | MEDIUM | Must rebuild engines | Include engine rebuild in JetPack update procedure. Takes minutes per model. |
+| **MAVSDK cannot send GPS_INPUT** | CONFIRMED | Must use pymavlink | Unchanged from draft03 |
+| **cuVSLAM fails on low-texture terrain** | HIGH | Frequent tracking loss | Unchanged from draft03 |
+| **Thermal throttling** | MEDIUM | Satellite matching budget blown | Unchanged from draft03 |
+| LiteSAM TRT FP16 >200ms at 1280px | MEDIUM | Must use fallback matcher | Day-one benchmark. Fallback chain: EfficientLoFTR TRT (if ≤300ms) → XFeat TRT (if all >300ms) |
+| Google Maps satellite quality in conflict zone | HIGH | Satellite matching fails | Unchanged from draft03 |
+
+## Testing Strategy
+
+### Integration / Functional Tests
+All tests from draft03 unchanged, plus:
+- **TRT engine load test**: Verify litesam.engine and xfeat.engine load successfully on Jetson Orin Nano Super
+- **TRT inference correctness**: Compare TRT engine output vs PyTorch reference output (max L1 error < 0.01)
+- **CUDA Stream B pipelining**: Verify satellite matching on Stream B does not block cuVSLAM on Stream A
+- **Engine pre-built validation**: Verify engine files from offline preparation work without rebuild at runtime
+
+### Non-Functional Tests
+All tests from draft03 unchanged, plus:
+- **TRT engine build time**: Measure trtexec build time for LiteSAM and XFeat on Orin Nano Super (expected: 1-5 minutes each)
+- **TRT engine load time**: Measure deserialization time (expected: 1-3 seconds each)
+- **Memory comparison**: Measure actual GPU memory with native TRT vs ONNX RT TRT-EP for both models
+- **MinGRU TRT compatibility** (day-one blocker):
+  1. Clone LiteSAM repo, load pretrained weights
+  2. Reparameterize MobileOne backbone
+  3. `torch.onnx.export(model, dummy, "litesam.onnx", opset_version=17)`
+  4. `polygraphy inspect model litesam.onnx` — check for unsupported ops
+  5. `trtexec --onnx=litesam.onnx --saveEngine=litesam.engine --fp16`
+  6. If step 3 or 5 fails on MinGRU: rewrite MinGRU forward() as unrolled loop, retry
+  7. If still fails: switch to EfficientLoFTR, apply Coarse_LoFTR_TRT adaptation
+  8. Compare TRT output vs PyTorch reference (max L1 error < 0.01)
+- **EfficientLoFTR TRT fallback benchmark** (if LiteSAM fails): apply TRT adaptation from Coarse_LoFTR_TRT → ONNX → trtexec → measure latency at 640×480 and 1280px
+- **Tensor core utilization**: Verify with NSight that TRT engines use tensor cores (unlike ONNX RT CUDA EP)
+
+## References
+- ONNX Runtime Issue #24085 (Jetson Orin Nano tensor core bug): https://github.com/microsoft/onnxruntime/issues/24085
+- ONNX Runtime Issue #20457 (TRT-EP memory overhead): https://github.com/microsoft/onnxruntime/issues/20457
+- ONNX Runtime Issue #12083 (TRT-EP vs native TRT): https://github.com/microsoft/onnxruntime/issues/12083
+- NVIDIA TensorRT 10 Python API: https://docs.nvidia.com/deeplearning/tensorrt/10.15.1/inference-library/python-api-docs.html
+- TensorRT Best Practices: https://docs.nvidia.com/deeplearning/tensorrt/latest/performance/best-practices.html
+- TensorRT engine hardware specificity: https://github.com/NVIDIA/TensorRT/issues/1920
+- trtexec ONNX conversion: https://nvidia-jetson.piveral.com/jetson-orin-nano/how-to-convert-onnx-to-engine-on-jetson-orin-nano-dev-board/
+- Torch-TensorRT JetPack 6.2: https://docs.pytorch.org/TensorRT/v2.10.0/getting_started/jetpack.html
+- XFeatTensorRT: https://github.com/PranavNedunghat/XFeatTensorRT
+- JetPack 6.2 Release Notes: https://docs.nvidia.com/jetson/archives/jetpack-archived/jetpack-62/release-notes/index.html
+- Jetson Orin Nano Super: https://developer.nvidia.com/blog/nvidia-jetson-orin-nano-developer-kit-gets-a-super-boost/
+- DLA on Jetson Orin: https://developer.nvidia.com/blog/maximizing-deep-learning-performance-on-nvidia-jetson-orin-with-dla/
+- EfficientLoFTR (CVPR 2024): https://github.com/zju3dv/EfficientLoFTR
+- EfficientLoFTR HuggingFace: https://huggingface.co/docs/transformers/en/model_doc/efficientloftr
+- Coarse_LoFTR_TRT (TRT for embedded): https://github.com/Kolkir/Coarse_LoFTR_TRT
+- Coarse_LoFTR_TRT paper: https://ar5iv.labs.arxiv.org/html/2202.00770
+- LoFTR_TRT: https://github.com/Kolkir/LoFTR_TRT
+- minGRU ("Were RNNs All We Needed?"): https://huggingface.co/papers/2410.01201
+- minGRU PyTorch implementation: https://github.com/lucidrains/minGRU-pytorch
+- LiteSAM paper (MinGRU details, Eqs 12-16): https://www.mdpi.com/2072-4292/17/19/3349
+- DALGlue (UAV feature matching, 2025): https://www.nature.com/articles/s41598-025-21602-5
+- All references from solution_draft03.md
+
+## Related Artifacts
+- AC Assessment: `_docs/00_research/gps_denied_nav/00_ac_assessment.md`
+- Research artifacts (this assessment): `_docs/00_research/trt_engine_migration/`
+- Previous research: `_docs/00_research/gps_denied_nav_v3/`
+- Tech stack evaluation: `_docs/01_solution/tech_stack.md`
+- Security analysis: `_docs/01_solution/security_analysis.md`
@@ -0,0 +1,257 @@
+# Tech Stack Evaluation
+
+## Requirements Summary
+
+### Functional
+- GPS-denied visual navigation for fixed-wing UAV
+- Frame-center GPS estimation via VO + satellite matching + IMU fusion
+- Object-center GPS via geometric projection
+- Real-time streaming via REST API + SSE
+- Disconnected route segment handling
+- User-input fallback for unresolvable frames
+
+### Non-Functional
+- <400ms per-frame processing (camera @ ~3fps)
+- <50m accuracy for 80% of frames, <20m for 60%
+- <8GB total memory (CPU+GPU shared pool)
+- Up to 3000 frames per flight session
+- Image Registration Rate >95% (normal segments)
+
+### Hardware Constraints
+- **Jetson Orin Nano Super** (8GB LPDDR5, 1024 CUDA cores, 67 TOPS INT8)
+- **JetPack 6.2.2**: CUDA 12.6.10, TensorRT 10.3.0, cuDNN 9.3
+- ARM64 (aarch64) architecture
+- No internet connectivity during flight
+
+## Technology Evaluation
+
+### Platform & OS
+
+| Option | Version | Score (1-5) | Notes |
+|--------|---------|-------------|-------|
+| **JetPack 6.2.2 (L4T)** | Ubuntu 22.04 based | **5** | Only supported OS for Orin Nano Super. Includes CUDA 12.6, TensorRT 10.3, cuDNN 9.3 |
+
+**Selected**: JetPack 6.2.2 — no alternative.
+
+### Primary Language
+
+| Option | Fitness | Maturity | Perf on Jetson | Ecosystem | Score |
+|--------|---------|----------|----------------|-----------|-------|
+| **Python 3.10+** | 5 | 5 | 4 | 5 | **4.8** |
+| C++ | 5 | 5 | 5 | 3 | 4.5 |
+| Rust | 3 | 3 | 5 | 2 | 3.3 |
+
+**Selected**: **Python 3.10+** as primary language.
+- cuVSLAM provides Python bindings (PyCuVSLAM v15.0.0)
+- TensorRT has Python API
+- FastAPI is Python-native
+- OpenCV has full Python+CUDA bindings
+- Performance-critical paths offloaded to CUDA via cuVSLAM/TensorRT — Python is glue code only
+- C++ for custom ESKF if NumPy proves too slow (unlikely for 16-state EKF at 100Hz)
+
+### Visual Odometry
+
+| Option | Version | FPS on Orin Nano | Memory | License | Score |
+|--------|---------|------------------|--------|---------|-------|
+| **cuVSLAM (PyCuVSLAM)** | v15.0.0 (Mar 2026) | 116fps @ 720p | ~200-300MB | Free (NVIDIA, closed-source) | **5** |
+| XFeat frame-to-frame | TensorRT engine | ~30-50ms/frame | ~50MB | MIT | 3.5 |
+| ORB-SLAM3 | v1.0 | ~30fps | ~300MB | GPLv3 | 2.5 |
+
+**Selected**: **PyCuVSLAM v15.0.0**
+- 116fps on Orin Nano 8G at 720p (verified via Intermodalics benchmark)
+- Mono + IMU mode natively supported
+- Auto IMU fallback on tracking loss
+- Pre-built aarch64 wheel: `pip install -e bin/aarch64`
+- Loop closure built-in
+
+**Risk**: Closed-source; nadir-only camera not explicitly tested. **Fallback**: XFeat frame-to-frame matching.
+
+### Satellite Image Matching (Benchmark-Driven Selection)
+
+**Day-one benchmark decides between two candidates:**
+
+| Option | Params | Accuracy (UAV-VisLoc) | Est. Time on Orin Nano | License | Score |
+|--------|--------|----------------------|----------------------|---------|-------|
+| **LiteSAM (opt)** | 6.31M | RMSE@30 = 17.86m | ~300-500ms @ 480px (estimated) | Open-source | **4** (if fast enough) |
+| **XFeat semi-dense** | ~5M | Not benchmarked on UAV-VisLoc | ~50-100ms | MIT | **4** (if LiteSAM too slow) |
+
+**Decision rule**:
+1. Export LiteSAM (opt) to TensorRT FP16 on Orin Nano Super
+2. Benchmark at 480px, 640px, 800px
+3. If ≤400ms at 480px → LiteSAM
+4. If >400ms → **abandon LiteSAM, XFeat is primary**
+
+| Requirement | LiteSAM (opt) | XFeat semi-dense |
+|-------------|---------------|------------------|
+| PyTorch → ONNX → TensorRT export | Required | Required |
+| TensorRT FP16 engine | 6.31M params, ~25MB engine | ~5M params, ~20MB engine |
+| Input preprocessing | Resize to 480px, normalize | Resize to 640px, normalize |
+| Matching pipeline | End-to-end (detect + match + refine) | Detect → KNN match → geometric verify |
+| Cross-view robustness | Designed for satellite-aerial gap | General-purpose, less robust |
+
+### Sensor Fusion
+
+| Option | Complexity | Accuracy | Compute @ 100Hz | Score |
+|--------|-----------|----------|-----------------|-------|
+| **ESKF (custom)** | Low | Good | <1ms/step | **5** |
+| Hybrid ESKF/UKF | Medium | 49% better | ~2-3ms/step | 3.5 |
+| GTSAM Factor Graph | High | Best | ~10-50ms/step | 2 |
+
+**Selected**: **Custom ESKF in Python (NumPy/SciPy)**
+- 16-state vector, well within NumPy capability
+- FilterPy (v1.4.5, MIT) as reference/fallback, but custom implementation preferred for tighter control
+- If 100Hz IMU prediction step proves slow in Python: rewrite as Cython or C extension (~1 day effort)
+
+### Image Preprocessing
+
+| Option | Tool | Time on Orin Nano | Notes | Score |
+|--------|------|-------------------|-------|-------|
+| **OpenCV CUDA resize** | cv2.cuda.resize | ~2-3ms (pre-allocated) | Must build OpenCV with CUDA from source. Pre-allocate GPU mats to avoid allocation overhead | **4** |
+| NVIDIA VPI resize | VPI 3.2 | ~1-2ms | Part of JetPack, potentially faster | 4 |
+| CPU resize (OpenCV) | cv2.resize | ~5-10ms | No GPU needed, simpler | 3 |
+
+**Selected**: **OpenCV CUDA** (pre-allocated GPU memory) or **VPI 3.2** (whichever is faster in benchmark). Both available in JetPack 6.2.
+- Must build OpenCV from source with `CUDA_ARCH_BIN=8.7` for Orin Nano Ampere architecture
+- Alternative: VPI 3.2 is pre-installed in JetPack 6.2, no build step needed
+
+### API & Streaming Framework
+
+| Option | Version | Async Support | SSE Support | Score |
+|--------|---------|--------------|-------------|-------|
+| **FastAPI + sse-starlette** | FastAPI 0.115+, sse-starlette 3.3.2 | Native async/await | EventSourceResponse with auto-disconnect | **5** |
+| Flask + flask-sse | Flask 3.x | Limited | Redis dependency | 2 |
+| Raw aiohttp | aiohttp 3.x | Full | Manual SSE implementation | 3 |
+
+**Selected**: **FastAPI + sse-starlette v3.3.2**
+- sse-starlette: 108M downloads/month, BSD-3 license, production-stable
+- Auto-generated OpenAPI docs
+- Native async for non-blocking VO + satellite pipeline
+- Uvicorn as ASGI server
+
+### Satellite Tile Storage & Indexing
+
+| Option | Complexity | Lookup Speed | Score |
+|--------|-----------|-------------|-------|
+| **GeoHash-indexed directory** | Low | O(1) hash lookup | **5** |
+| SQLite + spatial index | Medium | O(log n) | 4 |
+| PostGIS | High | O(log n) | 2 (overkill) |
+
+**Selected**: **GeoHash-indexed directory structure**
+- Pre-flight: download tiles, store as `{geohash}/{zoom}_{x}_{y}.jpg` + `{geohash}/{zoom}_{x}_{y}_resized.jpg`
+- Runtime: compute geohash from ESKF position → direct directory lookup
+- Metadata in JSON sidecar files
+- No database dependency on the Jetson during flight
+
+### Satellite Tile Provider
+
+| Provider | Max Zoom | GSD | Pricing | Eastern Ukraine Coverage | Score |
+|----------|----------|-----|---------|--------------------------|-------|
+| **Google Maps Tile API** | 18-19 | ~0.3-0.5 m/px | 100K tiles free/month, then $0.48/1K | Partial (conflict zone gaps) | **4** |
+| Bing Maps | 18-19 | ~0.3-0.5 m/px | 125K free/year (basic) | Similar | 3.5 |
+| Mapbox Satellite | 18-19 | ~0.5 m/px | 200K free/month | Similar | 3.5 |
+
+**Selected**: **Google Maps Tile API** (per restrictions.md). 100K free tiles/month covers ~25km² at zoom 19. For larger operational areas, costs are manageable at $0.48/1K tiles.
+
+### Output Format
+
+| Format | Standard | Tooling | Score |
+|--------|----------|---------|-------|
+| **GeoJSON** | RFC 7946 | Universal GIS support | **5** |
+| CSV (lat, lon, confidence) | De facto | Simple, lightweight | 4 |
+
+**Selected**: **GeoJSON** as primary, CSV as export option. Per AC: WGS84 coordinates.
+
+## Tech Stack Summary
+
+```
+┌────────────────────────────────────────────────────┐
+│  HARDWARE: Jetson Orin Nano Super 8GB               │
+│  OS: JetPack 6.2.2 (L4T / Ubuntu 22.04)            │
+│  CUDA 12.6.10 / TensorRT 10.3.0 / cuDNN 9.3       │
+├────────────────────────────────────────────────────┤
+│  LANGUAGE: Python 3.10+                             │
+│  FRAMEWORK: FastAPI + sse-starlette 3.3.2           │
+│  SERVER: Uvicorn (ASGI)                             │
+├────────────────────────────────────────────────────┤
+│  VISUAL ODOMETRY: PyCuVSLAM v15.0.0                │
+│  SATELLITE MATCH: LiteSAM(opt) or XFeat (benchmark) │
+│  SENSOR FUSION: Custom ESKF (NumPy/SciPy)           │
+│  PREPROCESSING: OpenCV CUDA or VPI 3.2              │
+│  INFERENCE: TensorRT 10.3.0 (FP16)                  │
+├────────────────────────────────────────────────────┤
+│  TILE PROVIDER: Google Maps Tile API                 │
+│  TILE STORAGE: GeoHash-indexed directory             │
+│  OUTPUT: GeoJSON (WGS84) via SSE stream              │
+└────────────────────────────────────────────────────┘
+```
+
+## Dependency List
+
+### Python Packages (pip)
+
+| Package | Version | Purpose |
+|---------|---------|---------|
+| pycuvslam | v15.0.0 (aarch64 wheel) | Visual odometry |
+| fastapi | >=0.115 | REST API framework |
+| sse-starlette | >=3.3.2 | SSE streaming |
+| uvicorn | >=0.30 | ASGI server |
+| numpy | >=1.26 | ESKF math, array ops |
+| scipy | >=1.12 | Rotation matrices, spatial transforms |
+| opencv-python (CUDA build) | >=4.8 | Image preprocessing (must build from source with CUDA) |
+| torch (aarch64) | >=2.3 (JetPack-compatible) | LiteSAM model loading (if selected) |
+| tensorrt | 10.3.0 (JetPack bundled) | Inference engine |
+| pycuda | >=2024.1 | CUDA stream management |
+| geojson | >=3.1 | GeoJSON output formatting |
+| pygeohash | >=1.2 | GeoHash tile indexing |
+
+### System Dependencies (JetPack 6.2.2)
+
+| Component | Version | Notes |
+|-----------|---------|-------|
+| CUDA Toolkit | 12.6.10 | Pre-installed |
+| TensorRT | 10.3.0 | Pre-installed |
+| cuDNN | 9.3 | Pre-installed |
+| VPI | 3.2 | Pre-installed, alternative to OpenCV CUDA for resize |
+| cuVSLAM runtime | Bundled with PyCuVSLAM wheel | |
+
+### Offline Preprocessing Tools (developer machine, not Jetson)
+
+| Tool | Purpose |
+|------|---------|
+| Python 3.10+ | Tile download script |
+| Google Maps Tile API key | Satellite tile access |
+| torch + LiteSAM weights | Feature pre-extraction (if LiteSAM selected) |
+| trtexec (TensorRT) | Model export to TensorRT engine |
+
+## Risk Assessment
+
+| Technology | Risk | Likelihood | Impact | Mitigation |
+|-----------|------|-----------|--------|------------|
+| cuVSLAM | Closed-source, nadir camera untested | Medium | High | XFeat frame-to-frame as open-source fallback |
+| LiteSAM | May exceed 400ms on Orin Nano Super | High | High | **Abandon for XFeat** — day-one benchmark is go/no-go |
+| OpenCV CUDA build | Build complexity on Jetson, CUDA arch compatibility | Medium | Low | VPI 3.2 as drop-in alternative (pre-installed) |
+| Google Maps Tile API | Conflict zone coverage gaps, EEA restrictions | Medium | Medium | Test tile availability for operational area pre-flight; alternative providers (Bing, Mapbox) |
+| Custom ESKF | Implementation bugs, tuning effort | Low | Medium | FilterPy v1.4.5 as reference; well-understood algorithm |
+| Python GIL | Concurrent VO + satellite matching contention | Low | Low | CUDA operations release GIL; use asyncio + threading for I/O |
+
+## Learning Requirements
+
+| Technology | Team Expertise Needed | Ramp-up Time |
+|-----------|----------------------|--------------|
+| PyCuVSLAM | SLAM concepts, Python API, camera calibration | 2-3 days |
+| TensorRT model export | ONNX export, trtexec, FP16 optimization | 2-3 days |
+| LiteSAM architecture | Transformer-based matching (if selected) | 1-2 days |
+| XFeat | Feature detection/matching concepts | 1 day |
+| ESKF | Kalman filtering, quaternion math, multi-rate fusion | 3-5 days |
+| FastAPI + SSE | Async Python, ASGI, SSE protocol | 1 day |
+| GeoHash spatial indexing | Geospatial concepts | 0.5 days |
+| Jetson deployment | JetPack, power modes, thermal management | 2-3 days |
+
+## Development Environment
+
+| Environment | Purpose | Setup |
+|-------------|---------|-------|
+| **Developer machine** (x86_64, GPU) | Development, unit testing, model export | Docker with CUDA + TensorRT |
+| **Jetson Orin Nano Super** | Integration testing, benchmarking, deployment | JetPack 6.2.2 flashed, SSH access |
+
+Code should be developed and unit-tested on x86_64, then deployed to Jetson for integration/performance testing. cuVSLAM and TensorRT engines are aarch64-only — mock these in x86_64 tests.