Refactor acceptance criteria, problem description, and restrictions for UAV GPS-Denied system. Enhance clarity and detail in performance metrics, image processing requirements, and operational constraints. Introduce new sections for UAV specifications, camera details, satellite imagery, and onboard hardware.

This commit is contained in:
Oleksandr Bezdieniezhnykh
2026-03-17 09:00:06 +02:00
parent 767874cb90
commit f2aa95c8a2
35 changed files with 4857 additions and 26 deletions
BIN
View File
Binary file not shown.
+40 -11
View File
@@ -1,21 +1,50 @@
- The system should find out the GPS of centers of 80% of the photos from the flight within an error of no more than 50 meters in comparison to the real GPS # Position Accuracy
- The system should find out the GPS of centers of 60% of the photos from the flight within an error of no more than 20 meters in comparison to the real GPS - The system should determine GPS coordinates of frame centers for 80% of photos within 50m error compared to real GPS
- The system should determine GPS coordinates of frame centers for 60% of photos within 20m error compared to real GPS
- Maximum cumulative VO drift between satellite correction anchors should be less than 100 meters
- System should report a confidence score per position estimate (high = satellite-anchored, low = VO-extrapolated with drift)
- The system should correctly continue the work even in the presence of up to 350 meters of an outlier photo between 2 consecutive pictures en route. This could happen due to tilt of the plane. # Image Processing Quality
- System should correctly continue the work even during sharp turns, where the next photo doesn't overlap at all, or overlaps in less than 5%. The next photo should be in less than 200m drift and at an angle of less than 70% - Image Registration Rate > 95% for normal flight segments. The system can find enough matching features to confidently calculate the camera's 6-DoF pose and stitch that image into the trajectory
- Mean Reprojection Error (MRE) < 1.0 pixels
- System should try to operate when UAV made a sharp turn, and all the next photos has no common points with previous route. In that situation system should try to figure out location of the new piece of the route and connect it to the previous route. Also this separate chunks could be more than 2, so this strategy should be in the core of the system # Resilience & Edge Cases
- In case of being absolutely incapable of determining the system to determine next, second next, and third next images GPS, by any means (these 20% of the route), then it should ask the user for input for the next image, so that the user can specify the location - The system should correctly continue work even in the presence of up to 350m outlier between 2 consecutive photos (due to tilt of the plane)
- System should correctly continue work during sharp turns, where the next photo doesn't overlap at all or overlaps less than 5%. The next photo should be within 200m drift and at an angle of less than 70 degrees. Sharp-turn frames are expected to fail VO and should be handled by satellite-based re-localization
- System should operate when UAV makes a sharp turn and next photos have no common points with previous route. It should figure out the location of the new route segment and connect it to the previous route. There could be more than 2 such disconnected segments, so this strategy must be core to the system
- In case the system cannot determine the position of 3 consecutive frames by any means, it should send a re-localization request to the ground station operator via telemetry link. While waiting for operator input, the system continues attempting VO/IMU dead reckoning and the flight controller uses last known position + IMU extrapolation
- Less than 5 seconds for processing one image # Real-Time Onboard Performance
- Results of image processing should appear immediately to user, so that user shouldn't wait for the whole route to complete in order to analyze first results. Also, system could refine existing calculated results and send refined results again to user - Less than 400ms end-to-end per frame: from camera capture to GPS coordinate output to the flight controller (camera shoots at ~3fps)
- Memory usage should stay below 8GB shared memory (Jetson Orin Nano Super — CPU and GPU share the same 8GB LPDDR5 pool)
- The system must output calculated GPS coordinates directly to the flight controller via MAVLink GPS_INPUT messages (using MAVSDK)
- Position estimates are streamed to the flight controller frame-by-frame; the system does not batch or delay output
- The system may refine previously calculated positions and send corrections to the flight controller as updated estimates
- Image Registration Rate > 95%. The system can find enough matching features to confidently calculate the camera's 6-DoF pose (position and orientation) and "stitch" that image into the final trajectory # Startup & Failsafe
- Mean Reprojection Error (MRE) < 1.0 pixels. The distance, in pixels, between the original pixel location of the object and the re-projected pixel location. - The system initializes using the last known valid GPS position from the flight controller before GPS denial begins
- If the system completely fails to produce any position estimate for more than N seconds (TBD), the flight controller should fall back to IMU-only dead reckoning and the system should log the failure
- On companion computer reboot mid-flight, the system should attempt to re-initialize from the flight controller's current IMU-extrapolated position
- The whole system should work as a background service exposed via REST API with Server-Sent Events (SSE) for real-time streaming. Service should be up and running and awaiting for the initial request. On the request processing should start, and immediately after the first results system should provide them to the client via SSE stream # Ground Station & Telemetry
- Position estimates and confidence scores should be streamed to the ground station via telemetry link for operator situational awareness
- The ground station can send commands to the onboard system (e.g., operator-assisted re-localization hint with approximate coordinates)
- Output coordinates in WGS84 format
# Object Localization
- Other onboard AI systems can request GPS coordinates of objects detected by the AI camera
- The GPS-Denied system calculates object coordinates trigonometrically using: current UAV GPS position (from GPS-Denied), known AI camera angle, zoom, and current flight altitude. Flat terrain is assumed
- Accuracy is consistent with the frame-center position accuracy of the GPS-Denied system
# Satellite Reference Imagery
- Satellite reference imagery resolution must be at least 0.5 m/pixel, ideally 0.3 m/pixel
- Satellite imagery for the operational area should be less than 2 years old where possible
- Satellite imagery must be pre-processed and loaded onto the companion computer before flight. Offline preprocessing time is not time-critical (can take minutes/hours)
+2 -4
View File
@@ -1,4 +1,2 @@
We have a lot of images taken from a wing-type UAV using a camera with at least Full HD resolution. Resolution of each photo could be up to 6200*4100 for the whole flight, but for other flights, it could be FullHD We have a wing-type UAV with a camera pointing downwards that can take photos 3 times per second with a resolution 6200*4100. Also plane has flight controller with IMU. During the plane flight, we know GPS coordinates initially. During the flight, GPS could be disabled or spoofed. We need to determine the GPS of the centers of the next frame from the camera. And also the coordinates of the center of any object in these photos. We can use an external satellite provider for ground checks on the existing photos. So, before the flight, UAV's operator should upload the satellite photos to the plane's companion PC.
Photos are taken and named consecutively within 100 meters of each other. The real world examples are in input_data folder, but the distance between each photo is way bigger than it will be from a real plane. On that particular example photos were taken 1 photo per 2-3 seconds. But in real-world scenario frames would appear within the interval no more than 500ms or even 400 ms.
We know only the starting GPS coordinates. We need to determine the GPS of the centers of each next image. And also the coordinates of the center of any object in these photos. We can use an external satellite provider for ground checks on the existing photos.
The real world examples are in input_data folder
+37 -11
View File
@@ -1,11 +1,37 @@
- Photos are taken by only airplane type UAVs. # UAV & Flight
- Photos are taken by the camera pointing downwards and fixed, but it is not autostabilized.
- The flying range is restricted by the eastern and southern parts of Ukraine (To the left of the Dnipro River) - Photos are taken by only airplane (fixed-wing) type UAVs
- The image resolution could be from FullHD to 6252*4168. Camera parameters are known: focal length, sensor width, resolution and so on. - Photos are taken by the camera pointing downwards and fixed, but it is not autostabilized
- Altitude is predefined and no more than 1km. The height of the terrain can be neglected. - The flying range is restricted by the eastern and southern parts of Ukraine (to the left of the Dnipro River)
- There is NO data from IMU - Altitude is predefined and no more than 1km. The height of the terrain can be neglected
- Flights are done mostly in sunny weather - Flights are done mostly in sunny weather
- We can use satellite providers, but we're limited right now to Google Maps, which could be outdated for some regions - During the flight, UAVs can make sharp turns, so that the next photo may be absolutely different from the previous one (no same objects), but it is rather an exception than the rule
- Number of photos could be up to 3000, usually in the 500-1500 range - Number of photos per flight could be up to 3000, usually in the 500-1500 range
- During the flight, UAVs can make sharp turns, so that the next photo may be absolutely different from the previous one (no same objects), but it is rather an exception than the rule
- Processing is done on a stationary computer or laptop with NVidia GPU at least RTX2060, better 3070. (For the UAV solution Jetson Orin Nano would be used, but that is out of scope.) # Cameras
- UAV has two cameras:
1. **Navigation camera** — fixed, pointing downwards, not autostabilized. Used by GPS-Denied system for position estimation
2. **AI camera** — main camera with configurable angle and zoom, used by onboard AI detection systems
- Navigation camera resolution: FullHD to 6252*4168. Camera parameters are known: focal length, sensor width, resolution, etc.
- Cameras are connected to the companion computer (interface TBD: USB, CSI, or GigE)
- Terrain is assumed flat (eastern/southern Ukraine operational area); height differences are negligible
# Satellite Imagery
- We can use satellite providers, but we're limited right now to Google Maps, which could be outdated for some regions
- Satellite imagery for the operational area must be pre-loaded onto the companion computer before flight
# Onboard Hardware
- Processing is done on a Jetson Orin Nano Super (67 TOPS, 8GB shared LPDDR5, 25W TDP)
- The companion computer runs JetPack (Ubuntu-based) with CUDA/TensorRT available
- Onboard storage for satellite imagery is limited (exact capacity TBD, but must be accounted for in tile preparation)
- Sustained GPU load may cause thermal throttling; the processing pipeline must stay within thermal envelope
# Sensors & Integration
- There is a lot of data from IMU (via the flight controller)
- The system communicates with the flight controller via MAVLink protocol using MAVSDK library
- The system must output GPS coordinates to the flight controller as a replacement for the real GPS module (MAVLink GPS_INPUT message)
- Ground station telemetry link is available but bandwidth-limited; it is not the primary output channel
@@ -0,0 +1,74 @@
# Acceptance Criteria Assessment
## Acceptance Criteria
| Criterion | Our Values | Researched Values | Cost/Timeline Impact | Status |
|-----------|-----------|-------------------|---------------------|--------|
| Position accuracy (80% of photos) | ≤50m error | 15-150m achievable depending on method. SatLoc (2025): <15m with adaptive fusion. Mateos-Ramirez (2024): 142m mean at 1000m+ altitude. At 400m altitude with better GSD (~6cm/px) and satellite correction, ≤50m for 80% is realistic | Moderate — requires high-quality satellite imagery and robust feature matching pipeline | **Modified** — see notes on satellite imagery quality dependency |
| Position accuracy (60% of photos) | ≤20m error | Achievable only with satellite-anchored corrections, not with VO alone. SatLoc reports <15m with satellite anchoring + VO fusion. Requires 0.3-0.5 m/px satellite imagery and good terrain texture | High — requires premium satellite imagery, robust cross-view matching, and careful calibration | **Modified** — add dependency on satellite correction frequency |
| Outlier tolerance | 350m displacement between consecutive photos | At 400m altitude, image footprint is ~375x250m. A 350m displacement means near-zero overlap. VO will fail; system must rely on IMU dead-reckoning or satellite re-localization | Low — standard outlier detection can handle this | Modified — specify fallback strategy (IMU dead-reckoning + satellite re-matching) |
| Sharp turn handling (partial overlap) | <200m drift, <70° angle, <5% overlap | Standard VO fails below ~20-30% overlap. With <5% overlap, feature matching between consecutive frames is unreliable. Requires satellite-based re-localization or IMU bridging | High — requires separate re-localization module | Modified — clarify: "70%" should likely be "70 degrees"; add IMU-bridge requirement |
| Disconnected route segments | System should reconnect disconnected chunks | This is essentially a place recognition / re-localization problem. Solvable via satellite image matching for each new segment independently | High — core architectural requirement affecting system design | Modified — add: each segment should independently localize via satellite matching |
| User fallback input | Ask user after 3 consecutive failures | Reasonable fallback. Needs UI/API integration for interactive input | Low | No change |
| Processing time per image | <5 seconds | On Jetson Orin Nano Super (8GB shared memory): feasible with optimized pipeline. CUDA feature extraction ~50ms, matching ~100-500ms, satellite crop+match ~1-3s. Full pipeline 2-4s is achievable with image downsampling and TensorRT optimization | Moderate — requires TensorRT optimization and image downsampling strategy | **Modified** — specify this is for Jetson Orin Nano Super, not RTX 2060 |
| Real-time streaming | SSE for immediate results + refinement | Standard pattern, well-supported | Low | No change |
| Image Registration Rate | >95% | For consecutive frames with nadir camera in good conditions: 90-98% achievable. Drops significantly during sharp turns and over low-texture terrain (water, uniform fields). The 95% target conflicts with sharp-turn handling requirement | Moderate — requires learning-based matchers (SuperPoint/LightGlue) | **Modified** — clarify: 95% applies to "normal flight" segments only; sharp-turn frames are expected failures handled by re-localization |
| Mean Reprojection Error | <1.0 pixels | Achievable with modern methods (LightGlue, SuperGlue). Traditional methods typically 1-3 px. Deep learning matchers routinely achieve 0.3-0.8 px with proper calibration | Moderate — requires deep learning feature matchers | No change — achievable |
| REST API + SSE architecture | Background service | Standard architecture, well-supported in Python (FastAPI + SSE) | Low | No change |
| Satellite imagery resolution | ≥0.5 m/px, ideally 0.3 m/px | Google Maps for eastern Ukraine: variable, typically 0.5-1.0 m/px in rural areas. 0.3 m/px unlikely from Google Maps. Commercial providers (Maxar, Planet) offer 0.3-0.5 m/px but at significant cost | **High** — Google Maps may not meet 0.5 m/px in all areas of the operational region. 0.3 m/px requires commercial satellite providers | **Modified** — current Google Maps limitation may make this unachievable for all areas; consider fallback for degraded satellite quality |
| Confidence scoring | Per-position estimate (high=satellite, low=VO) | Standard practice in sensor fusion. Easy to implement | Low | No change |
| Output format | WGS84, GeoJSON or CSV | Standard, trivial to implement | Negligible | No change |
| Satellite imagery age | <2 years where possible | Google Maps imagery for conflict zones (eastern Ukraine) may be significantly outdated or intentionally degraded. Recency is hard to guarantee | Medium — may need multiple satellite sources | **Modified** — flag: conflict zone imagery may be intentionally limited |
| Max VO cumulative drift | <100m between satellite corrections | VIO drift typically 0.8-1% of distance. Between corrections at 1km intervals: ~10m drift. 100m budget allows corrections every ~10km — very generous | Low — easily achievable if corrections happen at reasonable intervals | No change — generous threshold |
| Memory usage | <8GB shared memory (Jetson Orin Nano Super) | Binding constraint. 8GB LPDDR5 shared between CPU and GPU. ~6-7GB usable after OS. 26MP images need downsampling | **Critical** — all processing must fit within 8GB shared memory | **Updated** — changed to Jetson Orin Nano Super constraint |
| Object center coordinates | Accuracy consistent with frame-center accuracy | New criterion — derives from problem statement requirement | Low — once frame position is known, object position follows from pixel offset + GSD | **Added** |
| Sharp turn handling | <200m drift, <70 degrees, <5% overlap. 95% registration rate applies to normal flight only | Clarified from original "70%" to "70 degrees". Split registration rate expectation | Low — clarification only | **Updated** |
| Offline preprocessing time | Not time-critical (minutes/hours before flight) | New criterion — no constraint existed | Low | **Added** |
## Restrictions Assessment
| Restriction | Our Values | Researched Values | Cost/Timeline Impact | Status |
|-------------|-----------|-------------------|---------------------|--------|
| Aircraft type | Fixed-wing only | Appropriate — fixed-wing has predictable motion model, mostly forward flight. Simplifies VO assumptions | N/A | No change |
| Camera mount | Downward-pointing, fixed, not autostabilized | Implies roll/pitch affect image. At 400m altitude, moderate roll/pitch causes manageable image shift. IMU data can compensate. Non-stabilized means more variable image overlap and orientation | Medium — must use IMU data for image dewarping or accept orientation-dependent accuracy | **Modified** — add: IMU-based image orientation correction should be considered |
| Operational region | Eastern/southern Ukraine (left of Dnipro) | Conflict zone — satellite imagery may be degraded, outdated, or restricted. Terrain: mix of agricultural, urban, forest. Agricultural areas have seasonal texture changes | **High** — satellite imagery availability and quality is a significant risk | **Modified** — flag operational risk: imagery access in conflict zones |
| Image resolution | FullHD to 6252x4168, known camera parameters | 26MP at max is large for edge processing. Must downsample for feature extraction. Known camera intrinsics enable proper projective geometry | Medium — pipeline must handle variable resolutions | No change |
| Altitude | Predefined, ≤1km, terrain height negligible | At 400m: GSD ~6cm/px, footprint ~375x250m. Terrain "negligible" is an approximation — even 50m terrain variation at 400m altitude causes ~12% scale error. The referenced paper (Mateos-Ramirez 2024) shows terrain elevation is a primary error source | **Medium** — "terrain height negligible" needs qualification. At 400m, terrain variations >50m become significant | **Modified** — add: terrain height can be neglected only if variations <50m within image footprint |
| IMU data availability | "A lot of data from IMU" | IMU provides: accelerometer, gyroscope, magnetometer. Crucial for: dead-reckoning during feature-less frames, image orientation compensation, scale estimation, motion prediction. Standard tactical IMUs provide 100-400Hz data | Low — standard IMU integration | **Modified** — specify: IMU data includes gyroscope + accelerometer at ≥100Hz; will be used for orientation compensation and dead-reckoning fallback |
| Weather | Mostly sunny | Favorable for visual methods. Shadows can actually help feature matching. Reduces image quality variability | Low — favorable condition | No change |
| Satellite provider | Google Maps (potentially outdated) | **Critical limitation**: Google Maps satellite API has usage limits, unknown update frequency for eastern Ukraine, potential conflict-zone restrictions. Resolution may not meet 0.5 m/px in rural areas. No guarantee of recency | **High** — single-provider dependency is a significant risk | **Modified** — consider: (1) downloading tiles ahead of time for the operational area, (2) having a fallback provider strategy |
| Photo count | Up to 3000, typically 500-1500 | At 3fps and 500-1500 photos: 3-8 minutes of flight. At ~100m spacing: 50-150km route. Memory for 3000 pre-extracted satellite feature maps needs careful management on 8GB | Medium — batch processing and memory management needed | **Modified** — add: pipeline must manage memory for up to 3000 frames on 8GB device |
| Sharp turns | Next photo may have no common objects with previous | This is the hardest edge case. Complete visual discontinuity requires satellite-based re-localization. IMU provides heading/velocity for bridging. System must be architected around this possibility | High — drives core architecture decision | No change — already captured as a defining constraint |
| Processing hardware | Jetson Orin Nano Super, 67 TOPS | 8GB shared LPDDR5, 1024 CUDA cores, 32 Tensor Cores, 102 GB/s bandwidth. TensorRT for inference optimization. Power: 7-25W. Significantly less capable than desktop GPU | **Critical** — all processing must fit within 8GB shared memory, pipeline must be optimized for TensorRT | **Modified** — CONTRADICTS AC's RTX 2060 reference. Must be the binding constraint |
## Key Findings
1. **CRITICAL CONTRADICTION**: The AC mentions "RTX 2060 compatibility" (16GB RAM + 6GB VRAM) but the restriction specifies Jetson Orin Nano Super (8GB shared memory). These are fundamentally different platforms. **The Jetson must be the binding constraint.** All processing, including model weights, image buffers, and intermediate results, must fit within ~6-7GB usable memory (OS takes ~1-1.5GB).
2. **Satellite Imagery Risk**: Google Maps as the sole satellite provider for a conflict zone in eastern Ukraine presents significant quality, resolution, and recency risks. The 0.3 m/px "ideal" resolution is unlikely available from Google Maps for this region. The system design must be robust to degraded satellite reference quality (0.5-1.0 m/px).
3. **Accuracy is Achievable but Conditional**: The 50m/80% and 20m/60% accuracy targets are achievable based on recent research (SatLoc 2025: <15m with adaptive fusion), but **only when satellite corrections are successful**. VO-only segments will drift ~1% of distance traveled. The system must maximize satellite correction frequency.
4. **Sharp Turn Handling Drives Architecture**: The requirement to handle disconnected route segments with no visual overlap between consecutive frames means the system cannot rely solely on sequential VO. It must have an independent satellite-based geo-localization capability for each frame or segment — this is a core architectural requirement.
5. **Processing Time is Feasible**: <5s per image on Jetson Orin Nano Super is achievable with: (a) image downsampling (e.g., to 2000x1300), (b) TensorRT-optimized models, (c) efficient satellite region cropping. GPU-accelerated feature extraction takes ~50ms, matching ~100-500ms, satellite matching ~1-3s.
6. **Missing AC: Object Center Coordinates**: The problem statement mentions "coordinates of the center of any object in these photos" but no acceptance criterion specifies the accuracy requirement for this. Need to add.
7. **Missing AC: DEM/Elevation Data**: Research shows terrain elevation is a primary error source for pixel-to-meter conversion at these altitudes. If terrain variations are >50m, a DEM is needed. No current restriction mentions DEM availability.
8. **Missing AC: Offline Preprocessing Time**: No constraint on how long satellite image preprocessing can take before the flight.
9. **"70%" in Sharp Turn AC is Ambiguous**: "at an angle of less than 70%" — this likely means 70 degrees, not 70%.
## Sources
- SatLoc: Hierarchical Adaptive Fusion Framework for GNSS-denied UAV Localization (2025) — <15m error, >90% coverage, 2+ Hz on edge hardware
- Mateos-Ramirez et al. "Visual Odometry in GPS-Denied Zones for Fixed-Wing UAV" (2024) — 142.88m mean error over 17km at 1000m+ altitude, 0.83% error rate with satellite correction
- NVIDIA Jetson Orin Nano Super specs: 8GB LPDDR5, 67 TOPS, 1024 CUDA cores, 102 GB/s bandwidth
- cuda-efficient-features: Feature extraction benchmarks — 4K in ~12ms on Jetson Xavier
- SIFT+LightGlue for UAV image mosaicking (ISPRS 2025) — superior performance across diverse scenarios
- SuperPoint+LightGlue comparative analysis (2024) — best balance of robustness, accuracy, efficiency
- Google Maps satellite resolution: 0.15m-30m depending on location and source imagery
- VIO drift benchmarks: 0.82-1% of distance traveled (EuRoC, outdoor flights)
- UAVSAR cross-modality matching: 1.83-2.86m RMSE with deep learning approach (Springer 2026)
@@ -0,0 +1,88 @@
# Question Decomposition
## Original Question
Research the GPS-denied onboard navigation problem for a fixed-wing UAV and find the best solution architecture. The system must determine frame-center GPS coordinates using visual odometry, satellite image matching, and IMU fusion — all running on a Jetson Orin Nano Super (8GB shared memory, 67 TOPS).
## Active Mode
Mode A Phase 2 — Initial Research (Problem & Solution)
## Rationale
No existing solution drafts. Full problem decomposition and solution research needed.
## Problem Context Summary (from INPUT_DIR)
- **Platform**: Fixed-wing UAV, camera pointing down (not stabilized), 400m altitude max 1km
- **Camera**: ADTi Surveyor Lite 26S v2, 26MP (6252x4168), focal length 25mm, sensor width 23.5mm
- **GSD at 400m**: ~6cm/pixel, footprint ~375x250m
- **Frame rate**: 3 fps (interval ~333ms, real-world could be 400-500ms)
- **Photo count**: 500-3000 per flight
- **IMU**: Available at high rate
- **Initial GPS**: Known; GPS may be denied/spoofed during flight
- **Satellite reference**: Pre-uploaded Google Maps tiles
- **Hardware**: Jetson Orin Nano Super, 8GB shared memory, 67 TOPS
- **Region**: Eastern/southern Ukraine (conflict zone)
- **Key challenge**: Reconnecting disconnected route segments after sharp turns
## Question Type Classification
**Decision Support** — we need to evaluate and select the best architectural approach and component technologies for each part of the pipeline.
## Research Subject Boundary Definition
| Dimension | Boundary |
|-----------|----------|
| Population | Fixed-wing UAVs with nadir cameras at 200-1000m altitude |
| Geography | Rural/semi-urban terrain in eastern Ukraine |
| Timeframe | Current state-of-the-art (2023-2026) |
| Level | Edge computing (Jetson-class, 8GB memory), real-time processing |
## Decomposed Sub-Questions
### A. Existing/Competitor Solutions
1. What existing systems solve GPS-denied UAV visual navigation?
2. What open-source implementations exist for VO + satellite matching?
3. What commercial/military solutions address this problem?
### B. Architecture Components
4. What is the optimal pipeline architecture (sequential vs parallel, streaming)?
5. How should VO, satellite matching, and IMU fusion be combined (loosely vs tightly coupled)?
6. How to handle disconnected route segments (the core architectural challenge)?
### C. Visual Odometry Component
7. What VO algorithms work best for aerial nadir imagery on edge hardware?
8. What feature extractors/matchers are optimal for Jetson (SuperPoint, ORB, XFeat)?
9. How to handle scale estimation with known altitude and camera parameters?
10. What is the optimal image downsampling strategy for 26MP on 8GB memory?
### D. Satellite Image Matching Component
11. How to efficiently match UAV frames against pre-loaded satellite tiles?
12. What cross-view matching methods work for aerial-to-satellite registration?
13. How to preprocess and index satellite tiles for fast retrieval?
14. How to handle resolution mismatch (6cm UAV vs 50cm+ satellite)?
### E. IMU Fusion Component
15. How to fuse IMU data with visual estimates (EKF, UKF, factor graph)?
16. How to use IMU for dead-reckoning during feature-less frames?
17. How to use IMU for image orientation compensation (non-stabilized camera)?
### F. Edge Optimization
18. How to fit the full pipeline in 8GB shared memory?
19. What TensorRT optimizations are available for feature extractors?
20. How to achieve <5s per frame on Jetson Orin Nano Super?
### G. API & Streaming
21. What is the best approach for REST API + SSE on Python/Jetson?
22. How to implement progressive result refinement?
## Timeliness Sensitivity Assessment
- **Research Topic**: GPS-denied UAV visual navigation with edge processing
- **Sensitivity Level**: 🟠 High
- **Rationale**: Deep learning feature matchers (SuperPoint, LightGlue, XFeat) and edge inference frameworks (TensorRT) evolve rapidly. Jetson Orin Nano Super is a recent (Dec 2024) product. Cross-view geo-localization is an active research area.
- **Source Time Window**: 12 months (prioritize 2025-2026)
- **Priority official sources to consult**:
1. NVIDIA Jetson documentation and benchmarks
2. OpenCV / kornia / hloc official docs
3. Recent papers on cross-view geo-localization (CVPR, ECCV, ICCV 2024-2025)
- **Key version information to verify**:
- JetPack SDK: Current version ____
- SuperPoint/LightGlue: Latest available for TensorRT ____
- XFeat: Version and Jetson compatibility ____
@@ -0,0 +1,151 @@
# Source Registry
## Source #1
- **Title**: Visual Odometry in GPS-Denied Zones for Fixed-Wing UAV with Reduced Accumulative Error Based on Satellite Imagery
- **Link**: https://www.mdpi.com/2076-3417/14/16/7420
- **Tier**: L1
- **Publication Date**: 2024-08-22
- **Timeliness Status**: ✅ Currently valid
- **Target Audience**: Fixed-wing UAV GPS-denied navigation
- **Research Boundary Match**: ✅ Full match
- **Summary**: VO + satellite correction pipeline for fixed-wing UAV at 1000m+ altitude. Mean error 142.88m over 17km (0.83%). Uses ORB features, centroid-based displacement, Kalman filter smoothing, quadtree for satellite keypoint indexing.
- **Related Sub-question**: A1, B5, C7, D11
## Source #2
- **Title**: SatLoc: Hierarchical Adaptive Fusion Framework for GNSS-denied UAV Localization
- **Link**: https://www.scilit.com/publications/e5cafaf875a49297a62b298a89d5572f
- **Tier**: L1
- **Publication Date**: 2025
- **Timeliness Status**: ✅ Currently valid
- **Target Audience**: UAV localization in GNSS-denied environments
- **Research Boundary Match**: ✅ Full match
- **Summary**: Three-layer fusion: DinoV2 for satellite geo-localization, XFeat for VO, optical flow for velocity. Adaptive confidence-based weighting. <15m error, >90% coverage, 2+ Hz on edge hardware.
- **Related Sub-question**: B4, B5, C8, D12
## Source #3
- **Title**: XFeat: Accelerated Features for Lightweight Image Matching (CVPR 2024)
- **Link**: https://arxiv.org/abs/2404.19174
- **Tier**: L1
- **Publication Date**: 2024-04
- **Timeliness Status**: ✅ Currently valid
- **Target Audience**: Edge device feature matching
- **Research Boundary Match**: ✅ Full match
- **Summary**: 5x faster than SuperPoint, runs on CPU at VGA resolution. Sparse and semi-dense matching. TensorRT deployment available for Jetson. Comparable accuracy to SuperPoint.
- **Related Sub-question**: C8, F18, F20
## Source #4
- **Title**: XFeat TensorRT Implementation
- **Link**: https://github.com/PranavNedunghat/XFeatTensorRT
- **Tier**: L2
- **Publication Date**: 2024
- **Timeliness Status**: ✅ Currently valid
- **Summary**: C++ TensorRT implementation of XFeat, tested on Jetson Orin NX 16GB with JetPack 6.0, CUDA 12.2, TensorRT 8.6.
- **Related Sub-question**: C8, F18, F19
## Source #5
- **Title**: SuperPoint+LightGlue TensorRT Deployment
- **Link**: https://github.com/fettahyildizz/superpoint_lightglue_tensorrt
- **Tier**: L2
- **Publication Date**: 2024
- **Timeliness Status**: ✅ Currently valid
- **Summary**: C++ TensorRT implementation of SuperPoint+LightGlue. Production-ready deployment for Jetson platforms.
- **Related Sub-question**: C8, F19
## Source #6
- **Title**: FP8 Quantized LightGlue in TensorRT
- **Link**: https://fabio-sim.github.io/blog/fp8-quantized-lightglue-tensorrt-nvidia-model-optimizer/
- **Tier**: L2
- **Publication Date**: 2026
- **Timeliness Status**: ✅ Currently valid
- **Summary**: Up to ~6x speedup with FP8 quantization. Requires Hopper/Ada Lovelace GPUs (not available on Jetson Orin Nano Ampere). FP16 is the best available precision for Orin Nano.
- **Related Sub-question**: F19
## Source #7
- **Title**: NVIDIA JetPack 6.2 Release Notes
- **Link**: https://docs.nvidia.com/jetson/archives/jetpack-archived/jetpack-62/release-notes/index.html
- **Tier**: L1
- **Publication Date**: 2025
- **Timeliness Status**: ✅ Currently valid
- **Summary**: CUDA 12.6.10, TensorRT 10.3.0, cuDNN 9.3. Super Mode for Orin Nano: up to 2x inference performance, 50% memory bandwidth boost. Power modes: 15W, 25W, MAXN SUPER.
- **Related Sub-question**: F18, F19, F20
## Source #8
- **Title**: cuda-efficient-features (GPU feature detection benchmarks)
- **Link**: https://github.com/fixstars/cuda-efficient-features
- **Tier**: L2
- **Publication Date**: 2024
- **Timeliness Status**: ✅ Currently valid
- **Summary**: 4K detection: 12ms on Jetson Xavier. 8K: 27.5ms. 40K keypoints extraction: 20-25ms on Xavier. Orin Nano Super should be comparable or better.
- **Related Sub-question**: F20
## Source #9
- **Title**: Adaptive Covariance Hybrid EKF/UKF for Visual-Inertial Odometry
- **Link**: https://arxiv.org/abs/2512.17505
- **Tier**: L1
- **Publication Date**: 2025
- **Timeliness Status**: ✅ Currently valid
- **Summary**: Hybrid EKF/UKF achieves 49% better position accuracy, 57% better rotation accuracy than ESKF alone, at 48% lower computational cost than full UKF. Includes adaptive sensor confidence scoring.
- **Related Sub-question**: E15
## Source #10
- **Title**: SIFT+LightGlue for UAV Image Mosaicking (ISPRS 2025)
- **Link**: https://isprs-archives.copernicus.org/articles/XLVIII-2-W11-2025/169/2025/
- **Tier**: L1
- **Publication Date**: 2025
- **Timeliness Status**: ✅ Currently valid
- **Summary**: SIFT+LightGlue outperforms SuperPoint+LightGlue for UAV mosaicking across diverse scenarios. Superior in both low-texture and high-texture environments.
- **Related Sub-question**: C8, D12
## Source #11
- **Title**: UAVision - GNSS-Denied UAV Visual Localization System
- **Link**: https://github.com/ArboriseRS/UAVision
- **Tier**: L4
- **Publication Date**: 2024-2025
- **Timeliness Status**: ✅ Currently valid
- **Summary**: Open-source system using LightGlue for map matching. Includes image processing modules and visualization.
- **Related Sub-question**: A2
## Source #12
- **Title**: TerboucheHacene/visual_localization
- **Link**: https://github.com/TerboucheHacene/visual_localization
- **Tier**: L4
- **Publication Date**: 2024
- **Timeliness Status**: ✅ Currently valid
- **Summary**: Vision-based GNSS-free localization with SuperPoint/SuperGlue/GIM matching. Optimized VO + satellite image matching hybrid pipeline. Learning-based matchers for natural environments.
- **Related Sub-question**: A2, D12
## Source #13
- **Title**: GNSS-Denied Geolocalization with Terrain Constraints
- **Link**: https://github.com/yfs90/gnss-denied-uav-geolocalization
- **Tier**: L4
- **Publication Date**: 2024
- **Timeliness Status**: ✅ Currently valid
- **Summary**: No altimeters/IMU required, uses image matching + terrain constraints. GPS-comparable accuracy for day/night across varied terrain.
- **Related Sub-question**: A2
## Source #14
- **Title**: Google Maps Tile API Documentation
- **Link**: https://developers.google.com/maps/documentation/tile/satellite
- **Tier**: L1
- **Publication Date**: Current
- **Timeliness Status**: ✅ Currently valid
- **Summary**: Zoom levels 0-22. Satellite tiles via HTTPS. Session tokens required. Bulk download possible but subject to usage policies.
- **Related Sub-question**: D13
## Source #15
- **Title**: NaviLoc: Trajectory-Level Visual Localization for GNSS-Denied UAV Navigation
- **Link**: https://www.mdpi.com/2504-446X/10/2/97
- **Tier**: L1
- **Publication Date**: 2025
- **Timeliness Status**: ✅ Currently valid
- **Summary**: Trajectory-level optimization rather than per-frame matching. Optimizes entire trajectory against satellite reference for improved accuracy.
- **Related Sub-question**: B4, D11
## Source #16
- **Title**: GSD Estimation for UAV Photogrammetry
- **Link**: https://blog.truegeometry.com/calculators/UAV_photogrammetry_workflows_calculation.html
- **Tier**: L3
- **Publication Date**: Current
- **Timeliness Status**: ✅ Currently valid
- **Summary**: GSD = (sensor_width × altitude) / (focal_length × image_width). For our case: (23.5mm × 400m) / (25mm × 6252) = 0.06 m/pixel.
- **Related Sub-question**: C9
@@ -0,0 +1,121 @@
# Fact Cards
## Fact #1
- **Statement**: XFeat achieves up to 5x faster inference than SuperPoint while maintaining comparable accuracy for pose estimation. It runs in real-time on CPU at VGA resolution.
- **Source**: Source #3 (CVPR 2024 paper)
- **Phase**: Phase 2
- **Target Audience**: Edge device deployments
- **Confidence**: ✅ High
- **Related Dimension**: Feature Extraction
## Fact #2
- **Statement**: XFeat TensorRT implementation exists and is tested on Jetson Orin NX 16GB with JetPack 6.0, CUDA 12.2, TensorRT 8.6.
- **Source**: Source #4
- **Phase**: Phase 2
- **Target Audience**: Jetson platform deployment
- **Confidence**: ✅ High
- **Related Dimension**: Feature Extraction, Edge Optimization
## Fact #3
- **Statement**: SatLoc framework achieves <15m absolute localization error with >90% trajectory coverage at 2+ Hz on edge hardware, using DinoV2 for satellite matching, XFeat for VO, and optical flow for velocity.
- **Source**: Source #2
- **Phase**: Phase 2
- **Target Audience**: GNSS-denied UAV localization
- **Confidence**: ⚠️ Medium (paper details not fully accessible)
- **Related Dimension**: Overall Architecture, Accuracy
## Fact #4
- **Statement**: Mateos-Ramirez et al. achieved 142.88m mean error over 17km (0.83% error rate) with VO + satellite correction on a fixed-wing UAV at 1000m+ altitude. Without satellite correction, error accumulated to 850m+ over 17km.
- **Source**: Source #1
- **Phase**: Phase 2
- **Target Audience**: Fixed-wing UAV at high altitude
- **Confidence**: ✅ High
- **Related Dimension**: Accuracy, Architecture
## Fact #5
- **Statement**: VIO systems typically drift 0.8-1% of distance traveled. Between satellite corrections at 1km intervals, expected drift is ~10m.
- **Source**: Multiple sources (arxiv VIO benchmarks)
- **Phase**: Phase 2
- **Target Audience**: Aerial VIO systems
- **Confidence**: ✅ High
- **Related Dimension**: VO Drift
## Fact #6
- **Statement**: Jetson Orin Nano Super: 8GB LPDDR5 shared memory, 1024 CUDA cores, 32 Tensor Cores, 102 GB/s bandwidth, 67 TOPS INT8. JetPack 6.2: CUDA 12.6.10, TensorRT 10.3.0.
- **Source**: Source #7
- **Phase**: Phase 2
- **Target Audience**: Hardware specification
- **Confidence**: ✅ High
- **Related Dimension**: Edge Optimization
## Fact #7
- **Statement**: CUDA-accelerated feature detection at 4K (3840x2160): ~12ms on Jetson Xavier. At 8K: ~27.5ms. Descriptor extraction for 40K keypoints: ~20-25ms on Xavier. Orin Nano Super has comparable or slightly better compute.
- **Source**: Source #8
- **Phase**: Phase 2
- **Target Audience**: Jetson GPU performance
- **Confidence**: ✅ High
- **Related Dimension**: Processing Time
## Fact #8
- **Statement**: Hybrid EKF/UKF achieves 49% better position accuracy than ESKF alone at 48% lower computational cost than full UKF. Includes adaptive sensor confidence scoring based on image entropy and motion blur.
- **Source**: Source #9
- **Phase**: Phase 2
- **Target Audience**: VIO fusion
- **Confidence**: ✅ High
- **Related Dimension**: Sensor Fusion
## Fact #9
- **Statement**: SIFT+LightGlue outperforms SuperPoint+LightGlue for UAV mosaicking across diverse scenarios (low-texture agricultural and high-texture urban).
- **Source**: Source #10
- **Phase**: Phase 2
- **Target Audience**: UAV image matching
- **Confidence**: ✅ High
- **Related Dimension**: Feature Matching
## Fact #10
- **Statement**: GSD for our system at 400m: (23.5mm × 400m) / (25mm × 6252px) = 0.060 m/pixel. Image footprint: 6252 × 0.06 = 375m width, 4168 × 0.06 = 250m height.
- **Source**: Source #16 + camera parameters
- **Phase**: Phase 2
- **Target Audience**: Our specific system
- **Confidence**: ✅ High
- **Related Dimension**: Scale Estimation
## Fact #11
- **Statement**: Google Maps satellite tiles available via Tile API at zoom levels 0-22. Max zoom varies by region. For eastern Ukraine, zoom 18 (~0.6 m/px) is typically available; zoom 19 (~0.3 m/px) may not be.
- **Source**: Source #14
- **Phase**: Phase 2
- **Target Audience**: Satellite imagery
- **Confidence**: ⚠️ Medium (exact zoom availability for eastern Ukraine unverified)
- **Related Dimension**: Satellite Reference
## Fact #12
- **Statement**: FP8 quantization for LightGlue requires Hopper/Ada GPUs. Jetson Orin Nano uses Ampere architecture — limited to FP16 as best TensorRT precision.
- **Source**: Source #6, Source #7
- **Phase**: Phase 2
- **Target Audience**: Jetson optimization
- **Confidence**: ✅ High
- **Related Dimension**: Edge Optimization
## Fact #13
- **Statement**: SuperPoint+LightGlue TensorRT C++ deployment is available and production-tested. ONNX Runtime path achieves 2-4x speedup over compiled PyTorch.
- **Source**: Source #5, Source #6
- **Phase**: Phase 2
- **Target Audience**: Production deployment
- **Confidence**: ✅ High
- **Related Dimension**: Feature Matching, Edge Optimization
## Fact #14
- **Statement**: Cross-view matching (UAV-to-satellite) is fundamentally harder than same-view matching due to extreme viewpoint differences. Deep learning embeddings (DinoV2, CLIP-based) are the state-of-the-art for coarse retrieval. Local features are used for fine alignment.
- **Source**: Multiple (Sources #2, #12, #15)
- **Phase**: Phase 2
- **Target Audience**: Cross-view geo-localization
- **Confidence**: ✅ High
- **Related Dimension**: Satellite Matching
## Fact #15
- **Statement**: Quadtree spatial indexing enables O(log n) nearest-neighbor lookup for satellite keypoints. Combined with GeoHash for fast region encoding, this is the standard approach for tile management.
- **Source**: Sources #1, #14
- **Phase**: Phase 2
- **Target Audience**: Spatial indexing
- **Confidence**: ✅ High
- **Related Dimension**: Satellite Tile Management
@@ -0,0 +1,71 @@
# Comparison Framework
## Selected Framework Type
Decision Support — evaluating solution options per component
## Architecture Components to Evaluate
1. Feature Extraction & Matching (VO frame-to-frame)
2. Satellite Image Matching (cross-view geo-registration)
3. Sensor Fusion (VO + satellite + IMU)
4. Satellite Tile Preprocessing & Indexing
5. Image Downsampling Strategy
6. Re-localization (disconnected segments)
7. API & Streaming Layer
## Component 1: Feature Extraction & Matching (VO)
| Dimension | XFeat | SuperPoint + LightGlue | ORB (OpenCV) |
|-----------|-------|----------------------|--------------|
| Speed (Jetson) | ~2-5ms per frame (VGA), 5x faster than SuperPoint | ~15-50ms per frame (VGA, TensorRT FP16) | ~5-10ms per frame (CUDA) |
| Accuracy | Comparable to SuperPoint on pose estimation | State-of-the-art for local features | Lower accuracy, not scale-invariant |
| Memory | <100MB model | ~200-400MB model+inference | Negligible |
| TensorRT support | Yes (C++ impl available for Jetson Orin NX) | Yes (C++ impl available) | N/A (native CUDA) |
| Cross-view capability | Limited (same-view designed) | Better with LightGlue attention | Poor for cross-view |
| Rotation invariance | Moderate | Good with LightGlue | Good (by design) |
| Jetson validation | Tested on Orin NX (JetPack 6.0) | Tested on multiple Jetson platforms | Native OpenCV CUDA |
| **Fit for VO** | ✅ Best — fast, accurate, Jetson-proven | ⚠️ Good but heavier | ⚠️ Fast but less accurate |
| **Fit for satellite matching** | ⚠️ Moderate | ✅ Better for cross-view with attention | ❌ Poor for cross-view |
## Component 2: Satellite Image Matching (Cross-View)
| Dimension | Local Feature Matching (SIFT/SuperPoint + LightGlue) | Global Descriptor Retrieval (DinoV2/CLIP) | Template Matching (NCC) |
|-----------|-----------------------------------------------------|------------------------------------------|------------------------|
| Approach | Extract keypoints in both UAV and satellite images, match descriptors | Encode both images into global vectors, compare by distance | Slide UAV image over satellite tile, compute correlation |
| Accuracy | Sub-pixel when matches found (best for fine alignment) | Tile-level (~50-200m depending on tile size) | Pixel-level but sensitive to appearance changes |
| Speed | ~100-500ms for match+geometric verification | ~50-100ms for descriptor comparison | ~500ms-2s for large search area |
| Robustness to viewpoint | Good with LightGlue attention | Excellent (trained for cross-view) | Poor (requires similar viewpoint) |
| Memory | ~300-500MB (model + keypoints) | ~200-500MB (model) | Low |
| Failure rate | High in low-texture, seasonal changes | Lower — semantic understanding | High in changed scenes |
| **Recommended role** | Fine alignment (after coarse retrieval) | Coarse retrieval (select candidate tile) | Not recommended |
## Component 3: Sensor Fusion
| Dimension | EKF (Extended Kalman Filter) | Error-State EKF (ESKF) | Hybrid ESKF/UKF | Factor Graph (GTSAM) |
|-----------|-------------------------------|------------------------|------------------|---------------------|
| Accuracy | Baseline | Better for rotation | 49% better than ESKF | Best overall |
| Compute cost | Lowest | Low | 48% less than full UKF | Highest |
| Implementation complexity | Low | Medium | Medium-High | High |
| Handles non-linearity | Linearization errors | Better for small errors | Best among KF variants | Full non-linear |
| Real-time on Jetson | ✅ | ✅ | ✅ | ⚠️ Depends on graph size |
| Multi-rate sensor support | Manual | Manual | Manual | Native |
| **Fit** | ⚠️ Baseline option | ✅ Good starting point | ✅ Best KF option | ⚠️ Overkill for this system |
## Component 4: Satellite Tile Management
| Dimension | GeoHash + In-Memory | Quadtree + Memory-Mapped Files | Pre-extracted Feature DB |
|-----------|--------------------|-----------------------------|------------------------|
| Lookup speed | O(1) hash | O(log n) tree traversal | O(1) hash + feature load |
| Memory usage | All tiles in RAM | On-demand loading | Features only (smaller) |
| Preprocessing | Fast | Moderate | Slow (extract all features offline) |
| Flexibility | Fixed grid | Adaptive resolution | Fixed per-tile |
| **Fit for 8GB** | ❌ Too much RAM for large areas | ✅ Memory-efficient | ✅ Best — smallest footprint |
## Component 5: Image Downsampling Strategy
| Dimension | Fixed Resize (e.g., 1600x1066) | Pyramid (multi-scale) | ROI-based (center crop + full) |
|-----------|-------------------------------|----------------------|-------------------------------|
| Speed | Fast, single scale | Slower, multiple passes | Medium |
| Accuracy | Good if GSD ratio maintained | Best for multi-scale features | Good for center, loses edges |
| Memory | ~5MB per frame | ~7-8MB per frame | ~6MB per frame |
| **Fit** | ✅ Best tradeoff | ⚠️ Unnecessary complexity | ⚠️ Loses coverage |
@@ -0,0 +1,129 @@
# Reasoning Chain
## Dimension 1: Feature Extraction for Visual Odometry
### Fact Confirmation
XFeat is 5x faster than SuperPoint (Fact #1), has TensorRT deployment on Jetson (Fact #2), and comparable accuracy for pose estimation. SatLoc (the most relevant state-of-the-art system) uses XFeat for its VO component (Fact #3).
### Reference Comparison
SuperPoint+LightGlue is more accurate for cross-view matching but heavier. ORB is fast but less accurate and not robust to appearance changes. SIFT+LightGlue is best for mosaicking (Fact #9) but slower.
### Conclusion
**XFeat for VO (frame-to-frame)** — it's the fastest learned feature, Jetson-proven, and used by the closest state-of-the-art system (SatLoc). For satellite matching, a different approach is needed because cross-view matching requires viewpoint-invariant features.
### Confidence
✅ High — supported by SatLoc architecture and CVPR 2024 benchmarks.
---
## Dimension 2: Satellite Image Matching Strategy
### Fact Confirmation
Cross-view matching is fundamentally harder than same-view (Fact #14). Deep learning embeddings (DinoV2) are state-of-the-art for coarse retrieval (Fact #3). Local features are better for fine alignment. SatLoc uses DinoV2 for satellite matching specifically.
### Reference Comparison
A two-stage coarse-to-fine approach is the dominant pattern in literature: (1) global descriptor retrieves candidate region, (2) local feature matching refines position. Pure local-feature matching has high failure rate for cross-view due to extreme viewpoint differences.
### Conclusion
**Two-stage approach**: (1) Coarse — use a lightweight global descriptor to find the best-matching satellite tile within the search area (VO-predicted position ± uncertainty radius). (2) Fine — use local feature matching (SuperPoint+LightGlue or XFeat) between UAV frame and the matched satellite tile to get precise position. The coarse stage can also serve as the re-localization mechanism for disconnected segments.
### Confidence
✅ High — consensus across multiple recent papers and the SatLoc system.
---
## Dimension 3: Sensor Fusion Approach
### Fact Confirmation
Hybrid ESKF/UKF achieves 49% better accuracy than ESKF alone at 48% lower cost than full UKF (Fact #8). Factor graphs (GTSAM) offer the best accuracy but are computationally expensive.
### Reference Comparison
For our system: IMU runs at 100-400Hz, VO at ~3Hz (frame rate), satellite corrections at variable rate (whenever matching succeeds). We need multi-rate fusion that handles intermittent satellite corrections and continuous IMU.
### Conclusion
**Error-State EKF (ESKF)** as the baseline fusion approach — it's well-understood, lightweight, handles multi-rate sensors naturally, and is proven for VIO on edge hardware. Upgrade to hybrid ESKF/UKF if ESKF accuracy is insufficient. Factor graphs are overkill for this real-time edge system.
The filter state: position (lat/lon), velocity, orientation (quaternion), IMU biases. Measurements: VO-derived displacement (high rate), satellite-derived absolute position (variable rate), IMU (highest rate for prediction).
### Confidence
✅ High — ESKF is the standard choice for embedded VIO systems.
---
## Dimension 4: Satellite Tile Preprocessing & Indexing
### Fact Confirmation
Quadtree enables O(log n) lookups (Fact #15). Pre-extracting features offline saves runtime compute. 8GB memory limits in-memory tile storage.
### Reference Comparison
Full tiles in memory is infeasible for large areas. Memory-mapped files allow on-demand loading. Pre-extracted feature databases have the smallest runtime footprint.
### Conclusion
**Offline preprocessing pipeline**:
1. Download Google Maps satellite tiles at max zoom (18-19) for the operational area
2. Extract features (XFeat or SuperPoint) from each tile
3. Compute global descriptors (lightweight, e.g., NetVLAD or cosine-pooled XFeat descriptors) per tile
4. Store: tile metadata (GPS bounds, zoom level), features + descriptors in a GeoHash-indexed database
5. Build spatial index (GeoHash) for fast lookup by GPS region
**Runtime**: Given VO-estimated position, query GeoHash to find nearby tiles, compare global descriptors for coarse match, then local feature matching for fine alignment.
### Confidence
✅ High — standard approach used by all relevant systems.
---
## Dimension 5: Image Downsampling Strategy
### Fact Confirmation
26MP images need downsampling for 8GB device (Fact #6). Feature extraction at 4K takes ~12ms on Jetson Xavier (Fact #7). UAV GSD at 400m is ~6cm/px (Fact #10). Satellite GSD is ~60cm/px at zoom 18.
### Reference Comparison
For VO (frame-to-frame): features at full resolution are wasteful — consecutive frames at 6cm GSD overlap ~80%, and features at lower resolution are sufficient for displacement estimation. For satellite matching: we need to match at satellite resolution (~60cm/px), so downsampling to match satellite GSD is natural.
### Conclusion
**Downsample to ~1600x1066** (factor ~4x each dimension). This yields ~24cm/px GSD — still 2.5x finer than satellite, sufficient for feature matching. Image size: ~5MB (RGB). Feature extraction at this resolution: <10ms. This is the single resolution for both VO and satellite matching.
### Confidence
✅ High — standard practice for edge processing of high-res imagery.
---
## Dimension 6: Disconnected Segment Handling
### Fact Confirmation
SatLoc uses satellite matching as an independent localization source that works regardless of VO state (Fact #3). The AC requires reconnecting disconnected segments as a core capability.
### Reference Comparison
Pure VO cannot handle zero-overlap transitions. IMU dead-reckoning bridges short gaps (seconds). Satellite-based re-localization provides absolute position regardless of VO state.
### Conclusion
**Independent satellite localization per frame** — every frame attempts satellite matching regardless of VO state. This naturally handles disconnected segments:
1. When VO succeeds: satellite matching refines position (high confidence)
2. When VO fails (sharp turn): satellite matching provides absolute position (sole source)
3. When both fail: IMU dead-reckoning with low confidence score
4. After 3 consecutive total failures: request user input
Segment reconnection is automatic: all positions are in the same global (WGS84) frame via satellite matching. No explicit "reconnection" needed — segments share the satellite reference.
### Confidence
✅ High — this is the key architectural insight.
---
## Dimension 7: Processing Pipeline Architecture
### Fact Confirmation
<5s per frame required (AC). Feature extraction ~10ms, VO matching ~20-50ms, satellite coarse retrieval ~50-100ms, satellite fine matching ~200-500ms, fusion ~1ms. Total: ~300-700ms per frame.
### Conclusion
**Pipelined parallel architecture**:
- Thread 1 (Camera): Capture frame, downsample, extract features → push to queue
- Thread 2 (VO): Match with previous frame, compute displacement → push to fusion
- Thread 3 (Satellite): Search nearby tiles, coarse retrieval, fine matching → push to fusion
- Thread 4 (Fusion): ESKF prediction (IMU), update (VO), update (satellite) → emit result via SSE
VO and satellite matching can run in parallel for each frame. Fusion integrates results as they arrive. This enables <1s per frame total latency.
### Confidence
✅ High — standard producer-consumer pipeline.
@@ -0,0 +1,98 @@
# Validation Log
## Validation Scenario 1: Normal Flight (80% of time)
UAV flies straight, consecutive frames overlap ~70-80%. Terrain has moderate texture (agricultural + urban mix).
### Expected Based on Conclusions
- XFeat extracts features in ~5ms, VO matching in ~20ms
- Satellite matching succeeds: coarse retrieval ~50ms, fine matching ~300ms
- ESKF fuses both: position accuracy ~10-20m (satellite-anchored)
- Total processing: <500ms per frame
- Confidence: HIGH
### Actual Validation (against literature)
SatLoc reports <15m error with >90% coverage under similar conditions. Mateos-Ramirez reports 0.83% drift with satellite correction. Both align with our expected performance.
### Result: ✅ PASS
---
## Validation Scenario 2: Sharp Turn (5-10% of time)
UAV makes a 60-degree turn. Next frame has <5% overlap with previous. Heading changes rapidly.
### Expected Based on Conclusions
- VO fails (insufficient feature overlap) — detected by low match count
- IMU provides heading and approximate displacement for ~1-2 frames
- Satellite matching attempts independent localization of the new frame
- If satellite match succeeds: position recovered, segment continues
- If satellite match fails: IMU dead-reckoning with LOW confidence
### Potential Issues
- Satellite matching may also fail if the frame is heavily tilted (non-nadir view during turn)
- IMU drift during turn: at 100m/s for 1s, displacement ~100m. IMU drift over 1s: ~1-5m — acceptable
### Result: ⚠️ CONDITIONAL PASS — depends on satellite matching success during turn. Non-stabilized camera may produce tilted images that are harder to match. IMU provides reasonable bridge.
---
## Validation Scenario 3: Disconnected Route (rare, <5%)
UAV completes segment A, makes a 90+ degree turn, flies a new heading. Segment B has no overlap with segment A. Multiple such segments possible.
### Expected Based on Conclusions
- Each segment independently localizes via satellite matching
- No explicit reconnection needed — all in WGS84 frame
- Per-segment accuracy depends on satellite matching success rate
- Low-confidence gaps between segments until satellite match succeeds
### Result: ✅ PASS — architecture handles this natively via independent per-frame satellite matching.
---
## Validation Scenario 4: Memory-Constrained Operation (always)
3000 frames, 8GB shared memory. Full pipeline running.
### Expected Based on Conclusions
- Downsampled frame: ~5MB per frame. Keep 2 in memory (current + previous): ~10MB
- XFeat model (TensorRT): ~50-100MB
- Satellite tile features (loaded tiles): ~200-500MB for tiles near current position
- ESKF state: <1MB
- OS + runtime: ~1.5GB
- Total: ~2-3GB active, well within 8GB
### Potential Issues
- Satellite feature DB for large operational areas could be large on disk (not memory — loaded on demand)
- Need careful management of tile loading/unloading
### Result: ✅ PASS — 8GB is sufficient with proper memory management.
---
## Validation Scenario 5: Degraded Satellite Imagery
Google Maps tiles at 0.5-1.0 m/px resolution. Some areas have outdated imagery. Seasonal appearance changes.
### Expected Based on Conclusions
- Coarse retrieval (global descriptors) should handle moderate appearance changes
- Fine matching may fail on outdated/seasonal tiles — confidence drops to LOW
- System falls back to VO + IMU in degraded areas
- Multiple consecutive failures → user input request
### Potential Issues
- If large areas have degraded satellite imagery, the system may operate mostly in VO+IMU mode with significant drift
- 50m accuracy target may not be achievable in these areas
### Result: ⚠️ CONDITIONAL PASS — system degrades gracefully, but accuracy targets depend on satellite quality. This is a known risk per Phase 1 assessment.
---
## Review Checklist
- [x] Draft conclusions consistent with fact cards
- [x] No important dimensions missed
- [x] No over-extrapolation
- [x] Conclusions actionable/verifiable
- [x] Sharp turn handling addressed
- [x] Memory constraints validated
- [ ] Issue: Satellite imagery quality in eastern Ukraine remains a risk
- [ ] Issue: Non-stabilized camera during turns may degrade satellite matching
## Conclusions Requiring No Revision
All major architectural decisions validated. Two known risks (satellite quality, non-stabilized camera during turns) are acknowledged and handled by the fallback hierarchy.
@@ -0,0 +1,56 @@
# Question Decomposition
## Original Question
Assess current solution draft. Additionally:
1. Try SuperPoint + LightGlue for visual odometry
2. Can LiteSAM be SO SLOW because of big images? If we reduce size to 1280p, would that work faster?
## Active Mode
Mode B: Solution Assessment — `solution_draft01.md` exists in OUTPUT_DIR.
## Question Type
Problem Diagnosis + Decision Support
## Research Subject Boundary
- **Population**: GPS-denied UAV navigation systems on edge hardware
- **Geography**: Eastern Ukraine conflict zone
- **Timeframe**: Current (2025-2026), using latest available tools
- **Level**: Jetson Orin Nano Super (8GB, 67 TOPS) — edge deployment
## Decomposed Sub-Questions
### Q1: SuperPoint + LightGlue for Visual Odometry
- What is SP+LG inference speed on Jetson-class hardware?
- How does it compare to cuVSLAM (116fps on Orin Nano)?
- Is SP+LG suitable for frame-to-frame VO at 3fps?
- What is SP+LG accuracy vs cuVSLAM for VO?
### Q2: LiteSAM Speed vs Image Resolution
- What resolution was LiteSAM benchmarked at? (1184px on AGX Orin)
- How does LiteSAM speed scale with resolution?
- What would 1280px achieve on Orin Nano Super vs AGX Orin?
- Is the bottleneck image size or compute power gap?
### Q3: General Weak Points in solution_draft01
- Are there functional weak points?
- Are there performance bottlenecks?
- Are there security gaps?
### Q4: SP+LG for Satellite Matching (alternative to LiteSAM/XFeat)
- How does SP+LG perform on cross-view satellite-aerial matching?
- What does the LiteSAM paper say about SP+LG accuracy?
## Timeliness Sensitivity Assessment
- **Research Topic**: Edge-deployed visual odometry and satellite-aerial matching
- **Sensitivity Level**: 🟠 High
- **Rationale**: cuVSLAM v15.0.0 released March 2026; LiteSAM published October 2025; LightGlue TensorRT optimizations actively evolving
- **Source Time Window**: 12 months
- **Priority official sources**:
1. LiteSAM paper (MDPI Remote Sensing, October 2025)
2. cuVSLAM / PyCuVSLAM v15.0.0 (March 2026)
3. LightGlue-ONNX / TensorRT benchmarks (2024-2026)
4. Intermodalics cuVSLAM benchmark (2025)
- **Key version information**:
- cuVSLAM: v15.0.0 (March 2026)
- LightGlue: ICCV 2023, TensorRT via fabio-sim/LightGlue-ONNX
- LiteSAM: Published October 2025, code at boyagesmile/LiteSAM
@@ -0,0 +1,121 @@
# Source Registry
## Source #1
- **Title**: LiteSAM: Lightweight and Robust Feature Matching for Satellite and Aerial Imagery
- **Link**: https://www.mdpi.com/2072-4292/17/19/3349
- **Tier**: L1
- **Publication Date**: 2025-10-01
- **Timeliness Status**: ✅ Currently valid
- **Version Info**: LiteSAM v1.0; benchmarked on Jetson AGX Orin (JetPack 5.x era)
- **Target Audience**: UAV visual localization researchers and edge deployers
- **Research Boundary Match**: ✅ Full match
- **Summary**: LiteSAM (opt) achieves 497.49ms on Jetson AGX Orin at 1184px input. 6.31M params. RMSE@30 = 17.86m on UAV-VisLoc. Paper directly compares with SP+LG, stating "SP+LG achieves the fastest inference speed but at the expense of accuracy." Section 4.9 shows resolution vs speed tradeoff on RTX 3090Ti.
- **Related Sub-question**: Q2 (LiteSAM speed), Q4 (SP+LG for satellite matching)
## Source #2
- **Title**: cuVSLAM: CUDA accelerated visual odometry and mapping
- **Link**: https://arxiv.org/abs/2506.04359
- **Tier**: L1
- **Publication Date**: 2025-06 (paper), v15.0.0 released 2026-03-10
- **Timeliness Status**: ✅ Currently valid
- **Version Info**: cuVSLAM v15.0.0 / PyCuVSLAM v15.0.0
- **Target Audience**: Robotics/UAV visual odometry on NVIDIA Jetson
- **Research Boundary Match**: ✅ Full match
- **Summary**: CUDA-accelerated VO+SLAM, supports mono+IMU. 116fps on Jetson Orin Nano 8GB at 720p. <1% trajectory error on KITTI. <5cm on EuRoC.
- **Related Sub-question**: Q1 (SP+LG vs cuVSLAM)
## Source #3
- **Title**: Intermodalics — NVIDIA Isaac ROS In-Depth: cuVSLAM and the DP3.1 Release
- **Link**: https://www.intermodalics.ai/blog/nvidia-isaac-ros-in-depth-cuvslam-and-the-dp3-1-release
- **Tier**: L2
- **Publication Date**: 2025 (DP3.1 release)
- **Timeliness Status**: ✅ Currently valid
- **Version Info**: cuVSLAM v11 (DP3.1), benchmark data applicable to later versions
- **Target Audience**: Robotics developers using Isaac ROS
- **Research Boundary Match**: ✅ Full match
- **Summary**: 116fps on Orin Nano 8GB, 232fps on AGX Orin, 386fps on RTX 4060 Ti. Outperforms ORB-SLAM2 on KITTI.
- **Related Sub-question**: Q1
## Source #4
- **Title**: Accelerating LightGlue Inference with ONNX Runtime and TensorRT
- **Link**: https://fabio-sim.github.io/blog/accelerating-lightglue-inference-onnx-runtime-tensorrt/
- **Tier**: L2
- **Publication Date**: 2024-07-17
- **Timeliness Status**: ✅ Currently valid
- **Version Info**: torch 2.4.0, TensorRT 10.2.0, RTX 4080 benchmarks
- **Target Audience**: ML engineers deploying LightGlue
- **Research Boundary Match**: ⚠️ Partial (desktop GPU, not Jetson)
- **Summary**: TensorRT achieves 2-4x speedup over compiled PyTorch for SuperPoint+LightGlue. Full pipeline benchmarks on RTX 4080. TensorRT has 3840 keypoint limit. No Jetson-specific benchmarks provided.
- **Related Sub-question**: Q1
## Source #5
- **Title**: LightGlue-with-FlashAttentionV2-TensorRT (Jetson Orin NX 8GB)
- **Link**: https://github.com/qdLMF/LightGlue-with-FlashAttentionV2-TensorRT
- **Tier**: L4
- **Publication Date**: 2025-02
- **Timeliness Status**: ✅ Currently valid
- **Version Info**: TensorRT 8.5.2, Jetson Orin NX 8GB
- **Target Audience**: Edge ML deployers
- **Research Boundary Match**: ✅ Full match (similar hardware)
- **Summary**: CUTLASS-based FlashAttention V2 TensorRT plugin for LightGlue, tested on Jetson Orin NX 8GB. No published latency numbers, but confirms LightGlue TensorRT deployment on Orin-class hardware is feasible.
- **Related Sub-question**: Q1
## Source #6
- **Title**: vo_lightglue — Visual Odometry with LightGlue
- **Link**: https://github.com/himadrir/vo_lightglue
- **Tier**: L4
- **Publication Date**: 2024
- **Timeliness Status**: ✅ Currently valid
- **Version Info**: N/A
- **Target Audience**: VO researchers
- **Research Boundary Match**: ⚠️ Partial (desktop, KITTI dataset)
- **Summary**: SP+LG achieves 10fps on KITTI dataset (desktop GPU). Odometric error ~1% vs 3.5-4.1% for FLANN-based matching. Much slower than cuVSLAM.
- **Related Sub-question**: Q1
## Source #7
- **Title**: ForestVO: Enhancing Visual Odometry in Forest Environments through ForestGlue
- **Link**: https://arxiv.org/html/2504.01261v1
- **Tier**: L1
- **Publication Date**: 2025-04
- **Timeliness Status**: ✅ Currently valid
- **Version Info**: N/A
- **Target Audience**: VO researchers
- **Research Boundary Match**: ⚠️ Partial (forest environment, not nadir UAV)
- **Summary**: SP+LG VO pipeline achieves 1.09m avg relative pose error, KITTI score 2.33%. Uses 512 keypoints (reduced from 2048) to cut compute. Outperforms DSO by 40%.
- **Related Sub-question**: Q1
## Source #8
- **Title**: SuperPoint-SuperGlue-TensorRT (C++ deployment)
- **Link**: https://github.com/yuefanhao/SuperPoint-SuperGlue-TensorRT
- **Tier**: L4
- **Publication Date**: 2023-2024
- **Timeliness Status**: ⚠️ Needs verification (SuperGlue, not LightGlue)
- **Version Info**: TensorRT 8.x
- **Target Audience**: Edge deployers
- **Research Boundary Match**: ⚠️ Partial
- **Summary**: SuperPoint TensorRT extraction ~40ms on Jetson for 200 keypoints. C++ implementation.
- **Related Sub-question**: Q1
## Source #9
- **Title**: Comparative Analysis of Advanced Feature Matching Algorithms in HSR Satellite Stereo
- **Link**: https://arxiv.org/abs/2405.06246
- **Tier**: L1
- **Publication Date**: 2024-05
- **Timeliness Status**: ✅ Currently valid
- **Version Info**: N/A
- **Target Audience**: Remote sensing researchers
- **Research Boundary Match**: ⚠️ Partial (satellite stereo, not UAV-satellite cross-view)
- **Summary**: SP+LG shows "overall superior performance in balancing robustness, accuracy, distribution, and efficiency" for satellite stereo matching. But this is same-view satellite-satellite, not cross-view UAV-satellite.
- **Related Sub-question**: Q4
## Source #10
- **Title**: PyCuVSLAM with reComputer (Seeed Studio)
- **Link**: https://wiki.seeedstudio.com/pycuvslam_recomputer_robotics/
- **Tier**: L3
- **Publication Date**: 2026
- **Timeliness Status**: ✅ Currently valid
- **Version Info**: PyCuVSLAM v15.0.0, JetPack 6.2
- **Target Audience**: Robotics developers
- **Research Boundary Match**: ✅ Full match
- **Summary**: Tutorial for deploying PyCuVSLAM on Jetson Orin NX. Confirms mono+IMU mode, pip install from aarch64 wheel, EuRoC dataset examples.
- **Related Sub-question**: Q1
@@ -0,0 +1,122 @@
# Fact Cards
## Fact #1
- **Statement**: cuVSLAM achieves 116fps on Jetson Orin Nano 8GB at 720p resolution (~8.6ms/frame). 232fps on AGX Orin. 386fps on RTX 4060 Ti.
- **Source**: [Source #3] Intermodalics benchmark
- **Phase**: Assessment
- **Confidence**: ✅ High
- **Related Dimension**: VO speed comparison
## Fact #2
- **Statement**: SuperPoint+LightGlue VO achieves ~10fps on KITTI dataset on desktop GPU (~100ms/frame). With 274 keypoints on RTX 2080Ti, LightGlue matching alone takes 33.9ms.
- **Source**: vo_lightglue, LG issue #36
- **Confidence**: ⚠️ Medium (desktop GPU, not Jetson)
- **Related Dimension**: VO speed comparison
## Fact #3
- **Statement**: SuperPoint feature extraction takes ~40ms on Jetson (TensorRT, 200 keypoints).
- **Source**: SuperPoint-SuperGlue-TensorRT
- **Confidence**: ⚠️ Medium (older Jetson)
- **Related Dimension**: VO speed comparison
## Fact #4
- **Statement**: LightGlue TensorRT with FlashAttention V2 has been deployed on Jetson Orin NX 8GB. No published latency numbers.
- **Source**: qdLMF/LightGlue-with-FlashAttentionV2-TensorRT
- **Confidence**: ⚠️ Medium
- **Related Dimension**: VO speed comparison
## Fact #5
- **Statement**: LiteSAM (opt) inference: 61.98ms on RTX 3090, 497.49ms on Jetson AGX Orin at 1184px input. 6.31M params.
- **Source**: LiteSAM paper, abstract + Section 4.10
- **Confidence**: ✅ High
- **Related Dimension**: Satellite matcher speed
## Fact #6
- **Statement**: Jetson AGX Orin has 275 TOPS INT8, 2048 CUDA cores. Orin Nano Super has 67 TOPS INT8, 1024 CUDA cores. AGX Orin is ~3-4x more powerful.
- **Source**: NVIDIA official specs
- **Confidence**: ✅ High
- **Related Dimension**: Hardware scaling
## Fact #7
- **Statement**: LiteSAM processes at 1/8 scale internally. Coarse matching is O(N²) where N = (H/8 × W/8). For 1184px: ~21,904 tokens. For 1280px: ~25,600. For 480px: ~3,600.
- **Source**: LiteSAM paper, Sections 3.1-3.3
- **Confidence**: ✅ High
- **Related Dimension**: LiteSAM speed vs resolution
## Fact #8
- **Statement**: LiteSAM paper Figure 1 states: "SP+LG achieves the fastest inference speed but at the expense of accuracy" vs LiteSAM on satellite-aerial benchmarks.
- **Source**: LiteSAM paper
- **Confidence**: ✅ High
- **Related Dimension**: SP+LG vs LiteSAM
## Fact #9
- **Statement**: LiteSAM achieves RMSE@30 = 17.86m on UAV-VisLoc. SP+LG is worse on same benchmark.
- **Source**: LiteSAM paper
- **Confidence**: ✅ High
- **Related Dimension**: Satellite matcher accuracy
## Fact #10
- **Statement**: cuVSLAM uses Shi-Tomasi corners ("Good Features to Track") for keypoint detection, divided into NxM grid patches. Uses Lucas-Kanade optical flow for tracking. When tracked keypoints fall below threshold, creates new keyframe.
- **Source**: cuVSLAM paper (arXiv:2506.04359), Section 2.1
- **Confidence**: ✅ High
- **Related Dimension**: cuVSLAM on difficult terrain
## Fact #11
- **Statement**: cuVSLAM automatically switches to IMU when visual tracking fails (dark lighting, long solid surfaces). IMU integrator provides ~1 second of acceptable tracking. After IMU, constant-velocity integrator provides ~0.5 seconds more.
- **Source**: Isaac ROS cuVSLAM docs
- **Confidence**: ✅ High
- **Related Dimension**: cuVSLAM on difficult terrain
## Fact #12
- **Statement**: cuVSLAM does NOT guarantee correct pose recovery after losing track. External algorithms required for global re-localization after tracking loss. Cannot fuse GNSS, wheel odometry, or LiDAR.
- **Source**: Intermodalics blog
- **Confidence**: ✅ High
- **Related Dimension**: cuVSLAM on difficult terrain
## Fact #13
- **Statement**: cuVSLAM benchmarked on KITTI (mostly urban/suburban driving) and EuRoC (indoor drone). Neither benchmark includes nadir agricultural terrain, flat fields, or uniform vegetation. No published results for these conditions.
- **Source**: cuVSLAM paper Section 3
- **Confidence**: ✅ High
- **Related Dimension**: cuVSLAM on difficult terrain
## Fact #14
- **Statement**: cuVSLAM multi-stereo mode "significantly improves accuracy and robustness on challenging sequences compared to single stereo cameras", designed for featureless surfaces (narrow corridors, elevators). But our system uses monocular camera only.
- **Source**: cuVSLAM paper Section 2.2.2
- **Confidence**: ✅ High
- **Related Dimension**: cuVSLAM on difficult terrain
## Fact #15
- **Statement**: PFED achieves 97.15% Recall@1 on University-1652 at 251.5 FPS on AGX Orin with only 4.45G FLOPs. But this is image RETRIEVAL (which satellite tile matches), NOT pixel-level correspondence matching.
- **Source**: PFED paper (arXiv:2510.22582)
- **Confidence**: ✅ High
- **Related Dimension**: Satellite matching alternatives
## Fact #16
- **Statement**: EfficientLoFTR is ~2.5x faster than LoFTR with higher accuracy. Semi-dense matcher, 15.05M params. Has TensorRT adaptation (LoFTR_TRT). Performs well on weak-texture areas where traditional methods fail. Designed for aerial imagery.
- **Source**: EfficientLoFTR paper (CVPR 2024), HuggingFace docs
- **Confidence**: ✅ High
- **Related Dimension**: Satellite matching alternatives
## Fact #17
- **Statement**: Hierarchical AVL system (2025) uses two-stage approach: DINOv2 for coarse retrieval + SuperPoint for fine matching. 64.5-95% success rate on real-world drone trajectories. Includes IMU-based prior correction and sliding-window map updates.
- **Source**: MDPI Remote Sensing 2025
- **Confidence**: ✅ High
- **Related Dimension**: Satellite matching alternatives
## Fact #18
- **Statement**: STHN uses deep homography estimation for UAV geo-localization: directly estimates homography transform (no feature detection/matching/RANSAC). Achieves 4.24m MACE at 50m range. Designed for thermal but architecture is modality-agnostic.
- **Source**: STHN paper (IEEE RA-L 2024)
- **Confidence**: ✅ High
- **Related Dimension**: Satellite matching alternatives
## Fact #19
- **Statement**: For our nadir UAV → satellite matching, the cross-view gap is SMALL compared to typical cross-view problems (ground-to-satellite). Both views are approximately top-down. Main challenges: season/lighting, resolution mismatch, temporal changes. This means general-purpose matchers may work better than expected.
- **Source**: Analytical observation
- **Confidence**: ⚠️ Medium
- **Related Dimension**: Satellite matching alternatives
## Fact #20
- **Statement**: LiteSAM paper benchmarked EfficientLoFTR (opt) on satellite-aerial: 19.8% slower than LiteSAM (opt) on AGX Orin but with 2.4x more parameters. EfficientLoFTR achieves competitive accuracy. LiteSAM paper Table 3/4 provides direct comparison.
- **Source**: LiteSAM paper, Section 4.5
- **Confidence**: ✅ High
- **Related Dimension**: EfficientLoFTR vs LiteSAM
@@ -0,0 +1,45 @@
# Comparison Framework
## Selected Framework Type
Decision Support + Problem Diagnosis
## Selected Dimensions
1. Inference speed on Orin Nano Super
2. Accuracy for the target task
3. Cross-view robustness (satellite-aerial gap)
4. Implementation complexity / ecosystem maturity
5. Memory footprint
6. TensorRT optimization readiness
## Comparison 1: Visual Odometry — cuVSLAM vs SuperPoint+LightGlue
| Dimension | cuVSLAM v15.0.0 | SuperPoint + LightGlue (TRT) | Factual Basis |
|-----------|-----------------|-------------------------------|---------------|
| Speed on Orin Nano | ~8.6ms/frame (116fps @ 720p) | Est. ~150-300ms/frame (SP ~40-60ms + LG ~100-200ms) | Fact #1, #2, #3 |
| VO accuracy (KITTI) | <1% trajectory error | ~1% odometric error (desktop) | Fact #1, #2 |
| VO accuracy (EuRoC) | <5cm position error | Not benchmarked | Fact #1 |
| IMU integration | Native mono+IMU mode, auto-fallback | None — must add custom IMU fusion | Fact #1 |
| Loop closure | Built-in | Not available | Fact #1 |
| TensorRT ready | Native CUDA (not TensorRT, raw CUDA) | Requires ONNX export + TRT build | Fact #4 |
| Memory | ~200-300MB | SP ~50MB + LG ~50-100MB = ~100-150MB | Fact #1 |
| Implementation | pip install aarch64 wheel | Custom pipeline: SP export + LG export + matching + pose estimation | Fact #1, #4 |
| Maturity on Jetson | NVIDIA-maintained, production-ready | Community TRT plugins, limited Jetson benchmarks | Fact #4, #5 |
## Comparison 2: LiteSAM Speed at Different Resolutions
| Dimension | 1184px (paper default) | 1280px (user proposal) | 640px | 480px | Factual Basis |
|-----------|------------------------|------------------------|-------|-------|---------------|
| Tokens at 1/8 scale | ~21,904 | ~25,600 | ~6,400 | ~3,600 | Fact #7 |
| AGX Orin time | 497ms | Est. ~580ms (1.17x tokens) | Est. ~150ms | Est. ~90ms | Fact #5, #7 |
| Orin Nano Super time (est.) | ~1.5-2.0s | ~1.7-2.3s | ~450-600ms | ~270-360ms | Fact #5, #6 |
| Accuracy (RMSE@30) | 17.86m | Similar (slightly less) | Degraded | Significantly degraded | Fact #8, #10 |
## Comparison 3: Satellite Matching — LiteSAM vs SP+LG vs XFeat
| Dimension | LiteSAM (opt) | SuperPoint+LightGlue | XFeat semi-dense | Factual Basis |
|-----------|--------------|---------------------|------------------|---------------|
| Cross-view accuracy | RMSE@30 = 17.86m (UAV-VisLoc) | Worse than LiteSAM (paper confirms) | Not benchmarked on UAV-VisLoc | Fact #9, #10 |
| Speed on Orin Nano (est.) | ~1.5-2s @ 1184px, ~270-360ms @ 480px | Est. ~100-200ms total | ~50-100ms | Fact #5, #2, existing draft |
| Cross-view robustness | Designed for satellite-aerial gap | Sparse matcher, "lacks sufficient accuracy" for cross-view | General-purpose, less robust | Fact #9, #13 |
| Parameters | 6.31M | SP ~1.3M + LG ~7M = ~8.3M | ~5M | Fact #5 |
| Approach | Semi-dense (coarse-to-fine, subpixel) | Sparse (detect → match → verify) | Semi-dense (detect → KNN → refine) | Fact #1, existing draft |
@@ -0,0 +1,90 @@
# Reasoning Chain
## Dimension 1: SuperPoint+LightGlue for Visual Odometry
### Fact Confirmation
cuVSLAM achieves 116fps (~8.6ms/frame) on Orin Nano 8GB at 720p (Fact #1). SP+LG achieves ~10fps on KITTI on desktop GPU (Fact #2). SuperPoint alone takes ~40ms on Jetson for 200 keypoints (Fact #3). LightGlue matching on desktop GPU takes ~20-34ms for 274 keypoints (Fact #2).
### Extrapolation to Orin Nano Super
On Orin Nano Super, estimating SP+LG pipeline:
- SuperPoint extraction (1024 keypoints, 720p): ~50-80ms (based on Fact #3, scaled for more keypoints)
- LightGlue matching (TensorRT FP16, 1024 keypoints): ~80-200ms (based on Fact #11 — 2-4x speedup over PyTorch, but Orin Nano is ~4-6x slower than RTX 4080)
- Total SP+LG: ~130-280ms per frame
cuVSLAM: ~8.6ms per frame.
SP+LG would be **15-33x slower** than cuVSLAM for visual odometry on Orin Nano Super.
### Additional Considerations
cuVSLAM includes native IMU integration, loop closure, and auto-fallback. SP+LG provides none of these — they would need custom implementation, adding both development time and latency.
### Conclusion
**SP+LG is not viable as a cuVSLAM replacement for VO on Orin Nano Super.** cuVSLAM is purpose-built for Jetson and 15-33x faster. SP+LG's value lies in its accuracy for feature matching tasks, not real-time VO on edge hardware.
### Confidence
✅ High — performance gap is enormous and well-supported by multiple sources.
---
## Dimension 2: LiteSAM Speed vs Image Resolution (1280px question)
### Fact Confirmation
LiteSAM (opt) achieves 497ms on AGX Orin at 1184px (Fact #5). AGX Orin is ~3-4x more powerful than Orin Nano Super (Fact #6). LiteSAM processes at 1/8 scale internally — coarse matching is O(N²) where N is proportional to resolution² (Fact #7).
### Resolution Scaling Analysis
**1280px vs 1184px**: Token count increases from ~21,904 to ~25,600 (+17%). Compute increases ~17-37% (linear to quadratic depending on bottleneck). This makes the problem WORSE, not better.
**The user's intuition is likely**: "If 6252×4168 camera images are huge, maybe LiteSAM is slow because we feed it those big images. What if we use 1280px?" But the solution draft already specifies resizing to 480-640px before feeding LiteSAM. The 497ms benchmark on AGX Orin was already at 1184px (the UAV-VisLoc benchmark resolution).
**The real bottleneck is hardware, not image size:**
- At 1184px on AGX Orin: 497ms → on Orin Nano Super: est. **~1.5-2.0s**
- At 1280px on Orin Nano Super: est. **~1.7-2.3s** (WORSE — more tokens)
- At 640px on Orin Nano Super: est. **~450-600ms** (borderline)
- At 480px on Orin Nano Super: est. **~270-360ms** (possibly within 400ms budget)
### Conclusion
**1280px would make LiteSAM SLOWER, not faster.** The paper benchmarked at 1184px. The bottleneck is the hardware gap (AGX Orin 275 TOPS → Orin Nano Super 67 TOPS). To make LiteSAM fit the 400ms budget, resolution must drop to ~480px, which may significantly degrade cross-view matching accuracy. The original solution draft's approach (benchmark at 480px, abandon if too slow) remains correct.
### Confidence
✅ High — paper benchmarks + hardware specs provide strong basis.
---
## Dimension 3: SP+LG for Satellite Matching (alternative to LiteSAM)
### Fact Confirmation
LiteSAM paper explicitly states "SP+LG achieves the fastest inference speed but at the expense of accuracy" on satellite-aerial benchmarks (Fact #9). SP+LG is a sparse matcher; the paper notes sparse matchers "lack sufficient accuracy" for cross-view UAV-satellite matching due to texture-scarce regions (Fact #13). LiteSAM achieves RMSE@30 = 17.86m; SP+LG is worse (Fact #10).
### Speed Advantage of SP+LG
On Orin Nano Super, SP+LG satellite matching pipeline:
- SuperPoint extraction (both images): ~50-80ms × 2 images
- LightGlue matching: ~80-200ms
- Total: ~180-360ms
This is competitive with the 400ms budget. But accuracy is worse than LiteSAM.
### Comparison with XFeat
XFeat semi-dense: ~50-100ms on Orin Nano Super (from existing draft). XFeat is 2-4x faster than SP+LG and also handles semi-dense matching. For the satellite matching role, XFeat is a better "fast fallback" than SP+LG.
### Conclusion
**SP+LG is not recommended for satellite matching.** It's slower than XFeat and less accurate than LiteSAM for cross-view matching. XFeat remains the better fallback. SP+LG could serve as a third-tier fallback, but the added complexity isn't justified given XFeat's advantages.
### Confidence
✅ High — direct comparison from the LiteSAM paper.
---
## Dimension 4: Other Weak Points in solution_draft01
### cuVSLAM Nadir Camera Concern
The solution correctly flags cuVSLAM's "nadir-only camera" as untested. cuVSLAM was designed for robotics (forward-facing cameras). Nadir UAV camera looking straight down at terrain has different motion characteristics. However, cuVSLAM supports arbitrary camera configurations and IMU mode should compensate. **Risk is MEDIUM, mitigation is adequate** (XFeat fallback).
### Memory Budget Gap
The solution estimates ~1.9-2.4GB total. This looks optimistic if cuVSLAM needs to maintain a map for loop closure. The cuVSLAM map grows over time. For a 3000-frame flight (~16 min at 3fps), map memory could grow to 500MB-1GB. **Risk: memory pressure late in flight.** Mitigation: configure cuVSLAM map pruning, limit map size.
### Tile Search Strategy Underspecified
The solution mentions GeoHash-indexed tiles but doesn't detail how the system determines which tile to match against when ESKF position has high uncertainty (e.g., after VO failure). The expanded search (±1km) could require loading 10-20 tiles, which is slow from storage.
### Confidence
⚠️ Medium — these are analytical observations, not empirically verified.
@@ -0,0 +1,52 @@
# Validation Log
## Validation Scenario 1: SP+LG for VO during Normal Flight
A UAV flies straight at 3fps. Each frame needs VO within 400ms.
### Expected Based on Conclusions
cuVSLAM: processes each frame in ~8.6ms, leaves 391ms for satellite matching and fusion. Immediate VO result via SSE.
SP+LG: processes each frame in ~130-280ms, leaves ~120-270ms. May interfere with satellite matching CUDA resources.
### Actual Validation
cuVSLAM is clearly superior. SP+LG offers no advantage here — cuVSLAM is 15-33x faster AND includes IMU fallback. SP+LG would require building a custom VO pipeline around a feature matcher, whereas cuVSLAM is a complete VO solution.
### Counterexamples
If cuVSLAM fails on nadir camera (its main risk), SP+LG could serve as a fallback VO method. But XFeat frame-to-frame (~30-50ms) is already identified as the cuVSLAM fallback and is 3-6x faster than SP+LG.
## Validation Scenario 2: LiteSAM at 1280px on Orin Nano Super
A keyframe needs satellite matching. Image is resized to 1280px for LiteSAM.
### Expected Based on Conclusions
LiteSAM at 1280px on Orin Nano Super: ~1.7-2.3s. This is 4-6x over the 400ms budget. Even running async, it means satellite corrections arrive 5-7 frames later.
### Actual Validation
1280px is LARGER than the paper's 1184px benchmark resolution. The user likely assumed we feed the full camera image (6252×4168) to LiteSAM, causing slowness. But the solution already downsamples. The bottleneck is the hardware performance gap (Orin Nano Super = ~25% of AGX Orin compute).
### Counterexamples
If LiteSAM's TensorRT FP16 engine with reparameterized MobileOne achieves better optimization than the paper's AMP benchmark (which uses PyTorch, not TensorRT), speed could improve 2-3x. At 480px with TensorRT FP16: potentially ~90-180ms on Orin Nano Super. This is worth benchmarking.
## Validation Scenario 3: SP+LG as Satellite Matcher After LiteSAM Abandonment
LiteSAM fails benchmark. Instead of XFeat, we try SP+LG for satellite matching.
### Expected Based on Conclusions
SP+LG: ~180-360ms on Orin Nano Super. Accuracy is worse than LiteSAM for cross-view matching.
XFeat: ~50-100ms. Accuracy is unproven on cross-view but general-purpose semi-dense.
### Actual Validation
SP+LG is 2-4x slower than XFeat and the LiteSAM paper confirms worse accuracy for satellite-aerial. XFeat's semi-dense approach is more suited to the texture-scarce regions common in UAV imagery. SP+LG's sparse keypoint detection may fail on agricultural fields or water bodies.
### Counterexamples
SP+LG could outperform XFeat on high-texture urban areas where sparse features are abundant. But the operational region (eastern Ukraine) is primarily agricultural, making this advantage unlikely.
## Review Checklist
- [x] Draft conclusions consistent with fact cards
- [x] No important dimensions missed
- [x] No over-extrapolation
- [x] Conclusions actionable/verifiable
- [ ] Note: Orin Nano Super estimates are extrapolated from AGX Orin data using the 3-4x compute ratio. Day-one benchmarking remains essential.
## Conclusions Requiring Revision
None — the original solution draft's architecture (cuVSLAM for VO, benchmark-driven LiteSAM/XFeat for satellite) is confirmed sound. SP+LG is not recommended for either role on this hardware.
@@ -0,0 +1,102 @@
# Question Decomposition
## Original Question
Assess solution_draft02.md against updated acceptance criteria and restrictions. The AC and restrictions have been significantly revised to reflect real onboard deployment requirements (MAVLink integration, ground station telemetry, startup/failsafe, object localization, thermal management, satellite imagery specs).
## Active Mode
Mode B: Solution Assessment — `solution_draft02.md` is the latest draft in OUTPUT_DIR.
## Question Type
Problem Diagnosis + Decision Support
## Research Subject Boundary
- **Population**: GPS-denied UAV navigation systems on edge hardware (Jetson Orin Nano Super)
- **Geography**: Eastern/southern Ukraine (east of Dnipro River), conflict zone
- **Timeframe**: Current (2025-2026), latest available tools and libraries
- **Level**: Onboard companion computer, real-time flight controller integration via MAVLink
## Key Delta: What Changed in AC/Restrictions
### Restrictions Changes
1. Two cameras: Navigation (fixed, downward) + AI camera (configurable angle/zoom)
2. Processing on Jetson Orin Nano Super (was "stationary computer or laptop")
3. IMU data IS available via flight controller (was "NO data from IMU")
4. MAVLink protocol via MAVSDK for flight controller communication
5. Must output GPS_INPUT messages as GPS replacement
6. Ground station telemetry link available but bandwidth-limited
7. Thermal throttling must be accounted for
8. Satellite imagery pre-loaded, storage limited
### Acceptance Criteria Changes
1. <400ms per frame to flight controller (was <5s for processing)
2. MAVLink GPS_INPUT output (was REST API + SSE)
3. Ground station: stream position/confidence, receive re-localization commands
4. Object localization: trigonometric GPS from AI camera angle/zoom/altitude
5. Startup: initialize from last known GPS before GPS denial
6. Failsafe: IMU-only fallback after N seconds of total failure
7. Reboot recovery: re-initialize from flight controller IMU-extrapolated position
8. Max cumulative VO drift <100m between satellite anchors
9. Confidence score per position estimate (high/low)
10. Satellite imagery: ≥0.5 m/pixel, <2 years old
11. WGS84 output format
12. Re-localization via telemetry to ground station (not REST API user input)
## Decomposed Sub-Questions
### Q1: MAVLink GPS_INPUT Integration
- How does MAVSDK Python handle GPS_INPUT messages?
- What fields are required in GPS_INPUT?
- What update rate does the flight controller expect?
- Can we send confidence/accuracy indicators via MAVLink?
- How does this replace the REST API + SSE architecture?
### Q2: Ground Station Telemetry Integration
- How to stream position + confidence over bandwidth-limited telemetry?
- How to receive operator re-localization commands?
- What MAVLink messages support custom data?
- What bandwidth is typical for UAV telemetry links?
### Q3: Startup & Failsafe Mechanisms
- How to initialize from flight controller's last GPS position?
- How to detect GPS denial onset?
- What happens on companion computer reboot mid-flight?
- How to implement IMU-only dead reckoning fallback?
### Q4: Object Localization via AI Camera
- How to compute ground GPS from UAV position + camera angle + zoom + altitude?
- What accuracy can be expected given GPS-denied position error?
- How to handle the API between GPS-denied system and AI detection system?
### Q5: Thermal Management on Jetson Orin Nano Super
- What is sustained thermal performance under GPU load?
- How to monitor and mitigate thermal throttling?
- What power modes are available?
### Q6: VO Drift Budget & Monitoring
- How to measure cumulative drift between satellite anchors?
- How to trigger satellite matching when drift approaches 100m?
- ESKF covariance as drift proxy?
### Q7: Weak Points in Draft02 Architecture
- REST API + SSE architecture is wrong — must be MAVLink
- No ground station integration
- No startup/shutdown procedures
- No thermal management
- No object localization detail for AI camera with configurable angle/zoom
- Memory budget doesn't account for MAVSDK overhead
## Timeliness Sensitivity Assessment
- **Research Topic**: MAVLink integration, MAVSDK for Jetson, ground station telemetry, thermal management
- **Sensitivity Level**: 🟠 High
- **Rationale**: MAVSDK actively developed; MAVLink message set evolving; Jetson JetPack 6.2 specific
- **Source Time Window**: 12 months
- **Priority official sources**:
1. MAVSDK Python documentation (mavsdk.io)
2. MAVLink message definitions (mavlink.io)
3. NVIDIA Jetson Orin Nano thermal documentation
4. PX4/ArduPilot GPS_INPUT documentation
- **Key version information**:
- MAVSDK-Python: latest PyPI version
- MAVLink: v2 protocol
- JetPack: 6.2.2
- PyCuVSLAM: v15.0.0
@@ -0,0 +1,175 @@
# Source Registry
## Source #1
- **Title**: MAVSDK-Python Issue #320: Input external GPS through MAVSDK
- **Link**: https://github.com/mavlink/MAVSDK-Python/issues/320
- **Tier**: L4
- **Publication Date**: 2021 (still open 2025)
- **Timeliness Status**: ✅ Currently valid
- **Version Info**: MAVSDK-Python — GPS_INPUT not supported as of v3.15.3
- **Target Audience**: Companion computer developers
- **Research Boundary Match**: ✅ Full match
- **Summary**: MAVSDK-Python does not support GPS_INPUT message. Feature requested but unresolved.
- **Related Sub-question**: Q1
## Source #2
- **Title**: MAVLink GPS_INPUT Message Specification (mavlink_msg_gps_input.h)
- **Link**: https://rflysim.com/doc/en/RflySimAPIs/RflySimSDK/html/mavlink__msg__gps__input_8h_source.html
- **Tier**: L1
- **Publication Date**: 2024
- **Timeliness Status**: ✅ Currently valid
- **Version Info**: MAVLink v2, Message ID 232
- **Target Audience**: MAVLink developers
- **Research Boundary Match**: ✅ Full match
- **Summary**: GPS_INPUT message fields: lat/lon (1E7), alt, fix_type, horiz_accuracy, vert_accuracy, speed_accuracy, hdop, vdop, satellites_visible, velocities NED, yaw, ignore_flags.
- **Related Sub-question**: Q1
## Source #3
- **Title**: ArduPilot GPS Input MAVProxy Documentation
- **Link**: https://ardupilot.org/mavproxy/docs/modules/GPSInput.html
- **Tier**: L1
- **Publication Date**: 2024
- **Timeliness Status**: ✅ Currently valid
- **Version Info**: ArduPilot GPS1_TYPE=14
- **Target Audience**: ArduPilot companion computer developers
- **Research Boundary Match**: ✅ Full match
- **Summary**: GPS_INPUT requires GPS1_TYPE=14. Accepts JSON over UDP port 25100. Fields: lat, lon, alt, fix_type, hdop, timestamps.
- **Related Sub-question**: Q1
## Source #4
- **Title**: pymavlink GPS_INPUT example
- **Link**: https://webperso.ensta.fr/lebars/Share/GPS_INPUT_pymavlink.py
- **Tier**: L3
- **Publication Date**: 2023
- **Timeliness Status**: ✅ Currently valid
- **Version Info**: pymavlink
- **Target Audience**: Companion computer developers
- **Research Boundary Match**: ✅ Full match
- **Summary**: Complete pymavlink example for sending GPS_INPUT with all fields including yaw. Uses gps_input_send() method.
- **Related Sub-question**: Q1
## Source #5
- **Title**: ArduPilot AP_GPS_Params.cpp — GPS_RATE_MS
- **Link**: https://cocalc.com/github/ardupilot/ardupilot/blob/master/libraries/AP_GPS/AP_GPS_Params.cpp
- **Tier**: L1
- **Publication Date**: 2024
- **Timeliness Status**: ✅ Currently valid
- **Version Info**: ArduPilot master
- **Target Audience**: ArduPilot developers
- **Research Boundary Match**: ✅ Full match
- **Summary**: GPS_RATE_MS default 200ms (5Hz), range 50-200ms (5-20Hz). Below 5Hz not allowed.
- **Related Sub-question**: Q1
## Source #6
- **Title**: MAVLink Telemetry Bandwidth Optimization Issue #1605
- **Link**: https://github.com/mavlink/mavlink/issues/1605
- **Tier**: L2
- **Publication Date**: 2022
- **Timeliness Status**: ✅ Currently valid
- **Target Audience**: MAVLink protocol developers
- **Research Boundary Match**: ✅ Full match
- **Summary**: Minimal telemetry requires ~12kbit/s. Optimized ~6kbit/s. SiK at 57600 baud provides ~21% usable budget. RFD900 for long range (15km+).
- **Related Sub-question**: Q2
## Source #7
- **Title**: NVIDIA JetPack 6.2 Super Mode Blog
- **Link**: https://developer.nvidia.com/blog/nvidia-jetpack-6-2-brings-super-mode-to-nvidia-jetson-orin-nano-and-jetson-orin-nx-modules/
- **Tier**: L1
- **Publication Date**: 2025-01
- **Timeliness Status**: ✅ Currently valid
- **Version Info**: JetPack 6.2, Orin Nano Super
- **Target Audience**: Jetson developers
- **Research Boundary Match**: ✅ Full match
- **Summary**: MAXN SUPER mode for peak performance. Thermal throttling at 80°C. Power modes: 15W, 25W, MAXN SUPER. Up to 1.7x AI boost.
- **Related Sub-question**: Q5
## Source #8
- **Title**: Jetson Orin Nano Power Consumption Analysis
- **Link**: https://edgeaistack.app/blog/jetson-orin-nano-power-consumption/
- **Tier**: L3
- **Publication Date**: 2025
- **Timeliness Status**: ✅ Currently valid
- **Target Audience**: Jetson edge deployment engineers
- **Research Boundary Match**: ✅ Full match
- **Summary**: 5W idle, 8-12W typical inference, 25W peak. Throttling above 80°C drops GPU from 1GHz to 300MHz. Active cooling required for sustained loads.
- **Related Sub-question**: Q5
## Source #9
- **Title**: UAV Target Geolocation (Sensors 2022)
- **Link**: https://www.mdpi.com/1424-8220/22/5/1903
- **Tier**: L1
- **Publication Date**: 2022
- **Timeliness Status**: ✅ Currently valid (math doesn't change)
- **Target Audience**: UAV reconnaissance systems
- **Research Boundary Match**: ✅ Full match
- **Summary**: Trigonometric target geolocation from camera angle, altitude, UAV position. Iterative refinement improves accuracy 22-38x.
- **Related Sub-question**: Q4
## Source #10
- **Title**: pymavlink vs MAVSDK-Python for custom messages (Issue #739)
- **Link**: https://github.com/mavlink/MAVSDK-Python/issues/739
- **Tier**: L4
- **Publication Date**: 2024-12
- **Timeliness Status**: ✅ Currently valid
- **Version Info**: MAVSDK-Python, pymavlink
- **Target Audience**: Companion computer developers
- **Research Boundary Match**: ✅ Full match
- **Summary**: MAVSDK-Python lacks custom message support. pymavlink recommended for GPS_INPUT and custom messages. MAVSDK v4 may add MavlinkDirect plugin.
- **Related Sub-question**: Q1
## Source #11
- **Title**: NAMED_VALUE_FLOAT for custom telemetry (PR #18501)
- **Link**: https://github.com/ArduPilot/ardupilot/pull/18501
- **Tier**: L2
- **Publication Date**: 2022
- **Timeliness Status**: ✅ Currently valid
- **Target Audience**: ArduPilot companion computer developers
- **Research Boundary Match**: ✅ Full match
- **Summary**: NAMED_VALUE_FLOAT messages from companion computer are logged by ArduPilot and forwarded to GCS. Useful for custom telemetry data.
- **Related Sub-question**: Q2
## Source #12
- **Title**: ArduPilot Companion Computer UART Connection
- **Link**: https://ardupilot.org/dev/docs/raspberry-pi-via-mavlink.html
- **Tier**: L1
- **Publication Date**: 2024
- **Timeliness Status**: ✅ Currently valid
- **Target Audience**: ArduPilot companion computer developers
- **Research Boundary Match**: ✅ Full match
- **Summary**: Connect via TELEM2 UART. SERIAL2_PROTOCOL=2 (MAVLink2). Baud up to 1.5Mbps. TX/RX crossover.
- **Related Sub-question**: Q1, Q2
## Source #13
- **Title**: Jetson Orin Nano UART with ArduPilot
- **Link**: https://forums.developer.nvidia.com/t/uart-connection-between-jetson-nano-orin-and-ardupilot/325416
- **Tier**: L4
- **Publication Date**: 2025
- **Timeliness Status**: ✅ Currently valid
- **Version Info**: JetPack 6.x, Orin Nano
- **Target Audience**: Jetson + ArduPilot integration
- **Research Boundary Match**: ✅ Full match
- **Summary**: UART instability reported on Orin Nano with ArduPilot. Use /dev/ttyTHS0 or /dev/ttyTHS1. Check pinout carefully.
- **Related Sub-question**: Q1
## Source #14
- **Title**: MAVSDK-Python v3.15.3 PyPI (aarch64 wheels)
- **Link**: https://pypi.org/project/mavsdk/
- **Tier**: L1
- **Publication Date**: 2025
- **Timeliness Status**: ✅ Currently valid
- **Version Info**: v3.15.3, manylinux2014_aarch64
- **Target Audience**: MAVSDK Python developers
- **Research Boundary Match**: ✅ Full match
- **Summary**: MAVSDK-Python has aarch64 wheels. pip install works on Jetson. But no GPS_INPUT support.
- **Related Sub-question**: Q1
## Source #15
- **Title**: ArduPilot receive COMMAND_LONG on companion computer
- **Link**: https://discuss.ardupilot.org/t/recieve-mav-cmd-on-companion-computer/48928
- **Tier**: L4
- **Publication Date**: 2020
- **Timeliness Status**: ⚠️ Needs verification (old but concept still valid)
- **Target Audience**: ArduPilot companion computer developers
- **Research Boundary Match**: ✅ Full match
- **Summary**: Companion computer can receive COMMAND_LONG messages from GCS via MAVLink. ArduPilot scripting can intercept specific command IDs.
- **Related Sub-question**: Q2
@@ -0,0 +1,105 @@
# Fact Cards
## Fact #1
- **Statement**: MAVSDK-Python (v3.15.3) does NOT support sending GPS_INPUT MAVLink messages. The feature has been requested since 2021 and remains unresolved. Custom message support is planned for MAVSDK v4 but not available in Python wrapper.
- **Source**: Source #1, #10, #14
- **Phase**: Assessment
- **Target Audience**: GPS-Denied system developers
- **Confidence**: ✅ High — confirmed by MAVSDK maintainers
- **Related Dimension**: Flight Controller Integration
## Fact #2
- **Statement**: pymavlink provides full access to all MAVLink messages including GPS_INPUT via `mav.gps_input_send()`. It is the recommended library for companion computers that need to send GPS_INPUT messages.
- **Source**: Source #4, #10
- **Phase**: Assessment
- **Target Audience**: GPS-Denied system developers
- **Confidence**: ✅ High — working examples exist
- **Related Dimension**: Flight Controller Integration
## Fact #3
- **Statement**: GPS_INPUT (MAVLink msg ID 232) contains: lat/lon (WGS84, degrees×1E7), alt (AMSL), fix_type (0-8), horiz_accuracy (m), vert_accuracy (m), speed_accuracy (m/s), hdop, vdop, satellites_visible, vn/ve/vd (NED m/s), yaw (centidegrees), gps_id, ignore_flags.
- **Source**: Source #2
- **Phase**: Assessment
- **Target Audience**: GPS-Denied system developers
- **Confidence**: ✅ High — official MAVLink spec
- **Related Dimension**: Flight Controller Integration
## Fact #4
- **Statement**: ArduPilot requires GPS1_TYPE=14 (MAVLink) to accept GPS_INPUT messages from a companion computer. Connection via TELEM2 UART, SERIAL2_PROTOCOL=2 (MAVLink2).
- **Source**: Source #3, #12
- **Phase**: Assessment
- **Target Audience**: GPS-Denied system developers
- **Confidence**: ✅ High — official ArduPilot documentation
- **Related Dimension**: Flight Controller Integration
## Fact #5
- **Statement**: ArduPilot GPS update rate (GPS_RATE_MS) default is 200ms (5Hz), range 50-200ms (5-20Hz). Our camera at 3fps (333ms) means GPS_INPUT at 3Hz. ArduPilot minimum is 5Hz. We must interpolate/predict between camera frames to meet 5Hz minimum.
- **Source**: Source #5
- **Phase**: Assessment
- **Target Audience**: GPS-Denied system developers
- **Confidence**: ✅ High — ArduPilot source code
- **Related Dimension**: Flight Controller Integration
## Fact #6
- **Statement**: GPS_INPUT horiz_accuracy field directly maps to our confidence scoring. We can report: satellite-anchored ≈ 10-20m accuracy, VO-extrapolated ≈ 20-50m, IMU-only ≈ 100m+. ArduPilot EKF uses this for fusion weighting internally.
- **Source**: Source #2, #3
- **Phase**: Assessment
- **Target Audience**: GPS-Denied system developers
- **Confidence**: ⚠️ Medium — accuracy mapping is estimated, EKF weighting not fully documented
- **Related Dimension**: Flight Controller Integration
## Fact #7
- **Statement**: Typical UAV telemetry bandwidth: SiK radio at 57600 baud provides ~12kbit/s usable for MAVLink. RFD900 provides long range (15km+) at similar data rates. Position telemetry must be compact — ~50 bytes per position update.
- **Source**: Source #6
- **Phase**: Assessment
- **Target Audience**: GPS-Denied system developers
- **Confidence**: ✅ High — MAVLink protocol specs
- **Related Dimension**: Ground Station Telemetry
## Fact #8
- **Statement**: NAMED_VALUE_FLOAT MAVLink message can stream custom telemetry from companion computer to ground station. ArduPilot logs and forwards these. Mission Planner displays them. Useful for confidence score, drift status, matching status.
- **Source**: Source #11
- **Phase**: Assessment
- **Target Audience**: GPS-Denied system developers
- **Confidence**: ✅ High — ArduPilot merged PR
- **Related Dimension**: Ground Station Telemetry
## Fact #9
- **Statement**: Jetson Orin Nano Super throttles GPU from 1GHz to ~300MHz when junction temperature exceeds 80°C. Active cooling (fan) required for sustained load. Power consumption: 5W idle, 8-12W typical inference, 25W peak. Modes: 15W, 25W, MAXN SUPER.
- **Source**: Source #7, #8
- **Phase**: Assessment
- **Target Audience**: GPS-Denied system developers
- **Confidence**: ✅ High — NVIDIA official
- **Related Dimension**: Thermal Management
## Fact #10
- **Statement**: Jetson Orin Nano UART connection to ArduPilot uses /dev/ttyTHS0 or /dev/ttyTHS1. UART instability reported on some units — verify pinout, use JetPack 6.2.2+. Baud up to 1.5Mbps supported.
- **Source**: Source #12, #13
- **Phase**: Assessment
- **Target Audience**: GPS-Denied system developers
- **Confidence**: ⚠️ Medium — UART instability is a known issue with workarounds
- **Related Dimension**: Flight Controller Integration
## Fact #11
- **Statement**: Object geolocation from UAV: for nadir (downward) camera, pixel offset from center → meters via GSD → rotate by heading → add to UAV GPS. For oblique (AI) camera with angle θ from vertical: ground_distance = altitude × tan(θ). Combined with zoom → effective focal length → pixel-to-meter conversion. Flat terrain assumption simplifies to basic trigonometry.
- **Source**: Source #9
- **Phase**: Assessment
- **Target Audience**: GPS-Denied system developers
- **Confidence**: ✅ High — well-established trigonometry
- **Related Dimension**: Object Localization
## Fact #12
- **Statement**: Companion computer can receive COMMAND_LONG from ground station via MAVLink. For re-localization hints: ground station sends a custom command with approximate lat/lon, companion computer receives it via pymavlink message listener.
- **Source**: Source #15
- **Phase**: Assessment
- **Target Audience**: GPS-Denied system developers
- **Confidence**: ⚠️ Medium — specific implementation for re-localization hint would be custom
- **Related Dimension**: Ground Station Telemetry
## Fact #13
- **Statement**: The restrictions.md now says "using MAVSDK library" but MAVSDK-Python cannot send GPS_INPUT. pymavlink is the only viable Python option for GPS_INPUT. This is a restriction conflict that must be resolved — use pymavlink for GPS_INPUT (core function) or accept MAVSDK + pymavlink hybrid.
- **Source**: Source #1, #2, #10
- **Phase**: Assessment
- **Target Audience**: GPS-Denied system developers
- **Confidence**: ✅ High — confirmed limitation
- **Related Dimension**: Flight Controller Integration
@@ -0,0 +1,62 @@
# Comparison Framework
## Selected Framework Type
Problem Diagnosis + Decision Support (Mode B)
## Weak Point Dimensions (from draft02 → new AC/restrictions)
### Dimension 1: Output Architecture (CRITICAL)
Draft02 uses FastAPI + SSE to stream positions to clients.
New AC requires MAVLink GPS_INPUT to flight controller as PRIMARY output.
Entire output architecture must change.
### Dimension 2: Ground Station Communication (CRITICAL)
Draft02 has no ground station integration.
New AC requires: stream position+confidence via telemetry, receive re-localization commands.
### Dimension 3: MAVLink Library Choice (CRITICAL)
Restrictions say "MAVSDK library" but MAVSDK-Python cannot send GPS_INPUT.
Must use pymavlink for core function.
### Dimension 4: GPS Update Rate (HIGH)
Camera at 3fps → 3Hz position updates. ArduPilot minimum GPS rate is 5Hz.
Need IMU-based interpolation between camera frames.
### Dimension 5: Startup & Failsafe (HIGH)
Draft02 has no initialization or failsafe procedures.
New AC requires: init from last GPS, reboot recovery, IMU fallback after N seconds.
### Dimension 6: Object Localization (MEDIUM)
Draft02 has basic pixel-to-GPS for navigation camera only.
New AC adds AI camera with configurable angle, zoom — trigonometric projection needed.
### Dimension 7: Thermal Management (MEDIUM)
Draft02 ignores thermal throttling.
Jetson Orin Nano Super throttles at 80°C — can drop GPU 3x.
### Dimension 8: VO Drift Budget Monitoring (MEDIUM)
New AC: max cumulative VO drift <100m between satellite anchors.
Draft02 uses ESKF covariance but doesn't explicitly track drift budget.
### Dimension 9: Satellite Imagery Specs (LOW)
New AC: ≥0.5 m/pixel, <2 years old. Draft02 uses Google Maps zoom 18-19 which is ~0.3-0.6 m/pixel.
Mostly compatible, needs explicit validation.
### Dimension 10: API for Internal Systems (LOW)
Object localization requests from AI systems need a local IPC mechanism.
FastAPI could be retained for local-only inter-process communication.
## Initial Population
| Dimension | Draft02 State | Required State | Gap Severity |
|-----------|--------------|----------------|-------------|
| Output Architecture | FastAPI + SSE to client | MAVLink GPS_INPUT to flight controller | CRITICAL — full redesign |
| Ground Station | None | Bidirectional MAVLink telemetry | CRITICAL — new component |
| MAVLink Library | Not applicable (no MAVLink) | pymavlink (MAVSDK can't do GPS_INPUT) | CRITICAL — new dependency |
| GPS Update Rate | 3fps → ~3Hz output | ≥5Hz to ArduPilot EKF | HIGH — need IMU interpolation |
| Startup & Failsafe | None | Init from GPS, reboot recovery, IMU fallback | HIGH — new procedures |
| Object Localization | Basic nadir pixel-to-GPS | AI camera angle+zoom trigonometry | MEDIUM — extend existing |
| Thermal Management | Not addressed | Monitor + mitigate throttling | MEDIUM — operational concern |
| VO Drift Budget | ESKF covariance only | Explicit <100m tracking + trigger | MEDIUM — extend ESKF |
| Satellite Imagery Specs | Google Maps zoom 18-19 | ≥0.5 m/pixel, <2 years | LOW — mostly met |
| Internal IPC | REST API | Lightweight local API or shared memory | LOW — simplify from draft02 |
@@ -0,0 +1,202 @@
# Reasoning Chain
## Dimension 1: Output Architecture
### Fact Confirmation
Per Fact #3, GPS_INPUT (MAVLink msg ID 232) accepts lat/lon in WGS84 (degrees×1E7), altitude, fix_type, accuracy fields, and NED velocities. Per Fact #4, ArduPilot uses GPS1_TYPE=14 to accept MAVLink GPS input. The flight controller's EKF fuses this as if it were a real GPS module.
### Reference Comparison
Draft02 uses FastAPI + SSE to stream position data to a REST client. The new AC requires the system to output GPS coordinates directly to the flight controller via MAVLink GPS_INPUT, replacing the real GPS module. The flight controller then uses these coordinates for navigation/autopilot functions. The ground station receives position data indirectly via the flight controller's telemetry forwarding.
### Conclusion
The entire output architecture must change from REST API + SSE → pymavlink GPS_INPUT sender. FastAPI is no longer the primary output mechanism. It may be retained only for local IPC with other onboard AI systems (object localization requests). The SSE streaming to external clients is replaced by MAVLink telemetry forwarding through the flight controller.
### Confidence
✅ High — clear requirement change backed by MAVLink specification
---
## Dimension 2: Ground Station Communication
### Fact Confirmation
Per Fact #7, typical telemetry bandwidth is ~12kbit/s (SiK). Per Fact #8, NAMED_VALUE_FLOAT can stream custom values from companion to GCS. Per Fact #12, COMMAND_LONG can deliver commands from GCS to companion.
### Reference Comparison
Draft02 has no ground station integration. The new AC requires:
1. Stream position + confidence to ground station (passive, via telemetry forwarding of GPS_INPUT data + custom NAMED_VALUE_FLOAT for confidence/drift)
2. Receive re-localization commands from operator (active, via COMMAND_LONG or custom MAVLink message)
### Conclusion
Ground station communication uses MAVLink messages forwarded through the flight controller's telemetry radio. Position data flows automatically (flight controller forwards GPS data to GCS). Custom telemetry (confidence, drift, status) uses NAMED_VALUE_FLOAT. Re-localization hints from operator use a custom COMMAND_LONG with lat/lon payload. Bandwidth is tight (~12kbit/s) so minimize custom message frequency (1-2Hz max for NAMED_VALUE_FLOAT).
### Confidence
✅ High — standard MAVLink patterns
---
## Dimension 3: MAVLink Library Choice
### Fact Confirmation
Per Fact #1, MAVSDK-Python v3.15.3 does NOT support GPS_INPUT. Per Fact #2, pymavlink provides full GPS_INPUT support via `mav.gps_input_send()`. Per Fact #13, the restrictions say "using MAVSDK library" but MAVSDK literally cannot do the core function.
### Reference Comparison
MAVSDK is a higher-level abstraction over MAVLink. pymavlink is the lower-level direct MAVLink implementation. For GPS_INPUT (our core output), only pymavlink works.
### Conclusion
Use **pymavlink** as the MAVLink library. The restriction mentioning MAVSDK must be noted as a conflict — pymavlink is the only viable option for GPS_INPUT in Python. pymavlink is lightweight, pure Python, works on aarch64, and provides full access to all MAVLink messages. MAVSDK v4 may add custom message support in the future but is not available now.
### Confidence
✅ High — confirmed limitation, clear alternative
---
## Dimension 4: GPS Update Rate
### Fact Confirmation
Per Fact #5, ArduPilot GPS_RATE_MS has a minimum of 200ms (5Hz). Our camera shoots at ~3fps (333ms). We produce a full VO+ESKF position estimate per frame at ~3Hz.
### Reference Comparison
3Hz < 5Hz minimum. ArduPilot's EKF expects at least 5Hz GPS updates for stable fusion.
### Conclusion
Between camera frames, use IMU prediction from the ESKF to interpolate position at 5Hz (or higher, e.g., 10Hz). The ESKF already runs IMU prediction at 100+Hz internally. Simply emit the ESKF predicted state as GPS_INPUT at 5-10Hz. Camera frame updates (3Hz) provide the measurement corrections. This is standard in sensor fusion: prediction runs fast, measurements arrive slower. The `fix_type` field can differentiate: camera-corrected frames → fix_type=3 (3D), IMU-predicted → fix_type=2 (2D) or adjust horiz_accuracy to reflect lower confidence.
### Confidence
✅ High — standard sensor fusion approach
---
## Dimension 5: Startup & Failsafe
### Fact Confirmation
Per new AC: system initializes from last known GPS before GPS denial. On reboot: re-initialize from flight controller's IMU-extrapolated position. On total failure for N seconds: flight controller falls back to IMU-only.
### Reference Comparison
Draft02 has no startup or failsafe procedures. The system was assumed to already know its position at session start.
### Conclusion
Startup sequence:
1. On boot, connect to flight controller via pymavlink
2. Read current GPS position from flight controller (GLOBAL_POSITION_INT or GPS_RAW_INT message)
3. Initialize ESKF state with this position
4. Begin cuVSLAM initialization with first camera frames
5. Start sending GPS_INPUT once ESKF has a valid position estimate
Failsafe:
1. If no position estimate for N seconds → stop sending GPS_INPUT (flight controller auto-detects GPS loss and falls back to IMU)
2. Log failure event
3. Continue attempting VO/satellite matching
Reboot recovery:
1. On companion computer reboot, reconnect to flight controller
2. Read current GPS_RAW_INT (which is now IMU-extrapolated by flight controller)
3. Re-initialize ESKF with this position (lower confidence)
4. Resume normal operation
### Confidence
✅ High — standard autopilot integration patterns
---
## Dimension 6: Object Localization
### Fact Confirmation
Per Fact #11, for oblique camera: ground_distance = altitude × tan(θ) where θ is angle from vertical. Combined with camera azimuth (yaw + camera pan angle) gives direction. With zoom, effective FOV narrows → higher pixel-to-meter resolution.
### Reference Comparison
Draft02 has basic nadir-only projection: pixel offset × GSD → meters → rotate by heading → lat/lon. The AI camera has configurable angle and zoom, so this needs extension.
### Conclusion
Object localization for AI camera:
1. Get current UAV position from GPS-Denied system
2. Get AI camera params: pan angle (azimuth relative to heading), tilt angle (from vertical), zoom level (→ effective focal length)
3. Get pixel coordinates of detected object in AI camera frame
4. Compute: a) bearing = UAV heading + camera pan angle + pixel horizontal offset angle, b) ground_distance = altitude / cos(tilt) × (tilt + pixel vertical offset angle) → simplified for flat terrain, c) convert bearing + distance to lat/lon offset from UAV position
5. Accuracy inherits GPS-Denied position error + projection error from altitude/angle uncertainty
Expose as lightweight local API (Unix socket or shared memory for speed, or simple HTTP on localhost).
### Confidence
✅ High — well-established trigonometry, flat terrain simplifies
---
## Dimension 7: Thermal Management
### Fact Confirmation
Per Fact #9, Jetson Orin Nano Super throttles at 80°C junction temperature, dropping GPU from ~1GHz to ~300MHz (3x slowdown). Active cooling required. Power modes: 15W, 25W, MAXN SUPER.
### Reference Comparison
Draft02 ignores thermal constraints. Our pipeline (cuVSLAM ~9ms + satellite matcher ~50-200ms) runs on GPU continuously at 3fps. This is moderate but sustained load.
### Conclusion
Mitigation:
1. Use 25W power mode (not MAXN SUPER) for stable sustained performance
2. Require active cooling (5V fan, should be standard on any UAV companion computer mount)
3. Monitor temperature via tegrastats/jtop at runtime
4. If temp >75°C: reduce satellite matching frequency (every 5-10 frames instead of 3)
5. If temp >80°C: skip satellite matching entirely, rely on VO+IMU only (cuVSLAM at 9ms is low power)
6. Our total GPU time per 333ms frame: ~9ms cuVSLAM + ~50-200ms satellite match (async) = <60% GPU utilization → thermal throttling unlikely with proper cooling
### Confidence
⚠️ Medium — actual thermal behavior depends on airflow in UAV enclosure, ambient temperature in-flight
---
## Dimension 8: VO Drift Budget Monitoring
### Fact Confirmation
New AC: max cumulative VO drift between satellite correction anchors < 100m. The ESKF maintains a position covariance matrix that grows during VO-only periods and shrinks on satellite corrections.
### Reference Comparison
Draft02 uses ESKF covariance for keyframe selection (trigger satellite match when covariance exceeds threshold) but doesn't explicitly track drift as a budget.
### Conclusion
Use ESKF position covariance diagonal (σ_x² + σ_y²) as the drift estimate. When √(σ_x² + σ_y²) approaches 100m:
1. Force satellite matching on every frame (not just keyframes)
2. Report LOW confidence via GPS_INPUT horiz_accuracy
3. If drift exceeds 100m without satellite correction → flag as critical, increase matching frequency, send alert to ground station
This is essentially what draft02 already does with covariance-based keyframe triggering, but now with an explicit 100m threshold.
### Confidence
✅ High — standard ESKF covariance interpretation
---
## Dimension 9: Satellite Imagery Specs
### Fact Confirmation
New AC: ≥0.5 m/pixel resolution, <2 years old. Google Maps at zoom 18 = ~0.6 m/pixel, zoom 19 = ~0.3 m/pixel.
### Reference Comparison
Draft02 uses Google Maps zoom 18-19. Zoom 19 (0.3 m/pixel) exceeds the requirement. Zoom 18 (0.6 m/pixel) meets the minimum. Age depends on Google's imagery updates for eastern Ukraine — conflict zone may have stale imagery.
### Conclusion
Validate during offline preprocessing:
1. Download at zoom 19 first (0.3 m/pixel)
2. If zoom 19 unavailable for some tiles, fall back to zoom 18 (0.6 m/pixel — exceeds 0.5 minimum)
3. Check imagery date metadata if available from Google Maps API
4. Flag tiles where imagery appears stale (seasonal mismatch, destroyed buildings, etc.)
5. No architectural change needed — add validation step to preprocessing pipeline
### Confidence
⚠️ Medium — Google Maps imagery age is not reliably queryable
---
## Dimension 10: Internal IPC for Object Localization
### Fact Confirmation
Other onboard AI systems need to request GPS coordinates of detected objects. These systems run on the same Jetson.
### Reference Comparison
Draft02 has FastAPI for external API. For local IPC between processes on the same device, FastAPI is overkill but works.
### Conclusion
Retain a minimal FastAPI server on localhost:8000 for inter-process communication:
- POST /localize: accepts pixel coordinates + AI camera params → returns GPS coordinates
- GET /status: returns system health/state for monitoring
This is local-only (bind to 127.0.0.1), not exposed externally. The primary output channel is MAVLink GPS_INPUT. This is a lightweight addition, not the core architecture.
### Confidence
✅ High — simple local IPC pattern
@@ -0,0 +1,88 @@
# Validation Log
## Validation Scenario
A typical 15-minute flight over eastern Ukraine agricultural terrain. GPS is jammed after first 2 minutes. Flight includes straight segments, two sharp 90-degree turns, and one low-texture segment over a large plowed field. Ground station operator monitors via telemetry link. During the flight, companion computer reboots once due to power glitch.
## Expected Based on Conclusions
### Phase 1: Normal start (GPS available, first 2 min)
- System boots, connects to flight controller via pymavlink on UART
- Reads GLOBAL_POSITION_INT → initializes ESKF with real GPS position
- Begins cuVSLAM initialization with first camera frames
- Starts sending GPS_INPUT at 5Hz (ESKF prediction between frames)
- Ground station sees position + confidence via telemetry forwarding
### Phase 2: GPS denial begins
- Flight controller's real GPS becomes unreliable/lost
- GPS-Denied system continues sending GPS_INPUT — seamless for autopilot
- horiz_accuracy changes from real-GPS level to VO-estimated level (~20m)
- cuVSLAM provides VO at every frame (~9ms), ESKF fuses with IMU
- Satellite matching runs every 3-10 frames on keyframes
- After successful satellite match: horiz_accuracy improves, fix_type stays 3
- NAMED_VALUE_FLOAT sends confidence/drift data to ground station at ~1Hz
### Phase 3: Sharp turn
- cuVSLAM loses tracking (no overlapping features)
- ESKF falls back to IMU prediction, horiz_accuracy increases
- Next frame flagged as keyframe → satellite matching triggered immediately
- Satellite match against preloaded tiles using IMU dead-reckoning position
- If match found: position recovered, new segment begins, horiz_accuracy drops
- If 3 consecutive failures: send re-localization request to ground station via NAMED_VALUE_FLOAT/STATUSTEXT
- Ground station operator sends COMMAND_LONG with approximate coordinates
- System receives hint, constrains tile search → likely recovers position
### Phase 4: Low-texture plowed field
- cuVSLAM keypoint count drops below threshold
- Satellite matching frequency increases (every frame)
- If satellite matching works on plowed field vs satellite imagery: position maintained
- If satellite also fails (seasonal difference): drift accumulates, ESKF covariance grows
- When √(σ²) approaches 100m: force continuous satellite matching
- horiz_accuracy reported as 50-100m, fix_type=2
### Phase 5: Companion computer reboot
- Power glitch → Jetson reboots (~30-60 seconds)
- During reboot: flight controller gets no GPS_INPUT → detects GPS timeout → falls back to IMU-only dead reckoning
- Jetson comes back: reconnects via pymavlink, reads GPS_RAW_INT (IMU-extrapolated)
- Initializes ESKF with this position (low confidence, horiz_accuracy=100m)
- Begins cuVSLAM + satellite matching → gradually improves accuracy
- Operator on ground station sees position return with improving confidence
### Phase 6: Object localization request
- AI detection system on same Jetson detects a vehicle in AI camera frame
- Sends POST /localize with pixel coords + camera angle (30° from vertical) + zoom level + altitude (500m)
- GPS-Denied system computes: ground_distance = 500 / cos(30°) = 577m slant, horizontal distance = 500 × tan(30°) = 289m
- Adds bearing from heading + camera pan → lat/lon offset
- Returns GPS coordinates with accuracy estimate (GPS-Denied accuracy + projection error)
## Actual Validation Results
The scenario covers all new AC requirements:
- ✅ MAVLink GPS_INPUT at 5Hz (camera frames + IMU interpolation)
- ✅ Confidence via horiz_accuracy field maps to confidence levels
- ✅ Ground station telemetry via MAVLink forwarding + NAMED_VALUE_FLOAT
- ✅ Re-localization via ground station command
- ✅ Startup from GPS → seamless transition on denial
- ✅ Reboot recovery from flight controller IMU-extrapolated position
- ✅ Drift budget tracking via ESKF covariance
- ✅ Object localization with AI camera angle/zoom
## Counterexamples
### Potential issue: 5Hz interpolation accuracy
Between camera frames (333ms apart), ESKF predicts using IMU only. At 200km/h = 55m/s, the UAV moves ~18m between frames. IMU prediction over 200ms (one interpolation step) at this speed introduces ~1-5m error — acceptable for GPS_INPUT.
### Potential issue: UART reliability
Jetson Orin Nano UART instability reported (Fact #10). If MAVLink connection drops during flight, GPS_INPUT stops → autopilot loses GPS. Mitigation: use TCP over USB-C if UART unreliable, or add watchdog to reconnect. This is a hardware integration risk.
### Potential issue: Telemetry bandwidth saturation
If GPS-Denied sends too many NAMED_VALUE_FLOAT messages, it could compete with standard autopilot telemetry for bandwidth. Keep custom messages to 1Hz max (50-100 bytes/s = <1kbit/s).
## Review Checklist
- [x] Draft conclusions consistent with fact cards
- [x] No important dimensions missed
- [x] No over-extrapolation
- [x] Conclusions actionable and verifiable
- [x] All new AC requirements addressed
- [ ] UART reliability needs hardware testing — cannot validate without physical setup
## Conclusions Requiring Revision
None — all conclusions hold under validation. The UART reliability risk needs flagging but doesn't change the architecture.
@@ -0,0 +1,57 @@
# Question Decomposition
## Original Question
Should we switch from ONNX Runtime to native TensorRT Engine for all AI models in the GPS-Denied pipeline, running on Jetson Orin Nano Super?
## Active Mode
Mode B: Solution Assessment — existing solution_draft03.md uses ONNX Runtime / mixed inference. User requests focused investigation of TRT Engine migration.
## Question Type
Decision Support — evaluating a technology switch with cost/risk/benefit dimensions.
## Research Subject Boundary
| Dimension | Boundary |
|-----------|----------|
| Population | AI inference models in the GPS-Denied navigation pipeline |
| Hardware | Jetson Orin Nano Super (8GB LPDDR5, 67 TOPS sparse INT8, 1020 MHz GPU, NO DLA) |
| Software | JetPack 6.2 (CUDA 12.6, TensorRT 10.3, cuDNN 9.3) |
| Timeframe | Current (2025-2026), JetPack 6.2 era |
## AI Models in Pipeline
| Model | Type | Current Runtime | TRT Applicable? |
|-------|------|----------------|-----------------|
| cuVSLAM | Native CUDA library (closed-source) | CUDA native | NO — already CUDA-optimized binary |
| LiteSAM | PyTorch (MobileOne + TAIFormer + MinGRU) | Planned TRT FP16 | YES |
| XFeat | PyTorch (learned features) | XFeatTensorRT exists | YES |
| ESKF | Mathematical filter (Python/C++) | CPU/NumPy | NO — not an AI model |
Only LiteSAM and XFeat are convertible to TRT Engine. cuVSLAM is already NVIDIA-native CUDA.
## Decomposed Sub-Questions
1. What is the performance difference between ONNX Runtime and native TRT Engine on Jetson Orin Nano Super?
2. What is the memory overhead of ONNX Runtime vs native TRT on 8GB shared memory?
3. What conversion paths exist for PyTorch → TRT Engine on Jetson aarch64?
4. Are TRT engines hardware-specific? What's the deployment workflow?
5. What are the specific conversion steps for LiteSAM and XFeat?
6. Does Jetson Orin Nano Super have DLA for offloading?
7. What are the risks and limitations of going TRT-only?
## Timeliness Sensitivity Assessment
- **Research Topic**: TensorRT vs ONNX Runtime inference on Jetson Orin Nano Super
- **Sensitivity Level**: 🟠 High
- **Rationale**: TensorRT, JetPack, and ONNX Runtime release new versions frequently. Jetson Orin Nano Super mode is relatively new (JetPack 6.2, Jan 2025).
- **Source Time Window**: 12 months
- **Priority official sources to consult**:
1. NVIDIA TensorRT documentation (docs.nvidia.com)
2. NVIDIA JetPack 6.2 release notes
3. ONNX Runtime GitHub issues (microsoft/onnxruntime)
4. NVIDIA TensorRT GitHub issues (NVIDIA/TensorRT)
- **Key version information to verify**:
- TensorRT: 10.3 (JetPack 6.2)
- ONNX Runtime: 1.20.1+ (Jetson builds)
- JetPack: 6.2
- CUDA: 12.6
@@ -0,0 +1,231 @@
# Source Registry
## Source #1
- **Title**: ONNX Runtime Issue #24085: CUDA EP on Jetson Orin Nano does not use tensor cores
- **Link**: https://github.com/microsoft/onnxruntime/issues/24085
- **Tier**: L1 (Official GitHub issue with MSFT response)
- **Publication Date**: 2025-03-18
- **Timeliness Status**: ✅ Currently valid
- **Version Info**: ONNX Runtime v1.20.1+, JetPack 6.1, CUDA 12.6
- **Target Audience**: Jetson Orin Nano developers
- **Research Boundary Match**: ✅ Full match
- **Summary**: ONNX Runtime CUDA EP on Jetson Orin Nano is 7-8x slower than TRT standalone due to tensor cores not being utilized. Workaround: remove cudnn_conv_algo_search option and use FP16 models.
- **Related Sub-question**: Q1 (performance difference)
## Source #2
- **Title**: ONNX Runtime Issue #20457: VRAM usage difference between TRT-EP and native TRT
- **Link**: https://github.com/microsoft/onnxruntime/issues/20457
- **Tier**: L1 (Official GitHub issue with MSFT dev response)
- **Publication Date**: 2024-04-25
- **Timeliness Status**: ✅ Currently valid
- **Version Info**: ONNX Runtime 1.17.1, CUDA 12.2
- **Target Audience**: All ONNX Runtime + TRT users
- **Research Boundary Match**: ✅ Full match
- **Summary**: ONNX Runtime TRT-EP keeps serialized engine in memory (~420-440MB) during execution. Native TRT drops to 130-140MB after engine build by calling releaseBlob(). Delta: ~280-300MB.
- **Related Sub-question**: Q2 (memory overhead)
## Source #3
- **Title**: ONNX Runtime Issue #12083: TensorRT Provider vs TensorRT Native
- **Link**: https://github.com/microsoft/onnxruntime/issues/12083
- **Tier**: L2 (Official MSFT dev response)
- **Publication Date**: 2022-07-05 (confirmed still relevant)
- **Timeliness Status**: ⚠️ Needs verification (old but fundamental architecture hasn't changed)
- **Version Info**: General ONNX Runtime
- **Target Audience**: All ONNX Runtime users
- **Research Boundary Match**: ✅ Full match
- **Summary**: MSFT engineer confirms TRT-EP "can achieve performance parity with native TensorRT." Benefit is automatic fallback for unsupported ops.
- **Related Sub-question**: Q1 (performance difference)
## Source #4
- **Title**: ONNX Runtime Issue #11356: Lower performance on InceptionV3/4 with TRT EP
- **Link**: https://github.com/microsoft/onnxruntime/issues/11356
- **Tier**: L4 (Community report)
- **Publication Date**: 2022
- **Timeliness Status**: ⚠️ Needs verification
- **Version Info**: ONNX Runtime older version
- **Target Audience**: ONNX Runtime users
- **Research Boundary Match**: ⚠️ Partial (different model, but same mechanism)
- **Summary**: Reports ~3x performance difference (41 vs 129 inferences/sec) between ONNX RT TRT-EP and native TRT on InceptionV3/4.
- **Related Sub-question**: Q1 (performance difference)
## Source #5
- **Title**: NVIDIA JetPack 6.2 Release Notes
- **Link**: https://docs.nvidia.com/jetson/archives/jetpack-archived/jetpack-62/release-notes/index.html
- **Tier**: L1 (Official NVIDIA documentation)
- **Publication Date**: 2025-01
- **Timeliness Status**: ✅ Currently valid
- **Version Info**: JetPack 6.2, TensorRT 10.3, CUDA 12.6, cuDNN 9.3
- **Target Audience**: Jetson developers
- **Research Boundary Match**: ✅ Full match
- **Summary**: JetPack 6.2 includes TensorRT 10.3, enables Super Mode for Orin Nano (67 TOPS, 1020 MHz GPU, 25W).
- **Related Sub-question**: Q3 (conversion paths)
## Source #6
- **Title**: NVIDIA Jetson Orin Nano Super Developer Kit Blog
- **Link**: https://developer.nvidia.com/blog/nvidia-jetson-orin-nano-developer-kit-gets-a-super-boost/
- **Tier**: L2 (Official NVIDIA blog)
- **Publication Date**: 2025
- **Timeliness Status**: ✅ Currently valid
- **Version Info**: Orin Nano Super, 67 TOPS sparse INT8
- **Target Audience**: Jetson developers
- **Research Boundary Match**: ✅ Full match
- **Summary**: Super mode: GPU 1020 MHz (vs 635), 67 TOPS sparse (vs 40), memory bandwidth 102 GB/s (vs 68), power 25W. No DLA cores on Orin Nano.
- **Related Sub-question**: Q6 (DLA availability)
## Source #7
- **Title**: Jetson Orin module comparison (Connect Tech)
- **Link**: https://connecttech.com/jetson/jetson-module-comparison
- **Tier**: L3 (Authoritative hardware vendor)
- **Publication Date**: 2025
- **Timeliness Status**: ✅ Currently valid
- **Target Audience**: Jetson hardware buyers
- **Research Boundary Match**: ✅ Full match
- **Summary**: Confirms Orin Nano has NO DLA cores. Orin NX has 1-2 DLA. AGX Orin has 2 DLA.
- **Related Sub-question**: Q6 (DLA availability)
## Source #8
- **Title**: TensorRT Engine hardware specificity (NVIDIA/TensorRT Issue #1920)
- **Link**: https://github.com/NVIDIA/TensorRT/issues/1920
- **Tier**: L1 (Official NVIDIA TensorRT repo)
- **Publication Date**: 2022 (confirmed still valid for TRT 10)
- **Timeliness Status**: ✅ Currently valid
- **Version Info**: All TensorRT versions
- **Target Audience**: TensorRT developers
- **Research Boundary Match**: ✅ Full match
- **Summary**: TRT engines are tied to specific GPU models. Must build on target hardware. Cannot cross-compile x86→aarch64.
- **Related Sub-question**: Q4 (deployment workflow)
## Source #9
- **Title**: trtexec ONNX to TRT conversion on Jetson Orin Nano (StackOverflow)
- **Link**: https://stackoverflow.com/questions/78787534/converting-a-pytorch-onnx-model-to-tensorrt-engine-jetson-orin-nano
- **Tier**: L4 (Community)
- **Publication Date**: 2024
- **Timeliness Status**: ✅ Currently valid
- **Target Audience**: Jetson developers
- **Research Boundary Match**: ✅ Full match
- **Summary**: Standard workflow: trtexec --onnx=model.onnx --saveEngine=model.trt --fp16. Use --memPoolSize instead of deprecated --workspace.
- **Related Sub-question**: Q3, Q5 (conversion workflow)
## Source #10
- **Title**: TensorRT 10 Python API Documentation
- **Link**: https://docs.nvidia.com/deeplearning/tensorrt/10.15.1/inference-library/python-api-docs.html
- **Tier**: L1 (Official NVIDIA docs)
- **Publication Date**: 2025
- **Timeliness Status**: ✅ Currently valid
- **Version Info**: TensorRT 10.x
- **Target Audience**: TensorRT Python developers
- **Research Boundary Match**: ✅ Full match
- **Summary**: TRT 10 uses tensor-based API (not binding indices). load engine via runtime.deserialize_cuda_engine(). Async inference via context.enqueue_v3(stream_handle).
- **Related Sub-question**: Q3 (conversion paths)
## Source #11
- **Title**: Torch-TensorRT JetPack documentation
- **Link**: https://docs.pytorch.org/TensorRT/v2.10.0/getting_started/jetpack.html
- **Tier**: L1 (Official documentation)
- **Publication Date**: 2025
- **Timeliness Status**: ✅ Currently valid
- **Version Info**: Torch-TensorRT, JetPack 6.2, PyTorch 2.8.0
- **Target Audience**: PyTorch developers on Jetson
- **Research Boundary Match**: ✅ Full match
- **Summary**: Torch-TensorRT supports Jetson aarch64 with JetPack 6.2. Supports AOT compilation, FP16/INT8, dynamic shapes.
- **Related Sub-question**: Q3 (conversion paths)
## Source #12
- **Title**: XFeatTensorRT GitHub repo
- **Link**: https://github.com/PranavNedunghat/XFeatTensorRT
- **Tier**: L4 (Community)
- **Publication Date**: 2024
- **Timeliness Status**: ✅ Currently valid
- **Target Audience**: XFeat users on NVIDIA GPUs
- **Research Boundary Match**: ✅ Full match
- **Summary**: C++ TRT implementation of XFeat feature extraction. Already converts XFeat to TRT engine.
- **Related Sub-question**: Q5 (XFeat conversion)
## Source #13
- **Title**: TensorRT Best Practices (Official NVIDIA)
- **Link**: https://docs.nvidia.com/deeplearning/tensorrt/latest/performance/best-practices.html
- **Tier**: L1 (Official NVIDIA docs)
- **Publication Date**: 2025
- **Timeliness Status**: ✅ Currently valid
- **Version Info**: TensorRT 10.x
- **Target Audience**: TensorRT developers
- **Research Boundary Match**: ✅ Full match
- **Summary**: Comprehensive guide: use trtexec for benchmarking, --fp16 for FP16, use ModelOptimizer for INT8, use polygraphy for model inspection.
- **Related Sub-question**: Q3 (conversion workflow)
## Source #14
- **Title**: NVIDIA blog: Maximizing DL Performance on Jetson Orin with DLA
- **Link**: https://developer.nvidia.com/blog/maximizing-deep-learning-performance-on-nvidia-jetson-orin-with-dla/
- **Tier**: L2 (Official NVIDIA blog)
- **Publication Date**: 2024
- **Timeliness Status**: ✅ Currently valid
- **Target Audience**: Jetson Orin developers (NX and AGX)
- **Research Boundary Match**: ⚠️ Partial (DLA not available on Orin Nano)
- **Summary**: DLA contributes 38-74% of total DL performance on Orin (NX/AGX). Supports CNN layers in FP16/INT8. NOT available on Orin Nano.
- **Related Sub-question**: Q6 (DLA availability)
## Source #15
- **Title**: PUT Vision Lab: TensorRT vs ONNXRuntime comparison on Jetson
- **Link**: https://putvision.github.io/article/2021/12/20/jetson-onnxruntime-tensorrt.html
- **Tier**: L3 (Academic lab blog)
- **Publication Date**: 2021 (foundational comparison, architecture unchanged)
- **Timeliness Status**: ⚠️ Needs verification (older, but core findings still apply)
- **Target Audience**: Jetson developers
- **Research Boundary Match**: ⚠️ Partial (older Jetson, but same TRT vs ONNX RT question)
- **Summary**: Native TRT generally faster. ONNX RT TRT-EP adds wrapper overhead. Both use same TRT kernels internally.
- **Related Sub-question**: Q1 (performance difference)
## Source #16
- **Title**: LiteSAM paper — MinGRU details (Eqs 12-16, Section 3.4.2)
- **Link**: https://www.mdpi.com/2072-4292/17/19/3349
- **Tier**: L1 (Peer-reviewed paper)
- **Publication Date**: 2025
- **Timeliness Status**: ✅ Currently valid
- **Target Audience**: Satellite-aerial matching researchers
- **Research Boundary Match**: ✅ Full match
- **Summary**: MinGRU subpixel refinement uses 4 stacked layers, 3×3 window (9 candidates). Gates depend only on input C_f. Ops: Linear, Sigmoid, Mul, Add, ReLU, Tanh.
- **Related Sub-question**: Q5 (LiteSAM TRT compatibility)
## Source #17
- **Title**: Coarse_LoFTR_TRT paper — LoFTR TRT adaptation for embedded devices
- **Link**: https://ar5iv.labs.arxiv.org/html/2202.00770
- **Tier**: L2 (arXiv paper with working open-source code)
- **Publication Date**: 2022
- **Timeliness Status**: ✅ Currently valid (TRT adaptation techniques still apply)
- **Target Audience**: Feature matching on embedded devices
- **Research Boundary Match**: ✅ Full match
- **Summary**: Documents specific code changes for TRT compatibility: einsum→elementary ops, ONNX export, knowledge distillation. Tested on Jetson Nano 2GB. 2.26M params reduced from 27.95M.
- **Related Sub-question**: Q5 (EfficientLoFTR as TRT-proven alternative)
## Source #18
- **Title**: minGRU paper — "Were RNNs All We Needed?"
- **Link**: https://huggingface.co/papers/2410.01201
- **Tier**: L1 (Research paper)
- **Publication Date**: 2024-10
- **Timeliness Status**: ✅ Currently valid
- **Target Audience**: RNN/sequence model researchers
- **Research Boundary Match**: ✅ Full match
- **Summary**: MinGRU removes gate dependency on h_{t-1}, enabling parallel computation. Parallel implementation uses logcumsumexp for numerical stability. 175x faster than sequential for seq_len=512.
- **Related Sub-question**: Q5 (MinGRU TRT compatibility)
## Source #19
- **Title**: SAM2 TRT performance degradation issue
- **Link**: https://github.com/facebookresearch/sam2/issues/639
- **Tier**: L4 (GitHub issue)
- **Publication Date**: 2024
- **Timeliness Status**: ✅ Currently valid
- **Target Audience**: SAM/transformer TRT deployers
- **Research Boundary Match**: ⚠️ Partial (different model, but relevant for transformer attention TRT risks)
- **Summary**: SAM2 MemoryAttention 30ms PyTorch → 100ms TRT FP16. RoPEAttention bottleneck. Warning for transformer TRT conversion.
- **Related Sub-question**: Q7 (TRT conversion risks)
## Source #20
- **Title**: EfficientLoFTR (CVPR 2024)
- **Link**: https://github.com/zju3dv/EfficientLoFTR
- **Tier**: L1 (CVPR paper + open-source code)
- **Publication Date**: 2024
- **Timeliness Status**: ✅ Currently valid
- **Target Audience**: Feature matching researchers
- **Research Boundary Match**: ✅ Full match
- **Summary**: 2.5x faster than LoFTR, higher accuracy. 15.05M params. Semi-dense matching. Available on HuggingFace under Apache 2.0. 964 GitHub stars.
- **Related Sub-question**: Q5 (alternative satellite matcher)
@@ -0,0 +1,193 @@
# Fact Cards
## Fact #1
- **Statement**: ONNX Runtime CUDA Execution Provider on Jetson Orin Nano (JetPack 6.1) is 7-8x slower than TensorRT standalone due to tensor cores not being utilized with default settings.
- **Source**: Source #1 (https://github.com/microsoft/onnxruntime/issues/24085)
- **Phase**: Assessment
- **Target Audience**: Jetson Orin Nano developers using ONNX Runtime
- **Confidence**: ✅ High (confirmed by issue author with NSight profiling, MSFT acknowledged)
- **Related Dimension**: Performance
## Fact #2
- **Statement**: The workaround for Fact #1 is to remove the `cudnn_conv_algo_search` option (which defaults to EXHAUSTIVE) and use FP16 models. This restores tensor core usage.
- **Source**: Source #1
- **Phase**: Assessment
- **Target Audience**: Jetson Orin Nano developers
- **Confidence**: ✅ High (confirmed fix by issue author)
- **Related Dimension**: Performance
## Fact #3
- **Statement**: ONNX Runtime TRT-EP keeps serialized TRT engine in memory (~420-440MB) throughout execution. Native TRT via trtexec drops to 130-140MB after engine deserialization by calling releaseBlob().
- **Source**: Source #2 (https://github.com/microsoft/onnxruntime/issues/20457)
- **Phase**: Assessment
- **Target Audience**: All ONNX RT TRT-EP users, especially memory-constrained devices
- **Confidence**: ✅ High (confirmed by MSFT developer @chilo-ms with detailed explanation)
- **Related Dimension**: Memory consumption
## Fact #4
- **Statement**: The ~280-300MB extra memory from ONNX RT TRT-EP (Fact #3) is because the serialized engine is retained across compute function calls for dynamic shape models. Native TRT releases it after deserialization.
- **Source**: Source #2
- **Phase**: Assessment
- **Target Audience**: Memory-constrained Jetson deployments
- **Confidence**: ✅ High (MSFT developer explanation)
- **Related Dimension**: Memory consumption
## Fact #5
- **Statement**: MSFT engineer states "TensorRT EP can achieve performance parity with native TensorRT" — both use the same TRT kernels internally. Benefit of TRT-EP is automatic fallback for unsupported ops.
- **Source**: Source #3 (https://github.com/microsoft/onnxruntime/issues/12083)
- **Phase**: Assessment
- **Target Audience**: General
- **Confidence**: ⚠️ Medium (official statement but contradicted by real benchmarks in some cases)
- **Related Dimension**: Performance
## Fact #6
- **Statement**: Real benchmark of InceptionV3/4 showed ONNX RT TRT-EP achieving ~41 inferences/sec vs native TRT at ~129 inferences/sec — approximately 3x performance gap.
- **Source**: Source #4 (https://github.com/microsoft/onnxruntime/issues/11356)
- **Phase**: Assessment
- **Target Audience**: CNN model deployers
- **Confidence**: ⚠️ Medium (community report, older ONNX RT version, model-specific)
- **Related Dimension**: Performance
## Fact #7
- **Statement**: Jetson Orin Nano Super specs: 67 TOPS sparse INT8 / 33 TOPS dense, GPU at 1020 MHz, 8GB LPDDR5 shared, 102 GB/s bandwidth, 25W TDP. NO DLA cores.
- **Source**: Source #6, Source #7
- **Phase**: Assessment
- **Target Audience**: Our project
- **Confidence**: ✅ High (official NVIDIA specs)
- **Related Dimension**: Hardware constraints
## Fact #8
- **Statement**: Jetson Orin Nano has ZERO DLA (Deep Learning Accelerator) cores. DLA is only available on Orin NX (1-2 cores) and AGX Orin (2 cores).
- **Source**: Source #7, Source #14
- **Phase**: Assessment
- **Target Audience**: Our project
- **Confidence**: ✅ High (official hardware specifications)
- **Related Dimension**: Hardware constraints
## Fact #9
- **Statement**: TensorRT engines are tied to specific GPU models, not just architectures. Must be built on the target device. Cannot cross-compile from x86 to aarch64.
- **Source**: Source #8 (https://github.com/NVIDIA/TensorRT/issues/1920)
- **Phase**: Assessment
- **Target Audience**: TRT deployers
- **Confidence**: ✅ High (NVIDIA confirmed)
- **Related Dimension**: Deployment workflow
## Fact #10
- **Statement**: Standard conversion workflow: PyTorch → ONNX (torch.onnx.export) → trtexec --onnx=model.onnx --saveEngine=model.engine --fp16. Use --memPoolSize instead of deprecated --workspace flag.
- **Source**: Source #9, Source #13
- **Phase**: Assessment
- **Target Audience**: Model deployers on Jetson
- **Confidence**: ✅ High (official NVIDIA workflow)
- **Related Dimension**: Deployment workflow
## Fact #11
- **Statement**: TensorRT 10.x Python API: load engine via runtime.deserialize_cuda_engine(data). Async inference via context.enqueue_v3(stream_handle). Uses tensor-name-based API (not binding indices).
- **Source**: Source #10
- **Phase**: Assessment
- **Target Audience**: Python TRT developers
- **Confidence**: ✅ High (official NVIDIA docs)
- **Related Dimension**: API/integration
## Fact #12
- **Statement**: Torch-TensorRT supports Jetson aarch64 with JetPack 6.2. Supports ahead-of-time (AOT) compilation, FP16/INT8, dynamic and static shapes. Alternative path to ONNX→trtexec.
- **Source**: Source #11
- **Phase**: Assessment
- **Target Audience**: PyTorch developers on Jetson
- **Confidence**: ✅ High (official documentation)
- **Related Dimension**: Deployment workflow
## Fact #13
- **Statement**: XFeatTensorRT repo exists — C++ TensorRT implementation of XFeat feature extraction. Confirms XFeat is TRT-convertible.
- **Source**: Source #12
- **Phase**: Assessment
- **Target Audience**: Our project (XFeat users)
- **Confidence**: ✅ High (working open-source implementation)
- **Related Dimension**: Model-specific conversion
## Fact #14
- **Statement**: cuVSLAM is a closed-source NVIDIA CUDA library (PyCuVSLAM). It is NOT an ONNX or PyTorch model. It cannot and does not need to be converted to TRT — it's already native CUDA-optimized for Jetson.
- **Source**: cuVSLAM documentation (https://github.com/NVlabs/PyCuVSLAM)
- **Phase**: Assessment
- **Target Audience**: Our project
- **Confidence**: ✅ High (verified from PyCuVSLAM docs)
- **Related Dimension**: Model applicability
## Fact #15
- **Statement**: JetPack 6.2 ships with TensorRT 10.3, CUDA 12.6, cuDNN 9.3. The tensorrt Python module is pre-installed and accessible on Jetson.
- **Source**: Source #5
- **Phase**: Assessment
- **Target Audience**: Jetson developers
- **Confidence**: ✅ High (official release notes)
- **Related Dimension**: Software stack
## Fact #16
- **Statement**: TRT engine build on Jetson Orin Nano Super (8GB) can cause OOM for large models during the build phase, even if inference fits in memory. Workaround: build on a more powerful machine with same GPU architecture, or use Torch-TensorRT PyTorch workflow.
- **Source**: Source #5 (https://github.com/NVIDIA/TensorRT-LLM/issues/3149)
- **Phase**: Assessment
- **Target Audience**: Jetson Orin Nano developers building large TRT engines
- **Confidence**: ✅ High (confirmed in NVIDIA TRT-LLM issue)
- **Related Dimension**: Deployment workflow
## Fact #17
- **Statement**: LiteSAM uses MobileOne backbone which is reparameterizable — multi-branch training structure collapses to a single feed-forward path. This is critical for TRT optimization: fewer layers, better fusion, faster inference.
- **Source**: Solution draft03, LiteSAM paper
- **Phase**: Assessment
- **Target Audience**: Our project
- **Confidence**: ✅ High (published paper)
- **Related Dimension**: Model-specific conversion
## Fact #18
- **Statement**: INT8 quantization is safe for CNN layers (MobileOne backbone) but NOT for transformer components (TAIFormer in LiteSAM). FP16 is safe for both CNN and transformer layers.
- **Source**: Solution draft02/03 analysis
- **Phase**: Assessment
- **Target Audience**: Our project
- **Confidence**: ⚠️ Medium (general best practice, not verified on LiteSAM specifically)
- **Related Dimension**: Quantization strategy
## Fact #19
- **Statement**: On 8GB shared memory Jetson: OS+runtime ~1.5GB, cuVSLAM ~200-500MB, tiles ~200MB. Remaining budget: ~5.8-6.1GB. ONNX RT TRT-EP overhead of ~280-300MB per model is significant. Native TRT saves this memory.
- **Source**: Solution draft03 memory budget + Source #2
- **Phase**: Assessment
- **Target Audience**: Our project
- **Confidence**: ✅ High (computed from verified facts)
- **Related Dimension**: Memory consumption
## Fact #20
- **Statement**: LiteSAM's MinGRU subpixel refinement (Eqs 12-16) uses: z_t = σ(Linear(C_f)), h̃_t = Linear(C_f), h_t = (1-z_t)⊙h_{t-1} + z_t⊙h̃_t. Gates depend ONLY on input C_f (not h_{t-1}). Operates on 3×3 window (9 candidates), 4 stacked layers. All ops are standard: Linear, Sigmoid, Mul, Add, ReLU, Tanh.
- **Source**: LiteSAM paper (MDPI Remote Sensing, 2025, Eqs 12-16, Section 3.4.2)
- **Phase**: Assessment
- **Target Audience**: Our project
- **Confidence**: ✅ High (from the published paper)
- **Related Dimension**: LiteSAM TRT compatibility
## Fact #21
- **Statement**: MinGRU's parallel implementation can use logcumsumexp (log-space parallel scan), which is NOT a standard ONNX operator. However, for seq_len=9 (LiteSAM's 3×3 window), a simple unrolled loop is equivalent and uses only standard ops.
- **Source**: minGRU paper + lucidrains/minGRU-pytorch implementation
- **Phase**: Assessment
- **Target Audience**: Our project
- **Confidence**: ⚠️ Medium (logcumsumexp risk depends on implementation; seq_len=9 makes rewrite trivial)
- **Related Dimension**: LiteSAM TRT compatibility
## Fact #22
- **Statement**: EfficientLoFTR has a proven TRT conversion path via Coarse_LoFTR_TRT (138 stars). The paper documents specific code changes needed: replace einsum with elementary ops (view, bmm, reshape, sum), adapt for TRT-compatible functions. Tested on Jetson Nano 2GB (~5 FPS with distilled model).
- **Source**: Coarse_LoFTR_TRT paper (arXiv:2202.00770) + GitHub repo
- **Phase**: Assessment
- **Target Audience**: Our project
- **Confidence**: ✅ High (published paper + working open-source implementation)
- **Related Dimension**: Fallback satellite matcher
## Fact #23
- **Statement**: EfficientLoFTR has 15.05M params (2.4x more than LiteSAM's 6.31M). On AGX Orin with PyTorch: ~620ms (LiteSAM is 19.8% faster). Semi-dense matching. CVPR 2024. Available on HuggingFace under Apache 2.0.
- **Source**: LiteSAM paper comparison + EfficientLoFTR docs
- **Phase**: Assessment
- **Target Audience**: Our project
- **Confidence**: ✅ High (published benchmarks)
- **Related Dimension**: Fallback satellite matcher
## Fact #24
- **Statement**: SAM2's MemoryAttention showed performance DEGRADATION with TRT: 30ms PyTorch → 100ms TRT FP16. RoPEAttention identified as bottleneck. This warns that transformer attention modules may not always benefit from TRT conversion.
- **Source**: https://github.com/facebookresearch/sam2/issues/639
- **Phase**: Assessment
- **Target Audience**: Transformer model deployers
- **Confidence**: ⚠️ Medium (different model, but relevant warning for attention layers)
- **Related Dimension**: TRT conversion risks
@@ -0,0 +1,38 @@
# Comparison Framework
## Selected Framework Type
Decision Support
## Selected Dimensions
1. Inference latency
2. Memory consumption
3. Deployment workflow complexity
4. Operator coverage / fallback
5. API / integration effort
6. Hardware utilization (tensor cores)
7. Maintenance / ecosystem
8. Cross-platform portability
## Comparison: Native TRT Engine vs ONNX Runtime (TRT-EP and CUDA EP)
| Dimension | Native TRT Engine | ONNX Runtime TRT-EP | ONNX Runtime CUDA EP | Factual Basis |
|-----------|-------------------|---------------------|----------------------|---------------|
| Inference latency | Optimal — uses TRT kernels directly, hardware-tuned | Near-parity with native TRT (same kernels), but up to 3x slower on some models due to wrapper overhead | 7-8x slower on Orin Nano with default settings (tensor core issue) | Fact #1, #5, #6 |
| Memory consumption | ~130-140MB after engine load (releases serialized blob) | ~420-440MB during execution (keeps serialized engine) | Standard CUDA memory + framework overhead | Fact #3, #4 |
| Memory delta per model | Baseline | +280-300MB vs native TRT | Higher than TRT-EP | Fact #3, #19 |
| Deployment workflow | PyTorch → ONNX → trtexec → .engine (must build ON target device) | PyTorch → ONNX → pass to ONNX Runtime session (auto-builds TRT engine) | PyTorch → ONNX → pass to ONNX Runtime session | Fact #9, #10 |
| Operator coverage | Only TRT-supported ops. Unsupported ops = build failure | Auto-fallback to CUDA/CPU for unsupported ops | All ONNX ops supported via CUDA/cuDNN | Fact #5 |
| API complexity | Lower-level: manual buffer allocation, CUDA streams, tensor management | Higher-level: InferenceSession, automatic I/O | Highest-level: same ONNX Runtime API | Fact #11 |
| Hardware utilization | Full: tensor cores, layer fusion, kernel auto-tuning, mixed precision | Full TRT kernels for supported ops, CUDA fallback for rest | Broken on Orin Nano with default settings (no tensor cores) | Fact #1, #2 |
| Maintenance | Engine must be rebuilt per TRT version and per GPU model | ONNX model is portable, engine rebuilt automatically | ONNX model is portable | Fact #9 |
| Cross-platform | NVIDIA-only, hardware-specific engine files | Multi-platform ONNX model, TRT-EP only on NVIDIA | Multi-platform (NVIDIA, AMD, Intel, CPU) | Fact #9 |
| Relevance to our project | ✅ Best — we deploy only on Jetson Orin Nano Super | ❌ Cross-platform benefit wasted — we're NVIDIA-only | ❌ Performance issue on our target hardware | Fact #7, #8 |
## Per-Model Applicability
| Model | Can Convert to TRT? | Recommended Path | Notes |
|-------|---------------------|------------------|-------|
| cuVSLAM | NO | N/A — already CUDA native | Closed-source NVIDIA library, already optimized |
| LiteSAM | YES | PyTorch → reparameterize MobileOne → ONNX → trtexec --fp16 | INT8 safe for MobileOne backbone only, NOT TAIFormer |
| XFeat | YES | PyTorch → ONNX → trtexec --fp16 (or use XFeatTensorRT C++) | XFeatTensorRT repo already exists |
| ESKF | N/A | N/A — mathematical filter, not a neural network | Python/C++ NumPy |
@@ -0,0 +1,124 @@
# Reasoning Chain
## Dimension 1: Inference Latency
### Fact Confirmation
ONNX Runtime CUDA EP on Jetson Orin Nano is 7-8x slower than TRT standalone with default settings (Fact #1). Even with the workaround (Fact #2), ONNX RT adds wrapper overhead. ONNX RT TRT-EP claims "performance parity" (Fact #5), but real benchmarks show up to 3x gaps on specific models (Fact #6).
### Reference Comparison
Native TRT uses kernel auto-tuning, layer fusion, and mixed-precision natively — no framework wrapper. Our models (LiteSAM, XFeat) are CNN+transformer architectures where TRT's fusion optimizations are most impactful. LiteSAM's reparameterized MobileOne backbone (Fact #17) is particularly well-suited for TRT fusion.
### Conclusion
Native TRT Engine provides the lowest possible inference latency on Jetson Orin Nano Super. ONNX Runtime adds measurable overhead, ranging from negligible to 3x depending on model architecture and configuration. For our latency-critical pipeline (400ms total budget, satellite matching target ≤200ms), every millisecond matters.
### Confidence
✅ High — supported by multiple sources, confirmed NVIDIA optimization pipeline.
---
## Dimension 2: Memory Consumption
### Fact Confirmation
ONNX RT TRT-EP keeps ~420-440MB during execution vs native TRT at ~130-140MB (Fact #3). This is ~280-300MB extra PER MODEL. On our 8GB shared memory Jetson, OS+runtime takes ~1.5GB, cuVSLAM ~200-500MB, tiles ~200MB (Fact #19).
### Reference Comparison
If we run both LiteSAM and XFeat via ONNX RT TRT-EP: ~560-600MB extra memory overhead. Via native TRT: this overhead drops to near zero.
With native TRT:
- LiteSAM engine: ~50-80MB
- XFeat engine: ~30-50MB
With ONNX RT TRT-EP:
- LiteSAM: ~50-80MB + ~280MB overhead = ~330-360MB
- XFeat: ~30-50MB + ~280MB overhead = ~310-330MB
### Conclusion
Native TRT saves ~280-300MB per model vs ONNX RT TRT-EP. On our 8GB shared memory device, this is 3.5-3.75% of total memory PER MODEL. With two models, that's ~7% of total memory saved — meaningful when memory pressure from cuVSLAM map growth is a known risk.
### Confidence
✅ High — confirmed by MSFT developer with detailed explanation of mechanism.
---
## Dimension 3: Deployment Workflow
### Fact Confirmation
Native TRT requires: PyTorch → ONNX → trtexec → .engine file. Engine must be built ON the target Jetson device (Fact #9). Engine is tied to specific GPU model and TRT version. TRT engine build on 8GB Jetson can OOM for large models (Fact #16).
### Reference Comparison
ONNX Runtime auto-builds TRT engine from ONNX at first run (or caches). Simpler developer experience but first-run latency spike. Torch-TensorRT (Fact #12) offers AOT compilation as middle ground.
Our models are small (LiteSAM 6.31M params, XFeat even smaller). Engine build OOM is unlikely for our model sizes. Build once before flight, ship .engine files.
### Conclusion
Native TRT requires an explicit offline build step (trtexec on Jetson), but this is a one-time cost per model version. For our use case (pre-flight preparation already includes satellite tile download), adding a TRT engine build to the preparation workflow is trivial. The deployment complexity is acceptable.
### Confidence
✅ High — well-documented workflow, our model sizes are small enough.
---
## Dimension 4: Operator Coverage / Fallback
### Fact Confirmation
Native TRT fails if a model contains unsupported operators. ONNX RT TRT-EP auto-falls back to CUDA/CPU for unsupported ops (Fact #5). This is TRT-EP's primary value proposition.
### Reference Comparison
LiteSAM (MobileOne + TAIFormer + MinGRU) and XFeat use standard operations: Conv2d, attention, GRU, ReLU, etc. These are all well-supported by TensorRT 10.3. MobileOne's reparameterized form is pure Conv2d+BN — trivially supported. TAIFormer attention uses standard softmax/matmul — supported in TRT 10. MinGRU is a simplified GRU — may need verification.
Risk: If any op in LiteSAM is unsupported by TRT, the entire export fails. Mitigation: verify with polygraphy before deployment. If an op fails, refactor or use Torch-TensorRT which can handle mixed TRT/PyTorch execution.
### Conclusion
For our specific models, operator coverage risk is LOW. Standard CNN+transformer ops are well-supported in TRT 10.3. ONNX RT's fallback benefit is insurance we're unlikely to need. MinGRU in LiteSAM should be verified, but standard GRU ops are TRT-supported.
### Confidence
⚠️ Medium — high confidence for MobileOne+TAIFormer, medium for MinGRU (needs verification on TRT 10.3).
---
## Dimension 5: API / Integration Effort
### Fact Confirmation
Native TRT Python API (Fact #11): manual buffer allocation with PyCUDA, CUDA stream management, tensor setup via engine.get_tensor_name(). ONNX Runtime: simple InferenceSession with .run().
### Reference Comparison
TRT Python API requires ~30-50 lines of boilerplate per model (engine load, buffer allocation, inference loop). ONNX Runtime requires ~5-10 lines. However, this is write-once code, encapsulated in a wrapper class.
Our pipeline already uses CUDA streams for cuVSLAM pipelining (Stream A for VO, Stream B for satellite matching). Adding TRT inference to Stream B is natural — just pass stream_handle to context.enqueue_v3().
### Conclusion
Slightly more code with native TRT, but it's boilerplate that gets written once and wrapped. The CUDA stream integration actually BENEFITS from native TRT — direct stream control enables better pipelining with cuVSLAM.
### Confidence
✅ High — well-documented API, straightforward integration.
---
## Dimension 6: Hardware Utilization
### Fact Confirmation
ONNX RT CUDA EP does NOT use tensor cores on Jetson Orin Nano by default (Fact #1). Native TRT uses tensor cores, layer fusion, kernel auto-tuning automatically. Jetson Orin Nano Super has 16 tensor cores at 1020 MHz (Fact #7). No DLA available (Fact #8).
### Reference Comparison
Since there's no DLA to offload to, GPU is our only accelerator. Maximizing GPU utilization is critical. Native TRT squeezes every ounce from the 16 tensor cores. ONNX RT has a known bug preventing this on our exact hardware.
### Conclusion
Native TRT is the only way to guarantee full hardware utilization on Jetson Orin Nano Super. ONNX RT's tensor core issue (even if workaround exists) introduces fragility. Since we have no DLA, wasting GPU tensor cores is unacceptable.
### Confidence
✅ High — hardware limitation is confirmed, no alternative accelerator.
---
## Dimension 7: Cross-Platform Portability
### Fact Confirmation
ONNX Runtime runs on NVIDIA, AMD, Intel, CPU. TRT engines are NVIDIA-specific and even GPU-model-specific (Fact #9).
### Reference Comparison
Our system deploys ONLY on Jetson Orin Nano Super. The companion computer is fixed hardware. There is no requirement or plan to run on non-NVIDIA hardware. Cross-platform portability has zero value for this project.
### Conclusion
ONNX Runtime's primary value proposition (portability) is irrelevant for our deployment. We trade unused portability for maximum performance and minimum memory usage.
### Confidence
✅ High — deployment target is fixed hardware.
@@ -0,0 +1,65 @@
# Validation Log
## Validation Scenario
Full GPS-Denied pipeline running on Jetson Orin Nano Super (8GB) during a 50km flight with ~1500 frames at 3fps. Two AI models active: LiteSAM for satellite matching (keyframes) and XFeat as fallback. cuVSLAM running continuously for VO.
## Expected Based on Conclusions
### If using Native TRT Engine:
- LiteSAM TRT FP16 engine loaded: ~50-80MB GPU memory after deserialization
- XFeat TRT FP16 engine loaded: ~30-50MB GPU memory after deserialization
- Total AI model memory: ~80-130MB
- Inference runs on CUDA Stream B, directly integrated with cuVSLAM Stream A pipelining
- Tensor cores fully utilized at 1020 MHz
- LiteSAM satellite matching at estimated ~165-330ms (TRT FP16 at 1280px)
- XFeat matching at estimated ~50-100ms (TRT FP16)
- Engine files pre-built during offline preparation, stored on Jetson storage alongside satellite tiles
### If using ONNX Runtime TRT-EP:
- LiteSAM via TRT-EP: ~330-360MB during execution
- XFeat via TRT-EP: ~310-330MB during execution
- Total AI model memory: ~640-690MB
- First inference triggers engine build (latency spike at startup)
- CUDA stream management less direct
- Same inference speed (in theory, per MSFT claim)
### Memory budget comparison (total 8GB):
- Native TRT: OS 1.5GB + cuVSLAM 0.5GB + tiles 0.2GB + models 0.13GB + misc 0.1GB = ~2.43GB (30% used)
- ONNX RT TRT-EP: OS 1.5GB + cuVSLAM 0.5GB + tiles 0.2GB + models 0.69GB + ONNX RT overhead 0.15GB + misc 0.1GB = ~3.14GB (39% used)
- Delta: ~710MB (9% of total memory)
## Actual Validation Results
The memory savings from native TRT are confirmed by the mechanism explanation from MSFT (Source #2). The 710MB delta is significant given cuVSLAM map growth risk (up to 1GB on long flights without aggressive pruning).
The workflow integration is validated: engine files can be pre-built as part of the existing offline tile preparation pipeline. No additional hardware or tools needed — trtexec is included in JetPack 6.2.
## Counterexamples
### Counterexample 1: MinGRU operator may not be supported in TRT
MinGRU is a simplified GRU variant used in LiteSAM's subpixel refinement. Standard GRU is supported in TRT 10.3, but MinGRU may use custom operations. If MinGRU fails TRT export, options:
1. Replace MinGRU with standard GRU (small accuracy loss)
2. Split model: CNN+TAIFormer in TRT, MinGRU refinement in PyTorch
3. Use Torch-TensorRT which handles mixed execution
**Assessment**: Low risk. MinGRU is a simplification of GRU, likely uses subset of GRU ops.
### Counterexample 2: Engine rebuild needed per TRT version update
JetPack updates may change TRT version, invalidating cached engines. Must rebuild all engines after JetPack update.
**Assessment**: Acceptable. JetPack updates are infrequent on deployed UAVs. Engine rebuild takes minutes.
### Counterexample 3: Dynamic input shapes
If camera resolution changes between flights, engine with static shapes must be rebuilt. Can use dynamic shapes in trtexec (--minShapes, --optShapes, --maxShapes) but at slight performance cost.
**Assessment**: Acceptable. Camera resolution is fixed per deployment. Build engine for that resolution.
## Review Checklist
- [x] Draft conclusions consistent with fact cards
- [x] No important dimensions missed
- [x] No over-extrapolation
- [x] Conclusions actionable/verifiable
- [x] Memory calculations verified against known budget
- [x] Workflow integration validated against existing offline preparation
## Conclusions Requiring Revision
None — all conclusions hold under validation.
+346
View File
@@ -0,0 +1,346 @@
# Security Analysis
## Operational Context
This system runs on a UAV operating in a **conflict zone** (eastern Ukraine). The UAV could be shot down and physically captured. GPS denial/spoofing is the premise. The Jetson Orin Nano stores satellite imagery, flight plans, and captured photos. Security must assume the worst case: **physical access by an adversary**.
## Threat Model
### Asset Inventory
| Asset | Sensitivity | Location | Notes |
|-------|------------|----------|-------|
| Captured camera imagery | HIGH | Jetson storage | Reconnaissance data — reveals what was surveyed |
| Satellite tile cache | MEDIUM | Jetson storage | Reveals operational area and areas of interest |
| Flight plan / route | HIGH | Jetson memory + storage | Reveals mission objectives and launch/landing sites |
| Computed GPS positions | HIGH | Jetson memory, SSE stream | Real-time position data of UAV and surveyed targets |
| Google Maps API key | MEDIUM | Offline prep machine only | Used pre-flight, NOT stored on Jetson |
| TensorRT model weights | LOW | Jetson storage | LiteSAM/XFeat — publicly available models |
| cuVSLAM binary | LOW | Jetson storage | NVIDIA proprietary but freely distributed |
| IMU calibration data | LOW | Jetson storage | Device-specific calibration |
| System configuration | MEDIUM | Jetson storage | API endpoints, tile paths, fusion parameters |
### Threat Actors
| Actor | Capability | Motivation | Likelihood |
|-------|-----------|------------|------------|
| **Adversary military (physical capture)** | Full physical access after UAV loss | Extract intelligence: imagery, flight plans, operational area | HIGH |
| **Electronic warfare unit** | GPS spoofing/jamming, RF jamming | Disrupt navigation, force UAV off course | HIGH (GPS denial is the premise) |
| **Network attacker (ground station link)** | Intercept/inject on UAV-to-ground comms | Steal position data, inject false commands | MEDIUM |
| **Insider / rogue operator** | Authorized access to system | Data exfiltration, mission sabotage | LOW |
| **Supply chain attacker** | Tampered satellite tiles or model weights | Feed corrupted reference data → position errors | LOW |
### Attack Vectors
| Vector | Target Asset | Actor | Impact | Likelihood |
|--------|-------------|-------|--------|------------|
| **Physical extraction of storage** | All stored data | Adversary (capture) | Full intelligence compromise | HIGH |
| **GPS spoofing** | Position estimate | EW unit | Already mitigated — system is GPS-denied by design | N/A |
| **IMU acoustic injection** | IMU data → ESKF | EW unit | Drift injection, subtle position errors | LOW |
| **Camera blinding/spoofing** | VO + satellite matching | EW unit | VO failure, incorrect satellite matches | LOW |
| **Adversarial ground patterns** | Satellite matching | Adversary | Physical patches on ground fool feature matching | VERY LOW |
| **SSE stream interception** | Position data | Network attacker | Real-time position leak | MEDIUM |
| **API command injection** | Flight session control | Network attacker | Start/stop/manipulate sessions | MEDIUM |
| **Corrupted satellite tiles** | Satellite matching | Supply chain | Systematic position errors | LOW |
| **Model weight tampering** | Matching accuracy | Supply chain | Degraded matching → higher drift | LOW |
## Per-Component Security Requirements and Controls
### 1. Data at Rest (Jetson Storage)
| Requirement | Risk Level | Control | Implementation |
|-------------|-----------|---------|----------------|
| Protect captured imagery from extraction after capture | CRITICAL | Full-disk encryption (LUKS) | JetPack LUKS support with `ENC_ROOTFS=1`. Use OP-TEE Trusted Application for key management. Modify `luks-srv` to NOT auto-decrypt — require hardware token or secure erase trigger |
| Protect satellite tiles and flight plans | HIGH | Same LUKS encryption | Included in full-disk encryption scope |
| Enable rapid secure erase on capture/crash | CRITICAL | Tamper-triggered wipe | Hardware dead-man switch: if UAV telemetry lost for N seconds OR accelerometer detects crash impact → trigger `cryptsetup luksErase` on all LUKS volumes. Destroys key material in <1 second — data becomes unrecoverable |
| Prevent cold-boot key extraction | HIGH | Minimize key residency in RAM | ESKF state and position history cleared from memory when session ends. Avoid writing position logs to disk unless encrypted |
### 2. Secure Boot
| Requirement | Risk Level | Control | Implementation |
|-------------|-----------|---------|----------------|
| Prevent unauthorized code execution | HIGH | NVIDIA Secure Boot with PKC fuse burning | Burn `SecurityMode` fuse (odm_production_mode=0x1) on production Jetsons. Sign all boot images with PKC key pair. Generate keys via HSM |
| Prevent firmware rollback | MEDIUM | Ratchet fuses | Configure anti-rollback fuses in fuse configuration XML |
| Debug port lockdown | HIGH | Disable JTAG/debug after production | Burn debug-disable fuses. Irreversible — production units only |
### 3. API & Communication (FastAPI + SSE)
| Requirement | Risk Level | Control | Implementation |
|-------------|-----------|---------|----------------|
| Authenticate API clients | HIGH | JWT bearer token | Pre-shared secret between ground station and Jetson. Generate JWT at session start. Short expiry (flight duration). `HTTPBearer` scheme in FastAPI |
| Encrypt SSE stream | HIGH | TLS 1.3 | Uvicorn with TLS certificate (self-signed for field use, pre-installed on ground station). All SSE position data encrypted in transit |
| Prevent unauthorized session control | HIGH | JWT + endpoint authorization | Session start/stop/anchor endpoints require valid JWT. Rate-limit via `slowapi` |
| Prevent replay attacks | MEDIUM | JWT `exp` + `jti` claims | Token expiry per-flight. Unique token ID (`jti`) tracked to prevent reuse |
| Limit API surface | MEDIUM | Minimal endpoint exposure | Only expose: POST /sessions, GET /sessions/{id}/stream (SSE), POST /sessions/{id}/anchor, DELETE /sessions/{id}. No admin/debug endpoints in production |
### 4. Visual Odometry (cuVSLAM)
| Requirement | Risk Level | Control | Implementation |
|-------------|-----------|---------|----------------|
| Detect camera feed tampering | LOW | Sanity checks on frame consistency | If consecutive frames show implausible motion (>500m displacement at 3fps), flag as suspicious. ESKF covariance spike triggers satellite re-localization |
| Protect against VO poisoning | LOW | Cross-validate VO with IMU | ESKF fusion inherently cross-validates: IMU and VO disagreement raises covariance, triggers satellite matching. No single sensor can silently corrupt position |
### 5. Satellite Image Matching
| Requirement | Risk Level | Control | Implementation |
|-------------|-----------|---------|----------------|
| Verify tile integrity | MEDIUM | SHA-256 checksums per tile | During offline preprocessing: compute SHA-256 for each tile pair. Store checksums in signed manifest. At runtime: verify checksum on tile load |
| Prevent adversarial tile injection | MEDIUM | Signed tile manifest | Offline tool signs manifest with private key. Jetson verifies signature with embedded public key before accepting tile set |
| Detect satellite match outliers | MEDIUM | RANSAC inlier ratio threshold | If RANSAC inlier ratio <30%, reject match as unreliable. ESKF treats as no-measurement rather than bad measurement |
| Protect against ground-based adversarial patterns | VERY LOW | Multi-tile consensus | Match against multiple overlapping tiles. Physical adversarial patches affect local area — consensus voting across tiles detects anomalies |
### 6. Sensor Fusion (ESKF)
| Requirement | Risk Level | Control | Implementation |
|-------------|-----------|---------|----------------|
| Prevent single-sensor corruption of position | HIGH | Adaptive noise + outlier rejection | Mahalanobis distance test on each measurement. Reject updates >5σ from predicted state. No single measurement can cause >50m position jump |
| Detect systematic drift | MEDIUM | Satellite matching rate monitoring | If satellite matches consistently disagree with VO by >100m, flag integrity warning to operator |
| Protect fusion state | LOW | In-memory only, no persistence | ESKF state never written to disk. Lost on power-off — no forensic recovery |
### 7. Offline Preprocessing (Developer Machine)
| Requirement | Risk Level | Control | Implementation |
|-------------|-----------|---------|----------------|
| Protect Google Maps API key | MEDIUM | Environment variable, never in code | `.env` file excluded from version control. API key used only on developer machine, never deployed to Jetson |
| Validate downloaded tiles | LOW | Source verification | Download only from Google Maps Tile API via HTTPS. Verify TLS certificate chain |
| Secure tile transfer to Jetson | MEDIUM | Signed + encrypted transfer | Transfer tile set + signed manifest via encrypted channel (SCP/SFTP). Verify manifest signature on Jetson before accepting |
## Security Controls Summary
### Authentication & Authorization
- **Mechanism**: Pre-shared JWT secret between ground station and Jetson
- **Scope**: All API endpoints require valid JWT bearer token
- **Session model**: One JWT per flight session, expires at session end
- **No user management on Jetson** — single-operator system, auth is device-to-device
### Data Protection
| State | Protection | Tool |
|-------|-----------|------|
| At rest (Jetson storage) | LUKS full-disk encryption | JetPack LUKS + OP-TEE |
| In transit (SSE stream) | TLS 1.3 | Uvicorn SSL |
| In memory (ESKF state) | No persistence, cleared on session end | Application logic |
| On capture (emergency) | Tamper-triggered LUKS key erase | Hardware dead-man switch + `cryptsetup luksErase` |
### Secure Communication
- Ground station ↔ Jetson: TLS 1.3 (self-signed cert, pre-installed)
- No internet connectivity during flight — no external attack surface
- RF link security is out of scope (handled by UAV communication system)
### Logging & Monitoring
| What | Where | Retention |
|------|-------|-----------|
| API access logs (request count, errors) | In-memory ring buffer | Current session only, not persisted |
| Security events (auth failures, integrity warnings) | In-memory + SSE alert to operator | Current session only |
| Position history | In-memory for refinement, SSE to ground station | NOT persisted on Jetson after session end |
| Crash/tamper events | Trigger secure erase, no logging | N/A — priority is data destruction |
**Design principle**: Minimize data persistence on Jetson. The ground station is the system of record. Jetson stores only what's needed for the current flight — satellite tiles (encrypted at rest) and transient processing state (memory only).
## Protected Code Execution (OP-TEE / ARM TrustZone)
### Overview
The Jetson Orin Nano Super supports hardware-enforced protected code execution via **ARM TrustZone** and **OP-TEE v4.2.0** (included in Jetson Linux 36.3+). TrustZone partitions the processor into two isolated worlds:
- **Secure World** (TEE): Runs at ARMv8 secure EL-1 (OS) and EL-0 (apps). Code here cannot be read or tampered with from the normal world. OP-TEE is the secure OS.
- **Normal World**: Standard Linux (JetPack). Our Python application, cuVSLAM, FastAPI all run here.
Trusted Applications (TAs) execute inside the secure world and are invoked by Client Applications (CAs) in the normal world via the GlobalPlatform TEE Client API.
### Architecture on Jetson Orin Nano
```
┌─────────────────────────────────────────────────┐
│ NORMAL WORLD (Linux / JetPack) │
│ │
│ Client Application (CA) │
│ ↕ libteec.so (TEE Client API) │
│ ↕ OP-TEE Linux Kernel Driver │
│ ↕ ARM Trusted Firmware (ATF) / Monitor │
├─────────────────────────────────────────────────┤
│ SECURE WORLD (OP-TEE v4.2.0) │
│ │
│ OP-TEE OS (ARMv8 S-EL1) │
│ ├── jetson-user-key PTA (key management) │
│ ├── luks TA (disk encryption passphrase) │
│ ├── hwkey-agent TA (encrypt/decrypt data) │
│ ├── PKCS #11 TA (crypto token interface) │
│ └── Custom TAs (our application-specific TAs) │
│ │
│ Hardware: Security Engine (SE), HW RNG, Fuses │
│ TZ-DRAM: Dedicated memory carveout │
└─────────────────────────────────────────────────┘
```
### Key Hierarchy (Hardware-Backed)
The Jetson Orin Nano provides a hardware-rooted key hierarchy via the Security Engine (SE) and Encrypted Key Blob (EKB):
```
OEM_K1 fuse (256-bit AES, burned into hardware, cannot be read by software)
├── EKB_RK (EKB Root Key, derived via AES-128-ECB from OEM_K1 + FV)
│ ├── EKB_EK (encryption key for EKB content)
│ └── EKB_AK (authentication key for EKB content)
├── HUK (Hardware Unique Key, per-device, derived via NIST-SP-800-108)
│ └── SSK (Secure Storage Key, per-device, generated at OP-TEE boot)
│ ├── TSK (TA Storage Key, per-TA)
│ └── FEK (File Encryption Key, per-file)
└── LUKS passphrase (derived from disk encryption key stored in EKB)
```
Fuse keys are loaded into SE keyslots during early boot (before OP-TEE starts). Software cannot read keys from keyslots — only derive new keys through the SE. After use, keyslots should be cleared via `tegra_se_clear_aes_keyslots()`.
### What to Run in the Secure World (Our Use Cases)
| Use Case | TA Type | Purpose |
|----------|---------|---------|
| **LUKS disk encryption** | Built-in `luks` TA | Generate one-time passphrase at boot to unlock encrypted rootfs. Keys never leave secure world |
| **Tile manifest verification** | Custom User TA | Verify SHA-256 signatures of satellite tile manifests. Signing key stored in EKB, accessible only in secure world |
| **JWT secret storage** | Custom User TA or `hwkey-agent` TA | Store JWT signing secret in EKB. Sign/verify JWTs inside secure world — secret never exposed to Linux |
| **Secure erase trigger** | Custom User TA | Receive tamper signal → invoke `cryptsetup luksErase` via CA. Key erase logic runs in secure world to prevent normal-world interference |
| **TLS private key protection** | PKCS #11 TA | Store TLS private key in OP-TEE secure storage. Uvicorn uses PKCS #11 interface to perform TLS handshake without key leaving secure world |
### How to Enable Protected Code Execution
#### Step 1: Burn OEM_K1 Fuse (One-Time, Irreversible)
```bash
# Generate 256-bit OEM_K1 key (use HSM in production)
openssl rand -hex 32 > oem_k1_key.txt
# Create fuse configuration XML with OEM_K1
# Burn fuse via odmfuse.sh (IRREVERSIBLE)
sudo ./odmfuse.sh -i <fuse_config.xml> <board_name>
```
After burning `SecurityMode` fuse (`odm_production_mode=0x1`), all further fuse writes are blocked. OEM_K1 becomes permanently embedded in hardware.
#### Step 2: Generate and Flash EKB
```bash
# Generate user keys (disk encryption key, JWT secret, tile signing key)
openssl rand -hex 16 > disk_enc_key.txt
openssl rand -hex 32 > jwt_secret.txt
openssl rand -hex 32 > tile_signing_key.txt
# Generate EKB binary
python3 gen_ekb.py -chip t234 \
-oem_k1_key oem_k1_key.txt \
-in_sym_key uefi_enc_key.txt \
-in_sym_key2 disk_enc_key.txt \
-in_auth_key uefi_var_auth_key.txt \
-out eks_t234.img
# Flash EKB to EKS partition
# (part of the normal flash process with secure boot enabled)
```
#### Step 3: Enable LUKS Disk Encryption
```bash
# During flash, set ENC_ROOTFS=1 to encrypt rootfs
export ENC_ROOTFS=1
sudo ./flash.sh <board_name> <storage_device>
```
The `luks` TA in OP-TEE derives a passphrase from the disk encryption key in EKB at boot. The passphrase is generated inside the secure world and passed to `cryptsetup` — it never exists in persistent storage.
#### Step 4: Develop Custom Trusted Applications
Cross-compile TAs for aarch64 using the Jetson OP-TEE source package:
```bash
# Build custom TA (e.g., tile manifest verifier)
make -C <ta_source_dir> \
CROSS_COMPILE="<toolchain>/bin/aarch64-buildroot-linux-gnu-" \
TA_DEV_KIT_DIR="<optee_src>/optee/build/t234/export-ta_arm64/" \
OPTEE_CLIENT_EXPORT="<optee_src>/optee/install/t234/usr" \
TEEC_EXPORT="<optee_src>/optee/install/t234/usr" \
-j"$(nproc)"
# Deploy: copy TA to /lib/optee_armtz/ on Jetson
# Deploy: copy CA to /usr/sbin/ on Jetson
```
TAs conform to the GlobalPlatform TEE Internal Core API. Use the `hello_world` example from `optee_examples` as a starting template.
#### Step 5: Enable Secure Boot
```bash
# Generate PKC key pair (use HSM for production)
openssl genrsa -out pkc_key.pem 3072
# Sign and flash secured images
sudo ./flash.sh --sign pkc_key.pem <board_name> <storage_device>
# After verification, burn SecurityMode fuse (IRREVERSIBLE)
```
### Available Crypto Services in Secure World
| Service | Provider | Notes |
|---------|----------|-------|
| AES-128/256 encryption/decryption | SE hardware | Via keyslot-derived keys, never leaves SE |
| Key derivation (NIST-SP-800-108) | `jetson-user-key` PTA | Derive purpose-specific keys from EKB keys |
| Hardware RNG | SE hardware | `TEE_GenerateRandom()` or PTA command |
| PKCS #11 crypto tokens | PKCS #11 TA | Standard crypto interface for TLS, signing |
| SHA-256, HMAC | MbedTLS (bundled in optee_os) | Software crypto in secure world |
| RSA/ECC signing | GlobalPlatform TEE Crypto API | For manifest signature verification |
### Limitations on Orin Nano
| Limitation | Impact | Workaround |
|-----------|--------|------------|
| No RPMB support (only AGX Orin has RPMB) | Secure storage uses REE FS instead of replay-protected memory | Acceptable — LUKS encryption protects data at rest. REE FS secure storage is encrypted by SSK |
| EKB can only be updated via OTA, not at runtime | Cannot rotate keys in flight | Pre-provision per-device unique keys at manufacturing time |
| OP-TEE hello_world reported issues on some Orin Nano units | Some users report initialization failures | Use JetPack 6.2.2+ which includes fixes. Test thoroughly on target hardware |
| TZ-DRAM is a fixed carveout | Limits secure world memory | Keep TAs lightweight — only crypto operations and key management, not data processing |
### Security Hardening Checklist
- [ ] Burn OEM_K1 fuse with unique per-device key (via HSM)
- [ ] Generate and flash EKB with disk encryption key, JWT secret, tile signing key
- [ ] Enable LUKS full-disk encryption (`ENC_ROOTFS=1`)
- [ ] Modify `luks-srv` to NOT auto-decrypt (require explicit trigger or dead-man switch)
- [ ] Burn SecurityMode fuse (`odm_production_mode=0x1`) — enables secure boot chain
- [ ] Burn debug-disable fuses — disables JTAG
- [ ] Configure anti-rollback ratchet fuses
- [ ] Clear SE keyslots after EKB extraction via `tegra_se_clear_aes_keyslots()`
- [ ] Deploy custom TAs for JWT signing and tile manifest verification
- [ ] Use PKCS #11 for TLS private key protection
- [ ] Test secure erase trigger end-to-end
- [ ] Run `xtest` (OP-TEE test suite) on production Jetson to validate TEE
## Key Security Risks
| Risk | Severity | Mitigation Status |
|------|---------|-------------------|
| Physical capture → data extraction | CRITICAL | Mitigated by LUKS + secure erase. Residual risk: attacker extracts RAM before erase triggers |
| No auto-decrypt bypass for LUKS | HIGH | Requires custom `luks-srv` modification — development effort needed |
| Self-signed TLS certificates | MEDIUM | Acceptable for field deployment. Certificate pinning on ground station prevents MITM |
| cuVSLAM is closed-source | LOW | Cannot audit for vulnerabilities. Mitigated by running in sandboxed environment, input validation on camera frames |
| Dead-man switch reliability | HIGH | Hardware integration required. False triggers (temporary signal loss) must NOT cause premature erase. Needs careful threshold tuning |
## References
- xT-STRIDE threat model for UAVs (2025): https://link.springer.com/article/10.1007/s10207-025-01082-4
- NVIDIA Jetson OP-TEE Documentation (r36.4.4): https://docs.nvidia.com/jetson/archives/r36.4.4/DeveloperGuide/SD/Security/OpTee.html
- NVIDIA Jetson Security Overview (r36.4.3): https://docs.nvidia.com/jetson/archives/r36.4.3/DeveloperGuide/SD/Security.html
- NVIDIA Jetson LUKS Disk Encryption: https://docs.nvidia.com/jetson/archives/r36.4.4/DeveloperGuide/SD/Security/DiskEncryption.html
- NVIDIA Jetson Secure Boot: https://docs.nvidia.com/jetson/archives/r36.4.4/DeveloperGuide/SD/Security/SecureBoot.html
- NVIDIA Jetson Secure Storage: https://docs.nvidia.com/jetson/archives/r36.4.3/DeveloperGuide/SD/Security/SecureStorage.html
- NVIDIA Jetson Firmware TPM: https://docs.nvidia.com/jetson/archives/r36.4.3/DeveloperGuide/SD/Security/FirmwareTPM.html
- NVIDIA Jetson Rollback Protection: https://docs.nvidia.com/jetson/archives/r36.4.3/DeveloperGuide/SD/Security/RollbackProtection.html
- Jetson Orin Fuse Specification: https://developer.nvidia.com/downloads/jetson-agx-orin-series-fuse-specification
- OP-TEE Official Documentation: https://optee.readthedocs.io/en/latest/
- OP-TEE Trusted Application Examples: https://github.com/linaro-swg/optee_examples
- RidgeRun OP-TEE on Jetson Guide: https://developer.ridgerun.com/wiki/index.php/RidgeRun_Platform_Security_Manual/Getting_Started/TEE/NVIDA-Jetson
- GlobalPlatform TEE Specifications: https://globalplatform.org/specs-library/?filter-committee=tee
- Model Agnostic Defense against Adversarial Patches on UAVs (2024): https://arxiv.org/html/2405.19179v1
- FastAPI Security Best Practices (2026): https://fastlaunchapi.dev/blog/fastapi-best-practices-production-2026
+283
View File
@@ -0,0 +1,283 @@
# Solution Draft
## Product Solution Description
A real-time GPS-denied visual navigation system for fixed-wing UAVs, running entirely on a Jetson Orin Nano Super (8GB). The system determines frame-center GPS coordinates by fusing three information sources: (1) CUDA-accelerated visual odometry (cuVSLAM), (2) absolute position corrections from satellite image matching, and (3) IMU-based motion prediction. Results stream to clients via REST API + SSE in real time.
**Hard constraint**: Camera shoots at ~3fps (333-400ms interval). The full pipeline must complete within **400ms per frame**.
**Satellite matching strategy**: Benchmark LiteSAM on actual Orin Nano Super hardware as a day-one priority. If LiteSAM cannot achieve ≤400ms at 480px resolution, **abandon it entirely** and use XFeat semi-dense matching as the primary satellite matcher. Speed is non-negotiable.
**Core architectural principles**:
1. **cuVSLAM handles VO** — NVIDIA's CUDA-accelerated library achieves 90fps on Jetson Orin Nano, giving VO essentially "for free" (~11ms/frame).
2. **Keyframe-based satellite matching** — satellite matcher runs on keyframes only (every 3-10 frames), amortizing its cost. Non-keyframes rely on cuVSLAM VO + IMU.
3. **Every keyframe independently attempts satellite-based geo-localization** — this handles disconnected segments natively.
4. **Pipeline parallelism** — satellite matching for frame N overlaps with VO processing of frame N+1 via CUDA streams.
```
┌─────────────────────────────────────────────────────────────────┐
│ OFFLINE (Before Flight) │
│ Satellite Tiles → Download & Crop → Store as tile pairs │
│ (Google Maps) (per flight plan) (disk, GeoHash indexed) │
└─────────────────────────────────────────────────────────────────┘
┌─────────────────────────────────────────────────────────────────┐
│ ONLINE (During Flight) │
│ │
│ EVERY FRAME (400ms budget): │
│ ┌────────────────────────────────┐ │
│ │ Camera → Downsample (CUDA 2ms)│ │
│ │ → cuVSLAM VO+IMU (~11ms)│──→ ESKF Update → SSE Emit │
│ └────────────────────────────────┘ ↑ │
│ │ │
│ KEYFRAMES ONLY (every 3-10 frames): │ │
│ ┌────────────────────────────────────┐ │ │
│ │ Satellite match (async CUDA stream)│─────┘ │
│ │ LiteSAM or XFeat (see benchmark) │ │
│ │ (does NOT block VO output) │ │
│ └────────────────────────────────────┘ │
│ │
│ IMU: 100+Hz continuous → ESKF prediction │
└─────────────────────────────────────────────────────────────────┘
```
## Speed Optimization Techniques
### 1. cuVSLAM for Visual Odometry (~11ms/frame)
NVIDIA's CUDA-accelerated VO library (v15.0.0, March 2026) achieves 90fps on Jetson Orin Nano. Supports monocular camera + IMU natively. Features: automatic IMU fallback when visual tracking fails, loop closure, Python and C++ APIs. This eliminates custom VO entirely.
### 2. Keyframe-Based Satellite Matching
Not every frame needs satellite matching. Strategy:
- cuVSLAM provides VO at every frame (high-rate, low-latency)
- Satellite matching triggers on **keyframes** selected by:
- Fixed interval: every 3-10 frames (~1-3.3s between satellite corrections)
- Confidence drop: when ESKF covariance exceeds threshold
- VO failure: when cuVSLAM reports tracking loss (sharp turn)
### 3. Satellite Matcher Selection (Benchmark-Driven)
**Candidate A: LiteSAM (opt)** — Best accuracy for satellite-aerial matching (RMSE@30 = 17.86m on UAV-VisLoc). 6.31M params, MobileOne + TAIFormer + MinGRU. Benchmarked at 497ms on Jetson AGX Orin at 1184px. AGX Orin is 3-4x more powerful than Orin Nano Super (275 TOPS vs 67 TOPS, $2000+ vs $249).
Realistic Orin Nano Super estimates:
- At 1184px: ~1.5-2.0s (unusable)
- At 640px: ~500-800ms (borderline)
- At 480px: ~300-500ms (best case)
**Candidate B: XFeat semi-dense** — ~50-100ms on Orin Nano Super. Proven on Jetson. Not specifically designed for cross-view satellite-aerial, but fast and reliable.
**Decision rule**: Benchmark LiteSAM TensorRT FP16 at 480px on Orin Nano Super. If ≤400ms → use LiteSAM. If >400ms → **abandon LiteSAM, use XFeat as primary**. No hybrid compromises — pick one and optimize it.
### 4. TensorRT FP16 Optimization
LiteSAM's MobileOne backbone is reparameterizable — multi-branch training structure collapses to a single feed-forward path at inference. Combined with TensorRT FP16, this maximizes throughput. INT8 is possible for MobileOne backbone but ViT/transformer components may degrade with INT8.
### 5. CUDA Stream Pipelining
Overlap operations across consecutive frames:
- Stream A: cuVSLAM VO for current frame (~11ms) + ESKF fusion (~1ms)
- Stream B: Satellite matching for previous keyframe (async)
- CPU: SSE emission, tile management, keyframe selection logic
### 6. Pre-cropped Satellite Tiles
Offline: for each satellite tile, store both the raw image and a pre-resized version matching the satellite matcher's input resolution. Runtime avoids resize cost.
## Existing/Competitor Solutions Analysis
| Solution | Approach | Accuracy | Hardware | Limitations |
|----------|----------|----------|----------|-------------|
| Mateos-Ramirez et al. (2024) | VO (ORB) + satellite keypoint correction + Kalman | 142m mean / 17km (0.83%) | Orange Pi class | No re-localization; ORB only; 1000m+ altitude |
| SatLoc (2025) | DinoV2 + XFeat + optical flow + adaptive fusion | <15m, >90% coverage | Edge (unspecified) | Paper not fully accessible |
| LiteSAM (2025) | MobileOne + TAIFormer + MinGRU subpixel refinement | RMSE@30 = 17.86m on UAV-VisLoc | RTX 3090 (62ms), AGX Orin (497ms) | Not tested on Orin Nano; AGX Orin is 3-4x more powerful |
| TerboucheHacene/visual_localization | SuperPoint/SuperGlue/GIM + VO + satellite | Not quantified | Desktop-class | Not edge-optimized |
| cuVSLAM (NVIDIA, 2025-2026) | CUDA-accelerated VO+SLAM, mono/stereo/IMU | <1% trajectory error (KITTI), <5cm (EuRoC) | Jetson Orin Nano (90fps) | VO only, no satellite matching |
**Key insight**: Combine cuVSLAM (best-in-class VO for Jetson) with the fastest viable satellite-aerial matcher via ESKF fusion. LiteSAM is the accuracy leader but unproven on Orin Nano Super — benchmark first, abandon for XFeat if too slow.
## Architecture
### Component: Visual Odometry
| Solution | Tools | Advantages | Limitations | Fit |
|----------|-------|-----------|-------------|-----|
| cuVSLAM (mono+IMU) | PyCuVSLAM / C++ API | 90fps on Orin Nano, NVIDIA-optimized, loop closure, IMU fallback | Closed-source CUDA library | ✅ Best |
| XFeat frame-to-frame | XFeatTensorRT | 5x faster than SuperPoint, open-source | ~30-50ms total, no IMU integration | ⚠️ Fallback |
| ORB-SLAM3 | OpenCV + custom | Well-understood, open-source | CPU-heavy, ~30fps on Orin | ⚠️ Slower |
**Selected**: **cuVSLAM (mono+IMU mode)** — purpose-built by NVIDIA for Jetson. ~11ms/frame leaves 389ms for everything else. Auto-fallback to IMU when visual tracking fails.
### Component: Satellite Image Matching
| Solution | Tools | Advantages | Limitations | Fit |
|----------|-------|-----------|-------------|-----|
| LiteSAM (opt) | TensorRT | Best satellite-aerial accuracy (RMSE@30 17.86m), 6.31M params, subpixel refinement | 497ms on AGX Orin at 1184px; AGX Orin is 3-4x more powerful than Orin Nano Super | ✅ If benchmark passes |
| XFeat semi-dense | XFeatTensorRT | ~50-100ms, lightweight, Jetson-proven | Not designed for cross-view satellite-aerial | ✅ If LiteSAM fails benchmark |
| EfficientLoFTR | TensorRT | Good accuracy, semi-dense | 15.05M params (2.4x LiteSAM), slower | ⚠️ Heavier |
| SuperPoint + LightGlue | TensorRT C++ | Good general matching | Sparse only, worse on satellite-aerial | ⚠️ Not specialized |
**Selection**: Benchmark-driven. Day-one test on Orin Nano Super:
1. Export LiteSAM (opt) to TensorRT FP16
2. Measure at 480px, 640px, 800px
3. If ≤400ms at 480px → **LiteSAM**
4. If >400ms at any viable resolution → **XFeat semi-dense** (primary, no hybrid)
### Component: Sensor Fusion
| Solution | Tools | Advantages | Limitations | Fit |
|----------|-------|-----------|-------------|-----|
| Error-State EKF (ESKF) | Custom Python/C++ | Lightweight, multi-rate, well-understood | Linear approximation | ✅ Best |
| Hybrid ESKF/UKF | Custom | 49% better accuracy | More complex | ⚠️ Upgrade path |
| Factor Graph (GTSAM) | GTSAM | Best accuracy | Heavy compute | ❌ Too heavy |
**Selected**: **ESKF** with adaptive measurement noise. State vector: [position(3), velocity(3), orientation_quat(4), accel_bias(3), gyro_bias(3)] = 16 states.
Measurement sources and rates:
- IMU prediction: 100+Hz
- cuVSLAM VO update: ~3Hz (every frame)
- Satellite update: ~0.3-1Hz (keyframes only, delayed via async pipeline)
### Component: Satellite Tile Preprocessing (Offline)
**Selected**: **GeoHash-indexed tile pairs on disk**.
Pipeline:
1. Define operational area from flight plan
2. Download satellite tiles from Google Maps Tile API at max zoom (18-19)
3. Pre-resize each tile to matcher input resolution
4. Store: original tile + resized tile + metadata (GPS bounds, zoom, GSD) in GeoHash-indexed directory structure
5. Copy to Jetson storage before flight
### Component: Re-localization (Disconnected Segments)
**Selected**: **Keyframe satellite matching is always active + expanded search on VO failure**.
When cuVSLAM reports tracking loss (sharp turn, no features):
1. Immediately flag next frame as keyframe → trigger satellite matching
2. Expand tile search radius (from ±200m to ±1km based on IMU dead-reckoning uncertainty)
3. If match found: position recovered, new segment begins
4. If 3+ consecutive keyframe failures: request user input via API
### Component: Object Center Coordinates
Geometric calculation once frame-center GPS is known:
1. Pixel offset from center: (dx_px, dy_px)
2. Convert to meters: dx_m = dx_px × GSD, dy_m = dy_px × GSD
3. Rotate by IMU yaw heading
4. Convert meter offset to lat/lon and add to frame-center GPS
### Component: API & Streaming
**Selected**: **FastAPI + sse-starlette**. REST for session management, SSE for real-time position stream. OpenAPI auto-documentation.
## Processing Time Budget (per frame, 400ms budget)
### Normal Frame (non-keyframe, ~60-80% of frames)
| Step | Time | Notes |
|------|------|-------|
| Image capture + transfer | ~10ms | CSI/USB3 |
| Downsample (for cuVSLAM) | ~2ms | OpenCV CUDA |
| cuVSLAM VO+IMU | ~11ms | NVIDIA CUDA-optimized, 90fps capable |
| ESKF fusion (VO+IMU update) | ~1ms | C extension or NumPy |
| SSE emit | ~1ms | Async |
| **Total** | **~25ms** | Well within 400ms |
### Keyframe Satellite Matching (async, every 3-10 frames)
Runs asynchronously on a separate CUDA stream — does NOT block per-frame VO output.
**Path A — LiteSAM (if benchmark passes)**:
| Step | Time | Notes |
|------|------|-------|
| Downsample to ~480px | ~1ms | OpenCV CUDA |
| Load satellite tile | ~5ms | Pre-resized, from storage |
| LiteSAM (opt) matching | ~300-500ms | TensorRT FP16, 480px, Orin Nano Super estimate |
| Geometric pose (RANSAC) | ~5ms | Homography estimation |
| ESKF satellite update | ~1ms | Delayed measurement |
| **Total** | **~310-510ms** | Async, does not block VO |
**Path B — XFeat (if LiteSAM abandoned)**:
| Step | Time | Notes |
|------|------|-------|
| XFeat feature extraction (both images) | ~10-20ms | TensorRT FP16/INT8 |
| XFeat semi-dense matching | ~30-50ms | KNN + refinement |
| Geometric verification (RANSAC) | ~5ms | |
| ESKF satellite update | ~1ms | |
| **Total** | **~50-80ms** | Comfortably within budget |
### Per-Frame Wall-Clock Latency
Every frame:
- **VO result emitted in ~25ms** (cuVSLAM + ESKF + SSE)
- Satellite correction arrives asynchronously on keyframes
- Client gets immediate position, then refined position when satellite match completes
## Memory Budget (Jetson Orin Nano Super, 8GB shared)
| Component | Memory | Notes |
|-----------|--------|-------|
| OS + runtime | ~1.5GB | JetPack 6.2 + Python |
| cuVSLAM | ~200-300MB | NVIDIA CUDA library + internal state |
| Satellite matcher TensorRT | ~50-100MB | LiteSAM FP16 or XFeat FP16 |
| Current frame (downsampled) | ~2MB | 640×480×3 |
| Satellite tile (pre-resized) | ~1MB | Single active tile |
| ESKF state + buffers | ~10MB | |
| FastAPI + SSE runtime | ~100MB | |
| **Total** | **~1.9-2.4GB** | ~25-30% of 8GB — comfortable margin |
## Confidence Scoring
| Level | Condition | Expected Accuracy |
|-------|-----------|-------------------|
| HIGH | Satellite match succeeded + cuVSLAM consistent | <20m |
| MEDIUM | cuVSLAM VO only, recent satellite correction (<500m travel) | 20-50m |
| LOW | cuVSLAM VO only, no recent satellite correction | 50-100m+ |
| VERY LOW | IMU dead-reckoning only (cuVSLAM + satellite both failed) | 100m+ |
| MANUAL | User-provided position | As provided |
## Key Risks and Mitigations
| Risk | Likelihood | Impact | Mitigation |
|------|-----------|--------|------------|
| LiteSAM too slow on Orin Nano Super | HIGH | Misses 400ms deadline | **Abandon LiteSAM, use XFeat**. Day-one benchmark is the go/no-go gate |
| cuVSLAM not supporting nadir-only camera well | MEDIUM | VO accuracy degrades | Fall back to XFeat frame-to-frame matching |
| Google Maps satellite quality in conflict zone | HIGH | Satellite matching fails | Accept VO+IMU with higher drift; request user input sooner; alternative satellite providers |
| XFeat cross-view accuracy insufficient | MEDIUM | Position corrections less accurate than LiteSAM | Increase keyframe frequency; multi-tile consensus voting; geometric verification with strict RANSAC |
| cuVSLAM is closed-source | LOW | Hard to debug | Fallback to XFeat VO; cuVSLAM has Python+C++ APIs |
## Testing Strategy
### Integration / Functional Tests
- End-to-end pipeline test with real flight data (60 images from input_data/)
- Compare computed positions against ground truth GPS from coordinates.csv
- Measure: percentage within 50m, percentage within 20m
- Test sharp-turn handling: introduce 90-degree heading change in sequence
- Test user-input fallback: simulate 3+ consecutive failures
- Test SSE streaming: verify client receives VO result within 50ms, satellite-corrected result within 500ms
- Test session management: start/stop/restart flight sessions via REST API
### Non-Functional Tests
- **Day-one benchmark**: LiteSAM TensorRT FP16 at 480/640/800px on Orin Nano Super → go/no-go for LiteSAM
- cuVSLAM benchmark: verify 90fps monocular+IMU on Orin Nano Super
- Performance: measure per-frame processing time (must be <400ms)
- Memory: monitor peak usage during 1000-frame session (must stay <8GB)
- Stress: process 3000 frames without memory leak
- Keyframe strategy: vary interval (2, 3, 5, 10) and measure accuracy vs latency tradeoff
## References
- LiteSAM (2025): https://www.mdpi.com/2072-4292/17/19/3349
- LiteSAM code: https://github.com/boyagesmile/LiteSAM
- cuVSLAM (2025-2026): https://github.com/NVlabs/PyCuVSLAM
- PyCuVSLAM API: https://nvlabs.github.io/PyCuVSLAM/api.html
- Mateos-Ramirez et al. (2024): https://www.mdpi.com/2076-3417/14/16/7420
- SatLoc (2025): https://www.scilit.com/publications/e5cafaf875a49297a62b298a89d5572f
- XFeat (CVPR 2024): https://arxiv.org/abs/2404.19174
- XFeat TensorRT for Jetson: https://github.com/PranavNedunghat/XFeatTensorRT
- EfficientLoFTR (CVPR 2024): https://github.com/zju3dv/EfficientLoFTR
- JetPack 6.2: https://docs.nvidia.com/jetson/archives/jetpack-archived/jetpack-62/release-notes/
- Hybrid ESKF/UKF: https://arxiv.org/abs/2512.17505
- Google Maps Tile API: https://developers.google.com/maps/documentation/tile/satellite
## Related Artifacts
- AC Assessment: `_docs/00_research/gps_denied_nav/00_ac_assessment.md`
- Tech stack evaluation: `_docs/01_solution/tech_stack.md`
+356
View File
@@ -0,0 +1,356 @@
# Solution Draft
## Assessment Findings
| Old Component Solution | Weak Point (functional/security/performance) | New Solution |
|------------------------|----------------------------------------------|-------------|
| LiteSAM at 480px as satellite matcher | **Performance**: 497ms on AGX Orin at 1184px. Orin Nano Super is ~3-4x slower. At 480px estimated ~270-360ms — borderline. Paper uses PyTorch AMP, not TensorRT FP16. TensorRT could bring 2-3x improvement. | Add TensorRT FP16 as mandatory optimization step. Revised estimate at 480px with TensorRT: ~90-180ms. Still benchmark-driven: abandon if >400ms. |
| XFeat as LiteSAM fallback for satellite matching | **Functional**: XFeat is a general-purpose feature matcher, NOT designed for cross-view satellite-aerial gap. May fail on season/lighting differences between UAV and satellite imagery. | **Expand fallback options**: benchmark EfficientLoFTR (designed for weak-texture aerial) alongside XFeat. Consider STHN-style deep homography as third option. See detailed satellite matcher comparison below. |
| SP+LG considered as "sparse only, worse on satellite-aerial" | **Functional**: LiteSAM paper confirms "SP+LG achieves fastest inference speed but at expense of accuracy." Sparse matcher fails on texture-scarce regions. ~180-360ms on Orin Nano Super. | **Reject SP+LG** for both VO and satellite matching. cuVSLAM is 15-33x faster for VO. |
| cuVSLAM on low-texture terrain | **Functional**: cuVSLAM uses Shi-Tomasi corners + Lucas-Kanade tracking. On uniform agricultural fields/water bodies, features will be sparse → frequent tracking loss. IMU fallback lasts only ~1s. No published benchmarks for nadir agricultural terrain. Does NOT guarantee pose recovery after tracking loss. | **CRITICAL RISK**: cuVSLAM will likely fail frequently over low-texture terrain. Mitigation: (1) increase satellite matching frequency in low-texture areas, (2) use IMU dead-reckoning bridge, (3) accept higher drift in featureless segments, (4) XFeat VO as secondary fallback may also struggle on same terrain. |
| cuVSLAM memory estimate ~200-300MB | **Performance**: Map grows over time. For 3000-frame flights (~16min at 3fps), map could reach 500MB-1GB without pruning. | Configure cuVSLAM map pruning. Set max keyframes. Monitor memory. |
| Tile search on VO failure: "expand to ±1km" | **Functional**: Underspecified. Loading 10-20 tiles slow from disk I/O. | Preload tiles within ±2km of flight plan into RAM. Ranked search by IMU dead-reckoning position. |
| LiteSAM resolution | **Performance**: Paper benchmarked at 1184px on AGX Orin (497ms AMP). TensorRT FP16 with reparameterized MobileOne expected 2-3x faster. | Benchmark LiteSAM TRT FP16 at **1280px** on Orin Nano Super. If ≤200ms → use LiteSAM at 1280px. If >200ms → use XFeat. |
| SP+LG proposed for VO by user | **Performance**: ~130-280ms/frame on Orin Nano. cuVSLAM ~8.6ms/frame. No IMU, no loop closure. | **Reject SP+LG for VO.** cuVSLAM 15-33x faster. XFeat frame-to-frame remains fallback. |
## Product Solution Description
A real-time GPS-denied visual navigation system for fixed-wing UAVs, running entirely on a Jetson Orin Nano Super (8GB). The system determines frame-center GPS coordinates by fusing three information sources: (1) CUDA-accelerated visual odometry (cuVSLAM), (2) absolute position corrections from satellite image matching, and (3) IMU-based motion prediction. Results stream to clients via REST API + SSE in real time.
**Hard constraint**: Camera shoots at ~3fps (333-400ms interval). The full pipeline must complete within **400ms per frame**.
**Satellite matching strategy**: Benchmark LiteSAM TensorRT FP16 at **1280px** on Orin Nano Super as a day-one priority. The paper's AGX Orin benchmark used PyTorch AMP — TensorRT FP16 with reparameterized MobileOne should yield 2-3x additional speedup. **Decision rule: if LiteSAM TRT FP16 at 1280px ≤200ms → use LiteSAM. If >200ms → use XFeat.**
**Core architectural principles**:
1. **cuVSLAM handles VO** — 116fps on Orin Nano 8GB, ~8.6ms/frame. SuperPoint+LightGlue was evaluated and rejected (15-33x slower, no IMU integration).
2. **Keyframe-based satellite matching** — satellite matcher runs on keyframes only (every 3-10 frames), amortizing cost. Non-keyframes rely on cuVSLAM VO + IMU.
3. **Every keyframe independently attempts satellite-based geo-localization** — handles disconnected segments natively.
4. **Pipeline parallelism** — satellite matching for frame N overlaps with VO processing of frame N+1 via CUDA streams.
5. **Proactive tile loading** — preload tiles within ±2km of flight plan into RAM for fast lookup during expanded search.
```
┌─────────────────────────────────────────────────────────────────┐
│ OFFLINE (Before Flight) │
│ Satellite Tiles → Download & Crop → Store as tile pairs │
│ (Google Maps) (per flight plan) (disk, GeoHash indexed) │
└─────────────────────────────────────────────────────────────────┘
┌─────────────────────────────────────────────────────────────────┐
│ ONLINE (During Flight) │
│ │
│ EVERY FRAME (400ms budget): │
│ ┌────────────────────────────────┐ │
│ │ Camera → Downsample (CUDA 2ms)│ │
│ │ → cuVSLAM VO+IMU (~9ms) │──→ ESKF Update → SSE Emit │
│ └────────────────────────────────┘ ↑ │
│ │ │
│ KEYFRAMES ONLY (every 3-10 frames): │ │
│ ┌────────────────────────────────────┐ │ │
│ │ Satellite match (async CUDA stream)│─────┘ │
│ │ LiteSAM TRT FP16 or XFeat │ │
│ │ (does NOT block VO output) │ │
│ └────────────────────────────────────┘ │
│ │
│ IMU: 100+Hz continuous → ESKF prediction │
│ TILES: ±2km preloaded in RAM from flight plan │
└─────────────────────────────────────────────────────────────────┘
```
## Speed Optimization Techniques
### 1. cuVSLAM for Visual Odometry (~9ms/frame)
NVIDIA's CUDA-accelerated VO library (v15.0.0, March 2026) achieves 116fps on Jetson Orin Nano 8GB at 720p. Supports monocular camera + IMU natively. Features: automatic IMU fallback when visual tracking fails, loop closure, Python and C++ APIs.
**Why not SuperPoint+LightGlue for VO**: SP+LG is 15-33x slower (~130-280ms vs ~9ms). Lacks IMU integration, loop closure, auto-fallback.
**CRITICAL: cuVSLAM on difficult/even terrain (agricultural fields, water)**:
cuVSLAM uses Shi-Tomasi corner detection + Lucas-Kanade optical flow tracking (classical features, not learned). On uniform agricultural terrain or water bodies:
- Very few corners will be detected → sparse/unreliable tracking
- Frequent keyframe creation → heavier compute
- Tracking loss → IMU fallback (~1 second) → constant-velocity integrator (~0.5s more)
- cuVSLAM does NOT guarantee pose recovery after tracking loss
- All published benchmarks (KITTI: urban/suburban, EuRoC: indoor) do NOT include nadir agricultural terrain
- Multi-stereo mode helps with featureless surfaces, but we have mono camera only
**Mitigation strategy for low-texture terrain**:
1. **Increase satellite matching frequency**: In low-texture areas (detected by cuVSLAM's keypoint count dropping), switch from every 3-10 frames to every frame
2. **IMU dead-reckoning bridge**: When cuVSLAM reports tracking loss, ESKF continues with IMU prediction. At 3fps with ~1.5s IMU bridge, that covers ~4-5 frames
3. **Accept higher drift**: In featureless segments, position accuracy degrades to IMU-only level (50-100m+ over ~10s). Satellite matching must recover absolute position when texture returns
4. **Keypoint density monitoring**: Track cuVSLAM's number of tracked features per frame. When below threshold (e.g., <50), proactively trigger satellite matching
5. **XFeat frame-to-frame as VO fallback**: XFeat uses learned features that may detect texture invisible to Shi-Tomasi corners. But XFeat may also struggle on truly uniform terrain
### 2. Keyframe-Based Satellite Matching
Not every frame needs satellite matching. Strategy:
- cuVSLAM provides VO at every frame (high-rate, low-latency)
- Satellite matching triggers on **keyframes** selected by:
- Fixed interval: every 3-10 frames (~1-3.3s between satellite corrections)
- Confidence drop: when ESKF covariance exceeds threshold
- VO failure: when cuVSLAM reports tracking loss (sharp turn)
### 3. Satellite Matcher Selection (Benchmark-Driven)
**Important context**: Our UAV-to-satellite matching is EASIER than typical cross-view geo-localization problems. Both the UAV camera and satellite imagery are approximately nadir (top-down). The main challenges are season/lighting differences, resolution mismatch, and temporal changes — not the extreme viewpoint gap seen in ground-to-satellite matching. This means even general-purpose matchers may perform well.
**Candidate A: LiteSAM (opt) with TensorRT FP16 at 1280px** — Best satellite-aerial accuracy (RMSE@30 = 17.86m on UAV-VisLoc). 6.31M params, MobileOne reparameterizable for TensorRT. Paper benchmarked at 497ms on AGX Orin using AMP at 1184px. TensorRT FP16 with reparameterized MobileOne expected 2-3x faster than AMP. At 1280px (close to paper's 1184px benchmark resolution), accuracy should match published results.
Orin Nano Super TensorRT FP16 estimate at 1280px:
- AGX Orin AMP @ 1184px: 497ms
- TRT FP16 speedup over AMP: ~2-3x → AGX Orin TRT estimate: ~165-250ms
- Orin Nano Super is ~3-4x slower → estimate: ~500-1000ms without TRT
- With TRT FP16: **~165-330ms** (realistic range)
- Go/no-go threshold: **≤200ms**
**Candidate B (fallback): XFeat semi-dense** — ~50-100ms on Orin Nano Super. Proven on Jetson. General-purpose, not designed for cross-view gap. FASTEST option. Since our cross-view gap is small (both nadir), XFeat may work adequately for this specific use case.
**Other evaluated options (not selected)**:
- **EfficientLoFTR**: Semi-dense, 15.05M params, handles weak-texture well. ~20% slower than LiteSAM. Strong option if LiteSAM codebase proves difficult to export to TRT, but larger model footprint.
- **Deep Homography (STHN-style)**: End-to-end homography estimation, no feature/RANSAC pipeline. 4.24m at 50m range. Interesting future option but needs RGB retraining — higher implementation risk.
- **PFED and retrieval-based methods**: Image RETRIEVAL only (identifies which tile matches), not pixel-level matching. We already know which tile to use from ESKF position.
- **SuperPoint+LightGlue**: Sparse matcher. LiteSAM paper confirms worse satellite-aerial accuracy. Slower than XFeat.
**Decision rule** (day-one on Orin Nano Super):
1. Export LiteSAM (opt) to TensorRT FP16
2. Benchmark at **1280px**
3. **If ≤200ms → use LiteSAM at 1280px**
4. **If >200ms → use XFeat**
### 4. TensorRT FP16 Optimization
LiteSAM's MobileOne backbone is reparameterizable — multi-branch training structure collapses to a single feed-forward path at inference. Combined with TensorRT FP16, this maximizes throughput. **Do NOT use INT8 on transformer components** (TAIFormer) — accuracy degrades. INT8 is safe only for the MobileOne backbone CNN layers.
### 5. CUDA Stream Pipelining
Overlap operations across consecutive frames:
- Stream A: cuVSLAM VO for current frame (~9ms) + ESKF fusion (~1ms)
- Stream B: Satellite matching for previous keyframe (async)
- CPU: SSE emission, tile management, keyframe selection logic
### 6. Proactive Tile Loading
**Change from draft01**: Instead of loading tiles on-demand from disk, preload tiles within ±2km of the flight plan into RAM at session start. This eliminates disk I/O latency during flight. For a 50km flight path, ~2000 tiles at zoom 19 ≈ ~200MB RAM — well within budget.
On VO failure / expanded search:
1. Compute IMU dead-reckoning position
2. Rank preloaded tiles by distance to predicted position
3. Try top 3 tiles (not all tiles in ±1km radius)
4. If no match in top 3, expand to next 3
## Existing/Competitor Solutions Analysis
| Solution | Approach | Accuracy | Hardware | Limitations |
|----------|----------|----------|----------|-------------|
| Mateos-Ramirez et al. (2024) | VO (ORB) + satellite keypoint correction + Kalman | 142m mean / 17km (0.83%) | Orange Pi class | No re-localization; ORB only; 1000m+ altitude |
| SatLoc (2025) | DinoV2 + XFeat + optical flow + adaptive fusion | <15m, >90% coverage | Edge (unspecified) | Paper not fully accessible |
| LiteSAM (2025) | MobileOne + TAIFormer + MinGRU subpixel refinement | RMSE@30 = 17.86m on UAV-VisLoc | RTX 3090 (62ms), AGX Orin (497ms@1184px) | Not tested on Orin Nano; AGX Orin is 3-4x more powerful |
| TerboucheHacene/visual_localization | SuperPoint/SuperGlue/GIM + VO + satellite | Not quantified | Desktop-class | Not edge-optimized |
| cuVSLAM (NVIDIA, 2025-2026) | CUDA-accelerated VO+SLAM, mono/stereo/IMU | <1% trajectory error (KITTI), <5cm (EuRoC) | Jetson Orin Nano (116fps) | VO only, no satellite matching |
| VRLM (2024) | FocalNet backbone + multi-scale feature fusion | 83.35% MA@20 | Desktop | Not edge-optimized |
| Scale-Aware UAV-to-Satellite (2026) | Semantic geometric + metric scale recovery | N/A | Desktop | Addresses scale ambiguity problem |
| EfficientLoFTR (CVPR 2024) | Aggregated attention + adaptive token selection, semi-dense | Competitive with LiteSAM | 2.5x faster than LoFTR, TRT available | 15.05M params, heavier than LiteSAM |
| PFED (2025) | Knowledge distillation + multi-view refinement, retrieval | 97.15% Recall@1 (University-1652) | AGX Orin (251.5 FPS) | Retrieval only, not pixel-level matching |
| STHN (IEEE RA-L 2024) | Deep homography estimation, coarse-to-fine | 4.24m at 50m range | Open-source, lightweight | Trained on thermal, needs RGB retraining |
| Hierarchical AVL (2025) | DINOv2 retrieval + SuperPoint matching | 64.5-95% success rate | ROS, IMU integration | Two-stage complexity |
| JointLoc (IROS 2024) | Retrieval + VO fusion, adaptive weighting | 0.237m RMSE over 1km | Open-source | Designed for Mars/planetary, needs adaptation |
## Architecture
### Component: Visual Odometry
| Solution | Tools | Advantages | Limitations | Performance | Fit |
|----------|-------|-----------|-------------|------------|-----|
| cuVSLAM (mono+IMU) | PyCuVSLAM v15.0.0 | 116fps on Orin Nano, NVIDIA-optimized, loop closure, IMU fallback | Closed-source CUDA library | ~9ms/frame | ✅ Best |
| XFeat frame-to-frame | XFeatTensorRT | 5x faster than SuperPoint, open-source | ~30-50ms total, no IMU integration | ~30-50ms/frame | ⚠️ Fallback |
| SuperPoint+LightGlue | LightGlue-ONNX TRT | Good accuracy, adaptive pruning | ~130-280ms, no IMU, no loop closure | ~130-280ms/frame | ❌ Rejected |
| ORB-SLAM3 | OpenCV + custom | Well-understood, open-source | CPU-heavy, ~30fps on Orin | ~33ms/frame | ⚠️ Slower |
**Selected**: **cuVSLAM (mono+IMU mode)** — 116fps, purpose-built by NVIDIA for Jetson. Auto-fallback to IMU when visual tracking fails.
**SP+LG rejection rationale**: 15-33x slower than cuVSLAM. No built-in IMU fusion, loop closure, or tracking failure detection. Building these features around SP+LG would take significant development time and still be slower. XFeat at ~30-50ms is a better fallback for VO if cuVSLAM fails on nadir camera.
### Component: Satellite Image Matching
| Solution | Tools | Advantages | Limitations | Performance | Fit |
|----------|-------|-----------|-------------|------------|-----|
| LiteSAM (opt) TRT FP16 @ 1280px | TensorRT | Best satellite-aerial accuracy (RMSE@30 17.86m), 6.31M params, subpixel refinement | Untested on Orin Nano Super with TensorRT | Est. ~165-330ms @ 1280px TRT FP16 | ✅ If ≤200ms |
| XFeat semi-dense | XFeatTensorRT | ~50-100ms, lightweight, Jetson-proven, fastest | General-purpose, not designed for cross-view. Our nadir-nadir gap is small → may work. | ~50-100ms | ✅ Fallback if LiteSAM >200ms |
**Selection**: Day-one benchmark on Orin Nano Super:
1. Export LiteSAM (opt) to TensorRT FP16
2. Benchmark at **1280px**
3. **If ≤200ms → LiteSAM at 1280px**
4. **If >200ms → XFeat**
### Component: Sensor Fusion
| Solution | Tools | Advantages | Limitations | Performance | Fit |
|----------|-------|-----------|-------------|------------|-----|
| Error-State EKF (ESKF) | Custom Python/C++ | Lightweight, multi-rate, well-understood | Linear approximation | <1ms/step | ✅ Best |
| Hybrid ESKF/UKF | Custom | 49% better accuracy | More complex | ~2-3ms/step | ⚠️ Upgrade path |
| Factor Graph (GTSAM) | GTSAM | Best accuracy | Heavy compute | ~10-50ms/step | ❌ Too heavy |
**Selected**: **ESKF** with adaptive measurement noise. State vector: [position(3), velocity(3), orientation_quat(4), accel_bias(3), gyro_bias(3)] = 16 states.
### Component: Satellite Tile Preprocessing (Offline)
**Selected**: **GeoHash-indexed tile pairs on disk + RAM preloading**.
Pipeline:
1. Define operational area from flight plan
2. Download satellite tiles from Google Maps Tile API at max zoom (18-19)
3. Pre-resize each tile to matcher input resolution
4. Store: original tile + resized tile + metadata (GPS bounds, zoom, GSD) in GeoHash-indexed directory structure
5. Copy to Jetson storage before flight
6. **At session start**: preload tiles within ±2km of flight plan into RAM (~200MB for 50km route)
### Component: Re-localization (Disconnected Segments)
**Selected**: **Keyframe satellite matching is always active + ranked tile search on VO failure**.
When cuVSLAM reports tracking loss (sharp turn, no features):
1. Immediately flag next frame as keyframe → trigger satellite matching
2. Compute IMU dead-reckoning position since last known position
3. Rank preloaded tiles by distance to dead-reckoning position
4. Try top 3 tiles sequentially (not all tiles in radius)
5. If match found: position recovered, new segment begins
6. If 3 consecutive keyframe failures across top tiles: expand to next 3 tiles
7. If still no match after 3+ full attempts: request user input via API
### Component: Object Center Coordinates
Geometric calculation once frame-center GPS is known:
1. Pixel offset from center: (dx_px, dy_px)
2. Convert to meters: dx_m = dx_px × GSD, dy_m = dy_px × GSD
3. Rotate by IMU yaw heading
4. Convert meter offset to lat/lon and add to frame-center GPS
### Component: API & Streaming
**Selected**: **FastAPI + sse-starlette**. REST for session management, SSE for real-time position stream. OpenAPI auto-documentation.
## Processing Time Budget (per frame, 400ms budget)
### Normal Frame (non-keyframe, ~60-80% of frames)
| Step | Time | Notes |
|------|------|-------|
| Image capture + transfer | ~10ms | CSI/USB3 |
| Downsample (for cuVSLAM) | ~2ms | OpenCV CUDA |
| cuVSLAM VO+IMU | ~9ms | NVIDIA CUDA-optimized, 116fps capable |
| ESKF fusion (VO+IMU update) | ~1ms | C extension or NumPy |
| SSE emit | ~1ms | Async |
| **Total** | **~23ms** | Well within 400ms |
### Keyframe Satellite Matching (async, every 3-10 frames)
Runs asynchronously on a separate CUDA stream — does NOT block per-frame VO output.
**Path A — LiteSAM TRT FP16 at 1280px (if ≤200ms benchmark)**:
| Step | Time | Notes |
|------|------|-------|
| Downsample to 1280px | ~1ms | OpenCV CUDA |
| Load satellite tile | ~1ms | Pre-loaded in RAM |
| LiteSAM (opt) TRT FP16 matching | ≤200ms | TensorRT FP16, 1280px, go/no-go threshold |
| Geometric pose (RANSAC) | ~5ms | Homography estimation |
| ESKF satellite update | ~1ms | Delayed measurement |
| **Total** | **≤210ms** | Async, within budget |
**Path B — XFeat (if LiteSAM >200ms)**:
| Step | Time | Notes |
|------|------|-------|
| XFeat feature extraction (both images) | ~10-20ms | TensorRT FP16/INT8 |
| XFeat semi-dense matching | ~30-50ms | KNN + refinement |
| Geometric verification (RANSAC) | ~5ms | |
| ESKF satellite update | ~1ms | |
| **Total** | **~50-80ms** | Comfortably within budget |
## Memory Budget (Jetson Orin Nano Super, 8GB shared)
| Component | Memory | Notes |
|-----------|--------|-------|
| OS + runtime | ~1.5GB | JetPack 6.2 + Python |
| cuVSLAM | ~200-500MB | CUDA library + map state. **Configure map pruning for 3000-frame flights** |
| Satellite matcher TensorRT | ~50-100MB | LiteSAM FP16 or XFeat FP16 |
| Preloaded satellite tiles | ~200MB | ±2km of flight plan, pre-resized |
| Current frame (downsampled) | ~2MB | 640×480×3 |
| ESKF state + buffers | ~10MB | |
| FastAPI + SSE runtime | ~100MB | |
| **Total** | **~2.1-2.9GB** | ~26-36% of 8GB — comfortable margin |
## Confidence Scoring
| Level | Condition | Expected Accuracy |
|-------|-----------|-------------------|
| HIGH | Satellite match succeeded + cuVSLAM consistent | <20m |
| MEDIUM | cuVSLAM VO only, recent satellite correction (<500m travel) | 20-50m |
| LOW | cuVSLAM VO only, no recent satellite correction | 50-100m+ |
| VERY LOW | IMU dead-reckoning only (cuVSLAM + satellite both failed) | 100m+ |
| MANUAL | User-provided position | As provided |
## Key Risks and Mitigations
| Risk | Likelihood | Impact | Mitigation |
|------|-----------|--------|------------|
| **cuVSLAM fails on low-texture agricultural terrain** | **HIGH** | Frequent tracking loss, degraded VO | Increase satellite matching frequency when keypoint count drops. IMU dead-reckoning bridge (~1.5s). Accept higher drift in featureless segments. Satellite matching recovers position when texture returns. |
| LiteSAM TRT FP16 >200ms at 1280px on Orin Nano Super | MEDIUM | Must use XFeat instead (less accurate for cross-view) | Day-one TRT FP16 benchmark. If >200ms → XFeat. Since our nadir-nadir gap is small, XFeat may still perform adequately. |
| XFeat cross-view accuracy insufficient | MEDIUM | Satellite corrections less accurate | Benchmark XFeat on actual operational area satellite-aerial pairs. Increase keyframe frequency; multi-tile consensus; strict RANSAC. |
| cuVSLAM map memory growth on long flights | MEDIUM | Memory pressure | Configure map pruning, set max keyframes. Monitor memory. |
| Google Maps satellite quality in conflict zone | HIGH | Satellite matching fails | Accept VO+IMU with higher drift; request user input sooner; alternative satellite providers |
| cuVSLAM is closed-source, no nadir benchmarks | MEDIUM | Unknown failure modes over farmland | Extensive testing with real nadir UAV imagery before deployment. XFeat VO as fallback (also uses learned features). |
| Tile I/O bottleneck during expanded search | LOW | Delayed re-localization | Preload ±2km tiles in RAM; ranked search instead of exhaustive |
## Testing Strategy
### Integration / Functional Tests
- End-to-end pipeline test with real flight data (60 images from input_data/)
- Compare computed positions against ground truth GPS from coordinates.csv
- Measure: percentage within 50m, percentage within 20m
- Test sharp-turn handling: introduce 90-degree heading change in sequence
- Test user-input fallback: simulate 3+ consecutive failures
- Test SSE streaming: verify client receives VO result within 50ms, satellite-corrected result within 500ms
- Test session management: start/stop/restart flight sessions via REST API
- Test cuVSLAM map memory: run 3000-frame session, monitor memory growth
### Non-Functional Tests
- **Day-one satellite matcher benchmark**: LiteSAM TRT FP16 at **1280px** on Orin Nano Super. If ≤200ms → use LiteSAM. If >200ms → use XFeat. Also measure accuracy on test satellite-aerial pairs for both.
- cuVSLAM benchmark: verify 116fps monocular+IMU on Orin Nano Super
- **cuVSLAM terrain stress test**: test with nadir camera over (a) urban/structured terrain, (b) agricultural fields, (c) water/uniform terrain, (d) forest. Measure: keypoint count, tracking success rate, drift per 100 frames, IMU fallback frequency
- cuVSLAM keypoint monitoring: verify that low-keypoint detection triggers increased satellite matching
- Performance: measure per-frame processing time (must be <400ms)
- Memory: monitor peak usage during 3000-frame session (must stay <8GB)
- Stress: process 3000 frames without memory leak
- Keyframe strategy: vary interval (2, 3, 5, 10) and measure accuracy vs latency tradeoff
- Tile preloading: verify RAM usage of preloaded tiles for 50km flight plan
## References
- EfficientLoFTR (CVPR 2024): https://github.com/zju3dv/EfficientLoFTR
- EfficientLoFTR paper: https://zju3dv.github.io/efficientloftr/
- LoFTR TensorRT adaptation: https://github.com/Kolkir/LoFTR_TRT
- PFED (2025): https://github.com/SkyEyeLoc/PFED
- STHN (IEEE RA-L 2024): https://github.com/arplaboratory/STHN
- JointLoc (IROS 2024): https://github.com/LuoXubo/JointLoc
- Hierarchical AVL (MDPI 2025): https://www.mdpi.com/2072-4292/17/20/3470
- LiteSAM (2025): https://www.mdpi.com/2072-4292/17/19/3349
- LiteSAM code: https://github.com/boyagesmile/LiteSAM
- cuVSLAM (2025-2026): https://github.com/NVlabs/PyCuVSLAM
- cuVSLAM paper: https://arxiv.org/abs/2506.04359
- PyCuVSLAM API: https://nvlabs.github.io/PyCuVSLAM/api.html
- Intermodalics cuVSLAM benchmark: https://www.intermodalics.ai/blog/nvidia-isaac-ros-in-depth-cuvslam-and-the-dp3-1-release
- Mateos-Ramirez et al. (2024): https://www.mdpi.com/2076-3417/14/16/7420
- SatLoc (2025): https://www.scilit.com/publications/e5cafaf875a49297a62b298a89d5572f
- XFeat (CVPR 2024): https://arxiv.org/abs/2404.19174
- XFeat TensorRT for Jetson: https://github.com/PranavNedunghat/XFeatTensorRT
- EfficientLoFTR (CVPR 2024): https://github.com/zju3dv/EfficientLoFTR
- LightGlue (ICCV 2023): https://github.com/cvg/LightGlue
- LightGlue TensorRT: https://fabio-sim.github.io/blog/accelerating-lightglue-inference-onnx-runtime-tensorrt/
- LightGlue TRT Jetson: https://github.com/qdLMF/LightGlue-with-FlashAttentionV2-TensorRT
- ForestVO / SP+LG VO: https://arxiv.org/html/2504.01261v1
- vo_lightglue (SP+LG VO): https://github.com/himadrir/vo_lightglue
- JetPack 6.2: https://docs.nvidia.com/jetson/archives/jetpack-archived/jetpack-62/release-notes/
- Hybrid ESKF/UKF: https://arxiv.org/abs/2512.17505
- Google Maps Tile API: https://developers.google.com/maps/documentation/tile/satellite
## Related Artifacts
- AC Assessment: `_docs/00_research/gps_denied_nav/00_ac_assessment.md`
- Tech stack evaluation: `_docs/01_solution/tech_stack.md`
- Security analysis: `_docs/01_solution/security_analysis.md`
+491
View File
@@ -0,0 +1,491 @@
# Solution Draft
## Assessment Findings
| Old Component Solution | Weak Point (functional/security/performance) | New Solution |
|------------------------|----------------------------------------------|-------------|
| FastAPI + SSE as primary output | **Functional**: New AC requires MAVLink GPS_INPUT to flight controller, not REST/SSE. The system must act as a GPS replacement module. SSE is wrong output channel. | **Replace with pymavlink GPS_INPUT sender**. Send GPS_INPUT at 5-10Hz to flight controller via UART. Retain minimal FastAPI only for local IPC (object localization API). |
| No ground station integration | **Functional**: New AC requires streaming position+confidence to ground station and receiving re-localization commands via telemetry. Draft02 had no telemetry. | **MAVLink telemetry integration**: GPS data forwarded automatically by flight controller. Custom data via NAMED_VALUE_FLOAT (confidence, drift). Re-localization hints via COMMAND_LONG listener. |
| MAVSDK library (per restriction) | **Functional**: MAVSDK-Python v3.15.3 cannot send GPS_INPUT messages. Feature requested since 2021, still unresolved. This is a blocking limitation for the core output function. | **Use pymavlink** for all MAVLink communication. pymavlink provides `gps_input_send()` and full MAVLink v2 access. Note conflict with restriction — pymavlink is the only viable option. |
| 3fps camera → ~3Hz output | **Performance**: ArduPilot GPS_RATE_MS minimum is 5Hz (200ms). 3Hz camera output is below minimum. Flight controller EKF may not fuse properly. | **IMU-interpolated 5-10Hz GPS_INPUT**: ESKF prediction runs at 100+Hz internally. Emit predicted state as GPS_INPUT at 5-10Hz. Camera corrections arrive at 3Hz within this stream. |
| No startup/failsafe procedures | **Functional**: New AC requires init from last GPS, reboot recovery, IMU-only fallback. Draft02 assumed position was already known. | **Full lifecycle management**: (1) Boot → read GPS from flight controller → init ESKF. (2) Reboot → read IMU-extrapolated position → re-init. (3) N-second failure → stop GPS_INPUT → autopilot falls back to IMU. |
| Basic object localization (nadir only) | **Functional**: New AC adds AI camera with configurable angle and zoom. Nadir pixel-to-GPS is insufficient. | **Trigonometric projection for oblique camera**: ground_distance = alt × tan(tilt), bearing = heading + pan + pixel offset. Local API for AI system requests. |
| No thermal management | **Performance**: Jetson Orin Nano Super throttles at 80°C (GPU drops 1GHz→300MHz = 3x slowdown). Could blow 400ms budget. | **Thermal monitoring + adaptive pipeline**: Use 25W mode. Monitor via tegrastats. If temp >75°C → reduce satellite matching frequency. If >80°C → VO+IMU only. |
| ESKF covariance without explicit drift budget | **Functional**: New AC requires max 100m cumulative VO drift between satellite anchors. Draft02 uses covariance for keyframe selection but no explicit budget. | **Drift budget tracker**: √(σ_x² + σ_y²) from ESKF as drift estimate. When approaching 100m → force every-frame satellite matching. Report via horiz_accuracy in GPS_INPUT. |
| No satellite imagery validation | **Functional**: New AC requires ≥0.5 m/pixel, <2 years old. Draft02 didn't validate. | **Preprocessing validation step**: Check zoom 19 availability (0.3 m/pixel). Fall back to zoom 18 (0.6 m/pixel). Flag stale tiles. |
| "Ask user via API" for re-localization | **Functional**: New AC says send re-localization request to ground station via telemetry link, not REST API. Operator sends hint via telemetry. | **MAVLink re-localization protocol**: On 3 consecutive failures → send STATUSTEXT alert to ground station. Operator sends COMMAND_LONG with approximate lat/lon. System uses hint to constrain tile search. |
## Product Solution Description
A real-time GPS-denied visual navigation system for fixed-wing UAVs, running on a Jetson Orin Nano Super (8GB). The system replaces the GPS module for the flight controller by sending MAVLink GPS_INPUT messages via pymavlink over UART. Position is determined by fusing: (1) CUDA-accelerated visual odometry (cuVSLAM), (2) absolute position corrections from satellite image matching, and (3) IMU data from the flight controller. GPS_INPUT is sent at 5-10Hz, with camera-based corrections at 3Hz and IMU prediction filling the gaps.
**Hard constraint**: Camera shoots at ~3fps (333ms interval). The full VO+ESKF pipeline must complete within 400ms per frame. GPS_INPUT output rate: 5-10Hz minimum (ArduPilot EKF requirement).
**Output architecture**:
- **Primary**: pymavlink → GPS_INPUT to flight controller via UART (replaces GPS module)
- **Telemetry**: Flight controller auto-forwards GPS data to ground station. Custom NAMED_VALUE_FLOAT for confidence/drift at 1Hz
- **Commands**: Ground station → COMMAND_LONG → flight controller → pymavlink listener on companion computer
- **Local IPC**: Minimal FastAPI on localhost for object localization requests from AI systems
```
┌─────────────────────────────────────────────────────────────────────┐
│ OFFLINE (Before Flight) │
│ Satellite Tiles → Download & Validate → Pre-resize → Store │
│ (Google Maps) (≥0.5m/px, <2yr) (matcher res) (GeoHash) │
│ Copy to Jetson storage │
└─────────────────────────────────────────────────────────────────────┘
┌─────────────────────────────────────────────────────────────────────┐
│ ONLINE (During Flight) │
│ │
│ STARTUP: │
│ pymavlink → read GLOBAL_POSITION_INT → init ESKF → start cuVSLAM │
│ │
│ EVERY FRAME (3fps, 333ms interval): │
│ ┌──────────────────────────────────────┐ │
│ │ Nav Camera → Downsample (CUDA ~2ms) │ │
│ │ → cuVSLAM VO+IMU (~9ms) │ │
│ │ → ESKF measurement update │ │
│ └──────────────────────────────────────┘ │
│ │
│ 5-10Hz CONTINUOUS (between camera frames): │
│ ┌──────────────────────────────────────┐ │
│ │ ESKF IMU prediction → GPS_INPUT send │──→ Flight Controller │
│ │ (pymavlink, every 100-200ms) │ (GPS1_TYPE=14) │
│ └──────────────────────────────────────┘ │
│ │
│ KEYFRAMES (every 3-10 frames, async): │
│ ┌──────────────────────────────────────┐ │
│ │ Satellite match (CUDA stream B) │──→ ESKF correction │
│ │ LiteSAM TRT FP16 or XFeat │ │
│ └──────────────────────────────────────┘ │
│ │
│ TELEMETRY (1Hz): │
│ ┌──────────────────────────────────────┐ │
│ │ NAMED_VALUE_FLOAT: confidence, drift │──→ Ground Station │
│ │ STATUSTEXT: alerts, re-loc requests │ (via telemetry radio) │
│ └──────────────────────────────────────┘ │
│ │
│ COMMANDS (from ground station): │
│ ┌──────────────────────────────────────┐ │
│ │ Listen COMMAND_LONG: re-loc hint │←── Ground Station │
│ │ (lat/lon from operator) │ (via telemetry radio) │
│ └──────────────────────────────────────┘ │
│ │
│ LOCAL IPC: │
│ ┌──────────────────────────────────────┐ │
│ │ FastAPI localhost:8000 │←── AI Detection System │
│ │ POST /localize (object GPS calc) │ │
│ │ GET /status (system health) │ │
│ └──────────────────────────────────────┘ │
│ │
│ IMU: 100+Hz from flight controller → ESKF prediction │
│ TILES: ±2km preloaded in RAM from flight plan │
│ THERMAL: Monitor via tegrastats, adaptive pipeline throttling │
└─────────────────────────────────────────────────────────────────────┘
```
## Speed Optimization Techniques
### 1. cuVSLAM for Visual Odometry (~9ms/frame)
NVIDIA's CUDA-accelerated VO library (PyCuVSLAM v15.0.0, March 2026) achieves 116fps on Jetson Orin Nano 8GB at 720p. Supports monocular camera + IMU natively. Auto-fallback to IMU when visual tracking fails, loop closure, Python and C++ APIs.
**CRITICAL: cuVSLAM on low-texture terrain (agricultural fields, water)**:
cuVSLAM uses Shi-Tomasi corners + Lucas-Kanade optical flow (classical features). On uniform agricultural terrain:
- Few corners detected → sparse/unreliable tracking
- Frequent keyframe creation → heavier compute
- Tracking loss → IMU fallback (~1s) → constant-velocity integrator (~0.5s)
- cuVSLAM does NOT guarantee pose recovery after tracking loss
**Mitigation**:
1. Increase satellite matching frequency when cuVSLAM keypoint count drops
2. IMU dead-reckoning bridge via ESKF (continues GPS_INPUT output during tracking loss)
3. Accept higher drift in featureless segments — report via horiz_accuracy
4. Keypoint density monitoring triggers adaptive satellite matching
### 2. Keyframe-Based Satellite Matching
Not every frame needs satellite matching:
- cuVSLAM provides VO at every frame (~9ms)
- Satellite matching triggers on keyframes selected by:
- Fixed interval: every 3-10 frames
- ESKF covariance exceeds threshold (drift approaching budget)
- VO failure: cuVSLAM reports tracking loss
- Thermal: reduce frequency if temperature high
### 3. Satellite Matcher Selection (Benchmark-Driven)
**Context**: Our UAV-to-satellite matching is nadir-to-nadir (both top-down). Challenges are season/lighting differences and temporal changes, not extreme viewpoint gaps.
**Candidate A: LiteSAM (opt) TRT FP16 @ 1280px** — Best satellite-aerial accuracy (RMSE@30 = 17.86m on UAV-VisLoc). 6.31M params. TensorRT FP16 with reparameterized MobileOne. Estimated ~165-330ms on Orin Nano Super with TRT FP16.
**Candidate B: XFeat semi-dense** — ~50-100ms on Orin Nano Super. Fastest option. General-purpose but our nadir-nadir gap is small.
**Decision rule** (day-one on Orin Nano Super):
1. Export LiteSAM (opt) to TensorRT FP16
2. Benchmark at 1280px
3. If ≤200ms → LiteSAM at 1280px
4. If >200ms → XFeat
### 4. TensorRT FP16 Optimization
LiteSAM's MobileOne backbone is reparameterizable — multi-branch collapses to single feed-forward at inference. INT8 safe only for MobileOne CNN layers, NOT for TAIFormer transformer components.
### 5. CUDA Stream Pipelining
- Stream A: cuVSLAM VO for current frame (~9ms) + ESKF fusion (~1ms)
- Stream B: Satellite matching for previous keyframe (async, does not block VO)
- CPU: GPS_INPUT output loop, NAMED_VALUE_FLOAT, command listener, tile management
### 6. Proactive Tile Loading
Preload tiles within ±2km of flight plan into RAM at startup. For a 50km route, ~2000 tiles at zoom 19 ≈ ~200MB. Eliminates disk I/O during flight.
On VO failure / expanded search:
1. Compute IMU dead-reckoning position
2. Rank preloaded tiles by distance to predicted position
3. Try top 3 tiles, then expand
### 7. 5-10Hz GPS_INPUT Output Loop
Dedicated thread/coroutine sends GPS_INPUT at fixed rate (5-10Hz):
1. Read current ESKF state (position, velocity, covariance)
2. Compute horiz_accuracy from √(σ_x² + σ_y²)
3. Set fix_type based on last correction type (3=satellite-corrected, 2=VO-only, 1=IMU-only)
4. Send via `mav.gps_input_send()`
5. Sleep until next interval
This decouples camera frame rate (3fps) from GPS_INPUT rate (5-10Hz).
## Existing/Competitor Solutions Analysis
| Solution | Approach | Accuracy | Hardware | Limitations |
|----------|----------|----------|----------|-------------|
| Mateos-Ramirez et al. (2024) | VO (ORB) + satellite keypoint correction + Kalman | 142m mean / 17km (0.83%) | Orange Pi class | No re-localization; ORB only; 1000m+ altitude |
| SatLoc (2025) | DinoV2 + XFeat + optical flow + adaptive fusion | <15m, >90% coverage | Edge (unspecified) | Paper not fully accessible |
| LiteSAM (2025) | MobileOne + TAIFormer + MinGRU subpixel refinement | RMSE@30 = 17.86m on UAV-VisLoc | RTX 3090 (62ms), AGX Orin (497ms@1184px) | Not tested on Orin Nano |
| cuVSLAM (NVIDIA, 2025-2026) | CUDA-accelerated VO+SLAM, mono/stereo/IMU | <1% trajectory error (KITTI) | Jetson Orin Nano (116fps) | VO only, no satellite matching |
| EfficientLoFTR (CVPR 2024) | Aggregated attention + adaptive token selection | Competitive with LiteSAM | TRT available | 15.05M params, heavier |
| STHN (IEEE RA-L 2024) | Deep homography estimation | 4.24m at 50m range | Lightweight | Needs RGB retraining |
| JointLoc (IROS 2024) | Retrieval + VO fusion, adaptive weighting | 0.237m RMSE over 1km | Open-source | Planetary, needs adaptation |
## Architecture
### Component: Flight Controller Integration (NEW)
| Solution | Tools | Advantages | Limitations | Performance | Fit |
|----------|-------|-----------|-------------|------------|-----|
| pymavlink GPS_INPUT | pymavlink | Full MAVLink v2 access, GPS_INPUT support, pure Python, aarch64 compatible | Lower-level API, manual message handling | ~1ms per send | ✅ Best |
| MAVSDK-Python TelemetryServer | MAVSDK v3.15.3 | Higher-level API, aarch64 wheels | NO GPS_INPUT support, no custom messages | N/A — missing feature | ❌ Blocked |
| MAVSDK C++ MavlinkDirect | MAVSDK v4 (future) | Custom message support planned | Not available in Python wrapper yet | N/A — not released | ❌ Not available |
| MAVROS (ROS) | ROS + MAVROS | Full GPS_INPUT support, ROS ecosystem | Heavy ROS dependency, complex setup, unnecessary overhead | ~5ms overhead | ⚠️ Overkill |
**Selected**: **pymavlink** — only viable Python library for GPS_INPUT. Pure Python, works on aarch64, full MAVLink v2 message set.
**Restriction note**: restrictions.md specifies "MAVSDK library" but MAVSDK-Python cannot send GPS_INPUT (confirmed: Issue #320, open since 2021). pymavlink is the necessary alternative.
Configuration:
- Connection: UART (`/dev/ttyTHS0` or `/dev/ttyTHS1` on Jetson, 115200-921600 baud)
- Flight controller: GPS1_TYPE=14, SERIAL2_PROTOCOL=2 (MAVLink2)
- GPS_INPUT rate: 5-10Hz (dedicated output thread)
- Heartbeat: 1Hz to maintain connection
### Component: Visual Odometry
| Solution | Tools | Advantages | Limitations | Performance | Fit |
|----------|-------|-----------|-------------|------------|-----|
| cuVSLAM (mono+IMU) | PyCuVSLAM v15.0.0 | 116fps on Orin Nano, NVIDIA-optimized, loop closure, IMU fallback | Closed-source, low-texture terrain risk | ~9ms/frame | ✅ Best |
| XFeat frame-to-frame | XFeatTensorRT | Open-source, learned features | No IMU integration, ~30-50ms | ~30-50ms/frame | ⚠️ Fallback |
| ORB-SLAM3 | OpenCV + custom | Well-understood, open-source | CPU-heavy, ~30fps | ~33ms/frame | ⚠️ Slower |
**Selected**: **cuVSLAM (mono+IMU mode)** — 116fps, purpose-built for Jetson.
### Component: Satellite Image Matching
| Solution | Tools | Advantages | Limitations | Performance | Fit |
|----------|-------|-----------|-------------|------------|-----|
| LiteSAM (opt) TRT FP16 @ 1280px | TensorRT | Best satellite-aerial accuracy, 6.31M params | Untested on Orin Nano Super TRT | Est. ~165-330ms TRT FP16 | ✅ If ≤200ms |
| XFeat semi-dense | XFeatTensorRT | ~50-100ms, Jetson-proven, fastest | General-purpose | ~50-100ms | ✅ Fallback |
**Selection**: Day-one benchmark. LiteSAM TRT FP16 at 1280px → if ≤200ms → LiteSAM. If >200ms → XFeat.
### Component: Sensor Fusion
| Solution | Tools | Advantages | Limitations | Performance | Fit |
|----------|-------|-----------|-------------|------------|-----|
| ESKF (custom) | Python/C++ | Lightweight, multi-rate, well-understood | Linear approximation | <1ms/step | ✅ Best |
| Hybrid ESKF/UKF | Custom | 49% better accuracy | More complex | ~2-3ms/step | ⚠️ Upgrade path |
**Selected**: **ESKF** with adaptive measurement noise. State vector: [position(3), velocity(3), orientation_quat(4), accel_bias(3), gyro_bias(3)] = 16 states.
**Output rates**:
- IMU prediction: 100+Hz (from flight controller IMU via pymavlink)
- cuVSLAM VO update: ~3Hz
- Satellite update: ~0.3-1Hz (keyframes, async)
- GPS_INPUT output: 5-10Hz (ESKF predicted state)
**Drift budget**: Track √(σ_x² + σ_y²) from ESKF covariance. When approaching 100m → force every-frame satellite matching.
### Component: Ground Station Telemetry (NEW)
| Solution | Tools | Advantages | Limitations | Performance | Fit |
|----------|-------|-----------|-------------|------------|-----|
| MAVLink auto-forwarding + NAMED_VALUE_FLOAT | pymavlink | Standard MAVLink, no custom protocol, works with all GCS (Mission Planner, QGC) | Limited bandwidth (~12kbit/s), NAMED_VALUE_FLOAT name limited to 10 chars | ~50 bytes/msg | ✅ Best |
| Custom MAVLink dialect messages | pymavlink + custom XML | Full flexibility | Requires custom GCS plugin, non-standard | ~50 bytes/msg | ⚠️ Complex |
| Separate telemetry channel | TCP/UDP over separate radio | Full bandwidth | Extra hardware, extra radio | N/A | ❌ Not available |
**Selected**: **Standard MAVLink forwarding + NAMED_VALUE_FLOAT**
Telemetry data sent to ground station:
- GPS position: auto-forwarded by flight controller from GPS_INPUT data
- Confidence score: NAMED_VALUE_FLOAT `"gps_conf"` at 1Hz (values: 1=HIGH, 2=MEDIUM, 3=LOW, 4=VERY_LOW)
- Drift estimate: NAMED_VALUE_FLOAT `"gps_drift"` at 1Hz (meters)
- Matching status: NAMED_VALUE_FLOAT `"sat_match"` at 1Hz (0=inactive, 1=matching, 2=failed)
- Alerts: STATUSTEXT for critical events (re-localization request, system failure)
Re-localization from ground station:
- Operator sees drift/failure alert in GCS
- Sends COMMAND_LONG (MAV_CMD_USER_1) with lat/lon in param5/param6
- Companion computer listens for COMMAND_LONG with target component ID
- Receives hint → constrains tile search → attempts satellite matching near hint coordinates
### Component: Startup & Lifecycle (NEW)
**Startup sequence**:
1. Boot Jetson → start GPS-Denied service (systemd)
2. Connect to flight controller via pymavlink on UART
3. Wait for heartbeat from flight controller
4. Read GLOBAL_POSITION_INT → extract lat, lon, alt
5. Initialize ESKF state with this position (high confidence if real GPS available)
6. Start cuVSLAM with first camera frames
7. Begin GPS_INPUT output loop at 5-10Hz
8. Preload satellite tiles within ±2km of flight plan into RAM
9. System ready — GPS-Denied active
**GPS denial detection**:
Not required — the system always outputs GPS_INPUT. If real GPS is available, the flight controller uses whichever GPS source has better accuracy (configurable GPS blending or priority). When real GPS degrades/lost, flight controller seamlessly uses our GPS_INPUT.
**Failsafe**:
- If no valid position estimate for N seconds (configurable, e.g., 10s): stop sending GPS_INPUT
- Flight controller detects GPS timeout → falls back to IMU-only dead reckoning
- System logs failure, continues attempting recovery (VO + satellite matching)
- When recovery succeeds: resume GPS_INPUT output
**Reboot recovery**:
1. Jetson reboots → re-establish pymavlink connection
2. Read GPS_RAW_INT (now IMU-extrapolated by flight controller since GPS_INPUT stopped)
3. Initialize ESKF with this position (low confidence, horiz_accuracy=100m+)
4. Resume cuVSLAM + satellite matching → accuracy improves over time
5. Resume GPS_INPUT output
### Component: Object Localization (UPDATED)
**Two modes**:
**Mode 1: Navigation camera (nadir)**
Frame-center GPS from ESKF. Any object in navigation camera frame:
1. Pixel offset from center: (dx_px, dy_px)
2. Convert to meters: dx_m = dx_px × GSD, dy_m = dy_px × GSD
3. Rotate by heading (yaw from IMU)
4. Convert meter offset to lat/lon delta, add to frame-center GPS
**Mode 2: AI camera (configurable angle and zoom)**
1. Get current UAV position from ESKF
2. Get AI camera params: tilt_angle (from vertical), pan_angle (from heading), zoom (effective focal length)
3. Get pixel coordinates of detected object in AI camera frame
4. Compute bearing: bearing = heading + pan_angle + atan2(dx_px × sensor_width / focal_eff, focal_eff)
5. Compute ground distance: for flat terrain, slant_range = altitude / cos(tilt_angle + dy_angle), ground_range = slant_range × sin(tilt_angle + dy_angle)
6. Convert bearing + ground_range to lat/lon offset
7. Return GPS coordinates with accuracy estimate
**Local API** (FastAPI on localhost:8000):
- `POST /localize` — accepts: pixel_x, pixel_y, camera_id ("nav" or "ai"), ai_camera_params (tilt, pan, zoom) → returns: lat, lon, accuracy_m
- `GET /status` — returns: system state, confidence, drift, uptime
### Component: Satellite Tile Preprocessing (Offline)
**Selected**: GeoHash-indexed tile pairs on disk + RAM preloading.
Pipeline:
1. Define operational area from flight plan
2. Download satellite tiles from Google Maps Tile API at zoom 19 (0.3 m/pixel)
3. If zoom 19 unavailable: fall back to zoom 18 (0.6 m/pixel — meets ≥0.5 m/pixel requirement)
4. Validate: resolution ≥0.5 m/pixel, check imagery staleness where possible
5. Pre-resize each tile to matcher input resolution
6. Store: original + resized + metadata (GPS bounds, zoom, GSD, download date) in GeoHash-indexed structure
7. Copy to Jetson storage before flight
8. At startup: preload tiles within ±2km of flight plan into RAM
### Component: Re-localization (Disconnected Segments)
When cuVSLAM reports tracking loss (sharp turn, no features):
1. Flag next frame as keyframe → trigger satellite matching
2. Compute IMU dead-reckoning position since last known position
3. Rank preloaded tiles by distance to dead-reckoning position
4. Try top 3 tiles sequentially
5. If match found: position recovered, new segment begins
6. If 3 consecutive keyframe failures: send STATUSTEXT alert to ground station ("RE-LOC REQUEST: position uncertain, drift Xm")
7. While waiting for operator hint: continue VO/IMU dead reckoning, report low confidence via horiz_accuracy
8. If operator sends COMMAND_LONG with lat/lon hint: constrain tile search to ±500m of hint
9. If still no match after operator hint: continue dead reckoning, log failure
### Component: Thermal Management (NEW)
**Power mode**: 25W (stable sustained performance)
**Monitoring**: Read GPU/CPU temperature via tegrastats or sysfs thermal zones at 1Hz.
**Adaptive pipeline**:
- Normal (<70°C): Full pipeline — cuVSLAM every frame + satellite match every 3-10 frames
- Warm (70-75°C): Reduce satellite matching to every 5-10 frames
- Hot (75-80°C): Reduce satellite matching to every 10-15 frames
- Throttling (>80°C): Disable satellite matching entirely, VO+IMU only (cuVSLAM ~9ms is very light). Report LOW confidence. Resume satellite matching when temp drops below 75°C
**Hardware requirement**: Active cooling fan (5V) mandatory for UAV companion computer enclosure.
## Processing Time Budget (per frame, 333ms interval)
### Normal Frame (non-keyframe, ~60-80% of frames)
| Step | Time | Notes |
|------|------|-------|
| Image capture + transfer | ~10ms | CSI/USB3 |
| Downsample (for cuVSLAM) | ~2ms | OpenCV CUDA |
| cuVSLAM VO+IMU | ~9ms | NVIDIA CUDA-optimized, 116fps |
| ESKF measurement update | ~1ms | NumPy |
| **Total per camera frame** | **~22ms** | Well within 333ms |
GPS_INPUT output runs independently at 5-10Hz (every 100-200ms):
| Step | Time | Notes |
|------|------|-------|
| Read ESKF state | <0.1ms | Shared state |
| Compute horiz_accuracy | <0.1ms | √(σ²) |
| pymavlink gps_input_send | ~1ms | UART write |
| **Total per GPS_INPUT** | **~1ms** | Negligible overhead |
### Keyframe Satellite Matching (async, every 3-10 frames)
Runs on separate CUDA stream — does NOT block VO or GPS_INPUT.
**Path A — LiteSAM TRT FP16 at 1280px (if ≤200ms benchmark)**:
| Step | Time | Notes |
|------|------|-------|
| Downsample to 1280px | ~1ms | OpenCV CUDA |
| Load satellite tile | ~1ms | Pre-loaded in RAM |
| LiteSAM (opt) TRT FP16 | ≤200ms | Go/no-go threshold |
| Geometric pose (RANSAC) | ~5ms | Homography |
| ESKF satellite update | ~1ms | Delayed measurement |
| **Total** | **≤210ms** | Async |
**Path B — XFeat (if LiteSAM >200ms)**:
| Step | Time | Notes |
|------|------|-------|
| XFeat extraction + matching | ~50-80ms | TensorRT FP16 |
| Geometric verification (RANSAC) | ~5ms | |
| ESKF satellite update | ~1ms | |
| **Total** | **~60-90ms** | Async |
## Memory Budget (Jetson Orin Nano Super, 8GB shared)
| Component | Memory | Notes |
|-----------|--------|-------|
| OS + runtime | ~1.5GB | JetPack 6.2 + Python |
| cuVSLAM | ~200-500MB | CUDA library + map state (configure pruning for 3000 frames) |
| Satellite matcher TensorRT | ~50-100MB | LiteSAM FP16 or XFeat FP16 |
| Preloaded satellite tiles | ~200MB | ±2km of flight plan |
| pymavlink + MAVLink runtime | ~20MB | Lightweight |
| FastAPI (local IPC) | ~50MB | Minimal, localhost only |
| Current frame buffer | ~2MB | |
| ESKF state + buffers | ~10MB | |
| **Total** | **~2.1-2.9GB** | ~26-36% of 8GB — comfortable |
## Confidence Scoring → GPS_INPUT Mapping
| Level | Condition | horiz_accuracy (m) | fix_type | GPS_INPUT satellites_visible |
|-------|-----------|---------------------|----------|------------------------------|
| HIGH | Satellite match succeeded + cuVSLAM consistent | 10-20 | 3 (3D) | 12 |
| MEDIUM | cuVSLAM VO only, recent satellite correction (<500m travel) | 20-50 | 3 (3D) | 8 |
| LOW | cuVSLAM VO only, no recent correction, OR high thermal throttling | 50-100 | 2 (2D) | 4 |
| VERY LOW | IMU dead-reckoning only | 100-500 | 1 (no fix) | 1 |
| MANUAL | Operator-provided re-localization hint | 200 | 3 (3D) | 6 |
Note: `satellites_visible` is synthetic — used to influence EKF weighting. ArduPilot gives more weight to GPS with higher satellite count and lower horiz_accuracy.
## Key Risks and Mitigations
| Risk | Likelihood | Impact | Mitigation |
|------|-----------|--------|------------|
| **MAVSDK cannot send GPS_INPUT** | CONFIRMED | Must use pymavlink (conflicts with restriction) | Use pymavlink. Document restriction conflict. No alternative in Python. |
| **cuVSLAM fails on low-texture agricultural terrain** | HIGH | Frequent tracking loss, degraded VO | Increase satellite matching frequency. IMU dead-reckoning bridge. Accept higher drift. |
| **Jetson UART instability with ArduPilot** | MEDIUM | MAVLink connection drops | Test thoroughly. Use USB serial adapter if UART unreliable. Add watchdog reconnect. |
| **Thermal throttling blows satellite matching budget** | MEDIUM | Miss keyframe windows | Adaptive pipeline: reduce/skip satellite matching at high temp. Active cooling mandatory. |
| LiteSAM TRT FP16 >200ms at 1280px | MEDIUM | Must use XFeat | Day-one benchmark. XFeat fallback. |
| XFeat cross-view accuracy insufficient | MEDIUM | Satellite corrections less accurate | Multi-tile consensus, strict RANSAC, increase keyframe frequency. |
| cuVSLAM map memory growth on long flights | MEDIUM | Memory pressure | Configure map pruning, max keyframes. |
| Google Maps satellite quality in conflict zone | HIGH | Satellite matching fails | Accept VO+IMU with higher drift. Alternative providers. |
| GPS_INPUT at 3Hz too slow for ArduPilot EKF | HIGH | Poor EKF fusion, position jumps | 5-10Hz output with IMU interpolation between camera frames. |
| Companion computer reboot mid-flight | LOW | ~30-60s GPS gap | Flight controller IMU fallback. Automatic recovery on restart. |
| Telemetry bandwidth saturation | LOW | Custom messages compete with autopilot telemetry | Limit NAMED_VALUE_FLOAT to 1Hz. Keep messages compact. |
## Testing Strategy
### Integration / Functional Tests
- End-to-end: camera → cuVSLAM → ESKF → GPS_INPUT → verify flight controller receives valid position
- Compare computed positions against ground truth GPS from coordinates.csv
- Measure: percentage within 50m (target: 80%), percentage within 20m (target: 60%)
- Test GPS_INPUT rate: verify 5-10Hz output to flight controller
- Test sharp-turn handling: verify satellite re-localization after 90-degree heading change
- Test disconnected segments: simulate 3+ route breaks, verify all segments connected
- Test re-localization: simulate 3 consecutive failures → verify STATUSTEXT sent → inject COMMAND_LONG hint → verify recovery
- Test object localization: send POST /localize with known AI camera params → verify GPS accuracy
- Test startup: verify ESKF initializes from flight controller GPS
- Test reboot recovery: kill process → restart → verify reconnection and position recovery
- Test failsafe: simulate total failure → verify GPS_INPUT stops → verify flight controller IMU fallback
- Test cuVSLAM map memory: run 3000-frame session, monitor memory growth
### Non-Functional Tests
- **Day-one satellite matcher benchmark**: LiteSAM TRT FP16 at 1280px on Orin Nano Super
- cuVSLAM benchmark: verify 116fps monocular+IMU on Orin Nano Super
- cuVSLAM terrain stress test: urban, agricultural, water, forest
- **UART reliability test**: sustained pymavlink communication over 1+ hour
- **Thermal endurance test**: run full pipeline for 30+ minutes, measure GPU temp, verify no throttling with active cooling
- Per-frame latency: must be <400ms for VO pipeline
- GPS_INPUT latency: measure time from camera capture to GPS_INPUT send
- Memory: peak usage during 3000-frame session (must stay <8GB)
- Drift budget: verify ESKF covariance tracks cumulative drift, triggers satellite matching before 100m
- Telemetry bandwidth: measure total MAVLink bandwidth used by companion computer
## References
- pymavlink GPS_INPUT example: https://webperso.ensta.fr/lebars/Share/GPS_INPUT_pymavlink.py
- pymavlink mavgps.py: https://github.com/ArduPilot/pymavlink/blob/master/examples/mavgps.py
- ArduPilot GPS Input module: https://ardupilot.org/mavproxy/docs/modules/GPSInput.html
- MAVLink GPS_INPUT message spec: https://mavlink.io/en/messages/common.html#GPS_INPUT
- MAVSDK-Python GPS_INPUT limitation: https://github.com/mavlink/MAVSDK-Python/issues/320
- MAVSDK-Python custom message limitation: https://github.com/mavlink/MAVSDK-Python/issues/739
- ArduPilot companion computer setup: https://ardupilot.org/dev/docs/raspberry-pi-via-mavlink.html
- Jetson Orin UART with ArduPilot: https://forums.developer.nvidia.com/t/uart-connection-between-jetson-nano-orin-and-ardupilot/325416
- MAVLink NAMED_VALUE_FLOAT: https://mavlink.io/en/messages/common.html#NAMED_VALUE_FLOAT
- MAVLink STATUSTEXT: https://mavlink.io/en/messages/common.html#STATUSTEXT
- MAVLink telemetry bandwidth: https://github.com/mavlink/mavlink/issues/1605
- JetPack 6.2 Super Mode: https://developer.nvidia.com/blog/nvidia-jetpack-6-2-brings-super-mode-to-nvidia-jetson-orin-nano-and-jetson-orin-nx-modules/
- Jetson Orin Nano power consumption: https://edgeaistack.app/blog/jetson-orin-nano-power-consumption/
- UAV target geolocation: https://www.mdpi.com/1424-8220/22/5/1903
- LiteSAM (2025): https://www.mdpi.com/2072-4292/17/19/3349
- LiteSAM code: https://github.com/boyagesmile/LiteSAM
- cuVSLAM (2025-2026): https://github.com/NVlabs/PyCuVSLAM
- PyCuVSLAM API: https://nvlabs.github.io/PyCuVSLAM/api.html
- Intermodalics cuVSLAM benchmark: https://www.intermodalics.ai/blog/nvidia-isaac-ros-in-depth-cuvslam-and-the-dp3-1-release
- XFeat (CVPR 2024): https://arxiv.org/abs/2404.19174
- XFeat TensorRT for Jetson: https://github.com/PranavNedunghat/XFeatTensorRT
- EfficientLoFTR (CVPR 2024): https://github.com/zju3dv/EfficientLoFTR
- STHN (IEEE RA-L 2024): https://github.com/arplaboratory/STHN
- JointLoc (IROS 2024): https://github.com/LuoXubo/JointLoc
- Hybrid ESKF/UKF: https://arxiv.org/abs/2512.17505
- Google Maps Tile API: https://developers.google.com/maps/documentation/tile/satellite
- ArduPilot EKF Source Selection: https://ardupilot.org/copter/docs/common-ekf-sources.html
- Mateos-Ramirez et al. (2024): https://www.mdpi.com/2076-3417/14/16/7420
- SatLoc (2025): https://www.scilit.com/publications/e5cafaf875a49297a62b298a89d5572f
## Related Artifacts
- AC Assessment: `_docs/00_research/gps_denied_nav/00_ac_assessment.md`
- Research artifacts: `_docs/00_research/gps_denied_nav_v3/`
- Tech stack evaluation: `_docs/01_solution/tech_stack.md`
- Security analysis: `_docs/01_solution/security_analysis.md`
+385
View File
@@ -0,0 +1,385 @@
# Solution Draft
## Assessment Findings
| Old Component Solution | Weak Point (functional/security/performance) | New Solution |
|------------------------|----------------------------------------------|-------------|
| ONNX Runtime as potential inference runtime for AI models | **Performance**: ONNX Runtime CUDA EP on Jetson Orin Nano is 7-8x slower than TRT standalone with default settings (tensor cores not utilized). Even TRT-EP shows up to 3x overhead on some models. | **Use native TRT Engine for all AI models**. Convert PyTorch → ONNX → trtexec → .engine. Load with tensorrt Python module. Eliminates ONNX Runtime dependency entirely. |
| ONNX Runtime TRT-EP memory overhead | **Performance**: ONNX RT TRT-EP keeps serialized engine in memory (~420-440MB vs 130-140MB native TRT). Delta ~280-300MB PER MODEL. On 8GB shared memory, this wastes ~560-600MB for two models. | **Native TRT releases serialized blob after deserialization** → saves ~280-300MB per model. Total savings ~560-600MB — 7% of total memory. Critical given cuVSLAM map growth risk. |
| No explicit TRT engine build step in offline pipeline | **Functional**: Draft03 mentions TRT FP16 but doesn't define the build workflow. When/where are engines built? | **Add TRT engine build to offline preparation pipeline**: After satellite tile download, run trtexec on Jetson to build .engine files. Store alongside tiles. One-time cost per model version. |
| Cross-platform portability via ONNX Runtime | **Functional**: ONNX Runtime's primary value is cross-platform support. Our deployment is Jetson-only — this value is zero. We pay the performance/memory tax for unused portability. | **Drop ONNX Runtime**. Jetson Orin Nano Super is fixed deployment hardware. TRT Engine is the optimal runtime for NVIDIA-only deployment. |
| No DLA offloading considered | **Performance**: Draft03 doesn't mention DLA. Jetson Orin Nano has NO DLA cores — only Orin NX (1-2) and AGX Orin (2) have DLA. | **Confirm: DLA offloading is NOT available on Orin Nano**. All inference must run on GPU (1024 CUDA cores, 16 tensor cores). This makes maximizing GPU efficiency via native TRT even more critical. |
| LiteSAM MinGRU TRT compatibility risk | **Functional**: LiteSAM's subpixel refinement uses 4 stacked MinGRU layers over a 3×3 candidate window (seq_len=9). MinGRU gates depend only on input C_f (not h_{t-1}), so z_t/h̃_t are pre-computable. Ops are standard: Linear, Sigmoid, Mul, Add, ReLU, Tanh. Risk is LOW-MEDIUM — depends on whether implementation uses logcumsumexp (problematic) or simple loop (fine). Seq_len=9 makes this trivially rewritable. | **Day-one verification**: clone LiteSAM repo → torch.onnx.export → polygraphy inspect → trtexec --fp16. If export fails on MinGRU: rewrite forward() as unrolled loop (9 steps). **If LiteSAM cannot be made TRT-compatible: replace with EfficientLoFTR TRT** (proven TRT path via Coarse_LoFTR_TRT, 15.05M params, semi-dense matching). |
## Product Solution Description
A real-time GPS-denied visual navigation system for fixed-wing UAVs, running on a Jetson Orin Nano Super (8GB). All AI model inference uses **native TensorRT Engine files** — no ONNX Runtime dependency. The system replaces the GPS module by sending MAVLink GPS_INPUT messages via pymavlink over UART at 5-10Hz.
Position is determined by fusing: (1) CUDA-accelerated visual odometry (cuVSLAM — native CUDA), (2) absolute position corrections from satellite image matching (LiteSAM or XFeat — TRT Engine FP16), and (3) IMU data from the flight controller via ESKF.
**Inference runtime decision**: Native TRT Engine over ONNX Runtime because:
1. ONNX RT CUDA EP is 7-8x slower on Orin Nano (tensor core bug)
2. ONNX RT TRT-EP wastes ~280-300MB per model (serialized engine retained in memory)
3. Cross-platform portability has zero value — deployment is Jetson-only
4. Native TRT provides direct CUDA stream control for pipelining with cuVSLAM
**Hard constraint**: Camera shoots at ~3fps (333ms interval). Full VO+ESKF pipeline within 400ms. GPS_INPUT at 5-10Hz.
**AI Model Runtime Summary**:
| Model | Runtime | Precision | Memory | Integration |
|-------|---------|-----------|--------|-------------|
| cuVSLAM | Native CUDA (PyCuVSLAM) | N/A (closed-source) | ~200-500MB | CUDA Stream A |
| LiteSAM | TRT Engine | FP16 | ~50-80MB | CUDA Stream B |
| XFeat | TRT Engine | FP16 | ~30-50MB | CUDA Stream B (fallback) |
| ESKF | CPU (Python/C++) | FP64 | ~10MB | CPU thread |
**Offline Preparation Pipeline** (before flight):
1. Download satellite tiles → validate → pre-resize → store (existing)
2. **NEW: Build TRT engines on Jetson** (one-time per model version)
- `trtexec --onnx=litesam_fp16.onnx --saveEngine=litesam.engine --fp16`
- `trtexec --onnx=xfeat.onnx --saveEngine=xfeat.engine --fp16`
3. Copy tiles + engines to Jetson storage
4. At startup: load engines + preload tiles into RAM
```
┌─────────────────────────────────────────────────────────────────────┐
│ OFFLINE (Before Flight) │
│ 1. Satellite Tiles → Download & Validate → Pre-resize → Store │
│ (Google Maps) (≥0.5m/px, <2yr) (matcher res) (GeoHash)│
│ 2. TRT Engine Build (one-time per model version): │
│ PyTorch model → reparameterize → ONNX export → trtexec --fp16 │
│ Output: litesam.engine, xfeat.engine │
│ 3. Copy tiles + engines to Jetson storage │
└─────────────────────────────────────────────────────────────────────┘
┌─────────────────────────────────────────────────────────────────────┐
│ ONLINE (During Flight) │
│ │
│ STARTUP: │
│ 1. pymavlink → read GLOBAL_POSITION_INT → init ESKF │
│ 2. Load TRT engines: litesam.engine + xfeat.engine │
│ (tensorrt.Runtime → deserialize_cuda_engine → create_context) │
│ 3. Allocate GPU buffers for TRT input/output (PyCUDA) │
│ 4. Start cuVSLAM with first camera frames │
│ 5. Preload satellite tiles ±2km into RAM │
│ 6. Begin GPS_INPUT output loop at 5-10Hz │
│ │
│ EVERY FRAME (3fps, 333ms interval): │
│ ┌──────────────────────────────────────┐ │
│ │ Nav Camera → Downsample (CUDA ~2ms) │ │
│ │ → cuVSLAM VO+IMU (~9ms) │ ← CUDA Stream A │
│ │ → ESKF measurement update │ │
│ └──────────────────────────────────────┘ │
│ │
│ 5-10Hz CONTINUOUS: │
│ ┌──────────────────────────────────────┐ │
│ │ ESKF IMU prediction → GPS_INPUT send │──→ Flight Controller │
│ └──────────────────────────────────────┘ │
│ │
│ KEYFRAMES (every 3-10 frames, async): │
│ ┌──────────────────────────────────────┐ │
│ │ TRT Engine inference (Stream B): │ │
│ │ context.enqueue_v3(stream_B) │──→ ESKF correction │
│ │ LiteSAM FP16 or XFeat FP16 │ │
│ └──────────────────────────────────────┘ │
│ │
│ TELEMETRY (1Hz): │
│ ┌──────────────────────────────────────┐ │
│ │ NAMED_VALUE_FLOAT: confidence, drift │──→ Ground Station │
│ └──────────────────────────────────────┘ │
└─────────────────────────────────────────────────────────────────────┘
```
## Architecture
### Component: AI Model Inference Runtime
| Solution | Tools | Advantages | Limitations | Performance | Memory | Fit |
|----------|-------|-----------|-------------|------------|--------|-----|
| Native TRT Engine | tensorrt Python + PyCUDA + trtexec | Optimal latency, minimal memory, full tensor core usage, direct CUDA stream control | Hardware-specific engines, manual buffer management, rebuild per TRT version | Optimal | ~50-130MB total (both models) | ✅ Best |
| ONNX Runtime TRT-EP | onnxruntime + TensorRT EP | Auto-fallback for unsupported ops, simpler API, auto engine caching | +280-300MB per model, wrapper overhead, first-run latency spike | Near-parity (claimed), up to 3x slower (observed) | ~640-690MB total (both models) | ❌ Memory overhead unacceptable |
| ONNX Runtime CUDA EP | onnxruntime + CUDA EP | Simplest API, broadest op support | 7-8x slower on Orin Nano (tensor core bug), no TRT optimizations | 7-8x slower | Standard | ❌ Performance unacceptable |
| Torch-TensorRT | torch_tensorrt | AOT compilation, PyTorch-native, handles mixed TRT/PyTorch | Newer on Jetson, requires PyTorch runtime at inference | Near native TRT | PyTorch runtime ~500MB+ | ⚠️ Viable alternative if TRT export fails |
**Selected**: **Native TRT Engine** — optimal performance and memory on our fixed NVIDIA hardware.
**Fallback**: If any model has unsupported TRT ops (e.g., MinGRU in LiteSAM), use **Torch-TensorRT** for that specific model. Torch-TensorRT handles mixed TRT/PyTorch execution but requires PyTorch runtime in memory.
### Component: TRT Engine Conversion Workflow
**LiteSAM conversion**:
1. Load PyTorch model with trained weights
2. Reparameterize MobileOne backbone (collapse multi-branch → single Conv2d+BN)
3. Export to ONNX: `torch.onnx.export(model, dummy_input, "litesam.onnx", opset_version=17)`
4. Verify with polygraphy: `polygraphy inspect model litesam.onnx`
5. Build engine on Jetson: `trtexec --onnx=litesam.onnx --saveEngine=litesam.engine --fp16 --memPoolSize=workspace:2048`
6. Verify engine: `trtexec --loadEngine=litesam.engine --fp16`
**XFeat conversion**:
1. Load PyTorch model
2. Export to ONNX: `torch.onnx.export(model, dummy_input, "xfeat.onnx", opset_version=17)`
3. Build engine on Jetson: `trtexec --onnx=xfeat.onnx --saveEngine=xfeat.engine --fp16`
4. Alternative: use XFeatTensorRT C++ implementation directly
**INT8 quantization strategy** (optional, future optimization):
- MobileOne backbone (CNN): INT8 safe with calibration data
- TAIFormer (transformer attention): FP16 only — INT8 degrades accuracy
- XFeat: evaluate INT8 on actual UAV-satellite pairs before deploying
- Use nvidia-modelopt for calibration: `from modelopt.onnx.quantization import quantize`
### Component: TRT Python Inference Wrapper
Minimal wrapper class for TRT engine inference:
```python
import tensorrt as trt
import pycuda.driver as cuda
class TRTInference:
def __init__(self, engine_path, stream):
self.logger = trt.Logger(trt.Logger.WARNING)
self.runtime = trt.Runtime(self.logger)
with open(engine_path, 'rb') as f:
self.engine = self.runtime.deserialize_cuda_engine(f.read())
self.context = self.engine.create_execution_context()
self.stream = stream
self._allocate_buffers()
def _allocate_buffers(self):
self.inputs = {}
self.outputs = {}
for i in range(self.engine.num_io_tensors):
name = self.engine.get_tensor_name(i)
shape = self.engine.get_tensor_shape(name)
dtype = trt.nptype(self.engine.get_tensor_dtype(name))
size = trt.volume(shape)
device_mem = cuda.mem_alloc(size * np.dtype(dtype).itemsize)
self.context.set_tensor_address(name, int(device_mem))
mode = self.engine.get_tensor_mode(name)
if mode == trt.TensorIOMode.INPUT:
self.inputs[name] = (device_mem, shape, dtype)
else:
self.outputs[name] = (device_mem, shape, dtype)
def infer_async(self, input_data):
for name, data in input_data.items():
cuda.memcpy_htod_async(self.inputs[name][0], data, self.stream)
self.context.enqueue_v3(self.stream.handle)
def get_output(self):
results = {}
for name, (dev_mem, shape, dtype) in self.outputs.items():
host_mem = np.empty(shape, dtype=dtype)
cuda.memcpy_dtoh_async(host_mem, dev_mem, self.stream)
self.stream.synchronize()
return results
```
Key design: `infer_async()` + `get_output()` split enables pipelining with cuVSLAM on Stream A while satellite matching runs on Stream B.
### Component: Visual Odometry (UNCHANGED)
cuVSLAM — native CUDA library, not affected by TRT migration. Already optimal.
### Component: Satellite Image Matching (UPDATED runtime + fallback chain)
| Solution | Tools | Advantages | Limitations | Performance (est. Orin Nano Super TRT FP16) | Params | Fit |
|----------|-------|-----------|-------------|----------------------------------------------|--------|-----|
| LiteSAM (opt) TRT Engine FP16 @ 1280px | trtexec + tensorrt Python | Best satellite-aerial accuracy (RMSE@30=17.86m UAV-VisLoc), 6.31M params, smallest model | MinGRU TRT export needs verification (LOW-MEDIUM risk) | Est. ~165-330ms | 6.31M | ✅ Primary (if TRT export succeeds AND ≤200ms) |
| EfficientLoFTR TRT Engine FP16 | trtexec + tensorrt Python | Proven TRT path (Coarse_LoFTR_TRT repo, 138 stars). Semi-dense. CVPR 2024. High accuracy. | 2.4x more params than LiteSAM. Requires einsum→elementary ops rewrite for TRT (documented in Coarse_LoFTR_TRT paper). | Est. ~200-400ms | 15.05M | ✅ Fallback if LiteSAM TRT fails |
| XFeat TRT Engine FP16 | trtexec + tensorrt Python (or XFeatTensorRT C++) | Fastest. Proven TRT implementation. Lightweight. | General-purpose, not designed for cross-view satellite-aerial gap (but nadir-nadir gap is small). | Est. ~50-100ms | <5M | ✅ Speed fallback |
**Decision tree (day-one on Orin Nano Super)**:
1. Clone LiteSAM repo → reparameterize MobileOne → `torch.onnx.export()``polygraphy inspect`
2. If ONNX export succeeds → `trtexec --onnx=litesam.onnx --saveEngine=litesam.engine --fp16`
3. If MinGRU causes ONNX/TRT failure → rewrite MinGRU forward() as unrolled 9-step loop → retry
4. If rewrite fails or accuracy degrades → **switch to EfficientLoFTR TRT**:
- Apply Coarse_LoFTR_TRT TRT-adaptation techniques (einsum replacement, etc.)
- Export to ONNX → trtexec --fp16
- Benchmark at 640×480 and 1280px
5. Benchmark winner: **if ≤200ms → use it. If >200ms but ≤300ms → acceptable (async on Stream B). If >300ms → use XFeat TRT**
**EfficientLoFTR TRT adaptation** (from Coarse_LoFTR_TRT paper, proven workflow):
- Replace `torch.einsum()` with elementary ops (view, bmm, reshape, sum)
- Replace any TRT-incompatible high-level PyTorch functions
- Use ONNX export path (less memory required than Torch-TensorRT on 8GB device)
- Knowledge distillation available for further parameter reduction if needed
### Component: Sensor Fusion (UNCHANGED)
ESKF — CPU-based mathematical filter, not affected.
### Component: Flight Controller Integration (UNCHANGED)
pymavlink — not affected by TRT migration.
### Component: Ground Station Telemetry (UNCHANGED)
MAVLink NAMED_VALUE_FLOAT — not affected.
### Component: Startup & Lifecycle (UPDATED)
**Updated startup sequence**:
1. Boot Jetson → start GPS-Denied service (systemd)
2. Connect to flight controller via pymavlink on UART
3. Wait for heartbeat from flight controller
4. **Initialize PyCUDA context**
5. **Load TRT engines**: litesam.engine + xfeat.engine via tensorrt.Runtime.deserialize_cuda_engine()
6. **Allocate GPU I/O buffers** for both models
7. **Create CUDA streams**: Stream A (cuVSLAM), Stream B (satellite matching)
8. Read GLOBAL_POSITION_INT → init ESKF
9. Start cuVSLAM with first camera frames
10. Begin GPS_INPUT output loop at 5-10Hz
11. Preload satellite tiles within ±2km into RAM
12. System ready
**Engine load time**: ~1-3 seconds per engine (deserialization from .engine file). One-time cost at startup.
### Component: Thermal Management (UNCHANGED)
Same adaptive pipeline. TRT engines are slightly more power-efficient than ONNX Runtime, but the difference is within noise.
### Component: Object Localization (UNCHANGED)
Not affected — trigonometric calculation, no AI inference.
## Speed Optimization Techniques
### 1. cuVSLAM for Visual Odometry (~9ms/frame)
Unchanged from draft03. Native CUDA, not part of TRT migration.
### 2. Native TRT Engine Inference (NEW)
All AI models run as pre-compiled TRT FP16 engines:
- Engine files built offline with trtexec (one-time per model version)
- Loaded at startup (~1-3s per engine)
- Inference via context.enqueue_v3() on dedicated CUDA Stream B
- GPU buffers pre-allocated — zero runtime allocation during flight
- No ONNX Runtime dependency — no framework overhead
Memory advantage over ONNX Runtime TRT-EP: ~560-600MB saved (both models combined).
Latency advantage: eliminates ONNX wrapper overhead, guaranteed tensor core utilization.
### 3. CUDA Stream Pipelining (REFINED)
- Stream A: cuVSLAM VO for current frame (~9ms) + ESKF fusion (~1ms)
- Stream B: TRT engine inference for satellite matching (LiteSAM or XFeat, async)
- CPU: GPS_INPUT output loop, NAMED_VALUE_FLOAT, command listener, tile management
- **NEW**: Both cuVSLAM and TRT engines use CUDA streams natively — no framework abstraction layer. Direct GPU scheduling.
### 4-7. (UNCHANGED from draft03)
Keyframe-based satellite matching, TensorRT FP16 optimization, proactive tile loading, 5-10Hz GPS_INPUT output — all unchanged.
## Processing Time Budget (per frame, 333ms interval)
### Normal Frame (non-keyframe)
Unchanged from draft03 — cuVSLAM dominates at ~22ms total.
### Keyframe Satellite Matching (async, CUDA Stream B)
**Path A — LiteSAM TRT Engine FP16 at 1280px**:
| Step | Time | Notes |
|------|------|-------|
| Downsample to 1280px | ~1ms | OpenCV CUDA |
| Load satellite tile | ~1ms | Pre-loaded in RAM |
| Copy input to GPU buffer | <0.5ms | PyCUDA memcpy_htod_async |
| LiteSAM TRT Engine FP16 | ≤200ms | context.enqueue_v3(stream_B) |
| Copy output from GPU | <0.5ms | PyCUDA memcpy_dtoh_async |
| Geometric pose (RANSAC) | ~5ms | Homography |
| ESKF satellite update | ~1ms | Delayed measurement |
| **Total** | **≤210ms** | Async on Stream B |
**Path B — XFeat TRT Engine FP16**:
| Step | Time | Notes |
|------|------|-------|
| XFeat TRT Engine inference | ~50-80ms | context.enqueue_v3(stream_B) |
| Geometric verification (RANSAC) | ~5ms | |
| ESKF satellite update | ~1ms | |
| **Total** | **~60-90ms** | Async on Stream B |
## Memory Budget (Jetson Orin Nano Super, 8GB shared)
| Component | Memory (Native TRT) | Memory (ONNX RT TRT-EP) | Notes |
|-----------|---------------------|--------------------------|-------|
| OS + runtime | ~1.5GB | ~1.5GB | JetPack 6.2 + Python |
| cuVSLAM | ~200-500MB | ~200-500MB | CUDA library + map |
| **LiteSAM TRT engine** | **~50-80MB** | **~330-360MB** | Native TRT vs TRT-EP. If LiteSAM fails: EfficientLoFTR ~100-150MB |
| **XFeat TRT engine** | **~30-50MB** | **~310-330MB** | Native TRT vs TRT-EP |
| Preloaded satellite tiles | ~200MB | ~200MB | ±2km of flight plan |
| pymavlink + MAVLink | ~20MB | ~20MB | |
| FastAPI (local IPC) | ~50MB | ~50MB | |
| ESKF + buffers | ~10MB | ~10MB | |
| ONNX Runtime framework | **0MB** | **~150MB** | Eliminated with native TRT |
| **Total** | **~2.1-2.9GB** | **~2.8-3.6GB** | |
| **% of 8GB** | **26-36%** | **35-45%** | |
| **Savings** | — | — | **~700MB saved with native TRT** |
## Confidence Scoring → GPS_INPUT Mapping
Unchanged from draft03.
## Key Risks and Mitigations
| Risk | Likelihood | Impact | Mitigation |
|------|-----------|--------|------------|
| **LiteSAM MinGRU ops unsupported in TRT 10.3** | LOW-MEDIUM | LiteSAM TRT export fails | Day-one verification: ONNX export → polygraphy → trtexec. If MinGRU fails: (1) rewrite as unrolled 9-step loop, (2) if still fails: **switch to EfficientLoFTR TRT** (proven TRT path, Coarse_LoFTR_TRT, 15.05M params). XFeat TRT as speed fallback. |
| **TRT engine build OOM on 8GB Jetson** | LOW | Cannot build engines on target device | Our models are small (6.31M LiteSAM, <5M XFeat). OOM unlikely. If occurs: reduce --memPoolSize, or build on identical Orin Nano module with more headroom |
| **Engine incompatibility after JetPack update** | MEDIUM | Must rebuild engines | Include engine rebuild in JetPack update procedure. Takes minutes per model. |
| **MAVSDK cannot send GPS_INPUT** | CONFIRMED | Must use pymavlink | Unchanged from draft03 |
| **cuVSLAM fails on low-texture terrain** | HIGH | Frequent tracking loss | Unchanged from draft03 |
| **Thermal throttling** | MEDIUM | Satellite matching budget blown | Unchanged from draft03 |
| LiteSAM TRT FP16 >200ms at 1280px | MEDIUM | Must use fallback matcher | Day-one benchmark. Fallback chain: EfficientLoFTR TRT (if ≤300ms) → XFeat TRT (if all >300ms) |
| Google Maps satellite quality in conflict zone | HIGH | Satellite matching fails | Unchanged from draft03 |
## Testing Strategy
### Integration / Functional Tests
All tests from draft03 unchanged, plus:
- **TRT engine load test**: Verify litesam.engine and xfeat.engine load successfully on Jetson Orin Nano Super
- **TRT inference correctness**: Compare TRT engine output vs PyTorch reference output (max L1 error < 0.01)
- **CUDA Stream B pipelining**: Verify satellite matching on Stream B does not block cuVSLAM on Stream A
- **Engine pre-built validation**: Verify engine files from offline preparation work without rebuild at runtime
### Non-Functional Tests
All tests from draft03 unchanged, plus:
- **TRT engine build time**: Measure trtexec build time for LiteSAM and XFeat on Orin Nano Super (expected: 1-5 minutes each)
- **TRT engine load time**: Measure deserialization time (expected: 1-3 seconds each)
- **Memory comparison**: Measure actual GPU memory with native TRT vs ONNX RT TRT-EP for both models
- **MinGRU TRT compatibility** (day-one blocker):
1. Clone LiteSAM repo, load pretrained weights
2. Reparameterize MobileOne backbone
3. `torch.onnx.export(model, dummy, "litesam.onnx", opset_version=17)`
4. `polygraphy inspect model litesam.onnx` — check for unsupported ops
5. `trtexec --onnx=litesam.onnx --saveEngine=litesam.engine --fp16`
6. If step 3 or 5 fails on MinGRU: rewrite MinGRU forward() as unrolled loop, retry
7. If still fails: switch to EfficientLoFTR, apply Coarse_LoFTR_TRT adaptation
8. Compare TRT output vs PyTorch reference (max L1 error < 0.01)
- **EfficientLoFTR TRT fallback benchmark** (if LiteSAM fails): apply TRT adaptation from Coarse_LoFTR_TRT → ONNX → trtexec → measure latency at 640×480 and 1280px
- **Tensor core utilization**: Verify with NSight that TRT engines use tensor cores (unlike ONNX RT CUDA EP)
## References
- ONNX Runtime Issue #24085 (Jetson Orin Nano tensor core bug): https://github.com/microsoft/onnxruntime/issues/24085
- ONNX Runtime Issue #20457 (TRT-EP memory overhead): https://github.com/microsoft/onnxruntime/issues/20457
- ONNX Runtime Issue #12083 (TRT-EP vs native TRT): https://github.com/microsoft/onnxruntime/issues/12083
- NVIDIA TensorRT 10 Python API: https://docs.nvidia.com/deeplearning/tensorrt/10.15.1/inference-library/python-api-docs.html
- TensorRT Best Practices: https://docs.nvidia.com/deeplearning/tensorrt/latest/performance/best-practices.html
- TensorRT engine hardware specificity: https://github.com/NVIDIA/TensorRT/issues/1920
- trtexec ONNX conversion: https://nvidia-jetson.piveral.com/jetson-orin-nano/how-to-convert-onnx-to-engine-on-jetson-orin-nano-dev-board/
- Torch-TensorRT JetPack 6.2: https://docs.pytorch.org/TensorRT/v2.10.0/getting_started/jetpack.html
- XFeatTensorRT: https://github.com/PranavNedunghat/XFeatTensorRT
- JetPack 6.2 Release Notes: https://docs.nvidia.com/jetson/archives/jetpack-archived/jetpack-62/release-notes/index.html
- Jetson Orin Nano Super: https://developer.nvidia.com/blog/nvidia-jetson-orin-nano-developer-kit-gets-a-super-boost/
- DLA on Jetson Orin: https://developer.nvidia.com/blog/maximizing-deep-learning-performance-on-nvidia-jetson-orin-with-dla/
- EfficientLoFTR (CVPR 2024): https://github.com/zju3dv/EfficientLoFTR
- EfficientLoFTR HuggingFace: https://huggingface.co/docs/transformers/en/model_doc/efficientloftr
- Coarse_LoFTR_TRT (TRT for embedded): https://github.com/Kolkir/Coarse_LoFTR_TRT
- Coarse_LoFTR_TRT paper: https://ar5iv.labs.arxiv.org/html/2202.00770
- LoFTR_TRT: https://github.com/Kolkir/LoFTR_TRT
- minGRU ("Were RNNs All We Needed?"): https://huggingface.co/papers/2410.01201
- minGRU PyTorch implementation: https://github.com/lucidrains/minGRU-pytorch
- LiteSAM paper (MinGRU details, Eqs 12-16): https://www.mdpi.com/2072-4292/17/19/3349
- DALGlue (UAV feature matching, 2025): https://www.nature.com/articles/s41598-025-21602-5
- All references from solution_draft03.md
## Related Artifacts
- AC Assessment: `_docs/00_research/gps_denied_nav/00_ac_assessment.md`
- Research artifacts (this assessment): `_docs/00_research/trt_engine_migration/`
- Previous research: `_docs/00_research/gps_denied_nav_v3/`
- Tech stack evaluation: `_docs/01_solution/tech_stack.md`
- Security analysis: `_docs/01_solution/security_analysis.md`
+257
View File
@@ -0,0 +1,257 @@
# Tech Stack Evaluation
## Requirements Summary
### Functional
- GPS-denied visual navigation for fixed-wing UAV
- Frame-center GPS estimation via VO + satellite matching + IMU fusion
- Object-center GPS via geometric projection
- Real-time streaming via REST API + SSE
- Disconnected route segment handling
- User-input fallback for unresolvable frames
### Non-Functional
- <400ms per-frame processing (camera @ ~3fps)
- <50m accuracy for 80% of frames, <20m for 60%
- <8GB total memory (CPU+GPU shared pool)
- Up to 3000 frames per flight session
- Image Registration Rate >95% (normal segments)
### Hardware Constraints
- **Jetson Orin Nano Super** (8GB LPDDR5, 1024 CUDA cores, 67 TOPS INT8)
- **JetPack 6.2.2**: CUDA 12.6.10, TensorRT 10.3.0, cuDNN 9.3
- ARM64 (aarch64) architecture
- No internet connectivity during flight
## Technology Evaluation
### Platform & OS
| Option | Version | Score (1-5) | Notes |
|--------|---------|-------------|-------|
| **JetPack 6.2.2 (L4T)** | Ubuntu 22.04 based | **5** | Only supported OS for Orin Nano Super. Includes CUDA 12.6, TensorRT 10.3, cuDNN 9.3 |
**Selected**: JetPack 6.2.2 — no alternative.
### Primary Language
| Option | Fitness | Maturity | Perf on Jetson | Ecosystem | Score |
|--------|---------|----------|----------------|-----------|-------|
| **Python 3.10+** | 5 | 5 | 4 | 5 | **4.8** |
| C++ | 5 | 5 | 5 | 3 | 4.5 |
| Rust | 3 | 3 | 5 | 2 | 3.3 |
**Selected**: **Python 3.10+** as primary language.
- cuVSLAM provides Python bindings (PyCuVSLAM v15.0.0)
- TensorRT has Python API
- FastAPI is Python-native
- OpenCV has full Python+CUDA bindings
- Performance-critical paths offloaded to CUDA via cuVSLAM/TensorRT — Python is glue code only
- C++ for custom ESKF if NumPy proves too slow (unlikely for 16-state EKF at 100Hz)
### Visual Odometry
| Option | Version | FPS on Orin Nano | Memory | License | Score |
|--------|---------|------------------|--------|---------|-------|
| **cuVSLAM (PyCuVSLAM)** | v15.0.0 (Mar 2026) | 116fps @ 720p | ~200-300MB | Free (NVIDIA, closed-source) | **5** |
| XFeat frame-to-frame | TensorRT engine | ~30-50ms/frame | ~50MB | MIT | 3.5 |
| ORB-SLAM3 | v1.0 | ~30fps | ~300MB | GPLv3 | 2.5 |
**Selected**: **PyCuVSLAM v15.0.0**
- 116fps on Orin Nano 8G at 720p (verified via Intermodalics benchmark)
- Mono + IMU mode natively supported
- Auto IMU fallback on tracking loss
- Pre-built aarch64 wheel: `pip install -e bin/aarch64`
- Loop closure built-in
**Risk**: Closed-source; nadir-only camera not explicitly tested. **Fallback**: XFeat frame-to-frame matching.
### Satellite Image Matching (Benchmark-Driven Selection)
**Day-one benchmark decides between two candidates:**
| Option | Params | Accuracy (UAV-VisLoc) | Est. Time on Orin Nano | License | Score |
|--------|--------|----------------------|----------------------|---------|-------|
| **LiteSAM (opt)** | 6.31M | RMSE@30 = 17.86m | ~300-500ms @ 480px (estimated) | Open-source | **4** (if fast enough) |
| **XFeat semi-dense** | ~5M | Not benchmarked on UAV-VisLoc | ~50-100ms | MIT | **4** (if LiteSAM too slow) |
**Decision rule**:
1. Export LiteSAM (opt) to TensorRT FP16 on Orin Nano Super
2. Benchmark at 480px, 640px, 800px
3. If ≤400ms at 480px → LiteSAM
4. If >400ms → **abandon LiteSAM, XFeat is primary**
| Requirement | LiteSAM (opt) | XFeat semi-dense |
|-------------|---------------|------------------|
| PyTorch → ONNX → TensorRT export | Required | Required |
| TensorRT FP16 engine | 6.31M params, ~25MB engine | ~5M params, ~20MB engine |
| Input preprocessing | Resize to 480px, normalize | Resize to 640px, normalize |
| Matching pipeline | End-to-end (detect + match + refine) | Detect → KNN match → geometric verify |
| Cross-view robustness | Designed for satellite-aerial gap | General-purpose, less robust |
### Sensor Fusion
| Option | Complexity | Accuracy | Compute @ 100Hz | Score |
|--------|-----------|----------|-----------------|-------|
| **ESKF (custom)** | Low | Good | <1ms/step | **5** |
| Hybrid ESKF/UKF | Medium | 49% better | ~2-3ms/step | 3.5 |
| GTSAM Factor Graph | High | Best | ~10-50ms/step | 2 |
**Selected**: **Custom ESKF in Python (NumPy/SciPy)**
- 16-state vector, well within NumPy capability
- FilterPy (v1.4.5, MIT) as reference/fallback, but custom implementation preferred for tighter control
- If 100Hz IMU prediction step proves slow in Python: rewrite as Cython or C extension (~1 day effort)
### Image Preprocessing
| Option | Tool | Time on Orin Nano | Notes | Score |
|--------|------|-------------------|-------|-------|
| **OpenCV CUDA resize** | cv2.cuda.resize | ~2-3ms (pre-allocated) | Must build OpenCV with CUDA from source. Pre-allocate GPU mats to avoid allocation overhead | **4** |
| NVIDIA VPI resize | VPI 3.2 | ~1-2ms | Part of JetPack, potentially faster | 4 |
| CPU resize (OpenCV) | cv2.resize | ~5-10ms | No GPU needed, simpler | 3 |
**Selected**: **OpenCV CUDA** (pre-allocated GPU memory) or **VPI 3.2** (whichever is faster in benchmark). Both available in JetPack 6.2.
- Must build OpenCV from source with `CUDA_ARCH_BIN=8.7` for Orin Nano Ampere architecture
- Alternative: VPI 3.2 is pre-installed in JetPack 6.2, no build step needed
### API & Streaming Framework
| Option | Version | Async Support | SSE Support | Score |
|--------|---------|--------------|-------------|-------|
| **FastAPI + sse-starlette** | FastAPI 0.115+, sse-starlette 3.3.2 | Native async/await | EventSourceResponse with auto-disconnect | **5** |
| Flask + flask-sse | Flask 3.x | Limited | Redis dependency | 2 |
| Raw aiohttp | aiohttp 3.x | Full | Manual SSE implementation | 3 |
**Selected**: **FastAPI + sse-starlette v3.3.2**
- sse-starlette: 108M downloads/month, BSD-3 license, production-stable
- Auto-generated OpenAPI docs
- Native async for non-blocking VO + satellite pipeline
- Uvicorn as ASGI server
### Satellite Tile Storage & Indexing
| Option | Complexity | Lookup Speed | Score |
|--------|-----------|-------------|-------|
| **GeoHash-indexed directory** | Low | O(1) hash lookup | **5** |
| SQLite + spatial index | Medium | O(log n) | 4 |
| PostGIS | High | O(log n) | 2 (overkill) |
**Selected**: **GeoHash-indexed directory structure**
- Pre-flight: download tiles, store as `{geohash}/{zoom}_{x}_{y}.jpg` + `{geohash}/{zoom}_{x}_{y}_resized.jpg`
- Runtime: compute geohash from ESKF position → direct directory lookup
- Metadata in JSON sidecar files
- No database dependency on the Jetson during flight
### Satellite Tile Provider
| Provider | Max Zoom | GSD | Pricing | Eastern Ukraine Coverage | Score |
|----------|----------|-----|---------|--------------------------|-------|
| **Google Maps Tile API** | 18-19 | ~0.3-0.5 m/px | 100K tiles free/month, then $0.48/1K | Partial (conflict zone gaps) | **4** |
| Bing Maps | 18-19 | ~0.3-0.5 m/px | 125K free/year (basic) | Similar | 3.5 |
| Mapbox Satellite | 18-19 | ~0.5 m/px | 200K free/month | Similar | 3.5 |
**Selected**: **Google Maps Tile API** (per restrictions.md). 100K free tiles/month covers ~25km² at zoom 19. For larger operational areas, costs are manageable at $0.48/1K tiles.
### Output Format
| Format | Standard | Tooling | Score |
|--------|----------|---------|-------|
| **GeoJSON** | RFC 7946 | Universal GIS support | **5** |
| CSV (lat, lon, confidence) | De facto | Simple, lightweight | 4 |
**Selected**: **GeoJSON** as primary, CSV as export option. Per AC: WGS84 coordinates.
## Tech Stack Summary
```
┌────────────────────────────────────────────────────┐
│ HARDWARE: Jetson Orin Nano Super 8GB │
│ OS: JetPack 6.2.2 (L4T / Ubuntu 22.04) │
│ CUDA 12.6.10 / TensorRT 10.3.0 / cuDNN 9.3 │
├────────────────────────────────────────────────────┤
│ LANGUAGE: Python 3.10+ │
│ FRAMEWORK: FastAPI + sse-starlette 3.3.2 │
│ SERVER: Uvicorn (ASGI) │
├────────────────────────────────────────────────────┤
│ VISUAL ODOMETRY: PyCuVSLAM v15.0.0 │
│ SATELLITE MATCH: LiteSAM(opt) or XFeat (benchmark) │
│ SENSOR FUSION: Custom ESKF (NumPy/SciPy) │
│ PREPROCESSING: OpenCV CUDA or VPI 3.2 │
│ INFERENCE: TensorRT 10.3.0 (FP16) │
├────────────────────────────────────────────────────┤
│ TILE PROVIDER: Google Maps Tile API │
│ TILE STORAGE: GeoHash-indexed directory │
│ OUTPUT: GeoJSON (WGS84) via SSE stream │
└────────────────────────────────────────────────────┘
```
## Dependency List
### Python Packages (pip)
| Package | Version | Purpose |
|---------|---------|---------|
| pycuvslam | v15.0.0 (aarch64 wheel) | Visual odometry |
| fastapi | >=0.115 | REST API framework |
| sse-starlette | >=3.3.2 | SSE streaming |
| uvicorn | >=0.30 | ASGI server |
| numpy | >=1.26 | ESKF math, array ops |
| scipy | >=1.12 | Rotation matrices, spatial transforms |
| opencv-python (CUDA build) | >=4.8 | Image preprocessing (must build from source with CUDA) |
| torch (aarch64) | >=2.3 (JetPack-compatible) | LiteSAM model loading (if selected) |
| tensorrt | 10.3.0 (JetPack bundled) | Inference engine |
| pycuda | >=2024.1 | CUDA stream management |
| geojson | >=3.1 | GeoJSON output formatting |
| pygeohash | >=1.2 | GeoHash tile indexing |
### System Dependencies (JetPack 6.2.2)
| Component | Version | Notes |
|-----------|---------|-------|
| CUDA Toolkit | 12.6.10 | Pre-installed |
| TensorRT | 10.3.0 | Pre-installed |
| cuDNN | 9.3 | Pre-installed |
| VPI | 3.2 | Pre-installed, alternative to OpenCV CUDA for resize |
| cuVSLAM runtime | Bundled with PyCuVSLAM wheel | |
### Offline Preprocessing Tools (developer machine, not Jetson)
| Tool | Purpose |
|------|---------|
| Python 3.10+ | Tile download script |
| Google Maps Tile API key | Satellite tile access |
| torch + LiteSAM weights | Feature pre-extraction (if LiteSAM selected) |
| trtexec (TensorRT) | Model export to TensorRT engine |
## Risk Assessment
| Technology | Risk | Likelihood | Impact | Mitigation |
|-----------|------|-----------|--------|------------|
| cuVSLAM | Closed-source, nadir camera untested | Medium | High | XFeat frame-to-frame as open-source fallback |
| LiteSAM | May exceed 400ms on Orin Nano Super | High | High | **Abandon for XFeat** — day-one benchmark is go/no-go |
| OpenCV CUDA build | Build complexity on Jetson, CUDA arch compatibility | Medium | Low | VPI 3.2 as drop-in alternative (pre-installed) |
| Google Maps Tile API | Conflict zone coverage gaps, EEA restrictions | Medium | Medium | Test tile availability for operational area pre-flight; alternative providers (Bing, Mapbox) |
| Custom ESKF | Implementation bugs, tuning effort | Low | Medium | FilterPy v1.4.5 as reference; well-understood algorithm |
| Python GIL | Concurrent VO + satellite matching contention | Low | Low | CUDA operations release GIL; use asyncio + threading for I/O |
## Learning Requirements
| Technology | Team Expertise Needed | Ramp-up Time |
|-----------|----------------------|--------------|
| PyCuVSLAM | SLAM concepts, Python API, camera calibration | 2-3 days |
| TensorRT model export | ONNX export, trtexec, FP16 optimization | 2-3 days |
| LiteSAM architecture | Transformer-based matching (if selected) | 1-2 days |
| XFeat | Feature detection/matching concepts | 1 day |
| ESKF | Kalman filtering, quaternion math, multi-rate fusion | 3-5 days |
| FastAPI + SSE | Async Python, ASGI, SSE protocol | 1 day |
| GeoHash spatial indexing | Geospatial concepts | 0.5 days |
| Jetson deployment | JetPack, power modes, thermal management | 2-3 days |
## Development Environment
| Environment | Purpose | Setup |
|-------------|---------|-------|
| **Developer machine** (x86_64, GPU) | Development, unit testing, model export | Docker with CUDA + TensorRT |
| **Jetson Orin Nano Super** | Integration testing, benchmarking, deployment | JetPack 6.2.2 flashed, SSH access |
Code should be developed and unit-tested on x86_64, then deployed to Jetson for integration/performance testing. cuVSLAM and TensorRT engines are aarch64-only — mock these in x86_64 tests.