Refactor acceptance criteria, problem description, and restrictions for UAV GPS-Denied system. Enhance clarity and detail in performance metrics, image processing requirements, and operational constraints. Introduce new sections for UAV specifications, camera details, satellite imagery, and onboard hardware.

This commit is contained in:
Oleksandr Bezdieniezhnykh
2026-03-17 09:00:06 +02:00
parent 767874cb90
commit f2aa95c8a2
35 changed files with 4857 additions and 26 deletions
@@ -0,0 +1,74 @@
# Acceptance Criteria Assessment
## Acceptance Criteria
| Criterion | Our Values | Researched Values | Cost/Timeline Impact | Status |
|-----------|-----------|-------------------|---------------------|--------|
| Position accuracy (80% of photos) | ≤50m error | 15-150m achievable depending on method. SatLoc (2025): <15m with adaptive fusion. Mateos-Ramirez (2024): 142m mean at 1000m+ altitude. At 400m altitude with better GSD (~6cm/px) and satellite correction, ≤50m for 80% is realistic | Moderate — requires high-quality satellite imagery and robust feature matching pipeline | **Modified** — see notes on satellite imagery quality dependency |
| Position accuracy (60% of photos) | ≤20m error | Achievable only with satellite-anchored corrections, not with VO alone. SatLoc reports <15m with satellite anchoring + VO fusion. Requires 0.3-0.5 m/px satellite imagery and good terrain texture | High — requires premium satellite imagery, robust cross-view matching, and careful calibration | **Modified** — add dependency on satellite correction frequency |
| Outlier tolerance | 350m displacement between consecutive photos | At 400m altitude, image footprint is ~375x250m. A 350m displacement means near-zero overlap. VO will fail; system must rely on IMU dead-reckoning or satellite re-localization | Low — standard outlier detection can handle this | Modified — specify fallback strategy (IMU dead-reckoning + satellite re-matching) |
| Sharp turn handling (partial overlap) | <200m drift, <70° angle, <5% overlap | Standard VO fails below ~20-30% overlap. With <5% overlap, feature matching between consecutive frames is unreliable. Requires satellite-based re-localization or IMU bridging | High — requires separate re-localization module | Modified — clarify: "70%" should likely be "70 degrees"; add IMU-bridge requirement |
| Disconnected route segments | System should reconnect disconnected chunks | This is essentially a place recognition / re-localization problem. Solvable via satellite image matching for each new segment independently | High — core architectural requirement affecting system design | Modified — add: each segment should independently localize via satellite matching |
| User fallback input | Ask user after 3 consecutive failures | Reasonable fallback. Needs UI/API integration for interactive input | Low | No change |
| Processing time per image | <5 seconds | On Jetson Orin Nano Super (8GB shared memory): feasible with optimized pipeline. CUDA feature extraction ~50ms, matching ~100-500ms, satellite crop+match ~1-3s. Full pipeline 2-4s is achievable with image downsampling and TensorRT optimization | Moderate — requires TensorRT optimization and image downsampling strategy | **Modified** — specify this is for Jetson Orin Nano Super, not RTX 2060 |
| Real-time streaming | SSE for immediate results + refinement | Standard pattern, well-supported | Low | No change |
| Image Registration Rate | >95% | For consecutive frames with nadir camera in good conditions: 90-98% achievable. Drops significantly during sharp turns and over low-texture terrain (water, uniform fields). The 95% target conflicts with sharp-turn handling requirement | Moderate — requires learning-based matchers (SuperPoint/LightGlue) | **Modified** — clarify: 95% applies to "normal flight" segments only; sharp-turn frames are expected failures handled by re-localization |
| Mean Reprojection Error | <1.0 pixels | Achievable with modern methods (LightGlue, SuperGlue). Traditional methods typically 1-3 px. Deep learning matchers routinely achieve 0.3-0.8 px with proper calibration | Moderate — requires deep learning feature matchers | No change — achievable |
| REST API + SSE architecture | Background service | Standard architecture, well-supported in Python (FastAPI + SSE) | Low | No change |
| Satellite imagery resolution | ≥0.5 m/px, ideally 0.3 m/px | Google Maps for eastern Ukraine: variable, typically 0.5-1.0 m/px in rural areas. 0.3 m/px unlikely from Google Maps. Commercial providers (Maxar, Planet) offer 0.3-0.5 m/px but at significant cost | **High** — Google Maps may not meet 0.5 m/px in all areas of the operational region. 0.3 m/px requires commercial satellite providers | **Modified** — current Google Maps limitation may make this unachievable for all areas; consider fallback for degraded satellite quality |
| Confidence scoring | Per-position estimate (high=satellite, low=VO) | Standard practice in sensor fusion. Easy to implement | Low | No change |
| Output format | WGS84, GeoJSON or CSV | Standard, trivial to implement | Negligible | No change |
| Satellite imagery age | <2 years where possible | Google Maps imagery for conflict zones (eastern Ukraine) may be significantly outdated or intentionally degraded. Recency is hard to guarantee | Medium — may need multiple satellite sources | **Modified** — flag: conflict zone imagery may be intentionally limited |
| Max VO cumulative drift | <100m between satellite corrections | VIO drift typically 0.8-1% of distance. Between corrections at 1km intervals: ~10m drift. 100m budget allows corrections every ~10km — very generous | Low — easily achievable if corrections happen at reasonable intervals | No change — generous threshold |
| Memory usage | <8GB shared memory (Jetson Orin Nano Super) | Binding constraint. 8GB LPDDR5 shared between CPU and GPU. ~6-7GB usable after OS. 26MP images need downsampling | **Critical** — all processing must fit within 8GB shared memory | **Updated** — changed to Jetson Orin Nano Super constraint |
| Object center coordinates | Accuracy consistent with frame-center accuracy | New criterion — derives from problem statement requirement | Low — once frame position is known, object position follows from pixel offset + GSD | **Added** |
| Sharp turn handling | <200m drift, <70 degrees, <5% overlap. 95% registration rate applies to normal flight only | Clarified from original "70%" to "70 degrees". Split registration rate expectation | Low — clarification only | **Updated** |
| Offline preprocessing time | Not time-critical (minutes/hours before flight) | New criterion — no constraint existed | Low | **Added** |
## Restrictions Assessment
| Restriction | Our Values | Researched Values | Cost/Timeline Impact | Status |
|-------------|-----------|-------------------|---------------------|--------|
| Aircraft type | Fixed-wing only | Appropriate — fixed-wing has predictable motion model, mostly forward flight. Simplifies VO assumptions | N/A | No change |
| Camera mount | Downward-pointing, fixed, not autostabilized | Implies roll/pitch affect image. At 400m altitude, moderate roll/pitch causes manageable image shift. IMU data can compensate. Non-stabilized means more variable image overlap and orientation | Medium — must use IMU data for image dewarping or accept orientation-dependent accuracy | **Modified** — add: IMU-based image orientation correction should be considered |
| Operational region | Eastern/southern Ukraine (left of Dnipro) | Conflict zone — satellite imagery may be degraded, outdated, or restricted. Terrain: mix of agricultural, urban, forest. Agricultural areas have seasonal texture changes | **High** — satellite imagery availability and quality is a significant risk | **Modified** — flag operational risk: imagery access in conflict zones |
| Image resolution | FullHD to 6252x4168, known camera parameters | 26MP at max is large for edge processing. Must downsample for feature extraction. Known camera intrinsics enable proper projective geometry | Medium — pipeline must handle variable resolutions | No change |
| Altitude | Predefined, ≤1km, terrain height negligible | At 400m: GSD ~6cm/px, footprint ~375x250m. Terrain "negligible" is an approximation — even 50m terrain variation at 400m altitude causes ~12% scale error. The referenced paper (Mateos-Ramirez 2024) shows terrain elevation is a primary error source | **Medium** — "terrain height negligible" needs qualification. At 400m, terrain variations >50m become significant | **Modified** — add: terrain height can be neglected only if variations <50m within image footprint |
| IMU data availability | "A lot of data from IMU" | IMU provides: accelerometer, gyroscope, magnetometer. Crucial for: dead-reckoning during feature-less frames, image orientation compensation, scale estimation, motion prediction. Standard tactical IMUs provide 100-400Hz data | Low — standard IMU integration | **Modified** — specify: IMU data includes gyroscope + accelerometer at ≥100Hz; will be used for orientation compensation and dead-reckoning fallback |
| Weather | Mostly sunny | Favorable for visual methods. Shadows can actually help feature matching. Reduces image quality variability | Low — favorable condition | No change |
| Satellite provider | Google Maps (potentially outdated) | **Critical limitation**: Google Maps satellite API has usage limits, unknown update frequency for eastern Ukraine, potential conflict-zone restrictions. Resolution may not meet 0.5 m/px in rural areas. No guarantee of recency | **High** — single-provider dependency is a significant risk | **Modified** — consider: (1) downloading tiles ahead of time for the operational area, (2) having a fallback provider strategy |
| Photo count | Up to 3000, typically 500-1500 | At 3fps and 500-1500 photos: 3-8 minutes of flight. At ~100m spacing: 50-150km route. Memory for 3000 pre-extracted satellite feature maps needs careful management on 8GB | Medium — batch processing and memory management needed | **Modified** — add: pipeline must manage memory for up to 3000 frames on 8GB device |
| Sharp turns | Next photo may have no common objects with previous | This is the hardest edge case. Complete visual discontinuity requires satellite-based re-localization. IMU provides heading/velocity for bridging. System must be architected around this possibility | High — drives core architecture decision | No change — already captured as a defining constraint |
| Processing hardware | Jetson Orin Nano Super, 67 TOPS | 8GB shared LPDDR5, 1024 CUDA cores, 32 Tensor Cores, 102 GB/s bandwidth. TensorRT for inference optimization. Power: 7-25W. Significantly less capable than desktop GPU | **Critical** — all processing must fit within 8GB shared memory, pipeline must be optimized for TensorRT | **Modified** — CONTRADICTS AC's RTX 2060 reference. Must be the binding constraint |
## Key Findings
1. **CRITICAL CONTRADICTION**: The AC mentions "RTX 2060 compatibility" (16GB RAM + 6GB VRAM) but the restriction specifies Jetson Orin Nano Super (8GB shared memory). These are fundamentally different platforms. **The Jetson must be the binding constraint.** All processing, including model weights, image buffers, and intermediate results, must fit within ~6-7GB usable memory (OS takes ~1-1.5GB).
2. **Satellite Imagery Risk**: Google Maps as the sole satellite provider for a conflict zone in eastern Ukraine presents significant quality, resolution, and recency risks. The 0.3 m/px "ideal" resolution is unlikely available from Google Maps for this region. The system design must be robust to degraded satellite reference quality (0.5-1.0 m/px).
3. **Accuracy is Achievable but Conditional**: The 50m/80% and 20m/60% accuracy targets are achievable based on recent research (SatLoc 2025: <15m with adaptive fusion), but **only when satellite corrections are successful**. VO-only segments will drift ~1% of distance traveled. The system must maximize satellite correction frequency.
4. **Sharp Turn Handling Drives Architecture**: The requirement to handle disconnected route segments with no visual overlap between consecutive frames means the system cannot rely solely on sequential VO. It must have an independent satellite-based geo-localization capability for each frame or segment — this is a core architectural requirement.
5. **Processing Time is Feasible**: <5s per image on Jetson Orin Nano Super is achievable with: (a) image downsampling (e.g., to 2000x1300), (b) TensorRT-optimized models, (c) efficient satellite region cropping. GPU-accelerated feature extraction takes ~50ms, matching ~100-500ms, satellite matching ~1-3s.
6. **Missing AC: Object Center Coordinates**: The problem statement mentions "coordinates of the center of any object in these photos" but no acceptance criterion specifies the accuracy requirement for this. Need to add.
7. **Missing AC: DEM/Elevation Data**: Research shows terrain elevation is a primary error source for pixel-to-meter conversion at these altitudes. If terrain variations are >50m, a DEM is needed. No current restriction mentions DEM availability.
8. **Missing AC: Offline Preprocessing Time**: No constraint on how long satellite image preprocessing can take before the flight.
9. **"70%" in Sharp Turn AC is Ambiguous**: "at an angle of less than 70%" — this likely means 70 degrees, not 70%.
## Sources
- SatLoc: Hierarchical Adaptive Fusion Framework for GNSS-denied UAV Localization (2025) — <15m error, >90% coverage, 2+ Hz on edge hardware
- Mateos-Ramirez et al. "Visual Odometry in GPS-Denied Zones for Fixed-Wing UAV" (2024) — 142.88m mean error over 17km at 1000m+ altitude, 0.83% error rate with satellite correction
- NVIDIA Jetson Orin Nano Super specs: 8GB LPDDR5, 67 TOPS, 1024 CUDA cores, 102 GB/s bandwidth
- cuda-efficient-features: Feature extraction benchmarks — 4K in ~12ms on Jetson Xavier
- SIFT+LightGlue for UAV image mosaicking (ISPRS 2025) — superior performance across diverse scenarios
- SuperPoint+LightGlue comparative analysis (2024) — best balance of robustness, accuracy, efficiency
- Google Maps satellite resolution: 0.15m-30m depending on location and source imagery
- VIO drift benchmarks: 0.82-1% of distance traveled (EuRoC, outdoor flights)
- UAVSAR cross-modality matching: 1.83-2.86m RMSE with deep learning approach (Springer 2026)
@@ -0,0 +1,88 @@
# Question Decomposition
## Original Question
Research the GPS-denied onboard navigation problem for a fixed-wing UAV and find the best solution architecture. The system must determine frame-center GPS coordinates using visual odometry, satellite image matching, and IMU fusion — all running on a Jetson Orin Nano Super (8GB shared memory, 67 TOPS).
## Active Mode
Mode A Phase 2 — Initial Research (Problem & Solution)
## Rationale
No existing solution drafts. Full problem decomposition and solution research needed.
## Problem Context Summary (from INPUT_DIR)
- **Platform**: Fixed-wing UAV, camera pointing down (not stabilized), 400m altitude max 1km
- **Camera**: ADTi Surveyor Lite 26S v2, 26MP (6252x4168), focal length 25mm, sensor width 23.5mm
- **GSD at 400m**: ~6cm/pixel, footprint ~375x250m
- **Frame rate**: 3 fps (interval ~333ms, real-world could be 400-500ms)
- **Photo count**: 500-3000 per flight
- **IMU**: Available at high rate
- **Initial GPS**: Known; GPS may be denied/spoofed during flight
- **Satellite reference**: Pre-uploaded Google Maps tiles
- **Hardware**: Jetson Orin Nano Super, 8GB shared memory, 67 TOPS
- **Region**: Eastern/southern Ukraine (conflict zone)
- **Key challenge**: Reconnecting disconnected route segments after sharp turns
## Question Type Classification
**Decision Support** — we need to evaluate and select the best architectural approach and component technologies for each part of the pipeline.
## Research Subject Boundary Definition
| Dimension | Boundary |
|-----------|----------|
| Population | Fixed-wing UAVs with nadir cameras at 200-1000m altitude |
| Geography | Rural/semi-urban terrain in eastern Ukraine |
| Timeframe | Current state-of-the-art (2023-2026) |
| Level | Edge computing (Jetson-class, 8GB memory), real-time processing |
## Decomposed Sub-Questions
### A. Existing/Competitor Solutions
1. What existing systems solve GPS-denied UAV visual navigation?
2. What open-source implementations exist for VO + satellite matching?
3. What commercial/military solutions address this problem?
### B. Architecture Components
4. What is the optimal pipeline architecture (sequential vs parallel, streaming)?
5. How should VO, satellite matching, and IMU fusion be combined (loosely vs tightly coupled)?
6. How to handle disconnected route segments (the core architectural challenge)?
### C. Visual Odometry Component
7. What VO algorithms work best for aerial nadir imagery on edge hardware?
8. What feature extractors/matchers are optimal for Jetson (SuperPoint, ORB, XFeat)?
9. How to handle scale estimation with known altitude and camera parameters?
10. What is the optimal image downsampling strategy for 26MP on 8GB memory?
### D. Satellite Image Matching Component
11. How to efficiently match UAV frames against pre-loaded satellite tiles?
12. What cross-view matching methods work for aerial-to-satellite registration?
13. How to preprocess and index satellite tiles for fast retrieval?
14. How to handle resolution mismatch (6cm UAV vs 50cm+ satellite)?
### E. IMU Fusion Component
15. How to fuse IMU data with visual estimates (EKF, UKF, factor graph)?
16. How to use IMU for dead-reckoning during feature-less frames?
17. How to use IMU for image orientation compensation (non-stabilized camera)?
### F. Edge Optimization
18. How to fit the full pipeline in 8GB shared memory?
19. What TensorRT optimizations are available for feature extractors?
20. How to achieve <5s per frame on Jetson Orin Nano Super?
### G. API & Streaming
21. What is the best approach for REST API + SSE on Python/Jetson?
22. How to implement progressive result refinement?
## Timeliness Sensitivity Assessment
- **Research Topic**: GPS-denied UAV visual navigation with edge processing
- **Sensitivity Level**: 🟠 High
- **Rationale**: Deep learning feature matchers (SuperPoint, LightGlue, XFeat) and edge inference frameworks (TensorRT) evolve rapidly. Jetson Orin Nano Super is a recent (Dec 2024) product. Cross-view geo-localization is an active research area.
- **Source Time Window**: 12 months (prioritize 2025-2026)
- **Priority official sources to consult**:
1. NVIDIA Jetson documentation and benchmarks
2. OpenCV / kornia / hloc official docs
3. Recent papers on cross-view geo-localization (CVPR, ECCV, ICCV 2024-2025)
- **Key version information to verify**:
- JetPack SDK: Current version ____
- SuperPoint/LightGlue: Latest available for TensorRT ____
- XFeat: Version and Jetson compatibility ____
@@ -0,0 +1,151 @@
# Source Registry
## Source #1
- **Title**: Visual Odometry in GPS-Denied Zones for Fixed-Wing UAV with Reduced Accumulative Error Based on Satellite Imagery
- **Link**: https://www.mdpi.com/2076-3417/14/16/7420
- **Tier**: L1
- **Publication Date**: 2024-08-22
- **Timeliness Status**: ✅ Currently valid
- **Target Audience**: Fixed-wing UAV GPS-denied navigation
- **Research Boundary Match**: ✅ Full match
- **Summary**: VO + satellite correction pipeline for fixed-wing UAV at 1000m+ altitude. Mean error 142.88m over 17km (0.83%). Uses ORB features, centroid-based displacement, Kalman filter smoothing, quadtree for satellite keypoint indexing.
- **Related Sub-question**: A1, B5, C7, D11
## Source #2
- **Title**: SatLoc: Hierarchical Adaptive Fusion Framework for GNSS-denied UAV Localization
- **Link**: https://www.scilit.com/publications/e5cafaf875a49297a62b298a89d5572f
- **Tier**: L1
- **Publication Date**: 2025
- **Timeliness Status**: ✅ Currently valid
- **Target Audience**: UAV localization in GNSS-denied environments
- **Research Boundary Match**: ✅ Full match
- **Summary**: Three-layer fusion: DinoV2 for satellite geo-localization, XFeat for VO, optical flow for velocity. Adaptive confidence-based weighting. <15m error, >90% coverage, 2+ Hz on edge hardware.
- **Related Sub-question**: B4, B5, C8, D12
## Source #3
- **Title**: XFeat: Accelerated Features for Lightweight Image Matching (CVPR 2024)
- **Link**: https://arxiv.org/abs/2404.19174
- **Tier**: L1
- **Publication Date**: 2024-04
- **Timeliness Status**: ✅ Currently valid
- **Target Audience**: Edge device feature matching
- **Research Boundary Match**: ✅ Full match
- **Summary**: 5x faster than SuperPoint, runs on CPU at VGA resolution. Sparse and semi-dense matching. TensorRT deployment available for Jetson. Comparable accuracy to SuperPoint.
- **Related Sub-question**: C8, F18, F20
## Source #4
- **Title**: XFeat TensorRT Implementation
- **Link**: https://github.com/PranavNedunghat/XFeatTensorRT
- **Tier**: L2
- **Publication Date**: 2024
- **Timeliness Status**: ✅ Currently valid
- **Summary**: C++ TensorRT implementation of XFeat, tested on Jetson Orin NX 16GB with JetPack 6.0, CUDA 12.2, TensorRT 8.6.
- **Related Sub-question**: C8, F18, F19
## Source #5
- **Title**: SuperPoint+LightGlue TensorRT Deployment
- **Link**: https://github.com/fettahyildizz/superpoint_lightglue_tensorrt
- **Tier**: L2
- **Publication Date**: 2024
- **Timeliness Status**: ✅ Currently valid
- **Summary**: C++ TensorRT implementation of SuperPoint+LightGlue. Production-ready deployment for Jetson platforms.
- **Related Sub-question**: C8, F19
## Source #6
- **Title**: FP8 Quantized LightGlue in TensorRT
- **Link**: https://fabio-sim.github.io/blog/fp8-quantized-lightglue-tensorrt-nvidia-model-optimizer/
- **Tier**: L2
- **Publication Date**: 2026
- **Timeliness Status**: ✅ Currently valid
- **Summary**: Up to ~6x speedup with FP8 quantization. Requires Hopper/Ada Lovelace GPUs (not available on Jetson Orin Nano Ampere). FP16 is the best available precision for Orin Nano.
- **Related Sub-question**: F19
## Source #7
- **Title**: NVIDIA JetPack 6.2 Release Notes
- **Link**: https://docs.nvidia.com/jetson/archives/jetpack-archived/jetpack-62/release-notes/index.html
- **Tier**: L1
- **Publication Date**: 2025
- **Timeliness Status**: ✅ Currently valid
- **Summary**: CUDA 12.6.10, TensorRT 10.3.0, cuDNN 9.3. Super Mode for Orin Nano: up to 2x inference performance, 50% memory bandwidth boost. Power modes: 15W, 25W, MAXN SUPER.
- **Related Sub-question**: F18, F19, F20
## Source #8
- **Title**: cuda-efficient-features (GPU feature detection benchmarks)
- **Link**: https://github.com/fixstars/cuda-efficient-features
- **Tier**: L2
- **Publication Date**: 2024
- **Timeliness Status**: ✅ Currently valid
- **Summary**: 4K detection: 12ms on Jetson Xavier. 8K: 27.5ms. 40K keypoints extraction: 20-25ms on Xavier. Orin Nano Super should be comparable or better.
- **Related Sub-question**: F20
## Source #9
- **Title**: Adaptive Covariance Hybrid EKF/UKF for Visual-Inertial Odometry
- **Link**: https://arxiv.org/abs/2512.17505
- **Tier**: L1
- **Publication Date**: 2025
- **Timeliness Status**: ✅ Currently valid
- **Summary**: Hybrid EKF/UKF achieves 49% better position accuracy, 57% better rotation accuracy than ESKF alone, at 48% lower computational cost than full UKF. Includes adaptive sensor confidence scoring.
- **Related Sub-question**: E15
## Source #10
- **Title**: SIFT+LightGlue for UAV Image Mosaicking (ISPRS 2025)
- **Link**: https://isprs-archives.copernicus.org/articles/XLVIII-2-W11-2025/169/2025/
- **Tier**: L1
- **Publication Date**: 2025
- **Timeliness Status**: ✅ Currently valid
- **Summary**: SIFT+LightGlue outperforms SuperPoint+LightGlue for UAV mosaicking across diverse scenarios. Superior in both low-texture and high-texture environments.
- **Related Sub-question**: C8, D12
## Source #11
- **Title**: UAVision - GNSS-Denied UAV Visual Localization System
- **Link**: https://github.com/ArboriseRS/UAVision
- **Tier**: L4
- **Publication Date**: 2024-2025
- **Timeliness Status**: ✅ Currently valid
- **Summary**: Open-source system using LightGlue for map matching. Includes image processing modules and visualization.
- **Related Sub-question**: A2
## Source #12
- **Title**: TerboucheHacene/visual_localization
- **Link**: https://github.com/TerboucheHacene/visual_localization
- **Tier**: L4
- **Publication Date**: 2024
- **Timeliness Status**: ✅ Currently valid
- **Summary**: Vision-based GNSS-free localization with SuperPoint/SuperGlue/GIM matching. Optimized VO + satellite image matching hybrid pipeline. Learning-based matchers for natural environments.
- **Related Sub-question**: A2, D12
## Source #13
- **Title**: GNSS-Denied Geolocalization with Terrain Constraints
- **Link**: https://github.com/yfs90/gnss-denied-uav-geolocalization
- **Tier**: L4
- **Publication Date**: 2024
- **Timeliness Status**: ✅ Currently valid
- **Summary**: No altimeters/IMU required, uses image matching + terrain constraints. GPS-comparable accuracy for day/night across varied terrain.
- **Related Sub-question**: A2
## Source #14
- **Title**: Google Maps Tile API Documentation
- **Link**: https://developers.google.com/maps/documentation/tile/satellite
- **Tier**: L1
- **Publication Date**: Current
- **Timeliness Status**: ✅ Currently valid
- **Summary**: Zoom levels 0-22. Satellite tiles via HTTPS. Session tokens required. Bulk download possible but subject to usage policies.
- **Related Sub-question**: D13
## Source #15
- **Title**: NaviLoc: Trajectory-Level Visual Localization for GNSS-Denied UAV Navigation
- **Link**: https://www.mdpi.com/2504-446X/10/2/97
- **Tier**: L1
- **Publication Date**: 2025
- **Timeliness Status**: ✅ Currently valid
- **Summary**: Trajectory-level optimization rather than per-frame matching. Optimizes entire trajectory against satellite reference for improved accuracy.
- **Related Sub-question**: B4, D11
## Source #16
- **Title**: GSD Estimation for UAV Photogrammetry
- **Link**: https://blog.truegeometry.com/calculators/UAV_photogrammetry_workflows_calculation.html
- **Tier**: L3
- **Publication Date**: Current
- **Timeliness Status**: ✅ Currently valid
- **Summary**: GSD = (sensor_width × altitude) / (focal_length × image_width). For our case: (23.5mm × 400m) / (25mm × 6252) = 0.06 m/pixel.
- **Related Sub-question**: C9
@@ -0,0 +1,121 @@
# Fact Cards
## Fact #1
- **Statement**: XFeat achieves up to 5x faster inference than SuperPoint while maintaining comparable accuracy for pose estimation. It runs in real-time on CPU at VGA resolution.
- **Source**: Source #3 (CVPR 2024 paper)
- **Phase**: Phase 2
- **Target Audience**: Edge device deployments
- **Confidence**: ✅ High
- **Related Dimension**: Feature Extraction
## Fact #2
- **Statement**: XFeat TensorRT implementation exists and is tested on Jetson Orin NX 16GB with JetPack 6.0, CUDA 12.2, TensorRT 8.6.
- **Source**: Source #4
- **Phase**: Phase 2
- **Target Audience**: Jetson platform deployment
- **Confidence**: ✅ High
- **Related Dimension**: Feature Extraction, Edge Optimization
## Fact #3
- **Statement**: SatLoc framework achieves <15m absolute localization error with >90% trajectory coverage at 2+ Hz on edge hardware, using DinoV2 for satellite matching, XFeat for VO, and optical flow for velocity.
- **Source**: Source #2
- **Phase**: Phase 2
- **Target Audience**: GNSS-denied UAV localization
- **Confidence**: ⚠️ Medium (paper details not fully accessible)
- **Related Dimension**: Overall Architecture, Accuracy
## Fact #4
- **Statement**: Mateos-Ramirez et al. achieved 142.88m mean error over 17km (0.83% error rate) with VO + satellite correction on a fixed-wing UAV at 1000m+ altitude. Without satellite correction, error accumulated to 850m+ over 17km.
- **Source**: Source #1
- **Phase**: Phase 2
- **Target Audience**: Fixed-wing UAV at high altitude
- **Confidence**: ✅ High
- **Related Dimension**: Accuracy, Architecture
## Fact #5
- **Statement**: VIO systems typically drift 0.8-1% of distance traveled. Between satellite corrections at 1km intervals, expected drift is ~10m.
- **Source**: Multiple sources (arxiv VIO benchmarks)
- **Phase**: Phase 2
- **Target Audience**: Aerial VIO systems
- **Confidence**: ✅ High
- **Related Dimension**: VO Drift
## Fact #6
- **Statement**: Jetson Orin Nano Super: 8GB LPDDR5 shared memory, 1024 CUDA cores, 32 Tensor Cores, 102 GB/s bandwidth, 67 TOPS INT8. JetPack 6.2: CUDA 12.6.10, TensorRT 10.3.0.
- **Source**: Source #7
- **Phase**: Phase 2
- **Target Audience**: Hardware specification
- **Confidence**: ✅ High
- **Related Dimension**: Edge Optimization
## Fact #7
- **Statement**: CUDA-accelerated feature detection at 4K (3840x2160): ~12ms on Jetson Xavier. At 8K: ~27.5ms. Descriptor extraction for 40K keypoints: ~20-25ms on Xavier. Orin Nano Super has comparable or slightly better compute.
- **Source**: Source #8
- **Phase**: Phase 2
- **Target Audience**: Jetson GPU performance
- **Confidence**: ✅ High
- **Related Dimension**: Processing Time
## Fact #8
- **Statement**: Hybrid EKF/UKF achieves 49% better position accuracy than ESKF alone at 48% lower computational cost than full UKF. Includes adaptive sensor confidence scoring based on image entropy and motion blur.
- **Source**: Source #9
- **Phase**: Phase 2
- **Target Audience**: VIO fusion
- **Confidence**: ✅ High
- **Related Dimension**: Sensor Fusion
## Fact #9
- **Statement**: SIFT+LightGlue outperforms SuperPoint+LightGlue for UAV mosaicking across diverse scenarios (low-texture agricultural and high-texture urban).
- **Source**: Source #10
- **Phase**: Phase 2
- **Target Audience**: UAV image matching
- **Confidence**: ✅ High
- **Related Dimension**: Feature Matching
## Fact #10
- **Statement**: GSD for our system at 400m: (23.5mm × 400m) / (25mm × 6252px) = 0.060 m/pixel. Image footprint: 6252 × 0.06 = 375m width, 4168 × 0.06 = 250m height.
- **Source**: Source #16 + camera parameters
- **Phase**: Phase 2
- **Target Audience**: Our specific system
- **Confidence**: ✅ High
- **Related Dimension**: Scale Estimation
## Fact #11
- **Statement**: Google Maps satellite tiles available via Tile API at zoom levels 0-22. Max zoom varies by region. For eastern Ukraine, zoom 18 (~0.6 m/px) is typically available; zoom 19 (~0.3 m/px) may not be.
- **Source**: Source #14
- **Phase**: Phase 2
- **Target Audience**: Satellite imagery
- **Confidence**: ⚠️ Medium (exact zoom availability for eastern Ukraine unverified)
- **Related Dimension**: Satellite Reference
## Fact #12
- **Statement**: FP8 quantization for LightGlue requires Hopper/Ada GPUs. Jetson Orin Nano uses Ampere architecture — limited to FP16 as best TensorRT precision.
- **Source**: Source #6, Source #7
- **Phase**: Phase 2
- **Target Audience**: Jetson optimization
- **Confidence**: ✅ High
- **Related Dimension**: Edge Optimization
## Fact #13
- **Statement**: SuperPoint+LightGlue TensorRT C++ deployment is available and production-tested. ONNX Runtime path achieves 2-4x speedup over compiled PyTorch.
- **Source**: Source #5, Source #6
- **Phase**: Phase 2
- **Target Audience**: Production deployment
- **Confidence**: ✅ High
- **Related Dimension**: Feature Matching, Edge Optimization
## Fact #14
- **Statement**: Cross-view matching (UAV-to-satellite) is fundamentally harder than same-view matching due to extreme viewpoint differences. Deep learning embeddings (DinoV2, CLIP-based) are the state-of-the-art for coarse retrieval. Local features are used for fine alignment.
- **Source**: Multiple (Sources #2, #12, #15)
- **Phase**: Phase 2
- **Target Audience**: Cross-view geo-localization
- **Confidence**: ✅ High
- **Related Dimension**: Satellite Matching
## Fact #15
- **Statement**: Quadtree spatial indexing enables O(log n) nearest-neighbor lookup for satellite keypoints. Combined with GeoHash for fast region encoding, this is the standard approach for tile management.
- **Source**: Sources #1, #14
- **Phase**: Phase 2
- **Target Audience**: Spatial indexing
- **Confidence**: ✅ High
- **Related Dimension**: Satellite Tile Management
@@ -0,0 +1,71 @@
# Comparison Framework
## Selected Framework Type
Decision Support — evaluating solution options per component
## Architecture Components to Evaluate
1. Feature Extraction & Matching (VO frame-to-frame)
2. Satellite Image Matching (cross-view geo-registration)
3. Sensor Fusion (VO + satellite + IMU)
4. Satellite Tile Preprocessing & Indexing
5. Image Downsampling Strategy
6. Re-localization (disconnected segments)
7. API & Streaming Layer
## Component 1: Feature Extraction & Matching (VO)
| Dimension | XFeat | SuperPoint + LightGlue | ORB (OpenCV) |
|-----------|-------|----------------------|--------------|
| Speed (Jetson) | ~2-5ms per frame (VGA), 5x faster than SuperPoint | ~15-50ms per frame (VGA, TensorRT FP16) | ~5-10ms per frame (CUDA) |
| Accuracy | Comparable to SuperPoint on pose estimation | State-of-the-art for local features | Lower accuracy, not scale-invariant |
| Memory | <100MB model | ~200-400MB model+inference | Negligible |
| TensorRT support | Yes (C++ impl available for Jetson Orin NX) | Yes (C++ impl available) | N/A (native CUDA) |
| Cross-view capability | Limited (same-view designed) | Better with LightGlue attention | Poor for cross-view |
| Rotation invariance | Moderate | Good with LightGlue | Good (by design) |
| Jetson validation | Tested on Orin NX (JetPack 6.0) | Tested on multiple Jetson platforms | Native OpenCV CUDA |
| **Fit for VO** | ✅ Best — fast, accurate, Jetson-proven | ⚠️ Good but heavier | ⚠️ Fast but less accurate |
| **Fit for satellite matching** | ⚠️ Moderate | ✅ Better for cross-view with attention | ❌ Poor for cross-view |
## Component 2: Satellite Image Matching (Cross-View)
| Dimension | Local Feature Matching (SIFT/SuperPoint + LightGlue) | Global Descriptor Retrieval (DinoV2/CLIP) | Template Matching (NCC) |
|-----------|-----------------------------------------------------|------------------------------------------|------------------------|
| Approach | Extract keypoints in both UAV and satellite images, match descriptors | Encode both images into global vectors, compare by distance | Slide UAV image over satellite tile, compute correlation |
| Accuracy | Sub-pixel when matches found (best for fine alignment) | Tile-level (~50-200m depending on tile size) | Pixel-level but sensitive to appearance changes |
| Speed | ~100-500ms for match+geometric verification | ~50-100ms for descriptor comparison | ~500ms-2s for large search area |
| Robustness to viewpoint | Good with LightGlue attention | Excellent (trained for cross-view) | Poor (requires similar viewpoint) |
| Memory | ~300-500MB (model + keypoints) | ~200-500MB (model) | Low |
| Failure rate | High in low-texture, seasonal changes | Lower — semantic understanding | High in changed scenes |
| **Recommended role** | Fine alignment (after coarse retrieval) | Coarse retrieval (select candidate tile) | Not recommended |
## Component 3: Sensor Fusion
| Dimension | EKF (Extended Kalman Filter) | Error-State EKF (ESKF) | Hybrid ESKF/UKF | Factor Graph (GTSAM) |
|-----------|-------------------------------|------------------------|------------------|---------------------|
| Accuracy | Baseline | Better for rotation | 49% better than ESKF | Best overall |
| Compute cost | Lowest | Low | 48% less than full UKF | Highest |
| Implementation complexity | Low | Medium | Medium-High | High |
| Handles non-linearity | Linearization errors | Better for small errors | Best among KF variants | Full non-linear |
| Real-time on Jetson | ✅ | ✅ | ✅ | ⚠️ Depends on graph size |
| Multi-rate sensor support | Manual | Manual | Manual | Native |
| **Fit** | ⚠️ Baseline option | ✅ Good starting point | ✅ Best KF option | ⚠️ Overkill for this system |
## Component 4: Satellite Tile Management
| Dimension | GeoHash + In-Memory | Quadtree + Memory-Mapped Files | Pre-extracted Feature DB |
|-----------|--------------------|-----------------------------|------------------------|
| Lookup speed | O(1) hash | O(log n) tree traversal | O(1) hash + feature load |
| Memory usage | All tiles in RAM | On-demand loading | Features only (smaller) |
| Preprocessing | Fast | Moderate | Slow (extract all features offline) |
| Flexibility | Fixed grid | Adaptive resolution | Fixed per-tile |
| **Fit for 8GB** | ❌ Too much RAM for large areas | ✅ Memory-efficient | ✅ Best — smallest footprint |
## Component 5: Image Downsampling Strategy
| Dimension | Fixed Resize (e.g., 1600x1066) | Pyramid (multi-scale) | ROI-based (center crop + full) |
|-----------|-------------------------------|----------------------|-------------------------------|
| Speed | Fast, single scale | Slower, multiple passes | Medium |
| Accuracy | Good if GSD ratio maintained | Best for multi-scale features | Good for center, loses edges |
| Memory | ~5MB per frame | ~7-8MB per frame | ~6MB per frame |
| **Fit** | ✅ Best tradeoff | ⚠️ Unnecessary complexity | ⚠️ Loses coverage |
@@ -0,0 +1,129 @@
# Reasoning Chain
## Dimension 1: Feature Extraction for Visual Odometry
### Fact Confirmation
XFeat is 5x faster than SuperPoint (Fact #1), has TensorRT deployment on Jetson (Fact #2), and comparable accuracy for pose estimation. SatLoc (the most relevant state-of-the-art system) uses XFeat for its VO component (Fact #3).
### Reference Comparison
SuperPoint+LightGlue is more accurate for cross-view matching but heavier. ORB is fast but less accurate and not robust to appearance changes. SIFT+LightGlue is best for mosaicking (Fact #9) but slower.
### Conclusion
**XFeat for VO (frame-to-frame)** — it's the fastest learned feature, Jetson-proven, and used by the closest state-of-the-art system (SatLoc). For satellite matching, a different approach is needed because cross-view matching requires viewpoint-invariant features.
### Confidence
✅ High — supported by SatLoc architecture and CVPR 2024 benchmarks.
---
## Dimension 2: Satellite Image Matching Strategy
### Fact Confirmation
Cross-view matching is fundamentally harder than same-view (Fact #14). Deep learning embeddings (DinoV2) are state-of-the-art for coarse retrieval (Fact #3). Local features are better for fine alignment. SatLoc uses DinoV2 for satellite matching specifically.
### Reference Comparison
A two-stage coarse-to-fine approach is the dominant pattern in literature: (1) global descriptor retrieves candidate region, (2) local feature matching refines position. Pure local-feature matching has high failure rate for cross-view due to extreme viewpoint differences.
### Conclusion
**Two-stage approach**: (1) Coarse — use a lightweight global descriptor to find the best-matching satellite tile within the search area (VO-predicted position ± uncertainty radius). (2) Fine — use local feature matching (SuperPoint+LightGlue or XFeat) between UAV frame and the matched satellite tile to get precise position. The coarse stage can also serve as the re-localization mechanism for disconnected segments.
### Confidence
✅ High — consensus across multiple recent papers and the SatLoc system.
---
## Dimension 3: Sensor Fusion Approach
### Fact Confirmation
Hybrid ESKF/UKF achieves 49% better accuracy than ESKF alone at 48% lower cost than full UKF (Fact #8). Factor graphs (GTSAM) offer the best accuracy but are computationally expensive.
### Reference Comparison
For our system: IMU runs at 100-400Hz, VO at ~3Hz (frame rate), satellite corrections at variable rate (whenever matching succeeds). We need multi-rate fusion that handles intermittent satellite corrections and continuous IMU.
### Conclusion
**Error-State EKF (ESKF)** as the baseline fusion approach — it's well-understood, lightweight, handles multi-rate sensors naturally, and is proven for VIO on edge hardware. Upgrade to hybrid ESKF/UKF if ESKF accuracy is insufficient. Factor graphs are overkill for this real-time edge system.
The filter state: position (lat/lon), velocity, orientation (quaternion), IMU biases. Measurements: VO-derived displacement (high rate), satellite-derived absolute position (variable rate), IMU (highest rate for prediction).
### Confidence
✅ High — ESKF is the standard choice for embedded VIO systems.
---
## Dimension 4: Satellite Tile Preprocessing & Indexing
### Fact Confirmation
Quadtree enables O(log n) lookups (Fact #15). Pre-extracting features offline saves runtime compute. 8GB memory limits in-memory tile storage.
### Reference Comparison
Full tiles in memory is infeasible for large areas. Memory-mapped files allow on-demand loading. Pre-extracted feature databases have the smallest runtime footprint.
### Conclusion
**Offline preprocessing pipeline**:
1. Download Google Maps satellite tiles at max zoom (18-19) for the operational area
2. Extract features (XFeat or SuperPoint) from each tile
3. Compute global descriptors (lightweight, e.g., NetVLAD or cosine-pooled XFeat descriptors) per tile
4. Store: tile metadata (GPS bounds, zoom level), features + descriptors in a GeoHash-indexed database
5. Build spatial index (GeoHash) for fast lookup by GPS region
**Runtime**: Given VO-estimated position, query GeoHash to find nearby tiles, compare global descriptors for coarse match, then local feature matching for fine alignment.
### Confidence
✅ High — standard approach used by all relevant systems.
---
## Dimension 5: Image Downsampling Strategy
### Fact Confirmation
26MP images need downsampling for 8GB device (Fact #6). Feature extraction at 4K takes ~12ms on Jetson Xavier (Fact #7). UAV GSD at 400m is ~6cm/px (Fact #10). Satellite GSD is ~60cm/px at zoom 18.
### Reference Comparison
For VO (frame-to-frame): features at full resolution are wasteful — consecutive frames at 6cm GSD overlap ~80%, and features at lower resolution are sufficient for displacement estimation. For satellite matching: we need to match at satellite resolution (~60cm/px), so downsampling to match satellite GSD is natural.
### Conclusion
**Downsample to ~1600x1066** (factor ~4x each dimension). This yields ~24cm/px GSD — still 2.5x finer than satellite, sufficient for feature matching. Image size: ~5MB (RGB). Feature extraction at this resolution: <10ms. This is the single resolution for both VO and satellite matching.
### Confidence
✅ High — standard practice for edge processing of high-res imagery.
---
## Dimension 6: Disconnected Segment Handling
### Fact Confirmation
SatLoc uses satellite matching as an independent localization source that works regardless of VO state (Fact #3). The AC requires reconnecting disconnected segments as a core capability.
### Reference Comparison
Pure VO cannot handle zero-overlap transitions. IMU dead-reckoning bridges short gaps (seconds). Satellite-based re-localization provides absolute position regardless of VO state.
### Conclusion
**Independent satellite localization per frame** — every frame attempts satellite matching regardless of VO state. This naturally handles disconnected segments:
1. When VO succeeds: satellite matching refines position (high confidence)
2. When VO fails (sharp turn): satellite matching provides absolute position (sole source)
3. When both fail: IMU dead-reckoning with low confidence score
4. After 3 consecutive total failures: request user input
Segment reconnection is automatic: all positions are in the same global (WGS84) frame via satellite matching. No explicit "reconnection" needed — segments share the satellite reference.
### Confidence
✅ High — this is the key architectural insight.
---
## Dimension 7: Processing Pipeline Architecture
### Fact Confirmation
<5s per frame required (AC). Feature extraction ~10ms, VO matching ~20-50ms, satellite coarse retrieval ~50-100ms, satellite fine matching ~200-500ms, fusion ~1ms. Total: ~300-700ms per frame.
### Conclusion
**Pipelined parallel architecture**:
- Thread 1 (Camera): Capture frame, downsample, extract features → push to queue
- Thread 2 (VO): Match with previous frame, compute displacement → push to fusion
- Thread 3 (Satellite): Search nearby tiles, coarse retrieval, fine matching → push to fusion
- Thread 4 (Fusion): ESKF prediction (IMU), update (VO), update (satellite) → emit result via SSE
VO and satellite matching can run in parallel for each frame. Fusion integrates results as they arrive. This enables <1s per frame total latency.
### Confidence
✅ High — standard producer-consumer pipeline.
@@ -0,0 +1,98 @@
# Validation Log
## Validation Scenario 1: Normal Flight (80% of time)
UAV flies straight, consecutive frames overlap ~70-80%. Terrain has moderate texture (agricultural + urban mix).
### Expected Based on Conclusions
- XFeat extracts features in ~5ms, VO matching in ~20ms
- Satellite matching succeeds: coarse retrieval ~50ms, fine matching ~300ms
- ESKF fuses both: position accuracy ~10-20m (satellite-anchored)
- Total processing: <500ms per frame
- Confidence: HIGH
### Actual Validation (against literature)
SatLoc reports <15m error with >90% coverage under similar conditions. Mateos-Ramirez reports 0.83% drift with satellite correction. Both align with our expected performance.
### Result: ✅ PASS
---
## Validation Scenario 2: Sharp Turn (5-10% of time)
UAV makes a 60-degree turn. Next frame has <5% overlap with previous. Heading changes rapidly.
### Expected Based on Conclusions
- VO fails (insufficient feature overlap) — detected by low match count
- IMU provides heading and approximate displacement for ~1-2 frames
- Satellite matching attempts independent localization of the new frame
- If satellite match succeeds: position recovered, segment continues
- If satellite match fails: IMU dead-reckoning with LOW confidence
### Potential Issues
- Satellite matching may also fail if the frame is heavily tilted (non-nadir view during turn)
- IMU drift during turn: at 100m/s for 1s, displacement ~100m. IMU drift over 1s: ~1-5m — acceptable
### Result: ⚠️ CONDITIONAL PASS — depends on satellite matching success during turn. Non-stabilized camera may produce tilted images that are harder to match. IMU provides reasonable bridge.
---
## Validation Scenario 3: Disconnected Route (rare, <5%)
UAV completes segment A, makes a 90+ degree turn, flies a new heading. Segment B has no overlap with segment A. Multiple such segments possible.
### Expected Based on Conclusions
- Each segment independently localizes via satellite matching
- No explicit reconnection needed — all in WGS84 frame
- Per-segment accuracy depends on satellite matching success rate
- Low-confidence gaps between segments until satellite match succeeds
### Result: ✅ PASS — architecture handles this natively via independent per-frame satellite matching.
---
## Validation Scenario 4: Memory-Constrained Operation (always)
3000 frames, 8GB shared memory. Full pipeline running.
### Expected Based on Conclusions
- Downsampled frame: ~5MB per frame. Keep 2 in memory (current + previous): ~10MB
- XFeat model (TensorRT): ~50-100MB
- Satellite tile features (loaded tiles): ~200-500MB for tiles near current position
- ESKF state: <1MB
- OS + runtime: ~1.5GB
- Total: ~2-3GB active, well within 8GB
### Potential Issues
- Satellite feature DB for large operational areas could be large on disk (not memory — loaded on demand)
- Need careful management of tile loading/unloading
### Result: ✅ PASS — 8GB is sufficient with proper memory management.
---
## Validation Scenario 5: Degraded Satellite Imagery
Google Maps tiles at 0.5-1.0 m/px resolution. Some areas have outdated imagery. Seasonal appearance changes.
### Expected Based on Conclusions
- Coarse retrieval (global descriptors) should handle moderate appearance changes
- Fine matching may fail on outdated/seasonal tiles — confidence drops to LOW
- System falls back to VO + IMU in degraded areas
- Multiple consecutive failures → user input request
### Potential Issues
- If large areas have degraded satellite imagery, the system may operate mostly in VO+IMU mode with significant drift
- 50m accuracy target may not be achievable in these areas
### Result: ⚠️ CONDITIONAL PASS — system degrades gracefully, but accuracy targets depend on satellite quality. This is a known risk per Phase 1 assessment.
---
## Review Checklist
- [x] Draft conclusions consistent with fact cards
- [x] No important dimensions missed
- [x] No over-extrapolation
- [x] Conclusions actionable/verifiable
- [x] Sharp turn handling addressed
- [x] Memory constraints validated
- [ ] Issue: Satellite imagery quality in eastern Ukraine remains a risk
- [ ] Issue: Non-stabilized camera during turns may degrade satellite matching
## Conclusions Requiring No Revision
All major architectural decisions validated. Two known risks (satellite quality, non-stabilized camera during turns) are acknowledged and handled by the fallback hierarchy.