add solution drafts 3 times, used research skill, expand acceptance criteria

This commit is contained in:
Oleksandr Bezdieniezhnykh
2026-03-14 20:38:00 +02:00
parent 767874cb90
commit d764250f9a
23 changed files with 3385 additions and 1 deletions
@@ -0,0 +1,76 @@
# Acceptance Criteria Assessment
## System Parameters (Calculated)
| Parameter | Value |
|-----------|-------|
| GSD (at 400m) | 6.01 cm/pixel |
| Ground footprint | 376m × 250m |
| Consecutive overlap | 60-73% (at 100m intervals) |
| Pixels per 50m | ~832 pixels |
| Pixels per 20m | ~333 pixels |
## Acceptance Criteria
| Criterion | Our Values | Researched Values | Cost/Timeline Impact | Status |
|-----------|-----------|-------------------|---------------------|--------|
| GPS accuracy: 80% within 50m | 50m error for 80% of photos | NaviLoc: 19.5m MLE at 50-150m alt. Mateos-Ramirez: 143m mean at >1000m alt (with IMU). At 400m with 26MP + satellite correction, 50m for 80% is achievable with VO+SIM. No IMU adds ~30-50% error overhead. | Medium cost — needs robust satellite matching pipeline. ~3-4 weeks for core pipeline. | **Achievable** — keep as-is |
| GPS accuracy: 60% within 20m | 20m error for 60% of photos | NaviLoc: 19.5m MLE at lower altitude (50-150m). At 400m, larger viewpoint gap increases error. Cross-view matching MA@20m improving +10% yearly. Needs high-quality satellite imagery and robust matching. | Higher cost — requires higher-quality satellite imagery (0.3-0.5m resolution). Additional 1-2 weeks for refinement. | **Challenging but achievable** — consider relaxing to 30m initially, tighten with iteration |
| Handle 350m outlier photos | Tolerate up to 350m jump between consecutive photos | Standard VO systems detect outliers via feature matching failure. 350m at GSD 6cm = ~5833 pixels. Satellite re-localization can handle this if area is textured. | Low additional cost — outlier detection is standard in VO pipelines. | **Achievable** — keep as-is |
| Sharp turns: <5% overlap, <200m drift, <70° angle | System continues working during sharp turns | <5% overlap means consecutive feature matching will fail. Must fall back to satellite matching for absolute position. At 400m altitude with 376m footprint, 200m drift means partial overlap with satellite. 70° rotation is large but manageable with rotation-invariant matchers (AKAZE, SuperPoint). | High complexity — requires multi-strategy architecture (VO primary, satellite fallback). +2-3 weeks. | **Achievable with architectural investment** — keep as-is |
| Route disconnection & reconnection | Handle multiple disconnected route segments | Each segment needs independent satellite geo-referencing. Segments are stitched via common satellite reference frame. Similar to loop closure in SLAM but via external reference. | High complexity — core architectural challenge. +2-3 weeks for segment management. | **Achievable** — this should be a core design principle, not an edge case |
| User input fallback (20% of route) | User provides GPS when system cannot determine | Simple UI interaction — user clicks approximate position on map. Becomes new anchor point. | Low cost — straightforward feature. | **Achievable** — keep as-is |
| Processing speed: <5s per image | 5 seconds maximum per image | SuperPoint: ~50-100ms. LightGlue: ~20-50ms. Satellite crop+match: ~200-500ms. Full pipeline: ~500ms-2s on RTX 2060. NaviLoc runs 9 FPS on Raspberry Pi 5. ORB-SLAM3 with GPU: 30 FPS on Jetson TX2. | Low risk — well within budget on RTX 2060+. | **Easily achievable** — could target <2s. Keep 5s as safety margin |
| Real-time streaming via SSE | Results appear immediately, refinement sent later | Standard architecture pattern. Process-and-stream is well-supported. | Low cost — standard web engineering. | **Achievable** — keep as-is |
| Image Registration Rate > 95% | >95% of images successfully registered | ITU thesis: 93% SIM matching. With 60-73% consecutive overlap and deep learning features, >95% for VO between consecutive frames is achievable. The 5% tolerance covers sharp turns. | Medium cost — depends on feature matcher quality and satellite image quality. | **Achievable** — but interpret as "95% for normal consecutive frames". Sharp turn frames counted separately. |
| MRE < 1.0 pixels | Mean Reprojection Error below 1 pixel | Sub-pixel accuracy is standard for SuperPoint/LightGlue. SVO achieves sub-pixel via direct methods. Typical range: 0.3-0.8 pixels. | No additional cost — inherent to modern matchers. | **Easily achievable** — keep as-is |
| REST API + SSE background service | Always-running service, start on request, stream results | Standard Python (FastAPI) or .NET architecture. | Low cost — standard engineering. ~1 week for API layer. | **Achievable** — keep as-is |
## Restrictions Assessment
| Restriction | Our Values | Researched Values | Cost/Timeline Impact | Status |
|-------------|-----------|-------------------|---------------------|--------|
| No IMU data | No heading, no pitch/roll correction | **CRITICAL restriction.** Most published systems use IMU for heading and as fallback. Without IMU: (1) heading must be derived from consecutive frame matching or satellite matching, (2) no pitch/roll correction — rely on robust feature matchers, (3) scale from known altitude only. Adds ~30-50% error vs IMU-equipped systems. | High impact — requires visual heading estimation. All VO literature assumes at least heading from IMU. +2-3 weeks R&D for pure visual heading. | **Realistic but significantly harder.** Consider: can barometer data be available? |
| Camera not auto-stabilized | Images have varying pitch/roll | At 400m with fixed-wing, typical roll ±15°, pitch ±10°. Causes trapezoidal distortion in images. Robust matchers (SuperPoint, LightGlue) handle moderate viewpoint changes. Homography estimation between frames compensates. | Medium impact — modern matchers handle this. Pre-rectification using estimated attitude could help. | **Realistic** — keep as-is. Mitigated by robust matchers. |
| Google Maps only (cost-dependent) | Currently limited to Google Maps | Google Maps in eastern Ukraine may have 2-5 year old imagery. Conflict damage makes old imagery unreliable. **Risk: satellite-UAV matching may fail in areas with significant ground changes.** Alternatives: Mapbox (Maxar Vivid, sub-meter), Bing Maps (0.3-1m), Maxar SecureWatch (30cm, enterprise pricing). | High risk — may need multiple providers. Google: $200/month free credit. Mapbox: free tier for 100K requests. Maxar: enterprise pricing. | **Tighten** — add fallback provider. Pre-download tile cache for operational area. |
| Image resolution FullHD to 6252×4168 | Variable resolution across flights | Lower resolution (FullHD=1920×1080) at 400m: GSD ≈ 0.20m/pixel, footprint ~384m × 216m. Significantly worse matching but still functional. Need to handle both extremes. | Medium impact — pipeline must be resolution-adaptive. | **Realistic** — keep. But note: FullHD accuracy will be ~3x worse than 26MP. |
| Altitude ≤ 1km, terrain height negligible | Flat terrain assumption at known altitude | Simplifies scale estimation. At 400m, terrain variations of ±50m cause ±12.5% scale error. Eastern Ukraine is relatively flat (steppe), so this is reasonable. | Low impact for the operational area. | **Realistic** — keep as-is |
| Mostly sunny weather | Good lighting conditions assumed | Sunny weather = good texture, consistent illumination. Shadows may cause matching issues but are manageable. | Low impact — favorable condition. | **Realistic** — keep. Add: "system performance degrades in overcast/low-light" |
| Up to 3000 photos per flight | 500-1500 typical, 3000 maximum | At <5s per image: 3000 photos = ~4 hours max. Memory: 3000 × 26MP ≈ 78GB raw. Need efficient memory management and incremental processing. | Medium impact — requires streaming architecture and careful memory management. | **Realistic** — keep. Memory management is engineering, not research. |
| Sharp turns with completely different next photo | Route discontinuity is possible | Most VO systems fail at 0% overlap. This is effectively a new "start point" problem. Satellite matching is the only recovery path. | High impact — already addressed in AC. | **Realistic** — this is the defining challenge |
| Desktop/laptop with RTX 2060+ | Minimum GPU requirement | RTX 2060: 6GB VRAM, 1920 CUDA cores. Sufficient for SuperPoint, LightGlue, satellite matching. RTX 3070: 8GB VRAM, 5888 CUDA cores — significantly faster. | Low risk — hardware is adequate. | **Realistic** — keep as-is |
## Missing Acceptance Criteria (Suggested Additions)
| Criterion | Suggested Value | Rationale |
|-----------|----------------|-----------|
| Satellite imagery resolution requirement | ≥ 0.5 m/pixel, ideally 0.3 m/pixel | Matching quality depends heavily on reference imagery resolution. At GSD 6cm, satellite must be at least 0.5m for reliable cross-view matching. |
| Confidence/uncertainty reporting | Report confidence score per position estimate | User needs to know which positions are reliable (satellite-anchored) vs uncertain (VO-only, accumulating drift). |
| Output format | WGS84 coordinates in GeoJSON or CSV | Standardize output for downstream integration. |
| Satellite image freshness requirement | < 2 years old for operational area | Older imagery may not match current ground truth due to conflict damage. |
| Maximum drift between satellite corrections | < 100m cumulative VO drift before satellite re-anchor | Prevents long uncorrected VO segments from exceeding 50m target. |
| Memory usage limit | < 16GB RAM, < 6GB VRAM | Ensures compatibility with RTX 2060 systems. |
## Key Findings
1. **The 50m/80% accuracy target is achievable** with a well-designed VO + satellite matching pipeline, even without IMU, given the high camera resolution (6cm GSD) and known altitude. NaviLoc achieves 19.5m at lower altitudes; our 400m altitude adds difficulty but 26MP resolution compensates.
2. **The 20m/60% target is aggressive but possible** with high-quality satellite imagery (≤0.5m resolution). Consider starting with a 30m target and tightening through iteration. Performance heavily depends on satellite image quality and freshness for the operational area.
3. **No IMU is the single biggest technical risk.** All published comparable systems use at least heading from IMU/magnetometer. Visual heading estimation from consecutive frames is feasible but adds noise. This restriction alone could require 2-3 extra weeks of R&D.
4. **Google Maps satellite imagery for eastern Ukraine is a significant risk.** Imagery may be outdated (2-5 years) and may not reflect current ground conditions. A fallback satellite provider is strongly recommended.
5. **Processing speed (<5s) is easily achievable** on RTX 2060+. Modern feature matching pipelines process in <500ms per pair. The pipeline could realistically achieve <2s per image.
6. **Route disconnection handling should be the core architectural principle**, not an edge case. The system should be designed "segments-first" — each segment independently geo-referenced, then stitched.
7. **Missing criterion: confidence reporting.** The user should see which positions are high-confidence (satellite-anchored) vs low-confidence (VO-extrapolated). This is critical for operational use.
## Sources
- [Source #1] Mateos-Ramirez et al. (2024) — VO + satellite correction for fixed-wing UAV
- [Source #2] Öztürk (2025) — ORB-SLAM3 + SIM integration thesis
- [Source #3] NaviLoc (2025) — Trajectory-level visual localization
- [Source #4] LightGlue GitHub — Feature matching benchmarks
- [Source #5] DALGlue (2025) — Enhanced feature matching
- [Source #8-9] Satellite imagery coverage and pricing reports
@@ -0,0 +1,63 @@
# Question Decomposition — AC & Restrictions Assessment
## Original Question
How realistic are the acceptance criteria and restrictions for a GPS-denied visual navigation system for fixed-wing UAV imagery?
## Active Mode
Mode A, Phase 1: AC & Restrictions Assessment
## Question Type
Knowledge Organization + Decision Support
## Research Subject Boundary Definition
| Dimension | Boundary |
|-----------|----------|
| **Platform** | Fixed-wing UAV, airplane type, not multirotor |
| **Geography** | Eastern/southern Ukraine, left of Dnipro River (conflict zone, ~48.27°N, 37.38°E based on sample data) |
| **Altitude** | ≤ 1km, sample data at 400m |
| **Sensor** | Monocular RGB camera, 26MP, no IMU, no LiDAR |
| **Processing** | Ground-based desktop/laptop with NVIDIA RTX 2060+ GPU |
| **Time Window** | Current state-of-the-art (2024-2026) |
## Problem Context Summary
The system must determine GPS coordinates of consecutive aerial photo centers using only:
- Known starting GPS coordinates
- Known camera parameters (25mm focal, 23.5mm sensor, 6252×4168 resolution)
- Known flight altitude (≤1km, sample: 400m)
- Consecutive photos taken within ~100m of each other
- Satellite imagery (Google Maps) for ground reference
Key constraints: NO IMU data, camera not auto-stabilized, potentially outdated satellite imagery for conflict zone.
**Ground Sample Distance (GSD) at 400m altitude**:
- GSD = (400 × 23.5) / (25 × 6252) ≈ 0.060 m/pixel (6 cm/pixel)
- Ground footprint: ~376m × 250m per image
- Estimated consecutive overlap: 60-73% (depending on camera orientation relative to flight direction)
## Sub-Questions for AC Assessment
1. What GPS accuracy is achievable with VO + satellite matching at 400m altitude with 26MP camera?
2. How does the absence of IMU affect accuracy and what compensations exist?
3. What processing speed is achievable per image on RTX 2060+ for the required pipeline?
4. What image registration rates are achievable with deep learning matchers?
5. What reprojection errors are typical for modern feature matching?
6. How do sharp turns and route disconnections affect VO systems?
7. What satellite imagery quality is available for the operational area?
8. What domain-specific acceptance criteria might be missing?
## Timeliness Sensitivity Assessment
- **Research Topic**: GPS-denied visual navigation using deep learning feature matching
- **Sensitivity Level**: 🟠 High
- **Rationale**: Deep learning feature matchers (SuperPoint, LightGlue, GIM) are evolving rapidly; new methods appear quarterly. Satellite imagery providers update pricing and coverage frequently.
- **Source Time Window**: 12 months (2024-2026)
- **Priority official sources to consult**:
1. LightGlue GitHub repository (cvg/LightGlue)
2. ORB-SLAM3 documentation
3. Recent MDPI/IEEE papers on GPS-denied UAV navigation
- **Key version information to verify**:
- LightGlue: Current release and performance benchmarks
- SuperPoint: Compatibility and inference speed
- ORB-SLAM3: Monocular mode capabilities
@@ -0,0 +1,133 @@
# Source Registry
## Source #1
- **Title**: Visual Odometry in GPS-Denied Zones for Fixed-Wing UAV with Reduced Accumulative Error Based on Satellite Imagery
- **Link**: https://www.mdpi.com/2076-3417/14/16/7420
- **Tier**: L1
- **Publication Date**: 2024-08-22
- **Timeliness Status**: ✅ Currently valid
- **Target Audience**: Fixed-wing UAV navigation researchers
- **Research Boundary Match**: ✅ Full match (fixed-wing, high altitude, satellite matching)
- **Summary**: VO + satellite image correction achieves 142.88m mean error over 17km at >1000m altitude using ORB + AKAZE. Uses IMU for heading and barometer for altitude. Error rate 0.83% of total distance.
- **Related Sub-question**: 1, 2
## Source #2
- **Title**: Optimized visual odometry and satellite image matching-based localization for UAVs in GPS-denied environments (ITU Thesis)
- **Link**: https://polen.itu.edu.tr/items/1fe1e872-7cea-44d8-a8de-339e4587bee6
- **Tier**: L1
- **Publication Date**: 2025
- **Timeliness Status**: ✅ Currently valid
- **Target Audience**: UAV navigation researchers
- **Research Boundary Match**: ⚠️ Partial overlap (multirotor at 30-100m, but same VO+SIM methodology)
- **Summary**: ORB-SLAM3 + SuperPoint/SuperGlue/GIM achieves GPS-level accuracy. VO module: ±2m local accuracy. SIM module: 93% matching success rate. Demonstrated on DJI Mavic Air 2 at 30-100m.
- **Related Sub-question**: 1, 2, 4
## Source #3
- **Title**: NaviLoc: Trajectory-Level Visual Localization for GNSS-Denied UAV Navigation
- **Link**: https://www.mdpi.com/2504-446X/10/2/97
- **Tier**: L1
- **Publication Date**: 2025-12
- **Timeliness Status**: ✅ Currently valid
- **Target Audience**: UAV navigation / VPR researchers
- **Research Boundary Match**: ⚠️ Partial overlap (50-150m altitude, uses VIO not pure VO)
- **Summary**: Achieves 19.5m Mean Localization Error at 50-150m altitude. Runs at 9 FPS on Raspberry Pi 5. 16x improvement over AnyLoc-VLAD, 32x over raw VIO drift. Training-free system.
- **Related Sub-question**: 1, 7
## Source #4
- **Title**: LightGlue: Local Feature Matching at Light Speed (GitHub + ICCV 2023)
- **Link**: https://github.com/cvg/LightGlue
- **Tier**: L1
- **Publication Date**: 2023 (actively maintained through 2025)
- **Timeliness Status**: ✅ Currently valid
- **Target Audience**: Computer vision practitioners
- **Research Boundary Match**: ✅ Full match (core component)
- **Summary**: ~20-34ms per image pair on RTX 2080Ti. Adaptive pruning for fast inference. 2-4x speedup with PyTorch compilation.
- **Related Sub-question**: 3, 4
## Source #5
- **Title**: Efficient image matching for UAV visual navigation via DALGlue
- **Link**: https://www.nature.com/articles/s41598-025-21602-5
- **Tier**: L1
- **Publication Date**: 2025
- **Timeliness Status**: ✅ Currently valid
- **Target Audience**: UAV navigation researchers
- **Research Boundary Match**: ✅ Full match
- **Summary**: DALGlue achieves 11.8% improvement over LightGlue on matching accuracy. Uses dual-tree complex wavelet preprocessing + linear attention for real-time performance.
- **Related Sub-question**: 3, 4
## Source #6
- **Title**: Deep-UAV SLAM: SuperPoint and SuperGlue enhanced SLAM
- **Link**: https://isprs-archives.copernicus.org/articles/XLVIII-1-W5-2025/177/2025/
- **Tier**: L1
- **Publication Date**: 2025
- **Timeliness Status**: ✅ Currently valid
- **Target Audience**: UAV SLAM researchers
- **Research Boundary Match**: ✅ Full match
- **Summary**: Replacing ORB-SLAM3's ORB features with SuperPoint+SuperGlue improved robustness and accuracy in aerial RGB scenarios.
- **Related Sub-question**: 4, 5
## Source #7
- **Title**: SCAR: Satellite Imagery-Based Calibration for Aerial Recordings
- **Link**: https://arxiv.org/html/2602.16349v1
- **Tier**: L1
- **Publication Date**: 2026-02
- **Timeliness Status**: ✅ Currently valid
- **Target Audience**: Aerial/satellite vision researchers
- **Research Boundary Match**: ✅ Full match
- **Summary**: Long-term auto-calibration refinement by aligning aerial images with 2D-3D correspondences from orthophotos and elevation models.
- **Related Sub-question**: 1, 5
## Source #8
- **Title**: Google Maps satellite imagery coverage and update frequency
- **Link**: https://ongeo-intelligence.com/blog/how-often-does-google-maps-update-satellite-images
- **Tier**: L3
- **Publication Date**: 2025
- **Timeliness Status**: ✅ Currently valid
- **Target Audience**: GIS practitioners
- **Research Boundary Match**: ✅ Full match
- **Summary**: Conflict zones like eastern Ukraine face 2-5+ year update cycles. Imagery may be intentionally limited or blurred.
- **Related Sub-question**: 7
## Source #9
- **Title**: Satellite Mapping Services comparison 2025
- **Link**: https://ts2.tech/en/exploring-the-world-from-above-top-satellite-mapping-services-for-web-mobile-in-2025/
- **Tier**: L3
- **Publication Date**: 2025
- **Timeliness Status**: ✅ Currently valid
- **Target Audience**: Developers, GIS practitioners
- **Research Boundary Match**: ✅ Full match
- **Summary**: Google: $200/month free credit, sub-meter resolution. Mapbox: Maxar imagery, generous free tier. Maxar SecureWatch: 30cm resolution, enterprise pricing. Planet: daily 3-4m imagery.
- **Related Sub-question**: 7
## Source #10
- **Title**: Scale Estimation for Monocular Visual Odometry Using Reliable Camera Height
- **Link**: https://ieeexplore.ieee.org/document/9945178/
- **Tier**: L1
- **Publication Date**: 2022
- **Timeliness Status**: ✅ Currently valid (fundamental method)
- **Target Audience**: VO researchers
- **Research Boundary Match**: ✅ Full match
- **Summary**: Known camera height/altitude resolves scale ambiguity in monocular VO. Essential for systems without IMU.
- **Related Sub-question**: 2
## Source #11
- **Title**: Cross-View Geo-Localization benchmarks (SSPT, MA metrics)
- **Link**: https://www.mdpi.com/1424-8220/24/12/3719
- **Tier**: L1
- **Publication Date**: 2024
- **Timeliness Status**: ✅ Currently valid
- **Target Audience**: VPR/geo-localization researchers
- **Research Boundary Match**: ⚠️ Partial overlap (general cross-view, not UAV-specific)
- **Summary**: SSPT achieved 84.40% RDS on UL14 dataset. MA improvements: +12% at 3m, +12% at 5m, +10% at 20m thresholds.
- **Related Sub-question**: 1
## Source #12
- **Title**: ORB-SLAM3 GPU Acceleration Performance
- **Link**: https://arxiv.org/html/2509.10757v1
- **Tier**: L1
- **Publication Date**: 2025
- **Timeliness Status**: ✅ Currently valid
- **Target Audience**: SLAM/VO engineers
- **Research Boundary Match**: ✅ Full match
- **Summary**: GPU acceleration achieves 2.8x speedup on desktop systems. 30 FPS achievable on Jetson TX2. Feature extraction up to 3x speedup with CUDA.
- **Related Sub-question**: 3
@@ -0,0 +1,121 @@
# Fact Cards
## Fact #1
- **Statement**: VO + satellite image correction achieves ~142.88m mean error over 17km flight at >1000m altitude using ORB features and AKAZE satellite matching. Error rate: 0.83% of total distance. This system uses IMU for heading and barometer for altitude.
- **Source**: Source #1 — https://www.mdpi.com/2076-3417/14/16/7420
- **Phase**: Phase 1
- **Target Audience**: Fixed-wing UAV at high altitude (>1000m)
- **Confidence**: ✅ High (peer-reviewed, real-world flight data)
- **Related Dimension**: GPS accuracy, drift correction
## Fact #2
- **Statement**: ORB-SLAM3 monocular mode with optimized parameters achieves ±2m local accuracy for visual odometry. Scale ambiguity and drift remain for long flights.
- **Source**: Source #2 — ITU Thesis
- **Phase**: Phase 1
- **Target Audience**: UAV navigation (30-100m altitude, multirotor)
- **Confidence**: ✅ High (thesis with experimental validation)
- **Related Dimension**: VO accuracy, scale ambiguity
## Fact #3
- **Statement**: Combined VO + Satellite Image Matching (SIM) with SuperPoint/SuperGlue/GIM achieves 93% matching success rate and "GPS-level accuracy" at 30-100m altitude.
- **Source**: Source #2 — ITU Thesis
- **Phase**: Phase 1
- **Target Audience**: Low-altitude UAV (30-100m)
- **Confidence**: ✅ High
- **Related Dimension**: Registration rate, satellite matching
## Fact #4
- **Statement**: NaviLoc achieves 19.5m Mean Localization Error at 50-150m altitude, runs at 9 FPS on Raspberry Pi 5. 16x improvement over AnyLoc-VLAD. Training-free system.
- **Source**: Source #3 — NaviLoc paper
- **Phase**: Phase 1
- **Target Audience**: Low-altitude UAV (50-150m) in rural areas
- **Confidence**: ✅ High (peer-reviewed)
- **Related Dimension**: GPS accuracy, processing speed
## Fact #5
- **Statement**: LightGlue inference: ~20-34ms per image pair on RTX 2080Ti for 1024 keypoints. 2-4x speedup possible with PyTorch compilation and TensorRT.
- **Source**: Source #4 — LightGlue GitHub Issues
- **Phase**: Phase 1
- **Target Audience**: All GPU-accelerated vision systems
- **Confidence**: ✅ High (official repository benchmarks)
- **Related Dimension**: Processing speed
## Fact #6
- **Statement**: SuperPoint+SuperGlue replacing ORB features in SLAM improves robustness and accuracy for aerial RGB imagery over classical handcrafted features.
- **Source**: Source #6 — ISPRS 2025
- **Phase**: Phase 1
- **Target Audience**: UAV SLAM researchers
- **Confidence**: ✅ High (peer-reviewed)
- **Related Dimension**: Feature matching quality
## Fact #7
- **Statement**: Eastern Ukraine / conflict zones may have 2-5+ year old satellite imagery on Google Maps. Imagery may be intentionally limited, blurred, or restricted for security reasons.
- **Source**: Source #8
- **Phase**: Phase 1
- **Target Audience**: Ukraine conflict zone operations
- **Confidence**: ⚠️ Medium (general reporting, not Ukraine-specific verification)
- **Related Dimension**: Satellite imagery quality
## Fact #8
- **Statement**: Maxar SecureWatch offers 30cm resolution with ~3M km² new imagery daily. Mapbox uses Maxar's Vivid imagery with sub-meter resolution. Google Maps offers sub-meter detail in urban areas but 1-3m in rural areas.
- **Source**: Source #9
- **Phase**: Phase 1
- **Target Audience**: All satellite imagery users
- **Confidence**: ✅ High
- **Related Dimension**: Satellite providers, cost
## Fact #9
- **Statement**: Known camera height/altitude resolves scale ambiguity in monocular VO. The pixel-to-meter conversion is s = H / f × sensor_pixel_size, enabling metric reconstruction without IMU.
- **Source**: Source #10
- **Phase**: Phase 1
- **Target Audience**: Monocular VO systems
- **Confidence**: ✅ High (fundamental geometric relationship)
- **Related Dimension**: No-IMU compensation
## Fact #10
- **Statement**: Camera heading (yaw) can be estimated from consecutive frame feature matching by decomposing the homography or essential matrix. Pitch/roll can be estimated from horizon detection or vanishing points. Without IMU, these estimates are noisier but functional.
- **Source**: Multiple vision-based heading estimation papers
- **Phase**: Phase 1
- **Target Audience**: Vision-only navigation systems
- **Confidence**: ⚠️ Medium (well-established but accuracy varies)
- **Related Dimension**: No-IMU compensation
## Fact #11
- **Statement**: GSD at 400m with 25mm/23.5mm sensor/6252px = 6.01 cm/pixel. Ground footprint: 376m × 250m. At 100m photo interval, consecutive overlap is 60-73%.
- **Source**: Calculated from problem data using standard GSD formula
- **Phase**: Phase 1
- **Target Audience**: This specific system
- **Confidence**: ✅ High (deterministic calculation)
- **Related Dimension**: Image coverage, overlap
## Fact #12
- **Statement**: GPU-accelerated ORB-SLAM3 achieves 2.8x speedup on desktop systems. 30 FPS possible on Jetson TX2. Feature extraction speedup up to 3x with CUDA-optimized pipelines.
- **Source**: Source #12
- **Phase**: Phase 1
- **Target Audience**: GPU-equipped systems
- **Confidence**: ✅ High
- **Related Dimension**: Processing speed
## Fact #13
- **Statement**: Without IMU, the Mateos-Ramirez paper (Source #1) would lose: (a) yaw angle for rotation compensation, (b) fallback when feature matching fails. Their 142.88m error would likely be significantly higher without IMU heading data.
- **Source**: Inference from Source #1 methodology
- **Phase**: Phase 1
- **Target Audience**: This specific system
- **Confidence**: ⚠️ Medium (reasoned inference)
- **Related Dimension**: No-IMU impact
## Fact #14
- **Statement**: DALGlue achieves 11.8% improvement over LightGlue on matching accuracy while maintaining real-time performance through dual-tree complex wavelet preprocessing and linear attention.
- **Source**: Source #5
- **Phase**: Phase 1
- **Target Audience**: Feature matching systems
- **Confidence**: ✅ High (peer-reviewed, 2025)
- **Related Dimension**: Feature matching quality
## Fact #15
- **Statement**: Cross-view geo-localization benchmarks show MA@20m improving by +10% with latest methods (SSPT). RDS metric at 84.40% indicates reliable spatial positioning.
- **Source**: Source #11
- **Phase**: Phase 1
- **Target Audience**: Cross-view matching researchers
- **Confidence**: ✅ High
- **Related Dimension**: Cross-view matching accuracy
@@ -0,0 +1,115 @@
# Comparison Framework
## Selected Framework Type
Decision Support (component-by-component solution comparison)
## System Components
1. Visual Odometry (consecutive frame matching)
2. Satellite Image Geo-Referencing (cross-view matching)
3. Heading & Orientation Estimation (without IMU)
4. Drift Correction & Position Fusion
5. Segment Management & Route Reconnection
6. Interactive Point-to-GPS Lookup
7. Pipeline Orchestration & API
---
## Component 1: Visual Odometry
| Solution | Tools | Advantages | Limitations | Fit |
|----------|-------|-----------|-------------|-----|
| ORB-SLAM3 monocular | ORB features, BA, map management | Mature, well-tested, handles loop closure. GPU-accelerated. 30FPS on Jetson TX2. | Scale ambiguity without IMU. Over-engineered for sequential aerial — map building not needed. Heavy dependency. | Medium — too complex for the use case |
| Homography-based VO with SuperPoint+LightGlue | SuperPoint, LightGlue, OpenCV homography | Ground plane assumption perfect for flat terrain at 400m. Cleanly separates rotation/translation. Known altitude resolves scale directly. Fast. | Assumes planar scene (valid for our case). Fails at sharp turns (but that's expected). | **Best fit** — matches constraints exactly |
| Optical flow VO | cv2.calcOpticalFlowPyrLK or RAFT | Dense motion field, no feature extraction needed. | Less accurate for large motions. Struggles with texture-sparse areas. No inherent rotation estimation. | Low — not suitable for 100m baselines |
| Direct method (SVO) | SVO Pro | Sub-pixel precision, fast. | Designed for small baselines and forward cameras. Poor for downward aerial at large baselines. | Low |
**Selected**: Homography-based VO with SuperPoint + LightGlue features
---
## Component 2: Satellite Image Geo-Referencing
| Solution | Tools | Advantages | Limitations | Fit |
|----------|-------|-----------|-------------|-----|
| SuperPoint + LightGlue cross-view matching | SuperPoint, LightGlue, perspective warp | Best overall performance on satellite stereo benchmarks. Fast (~50ms matching). Rotation-invariant. Handles viewpoint/scale changes. | Requires perspective warping to reduce viewpoint gap. Needs good satellite image quality. | **Best fit** — proven on satellite imagery |
| SuperPoint + SuperGlue + GIM | SuperPoint, SuperGlue, GIM | GIM adds generalization for challenging scenes. 93% match rate (ITU thesis). | SuperGlue slower than LightGlue. GIM adds complexity. | Good — slightly better robustness, slower |
| LoFTR (detector-free) | LoFTR | No keypoint detection step. Works on low-texture. | Slower than detector-based methods. Fixed resolution (coarse). Less accurate than SuperPoint+LightGlue on satellite benchmarks. | Medium — fallback option |
| DUSt3R/MASt3R | DUSt3R/MASt3R | Handles extreme viewpoints and low overlap. +50% completeness over COLMAP in sparse scenarios. | Very slow. Designed for 3D reconstruction not 2D matching. Unreliable with many images. | Low — only for extreme fallback |
| Terrain-weighted optimization (YFS90) | Custom pipeline + DEM | <7m MAE without IMU! Drift-free. Handles thermal IR. 20 scenarios validated. | Requires DEM data. More complex implementation. Not open-source matching details. | High — architecture inspiration |
**Selected**: SuperPoint + LightGlue (primary) with perspective warping. GIM as supplementary for difficult matches. YFS90-style terrain-weighted sliding window for position optimization.
---
## Component 3: Heading & Orientation Estimation
| Solution | Tools | Advantages | Limitations | Fit |
|----------|-------|-----------|-------------|-----|
| Homography decomposition (consecutive frames) | OpenCV decomposeHomographyMat | Directly gives rotation between frames. Works with ground plane assumption. No extra sensors needed. | Accumulates heading drift over time. Noisy for small motions. Ambiguous decomposition (need to select correct solution). | **Best fit** — primary heading source |
| Satellite matching absolute orientation | From satellite match homography | Provides absolute heading correction. Eliminates accumulated heading drift. | Only available when satellite match succeeds. Intermittent. | **Best fit** — drift correction for heading |
| Optical flow direction | Dense flow vectors | Simple to compute. | Very noisy at high altitude. Unreliable for heading. | Low |
**Selected**: Homography decomposition for frame-to-frame heading + satellite matching for periodic absolute heading correction.
---
## Component 4: Drift Correction & Position Fusion
| Solution | Tools | Advantages | Limitations | Fit |
|----------|-------|-----------|-------------|-----|
| Kalman filter (EKF/UKF) | filterpy or custom | Well-understood. Handles noisy measurements. Good for fusing VO + satellite. | Assumes Gaussian noise. Linearization issues with EKF. | Good — simple and effective |
| Sliding window optimization with terrain constraints | Custom optimization, scipy.optimize | YFS90 achieves <7m with this. Directly constrains drift. No loop closure needed. | More complex to implement. Needs tuning. | **Best fit** — proven for this exact problem |
| Pose graph optimization | g2o, GTSAM | Standard in SLAM. Handles satellite anchors as prior factors. Globally optimal. | Heavy dependency. Over-engineered if segments are short. | Medium — overkill unless routes are very long |
| Simple anchor reset | Direct correction at satellite match | Simplest. Just replace VO position with satellite position. | Discontinuous trajectory. No smoothing. | Low — too crude |
**Selected**: Sliding window optimization with terrain constraints (inspired by YFS90), with Kalman filter as simpler fallback. Satellite matches as absolute anchor constraints.
---
## Component 5: Segment Management & Route Reconnection
| Solution | Tools | Advantages | Limitations | Fit |
|----------|-------|-----------|-------------|-----|
| Segments-first architecture with satellite anchoring | Custom segment manager | Each segment independently geo-referenced. No dependency between disconnected segments. Natural handling of sharp turns. | Needs robust satellite matching per segment. Segments without any satellite match are "floating". | **Best fit** — matches AC requirement for core strategy |
| Global pose graph with loop closure | g2o/GTSAM | Can connect segments when they revisit same area. | Heavy. Doesn't help if segments don't overlap with each other. | Low — segments may not revisit same areas |
| Trajectory-level VPR (NaviLoc-style) | VPR + trajectory optimization | Global optimization across trajectory. | Requires pre-computed VPR database. Complex. Designed for continuous trajectory, not disconnected segments. | Low |
**Selected**: Segments-first architecture. Each segment starts from a satellite anchor or user input. Segments connected through shared satellite coordinate frame.
---
## Component 6: Interactive Point-to-GPS Lookup
| Solution | Tools | Advantages | Limitations | Fit |
|----------|-------|-----------|-------------|-----|
| Homography projection (image → ground) | Computed homography from satellite match | Already computed during geo-referencing. Accurate for flat terrain. | Only works for images with successful satellite match. | **Best fit** |
| Camera ray-casting with known altitude | Camera intrinsics + pose estimate | Works for any image with pose estimate. Simpler math. | Accuracy depends on pose estimate quality. | Good — fallback for non-satellite-matched images |
**Selected**: Homography projection (primary) + ray-casting (fallback).
---
## Component 7: Pipeline & API
| Solution | Tools | Advantages | Limitations | Fit |
|----------|-------|-----------|-------------|-----|
| Python FastAPI + SSE | FastAPI, EventSourceResponse, asyncio | Native SSE support (since 0.135.0). Async GPU pipeline. Excellent for ML/CV workloads. Rich ecosystem. | Python GIL (mitigated with async/multiprocessing). | **Best fit** — natural for CV/ML pipeline |
| .NET ASP.NET Core + SSE | ASP.NET Core, SignalR | High performance. Good for enterprise. | Less natural for CV/ML. Python interop needed for PyTorch models. Adds complexity. | Low — unnecessary indirection |
| Python + gRPC streaming | gRPC | Efficient binary protocol. Bidirectional streaming. | More complex client integration. No browser-native support. | Medium — overkill for this use case |
**Selected**: Python FastAPI with SSE.
---
## Google Maps Tile Resolution at Latitude 48° (Operational Area)
| Zoom Level | Meters/pixel | Tile coverage (256px) | Tiles for 20km² | Download size est. |
|-----------|-------------|----------------------|-----------------|-------------------|
| 17 | 0.80 m/px | ~205m × 205m | ~500 tiles | ~20MB |
| 18 | 0.40 m/px | ~102m × 102m | ~2,000 tiles | ~80MB |
| 19 | 0.20 m/px | ~51m × 51m | ~8,000 tiles | ~320MB |
| 20 | 0.10 m/px | ~26m × 26m | ~30,000 tiles | ~1.2GB |
Formula: metersPerPx = 156543.03 × cos(48° × π/180) / 2^zoom ≈ 104,771 / 2^zoom
**Selected**: Zoom 18 (0.40 m/px) as primary matching resolution. Zoom 19 (0.20 m/px) for refinement if available. Meets the ≥0.5 m/pixel AC requirement.
@@ -0,0 +1,146 @@
# Reasoning Chain
## Dimension 1: GPS Accuracy (50m/80%, 20m/60%)
### Fact Confirmation
- YFS90 system achieves <7m MAE without IMU (Fact from Source DOAJ/GitHub)
- NaviLoc achieves 19.5m MLE at 50-150m altitude (Fact #4)
- Mateos-Ramirez achieves 143m mean error at >1000m altitude with IMU (Fact #1)
- Our GSD is 6cm/pixel at 400m altitude (Fact #11)
- ITU thesis achieves GPS-level accuracy with VO+SIM at 30-100m (Fact #3)
### Reference Comparison
- At 400m altitude, our camera produces much higher resolution imagery than typical systems
- YFS90 at <7m without IMU is the strongest reference — uses terrain-weighted constraint optimization
- NaviLoc at 19.5m uses trajectory-level optimization but at lower altitude
- The combination of VO + satellite matching with sliding window optimization should achieve 10-30m depending on satellite image quality
### Conclusion
- **50m / 80%**: High confidence achievable. Multiple systems achieve better than this.
- **20m / 60%**: Achievable with good satellite imagery. YFS90 achieves <7m. Our higher altitude makes cross-view matching harder, but 26MP camera compensates.
- **10m stretch**: Possible with zoom 19 satellite tiles (0.2m/px) and terrain-weighted optimization.
### Confidence: ✅ High for 50m, ⚠️ Medium for 20m, ❓ Low for 10m
---
## Dimension 2: No-IMU Heading Estimation
### Fact Confirmation
- Homography decomposition gives rotation between frames for planar scenes (multiple sources)
- Ground plane assumption is valid for flat terrain (eastern Ukraine steppe)
- Satellite matching provides absolute orientation correction (Sources #1, #2)
- YFS90 achieves <7m without requiring IMU (Source #3 DOAJ)
### Reference Comparison
- Most published systems use IMU for heading — our approach is less common
- YFS90 proves it's possible without IMU, but uses DEM data for terrain weighting
- The key insight: satellite matching provides both position AND heading correction, making intermittent heading drift from VO acceptable
### Conclusion
Heading estimation from homography decomposition between consecutive frames + periodic satellite matching correction is viable. The frame-to-frame heading drift accumulates, but satellite corrections at regular intervals (every 5-20 frames) reset it. The flat terrain of the operational area makes the ground plane assumption reliable.
### Confidence: ⚠️ Medium — novel approach but supported by YFS90 results
---
## Dimension 3: Processing Speed (<5s per image)
### Fact Confirmation
- LightGlue: ~20-50ms per pair (Fact #5)
- SuperPoint extraction: ~50-100ms per image
- GPU-accelerated ORB-SLAM3: 30 FPS (Fact #12)
- NaviLoc: 9 FPS on Raspberry Pi 5 (Fact #4)
### Pipeline Time Budget Estimate (per image on RTX 2060)
1. SuperPoint feature extraction: ~80ms
2. LightGlue VO matching (vs previous frame): ~40ms
3. Homography estimation + position update: ~5ms
4. Satellite tile crop (from cache): ~10ms
5. SuperPoint extraction on satellite crop: ~80ms
6. LightGlue satellite matching: ~60ms
7. Position correction + sliding window optimization: ~20ms
8. Total: ~295ms ≈ 0.3s
### Conclusion
Processing comfortably fits within 5s budget. Even with additional overhead (satellite tile download, perspective warping, GIM fallback), the pipeline stays under 2s. The 5s budget provides ample margin.
### Confidence: ✅ High
---
## Dimension 4: Sharp Turns & Route Disconnection
### Fact Confirmation
- At <5% overlap, consecutive feature matching will fail
- Satellite matching can provide absolute position independently of VO
- DUSt3R/MASt3R handle extreme low overlap (+50% completeness vs COLMAP)
- YFS90 handles positioning failures with re-localization
### Reference Comparison
- Traditional VO systems fail at sharp turns — this is expected and acceptable
- The segments-first architecture treats each continuous VO chain as a segment
- Satellite matching re-localizes at the start of each new segment
- If satellite matching fails too → wider search area → user input
### Conclusion
The system should not try to match across sharp turns. Instead:
1. Detect VO failure (low match count / high reprojection error)
2. Start new segment
3. Attempt satellite geo-referencing for new segment start
4. Each segment is independently positioned in the global satellite coordinate frame
This is architecturally simpler and more robust than trying to bridge disconnections.
### Confidence: ✅ High
---
## Dimension 5: Satellite Image Matching Reliability
### Fact Confirmation
- Google Maps at zoom 18: 0.40 m/px at lat 48° — meets AC requirement
- Eastern Ukraine imagery may be 2-5 years old (Fact #7)
- SuperPoint+LightGlue is best performer for satellite matching (Source comparison study)
- Perspective warping improves cross-view matching significantly
- 93% match rate achieved in ITU thesis (Fact #3)
### Reference Comparison
- The main risk is satellite image freshness in conflict zone
- Natural terrain features (rivers, forests, field boundaries) are relatively stable over years
- Man-made features (buildings, roads) may change due to conflict
- Agricultural field patterns change seasonally
### Conclusion
Satellite matching will work reliably in areas with stable natural features. Performance degrades in:
1. Areas with significant conflict damage (buildings destroyed)
2. Areas with seasonal agricultural changes
3. Areas with very homogeneous texture (large uniform fields)
Mitigation: use multiple scale levels, widen search area, accept lower confidence.
### Confidence: ⚠️ Medium — depends heavily on operational area characteristics
---
## Dimension 6: Architecture Selection
### Fact Confirmation
- YFS90 architecture (VO + satellite matching + terrain-weighted optimization) achieves <7m
- ITU thesis architecture (ORB-SLAM3 + SIM) achieves GPS-level accuracy
- NaviLoc architecture (VPR + trajectory optimization) achieves 19.5m
### Reference Comparison
- YFS90 is closest to our requirements: no IMU, satellite matching, drift correction
- Our system adds: segment management, real-time streaming, user fallback
- We need simpler VO than ORB-SLAM3 (no map building needed)
- We need faster matching than SuperGlue (LightGlue preferred)
### Conclusion
Hybrid architecture combining:
- YFS90-style sliding window optimization for drift correction
- SuperPoint + LightGlue for both VO and satellite matching (unified feature pipeline)
- Segments-first architecture for disconnection handling
- FastAPI + SSE for real-time streaming
### Confidence: ✅ High
@@ -0,0 +1,57 @@
# Validation Log
## Validation Scenario
Using the provided sample data: 60 consecutive images from a flight starting at (48.275292, 37.385220) heading generally south-southwest. Camera: 26MP at 400m altitude.
## Expected Behavior Based on Conclusions
### Normal consecutive frames (AD000001-AD000032)
- VO successfully matches consecutive frames (60-73% overlap)
- Satellite matching every 5-10 frames provides absolute correction
- Position error stays within 20-50m corridor around ground truth
- Heading estimated from homography, corrected by satellite matching
### Apparent maneuver zone (AD000033-AD000048)
- The coordinates show the UAV making a complex turn around images 33-48
- Some consecutive pairs may have low overlap → VO quality drops
- Satellite matching becomes the primary position source
- New segments may be created if VO fails completely
- Position confidence drops in this zone
### Return to straight flight (AD000049-AD000060)
- VO re-establishes strong consecutive matching
- Satellite matching re-anchors position
- Accuracy returns to normal levels
## Actual Validation (Calculated)
Distances between consecutive samples in the data:
- AD000001→002: ~180m (larger than stated 100m — likely exaggeration in problem description)
- AD000002→003: ~115m
- Typical gap: 80-180m
- At 376m footprint width and 250m height, even 180m gap gives 52-73% overlap → sufficient for VO
At the turn zone (images 33-48):
- AD000041→042: ~230m with direction change → overlap may drop to 30-40%
- AD000042→043: ~230m with direction change → overlap may drop significantly
- AD000045→046: ~160m with direction change → may be <20% overlap
- These transitions are where VO may fail → satellite matching needed
## Counterexamples
1. **Homogeneous terrain**: If a section of the flight is over large uniform agricultural fields with no distinguishing features, both VO and satellite matching may fail. Mitigation: use higher zoom satellite tiles, rely on VO with lower confidence.
2. **Conflict-damaged area**: If satellite imagery shows pre-war structures that no longer exist, satellite matching will produce incorrect position estimates. Mitigation: confidence scoring will flag inconsistent matches.
3. **FullHD resolution flight**: At GSD 20cm/pixel instead of 6cm, matching quality degrades ~3x. The 50m target may still be achievable but 20m will be very difficult.
## Review Checklist
- [x] Draft conclusions consistent with fact cards
- [x] No important dimensions missed
- [x] No over-extrapolation
- [x] Issue found: The problem states "within 100 meters of each other" but actual data shows 80-230m. Pipeline must handle larger baselines.
- [x] Issue found: Tile download strategy needs to handle unknown route direction — progressive expansion needed.
## Conclusions Requiring Revision
- Photo spacing is 80-230m not strictly 100m — increases the range of overlap variations. Still functional but wider variance than assumed.
- Route direction is unknown at start — satellite tile pre-loading must use expanding radius strategy, not directional pre-loading.