fresh start. Another try

This commit is contained in:
Oleksandr Bezdieniezhnykh
2026-04-25 23:13:40 +03:00
parent 17d7730048
commit 2178737b36
101 changed files with 1 additions and 15518 deletions
-283
View File
@@ -1,283 +0,0 @@
# Solution Draft
## Product Solution Description
A real-time GPS-denied visual navigation system for fixed-wing UAVs, running entirely on a Jetson Orin Nano Super (8GB). The system determines frame-center GPS coordinates by fusing three information sources: (1) CUDA-accelerated visual odometry (cuVSLAM), (2) absolute position corrections from satellite image matching, and (3) IMU-based motion prediction. Results stream to clients via REST API + SSE in real time.
**Hard constraint**: Camera shoots at ~3fps (333-400ms interval). The full pipeline must complete within **400ms per frame**.
**Satellite matching strategy**: Benchmark LiteSAM on actual Orin Nano Super hardware as a day-one priority. If LiteSAM cannot achieve ≤400ms at 480px resolution, **abandon it entirely** and use XFeat semi-dense matching as the primary satellite matcher. Speed is non-negotiable.
**Core architectural principles**:
1. **cuVSLAM handles VO** — NVIDIA's CUDA-accelerated library achieves 90fps on Jetson Orin Nano, giving VO essentially "for free" (~11ms/frame).
2. **Keyframe-based satellite matching** — satellite matcher runs on keyframes only (every 3-10 frames), amortizing its cost. Non-keyframes rely on cuVSLAM VO + IMU.
3. **Every keyframe independently attempts satellite-based geo-localization** — this handles disconnected segments natively.
4. **Pipeline parallelism** — satellite matching for frame N overlaps with VO processing of frame N+1 via CUDA streams.
```
┌─────────────────────────────────────────────────────────────────┐
│ OFFLINE (Before Flight) │
│ Satellite Tiles → Download & Crop → Store as tile pairs │
│ (Google Maps) (per flight plan) (disk, GeoHash indexed) │
└─────────────────────────────────────────────────────────────────┘
┌─────────────────────────────────────────────────────────────────┐
│ ONLINE (During Flight) │
│ │
│ EVERY FRAME (400ms budget): │
│ ┌────────────────────────────────┐ │
│ │ Camera → Downsample (CUDA 2ms)│ │
│ │ → cuVSLAM VO+IMU (~11ms)│──→ ESKF Update → SSE Emit │
│ └────────────────────────────────┘ ↑ │
│ │ │
│ KEYFRAMES ONLY (every 3-10 frames): │ │
│ ┌────────────────────────────────────┐ │ │
│ │ Satellite match (async CUDA stream)│─────┘ │
│ │ LiteSAM or XFeat (see benchmark) │ │
│ │ (does NOT block VO output) │ │
│ └────────────────────────────────────┘ │
│ │
│ IMU: 100+Hz continuous → ESKF prediction │
└─────────────────────────────────────────────────────────────────┘
```
## Speed Optimization Techniques
### 1. cuVSLAM for Visual Odometry (~11ms/frame)
NVIDIA's CUDA-accelerated VO library (v15.0.0, March 2026) achieves 90fps on Jetson Orin Nano. Supports monocular camera + IMU natively. Features: automatic IMU fallback when visual tracking fails, loop closure, Python and C++ APIs. This eliminates custom VO entirely.
### 2. Keyframe-Based Satellite Matching
Not every frame needs satellite matching. Strategy:
- cuVSLAM provides VO at every frame (high-rate, low-latency)
- Satellite matching triggers on **keyframes** selected by:
- Fixed interval: every 3-10 frames (~1-3.3s between satellite corrections)
- Confidence drop: when ESKF covariance exceeds threshold
- VO failure: when cuVSLAM reports tracking loss (sharp turn)
### 3. Satellite Matcher Selection (Benchmark-Driven)
**Candidate A: LiteSAM (opt)** — Best accuracy for satellite-aerial matching (RMSE@30 = 17.86m on UAV-VisLoc). 6.31M params, MobileOne + TAIFormer + MinGRU. Benchmarked at 497ms on Jetson AGX Orin at 1184px. AGX Orin is 3-4x more powerful than Orin Nano Super (275 TOPS vs 67 TOPS, $2000+ vs $249).
Realistic Orin Nano Super estimates:
- At 1184px: ~1.5-2.0s (unusable)
- At 640px: ~500-800ms (borderline)
- At 480px: ~300-500ms (best case)
**Candidate B: XFeat semi-dense** — ~50-100ms on Orin Nano Super. Proven on Jetson. Not specifically designed for cross-view satellite-aerial, but fast and reliable.
**Decision rule**: Benchmark LiteSAM TensorRT FP16 at 480px on Orin Nano Super. If ≤400ms → use LiteSAM. If >400ms → **abandon LiteSAM, use XFeat as primary**. No hybrid compromises — pick one and optimize it.
### 4. TensorRT FP16 Optimization
LiteSAM's MobileOne backbone is reparameterizable — multi-branch training structure collapses to a single feed-forward path at inference. Combined with TensorRT FP16, this maximizes throughput. INT8 is possible for MobileOne backbone but ViT/transformer components may degrade with INT8.
### 5. CUDA Stream Pipelining
Overlap operations across consecutive frames:
- Stream A: cuVSLAM VO for current frame (~11ms) + ESKF fusion (~1ms)
- Stream B: Satellite matching for previous keyframe (async)
- CPU: SSE emission, tile management, keyframe selection logic
### 6. Pre-cropped Satellite Tiles
Offline: for each satellite tile, store both the raw image and a pre-resized version matching the satellite matcher's input resolution. Runtime avoids resize cost.
## Existing/Competitor Solutions Analysis
| Solution | Approach | Accuracy | Hardware | Limitations |
|----------|----------|----------|----------|-------------|
| Mateos-Ramirez et al. (2024) | VO (ORB) + satellite keypoint correction + Kalman | 142m mean / 17km (0.83%) | Orange Pi class | No re-localization; ORB only; 1000m+ altitude |
| SatLoc (2025) | DinoV2 + XFeat + optical flow + adaptive fusion | <15m, >90% coverage | Edge (unspecified) | Paper not fully accessible |
| LiteSAM (2025) | MobileOne + TAIFormer + MinGRU subpixel refinement | RMSE@30 = 17.86m on UAV-VisLoc | RTX 3090 (62ms), AGX Orin (497ms) | Not tested on Orin Nano; AGX Orin is 3-4x more powerful |
| TerboucheHacene/visual_localization | SuperPoint/SuperGlue/GIM + VO + satellite | Not quantified | Desktop-class | Not edge-optimized |
| cuVSLAM (NVIDIA, 2025-2026) | CUDA-accelerated VO+SLAM, mono/stereo/IMU | <1% trajectory error (KITTI), <5cm (EuRoC) | Jetson Orin Nano (90fps) | VO only, no satellite matching |
**Key insight**: Combine cuVSLAM (best-in-class VO for Jetson) with the fastest viable satellite-aerial matcher via ESKF fusion. LiteSAM is the accuracy leader but unproven on Orin Nano Super — benchmark first, abandon for XFeat if too slow.
## Architecture
### Component: Visual Odometry
| Solution | Tools | Advantages | Limitations | Fit |
|----------|-------|-----------|-------------|-----|
| cuVSLAM (mono+IMU) | PyCuVSLAM / C++ API | 90fps on Orin Nano, NVIDIA-optimized, loop closure, IMU fallback | Closed-source CUDA library | ✅ Best |
| XFeat frame-to-frame | XFeatTensorRT | 5x faster than SuperPoint, open-source | ~30-50ms total, no IMU integration | ⚠️ Fallback |
| ORB-SLAM3 | OpenCV + custom | Well-understood, open-source | CPU-heavy, ~30fps on Orin | ⚠️ Slower |
**Selected**: **cuVSLAM (mono+IMU mode)** — purpose-built by NVIDIA for Jetson. ~11ms/frame leaves 389ms for everything else. Auto-fallback to IMU when visual tracking fails.
### Component: Satellite Image Matching
| Solution | Tools | Advantages | Limitations | Fit |
|----------|-------|-----------|-------------|-----|
| LiteSAM (opt) | TensorRT | Best satellite-aerial accuracy (RMSE@30 17.86m), 6.31M params, subpixel refinement | 497ms on AGX Orin at 1184px; AGX Orin is 3-4x more powerful than Orin Nano Super | ✅ If benchmark passes |
| XFeat semi-dense | XFeatTensorRT | ~50-100ms, lightweight, Jetson-proven | Not designed for cross-view satellite-aerial | ✅ If LiteSAM fails benchmark |
| EfficientLoFTR | TensorRT | Good accuracy, semi-dense | 15.05M params (2.4x LiteSAM), slower | ⚠️ Heavier |
| SuperPoint + LightGlue | TensorRT C++ | Good general matching | Sparse only, worse on satellite-aerial | ⚠️ Not specialized |
**Selection**: Benchmark-driven. Day-one test on Orin Nano Super:
1. Export LiteSAM (opt) to TensorRT FP16
2. Measure at 480px, 640px, 800px
3. If ≤400ms at 480px → **LiteSAM**
4. If >400ms at any viable resolution → **XFeat semi-dense** (primary, no hybrid)
### Component: Sensor Fusion
| Solution | Tools | Advantages | Limitations | Fit |
|----------|-------|-----------|-------------|-----|
| Error-State EKF (ESKF) | Custom Python/C++ | Lightweight, multi-rate, well-understood | Linear approximation | ✅ Best |
| Hybrid ESKF/UKF | Custom | 49% better accuracy | More complex | ⚠️ Upgrade path |
| Factor Graph (GTSAM) | GTSAM | Best accuracy | Heavy compute | ❌ Too heavy |
**Selected**: **ESKF** with adaptive measurement noise. State vector: [position(3), velocity(3), orientation_quat(4), accel_bias(3), gyro_bias(3)] = 16 states.
Measurement sources and rates:
- IMU prediction: 100+Hz
- cuVSLAM VO update: ~3Hz (every frame)
- Satellite update: ~0.3-1Hz (keyframes only, delayed via async pipeline)
### Component: Satellite Tile Preprocessing (Offline)
**Selected**: **GeoHash-indexed tile pairs on disk**.
Pipeline:
1. Define operational area from flight plan
2. Download satellite tiles from Google Maps Tile API at max zoom (18-19)
3. Pre-resize each tile to matcher input resolution
4. Store: original tile + resized tile + metadata (GPS bounds, zoom, GSD) in GeoHash-indexed directory structure
5. Copy to Jetson storage before flight
### Component: Re-localization (Disconnected Segments)
**Selected**: **Keyframe satellite matching is always active + expanded search on VO failure**.
When cuVSLAM reports tracking loss (sharp turn, no features):
1. Immediately flag next frame as keyframe → trigger satellite matching
2. Expand tile search radius (from ±200m to ±1km based on IMU dead-reckoning uncertainty)
3. If match found: position recovered, new segment begins
4. If 3+ consecutive keyframe failures: request user input via API
### Component: Object Center Coordinates
Geometric calculation once frame-center GPS is known:
1. Pixel offset from center: (dx_px, dy_px)
2. Convert to meters: dx_m = dx_px × GSD, dy_m = dy_px × GSD
3. Rotate by IMU yaw heading
4. Convert meter offset to lat/lon and add to frame-center GPS
### Component: API & Streaming
**Selected**: **FastAPI + sse-starlette**. REST for session management, SSE for real-time position stream. OpenAPI auto-documentation.
## Processing Time Budget (per frame, 400ms budget)
### Normal Frame (non-keyframe, ~60-80% of frames)
| Step | Time | Notes |
|------|------|-------|
| Image capture + transfer | ~10ms | CSI/USB3 |
| Downsample (for cuVSLAM) | ~2ms | OpenCV CUDA |
| cuVSLAM VO+IMU | ~11ms | NVIDIA CUDA-optimized, 90fps capable |
| ESKF fusion (VO+IMU update) | ~1ms | C extension or NumPy |
| SSE emit | ~1ms | Async |
| **Total** | **~25ms** | Well within 400ms |
### Keyframe Satellite Matching (async, every 3-10 frames)
Runs asynchronously on a separate CUDA stream — does NOT block per-frame VO output.
**Path A — LiteSAM (if benchmark passes)**:
| Step | Time | Notes |
|------|------|-------|
| Downsample to ~480px | ~1ms | OpenCV CUDA |
| Load satellite tile | ~5ms | Pre-resized, from storage |
| LiteSAM (opt) matching | ~300-500ms | TensorRT FP16, 480px, Orin Nano Super estimate |
| Geometric pose (RANSAC) | ~5ms | Homography estimation |
| ESKF satellite update | ~1ms | Delayed measurement |
| **Total** | **~310-510ms** | Async, does not block VO |
**Path B — XFeat (if LiteSAM abandoned)**:
| Step | Time | Notes |
|------|------|-------|
| XFeat feature extraction (both images) | ~10-20ms | TensorRT FP16/INT8 |
| XFeat semi-dense matching | ~30-50ms | KNN + refinement |
| Geometric verification (RANSAC) | ~5ms | |
| ESKF satellite update | ~1ms | |
| **Total** | **~50-80ms** | Comfortably within budget |
### Per-Frame Wall-Clock Latency
Every frame:
- **VO result emitted in ~25ms** (cuVSLAM + ESKF + SSE)
- Satellite correction arrives asynchronously on keyframes
- Client gets immediate position, then refined position when satellite match completes
## Memory Budget (Jetson Orin Nano Super, 8GB shared)
| Component | Memory | Notes |
|-----------|--------|-------|
| OS + runtime | ~1.5GB | JetPack 6.2 + Python |
| cuVSLAM | ~200-300MB | NVIDIA CUDA library + internal state |
| Satellite matcher TensorRT | ~50-100MB | LiteSAM FP16 or XFeat FP16 |
| Current frame (downsampled) | ~2MB | 640×480×3 |
| Satellite tile (pre-resized) | ~1MB | Single active tile |
| ESKF state + buffers | ~10MB | |
| FastAPI + SSE runtime | ~100MB | |
| **Total** | **~1.9-2.4GB** | ~25-30% of 8GB — comfortable margin |
## Confidence Scoring
| Level | Condition | Expected Accuracy |
|-------|-----------|-------------------|
| HIGH | Satellite match succeeded + cuVSLAM consistent | <20m |
| MEDIUM | cuVSLAM VO only, recent satellite correction (<500m travel) | 20-50m |
| LOW | cuVSLAM VO only, no recent satellite correction | 50-100m+ |
| VERY LOW | IMU dead-reckoning only (cuVSLAM + satellite both failed) | 100m+ |
| MANUAL | User-provided position | As provided |
## Key Risks and Mitigations
| Risk | Likelihood | Impact | Mitigation |
|------|-----------|--------|------------|
| LiteSAM too slow on Orin Nano Super | HIGH | Misses 400ms deadline | **Abandon LiteSAM, use XFeat**. Day-one benchmark is the go/no-go gate |
| cuVSLAM not supporting nadir-only camera well | MEDIUM | VO accuracy degrades | Fall back to XFeat frame-to-frame matching |
| Google Maps satellite quality in conflict zone | HIGH | Satellite matching fails | Accept VO+IMU with higher drift; request user input sooner; alternative satellite providers |
| XFeat cross-view accuracy insufficient | MEDIUM | Position corrections less accurate than LiteSAM | Increase keyframe frequency; multi-tile consensus voting; geometric verification with strict RANSAC |
| cuVSLAM is closed-source | LOW | Hard to debug | Fallback to XFeat VO; cuVSLAM has Python+C++ APIs |
## Testing Strategy
### Integration / Functional Tests
- End-to-end pipeline test with real flight data (60 images from input_data/)
- Compare computed positions against ground truth GPS from coordinates.csv
- Measure: percentage within 50m, percentage within 20m
- Test sharp-turn handling: introduce 90-degree heading change in sequence
- Test user-input fallback: simulate 3+ consecutive failures
- Test SSE streaming: verify client receives VO result within 50ms, satellite-corrected result within 500ms
- Test session management: start/stop/restart flight sessions via REST API
### Non-Functional Tests
- **Day-one benchmark**: LiteSAM TensorRT FP16 at 480/640/800px on Orin Nano Super → go/no-go for LiteSAM
- cuVSLAM benchmark: verify 90fps monocular+IMU on Orin Nano Super
- Performance: measure per-frame processing time (must be <400ms)
- Memory: monitor peak usage during 1000-frame session (must stay <8GB)
- Stress: process 3000 frames without memory leak
- Keyframe strategy: vary interval (2, 3, 5, 10) and measure accuracy vs latency tradeoff
## References
- LiteSAM (2025): https://www.mdpi.com/2072-4292/17/19/3349
- LiteSAM code: https://github.com/boyagesmile/LiteSAM
- cuVSLAM (2025-2026): https://github.com/NVlabs/PyCuVSLAM
- PyCuVSLAM API: https://nvlabs.github.io/PyCuVSLAM/api.html
- Mateos-Ramirez et al. (2024): https://www.mdpi.com/2076-3417/14/16/7420
- SatLoc (2025): https://www.scilit.com/publications/e5cafaf875a49297a62b298a89d5572f
- XFeat (CVPR 2024): https://arxiv.org/abs/2404.19174
- XFeat TensorRT for Jetson: https://github.com/PranavNedunghat/XFeatTensorRT
- EfficientLoFTR (CVPR 2024): https://github.com/zju3dv/EfficientLoFTR
- JetPack 6.2: https://docs.nvidia.com/jetson/archives/jetpack-archived/jetpack-62/release-notes/
- Hybrid ESKF/UKF: https://arxiv.org/abs/2512.17505
- Google Maps Tile API: https://developers.google.com/maps/documentation/tile/satellite
## Related Artifacts
- AC Assessment: `_docs/00_research/gps_denied_nav/00_ac_assessment.md`
- Tech stack evaluation: `_docs/01_solution/tech_stack.md`