mirror of
https://github.com/azaion/gps-denied-desktop.git
synced 2026-04-23 01:06:36 +00:00
add clarification to research methodology by including a step for solution comparison and user consultation
This commit is contained in:
@@ -0,0 +1,141 @@
|
||||
# Reasoning Chain
|
||||
|
||||
## WP-1: Lens Undistortion
|
||||
|
||||
### Fact Confirmation
|
||||
According to Fact #13, lens distortion correction is crucial for UAV photogrammetry with non-metric cameras. Distortion at image edges can be 5-20px for wide-angle lenses. The camera parameters (K matrix + distortion coefficients) are known.
|
||||
|
||||
### Current State
|
||||
Draft05 mentions "rectify" in preprocessing step 2 but does not explicitly include undistortion using camera intrinsics (K, distortion coefficients). Feature matching operates on distorted images, introducing position errors especially at image edges.
|
||||
|
||||
### Conclusion
|
||||
Add explicit cv2.undistort() step after image loading, before downscaling. This corrects radial and tangential distortion across the entire image. Camera calibration matrix K and distortion coefficients are provided as camera_params in the job request. Cost: ~5-10ms per image — negligible vs 5s budget.
|
||||
|
||||
### Confidence
|
||||
✅ High — well-established photogrammetry practice
|
||||
|
||||
---
|
||||
|
||||
## WP-2: Camera Tilt GSD Compensation
|
||||
|
||||
### Fact Confirmation
|
||||
According to Fact #1, camera tilt of 18° produces >5% GSD error. During turns (10-30° bank angle), error ranges 1.5-15.5%. According to Fact #2, homography decomposition (already in the VO pipeline) extracts rotation matrix R from which tilt can be derived.
|
||||
|
||||
### Current State
|
||||
Draft05 computes GSD assuming perfectly nadir (straight-down) camera. The restrictions state the camera is "not autostabilized." During turns, the UAV banks causing significant camera tilt. GSD error of 10-15% during turns propagates to VO displacement estimates and then to position estimates.
|
||||
|
||||
### Conclusion
|
||||
After homography decomposition in VO step 6, extract tilt angle θ from rotation matrix R. Apply correction: GSD_corrected = GSD_nadir / cos(θ). For the first frame in a segment (no homography yet), use GSD_nadir (tilt unknown, assume straight flight). Zero additional computation cost — the rotation matrix R is already computed.
|
||||
|
||||
### Confidence
|
||||
✅ High — mathematical relationship, data already available in pipeline
|
||||
|
||||
---
|
||||
|
||||
## WP-3: DINOv2 Aggregation
|
||||
|
||||
### Fact Confirmation
|
||||
According to Fact #3, SALAD aggregation improves DINOv2 retrieval by +12.4pp R@1 on MSLS Challenge over GeM. According to Fact #5, GeM pooling itself is +20pp over VLAD-style average pooling.
|
||||
|
||||
### Current State
|
||||
Draft05 uses "spatial average pooling" of DINOv2 patch tokens — the simplest and weakest aggregation method.
|
||||
|
||||
### Reasoning
|
||||
Coarse retrieval quality directly impacts satellite matching success rate. If the correct tile isn't in top-5 retrieval results, fine matching cannot succeed regardless of LiteSAM quality. A 20pp improvement in retrieval (via GeM) is substantial and costs nothing. SALAD adds another +12pp but requires a trained adapter layer — reasonable future enhancement.
|
||||
|
||||
### Conclusion
|
||||
Replace average pooling with GeM pooling as the immediate upgrade (one-line change, zero overhead). Document SALAD as a future enhancement if retrieval recall proves insufficient.
|
||||
|
||||
### Confidence
|
||||
✅ High for GeM improvement; ⚠️ Medium for SALAD on UAV-satellite cross-view (not directly benchmarked)
|
||||
|
||||
---
|
||||
|
||||
## WP-4: GPU Scheduling
|
||||
|
||||
### Fact Confirmation
|
||||
According to Fact #6, compute-bound models cannot run truly concurrently on a single GPU via CUDA streams. According to Fact #7, recommended pattern is sequential GPU execution with async Python.
|
||||
|
||||
### Current State
|
||||
Draft05 states "satellite matching for frame N overlaps with VO processing for frame N+1" — this implies true GPU-level parallelism which is not achievable.
|
||||
|
||||
### Conclusion
|
||||
Clarify the pipeline model: GPU executes VO and satellite matching sequentially for each frame. Total GPU time per frame: ~450ms (VO ~200ms + satellite ~250ms). Well within 5s budget. The async benefit is in Python-level logic: while GPU processes satellite matching for frame N, the CPU can prepare frame N+1 data (image loading, preprocessing, GTSAM update). Satellite results for frame N are added to the factor graph when ready. The critical path per frame is ~200ms (VO only for position estimate); satellite correction is asynchronous at the application level, not GPU level.
|
||||
|
||||
### Confidence
|
||||
✅ High — official CUDA/PyTorch documentation
|
||||
|
||||
---
|
||||
|
||||
## WP-5-9: Security Dependency Updates
|
||||
|
||||
### Fact Confirmation
|
||||
Facts #8-12 establish concrete CVEs with specific affected versions and fixes.
|
||||
|
||||
### Current State
|
||||
Draft05 pins PyTorch ≥2.10.0 and Pillow ≥11.3.0. It uses python-jose for JWT, aiohttp for HTTP, and ONNX Runtime without version pinning.
|
||||
|
||||
### Conclusion
|
||||
1. Replace python-jose with PyJWT ≥2.10.0 (maintained, secure, drop-in replacement for JWT)
|
||||
2. Upgrade Pillow pin to ≥12.1.1 (CVE-2026-25990)
|
||||
3. Pin aiohttp ≥3.13.3 (7 CVEs)
|
||||
4. Pin h11 ≥0.16.0 (CVE-2025-43859, CVSS 9.1)
|
||||
5. Pin ONNX Runtime ≥1.24.1 (path traversal)
|
||||
6. Monitor safetensors metadata RCE
|
||||
|
||||
### Confidence
|
||||
✅ High — all from NVD/official advisories
|
||||
|
||||
---
|
||||
|
||||
## WP-10: ENU vs UTM for Long Flights
|
||||
|
||||
### Fact Confirmation
|
||||
According to Fact #14, ENU approximation is accurate within 4km. Beyond 4km, errors become significant. At 10km: ~0.5m error; at 50km: ~12.5m.
|
||||
|
||||
### Current State
|
||||
Draft05 uses ENU centered on starting GPS. UAV flights can cover 30-50km+ (3000 photos at 100m spacing = 300km theoretical max).
|
||||
|
||||
### Reasoning
|
||||
300km is well beyond ENU's 4km accuracy range. Even typical flights (500-1500 photos at 100m = 50-150km) far exceed this. UTM projection is accurate to <1m within a 360km-wide zone, covers any realistic flight. pyproj (already mentioned in draft05 for WGS84↔ENU) supports UTM natively.
|
||||
|
||||
### Conclusion
|
||||
Replace ENU with UTM coordinates. Use pyproj to auto-select UTM zone from starting GPS. All internal positions in UTM meters. Convert to WGS84 for output. Factor graph operates in UTM — same math as ENU, just different projection. No re-centering needed.
|
||||
|
||||
### Confidence
|
||||
✅ High — well-established geodesy
|
||||
|
||||
---
|
||||
|
||||
## WP-11: Memory Management
|
||||
|
||||
### Fact Confirmation
|
||||
According to Fact #15, visual SLAM systems use rolling windows for feature descriptors, keeping only recent frames in active memory.
|
||||
|
||||
### Current State
|
||||
Draft05 doesn't specify when SuperPoint features are freed. For 3000 images, keeping all features would use ~6GB RAM (2000 keypoints × 256 dims × 4 bytes × 3000 = 6.1GB).
|
||||
|
||||
### Reasoning
|
||||
Only consecutive frame pairs need SuperPoint features for VO. After matching frame N with frame N-1, frame N-1's features are no longer needed. Factor graph stores only Pose2 variables (~24 bytes each), not features. Satellite matching uses DINOv2 + LiteSAM (separate features, not cached per frame).
|
||||
|
||||
### Conclusion
|
||||
Explicitly specify: after VO matching between frame N and N-1, discard frame N-1's SuperPoint features. Keep only current frame's features for next iteration. Memory: constant ~2MB regardless of flight length. Document total memory budget per component.
|
||||
|
||||
### Confidence
|
||||
✅ High — standard SLAM practice
|
||||
|
||||
---
|
||||
|
||||
## WP-12: safetensors Security
|
||||
|
||||
### Fact Confirmation
|
||||
According to Fact #16, safetensors metadata RCE is under review. Polyglot and header-bomb attacks are known vectors.
|
||||
|
||||
### Current State
|
||||
Draft05 recommends safetensors format for DINOv2 but doesn't validate header size.
|
||||
|
||||
### Conclusion
|
||||
Add safetensors header size validation: reject files with header > 10MB (normal header is <1KB for DINOv2). This mitigates header-bomb DoS and reduces attack surface for metadata RCE.
|
||||
|
||||
### Confidence
|
||||
⚠️ Medium — vulnerability is under review, mitigation is precautionary
|
||||
Reference in New Issue
Block a user