mirror of https://github.com/azaion/gps-denied-desktop.git synced 2026-04-22 22:46:36 +00:00

Files

T

Oleksandr Bezdieniezhnykh b419e2c04a add clarification to research methodology by including a step for solution comparison and user consultation

2026-03-17 18:43:57 +02:00

7.3 KiB

Raw Blame History

Reasoning Chain

WP-1: Lens Undistortion

Fact Confirmation

According to Fact #13, lens distortion correction is crucial for UAV photogrammetry with non-metric cameras. Distortion at image edges can be 5-20px for wide-angle lenses. The camera parameters (K matrix + distortion coefficients) are known.

Current State

Draft05 mentions "rectify" in preprocessing step 2 but does not explicitly include undistortion using camera intrinsics (K, distortion coefficients). Feature matching operates on distorted images, introducing position errors especially at image edges.

Conclusion

Add explicit cv2.undistort() step after image loading, before downscaling. This corrects radial and tangential distortion across the entire image. Camera calibration matrix K and distortion coefficients are provided as camera_params in the job request. Cost: ~5-10ms per image — negligible vs 5s budget.

Confidence

✅ High — well-established photogrammetry practice

WP-2: Camera Tilt GSD Compensation

Fact Confirmation

According to Fact #1, camera tilt of 18° produces >5% GSD error. During turns (10-30° bank angle), error ranges 1.5-15.5%. According to Fact #2, homography decomposition (already in the VO pipeline) extracts rotation matrix R from which tilt can be derived.

Current State

Draft05 computes GSD assuming perfectly nadir (straight-down) camera. The restrictions state the camera is "not autostabilized." During turns, the UAV banks causing significant camera tilt. GSD error of 10-15% during turns propagates to VO displacement estimates and then to position estimates.

Conclusion

After homography decomposition in VO step 6, extract tilt angle θ from rotation matrix R. Apply correction: GSD_corrected = GSD_nadir / cos(θ). For the first frame in a segment (no homography yet), use GSD_nadir (tilt unknown, assume straight flight). Zero additional computation cost — the rotation matrix R is already computed.

Confidence

✅ High — mathematical relationship, data already available in pipeline

WP-3: DINOv2 Aggregation

Fact Confirmation

According to Fact #3, SALAD aggregation improves DINOv2 retrieval by +12.4pp R@1 on MSLS Challenge over GeM. According to Fact #5, GeM pooling itself is +20pp over VLAD-style average pooling.

Current State

Draft05 uses "spatial average pooling" of DINOv2 patch tokens — the simplest and weakest aggregation method.

Reasoning

Coarse retrieval quality directly impacts satellite matching success rate. If the correct tile isn't in top-5 retrieval results, fine matching cannot succeed regardless of LiteSAM quality. A 20pp improvement in retrieval (via GeM) is substantial and costs nothing. SALAD adds another +12pp but requires a trained adapter layer — reasonable future enhancement.

Conclusion

Replace average pooling with GeM pooling as the immediate upgrade (one-line change, zero overhead). Document SALAD as a future enhancement if retrieval recall proves insufficient.

Confidence

✅ High for GeM improvement; ⚠️ Medium for SALAD on UAV-satellite cross-view (not directly benchmarked)

WP-4: GPU Scheduling

Fact Confirmation

According to Fact #6, compute-bound models cannot run truly concurrently on a single GPU via CUDA streams. According to Fact #7, recommended pattern is sequential GPU execution with async Python.

Current State

Draft05 states "satellite matching for frame N overlaps with VO processing for frame N+1" — this implies true GPU-level parallelism which is not achievable.

Conclusion

Clarify the pipeline model: GPU executes VO and satellite matching sequentially for each frame. Total GPU time per frame: ~450ms (VO ~200ms + satellite ~250ms). Well within 5s budget. The async benefit is in Python-level logic: while GPU processes satellite matching for frame N, the CPU can prepare frame N+1 data (image loading, preprocessing, GTSAM update). Satellite results for frame N are added to the factor graph when ready. The critical path per frame is ~200ms (VO only for position estimate); satellite correction is asynchronous at the application level, not GPU level.

Confidence

✅ High — official CUDA/PyTorch documentation

WP-5-9: Security Dependency Updates

Fact Confirmation

Facts #8-12 establish concrete CVEs with specific affected versions and fixes.

Current State

Draft05 pins PyTorch ≥2.10.0 and Pillow ≥11.3.0. It uses python-jose for JWT, aiohttp for HTTP, and ONNX Runtime without version pinning.

Conclusion

Replace python-jose with PyJWT ≥2.10.0 (maintained, secure, drop-in replacement for JWT)
Upgrade Pillow pin to ≥12.1.1 (CVE-2026-25990)
Pin aiohttp ≥3.13.3 (7 CVEs)
Pin h11 ≥0.16.0 (CVE-2025-43859, CVSS 9.1)
Pin ONNX Runtime ≥1.24.1 (path traversal)
Monitor safetensors metadata RCE

Confidence

✅ High — all from NVD/official advisories

WP-10: ENU vs UTM for Long Flights

Fact Confirmation

According to Fact #14, ENU approximation is accurate within 4km. Beyond 4km, errors become significant. At 10km: ~0.5m error; at 50km: ~12.5m.

Current State

Draft05 uses ENU centered on starting GPS. UAV flights can cover 30-50km+ (3000 photos at 100m spacing = 300km theoretical max).

Reasoning

300km is well beyond ENU's 4km accuracy range. Even typical flights (500-1500 photos at 100m = 50-150km) far exceed this. UTM projection is accurate to <1m within a 360km-wide zone, covers any realistic flight. pyproj (already mentioned in draft05 for WGS84↔ENU) supports UTM natively.

Conclusion

Replace ENU with UTM coordinates. Use pyproj to auto-select UTM zone from starting GPS. All internal positions in UTM meters. Convert to WGS84 for output. Factor graph operates in UTM — same math as ENU, just different projection. No re-centering needed.

Confidence

✅ High — well-established geodesy

WP-11: Memory Management

Fact Confirmation

According to Fact #15, visual SLAM systems use rolling windows for feature descriptors, keeping only recent frames in active memory.

Current State

Draft05 doesn't specify when SuperPoint features are freed. For 3000 images, keeping all features would use ~6GB RAM (2000 keypoints × 256 dims × 4 bytes × 3000 = 6.1GB).

Reasoning

Only consecutive frame pairs need SuperPoint features for VO. After matching frame N with frame N-1, frame N-1's features are no longer needed. Factor graph stores only Pose2 variables (~24 bytes each), not features. Satellite matching uses DINOv2 + LiteSAM (separate features, not cached per frame).

Conclusion

Explicitly specify: after VO matching between frame N and N-1, discard frame N-1's SuperPoint features. Keep only current frame's features for next iteration. Memory: constant ~2MB regardless of flight length. Document total memory budget per component.

Confidence

✅ High — standard SLAM practice

WP-12: safetensors Security

Fact Confirmation

According to Fact #16, safetensors metadata RCE is under review. Polyglot and header-bomb attacks are known vectors.

Current State

Draft05 recommends safetensors format for DINOv2 but doesn't validate header size.

Conclusion

Add safetensors header size validation: reject files with header > 10MB (normal header is <1KB for DINOv2). This mitigates header-bomb DoS and reduces attack surface for metadata RCE.

Confidence

⚠️ Medium — vulnerability is under review, mitigation is precautionary

7.3 KiB Raw Blame History Unescape Escape

Reasoning Chain

WP-1: Lens Undistortion

Fact Confirmation

Current State

Conclusion

Confidence

WP-2: Camera Tilt GSD Compensation

Fact Confirmation

Current State

Conclusion

Confidence

WP-3: DINOv2 Aggregation

Fact Confirmation

Current State

Reasoning

Conclusion

Confidence

WP-4: GPU Scheduling

Fact Confirmation

Current State

Conclusion

Confidence

WP-5-9: Security Dependency Updates

Fact Confirmation

Current State

Conclusion

Confidence

WP-10: ENU vs UTM for Long Flights

Fact Confirmation

Current State

Reasoning

Conclusion

Confidence

WP-11: Memory Management

Fact Confirmation

Current State

Reasoning

Conclusion

Confidence

WP-12: safetensors Security

Fact Confirmation

Current State

Conclusion

Confidence

7.3 KiB

Raw Blame History