Files
gps-denied-onboard/_docs/00_research/04_reasoning_chain.md
T
2026-04-29 17:03:57 +03:00

6.0 KiB

Reasoning Chain

Dimension 1: Local Matcher Product Fit

Fact Confirmation

SuperPoint-style features remain technically attractive for local geometric verification, but the official Magic Leap pretrained weights are noncommercial research-only (Fact #17). LightGlue itself is Apache-2.0, but it does not license upstream extractors (Fact #18). LightGlue supports ALIKED, DISK, SIFT, and other extractors (Fact #24), DeDoDe is MIT-licensed with deployment ports (Fact #25), and OpenCV SIFT is now a commercial-safe classical baseline (Fact #26).

Reference Comparison

solution_draft02.md fixed the SuperPoint licensing issue but left "license-cleared extractor" too abstract for planning. The architecture can keep the local-verification stage, but planning needs named candidates so benchmark and licensing tasks can be decomposed.

Conclusion

Reject official SuperPoint pretrained weights for product v1 unless a commercial license is obtained. Select ALIKED + LightGlue as the first learned-feature candidate, OpenCV SIFT/AKAZE as the legal baseline, and DeDoDe as an experimental fallback pending Jetson/model-size validation.

Confidence

High for licensing; Medium for final extractor accuracy until benchmarked.


Dimension 1.5: Real-Time Scheduling

Fact Confirmation

The camera produces frames at 3 Hz, AC-4.1 allows <400 ms p95 end-to-end latency with up to ~10% dropped frames, and AC-4.4 forbids batching or delaying output (Fact #27).

Reference Comparison

A FIFO queue can accumulate stale frames whenever a heavy VPR or local-matching event exceeds the 333 ms camera interval. That would make the system accurate on old images while violating the flight-controller output latency budget.

Conclusion

Add a bounded latest-frame scheduler: camera queue size 1, explicit drop accounting, IMU propagation continues between image fixes, VPR/local matching run under deadlines, and every emitted GPS_INPUT references the freshest state timestamp.

Confidence

High.


Dimension 2: VPR Descriptor and Cache Footprint

Fact Confirmation

AnyLoc DINOv2 VLAD examples produce 49,152-dimensional descriptors (Fact #19). The operational area can be up to 400 km² with multi-scale, overlapping chunks.

Reference Comparison

Event-triggered VPR is still the right architecture, but uncompressed VLAD descriptors can quietly consume a large fraction of RAM/cache. For example, 4,000-10,000 chunks at 49,152 float32 values each is roughly 0.8-2.0 GB before multi-scale variants, indexes, metadata, and model/runtime memory.

Conclusion

Keep AnyLoc/DINOv2-style VPR as the lead retrieval family only with a mandatory descriptor-compression gate: PCA/float16/product quantization or a smaller descriptor must be chosen before implementation freeze. CPU FAISS/HNSW remains the v1 baseline until Jetson GPU indexing is proven.

Confidence

High for the footprint risk; Medium for the best compression/index choice.


Dimension 3: Satellite Cache Storage

Fact Confirmation

COG supports tiled imagery, overviews, and multiple compression profiles, but docs do not provide a universal bytes-per-pixel budget for the target imagery (Fact #21). Zoom level alone does not prove physical resolution (Fact #13).

Reference Comparison

The 10 GB persistent cache budget may be plausible with lossy compressed 0.3-0.5 m/px imagery and careful indexing, but it is not proven until representative Suite Satellite Service imagery is packaged with overviews, manifests, descriptors, and generated-tile sidecars.

Conclusion

Treat cache size as a hard measurement gate. The architecture should preserve the 10 GB budget but require a cache-packing benchmark before task decomposition commits to descriptor formats or chunk overlap settings.

Confidence

Medium-high.


Dimension 4: Relative Motion and cuVSLAM

Fact Confirmation

NVIDIA describes cuVSLAM as stereo-visual-inertial SLAM/odometry, with IMU-only degraded tracking suitable only for short intervals around one second (Fact #20). The project has one fixed downward navigation camera for v1.

Reference Comparison

cuVSLAM is a strong Jetson stack, but the selected v1 camera geometry does not match its documented primary input assumptions. A custom planar VO/IMU module can exploit nadir imagery, flat terrain, camera intrinsics, altitude, and FC attitude directly.

Conclusion

Keep custom planar VO/IMU as the lead. Keep cuVSLAM rejected for v1 product use, but preserve it as a benchmark/reference if the hardware changes to stereo or if NVIDIA documents an exact monocular deployment path matching the project.

Confidence

High.


Dimension 5: Validation Data

Fact Confirmation

AerialVL and UAV-VisLoc provide useful public aerial localization data, but they only partially match the fixed-wing, ArduPilot, high-rate IMU, camera-timing, and Ukraine steppe deployment context (Facts #22, #23).

Reference Comparison

Public datasets can validate VPR/cross-view ideas and regression-test retrieval. They cannot prove ESKF covariance, MAVLink timing, companion reboot, or false-position budgets without representative IMU and FC traces.

Conclusion

Use public datasets for early VPR/local-matcher benchmarking, then require ArduPilot SITL-generated IMU traces and at least one real FC/camera timing capture before final acceptance.

Confidence

High.


Dimension 6: ArduPilot Output

Fact Confirmation

ArduPilot documents MAVLink GPS input with GPS1_TYPE=14 (Fact #1), MAVLink defines GPS_INPUT as raw GPS sensor input rather than the global position estimate (Fact #2), and external-nav/GPS source-fusion issues are version-specific (Fact #3).

Reference Comparison

ODOMETRY is semantically richer but increases EKF source-interaction risk. v1 GPS_INPUT only is narrower and forces honest accuracy fields, but it matches the "GPS substitute" framing and avoids dual-source overlap.

Conclusion

Keep v1 GPS_INPUT only. Add a v1.1 research/testing backlog item for ODOMETRY, gated by exact ArduPilot release, params, and SITL proof.

Confidence

High.