# Reasoning Chain ## Dimension 1: Local Matcher Product Fit ### Fact Confirmation SuperPoint-style features remain technically attractive for local geometric verification, but the official Magic Leap pretrained weights are noncommercial research-only (Fact #17). LightGlue itself is Apache-2.0, but it does not license upstream extractors (Fact #18). LightGlue supports ALIKED, DISK, SIFT, and other extractors (Fact #24), DeDoDe is MIT-licensed with deployment ports (Fact #25), and OpenCV SIFT is now a commercial-safe classical baseline (Fact #26). ### Reference Comparison `solution_draft02.md` fixed the SuperPoint licensing issue but left "license-cleared extractor" too abstract for planning. The architecture can keep the local-verification stage, but planning needs named candidates so benchmark and licensing tasks can be decomposed. ### Conclusion Reject official SuperPoint pretrained weights for product v1 unless a commercial license is obtained. Select ALIKED + LightGlue as the first learned-feature candidate, OpenCV SIFT/AKAZE as the legal baseline, and DeDoDe as an experimental fallback pending Jetson/model-size validation. ### Confidence High for licensing; Medium for final extractor accuracy until benchmarked. --- ## Dimension 1.5: Real-Time Scheduling ### Fact Confirmation The camera produces frames at 3 Hz, AC-4.1 allows <400 ms p95 end-to-end latency with up to ~10% dropped frames, and AC-4.4 forbids batching or delaying output (Fact #27). ### Reference Comparison A FIFO queue can accumulate stale frames whenever a heavy VPR or local-matching event exceeds the 333 ms camera interval. That would make the system accurate on old images while violating the flight-controller output latency budget. ### Conclusion Add a bounded latest-frame scheduler: camera queue size 1, explicit drop accounting, IMU propagation continues between image fixes, VPR/local matching run under deadlines, and every emitted `GPS_INPUT` references the freshest state timestamp. ### Confidence High. --- ## Dimension 2: VPR Descriptor and Cache Footprint ### Fact Confirmation AnyLoc DINOv2 VLAD examples produce 49,152-dimensional descriptors (Fact #19). The operational area can be up to 400 km² with multi-scale, overlapping chunks. ### Reference Comparison Event-triggered VPR is still the right architecture, but uncompressed VLAD descriptors can quietly consume a large fraction of RAM/cache. For example, 4,000-10,000 chunks at 49,152 float32 values each is roughly 0.8-2.0 GB before multi-scale variants, indexes, metadata, and model/runtime memory. ### Conclusion Keep AnyLoc/DINOv2-style VPR as the lead retrieval family only with a mandatory descriptor-compression gate: PCA/float16/product quantization or a smaller descriptor must be chosen before implementation freeze. CPU FAISS/HNSW remains the v1 baseline until Jetson GPU indexing is proven. ### Confidence High for the footprint risk; Medium for the best compression/index choice. --- ## Dimension 3: Satellite Cache Storage ### Fact Confirmation COG supports tiled imagery, overviews, and multiple compression profiles, but docs do not provide a universal bytes-per-pixel budget for the target imagery (Fact #21). Zoom level alone does not prove physical resolution (Fact #13). ### Reference Comparison The 10 GB persistent cache budget may be plausible with lossy compressed 0.3-0.5 m/px imagery and careful indexing, but it is not proven until representative Suite Satellite Service imagery is packaged with overviews, manifests, descriptors, and generated-tile sidecars. ### Conclusion Treat cache size as a hard measurement gate. The architecture should preserve the 10 GB budget but require a cache-packing benchmark before task decomposition commits to descriptor formats or chunk overlap settings. ### Confidence Medium-high. --- ## Dimension 4: Relative Motion and cuVSLAM ### Fact Confirmation NVIDIA describes cuVSLAM as stereo-visual-inertial SLAM/odometry, with IMU-only degraded tracking suitable only for short intervals around one second (Fact #20). The project has one fixed downward navigation camera for v1. ### Reference Comparison cuVSLAM is a strong Jetson stack, but the selected v1 camera geometry does not match its documented primary input assumptions. A custom planar VO/IMU module can exploit nadir imagery, flat terrain, camera intrinsics, altitude, and FC attitude directly. ### Conclusion Keep custom planar VO/IMU as the lead. Keep cuVSLAM rejected for v1 product use, but preserve it as a benchmark/reference if the hardware changes to stereo or if NVIDIA documents an exact monocular deployment path matching the project. ### Confidence High. --- ## Dimension 5: Validation Data ### Fact Confirmation AerialVL and UAV-VisLoc provide useful public aerial localization data, but they only partially match the fixed-wing, ArduPilot, high-rate IMU, camera-timing, and Ukraine steppe deployment context (Facts #22, #23). ### Reference Comparison Public datasets can validate VPR/cross-view ideas and regression-test retrieval. They cannot prove ESKF covariance, MAVLink timing, companion reboot, or false-position budgets without representative IMU and FC traces. ### Conclusion Use public datasets for early VPR/local-matcher benchmarking, then require ArduPilot SITL-generated IMU traces and at least one real FC/camera timing capture before final acceptance. ### Confidence High. --- ## Dimension 6: ArduPilot Output ### Fact Confirmation ArduPilot documents MAVLink GPS input with `GPS1_TYPE=14` (Fact #1), MAVLink defines `GPS_INPUT` as raw GPS sensor input rather than the global position estimate (Fact #2), and external-nav/GPS source-fusion issues are version-specific (Fact #3). ### Reference Comparison `ODOMETRY` is semantically richer but increases EKF source-interaction risk. v1 `GPS_INPUT` only is narrower and forces honest accuracy fields, but it matches the "GPS substitute" framing and avoids dual-source overlap. ### Conclusion Keep v1 `GPS_INPUT` only. Add a v1.1 research/testing backlog item for `ODOMETRY`, gated by exact ArduPilot release, params, and SITL proof. ### Confidence High.