- Revised acceptance criteria in the acceptance_criteria.md file to clarify metrics and expectations, including updates to GPS accuracy and image processing quality. - Enhanced restrictions documentation to reflect operational parameters and constraints for UAV flights, including camera specifications and satellite imagery usage. - Added new research documents for acceptance criteria assessment and question decomposition to support ongoing project evaluation and decision-making.
16 KiB
Reasoning Chain — Mode B (Solution Assessment of solution_draft01.md)
For each Mode B finding (M-1..M-15 in 02_fact_cards.md), trace the fact → comparison → conclusion path and pin the conclusion's confidence. Conclusions feed solution_draft02.md.
M-1 — ODOMETRY vs GPS_INPUT (Component 6)
Fact. ArduPilot dev docs (S41) say "ODOMETRY (the preferred method)" for sending external-nav to EKF3. ODOMETRY: quaternion + velocity NED + 21-element pos+att covariance + quality 0..100. GPS_INPUT: lat/lon/alt + 3-D velocity + scalar h_acc/v_acc + fix_type. Both supported; both targetable from pymavlink.
Reference comparison. AC-4.3 originally states "Replacement for GPS module … via MAVLink GPS_INPUT, GPS1_TYPE=14". That's GPS-substitute framing, which suggests GPS_INPUT is the right channel. But AC-NEW-4 (false-position safety budget P[err>500m]<0.1%) requires the FC to act on calibrated covariance — and GPS_INPUT collapses our 6-DoF covariance into one scalar, which is information loss.
Conclusion. Hybrid output. Keep GPS_INPUT as the primary "GPS-substitute" channel (matches AC-4.3 framing, plays cleanly with FC operator workflows that expect a GPS_RAW_INT-shaped status). Also emit ODOMETRY when the EKF emits a fix with a full 6-DoF covariance and a non-trivial yaw observability — let the FC's EKF3 fuse the richer signal. Configure FC source priorities so GPS_INPUT is the failover in case ODOMETRY trips a parameter gate (VISO_QUAL_MIN). This is a strict superset of the draft's choice; the only cost is the extra MAVLink emit and the source-switching SITL test scope (M-11).
Confidence. ✅ High. Two L1 sources (S41 dev docs + S42 PR #19563), one L1 confirming the failure path is real (S43 PR #30080).
M-2 — MASt3R off the primary matcher list
Fact. mast3r-runtime Jetson support = "Planned" (S57). Speedy MASt3R = 91 ms / pair on A40 GPU.
Reference comparison. A40 ≈ 38 TFLOPS FP16 (admin-class GPU); Jetson Orin Nano Super 25 W ≈ 1.7 TFLOPS FP16 (~67 TOPS sparse INT8). Throughput ratio ~22× to 30× depending on operator-mix. 91 ms × 22 ≈ 2 s/pair; × 30 ≈ 2.7 s/pair. Even with INT8 quantisation closing the gap by ~2× (typical for ViT-class), MASt3R lands at >1 s/pair — outside the 400 ms p95 budget by a factor of ≥2.5×.
Conclusion. MASt3R drops from the "stretch candidate" row in the draft's bench-off table to a research-track-only label. Bench-off resources should focus on SP+LG / XFeat / GIM-LightGlue / RoMa-distilled.
Confidence. ✅ High. Numbers are conservative — MASt3R has additional overhead from the depth backbone that doesn't exist in pure 2D matchers.
M-3 — Add GIM-LightGlue to the bench-off
Fact. GIM (S48): self-trained generalist matcher, 8.4–18.1 % zero-shot improvement over LightGlue/RoMa/DKM/LoFTR baselines. Pre-trained checkpoints public.
Reference comparison. Our domain (eastern-Ukraine 1 km AGL nadir vs. service satellite tiles) has zero training data publicly available; the bench-off therefore tests zero-shot transfer. GIM's training paradigm (50 h of internet videos covering every kind of scene including aerial) is precisely the regime that maximises zero-shot transfer.
Conclusion. Add GIM-LightGlue to the matcher bench-off shortlist as a peer of vanilla SP+LG. If the published 8–18 % zero-shot gain holds on AerialVL + Mavic, GIM-LightGlue dominates the cost/quality frontier (same TRT path as SP+LG, better accuracy out of the box).
Confidence. ✅ High. ICLR 2024 spotlight; benchmark numbers reproduced by independent users in the GitHub issue tracker.
M-4 — VPR shortlist expansion: + SALAD + BoQ
Fact. SALAD (S47, CVPR 2024): DINOv2 + Sinkhorn optimal-transport VLAD; R@1 = 75 % on MSLS Challenge / 92.2 % MSLS Val / 76 % NordLand; in aero-vloc. BoQ (S46, CVPR 2024): bag of learnable queries, beats NetVLAD/MixVPR/EigenPlaces/Patch-NetVLAD/TransVPR/R2Former on 14 benchmarks; DinoV2 results Nov 2024.
Reference comparison. AnyLoc (draft primary) is unsupervised VLAD over DINOv2 features; SALAD is trained DINOv2-VLAD via Sinkhorn; BoQ is learnable queries over a backbone (DINOv2 or ViT). SALAD strictly beats AnyLoc on the same backbone in published benchmarks. BoQ beats both on standard VPR benchmarks; aerial-specific numbers TBD but well-positioned.
Conclusion. The bench-off table grows from {AnyLoc, MixVPR} to {AnyLoc, SALAD, BoQ, MixVPR}. AnyLoc remains the training-free fallback; SALAD and BoQ are likely primaries.
Confidence. ✅ High on M-4 (sources are CVPR 2024 papers + GitHub repos with published weights). Aerial-domain ranking is empirical — the bench-off resolves it.
M-5 — Latency budget has more headroom than the draft assumed
Fact. Jetson AI Lab (S40): DINOv2-base-patch14 = 126 inf/s on Orin Nano Super → ~8 ms/inf at 224×224, FP16 trtexec.
Reference comparison. Draft estimated 50–80 ms / 224×224 for DINOv2 ViT-B (Component 2 row 1). Real number is ~6–10× better. At 448×448 (more typical for AnyLoc descriptor extraction), expect ~32 ms/inf via near-quadratic scaling.
Conclusion. AC-4.1 (400 ms p95) is comfortably feasible with budget left over for SP+LG / GIM-LightGlue (target ~100 ms/pair) + EKF + MAVLink emit. R2 in the draft's risk table downgraded from High to Medium — empirical confirmation needed but no longer a make-or-break risk.
Confidence. ✅ High. NVIDIA L1 source.
M-6 — mavlink-router CVE-class issue
Fact. S45: stack-based buffer overflow in mavlink-router config parsing, fuzzing-discovered, public, no SECURITY.md.
Reference comparison. mavlink-router is C++ daemon running with the same privileges as our companion process; if the config file is attacker-controlled (e.g., a tampered SD card on the airframe), this becomes RCE on the companion. Even if the config file is operator-controlled, a buggy config-file parser is one bug away from another related issue.
Conclusion. Three options, choose one:
- Pin a specific patched version + sandboxed systemd unit (NoNewPrivileges, ReadOnlyPaths=/etc/mavlink-router/, MemoryDenyWriteExecute, RestrictAddressFamilies=AF_UNIX AF_INET).
- Replace with an in-process MAVLink endpoint multiplexer (Python or Go, ~150 LOC) — eliminates the dependency entirely.
- Distinct system-IDs for MAVSDK + pymavlink sharing the same serial port via ArduPilot's native MAVLink routing, no router daemon at all.
Option 3 is the simplest. Option 2 gives us the most control. Option 1 is the lowest-effort quick fix. Recommend Option 3 for v1, with Option 2 as v1.1 if MAVLink message volume saturates a single endpoint.
Confidence. ✅ High that the issue is real; choice of mitigation is implementation preference.
M-7 — MAVLink2 signing is v1-mandatory
Fact. S44: signing supported in ArduPilot 4.5+ on telemetry links; USB bypasses; keys in FRAM.
Reference comparison. Without signing, anyone with serial-line access (companion side OR an exposed telemetry radio) can inject a GPS_INPUT (or ODOMETRY) frame and crash the vehicle. Signing makes that injection require possession of the FRAM key. The cost is one operator key-provisioning step per airframe.
Conclusion. Promote signing from "Security note (deferred to a Phase-4 security pass)" to a v1 hard configuration item. Document the key-provisioning procedure in the deploy runbook. Verify signing-on at boot and refuse to inject GPS_INPUT/ODOMETRY if the signed-frame ack from the FC indicates signing-off.
Confidence. ✅ High.
M-8 — MBTiles operational recipe
Fact. S54: WAL + connection pool + transaction batching is the established recipe for MBTiles SQLite under concurrent reader+writer load. Default rollback journal mode causes database is locked failures.
Reference comparison. Our workload: many concurrent readers (matcher cache lookup at ≤3 fps × ~30 candidate tiles) + occasional writer (Component 1b ortho-tile write at ≤1–2 Hz × ~30 tiles). Without WAL, every writer commit blocks all readers. With WAL, readers and one writer proceed concurrently.
Conclusion. Update Component 1's "Tile format" row in the architecture table to specify: MBTiles SQLite + WAL + connection pool + per-Component-1b-cycle transaction batching. Add to AC-4.1 latency-budget validation: the tile-cache lookup must hit p95 ≤5 ms.
Confidence. ✅ High.
M-9 — Cache-poisoning safety hazard
Fact (analytical, not a single source). Draft's dedup rule allows onboard tiles to overwrite stale service tiles when "our quality > existing". Quality = inlier count + sharpness; does not include parent-pose covariance as a hard gate. Combined with EKF over-confidence (a known failure mode — see W4.b), this lets a confidently-bad pose write a misaligned tile that becomes the next flight's anchor.
Reference comparison. Cartography literature consistently treats authoritative basemap as immutable and crowdsourced/UAV updates as voting input that requires consensus before promotion. SfM bundle-adjustment treats over-confident poses as the dominant error source.
Conclusion. Three layered mitigations:
- Service-source tiles are immutable within freshness budget. Onboard tiles overwrite only stale or other-onboard tiles.
- The Suite Service ingest applies a voting layer: an onboard tile gets promoted to "trusted basemap" only after N≥2 independent flights confirm consistent geo-alignment within X m.
- Parent-pose covariance is a hard gate in the local quality score: σ_xy must be tighter than the generation-eligibility gate (e.g., σ_xy ≤ 5 m vs. 10 m generation gate), and a tile written above the hard gate is marked "soft" in its sidecar.
Add AC-NEW-7 — Cache-poisoning safety budget: P(onboard tile mis-aligned > 30 m) per flight < 1 %; P(misaligned > 100 m) per flight < 0.1 %. Validation: replay AerialVL with synthetic over-confidence injection.
Confidence. ⚠️ Medium. Hazard is real and qualitatively well-known; specific numeric thresholds need empirical calibration during implementation.
M-10 — Free-threaded Python 3.13 not v1-ready
Fact. S55: experimental, single-threaded perf hit, GIL re-enables on non-FT-aware C extension import.
Reference comparison. Our hot-path includes: numba JIT kernels, TensorRT Python bindings, pymavlink (C extension), numpy/scipy, possibly cv2. Any one of these silently re-enabling the GIL nullifies the benefit. And the non-trivial single-threaded penalty (~10–15 % per various benchmarks) directly hits AC-NEW-1 (cold-start TTFF <30 s).
Conclusion. v1 stays on standard CPython 3.11 or 3.12 (newest stable, well-supported by JetPack / numba / TRT). Sharpen the rationale in the architecture: the choice is not "GIL is fine" but "asyncio + TRT subprocess workers + numba JIT is the production-ready combination today; revisit free-threading in v1.1."
Confidence. ✅ High.
M-11 — ODOMETRY known production gotchas → SITL coverage required
Fact. S41/S42/S43: companion-derived velocity errors, position-estimate resets when external-nav reference loss, source-switching conflicts when running alongside GPS.
Reference comparison. AC-NEW-2 (3 s spoofing-promotion latency) is the source-switching path. Whatever output channel we pick (GPS_INPUT, ODOMETRY, or hybrid), the source switch is the high-risk transition.
Conclusion. Add an explicit testing requirement: F-T9 (SITL: full MAVLink loop) must include source-switching scenarios (jam onset → our channel → spoofed real-GPS recovery → operator-confirmed source restore). Include the EK3_SRC1_* parameter combinations being benchmarked in the test plan.
Confidence. ✅ High.
M-12 — Eastern-Ukraine relief amplitude affects flat-Earth assumption
Fact. S56: ~24 m peak-to-trough relief in Kharkiv-region UAV survey areas, with creek/gully systems.
Reference comparison. At 1 km AGL with 35° HFOV camera, a 24 m elevation offset at frame edge → ~17 m horizontal misalignment when ortho-projected on flat-Earth. AC-1.1 budget = 50 m@80 % (comfortable); AC-1.2 = 20 m@50 % (tight).
Conclusion. Add a per-sector DEM lookup to the pre-flight tile-sync pass. Classify sectors:
- flat (≤5 m amplitude) — full ortho-tile generation, full anchor weight.
- moderate (5–15 m) — ortho-tile generation, anchor weight × 0.7.
- rugged (>15 m) — skip ortho-tile generation, anchor weight × 0.3 with explicit "rugged-sector" flag in confidence telemetry. This is a small one-time pre-flight step (SRTM 30 m DEM is free, ~15 GB global, ~30 MB for 400 km²).
Confidence. ⚠️ Medium. Single regional sample; refine numbers when more terrain data lands.
M-13 — TartanAir V2 reconsideration (open question)
Fact. S51: photo-realistic synthetic, native IMU + 12-cam + season variation + custom camera models.
Reference comparison. User's last-message reasoning was "Mavic-class dynamics ≠ fixed-wing dynamics → synthetic IMU is unlikely to produce a useful signal". TartanAir V2 lets us configure motion patterns, so the dynamics-mismatch argument is weaker than for MidAir-class quadcopter-only sims.
Conclusion. Open question for the user: include TartanAir V2 in the bench-off as an early-stage synthetic baseline (good for sweeping seasons / lighting / pitches), or hold to "real-data-only purism" with AerialVL + Mavic + planned-fixed-wing-flights as the only V&V?
Confidence. ⚠️ Medium. Technical viability is high; the call is product-side.
M-14 — Add AerialExtreMatch + 2chADCNN to V&V plan
Fact. AerialExtreMatch (S49) — 1.5 M synthetic image pairs, 32 difficulty levels (overlap × scale × pitch), real-world UAV localization subset. 2chADCNN (S50) — season-aware UAV↔satellite template-matching.
Reference comparison. Draft's bench-off targets are AerialVL + UAV-VisLoc + internal Mavic. None of those grade against extreme-pitch / extreme-scale / extreme-overlap separately. Without a benchmark that crosses these axes, the bench-off can pick a winner that fails silently in cornered conditions.
Conclusion. Add to the V&V plan:
- AerialExtreMatch as a primary structured-difficulty regression bench.
- 2chADCNN as a season-aware baseline either (a) included in the bench-off, or (b) used as an explicit season-robustness ceiling reference.
Confidence. ✅ High.
M-15 — Real fixed-wing VO is harder than draft implies
Fact. S52 (AFIT thesis): SVO/DSO/ORB-SLAM2 all "had significant difficulty maintaining localisation" on real fixed-wing flights. S53: high-altitude (300–1000 m AGL) VIO drift in the same band as our AC-1.3.
Reference comparison. Draft's choice ("custom 2-frame homography VO via Component-3 matcher") is correct framing — VO between satellite anchors is a much easier problem than standalone metric SLAM. But AC-1.3's drift budget (<100 m without IMU, <50 m with IMU between two satellite-anchored fixes) requires empirical confirmation against a real fixed-wing baseline.
Conclusion. Add to risks: R8 — fixed-wing VO drift under our AC-1.3 budget is unconfirmed. Mitigations:
- Borrow AerialVL's fixed-wing trajectories (70 km of real fixed-wing flight) for AC-1.3 regression in
F-T1b(new). - Plan the first internal fixed-wing flight before AC lock — not as a stretch goal.
Confidence. ✅ High.
Summary table
| Finding | Severity | Affects | Resolution |
|---|---|---|---|
| M-1 | High | C-6, AC-4.3, AC-NEW-4 | Hybrid GPS_INPUT + ODOMETRY |
| M-2 | High | C-3 bench-off | Drop MASt3R from primary list |
| M-3 | Med | C-3 bench-off | Add GIM-LightGlue |
| M-4 | High | C-2 bench-off | Add SALAD + BoQ |
| M-5 | High (positive) | AC-4.1 | Downgrade R2 risk |
| M-6 | High (security) | C-6 | Replace mavlink-router OR sandbox & pin |
| M-7 | High (security) | C-6 | MAVLink2 signing v1-mandatory |
| M-8 | Med | C-1 | MBTiles WAL + pool + batching |
| M-9 | High (safety) | C-1b, AC-NEW | New AC-NEW-7 + dedup-rule changes |
| M-10 | Med | C-9 | Stay on CPython 3.11/3.12; sharpen rationale |
| M-11 | Med | C-5/C-6, AC-NEW-2 | Add SITL source-switching tests |
| M-12 | Med | C-1b, AC-1.2 | Per-sector DEM lookup + anchor weight |
| M-13 | Open question | datasets | Surface to user |
| M-14 | Med | V&V plan | Add AerialExtreMatch + 2chADCNN |
| M-15 | Med | C-4, AC-1.3 | Risk R8 + AerialVL F-T1b |